ML Arxiv Haul #8
These have become useful to me as a mental bookmarking exercise and as a summary for interesting papers.
This paper is a year old, but I think resurfaced in a weird Twitter or podcast deep dive. Previous work had considered whether Dyson spheres could be built around a supermassive black hole and use cosmic microwave background radiation (CMB). The authors consider other sources of energy which would make a black hole appealing, and how these might be detectable from our telescopes.
After June’s Discovering the Hidden Vocabulary of DALLE-2, it’s interesting to see someone else develop their own take. Millière describes two types of nonsense prompts: “macaronic” which mixes tokens from multiple languages (uccoisegeljaros to mean birds) and “evocative” which mimics scientific names (ceralineus rabaventis to mean insects).
In June I was in a Probabilistic AI class, so I have been trying to pick up some more knowledge about Bayes + ML. The paper discusses how Bayesian systems have been proposed to mathematically remove a record from training (pitched as following GDPR regulations). There can be major negative effects of these removal processes; the authors point to hyperparameters as making or breaking a successful removal.
Work from OpenAI about setting up generative models to not just append text to the end, but form text in the middle of a context. This would be super-cool because existing models and methods could be tuned to do this new task.
A Microsoft project (CodeT) specifically studies this in code generation. I’m particularly interested in it because you could write a comment to be infilled, then the code which you’d like to generate, and see what type of comment would have prompted the model.
ferret is a new Python library to take in HuggingFace models and run tons of different explanation methods.
Hadn’t thought about this idea before. This experiment is working on the WILDS domain-shift dataset, and then during test it develops a system for some portion of the data to get noised through nlpaug.
Facebook/Meta’s plan to match all English Wikipedia articles to their cited sources. This is getting tech press for an ‘accurate’ Wikipedia, as though it will be a super-intelligent no-nonsense AI checking our work, but if a bad article has a bad source, the retrieval model should agree. I also wonder about all of the articles which have non-web sources? This is covered in the article’s discussion section.
we only considered references corresponding to web pages, but Wikipedia also cites books, scientific articles and other kind of documents. These include other modalities than just text, such as images and videos. To fully assess the quality of Wikipedia references, Side needs to become multi-modal
After my AI Village talk, I’m continuing to read up on code-generation models. This paper generates new programming problems, and this could expand on the existing handful of open source code-generation problems and prompts.
Facebook/Meta dialogue model related to their ‘BlenderBot’ project. Incorporates retrieval from the web and human feedback.
Builds an ensemble of smaller models and weights their accuracy on different tasks.
A taxonomy for improving ML datasets and models. I think maybe this could help people outside of ML understand where data augmentation etc. fall in this world.
This is a peculiar sub-field which I had not heard about before, where they’ve developed a model which can handle imprecise or low-voltage hardware.
Facebook/Meta paper about a new approach to polynomial models, which are somewhat interpretable, and perform better than other interpretable methods even if they don’t match neural networks.
Improving language models by training it on a WikiData knowledge graph. The very largest model benefit much more from this process.
Introduces NLP concepts and popular biology/medical BERT models to people in the pharma world.
This has been linked from multiple places in the past month. Tabular data (and time-series data, not part of this paper) are still difficult for neural networks. The pro-NN view is that the datasets and models just need to be bigger. This paper covers three things which are interesting about tabular data, and how they perform on different NN architectures.