ML Arxiv Haul #2

Nick Doiron
4 min readFeb 5, 2022

--

About two months ago, I burned through a backlog of ML articles which I had queued up to read ‘soon’. That list has gotten long again so I’m going to try and write out short summaries as I skim through them.

Combines several existing language benchmarks for long texts (the longest part being NarrativeQA, which has questions about whole books). Several of the datasets come from Project Gutenberg (public domain texts), so unless the books are obscure, I’d worry that language models trained on the public internet might know too much already (for example: if I asked you about Les Miserables: “what was Valjean sentenced to jail for stealing?” you may have seen a summary and don’t need to extract answers from the original text).

This is a late 2019 dive into OpenAI’s decision to release GPT-2 in slowly increasing model sizes. Unfortunately it predates the completely privatized and commercialized GPT-3 and DALL-E, and EleutherAI’s reconstructed versions of those models, which add complexity to this discussion.

When we talked about this in a 2020 class on international law and AI, the class learned how GPT-2 was withheld but not that it had since been totally released and we weren’t all trapped in a GPT-matrix-web. I would like to place OpenAI’s truth in the space between ‘cautious AI safety research’ ‘don’t need to market itself with open source anymore’ ‘money’ ‘only oligarchs and mega-corps could deploy GPT-3 anyway’ with some sociological research but this probably isn’t the most pressing AI & society issue.

Facebook/Meta’s paper about computer vision architectures. There’s a general sense that convolutional networks (such as ResNet) are being extinct-ed by the rise of attention-based vision transformers. The thing is, all of these architectures could be better with some love and care. The researchers improve their ConvNeXt model and reach a new high on ImageNet. I was a bit concerned that maybe this was a purpose-built ImageNet model not applicable to other CV tasks, but they include a mention and an appendix on robustness of the model.

I’ve previously developed BERT and GPT-style language models, fine-tuned them, and submitted one of the first ‘adapters’ to AdapterHub. The idea there was that the last stage of the neural net could be quickly specialized or swapped out for the specific task, like a drill bit. This Google paper takes a new approach by changing the middle layers and finds it interesting, but they don’t claim SOTA results at this time.

I want to include my June 2020 Tweet to look smart about this intermediate layer topic.

OOD

This was just posted just recently by researchers at University of Wisconsin. They train a model to be really good at object-recognition boxes and create a new benchmark around out-of-domain objects which the model is unfamiliar with (best explained by the error below).

Predicts accuracy on a test set with a new metric (Average Thresholded Confidence) just seeing how the model reacts to batches of the new input and not knowing the labels. Their example of OOD data are WILDS and some interesting ImageNet spinoffs where the test set would be illustrations or other new formats.

Model prompts

Uses feedback from the users about incorrect answers / misunderstood questions to re-prompt GPT-3 and provide better answers in the future. Researchers are from Carnegie Mellon and AllenAI. I think that OpenAI is more likely to take the InstructGPT route (using reinforcement learning from human prompters) but still interesting.

Other NLP

Basic but interesting analysis — if you scramble word order, models can figure out the right word order 87% of the time across language families. So the grammar is only particularly helpful in the remaining cases and unexpected cases (“man bites dog”).

Discusses an ongoing problem in NLG where the most likely text can be boring and repetitive. The researchers analyze human-generated text and discuss an expected information content at each new token. They cover two common methods to select tokens from GPT-3 end probabilities (top-k, nucleus) and come up with their own (‘typical sampling’) to fit this new system. The new method gets high scores on perplexity and a select number of tasks such as summarization. This was relevant to my GPT-NYC interests and I’m happy to see they have a PR open with HuggingFace Transformers.

--

--

Nick Doiron
Nick Doiron

Written by Nick Doiron

Web->ML developer and mapmaker.

No responses yet