A menagerie of text ‘attack’ libraries

In image classifiers, a huge library of image augmentation tools crop, rotate, and otherwise edit input images to add variety to the training data and make classifiers more robust.
It’s reasonable to think that a similar technique could improve text data, but editing a sentence is more than moving pixels in x/y/color space. Changing one word or one letter can render a sentence incoherent or flip its meaning.

Note: there is some evidence that nonsensical data can help with some tasks — in Does Pretraining for Summarization Require Knowledge Transfer?, authors Krishna, Bigham, and Lipton show this for text summarization, and Uber has experimented with ‘Generative Teaching Networks’ with unreadable digits. But there’s debate about whether this is trains a robust model.

Text mutation libraries can have very different functions, such as adversarial ‘attacks’ against models, or those which add data before training. I’m going to talk about all of them here under that ‘mutation’ label.


This is one of the best-known libraries, with 1,700+ stars. It’s hosted by the QData team at University of Virginia. They have a ‘model zoo’ which shows which pretrained models and metrics you can expect after running the permutations / attacks.


This library was recently updated by the Natural Language Processing Lab at Tsinghua University. There is integration with HuggingFace models and datasets, which implies it could support almost any English or Chinese transformer model, and also older NLTK / other classifiers.

EDA: Easy Data Augmentation

This is our first option where we’re not reading into the model, but editing text to diversify our training data. The paper’s authors come from Carnegie Mellon, Google, MILA, and Dartmouth. A Chinese implementation has more stars than this original repo.

The techniques are simple: synonyms, random insertions, random swaps, and random deletion.


Hosted by University of Pretoria, this augmentation library can replace words with synonyms from WordNet or Word2Vec, or run a sentence through a translator and back to vary it. I recently saw Dr. Vukosi Marivate is looking for collaborators to update gensim and add features.


Code for a recent paper from researchers at Auburn University and UC Berkeley. They look for nearest-neighbors inside of the embeddings to generate similar text which confounds the model.


Code for a recent paper from Facebook Research. They also use gradients to find words to defeat the classifier.

I found the generated text to not be so realistic, or different enough to be worth changing the output class?


This is a lesser-known library (12 stars) by Dillon Niederhut, but its add_love permutation made the rounds on Twitter, and was funny enough that I made a note of the library. The idea is that sentiment analysis classifiers can easily confuse a negative sentence with a positive one, simply by ending it with ‘love’.
There’s also add_leet which swaps out characters.


This article was written in October 2021. If my recommendations change in the future, I’ll update it on this GitHub README.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store