Deciphering Explainable AI, with ELI5

7 min readJul 31, 2019

After running into issues creating machine learning models for new datasets, I decided to try this next Explainable AI library, ELI5, with my proven and popular AOC Reply Dataset from May.

The goal for the project will be to train a classifier to identify ‘troll’ comments, and use the explainable AI tool to highlight which words triggered a troll or not-troll decision. I’ll use only SciKit-Learn and ELI5 and, in a future post, expand on it with other resources such as FastText.

Designing our process

ELI5, named after the internet acronym ‘explain like I’m 5 [years old]’, is a Python library which connects to several machine learning libraries including SciKit-Learn and XGBoost. From this sentence in the docs, it looks like our use case is a common and well-supported one:

ELI5 understands text processing utilities from scikit-learn and can highlight text data accordingly

Remembering how text classifiers work, getting there will be a little complex.

We want a SciKit-only pipeline from text to tokens to vectors.
We need to pick a classifier which can tell these troll and not-troll vectors apart. It would be good to review several different types, and use metrics.classification_report to find out which has better accuracy
Then we will connect ELI5 to the end of the pipeline, to take the most important vectors back out of our vectorizer, and figure out which words in the text to highlight.

Picking a word token and vector-izer

There are a ton of libraries which could help here, but SciKit-Learn comes with only a few built in. I am using TfidfVectorizer, which the docs tell me is “equivalent to CountVectorizer followed by TfidfTransformer.” What does this all mean?

The key to CountVectorizer is that we’re not using word2vec or a similar pretrained word-association model, but counting how often individual words are used across the dataset. This potentially takes up more memory, because every word can take up a column, instead of 100–300 for word2vec.
The TfidfTransformer portion is explained in the docs:

The goal of using tf-idf … is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative

I looked for examples of ELI5 with word2vec, but instead I got many results of people asking for simple explanations of what word2vec was. 🤦 Eventually I found this section in the ELI5 docs which explains that the text highlighting features expect us to use SciKit’s built-in tools.

Picking a classifier

The ELI5 text demo uses the classifier LogisticRegressionCV (the ‘CV’ here stands for cross-validation). I started out with this option and train on an initial 10–18k messages from my AOC dataset (any more and I got a memory error during vectorization, or an imbalance of troll and not-troll messages).

As in my AOC post, I didn’t manually identify every troll Tweet, but flagged any Tweets from users who used a variety of profane or obvious troll words.

I discovered that I needed to vectorize x_train and x_test in the same step, because without a word2vec style reference, their word vectors weren’t lining up at all.

I got a lot of these errors during training with logistic regression:

tweet_classifier_plus_eli5/.env/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. “of iterations.”, ConvergenceWarning)

tl;dr in a more professional context I should keep working on parameters for logistic regression, or choose a different model.

I then tried many other estimators from the list in ELI5 docs. I used SciKit’s classification_report feature to analyze their performance.

With RidgeClassifierCV (a type of Bayesian analysis) on another Tweet, the most important words were ‘lost’, ‘reminds’, ‘wapo’, and ‘election’.

wapo? please provide credible sources for your outlandish statements! btw congratulations on losing a sure 25k good paying jobs reminds me of when hillary lost a rigged election. rotfl [pic.twitter.com link]

The keywords here seem to be on the right track. As I re-ran on more Tweets, I could see words such as ‘leave’ and ‘Alex’ were also negative (more troll-ish), and ‘politics’ and ‘anymore’ were positives.

Other good performing classifiers were NuSVC and XGBoost. These both took longer to train, and are structured differently in ELI5’s explanation output, so I will fall back on RidgeClassifierCV for now.

Displaying ELI5 output

ELI5’s text analysis was confusing at first when I ran it in my console, but it looks great in notebooks — for this section I used NextJournal.
In the following, green contributes toward the final label, and red detracts from it (so a strong green has opposite meanings depending on whether the final label was ‘known weird’ or ‘less weird’ ).

Common phrases aren’t, just, https, and yes have influential scores in this system, which is frustrating, but most words seem to make sense.
Some longer Tweets without profanity are sorted into ‘less weird’ but get a low magnitude score.

If a final score is made up mostly of <BIAS>, then maybe I should consider a Tweet as borderline.

Overlaying results on Twitter

The next step is setting up a local server which will receive Tweet text and overlay an analysis (with a consistent color scheme). Because the ML code is already in Python, I use Flask (see server code).

When I try to access my local server from the Twitter desktop, my browser warns me that it’s against the document security policy. After doing some research, I can disable this (free security advice: don’t do this). I’d like to see this through, so I switch to a Firefox dev browser where I’m logged out of Twitter, disable content security, and continue developing the plugin.

I reverse-engineered the word highlighting, which adds a span with background-color set using HSL color scheme. I decided for consistency to set the two colors along the same axis, with a red/blue divide. Here is how my browser script handles Tweets on a post-debate Tweet from Senator Elizabeth Warren:

On AOC’s Twitter, it captured a few negative comments (I don’t find these comments on their own to be racist or profane, but they’ve clearly come to AOC’s page to troll, and I’m going to show these in this post rather than re-sharing a bunch of offensive or racist memes)

For the most part, though, showing a lot of red and blue words together is unintelligible, and the divide is rather confusing:

Some thoughts on how this could be improved:

Either a troll or a non-troll could use the words ‘embarrassing’ or ‘daughters’, but it’s trained on a handful of replies to AOC, where these are assigned a meaning that doesn’t make sense in a global context.
The phrase ‘student debt relief’ is broken into three words which the model finds all ‘known weird’, but together they are a very liberal position which was sent as a compliment to Warren. There’s something which could be done here for named entity recognition, bundling phrases together, or finding models which include more context about what words it appears along with.
The model’s accuracy comes from summing up all of the positive and negative scores and making a conclusion. Revealing that process has given me more information as a developer / data scientist, but unless the model and its underlying dataset is significantly improved and diversified (like 10x better), it isn’t friendly enough to show a user.

Next Steps

I should have separate processes for a well-trained model and the server-side code, so that the server can quickly be updated and rebooted without rebuilding the model.
In a future post I will try FastText to see how pre-trained word vectors / word embeddings changes classification (I’m confident that it will improve classifier accuracy by A LOT). The trick then will be figuring out my word-highlighting with a system which isn’t natively supported by ELI5.
Flagging Tweets would be a great reinforcement learning project, but I know less about that at this point.