Deciphering Explainable AI, Pt.1

Trying out Seldon.IO’s Alibi

One of the trending topics in machine learning is ‘explainable’ AI, in other words, opening up a neural network or other system whose learning process created a ‘black box’ system.
You might use this to check what contributes to a model’s success (for example, finding which factors predict real estate value). You might be looking for biases to prevent perpetuating nonsensical or unfair outcomes, as in this case:

How can a system be ‘explained’ anyway?

A machine learning model is a mathematical product, not a magic trick. For toy systems maybe you can solve a problem with linear regression, but usually there is not one reveal or a one-liner formula. So before people build explainable AI (XAI), they need to set some scientific goals.

Patrick Hall of H2O.ai has a pre-print On Explainable Machine Learning Misconceptions & A More Human-Centered Machine Learning which singlehandedly redirected my expectations from magic tricks to reading up on game theory.

I had two main takeaways:

  • If an AI system is explained, that doesn’t guarantee that it is well-designed or unbiased. Conversely, some black boxes work well 🤷. This complicates the value of making a system ‘explainable’ or ‘interpretable’.
  • Mathematical explanations for a model could be: a labeled decision tree, ICE curves, Shapley values, local linear coefficients, or ‘models of models’.

Now that I have a better idea of what XAI is and is not, I’m setting up a blog series about at least three XAI tools: starting with Alibi (this post), then Microsoft’s InterpretML, and Google’s Lucid and ‘What-If’ tool.

Seldon.IO’s Alibi

One company that’s come up in my Twitterverse a few times is Seldon.IO (as in Asimov’s Hari Seldon), and their XAI tool named Alibi.

Alibi has multiple options to analyze models from SciKit or TensorFlow (via Keras). Some practical examples find the most significant pixels in an image, columns in a table row, or words in some text when you make a prediction.

A Super-Easy Text Classifier

Let’s see how it visualizes a text classifier (basing this on Alibi’s Anchor Text demo for movie reviews). To make it as super-easy / short-circuit-able as possible, one category starts sentences with ‘Apples’, one category begins with ‘Oranges’, and the third will have neither. This latter category reveals how the Anchor Text method handles the absence of a word being relevant.

Sure enough, the demo’s SciKit-Learn LogisticRegression + spaCy model gets 100% accuracy on this super-easy problem.

  • On the Apples category, we see that AnchorText found ‘Apples’ to be the anchor. Replacing any combination of the other words with the nonsensical ‘UNK’, we get the same result.
  • Instead of UNK, AnchorText has a different mode where it replaces words with synonyms (I got warnings running those code, but after a few minutes it comes back with good results — I have an GitHub issue open now on how this might be fixed).
  • You can see that capitalization is relevant in the vectorizer (many versions of ‘THis’). Beginning with the singular ‘Apple’ was filtered into ‘neither’ too. You might change Vectorizer settings for a model that doesn’t differentiate plurals and other details.
  • In the Neither category, Alibi predicted it correctly, but the anchor text was empty. Actually, every part of the response was empty. I’m thinking about how the null response could communicate more information.

Eastern Arabic Numerals

Now I’ll try a more serious problem: a TensorFlow classifier of Eastern Arabic numerals. This example is similar to MNIST handwritten digits which you’ve seen in every machine learning tutorial, but with digits: ٠١٢٣٤٥٦٧٨٩ often used in countries with Arabic language or script.

Accuracy of a convolutional neural network (CNN) was 98%. Alibi has a few different options to explore image models:

TrustScore compares the output of the classifier (in Alibi’s demo, an autoencoder; in my demo, a CNN) to a simpler nearest-neighbors classifier. Though this simpler classifier is less accurate on its own, researchers have developed an algorithm comparing the two which is more accurate than the original classifier’s confidence score.
Here we see examples of digits which received a low TrustScore. The correct output is shown in this output as ‘label’ and we can see our final prediction would be improved if we switched low-scoring results to ‘closest other class’ provided by the TrustScore library.

I reworked the code to find the lowest-confident results for any digit. For example, these are the worst ٥s

I was puzzled that these could be mistaken for ٢,٧, or ٨ , but it’s important to remember that low TrustScores are ones which the original classifier wasn’t confident, and the second classifier did better. The most puzzling handwritten digits would be ones where both classifiers failed.

If we combine both models and the TrustScore, we can make a system with improved overall accuracy.

Generating the rest of these figures first requires an autoencoder. This is another type of neural network where instead of outputting the labels, we get back the same digit… but why? The idea is that by compressing the image (encoding) and then training a model to reconstruct it (decoding), the system will fix onto more general characteristics.

Then Alibi trains a model using what the library calls the Contrastive Explanations Method (CEM).
Let’s start with this one handwritten ٨:

The pertinent positive image below identifies the most important pixels for the classifier finding that this particular digit is a ٨. It seems like the lower left is important because only one other digit of (٠١٢٣٤٥٦٧ and ٩) ventures there.

As I understand it, in the pertinent negative below, the pixels outside of the ٨ are highlighted here based on how important it was to be left black (otherwise they would steer the original classifier to label it as another digit).

The counterfactual image uses pertinent negatives to show us how it would turn this rather hooklike digit ٢ into another.

It becomes more of a ٣ or ٤ in the counterfactual (I don’t really see it here, but it could be something). The documentation points to this paper for a better understanding and example uses of this algorithm.

At first I was surprised to get explanations on a singular prediction level, rather than a global understanding. In practicality this might be more useful! If you had a successful machine learning system that occasionally made strange errors, that’s when you would be searching for explanations. If I get my Twitter troll classifier working in SK-Learn, I’d like to see which words put this or that Tweet over the line (maybe that can be in the next blog post).

There must be an art to knowing which technique to use, or aggregating trends for many predictions.

Related Reading

Patrick Hall (from the Misconceptions of XAI paper) also runs a GitHub repo listing links to ‘Awesome Machine Learning Interpretability’.

Christoph Molnar has a longer-form book online, covering lots of the Interpretable Machine Learning concepts.

There are MNIST datasets for Devanagari, Telugu, and Bangla numbers here:

A friend shared a link to this Google Brain / PAIR walkthrough on BERT (an advanced pre-trained model in natural language processing that’s a successor to word vectors). This is much more advanced than explaining a classifier, but it certainly is an interesting look into the nuts and bolts.

Uber’s posts on Causal Inference and Meditation Modeling are methods to explain user behavior in their app. It seems like these techniques could be useful for understanding AI and human systems.

Nomadic web developer and mapmaker.