What if Explainable AI doesn’t work?

Nick Doiron
4 min readNov 10, 2019

--

There is a strengthening current against the influence of AI systems in areas which can upset people’s livelihood and freedom. Even though we are in early days of AI, pilot programs are experimenting with screening job applicants, suspicion and sentencing in the criminal justice system, and investigating WIC stores. At least four cities have banned facial recognition use by their government.

This is not a fringe movement either, as favorite tech skeptic Pinboard got a lot of appreciative engagement on a Tweet calling out “racist linear algebra” as the current status quo for consequential machine learning.

AlgoFace, a startup navigating this space, pledged:

This year we pledged to never enter any industry where a false positive or false negative from our artificial intelligence technologies can be used to rob a person of their human rights.

We build technology to detect demographic attributes about the human face, but never to detect the identity of the person the face belongs to.

Do you read me, HAL?

We learned decades ago to fear an AI that says “I’m afraid I can’t do that”, but we also don’t want an AI that tells humans “because I said so”.

The technology world saw the controversies over AI and sought out a technical solution, which is often called either Interpretable Machine Learning or Explainable AI (XAI) depending on how a solution works. Essentially, we can either design machine learning algorithms to have inspectable components (in a strictly linear algebra example, measuring the weight of each data input), or run trained algorithms and neural networks through a blackbox analysis (in a NLP example, the LIME algorithm measures probability of outcomes for several permutations of the text).

The goal is that an explained applicant-hiring system will show us that its decisions are made on the intended inputs (such as level of education and relevant skills) and not on meaningless inputs (spacing on a CV) or illegally discriminatory information (name or appearance). A bad AI would theoretically be exposed by this system.

When explanations don’t do the work

Vision models need explaining, and those explanations need their own explanations

The first paper to share on this was pre-published by Dr. Cynthia Rudin almost a year ago, but recently updated. I learned about it from a blog post by Adrian Colyer. I’ll link to both here:

The blog post covers the paper on a very understandable level, so I won’t repeat. The key here is to debunk some common visualizations and explanations of computer vision in AI, and set an ambitious goal: we should make systems out of inspectable/interpretable components, and we should advance interpretable machine learning to show more than a simple attention map.

Text explanations are flawed and can be manipulated

This attention map problem is also discussed in the context of text / NLP, in this paper by Sarthak Jain and Prof. Byron C. Wallace:

In this case, the paper shows examples of the exposed attention weights of a model, which seemed like the best place to look at a word’s importance are not so revealing. They also present adversarial examples, where tweaking attention weights on a recurrent neural network allows them to find an alternative model which successfully classifies movie reviews, even though the attention highlights common words such as ‘was’.

Update: Saliency maps don’t work very well either

I saw this paper after completing this post, but it compares image maps from a saliency method to basic image-processing or edge detection.

Going forward

I’m reflecting on my previous posts where I centered ELI5/LIME to explain how I classified Tweets. I was using SGDClassifier, not a neural network, so the blackbox effect was not as strong. But I could overlook several confusing outputs of the attention map.

Consider a positive Tweet and a negative Tweet, each only a few dozen words long. Almost any uncommon word could be labeled as contributing to a positive or negative classification, and to the human reader, look like it’s accurate!

It’s difficult to train a bag-of-words or averaging model on an AOC reply dataset, where common words like ‘bartender’ are used positively, neutrally, or negatively, in ways that humans can easily tell apart. If I were diagramming these Tweets for a mathematician unfamiliar with American culture and politics, I would never use a word-by-word heatmap. Instead I would show links between words, references to unwritten political topics, and the etiquette of replying to a bodega cat photo.
Following the advice of the Rudin paper, it would be nice to build an interpretable machine learning tool which dug into text explanations on this level.

--

--

Nick Doiron
Nick Doiron

Written by Nick Doiron

Web->ML developer and mapmaker.

No responses yet