Algorithmic fairness in machine translation

On a proposal rejected from a NeurIPS workshop

3 min readNov 9, 2020

I’ve been posting here about machine learning / NLP projects for about 18 months now. This fall, after landing a contract, I resolved that my next milestone would be a paper in a major academic conference by NeurIPS 2021. But then I noticed— some 2020 workshops were still open for submissions. Could I get this done a year ahead of time?
I reached out to a few people and eventually recruited a human rights lawyer from the AI & International Law class, and a recent graduate who’s been making his own Indian NLP models. We put together a one-pager proposing a breakout session / panel discussion.

What was it about?

We had a few ideas and citable papers going around, which were constrained a bit by the workshop topics. If we’d been accepted, my plan was to have a few calls to get us on the same page ahead of the panel.
Here are the three points that were on my individual agenda:

What would the ideal output of a translation model look like, if we focus on making it fair / interpretable / explanation-based? Here I thought about the end user as a human translator, choosing between suggested translations.
I gave an example of tā in spoken Mandarin Chinese, which can be he, she, or it (in written Chinese, the character is distinct). Where the translation is ambiguous, maybe we could see explanations for which pronoun should be used based on context.
What are some more defined technical solutions — for example, could influence functions, which show the training examples that most contributed to the decisions of a model, be used in a translation model?

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

Modern deep learning models for NLP are notoriously opaque. This has motivated the development of methods for…

arxiv.org

How do we measure bias in parallel corpora? As an example, the IIT Bombay English-Hindi Parallel Corpus is widely recommended for training translation models, but it includes articles from the early 20th century, covering old attitudes about gender, race, caste, and colonialism.
This can bring in issues of the multi-dimensional nature of bias and fairness. Gender bias is often reduced to: does the model treat men and women equally, and this is an easier topic to tackle than ‘can we ever remove assumptions about gender from training data’ or ‘this is a historical account of apartheid South Africa’.
Historical words should be translated faithfully, but we shouldn’t allow them to determine what words appear in, for example, modern subtitles and chat tools.

On having a facilitator

The workshop asked for us to specifically name one member as facilitator. In an in-person conference this person might be the notetaker, but since this would be a recorded Zoom, I didn’t understand the point. I invited someone as a facilitator and promised we would organize a panel. Later I considered why did I not put myself as facilitator, if it really was no different? So I feel guilt about that.

On being rejected

A month later, I got a rejection notice and some feedback. There was a general agreement that we had interesting discussion topics, but some of the material was not new enough, or not connected enough to the core of this workshop. We had one Accept and two Rejects, including this lovely excerpt:

One of my first Tweet-able reactions was, I’m not even in research or academia, why did I invite a new source of criticism into my life?
But being independent softens the meaning of rejection… I am under no career pressure to submit this topic to another event or go back to some advisor or committee for damage control. I wasn’t even planning on getting into this year’s event.
I should acknowledge some mistakes, first by coloring outside the guidelines, second by not adding detail once the committee clarified that the one-page limit did not include references.

What next?

Aside from my day job,

I delivered a new talk on JAX (based on this Medium post) to two virtual conferences. I would like to continue that work and make more accessible examples.
There is an upcoming call for workshops and tutorials for FAccT (a conference on fairness and accountability) so I can aim to be an active participant in that event or in a workshop