Interpretable Q&A AI
AllenNLP’s Interpret is designed not just to understand and model human language, but to expose how a language model decides its output. One of their examples, Reading Comprehension, receives a document and a question about its content, and uses AllenNLP’s ELMo-BiDAF model to return an answer and its position in the source text. This led me to think of a few improvements:
- First, building and evaluating the model’s performance on another company’s English-language Q&A dataset, in this case, Google’s Natural Questions (website and Kaggle challenge).
- Replacing the ELMo model with other pipelines for question-answering (such as Transformers).
Testing AllenNLP on Google’s Natural Questions
It’s really easy to get started with AllenNLP’s Q&A pipeline. I then wrote a quick script to compare its answers to the training dataset’s answers.
Out of several Q&A’s, AllenNLP appeared to understand if the answer should be a name, number, or a range of dates, but usually picked the wrong ones. There were two issues: Google’s Natural Questions includes HTML tags, and it has much longer text source than the original model was trained on.
I tried rerunning the script without the HTML, and was impressed by some answers:
what are the minds two tracks and what is dual processing
an implicit ( automatic ) , unconscious process and an explicit ( controlled ) , conscious processhow does bill of rights apply to states
procedurally and substantivelywhat episode of how i met your mother is the slap bet
9th (note that it’s already reading this episode’s wiki article, and just needed to read a sentence about it being the n-th episode)when did they finish building the sydney opera house
1973
Fine-tuning a model on Google’s Natural Questions
There were still plenty of misread answers, so the right approach would be to get a pre-trained Q&A model, and fine-tune it on the new dataset. It appears that AllenNLP has a separate allennlp-reading-comprehension
repo for doing this, including an addition this month of a Transformers-based model:
That said, I couldn’t figure out how to work with this system. Starting with the same Q&A model had some errors, as did the most recent models posted on https://storage.googleapis.com/allennlp-public-models/. Every model raises different errors on scripts/transformer_qa_eval.py
.
I tried a different approach to access TransformerQAPredictor
, but that still requires initializing with a model and dataset reader, so I’m looking forward to this being documented.
Future Ideas
- Creating a machine learning system to predict which Wikipedia article I should fetch based on a question (so a user can submit just a question, and not the source document).
- For languages other than English, there are multilingual Q&A datasets and challenges