I originally planned this as a separate website or video series, but it’s stalled in the past few months. I’ve decided to post with only a few edits.
What happens to the experiments which don’t show improvement over our previous baseline? In the data science / machine learning community, we hear that negative results can bring balance and objectivity. Yet there is still a publication bias, and a lack of sources for great negative results content. Here I’ve selected three papers which are recognized as standout examples of negative results, with added commentary or definitions.
The WinoWhy dataset — presented at ACL 2020 by Hongming Zhang, Xinran Zhao, and Yangqiu Song — offers human explanations for ambiguous sentences.
Bill passed the half-empty plate to John because he was full.
We understand that the ‘he’ refers to Bill. Crowdsourced explanations include:
Bill was full, so he gave the rest of his food to John
Bill is full and couldn't eat more
The purpose of the dataset is to fine-tune better explanatory models. But in each of these prompts, an alternate anti-explanation must exist in the model’s probability-space (i.e. John was full so he needed only half).
Early this year, I uploaded a seq2seq model to generate gender counterfactuals in Spanish (el profesor viejo <-> la profesora vieja). Since then, I debugged some issues, made a general-purpose library, and created an initial seq2seq model for Arabic.
The goal is to see if passing data through process creates more generalized training data (data augmentation) which improves accuracy and fairness.
When I proposed an Arabic version of this project, Wissam Antoun recommended a paper ‘Automatic Gender Identification and Reinflection in Arabic’. The authors at NYU Abu Dhabi released their parallel gender corpus adapted from OpenSubtitles. …
When I recently chatted with members of a fiction-generation AI startup, one of my pre-written questions was whether a model could be trained for a specific non-fiction location. This is based on my idea for a model trained on the AskNYC subreddit.
Note: I wrote up my experience, thoughts, and conversations on the days when I received doses of vaccine, and when I reached statistical immunity. Nothing dramatic happened, but I wanted to have a contemporary account to look back on later.
Compared to others in the US, I was lucky to receive my first dose when I did. For other countries where access is still being negotiated and a new wave of infections is starting, the wait is still longer. I know this isn’t a great time to read about vaccinations. Sorry.
For people who read this and feel that your personal…
In these first four months of 2021, a few researchers have used my language models or mentioned me in their papers. I feel encouraged and validated to be part of someone else’s work. Sharing models on HuggingFace made it possible for them to continue that work and extend it into new areas.
I’m confident that the next year of papers from researchers in their native languages will greatly surpass my 2020 work.
M. M. Rahman, M. Aktaruzzaman Pramanik, R. Sadik, M. Roy and P. Chakraborty
Compared Multilingual BERT and Bangla-Electra model on three news topic classification tasks: BARD, OSBC, and…
I’m working through the books which I’ve been carrying across the country over the past several months. Soon I’m moving to Colorado and getting my second dose of vaccine. Though COVID has not disappeared here or elsewhere in the world, I’m retiring the title “Pandemic Reads”. Hopefully I can make reading and recommending books a more regular part of my life.
What do I think about reviewing 19 books during a pandemic year? I feel like a determined reader could read two or three books in a month and be very thoughtful on GoodReads. So I under-developed this; I watched…
Facebook recently published “Casual Conversations”- 45,000 short videos with paid actors as a new benchmark with diversity along speakers’ age, gender, and skin color (Facebook classifies these speakers on the Fitzpatrick scale rather than race and ethnicity).
While downloading I was listening to a NewNaratif podcast about hijab rules in Singapore, and wondered whether Facebook included any hijab-wearing women in a diversity dataset. To make the question applicable to a wider audience, my research question is: how many speakers in the dataset wore a hat or head covering? What types are included in the dataset? …
Arizona opened COVID vaccine registration at their state-run facilities to all adults as of March 24th. By the end of the day, all major locations available in Maricopa County (which includes Phoenix) had updated their websites, too.
I searched for vaccines at CVS, Walgreens, Safeway, and Fry’s Food and Drug. A particular wrinkle to my schedule was that I am moving, so I’m looking for either the J&J one-dose vaccine, a second dose by April 17th, or a second dose in Colorado (which will be open for all adults in mid-April).
I was skimming through Twitter and an unrelated post about keras-tuner got me thinking, should we use hyperparameter tuning for ML fairness? And why was my immediate reaction so negative?
I don’t intend to resolve it here, but think arguments could be made for either side, making it good for an ML interview question.
Nomadic web developer and mapmaker.