After looking back at the start of Covid from early July, and checking in about ‘things that suck’ in late October, I realized another 4 months cover a full year — and here we are (almost).

An effective vaccine changed everything

In my October post, I didn’t even mention vaccines and asked if I believed my family could stay covid-free for another year. One older relative had asked if she would ever see the world go back to normal. Within a few weeks, Moderna and Pfizer trials proved >90% effective (much much better than expected). By the end of the year, we were discussing if…


from quarantine

I’m way behind on reading books for reviews, but here are some TV options:

  • Counterpart (2 seasons, 2017–2019)
    I didn’t do a full quarantine rewatch of this sci-fi drama series, but there are certain episodes which I think about often. There are elements about what decisions and events make us who we are. The villains get too much free rein in Season 2, but we do get a reasonable series conclusion.
  • Little Things (3 seasons on Netflix, 2016–2019) This is an Indian romantic comedy which started as a YouTube series. Watch with original audio and English subtitles (for frequent Hindi…

In May 2020, I posted a project where I used spaCy and BERT to “flip” gender in Spanish sentences (un profesor viejo <-> una profesora vieja). This was useful to evaluate models’ biases or augment training data, but it was slow and dependent on hardcoded variables in my script. At the time, I suggested the next step would be using a neural network model (seq2seq) often used to translate or summarize the text.

In addition to bias and data, I’ve collected more reasons to use counterfactuals in any language:

  • train chatbots on an equal selection of messages where the user…


One of the first steps in an NLP pipeline is dividing raw text into words or word-pieces, known as tokens. But what if you don’t have spaces to divide sentences into words?

Background

Image for post
Image for post
Sample Thai text from Wikipedia

People do write some spaces in Thai text, as you can see above, but they aren’t universal as they are in English. There is also no set punctuation to end a Thai sentence. This can cause confusion, or poetry, but humans are good at separating them in context. The difficult part, then, is getting computers to pick up on that context.

When I first heard about this text-parsing…


A small dataset affects four ML models differently

Recently I posted a benchmark summary for three Bangla language models and one multilingual model (Indic-BERT). I’ve bolded any models within 1 percentage point of the top score.

Indic-BERT and my own ELECTRA model performed well on Sentiment Analysis and News Topics, but notably worse on Hate Speech classification, not matching mBERT. What makes this task so difficult, and why does it affect models differently?

Experiment 1: Revised Dataset

When I shared my results, the Indic-BERT team asked some questions and I went back to my original source for the data. …


Why do most of my projects focus on NLP? It’s partly because that’s where I’ve done the most reading of blogs, code, and papers (most papers have actually been pre-prints posted on arXiv.org). This has been a steep learning curve because when I worked as a web developer, I was never expected to look through research papers. Recently I’ve tried to read papers in other subfields of machine learning, on some more beginner-friendly topics.

Best places to start

The #1 piece of advice I ever saw about reading a pre-print is to plan on reading in multiple passes. Huh? But allowing myself to skim…


From video games to overlooking government tech

In summer of 2016 I started meeting UBI enthusiasts through the civic tech world. In 2018 when Andrew Yang wrote “The War on Normal People” and made headlines by saying he was running for president, I ordered a copy and a “universal basic income” T-shirt (it has a colorful bar chart). Everyone knew the campaign was a long shot, but everyone in the /r/basicincome world was excited to see the mission on TV, promoted by a new, younger voice in a field of exhaustingly familiar politicians. …


On a proposal rejected from a NeurIPS workshop

I’ve been posting here about machine learning / NLP projects for about 18 months now. This fall, after landing a contract, I resolved that my next milestone would be a paper in a major academic conference by NeurIPS 2021. But then I noticed— some 2020 workshops were still open for submissions. Could I get this done a year ahead of time?
I reached out to a few people and eventually recruited a human rights lawyer from the AI & International Law class, and a recent graduate who’s been making his own Indian NLP models. …


In July I wrote “soon this can end and we can all meet” to remember everything I could about the early days of the pandemic, ending travel, following NextStrain, and quarantining.
Now, almost another four months gone, I live in another state with an apartment overlooking the ocean, I have a new freelance gig (alongside my old job), every day I update a site with the latest voter records from Texas.

But there’s still a pandemic, everywhere.

To escape winter, I left town again. I’d read all about Barbados and Bermuda’s remote work visas, but the costs are considerable. To…


Stories from soybeans and refugees

The Government of Beans: Regulating Life in the Age of Monocrops (Kregg Hetherington, 2020)

This is an in-depth study of large-scale soy farming, and its impact on Paraguayan government and several social groups. I’d compare this book to an NPR story that comes out of the blue and you listen for an hour — some facts and figures, some historical context, some personal narrative as the author moves around and meets sources. I heard about this book through the small network of liberal farming Twitter threads which had also led me to read Uncertain Harvest months ago.

What emerges is a wicked problem, where soy is so profitable that it expands to the limits…

Nick Doiron

Nomadic web developer and mapmaker.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store