After looking back at the start of Covid from early July, and checking in about ‘things that suck’ in late October, I realized another 4 months cover a full year — and here we are (almost).
In my October post, I didn’t even mention vaccines and asked if I believed my family could stay covid-free for another year. One older relative had asked if she would ever see the world go back to normal. Within a few weeks, Moderna and Pfizer trials proved >90% effective (much much better than expected). By the end of the year, we were discussing if…
I’m way behind on reading books for reviews, but here are some TV options:
In May 2020, I posted a project where I used spaCy and BERT to “flip” gender in Spanish sentences (un profesor viejo <-> una profesora vieja). This was useful to evaluate models’ biases or augment training data, but it was slow and dependent on hardcoded variables in my script. At the time, I suggested the next step would be using a neural network model (seq2seq) often used to translate or summarize the text.
In addition to bias and data, I’ve collected more reasons to use counterfactuals in any language:
One of the first steps in an NLP pipeline is dividing raw text into words or word-pieces, known as tokens. But what if you don’t have spaces to divide sentences into words?
People do write some spaces in Thai text, as you can see above, but they aren’t universal as they are in English. There is also no set punctuation to end a Thai sentence. This can cause confusion, or poetry, but humans are good at separating them in context. The difficult part, then, is getting computers to pick up on that context.
Recently I posted a benchmark summary for three Bangla language models and one multilingual model (Indic-BERT). I’ve bolded any models within 1 percentage point of the top score.
Indic-BERT and my own ELECTRA model performed well on Sentiment Analysis and News Topics, but notably worse on Hate Speech classification, not matching mBERT. What makes this task so difficult, and why does it affect models differently?
When I shared my results, the Indic-BERT team asked some questions and I went back to my original source for the data. …
Why do most of my projects focus on NLP? It’s partly because that’s where I’ve done the most reading of blogs, code, and papers (most papers have actually been pre-prints posted on arXiv.org). This has been a steep learning curve because when I worked as a web developer, I was never expected to look through research papers. Recently I’ve tried to read papers in other subfields of machine learning, on some more beginner-friendly topics.
The #1 piece of advice I ever saw about reading a pre-print is to plan on reading in multiple passes. Huh? But allowing myself to skim…
In summer of 2016 I started meeting UBI enthusiasts through the civic tech world. In 2018 when Andrew Yang wrote “The War on Normal People” and made headlines by saying he was running for president, I ordered a copy and a “universal basic income” T-shirt (it has a colorful bar chart). Everyone knew the campaign was a long shot, but everyone in the /r/basicincome world was excited to see the mission on TV, promoted by a new, younger voice in a field of exhaustingly familiar politicians. …
I’ve been posting here about machine learning / NLP projects for about 18 months now. This fall, after landing a contract, I resolved that my next milestone would be a paper in a major academic conference by NeurIPS 2021. But then I noticed— some 2020 workshops were still open for submissions. Could I get this done a year ahead of time?
I reached out to a few people and eventually recruited a human rights lawyer from the AI & International Law class, and a recent graduate who’s been making his own Indian NLP models. …
In July I wrote “soon this can end and we can all meet” to remember everything I could about the early days of the pandemic, ending travel, following NextStrain, and quarantining.
Now, almost another four months gone, I live in another state with an apartment overlooking the ocean, I have a new freelance gig (alongside my old job), every day I update a site with the latest voter records from Texas.
But there’s still a pandemic, everywhere.
To escape winter, I left town again. I’d read all about Barbados and Bermuda’s remote work visas, but the costs are considerable. To…
This is an in-depth study of large-scale soy farming, and its impact on Paraguayan government and several social groups. I’d compare this book to an NPR story that comes out of the blue and you listen for an hour — some facts and figures, some historical context, some personal narrative as the author moves around and meets sources. I heard about this book through the small network of liberal farming Twitter threads which had also led me to read Uncertain Harvest months ago.
What emerges is a wicked problem, where soy is so profitable that it expands to the limits…
Nomadic web developer and mapmaker.