What makes this Bengali NLP task so difficult?

A small dataset affects four ML models differently

Recently I posted a benchmark summary for three Bangla language models and one multilingual model (Indic-BERT). I’ve bolded any models within 1 percentage point of the top score.

Experiment 1: Revised Dataset

When I shared my results, the Indic-BERT team asked some questions and I went back to my original source for the data. The designated train and test CSVs were recently replaced with a single ‘revised’ CSV.

Experiment 2: All Small

Even with the new CSV, I noticed that Hate Speech is the smallest dataset (1400 rows, with only 1050 for training). I can’t experiment with making this dataset larger (maybe data augmentation another day), so I wondered what would happen if the other training datasets were the same size? That lead to my second experiment.

Updates?

This article was written in November 2020. For recommended models such as MuRIL, I will keep this readme up to date: https://github.com/mapmeld/use-this-now/blob/main/README.md#south-asian-language-model-projects

Nomadic web developer and mapmaker.