1 min readJul 18, 2020
Uh-oh, I used the shuffled/deduplicated download from their site. I'm glad that I documented the process, then, and thankful for your response. I was planning to retrain the Hindi model in the near future - I will ask the OSCAR team for the unshuffled data to measure the improvement.