ML Arxiv Haul #6

  • A classifier from 2016 (pre-BERT) is used to label Tweets from previously un-labeled sources as AAE or SAE.
  • One hate speech dataset (Davidson 2017) is labeled as 70% AAE, even though it is a general hate dataset.
    Another hate speech dataset (HateXplain 2021) is labeled as ~10% AAE. It includes content from Twitter and Gab (a right-wing network).
  • I’m concerned about de-duplication of Tweets, unless BERT is being fine-tuned and evaluated on each dataset individually?
  • BERT is a small model in the year 2022
  • The end goal of ColBERT models is document retrieval, i.e. instead of showing the answer to a search input, or the sentence with the answer to your search input, it’s going to return a full relevant document. This means that you might want to separate the original document into several passages for indexing.
  • ColBERT is its own thing where you would want to start pre-training from scratch, or use the existing weights for English.
  • The examples help you set up with existing weights, existing index but I wasn’t able to figure out starting from scratch in another language.
  • text cues and image-text examples are ‘cherry-picked’
  • DALLE-2 text includes imagined characters — consider the image included in my Tweet above — how can the lower lines be transcribed?
  • vocabulary (other than Apoploe) are difficult to repeat
    Apoploe works because the model is confused and guesses that it is a bird species name (Latin)
  • The given words may trigger neurons inside of DALLE, but they are random noise. This appeals to linguists who want to talk about what is a language even and how it is not bird+bug. It is still bad news for text filters on a model.

--

--

--

Web->ML developer and mapmaker.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to create more Efficient Deep Learning Models

Build a small project on Train and Run ML Model inside a Docker Container

Know the Difference Between Logistic and Linear Regression Through Simple and Straight to the Point…

Fairness in the Age of Algorithms

Epileptic Seizure Classification ML Algorithms

Support Vector Machines Deep Intuition PART-I(Basic Intuition)

Using Deep Convolution Generative Adversarial Networks (DCGAN) to generate anime faces!!

How to make a Facial Expression Recognition app?

Facial Expression

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nick Doiron

Nick Doiron

Web->ML developer and mapmaker.

More from Medium

Life in Tech: The uncommon (and doable) path from engineering to law

Welcome to Titus Talks season 2, kicking off with Kathryn Hamilton!

Complexity and the Endless Combination of Rewrite Machines

Is It Okay to Unplug a Conscious Computer?