What did Saudi info ops Tweet?

On December 20, Twitter released data on nearly 6,000 accounts which they connected to Saudi information operations / disinformation. The 4.3GB zipfile of text (and over a terabyte of media) is Twitter’s largest disclosure, dwarfing the 1.2GB Twitter text dataset from Russia last year.

Back when I released a dataset of political Tweets, I heard whispers that bot networks were in part driven by Saudi Arabia, including US politics accounts. With a sample of their work now in hand, I wanted to see what I could learn.

1. Language breakdown of Tweets

I used Twitter’s tweet_language column as my source of language.
93% of Tweets were Arabic, and the second highest category, ‘undefined’, was another 4%. The next 15 languages take up nearly 3% of the dataset.

English is the most common language apart from Arabic, but it is a small part (1.5%) of the full dataset.

Disclaimer: I used Pandas groupby and + to combine multiple CSVs; if one CSV did not contain any Tweets in a language this became a NaN, meaning I don’t have accurate counts for these.
I found these languages to be truly infrequent. Also, any language outside of the top 10–15 is likely mis-categorized. There are 440 ‘Icelandic’ Tweets with content like ‘Skskskskksk’ or app checkins from Spain.

Original Content and Retweets

Arabic and ‘undefined’ remain dominant.
Russian, Japanese, and Ukrainian are all highly original (slightly >90%), while Portuguese and German fall in ranking (they are only 33% and 23% original).
Tweets in Portuguese notably used ‘📣 Projeto Follow Trick ™ 📣’ ‘To Gain Followers Follow Me’ to build up a network, more than a disinformation campaign.

‘Korean’ Tweets were almost all actually Arabic retweets, marked incorrectly due to newlines, emojis, and Korean quotation marks. Here’s a genuine, not info-ops Tweet which was Retweeted by these accounts and labeled Korean:

Code:

2. Who were these Tweets for? Most popular mentions

In Arabic:
@AdelAliBinAli (Chairman of Ali Bin Ali Group / Holding in Qatar)
@Turki_alalshikh (Saudi Royal Court Advisor, Chairman of the General Entertainment Authority)

In English:
Not only were English Tweets not as common as you might think, the top accounts were not related to US politics.
The top accounts were related to social media followers, with the tag “#H0MEL3ND”. @LIONxCLAW and @AdryDrive get 241 Retweets on posts like this:

In Russian:

LARRY_B_CAMAPE (social media manager), kvazdopil (photojournalist), vanlovenowS (protected), and varlamov (journalist).
Unlike other languages, only one of the top five accounts was a suspended info ops account.

In Japanese: AizWalenstein and Lv10000_

In Turkish: birsorubinanlam and nededilanbu

In Persian: After reviewing Twitter’s ‘Persian’ mentions, the top accounts were mostly Saudi-based accounts posting in Arabic: TopNewsSA featuring news, and SR___74 and 44__kk showing artsy photos and music. I would need to review the mentions for more information.

Code:

https://gist.github.com/mapmeld/0820a6f83843e1c8266c280ec688761b

3. Does my AOC Reply Dataset include actors?

Twitter’s disclosures include username hashes, not original usernames, unless they had many followers. Instead I filtered the Saudi Tweets with a simple cat tweets.csv | grep -i ‘@aoc’ (or RepAOC). She’s mentioned only 20–30 times, mostly to criticize Houthis in Yemen or praise a video, but one user replied to this conversation to recommend La Chula:

“La Chula in Spanish Harlem is an excellent option as well”

Disinfo bots have opinions on tacos, too 🌮 … who knew?

Updates?

Nomadic web developer and mapmaker.