Alif: an Arabic word bank

Resources for puzzles and word games

Two years ago, I made a multilingual crossword puzzle generator, and wrote a post on that — Crosswords in Burmese. It took frustrating manual labor for users to build up their puzzles, so I made wiki-crossword, which grabs random articles from Wikipedia and generates puzzles and clues from there.

While adding Arabic script, it was interesting resolving right-to-left in the game, but there were other roadblocks which I hadn’t expected. Let me put them in context with word games and puzzles:

  • A simple game encourages you to fill in a missing letter. What does the partial word look like? Suppose you are removing it from العَرَبِيَّة.
    Take out one char and you will see العَرَبِ_ة ; by shaping the neighboring letters, we can preserve a more natural العَرَ بـ_ـة

A toolkit and a database

A game developer doesn’t want to stop and parse DBpedia, WikiData, and the Unicode spec before starting to write their game. So I made a thing:

Alif-Toolkit is a TypeScript library that supports all of those functions (and normalization) for any letters in Arabic’s Unicode blocks. All other libraries that I know of are GPL-licensed, so I pored over PDFs and hex codes to be comprehensive and be MIT-licensed.

Alif Word Bank is something that I only got running today, but it uses the toolkit, excerpts of articles, and category names to break down words in several ways. Here’s a Persian article on cookies which you can get as a response:

this is my concept for showing RTL JSON responses

Here’s a sample API request which returns names of birds in Persian (Farsi = “fa”), with a presence on the Simple English Wikipedia (this could help cut down on obscure topics, or give you an easy-to-read resource).
alif-word-bank.herokuapp.com/topic/fa/en:bird?inSimple=true&count=20

DBpedia and WikiData, working together

My DBpedia parser is a little marvel. First I pull in their list of all animals, then use a forEach to look up thousands of entries. I quickly got blocked by the server, so I added random timeouts to each call, spacing out to about 0.8 seconds per animal.

DBpedia lists categories (in English), names in a handful of languages, and a WikiData ID. To get Persian and the full range of Chinese names (such as Simplified vs. Traditional), I also load the WikiData entity. This is also where to note if there is a Simple English link or not.

DBpedia also provides a short blurb in each of their languages, but I have different rules for my crossword blurbs, so I prefer parsing the article directly. For example, the article titled “hyperlink” starts with:

In computing, a hyperlink, or simply a link, is a reference to data that the reader can directly follow either by clicking or tapping.

I use the bold markup to hide both “hyperlink” and “link” from the player, which I would’ve missed if I simply did find-and-replace with the title.

Again, nothing super-advanced, but something a game developer won’t have to scrape and trudge through to access words.

Future goals

  • More categories of articles, including Arabic and Persian category names

Nomadic web developer and mapmaker.