Finalizing the CRUD ML app

3 min readSep 22, 2019

I’ve been working on an incremental machine learning server with an Explainable AI tie-in, and NLP with FastText. Here’s the conclusion (or you can skip to the source code: github.com/mapmeld/crud-ml )

Identifying models and training data in a database

I’ve added PostgreSQL to my server, using Python module psycopg2 to maintain the connection. When you first create a model with /create, /train/create, or /train_text/create I record if it is text-based, and return a unique ID to refer to the model from then on.

I also upload all of your CSV rows into their own table using an asynchronous call to csvkit. Subsequent calls to /:model/train/insert continue to add to that table (so long as it successfully finished being created).

You could use any database here, so long as it’s supported by csvkit.

Reviewing training data

This demo is barebones, so I decided to use the DataTables library (based on jQuery) to render rows from the training data table. There is some server-side code to support pagination and sorting.

Saving time with word vectors during development

It takes a long time to load word vectors into memory, and I don’t want to wait every time I restart the Flask develop server. I created a separate word-vector.py server to stay on port 9000 and return the word vector (or a dummy value) as needed.

Viewing weights

I previously highlighted words in Tweets to show how they influenced the prediction, according to ELI5. I can do that here, too. This praise of NetflixMENA and horror movies sounded negative to our classifier, as it saw ‘slasher’ and ‘يهدد حياة الناس’.
Arabic FastText didn’t recognize ‘Clown’, and I haven’t helped it interpret any of the emojis, so they appear unhighlighted.

Changing weights

Now that we have our initial weights, it would be cool to customize these as a final ‘layer’ of the model. This could be considered cheating, as it passes over the model’s original weighting of the word, but it avoids frustrating the final deployer of this classifier.

I make each span clickable, which then shows some sample rows which include this word.
You can use a range tool to move each word between -1 to +1 in weight, updating the highlighted span, final score, and prediction. Each word-score is then stored into a model-specific row in a table that I’ve namedword_adjust.

The UI isn’t pretty, but it finally accomplishes my goal of turning the ML prediction explanation into an interactive discussion.

mapmeld/crud-ml

This repo is based directly on Amir Ziai's SKLearnFlask ML server, which serves predictions from a scikit-learn model…

github.com

Updates

This article is from 2019. For latest links and recommendations see:
https://github.com/mapmeld/use-this-now#model-editing
I also have a section there about Arabic NLP updates.