Can written words be time series data?

2 min readOct 26, 2019

I’ve been thinking about this idea for a while: what if I tried using a machine learning tool for time series (like stock markets, seasonal demand for candy, etc) to language?

Choosing a library

I found four libraries on GitHub which would be interesting to apply here:

facebook/prophet

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with…

github.com

awslabs/gluon-ts

GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (incubating). GluonTS…

github.com

jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction

LSTM built using the Keras Python package to predict time series steps and sequences. Includes sine wave and stock…

github.com

Ceruleanacg/Personae

Personae is a repo that implements papers proposed methods in Deep Reinforcement Learning & Supervised Learning and…

github.com

I decided on Gluon-TS, as Prophet is more of a seasonal prediction tool with literal datetimes, and Personae examples are all stock market data.

Setting up the training data

I downloaded Samuel Butler’s English text of the Odyssey and the Iliad from the MIT Classics website. The Iliad has over 156,000 words. NLTK splits it into word tokens. On Google CoLab, I installed dependencies and downloaded the FastText pre-trained word embeddings for English, to form a time-series array for each of its 300 word vectors.

Running Gluon-TS 300 times

Here is part of my test data from the Iliad, and predicted continuation of the zeroth dimension of the word vector, using Gluon-TS:

The loss metric, and the large uncertainty suggested in the prediction, make it such that almost any word would likely fit in the range, and the line in the middle (the mean) would never vary far from -0.05. This might be good to show uncertainty in a stock movement, but it doesn’t look promising.

I decided to continue in the event that I might in the future find a different prediction library, which could use the same code.

Generating words

Once I’ve generated 10 predicted values on each dimension (not just going to the mean), I need to turn those numbers back into words. It looks a little like this:

word = []
for v in vects:
word.append(v[w])
model_word_vector = np.array( word, dtype=’f’)
most_similar_words = en_src.most_similar( [ model_word_vector ], [], 1)

Can written words be time series data?

Choosing a library

facebook/prophet

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with…

awslabs/gluon-ts

GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (incubating). GluonTS…

jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction

LSTM built using the Keras Python package to predict time series steps and sequences. Includes sine wave and stock…

Ceruleanacg/Personae

Personae is a repo that implements papers proposed methods in Deep Reinforcement Learning & Supervised Learning and…

Setting up the training data

Running Gluon-TS 300 times

Generating words

Written by Nick Doiron

No responses yet