asim.dev

Bot Of The United States

Code Europe 2017 · Warsaw, Poland

by Asim Hussain · 1 November 2017

talk tech
Bot Of The United States
▶ Watch on YouTube ↗

I open this one with a promise to make everyone in the room insanely rich, and then spend the rest of it explaining how I built a bot that reads Donald Trump’s tweets, guesses whether he’s happy or angry, and trades currencies and gold on the back of it. It’s really a tour through sentiment analysis, Redis and serverless — with Trump’s Twitter feed as the excuse.

AI-generated summary of my talk

Jump into the talk

  1. 0:04 A guaranteed way to get rich
  2. 1:05 BOTUS: the bot that came before mine
  3. 3:08 My version — and the trading rules
  4. 7:17 Step 1: getting the tweets with Logic Apps
  5. 8:17 How sentiment analysis actually works
  6. 11:22 The NLP toolkit: tokens, stemming, n-grams, POS
  7. 16:26 Pre-trained models and Text Analytics
  8. 18:32 Storing time series in Redis sorted sets
  9. 24:37 Making trades, serverless and the four hosting models

A guaranteed way to get rich

I like to start a talk with a question, so I start this one by asking who wants to be rich — and promising a 100% foolproof method. The method is Donald Trump. Whether you love him or hate him, two things are true: he loves Twitter, and he’s influential enough that people argue he moves markets. The example I show is the exact moment he said something bad about Toyota Motor, and the share price dipping right alongside it.

I wasn’t the first to spot this. I found out about it from the Planet Money podcast — and I’m legally obliged to say I’m not affiliated with them and they don’t endorse me. Earlier that year they hired a high-frequency trading firm to see if you could make money off Trump’s tweets, built a bot, and put it live on Twitter under the name BOTUS — Bot of the United States, a play on POTUS. They’d shut it down about two weeks before this talk, because it was only ever for educational purposes and the listeners who’d put money in wanted it back. So I decided to show people how to build their own.

My version, and the rules

The Planet Money bot traded stocks. Mine trades FX currencies, and that’s a deliberate choice. Early in his presidency Trump said a lot about individual companies and moved their share prices; later he largely stopped — possibly someone had a word — but he never stopped talking about countries, positively and negatively, and that makes currency trading genuinely interesting.

The app shows the latest tweets with a face on each one: a smile for positive, an angry face for negative, a neutral face when the algorithm can’t decide. Each tweet triggers a batch of trades, and there’s a portfolio page showing the current value of everything it has bought. At the point I recorded it, it was up 61 dollars — the best investment I’ve ever made.

The rules are deliberately simple. A positive tweet buys 100 dollars of the relevant currency. A negative tweet sells whatever I hold of it. If the currency is the US dollar, I flip it and buy and sell gold instead — say something negative about America and the way to counteract it is to buy gold. This is not a high-frequency engine; it’s a very low-frequency one, holding positions overnight for as long as it takes to make money. And no money was harmed in the making of it — it’s all simulated.

How sentiment analysis works

To build it you need four things: get the tweets, analyse the text, store the data, and make the trades — and do it in a way that scales elastically, because Twitter can publish a torrent of data in a short space of time. Getting the tweets is almost disappointingly easy: Azure has Logic Apps, a visual workflow tool with a ready-made Twitter connector, so whenever Trump tweets I can kick off a workflow without writing the plumbing myself.

The interesting part is the text. Sentiment analysis is just a trained classifier: you take a pile of tweets that someone has manually labelled positive or negative — and it’s subjective, one person’s “I love that” is another’s sarcasm — feed them to a classifier, and out comes a model. Then you pass new strings to the model and it spits out a number telling you how positive or negative it is. Trump turns out to be a gift here, something Planet Money found too: he’s unambiguous. No double negatives, everything is very very bad or very very good, so the sentiment lands cleanly at one end or the other.

The NLP toolkit

Each piece of text is a feature. The most basic feature is just a bag of words — split the tweet on whitespace, which is called tokenisation, and you can actually train a decent classifier on that alone. But you can do better with a few more tricks, and I run through them quickly so people get the shape of it:

Stemming reduces words to their root, so “love”, “loves” and “loving” count as the same word — and it works differently per language. N-grams take pairs (bigrams) or triples of words instead of single ones, so the machine can tell “Asim is” from a meaning the individual words would lose; two is a good default, but you’d experiment. Part-of-speech tagging attaches a grammatical code to each word — determiner, proper noun, verb — so order and context aren’t thrown away when the sentence becomes a bag of data. And word embeddings map words into a vector space so the model learns that a dog and a cat are both pets and related, while a cat and a truck are not.

You can take all of that, use a toolkit like NLTK, and train your own model — Azure will happily rent you the GPUs. But I prefer the other end of the spectrum: pre-trained models. Azure’s cognitive services hand you the trained model behind a simple HTTP request. I use the Text Analytics API — give it your text, get back sentiment and extracted key phrases like “Mexico”, which tells me which currency to trade. If you can make an HTTP request, you can do AI. And because it’s all wired through Logic Apps connectors, getting tweets in, scoring sentiment and pulling key phrases is mostly drag and drop.

Redis, and the serverless ladder

I can’t trade on tweets alone — I need prices and somewhere to keep the history of prices, trades and money. I pull currency rates from Open Exchange Rates, and store everything in Redis. Not as a cache — as a database. Redis is an in-memory database, and to show why that matters I use the classic table that scales computer timings to human ones: if a single CPU cycle is one second, reading from main memory is about six minutes, while reading from disk is a vast jump beyond that. Traditional SQL and document databases still go to disk; Redis stays in memory, which makes it insanely fast. The trade-off is that to store ten megabytes you need ten megabytes of RAM.

Most of my data is time series — tweets, prices and trades all happen over time — so I lean on Redis sorted sets. You store a value against a key with a score, and Redis keeps it sorted by that score automatically. I use the key (say the currency pair), the Unix timestamp as the score, and because set values must be unique I append the time to the value. Then I can pull back every rate between two dates with a single range-by-score query.

Making the trade is the easy bit: load the portfolio, work out which country and currency the tweet is about, and run it through the engine that decides buy or sell — and all the code is online. The real question is how to host that logic at scale, and I walk through the ladder: on-premise (you own the hardware, the OS, the framework and the function, and you fix the disk when it dies), infrastructure-as-a-service (rent the hardware), platform-as-a-service (just hand over your app), and finally functions-as-a-service, where you give a cloud the single function and it hosts and scales it. That’s serverless, and my favourite definition is that you pay only for actual usage, never predicted usage — deploy a function and never call it, you pay nothing. The function gets an endpoint, which I paste straight back into Logic Apps, and the loop is closed.