I can't do that for you dave: Undefined is not a function

LeadDev London · London, UK

by Asim Hussain · 31 May 2019

▶ talk tech

I can't do that for you dave: Undefined is not a function

This is a ten-minute lightning tour, under a HAL-9000 “2001: A Space Odyssey” title, of what machine learning actually looks like once you’re a JavaScript developer rather than a maths PhD. I run through three AI-powered web apps live, and the whole point is how little stands between you and building this stuff yourself.

AI-generated summary of my talk

Key takeaways

You don’t need to go deep on machine learning — check for an API first. Much of what Microsoft was doing at the time was commoditising these services, so a single HTTP request gets you faces, emotions, captions and more.
TheMojifier posts an image to the Face API, gets back every face plus its emotion (anger, contempt, disgust, fear, happiness, sadness, surprise), and pastes the matching emoji over each one.
You can run real ML in the browser. TensorFlow.js is the whole dependency — TensorFlow rewritten from scratch in JavaScript — and it loads pre-trained models or trains new ones, no extra install.
MobileNet classifies one of 1,000 things in an image in roughly four lines of code, optimised for mobile, with the only network call being the image fetch — the recognition happens locally.
For heavier lifting, reach for a bigger API. The Computer Vision API returns a human-readable caption, which my friend Sarah wired up to auto-generate alt-text for screen-readers.
Generative models are further along than you’d think. pix2pix turns a scribbled cat outline into a photorealistic cat in the browser, and related models go outline-to-face, segmentation-to-video, even text-to-image.
The honest framing: half-right captions still pass, beards apparently rule out 100% happiness, and you can start all of this today.

Jump into the talk

A meetup, and a website of the weird

I co-run an AI JavaScript meetup. We started it early in 2018 because we’d noticed JavaScript and machine learning beginning to overlap in genuinely interesting ways. At the end of every meetup someone would wander up with a link to some AI-powered JavaScript site, and after a year of that we collected them all into one place: aijs.rocks. So for the talk I picked three of those apps and walked through, very briefly, what’s actually going on under the hood. The title is the HAL-9000 line, mangled — if you got the reference, you’re old.

TheMojifier, and the lesson hiding inside it

The first one I wrote myself: TheMojifier. Give it an image, it finds every face, works out the emotion on each one, picks the matching emoji and pastes it back over the face. It works with multiple faces, it works on memes, you can add it to your own Slack workspace. The first demo is a photo of my son — that, apparently, is exactly how he emojifies.

The interesting question is how it reads emotion at all. I do run workshops teaching people to train a neural network to detect emotion — but the honest answer, the one people don’t expect, is: there’s an API for that. Microsoft’s Face API takes a posted image and returns every face plus a set of emotion scores: anger, contempt, disgust, fear, happiness, sadness, surprise. (A warning for the bearded among you: you can apparently never be 100% happy. I’m sorry.) That’s the first real lesson of the talk — before you go deep-diving into machine learning, check whether someone has already commoditised the hard part behind a single request.

Running the model in your browser

The second app is a simple CodePen: click around, it searches Unsplash for an image, and the percentages in the corner are its guess at what’s inside — terrier, puppy, and so on. The thing worth noticing is that the only network call is fetching the image. The recognition happens in JavaScript, in the browser.

That’s TensorFlow.js. Plain TensorFlow runs heavy numerical computation across GPUs and CPUs, written in C. Early in 2018 they shipped TensorFlow.js — the whole thing rewritten from scratch in JavaScript — and the lovely part is that it’s the only dependency you need. No extra install. You can train models from your own data, or load pre-trained ones. This demo uses MobileNet, a model that recognises one of 1,000 things in an image and has been optimised to run on mobile, and it gave you that whole classifier in about four lines of code.

The catch is size. A model small enough for mobile only knows so much. If you want to really understand what’s in an image, you need a much bigger model trained on much more data — and again, you can either do that yourself or just call an API. Microsoft’s Computer Vision API is one such, and the bit I love is that it returns a caption: a plain, human-readable sentence describing the image. My friend Sarah built a demo wiring that straight into alt-text — the descriptions screen-readers read aloud for people who can’t see the image. Twitter, being Twitter, was quick to point out where it fails: a star-filled sky it got wrong, and a couple of others it got about half right. Fifty per cent is a pass mark in my book.

Drawing the rest of the cat

The last one is my favourite. Running in the browser, you draw a rough outline of a cat and it generates the rest — a photorealistic cat — using pix2pix, a generative neural network that takes your outline as input and paints the image. It doesn’t have to be cats; it works on human faces too, at which point the room stops laughing. From there the inputs get stranger: a segmented depth image driving a generated dancer (a related vid2vid model, not yet in the browser), and even plain text as the input with the images as output.

Which is where I left it. How far off, really, is someone typing “build me an e-commerce app on this reference architecture for a company in Japan with this many customers” and getting it? In 2019 my answer was: not that far off. That’s the whole talk — ten minutes, no maths PhD required, and an npm install or a single API call is enough to start.