Machine Learning in JavaScript

bkanber · on Jan 30, 2014

Author here -- thanks for submitting, xd!

Let me know if you have any questions. I do intend to keep up with this series, although my pace is pretty slow at about one article every three months or so.

There are already a couple of comments about running ML in JS and how JS and the browser environment isn't terribly suited for heavy calculations. First: you're totally correct; second, I chose JS because it's

1) accessible -- whether you're Python or Ruby or PHP on the backend, you're probably comfortable with JS and

2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all.

Anyway, thanks for reading, and I'll poke in here throughout the day if you have questions.

xd · on Jan 30, 2014

No no, thank you!

I've been building a data management platform for the last 8 years and we are now at the stage where we want to provide tools to help our customers get more from their data than just statistics. As my programming experience is mainly in PHP and JS this set of articles is helping me grasp ML rather than trying to wrap my head around a new language. I'm currently working on k-means clustering and re-implementing everything in PHP to get the best possible understanding I can .. my aim after that is to see how well I can implement things at an SQL level.

bkanber · on Jan 30, 2014

Excellent, I'm glad to hear it! If you ever want to reach out, feel free -- email in my profile.

d_j_b · on Jan 30, 2014

This is great stuff - thank you very much.

>all those wonderful Python libs

As a non-mathematician I have no understanding of how wonderful they really are, which is why this sort of thing is so valuable.

zenocon · on Jan 30, 2014

Article looks really good! Look forward to longer read tonight. Minor nit: I'd stick with idiomatic JS style/formatting. Looks like you mixed in styles from some of the other languages (tab indentation, braces, etc.) This is always a religious argument, but I write a lot of different languages, and always just try to stick with the most popular idiomatic way, regardless of whether I care for it or not.

yohanatan · on Jan 30, 2014

Yea, 8-space tabs is a bit ridiculous. I prefer 2 or at most 3.

usamec · on Jan 30, 2014

"2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all." - you can demystify machine learning in any better language.

JS and PHP are slow, crappy and bug prone. Sane languages (like Python, C++) have tools to make your job easier (like numpy, blas, eigen and other libraries). They provide fast and reliable math routines so you don't have to worry about some eigenvalue decomposition, matrix multiplication and other problems.

bkanber · on Jan 30, 2014

I'm not sure I understand the point you're trying to make. Are you telling me I should not have written this series? That it's somehow not valuable because you don't like JS? That I can't effectively teach basic machine learning concepts to interested people without forcing linear algebra onto them? That, as a teacher, I can't try to use any and all tools and techniques available to the group I'm trying to teach?

I've even explicitly mentioned that I'm staying away from algorithms that rely on linear algebra, because I'm trying to bring these concepts to people who may not have a CS or mathematical background.

breuleux · on Jan 30, 2014

JS isn't particularly slow any more, thanks to the massive efforts invested in the optimization of the various competing JS engines. It is generally faster than Python 3 (http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...) and not just a little. Just for the hell of it I compared node and python on naive fibonacci. That is anecdotal, of course, but node is 30 times faster. I do agree with you that JS has horrid, bug-prone semantics, but it's impressively well optimized.

I also agree that other languages offer better tools. For instance, Python has Numpy. However, that's written in C++, not in Python. You can write plugins for Node in C++ too, so nothing would stop someone from writing a Numpy equivalent for JS. You might even be able to run it in some browsers through something like Emscripten with a performance overhead of 2-3x (I think?)

m1sta_ · on Jan 30, 2014

Let's not forget webCL. Gpu based matrix multiplication is available in javascript too.

kajecounterhack · on Jan 30, 2014

For those who aren't familiar Andrej Karpathy has done a lot of cool stuff with ML in JS. Particularly he has a CNN library -- deep learning comes to JS!

http://cs.stanford.edu/people/karpathy/convnetjs/

http://cs.stanford.edu/people/karpathy/svmjs/demo/

Heather Arthur (npm libraries brain, classifier) has also done a bunch of cool stuff!

https://github.com/harthur

morganherlocker · on Jan 30, 2014

To those wondering why someone would want ML in JS, there are loads of reasons.

For starters, node.js, which makes most of the arguments regarding server/client moot.

Secondly, there are many client side applications for these types of algorithms as well. K-means clustering, for example, is already used by many mapping libraries to group together large numbers of points[1].

I personally use neural networks and affinity propagation in many of my applications for predictive analysis. This does not have to only be educational, or of a 'toy' nature.

[1] http://danzel.github.io/Leaflet.markercluster/example/marker...

nashequilibrium · on Jan 30, 2014

Nodejs is for i/o, i know it has workarounds for long running tasks(threadpool) but it does not excell at that. Training your model, updating your model, validation, matrix factorization on large datasets etc, i just don't see how Javascript helps here. Maybe just taking the http request and dumping it onto a rabbitmq queue to to classify something but you still have a whole host of other stuff to deal with.

morganherlocker · on Jan 30, 2014

> Nodejs is for i/o

Node is a general purpose language that can be used for all kinds of things. I switched from Python to node.js about a year ago for exactly the sort of tasks you are describing and could not be happier. Right off the bat I had huge speed improvements.

Also, io is one of the biggest issues with web based data analysis, so it really should not be underestimated. I can do more with less with node than I could with Python. This is especially true with long running tasks where a 1 minute processing time vs a 20 minute processing time might mean you need 1/20th the number of servers in a cluster ($$$).

Of course, this could be a pretty good argument for something even faster/lower level, but for me, node.js struck a good balance between performance and ease of development/ecosystem. As usual, YMMV.

One last point. The language you choose cannot always be the best language for every task you need. Typically you choose a stack based on the most common/important tasks in your infrastructure, then for less common tasks you just make it work with what the chosen language provides. In this case node.js does not need to be the best solution for ML, it just needs to check the box for being possible, so that devs who needed node.js for other reasons now have the ability to add ML to their toolbox.

kimmel · on Jan 31, 2014

> Node is a general purpose language that can be used for all kinds of things.

Node is a library that can be used for all kinds of things. FTFY

morganherlocker · on Jan 31, 2014

Fair enough, it is not a language, but I would not call it a library either. Perhaps a platform?[1] It contains many libraries, but also has a cli tool, packages a runtime, a debugger, and even includes a package manager. It contains the sorts of things that would be packaged when you install languages like Python or Ruby.

[1] http://nodejs.org/

thrush · on Jan 30, 2014

I think they are making the argument for capability rather than best of class. This is really exciting for Javascript and NodeJS devs right now because of the "Give me an inch and I'll take a foot" mentality.

Already__Taken · on Jan 30, 2014

This blog also contains a JavaScript physics series which is very cool.

Do hope this author writes more again it has been quite.

bkanber · on Jan 30, 2014

Thanks! -- I am intending to write more, but the pace is sloooowww.

viana007 · on Jan 30, 2014

For neural networks, you can use BrainJS

https://github.com/harthur/brain

nightski · on Jan 30, 2014

So the input/output pairs are a linked list of objects? Which then contain vectors comprised of linked lists? I am not very into JavaScript, but that right there must preclude this from doing anything significant in a reasonable amount of time?

frik · on Jan 30, 2014

That's great, keep up the good work.

For some reason it's somewhat hard to find C-style science code examples in some disciplines. Python feels a bit like a plague in this respect. Everytime I have to wrap my head around while converting code to C-like language (C, C++, PHP, JS).

tlarkworthy · on Jan 30, 2014

The distance to convert math to python is so much shorter than math to C or math to javascript.

You need something like numpy to make working in javasctipt easier before there will be a proliferation of of ML in JS.

I really love JS for its distribution and some of the visualizations are amazing. But the low level, numerically stable, matrix math primitives are sorely lacking.

nashequilibrium · on Jan 30, 2014

You need the love of the data science community and they have settled on python, and when they really want to scale they use Java. I noticed Julia becoming very popular and the twitter guys use scala for their stuff.

Already__Taken · on Jan 30, 2014

I feel the people doing heavy work with graphics APIs and 3D work with robots too will push this back into the javascript language given some time.

tlarkworthy · on Jan 30, 2014

I work with robots. The leading middleware is ROS. It is multi language, but does not support JS out the box (unlike LISP, Java, C++ and Python), though there is movement there http://brandonalexander.com/rosnodejs/

I can see JS useful for a UI to a robot, but I can't see it replacing Python for math, or C++ for speed, or LISP for planning systems.

That said, I can imagine node.js being a better async message router than the current C++ one.

ROS is glued together with XML-RPC, which I think was a mistake (why not JSON???)

Joe8Bit · on Jan 30, 2014

If you're looking for more on this, or some general purpose JS NLP you'd do worse than to checkout node-natural[0]

[0]: https://github.com/NaturalNode/natural

e12e · on Jan 30, 2014

Interesting, but I wonder about this:

> … well, most of the time. There are some things you really can’t do in PHP or Javascript, but those are the more advanced algorithms that require heavy matrix math.

Leaving out javascript (in the browser), it sounds like an odd statement to make about php -- after all one of php strengths is how easy it is to link with c-libraries (or other with c ffi)? Among other things I quickly found:

http://www.php.net/manual/en/intro.lapack.php

code_scrapping · on Jan 30, 2014

I still view JS as a UI-oriented language, and I really don't know why would you want to implement processor-heavy algorithms in a browser environment, which need a lot of data and don't use the networking.

I would still stick to python. Or java. Or anything else which has a clear syntax and can run at a useful speed (I'm not mentioning C++ because of the coding overhead and dirty tricks which makes it a bit unfriendly for learning an algorithm)

dangoor · on Jan 30, 2014

Clarity of syntax is a matter of opinion (personally, I agree that Python is clearer than JS... Java, not so much.)

Implying that JavaScript can't "run at a useful speed" is wrong, using modern implementations. This is especially true for code that runs through lots of repetition as the just-in-time compilers in the JS engines do a remarkable job.

Not to mention that viewing JS as a UI-oriented language seems a bit out of date given the 40k or so packages for Node.js that are in npm.

JavaScript of today is pretty different than JS of 2007, and there are more changes coming with generators, iterators, destructuring, class syntax, arrow functions, promises, etc.

Joe8Bit · on Jan 30, 2014

While I disagree with the comment you're responding to, and agree with yours, there are some interesting problems doing resource heavy operations in ML/NLP in an environment like Node that's inherently single threaded.

I'm actually adding multi-threading to classifier training in node-natural as we speak [0] so it's something I'm recently familiar with. Multi-threading in JS isn't new or particularly exciting (even less so is multithreading in ML/NLP applications) but the marriage of the two has led to a few interesting problems in JS's asynchronous/event based view of the world!

[0]: https://github.com/NaturalNode/natural/issues/124

--

Edited for clarity

dangoor · on Jan 30, 2014

That's a good point. Of course, the problems with shared mutable state are well-documented and I'm glad that JavaScript hasn't headed down that path. But you're right that Node doesn't have good, mature solutions for that yet (short of your central data store option)

xd · on Jan 30, 2014

"Through this series of articles, I’ll teach you the fundamental machine learning algorithms using Javascript"

perimo · on Jan 30, 2014

I kicked around with some JS manifold learning stuff[1] a while back for essentially the same purpose: practice in writing things from scratch, while making it easier for other people to play with.

[1]: https://github.com/perimosocordiae/js_manifolds

ecesena · on Jan 30, 2014

Cool project! Have you already tried asm.js and/or measured performances?

frik · on Jan 31, 2014

You cannot write asm.js by hand (in a sane way... it uses one big array for everything). It's meant to be translated from emscripten clang compiler project. So you can compile C/C++ code to asm.js.

But Javascript engines like V8 with its JIT are way faster than Python. You can even use typed arrays that give you almost native speed for such operations (e.g. matrix). I am coding a 3D game in WebGL and JS is as fast as Java when used in a modern fashion, though JS run in every browser

TeeWEE · on Jan 30, 2014

X in JavaScript..... ugh

if all you have is a hammer, everything looks like a nail

bkanber · on Jan 30, 2014

Have you read the article? I make it pretty apparent that JS is used primarily for its educational value :)

TeeWEE · on Jan 30, 2014

To be honest i didn't initially. I just read it.

I think it is a noble thing to explain this in JS. But i don't think "because every body uses js" is a good reason to choose js.

However your specific use case makes sense. But in a broader sense I see more and more people fleeing to JS because its what they know.

thrush · on Jan 30, 2014

What alternatives would you recommend for someone new to programming and CS?

I guess fleeing implies that they were using other tools already, but a lot of new devs are going to JS because it just makes sense to start there (lots of flexibility, hyperactive community, education value).

stusmall · on Jan 30, 2014

If you are at the point of your CS education that you are taking a serious look at machine learning and understanding the theory then you shouldn't have a problem translating into whatever your language of choice is. I get why a teacher would just want to pick a language and say "this is what it is in" but I don't get people who need CS concepts taught in their language of choice. The hard part is the theory and not the implementation.

When I took it in university it was taught in language agnostic psuedocode and we were free to use any language from a long list for our assignments.

bkanber · on Jan 30, 2014

> If you are at the point of your CS education that you are taking a serious look at machine learning and understanding the theory then you shouldn't have a problem translating into whatever your language of choice is.

This feels like begging the question. Why does that need to be the case? Why can't someone strive to learn machine learning _without_ learning a new language? Why can't they get a head start on the concepts early in their career? Is there some requirement that ML _must_ be an advanced topic, only accessible to polyglots that I haven't heard about?

computerslol · on Jan 31, 2014

When I first wrote "Introduction to open heart surgery: your housecat" people asked me "why housecats?". Because they are easy to find, and can get you real experience. No need to work up to open heart surgery, it should be available to anyone that has a sharp implement, a housecat, and a thirst for knowledge. Bloody, messy, knowledge.

stusmall · on Jan 30, 2014

Because the nature of the subject requires a fair amount of background. To truly understand the subject and a lot of the approaches a firm understand of statistics, data structures, and even some calculus. Usually by the time someone these subjects down enough for anything substantial in ML then they've seen enough different languages to suss out the general idea of most algorithm sample code.

I'm not saying there isn't room for the easier to understand and easier to read guides to ML. More the better, Mitchell was a beast to read through. Its just the language isn't the hard part of the subject. You are the author of the link, correct? I read through some of it, and its approaches the theory and subject matter in a gentle way which is what matters. The sample code is easy to read. I've written maybe 100 lines of js in my life and avoid all web dev like the plague. Your guide is well written and useful. I am not dogging it at all and please don't take it that way. I think its great!

What I'm saying is if someone is saying to themselves "I would be able to learn machine learning if only their was a guide in X" then they are probably mistaken. The code is easy, the math and theory is what is hard.

bkanber · on Jan 30, 2014

> Its just the language isn't the hard part of the subject.

For you, sure -- but not for everyone.

This series has actually been up for a little over a year now. I get emails from people who didn't know what machine learning was before they started reading the articles, and now they're building some of the most creative and beautiful projects out there. I also get emails from people who need to implement ML in JS or C-like languages but have had trouble seeing the algorithms in full relief when translating from Python, for instance.

The point is, your experience is not everyone's experience. My goal is purely one of accessibility of education. There are smart, talented people who never played with ML simply because they didn't want to dive into a different language, different platform, and different environment just to muck around. There are people who hadn't heard of ML before, but tried it out because JS was right there for them. There are people who stayed away from ML because they thought higher math and a CS education were requirements. Those are facts. This series serves all those people, and it serves them well.

stusmall · on Jan 31, 2014

I just want to make sure that its clear that I think you are hitting that goal and your posts look great. There is nothing worse than posting something on the internet to have some snarky neckbeard from the peanut gallery put it down for some tangential reason.

xiaoma · on Jan 30, 2014

>"To truly understand the subject and a lot of the approaches a firm understand of statistics, data structures, and even some calculus. Usually by the time someone these subjects down enough for anything substantial in ML then they've seen enough different languages to suss out the general idea of most algorithm sample code."

I learned calculus and linear algebra long before learning to code.

stusmall · on Jan 31, 2014

I don't mean to be snarky, but how was your data structures knowledge then?

He asked "Is there some requirement that ML _must_ be an advanced topic" and I listed a few prerequisite pieces of knowledge that make it fairly advanced. You may learn those prerequisites in a different order but they are required before you properly tackle machine learning without cargoculting through it.

LambdaAlmighty · on Jan 30, 2014

Only a question of time before the Machine Learning wave arrived at the arguably least suited--but certainly the most popular--platform!

There's money to be made with this combination. The field is ripe.

Good write up too.