Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning in JavaScript (burakkanber.com)
169 points by xd on Jan 30, 2014 | hide | past | favorite | 49 comments



Author here -- thanks for submitting, xd!

Let me know if you have any questions. I do intend to keep up with this series, although my pace is pretty slow at about one article every three months or so.

There are already a couple of comments about running ML in JS and how JS and the browser environment isn't terribly suited for heavy calculations. First: you're totally correct; second, I chose JS because it's

1) accessible -- whether you're Python or Ruby or PHP on the backend, you're probably comfortable with JS and

2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all.

Anyway, thanks for reading, and I'll poke in here throughout the day if you have questions.


No no, thank you!

I've been building a data management platform for the last 8 years and we are now at the stage where we want to provide tools to help our customers get more from their data than just statistics. As my programming experience is mainly in PHP and JS this set of articles is helping me grasp ML rather than trying to wrap my head around a new language. I'm currently working on k-means clustering and re-implementing everything in PHP to get the best possible understanding I can .. my aim after that is to see how well I can implement things at an SQL level.


Excellent, I'm glad to hear it! If you ever want to reach out, feel free -- email in my profile.


This is great stuff - thank you very much.

>all those wonderful Python libs

As a non-mathematician I have no understanding of how wonderful they really are, which is why this sort of thing is so valuable.


Article looks really good! Look forward to longer read tonight. Minor nit: I'd stick with idiomatic JS style/formatting. Looks like you mixed in styles from some of the other languages (tab indentation, braces, etc.) This is always a religious argument, but I write a lot of different languages, and always just try to stick with the most popular idiomatic way, regardless of whether I care for it or not.


Yea, 8-space tabs is a bit ridiculous. I prefer 2 or at most 3.


"2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all." - you can demystify machine learning in any better language.

JS and PHP are slow, crappy and bug prone. Sane languages (like Python, C++) have tools to make your job easier (like numpy, blas, eigen and other libraries). They provide fast and reliable math routines so you don't have to worry about some eigenvalue decomposition, matrix multiplication and other problems.


I'm not sure I understand the point you're trying to make. Are you telling me I should not have written this series? That it's somehow not valuable because you don't like JS? That I can't effectively teach basic machine learning concepts to interested people without forcing linear algebra onto them? That, as a teacher, I can't try to use any and all tools and techniques available to the group I'm trying to teach?

I've even explicitly mentioned that I'm staying away from algorithms that rely on linear algebra, because I'm trying to bring these concepts to people who may not have a CS or mathematical background.


JS isn't particularly slow any more, thanks to the massive efforts invested in the optimization of the various competing JS engines. It is generally faster than Python 3 (http://benchmarksgame.alioth.debian.org/u64/benchmark.php?te...) and not just a little. Just for the hell of it I compared node and python on naive fibonacci. That is anecdotal, of course, but node is 30 times faster. I do agree with you that JS has horrid, bug-prone semantics, but it's impressively well optimized.

I also agree that other languages offer better tools. For instance, Python has Numpy. However, that's written in C++, not in Python. You can write plugins for Node in C++ too, so nothing would stop someone from writing a Numpy equivalent for JS. You might even be able to run it in some browsers through something like Emscripten with a performance overhead of 2-3x (I think?)


Let's not forget webCL. Gpu based matrix multiplication is available in javascript too.


For those who aren't familiar Andrej Karpathy has done a lot of cool stuff with ML in JS. Particularly he has a CNN library -- deep learning comes to JS!

http://cs.stanford.edu/people/karpathy/convnetjs/

http://cs.stanford.edu/people/karpathy/svmjs/demo/

Heather Arthur (npm libraries brain, classifier) has also done a bunch of cool stuff!

https://github.com/harthur


To those wondering why someone would want ML in JS, there are loads of reasons.

For starters, node.js, which makes most of the arguments regarding server/client moot.

Secondly, there are many client side applications for these types of algorithms as well. K-means clustering, for example, is already used by many mapping libraries to group together large numbers of points[1].

I personally use neural networks and affinity propagation in many of my applications for predictive analysis. This does not have to only be educational, or of a 'toy' nature.

[1] http://danzel.github.io/Leaflet.markercluster/example/marker...


Nodejs is for i/o, i know it has workarounds for long running tasks(threadpool) but it does not excell at that. Training your model, updating your model, validation, matrix factorization on large datasets etc, i just don't see how Javascript helps here. Maybe just taking the http request and dumping it onto a rabbitmq queue to to classify something but you still have a whole host of other stuff to deal with.


> Nodejs is for i/o

Node is a general purpose language that can be used for all kinds of things. I switched from Python to node.js about a year ago for exactly the sort of tasks you are describing and could not be happier. Right off the bat I had huge speed improvements.

Also, io is one of the biggest issues with web based data analysis, so it really should not be underestimated. I can do more with less with node than I could with Python. This is especially true with long running tasks where a 1 minute processing time vs a 20 minute processing time might mean you need 1/20th the number of servers in a cluster ($$$).

Of course, this could be a pretty good argument for something even faster/lower level, but for me, node.js struck a good balance between performance and ease of development/ecosystem. As usual, YMMV.

One last point. The language you choose cannot always be the best language for every task you need. Typically you choose a stack based on the most common/important tasks in your infrastructure, then for less common tasks you just make it work with what the chosen language provides. In this case node.js does not need to be the best solution for ML, it just needs to check the box for being possible, so that devs who needed node.js for other reasons now have the ability to add ML to their toolbox.


> Node is a general purpose language that can be used for all kinds of things.

Node is a library that can be used for all kinds of things. FTFY


Fair enough, it is not a language, but I would not call it a library either. Perhaps a platform?[1] It contains many libraries, but also has a cli tool, packages a runtime, a debugger, and even includes a package manager. It contains the sorts of things that would be packaged when you install languages like Python or Ruby.

[1] http://nodejs.org/


I think they are making the argument for capability rather than best of class. This is really exciting for Javascript and NodeJS devs right now because of the "Give me an inch and I'll take a foot" mentality.


This blog also contains a JavaScript physics series which is very cool.

Do hope this author writes more again it has been quite.


Thanks! -- I am intending to write more, but the pace is sloooowww.


For neural networks, you can use BrainJS

https://github.com/harthur/brain


So the input/output pairs are a linked list of objects? Which then contain vectors comprised of linked lists? I am not very into JavaScript, but that right there must preclude this from doing anything significant in a reasonable amount of time?


That's great, keep up the good work.

For some reason it's somewhat hard to find C-style science code examples in some disciplines. Python feels a bit like a plague in this respect. Everytime I have to wrap my head around while converting code to C-like language (C, C++, PHP, JS).


The distance to convert math to python is so much shorter than math to C or math to javascript.

You need something like numpy to make working in javasctipt easier before there will be a proliferation of of ML in JS.

I really love JS for its distribution and some of the visualizations are amazing. But the low level, numerically stable, matrix math primitives are sorely lacking.


You need the love of the data science community and they have settled on python, and when they really want to scale they use Java. I noticed Julia becoming very popular and the twitter guys use scala for their stuff.


I feel the people doing heavy work with graphics APIs and 3D work with robots too will push this back into the javascript language given some time.


I work with robots. The leading middleware is ROS. It is multi language, but does not support JS out the box (unlike LISP, Java, C++ and Python), though there is movement there http://brandonalexander.com/rosnodejs/

I can see JS useful for a UI to a robot, but I can't see it replacing Python for math, or C++ for speed, or LISP for planning systems.

That said, I can imagine node.js being a better async message router than the current C++ one.

ROS is glued together with XML-RPC, which I think was a mistake (why not JSON???)


If you're looking for more on this, or some general purpose JS NLP you'd do worse than to checkout node-natural[0]

[0]: https://github.com/NaturalNode/natural


Interesting, but I wonder about this:

> … well, most of the time. There are some things you really can’t do in PHP or Javascript, but those are the more advanced algorithms that require heavy matrix math.

Leaving out javascript (in the browser), it sounds like an odd statement to make about php -- after all one of php strengths is how easy it is to link with c-libraries (or other with c ffi)? Among other things I quickly found:

http://www.php.net/manual/en/intro.lapack.php


I still view JS as a UI-oriented language, and I really don't know why would you want to implement processor-heavy algorithms in a browser environment, which need a lot of data and don't use the networking.

I would still stick to python. Or java. Or anything else which has a clear syntax and can run at a useful speed (I'm not mentioning C++ because of the coding overhead and dirty tricks which makes it a bit unfriendly for learning an algorithm)


Clarity of syntax is a matter of opinion (personally, I agree that Python is clearer than JS... Java, not so much.)

Implying that JavaScript can't "run at a useful speed" is wrong, using modern implementations. This is especially true for code that runs through lots of repetition as the just-in-time compilers in the JS engines do a remarkable job.

Not to mention that viewing JS as a UI-oriented language seems a bit out of date given the 40k or so packages for Node.js that are in npm.

JavaScript of today is pretty different than JS of 2007, and there are more changes coming with generators, iterators, destructuring, class syntax, arrow functions, promises, etc.


While I disagree with the comment you're responding to, and agree with yours, there are some interesting problems doing resource heavy operations in ML/NLP in an environment like Node that's inherently single threaded.

I'm actually adding multi-threading to classifier training in node-natural as we speak [0] so it's something I'm recently familiar with. Multi-threading in JS isn't new or particularly exciting (even less so is multithreading in ML/NLP applications) but the marriage of the two has led to a few interesting problems in JS's asynchronous/event based view of the world!

[0]: https://github.com/NaturalNode/natural/issues/124

--

Edited for clarity


That's a good point. Of course, the problems with shared mutable state are well-documented and I'm glad that JavaScript hasn't headed down that path. But you're right that Node doesn't have good, mature solutions for that yet (short of your central data store option)


"Through this series of articles, I’ll teach you the fundamental machine learning algorithms using Javascript"


I kicked around with some JS manifold learning stuff[1] a while back for essentially the same purpose: practice in writing things from scratch, while making it easier for other people to play with.

[1]: https://github.com/perimosocordiae/js_manifolds


Cool project! Have you already tried asm.js and/or measured performances?


You cannot write asm.js by hand (in a sane way... it uses one big array for everything). It's meant to be translated from emscripten clang compiler project. So you can compile C/C++ code to asm.js.

But Javascript engines like V8 with its JIT are way faster than Python. You can even use typed arrays that give you almost native speed for such operations (e.g. matrix). I am coding a 3D game in WebGL and JS is as fast as Java when used in a modern fashion, though JS run in every browser


X in JavaScript..... ugh

if all you have is a hammer, everything looks like a nail


Have you read the article? I make it pretty apparent that JS is used primarily for its educational value :)


To be honest i didn't initially. I just read it.

I think it is a noble thing to explain this in JS. But i don't think "because every body uses js" is a good reason to choose js.

However your specific use case makes sense. But in a broader sense I see more and more people fleeing to JS because its what they know.


What alternatives would you recommend for someone new to programming and CS?

I guess fleeing implies that they were using other tools already, but a lot of new devs are going to JS because it just makes sense to start there (lots of flexibility, hyperactive community, education value).


If you are at the point of your CS education that you are taking a serious look at machine learning and understanding the theory then you shouldn't have a problem translating into whatever your language of choice is. I get why a teacher would just want to pick a language and say "this is what it is in" but I don't get people who need CS concepts taught in their language of choice. The hard part is the theory and not the implementation.

When I took it in university it was taught in language agnostic psuedocode and we were free to use any language from a long list for our assignments.


> If you are at the point of your CS education that you are taking a serious look at machine learning and understanding the theory then you shouldn't have a problem translating into whatever your language of choice is.

This feels like begging the question. Why does that need to be the case? Why can't someone strive to learn machine learning _without_ learning a new language? Why can't they get a head start on the concepts early in their career? Is there some requirement that ML _must_ be an advanced topic, only accessible to polyglots that I haven't heard about?


When I first wrote "Introduction to open heart surgery: your housecat" people asked me "why housecats?". Because they are easy to find, and can get you real experience. No need to work up to open heart surgery, it should be available to anyone that has a sharp implement, a housecat, and a thirst for knowledge. Bloody, messy, knowledge.


Because the nature of the subject requires a fair amount of background. To truly understand the subject and a lot of the approaches a firm understand of statistics, data structures, and even some calculus. Usually by the time someone these subjects down enough for anything substantial in ML then they've seen enough different languages to suss out the general idea of most algorithm sample code.

I'm not saying there isn't room for the easier to understand and easier to read guides to ML. More the better, Mitchell was a beast to read through. Its just the language isn't the hard part of the subject. You are the author of the link, correct? I read through some of it, and its approaches the theory and subject matter in a gentle way which is what matters. The sample code is easy to read. I've written maybe 100 lines of js in my life and avoid all web dev like the plague. Your guide is well written and useful. I am not dogging it at all and please don't take it that way. I think its great!

What I'm saying is if someone is saying to themselves "I would be able to learn machine learning if only their was a guide in X" then they are probably mistaken. The code is easy, the math and theory is what is hard.


> Its just the language isn't the hard part of the subject.

For you, sure -- but not for everyone.

This series has actually been up for a little over a year now. I get emails from people who didn't know what machine learning was before they started reading the articles, and now they're building some of the most creative and beautiful projects out there. I also get emails from people who need to implement ML in JS or C-like languages but have had trouble seeing the algorithms in full relief when translating from Python, for instance.

The point is, your experience is not everyone's experience. My goal is purely one of accessibility of education. There are smart, talented people who never played with ML simply because they didn't want to dive into a different language, different platform, and different environment just to muck around. There are people who hadn't heard of ML before, but tried it out because JS was right there for them. There are people who stayed away from ML because they thought higher math and a CS education were requirements. Those are facts. This series serves all those people, and it serves them well.


I just want to make sure that its clear that I think you are hitting that goal and your posts look great. There is nothing worse than posting something on the internet to have some snarky neckbeard from the peanut gallery put it down for some tangential reason.


>"To truly understand the subject and a lot of the approaches a firm understand of statistics, data structures, and even some calculus. Usually by the time someone these subjects down enough for anything substantial in ML then they've seen enough different languages to suss out the general idea of most algorithm sample code."

I learned calculus and linear algebra long before learning to code.


I don't mean to be snarky, but how was your data structures knowledge then?

He asked "Is there some requirement that ML _must_ be an advanced topic" and I listed a few prerequisite pieces of knowledge that make it fairly advanced. You may learn those prerequisites in a different order but they are required before you properly tackle machine learning without cargoculting through it.


Only a question of time before the Machine Learning wave arrived at the arguably least suited--but certainly the most popular--platform!

There's money to be made with this combination. The field is ripe.

Good write up too.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: