> What we liked the most of Javascript was the fact that it’s an asynchronous language ‘by default’.
That's not strictly true - a "while (true)" will lock up a Node process as far as I can tell. I think a more accurate way of stating it would be that "Javascript API's and libraries tend to be written with asynchronous use in mind", with lots of callbacks.
If you want something that's async at a deeper level, Erlang is worth a look.
Ah, so it uses preemptive rather than cooperative multitasking. I find that locking the whole thread is usually an error, so preemptive multitasking wouldn't help much there, and I like how cooperative multitasking lets me reason about the program, but I can see the benefits in both approaches.
Erlang does not have a problem reasoning about the program, because it doesn't share any data. They are actually independent from each other in the same ways that two OS processes are independent from each other (that is, there are still resource contention issues but abstract correctness don't unduly depend on each other).
I'm running out of polite ways to say this, but Node advocates really need to learn about other ways of doing things before advocating so confidently that Node's way is better. Cooperative multithreading does not have the reasoning advantage, which is one of the reasons why it has been abandoned for so long at the OS level. It's incredibly harder to work with that preemptive multithreading sorts of things, combined with other techniques that have been developed over the decades.
(Also, I said "Node advocates" and not StavrosK or "you" specifically; I mean that more generally than just your post here.)
Oh, I'm speaking from more of a Go/gevent approach. Obviously, a correctly-designed program would try to avoid shared data structures as much as possible, in any language, e.g. for Go you would write goroutines that communicate with channels.
The "reasoning" I was referring to was more that I know that simple things like incrementing a global counter is guaranteed to be atomic. I agree with you that that's still a shared structure, and thus dirty, so I avoid it anyway.
Now that you mention it, I don't actually know why I said that before, since I never write production code like that. I guess I was referring to ad-hoc scripts, where I like knowing that I don't need a lock to guarantee atomic operations, and thus they're a bit easier to write.
It will block your process (just like a `receive' does), but a good Erlang/OTP system is generally comprised of a great many processes. If one process can block, your design is wrong, and the system itself is designed to be robust when things block like this. You expect your processes to be blocked, but you design things so that truly concurrent activities are being run in parallel.
It is a magnitude easier in Node to block the whole OS process by doing anything CPU intensive. In Erlang, if not using C-bindings, there is nothing a single Erlang process can do to block the whole runtime.
Well, yes, there is. When implementing locks, it depends on how likely and how long contention is. If the expected cost of busy waiting is less than the expected cost of interacting with a heavier-weight scheduling system (which may mean crossing the kernel boundary), then busy waiting is better. This is, of course, for lower level codes. For example, spin locks (so-called because they spin - or, busy wait - on a status variable) are frequently used in kernel code, and I have used them in a memory allocator.
Well do you mean per-process or for the whole node? You'd have to go out of your way in any case. For process you could create a tail recursive loop that just calls itself. It will run for 2000 reductions then the scheduler will kick it out and so on. But it own't block other processes.
To block a whole node'd have to perhaps write a native functions a (NIF) and do it there. Which is also a reason to be careful with NIFs, they can kill predictable latency under load, while initially in serial benchmarks they could show a performance improvement.
I would say yes, it's very easy to write inline anonymous functions which are very very nice for a primarily callback-driven model. I much prefer Python as a language overall, but I think writing code for "Node.py" would be an awful lot harder.
I am using a mix of Ruby, Node and Java in production but after some hard-learned lessons chose to minimize the Node usage to things I absolutely need.
My experience with the Node community:
- Great people (Substack!)
- Great attitude #node.js/freenode
- However, many hours spent on solving bugs in existing libraries.
My experience with maintaining production Node code:
- It may be more maintainable than EventMachine, but it's absolutely not more maintainable than Ruby.
- V8 garbage collection is a real pain when you do work that needs it (this also includes memory held by open sockets).
- Was V8 built for the server? (rhetorical)
In the end, I prefer threaded Ruby code to evented Node code. I try to offset the inefficiencies of "threads vs evented" or "Ruby vs Node" by using the JVM and JRuby.
It seems needlessly limiting to make EventMachine and Node your only candidates. Since you've already got java in the stack, what about clojure? What about go?
I used the examples from the original article. I think the OP ported an EventMachine code to Node.
When I said 'Java' I really meant JVM. Specifically I'm using Scala.
Regardless, I have a different Clojure based project, with it as well, I prefer doing what clojure has to offer in terms of concurrency (pmap, etc).
I have dipped a toe in Go. Loved the fact that you get a compiled binary, ecosystem still feels skinny. Ended up concluding that in a year or two it'll be worth revisiting.
Like superfeeder, I have "backend" services, but I also have client-facing services.
I value how Node.js handles slow clients. It also services some more 'utility' use
cases for me, such as reverse proxies, etc.
JRuby and Scala cover IO bound processing for me over the Web.
JRuby covers the majority of the backend services.
I use Scala coupled with Storm, and I could have used JRuby here too, but you can also use JRuby with Hadoop and you typically don't. Since this
use case actually required the optimization (I wasn't prematurely optimizing), I went as bare metal as possible (ruled Java out, yes). Previously, this service was a Node.js service and got rewritten into Scala+Storm.
I don't use Akka because I didn't feel it was needed yet. Old school threaded workers with JRuby works fine so far.
I know that Scala is supposed to be a multi-layered solution and it can handle all of this.
However, Ruby and Node brings the ecosystem Scala doesn't have (I'm not moved by the "but Java has a million Jars out there" argument, already integrating with them with JRuby).
And in general Ruby makes me happy (SBT makes me very very angry and sad, for comparison - yet Scala is OK), that simple.
>could you detail how you are using jruby to offset evented systems on node.js ?
Simply because JRuby has real threads over MRI's GIL, and the JVM has a time-proven Server VM that has a very good JIT. I also leave open the option to drop to 'bare' Java.
As the OP mentioned, it's a craft of balancing CPU and memory. In my case I don't mind the extra threads memory and context switching overhead.
> If you are using scala, which async framework are you using - I have heard it is basically scalatra vs spray.io
My Scala use case is with Storm, as mentioned in other reply. You can call that "backend" processing, instead of using it with a Web framework.
> It terms of performance, we also have seen a significant (about 25%) bump in terms of feeds processed by second per server.
They rewrote the entire codebase and obtained for just a 25% gain? It doesn't sound like they are very happy to be coding in javascript now either.
Everyone: please don't rewrite your code, it's almost never worth it. The one exception is rewriting a core part of an algorithm in C for speed (the last 10x speedup).
That's fine. Did the code become easier to maintain? It's not clear from the article. The "The good" section is relatively weak and the "The not-so-good" has some painful points about memory management and api instability.
It sounds a bit like you guys rewrote it because node.js is hot and you then stuck with the rewrite because of sunk costs. Or am I imagining things?
Thanks for sharing your experiences with us. I'm just asking those questions because I want to learn more about them.
Well, we assumed everyone knew about the good... so yes, we're happy with the rewrite and will probably never go back. Also, a 25% saving in servers for us translate in several thousands of dollars saved monthly. Not negligible :)
Yeah but in your article you said you're not even sure if this is because of node. You made some architectural changes in your code base and hinted that might have been a reason for the speed boost too.
The article might as well be written as "we refactored our code and got a nice speed boost".
Hard to assess exactly, but we were not adding feature and improving the user experience mostly because maintaining the previous code base and evolving it was so costly.
from a long term maintainability perspective, javascript can be notorious. you could always follow a disciplined approach for development to help with maintainability, but that applies to all programming languages. i don't see anything about javascript that makes it more maintainable especially for large projects.
Well, again, most of the dependencies we used in Ruby did not see any update in the 3 years we've been using them. We also reported several bugs in those libraries/dependencies which were never fixed. This led us overtime to use our own branch of all the significant dependencies we had (including the MySQL gem for EM, the redis gem... and several other key ones).
Most of node modules are still in active development. As I've stated in the blog post, that's a pro and con, but we estimated that the pro was greater than the con :)
They might not be, but that doesn't matter as much as being able to report bugs and have them fixed now, while people are still actually developing/fixing things.
Trying to make do with horrible code for long periods of time, suffering through bugs in every change... Until we finally rewrite it. I've never regretted a rewrite, and it has always ended up significantly better than the original (possibly because the horror threshold for rewriting is high, so that's not saying much).
In my experience it's quite doable (with a good IDE) to refactor the working but messy code into something maintainable. It might take the same amount of time as a rewrite but you'll still have all the features and corner cases covered. Usually refactoring will be much faster though.
I disagree. Incremental changes are not always capable of reaching point B from point A.
A large, extremely messy code-base can make even trivial changes unsafe, let alone large changes.
The effort required not only to figure out what point B is, but also how to safely reach it from point A is immense. Much much harder than a rewrite.
However, it is a very good idea to meticulously read the bad old code and write down a list of things it handles -- to make sure none of it is missed in the rewrite.
I'd look at it from another side. They took a working project which they knew well and transferred it completely to a different platform. The first "complete" version of the rewrite had 25% more throughput.
In that case it's a pretty good achievement, since the rewrite will have a lot more low-hanging fruit regarding performance improvements, than a polished, existing product. I'm used to see rewrites which are at least a little bit worse than previous versions due to all the work done to squeeze everything out of version N-1.
"Also, C is not the only language worth rewriting in" Agreed!
I would really like to see someone write a node.js and Go back-end for the same front-end and do a head to head comparison.
As someone who spent a fair amount of time rewriting node.js prototypes in Go, I'm probably biased. I feel like javascript is a much less maintainable language. Perhaps is was the original node.js implementations (I don't think it was), but the Go versions were always faster, used less memory and IMHO were more readable.
Nothing too unexpected, off the top of my head, I've noticed:
* Use Go tip; You can grab a snapshot review all the open issues for that snapshot (most are enhancements).
* Like any re-factor doing it sooner as opposed to later is less work :)
* Do some "from scratch" Go projects before doing re-factor projects to get your legs under you (if they are not already there)
* Write Go in Go, not C/Python/Java in Go. This is harder than you think when you get started, but, if you ask for help and people tell you you are fighting the system, carefully consider their advice.
* A lot of the Go community likes to use single letter variable names in contexts like receivers, struct state (just look at the stdlib), buck the system, don't do that, use short camel case names. The next guy / you in six months will be glad you did.
* If you have a Java/C++ background you might often write a single threaded version of a daemon and later multithread it later, this is generally an unnecessary step in Go.
* The Go versions really are not much larger (LOC)
* There is lots of useful Go code on github (don't be afraid to try them)
* If you are doing front-endy kind of stuff supplement "net/http" with Gorilla where needed whenever you can rather than rolling your own. ttp://www.gorillatoolkit.org/
* "go tool prof" is a great tool, know how to use it and its top20 / web commands. Even if you don't feel the pain, use it and you will learn what things you do are expensive and it will keep trouble from sneaking up on you.
* If you are using a SQL based store, use a driver that implements the interfaces in "database/sql" rather than providing its own interface. This will make your life very simple if you need to migrate between mySQL <-> Postgre, etc
* LiteIDE is a nice lean cross platform Go IDE that includes syntax highlighing, autocomplete (with gocode) and debugging support. The only think I had to do was write my own syntax highlighting theme, based on Solarized, because I thought the included ones were gross.
It depends on how large the code base is. Rewriting a 10k LOC project isn't a big deal, compared to a 10 mln one. And a rewrite doesn't have to target performance only, but ease of maintenance too.
You have better ways to ask for explanations. Not the OP but I can give some reasons for Javascript and Node.js to not be the best option for a rewrite if the goal is longterm maintainability:
- js / node.js is very young, it might fade out of fashion, and in 10 years it is not impossible that good coders in js will be hard to find.
- Readability and simplicity are prominent in this context, and js has a syntax that is less than optimal in this regard.
- Maintenance tooling may be lacking.
If I were to choose a language for a project that I new will be big and will need care in 10 or 20 years, I'd hesitate, and maybe choose Java (which I hate) or Python (which I hate less). If in a risk-things mood, I'd give Go a try.
Technology and business requirements move too fast to worry about writing software that will be around 10 years from now. Choosing the best technology for your particular use case, that will also allow your software to evolve over time, is much more important than trying to predict the future.
Also, I've gotta say, you completely undermined all 3 points of your argument by saying you'd choose Go.
> Technology and business requirements move too fast to worry about writing software that will be around 10 years from now
This is a common belief but I do not think it applies very well in most cases. If you write the last pic sharing thing, maybe you can dismiss worries, but if you build the next Google, entreprise or scientific software, or even something like Github, you should hope your baby to be well and alive in 10 years, and choose your tech stack accordingly. I guess.
Google in 2002 was nothing like Google in 2012. The only thing the same is the name. The entire software industry has changed at least twice in the last decade or so. If more than 10% of the original code that powered Google is still in production, I would be absolutely shocked.
Remember how Twitter was originally written in Rails, and then it collapsed under its own weight when they tried to scale it? Their business requirements changed, so they rewrote the parts that needed to be rewritten. On the flip side, if they had originally set out to try to build the system that now powers Twitter, not only would they likely have built the wrong thing entirely, they never would have launched.
Software services are evolutionary. You have to always be willing to burn pieces of it down and rewrite them as conditions change around you.
Sure, but changing the tech stack is much harder than rewriting some parts of it. And some setups are easier to adapt than others. Thus choosing the best available option when starting a new project is important, and the likeliness of the technology to be alive and well in ten years should be taken into consideration, among other parameters (your own experience, the current availability of good developers).
Moreover, the "trendiness" of a technology should be counted as a negative factor, because it would tend to be overestimated and cast shadows on potentially better (but less sexy) alternatives.
I don't see how finding JavaScript programmers is going to be a problem in 10 years... even if they started now it would take, at least, that long to fully deprecate the language out of the browser. NodeJS may go away but I think betting on JavaScript going away are some pretty long odds.
Node.js may be young but 98% of the code can be reused (and if you are careful, you probably wrote the fallbacks already). V8 can be built separately.
When you understand the zen of javascript and try to write in a consistent manner, JS is more readable than C. I equate these concerns with arguments about how lisp or scheme is unreadable.
The ecosystem for tooling is expanding, albeit slowly.
However, most of your code probably could be implemented in a way that can be run in browser (e.g. XLS parser: http://niggler.github.com/js-xls/) which is where I see the real value in node. Aligning the languages means fewer moving parts and potential points of failure (as opposed to having to worry about quirks in implementations of many languages and worrying about features supported in one context but not the other)
I kind of agree that 25% gain doen't justify codebase rewrite, let along language/framework switching. Perhaps they are doing this with future traffic peak in mind. Premature optimisation, I know. Probably they've already cleared up other important items on their todo list.
What interested me most is the community part. I've always thought ruby/rails community is awesome. Maybe I should start learning some JavaScript now.
Say I'm a Python developer and I'm looking to write services that are fast. Would I likely be better off going down the route of node.js, Go, Haskell, Erlang? I mean they are all fantastic languages and I've dabbled in most of them but from what I've read, Go seems to be the best one to use if you don't want to shake your world up, but if you do, Haskell or Erlang are nice new paradigms to dive into. Is this true?
Erlang has decades of use in fast, robust, large telecommunication systems. Haskell promises more robustness than any other platform as well as performance sometimes comparable to C. Its downside is just that steep learning curve though.
So if you want speed, I'd go for those two out of your list. Go and node.js are still infants in the game, so I don't think there is enough serious software out there built with these to properly judge their effective speeds.
I agree that Erlang is probably the best example of proven to be robust due to use in large telecommunication systems. I would also agree that Go and Node.js are still infants in the game, but I'd argue that there is enough software out there to judge Go's effective speeds.
- Google is using it internally, where speed is an absolute requirement
- Vitess, recently open sourced (and used internally by youtube) would definitely have to be fast for the task youtube is using it for. (http://code.google.com/p/vitess/)
Just my two cents. I would absolutely stay away from node.js if you are in an environment where people touching the code aren't easily accessible, since it's very easy to write javascript that only you understand. The other languages seem to punish it a bit more, while at times it feels as if javascript embraces it.
Of that list, I'd recommend Haskell and Erlang. Haskell in particular is especially interesting if speed is of concern.
Haskell manages to combine the conciseness and elegance of good Python code with static safety not found in other mainstream languages. Speed-wise, it ranges from OK to great, depending on how skilled you are at optimizing Haskell. The tools for optimizing Haskell code are pretty great, though.
I recommend Erlang. But you have to be careful what you mean by "fast".
There is serially fast. As in fast if you have a single client but that might not be "fast" when you have 10000 clients anymore. Erlang will make sure your system stays responsive under load.
Any of those languages will probably do that but I feel that Erlang is the one with most tooling and most practical experience behind its back.
Also, don't give up on Python. Python is an excellent language and when you don't need 5 nines or reliability or crazy scalability, a Python gevent or tornado based server might just do the job.
The problem is the threading solutions for python fill bolted on, and kind of break the whole "pythonic" feel of everything. Of course this is purely subjective, but for me it feels like I'm "going against the grain" and falling out of the domain space of problems the language was meant for.
That, and Go seems to be close enough to python with those advantages built in that it becomes an easy choice for me. Combine that with the fact that my problems are mostly solved easily with the inbuilt libraries, and it's a clear solution.
> The problem is the threading solutions for python fill bolted on, and kind of break the whole "pythonic" feel of everything.
See I find a lot Twisted, and yield based concurrency frameworks not being Pythonic enough. Threads are just functions that run concurrently. In the case of green threads it is really just as simple as spawn(func) to start a new green thread running function func().
Take a look at these eventlet examples, they are pretty elegant:
Yes there is monkey patching. However, that happens once per program at the very top. That is a small price to pay for the ability to use all the Python libraries out there.
Speaking of libraries. That is one good thing about Python. And probably the reason to stick to it -- the large library ecosystem. Of course it depends on the system you are designing but a large enough system will usually need some other libraries (parsing a protocol, using a work queue etc).
Go and Erlang also have plenty, but not nearly the level and breadth that Python has.
I would recommend Go since you'll be able to parse Go easier and it was a language designed for fast web services. That, and I've had experience writing web services with Go that have proven to be fast, reliable, easy to extend, and easy to maintain.
Not to mention, dead simple and clear to people who don't even know Go.
Go is actually fast, not just fast in theory or fast when speed isn't important, or whatever other weasel words you care to choose. Goroutines and channels give you the scalability advantages of async programming, without leaving behind readable code. Deployment is super simple because it produces a compiled binary.
Go has a solid foundation, including things like real support for integers, real support for threads, a runtime that was really developed for servers, and a well-thought-out type system. Some people have compared it to static duck typiing.
Most of the people writing in JavaScript are not programmers. They lack the training and discipline to write good programs. JavaScript has so much expressive power that they are able to do useful things in it, anyway. This has given JavaScript a reputation of being strictly for the amateurs, that it is not suitable for professional programming. This is simply not the case.
I tried Node, because it was the new hotness and trying new things is great. It was distinctly Not For Me, for many of the same reasons that they've found; many tutorials are out of date in subtle ways (I like to think of this as the Rails effect from way back in the good old fun days), the language is just painful to structure and read over for my syntax processing.
There's definitely a lot to like, and like they said it's still a very new world with a lot of exploration to be done.
I have not tried Node (or in fact JS as a whole) yet. But it seems very hard to avoid JS in these days. I am going to learn it in a few months. Can anyone give me a rough idea how bad it is? (I know some Obj-C so is JS worse than that?)
Coming from Objective-C first, which is generally a rather sensible language, I think you'll be surprised at how stupid, unnecessary and inexcusable many of JavaScript's problems are.
People who only have a PHP background, for instance, have become accustomed to such stupidity. They think it's "normal", solely because they don't really know any better.
Those coming from C, C++, Java, C#, Ruby, Python or most other languages, on the other hand, know that the JavaScript way is not the right way. These people generally have a much harder time coming to terms with JavaScript's numerous issues.
It's not that JavaScript is a bad language, it's just easy to write bad code in JavaScript. I don't get it why most people coming from languages you mentioned like to bash JS so much, I used most of these languages in the past or use them currently (mainly Java and Python) and I like JavaScript the most. The problem is most people just don't want to learn JS, they write shitty code because it's easier and than complain how bad the language is. Sure, there are many issues with the language itself, usually because of how flexible it is, but for almost every issue there is a 'good way' of dealing with it.
Unless you are lucky enough to work on only your own code, that it's so easy to write bad code is a massive problem.
My big beef with js is that there's no 'right' way to lay out your code. Trying to figure out how a js module works is always a unnecessarily massive pita.
I too have been surprised at all of the frameworks coming out for JS to be on more than the browser. From my experience large JS projects are a pain to write and maintain. It's a neat language, but I don't really get why I would want to write my server code in it.
I think that's my biggest problem, my background is heavy on C/C++, and moved onto C#, Ruby, Python and Obj-C later and whilst I think Javascript is fine in terms of providing functionality for web pages, the language doesn't lend itself to a large implementation like NodeJS unless you're supremely well versed in it.
I was speaking with a colleague about Node and he suggested using Coffeescript instead of JS directly for the exact reasons you mentioned.
I'm also coming from Obj-C, C and Ruby background and recently read "Javascript: The Good Parts". I'm still looking for the good parts promised in the title and introduction.
I agree; it's a shortcut for people who know JS, not for people who are learning it for the first time. As soon as you need to use or look into the internals of a library that isn't written in CoffeeScript, everything's going to get unstuck pretty quickly.
I personally happen to disagree. (I have production/team experience with both node.js and CoffeeScript.)
For many learning types, I'd guess it's preferable to leisurely learn JavaScript from the shores of CoffeeScript. Rather than having to deal with all of JavaScript's absurdity at once.
And when you need to read someone else's JavaScript code, you can usually get away with ignoring boilerplate.
Instead of worrying about how bad JS is, read JavaScript the Good Parts and Effective JavaScript and focus on how good much of it is. Yes, JavaScript has oddities and there is a quite short list of pitfalls you have to learn to avoid ('==' vs '===' for example), but it also has some very powerful features. Take the time to actually learn actual JavaScript, as opposed to assuming that it's basically a bad Java (or whatever) with slightly different syntax and you'll find a pretty neat and powerful language.
Javascript is overall cleaner than Obj-C. Both languages can of course be used and abused in many ways. Javascript is going to require less code in general. On the flipside, larger projects are harder to keep maintainable.
There are some trivial gotchas in Javascript, but same can be said for most languages. Most importantly weird type promotion system, variable & function hoisting and prototypal inheritance can cause some gray hairs. All those can be lived with, and prototypes are in some ways superior to "classical" inheritance.
C, C++, Java, Javascript, Lua(JIT), Perl and ObjC are languages I normally use.
[1] can give you some insight on the dark side of JS. Also, avoiding JS in some way or another is not that hard, esp. since there're a lot of languages which compile to JS, most notably CoffeeScript and ClojureScript.
As much as I love CoffeeScript, it's not a substitution for learning JavaScript. CoffeeScript is basically just JavaScript with a prettier syntax and if you don't understand JavaScript you won't understand CoffeeScript.
That is completely true, you can't treat CoffeeScript as a replacement for learning Javascript in fact I'd say you need to have a pretty strong understanding of JavaScript to really use CoffeeScript well. That said CoffeeScript does some nice things to prevent typical small js errors for newer devs.
Very true. I love CoffeeScript very, very much. It's my favorite language - I like it even better than ruby, if you believe me! I like using indentation instead of stupid 'end's that only God knows are ending a loop or a def or something else, I like the use of ... instead of * (for splats), and a few other things.
But, as much as I like CS and hate (really hate) JS, I have to agree with you wholeheartedly. You must know JavaScript well. Variable/function hoisting won't bite you in CS, but other JS idiocies will.
Read Douglas Crockford's "JavaScript: The Good Parts" at least twice (and take a lot of notes). It's the best no-bullshit book on JavaScript (the language, not "how to use JavaScript in Node/front-end web design") around.
I like JS and I like NodeJS, but I'm more of an integrator-coder and prefer frameworks, so I miss things like Django, South, mature ORMs, etc. Though I did somewhat overcome this hurdle, I particularly hurt for a migration library when building a NodeJS project (we did find a decent one, but it was young, immature and not well documented, so it took 10x the time to get comfortable working with it and then it lacked lots of features). I also missed the availability of batteries-included API libraries such as Tasty-Pie and Django-Rest-Framework. Again, it's not hard to code up a few URLs for a model, but then ... HATEOS ... default REST stuff ... JSONP support ... etc.
I do look forward to switching to NodeJS and I am fairly early adopter... but it's still been too early for me.
That's not strictly true - a "while (true)" will lock up a Node process as far as I can tell. I think a more accurate way of stating it would be that "Javascript API's and libraries tend to be written with asynchronous use in mind", with lots of callbacks.
If you want something that's async at a deeper level, Erlang is worth a look.