Why Events Are A Bad Idea (for High-concurrency Servers)

tptacek · on Aug 20, 2011

This has been posted to HN a bunch of times over the years:

http://www.hnsearch.com/search#request/submissions&q=Eve...

What I said about it last time:

The problem with the article is that it's superficial. At its core, it's premised on the idea that threads are inherently easier than events, because they produce straight-line code and automatically handle tracking the state needed across a transaction. If that's true, then the article says threads are just as good as events once we solve all the problems that make threads dangerous or slow.

But it's not a given that threads are easy. In particular, as anyone who has spent time doing high-performance threaded code will tell you, there's the effort you have to put in to make sure your threaded code is correct, and then the 2x more time you have to put in to make sure that it's efficient; in particular, you have to do synchronization without effectively serializing all your threads. Which is a huge pain.

I think this is gradually becoming a moot point, as modern languages --- all of which have some kind of anonymous function and some approximation of closures --- allow us to write evented code that looks and feels like straight-line code. jQ callbacks in Javascript are the most vivid illustration of this idea (I suppose the node.js people are carrying that even further; I haven't played with it).

PaulHoule · on Aug 20, 2011

I've written both kinds of code and I like threads better. With threads there's an abundant literature about the problems you'll run into. With events I often feel like I'm on my own.

Back in 2002 I did an evaluation of all the open source event-based web servers that were out there for Linux and found that all of them had issues with data corruption. You just couldn't expect to download a 100MB file and get it right every time. Things are better now, but process and thread-based web servers got mature by 1996 if not sooner.

MostAwesomeDude · on Aug 21, 2011

Link, please?

Cushman · on Aug 20, 2011

modern languages --- all of which have some kind of anonymous function and some approximation of closures --- allow us to write evented code that looks and feels like straight-line code

Not getting at you, but I'd question the idea that writing code for distributed systems which looks and feels just like code for a single machine is desirable. My gut says the righteous path looks much more like Erlang, treating concurrent programming as a paradigm in its own right and leaving implementation details up to the compiler.

noelwelsh · on Aug 20, 2011

There are threads and then there are threads. I sometimes see people from a C/C++ background assuming that threads means shared memory concurrency, with all the synchronisation problems this entails. If you have a message passing/actor abstraction, so that threads only communicate by sending and receiving messages, writing high performance threaded code is pretty easy. I've done it, as have many others (case in point: any system written in Erlang).

Like you say, a decent language can make event-driven code seem like straight-line code. The Akka framework (http://akka.io/) even makes event-driven code look like message passing. There are two substantive points to be made, I believe:

- If your language's runtime doesn't allow you to use all available CPUs it's a bit lame -- or at least limited in its application. (Hi Node.js.)

- Given one concurrency abstraction you can, most of the time, build other concurrency abstractions. We've seen this above with Akka. To make this feasible your language better provide some lightweight concurrency mechanism -- either lightweight threads or events.

barrkel · on Aug 20, 2011

It is a moot point - because CPS transformation is the very process of turning straight-line code into event-style code using closures for callbacks, and it is mechanical. The compiler can do it, and for some languages it does it as a matter of course.

The problem with threads is that they are more general than a good implementation of events. You've got a lot more rope to hang yourself with.

Peaker · on Aug 20, 2011

They acknowledge that shared mutable state is (very) problematic in conjunction with preemptive multitasking, so they're actually arguing for cooperatively-multitasked threads as opposed to events. In both systems, shared-mutable state is being used between the concurrency code units.

I think they're neglecting a third option: Shared-immutable state, with very little, explicit, shared-mutable state. This is the model used by Haskell threads, and Erlang processes. Like Events/Cooperative-threads, it does away with the horrible semantics of shared-by-default, and gives the parallelism benefits lost when switching to cooperative multitasking.

But I think there are still 2 crucial differences between event-based systems and cooperative threads that I did not see in the paper:

* Events make it clear where context-switch/reentrancy points are. When I call a library function, then I generally have a very good idea if it could "call me back" (have I given it callbacks to call me back?) and if not -- I know it is safe to call it and do not have to reason about my internal state in every given call. Even with cooperative threads, a "yield" point may be lurking in any library you're calling. This effectively means that any library call can potentially do anything, including switch to "higher-level" code that may re-enter your own code. This makes re-entrancy considerations far more complicated. It is solvable, though, if you make sure that yield-points are "tagged" with some token that must be passed as an argument. Then, a function signature and call pattern tells you whether or not it can yield to anywhere.

* Implementing user-level threads may be very cheap, compared with standard posix threads. But it is still many times more expensive than using explicit data allocations as in event-based programs. Even if using "split stacks", a user-level thread is still going to pay with around a 4Kbyte stack. 1 million threads thus take 4GB of memory before they even do anything. This means that you still have to be wary of creating threads as freely as you would register callbacks -- and requires mixing in event-based programming, or manual event loops, into your threads. This mixture is worse than uniformity of any method.

alextp · on Aug 20, 2011

The cooperative multitasking model of F#'s async makes good progress in solving your first point. There are essentially two ways of assigning the result of code blocks to variables: doing "let a = foo bar" which runs the code without a yield and "let! a = foo bar" which can yield as required inside foo. So you always know which library functions are safe (and you can even write unsafe ones yourself) and can write cooperative threads very easily (just as easily as usual threads).

Peaker · on Aug 20, 2011

Is that a language feature specifically tailored for cooperative multitasking?

That sounds good, except passing an argument ("yielder") that cannot be captured/stored (so you have to take it as an argument) sounds like it would require less language magic to solve the same problem.

pjscott · on Aug 20, 2011

You make some good points, but why would every lightweight thread require about 4k of memory? I know for a fact that the Haskell runtime doesn't need that much; it allocates a starting stack (default size 1k, but can be lower) and grows it if necessary. I imagine the techniques here would also apply to other languages.

marshray · on Aug 20, 2011

4k is the paging granularity of x86, so if you want to be able to use hardware-assisted guard pages to know when to grow the stack, you have to allocate in multiples of 4k.

Peaker · on Aug 21, 2011

Haskell allocates 1KB (which is also much larger than the allocation required for a typical callback registration in event-based programming) initially, but grows the stack in chunks of 4KB.

These small chunks come at a cost, too. The GHC runtime needs to copy around the edges of the stack at the chunk boundaries, and stack use becomes slightly more expensive.

wmf · on Aug 20, 2011

This paper is one part of the long-running threads vs. events debate, and it probably isn't that helpful to read it in isolation. Erlang, JRockit, and Go implemented the kind of scalable green threads proposed in this paper, although I don't think they were inspired by it. Most programmers who care about high concurrency (other than Erlangers) decided to just bite the bullet and accept the difficulty of writing event-driven code rather than try to make threads scale. The fact that people prefer to struggle along with events doesn't negate their difficulty, though.

bcx · on Aug 20, 2011

That paper is also 8 years old (or at least the timestamp on that page is way off) -- which was before node even existed.

If it was so easy to implement their proposed high performance thread system (disclaimer: haven't read the paper) -- Why don't we see it in production 8 years later?

I'd argue that we don't see it in production, because it lacks some of the advantages they propose in the abstract.

saurik · on Aug 20, 2011

In this case, as someone who actually has built one of these efficient threading models (and studied, while in academia, these results), I believe it is only because the larger engineering community has incorrectly associated "threads" with "slow", and refuses to even though them anymore, especially if it involves modifying the compiler to make the stack ripping efficient (I did not modify the compiler, and instead required the developer to occasionally manually rip the stack). The result "events == threads" is introductory CS material at this point, and was largely established in 1978: this paper only claims to "refine" that result.

Frankly, the event/thread duality result is actually pretty trivial to grok if you bother to look at it. Hell: with only minimal overhead (due to having started with events and having to adapt up), you can use a continuation passing transform on JavaScript and have a threaded node.js (and some people even do this): you'll have roughly the same concurrency and roughly the same memory usage (and the "roughly", again, only comes from the fact that we are building the threads on top of events with an adapter that will be forced to slightly inefficiently store and access its state).

olegp · on Aug 20, 2011

I've run some benchmarks while developing Common Node, which is my stab at implementing the synchronous CommonJS proposals on top of Node using node-fibers. Take a look at https://github.com/olegp/common-node and scroll to the bottom to see response time distribution graphs.

The node-fibers version has roughly 70% of the throughput of node. Memory consumption was on the order of 28MB for node and 32MB for node-fibers. This is more or less unoptimized code with room for improvement.

pjscott · on Aug 20, 2011

This kind of thing is in production.

There are systems using light-weight threads to write very high concurrency servers. That's what the Erlang VM does, for example. The GHC Haskell runtime now does this as well. They both use epoll/kqueue/etc. in their I/O systems, but the code itself is threaded in user-space, and these lightweight threads can be mapped onto a pool of OS-level threads so the programs can run on multiple cores.

Even without fancy compiler and runtime support, you can do this. In Python, you can use eventlet or gevent to write very concurrent servers with what looks like threaded code, thanks to coroutines in the interpreter. It's very pleasant.

anonymoushn · on Aug 21, 2011

We're using this in production right now with greenlet. The way we're using it, it has less expressive power than an event-based approach, but a large majority of the code doesn't actually use the power that we lose by not using events. We can use events when we need events, and when we don't, we can write significantly shorter code that more closely resembles our intent.

I would guess that one large part of the reason that this is uncommon in production is that fresh CS graduates are not familiar with it. Another reason might be that you can't use this model at all on the client side for some popular clients (like web browsers).

schiptsov · on Aug 20, 2011

Also good readings: type in google search:

  threads +posix +signals

  threads +stack +usage

  thread +safe +function

velshin · on Aug 20, 2011

Circa 2003. fast forward to 2011... Youtube, Facebook et al seem to have refuted this. Tornado/epoll + nginx ftw

pjscott · on Aug 20, 2011

The paper doesn't dispute that you can write good, high-performance servers with event-based programming. It's disputing the claim that you can't do it with threads, and then claiming that threads are an easier model for programming servers.

Here's a thought experiment: if your language of choice offered threads so lightweight as to be almost free, with proper preemptive scheduling, not limited to a single core, with the I/O using something like epoll or kqueue underneath -- would you use it?

Erlang and Haskell offer this right now. Python can give you a passable approximation. It really is as cool as it sounds.

ovi256 · on Aug 20, 2011

How much of that is inertia is anybody's guess.

justincormack · on Aug 20, 2011

I dont think it is inertia, it is just that making stacks small enough is a harder issue than just making a fast evented system. More to change. Will take longer or involve more radical change, like shifting to Erlang say...

pjscott · on Aug 20, 2011

Fundamentally, this is a compiler and runtime issue. If they guys behind some mainstream programming language wanted to, they could go and add lightweight threads that people could just use. It would take a lot of engineering effort, and there would be some compatibility edge cases, but it could definitely be done.

(In the meantime, I'm happily using eventlet in Python, and some people swear by gevent. These have their shortcomings, but they're worth the trouble, easily.)

wycats · on Aug 20, 2011

Ruby 1.9 has lightweight threads built-in (fibers) but you have to write and manage the scheduler on your own. That also means you can't use any existing IO libraries that already know how to yield back to the scheduler, but it's only a matter of time before that's a solved problem.

pjscott · on Aug 20, 2011

Nice! The big caveat here is that they're cooperatively-scheduled, and they've left the I/O stuff up to libraries. In other words, they look exactly like Python's greenlets, which are very useful when you add the appropriate libraries on top.

Edit: the other big issue here is that blocking operations (e.g. anything in libmysql) are not automatically deferred to a thread pool -- that has to be done manually. Ruby's fibers and Python's greenlets are magical, but not magical enough that you don't occasionally get tackled by fantasy creatures materializing from the aether. If that makes sense.

atarian · on Aug 20, 2011

Interesting.. just when I was about to pick up on Node.js too.

jameskilton · on Aug 20, 2011

I wouldn't use this to NOT give Node a try. It's a great system for teaching yourself event-driven execution. The scale at which this paper is talking about won't be hit by most people playing with Node (from how I've read the paper so far).

Being familiar with thread and events makes you a better programmer, so go forth and learn! As I see it, scalable concurrency is far from a solved problem, so it's good to see people pushing both sides of the equation.

pjscott · on Aug 20, 2011

Plus, node is fun, and has a very nice community. I've been pushing threads in this discussion, but don't let that discourage anybody from playing with node.

maushu · on Aug 20, 2011

Like it has been stated on other comments here, this is a old paper. Technology (both hardware and software) has changed.

pjscott · on Aug 20, 2011

The changes in technology since then have made the paper's point more relevant. Threaded servers can exploit multicore processors, and if they're preemptively scheduled, you can write threaded server code without worrying too terribly much about taking excessive CPU time -- not something you want to do in an event handler on a single thread. Some really solid software implementations of lightweight threading have been made, and they work great.

So, what hardware and software changes were you thinking of?

iamelgringo · on Aug 20, 2011

Anyone from the Node community care to tackle this?

wycats · on Aug 20, 2011

What makes you think that the node community has a well-thought-out rebuttal to a mostly academic debate about whether ideal threads could perform as well as events?

pjscott · on Aug 20, 2011

The great thing about this paper is that it goes beyond academic debates over ideal threads, and actually writes code and runs benchmarks. The debate here is practical.

scott_s · on Aug 22, 2011

I realize that for most people, "academic" is synonymous with "moot," but all systems papers have experimental results.

nirvana · on Aug 20, 2011

This paper has been proven correct, if you replace the word "threads" with "processes" and note that erlang has provided a solution with the compiler support they advocate. Outside of erlang, (and some equivalently obscure environments) little attempt has been made to make concurrent processes manageable, and this is why you see the predominance of event-based systems. We're in a multi-core era and and you can't really say you're "concurrent" when your event model only runs on a single core.

While I think, if your choices are the typical thread support and its problems and an event based system then events are the way to go... I believe the great majority of developers have decided to use node or twisted, et. al, either because they'd never heard of erlang, never understood its strength, or weren't willing to deal with the initially alien looking syntax.

I think this is a mistake, but rather than fight it, I'm working on a solution that allows coffeescript to run within a distributed system based on erlang. I've had to provide some work arounds and conventions for the coffeescript developer to follow a sequential programming style, but have the ability for coffeescript to really run concurrently on multiple CPUs or multiple machines, with a common shared state.

(I described some of this here: http://news.ycombinator.com/item?id=2906290 and went into the architecture a bit more here: http://news.ycombinator.com/item?id=2848647 )

Granted, it is easy for me to say it should be done a different way, and until I publish code it is all talk. I'm very aware of this... but I would like to suggest that there really is a difference in erlang, and that events are a compromise that don't inherently actually allow for concurrency. Concurrency is a better solution, isn't that hard to do now (if you write straight up erlang) and I'm working on making it even easier (if you're willing to be constrained a little, bit, but I think the popularity of node.js shows people are willing.)

If it seems I'm shilling for my project, I am a little bit, but I'm also wanting to get it out there so that if I'm making a mistake maybe someone can identify it... or be inspired to build a better way on top of it.

If you want updates, I have a twitter account @nirvanacore that only talks about this project.

EGreg · on Aug 20, 2011

If a function call is asynchronous, you can say you are "concurrent", because whatever that function is doing -- I/O, or something else -- is being done concurrently while you are handling another event. Usually Node.js spawns threads to do whatever it has to do, or the operating system does it asynchronously (e.g. using DMA and interrupts).

nirvana · on Aug 20, 2011

Right, and that is an improvement. But you don't have a set of V8 virtual machines all running your javascript in parallel. This is what I'm doing (though may not use V8).

Peaker · on Aug 20, 2011

Actually, they're arguing for cooperatively-multitasked threads, so scaling to multi-core is apparently not their concern. At least not when compared with event systems which explicitly are not multi-core.

pjscott · on Aug 20, 2011

True -- but it is possible to make lightweight preemptively-multitasked threads that get scheduled onto multiple cores. It's just that this paper wasn't talking about it.