The problems with traditional multi-threaded concurrency go beyond just complexi...

pcwalton · on July 17, 2015

Shared-nothing is great when you can do it. But sometimes the cost of copying is too high, and that's what shared memory is for.

Take, for example, a simple texturing fragment shader in GLSL. You're not going to copy the entire texture to every single GPU unit; it might be a 4096x4096 texture you're rendering only a dozen pixels of. Rather, you take advantage of the caching behavior of the memory hierarchy to have each shading unit only cache the part it needs. This is what shared memory can do for you: it enables you to use the hardware to dynamically distribute the data around to maximize locality.

jandrewrogers · on July 17, 2015

I did not mean to imply that you are giving every process a copy of all the data. The main trick is decomposition of the application, data model, and operations such that every process may have a thousand discrete and disjoint shards of "stuff" it shares with no other process. The large number of shards per process mean that average load across shards will be relatively balanced. The "one shard per server/core" model is popular but poor architecture precisely because it is expensive to keep balanced.

However, in these models you rarely move data between cores because it is expensive, both due to NUMA and cache effects. Instead, you move the operations to the data, just like you would in a big distributed system. This is the part most software engineers are not used to -- you move the operations to the threads that own the data rather moving to the data to the threads (traditional multithreading) that own the operations. Moving operations is almost always much cheaper than moving data, even within a single server, and operations are not updatable shared state.

This turns out to be a very effective architecture for highly concurrent, write heavy software like database engines. It is much faster than, for example, the currently trendy lock-free architectures. Most of the performance benefit is much better locality and fewer stalls or context switches, but it has the added benefit of implementation simplicity since your main execution path is not sharing anything.

dman · on July 18, 2015

Have been reading your hn posts for a while. At this point I am intensely curious to take a look at the space curve codebase :)

arielby · on July 18, 2015

Don't you lose all of your performance gains in RPC overhead? How do you avoid latency in the data thread (do you have one thread per lockable object? won't that be more than 1 thread per core?) - these are the reasons lock-free is so popular.

simoncion · on July 19, 2015

> Don't you lose all of your performance gains in RPC overhead?

If one did, then why would anyone who knew what they were talking about (or even just knew how to write and use a decent performance test) advocate this method? :)

SamReidHughes · on July 18, 2015

In a database engine, you ultimately need to move some data around, too. After all, you can't move a network connection to five threads at once, and not all aggregations can be decomposed into pieces that return small amounts of information back. Sharing memory often brings a quite substantial speedup.

vasilipupkin · on July 18, 2015

agree. But let's say each of these processes bound to a core need to log important information. Well, that logging should probably be done on a separate thread. So, it is really just a matter of using multi threading where appropriate and not using it just to use it

kapv89 · on July 18, 2015

I hope you realize you are advocating node.js without ever calling it by name