What's wrong with message passing though - MPI does it, Erlang does it - surely ...

azakai · on Feb 26, 2015

Message passing can be very efficient for some tasks. But there are cases where it is hard to optimize out copying.

For example, let's say you're running a raytracer, and you have several web workers each render a slice of the frame. Then each worker can transfer back their output to the "main" thread (using existing typed array transfer). But the main thread now has several separate typed arrays, one from each worker. If it wants to combine them all into one contiguous typed array, it needs to do a copy of the data, which is something we'd like to avoid.

In this case, what you really want is to have a single contiguous typed array, and let each worker write to a slice of it. Something similar to that would be possible in what is proposed in the blogpost.

justincormack · on Feb 26, 2015

Copying in cache is very fast now. Spending time aboiding copies is no longer always a win like it used to be.

bzbarsky · on Feb 26, 2015

In cache is key, though.

Decoded image data is 4 bytes per pixel, so a raytraced image of any sort of reasonable size would barely (if at all) fit in L3 cache even on modern processors. And you need to fit both the source and the destination, right?

sunfish · on Feb 26, 2015

The shared ArrayBuffer interface being described here is following the philosophy of the Extensible Web Manifesto [0]. The idea is that libraries providing higher-level APIs and programming models, such as message passing, can be built on top of low-level primitives.

[0] https://extensiblewebmanifesto.org/

reissbaker · on Feb 27, 2015

When you have a large amount of data that needs to be sequentially iterated through by multiple threads — e.g. in games, where the UI thread and the physics thread are often entirely separate but read off a shared world state — message passing falls over. The copies are just too expensive.

Message passing is great for things that use small amounts of data but lots of CPU, though!

ndesaulniers · on Feb 26, 2015

Forgive me if my comment seems ignorant, as I've had experience with MPI and threaded code, but not professionally. I also do not provide any numbers or profiles.

I would think message passing in terms of MPI is acceptable, because "the cost of copying" is insignificant to "the cost of network latency." When you have very little overhead (multiple threads performing atomic operations on shared memory), then "the cost of copying" becomes relatively significant. And if you want to target JS given existing legacy C++ code that probably won't be rewritten, well then the JS execution environment will have to be the one to bend.

woah · on Feb 26, 2015

Which existing C++ code are you referring to? It seems like a bad idea to compromise the soundness of Javascript to do whatever it is you are talking about doing. I don't think that legacy C++ code is something that should be causing anything to bend.

amelius · on Feb 26, 2015

It starts to become a problem when the messages are very large. Consider the case where you need to send a big set to a thread, so that it can use it as a part of a computation. Conversion to JSON would be too slow, because every element of the set would have to be visited upon invocation.

You often want large computations to be performed in a side-thread (to avoid blocking the UI thread). It would be a pity if such computations couldn't take large data-structures as an argument, because large computations often take large amounts of data as input.

odiroot · on Feb 26, 2015

I think they mostly mean the overhead of (de)serialization from/into JSON. It's also hard to pass binary data that way.

benjaminjackman · on Feb 26, 2015

This was mentioned in one line of the article so it's easy to miss, (I only recently heard about them which is why I caught it) but transfering binary data can actually be done with Transferrable objects already[1]

However for some high performance applications even that overhead might be too much because it requires allocation. Also having a regions of opt-in shared memory allows for higher level languages / patterns where message passing isn't the perfect answer.

An example off the top of my head that hits on both points (no-alloc + higher level patterns) would be to have one worker writing something encoded with SBE[2] into a shared buffer and having another consume it.

That will be 0 allocation (thus no gc-pressure) and very fast, for a class of applications avoiding GC pressure all together is really important.

It's a little sad that you can't share that back with the main thread. But it's not a deal breaker by any stretch, think about the main thread in the web like a classic gui event loop which you just use for rendering and you can still use transferable objects to get data to it and you should be fine.

1: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...

2: https://github.com/real-logic/simple-binary-encoding

jewel · on Feb 26, 2015

Would it work to have the data structures be copy-on-write? That way if the worker only reads then it's O(1), you just pass a reference to the worker.

I imagine it'd be a pain to write a garbage collector for something like that.

benjaminjackman · on Feb 26, 2015

Copy on write is already implemented-ish with transferable objects [1] (just copy then write the copy).

However, copy on write still requires allocation and for some applications that is a deal breaker.

1: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...

polskibus · on Feb 26, 2015

Perhaps we need a fast copy mechanism for js to make message passing a more attractive option? Or make interpreter recognize such cases and do it under the hood/natively. Keeping my fingers crossed for a more functional approach.