15 Years of Concurrency

krat0sprakhar · on Dec 1, 2016

To note:

> Out of the bunch, Rust has impressed me the most. They have delivered on much of what we set out to deliver with Midori, but actually shipped it (whereas we did not). My hat goes off to that team, seriously, because I know first hand what hard, hard, hard work this level of type system hacking is.

jacquesgt · on Dec 1, 2016

But see also:

> In particular, I’m still conflicted about whether all those type system extensions were warranted. Certainly immutability helped with things far beyond safe concurrency. And so did the side-effect annotations, as they commonly helped to root out bugs caused by unintended side-effects. The future for our industry is a massively distributed one, however, where you want simple individual components composed into a larger fabric. In this world, individual nodes are less “precious”, and arguably the correctness of the overall orchestration will become far more important. I do think this points to a more Go-like approach, with a focus on the RPC mechanisms connecting disparate pieces.

kibwen · on Dec 1, 2016

I don't think this is incompatible with Rust's goal, which is to bring safety to the sort of low-level domain where C is currently dominant and systems can't afford to be structured as isolated nodes communicating only via RPC. In contrast, Midori was aimed at a higher-level general-purpose domain, like Go (and Erlang, too, which is even better than Go at the "cluster of nodes" approach, yet doesn't have a type system at all).

k__ · on Dec 1, 2016

Could it be, that many desicions for Rust have been made to make it more appealing to C devs?

Like, opting out of purity annotations or forced immutability?

I'm not saying these desicions are good or bad, I don't know much about system programming. But I often ask myself what the main motivators for a language are.

tupshin · on Dec 1, 2016

FWIW,after spending a couple of years with rust as my primary programming language, I'm convinced that they made the right decision wrt mutability (immutable by default, but mutability availale when needed), as rust's lifetime system eliminates nearly all the remaining pain points around mutability (data races).

On the other hand, I think it is very important that some notion of purity be incorporated, even if only as an opt-in.

k__ · on Dec 1, 2016

> spending a couple of years with rust as my primary programming language

Oh, how did you achieve that? :)

Groxx · on Dec 1, 2016

Probably not with too much difficulty. 1.0's alpha was nearly 2 years ago. The github repo goes back to mid 2010: https://github.com/rust-lang/rust/graphs/contributors

steveklabnik · on Dec 2, 2016

See also https://github.com/graydon/rust-prehistory

k__ · on Dec 2, 2016

I didn't want to imply that Rust is to young for people to have been spending years using it.

I wanted to know how you could get a Rust-job (using it as primary programming language) today :)

steveklabnik · on Dec 2, 2016

Ah. So, most organizations using Rust in production are moving existing employees over or hiring more generally.

I'd hit up these people: https://www.rust-lang.org/en-US/friends.html

jackmott · on Dec 1, 2016

Forced immutability has enormous performance downsides in some cases. Rust can't have that.

masklinn · on Dec 1, 2016

That's really interesting in that the Rust developers ended up falling on the side of "no". Immutability is contextual (depending on the type of reference you hold) and purity annotations ended up being completely removed.

continuational · on Dec 1, 2016

I'm wondering why the `mut` keyword is named the way it is when you can still have "interior" mutability regardless of whether you have a `mut` reference or not.

Shouldn't it be called `unique` instead?

steveklabnik · on Dec 1, 2016

What my sibling said, but also, this almost happened. It's called "the mutapocalpyse". See here: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focu...

In the end, we decided that these are two equivalent perspectives, and decided to stick with the more traditional mutable/immutable distinction.

FreeFull · on Dec 1, 2016

The mut keyword on variable bindings doesn't influence uniqueness at all, though. You're probably thinking about the `&mut` references vs the `&` references.

sriram_malhar · on Dec 1, 2016

One of the things that this blog ought to have clearly mentioned is that all this business about static analysis, ownership types, read-write annotations, purity, immutability and all that is due to a simple conflation of two axes that ought to be orthogonal: mutation and aliasing.

This is what Rust gets right: you can have an object be mutable (in which case you have only one pointer to it), or you can multiply alias pointers, all of which can only read the object. The type system enforces a static RW locking scheme in a manner of speaking; it ensures that only one of those capabilities is available at any one point in time.

In other words, Rust ensures that mutation by itself is not a problem, that passing mutable structures between threads is statically checked for safety. You don't _have_ to make an object immutable in order to share safely by reference.

---------------

It'd have been nice to hear about the "P" language, also from MSR (https://www.microsoft.com/en-us/research/publication/p-safe-...). No pointers, just value types, communicating state machines, produces model-checkable code. What's not to like?

pjmlp · on Dec 2, 2016

> It'd have been nice to hear about the "P" language

And already used in production, even if in a small use case.

scribu · on Dec 1, 2016

For anyone else wondering what "Midori" is, it looks like it was an internal Microsoft research project:

http://joeduffyblog.com/2015/11/03/blogging-about-midori/

honkhonkpants · on Dec 1, 2016

I've been reading these but it's become apparent that the author lives in a universe parallel to mine. I remember reading that in most languages "A null dereference or out-of-bounds array access is treated the same way as a network connectivity problem or parsing error" and scratching my head, because while I know some languages are like that (Java) there are others were a null dereference and the failure to establish a network connection have radically different effects (C++). At another point discussing error handling he says "This leads us to a beautiful sweet spot. It does suffer from the performance problems I mentioned earlier, but does very well on all other dimensions." so the author's sweet spot is a spot with poor performance, which also put me off because his articles all talk about performance but never concretely (in terms of nanoseconds).

Anyway it's an interesting, if long, series but most of the interest for me has been in reading things from a very different perspective from the one I hear at work every day.

Locke1689 · on Dec 1, 2016

I'm not really clear what this comment implies?

What system do you work with that doesn't have null pointers but also cares about eliminating as many fine-grained locks as possible?

asveikau · on Dec 1, 2016

I believe the point you are missing: The article compares a null pointer deference and network failure.

One of those is routine.

The other is a serious logic error that strictly speaking a correct program cannot be said to do.

Locke1689 · on Dec 1, 2016

After re-reading I think I see that point, but I still don't understand the overall comment...

Is the author saying his standard language is C++? Everything I know about Midori said that it was competitive with C++ on performance in almost every way. Regarding error handling in particular, Midori used a combination of returning values and safe exceptions, both of which seem similar to C++.

It also explicitly doesn't have the aforementioned conflation of bugs and runtime errors.

amelius · on Dec 1, 2016

15 Years of Concurrency ... and now we're all using JavaScript, which has a single-thread model, and doesn't allow cross-thread structural sharing of data structures (without going through a number of hoops).

pkolaczk · on Dec 1, 2016

Which ironically, is a model with probably best performance and scaling characteristics. If you want high performance, you should avoid sharing mutable stuff as much as possible.

danarmak · on Dec 1, 2016

The bigger problem is that Javascript disallows (or at least, browsers don't currently provide) efficient sharing of immutable values between threads/workers.

Because JS has no notion of immutability, sharing is done via copying - but what's worse, copying via full-blown serialization, not just in-memory blitting or COW. At least, that was the state of things when I last checked a year ago.

maxpolun · on Dec 1, 2016

Modern JS does allow you to send large buffers from one worker to another with just a pointer copy (because it stops being accessible from the sender).

danarmak · on Dec 2, 2016

Yes, but even with that restriction, it doesn't let you transfer ownership of structured messages, only buffers and images; so a serialization roundtrip is needed.

donatj · on Dec 1, 2016

I disagree. Play with Go's go routine and channels and see how easily you can spread a simple or complex job across 16 cores, and then bring that data all back together. JS has nothing on that.

djtriptych · on Dec 1, 2016

Pretty good CSP implementation in clojurescript though, which compiles down to JS.

pcwalton · on Dec 1, 2016

No, it's not. Data sharing is essential for performance.

Besides, the problem with models like those of JS (today, anyway) is that they don't even let you read concurrently, even if there are no mutators.

nostrademons · on Dec 1, 2016

It's important to distinguish performance (making the most out of a single computer) from scalability (making an overall system that continues to perform well when you throw lots of hardware at it). Data-sharing is essential for performance. It's anathema to scalability, not least because different processes in different datacenters aren't going to be able to share data without explicit messages between them anyway.

Node.js (and PHP, for that matter) often don't get enough credit for their scalability in the large. Perhaps this is because the surrounding distributed-systems ecosystem hasn't grown up around them yet the way it has around, say, Erlang or Java or even C++, but architecturally they're pretty sound if you want to build distributed systems that scale across many computers.

pcwalton · on Dec 1, 2016

If you're looking at things solely from a distributed systems perspective, sure. But I work on graphics. It'd be irresponsible for me not to embrace shared memory, where it's so essential for performance it's practically impossible to contemplate not having it. (Imagine if you couldn't have multiple fragments read from the same texture!)

We should use all the hardware resources available to us. If our programming models don't allow us to do that, then we should replace them with models that do.

signa11 · on Dec 1, 2016

from the article:

So, we made some:

Isolation is paramount, and we will embrace it wherever possible.

Message passing will connect many such isolated parties through strongly typed RPC interfaces.

Namely, inside of a process, there exists a single message loop, and, by default, no extra parallelism.

A “promises-like” programming model will be first class so that:

Synchronous blocking is disallowed. All asynchronous activity in the system is explicit. Sophsiticated (sic) coordination patterns are possible without resorting to locks and events.

with the exception of trying concurrency primitives on heavy-weight os provided processes, this looks like recreation/rediscovering of erlang primitives once again :)

would one be chastisized for wagering that erlang is the lisp of concurrency world ?

edit-001: slight clarification (hopefully)

brightball · on Dec 1, 2016

Every time I read about concurrency in other languages I see them trying to become Erlang but without the core design decisions that make Erlang works.

Always reminds me of this quote:

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang." - Robert Vriding

pjmlp · on Dec 1, 2016

The problem is that Erlang is only good a network communication and parallel programming, which is of course to be expected, given the language design goals.

The problem is when one tries to make a general purpose solution out of that design.

For example, how would Erlang look like in GPGPU programming, video/audio codecs, graphics programming, GUI,...

di4na · on Dec 1, 2016

Gui is not that hard, there are couples of examples out there. After all it is just dealing with some binaries.

For GPGPU, video/audio codecs and all ... You want single threaded things. And there Erlang says "i'm ok with that" and let you use Native Implemented Functions (NIFs) that enable you to use Rust, C and co. Or to use what is called a Port and let you interface with a program outside of erlang in a safe way. Works quite well.

Then Erlang is used as a "concurrent and distributed glue" between your programs. There are a couple of scientific simulations that use erlang like that in the wild.

zerofan · on Dec 1, 2016

Is there some reason Erlang had to be slow for single threaded things? Mixed language programming isn't desirable for many of us. I suspect Erlang could be fast, with just a few changes, but it never will be.

di4na · on Dec 1, 2016

Some.

1) it was never seen as a problem because the erlang team believe (and i agree with them) that you best let specialists deal with their speciality are.

2) There were never an efficient JIT on the BEAM. This is something that is regularly looked at by the OTP team, but it would be a massive effort, just to try to approach effiency of what other language, that you can easily interact with, could do.

3) The use case are small. Let be honest. The amount of problems that are really only CPU bounded are limited in day to day work.

zerofan · on Dec 1, 2016

My last project saturated a cluster of computers for days at a time. Not everyone writes crud screens for day to day work, honest.

Parallelism is about performance, and it seems contradictory to do parallelism well on top of a slow language. Yes, Erlang can call out to other languages, but so can shell scripts.

ramchip · on Dec 1, 2016

Erlang was built as a concurrent language for fault tolerance, not performance. The best example of a trade-off I can think of is the preemptive scheduling: Erlang switches between processes very often, which has a cost in performance but ensure the system stays responsive even if some tasks are doing intensive work (or even stuck in an infinite loop). Every function call, allocation, etc. increments a counter which is used to decide when a process is scheduled out, so that's significant overhead you pay to get a more predictable system.

Some other examples are live code loading, which means you can upgrade a system without restarting or interrupting it, the extensive tracing and statistics, etc. All these little things make it very nice to build and maintain a long-running service, but add up to make the VM slower. If you're doing batch jobs and have no realtime requirement, then the trade-off may not be so attractive.

signa11 · on Dec 1, 2016

> The problem is when one tries to make a general purpose solution out of that design.

yup. go(lang) / rust seems to be exploring some ideas with an almost similar approach, but not _solely_ focussed on n/w programming. are there other languages out there as well ?

and a slightly open ended question as well: in your opinion, do you think a language needs to implement a complete runtime (similar to erlang) with it's own 'process' abstraction, scheduler, i/o etc. etc. to provide concurrency support ?

zzzcpan · on Dec 1, 2016

I believe erlang-style language level lightweight isolated processes is the only future. It will help not only with concurrency and distributed system, but also with security and reliability. For example, wild regexp that takes forever on some inputs will not affect other processes and won't bring the whole system down and a broken input validation will not help any client to cross the trust boundary of its process.

pjmlp · on Dec 1, 2016

Well, C++, D, .NET, Java, Swift, Objective-C, Rust... are all exploring the runtime approach for concurrency support.

The main difference across those languages and Erlang/Go is how they expose the runtime to the developers.

It isn't a black box, rather they exposes a default set of behaviors and as set of APIs that allow to fine tune its behaviors.

signa11 · on Dec 1, 2016

> Well, C++, D, .NET, Java, Swift, Objective-C, Rust... are all exploring the runtime approach for concurrency support.

wasn't it hans-boehm, who famously posited (?) that concurrency needs to be part of the language, rather than bolted on after the fact ? do you think that makes all these other languages kind of poorer w.r.t concurrency ?

edit-001: specifically, this is the paper that i am talking about: http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf

pjmlp · on Dec 1, 2016

But concurrency is part of the language.

What you are missing is that those languages are quite expressive, thanks to generics, operator overloading, trailing lambdas, and APIs into runtime the are available as tools for the programmer.

So you get, for example, async/await built into the language, with a default behavior. But at the same time, the necessary runtime hooks that allow any library writer to fine tune and built other concurrency models on top of it.

For the user of those concurrency models, it is indistinguishable where they are coming from.

brightball · on Dec 1, 2016

The disnguishing factor is prescheduling vs cooperative scheduling. In all those other languages each process has to hand control back to the scheduler - meaning runaway code can negatively impact everything, pause the entire stack to garbage collect and can't transparently distribute itself across machines because of shared memory (you can't transparently pass a pointer).

Those are design decisions you can't bolt on later. Isolation, prescheduling and message passing without shared memory get you access to all of that.

pjmlp · on Dec 1, 2016

Yes you can, has been done in Java, .NET, Active Oberon and Modula-3, just from the top of my head.

It is only a matter of making use of the said compiler hooks and underlying machinery.

brightball · on Dec 1, 2016

I'm not familiar with Active Oberon or Modula 3, but Java and .NET absolutely do not. You can bolt on message passing but it's a constraint of the VM to properly handle prescheduling and heap per process isolation.

Mimicking the patterns doesn't replace the VM.

pjmlp · on Dec 1, 2016

Java and .NET are not only the Java and C# programming languages.

There are for example, distributed agent programming languages built on top of them, including network aware distributed GCs.

steveklabnik · on Dec 1, 2016

So, with Rust, it's more subtle than that. That is, while threading proper isn't part of the language itself, Rust's type system is structured in such a way as to make it possible to build those kinds of libraries. In other words, Rust's focus on aliasability ends up solving these problems.

So for example, in that paper, 4.1 is about the problem of concurrent modifiability. And indeed, it says

> Indeed, under the implementation strategy we outlined above, in which the compiler is unaware of threads, it is allowed to transform code subject only to sequential cor- rectness constraints and hence could generate the code con- taining a race.

However, in Rust, this re-ordering can't happen: Rust won't let you alias x and y between two threads without some sort of synchronization primitive. But this isn't because Rust knows about concurrency, it's because Rust knows about aliasing. In a sense, Rust-the-language makes this program _impossible to write_, but a library re-enables you to write this program. You need unsafe to do this, but it's all wrapped up inside of the implementation of, for example, Mutex<T>.

From the last part of this section:

> Resolving it essential requires a programming-language-defined and compiler-respected memory model, simply to ensure that the user and compiler can agree on when there is a data race.

We're in agreement here, but the model is built around aliasing, not concurrency.

4.2 is about speculatively executing store instructions. I know less about this, but again, it's built on the idea of two threads accessing data at the same time, unsynchronized. This can't happen in Rust due to the aliasing rules.

4.3 is about register promotion. This cannot happen in Rust, because you don't call a function to acquire the lock, then do whatever you want. Mutex<T> hides the value it's locking inside of itself, unable to be accessed from the outside, and the call to acquire the lock returns a mutable reference to the inner data. The call to acquire the lock is the only way to get said reference, and Rust's aliasing rules will forbid any other kind of access through the returned reference. So this kind of transformation can't happen in Rust either.

Section 5 is about performance. It's true that synchronization primitives are expensive. Rust can again use unsafe code in a disciplined way to provide safe concurrent modification, while ruling out data races entirely. For example, consider a simple map operation. We take an array of integers, and for each element, add one to it. This is an embarrassingly parallel operation, yet, as the paper mentions, with a pthreads-style approach to making it safe, one would need either a single lock around the whole array, which destroys the concurrency entirely, or some set of more fine-grained locks, which introduce cost, as well as limiting the amount of concurrency to some degree.

But with a small utility function [1], which performs a small (ie, non-atomic) check at runtime, we can safety split up our array into as many disjoint chunks as we'd like, and then pass each one off to its own thread, which is free to do the modification with no more synchronization needed. In fact, libraries like Rayon can even determine roughly the correct amount for you, if you don't want to think about it, and it will near-transparently just handle this for you (you change a call from iter() to par_iter() and you're done).

1: https://github.com/rust-lang/rust/blob/f8614c397313db00e4b46...

So yeah. I'm in agreement with the paper that the language needs to do _some_ kind of reasoning, but since aliasing and concurrency are so tightly related, I would argue that the language could understand only aliasing, not concurrency, and then library abstractions are sufficient.

zzzcpan · on Dec 1, 2016

The problem is in doing concurrency for the sake of doing concurrency. There is no general purpose solution for concurrency, because concurrency is not a problem to solve. You either need to scale and way beyond a bunch of cores you have on a single machine, so you go into distributed systems, or you don't need to scale, then you are fine with the single core, possibly with some primitive but safe abstractions for parallelism (not threads).

Pxtl · on Dec 1, 2016

See Wings3D to find out.

di4na · on Dec 1, 2016

Problem here : RPC and promises are radically against message passing. They enable you to add some concurrency and parallelism to your "single threaded" program, but there are whole class of problems and optimisations they can not solve.

In general, synchronous message passing or RPC can not work. At all. There are too much edge cases where it breaks the laws of physics.

bogomipz · on Dec 1, 2016

Can you elaborate on your statement:

>"Problem here : RPC and promises are radically against message passing."

di4na · on Dec 1, 2016

https://christophermeiklejohn.com/pl/2016/04/12/rpc.html https://www.youtube.com/watch?v=aDWZyYHj2XM

Christopher is far better than i will ever be at explaining it. In general, RPC have some fundamental problems, among other, they need you to ship the full context.

bogomipz · on Dec 3, 2016

Oh this is good reading, thanks!

_doky · on Dec 1, 2016

This looks like it's gonna be a long, interesting read. I better save up my bathroom time...

tomcam · on Dec 1, 2016

> I better save up my bathroom time

Unsure about implementation details

okreallywtf · on Dec 1, 2016

Not sure why you got down voted, I feel the same way. I read what I was sure was 3/4 of it only to realize it was about 1/8 but I'm definitely going to save it and finish.

I'm slightly (very?) jealous that I am not so passionate about any part of software as the author. I am interested in lots of areas but no one piece of the puzzle draws me and its bittersweet sometimes to read the thoughts of someone who seems so passionate about their field.

garrettgrimsley · on Dec 1, 2016

>Not sure why you got down voted

Because the comment adds nothing to the conversation.

module0000 · on Dec 1, 2016

It adds humor, and we like that sort of thing on HN

scott_s · on Dec 2, 2016

Post that only try to be funny tend to be downvoted on HN. We try to avoid reddit style threads where people just post one-liners.

michaelsbradley · on Dec 1, 2016

This is a fantastic write-up.

It was back in 2011 that I first started wrapping my head around the complexities of concurrent programming. Even in single-threaded environments, such as a JavaScript processes, concurrency woes can escalate quickly in terms of the non-determinism introduced by asynchronous operations (timers, I/O) and their callbacks.

My trying to understand and deal with those complexities resulted in an exploration of functional-reactive programming concepts, which in turn led me to make a "deeper dive" into computer programming language design and various computer science topics. Concurrency is still the CS topic that makes my ears perk up the most.

romaniv · on Dec 1, 2016

I've recently stumbled upon a very interesting paper about concurrency. Naming and Synchronization in a Decentralized Computer System:

http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-20...

It's written in 1978, but if I interpret it correctly, it's still conceptually ahead of everything mentioned in the article (the closest thing being STM). Anyone else here seen it?

pjmlp · on Dec 2, 2016

No, but thanks for bringing it up, it appears an interesting read.

victorcase · on Dec 1, 2016

Hey another article at YC frontpage about saying "DO NOT MESS WITH STATE" to scale;

"Shared-memory multithreading really isn’t the future, we thought, and notably absent from all of my prior work was fixing this problem."

I'm thinking about..

Immutability is the way to scale complex systems?

Every system can be implemented as immutable by definition?

We need to look to build new data structures - like Tries - to be Immutable-friendly?

When Immutability doesn't worth it?

colanderman · on Dec 1, 2016

Computer memory is mutable. For multiple processors to make use of the same memory requires contending for that memory somehow (else the processors are not communicating in any meaningful way).

You can mathematically try to hide that however you want, but somewhere deep in your language runtime stack will be a shared-memory concurrent mutable algorithm, be it a lock, queue, or otherwise.

The alternative is to ditch shared memory entirely and use e.g. hardware IPC to manage contention. Tilera tried that, they had 64-core processors with a hardware IPC mesh and incoherent cache. But last I checked they switched focus to coherent shared-memory concurrency. Not sure what the driver for the change was but it's telling.

EDIT: Not to mention that half the article focused on promise-style concurrency, which has nothing to do with shared memory or even multicore. Immutability does nothing to address that I/O exists and has latency, and that people try to hide it in frankly bizarre ways. (Really… not pumping the event loop to avoid deadlock? That's a design issue, you should be using queues instead of mutexes and eliminate cycles if you find you have that problem.)

eternalban · on Dec 1, 2016

> Computer memory is mutable. For multiple processors to make use of the same memory requires contending for that memory somehow (else the processors are not communicating in any meaningful way).

That's a bit of a strawman. For example, content addressable memory and message passing will certainly permit "processors ... communicating in [a] meaningful way".

colanderman · on Dec 1, 2016

That's why I called out message passing specifically in the very next paragraph.

victorcase · on Dec 1, 2016

I will search about Tilera :) Your answer open my mind cause I hadn't thought about hardware. It got me thinking.. Are GPU architecture more friendly to be immutable or It have the same "problem" of shared memory too? And.. by the way thanks to sharing your knowledge :)

adwn · on Dec 1, 2016

If anything, GPU architectures are even less suited to immutability than CPUs. To get good performance out of a GPU (otherwise, why use them?), you have to avoid thread divergence caused by data-dependent branches as much as possible (I'm simplifying a bit). The simplest and most effective way to do this is to allocate a couple of large arrays and work on them. Immutable data structures, garbage collection, and frequent allocation of small objects, in contrast, will cause a lot of thread divergence.

In addition, immutable data structures tend to be pointer-based. For GPUs (even more so than for CPUs), you want contiguous memory accesses for acceptable performance.

colanderman · on Dec 1, 2016

GPUs deal primarily with "embarrassingly parallel" worksets – those worksets where the data can be trivially divided into unrelated chunks, so mutability is not really a concern (since working memory isn't shared). In fact it's typical for an algorithm not to modify its input, and to write its output to a separate buffer. All the synchronization happens at the start and end of the algorithm, when work is passed to/from the CPU.

While it is possible for threads in a (modern) GPU to modify shared memory, it's typically very costly. Usually it takes the form of one thread running at the exclusion of all others, to do something like collecting the results from other threads after they're done running.

Re: Tilera, looks like they got bought out by Mellanox and haven't progressed past where I last kept track of them: http://www.mellanox.com/page/products_dyn?product_family=238... It was a shame, because the IPC mesh was easily twice as fast as the shared memory solutions they were pushing in the Gx series.

Athas · on Dec 1, 2016

Functional array programming on GPUs works pretty well, and is an immutable programming style. Your object granularity tends to be pretty big (entire arrays), so the low-level shared memory concerns that inevitably arise at some point in the software stack are easy to handle for the compiler and/or runtime. Basically, you have much fewer synchronisation points, so you can afford for them to be slow.

It's a more restricted programming style than common functional programming - in particular, recursion and advanced control flow is out - but that's the name of the game on GPUs, no matter what you do.

pjmlp · on Dec 1, 2016

And there are at least Haskell and F# implementations for GPGPU, not sure about other languages.

jerf · on Dec 1, 2016

Rust is probably the right cost/benefit tradeoff; mutable data structures, but strictly controlled mutability. In safe Rust code, the sort of mutability that gets you in trouble is blocked by the compiler.

I'm not sure there's an immutable language out there that impresses me with its performance at scale. In some ways, GHC works miracles for Haskell, and yet, it can still end up being 2-5x slower than C even if you try to make it fast. I'm not sure if there's anything else out there that's any faster with pervasive immutability. (There are some other "functional programming languages" that are often faster, like OCaml, but they're getting there by letting mutability in.)

As said in the article: "Rust is slightly less opinionated on the overall architecture of your system than Midori was, which means it is easier to adopt piecemeal," emphasis mine. Over the past couple of decades there have been a lot of projects like Midori (though that is one of the better ones) that could solve lots of problems, as long as lots of people dropped everything and adopted it. I've learned from both looking around and my personal experience that while eventually you do have to drop everything and try something new, that if you can to create a system that you can incrementally improve into, it will outcompete the drop-everything one every time. Even if the tradeoffs might make you sick, the reality is stark; the situation must be desparate before a "drop everything" solution can displace an existing one entirely. In Rust's case, in some ways the most exciting thing about it isn't the features its shipping or its capabilities or anything else; it's the way you're seeing people carve out a hunk of Firefox or other programs and replacing one function, one module, one subsystem at a time in Rust. That's what's missing from so many of the other efforts to address the issues Rust is addressing.

nickbauman · on Dec 1, 2016

Last I checked OCaml is single-threaded concurrency. My testing comparing Java with Clojure is they clock in about the same but Clojure pushes you away from locking where Java pushes you toward it but hard. Clojure isn't pure functional but it's enough that it makes a huge difference to achieving correctness without the performance compromise.

pjmlp · on Dec 1, 2016

The multi-core runtime for OCaml already exists, just isn't fully feature complete to be integrated into the stable version.

https://github.com/ocamllabs/ocaml-multicore

slashdev · on Dec 1, 2016

Echoing what colandrman says in a peer comment, current computers are shared mutable memory, and at least in x64, i.e. servers, coherent too. Immutability is nice, message passing is nice. It's certainly easier to reason about most of the time. But often (not always, sometimes it can be faster) there's a performance price to pay because you're layering an abstraction over the base computational model.

I specialize in lock-free everything mutable programming. It's a bloody nightmare to write correct code in an environment like that - but when done well it performs better than anything else.

greenleafjacob · on Dec 1, 2016

Care to share some resources on lock free concurrent data structures? Recently I saw HN post about a wait free stack [1].

[1] https://news.ycombinator.com/item?id=12109206

jwatte · on Dec 1, 2016

Erlang systems have been immutable, parallel, and concurrent for > 20 years.

zerofan · on Dec 1, 2016

Erlang is elegant and noteworthy, but you'll get more computation done in a single-threaded C program.

zzzcpan · on Dec 1, 2016

Yes, Erlang's implementation is a big problem, but it's not something fundamental to its architecture. Some new language with a fresh implementation could definitely solve that.

jwatte · on Dec 1, 2016

Also, Haskell has both useful and safe STM, and a runtime with built in concurrency and parallelism.

(Mentioning Erlang and Haskell because I've shipped solid production systems using both of those, and they work well!)

taeric · on Dec 1, 2016

Funny, because I read it more as learning to cope with state. Do you want to scale for performance? You need to delicately manage state.

You can do local tricks to hide change in state. Consider, many even simple addition operations in your computer are destructive. At the end of the day, though, your application's value is in the state it manages.

There may be a panacea in there some where. However, it often seems that performance is heavily sacrificed in the name of scalability. Which is scary, because performance is the main reason my scalable phone is driving me batty.

Bahamut · on Dec 1, 2016

Performance - immutability comes at a high cost sometimes.

jandrese · on Dec 1, 2016

When I saw the example in the paper of building a list by copying the entire list for every new entry and trashing the old one I knew performance would be a problem. A smart compiler can optimize that if it knows what you are doing, but they were writing a research compiler.

There is a tradeoff with this kind of immutable oriented programming that you trade off relatively easy parallelism for heinous memory thrashing--and cache misses are already a huge bottleneck on modern architectures. Rust seems to have struck a good balance here, I'm keeping a close eye on it as it continues to mature.

harpocrates · on Dec 1, 2016

Another thing to keep in mind is that with everything being immutable you end up with a pure language. Once that happens, the compiler starts being able to perform some really nice rewrites of your code really easily. The problem is that this is all or nothing - unless your language is actually pure you can't do this.

See this blog post for more <https://donsbot.wordpress.com/2008/06/04/haskell-as-fast-as-...

VodkaHaze · on Dec 1, 2016

Especially when memory constrained

victorcase · on Dec 1, 2016

Ok in the context of - critical - constrained memory I understand;

However do you agree that is more easy to solve memory (buying new hardware) than handle with race condition? kkkk

Then, Maybe we need to think about optimize our daily data structures.. I have following a few papers about this.. e.g:

http://dl.acm.org/citation.cfm?id=2658763 http://dl.acm.org/citation.cfm?id=2993251

jandrese · on Dec 1, 2016

It's easy to buy memory, but hard to buy L2/L3 cache. The whole point of the exercise is to scale more easily on multicore architectures, but it's no good if you blow out the cache thousands of times per second and bottleneck the system on memory accesses.

AstralStorm · on Dec 1, 2016

Additionally, DRAM and VRAM bandwidth are always at premium. Whenever you're making a copy (which you do a lot when objects are immutable), you use memory bandwidth.

This is especially important on mobile, FPGAs and in cases where sheer volume of data is huge. (GPU and big data)

adwn · on Dec 1, 2016

In addition to what jandrese said: You can nearly always add main memory capacity, but you cannot easily improve main memory throughput or latency.

agentgt · on Dec 1, 2016

Although distributed programming is different than concurrency (albeit one I suppose could argue) I was hoping for some mentioning of Join Calculus. Particularly Polyphonic C# [1] (given Duffy's MS background).

I have been meaning to re-investigate Join Calculus (the last time I did was in OCaml a long time ago) and to why it never really took off.

Some programming patterns can be found here: https://en.wikipedia.org/wiki/Join-pattern

[1]: https://en.wikipedia.org/wiki/Polyphonic_C_Sharp

sriram_malhar · on Dec 1, 2016

True. Both polyphonic C# and JoCaml were lovely projects.

nickpsecurity · on Dec 1, 2016

Weird thing is that the inspirations didnt factor in only commercially-depolyed schemes for safe concurrency: Ada's and Eiffel SCOOP. The SCOOP model was even ported to Java at one point.

jwatte · on Dec 1, 2016

Erlang CSP is also commercially deployed and safe.

agentgt · on Dec 1, 2016

On Joe Duffy's profile there is what I thought to be a hexadecimal: 284d69f1cb

I tried converting it to ascii and decimal and nothing interesting came out. Anyone know what that is? Maybe a key signature?

ludwigvan · on Dec 2, 2016

Probably a part of the hash of the first Git commit of his new project.