What we learned from C++ atomics and memory model standardization [video]

tialaramex · 2024-03-04T18:29:54.000000Z

There's clearly an opportunity for a much longer talk, this teases that Paul (McKenney, of Linux fame) will have a very different take and we don't hear what it is, maybe that was presented at another session of this conference.

It's definitely true that translating into "standardese" is a bad idea. Humans don't read standardese and neither do machines, a machine readable model would be superior (more likely to be correct, more easily tested) for this and for other tricky technical problems. Given that even the for-profit C++ vendors don't use the actual ISO document (it's out of date and pointlessly expensive so why bother) having an appendix to the "draft" with the machine proofs would be much better.

I think Hans makes an understandable but (IMO) wrong assumption about a benefit from choosing Sequentially Consistent ordering (the C++ default) over caring which order is correct. Much of the tricky concurrent code non-experts are going to write against these APIs will be wrong anyway, even if you provide Sequential Consistency.

As a result the committee's original assumption wouldn't have much helped. For example I doubt that when (if? I don't like some of the noises I've been hearing) Microsoft fixes SRWLock the fix will be an ordering tweak.

davidtgoldblatt · 2024-03-04T18:53:56.000000Z

> There's clearly an opportunity for a much longer talk, this teases that Paul (McKenney, of Linux fame) will have a very different take and we don't hear what it is, maybe that was presented at another session of this conference.

Paul's talk is here: https://www.youtube.com/watch?v=iJP6DWVrLjM

tialaramex · 2024-03-04T21:16:09.000000Z

Thanks, I think I'll end up watching a good number of these

jcranmer · 2024-03-04T20:14:47.000000Z

There is a bit of a trend in languages to shift towards more formal semantics. In the C/C++ world, the relaxed memory model and the pointer provenance work are being heavily driven by actual formal semantics model. And many newer languages seem to rely on a more overtly operational semantics description (Java and JavaScript come to mind), although the C and C++ standards themselves are largely free of this. I definitely would like to see formal semantics be defined by the standards, although this is difficult because they still don't exist in complete form, and I'm not sure many committee members have an ability to read or reason about formal semantics very well.

I think the commentary about the atomics memory model kind of not working as intended is useful to note. Consume is useless in practice, and there are probably better railings that could have been put around acquire/release semantics.

I'll have to listen to several of the other talks.

matt_d · 2024-03-04T19:55:09.000000Z

The Future of Weak Memory (FOWM) 2024 talks: https://www.youtube.com/playlist?list=PLyrlk8Xaylp6u1S3R6gH0...

Abstract: https://popl24.sigplan.org/details/fowm-2024-papers/16/What-...

The C++11 memory model was first included with thread support in C++11, and then incrementally updated with later revisions. I plan to summarize what I learned, both as a C++ standards committee member, and more recently as a frequent user of this model, mentioning as many of these as I have time for:

The C++ committee began with a view that higher level synchronization facilities like mutexes and barriers should constitute perhaps 90% of thread synchronization, sequentially consistent atomics, maybe another 9%, and weakly ordered atomics the other 1%. What I’ve observed in C++ code is often very far from that. I see roughly as much atomics as mutex use, in spite of some official encouragement to the contrary. Much of that uses weakly ordered atomics. I see essentially no clever lock-free data structures, along the lines of lock-free linked lists in the code I work with. I do see a lot of atomic flags, counters, fixed-size caches implemented with atomics, and the like. Code bases vary, but I think this is not atypical.

In spite of their frequent use, the pay-off from weakly ordered atomics is decreasing, and is much less than it was in Pentium 4 times. The perceived benefit on most modern mainstream CPUs seems to significantly exceed the actual benefit, though probably not so on GPUs. In my mind this casts a bit of doubt on the need to expose dependency-based ordering, as in the unsuccessful memory_order_consume, to the programmer, in spite of an abundance of use cases. Even memory_order_seq_cst is often not significantly slower. I’ll illustrate with a microbenchmark.

We initially knew way too little about implementability on various architectures. This came back to bite us recently [Lahav et al.] This remains scary in places. Hardware constraints forced us into a change that makes the interaction between acquire/release and seq_cst hard to explain, and far less intuitive than I would like. It seems to be generally believed that this is hard or impossible to avoid with very high levels of concurrency, as with GPUs.

We knew at the start that the out-of-thin-air problem would be an issue. We initially tried to side-step it, which was a worse disaster than the current hand-waving. This has not stopped memory_order_relaxed from being widely used. Practical code seems to work, but it is not provably correct given the C++ spec, and I will argue that the line between this and non-working code will inherently remain too fuzzy for working programmers. [P1217]

Unsurprisingly, programmers very rarely read the memory model in the standard. We learned that commonly compiler writers do not either. The real audience for language memory models mostly consists of researchers who generate instruction mapping tables for particular architectures. The translation from a mathematical model to standardese is both error prone, and largely pointless. We need to find a way to avoid the standardese.

Atomics mappings are part of the platform application binary interface, and need to be standardized. They often include arbitrary conventions that need to be consistently followed by all compilers on a system for all programming languages. Later evolution of these conventions is not always practical. I’ll give a recent RISC-V example of such a problem.