Can somebody weigh in on their claims of performance differences? IIUC that's the strongest argument so far against sequential consistency by default, yet I'm not sure I understand their evidence. I haven't followed their references yet, but after reading their arguments I'm not sure just what the overhead is now, and what they expect it to be in the future (tbh, I would have expected them to be much clearer about this point, since it is so crucial). They say they expect the overhead to be reduced substantially to the point of not being worth caring about, but is that actually true/likely? I'm not familiar with this enough to judge on my own. My suspicion here is that they (cheekily) say that the overhead can be reduced, without having to prove that it will be reduced.
The alternative, if it is genuinely more expensive to implement SC guarantees in hardware, is that we simply stop teaching people that "A;B" means A is executed and then B. Maybe there really should be an exception saying "but nobody is allowed to look at any intermediate states, unless explicitly allowed". We could also just teach the full meaning of it from the start, it can't be that difficult. Their argument seems to be that non-SC is much less convenient, which I agree with.
On a scale from plenty-real to not-real-at-all, how real are the hardware performance limitations exactly?
The paper reports overhead numbers from existing research. For instance, see Figure 18 in http://arxiv.org/abs/1312.1411, which shows the cost of SC for memcached - 1% on x86 and 3% on ARM.
This overhead is primarily due to the cost of fences on existing hardware. What we (not so cheekily) say is this is likely to get better as hardware platforms provide better implementations for fences.
> The paper reports overhead numbers from existing research. For instance, see Figure 18 in http://arxiv.org/abs/1312.1411, which shows the cost of SC for memcached - 1% on x86 and 3% on ARM.
But that's the bit I don't find nearly convincing enough. You say (p.5) that you're going to "rebut these arguments" that "SC is too expensive", but the main figure of 1%/3% is from a non-standard research-level static analysis tool that, if I read that paper correctly, works on codes up to 10k LOC and takes a few minutes to run, producing the 1%/3% figure. Can that really be generalised? I'm not quite sure. The other tools in comparison did much worse, which I think may be closer to what one would get in practice. So I think that's not a good rebuttal: if you consider the tools actually available SC may well be too expensive.
I'm not saying you're wrong, just that I don't think you've proven your case very clearly. I was kind of expecting a much clearer rebuttal than I found, sorry about the snark.
yes, you did read correctly. The 1-3% does assume a non-standard whole-program analysis. Something more practical on existing hardware will look more like the E numbers (for escape analysis): 17.5% on x86 and 6.8% on ARM. An even dumber analysis (H) adds up to 37.5% overhead on x86.
It is important to realize that the overhead numbers are not huge, like 5x or 20x, to simply write SC off.
As we say in the paper, these overheads, however small they might be, will be unacceptable for low-level libraries written in C/C++ programs. The main argument of the paper was that these overheads are acceptable for "safe" languages like Java/C# which anyway add other (useful) overheads such as garbage collection, bounds checking etc. to help out the programmer.
Even for C/C++, it will still make sense to have SC by default - the programmer is responsible for explicitly turning off SC semantics in parts of the code where the inserted fences hurts performance. This is a much better tradeoff - safe by default and performance by choice.
Good article, but I was kinda disappointed not to have what I expected to be a good read about the evolution of how people use semicolons in natural language. Now I want to read that too.
I don't do much threaded programming - I think it's come up once in my career? (and fortunately at the time I was pairing with a Java guru) - so I don't completely understand the issues here.
Can someone link to a good introduction to the problem? I just wasn't able to completely understand it from the examples in the linked article.
I had a completely different reading. I thought the fable portrayed Alice as thoughtful and bright. The punchline is that this talented student was right to leave programming for cooking because the lack of sequential consistency in high-level languages is absurd.
I saw that a few times. Brilliant college minds failing at programming because of legacy and ad-hoc features. They preferred maths and physics, not devoid of arbitrary but, probably tinier and more stable over time.
But is the argument for higher consistency based on the existence of "saner" architecture as in x86, and wasn't argument insufficient in the time the article was written (1998) as well as is now, given that ARM predates the article and its popularity significantly increased since?
Then ignoring the existence of the weaker memory model of ARM (versus the stronger x86/x64) is even stranger. And even if x86/x64 models are stronger, they still need some fences and using them all the time would be too slow. So I still don't really understand the arguments of the article.
The article does not ignore architectures like ARM. You do need fences, but not all the time - the compiler can avoid fences in places where there is no danger of violating sequential consistency (e.g. on accesses to provably local objects).
Yeah, that's my understanding too. It's not "math is hard let's go cooking" it "fuck this nonsense". It's not math problems that are the problem here, it's inconsistent execution in the name of performance.
To be precise, Alice quits programming because modern languages lack proper concurrency specification, and thus lack actual "Math" in this area. I can sympathize with that.
The concurrency behavior of Java/C++/etc is specified, it's just that the specification doesn't match the model given in the article. It follows a different mathematical model—that doesn't mean that it "lacks actual math".
What don't you like about that? If you heat the vegetables and oil together, they can slowly lose their water and essentially boil, and taste boring; if you let the oil get to the right temperature and then add the vegetables, then they'll cook up nicely, with plenty of Maillard reactions (browning), and taste delicious.
The alternative, if it is genuinely more expensive to implement SC guarantees in hardware, is that we simply stop teaching people that "A;B" means A is executed and then B. Maybe there really should be an exception saying "but nobody is allowed to look at any intermediate states, unless explicitly allowed". We could also just teach the full meaning of it from the start, it can't be that difficult. Their argument seems to be that non-SC is much less convenient, which I agree with.
On a scale from plenty-real to not-real-at-all, how real are the hardware performance limitations exactly?