I think the hard part of it is that x86 only has one atomic ordering and none of the other modes do anything. As such, it’s really hard to build intuition about it unless you spend a lot of time writing such code on ARM which wasn’t that common in the industry and today most people use higher level abstractions.
By databases, do you mean those running on DEC Alphas? Cause that was a niche system that few would have had experience with. If you meant to compare in terms if consistency semantically, sure but there’s meaningful differences between database consistency semantics of concurrent transactions and atomic ordering in a multithreaded concept.
Java’s memory model “wrestling” was about defining it formally in an era of multithreading and it’s largely sequentially consistent - no weakly consistent ordering allowed.
The c++ memory model was definitely the first large scale adoption of weaker consistency models I’m aware of and was done so that ARM CPUs could be properly optimized for since this was c++11 when mobile CPUs were very much front of mind. Weak consistency remains really difficult to reason about and even harder to play around with if you primarily work with x86 and there’s very little tooling around to validate that can help you get confidence about whether your code is correct. Of course, you can follow common “patterns” (eg loads are always acquire and stores are release), but fully grokking correctness and being able to play with the model in interesting ways is no small task no matter how many learning resources are out there.
Nit: x86 has acquire/release and seq_cst for load/stores (it technically also has relaxed, but it is not useful to map it to c++11 relaxed). What x86 lacks is weaker ordering for RMW, but there are a lot of useful lock free algorithms that are implementable just or mostly with load and stores and it can be a significant win to use non-seq-cst stores for this on x86
I would have to imagine you mean x86-64 right? I would imagine 32bit x86 doesn’t have those instructions?
I’m also kind of curious if a lot of modern code compiled to x86 would see consistency issues running on old CPUs before TSO was formalized (like a p2 multiprocessor server).
Embedded devices did not necessarily use the c++ memory model, and definitely not in the 90s and were highly likely in order CPUs to boot with no crazy compilers and thus atomics didn’t matter too much anyway (volatile was sufficient). They had a weaker memory model maybe but at the same time multi threading on embedded did not really exist as it was only being introduced into the industry with any real seriousness around that time (threading on Linux started to shake out around the mid 90s).
By databases, do you mean those running on DEC Alphas? Cause that was a niche system that few would have had experience with. If you meant to compare in terms if consistency semantically, sure but there’s meaningful differences between database consistency semantics of concurrent transactions and atomic ordering in a multithreaded concept.
Java’s memory model “wrestling” was about defining it formally in an era of multithreading and it’s largely sequentially consistent - no weakly consistent ordering allowed.
The c++ memory model was definitely the first large scale adoption of weaker consistency models I’m aware of and was done so that ARM CPUs could be properly optimized for since this was c++11 when mobile CPUs were very much front of mind. Weak consistency remains really difficult to reason about and even harder to play around with if you primarily work with x86 and there’s very little tooling around to validate that can help you get confidence about whether your code is correct. Of course, you can follow common “patterns” (eg loads are always acquire and stores are release), but fully grokking correctness and being able to play with the model in interesting ways is no small task no matter how many learning resources are out there.