>> In the get method, you're returning a pointer to the element within the queue...

hi_dang_ · on July 8, 2024

I sincerely doubt the big HFT firms use anything of Fowler’s. Their optimizations are down to making their own hardware. LL is very context dependent and Amdahl’s law applies here.

JedMartin · on July 9, 2024

I have absolutely no idea how this works in Java, but in C++, there are a few reasons you need std::atomic here:

1. You need to make sure that modifying the producer/consumer position is actually atomic. This may end up being the same instruction that the compiler would use for modifying a non-atomic variable, but that will depend on your target architecture and the size of the data type. Without std::atomic, it may also generate multiple instructions to implement that load/store or use an instruction which is non-atomic at the CPU level. See [1] for more information.

2. You're using positions for synchronization between the producer and consumer. When incrementing the reader position, you're basically freeing a slot for the producer, which means that you need to make sure all reads happen before you do it. When incrementing the producer position, you're indicating that the slot is ready to be consumed, so you need to make sure that all the stores to that slot happen before that. Things may go wrong here due to reordering by the compiler or by the CPU [2], so you need to instruct both that a certain memory ordering is required here. Reordering by the compiler can be prevented using a compiler-level memory barrier - asm volatile("" ::: "memory"). Depending on your CPU architecture, you may or may not need to add a memory barrier instruction as well to prevent reordering by the CPU at runtime. The good news is that std::atomic does all that for you if you pick the right memory ordering, and by default, it uses the strongest one (sequentially-consistent ordering). I think in this particular case you could relax the constraints a bit and use memory_order_acquire on the consumer side and memory_order_release on the producer side [3].

[1] https://preshing.com/20130618/atomic-vs-non-atomic-operation...

[2] https://en.wikipedia.org/wiki/Memory_ordering

[3] https://en.cppreference.com/w/cpp/atomic/memory_order

6keZbCECT2uB · on July 9, 2024

Fowler's implementation is written in Java which has a different memory model from C++. To see another example of Java memory model vs a different language, Jon Gjengset ports ConcurrentHashMap to Rust