> (It might appear that a queue using a lock can also be stall-free, since the a...

cesarb · on June 4, 2019

> And in practice, how is that different from a locked queue that simply discards data when pthread_mutex_trylock() fails?

The difference is that it only discards data when the queue is full. If you discard data when a trylock fails, you'll randomly discard data just because the other side happened to be looking at the queue at that moment. In the common situation where the consumer is being fast enough to always drain the queue before it fills, it's the difference between not discarding any data versus discarding an unpredictable subset of the data.

dragontamer · on June 4, 2019

> In the common situation where the consumer is being fast enough to always drain the queue before it fills

There is literally no application I'm aware of where you can hold this assumption, if only because of the existence of swap memory and the way OSes page tasks in and out.

If you accept data-loss in the case of "poor performance on the consumer", you are tacitly accepting data-loss whenever that thread hits the "wrong" memory address that happened to have been swapped to disk, and now has to wait 10ms+ before that task can continue.

Its effectively random data loss from the perspective of the user.

------------

EDIT: I had a tangent here, but I probably shouldn't try to write a lock-free queue in one try in a stream of consciousness post. I just deleted it all, I probably had a subtle bug somewhere anyway.

------------

Seriously: look at the code in the blogpost. I'm not 100% convinced its bug free. The consumer-thread is still interacting with the "write" and "watermark" variables, and those two variables are NOT being updated atomically by the producer queue.

I think they manage to squeak by due to SeqCst, but its a very difficult "proof". I'm not entirely sure I can "prove" that the blogpost's code is correct.

But hey, if you think the code is fine, convince me why the following is NOT a bug.

    === In the "producer" thread ====
    
    buffer.write.store(buffer.write.load() + write_len)

Mind you, I don't think its a bug. But its really difficult for me to prove it. So I'm curious, why are you confident in their methodology when the above is such a difficult line to grok? And there's not a single compare-and-swap talked about in the entire blogpost (so they're definitely not using the "standard" methodology... to lock-free code)