> 2. Lockfree can be faster -- This is the case most people hope for. it's not a...

dragontamer · on June 4, 2019

Spinlocks don't return to the OS.

Mutexes are spinlocks for roughly 1000ish cycles on both Linux and Windows. If you're in-and-out before the 1000 cycles are up, the typical mutex (be it a pthread_mutex or Windows CriticalSection) will never hit kernel code.

If you take more than 1000ish cycles, then yeah OS takes over as a heuristic. But most spinlock code won't take more than 50 cycles.

chrisseaton · on June 4, 2019

How many cycles do you think it will take if the other thread has locked the spinlock and then died or was scheduled out...

dragontamer · on June 4, 2019

Both Linux and Windows detects these cases with pthread_mutex (Linux spinlock+mutex) and CriticalSection objects (Windows spinlock+mutex).

Can you tell me an OS where what you say is a legitimate concern?

To answer your question: about 4000 cycles on Windows if I'm remembering off the top of my head. I think that's the default. Roughly 1-microsecond.

chrisseaton · on June 4, 2019

Detects these cases... and does what?

What can the OS do for you if your consuming thread has for example taken a lock and is now stalled on a page fault triggering IO from spinning disk that could take seconds to finish? Nothing that I'm aware of - you just have to wait until it runs again and releases the lock.

That's why people write lock-free - they don't want a stall or death of the one thread to stall other threads. Deadlock, lovelock, priority inversion.

Read the Wikipedia page for a starting point of motivations for lock-free.

dragontamer · on June 4, 2019

Then schedules out the other thread?

I mean, what happens if the writer thread stalls out in the lock free case? The reader thread runs out of things to do and is also going to be scheduled out (eventually).

Like, that's the OS's job. To remove tasks from the "runnable" state when they stall out.

> That's why people write lock-free - they don't want a stall or death of the one thread to stall other threads.

Erm, if you're saying that the reader thread will be highly utilized... that's not likely the case if the writer thread in this "lock free" implementation stalls out.

Lets say the writer-thread hits a blocking call and is waiting for the hard drive for ~10 milliseconds. What do you think will happen to the reader-thread? In the "lock-free" implementation talked in this blog.

chrisseaton · on June 4, 2019

> I mean, what happens if the writer thread stalls out in the lock free case? The reader thread runs out of things to do and is also going to be scheduled out (eventually).

Yeah it keeps running, exactly as you say! It can keep working on jobs already on the queue independently.

Are you arguing that you you personally don't think it's worth doing lock-free for various practical reasons? Or arguing that's never why anyone does it?

Because the latter is simply false - I've done it for the latter reasons professionally both in academia and industry so there's one data point for you, and the literature talks about this use-case extensively so I know other people do as well.

dragontamer · on June 4, 2019

> Are you arguing that you you personally don't think it's worth doing lock-free for various practical reasons? Or arguing that's never why anyone does it?

No. Lock-free is great. When it is done correctly.

The code in this blog post was NOT done correctly. It only works in the case of 1-reader + 1-writer, so the "advantages" you're talking about are extremely limited.

You have to keep in mind the overall big picture. Lock-free code is incredibly difficult to write and usually comes with great cost to complexity. However... sometimes Lock-free makes life easier (okay, use them in that case). Sometimes, lock-free can be faster (okay... use them in that case... if speed is important to you).

But in many, many cases, locked data-structures are just easier to write, test, and debug.

jcelerier · on June 4, 2019

> It only works in the case of 1-reader + 1-writer

this is what is needed 99.9% of the time. e.g. look at the number of users of this C++ SPSC queue : https://github.com/cameron314/readerwriterqueue

or of the Boost one : https://github.com/search?q=boost%3A%3Alockfree%3A%3Aspsc_qu...

dragontamer · on June 4, 2019

Yeah, but the "queue-of-queues" is a more traditional structure familiar to people with lock-free algorithms.

I don't have time to do a serious code review of everything... but immediately upon looking at that code, its already clearly written with better quality: with acquire/release fences being used in (seemingly) the correct spot (just as an example).

Like, the real issue I have here is the quality of the code that is being demonstrated in the blog post. A lock-free of queues-of-queues implementation makes sense in my brain. What's going on in the blog-post ultimately doesn't make sense.

jcelerier · on June 4, 2019

> Erm, if you're saying that the reader thread will be highly utilized... that's not likely the case if the writer thread in this "lock free" implementation stalls out.

That does not mean that the reader thread has nothing to do. e.g. the most common case is that your reader thread will produce an output, and your writer thread sends messages to change how this output is generated - but even if there's no message, the reader thread has to keep going at a steady rate.