One thing I was surprised to learn about spinlocks is that not all spinlocks are spinlocks! At at least one company I have worked for, spinlocks are actually just locks that are biased to spin, but still fall back to kernel-based locking and scheduling eventually. Likewise, that company's main mutex class spins for a while before locking.
When I asked why the rationale was something like, "If a spinlock is ever held more than such and so number of microseconds, it's an error that it's a spinlock and actually it should be a mutex." Instead of doing degenerate behavior in this case, they just made the spinlock actually act like a mutex.
True spinlocks in user space are rarely a good idea. You would use a spinlock in the first place because you assume that the critical section is short, but the kernel often gets in the way of that assumption. The thread in the critical section can run out of its time slice, or there’s an interrupt etc. etc.
And once this happens, you now have a descheduled thread that owns the lock and a spinning thread waiting to acquire it, which is a recipe for priority inversion. As far as the kernel can tell, it’s the spinning thread that’s doing some important work so it has no reason to preempt it and schedule the waiting thread, which is exactly the wrong thing to do in this situation.
Spins are so short and well-defined that I wonder why the kernel doesn't just examine a handful of instructions around the program counter and guess if the thread is spinning or not before scheduling it for another time slice. Seems like a really nice and simple kind of heuristic to implement.
Your idea works well for other kinds of more abstract critical sections too, such as code regions where a garbage collection should be temporarily deferred.
Or in embedded systems where code is running at supervisor level, regions where an interrupt should be deferred, thereby allowing interrupt-disable and interrupt-enable instructions to be avoided.
An alternative which is reasonably fast on modern CPUs is to have a spinlock implementation in the kernel VDSO (or equivalent).
Something a bit like that was done in uCLinux for old ARMs, except instead of a spinlock, they put atomic compare-and-swap in the VDSO, as the CPU didn't have an instruction for that. The kernel checked the userspace address on interrupts, and altered the PC register to redo the compare if the sequence was interrupted.
So that didn't extend the timeslice to allow the operation to complete. Instead it unwound the operation to ensure correct semantics. With a spinlock/mutex combination, cutting short the spinning phase when the task is descheduled by the kernel would be similar, and would allow the "normal" spin count to be unlimited.
Oh jeeze, I don't think there'd be anything simple about having to divine facts about user space by the code flow graph near the interrupted instruction on basically every thread switch.
You don't need to worry about the CFG here, a spinlock is literally just a ~4-instruction loop (or a few more, depending on the form you use). All you need is to handle a few common codegen patterns. Like if you see the instruction pointer in the middle of a mov + xchg + test + jne sequence then you know it's a spinlock. If you don't detect it in some canonical form then you're back where we are now; whatever. It's not complicated.
I'm not sure which issue you're referring to, but the one I was trying to address was the one above about the spinning itself causing a priority inversion: "you now have a descheduled thread that owns the lock and a spinning thread waiting to acquire it, which is a recipe for priority inversion."
A) it's not just a priority inversion. The problem happens with simple round robin schedulers too.
B) The problem is the descheduled thread which has the lock and isn't in the lock sequence. Simply killing the time slice of the spinning thread without knowing what it's waiting on may lead to even worse behavior.
The answer here is simple: just don't use pure spinlocks if you can be preempted.
> Simply killing the time slice of the spinning thread without knowing what it's waiting on may lead to even worse behavior.
It's not obvious to me how likely this is compared to the other case, but if you're trying to make a case for why a kernel doing that would result in worse behavior in typical cases, it would probably help to explain this.
> The answer here is simple: just don't use pure spinlocks if you can be preempted.
I don't get why you're arguing with me on this. I wasn't telling anyone to use spinlocks, nor claiming this is the one and only problem with spinlocks. I was just saying a kernel could be a little smarter about a particular case by examining the instruction sequence.
Your idea doesn't solve the problem. By the time the lock acquirer is pre-empted and the kernel has a chance to do something clever, a significant fraction of a timeslice has already been wasted.
Actually, many mutex implementations first spin in a loop and only then enter wait state. One prominent example is Windows' CRITICAL_SECTION (where the spin count is even configurable). However, I would never call it a spinlock. A mutex might contain a spinlock but not vice versa. After all, the defining property of a spinlock is that it never goes to sleep. One use case outside kernel programming is to protect shared resources in (soft) real time applications where the use of mutexes are forbidden (e.g. real time audio programming). Often you can use atomic variables or lockfree queues instead, but sometimes a spinlock is the only practical solution.
FreeBSD has "adaptive locks" in the kernel, but the logic works the other way: A sleep mutex might spin for a while before sleeping. Going the other way wouldn't work because there's some contexts (e.g. in a lightweight interrupt handler) where you can't sleep.
What you describe isn't a spinlock then. It's a regular blocking IPC primitive with an optimization layer that can busy-wait a bit.
As the term is pervasively used in industry, "spinlocks" always imply locking against other physical hardware, and that can't be done without spinning (I mean, you could fall back to blocking, but then the blocking implementation in the kernel would have to lock its own data structures to effect the scheduling; it's a chicken and egg problem). They usually, but not quite always, imply protection against asynchronous interrupts or signals too.
Yes, but it's not so uncommon for "spinlocks" to evolve into not-actually-spinlocks while keeping the name. In the Linux kernel, `spinlock_t` is actually a mutex if `PREEMPT_RT` is enabled [1]. And Darwin has some code to do this in userspace [2], although I'm not sure if it's actually used.
You're referring to a Linux Futex, which is a user-space construct.
In kernel-level code, for areas of the kernel that must use real spinlocks, they do not fallback to the equivalent of a Futex because that would mean the current thread of execution would be swapped out and kernel-level spinlocks are designed for critical sections that cannot be swapped out. In critical sections of the kernel that can be swapped out, you'd use a Mutex which does the equivalent of a user-level Futex (and swap out) when the Mutex is contended.
One thing I don’t get is how blocking is implemented once you have basic mutual exclusion working? Lets say you try to take the lock, but you spin, waiting for it. But say in your OS now you want the thread to go to sleep until it’s available.. how is actually implemented without spinning and using CPU?
>But say in your OS now you want the thread to go to sleep until it’s available.. how is actually implemented without spinning and using CPU?
If you're facing such low-latency constraints that you need a spinlock, you wouldn't want the thread to sleep, instead you'd pin the core and have the dedicated core constantly spinning. That's how it's done in HFT. You can use a fancy network card like SolarFlare with efvi_drivers to bypass the kernel completely for network IO (spin waiting on the network card). Basically you need to treat the kernel like it's a paedophile trying to molest your children, and do everything you can to stop it touching your low-latency threads.
To have a thread block on a lock, you need to keep track of the fact that the thread is waiting on that lock, and then when the lock is unlocked, wake that thread (or at least one thread that's blocked on it).
That could be a wait queue, or it could be keeping track of what lock (if any) a thread is blocked on and iterating all threads to see if anything is blocked (this is relatively easy to implement, but not great if you have many threads), or ???. If you're running SMP or a re-entrant kernel, you need to be careful about how you do your checks so you don't have race conditions; you don't want a thread blocked on an unlocked lock.
The OS schedules threads that it wants to run. When you consider which thread to schedule, don't schedule a thread that is waiting for a lock that isn't free yet.
Google for how the Linux Futex works. In short it's done by task switching after double checking the spinlock is still held. The kernel can do such a thing without the threat of a race condition because it can ensure no other task is running that would modify the spinlock state.
The kernel knows about mutexes, and so the thread waiting on the lock won't be scheduled again until the mutex has been released. Whether the lock can be acquired depends on the scheduler and how many other threads are waiting on the same lock.
When I asked why the rationale was something like, "If a spinlock is ever held more than such and so number of microseconds, it's an error that it's a spinlock and actually it should be a mutex." Instead of doing degenerate behavior in this case, they just made the spinlock actually act like a mutex.