Hacker News new | past | comments | ask | show | jobs | submit login
Threads can infect each other with their low priority (github.com/dobiasd)
68 points by Dobiasd on Dec 10, 2019 | hide | past | favorite | 35 comments



As far as I can establish with a cursory skim, the article describes priority inversion, and states:

"OS schedulers might implement different techniques to lessen the severity of this problem, but it's far from being solved once and for all."

The article fails to describe what "solved once and for all" might mean.

Given that Priority Ceiling Protocol and Priority Inheritance are two common solutions used in real-time schedulers, I would like to understand why priority inversion is not a solved problem.

On the other hand, I have yet to learn whether Mac OS has any synchronisation primitives that mitigate mutex priority inversion.


Given that Priority Ceiling Protocol and Priority Inheritance are two common solutions used in real-time schedulers, I would like to understand why priority inversion is not a solved problem.

Because Windows and Linux are not designed for hard real time. There's a "rt_mutex" thing in Linux, which does priority inheritance, but it's not used much.

QNX took this seriously; it has priority inheritance not just for mutexes, but for interprocess messaging and I/O too. But it's a real-time OS. It passes through not only the priority, but the CPU quantum for non hard real time tasks. So a group of rapidly interacting processes are effectively scheduled as if they were one process which happens to cross address spaces.

On most non-real-time OSs, if you have a microservices-type program which is rapidly passing control back and forth between multiple processes, it will work fine as long as there's spare CPU time. If there's no spare CPU time, performance suddenly drops badly. That's because the processes being released by mutex unlocks go to the end of the line for CPU time.

This is something microkernel designers take seriously, because it has a big impact on microkernel performance. Unikernel designers, not so much.


I agree, there are operating systems where few guarantees are made about protection from priority inversion, at least not without specific programmer intervention. As a programmer you need to be aware that priority inversion can be a thing.

However, this has no bearing on whether or not priority inversion is a "solved problem". "Solved problem" generally means that there are well known, widely studied technical solutions to a problem. My understanding right now is that priority inversion is a solved problem -- it is a standard topic in any introductory textbook on real-time systems with solutions that I stated. But maybe I'm wrong -- certainly the author seems to think so -- hence my question stands: I would like to understand why priority inversion is not a solved problem.


Sorry to double-reply, but the paper I was thinking of was probably [0]

It seems very informative. Doesn't appear to be peer-reviewed, mind.

[0] http://www.cse.uaa.alaska.edu/~ssiewert/a320_doc/Yodaiken-20...


> hence my question stands: I would like to understand why priority inversion is not a solved problem.

I believe it's because although complete solutions do exist, they have prohibitive overheads.

Annoyingly I can't find the relevant paper I'm thinking of.


> I would like to understand why priority inversion is not a solved problem.

Maybe I just was not careful enough in choosing this phrasing when writing the article.

Do you think it would make sense if I change "but it's far from being solved once and for all." to "each of them with different upsides and downsides."?


Trade offs yes. I think I understand now that you meant that it is common that application-level priority inversion is not automatically avoided by the OS -- programmers still need to worry about it. Even if theoretical solutions exist, they may not be available on your chosen platform, or they may need to be manually configured (e.g. PTHREAD_PRIO_INHERIT).


Thanks. I committed a change (https://github.com/Dobiasd/articles/commit/8ac8db11eb95de688...). Feel free to open a PR if you have a better phrasing in mind. :)


> Because Windows and Linux are not designed for hard real time.

True, although RT kernels have been developed for both.

Windows actually features Autoboost (Priority Inheritance). As Windows also has IO priority, the boost also applies to IO operations, boosting both the thread priority as well as the IO request priority of the lock owner, when a higher-priority thread blocks on a mutex owned by a lower-priority thread.


> On the other hand, I have yet to learn whether Mac OS has any synchronisation primitives that mitigate mutex priority inversion.

It does! xnu supports priority inheritance through "turnstiles", a kernel-internal mechanism which is used by default by a number of locking primitives (list at [1]), including normal pthread mutexes (though not read-write locks [2]), as well as the os_unfair_lock API (via the ulock syscalls). With pthread mutexes, you can actually explicitly request priority inheritance by calling pthread_mutexattr_setprotocol [3] with PTHREAD_PRIO_INHERIT; the Apple implementation supports it, but currently ignores the protocol setting and just gives all mutexes priority inheritance.

[1] https://opensource.apple.com/source/xnu/xnu-4903.241.1/osfmk...

[2] https://opensource.apple.com/source/libpthread/libpthread-33..., search for _kwq_use_turnstile

[2] https://linux.die.net/man/3/pthread_mutexattr_setprotocol


> xnu supports priority inheritance through "turnstiles"

Do you have any information on how long this "turnstiles" feature has been available? [1] I was aware that the userspace pthread libraries implement but ignore PTHREAD_PRIO_INHERIT, but wasn't aware that the underlying implementation gave priority inheritance semantics.

Update: Looks like the "turnstiles" feature may have its origins in Solaris. [2]

[1] This link would suggest as of Mac OS 10.14: http://newosxbook.com/articles/12-10.14.html

[2] http://sunsite.uakom.sk/sunworldonline/swol-08-1999/swol-08-...


Yeah, looks like 10.14 and iOS 12. I actually hadn't realized it was so new. Before that, it seems that priority inheritance ("promotion") had long existed for kernel-internal locks [1], but not for anything in userland. It also seems that the behavior of supporting but ignoring PTHREAD_PRIO_INHERIT long predates turnstiles, which is pretty annoying.

As for origins... xnu's implementation appears to be written from scratch, but in terms of inspiration for the design and name, I'm not sure. Aside from Solaris, another possibility is FreeBSD [2] (but note that xnu's turnstiles are in the Mach side of the codebase rather than the BSD side).

[1] https://opensource.apple.com/source/xnu/xnu-1504.15.3/osfmk/...

[2] https://github.com/freebsd/freebsd/commits/master/sys/sys/tu...


Priority inheritance isn't always trivial to implement. The protocols by which a high-priority thread passes its higher priority to a lower priority thread need to be synchronous, otherwise both could end up executing concurrently at high priority. Or you could make the protocol asynchronous but the scheduler fair enough, dividing the time between the two threads, or perhaps have the threads swap priority while the originally-lower-priority thread works on behalf of the other.

With asynchronous designs you also have a problem identifying when the lower-priority thread is done working on behalf of the higher-priority thread. You can work around this by only letting the worker inherit the higher priority for a small amount of time, but this is guess work: after all, the worker itself could be using async I/O and processing many different clients' requests over any time span, so how can the scheduler know when it's working on behalf of one client or another?

If you trust the lower-priority threads then just let them manage their priorities according to those of the clients they manage. Now you need something like SCM_RIGHTS/SCM_CREDS to pass priority tokens/info around.

Asynchrony is essential, so I suspect that automatic priority inheritance can't quite be had.


Those solutions raise the low-priority task to the higher circle. Which breaks a deadlock but doesn't fix the latency issue.

The low-priority task was low-priority for a reason. It wasn't supposed to soak up CPU cycles needed for higher-priority tasks. Those solutions do just that - making a long-running background task run before low-latency more-critical tasks increases their latency (delays them responding to their time-sensitive events).


Grand Central Dispatch can resolve certain kinds of priority inversions involving owned locks.


Solution from the article: Don't fiddle with thread priorities.

Solution that works well in real life: Use lock-free (or, better still, wait-free) data structures.

The trick is that, for those of us in the devices and media world, and despite not always using RT schedulers, we generally need (largely) consistently prioritized performance for critical operations. For example, I actually need my device audio buffer callback to be timely to avoid displeasing gaps.

How have whole industries managed to deliver sellable products in these spaces without resorting to RT kernels? In many cases, careful uses of thread priorities and, yes, mutexes, have been involved.

This article feels like it was written by someone half-way through:

Junior engineer: Just use thread priority.

Senior engineer: No! Thread priority is a minefield.

Principal engineer: We're going to carefully walk through this minefield.


It would be nice though if media and games could get soft real time guarantees so as to avoid priority inversions on things like the Spotlight indexer.

I got dinged on a performance review once at Apple for negatively comparing the MacOS process and IO schedulers to Linux while I was working on Final Cut/QuickTime 10 years ago. Apparently "why can't we adapt their algorithm to our stuff, it's not a secret" is not constructive enough.


To be fair, the author says to not fiddle with priorities if possible

> Conclusion: If possible, avoid headaches by just not fiddling around with thread priorities.

I often see developers using complex and potentially dangerous techniques, when there are simpler options available. I imagine that's what the author is warning against


> This article feels like it was written by someone half-way through [...]

You might be right. I'm not a principal engineer. :)

The first solution in the wild we came up with back then when confronted with this problem, actually was to use a lock-free queue. Only later we figured out, that we did not need the prioritizing after all, and our software worked just fine with all threads having the default priority.


Jr./Sr./PE is just a strawman for illustrative purposes. No ranking aspersions intended... In order I've been SDE, Sr. SDE, Senior PE, Sr. SDE, PE... It's pretty meaningless outside of the scope of one company's system.

There are plenty of hardware/software systems out there that leverage thread priority to keep low latency operations humming along (e.g. clearing a hardware FIFO) or keep background operations out of the way (ish) when the system is heavily loaded.

The interactions between threads are often sparse and directional (e.g. producer/consumer) in a reasonably designed system. Needing to play with priority to govern the interaction's between one's own internal thread boundaries can be a bad code smell. Using it to better satisfy external restrictions (e.g. user input, hardware interaction, callback handling) isn't too uncommon if you want to maximize the utility of your hardware.


> If possible, avoid headaches by just not fiddling around with thread priorities.

No, learn the ins and outs of the primitives of your OS. Threads are perfectly fine to run at different priorities but as soon as you start communicating between them they become a chain of sorts and if resources are exhausted (such as a queue reaching a maximum size) or if you artificially couple the threads using some kind of sync mechanism then your priorities will not what you want them to be.

Prioritization works well for independent threads, not for dependent threads, which are somewhat closer to co-routines that happen to use the thread scheduler rather than that they call each other directly.


The PI the article describes does not need an exhausted queue, just merely used.


That's because that particular queue is not implemented lock-free. If it were you'd have to exhaust it first.


This article seems like 3 pages of my CS Operating Systems unit summarised into an article. Since I started reading Hacker News I've always assumed the userbase has a typical education equating to at least a CS education.

Am I wrong for assuming that this article seems fairly low level for the user base here? It just seems to me that if you can read this article and understand push, pop, thread priority, mutex, transitive, etc then it's more than likely that someone has already lectured at you about the issues that can arrive with using mutexes for locking.


Setting aside that many here are self-taught (or haven't even reached university age yet, the first class when I went to university had an interesting bimodal distribution: half the class already knew how to program well, the other half had no idea), there are other reasons why an article like this is useful:

* Some might not have learned that particular piece of knowledge, either because their particular OS class didn't cover it, or because they skipped it;

* Others might have learned it, but later forgotten all about it, or at least some of the details;

* Even if they do remember it, this article can act as a reminder ("pay attention if you set thread priorities, because priority inversion").


I would say your assumption is wrong. A lot of people have a CS background, but there are many others that don't.


Also there are different approaches. I hardly do any low-level threading but I'm aware of the fact that locking and context switching can be expensive. I don't think that one always has to learn theory first, it can also go the other way.


The guy who started ycomb has a philosophy degree.

I've been reading HN since 2011, and the only programming course I've taken is a (bad) into to programming that all engineers had to take.

Maybe most of everything I know about CS I've learned reading it on HN.


This is called priority inversion.

One way to deal with this is to make it so that when a low-priority thread dequeues a message from a high-priority thread then the low-priority thread temporarily inherits the client's higher priority, then later goes back to its lower priority. The problem is identifying when to go back. Solaris doors was an IPC mechanism that did this well, but at the price of being synchronous -- when you throw asynchrony in that approach doesn't work, and you really do want asynchrony. If you trust the low-priority threads enough you can let them pick a priority according to that of the client they are servicing at any moment.


I was hoping for a cool story of os bugs currupting threads. Instead, as others pointed out, this is just a more verbose explanation of priority inversion.

I would like to read the zombie injection story still.



Note that this is only catastrophic when using realtime thread priorities, i.e. thread priority kinds that cause a runnable higher priority thread to run instead of a lower priority one regardless of how much time they have been running already.

In general OSes, the commonly used thread priorities are not realtime, and you need admin/root to set realtime thread priorities.

Also indefinite deadlock can only happen if realtime threads want to take up more CPU than available (which in particular requires to not have more cores than realtime threads), since otherwise they will eventually all be sleeping/waiting, allowing any threads waiting on the mutex to run.


Can this be resumed to “A consumer cannot be faster than the producer”?


The infection is the other way around here, the producer thread is slowed down due to a low priority of the consumer.

TLDR for article: the consumer has a lower priority and pauses on the context switch of the mutex lock, but doesn't get rescheduled for a while. The producer then has to wait for the consumer to get rescheduled and unlock the mutex before it can enqueue something.


Isn't this an issue on the OS side? It should only actually lock the mutex once the scheduler returns, and not on the entrance to the function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: