The "Benaphore" was a pretty old idea, but the Futex isn't just "Oh it's a Benaphore but Linux" the essential trick is that you don't actually need a kernel object - your "locking primitive" at all, and that idea is where this goes from "Yeah, everybody knows that" to OK, our OS should add this feature ASAP.
Instead of an OS synchronisation object which is used to handle conflicts, with the futex design the OS carries a list of address -> thread mappings. If a thread T is asleep on a futex at address X, the address X goes in the list pointing to thread T. When the OS is asked to wake the X futex, it walks the list and wakes T.
The give away is the limits. For something like Benaphores you're constrained, these are a costly OS wide resource, I think BeOS only allowed 65536 per machine or something. But a Futex is just memory, so there's no reason to have any limit at all.
The "Benaphore" was a pretty old idea, but the Futex isn't just "Oh it's a Benaphore but Linux" the essential trick is that you don't actually need a kernel object - your "locking primitive" at all, and that idea is where this goes from "Yeah, everybody knows that" to OK, our OS should add this feature ASAP.
Instead of an OS synchronisation object which is used to handle conflicts, with the futex design the OS carries a list of address -> thread mappings. If a thread T is asleep on a futex at address X, the address X goes in the list pointing to thread T. When the OS is asked to wake the X futex, it walks the list and wakes T.
The give away is the limits. For something like Benaphores you're constrained, these are a costly OS wide resource, I think BeOS only allowed 65536 per machine or something. But a Futex is just memory, so there's no reason to have any limit at all.