> So ok() only has to check if threads have a reference to <oldval> that they ar...

dilap · on June 6, 2015

You have to be atomically reading and writing cache (which you can do). So then after

    cache = <newval>

...any future reads of cache will use <newval> (or <newer-val>, but not <oldval>).

That's what makes the trick work w/o having to pause threads.

asveikau · on June 6, 2015

> cache = <newval>

As written, that is not a guarantee that all cores will see <newval> immediately. It's very CPU-specific but you may need memory fences to achieve this.

Further, in my opinion it's kind of playing with fire.

Edit: Also, it is my impression reading the article that <oldval> is actually a shared list of old caches to be freed (gOldCachesList). That makes it a lot more complicated than your example snippet and leaves more potential for nasty synchronization problems.

ridiculous_fish · on June 6, 2015

Strictly speaking you're right. However it's not necessary that other cores see the new store immediately: it's only necessary that they see it before the memory is freed, which occurs after a substantial delay sufficient to prevent these out-of-order issues on other CPUs. This is the technique by which memory barriers are elided on the read side (but not the write side).

asveikau · on June 8, 2015

Yes, you kind of do need the read barrier, otherwise you get exactly the problem I described with the stale cache.

You can't "kinda" have synchronization just as a design for a mutex can't be a "kinda" lock... You may get away with it for a while, but like I said, playing with fire.

vitalyd · on June 7, 2015

This is both clever and terrifying :). How do they prevent compiler from scheduling the load early, observing, and using the stale cache pointer? Do they have at least a compiler fence on the read side?

bdonlan · on June 7, 2015

The read-side code in question is hand-written assembly code, isn't it? And on the hardware side, observing pc will probably result in an actual memory barrier happening anyway (even if no signal is sent internally the kernel must still suspend the thread to capture register state, and then it needs a barrier to communicate that result back to the asking thread).

vitalyd · on June 7, 2015

Yes, perhaps that part is in the hand written assembly portion that's mentioned.

ridiculous_fish · on June 7, 2015

Nothing prevents using a stale cache pointer. The stale cache still contains correct data, so using it is benign, up until it gets deallocated.

A stale cache pointer may become wrong if a method implementation is replaced, e.g. by a category, but replacing a method while calling it in another thread is inherently racey.

vitalyd · on June 7, 2015

By stale I meant holding a ref to a cache about to be deleted.

lgg · on June 7, 2015

Yes, obviously you may need barriers or fences depending on the exact architectural details, and as another commenter mentioned you may be able to elide the read barrier if you know what is going on.

This all about optimizing things so the read side doesn't have to do any synchronization. The actual implementation takes locks in functions that mutate the cache, it is just done in such a way that read side always sees consistent data without taking a lock. Since the write side has locks it can safely queue stuff for deletion is actually pretty trivial.

As for playing with fire... the core dispatch routine of your runtime is exactly the place you want to complicated things that exploit innate knowledge of your architecture. It is a small piece of code that is incredibly hot and will get a lot of scrutiny.

comex · on June 7, 2015

Indeed. One nitpick though: you can't get around pausing the thread briefly or else waiting for it to hit an interrupt, because there's no way to observe another CPU core's registers without its cooperation.

lgg · on June 7, 2015

True, I should have said it is not concurrently stopping all the threads. If you actually need the register state of a thread that is currently running on another processor it may be stopped to get it (though if you ask for the state of a thread that is not currently executing you may get its saved register state out of the kernel without having to actually pause its execution, of course being able to tell what is and is not actually running on other CPUs at that granularity is not really feasible).

dilap · on June 6, 2015

As written, it's psuedo-code. ;) Yup it's tricky and platform specific, but the point is it can be done (or at least that's my understanding -- I do not have first hand knowledge).