Hacker News new | past | comments | ask | show | jobs | submit login

> So ok() only has to check if threads have a reference to <oldval> that they are still using.

Right. The problem with doing this while other threads are running is that ok() can return true, correctly so for its point in time, then immediately after ok() returns another thread could enter objc_msgSend() while you are inside free(). Maybe ok() ran on thread A while thread B was right at unrelated function foo()'s "call objc_msgSend" instruction. The check is OK at a point in time, but perhaps by the time you enter free() thread B did an unsafe read.

> A thread that wasn't in BAD enters BAD. But that will use <newval>, so that's fine too.

You can't make guarantees that it will see newval. Whether or not it does depends on timing. For example, maybe the guy who calls ok() gets a page fault or is on a very busy CPU with lots of preemption happening. That will alter timing in the direction of this being unsafe. Cache coherence may also be an issue here.




You have to be atomically reading and writing cache (which you can do). So then after

    cache = <newval>
...any future reads of cache will use <newval> (or <newer-val>, but not <oldval>).

That's what makes the trick work w/o having to pause threads.


> cache = <newval>

As written, that is not a guarantee that all cores will see <newval> immediately. It's very CPU-specific but you may need memory fences to achieve this.

Further, in my opinion it's kind of playing with fire.

Edit: Also, it is my impression reading the article that <oldval> is actually a shared list of old caches to be freed (gOldCachesList). That makes it a lot more complicated than your example snippet and leaves more potential for nasty synchronization problems.


Strictly speaking you're right. However it's not necessary that other cores see the new store immediately: it's only necessary that they see it before the memory is freed, which occurs after a substantial delay sufficient to prevent these out-of-order issues on other CPUs. This is the technique by which memory barriers are elided on the read side (but not the write side).


Yes, you kind of do need the read barrier, otherwise you get exactly the problem I described with the stale cache.

You can't "kinda" have synchronization just as a design for a mutex can't be a "kinda" lock... You may get away with it for a while, but like I said, playing with fire.


This is both clever and terrifying :). How do they prevent compiler from scheduling the load early, observing, and using the stale cache pointer? Do they have at least a compiler fence on the read side?


The read-side code in question is hand-written assembly code, isn't it? And on the hardware side, observing pc will probably result in an actual memory barrier happening anyway (even if no signal is sent internally the kernel must still suspend the thread to capture register state, and then it needs a barrier to communicate that result back to the asking thread).


Yes, perhaps that part is in the hand written assembly portion that's mentioned.


Nothing prevents using a stale cache pointer. The stale cache still contains correct data, so using it is benign, up until it gets deallocated.

A stale cache pointer may become wrong if a method implementation is replaced, e.g. by a category, but replacing a method while calling it in another thread is inherently racey.


By stale I meant holding a ref to a cache about to be deleted.


Yes, obviously you may need barriers or fences depending on the exact architectural details, and as another commenter mentioned you may be able to elide the read barrier if you know what is going on.

This all about optimizing things so the read side doesn't have to do any synchronization. The actual implementation takes locks in functions that mutate the cache, it is just done in such a way that read side always sees consistent data without taking a lock. Since the write side has locks it can safely queue stuff for deletion is actually pretty trivial.

As for playing with fire... the core dispatch routine of your runtime is exactly the place you want to complicated things that exploit innate knowledge of your architecture. It is a small piece of code that is incredibly hot and will get a lot of scrutiny.


Indeed. One nitpick though: you can't get around pausing the thread briefly or else waiting for it to hit an interrupt, because there's no way to observe another CPU core's registers without its cooperation.


True, I should have said it is not concurrently stopping all the threads. If you actually need the register state of a thread that is currently running on another processor it may be stopped to get it (though if you ask for the state of a thread that is not currently executing you may get its saved register state out of the kernel without having to actually pause its execution, of course being able to tell what is and is not actually running on other CPUs at that granularity is not really feasible).


As written, it's psuedo-code. ;) Yup it's tricky and platform specific, but the point is it can be done (or at least that's my understanding -- I do not have first hand knowledge).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: