Handling of the case where big and little both want the same thing in their cach...

filereaper · on Sept 12, 2016

+1

Not just that, the ISA usually requires that a cache invalidation instruction be issued regardless of whether the chip's coherency will automatically detect and invalidate it.

In cases such as this post, it is perfectly valid for the silicon engineers to say that its the software's fault for not adhering to the ISA.

ChuckMcM · on Sept 12, 2016

Thanks for the update on the manual cache management!

I'll admit that I find the question of dissimilar cache line sizes into the same cache intriguing from an architecture point of view. It has me doodling all sorts of questions into my notebook.

brandmeyer · on Sept 12, 2016

I don't see why they have to do this in userspace at all. If they did:

* allocate read/write buffer

* JIT instructions into it

* change mapping to read/execute

* run the JITted code

Then the kernel manages flushing the data caches on the mapping change, and Mono gets to wrap a Somebody Else's Problem field around it. It sounds like they are instead:

* allocate read/write/execute buffer

* JIT instructions into it

* manually flush relevant data caches (with an assumption that the cache line size is constant)

* run the JITted code

rodrigokumpera · on Sept 12, 2016

That approach is harder to use in practice that in sounds. It's not like people have not tried it.

The OS only let you alloc in large granules, like 4k or 16k, and the vast majority of the methods are significantly smaller than that, meaning a JIT must colocate multiple methods in the same allocation block or waste a significant amount of memory.

We could get around that by remapping memory between read/write to read/execute and have the OS solve the problem for us. Except for a couple of small details, modifying a memory mapping is very expensive and we're, well, in the performance business, and that mono is multi-threaded so one thread might be executing code from the exact page we just made non-executable.

This approach, IIRC, was tried by Firefox as it has some security advantages, but discarded due to the measurable performance impact - and they don't have the second problem as JS is single threaded.

Full Disclosure: I'm part of the Mono team.

caf · on Sept 13, 2016

How is this safe in the multithreaded case anyway? If a process has just written a new JITted method and is flushing the i$ on the CPU it's executing on, but then gets scheduled away part-way through the flush, if you were very unlucky then couldn't another thread then get scheduled on that CPU and try to execute the just-written method, which failed to be fully flushed from that CPU's cache?

rodrigokumpera · on Sept 13, 2016

Multi-threaded safety is simply due to JIT controlling the visibility of the newly compiled code. First flush, then make it visible for execution, can't go wrong with that and scheduling won't matter.

Things get a lot more complicated when it comes to code patching, but the principle is similar.

caf · on Sept 13, 2016

I don't think that helps - the point is that the flush might not be effective if the flushing thread gets scheduled away from the core which has the stale I$ before it manages to fully issue the flush.

Or is the flush guaranteed to flush all cores caches? That would be a fairly unusual design.

oshepherd · on Sept 13, 2016

IC IVAU instructions are broadcast to all cores in the same 'inner shareable domain' (all cores running the same OS instance are in the same inner shareable domain)

caf · on Sept 13, 2016

That does seem like a good solution to let you do this kind of invalidation in userspace. Thanks.

bzbarsky · on Sept 13, 2016

Firefox is shipping W^X for jitcode, as far as I know. Or at least https://bugzilla.mozilla.org/show_bug.cgi?id=1215479 is marked as fixed and I don't see any obvious bugs blocking it that are fallout from that change.

And the "Firefox (Ion)" and "Firefox (non writable jitcode)" lines on https://arewefastyet.com/ seem to coincide...

But yes, actually making this work in practice is not at all simple.

brandmeyer · on Sept 13, 2016

I see the argument about the page size potentially being large relative to the size of a jitted function. But,

> modifying a memory mapping is very expensive

It used to be true that operations on memory mappings were appallingly expensive. However, the advent of virtualization has driven a significant change in performance. IIRC, ARMv8 has a TLB invalidation operation that is per-entry, addressed by the virtual address being invalidated. You don't need to flush the entire TLB cache.

filereaper · on Sept 12, 2016

JITs need to provide read/write/execute access to JIT'ed bodies, particularly for things like Polymorphic Inline Cache's (PICs : https://en.wikipedia.org/wiki/Inline_caching#Polymorphic_inl...)

In which case a manual flush of the icache is needed, hence this problem.