I found the cheribuild ( https://github.com/CTSRD-CHERI/cheribuild ) script to build a cross compiler, vm image, and CHERI riscv emulator to be very easy to use and effective.
My code mostly compiled on CHERI without issue-- I was hoping to find some interesting bug as that's a pretty realistic prospect when porting to CHERI but didn't do so on the code I've tried so far.
CHERI is a research project currently. It has a number of large outstanding issues - CHERI essentially needs to do a whole-system stop-the-world garbage collection phase for precise tag revocation in order to avoid non-stochiastic temporal safety, iirc, which is alluded to in the OP as the "revoker" (google "CHERIvoke sweep") - and is still under active research. It is not something that can just be rolled out to production on one of the highest profile devices in the world. Porting code to run on CHERI, especially if you want to actually use the security features it provides, is also a large undertaking, and would be a massive userspace change. Apple is doing a lot of very good work in similar spaces, between PAC and FireBloom and related security measures, but there's a very big difference between that and CHERI.
Stop-the-world is only brief, to scan the register file of each thread and any other capabilities held in the kernel (asynchronous I/O, signal handlers, etc). Once done the entire memory sweep can be done concurrent with execution.
As for porting, it very much depends what you're doing. Operating systems and language runtimes, especially those with JITs, have intimate knowledge of the architecture and like to play cute tricks with pointers, so those are disproportionately involved to port. General user code requires very little, if any, porting. In one study, a basic KDE+X11 desktop stack was ported to CHERI, seeing 0.026% LoC changes across 6 million LoC, or 1584 lines. It's non-zero, and of course there is a lot of code out there so even a tiny fraction of it isn't insignificant, but it is very small as these things go.
> Stop-the-world is only brief, to scan the register file of each thread and any other capabilities held in the kernel (asynchronous I/O, signal handlers, etc). Once done the entire memory sweep can be done concurrent with execution.
Is there documentation of the scheme that allows this? I did some quick Googling and found CHERIvoke (which is not concurrent at all) and Cornucopia (which requires scanning some user pages while the world is stopped). Are you referring to something newer?
Potentially (likely) uninformed question: Doesn't this have the potential side-effect of wrecking L1/L2 caches? If so that could get quite pathological for many workloads.
Or is there some new fancy dedicated hardware registers involved?
You’re talking about a depressing future and all I’m hearing is a depressing present where you think the best way to fight for software freedom is to ensure people get hacked rather than, like, actual policies designed around giving control back to users. I can think of few things less empowering than software that is designed to be intentionally worse for an auxiliary side benefit because we can’t get our act together and figure out a real way to manifest what we actually want.
> you think the best way to fight for software freedom is to ensure people get hacked rather than, like, actual policies designed around giving control back to users
It's just being realistic. Obviously, it would be great if there were laws guaranteeing the users' right to have full control over devices they've purchased, but I think most people have accepted that this ship has sailed. So we can only really hang on to the little control we still have, which is usually achieved without the manufacturer's permission.
No, that’s just being stupid. There are thousands of people in Europe petitioning right now to give people freedom to run games even after the company that runs them shuts down. People are working on accessibility technologies that help put computers in the hands of more people than ever before. Countries are slowly turning the screws on companies who abuse their monopolies to make worse products for people. Meanwhile ‘userbinator thinks the best way to ensure software freedom is to argue on an Internet forum about why everyone should use C.
BTW: Stallman is brought up too which is actually very relevant for why this is such a losing argument: jailbreaks are cool and all but today they are basically irrelevant for the average user. This is just like Stallman fighting for GPL is nice but only really for people who compile their software themselves. People like the ability to have control over their software but going about it from the perspective of a software engineer who can write code or hack things is very exclusionary.
What does jailbreaking have to do with memory safety? I thought the cpu that will only boot signed binaries was the end of jailbreaking
I, for one, would appreciate a device that isn't riddled with memory bugs that are exploitable principally by nation states; one might say buggy code is the backdoor free software folk are so paranoid about
The overhead of 128-bit pointers is a pretty big downside, I'd imagine, especially with only 8 GB of RAM on some models. Also, it confuses any code that assumes you can convert a pointer to a long (or long long) and back.
plenty of applications would be just fine with a 5x slowdown vs state of the art CPUs and could desperately use the security improvement.
Other than the temporal safety overheads-- which you could skil and still have spatial safety worth using-- the worst CHERI does is increase the memory bandwidth for pointers, so in a pathological case you might imagine a 2x slowdown.
In some cases it could make things faster and simpler though, since the CHERI approach can get process isolation without a MMU or TLB, so those overheads could be reduced or eliminated.
> Also, it confuses any code that assumes you can convert a pointer to a long (or long long) and back
such code is already pretty unportable on existing systems.
You can't implement an xor linked list on CHERI (at least not in the pure mode)... hardly seems like much of a loss. :)
lol. We had 32-bit pointers on 386/486 systems in the early '90s, on systems with frequently less than 4MB of RAM.
If making pointers 4x bigger on systems with 2000x the amount of memory is a problem, I'm not sure how we're going to be able to solve anything ever again.
> Also, it confuses any code that assumes you can convert a pointer to a long (or long long) and back.
Well, that's why we invented intptr_t. Back in... checks notes C99. 25 years ago.
Correct. In fact since memory access must be aligned (I think Linux uses 4 byte alignment, which is the alignment of instructions) the lower bits of every code pointer can be used for this purpose.
ARM as in 32-bit arm already does this. If the lowest bit of the pointer is set to one when control jumps there it is treated as thumb (2 byte) instructions rather than A32 (4 byte) - that lower bit is still masked away of course.
The QARMA block cipher ARM proposes for PAC (but does not mandate, any can be used) "signs" (it isn't really signing, more like a mac but whatever) pointers with an authentication code as small as 3 bits.
That might sound nuts but a 3 bit code means you have 1/8 chance of bruteforcing correctly. Raise to the power of the number of required forged pointers and you can see how an exploit can be made very unreliable. E.g. if you need 4 forged pointers you now have (1/8)^4 chance of succeeding.
I mean the extra 8 bytes of RAM per pointer, every time you store an object in memory that refers to another object, or that points to a vtable. No amount of register size will get around that.
Companies goals include backward compatibility, ease of integration with existing hardware blocks (esp GPU’s/AI), raw speed, memory efficiency, and energy use. Most security features either work against these or are perceived to. They also usually cost more in development while slowing it down. If they don’t, you might run into unknowns that risk tanking the whole project. The Snowden leaks showed many big companies were also paid to make their software vulnerable on purpose which CHERI would work against.
There’s also motivations. Companies usually don’t care about real security because they’ll be rich anyway. FOSS people are motivated by what they enjoy which is rarely securing the codebase. Relevant to Apple and Google, some companies will ignore good solutions, even permissively licensed, because they’re Not Invented Here. It has to be their way or it won’t happen at all.
Any of these might be motivators for Apple not baking in something like CHERI. I agree with you that they’re in the best position. If I was ever hired by them, and able to influence a big decision, baking memory safety into the CPU in a compatibility-preserving way was one of the first decisions I’d have made. It would be a huge differentiator.
Because they already have PAC, use their own Safe C dialect for iBoot, and have the long term roadmap to drag the hordes in their ecosystem into Swift.
I also expect them to eventually adopt ARM MTE was well.
Not being well versed in this area ... do you believe CHERI is providing any safety (or security) that macOS/iOS don't already have in place on Apple Silicon devices?
Yes, it does additional stuff, however it isn't under Apple's control, it is being researched at Microsoft's Cambridge labs, so Apple will most likely go their own ways with PAC and MTE.
CHERI is very ambitious but also very different what we have now in a lot of subtle ways.
TL;DR: CHERI changes how pointer work and needs both hard and software support _all through the stack from kernel over drivers to your end user application. Companies will be very careful (slow) about evaluating and adapting it, there are many challenges both technical and organizational(people).
TL;DR2: CHERI is experimental both in software and hardware I think there isn't even a python interpreter which works with the fully enforced (pure-capability) CHERI.
Now to be clear how much your software needs to change depends a bit on weather you use the pure-capability API or the hybrid API but if you use the hybrid API and don't recompile your code in a way which might require code changes you are not getting any security benefit and at least the kernel (and in turn in-kernel drivers) should all work in pure-capability mode.
But that means your pointers now all are 128bit consisting of an address and a tag. And while a lot of pure C/C++ standard compatible code not doing anything fancy will work, a lot of C/C++ code does fancy things and will not work. E.g.you can't cast a CHERI pointer to a int and back and expect it to work as the int only represents the address but not the tag (through doing so is anyway highly problematic due to pointer provenance). C/C++ programs written using a "C is just high level assembly" or a "pointers are just memory addresses" or a "data is just bits in RAM" approach already cause a lot of issues today due to UB and with CHERI this approaches work even less.
And even if you program does work, due to 128bit pointer size it might have performance problems it didn't had before needing optimizations it didn't need before.
And non of this is even considering weather or not it can effectively be integrated in the current Apple ARM hardware without causing layout or performance issues (it might but it might also not, I think only apple can answer that properly. I mean it definitely is possible but it comes at a (work time=money) cost which might be not so small).
If I would be Apple I would wait for the adoption until the ecosystem around CHERI becomes more mature and when it start showing signs of success help it actively to become more mature. But for that you need:
- non experimental ARM ISA extensions
- better language support (I don't think there is any Swift CHERI support)
- more experience with porting large software stacks to it
- porting of their Kernel, drivers
- porting of their core libraries (for simplicity lets assume iOS), notably here is e.g. their JavaScript VM
- adopting their current iOS specific security mechanisms to CHERI
- creating a lot of tooling to make migration for developers feasibly (no less effort then they had to go through for the x86->ARM migration)
and even then they still would need to support running apps in a compatibility mode for many years as they can't expect existing apps to be made compatible with CHERI as they can't expect all of the software stack that apps use to be made compatible. They might simple not care for smaller devs and force them but they can't do so for many of the larger companies core apps people expect.
And all of this steps have supper funny surprise issues as there is code you can't make work with pure-capability cherry as it's too much designed around "clever tricks" which do not work with cherry. Like how does it affects JITs/AOTs and could there be cases where you can't make it work with pure-capability mode in a performant way. Like e.g. can you even make WASM AOT work with CHERI and if not can you require browsers to now have CHERI-WASM and non CHERI-WASM???
Cool. Wasn’t aware of CHERI, but was thinking about a “CHERI light” that would use the regular x86/arm MMU to enforce some compartmentalization within programs the other day. Anybody know if there is such a project?
Not exactly the same, but Mozilla had a whacky idea of compiling C to WASM and then WASM back to native code, which gives a natively running program that is sandboxing its own memory accesses:
Intel MPX had both hardware support for "bounds tables" and bounds checks[0], along with a "memory key" system[1] more similar to CHERI for memory tagging (only at page granularity, however). Both were massive failures and they are no longer supporting the extensions on new chips going forward.
MPX is dead, but memory protection keys are a distinct feature and work fine (with some limitations); I haven't seen anything about them being removed at all and Intel doesn't note them in their deprecated features list. AMD even added support for them in Zen3.
I worked on a prototype to add kernel-mediated segmentation to linux. its easy to associate a segment map to a thread, and manage maps in virtual addresses and provide a call gate which is a registered procedure that can change a map.
making it airtight and useful in an existing programming environment is less straightforward
Old guy here -- I came of age in the PDP-11 era. I thought it was funny, just a gentle parody of a certain Im-being-patient writing style.
I am sure the author does not really believe older people can't understand CHERI. One of the original contributors to CHERI is Peter G. Neumann, born 1932, programming since the 1950s. Chisnall (the author of this piece) and Neumann are among the co-authors of a paper on CHERI presented at ASPLOS'24 this year [1]
I guess this shows the risk in using humor in scientific writing -- you never know how it will land.
We are roughly the same age. I did read this as an attempt at humor, but of the same sort as racist or sexist humor, relying on a bigoted "it's funny because it's true" mindset. I suppose the reason is because that sort of sentiment is often seriously held in our industry.
It made it impossible for me to continue reading the paper, though. Ah well.
My code mostly compiled on CHERI without issue-- I was hoping to find some interesting bug as that's a pretty realistic prospect when porting to CHERI but didn't do so on the code I've tried so far.