> For all major architectures and CHERI there is only one address space as far as I’m concerned, but it’s potentially worth opening the door for properly talking about pointers in segmented architectures here.
I regret to inform you that x86_64 (which is probably a “major architecture”) is segmented and has three address spaces. In practice, it has two. In user code, those address spaces are normal memory and TLS and, in kernel code, they are normal memory and percpu memory.
And the proposed scheme in this article is, on first read, amazing. It would make the epic hackery in the kernel percpu code safer (in Rust, anyway). The major caveat is that, for even slightly decent codegen, the address space of an x86_64 pointer should be part of its type, not its value. Even do, this scheme still works:
ptr.with_addr(usize) -> ptr
has a type! If the input pointer is TLS or percpu, so is the output.
Segments are somewhat different on x86-64, which uses them to just compute a virtual address that you can just use as a normal pointer. There’s no “fat” pointer really involved here, the segment register just provides a convenient place to stash the base address of the TLS or whatever you’re using it for. This is why you can store a normal pointer that points into the TLS. This is different from segmentation in the past where you could really keep everything in your address space at once, so every use of the pointer required arithmetic, or with CHERI where you need to pass around the full “big” pointer to have the right capability to access the memory.
There are C embedded extensions for separate address spaces. So the idea is not totally new. But AIUI, the C extensions still require some way of converting a pointer in a specified address space to the usual "generic" pointer type where the address space could only be part of the vale.
It’s a funny register: you don’t want to read it (and before Ivy Bridge, user code couldn’t read it). The base is added by the CPU when you dereference the pointer. So code like:
__thread int array[100];
array[n] += 5:
Effectively computes the offset portion of array+n and then, using CPU assistance, accesses that memory relative to the segment.
Having watched CHERI go through its evolution for the last couple years, I am extremely glad that there’s now a modern thing that breaks the “pointers are integers” assumptions that people have mostly been OK with making for several decades, thinking that those were just things that “old” computers did and were now irrelevant to support. I’m really hoping to see architectures start re-adding trapping arithmetic too so we can start taking advantage of all the things in C that everyone hates is UB but can be repurposed into catching real bugs in hardware. I suspect this is going to break a lot of code that’s been playing fast-and-loose with these things, but “all the world’s a VAX” is a thing we got over before and now it’s going to be replaced with “Intel”.
What's funny about "all the world's a VAX" is that the VAX could trap on integer overflow. I suspect that C compilers on VAX never set that mode though (it's a flag in the status register).
IIRC the Alpha did that as well but due to instruction pipelining it was more complicated in practice. I want to say it was used on some SCADA software but that's an ancient memory -- and it wasn't even everywhere in that software as I remember an integer overflow error in an alarm processing program that we had to hot-patch one night with live data, that was exciting.
Making myself nostalgic for the Alpha again ... I'll be over in the corner crying into my beer.
As a former JavaScript VM hacker, I think what I want the most is a conditional call instruction (like on arm32). The conditional flags are already computed for arithmetic, and there are already conditional jumps. JS VMs use them to check for overflow out of the small integer range (31 bits, but shifted one left, so a 32-bit arith op generates the right flags) followed by a branch to out-of-line code. That out-of-line code has to be duplicated for every different check site, since a branch doesn't keep track of where it came from. But a conditional call instruction would, either pushing the return address or putting it in a register. This allows a single, shared routine. You can then use this for all kinds of things....inline bump-pointer allocation with an automatic bailout to a slowpath, safety checks of all kinds, profiling, deoptimization, you name it.
What I wouldn't give for a conditional call instruction!
I imagine this isn’t the first time you’ve realized you wanted this so I’m curious: what’s stopping this today? CPU designers being lethargic? Technical complexity? Something else?
It seems like the kind of feature that competition among CPU manufacturers would drive towards an implementation out of a desire to claim faster performance on a workload that people care about.
I’m way out of my depth here, but as far as I know flags have a non-trivial interaction with superscalar execution.
Not sure how accurate it is, but I can imagine it being similar to parallelizing a program with independent units of works vs trying to do the same for threads that depend on each other’s output.
Flags are treated as just another kind of dependency in the register renaming phase, though it's a lot hairier because they probably need to be tracked on an individual (per-bit) basis.
As a sibling commenter noted, a conditional call is basically just a macro for a (negated) conditional branch jump over a call instruction, but with a different prediction.
such VMs would be near the only thing to benefit from them, and, still, only in code size (which wouldn't be in cache anyway) and codegen speed, i.e. not anything related to actual runtime speed.
You can't use them for actually conditionally calling functions because functions need arguments, thrash registers, and whatever else functions do.
How would "conditional call" differ from the obvious pattern of forward branching on the negative of the original condition to skip over a call instruction? The pattern could easily be macro-fused in hardware if deemed worthwhile.
"Default" (static) prediction matters very little in modern processors though. Branch prediction is largely based on dynamic data about the actual program run.
They still do matter because programs are huge, with lots of cold and warm (not hot) code, and there are only so many branch prediction entries. Compilers have been tuned for several decades with this assumption in mind and processors, in turn, keep reinforcing it, because they keep running code produced by such compilers. It'll take some very extensive measurements to show that it is no longer the case.
The fact that modern CPU architectures in general have abandoned custom prediction hints suggests that static branch prediction is being viewed as less important.
x86 had INTO, interrupt when overflow, which basically gave this functionality. It would allow software to treat an overflow basically like a segmentation violation.
Coming back to our discussions, Android 13 is finally making ARM MTE a reality, so that is yet another front.
While only available as developer mode flag, this might mean it will be enabled by default on Android 14, as this is how such kind of features tend to be integrated into Android (1 OS generation for testing purposes and fine tuning).
I find a bit ironic that for all the hate Oracle gets, Solaris SPARC has been the most successful UNIX to tame C, and has taken so long for this kind of features to spread into other platforms.
Coming from a position knowing essentially nothing about computer architecture or compiler fundamentals CHERI seems like a bandaid in the same way that garbage collection is a bandaid. I may be way off base, but it feels like the things CHERI ensures at “run time” _are_ enforceable at “compile time”. I’m not saying it doesn’t have a place though - we definitely don’t live in a world we’re such things are enforced so we need systems to check things at run time. It looks like a good solution but doesn’t feel like the “right” thing. Am I getting the right idea of what it’s purpose is?
I think there's two important things to note here.
1. Bandaids are critical. We can't just fix the billion lines of C code out there by replacing it - we need strategies that wholesale destroy bug classes. CHERI does that. If you concretely removed temporal unsafety from all of an operating system's userland, just by recompiling, you'd have done the world a great service.
2. CHERI is very strong. It's not like some mitigation techniques, which rely on increasingly niche thread models (ASLR makes less and less sense as the world pushes towards "send code, not data" - to be fair, pax/grsec explicitly noted this 20 years ago, so it's hardly the fault of ASLR). CHERI totally wrecks spatial memory unsafety. That's fucking huge. The death of buffer overflows. Combined with other techniques like PAC, the memory unsafety threat becomes seriously less critical, or at least that's my position on it - would love to hear someone point out why I'm overly optimistic on this.
In theory you could remove all bounds checking from code and leave it up to the hardware to enforce.
It also is just sane. Like, I think containers are "sane" - they finally split the OS into disparate pieces. I think that NX is sane - why should memory be RWX by default? "A pointer for an allocation can dereference memory anywhere in the address space" is not sane, so hardware enforcing sanity is good.
Where we can enforce sanity, we should. CHERI does that imo.
Also, idk, even in memory safe languages it's not like bounds just go away. You get compile time assurances that either the bounds are gone and that's safe, or they're not gone and that's safe.
> We can't just fix the billion lines of C code out there by replacing it - we need strategies that wholesale destroy bug classes.
Won't any C code making use of int-to-pointer casts fail to work on CHERI? And isn't that a property of virtually every C codebase in existence? It sounds like that code will need to be rewritten either way.
> CHERI totally wrecks spatial memory unsafety. That's fucking huge. The death of buffer overflows.
You'll have to elaborate on what PAC is, but leaving use-after-free/temporal memory unsafety on the table is a pretty conspicuous hole. If the answer for that involves runtime overhead, then it will be tough to sell moving to a new architecture as opposed to just using runtime mitigations on existing architectures (or, at the limit, rewriting the important bits in Rust).
No, as my article notes, because C has a dozen slightly different definitions for "integers that are basically pointers" they were able to keep size_t 64-bit and make intptr_t into a pointer-sized integer and mark it as something the compiler should manipulate as if it was 'void*' (because it is as far as CHERI is concerned). This is still a messy hack but it vaguely functions.
Rust doesn't have this luxury because it only defines an exact-pointer-sized integer, so it has to map usize to intptr_t and bloat up everything really badly.
'Gankra answers the cast part but for the other question: PAC is "pointer authentication codes", which is pretty much what it sounds like. It's a CFI+data integrity protection measure. Re: temporal memory safety, I don't think MTE solves everything but my understanding was you can revoke a capability for a region when the memory is freed and then mint it with a new one when allocating it out again, which would make stale pointers to it/UAF fault on access. Will have to check what the exact details are.
> leaving use-after-free/temporal memory unsafety on the table is a pretty conspicuous hole
It's not a hole, it's just not addressed by CHERI explicitly. It's like saying that ROP is a "hole" in NX - it's just not part of the NX threat model (well, it is/was, since they knew about it - but that is not the point).
Yes, temporal safety is a concern. But removing spatial memory unsafety as an entire primitive will have consequences for practical exploitation of temporal vulnerabilities in some cases. A full exploit chain is often going to abuse both properties - though not always.
But also, CHERI plays well with other mitigations, like MTE (which PAC is a precursor, I should have said MTE).
MTE does address temporal safety. But a flaw is that it relies on pointer metadata that isn't protected against spatial unsafety. So CHERI reinforces that protection.
The point being, removing spatial unsafety in the way that CHERI does (ie: enforced by hardware) will have significant impact across the board for security.
Whether it plays out as well as I hope remains to be seen, but I think it would make practical exploitation of many temporal vulnerabilities more difficult.
> it will be tough to sell moving to a new architecture as opposed to just using runtime mitigations on existing architectures (or, at the limit, rewriting the important bits in Rust).
FWIW PAC is already deployed on every Android device afaik. I'd love to see everything rewritten in Rust but I'd still want CHERI.
> If we can find a legitimate user client that provides an implementation of getTargetAndTrapForIndex() that returns a pointer to an IOExternalTrap residing in writable memory, then all we have to do is replace trap->func with a PACIZA'd function pointer (that is, a pointer signed under APIAKey with context 0). That means only a partial PAC bypass, such as the ability to forge just PACIZA pointers, would be sufficient.
Maybe I'm lazy but it seems like you can just store your fat 129-bit pointers in a hashmap and lookup normal 64-bit addresses in the hashmap to dereference them. This defeats the security model and is a little slow but means you don't need to change any code to run on CHERI.
Maybe I am misunderstanding how this works, but if you did that, you would no longer have a pointer with a valid metadata flag, and if you try to use it, you get a fault (at least on the Morello board that ARM has built). Perhaps you could do this on something that is just emulating CHERI, but at that point why are you doing this at all?
I'm not a systems level programmer but I personally super appreciate this kind of dialog and communication about the flaws in the programming language from prominent members in the community. I've seen it a lot more in Rust (and C++) than in most other languages. I wonder if there is a correlation between how much a language grows and changes to how many blog posts there are from prominent members in the language community outlining flaws? (Tangentially, I would love for a search engine where I could track metadata like this)
I personally think that way too often the discourse from the core team / stewards / BDFLs etc is wholly positive about their own community + the technical merits of the programming language they are representing. A healthy dose of "here are some flaws" is very refreshing and humbling, and makes me appreciate the community more.
Very well written and approachable. I think everything noted makes perfect sense, really.
Semi-related, CHERI is so cool. I find it extremely promising and I truly hope that it sees widespread adoption. I really think if CHERI, or approaches similar, gain widespread adoption we will see a new era of security, similar to if not more significant than when Chrome hit the scene for desktops.
They’ve improved some of the things, but the ergonomics of “I have an UnsafeXPointer and I want an UnsafeYPointer” could be significantly improved. Plus, Swift really needs better documentation as to how aliasing works when external code captures a pointer and reinterprets it. My current understand is that it’s far more lenient than what C allows (basically exposing LLVM’s model directly) but this isn’t really documented anywhere that I could find.
Interesting write-up. Unsafe pointers are definitely due for an overhaul.
Suggestion on syntax: just use ‘->’
It has the familiarity from C already. Most rust unsafe programmers are long time C/C++ folks. You can define them to only work on raw pointers (like C) and always produce pointers (unlike C). And using it means no auto-deref magic
ptr->field in C is (*ptr).field, at which point you are in whatever C's equivalent of a "place (lvalue) expression" is. This creates a weird discontinuity where you -> for the first step and then use `.` for subsequent steps and then do the standard "just kidding, it was a pointer offset all along" thing of slapping `&` in front of it.
My proposed ~ always keeps you indirected so you just use ~ all the way until you actually want to load a value from memory (which in C would implicitly happen whenever you have a nested ->).
I think the familiarity might be a double edged sword, since AIUI the proposed operator has different semantics. "ptr~field~field2" is not "ptr->field->field2", it's more like "&(&ptr->field)->field2". Therefore I feel like the familiarity might just confuse people or give them a false sense of security.
Meh. It’s harmless. You get a helpful compiler error the first time and you very quickly learn. I’ve been a C programmer for >20 years and this slight difference wouldn’t bother me.
The problem is not ibvalid.code being rejected with an error, but rather code that is accepted but does something unexpected. Lints can help with that, but that has many limits.
I regret to inform you that x86_64 (which is probably a “major architecture”) is segmented and has three address spaces. In practice, it has two. In user code, those address spaces are normal memory and TLS and, in kernel code, they are normal memory and percpu memory.
And the proposed scheme in this article is, on first read, amazing. It would make the epic hackery in the kernel percpu code safer (in Rust, anyway). The major caveat is that, for even slightly decent codegen, the address space of an x86_64 pointer should be part of its type, not its value. Even do, this scheme still works:
has a type! If the input pointer is TLS or percpu, so is the output.