It seems to me that there are three pieces we'd like to have, to show that we can trust Rust's borrow checker as much as we'd like to:
- a runtime model of a hypothetical dialect of Rust without a borrow checker, defining which programs have undefined behaviour because they access memory "wrongly";
- a compiler for that dialect, in practice built on LLVM, that gives LLVM permission to invoke undefined behaviour only in cases where the model says that's allowed;
- a demonstration that any program in the hypothetical dialect that can invoke undefined behaviour either uses `unsafe` or is rejected by the borrow checker.
As I understand it, this paper is a plan for providing the first, and the compiler people think it's possible to use it to provide the second.
Has the third been done, or is someone working on it?
> Has the third been done, or is someone working on it?
There has been a lot of work on this as part of the RustBelt project (that the OP is also a part of). But it's not "done" in any sense, for example traits are still unmodeled and a lot of the unsoundness of some kind or another that's been found in Rust has to do with them. The first goal is being worked on implicitly by the 'Rustonomicon' folks, who are trying to characterize "unsafe" Rust properly. Unsafe Rust is not "free of the borrow checker" in practice, but it's the part of the language that lets you do things which might be UB. So the two are obviously related.
> In addition, we plan to connect Stacked Borrows with the formal Rust type system of RustBelt [Jung et al. 2018] to verify that in fact all safe Rust programs comply with Stacked Borrows.
> a demonstration that any program in the hypothetical dialect that can invoke undefined behaviour either uses `unsafe` or is rejected by the borrow checker.
This isn’t quite what they’re doing here. They’re trying to define what guarantees the compiler should make about its treatment of unsafe code to strike a balance between usability and the ability of the compiler to optimize.
> Has the third been done, or is someone working on it?
Can this be done though? The way I understand this: Either 1. the whole model is specified and can be proven, (which wouldn't need the second step) or 2. you can only find specific counter-examples, not demonstrate there aren't any. (That would be like a solution to the halting problem) Did I miss something?
If you're trying to analyse a program statically and find some property of its runtime behaviour, the halting problem only gets in the way when you need to be accurate about both whether the runtime behaviour does or doesn't have the property.
Here, it's OK if the borrow checker rejects some programs which the stacked-borrows runtime model would say don't in fact invoke undefined behaviour, so long as it doesn't accept any which do.
(And it surely will give some rejections of that sort, for example if I violate the borrowing rules in a function which is never called.)
Also, this is the primary reason that unsafe exists— there exists code that is both useful and correct but the borrow checker can’t properly verify. It’s a marker that tells the compiler it’s not responsible for verifying that code.
To clarify, the borrow checker is not disabled in unsafe blocks. Unsafe blocks just allow raw pointer dereferencing, which can be used to bypass the borrow checker.
Didn't really get it. Does it speed programs up, or allow better guessing or when borrow checking isn't actually violated? If the former, what kind of situations is the program sped up in?
It's a paper about improving compiler optimisations by allowing Rust to tell LLVM whether two pointers will or will not alias (refer to the same block of memory). This aliasing information can provide some pretty significant speedups in a variety of common operations (for instance, memmove becomes a memcpy). I'm not a compiler developer so I unfortunately can't give you a specific example.
I believe it's about creating a set of rules that when followed allow people to write `unsafe` code safely. It sounds like currently if you write `unsafe` code, the optimizer could introduce undefined behavior. This paper is creating a set of rules so that if your `unsafe` code follows those rules, there won't be undefined behavior.
No, that's not quite right. Unsafe code written today does not by default have undefined behaviour -- if it did, it would be a useless tool. You can write programs using unsafe that have undefined behaviour (of course) but that's a bug in the code written.
This paper isn't about that (nor is it necessarily about unsafe Rust). It's about compiler optimisations. Because Rust has very clear rules about references and borrows, the idea is to come up with a model for Rust code that allows for a particular optimisation (namely, to allow telling the compiler that two pointers do not alias -- do not refer to the same memory). Obviously unsafe code would need to make sure it obeys this new memory model, otherwise the code would be invoking undefined behaviour.
Yes, and extending that model to unsafe code. It would make violations of the model have undefined results, which allows the compiler to make optimizations that assume the model is followed even in situations where that can’t be statically proved.
Hopefully the programmer will be informed whenever the compiler decides it can treat some code as "undefined"? Otherwise, it can be the source of bad results.
> Hopefully the programmer will be informed whenever the compiler decides it can treat some code as "undefined"?
That's the wrong mental model. The compiler doesn't decide to treat some code as "undefined". Instead, the compiler assumes all code is not "undefined", and generates code according to that. The compiler might notice that the code is doing bad things, and warn the developer (or even return an error), but that's not guaranteed, since undefined behavior is generally things the compiler cannot easily check.
For an example, Rust guarantees that no two mutable references can reference the same place at the same time, even in unsafe code. If a function receives a pair of mutable references, the compiler assumes that they do not overlap. If they in fact overlap, the code generated by the compiler might do the wrong thing.
"The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.
Performance doesn't come from occasional small and odd
micro-optimizations. I care about performance a lot, and I actually
look at generated code and do profiling etc. None of those three
options have ever shown up as issues. But the incorrect code they
generate? It has."
In Rust, that’s more or less the meaning of the unsafe keyword. It marks code that does things the compiler can’t verify as correct, but it’s always been expected that unsafe code will maintain the invariants the Rust compiler uses. This is an attempt to formalize that general expectation.
My take on this: There are certain properties all rust code should have.
In safe rust, the compiler gives a mathematical proof this is so. In unsafe code, the compiler does not, but still assumes these properties are holding. The burden to make sure everything is correct falls on the shoulders of the programmers.
But nobody knows exactly what these properties are. There is a big part known, but the edges are fuzzy. So unsafe code is required to follow rules nobody knows.
This effort is part of the work to clarify the rules. If accepted, both the compiler and unsafe programmer knows how far they are allowed to go
No, it doesn't. There's a set of rules that are proven to have a set of desirable properties (from the viewpoint of compiler optimisations and programmer freedom, in a reasonable balance), and an accompanying interpreter that checks those rules, and turns undefined behaviour regarding aliasing into _runtime_ errors.