For any would-be readers - please ignore the clickbait-y headline! I went into this expecting to hate-read someone complaining how Rust is too hard and they don't need it anyway because they never make memory management errors in C++ or something. This is nothing like that at all!
This is an excellent article that's about completely different issues that's much deeper and more interesting than that headline implies.
Agreed. I see this as more "foundational" thinking rather than focusing on specific languages or behavior. It seems to be a flawed model to begin with.
> The EverCrypt project did phenomenal work producing formally verified (including side-channel resistant) cryptography libraries but a formally verified program only has the properties that are proven if the axioms hold. If you don't have memory safety then these axioms don't hold.
This is not true. You can have formally verified programs in a non-memory safe language.
The author is mixing discussions of memory-safe languages vs memory-safe programs.
They make this "anti-distinction" because they actually want to run both kinds of programs (w/ and w/o compile-time memory guarantees) in a sandbox that itself provides the memory-safe guarantees. Of course, they will sell you that sandbox.
But who wrote the sandbox, and and here we are: what are its guarantees conditioned on:
> You can take for granted the fact that any out-of-bounds access and any use-after-free will trap. And that means that you can share an object with another compartment by passing it a pointer. You can rely on the fact that, if you pass another compartment a pointer to an on-stack object (or any pointer that you've explicitly marked as ephemeral), any attempt to capture that pointer will trap. You can share a read-only view of a buffer, by providing a pointer without write permission, or of a complex data structure by providing one without transitive write permission.
So, forgive my ignorance, but this is great for _your_ code, but not helpful for the code of others. They could still read the data from a buffer the not-so-safe code used to copy the data you sent to it, and no amount of "marking it ephemeral" can save you unless somehow the safety is transitory. This doesn't seem possible.
> vacuum-cleaner model of software development': point your vacuum cleaner at the Internet, vacuum up all of the components you need, and ship them. Only now you can audit exactly what each of those third-party components has access to. Because even assembly code has to follow the core rules for memory safety, you can write policies about what they should be able to access and audit them before you sign a firmware image.
Yeah - find. You allow openssl to see your data, then component X is exploited and reads data from an openssl _copy_ of that data. right?
To me, he means that if you take your formally verified program, the axioms do not hold in an environment without memory safety. He isn't saying your formally verified program is good in Rust but not in C, he's saying if you put it in the same memory space as an unsafe program, all bets are off. You can formally verify, for example, the BPF engine in Linux, and some other code in the kernel can void that guarantee.
> You can have formally verified programs in a non-memory safe language.
But you can't have formally verified libraries in a non-memory safe language; or rather, when you use such a library in a non-memory safe language, the proofs don't carry over because the ambient program may break the underlying assumptions at any point in execution.
It doesn't matter if the library carefully tracks its buffers if the main program can accidentally overwrite the library's book-keeping structures between two calls into the library.
"If addition doesn't work, I have no way of reasoning about anything in a program."
Hardware addition is filled with edge cases that cause it to not work as expected; I don't see it as that much different from the memory safety edge cases in most programming models. So by corollary is there no way to reason about any program that uses hardware addition?
>Hardware addition is filled with edge cases that cause it to not work as expected
I am not aware of a single case where integer addition does not work as expected at the hardware level. Of course if you have a different expectation than what the hardware does it could be called "unexpected", but I would classify this more as "ignorance". I think we can reword it as "addition must be predictable and consistent".
This is not the case in standard C, because addition can produce a weird value that the compiler assumes obeys a set of axioms but in fact doesn't, due to hardware semantics. Most C compilers allow you to opt into hardware semantics for integer arithmetic, at which point you can safely use the result of addition of any two integer values. That is really the crux of the matter here - if I write "a = b + c", can I safely examine the value "a"? In C, you cannot. You must fist make sure that b and c fulfil the criteria for safe use with the "+" operator. Or, as is more usual, close your eyes and hope they do. Or just turn on hardware semantics and only turn them off for specific very hot regions of the code where you can actually prove that the input values obey the required criteria.
The difference is that hardware will treat addition in a predictable and consistent manner, but C and C++ will not. In C and C++, if you overflow two signed integers, the behavior of your entire program is undefined, not simply the result of the addition.
That's the difference between something being safe and local but perhaps it's unexpected because you lack the knowledge about how it works, and something being unsafe because there is absolutely no way to know what will happen and the consequences can be global.
> The difference is that hardware will treat addition in a predictable and consistent manner, but C and C++ will not. In C and C++, if you overflow two signed integers, the behavior of your entire program is undefined, not simply the result of the addition.
On the same hardware, yes, but the same C or C++ program may behave differently on different hardware specifically because the C abstract machine doesn't define what's supposed to happen. This leaves it up to (to your point) the compiler or the hardware what happens in, e.g., an overflow condition.
If you're planning on running the program on more than one CPU revision then I'd argue it introduces a similar level of risk, although one that's probably less frequently realised.
Leaving the behavior up to the compiler (or hardware) is not undefined behavior, that is unspecified behavior or implementation defined behavior.
Undefined behavior really does mean that the program does not have valid semantics. There is no proper interpretation of how the program is to behave if signed overflow happens. It's not simply that the interpretation of the program is beyond the language and left to the hardware or the operating system, it's that the program's behavior is undefined.
Google also makes claims[1] corroborating this ratio in their own work. It's an endless ammo supply for the memory safety argument. (The linked story doesn't argue otherwise, despite the headline.)
That is correct. Measurements including all vulnerabilities have memory safety ones somewhere in the ballpark of 15%. A pretty extreme minority overall.
This doesn’t gel with the hive mind though, so you probably won’t see it pop up terribly often.
Then you get headlines like this: "Microsoft: 70 percent of all security bugs are memory safety issues", which is flat out wrong and contradicts itself the first line of the article.
The HN post that we are commenting on even says "To me, the fact 70% of security vulnerabilities arise from a lack of memory safety is not the reason that memory safety is important." which is also false and certainly needs clarification.
Perhaps the correct statement is something like "70% of security vulnerabilities in large projects written in C or C++ are memory safety issues", with the understanding that there are both a lot of software that are and aren't subject to that statistic.
Achieving memory safety is fine. But there are other kinds of remote code execution vulnerability. For example when the program blithely accepts some data from an untrusted source, then executes it. We should seek to mitigate all such attacks, not just the sexy one.
A big blue button called "Continue" will in fact download and run a LinkedIn app (I think). To actually "Continue" (i.e. carry on reading using Chrome on my Android device) you have to press the other less obvious button. How ridiculous is that? Dark patterns from ostensibly reliable actors is really annoying.
This is an excellent article that's about completely different issues that's much deeper and more interesting than that headline implies.