Sandboxed is better than unsandboxed, but don't mistake it for being secure. A sandboxed JSON parser can still lie to you about what's been parsed. It can exfiltrate data by copying secrets to other JSON fields that your application makes publicly visible, e.g. your config file may have DB access secret and also a name to use in From in emails. It can mess with your API calls, and make some /change-password call use attacker's password, etc.
You seem to have a very narrow understanding of the utility of language purity and effects systems.
Yes, the parser can lie to you. But the actual lying can only depend on the code you are parsing. No it can't just exfiltrate data by copying it into other messages.
I've said fields, not messages. It can exfiltrate data by copying it between fields of the single message it parses.
Imagine a server calling some API and getting `{"secret":"hunter2"}` response that isn't supposed to be displayed to the user, and an evil parser pretending the message was `{"error":{"user_visible_message":"hunter2"}}` instead, which the server chooses to display.
I'm trying to puzzle this one out a bit. Who are the good and bad actors in this threat model?
I wrote a server:
myServer = do
fetched : Bytes <- fetchFromExternalApi
let parsed : SecretResponse = jsonParse fetched
return parsed
This code is all mine except for the jsonParse which i imported from a nefarious library. If jsonParse returns a SecretResponse, then the code will compile. If jsonParse returns an ErrorResponse, it won't compile.
In more mature implementations a simple "doesn't parse" doesn't cut it. You may want to get specific error codes to know if you should retry the request, or blame the user for bad inputs, or raise an alarm because the API changed its schema unexpectedly. You'll also want to report something helpful to the end users, so they can understand the issue or at least have something useful to forward to your tech support, so you don't just get "the app is borken!! don't parse!!11".
JSON APIs often have a concept of an envelope that gives them a standard way to report errors and do pagination, so the message would have been parsed as some Envelope<SecretResponse>, or reparsed as an ErrorResponse if it didn't parse as the expected kind.
JSON is used in lots of places where lying about the content could cause trouble, and this is just one hypothetical example. I just want to bring attention to the class of attacks where a malicious dependency can lie through its normal API, and may have opportunity to turn its caller into a Confused Deputy instead of having to break out of the sandbox itself.
The change itself was very reasonable. They only missed the mark on how that change was introduced. They should have waited with it until the next Rust edition, or at least held back a few releases to give users of the one affected package time to update.
The change was useful, fixing an inconsistency in a commonly used type. The downside was that it broke code in 1 package out of 100,000, and only broke a bit of useless code that was accidentally left in and didn't do anything. One package just needed to delete 6 characters.
Once the new version of Rust was released, they couldn't revert it without risk of breaking new code that may have started relying on the new behavior, so it was reasonable to stick with the one known problem than potentially introduce a bunch of new ones.
But that is not how backwards compatibility works. You do not break user space. And user space is pretty much out of your control! As a provider of a dependency you do not get to play such games with your users. At least not, when those users care about reliability.
Meaning of this code has not changed since Rust 1.0. It wasn't a language change, nor even anything in the standard library. It's just a hack that the poster wanted to work, and realized it won't work (it never worked).
This is equivalent of a C user saying "I'm disappointed that replacing a function with a macro is a breaking change".
Rust had actual changes that broke people's code. For example, any ambiguity in type inference is deliberately an error, because Rust doesn't want to silently change meaning of users' code. At the same time, Rust doesn't promise it won't ever create a type inference ambiguity, because it would make any changes to traits in the standard library almost impossible. It's a problem that happens rarely in practice, can be reliably detected, and is easy to fix when it happens, so Rust chose to exclude it from the stability promise. They've usually handled it well, except recently miscalculated "only one package needed to change code, and they've already released a fix", but forgot to give users enough time to update the package first.
> has to specify whether function parameters are passed by value or by reference
You specify whether arguments are borrowed or moved, and whether the access is shared or exclusive. This is not an implementation detail, it's an API contract that affects semantics of the program. It adds or removes restrictions on the caller's side, and controls memory management and thread safety.
People unfamiliar with Rust very often misunderstand Rust's borrowed/owned distinction as reference/value, but in Rust these two aspects are orthogonal: Rust also has owning reference types, and borrowing types passed by copying. This misunderstanding is the major reason why novices "fight the borrow checker", because they try to avoid copying, but end up avoiding owning.
There are different possible approaches to achieving similar results for argument passing, but Rust prefers to be explicit and give low-level control. For example, Mutable Value Semantics is often cited as an alternative design, but it can't express putting temporary loans in structs. The syntax needs a place to declare lifetimes (loan scopes), as otherwise implicit magic makes working with view types tricky or impossible: https://safecpp.org/draft-lifetimes.html
Linus said he's worried that he and core kernel maintainers are getting old, and there may not be enough younger contributors to replace them. There's an implication there that Rust will continue to grow and attract more talent, and it'll get harder and harder to find devs passionate about old C codebases.
It’s absolutely a wild to me how this is not talked about more. All it takes is for a set of highly praised institutions to use Rust for their major courses and now you will have generations which C is just something you know about.
The Rust project started 15 years ago, and spent a decade growing userbase, library ecosystem, and proving that it's a serious language that's here to stay (which some C maintainers still don't believe).
We don't have a Rust-killer language yet. The closest one is SafeC++ (Circle), but it's still a single-dev proof of concept, and the C++ leadership firmly rejected it. Zig went in a different direction. Swift is adding Rust-like features, but it's unclear if that's going to be compelling. Ownership and borrowing is spreading to Mojo and Ocaml, but they're not kernel languages.
Even if there's a Rust-killer tomorrow, it will go through the same growing pains of rewriting everything and being treated as just a temporary hype. It will have to prove why use the new language instead of Rust that's already here, and had even more time to establish itself.
The type-system analysis of Rust is smart, but not restricted to the language per se, see https://github.com/ityonemo/clr. One merely has to have proper namespacing and necessary type info from generic code to do annotations and solve them. These things are solved in Rust via trait system.
Retroactively patching C to have namespaces will not work and same holds for generics, meaning concrete how to attach lifetimes to generic code.
> Zig went in a different direction.
There is stuff cooking for debugging comptime and better than lsp infos, but this is only wip and hearsay. Might be enough to write external static analysis or not.
There was also Cyclone before Rust, and Checked-C in the meantime. The concepts like regions and affine types existed long before Rust, and Rust started out by copying from older languages (http://venge.net/graydon/talks/intro-talk-2.pdf).
It's not enough to have just a proof-of-concept compiler that could match Rust's checks — that's where Rust was 10 years ago. Rust had time to polish its compiler, expand tooling, integrations, platform support, attract contributors, grow userbase, create learning materials, etc. To displace Rust of today you don't need to just match the old starting point, but offer something better by a margin large enough to offset the cost and risk of switching to a less mature language. That's the same problem that Rust is facing when trying to displace even more established C and C++.
Yes, I do agree with you. Just wanted to inform you that it may still be possible in principle and as you can see with development speed, build system, dependencies etc there is a potential angle.
It's valid to assume "it will never happen" for 128 bits or more (if the hash function isn't broken) since chance of a random collision is astronomically small, but a collision in 64 bits is within realm of possibility (50% chance of hitting a dupe among 2^32 items).
The birthday paradox is a thing. If you have 128 bits of entropy, you expect the 50% mark to be proportional to 64-bit keys, not 128 bits. 64 bits is a lot, but in my current $WORK project if I only had 128 bits of entropy the chance of failure any given year would be 0.16%. That's not a lot, but it's not a negligible amount either.
Bigger companies care more. Google has a paper floating around about how "64 bits isn't as big as it used to be" or something to that effect, complaining about how they're running out of 64-bit keys and can't blindly use 128-bit random keys to prevent duplication.
> bits of entropy
Consumer-grade hash functions are often the wrong place to look for best-case collision chances. Take, e.g., the default Python hash function which hashes each integer to itself (mod 2^64). The collision chance for truly random data is sufficiently low, but every big dictionary I've seen in a real-world Python project has had a few collisions. Other languages usually make similar tradeoffs (almost nobody uses crytographic hashes by default since they're too slow). I wouldn't, by default, trust a generic 1-million-bit hash to not have collisions in a program of any size and complexity. 128 bits, even with low enough execution counts to otherwise make sense, is also unlikely to pan out in the real world.
I agree that 128 bits is on the lower end of "never", but you still need to store trillions of hashes to have a one-in-a-trillion chance to see a collision (and that's already the overall probability, you don't multiply it by the number of inserts to get 1:1 chance :)
I don't think anybody in the world has ever seen a collision of a cryptographically strong 128-bit hash that wasn't a bug or attack.
Birthday paradox applies when you store the items together (it's a chance of collision against any existing item in the set), so overall annual hashing churn isn't affected (more hashes against a smaller set doesn't increase your collision probability as quickly).
Based on currently available public estimates, Google stores around 2^75 bytes, most of that backed by a small number of very general-purpose object stores. A lot of that is from larger files, but you're still approaching birthday-paradox numbers for in-the-wild 128-bit hash collisions.
Hashtables have collisions because they don't use all bits of hash, they calculate index=hash%capacity. It doesn't matter, how you calculate the hash, if you have only a few places to insert an item, they will collide.
Right, but the problem they were describing was storing the "hash" in a hash table, not storing the item using a hash. For that, it absolutely matters, and the fact that it was a 128-bit hash IMO isn't good enough because the hash function itself likely sucks.
Storing values at the edges of cells makes the math simpler, but unfortunately makes a GPU implementation harder.
In this setup one edge update affects two cells, so the cells are no longer trivially independent. It's still possible to update cells in parallel, but it requires splitting the update into two checkerboard-like passes.
It's true that it makes it more complicated on the GPU side in general, but specifically in this scenario everything works out just fine, mostly because all updates effectively ping-pong between flow & water height buffers, and you never change both in the same kernel.
C doesn't have syntax/typesystem to express ownership, lifetimes and thread safety, but Rust does.
Personally I'd love if C was extended to be able define APIs more precisely, but that's not a realistic option at the moment.
There isn't even any vendor-specific solution to this in C (the hodgepodge of attributes barely scratches the surface). Linux would have to invent its own interface definition language and make C and Rust adopt it. That means more changes for the C maintainers.
The C standard moves very slowly, and avoids making non-trivial changes. I'm still waiting for slices (pointer+length type).
Rust is here today, and won't stop for a dream of C that may or may not happen in 2050.
C has been extended in *exactly* that way using the `__((attribute))__` mechanism in GCC and clang. One does not need to wait for a standards body to do this.
This is one of the ways in which the static analyzer in clang works and they’re very open to further such extensions, which can be easily wrapped in macros so as to not affect compilers that don’t support them.
Do not let the perfect be the enemy of the good. It’s absolutely a realistic option, and it’s one that should be pursued if kernel code quality checking via static analysis is important.
Nothing in GCC or clang that exists today is anywhere near the level of expressiveness that Rust uses in its APIs (e.g. lifetimebound in [1]).
The current problem is not technical, it's maintainers not wanting to even think about another language in their C codebase. Adding non-standard foreign language's semantics on top of C, not supported natively by C compilers, and forced to have an even uglier syntax than Rust's own, would not make C maintainers happy.
You could add an __((attribute))__ that's equivalent to a code comment containing a Rust type, but that also would be strictly worse: less useful to Rust devs, still annoying to C maintainers who wouldn't want to maintain it nor worry about the attributes breaking another compiler for a different language.
I don't think it's feasible to use attributes to add functionality that actually does something meaningful, and doesn't look comically bad at the same time. Look at the zoo of annotations that clang had to add to work around lack of a built-in slice type in C, and that's just a simple thing that's written `[T]` in Rust:
https://clang.llvm.org/docs/BoundsSafety.html
Lifetime annotations are annoyingly noisy and viral even in Rust itself, which has a first-class syntax for them, and generics and type inference to hide the noise. In C they'd be even noisier, and had to be added in more places.
You can't really take safety annotations that are for Rust and reuse them for static analysis of C code itself. They're for a boundary between languages, not for the C implementation. Rust itself is essentially a static analyzer, with everything in the language designed from the ground up for the static analyzer. C wasn't designed for that. If you transplant enough of Rust's requirements to C, you'll make C maintainers write Rust, but using a botched C syntax and a C compiler that doesn't understand the actual language they're writing.
Static analysis is fundamentally limited by C's semantics (limited by undecidability, not mere engineering challenges of implementing a Sufficiently Smart static analyzer). It's not something that can be easily solved with an attribute here and there, because that will be brittle and incomplete[2][3]. Static analysis at that level requires removing a lot of flexibility from C and adding new formalisms, which again would not make C maintainers happy who already resist even tiniest changes of their code to align better with non-C's restrictions.
Sandboxed is better than unsandboxed, but don't mistake it for being secure. A sandboxed JSON parser can still lie to you about what's been parsed. It can exfiltrate data by copying secrets to other JSON fields that your application makes publicly visible, e.g. your config file may have DB access secret and also a name to use in From in emails. It can mess with your API calls, and make some /change-password call use attacker's password, etc.