"The purpose of this event is to provide information on the TRACTOR technical goals and challenges, address questions from potential proposers, and provide an opportunity for potential proposers to consider how their research may align with the TRACTOR program objectives."
That sounds ... hard. Especially as idiomatic Rust as written by skilled programmers looks nothing like C, and most interesting code is written in C++ anyway.
Isn't it equivalent to statically determining the lifetimes of all allocations in the C program, including those that are implemented using custom allocators or which cross into proprietary libraries? There's been a lot of research into this sort of thing over the years without much success. C/C++ programs can do things like tie allocation lifetimes to what buttons a user clicks, without ref counting or other mechanisms to ensure safety. It's not a good idea, but, they can do it.
The other obvious problem with trying to write such a static analysis is that the programs you're analyzing are by definition buggy and the lifetimes might not make sense (if they did, they wouldn't have memory safety holes and wouldn't need to be replaced). The only research I've seen on this problem of statically detecting what lifetimes should be does assume the code being analyzed is actually correct to begin with. I guess you could try and aim for a program that detects where lifetimes can't be worked out and asks the developer for help though.
DARPA is basically a state-sponsored VC that optimizes for completely different things. Instead of looking for 100x financial returns, they want technical advantages for the United States. The "moat" is the hardness of developing and operationalizing those technologies first.
Decades ago, as my father explained to me, ARPA (no "D" at that time) was happy if 1% of their projects went all the way through to successful deployment. If they had a higher success rate it would mean they weren't aiming high enough.
Yeah, I meant by number. But also: ARPA didn't commercialize the Internet! They explicitly refused to commercialize it; commercialization only happened after an Act of Congress induced interconnections between NSFNET and commercial networks.
I can't find any clear references to DARPA (or ARPA) being involved in Ada's development. It was a DoD program but, well, the DoD is notoriously large and multi-headed.
(But even if DARPA was involved in Ada: I think it's clear, at this point, that Ada has been a resounding success in a small number of domains without successfully breaking into general-purpose adoption. I don't have a particular value judgment associated with that, but from a strategic perspective it makes a lot of sense for DARPA to focus program analysis research on popular general-purpose languages -- there's just more labor and talent available.)
it was depressing when RH dropped ada support.
sure, it was gcc, but it was so nice to have an ada compiler part of the default gcc installation.
gnat needs money. well deserved. but adoption needs a free, easy to install compiler.
5 years ago i had the pleasure of resurrecting a dead system. it was about 30k of ada, lets call it ada 87 (!). unknown compiler, 32 bit, 68K processor, 16 MB memory, unknown OS.
code was compiling in 2 days, running in 2 weeks. i needed to change from using 32 bit floats to 64 bit floats (seems positional data is a little more accurate in 2020). 1 declaration in 1 package spec and a recompile, and all my positions are good.
npm ERR! install Couldn't read dependencies
npm ERR! package.json ENOENT, open '/boeing/787-9/flaps-up.json'
npm ERR! package.json This is most likely not a problem with npm itself.
npm ERR! package.json npm can't find a package.json file in your current directory.
speaking of hard, the DOE actually funds a project that has been around for 20+ years now (ROSE) that involves (among other things) doing static analysis on and automatically translating between C/C++/Cuda and even high level languages like Python as well as HPC variants of C/C++. They have a combined AST that supports all of those languages with the same set of node types essentially. Quite cool. I got to work on it when I was an intern at Livermore, summer of 2014.
I have already seen legacy projects that were designed using Rational Rose, but for some reason I thought it was only a commercial name, not an actual system. Thanks, I learned something today !
Most of what they use it for is static analysis, but the funding comes from its ability to translate old simulation code to HPC-ready code. I think they even support fortran IIRC
I have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.
If you think it's hard wrestling with the borrow checker, just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior. A classic problem of writing compilers is that the space of valid programs is much larger than the space of programs which will compile.
A quick web search reveals some other efforts, such as c2rust [1]. I wonder how TRACTOR differs.
> have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.
That’s not what they are aiming for. FTA: “The goal is to achieve the same quality and style that a skilled Rust developer would produce”
> just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior
Nitpick: undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.
(Doing that translation in such a way that the behavior remains what gcc, clang or “most C compilers” do may be harder, but I’m not sure of that)
> undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.
That's the kind of language lawyer approach that caused a rebellion in the last decade amongst C programmers against irresponsible compiler optimizations. "Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's written to exploit loopholes".
I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't. But hell, even if the result is unreliable in practice, I suppose that if somebody gets to claim "it works" then the incentives are aligned to produce garbage.
> Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's relying written to exploit loopholes".
If your program invokes undefined behaviour, it's invalid and non-portable. Out of bounds array accesses are UB, yet a program containing them may just happen to work.
It won't be portable even between different compiler versions.
The C standard is a 2 way contract: the programmer doesn't produce code that invokes undefined behaviour, and the compiler returns a standard conforming executable
If undefined behavior is invalid, then reject the program instead of "optimizing" it. This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious. Null pointer check deletion can turn bugs into exploitable vulnerabilities.
> If undefined behavior is invalid, then reject the program instead of "optimizing" it.
Undefined behavior is usually a result of runtime situation, it is usually not obvious from just the code whether it could or could not happen, so the compiler cannot reject the program.
The 'UB-based' optimization is just assumption that the code is correct and therefore UB-situation could not happen in runtime.
The C++ forward progress guarantee enables more optimizations since it allows the compiler to reason more easily about loops:
> The standards added the forward progress guarantees to change an optimization problem from "solve the halting problem" to "there will be observable side effects in the forms of termination, I/O, volatile, and/or atomic synchronization, any other operation can be reordered". The former is generally impossible to solve, whereas the latter is eminently tractable.
But yeah, that's one of the more foot-gunny UB rules that Rust does not have. But it does mean it doesn't mark functions as `mustprogress` in LLVM IR which means it misses out on whatever optimizations that enables.
> This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious.
You significantly underestimate how much UB people write and overestimate the end-result if the current approach would not be taken.
The C standard with its extensive undefined behavior causes programmers and compiler writers to be at odds. In a sane world, "undefined behavior" wouldn't be assumed to mean "the programmer must have meant for me to optimize this whole section of code away". We aren't on the same team, even if I believe that all parties are acting with the best of intentions.
I don't feel that the Rust language situation incentivizes such awful conflict, and it's one of many reasons I now try really hard to avoid C and use Rust instead.
A funny thing about this problem is that it gets worse the more formally correct your implementation is. Undefined behavior is undefined, so it's outside the model, and if your program is a 100% correct implementation of a model then how can it know what to do about something outside it?
But I don't think defining all behavior helps. The defined behavior could be /wrong/, and now you can't find it because the program using it is valid, so it can't be detected with UBSan.
Doing one funny thing on platform A and a different funny thing on platform B when an edge case arises is way better than completely deleting the code on all platforms with no warning.
> I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't.
I don’t see any way it can do otherwise. As a simple example, what would one translate this C statement to:
int i;
…
i = abs(i);
? I would expect TRACTOR to generate (assuming 64-bit integers):
let i: i64;
…
i = abs(i);
However, that can panic in debug mode and return a negative number in release mode (https://doc.rust-lang.org/stable/std/primitive.i64.html#meth...), and there’s no way for TRACTOR to know whether that makes the program “work as intended”. That code may have worked fine/fine enough) for decades because its standard library returns zero for abs(INT_MIN).
It's possible to preserve the semantics of the original program using unsafe Rust. [1]
unsafe {
let mut i: std::os::raw::c_int
= std::mem::MaybeUninit::uninit().assume_init();
// ...
i = libc::abs(i);
}
That's grotesque, but it is idiomatic Rust insofar as it lays bare many of the assumptions in the C code and gives the programmer the opportunity to fix them. It is what I would personally want TRACTOR to generate if it could not prove that `i` can never take on the value `libc::INT_MIN`.
Given that generated code, I could then piecemeal migrate the unsafe bits to cleaner, idiomatic safe rust: possibly your code but more likely `i::wrapping_abs()` or similar.
What will TRACTOR choose? At least for this example, they don't have to choose inappropriate pruning of undefined behavior. They claim the following:
> The goal is to achieve the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities present in C programs.
If they're going to uphold the same "quality", the translation you presented doesn't cut it. But you may be right and they will go down the path of claiming that a garbage translation is technically valid under undefined behavior and therefore”quality” — if so, I will shun them.
> It's possible to preserve the semantics of the original program using unsafe Rust
Because of the leeway the C standard gives you, you can preserve the semantics of the C program by just calling abs, and I think that’s the best you can do.
What the compiler does may be different for different compilers, different compiler versions or different compilation flags, so if all you have is the C source code, there’s no way to preserve the semantics of the machine code that the C compiler generates.
You could special-case all of them, but even then, there is the problem that a C compiler, even in a single translation unit, can inline one call and then apply some transformations while compiling another call to a call to a library function, making the semantics of overflow in one location different from that in another.
If you want to replicate that, I’d say you aren’t writing a C to rust translator, but a (C + assembly) to rust translator.
Also, if you go this route, you’d have to do similar gnarly stuff for all arithmetic on integers where you cannot prove there will not be overflow. I would not call the resulting code idiomatic rust.
What you describe is antithetical to idiomatic Rust, written by a skilled Rust programmer.
To uphold the spirit of Rust, a C program must go through a process where assumptions are laid bare and footguns are dismantled. Applying an automatic process which arbitrarily changes the behavior from the implementation-dependent compilation of a C program just gets you a messy slop of hidden bugs collected inside an opaque, "safe" garbage can.
You don't get to Rust's reliability by applying a translation which discards it!
> Also, if you go this route, you’d have to do similar gnarly stuff for all arithmetic on integers where you cannot prove there will not be overflow.
Damn straight. That's what C is! It was always this bad, as those of us who have struggled to control it can attest. Faithful translation to unsafe Rust just makes it obvious.
Mmm, I went back and read the docs for MaybeUnit more carefully and that's a good point.
It may be better to just leave the assignment off the declaration. If the variable is read before it's initialized to something, we'll get a Rust compilation error, forcing programmer intervention. Detecting actual bugs that would result in memory errors and forcing them to be resolved is very much in the spirit of Rust. TRACTOR may aspire to gift C programs with memory safety for free, but it won't always be possible.
Of course if TRACTOR can determine through static analysis that the unitialized read can't cause problems, it might emit different code.
> undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.
You assume that the compiler can determine what behavior is undefined. It can't. C compilers don't just look at some individual line of the program and say "oh, that's undefined, unleash the nasal demons". C compilers look at code, reason that if such-and-such variable has a certain value (say, a null or invalid pointer), then such-and-such operation is undefined (say, dereferencing that variable), and therefore on the next line that variable can be assumed not to have that bad value. Despite all the FUD, this is a very limited power. C compilers don't usually know the actual values in question, all they do is exclude some invalid ones.
I (not the person you are replying to) do understand that's how compilers interact with UB. However, a wealth of experience has shown us that the assumption "UB doesn't occur" is completely false. It is, in my opinion, quite irresponsible for compiler writers to continue to use a known-false assumption when building the optimizer. I don't really care how much speed it costs, we need to stop building software on a shaky foundation like that.
Soon (or actually, already) we'll have MTE and CHERI, and then that C undefined behavior will be giving you security improvements as well as speed improvements.
Can't design a system that 100% crashes on invalid behavior if you've declared that behavior is valid, because then someone is relying on it.
I have to think the approach will be something like "AI summarizes the features of the program into some kind of technical language, then the AI synthesizes Rust code that covers the same feature set".
It would be most interesting if the approach was not to feed the program the original program but rather the manual for the program. That said it's rare that a manual captures all of the nuances of the program so a view into the source code is probably necessary, at least for getting the ground truth.
"AI more or less sort of summarizes the features of the program into some approximate kind of technical language, then the AI synthesizes something not too far from Rust code that hopefully covers aspirationally the same feature set".
Ghidra, which is decompilation software, already manages to produce almost-valid C from assembly, and it does so without AI. I know nothing about how it works, but just from that, I'm guessing that producing almost-valid Rust from C code would be a simpler problem to solve.
In theory, a codebase is a language precisely describing a program. The same program can be described in other languages. So that’s what you’re asking the LLM to do, in the same way you can describe a flower in either English or Spanish.
Write tests for your C code. Run c2rust (mechanical translation), including the tests. Let a LLM/MCTS/verifier loop go to town. Verifier here means it passes compiler checks, tests, santiziers and miri.
Additional training data can be generated by running mrustc or by inlining unsafe code (from std/core/leaf crates) into safe code and running semantics-preserving mechanical refactorings on the code.
I did mention using sanitizers in the verification step of the optimization loop. The optimization goal here would be reducing the lines of `unsafe` while preserving program semantics.
Just to be clear to others, Dan is the darpa PM on this - he convinced darpa internally it was worth funding other people to do the work, so he himself / his research group won't be doing this work. He's on leave from Rice for a few years to be a PM at DARPA's I2O.
And while DARPA doesn't directly care about research publications as an outcome, there's certainly a publishable research component to this, as well as a lot of lower papers-per-$ engineering and validation work. A lot of the contracts they hand out end up going to some kind of contractor prime (BBN, Raytheon, that kind of company) with one or more academic subs. The academic subs publish.
Really?! The Linux kernel is a _pretty enormous_ counterexample, as are many of the userland tools of most desktop Linux distros.
I am also a key developer of an entirely-written-in-C tool which I'd venture that [a large fraction of desktop Linux users in corporate environments use on a regular basis](https://gitlab.com/openconnect/openconnect).
The refusal to use C++ in Linux isn't entirely rational. Nobody else makes that decision. Other kernels are a mix of C and C++ (macOS/iOS, Windows, even hobby operating systems like SerenityOS).
Then you get into stuff that's not kernels and the user-spaces are again mostly all C++. The few exceptions that exist are coming out of the 90s UNIX culture, stuff like Apache or nginx. Beyond that it's all C++ or managed languages.
Lowering is typically easier than lifting (or brightening). When you lower, you can erase higher-level semantics that aren't relevant; when you lift, you generally want to compose lower-level program behaviors into their idiomatic (and typically safer) equivalent.
COBOL migrations are tar pits of replicating 40+ years of undocumented niche business logic for a given field, edge cases included, that was "commonly understood" by people who are now retired or dead. Don't get your hopes up.
MicroFocus has COBOL compilers for Java and .NET, as do other COBOL vendors still in business.
Usually the biggest issue, is that most of the porting attempts don't start there, rather they go for the rewritte from scratch, and lets not pay the licenses for those cross-compilers.
If anything, I think self the failure to hit L5 driving after billions of dollars and millions of man hours invested is probably reflective of how automatic C to Rust translation will go. We'll cruise 90% of the way, but the last 10% will prove insurmountable with current technology.
Think about the number of C programs in the wild that rely on compiler-specific or libc-specific or platform-specific behavior, or even undefined behavior plus the dumb luck of a certain brittle combination of {compiler version} ∩ {libc version} ∩ {linker version} ∩ {build flags} emitting workable machine code. There's a huge chunk of C software where there's not enough context within the source itself (or even source plus build scripts) to understand the behavior. It's not even clear that this is a solvable problem in the abstract.
None of that is to say that DARPA shouldn't fund this. Research isn't always about finding an industrial strength end product; the knowledge and expertise gained along the way is important too.
This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or... Self driving cars will be solved. There is at least one general purpose computer that can solve it already (a human brain), so of a purpose built computer can also be made to solve it.
In 10 (or 2 or 50 or X) years when Chevy, Ford, and others are rolling out cheap self driving this argument stops working. The important thing is that this argument stops working with no change in how hard C to Rust conversion is.
We really should be looking at the specifics of both problems. What makes computer language translation hard? Why is driving hard? One needs to be correct while inferring intent and possibly reformulating code to meet new restrictions. The other needs to be able to make snap judgments and in realtime avoid hitting things even if it just means stopping to prefer safety over motion. One problem can be solved piecewise without significant regard to time and the other solved in realtime as it happens without producing unsafe output.
These problems really aren't analogous.
I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.
I am not saying this problem is easy, just that it seems solvable with sufficient effort.
I'd put money on the solutions to said problems looking largely the same though - big ass machine learning models.
My prediction is that a tool like copilot (but specialized to this domain) will do the bulk of source code conversions, with a really smart human coming behind to validate.
With you, except for the conclusion "[ the tool ] will do the bulk of source code conversions, with a really smart human coming behind to validate".
The director orders the use of the tool when the dev team got downsized (and the two most-seniors left for greener pastures just after that). Validation is in the "extensive" tests anyway, we have those, right, so the new intern shall have a look, make it all work (fudge the tests where possible and remove the persistently failing ones as they've probably been always broken). The salesman said it comes from the DOA or DOD or something. If the spooks can do it so can we.
> This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or...
Which are things that took 20 or 50 years longer than expected in some cases.
> I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.
But C to Rust translation is a big and only partially solved problem.
Ok, but if it's like 90% of small projects can use it as direct no pain bridge, that can be a huge win.
Even if it's "can handle well 90%" of the transition for any project, this is still interesting. Unlike cars on the road, most code transition project out there doesn't need to be 100% fine to provide some useful value.
Even if every project can only be 90% done, that’s a huge win. Best would be if it could just wrap the C equivalent code into an unsafe block which would be automatically triaged for human review.
Just getting something vaguely Rust shaped which can compile is the first step in overcoming the inertia to leave the program in its current language.
c2rust exists today, and pretty much satisfies this. I've used it to convert a few legacy math libraries to unsafe rust, and then been able to do the unsafe->safe refactor in the relative comfort of the full rust toolset (analyser + IDE + tests)
There is real utility in slowly fleshing out the number of transforms in a tool like c2rust that can recognise high-level constructs in C code and produce idiomatic safe equivalents in rust
"real" (large) C/C++ programs get much of their complexity from the fact that it's hundred of "sources" (both compiled and libraries) that sometimes, or even often, share global state and at best use a form of "opportunistic sharing". Global variables are (deliberately, and justifiedly-so) hard in rust, but (too) trivial in C/C++, cross-references / pointer chains / multi-references likewise. And once you enter threading, it becomes even harder to output "good" rust code - you'd have to prove func() is called from threaded code and should in rust best take Arc<> or some such instead of a pointer.
It'll be great for "pure" functions. For the grimey parts of the world, funcs taking pointer args and returning pointers, for things that access and modify global data without locks, for threaded code with implicit (and undocumented) locking, the tool would add most value. If it can. Even only by saying "this code looks grimey. here's why. A bit of FFI will also be thrown in because it links against 100 libraries. I suggest changes along those lines ... use one of the 2000000 hint flags to pick-your-evil".
In addition to the other replies, this is a one-time project. After everything (or almost everything) has been translated, you're done, you won't be running into new edge cases.
> You can attach about a hundred asterisks to that.
Not in San Francisco. There are about 300 Waymo cars safely driving in one of the most difficult urban environments around (think steep hills, fog, construction, crazy traffic, crazy drivers, crazier pedestrians). Five years ago this was "someday" science-fiction. Frankly I trust them much more then human drivers and envision a future utopia where human drivers are banned from urban centers.
To get back on topic, I don't think automatic programming language translation is nearly as hard, especially since we have a deterministic model of the machines it runs on. I can see a possible approach where AI systems take the assembler code of a C++ program, then translate that into Rust, or anything else. Can they get 100% accuracy and bit-for-bit compatibility on output? I would not bet against it.
Opinions about automated driving systems vary. Just from my own experience doing business all around San Francisco I have seen at least a half dozen instances of Waymo vehicles making unsafe maneuvers. Responders have told me and local government officials that Waymo vehicles frequently fail to acknowledge emergency situations or respond to driving instructions. Driving is a social exercise which requires understanding of a number of abstractions.
Isn't 100% accuracy (relatively) easy? c2rust already does that, or at least comes close, as far as I know.
Getting identical outputs on safe executions, catching any unsafe behavior (at translation-time or run-time), and producing efficient, maintainable code all at once is a million times harder.
Well, Claude 3.5 can do translation from one language to another in a fairly competent manner if the languages are close enough. I've used it for that task myself with success (Java -> JavaScript).
But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.
Personally I think the most pragmatic way to make C/C++ memory safe quicker is one of two approaches:
1. Incrementally. Make std::vector[] properly bounds checked (still not done even in chrome!), convert allocations to allocations that know their own size and do bounds checking e.g. https://issues.chromium.org/issues/40285824
2. Or, go the whole hog and use runtime techniques like garbage collection and runtime bounds checks.
A good example of approach (2) is Managed Sulong, which extends the JVM to execute LLVM bitcode directly whilst exposing to the C/C++/FORTRAN a virtualized Linux syscall interface. The whole piece of code can be sandboxed with permissions, and memory safety errors are caught at runtime. The compiler tries to optimize out as many bounds checks as possible. The interesting thing about this approach is it doesn't require big changes to the source code (as long as it's already been ported to Linux), which means the work of making something safe can be done by teams independent of the original authors. In practice "rewrite it in Rust" will usually mean a fork, which introduces lots of complicated technical, cultural and economic issues.
Managed Sulong is also a research project and has a bunch of problems to solve, for instance it needs to lose the JITC dependency and go fully AOT compiled (doable, there's no theoretical issue with it and much of the needed infra already exists). And performance/memory usage can always be improved of course, it regresses vs the original C. But those are "just" systems engineering problems, not rewrite-the-world and solve-static-analysis problems.
Disclosure: I do work part time at Oracle Labs which developed Managed Sulong, but I don't work on it.
> But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.
That might not be too bad.
A combination of a formal system and an LLM might work here. Suppose we see a C function
void somefn(char* buf, int n);
First question: is "buf" a pointer to an array, or a pointer to a single char? That can be answered by looking at what the function does with "buf", and what callers pass to it.
If it's an array, how big is it? We don't have enough info to know that yet. But a reasonable guess, and one than an LLM might make, is that the length of buf is "n".
Following that assumption, it's reasonable to translate this to Rust as
fn somefn(buf: &[u8])
and, if n is needed within the function, use
buf.len()
The next step is to validate that guess. The run-time approach is to write all calls to "somefn" with
assert!(buf.len() == n);
somefn(buf, n);
Maybe formal methods can prove the assert true, and we can take it out. Or if a SAT solver or a fuzz tester
can generate a counterexample, we know that the guess was wrong and this has to be done the hard way, as
fn somefn(buf: &[u8], int n)
implying more subscript checks inside "somefn".
The idea is to recognize common C idioms and do clean translations to Rust for them. This should handle a high percentage of cases.
Yes, this is similar to what IntelliJ does for Java->Kotlin. Do a first pass that's extremely non-idiomatic and mechanical, then do lots of automated refactoring to bring it closer to idiomatic.
But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust. That way code can be ported file-at-a-time or even function-at-a-time, and so you'll have a chance to run the assertions in the context of the original code. Which of course may not have good test coverage, as C codebases often don't, so you'll have to be testing your assertions in production.
std::vector [] has had bounds checking since forever if you set the correct compiler flag. Since they aren't using it this is a choice, presumably they prefer the speed gain.
You mean _GLIBCXX_DEBUG? It's got some issues. Linux only, it doesn't always work [1] and it's all or nothing. What's really needed is the ability to selectively opt-out on a per-instantiation level so very hot paths can keep the needed performance whilst all the rest gets opted into safety checks.
but it doesn't seem to actually make std::vector[] safe.
It's frustrating that low hanging fruit like this doesn't get harvested.
[1] "although there are precondition checks for some string operations, e.g. operator[], they will not always be run when using the char and wchar_t specializations (std::string and std::wstring)."
With MSVC you can use _CONTAINER_DEBUG_LEVEL=1 to get a fast bounds check that can be used in release builds. Or just use it in development to catch errors.
> We talked about this at the weekly maintainer meeting and decided that we're not comfortable enough with the (lack of) design of this feature to begin documenting it for wide usage.
As far as I am aware, the standard doesn't mandate bounds checking for std::vector::operator[] and probably never will for backwards compatibility reasons. Most standard library implementations have opt-out std::vector[] bounds checking in unoptimized builds, but not in optimized builds.
I tried a toy example with GCC [1], Clang [2], and MSVC [3], and none of them emit bounds checks with basic optimization flags.
As I said you need the correct flag set.. MSVC use _CONTAINER_DEBUG_LEVEL=1 and it can be used in release. They have had this feature since 2010 or so, though the flag name has changed.
In my experience claude.ai has near perfect grasp of what a program (that fits its window) is written to do. It can already make a program in another language that can do the same. What this means is that the cost of a full rewrite is going to come down dramatically over the next few years.
This is an excellent example of government action I like to see, as it isn't about favoritism or the swamp dynamics. Just provide a target, a bounty and no or low barriers to entry.
This challenge does push Rust out in front of everybody. That's a mixed blessing. I hope this challenge gets modified to not specify the target language, but instead the requirement of memory and type safety. Rust is likely an intermediate stop on the way to something better, and it shouldn't matter if that language is called Rust 2.0 or something else.
As a reminder, DARPA funded self-driving car research since at least the 1980s with the Autonomous Land driven Vehicle (ALV) project, plus the DARPA Grand Challenges, and more.
I have been aware of this proposed initiative for some time and I find it interesting that it is now becoming public. It is a very ambitious proposal and I agree that this level of ambition is appropriate for DARPA's mission and I wish them well.
As a Rust advocate in this domain I have attempted to temper the expectations of those driving this proposal with due respect to the feasibility of automatic translation from C to Rust. The fundamental obstacle that I foresee remains that C source code contains less information than Rust source code. In order to translate C code to Rust code that missing information must be produced by someone or something. It is easy to prove that it is impossible to infallibly generate this missing information for the same reason that scaling an image to make it larger cannot infallibly produce bits of information that were not captured by the original image. Instead we must extrapolate (invent) the missing information from the existing source code. To extrapolate correctly we must exercise judgement and this is a fallible process especially when exercised in large quantities by unsupervised language models. I have proposed solutions that I believe would go some way towards addressing these problems but I will decline to go into detail.
Ultimately I will say that I believe that it is possible for this project to achieve a measure of success, although it must be undertaken with caution and with measured expectations. At the same time it should be emphasized it is also possible that no public result will come of this project and so I caution those here against reading too much into this at this time. In particular I would remind everyone that the government is not a singular entity and so I would not interpret this project as a blanket denouncement against C or vice versa as a blanket blessing of Rust. Each agency will set its own direction and timelines for the adoption of memory-safe technologies. For example NIST recommends Rust as well as Ada SPARK in addition to various hardened dialects of C/C++.
> In order to translate C code to Rust code that missing information must be produced by someone or something.
If you don't go for preserving the formal semantics of C code and instead only require the test-suite to still pass after translation that can provide a lot of wiggle room for the translation. This is how oxidation projects often work in practice.
Fuzzers can also help with generating additional test data to get good branch coverage.
I'm personally not a fan of "rewrite the world in Rust" mentality, but that being said, if one is planning to port a project to a new language or platform, mechanical translation is a poor means of doing so. Spend the time planning better architecture and designing a better software system, and find a way to replace it piece by piece. Don't build a castle in the sky, because it will never reach the ground. If you've decided to use Rust for this system, that's fine. But, write Rust. Don't try to back-port C into Rust.
I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety. One gets the same safety as a gradual Rust rewrite, but the code base, knowledge base, and developers can be maintained.
> I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety.
No chance. CBMC is amazing, but have you actually tried formally verifying a "real" program?
I agree replacing with a hand-architected Rust version is clearly the better solution but also more expensive. I think they're going for an RLBox style "improve security significantly with little-to-no effort" type product here. That doesn't mean you shouldn't do a full manual rewrite if you have the resources, but it's better than nothing if you haven't.
> No chance. CBMC is amazing, but have you actually tried formally verifying a "real" program?
Yes. Every day. It's actually quite easy to do. Write shadow methods covering the resources and function contracts of called functions, then verify the function. Repeat all of the way up and down the stack. It adds about 30% overhead over just TDD development.
Last time I tried CBMC, it ended up running out of memory for relatively small programs, do you encounter any resource usage issues with it? I'm learning Frama-C and I find it more predictable, although the non-determinism of solvers shocked me when I first tried to prove non-trivial programs. I guess ideally I would like something even more explicit than Frama-C.
CBMC works best on functions, not programs. You want to isolate an individual function, then provide shadows of the functions it calls. The shadows should have nondeterministic behavior (cover every possible error condition) and otherwise follow the same memory and resource rules as the original function. For instance, if shadowing a function that reads a buffer, the shadow should ensure full buffer access as part of its assertions.
The biggest issue you will run into with bounded model checking is recursion and looping. In these cases, you want to refactor the code to make it easier to formally verify outside of the loop. Capture and assert on loop variants / invariants, and feed these forward in assertions on code.
There's no way I can capture all of this in an HN comment, but to get CBMC to work, you need to break down your code.
Thanks, that was really helpful. Relying on getting shadow functions right does seem icky, but I guess the improved productivity of CBMC should make up for it. Definitely going to give it another chance!
You're welcome. I've been meaning to write a blog article on the subject, because it is a subtle thing to get working.
Think of shadow functions as the specifications that you are building. Unlike proof assistants or Frama-C, you write specifications in C itself, and they work similarly to code. Often, the same contracts you write in these specifications can be shared by both the shadow functions and the real functions they shadow.
I take a bottom-up approach to model checking. I'll start by model checking the lowest level code, then I'll shadow this code to model check code that depends on it. In this way, I can increase the level of abstraction for model checking, focusing just on the side effects and contracts of functions I shadow, and move up the stack toward more and more general code.
What do you mean by "non determinism of solvers"? AFAIK, unless your proof finishes really close to the timeout, it is pretty uncommon that a failed PO suddenly succeeds and vice-versa if the code/the annotation are not modified.
Bounded model checking has changed things. C on its own can't solve these problems. Likewise, Rust on its own -- while it can solve memory errors -- can't demonstrate safety from all errors that lead to CVEs.
Practical formal methods using a tool like CBMC can make C safer. The existing code base can be made safer without porting it to a new language or using experimental mechanical translation. This isn't just something for C. Such tools exist for many languages now, including Rust, so that even Rust can be made safer.
> The amount of people using stuff CBMC is like trying to boil the ocean.
That's like saying, "Getting everyone to use Rust or TDD or X is like trying to boil the ocean."
It's impossible to solve all things for all people at once. But, that doesn't mean that we can't advocate for tooling that can be used today to build safer software. This goes beyond C, as such tools and techniques are being ported to many languages and platforms.
Rust is a solution that works for some people. Modern C with bounded model checking is another solution that works for some other people. I'm certainly not going to change the minds of folks who have decided to port a project to Rust and who are willing to spend the engineering budget for this. But, hopefully, I can convince someone to try bounded model checking instead of maintaining the status quo. Because, the status quo is where we are with projects like the Linux kernel. Linux may pay lip service to Rust folks and allow them to write some components in that language, but the majority of the kernel is still in C and is not being properly vetted for these vulnerabilities, as we can see with the stream of CVEs coming out weekly.
> WG14 can solve those problems, they decided it isn't their priority to fix C.
WG14 must maintain some semblance of backwards compatibility with previous versions of C. It's no good to make a feature that breaks older code. This happens from time to time -- old school K&R C won't work in a C18 or C23 compliant compiler -- but efforts are made to keep that legacy code compiling, for good or ill.
Yep, but we have to deal with what we have. For better or for worse, C remains where it is. We can either use process and tools to improve existing C, or throw our hands up.
I prefer to work toward fixing what is. We are unlikely to see things like array slices in C, and even if such features were added, this does nothing to fix the billions of lines of legacy code out there.
The programmers have changed, the machines have changed, the literature has changed, the compilers have changed a lot. You can still write and run the old insecure code, but you'll get warnings and hit stack canaries and your colleagues will gasp at you and your merge requests will be rejected.
I respectfully disagree. GP claimed that nothing has changed [regarding string and array security bugs in C] in 50 years. I responded that many relevant factors have changed, such that people tend to write different code now which is less susceptible to those bugs. Of course the same old bugs are possible, and sometimes good coders will still write them. Still I argue that there has been meaningful change since there are more protections against writing bugs in the first place, less incentive to write dangerous code, and more security for when (some) bugs still appear.
You've made three true statements, but I don't agree if you're implying that they prove that "nothing has changed". Bugs still appear, but they are significantly less common (per project or line not per year) and not as damaging when they occur. This is a non-trivial change for the better in the realm of C application quality.
There are more slaves in the world now than ever before in history, but global society has still made great progress on eliminating it in the last thousand years.
Not that it matters, but isn't that technically ANSI C(89)? If I remember correctly, the first ISO C standard is instead C90, which is basically identical to C89.
This is definitely a pie-in-the-sky DARPA challenge that would be great to have around as we migrate away from legacy systems, however, even taking your functions/methods in one language and giving them to ChatGPT and asking it to translate your method to a different language generally doesn't work. Asking ChatGPT the initial problem you're trying to solve, works more frequently, but still generally doesn't work. You still need to do a lot of tinkering and thinking to get even basic things to work that it outputs.
If you have dormant code, as in running everywhere but not getting worked on anywhere, a "translate to shitty rust before ever touching again" has a certain appeal. Not the appeal of an obviously good idea: chances are the "shitty rust" created through translation would be so much worse to work on than C with some level of background noise of bugs (that would also be present in the "shitty rust" thanks to faithful translation). In C, people have an idea about how to deal with the problems. In "shitty rust", it's, well, shitty, because rust people are not used to that stuff.
But there's a non-zero chance that someone could develop a skillset for iteratively cleaning up into something tolerable.
And then there are non-goal things that could grow out of the project, e.g. some form of linter feedback "can't translate into tolerable rust because of x, y and z". C people could look into that, and once the code is translatable into good rust, why translate.
If that was an outcome of the project, some people might find it easier to describe their solution in runnable C and let the "translator/linter" guide them to a non-broken approach.
I'd certainly consider all these positive outcomes quite unlikely, but isn't it pretty much the job description of DARPA to do the occasional dark horse bet?
In my experience (supporting a machine-translated codebase which resulted in shitty Java) your theory doesn't play out.
If you give developers a shitty codebase then those developers will leave to work somewhere else.
After a few years of working on this codebase we had 88% turnover. 1 in 10 developers remembered the original project's design philosophy and intention.
GP was proposing a different situation where the source code is not changing or changing very rarely. If you have a high churn codebase, obviously the maintenance experience will worsen dramatically after machine translation (at least with many current tools), so your experience is not unexpected.
> I'm personally not a fan of "rewrite the world in Rust" mentality
There is no such mentality anywhere. There is a ton of software that's much better off left alone in a dynamic language, or a statically typed language with a garbage collector (like Golang). Good engineers understand the idea of using the right tool for the job.
The push is to start reducing those memory safety CVEs because they have been proven to be a real problem, many times over.
> mechanical translation is a poor means of doing so
Agreed. If we could automatically and reliably translate C/C++ to Rust it would have been done already.
> Spend the time planning better architecture and designing a better software system, and find a way to replace it piece by piece.
OK, I am just saying that somewhere along that process people might get a bout of confidence and tell themselves "oh, we're doing C much better now, we no longer write memory safety bugs, can't we stop here?" and they absolutely will. Cue another hilarious buffer overflow CVE 6 months later.
> I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety.
A huge investment. If you are going to do that then you might as well just move to Rust.
> One gets the same safety as a gradual Rust rewrite
Maybe, but that sounds fairly uncertain or far from a clear takeaway to me.
Rewriting is rarely a good idea in general. Rust proponents like to pretend that it is impossible to avoid safety issues in C while it is automatically given in Rust. But this is not so simply in reality.
I don't like generalizations... in in general. :D (Addressing your "rewrites are rarely a good idea in general" here.)
My experience tells me that if a tech stack supports certain safety guarantees by default that this leads to measurable reduction of those safety problems when you switch to the stack. People love convenient defaults, that's a fact of life.
The apparently inconvenient truth is that most programmers are quite average and you can't rely on them going above and beyond to reduce memory safety errors.
So I don't buy the good old argument of "just hire better C programmers". We still have a ton of buffer overflow CVEs regardless.
And I never "pretended it's impossible to avoid safety issues in C". I'll appreciate if you don't clump me in some imaginary group of "Rust proponents".
What I'm saying is this: use the right tool for the job. The C devs have been given decades and yet memory safety CVEs are still prevalent.
What conclusion would you arrive at if you were in my place -- i.e. not coding C for a living for like 18 years now but still witnessing it periodically crapping the bed?
I'm curious of your take on this. Again, what other conclusion would you arrive at?
I am complaining about the usual phrases which are part of the Rust marketing, like the "just hire better C programmer did not work" or the "why are there still CVEs" pseudo arguments, etc.
For example, let's look at the "hire better C programmers does not work" argument. Like every good propaganda it starts with a truism: In this case that even highly skilled C/C++ programmers will make mistakes that could lead to exploitable memory safety issues. The problem comes from exaggerating this to the idea that "all hope is lost and nothing can be done". In reality one can obviously do a lot of things to improve safety in C/C++. And even one short look at CVEs should make it clear that there is often huge room for improvements even with relatively simple measures. For example, a lot of memory safety bugs in C/C++ come from open-coded string or buffer manipulation. But it is not exactly rocket science to abstract this away behind a safer interface. But once this is understood, the obvious conclusion is that addressing some of these low-hanging fruits would be far more effective in improving safety than wasting a lot of time and effort in rewriting in Rust.
> In reality one can obviously do a lot of things to improve safety in C/C++.
That's not "in reality", that's "in theory". Because in actual reality, people still write the good old buffer overflow bugs to this day.
I don't think anyone reasonable is disputing that we indeed can improve C/C++ programming. The argument of myself and many others like myself is: "a lot can be done but for one reason or another it is STILL NOT being done". Likely the classic cost cutting but there are likely other factors at play as well.
> But once this is understood, the obvious conclusion is that addressing some of these low-hanging fruits would be far more effective in improving safety than wasting a lot of time and effort in rewriting in Rust.
Explain why this has not been done yet. Explain why Microsoft, Google and various intelligence agencies attribute memory safety bugs to between 60% to 75% of all CVEs and demonstrable exploits that they are aware of.
Please do, I am listening. Why has almost nothing been done yet?
Secondly, "wasting a lot of time and effort in rewriting in Rust" is an empty claim. To demonstrate why, I ask you this: at which point the continued cost of investing in endlessly patching C/C++ and all its glorious foot-guns becomes bigger than the cost a rewrite?
Surely at one point just endlessly throwing money at something that gives you a 1% return of investment (in terms of getting more stable and less dangerously buggy) does indeed get more expensive than starting over?
I have no clear answer because it depends on the organization, the tenure of C/C++ and the devs in the org, and many others. It's strange that you pretend to have the answer.
> That's not "in reality", that's "in theory". Because in actual reality, people still write the good old buffer overflow bugs to this day.
That's because while the technology exists, it is not widely communicated. That's not a fault of C, and that's not something that any language can solve.
> Explain why this has not been done yet.
See above.
The technology to make C and C++ safer is not yet widely used. But, it exists and it is being used. I use it on every firmware and OS project that I currently work on. The code we produce is free of memory errors, integer errors, API misuse errors, resource management errors, cryptography errors, confused deputization errors, and a host of other errors that our specifications are designed to catch. That goes well beyond what Rust or any other language can provide on its own. But, to be fair, Rust developers can do this using similar tooling.
It's laudable that you wish to rid the world of memory errors. I want to normalize going three or four steps further. Rust by itself won't get us there.
The proven fact that the said technology has failed its purpose, as the C and C++ culture keeps resisting its adoption, is the fact that all CPU vendors are now integrating hardware memory tagging as the ultimate weapon against memory corruption exploits.
Solaris has already been doing it since 2015, ARM more recently, we have Microsoft putting the big buckets into CHERI (including custom FPGA boards for testing), the new CoPilot+ PCs architecture with Pluton, and while AMD/Intel attempts weren't quite right like MPX, they will surely do something for x64 as well.
> The proven fact that the said technology has failed its purpose,
How, because other solutions are being explored? That's not due to a failure of one thing, but because both defense in depth and a desire to fix existing systems with no additional engineering are paths that security researchers and vendors explore. Not everyone will converge on a single solution, even when that solution is practical.
Just because something is not being used universally doesn't mean that it has failed. Moreso, it is not widely known about, and there persists rumors that it requires extraordinary effort, often reinforced by well meaning, but rather outdated advice.
> desire to fix existing systems with no additional engineering
I, too, enjoy sci-fi.
> Just because something is not being used universally doesn't mean that it has failed.
You are only correct in the dictionary sense of these words. Fact is that a lot of the programmers are vain creatures prone to ego, and they make their chosen technical stack part of their core identity. This prevents them from being flexible, they get rigid as they age and they become part of the problems they so passionately wanted to fix when they were young.
None of that is made easier by the managerial class that absolutely loves and financially stimulates the programmers who don't want to rock the boat.
So I'd say if the said CBMC, and likely other tools in the same area, has more or less failed if it could not convince a critical mass of C/C++ devs to use it and finally start keeping up with Rust (and the other languages @pjmlp mentioned).
> Moreso, it is not widely known about, and there persists rumors that it requires extraordinary effort, often reinforced by well meaning, but rather outdated advice.
The victims of Heartbleed and many other CVEs don't care. The breaches happened anyway.
I am amazed at your desire to downplay the problem and keep claiming that eventually stuff will work out.
I disagree. And I'll repeat a very core part of my argument: C/C++ devs were handed a monopoly in their areas for decades and they still can't arrive at a set of common techniques that reduce or eliminate memory safety bugs.
I am not impressed. And I am not even a particularly good programmer. Just a diligent guy with average programming ability whose only unique trait is that he refuses to accept the status quo and always looks at how can stuff be improved. But this has taken me a long way.
I was characterizing these hardware changes as being fantasy, so I'm glad you agree.
> So I'd say if the said CBMC, and likely other tools in the same area, has more or less failed if it could not convince a critical mass of C/C++ devs to use it
So, in the same vein, Rust has failed because it has only been around for a similar amount of time and people still use C/C++?
> The victims of Heartbleed and many other CVEs don't care. The breaches happened anyway.
I fail to see how a CVE that occurred due to poor engineering practices has anything to do with the adoption of good engineering practices and tooling. Yes, Heartbleed is why we need this tooling.
You are simultaneously arguing that if we could just adopt Rust, our problems would be solved, but since another technology has not yet been adopted, it has failed. Rust isn't adopted due to programmer ego, but the use of tooling that does the same thing as Rust and more has not yet been adopted because it has failed. Do you not see the logical inconsistency in your position?
> So, in the same vein, Rust has failed because it has only been around for a similar amount of time and people still use C/C++?
Yes, it kind of failed there indeed. And I even hinted at why: Rust is far from perfect and its async implementation is a cobbled together mess. Golang's model reads much better, though I hate their foot-guns quite a lot (like writing to a closed channel leads to a panic; who thought that was a good idea?).
> I fail to see how a CVE that occurred due to poor engineering practices has anything to do with the adoption of good engineering practices and tooling. Yes, Heartbleed is why we need this tooling.
You can't see it? But... the good practices do lead to less of these CVEs as you yourself seem to realize? I don't get this part of your comment.
> You are simultaneously arguing that if we could just adopt Rust, our problems would be solved, but since another technology has not yet been adopted, it has failed.
You have answered it yourself: a lot of people see manual wrangling of `void**` as a badge of honor and their ego takes over (and the fear of being displaced, of course). I claim that Rust is not being more widely adopted due to programmer ego and fear of being obsolete. The fear of the end of nice salaries because they belong to a diminishing cohort of old-school cowboys.
Who would not fear that? Who would want that to end?
> Do you not see the logical inconsistency in your position?
No, and I don't get your argument. The reasons for C/C++ devs not improving the memory safety of their code, and the reasons for them not adopting Rust are very different. Not only is the analogy bad, it is plain inapplicable.
---
But it also does not help that HN reacts like a virgin schoolgirl pinched on the arse when Rust is mentioned. I've coded it for a few years, I loved it, I hated the bad parts and called them out, but even to this day I very quickly and easily get branded as a Rust fanboy even if my comment history shows balanced criticisms towards it. People don't care. People are emotional and are quick to put you in a camp that's easy to hate.
That is the part that I truly hate. No objective debate.
Too expensive to move to Rust? GOOD! That's an amazing argument, we can talk that for weeks and get very interesting insights in both directions.
People unwilling to get re-trained? Also a good argument, with big potential for interesting insights!
But most of everything else is at the level of a heated table debate after the 11th beer. Pretty meh and very uninteresting. No idea why I keep engaging, I think I am just bitter that people who REALLY should know better are reacting on emotion and not on merit. But that's on me. We all have our intolerances to the reality we inhabit. This is one of mine.
That's a rather cynical interpretation of these initiatives. CHERI, for instance, has been in development for twenty years. It predates the general availability of open source tools like CBMC or languages like Rust. But, that doesn't make the concept better or obsolete. It makes it complementary.
Hardware security is complementary to software security. Mitigations at the hardware level, the hypervisor level, and the operation system level complement architectural, process, and tooling decisions made at the software level.
Defense in depth is a good thing. There can always be errors in one layer or another, regardless of software solution, operating system, hypervisor, or hardware. I can wax poetic about current CPU vulnerabilities that must be managed in firmware or operating systems.
Many of the issues caused by C, are solved by Modula-2, Object Pascal and Ada, we didn't need to wait for Rust. But those aren't the languages that come for free with UNIX.
Or even better, they would be solved by C itself, if WG 14 cared even a little about providing proper support for slices, proper arrays and proper string types, or even as library vocabulary types.
But what to expect, when even Dennis Ritchie wasn't able to get his approach to slices being worked on by WG 14.
So hardware memory tagging, and sandboxed enclaves it is.
There is nothing wrong with defense in depth. But, this is not where things stop.
I make extensive use of bounded model checking in my C development. I also use privilege separation, serialization between separate processes, process isolation, and sandboxing. That's not because bounded model checking has somehow failed, but because humans are fallible. I can formally verify the code I write, but unless I'm running bare metal firmware, I also have to deal with an operating system and libraries that aren't under my direct control. These also have vulnerabilities.
That's not a trivial thing. The average software stack running on a server -- regardless of whether it is written in C, Rust, Modula-2, Pascal, Ada, or constructively proven Lean extracted to C++ -- still goes through tens of millions of lines of system software that is definitely NOT safe. All of that code is out of a developer's control for now. Admins can continually apply patches, but until those projects employ similar technology, they are themselves a risk.
One day, hopefully, all software and firmware will go through bounded model checking as a matter of course. Until then, we work with what we can, and we fix what we can. We can also rely on hardware mitigations where applicable. That's not failure as you have claimed, but practical reality.
> I make extensive use of bounded model checking in my C development...
I would absolutely love it if you were the majority, alas you are not.
I emulate exhaustive pattern matching in my main language of choice because it does not have it (it's not Rust or OCaml or Haskell) but because I saw how beneficial and useful it is. And sadly, many of the other devs using that language don't do so, and I have made a good buck going after them and fixing their mistakes.
I don't doubt your abilities as a person. I doubt the abilities of the corpus of C/C++ devs at large.
Well, that's something I hope to change. The tools required to write safer software exist. They just aren't widely distributed yet.
I can say, without ego, that I'm a reasonably good software developer. But, it is the tooling and process that I use that allows me to build safer software and that makes me a reasonably good developer. The same is true of Rust developers.
I can teach these skills to other developers, and in fact, I have plans to do so.
I don't expect things to change overnight, any more than I expect things to be rewritten in Rust overnight. C++ has been around for nearly 40 years, and software is still written in C. But, we can do better, and we must do better.
There are no silver bullets. But, that doesn't mean that we should dismiss tooling that is not well understood in order to chase unrealistic goals, like rewriting extant code bases in a different language to achieve security goals. Or, worse, as this article suggests, using mechanical translation to somehow capture the features of error-prone software without carrying over the errors.
Better process and better tooling allows us to write better software. Bounded model checking is an incredibly useful bit of tooling that allows us, within context of the software, to demonstrate that certain conditions do not arise. This includes memory errors, resource errors, and other classes of errors. The limitation is the faithfulness of the translation to SMT and the complexity of the code being modeled. The former has gotten quite good with CBMC 6, and the latter can be managed through careful refactoring and shadow function substitution.
Is it magic? There is no such thing. But, it is a practical tool that is available for use today.
One need not wait until an entire web browser is verified using it. It can scale to this, but given the unreasonable size and scope of web browsers with respect to this challenge, which are basically operating systems and suites of software in one these days, that's like saying, "verify all software then blog about it."
> That's not a fault of C, and that's not something that any language can solve.
If you say so. Rust clearly does, and before you go saying "but `unsafe` exists!" I'll have to remind you that (1) scarcely any Rust devs reaches for that and (2) it still keeps quite a lot of guarantees and only relaxes some. Some, not all. Not even most.
> It's laudable that you wish to rid the world of memory errors. I want to normalize going three or four steps further. Rust by itself won't get us there.
Well now we are on the same page. I never said "ONLY Rust will save us", I am saying that Rust clearly can get us further than we are right now. If there's something even more accessible, less verbose, and with not such a cobbled together Frankenstein async implementation like Rust, I'll start using it tomorrow.
Until it exists at the kernel layer, the firmware layer, the runtime library layer, and the application layer, these issues still exist. CVEs come out weekly for memory errors in Linux, in firmware, in operating system libraries, and in application libraries. We need to think beyond rewriting code in one language or platform, and instead think about technologies that we can apply to all languages and platforms, including C and Rust.
> I am saying that Rust clearly can get us further than we are right now.
As can bounded model checking, without having to teach developers a new language with new idioms.
> If there's something even more accessible, less verbose, and with not such a cobbled together Frankenstein async implementation...
Indeed there is. Reach for the bounded model checker that works with your existing language or platform. Pour over the manual, and look at existing practical examples.
If you like Rust, feel free to use it. But, if you prefer C/C++, Pascal, Ada, Python, C#, Java, or Modula2, that's fine. Either use an existing bounded model checker for that language or port CProver / GOTO to that platform. Rust developers ported CProver to Rust via Kani, because they also recognize that writing safer code can't be done by language alone.
I don't think it's necessary to push people to use different languages or platforms to write safer code. They just need to use or port existing tooling and learn safer coding practices. If I come at firmware developers or old school OS developers with "we need to use Rust", the conversation is immediately shut down and I'm considered a fool. If, instead, I show them tooling that allows them to maintain their existing code base and make it safer, I get much further.
Respectfully, that's a rather extraordinary claim. There are model checkers that use separate specification languages, but there are also model checkers embedded in the host language.
CBMC translates C -- the same language -- to an SMT solver. A different target but the same language.
It is true that new idioms will often be discovered along the way of converting existing C to pass the bounded model checker in every branch condition and in every case. However, software that is already relatively safe will require very little modification. I've seen it go both ways. Simpler code bases can pass model checks relatively unscathed. More complex code bases require refactoring to pass model checking.
To my point, the code base can remain in C, and can be model checked gradually. It doesn't have to be ported to a different language or platform. But, it will require added assertions and some refactoring to make the execution of code more clear. It's still in C. The specifications are specified in C using regular assertions. The only thing that changes is that one will often use shadow methods -- still written in C but simpler than the functions they are shadowing -- in order to model check other functions.
Other bounded model checkers like JBMC, Kani, or PolySpace work in similar ways.
There definitely is. Mainstream and official Rust community material is generally sane, but the meme did not come from nowhere. The rewrite-everything people are out there.
Meh, there are zealots in every community -- we're not even talking programming language communities only. Not even programming either. Everywhere.
No idea why people over-reacted so much to one particular 0.1% fanatics. It's a pretty normal state of affairs. Point me at your hobby group and even if it is only 20 people I can bet my balls at least 1 of them is a fanatic.
Overreacting to fanatics is also a normal state of affairs, so don't act surprised. :) By their nature fanatics almost always make a disproportionate amount of noise, and if you're outside the community you often can't tell the difference: don't know which if any of the loudmouths members pay attention to, etc. And even more broadly, a small number of people can cause a lot of damage.
> A huge investment. If you are going to do that then you might as well just move to Rust.
People say that, but the people who say this rarely have any practical experience using CBMC. It's very straight-forward to use. I could teach a developer to use it reliably, on practical software, in a month.
I am not denying it, nor am I claiming that "just move to Rust" is an universal escape hatch.
What I am saying is that if it were as simple as "just learn CBMC" then maybe Microsoft and Google would have not published their studies demonstrating that 60% - 75% of all CVEs are memory safety errors like buffer under-/over-flows.
These studies aren't wrong. But, that's also because neither Microsoft nor Google make use of practical formal methods in practice. Both have research teams and pie-in-the-sky projects, not dissimilar to this DARPA project. But, when it comes down to the nitty-gritty development cycle, both companies use decades old software development practices.
A lot of people are reading this as a call or demand to translate all C and C++ code to Rust, but (despite the catchy project name), I don't read the abstract in that way. There are two related but separate paragraphs.
1. C and C++ just aren't safe enough at large. Even with careful programming and good tooling, so many vulnerabilities are caused by their unsafe by default designs. Therefore, as much code as possible should be translated to or written in "safe" languages (especially ones that guarantee memory safety).
2. We are funding and calling for software to translate existing C code into Rust.
It's not a consensus to rewrite the world in Rust. It's a consensus to migrate to safe languages, which Rust is an example of, and a program that targets Rust in such migration.
So when those languages have 'unsafe' constructs what are the rules going to be around using those? Without a defining set of rules to use here you're just going to end up right back where you started.
> to migrate to safe languages, which Rust is an example of
Rust has a safe mode. It is _not_ a safe language. To do anything interesting you will require unsafe blocks. This will not get you very much.
Meanwhile you have tons of garbage collected languages that don't even let the programmer touch pointers. Why aren't those considered? The reason is performance. And because Rust programmers "care" so much about performance you're not ever going to solve the fundamental problem with that language.
Do you want performance or safety? You can't have both.
> Rust has a safe mode. It is _not_ a safe language. To do anything interesting you will require unsafe blocks. This will not get you very much.
1. There are plenty of interesting programs which don't require unsafe.
2. Even if your program does require unsafe, Rust still limits where the unsafety is. This lets you focus your scrutiny on the small section of the program which is critical for safety guarantees to hold. That is still a win.
You can do tons of stuff with purely safe Rust. The main things that you can't do are FFI, making self-referential structures, and dereferencing raw pointers.
And unsafe isn't a problem. It's a point of potential danger to be heavily audited, tested, and understood. Having the entire language unsafe by default is an obviously worse situation. This is throwing the baby out with the bathwater, like rallying against seat belts because you can still die while wearing one. An improvement is still an improvement. I don't understand why people criticizing Rust tend so heavily to let perfect be the enemy of good.
> I don't understand why people criticizing Rust tend so heavily to let perfect be the enemy of good.
if you've convinced yourself that you're special and all problems with c are solved by trying harder, clearly everyone else is just lazy. with that line of logic, there's nothing to fix with c. rust is not just redundant, but also aggravating, since its popularity causes the cognitive dissonance to start creeping in.
maybe i can make mistakes? should we improve tooling somewhat? no, it's the children who are wrong.
> all problems with c are solved by trying harder, clearly everyone else is just lazy.
If you're even remotely familiar with professional C development then you should know this is unironically true. Tooling does exist to offer memory-safe features in C, they're just far more complicated than using a safe language from the offset. Nobody wants to use Valgrind when your linter can do the same job without leaving your editor.
Most of today's high-performance C code is compiled using the same IR that LLVM generates when compiling C. Unless you're a GCC pundit it doesn't make sense to reject the direction the industry is headed in.
> maybe i can make mistakes? should we improve tooling somewhat?
People still die while wearing seatbelts, helmets and motorbike protective gear, body armor, bullet proof vests, yet many more survive, than those not wearing any of those in similar situations.
I'm really surprised this can work at all in any automated way. You can't just make a line-by-line transcription of a typical c program into rust. Pointers and aliasing are ubiquitous in c programs, concepts that rust explicitly prevents. You have to rethink many typical constructs at a high level to rewrite a c program in rust, unless you wrap the whole thing in "unsafe."
Line by line is infeasible, which is precisely why you need to use AI to make larger semantic inferences.
You also don't have to one-shot translate everything. One of the valuable things about the Rust compiler is it gives lots of specific information that you can feed back into an LLM to iterate.
I've been working on similar problems for my startup (grit.io) and think C -> Rust is definitely tractable in the near term. Definitely not easy but certainly solvable.
That’s probably the rout they would take, but the C AST won’t have ownership attributes. You‘d have to discover those yourself.
ASTs also don’t have much info on threading (that’s more or less limited to “the program starts a thread with entry point foo at some time”, “Foo waits for another thread to finish”)
Foundation models aren't primarily trained on ASTs, so you're typically going to have worse results than just using text unless you do extensive fine-tuning yourself.
ASTs also generally don't actually have magical information in them. They won't solve the lifetime issues for you.
> Pointers and aliasing are ubiquitous in c programs
If we ignore multi-threaded programs is long term aliasing actually ubiquitous in C programs? For many programs, I would expect most of it to happen within the scope of a single function (and within it, across function calls, but there, borrowing will solve this, won’t it?)
If so I would trying to tackle that as one sub-problem (you have to start somewhere), and detecting how data gets shared between threads as another. For the latter, I expect that many programs will have some implicit ownership rule such as “thread T1 puts stuff in queue Q where thread T2 will pick it up” that can be translated as “putting it in queue transfers ownership”.
Detecting such rules may not be easy, but doesn’t look completely out of reach for me, either, and that would be good enough for a research project.
For a naive newcomer - could you go line by line, wrap the whole thing in “unsafe”, compile to an identical binary, and then slowly peel away the “unsafe” while continuing to validate equivalence?
That would at least get you to as much rust as possible, and then let engineers tackle rethinking just those concepts.
Converting C to legal (unsafe) Rust is quite possible; there is indeed already a tool that does this (https://github.com/immunant/c2rust).
The problem you run into is that the conversion is so pedantically correct that the resulting code is useless. The result retains all of the problems that the C code has, and is so far from idiomatic Rust that it's easier to toss the code and start from scratch. Progressive lifting on unsafe Rust to safe Rust is a very difficult order, and the tool I mentioned had a tool to do that... which is now abandoned and unmaintained.
At the end of the day, the chief issue with converting to safe Rust is not just that you have to copy semantics over, but you also have to recover a lot of high-level preconditions. Turning pointers into slices is perhaps the easiest task of the lot; given the very strict mutability rules in Rust, you also have to work out when and where to insert things like Cell or Rc or Mutex or what have you, as well as building out lifetime analysis. And chances are the original code doesn't get all these rules right, which is why there are bugs in the first place.
Solving that problem is the goal of this DARPA proposal, or perhaps more accurately, determining how feasible it is to solve that problem automatically. Personally, I think the better answer is to have a semi-automated approach, where users provide as input the final Rust struct layouts (and possibly parts of the API, to fix lifetime issues), and the tool automates the drudgery of getting the same logic ported to that mapping.
Right. Used c2rust once. Been there, done that. The Rust code that comes out is awful. Does the same thing as the C code, bugs and all. You don't get Rust subscript check errors, you get segfaults from unsafe Rust code. What comes out is hopeless for manual "refactoring".
The hardest part may be Rust's affine type rules. Reference use in Rust is totally different than pointers in C/C++. Object parenting relationships are hard to express in Rust.
what you need to avoid is incompatibilites between different high level languages with a low level intermediary so you arent stuck attempting to convert high level hardware abstraction directly to another high level hardware abstraction.
> Those involved with the oversight of C and C++ have pushed back, arguing that proper adherence to ISO standards and diligent application of testing tools can achieve comparable results without reinventing everything in Rust.
If you stick to extremely stringent coding practices and incorporate third party static verification tools that require riddling your code with proprietary situations, then sure, you can achieve comparable results with C/C++.
It's quite hilarious to see the push back rust gets by the c/c++ community. Obviously their decades of hard work and experience to work with those languages are overriding their reasoning circuits. Who in their right mind would defend a language that has such major and obvious design flaws if a genuine alternative is there.
Many of the most widely used languages have obvious major design flaws. (JavaScript is one obvious candidate, python is another. How did a language which has no built-in floating point type become the number one language for numerical analysis?)
The real question is what tradeoffs you are making and what you are gaining. Rust makes certain memory safety guarantees about the program at compile time, but at the same time it disallows perfectly safe constructions, which can exist in C++, as well.
I think DARPA is making the right decision about choosing Rust as the language for low level systems programming. For national security related matters you'd definitely want the certainty Rust brings.
The reason I personally chose Rust as my go to language for low level programming is that despite learning systems programming in college I pretty much never used it outside of school. Meaning I didn't have any of that knowledge that c and c++ programmers had built up over years of experience. So I decided that instead of having to deal with the unknown skill deficiencies in writing concurrent software and memory management I'd rather just have a compiler scream at me. I don't regret the decision.
Also, I remember writing an async TCP implementation in college with c++ using boost. Rust tooling is just so far ahead of that.
> I think DARPA is making the right decision about choosing Rust as the language for low level systems programming. For national security related matters you'd definitely want the certainty Rust brings.
I see this differently: DARPA bets on different baskets in parallel. This is just one basket, if they are wrong it doesn't matter because there are other bets to reduce the general risk.
I don't see anyone defending JavaScript. In fact a whole lot of people are using typescript now because JavaScript is just so bad.
As for python, that's a good point. I guess it's just because it's easy to use and all the numerical stuff is done with c-bindings anyway?
But the C++ Situation is genuinely different. There's a reason governments are now calling upon developers to just let it die already[0]. That design flaw is so bad it's causing genuine harm.
>I guess it's just because it's easy to use and all the numerical stuff is done with c-bindings anyway?
No, it's horrible, because now you have both python types and numpy types, which don't really interact well with one another. If you are using a language made for numerical analysis (e.g. Julia), a lot of headaches disappear instantly.
Python is 100% just a case of a language being used because it is being used. It has, by itself, few merits to many of the tasks it is actually being for.
>design flaw
It is a tradeoff though. Rust is paying that tradeoff by being very restrictive about certain patterns and being in general quite complex to learn.
Honestly, it feels too limiting, I know about unsafe and stuff but there's just something about managing memory manually, Zig is a good middle ground imo
I think it depends on the domain. You don't need Rusts memory safety guarantees everywhere for everything. But if you're writing some sensitive piece of code where security is crucial, it seems crazy to not use a language like Rust.
I don't see this working. There are abstractions in C which are not replicable in Rust, without major changes.
In C, having two separate data structures which carry an identical pointer and are writing to it is a common occurrence. This can not be trivially replicated in rust and will need some reasonably clever intervention.
Is this supposed to be automatic ? And if so wouldn’t any Programm that can automatically port c to rust, by necessity contain all the functionality to make the c code itself safe?
I don't think a reasonable reading of the statement implies "fully automated", at which point the answer to the question is no.
Obviously some C code isn't just "not verifiable correct" but "actually wrong in a memory unsafe way". That code isn't going to be automatically translated without human intervention because, how could it be, there is no correct equivalent code. The tooling is going to have to have an escape hatch where it says "I don't know what this code is meant to do, and I know it isn't meant to do what it does do (violate promises to the compiler), help me human".
On a theoretical level it's not possible for that escape hatch to only be used when undefined behaviour does occur (rices theorem). On a practical level it's probably not even desirable to try because obtuse enough code shouldn't just be blindly translated.
So what I imagine the tooling ends up looking like is an interactive tool that does the vast majority of the work for you, but is guided by a human, and ultimately as a result of that human guidance doesn't end up with exactly equivalent code, just code that serves the same purpose.
Since you did not specify that you wish to preserve all behaviors of the C code, there are trivial solutions to this problem. For example, one could replace all dynamic memory allocations with fixed buffers (set at translation time), and reject all inputs that do not fit in those buffers.
It's good to see DARPA pushing on this. It's a hard problem, but by no means impossible. Translating to safe Rust, though, is going to be really tough. There's a C to Rust translator now, but what comes out is horrible Rust, which just rewrites C pointer manipulation as unsafe Rust struct manipulation. The result is less maintainable than the original.
So what would it take to actually do this right? The two big problems are 1) array sizes, and 2) non-affine pointer usage. Pointer arithmetic is also hard, but rare. Most pointer arithmetic can be expressed as slices.
Every array in C has a size. It's just that the compiler doesn't know what it is.
I once tried to use c2rust as a starting point for rustification of code and... it's not even good at that. The code is just too freakishly literal to the original C semantics that you can't even take the non-pointery bits and strip off the unsafe block and use that as a basis.
(To give you a sense, it translates something like a + 1 to a.unwrapped_add(1i32), and my recollection is that for (int i = 0; i < 10; i++) gets helpfully turned into a while loop instead of a for loop).
In general, the various challenges that all need to be solved that aren't solved yet are:
a) when is integer overflow intentional in the original code so that you know when to use wrapping_op instead of regular Rust operators?
b) how to convert unions into Rust enums
c) when pointers are slices, and what corresponds to the length of the slice
d) convert pointers to references, and know when they're mutable or const references
e) work out lifetime annotations where necessary
f) know when to add interior mutability to structs
g) wrap things in Mutex/RwLock/etc. for multithreaded access
We're a very long way from having full-application conversion workable, and that might be sufficiently difficult that it's impossible.
That doesn't mention the affine type problem. Rust references are restricted to single ownership. If A has a reference to B, B can't have a reference to A. Bi-directional references are not only a common idiom in C, they're an inherent part of C++ objects.
Rust has to use reference counts in such situations. You have an Rc wrapped around structs, sometimes a RefCell, and .borrow() calls that panic when you have a conflict. C code translates badly into that kind of structure.
Static analysis might help find .borrow() and .borrow_mut() calls that will panic, or which won't panic. It's very similar to finding lock deadlocks of the type where one thread locks the same lock twice.
(If static analysis shows that no .borrow() or .borrow_mut() for an RwLock will panic, you don't really need the RwLock. That's worth pursuing as a way to allow Rust to have back references.)
I'd lump that analysis somewhere in the d-g, because you have to remember that &mut is also noalias and work out downstream implications of that. It's probably presumptive of me to assume a particular workflow for reconstructing the ownership model to express in Rust, and dividing that into the steps I did isn't the only way to do it.
In any case, it's the difficulty of that reconstruction step that leaves me thinking that automated conversion of whole-application to Rust is a near-impossibility. Conversion of an individual function that works on plain-old-data structures is probably doable, if somewhat challenging.
An off-the-cuff idea I just had is to implement a semi-automated transformation, where the user has to input what a final conversion of a struct type should look like (including all Cell/Rc/whatever wrappers as needed), and the tool can use that to work out the rest of the translation. There's probably a lot of ways that can go horribly wrong, but it seems more feasible than trying to figure out all of the wrappers need to be.
In my understanding, this is a call for proposals to do the work, there is no detailed discussion yet. That will come when there's actual responses to this call.
I've tried that thing. The Rust that comes out is terrible. It converts C into a set of Rust function calls which explicitly emulate C semantics by manipulating raw pointers. It doesn't even convert C arrays to a Vec. It's a brute-force transliteration, not a translation.
I and someone else ran this on a JPEG 2000 decoder that sometimes crashed with a bad memory reference. The Rust version crashed with the same bad memory reference. It's bug-compatible.
What comes out is totally unreadable and much bigger than the original C code. Manual "refactoring" of that output is hopeless.
> Any automatic translation is bug-compatible with the original. Did you expect it to divine some requirements?
That would be useless when translating C to Rust. Yes, I would expect the tool to point out the flaws in the original memory handling and only translate the corrected code. This is far from easy, since some information (intent) is missing, but a good coder could do it on decent codebases. The question is, can an automated tool do it too? We'll see.
It doesn't make sense to convert a C array to a Vec, the Vec type is a growable array but the C array isn't growable. It makes sense to convert to Rust's array type, which has a fixed size, and we realise there's a problem at API boundaries because C's arrays decay to pointers, so the moment we touch an API boundary all safety is destroyed.
Firstly, that's not an array. C has actual arrays, even though they decay to pointers at API edges and what you've made with malloc is not an array. I'll disregard C++ new and new[]
But also, it's definitely not a growable array. Box::new_uninit_slice makes the thing you've got here, a heap allocation of some specific size, which doesn't magically grow (or shrink) and isn't initialized yet.
> I ran this on a JPEG 2000 decoder that sometimes crashed with a bad memory reference. The Rust version crashed with the same bad memory reference. It's bug-compatible.
Of course it is. The README says it generates unsafe rust in the first paragraph, what did you expect?
I think it's a really fascinating experiment, and IMHO it's pretty remarkable what it can do. This is an incredibly difficult problem after all...
It seems easy (relatively speaking) to directly translate C to Rust if you're allowed to use unsafe and don't make an effort to actually verify the soundness of the code.
But if you need to verify the soundness and fix bugs while translating it? That's really hard, and that's what it sounds like what TRACTOR wants to do.
Using "unsafe" doesn't automatically make Rust useless, of course, but the example on the c2rust website itself doesn't make any effort to verify its usage of unsafe (you can easily read memory out of bounds just by changing "n" to "n + 1" in the example loop). Sadly, that is a much, much harder problem to solve even for fairly basic C programs.
Eh, if c2rust "seems fairly easy" to you, I can pretty much guarantee you don't appreciate the complexity involved. Just take a look at the commit log...
As I mentioned elsewhere (https://news.ycombinator.com/item?id=41113257), that tool is pretty much useless unless you have some checkbox that says "no C code allowed anywhere". It's not even a feasible starting point for refactoring because the code is so far from idiomatic Rust.
This is a terrible idea. In order to get rid of one specific class of bugs, you want to risk introducing logic errors and performance issues and make the code harder to maintain.
Not to mention that this quote is incredibly scary. This is someone we are trusting to make this decision?
"You can go to any of the LLM websites, start chatting with one of the AI chatbots, and all you need to say is 'here's some C code, please translate it to safe idiomatic Rust code,' cut, paste, and something comes out, and it's often very good, but not always," said Dan Wallach, DARPA program manager for TRACTOR, in a statement.
The problems come with maintaining the translated code bases:
1. A code base written in C and a team of C engineers that have a good mental model of the code base to be able to maintain it.
2. An automatically translated Rust code base. Potentially (I'd say probably, but that is just my gut feeling) harder to read and understand than the original one.
3. Now you need a team of Rust engineers that have a good mental model of the code base that was generated.
If you already have that team of Rust engineers, I'd rather let them rewrite the code manually as they can improve it and have the correct mental model from the start.
Difficult: most C programs I know would convert to one single large "unsafe" block...
One might argue that re-writing from scratch is the safer option; and a re-write is also an opportunity to do things differently (read: improve the architecture by using what one has learned), despite the much-feared "second system" syndrome.
But nothing wrong with spending some research dollars towards tooling for "assisted legacy rewrites". DARPA and her sister IARPA fund step innovation (high risk, high reward), and this is an area where good things can come potentially come from.
Would be nice if they could first hire all the smart engineers (that Mozilla laid off) to continue working on the language itself.
Async is still a half finished mess even for people that use it every day. And that is my main annoyance but there are many (trait specialization, orphan rule limits, HKT, etc.)
Ah! Now's my chance to cheaply break into the dev field by becoming an expert Rust-all-the-C fairy who can flutter into high liability industries and get paid big bucks rebuilding the wheel into a memory-safe wheel!
(I'm only being half-facetious, I fear I may never break in!)
Russ Cox gave a GopherCon talk on the effort to automatically convert the Go compiler from C to Go (in the early days). Lots of interesting IRL issues / solutions in there.
I am a total C shill....I'll admit it. I'm just starting to learn it and the only reason I picked it was I wanted a low level language that most systems run.
That said, while I can acknowledge the benefits of memory safety, I would personally choose zig over rust.
All things considered I know you can do some safety check for C using the compiler, and that helps reduce the odds of memory issues.
Idk what rust mail libraries look like( are they even called that?) But I know C's standard library's have made learning stuff easier. Is their any way to know if your libraries in rust are using unsafe code? Will that just spit out compile time errors?
Every tool has its own specific quirks. Over many years of using a tool, "expertise" is the intimate knowledge of those quirks and how to use that tool most effectively. Changing tools requires you to gain expertise again. You're going to be less proficient in the new tool for a long time, and make a lot of mistakes.
Considering we already know how to make C/C++ programs memory safe, it's bizarre that people would ditch all of their expertise, and the years and years of perfecting the operation of those programs, and throw all that out the window because they can't be bothered to use a particular set of functions [that enforce memory safety].
If you're going to go to all of the trouble to gain expertise in an entirely new tool, plus porting a legacy program to the new tool, I think you need a better rationale than "it does memory safety now". You should have more to show for your efforts than just that, and take advantage of the situation to add more value.
But even proficient C and C++ programmers continue to produce code with memory safety issues leading to remote code execution exploits. This argument doesn’t hold up to the actual experience of large C and C++ projects.
They aren't trying to prevent them. It's trivial to prevent them if you actually put effort into it; if you don't, it's going to be vulnerable. This is true of all security concerns.
"You aren't trying hard enough" isn't a serious approach to security: if it was, we wouldn't require seatbelts in cars or health inspections in restaurants.
(It's also not clear that they aren't trying hard enough: Google, Apple, etc. have billions of dollars riding on the safety of their products, but still largely fail to produce memory-safe C and C++ codebases.)
In the case of OpenSSL, Big Tech clearly neglected proper support until after the Heartbleed vulnerability. Prior to Heartbleed, the OpenSSL Software Foundation only received about $2K annually in donations and employed just one full-time employee [1]. Given the projects critical role in internet security, Big Techs neglect raises concerns about their quality assurance practices for less critical projects.
The OpenSSL Foundation is not exempt from criticism despite inadequate funding. Heartbleed was discovered by security researches using fuzz testing, but proactive fuzz testing should have been a standard practice from the start.
OpenSSL is not a great example, either before or after funding — it’s a notoriously poorly architected codebase with multiple layers of flawed abstractions. I meant things more like Chromium, WebKit, etc.: these have dozens to hundreds of professional top-bracket C and C++ developers working on them, and they still can’t avoid memory corruption bugs.
Good True C Programmers had guard rails | canary bytes | etc. to detect and avoid actual buffer overflow (into unallocated memory) rather than technical buffer overflow (reading|writing past the end of a char|byte array).
> Considering we already know how to make C/C++ programs memory safe...
I think that the legion of memory bugs which still occur in C/C++ programs are proof of one of two things:
1. We (the industry as a whole) do not actually know how to make these programs memory safe, or
2. Knowing how to make programs memory safe in C/C++ is not sufficient to prevent memory safety issues.
Either way, it seems clear that something needs to be done and that the status quo in C/C++ programming is not enough. I'm not saying Rust will be the right answer in the end (I do like it, but there's a ton of hype and hype makes me distrustful), but I can't fault people for wanting to try something new.
Why would AI be competent at finding bugs? Most non-trivial bugs I find are about unexpected interactions between distinct pieces of code. Seems totally unfeasible for a llm to be good at.
It’s not any more of a joke than the hee-haw nonsense that using an LLM to translate working C code into something else will yield a result with fewer bugs.
I program in C++ and am very happy to do so. Modern C++ is very safe and actually fun to program in. It gives me enormous expressivity, extraordinary performance and safety when I need it. I'm not building space shuttles, I'm building 3D experiences, so I'm not terribly concerned about crashes. But even for me, I've not run into a memory corruption bug in recent memory (10-15 years.)
Bash C/C++ all you want. I'm happy to keep using it to my advantage.
What is the learning curve for newbies to avoid critical segfaults? If you still have to walk a tightrope to get code across the board, wouldn't all benefit from a plankway with guardrails instead?
I'm not dissing C or C++ in any way. I've used it. But I recognize there are some major footguns that aren't easy to avoid, causing a much longer learning curve than necessary to get things built. Rust at least seems determined to address them, good or bad!
To a first approximation, avoid using raw pointers. They should almost never be needed in application code. Use C++'s standard library facilities for smart pointers and containers instead. They are masterpieces of engineering, and work extremely well.
I program in modern C++ as well (C++23). I disagree with both "very safe" and "fun". Even with 23 there are an innumerable number of footguns throughout both the language and the standard library. Debugging code is also a mess. Good luck getting anything done without paying for an IDE, and even then it can be a struggle.
Of all the languages I use C/C++ have the least need for paid tools.
I use emacs(and vim), make and Boost's b2 build system for most of my programming. Although on Windows, Visual Studio is a joy to use. On Linux I use gdb. Works fine. I also use static analysers and valgrind. But I come from a tradition of Unix and living on the command line.
I've tried CLion, because I pay for IntelliJ IDEA for other programming (I also have to write Javascript, and Python) But while its nice, there is nothing there that I couldn't do without.
If you stick to C++ standard libraries, Boost, and turn on all warnings, and are reasonably competent, you won't encounter any bugs that are so serious that your program crashes inexplicably.
There's no direct translation back and forth between unsafe C to safe Rust tho, and theres infinite memory safe interpretations possible. If you're only interested in a black box executable, where certain tests pass, then I suppose you could just save the C and delete the Rust. But, the Rust has more information (that can't be deduced deterministically from the C). C would first get transpiled to unsafe Rust. Then some intelligence (A.I. or human) would get rid of all the unsafe keywords by making new design decisions that affect how the executable works inside. Each intelligence will do it differently. New edge case tests might give different outputs depending when you transpiled the C. It'd be better to save the Rust and make future changes without worrying if the latest A.I. will make the same design decisions at the prior A.I. each time you compile.
Yeah. I'm not sure how transpiling to rust is that much different than using the various standard analysis tools. And back porting things like the counted by attribute.
Also the lowest hanging fruit in C would be adding the ability to box and unbox fat pointers to objects.
I guess it is a consensus like `goto considered harmful` or `numbering should start at zero`, which is not a perfect consensus, but as much of a consensus as you can reach for such a disparate community.
i like the idea but i struggle to see how one can go about doing 'safe' disk reads, having 'safe' ways to manage global resources in kernel land (page tables, descriptor tables etc) and a lot of other stuff. perhaps if those devices also have rust in their firmware they can reply safely?? genuinely curious because i went back to C from rust in my OS. i could not figure it out (maybe i am not a darpa level engineer but i did work at a similar place doing similar things).
id be excited if this gets solved. rust is a lot more comfy for higher level kernel stuff.
Anyone interested in this should apply, but also look into one of the small software consultants that does a lot of government contracting. Those consultants will likely also be involved in this and more potential opportunities to work on this. Also, the private sector pays (much) better and you'll have liaisons to handle most of the bureaucratic nonsense that accompanies a government job, especially within the morass of the DoD. Going into this without being prepared for immense political nonsense will not be effective.
Isn't weird that they don't mention the already-existing open-source project they funded, c2rust? And no mention of the company behind it, Immunant, either.
They didn't explain why they've chosen Rust. There are a lot of memory-safe languages besides Rust, especially in application-level area (not systems-level like Rust).
There are a lot of memory safe languages; there are fewer that have (1) marginal runtime requirements, (2) transparent interop/FFI with existing C codebases, (3) enable both spatial and temporal memory safety without GC, and (4) have significant development momentum behind them. Rust doesn't have to be unique among these qualifications, but it's currently preeminent.
Yes, but you assume all their projects need all 4 of these. I like Rust, but it's a bad choice for many areas (e.g. aforementioned application-level code). I'd expect serious decisions to at least take that into account.
I’m not assuming anything of the sort. These are just properties that make Rust a nice target for automatic translation of C programs; there are myriad factors that guarantee that nowhere close to 100% of programs (C, application level, or otherwise) won’t be suitable for translation.
Apart from runtime/embedded requirements, there's the big question of how you represent what C is doing in other languages that don't have interior pointers and pointer casting. For example, in C I might have a `struct foo*` that aliases the 7th element of a `struct foo[]` array. How do you represent that in Java or Python? I don't think you can use regular objects or regular arrays/lists from either of those languages, because you need assignments through the pointer (of the whole `struct foo`, not just individual field writes) to affect the array. Even worse, in C I might have a `const char*` that aliases the same element and expects every write to affect its bytes. To model all this you'd need some Frankenstein, technically-Turing-complete, giant-bytestring-that-represents-all-of-memory thing that wouldn't really be Java or Python in any meaningful sense, wouldn't be remotely readable or maintainable, and wouldn't be able to interoperate with any existing libraries.
In Rust you presumably do all of that with raw pointers, which leaves you with a big unsafe mess to clean up over time, and I imagine a lot of the hard work of this project is trying to minimize that mess. But at least the mess that you have is recognizably Rust, and incremental cleanup is possible.
I’ve spent the past few months translating a C library heavy in pointer arithmetic to TypeScript. Concessions have to be made here and there but ended up making utility classes to capture some of the functionality. Structs can be represented as types since they are able to also to be expressed as unions similar to structs. These const types can have fields updated in place and inherit properties from other variables similar to passing by reference which JS can do (pass by sharing) or use a deep clone to copy. As far as affecting the underlying bytes as a type I’ve come up with something I call byte type reflection which is a union type which does self-inference on the object properties in order to flatten itself into a bytearray so that the usual object indexing and length properties automatically only apply to the byte array as it has been expressed (the underlying object remains as well). C automatically does this so there is some overhead for this that cannot be removed. Pointer arithmetic can be applied with an iterator class which keeps track of the underlying data object but sadly does count as another copy. Array splicing can substitute creating a view of a pointer array which is not optimal but there are some Kotlin-esque utilities that create array views which can be used. Surprisingly, the floating point values which I expected to be way off and can only express as a number type are close enough. I use Deno FFI so plenty of room to go back to unmanaged code for optimizations and WASM can be tapped into easily. For me those values are what is important and it does the job adequately. The code is also way more resilient to runtime errors as opposed to the C library which has a tendency to just blow up. TLDR; Don’t let it stop you until you try because you might just be surprised at how it turns out. If the function calls of a library are only 2-3 levels deep how much “performance” are you really gaining by keeping it that way? Marshalling code is the usual answer and Deno FFI does an amazing job at that.
Naah, I believe in some areas like DARPA's a lot of folks still do C out of tradition only. Same as in banking they still use COBOL -- way too many existing problems and integrations are already in COBOL.
In DARPA I think a lot of control software is written in C, even though some of their controllers can even run Java.
So a large-scale effort is needed to refactor all the infrastructure.l and processes.
The problem with Ada, was the price of the compilers, and that the few UNIX vendors that bothered with Ada, like Sun, it was an additional license on top of the already expensive SunOS/Solaris Developer SDK.
Thus the push for C and C++, alongside security certifications, where those languages feel like using Ada with a C like syntax.
Nowadays we live in a world where developers refuse to pay for their tools like other professionals, but hey, Rust is free beer, not like the several millions per seat licenses used by Ada vendors, of whom there are still 7 vendors in business.
slight tangent, but I think it would be amazing if AI could write device drivers. Full-featured GPU drivers for, say, OpenBSD. What does it need? Probably the state machines of the GPUs, how to enable various modes, how to feed data in/out of the device at intended speed, how to load shaders.
Why can't AI learn to do that? Its reward could be getting past the initialization and getting to the default state of the driver. It could be trained on hundreds of GPU drivers, not only for the minutiae of how to load values into the control registers, but the bigger picture of what it actually means.
Do you know what an AI trained on hundreds of books looks like? Even with millions of books it can not write a coherent chapter, much less an entire book.
This is a genuinely terrible idea. It is exactly the thing AI is bad at, high degree of accuracy over long stretches of output.
LLMs are bad at exactly those things which you need to make a GPU driver. Extremely high accuracy over long distances. AI totally falls apart when trying to write a novel, how could it write a GPU driver?
They just are terrible at producing long coherent segments of text.
>They are excellent at stealing artists' content.
AI doesn't "steal" anything. It is matrix multiplication. AI companies are exercising fair use to create derived works using matrix multiplication. Not only is it obviously fair use, it is also what every other artist does.
>I don't think it will be long for them to steal content from authors.
Most authors who have digitally available works have almost certainly have had their works used as training data.
there's literally no such thing as theft. only bring an object to one location from another, all 3d planes are relative to earth/sun/galaxy's orbit. there is no such thing as location.
I think this is indirectly a great argument for automated, test generation or equivalence checking. The reason is that these translations might change the function of the code. Automated testing would show whether or not that happened. It also reveals many bugs.
So, they should solve total, automated testing first. Maybe in parallel. Then, use it for equivalence checks.
It sounds to me near-impossible to convert C or C++ into as-safe-as-possible Rust code, because the original intention of the developer are missing. However, I wonder if some clever generative AI could be taught to recognize sufficient C programming patterns in relevant code bases to make the problem tractable.
Surely this could be better pitched to researchers as just another AI benchmark, a bit like ARC Prize? ;)
There could be some exiting C projects that are already public, with tests for feedback during development iteration and some holdout tests, and some holdout projects too with a leaderboard and prizes.
For preferences about converted code quality, both automated assesment and human preferences could be ranked with Elo? Kaggle is made for this sort of thing I think?
I'm sure Google Deepmind and others have some MCTS agents that could do a great job with a bit of effort.
And like with most other competition/benchmark, the result is likely optimizing for the benchmark and not the wider goal ;-). It’s difficult to get a serious effort without people trying to game the benchmark.
People could try but it would not help with the withheld datasets so much, and it would be possible to add more to it.
If the withheld data was closed, and only available to an assessment system, gaming that would be pretty difficult. Scale.com's SEAL Leaderboards take a similar approach. The ARC Prize still exists too, and it's waiting for winners.
I get the idea of moving to more memory safety, but the whole "rewrite everything in Rust" trend feels really misguided, because if you're talking about being able to trust code and code safety:
- Rust's compiler is 1.8 million lines of recursively compiled code, how can you or anyone know that what was written is actually trustworthy? Also memory safety is just a very small part of being able to actually trust code.
- C compiles down to straightforward assembly, almost like a direct translation, so you can at least verify that smaller programs that you write in C actually do compile down to assembly you expect, and compose those smaller programs into larger ones.
- C has valgrind and ASAN so it's at least possible to write safe code with code coding discipline, and plenty of software has been able to do this for decades.
- A lot of (almost all) higher level programming languages are written in C, which means that those languages just need to make sure they get the compiler and GC right, and then those languages can be used for general purpose, scripting, "low level" high level code like Go or OCaml, etc.
- There are many C compilers and only one Rust compiler, and it's unclear whether it'll really be feasible to have more than one Rust compiler due to the complexity of the language. So you're putting a lot of trust into a small group of people, and even if they're the most amazing, most ethical people, surely if a lot of critical infra is based on Rust they'll get targeted in some way.
- Something being open source doesn't mean it's been fully audited. We've seen all sorts of security vulnerabilities cause a world a hurt for a lot of people that came from all open source code, and often very small libraries that could actually be much easier to audit than lines with millions of lines of code.
- Similarly, Rust does not translate to straightforward assembly, and again would seem to be impossible to do given the complexity of the language.
- There was an interesting project I came across called CompCert, which aims to have a C compiler that's formally verified (in Coq) to translate into the assembly you expect. Something like a recursively compiled CompCert C -> OCaml -> Coq -> CompCert would be an interesting undertaking, which would make OCaml and Coq themselves built on formally verified code, but I'm not sure if that'll really work and I suspect it's too complicated.
- I think Rust might be able to solve some of these problems if they have a fully formally verified thing, and the formally verified thing is itself formally verified, and the compiler was verified by that thing, and then you know that you can trust the whole thing. Still, the level of complexity and the inability to at least manually audit the core of it makes me suspect it's too complicated and would still be based on trust of some sort.
- I still think that static analysis and building higher level languages on top of C is a better approach, and working on formal verification from there, because there are really small C compilers like tinycc that are ~50k LOCs, which can be hand verified. You can compile chibi-scheme with tinycc, for example, which is also about ~50k LOCs of C, and so you get a higher level language from about 100k LOCs (tcc and chibi), which is feasible for an ordinary but motivated dev to manually audit to know that it's producing sound assembly and not something wonky or sketchy. Ideally we should be building compilers and larger systems that are formally verified, but I think the core of whatever the formally verified system is has to be hand verifiable in some way in order to be trustworthy, so that you can by induction trust whatever gets built up from that, and I think that would need to require a straightforward translation into assembly, with ideally open source ISA and hardware, and a small enough codebase to be manually audited like the tinycc and chibi-scheme example I gave.
- Worst case everyone kind of shrugs it all off and just trusts all of these layers of complexity, which can be like C -> recursively compiled higher level lang -> coffeescript-like layer on top -> framework, which is apparently a thing now, and just hope that all of these layers of millions of lines of code of complexity don't explode in some weird way, intentionally or unintentionally.
- Best case of the worst case is that all of our appliances are now "smart" appliances, and then one day they just transform into robots that start chasing you around the house, all the while the Transformers cartoon theme is playing in the background while, which would match up nicely with the current trend of everything being both terrifying and hilarious in a really bizarre way.
I am fully in the RIIR koolaid, but SQLite would be near the absolute bottom of my prioritization list. Care to explain? SQLite is extensively tested, has requirements to run on ~every platform, be backwards compatible, and has a relatively small blast radius if there is a C derived bug. There is much more fertile ground in any number of core system services (network, sudo, dns, etc)
Zig is not safe, it's a C with a better templating system (comptime).
Ada, is not popular enough, is my guess.
To be fair, writing everything in Ada Spark would make code way more secure, simply because you'd need to write your pre-condition, invariants and post-conditions upfront, and prove they hold, but no one seems to want to think about lifetimes, let alone think about programming in more mathematical terms.
a) if every C program could be translated into an equivalent safe Rust program, that would mean that each C program is as safe as the safe Rust equivalent.
b) since there are C programs that are open to memory currption in a way safe Rust isn't, this corruptability would need to be translated into partially unsafe Rust. Congrats, you now have a corruptible Rust program, what's the point again??
c) so DARPA must be trying to fix/change what the program is doing when switching to Rust. So how to discern what behaviour is intended and which is not? Doesn't this run directly into the undecidability/uncomputability of the halting problem!?!
Memory corruption is undefined behavior and means the compiler is free to do anything it wants.
Anything it wants... and that includes doing something entirely safe and reasonable.
If you write out of bounds, the compiler is allowed to shut the program down in a controlled manner. It's allowed to transparently resize the array for you. Etc.
You "anything it wants" folks really annoy me a little.
If the compiler can, compile-time, detect that code is prone to memory corruption, it can warn the developer.
If it can't detect it at compile time, will and shall it add some sort of magic signal handler heuristic to determine whether a segfault occurred due to a runtime-provable specific instance of memory corruption and hence format your harddrive, while for runtime-indeterminable kinds it'd rather fry cpu core seven preemptively ? But that behaviour changes in the next version to blink sos on the network cable leds ?
I mean, it were cool if compilers used their "freedom" here to output nagging messages "the mem-safe UB brigade told you so, told you so, told you so ...". The fact they don't tells me, at least, that compiler developers follow Postel's law - be strict at what you emit but lenient at what you process. They're reasonable people. Not some sort of crusader out there to get you in the most excruciatingly painful ways. Undefined behaviour isn't unreasonable behaviour.
#include <stdio.h>
int main(void) {
int a[10];
a[20] = 100;
printf("%d\n", a[20]);
}
because accessing a[20] is undefined behavior, it is legal to translate the program to the following rust code (which crashes with out of bounds error message during runtime).
It gives a different result than gcc. But one that is both valid one and useful. And that's why machine-translating to rust could have benefits in practice. Contrary to
simon_void's assertion, you can translate a corruptible program to a non-corruptible one.
(In this particular case the error is simple enough that the compiler catches it and we have to tell it to go ahead anyway, but in more complicated cases it wont be. So please don't get hung up on this point)
>Doesn't this run directly into the undecidability/uncomputability of the halting problem!?!
The programmer gets to decide. DARPA does not expect the translator program to autonomously output a perfect Rust program. It just wants a "high degree of automation towards translating legacy C to Rust" (from the sam.gov link in the submission, emphasis mine).
I remember Ada getting pushed in a time when there were many in the computer industry that were pushing Pascal as both a systems and a teaching language. Ada was a lot like Pascal which I think caused an immediate violent reaction in some people. (e.g. the implementers of every other programming language were pissed that BASIC was so hegemonic but they never asked "Why?" or if their alternatives were really any better)
In the early 1980s, microcomputer implementations such as UCSD Pascal were absolutely horrific in terms of performance plus missing the features you'd need to do actual systems programming work. In the middle of the decade you saw Turbo Pascal which could compile programs before you aged to death and also extended Pascal sufficiently to compete with C. But then you had C, and the three-letter agencies were still covering up everything they knew about buffer overflows.
I don’t know Rust but even if the Rust is just as unsafe in certain blocks, simply being translated to Rust removes a lot of corporate resistance to adopt the language.
Getting people to adopt a new language can be a lot of work. I remember people claiming they missed headers files in Swift so they wanted to stick with Objective C.
Indeed. There have been UB bugs in the standard library caused by unsafe blocks.
Those are bugs. They are faults in the code. They need to be fixed. They are not UB-as-a-feature like in C/C++. “Well watch out for those traps every time you use this.”
This is like getting mad that a programming language boasts that it produces great binaries and yet the compiler has a test suite to catch bugs in the emitted assembly. That’s literally what you are doing.
> Those are bugs. They are faults in the code. They need to be fixed. They are not UB-as-a-feature like in C/C++.
Rust has UB-as-a-feature too. They could have eliminated UB from the language entirely, but they chose not to (for very valid reasons in my opinion).
UB is a set of contracts that you as the author agree to never violate. In return, you get faster code under the assumption that you never actually encounter a UB condition. If you violate those contracts in Rust and actually encounter UB, that's a a bug, that's a fault in the code. If you violate those contracts in C++, that's a bug, that's a fault in the code. This is the same in both languages.
It's true that Rust UB can only arise from unsafe blocks, but it is not limited to unsafe blocks. Rust UB has "spooky action at a distance" the same way C++ UB does. In other words, you can write UB free code in Rust, but if any third party code encounters UB (including the standard library), your safe code is now potentially infected by UB as well. This is also the same in both languages.
There are good reasons to favor Rust's flavor of UB over C++'s, but I keep seeing these same incorrect arguments getting repeated everywhere, which is frustrating.
> It's true that Rust UB can only arise from unsafe blocks, but it is not limited to unsafe blocks.
This is correct, and it's hard to teach, and I agree that a lot of folks get it wrong. (Here's my attempt: https://jacko.io/safety_and_soundness.html.) But I think this comment is understating how big of a difference this makes:
1. Rust has a large, powerful safe subset, which includes lots of real-world programs. Unsafe code is an advanced topic, and beginners don't need to learn about it to start getting their work done. Beginners can contribute to big projects without touching the unsafe parts (as you clarified, that means the module privacy boundaries that include unsafe code, not just the unsafe blocks), and reviewers don't need to be paranoid about every line.
2. A lot of real-world unsafe Rust is easy to audit, because you can grep for `unsafe` in a big codebase and zoom right to the parts you need to look at. Again, as you pointed out, those blocks might not be the whole story, and you do need to read what they're doing to see how much code they "infect". But an experienced Rust programmer can audit a well-written codebase in minutes. It's not always that smooth of course, but it's a totally different world that that's even possible.
> There are good reasons to favor Rust's flavor of UB over C++'s, but I keep seeing these same incorrect arguments getting repeated everywhere, which is frustrating.
Tell me what I wrote that was incorrect. I called them UB bugs in the standard library. If they were trivial bugs that caused some defined-behavior logic bug while used outside of the standard library then it wouldn’t rise to the level of being called an UB bug.
That's the part that's incorrect. That, plus the implication that UB is a bug in Rust, but not in C++. As I said, the existence of UB is a feature in both languages and actually encountering UB is a bug in both languages. You can play with the semantics of the word "feature" but I don't think it's possible to find a definition that captures C++ UB and excludes Rust UB without falling into a double standard. Unfortunately double standards on UB are pretty common in conversations about C++ and Rust.
Do you think UB-as-feature is something that someone would honestly describe C or C++ as? It’s a pretty demeaning way of framing things. Indeed it’s a tongue-in-cheek remark, a vhimsical exaggeration/description of the by-default UB of those languages which was added to the end of the completely factual description of the role that finding UB in the Safe Rust subset of the standard library of Rust serves.
Of course one cannot, from the Rust Side so to speak, use tongue in cheek, off-hand remarks in these discussions; one must painstakingly add footnotes and caveats, list and mention every trivial fact like “you can get UB in unsafe blocks”[1] or else you have a “double standard”.
[1] Obligatory footnote: even though all participants in the discussion clearly knows this already.
> Do you think UB-as-feature is something that someone would honestly describe C or C++ as?
Yes. That's how I describe it. That's also how Ralf Jung (long time Rust contributor and one of the main people behind Miri) describes UB in both Rust and C++ (although he says C++ overdoes it) [1]
The thing I edited out of my comment was "motte and bailey fallacy" because after reflecting a bit I thought it was unfair. But now you're actually trying to retroactively reframe as a joke.
> Yes. That's how I describe it. That's also how Ralf Jung (long time Rust contributor and one of the main people behind Miri) describes UB in both Rust and C++ (although he says C++ overdoes it) [1]
Okay. Then I was wrong about that.
> The thing I edited out of my comment was "motte and bailey fallacy" because after reflecting a bit I thought it was unfair. But now you're actually trying to retroactively reframe as a joke.
What a coincidence. I had written on a post-it note that you were going to pull out an Internet Fallacy. (I guess it’s more about rhetoric.)
I guess you’ve never seen someone explain after the fact that they were being tongue in cheek (it’s not a joke, it’s an exaggeration)? Because jokes, sarcastic remarks are always clearly labelled and unambiguous? Okay then. I guess it was a Motte and Bailey.
> this infernal 'internal compiler representation'
What makes MIR "infernal"?
> I'm not even sure what is even remotely confusing about that?
You posted a link to a tool that executes pure rust libraries and evaluates memory accesses (both from safe and unsafe rust code) to assert whether they conform to the rust memory model. It sits in the same space as valgrind. You left it open to interpretation with really no other context. We can be excused for not knowing what you were trying to say. I personally still don't.
Miri is a MIR interpreter aimed at unsafe Rust, not safe Rust. Using the fact that it operats on an internal representation is a very weird swipe; almost all static and dynamic analysis tools work on some kind of IR or decomposed program representation.
> Miri is an Undefined Behavior detection tool for Rust. It can run binaries and test suites of cargo projects and detect unsafe code that fails to uphold its safety requirements.
> ... detect unsafe code that fails ...
Show me the documented safe Rust code that causes UB without using any unsafe blocks outside of the standard library.
There are some soundness holes in the implementation that can cause this. Just like any project, the compiler can have bugs. They’ll be fixed just like any bug.
So, the reason I posted my original reply, is that at one of my $DAYJOBs, we recently had a 3-day outage on some service, related to Rust. Something like using AVX to read, like, up to 7 bytes too many from an array.
Nothing really major -- we have a 10-day backup window, and the damage was limited to 4 days, so we were able to identify and fix all identified cases. But the person-to-Git-blame for this issue happened to be one of my mentees, and... they were blown away by it.
As in: literally heartbroken. Unable to talk about it. "But the compiler said it was okay!", crying. One of my coworkers pointed at MIRI, which correctly warned about the issue-at-hand, at which point I recommended incorporating that tool into the build pipeline, as well as (the usual advice in cases such as this) improving unit tests and focusing on X-1 and X+1 cases that might be problematic.
To this day, I'm truly worried about my mentee. I'm just a C# wagie, and I fully accept that my code, my language, my compiler, and my runtime environment are all shit.
But, as evidenced by my experience and supported by the voting in this thread, it seems that Rust users seem to self-identify with the absolute infallibility of anything relate to the language, and react quite violently and self-destructively to any evidence to the contrary.
As a community leader, do you see any room for improvement there? And if not, what would it take to convince you?
> As in: literally heartbroken. Unable to talk about it.
I would hope that this person improves as an engineer, because this isn't particularly professional behavior, from the way you describe it.
> "But the compiler said it was okay!"
Given that you'd have to use unsafe to do this, the compiler can't say it was okay. It sounds like this person may not fully understand Rust either.
> it seems that Rust users seem to self-identify with the absolute infallibility of anything relate to the language, and react quite violently and self-destructively to any evidence to the contrary.
I don't see how this generalizes. You had one (apparently junior, given "mentee"?) person make a mistake and respond poorly to feedback. You also barged into this thread and made incorrect statements about Rust, and were downvoted for it. That doesn't mean that Rust users think everything is perfect.
> As a community leader, do you see any room for improvement there?
I do think sometimes enthusiastic people who don't understand things misrepresent the thing they're enthusiastic about, but that's a human problem, not a Rust problem. I do not think there's a way to fix that, no.
It'd require using unsafe code somewhere in the stack. Not necessarily by the mentee. It's possible that the AVX code wasn't properly hidden behind a safe abstraction in a library.
OK, so here's my heartfelt plea: remove the 'unsafe' keyword from Rust?
Sure, not being able to do basic things like IO might be a bit of a limitation at first, but, that's all worth it, I guess?
Again: I'm pointing out to you that your absolutist stance on 'unsafe' and 'UB' is doing more harm than good.
You continue to choose to ignore this, which is your right. But as a "community leader" you could and should to better. As could I, I guess, by simply ignoring you, but, the mental health issues I see you cause in real-life make that sort-of hard...
I don't know if he's choosing to ignore it, or if it's simply hard to figure out exactly what you're saying. Your comments are unfocused in a way that makes it hard to engage with any specific point.
The points are:
* Unsafe Rust is required to uphold specific guarantees to not cause undefined behavior. This can be tricky, but it's not impossible, it just involves a lot of care and some tooling like Miri for those specific situations. The situation is the same as pretty much the entirety of the C and C++ languages, plus Rust reference safety.
* Safe Rust is designed to not cause any UB on its own. It can only "bleed" UB from incorrect unsafe code. Without any incorrect unsafe code, this is easy to work with and involves much less work and care.
* Therefore, keeping your unsafe blocks small and in dedicated crates where they can be individually tested increases the quality and reliability of the codebase.
Surely you can see that it's an improvement over the previous status quo. I don't know what absolutist stance you're talking about. Most Rust fans I know, including myself, accept that Rust is an imperfect language, representing an improvement over C and C++. It's not just hypothetical either. Rust has brought demonstrated improvement in reliability for us, and for some of the biggest companies in the world who now lean on it to reduce their rate of defects.
Given the story at hand, it sounds like the center incorrectly assumed the compiler would prevent UB even in unsafe blocks. They wouldn't be saying "But the compiler said it was okay" if it wasn't unsafe code they had written.
I think the story is just somebody who didn't actually learn unsafe Rust properly (and I'm struggling to give it the benefit of the doubt, as it sounds quite exaggerated; I couldn't imagine a novice Rust dev literally crying because they thought unsafe blocks couldn't cause UB. If you were that emotionally attached to the language, I'd expect you to have learned what unsafe means).
The Rust community as a whole very much promotes the idea of trusting the Compiler. Which is a very useful thing, especially for folks coming from other languages like C. It's not perfect of course as the compiler has bugs, but I think it still a good thing to teach.
You should never do this if you work at a company large enough to have a compiler team, btw, because they're going to fork the compiler and put bugs in it.
Conversely, if you never encounter bugs in a component, it means it's not being improved fast enough.
Don't worry, your language and especially the runtime and compiler are great. Particularly so in the last few years. I wouldn't worry about the noise, maybe it concerns C++, but C# is a strict productivity upgrade for general-purpose applications despite some* of the dated bits in the language (but not the runtime).
* like un-unified representation of nullable reference types and structs under generics for example, or just the weight of features over the years, still makes most other alternatives look abysmal in comparison
> I'm just a C# wagie, and I fully accept that my code, my language, my compiler, and my runtime environment are all shit.
What is shit about those things for C#? That’s the application programming language that seems to get the least flak out of all of them.
If I’m using an alpha or beta compiler, I might suspect a compiler bug from time to time… not really when I’m working in a decades-old, very established language.
Java is an underpowered clone of ObjC and C# is a slightly less underpowered clone of Java.
So they fixed the biggest issues (at least it has value types), but it has nullable classes, collection types are mutable, integer overflow doesn't trap, it doesn't have nearly enough program verification features (aka dependent types), etc.
Worst of all it was written by enterprise programmers, who think programs get better designed when you put all their types four namespaces deep. I assume whoever named System.Collections.ArrayList keeps everything in their house in one of those filing cabinets with the tiny drawers.
Yes, in particular some interactions with LLVM have caused some frustrating UB. But those are considered implementation bugs, rather than user bugs, and all the conditions Miri states at the top are relevant primarily in unsafe code, which contradicts the OP's point, which is that there are tons of documented cases of UB in safe Rust. This is not true. There are a few documented cases, and most have been fixed. It's nowhere close to the world of C or C++'s UB minefield.
I believe it goes something like, "I have constructed a strawman that Rust claims that all code written in it is automatically safe by all conceivable definitions of safe, but look, ha ha, here's something that detects unsafe code in Rust!", and I don't mean "code marked in unsafe blocks".
It's a concatenation of several logical fallacies in a row; equivocation, straw manning, binary thinking about safety, several others. It's hard to pick the main one, but I'd go with the dominant problem being a serious case of binary thinking about what "safety" is. Of course, if the commentor is using anything other than Idris for all their programming, they're probably not actually acting on their own accusations.
> Of course, if the commentor is using anything other than Idris
I'm sure the Idris compiler has bugs somewhere too. If the OP actually programs, they are violating their rationale (I'm quite sure assembly or assembled binary aren't ok either).
Just to find agreement about the terminology, wouldn't we call all code that is not inside an unsafe block "safe?" If so, then adding "generally" is superfluous, right?
If not, then how is "generally safe" different from "not inside an unsafe block?"
I didn't expect you to outright confirm that you are using the "solve all programming problems ever" strawman, but, err, thanks for the proof I guess. I thought maybe I went a bit overboard in the reading between the lines but I guess I nailed it.
They are claiming that because code in ‘unsafe’ blocks in Rust can have undefined behavior, that the language is no safer than C.
This does not settle the debate because unsafe is rarely needed for a typical Rust program. In addition, the presence of an unsafe block also alerts the reader that the set of possible errors is greatly increased for that part of the code and more careful auditing is needed.
It’s a little like saying traffic lights are useless because emergency responders need to drive through them sometimes, so we should just leave intersections completely unsignaled and expect drivers to do better.
Rust is by default restrictive and requires you to explicitly make it unsafe, C/++ are by default unsafe and require you to explicitly make them restrictive.
It is a tool for checking that your unsafe code doesn't cause UB. It doesn't really settle anything, but the commenter uses it as a gotcha to say "rust is no better than C, because you still can compile code that contains UB".
Well, the general 'Rewrite All in Rust' consensus is that it solves all general programming problems, ever.
Yet, the linked repository shows a huge list of cases in which simple, documented use of Rust can cause Undefined Behavior (a.k.a. 'UB')
Pretty much every argument of Rust advocates against C/C++ boils down to either 'but memory safety' or 'but UB'.
Yet there are many convincing counter-arguments that boil down to 'but CompCert' or similar, and, as the linked repository shows, there might be at least some truth in there?
No serious person claims that Rust solves every problem ever.
Also, many people cite things like Cargo as a reason to prefer Rust over C and C++, as well as other things. UB is a big part of it, of course, but it isn’t the only thing.
I selected it for performance reasons myself, the UB protection was a nice benefit that was expected, cargo wasn't expected and is extremely nice coming from the cmake,conan,vcpkg and duct tape world I came from.
You're available as an expert witness to that fact?
Because, eh, well, in at least one of the Rust-related situations that I'm involved in right now, someone might soon very well require the services of a person both as wise and reluctant-to-offer-any-kind-of-compromise as yourself...
The situation you've alluded to in another thread seems to involve an unsafe block (since it's using a type which is only usable in an unsafe block).
Let me be even more explicit than steveklabnik here. If your code, including any libraries you link to, is 100% Rust and free of any unsafe blocks, then (barring compiler bugs) it is impossible to execute undefined behavior. If your code has an unsafe block, then it is possible execute undefined behavior. Note that it is possible for safe code to execute undefined behavior, IF there was an unsafe block that did an operation that requires the programmer to promise something was true that was not true.
For example, there is an unsafe method that will let you convert a pointer to a reference with an arbitrary lifetime. If you wrap that in a safe function, you can return a reference to an object whose lifetime has ended, and cause undefined behavior in attempting to use that lifetime--the attempt can even be outside the safe block. But were that unsafe block that upgraded the lifetime not present, then you couldn't cause the later undefined behavior to happen.
In short, an unsafe block is where the compiler can no longer guarantee that the conditions that prevent the ability to observe undefined behavior are present, and it is up to the programmer to ensure that these conditions are met, and even and especially ensure that they continue to be met after the unsafe block completes. I do worry that too many programmers are blasé about the last bit, and it sounds like your coworker may fall into that category. But Rust has always maintained this principle.
OK, you truly seem not to understand how much damage you're dealing to the general population using absolutist statements like this, do you? Nor do you seem to understand "compromise", like at all, because you seem to equate it with "tit for that", which is unsurprising, but still... disappointing.
In any case, I'm truly done here, in all senses of the word, but I still I wish you and your acolytes the absolute best.
What are you talking about? Yes it's impossible to have UB in safe rust unless theres some obscure compiler bug or something. This isn't a controversial statement.
> Well, the general 'Rewrite All in Rust' consensus is that it solves all general programming problems, ever.
a) There is no such consensus. The actual consensus is that even if Rust solved all problems, it would not be financially feasible to rewrite pretty much any substantial project.
b) While Rust does solve many problems, it is nowhere close to solving all safety, otherwise there would be no `unsafe` keyword. Alas, fully proving safety in an impure, turing-complete language is mathematically impossible.
c) The only reason you would think that there's some sort of woke Rust lobby, is if you spend way too much time subjecting yourself to opinions of literal sixteen year olds on twitter.
Towards general mental health. I'm just a C# wage slave, and I'll admit, when being prompted, that my language, its vendor, its runtime environment, and its general approach are, to put it kindly, flawed.
However, as evidenced by the arguments and voting in this thread, Rust proponents will take no criticism, whatsoever.
I linked to a GitHub repository that documents many, many instances in which generally safe Rust causes UB.
The same kind of UB that recently hit one of my coworkers, caused a 3-day outage and now (despite all my counseling to the contrary!) will burn them out permanently.
My only request: can you guys please back off just a little bit? Programming is already hard enough without the purity wars you're stoking all the time...
to be fair, from his perspective, it's often the rusty crowd who is stoking the flame wars - this sounds like a reaction to them.
how often do we hear something like "C and C++ are horribly flawed and completely unsafe. it's basically a crime against humankind and gross negligence to use them"?
i get weary of that kind of thing too. i wouldn't approach it by reacting in the same way as the GP comment, but i get it. and it's not really that much of a strawman. it's more exasperation and sarcasm.
personally, i'm very interested in rust. but everytime someone at best "overhypes" it or at worse, outright dogs on other languages, it's a negative point toward dealing with the whole rust ecosystem.
In all honesty, I don't see that sort of thing posted except maybe the overly naive excited "omg I love rust" post in /r/rust from someone just learning it which no one should be taking as credible.
I do, however, see people trot out the oft-repeated "rust evangelists want to rewrite everything in rust" or "rust people say programming C++ is a crime against humanity", but it seems to me that's the only place I see this argument. In other words, it's a simple strawman.
People can, in the most neutral way possible, point out facts about how safe or unsafe Rust is compared to C and C++. People will STILL complain about how the Rust zealots are bullying their language. This is how it plays out every time.
You can look at this thread. The “exasperation and sarcasm“ is stupid and one-sided. “But” they always say “that’s just a reaction to a previous debate”–because the Rust zealots are always in the rear-view mirror, never in front of them.
How about complaining about something in Rust… that is bad? Like how un-ergonomic Async is? Or how pointy and awkward the syntax can be? Instead they choose to fight the losing battle over how C and Rust are equally unsafe or how actually Rust’s safety doesn’t matter, depending on the phase of the moon. Then they whine about tone and zealotry when they realize arguing against Rust safety from the C and C++ side is a losing battle and they have run out of arguments.