> have to imagine that in the general case it will be a translation to unsafe Ru...

rectang · 2024-07-30T21:14:53 1722374093

> undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

That's the kind of language lawyer approach that caused a rebellion in the last decade amongst C programmers against irresponsible compiler optimizations. "Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's written to exploit loopholes".

I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't. But hell, even if the result is unreliable in practice, I suppose that if somebody gets to claim "it works" then the incentives are aligned to produce garbage.

atiedebee · 2024-07-30T21:29:15 1722374955

> Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's relying written to exploit loopholes".

If your program invokes undefined behaviour, it's invalid and non-portable. Out of bounds array accesses are UB, yet a program containing them may just happen to work. It won't be portable even between different compiler versions.

The C standard is a 2 way contract: the programmer doesn't produce code that invokes undefined behaviour, and the compiler returns a standard conforming executable

matheusmoreira · 2024-07-30T23:27:21 1722382041

If undefined behavior is invalid, then reject the program instead of "optimizing" it. This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious. Null pointer check deletion can turn bugs into exploitable vulnerabilities.

zajio1am · 2024-07-31T02:13:50 1722392030

> If undefined behavior is invalid, then reject the program instead of "optimizing" it.

Undefined behavior is usually a result of runtime situation, it is usually not obvious from just the code whether it could or could not happen, so the compiler cannot reject the program.

The 'UB-based' optimization is just assumption that the code is correct and therefore UB-situation could not happen in runtime.

grumpyprole · 2024-07-31T07:55:00 1722412500

Usually but not always. For example, the removal of an empty effect free infinite loop. This should be an error.

the8472 · 2024-07-31T08:19:11 1722413951

The C++ forward progress guarantee enables more optimizations since it allows the compiler to reason more easily about loops:

> The standards added the forward progress guarantees to change an optimization problem from "solve the halting problem" to "there will be observable side effects in the forms of termination, I/O, volatile, and/or atomic synchronization, any other operation can be reordered". The former is generally impossible to solve, whereas the latter is eminently tractable.

But yeah, that's one of the more foot-gunny UB rules that Rust does not have. But it does mean it doesn't mark functions as `mustprogress` in LLVM IR which means it misses out on whatever optimizations that enables.

Avamander · 2024-07-31T09:29:10 1722418150

> This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious.

You significantly underestimate how much UB people write and overestimate the end-result if the current approach would not be taken.

rectang · 2024-07-30T21:41:18 1722375678

The C standard with its extensive undefined behavior causes programmers and compiler writers to be at odds. In a sane world, "undefined behavior" wouldn't be assumed to mean "the programmer must have meant for me to optimize this whole section of code away". We aren't on the same team, even if I believe that all parties are acting with the best of intentions.

I don't feel that the Rust language situation incentivizes such awful conflict, and it's one of many reasons I now try really hard to avoid C and use Rust instead.

astrange · 2024-07-31T01:39:28 1722389968

A funny thing about this problem is that it gets worse the more formally correct your implementation is. Undefined behavior is undefined, so it's outside the model, and if your program is a 100% correct implementation of a model then how can it know what to do about something outside it?

But I don't think defining all behavior helps. The defined behavior could be /wrong/, and now you can't find it because the program using it is valid, so it can't be detected with UBSan.

Asooka · 2024-07-30T21:51:02 1722376262

Doing one funny thing on platform A and a different funny thing on platform B when an edge case arises is way better than completely deleting the code on all platforms with no warning.

Someone · 2024-07-31T14:57:14 1722437834

> I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't.

I don’t see any way it can do otherwise. As a simple example, what would one translate this C statement to:

  int i;
  …
  i = abs(i);

? I would expect TRACTOR to generate (assuming 64-bit integers):

  let i: i64;
  …
  i = abs(i);

However, that can panic in debug mode and return a negative number in release mode (https://doc.rust-lang.org/stable/std/primitive.i64.html#meth...), and there’s no way for TRACTOR to know whether that makes the program “work as intended”. That code may have worked fine/fine enough) for decades because its standard library returns zero for abs(INT_MIN).

rectang · 2024-07-31T17:01:10 1722445270

It's possible to preserve the semantics of the original program using unsafe Rust. [1]

    unsafe {
        let mut i: std::os::raw::c_int
            = std::mem::MaybeUninit::uninit().assume_init();
        // ...
        i = libc::abs(i);
    }

That's grotesque, but it is idiomatic Rust insofar as it lays bare many of the assumptions in the C code and gives the programmer the opportunity to fix them. It is what I would personally want TRACTOR to generate if it could not prove that `i` can never take on the value `libc::INT_MIN`.

Given that generated code, I could then piecemeal migrate the unsafe bits to cleaner, idiomatic safe rust: possibly your code but more likely `i::wrapping_abs()` or similar.

What will TRACTOR choose? At least for this example, they don't have to choose inappropriate pruning of undefined behavior. They claim the following:

> The goal is to achieve the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities present in C programs.

If they're going to uphold the same "quality", the translation you presented doesn't cut it. But you may be right and they will go down the path of claiming that a garbage translation is technically valid under undefined behavior and therefore”quality” — if so, I will shun them.

[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...

Someone · 2024-07-31T19:02:30 1722452550

> It's possible to preserve the semantics of the original program using unsafe Rust

Because of the leeway the C standard gives you, you can preserve the semantics of the C program by just calling abs, and I think that’s the best you can do.

What the compiler does may be different for different compilers, different compiler versions or different compilation flags, so if all you have is the C source code, there’s no way to preserve the semantics of the machine code that the C compiler generates.

You could special-case all of them, but even then, there is the problem that a C compiler, even in a single translation unit, can inline one call and then apply some transformations while compiling another call to a call to a library function, making the semantics of overflow in one location different from that in another.

If you want to replicate that, I’d say you aren’t writing a C to rust translator, but a (C + assembly) to rust translator.

Also, if you go this route, you’d have to do similar gnarly stuff for all arithmetic on integers where you cannot prove there will not be overflow. I would not call the resulting code idiomatic rust.

rectang · 2024-07-31T20:21:38 1722457298

What you describe is antithetical to idiomatic Rust, written by a skilled Rust programmer.

To uphold the spirit of Rust, a C program must go through a process where assumptions are laid bare and footguns are dismantled. Applying an automatic process which arbitrarily changes the behavior from the implementation-dependent compilation of a C program just gets you a messy slop of hidden bugs collected inside an opaque, "safe" garbage can.

You don't get to Rust's reliability by applying a translation which discards it!

> Also, if you go this route, you’d have to do similar gnarly stuff for all arithmetic on integers where you cannot prove there will not be overflow.

Damn straight. That's what C is! It was always this bad, as those of us who have struggled to control it can attest. Faithful translation to unsafe Rust just makes it obvious.

artikae · 2024-08-01T22:06:31 1722549991

The first line is already UB. `assume_init` requires the contents to be initialized, hence the name.

rectang · 2024-08-02T04:07:41 1722571661

Mmm, I went back and read the docs for MaybeUnit more carefully and that's a good point.

It may be better to just leave the assignment off the declaration. If the variable is read before it's initialized to something, we'll get a Rust compilation error, forcing programmer intervention. Detecting actual bugs that would result in memory errors and forcing them to be resolved is very much in the spirit of Rust. TRACTOR may aspire to gift C programs with memory safety for free, but it won't always be possible.

Of course if TRACTOR can determine through static analysis that the unitialized read can't cause problems, it might emit different code.

derdi · 2024-07-30T21:34:56 1722375296

> undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

You assume that the compiler can determine what behavior is undefined. It can't. C compilers don't just look at some individual line of the program and say "oh, that's undefined, unleash the nasal demons". C compilers look at code, reason that if such-and-such variable has a certain value (say, a null or invalid pointer), then such-and-such operation is undefined (say, dereferencing that variable), and therefore on the next line that variable can be assumed not to have that bad value. Despite all the FUD, this is a very limited power. C compilers don't usually know the actual values in question, all they do is exclude some invalid ones.

bigstrat2003 · 2024-07-30T23:12:15 1722381135

I (not the person you are replying to) do understand that's how compilers interact with UB. However, a wealth of experience has shown us that the assumption "UB doesn't occur" is completely false. It is, in my opinion, quite irresponsible for compiler writers to continue to use a known-false assumption when building the optimizer. I don't really care how much speed it costs, we need to stop building software on a shaky foundation like that.

astrange · 2024-07-31T01:41:22 1722390082

Soon (or actually, already) we'll have MTE and CHERI, and then that C undefined behavior will be giving you security improvements as well as speed improvements.

Can't design a system that 100% crashes on invalid behavior if you've declared that behavior is valid, because then someone is relying on it.