Maybe consider a different color palette. I have a red/green color blindness (like ~8% of the male population), and I can only barely differentiate the colors.
That's a good point I haven't thought about. It's obvious mistake on my part. I will make the colors configurable perhaps. Interesting that none of the reviewers raised this point.
I wouldn't call some cases of UB like overflowing a signed int or running into a dangling reference "actively pushing the compiler". I'm having a hard time believing you didn't ever run into these or similar issues having "programmed for a long time".
For other cases (e.g. aliasing, alignment), I'd agree that one is rather safe as long as the dangerous tools (e.g. reinterpret_cast) are not used.
As others are pointing out, the C standard does allow this.
There is no safe way to check for undefined behavior (UB) after it has happened, because the whole program is immediately invalidated.
This has caused a Linux kernel exploit in the past [1], with GCC removing a null pointer check after a pointer had been dereferenced. Null pointer dereferences are UB, thus GCC was allowed to remove the following check against null. In the kernel, accessing a null ptr is technically fine, so the Linux kernel is now compiled with -fno-delete-null-pointer-checks, extending the list of differences between standard C and Linux kernel C.
> because the whole program is immediately invalidated.
The problem is the program isn't invalidated, it's compiled and run.
The malicious compiler introducing security bugs from Ken Thompson's "Reflections on Trusting Trust" is real, and it's the C standard.
I will grant that trying to detect UB at runtime may impose serious performance penalties, since it's very hard to do arithmetic without risking it. But at compile time? If a situation has been statically determined to invoke UB that should be a compile time error.
Also, if an optimizer determines that an entire statement has no effect, that should be at least a warning. (C lack C#'s concept of a "code analysis hint" which have individually configurable severity levels).
> If a situation has been statically determined to invoke UB that should be a compile time error.
That's simply not how the compiler works.
There is (presumably, I haven't actually looked) no boolean function in GCC called is_undefined_behavior().
It's just that each optimization part of the compiler can (and does) assume that UB doesn't happen, and results like the article's are then essentially emergent behavior.
C++ bans undefined behavior in constexpr, so you can force GCC to prove that code has no undefined behavior by sprinkling it in declarations where applicable:
Constant-evaluated expressions with undefined behavior are ill-formed but constexpr annotated functions which may in some invocations result in undefined behavior are not.
Does that mean it's acceptable for GCC to reformat my hard drive?
Just because something is UD doesn't give anyone a license to do crazy things.
If I misspell --help I expect the program to do something reasonable. If I invoke UD I still expect the program to do something reasonable.
Removing checks for an overflow because overflows 'can't happen' is just crazy.
UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself. A compiler undermining that is a bug and user hostile.
No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations.
> UD is supposed to allow C to be implemented on different architectures if you don't know whether it will overflow to INT_MIN it makes sense to leave the implementation open. If I, the user knows what happens when an int overflows then I should be able to make use of that and guard against it myself.
I think you're confusing UB with unspecified and implementation defined behavior. It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.
This has come up before, because, in some technical sense, the C standard does indeed not define what a "gcc" is, so "gcc --help" is undefined behavior according to the C standard, because the C standard does not define the behavior. By the same token, instrument flight rules are undefined behavior.
A slightly less textualist approach to language recognizes that when we talk about C and UB, we mean behavior, which is undefined, of operations otherwise defined by the C standard.
I think this is confusing undefined behavior with behavior of something that is undefined. And either way, the C standard explicitly applies to C programs, so even this cute "textualist" interpretation would be wrong, IMO.
But it is a simple example to illustrate how programs react when it receives something that isn't in the spec.
GCC could do anything with Gcc --hlep just like it could do anything with INT_MAX + 1. That doesnt mean that all options open to it are reasonable.
If I typed in GCC --hlep I would be reasonably pissed that it deleted my hard drive. You pointing out that GCC never made any claims about what would happen if I did that doesn't make it ok.
If you come across UD, there's reasonable and unreasonable ways to deal with that. Reformatting your hard drive which is presumably allowed by the C standard isn't reasonable. I would contend that removing checks is also unreasonable.
The general thinking seems to be that UB can do anything so you can't complain, whatever that anything is.
That would logically include reformatting your hard drive.
I definitely disagree with that pov, if you don't accept that UB can result in anything then the line needs to be drawn somewhere.
I would contend that UB stems from the hardware. C won't take responsibility for what the hardware does. Neither will it step in to change what the hardware does. That in turn means that UB means the compiler shouldn't optimise because the behaviour is undefined.
>No, it's not, and I don't know why you'd think so. UB is a concept applying to C programs, not GCC invocations
What should happen when I invoke --hlep then?
The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.
>I think you're confusing UB with unspecified and implementation defined behavior
Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.
>It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.
I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.
C gives you enough rope to hang yourself. It isn't required for GCC to tie the noose and stick your head in it though.
I think you're confusing UB with unspecified and implementation defined behavior
> What should happen when I invoke --hlep then? The program could give an error, could warn that it's an unrecognised flag. Could ask you if you meant --help. Infer you mean help and give you that, or it could give you a choo Choo train running across the screen. Or it could reformat your hard drive. Just because it isn't specifically listed as UD doesn't mean it's not. If it isn't defined then it's undefined. The question is what is the reasonable thing to do when someone types --hlep. I hope we can agree reformating your hard drive isn't the most reasonable thing to do.
I honestly don't understand the point of this paragraph.
> Am I? What's the reason for not defining integer overflow? Yes unspecified behaviour could be used to allow portability, but so can undefined.
Yes, you are confused about that. UB is precisely the kind of behavior where the C standard deemed it unsuitable to define as implementation defined or whatever, and it usually has really good reasons to do so. You could look them up instead of asking rhetorically.
> I'm not saying it shouldn't be UB. I'm saying there's reasonable and unreasonable things to do when you encounter UB. In the article the author took reasonable steps to protect themselves and the compiler undermined that. That isn't reasonable. In exactly the same way that --hlep shouldn't lead to my hard drive getting reformatted.
Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen. The ability to do that is fundamentally the point of UB. Situations like in the article are not a specific act of the compiler to screw you in particular, but an emergent result.
Additionally, I think you you're also confusing Undefined Behavior with 'behavior of something that is undefined'. These are not the same things.
>Again, you seem to fundamentally misunderstand how compilers work in this case. They largely don't "encounter" UB; It's optimization passes are coded with the assumption that UB can't happen
Which is as wrong as coding GCC to assume --hlep can't happen.
It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.
If you don't understand my --hlep example how about:
Int mian () {
What should the compiler do there? Same rules apply should it reformat your hard drive or warn you that it can't find such a function? There are reasonable and unreasonable ways to deal with behaviour that hasn't been defined.
If I put in INT_MAX + 1 it isn't reasonable to reformat my hard drive. The compiler doesn't have carte blanche to do what it likes just because it's UD. It should be doing something reasonable. To me removing an overflow check isn't reasonable.
If you want to have a debate about what is reasonable we can have that debate but if you're going to say UB means anything tlcan happen then I'm just going to ask why it shouldn't reformat your hard drive.
> It will happen and you need to deal with it when it does, and there are reasonable and unreasonable ways of dealing with that.
A compiler's handling of UB simply can't work the same way handling flag passing works in GCC. Fundamentally.
With GCC, the example is something like:
if (strcmp(argv[1], "--help") == 0) { /* do help */ } else { /* handle it not being help, for example 'hlep' or whatever */ }
Here, GCC can precisely control what happens when you pass 'hlep'.
Compilers don't and can't work this way. There is no 'if (is_undefined_behavior(ast)) { /screw the user / }'. UB is a property of an execution, i.e. what happens at runtime, and can't _generally_ be detected at compile time. And you very probably do not want checks for every operation that can result in UB at runtime! (But if you do, that's what UBSan is!).
So, the only way to handle UB is either
1) Leaving the semantics of those situation undefined (== not occuring), and coding the transformation passes (so also opt passes) that way.
or
2) Defining some semantics for those cases.
But 2) is just implementation defined behavior! And that is what you're arguing for here. You want signed integer overflow to be unspecified or implementation defined behavior. That's fine, but a job for the committee.
It's basically dead code removal. X supposedly can't happen so you never need to check for X.
The instance in the article is about checking for an overflow. The author was handling the situation. C handed him the rope, he used the rope sensibly checking for overflow. GCC took the rope and wrapped it around his neck. Fine GCC (and C) can't detect overflow at compile time and doesn't want to get involved in runtime checks. Leave it to the user then. But GCC isn't leaving it to the user it's undermining the user.
Re 2) (are you referring to gccs committee or the c committee?)
I don't mind what it's deemed to be, I expect GCC to do something reasonable with it. Whatever happens a behavior needs to be decided by someone. Some of those behaviours are reasonable some aren't. If you're doing a check for UB, the reasonable thing, to me is to maintain that check.
I could make a choice when I write an app to assume that user input never exceeds 100 bytes. I could document it saying anything could happen, then reasonably (well many people would disagree) leave it there, that is my choice.
If you come along and put 101bytes of input in you would complain if my app then reformatted your hard drive. Wouldn't you also complain if GCC did the same?
There's atleast a post a week complaining about user hostile practices with regard to apps. Why do compiler writers get a free pass?
If I put up code assuming user input would be less than 100 bytes documented or not, someone would raise that as an issue so why the double standard.
I'm not even advocating the equavalent of safe user input. I'm advocating that just because you go outside the bounds of what is defined, you do something reasonable.
> If you're doing a check for UB, the reasonable thing, to me is to maintain that check.
The problem is that you need to do the check before you cause UB, not after, and here the check appears after. If you do the check before, the compiler will not touch it.
The compiler can't know that this code is part of a UB check (so it should leave it alone), whereas this other code here isn't a UB check but is just computation (so it should assume no UB and optimise it). It just optimises everything, and assumes you don't cause UB anywhere.
Now, I'm not defending this approach, but C works like this for performance and portability reasons. There are modern alternatives that give you most or all of the performance without all these traps.
How would you do the check in the article in a more performant way?
Philosophically I'm not sure it's even possible. Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out? Yes you can make it unrelated enough that the compiler doesn't realise. But really if the compiler can always assume you aren't going to overflow integers, then it should be able to optimise away 'stupid' questions like 'if I add X and y, would that be an overflow?'.
>The compiler can't know that this code is part of a UB check
If it doesn't know what the code is then it shouldn't be deleting it. It has just rearranged code that it knows is UB, it is now faced with a check on that UB. It could (and does) decide that can't possibly happen, because 'UB'. It could instead decide that it is UB and so doesn't know if this check is meaningful or not, and not delete the check, this to me is the original point of UB, C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does. The performance of c as I understood it is that overflow check is optional, you aren't forced to check. But you are required to ensure that the check is done if needed, or deal with the consequences.
Would you get rid of something you don't understand because you can't see it doing something useful. Or would you keep it because you don't know what you might break when you delete it? GCC in this case is deleting something it doesn't understand. Why is that not a bug?
> Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out?
No, if you never do the calculation it's not going to be UB.
int8_t x = some_input();
if (x > 10) return bad_value;
else x *= 10;
There is no UB here, because we never execute the multiplication in cases where it would have otherwise been UB. The compiler is not free to remove the check, because it can't prove that the value is not > 10.
> It has just rearranged code that it knows is UB
No - that's the problem. The compiler doesn't know that the code is UB, because this depends on the exact values at runtime, which the compiler doesn't know.
In some limited cases it could perform data flow analysis and know for sure that it will be UB, but those cases are very limited. In general there is no way for to know. So there are three things it could do:
A) Warn/error if there could possibly be UB. This would result in warnings in hundreds of thousands of pieces of legitimate code, where there are in fact guarantees about the value but the compiler can't prove or see it. It would require much more verbose code to work around these, or changing the language significantly. For example, you could represent this in the type system, or have annotations.
B) Insert runtime checks for the UB. This would have a significant performance overhead, as there are lots of "innocent" operations in the language that, in the right circumstances, lead to UB. So we would bloat the code with a lot of branches, 99.999% of which will never ever be taken, filling up the instruction cache and branch predictor. You get something more like (the runtime behaviour of) Python or JavaScript. Or even C if you enable UBSan.
C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB. This is what C chooses.
> C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does
This is mostly right, but with the caveat that you can't invoke UB. If you want to deal with whatever the underlying representation is, cast it to an unsigned type and then do whatever you want with it. The compiler will not mess with your unsigned arithmetic, because it's allowed to wrap around. But for signed types, you are promising to the compiler that you won't cause overflow. In exchange the compiler promises you fast signed arithmetic.
This promise is part of the language, not part of GCC. If you removed that promise, you would have to pay the price in reduced performance.
Could you have a C compiler that inserts these checks? Yes (see UBSan). But you would be throwing away performance - it would be slower than GCC/Clang/MSVC/etc. If you're writing performance-sensitive software, you are better off either ensuring you never trigger UB, or use another language like Rust. If performance is not so important, you are probably better off writing the thing in Go/JavaScript/whatever.
>No, if you never do the calculation it's not going to be UB.
int8_t x = some_input();
if (x > 10) return bad_value;
else x *= 10
In this simple case yes. But what if you don't know what you're going to multiply by? What if you can't say that X is a bad value?
If you have:
Long long x = ?;
Long long y = ?;
If (????); x *= y;
I don't know the answer to this. I've looked online and the answers invoke UB. The best I can think of is a LUT of safe / unsafe combinations, but that isn't faster, and when you're at that point you may as well give up on the MUL hardware in your cpu, I'm not even sure how to safely calculate the LUT, I suppose you could iterate with additions subbing the current total from int_max and checking if that's bigger than the number you're about to add.
But that's frankly stupid. And again you are basically checking if something is going to be UB which can't happen the compiler is therefore free to remove the check.
Or do you roll your own data type with unsigned ints and a sign bit? But but then what's the point of having signed ints, and what happens to Cs speed. Or is there some bit twiddling you can do?
>No - that's the problem. The compiler doesn't know that the code is UB
Ok I should properly have said, code it can't prove isn't UB.
If it can't say X + y isn't an overflow it shouldn't just assume it can't.
If y is 1 and X is probably 9 it wouldn't be reasonable to assume the sum is 10.
>C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB
You get the performance by avoiding option B. I'm not even sure the programmer is responsible for avoiding UB? UB just doesn't give guarantees about what will happen. You should still be able to invoke it, and I would contend, expect the compiler to do something reasonable.
It is tedious but possible to check for overflow before multiplying signed integers.
long long x = (...);
long long y = (...);
long long z;
// Portable
bool ok = x == 0 || y == 0;
if (!ok) {
long long a = x > 0 ? x : -x;
long long b = y < 0 ? y : -y;
if ((x > 0) == (y > 0))
ok = -LONG_LONG_MAX / a <= b;
else
ok = LONG_LONG_MIN / a <= b;
}
if (ok)
z = x * y;
// Compiler-specific
bool ok = !__builtin_smulll_overflow(x, y, &z);
> It's fine if you think something shouldn't be UB, but you have to go lobbying the C standard for that. Compiler writers aren't to blame here.
I'm glad I don't live in your country, where the C standard has been incorporated into law, making it illegal for compiler writers to do things that are helpful to programmers and end users, but aren't required by the standard.
> UD is supposed to allow C to be implemented on different architectures
No, that's wrong. Implementation-Defined Behavior is supposed to allow C to be implemented on different architectures. In those cases, the implementation must define the behavior itself, and stick with it. UB, on the other hand, exists for compiler authors to optimize.
If you want to be mad at someone, be mad at the C standard for defining so much stuff as UB instead of implementation-defined behavior. Integer overflow should really be implementation-defined instead.
Not only to optimize but to write safety tools. If you defined all the behavior, and then someone used some rare behavior like integer overflow by accident, it'd be harder to detect that since you have to assume it was intentional.
UB is also very much based around software incompatibilities though, not just the ability to optimise stuff.
But where IB can have useful definitions to document, UB was defined so because the behaviours were considered sufficiently divergent that allowing them was useless, and so it was much easier to just forbid them all.
You're getting it backward. UB doesn't immediately stop compilation only due to implementation defined backward compatibility, just because you don't want to break compilation of existing programs each time the compiler converges to the C spec and identified an implementation of undefined behavior.
And since you want some cross-compiler compatibility, you also import's third parties implementation defined UB.
This is not some conceptual reasonable decision, the proper way would be to throw out compilation on each UB behavior. The reality is that the proper way would be too harsh on existing codebase, making people use a less strict compiler or not updating version, which are non-desirable effects for compilers writers.
I can't really follow. What would be wrong with making -fwrapv the default? i.e. let the compiler assume signed integers are two's complement on according platforms (i.e. virtually everything in use today). Then stop assuming "a + 1 < a" cannot be true for signed ints. How would that make existing code worse, or break it? It's basically what you already get with -O0 afaict, so any such program would be broken with optimizations turned off.
I think I misunderstood your comment, sorry, but I have difficulties in understanding how it's different that how things works already, then. You either have to rely that the compiler author did chose what you expect (not the case here), or check by yourself and hope it won't change.
Well no, it's a compilation error, you need at the very least a semicolon after hlep and from there on it depends on what GCC is. If it's a function you need parentheses around --hlep, if it's a type you need to remove the --, if it's a variable you need to put a semicolon after it,...
Because GCC is all-caps I'm guessing it's a macro, so here's an example of how you could write it (though it won't be UB): https://godbolt.org/z/dYMddrTjj
I'm not sure if you're supporting my pov by showing the absurdity of the other position???
Yeah sure, if my phone auto incorrects gcc to GCC then that is technically meaningless so you're completely free to interpret my comment how you want.
..... Although..... GCC stands for GNU Compiler Collection so it can be reasonably capitalised, so maybe then, rather than saying anything goes we should do something reasonable because then you aren't left saying something really stupid if you're wrong???
Parent point is when the standard talks about UB it refers about translating C code. So parent cheekly interpreted your comment about command line flags (which are outside the remit of the standard) as code instead. I thought it was fitting.
The example here doesn't have compile-time known undefined behavior though; as-is, the program is well-formed assuming you give it safe arguments (which is a valid assumption in plenty of scenarios), and the check in question is even kept to an extent. Actual compile-time UB is usually reported. (also, even if the compiler didn't utilize UB and kept wrapping integer semantics, the code would still be partly broken were it instead, say, "x * 0x1f0 / 0xffff", as the multiplication could overflow to 0)
The problem with making the compiler give warnings on dead code elimination (which is what deleting things after UB really boils down to) is that it just happens so much, due to macros, inlining, or anything where you may check the same condition once it has already been asserted (by a previous check, or by construction). So you'd need some way to trace back whether the dead-ness comes directly from user-written UB (as opposed to compiler-introduced UB, which a compiler can do if it doesn't change the resulting behavior; or user-intended dead code, which is gonna be extremely subjective) which is a lot more complicated. And dead code elimination isn't even the only way UB is used by a compiler.
> also, even if the compiler didn't utilize UB and kept wrapping integer semantics, the code would still be partly broken were it instead, say, "x * 0x1f0 / 0xffff", as the multiplication could overflow to 0
That's the most important point! You simply cannot detect overflow when multiplying integers in C after the fact. This is not GCC's fault.
I agree that some of the optimizations exploiting UB are too aggressive, but the article presents a really bad example.
> If a situation has been statically determined to invoke UB that should be a compile time error.
But you typically can’t prove that. There’s lots of code where you could prove it might happen at runtime for some inputs, but proving that such inputs occur would, at least, require whole-program analysis. The moment a program reads outside data at runtime, chances are it becomes impossible.
If you want to ban all code that might invoke it it boils down to requiring programmers to think about adding checks around every addition, multiplication, subtraction, etc. in their code, and add them to most of them. Programmers then would want the compiler to include such checks for them, and C would no longer be C.
C will accept every valid program, at the cost of also accepting some invalid programs. Rust will reject every invalid program, at the cost of also rejecting some valid ones.
("unsafe" (aka "trust me" mode) means that's not quite true, and so do some of the warnings and errors that you can enable on a C compiler, but it's close enough)
> But you typically can’t prove that. There’s lots of code where you could prove it might happen at runtime for some inputs, but proving that such inputs occur would, at least, require whole-program analysis. The moment a program reads outside data at runtime, chances are it becomes impossible.
No, I specifically ruled out doing that in my comment.
I was referring to the situation where a null check was deleted because the compiler found UB through static analysis.
(Or specifically, placing a null check after a possibly-null usage. It is wrong to assume that after possibly-null usage the possibly-null variable is definitely-null.)
As I recall, the compiler didn't know it had found undefined behaviour. An optimisation pass saw "this pointer is deferenced", and from that inferred that if execution continued, the pointer can't be null.
If the pointer can't be null, then code that only executes when it is null is dead code that can be pruned.
Voila, null check removed. And most relevantly, it didn't at any point know "this is undefined behaviour". At worst it assumed that dereferencing a null would mean it wouldn't keep executing.
The compiler didn't find UB. What it saw was a pointer dereference, followed by some code later on that checked if the pointer was null.
Various optimisation phases in compilers try to establish the possible values (or ranges) of variables, and later phases can then use this to improve calculations and comparisons. It's very generic, and useful in many circumstances. For example, if the compiler can see that an integer variable 'i' can only take the values 0-5, it could optimise away a later check of 'i<10'.
In this specific case, the compiler reasoned that the pointer variable could not be zero, and so checks for it being zero were pointless.
The compiler now knows x's possible range is non-negative.
int32_t i = x * 0x1ff / 0xffff;
A non-negative multiplied and divided by positive numbers means that i's possible range is also non-negative (this is where the undefinedness of integer overflow comes in - x * 0x1ff can't have a negative result without overflow occurring).
if (i >= 0 && i < sizeof(tab)) {
The first conditional is trivially true now, because of our established bounds on i, so it can just be replaced with "true". This is what causes the code to behave contrary to the OP's expectations: with his execution environment in the overflow case we can end up with a negative value in i.
It is probably more precise to say “if the pointer is null, then it doesn’t matter what I do here, so I am permitted to eliminate this” than to say that it can’t be null here. (It can’t be both null and defined behavior.)
I'm not sure that's right. The compiler isn't tracking undefined behaviour, it is tracking possible values. It just happens that one specific input into determining these values is the fact "a valid program can't dereference a null pointer", so if the source code ever dereferences a pointer, the compiler is free to reason that the pointer cannot therefore be null.
In essence, the compiler is allowed to assume that your code is valid and will only do valid things.
Consider function inlining, or use of a macro to for some generic code. For safety, we include a null check in the inlined code. But then we call it from a site where the variable is known to not be null.
The compiler hasn't found UB through static analysis, it has found a redundant null check.
> I was referring to the situation where a null check was deleted because the compiler found UB through static analysis.
You can say that but in practice -Onone is fairly close to what you're asking for already. Most people are 100% unwilling to live with that performance tradeoff. We know that because almost no one builds production software without optimizations enabled.
The compiler is not intelligent. It just tries to make deductions that let it optimize programs to run faster. 99.999% of the time when it removes a "useless" null check (aka branch that has to be predicted and eat up branch prediction buffer space and bloats up the number of instructions) it really is useless. The compiler can't tell the difference between the useless ones and security critical ones because all of them look the same and are illegal by the rules of the language.
Even if you mandate that null checks can't be removed that doesn't fix all the other situations where inserting the relevant safety checks have huge perf costs or where making something safe reduces to the halting problem.
FWIW I agree that the committee should undertake an effort to convert UB to implementation-defined where possible... for example just mandate twos complement integer representations and make signed integer overflow ID.
To illustrate the complexity: most loops end up using an int which is 32-bit on most 64-bit platforms so if you require signed integer wrapping that slows down all loops because the compiler must insert artificial checks to make the 64-bit register perform 32-bit wrapping and we can't change the size of int at this point.
FWIW I agree that the committee should undertake an effort to convert UB to implementation-defined where possible... for example just mandate twos complement integer representations and make signed integer overflow ID.
To accomodate trapping implementations you'd have to make it "implementation-defined or an implementation-defined signal is raised" which it happens is exactly the wording for when an out-of-range value is assigned to a signed type. In practice it means you have to avoid it in your code anyway because "an implementation-defined signal is raised" means "your program may abort and you can't stop it".
But again, the compiler did not find UB through static analysis. The compiler inferred that the pointer could not be null and removed a redundant check.
For example you would you not expect a compiler to remove a redundant bound check if it can infer that an index can't be out of range?
The compiler made a dangerous assumption that the standard permits ("the author surely has guaranteed, through means I can't analyze, that this pointer will never be null").
Then it encountered evidence explicitly contradicting that assumption (a meaningless null check), and it handled it not by changing its assumption, but by quietly removing the evidence.
> For example you would you not expect a compiler to remove a redundant bound check if it can infer that an index can't be out of range?
If it can infer it from actually good evidence, sure. But using "a pointer was dereferenced" as evidence "this pointer is safe to dereference" is comically bad evidence that only the C standard could come up with.
If I had written the above code, I had clearly done something wrong. I would not want the compiler to remove the second check. I'd want it to (at the very least) warn me about an unreachable return statement, so that I could remove the actual meaningless code.
It's been long enough since I wrote C that I'm not familiar with that noreturn syntax or the contract I guess it implies, but control flow analysis which can prove the code will never be run, should all ideally warn me about it so that I can remove it in the source code, not quietly remove it from the object code.
I'm not demanding that it should happen in every case, but the cases where it's undecidable whether a statement is reachable or not, obviously it's undecidable for purposes of optimizing away the statement too.
The first check might be in a completely different function in another module (for example a postcondition check before a return). Removing dead code is completely normal and desirable, warning every time it happens would be completely pointless and wrong.
libX_foo from libX gets at some point updated to abort if the return value would be null. After interprocedural analysis (possibly during LTO) the compiler infers that the if statement is redundant.
Should the compiler complain? Should you remove the check?
Consider that libX_foo returning not-null might not be part of the contract and just an implementation detail of this version.
> How is it an “implementation detail” whether a procedure can return null? That's always an important part of its interface.
In gpderetta's example, the interface contract for that function says "it can return null" (which is why the calling code has to check for null). The implementation for this particular version of the libX code, however, never returns null. That is, when the calling code is linked together with that particular version of the libX interface, and the compiler can see both the caller and the implementation (due to link-time optimization or similar), it can remove the null check in the caller. But it shouldn't complain, because the null check is correct, and will be used when the program is linked with a different version of the libX code which happens to be able to return null.
For a more concrete example: libX_foo is a function which does some calculations, and allocates temporary memory for these calculations, and this temporary allocation can fail. A later version of libX_foo changes the code so it no longer needs a temporary memory allocation, so it no longer can fail.
And LTO is not even necessary. It could be an inline function defined in a header coming from libX (this kind of thing is very common in C++ with template-heavy code). The program still cannot assume a particular version of libX, so it still needs the null check, even though in some versions of libX the compiler will remove it.
The contract is that libX_foo can return null. But a specific implementation might not. Now you need to remove the caller side check to shut up the compiler which will leave you exposed to a future update making full use of the contract.
Also consider code that call libX_foo via a pointer. After specialization the compiler might see that the check is redundant, but you can't remove the check because the function might still be called with other function pointers making full use of the contract.
I'd expect any reasonable library to say “libX_foo returns null if [something happens]”. What use is there in a procedure that can just return null whenever it feels like it?
It returns null when it fails to do its task for some reason. It is not unreasonable for the condition for that failure to be complex enough or change over time so it doesn't make sense to spell it out in the interface contract.
You typically can't prove it, but if and when you can prove it, you should definitively warn about it or even refuse to compile.
Things like that meaningless null check mentioned, can definitively be found statically (the meaningless arithmetic sanity check in OP's example, I'm not so sure, at least not with C's types).
So, how much effort should the standard require a compiler to make for “if and when you can prove it”? You can’t, for example, reasonably require a compiler to know whether Fermat’s theorem is true if that’s needed to prove it.
There are languages that specify what a compiler has to do (e.g. Java w.r.t. “definite assignment” (https://docs.oracle.com/javase/specs/jls/se9/html/jls-16.htm...)), and thus require compilers to reject some programs that otherwise would be valid and run without any issues, but C chose to not do that, so compilers are free to not do anything there.
Everyone wants to drag nontermination into this, but in the OP's example, the compiler already had proof that a the conditional would never evaluate to true. What you can or can't prove in the bigger picture isn't so interesting when we already have the proof we need right now.
It's just that it used this proof to remove the conditional evaluation (and the branch) instead of warning the user that he was making a nonsensical if statement.
So to the question of "when can we hope to do it" the answer is, "not in all cases, sure, but certainly in this case".
> Compiling and running as if nothing is amiss is exactly how UB is allowed to look like.
Yes, and this is a "billion-dollar mistake" that's responsible for an ongoing flow of CVEs.
(the proposal to replace "undefined" with "implementation-defined" may be the only way of fixing this, and that gets slightly easier to do as the number of actively maintained C implementations shrinks)
You can already do that to some extent. There's tons of compiler flags that make C more defined. Eg both clang and gcc support `-fno-strict-overflow` to define signed integer overflow as wraparound according to two's complement.
-fwrapv introduces runtime bugs on purpose! The last thing you want is an unexpected situation where n is an integer and n+1 is somehow less than n. And of course that bug has good chances of leading to UB elsewhere, such as a bad subscript. If you want to protect from UB on int overflow, -ftrapv (not -fwrapv) is the only sane approach. Then at least you'll throw an exception, similar to range checking subscripts.
It is sad that we don't get hardware assistance for that trap on any widespread cpu, at least that I know of.
Without you have to do convoluted things like rearranging the expression to unnatural forms (move the addition to the right but invert to subtraction, etc), special case INT_MAX/INT_MIN, and so on - which you then have to hope the compiler is smart enough to optimize, which it often isn't (oh how ironic).
We've got a few components written in C that I'm (partially) responsible for. It's mostly maintenance, but for reasons like this I run that code with -O0 in production, and add all those kinds of flags.
I'd be curious to know how much production code today that's written in C is that performance critical, i.e. depends on all those bonkers exploits of UB for optimizations. The Linux kernel seems to do fine without this.
I'm fairly confident in declaring the answer to your question: None.
Most programs rarely issue all the instructions that a CPU can handle simultaneously, they are stuck waiting on memory or linear dependencies. An extra compile-out-able conditional typically doesn't touch memory and is off the linear dependency path, which makes it virtually free.
So the actual real-world overhead ends up at less than 1%, but in most cases something that is indistinguishable from 0.
If you care that much about 1% you are probably already writing the most performance critical parts in Assembly anyway.
> If you care that much about 1% you are probably already writing the most performance critical parts in Assembly anyway.
I call this hotspot fallacy and it is a common one. This assumes there is relatively small performance critical parts that can be rewritten in assembly. Yes, sometimes there is a hotspot, but by no means always. A lot of people caring about 1% is running gigabytes binary on datacenter scale computer without hotspots.
I had read it a long time ago, and had since forgotten the source. I've spent a few hours trying to find it in bug-trackers. really glad to have the link now, thanks!
The C standard doesn't really matter. Standards don't compile or run code. Only thing that matters is what the compilers do. "Linux kernel C" is a vastly superior language simply because it attempts to force the compiler to define what used to be undefined.
This -fno-delete-null-pointer-checks flag is just yet another fix for insane compiler behavior and it's not the first time I've seen them do it. I've read about the Linux kernel's troube with strict aliasing and honestly I don't blame them for turning it off so they could do their type punning in peace. Wouldn't be surprised if they also had lots more flags like -fwrapv and whatnot.
I don't believe that it does. If the invalid arithmetic proceeds without crashing, and produces a value in the int32_t i variable, then that issue is settled. The subsequent statement should behave according to accessing that value.
"Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message)."
Ignoring the situation completely means exactly that: completely. The situation is not being ignored completely if the compilation of something which follows is predicated upon the earlier situation being free of undefined behavior.
OK, so since the situation is not being ignored completely, and translation or execution is not terminated with a diagnostic message, it must be that this is an example of "behaving in a documented manner characteristic of the implementation". Well, what is the characteristic; where is it documented? That part of the UB definition refers to documented extensions; this doesn't look like one.
What is "characteristic of the implementation" is in fact that when you multiply two signed integers together with overflow, that you get a particular result. A predictable result characteristic of how that machine performs the multiplication. If the intent is to provide a documented, characteristics behavior, that would be the thing to document: you get the machine multiplication, like in assembly language.
> I don't believe that it does. If the invalid arithmetic proceeds without crashing, and produces a value in the int32_t i variable, then that issue is settled. The subsequent statement should behave according to accessing that value.
You may dislike it, but that is not how UB in C and C++ works. See [1] for a guide to UB in C/C++ that may already have been posted elsewhere here.
It is a common misconception that UB on a particular operation means "undefined result", but that is not the case. UB means there are no constraints whatsoever on the behavior of the program after UB, often referred to as "may delete all your files". See [2] for a real-world demo doing that.
> If the invalid arithmetic proceeds without crashing, and produces a value in the int32_t i variable, then that issue is settled. The subsequent statement should behave according to accessing that value.
The C standard imposes no such constraint on undefined behaviour, neither is it the case that real compilers always behave as if it did.
Even if this solution cannot be used for the Linux kernel, for user programs written in C the undefined behavior should always be converted into defined behavior by using compilation options like "-fsanitize=undefined -fsanitize-undefined-trap-on-error".
It is hard to overestimate the importance of the OEIS in enumerative combinatorics.
I discovered the main results of my PhD thesis essentially as follows:
1. Find complicated construction A, hoping to prove some new results.
2. Fail to sufficiently understand/analyze A.
3. Write computer program to analyze characteristics of A for small n.
4. Using OEIS, discover that apparently A is (in some sense) equivalent to some completely different construction B, which is much simpler and well-understood.
5. Show desired result as well as further other results using B and variations of it.