One of the things this person misses is a lot of these are undefined with the explicit goal to let compiler authors take advantage of them.
So his real argument is with the C committee for not defining these things the way he wants them. His claim that the behavior of what should happen is obvious is pretty strange (IE "it should do exactly whatever the compiler is trying to prove it doesn't")
It's essentially a complaint that compiler authors should follow a mystical spirit of C that he seems to have in his head.
(Plus, you know, he's kind of a dick about it all)
I read things like "We had plans to turn this optimization into a production feature, but eventually
realized that the GCC maintainers only care about certain benchmarks
and treat production programmers as adversaries that deserve getting their code
broken.
...
The current GCC and Clang maintainers suggest that we should convert
programs to “C”. How would that turn out for Gforth?"
I was intrigued, so i went looking for the mailing list discussion where the gcc or clang people suggested this or where any of this happened.
I can't find it.
On either set of mailing lists.
I searched for gforth, i searched for anton, etc
Nothing.
> lot of these are undefined with the explicit goal to let compiler authors take advantage of them.
Constructs should be undefined because the current landscape of C implementations is surveyed, and those constructs are found to be nonportable to such an extent that their behavior cannot be even "implementation-defined" or "unspecified".
That's the beginning and end of it, period.
That's why certain thing were undefined or unspecified in the first ANSI C standard. For instance, different compilers had different evaluation order for function arguments, without documenting which, or guaranteeing that it would always be the way it is.
The committee didn't sit there and decide "let's make evaluation orders undefined for the sake of blazing speed, even though all C compilers today are doing it left to right. Let's have it so that after 1989, they can use whatever order they want!"
It was, "Ooops, we can't codify a specific evaluation order, because looking at what's out there, there isn't one. If we were to codify an order, implementations not having the order that we codify would have to be modified to conform with our standardized language."
As time goes on, these stupid things should be tightened. Over the course of thirty years, it's okay to codify some gratuitous lack of definedness in a language area and make everyone straighten it out in their implementations.
I believe the order of evaluation of function arguments is unspecified (though it might lead to undefined behaviour in some cases, I'm not sure).
---
Signed integer overflow is more interesting. In most platforms, it is very well defined: it either wraps around or it saturates. On one's complement machines I don't know, though I'd expect a deterministic (yet mostly nonsensical) result.
So it might sound reasonable to have signed integer overflow be implementation defined. Except that another reasonable behaviour is to trigger a trap, just like divide by zero. So they put in into the "anything can happen" bucket, that is, undefined. And now we have compilers taking advantage of this, by assuming signed overflow simply never happens. Which sometimes lets them deduce interesting things such as `k<k+1` evaluates to `true` no matter what.
I'm not sure the standard bodies anticipated that. I bet they were thinking something along the lines of "the runtime can assume it never happens, and fail to perform the relevant checks in the name of performance". I'm not sure they originally interpreted this as "the compiler can assume it never happens, and treat the alternative as dead code".
Overall, I see 3 kinds of undefined behaviour: the most innocuous one is when the runtime is allowed to trigger a trap, and crash the program right away. The second one is when the runtime is allowed to not perform the relevant check, and go bananas because of mangled memory or something. This one is more serious, and the source of many security vulnerabilities. The third one is when the compiler can assume the relevant undefined behaviour doesn't even happen, and make lots of interesting deductions from this assumption. This one is mostly insane, though I can understand why it would be nice in some limited cases.
We need 3 buckets for undefined behaviour. "Anything can happen" is too coarse grained.
While his tone may be unwarranted, I think his claims are more reasonable than you suggest. I'm not sure where the actual bug reports are, but here's a long but excellent thread on comp.arch where Anton makes his point more fully: http://compgroups.net/comp.arch/if-it-were-easy/2993157
Edit: In another comment, you say "The whole thing is like complaining that people follow the law, instead of the spirit of the law." I think that might be accurate. Impractical as it may be, there are many (including me and presumably Anton) who believe that society benefits if individuals hold themselves to a higher standard than obeying the letter of the law and flaunting the spirit.
Whether his claims are warranted or not, his vitriol is still directed at the wrong place. If the behavior he claims is so easy to define is so easy, obvious, and beneficial, as he claims, he should just submit defect reports and get it fixed. Blaming the compiler authors is just silly (in the sense that it will never change anything, and is not effective)
As for folks "holding themselves to a higher standard", it's not really a sane thing to do here, because there is no universally agreed upon, or heck, even "precisely defined"set of "bad things" that they should not do. Depending who you ask, you will find differing answers on what people think should not be taken advantage of.
IE despite his claims, it's not even clear what this "higher standard" is supposed to look like.
His point as I understand it is that the people who would be doing the fixing are unlikely to agree that the defects he is reporting are actually defects, and that reporting them is therefore futile, absent a shift in consensus on the topic of undefined behavior. The paper exists in order to help promote that shift.
This is not an effective way, at all, to promote such a shift, of course. It's also not like this is the first paper. There are plenty of papers (he even cites a bunch) saying the same thing. So i'm not sure why anyone would think this would do much ...
Convince any of the 100+ vendors producing gcc forks to do it, have their customers sing their praises, and others will follow.
yelling about it on the internet is not likely to do much :)
> One of the things this person misses is a lot of these are undefined with the explicit goal to let compiler authors take advantage of them.
It doesn't look like it's missing it to me:
"C compiler maintainers also claim that these “optimizations” give speedups for other programs. However, that would require programmers to convert their C* programs to “C” first, a process which can produce worse code (Section 3), and more importantly, requires an effort that would be much more effective if directed at source-level optimizations."
The goal is to let compiler authors take advantage of undefined behavior for, among other things, performance. The counterclaim is that these goals are noble, but that letting compiler authors take advantage of undefined behavior as a means to achieve those goals is, unfortunately, counterproductive.
> Convince any of the 100+ vendors producing gcc forks to do it, have their customers sing their praises, and others will follow.
So... yell (positive things) on the internet?
> yelling about it on the internet is not likely to do much :)
Oh. Well, for what it's worth, I agree. As such, I spend my energy investing in staying the hell away from C when possible. I then yell positive things on the internet about alternatives for my own amusement. Still doesn't do much, but it does a little.
Anton is a compiler writer himself (part of the Gforth team, which I lead). We do submit bug reports. They get rejected as "invalid", on UB. It is pretty clear that the way we use GCC to implement Gforth is C* and not "C", we use extensions to GCC as source level optimizations. Without them, Gforth would be a factor 3-5 slower than it is now. We experience slowdowns in that order when we have to switch back to "C" instead of finding a workaround for that particular "optimization".
So sane behavior of the compiler is absolutely necessary to create fast programs. The tone reflects more than 10 years of frustration about "C" maintainers, and their inability do deal with the needs of their customers, and our long-term goal is to completely throw out C, writing our own code generator instead. It's going to give us another factor 2-5 (depending on the problem; some microbenchmarks might gain a factor 10) over the C*-based solution we have now. And, at the same time, we can forget about releasing a bug workaround patch for Gforth for every new GCC release, as it was the case for GCC 3.x and 4.x (GCC 5.x is better, some of the critics has been heard). This is feasible, as the number of popular architectures has decreased so far that a small team can support the remaining ones (which is essentially x64, ARM, and ARM64, with some variations within the family).
As a GNU project, we understand ourselves as customers of the other, very important GNU project, GCC, of which we, and almost everybody else in the GNU project, depend. If it's broken, it is too bad. And if the bug report process is broken due to the "UB=we can blow up your program" attitude, it's worse. If the competing comiler maintainers, clang, have the same attitude, we can't even switch.
We are also part of a language standard effort (Forth, of course), so we know how such a committee think; this is not different just because the language is different. What we don't understand is why the C standard has a lot of UB, and only little IB (implementation defined behavior). For example integer overflow: This is a very obvious candidate for IB, as the behavior of the underlying hardware defines integer overflow; usually as wrapv in two's complement, sometimes as trapv, but the other options like one's complement and such are non-existent. The result of such an operation is never "undefined", and as wrapv two's complement is the dominating implementation, it is pretty portable, as well.
Or how pointers look like (pointer comparison): You know how your pointers will look like when you implement a C compiler, so you can fully specify the behavior of pointer comparisons. It's IB, because segmented architectures like 8086 with their overlapping segments simply don't give reliable and fast results. But if you don't program for MS-DOS, you don't need to care: usually pointers are just register-sized integers pointing to a flat, non-segmented memory, and pointer comparison is just compiled to integer comparison.
Bottom line: If you don't understand what your underlying machine does, and how your code generator translates statements to instructions, and therefore you think that x-1>x never can become true, you shouldn't be writing compilers.
> So his real argument is with the C committee for not defining these things the way he wants them.
To some extent yes. But if you take it to that extreme – where you really have to understand the standard at the language lawyer level –, C becomes an unreasonably complicated language where you can only trust experts to have a deep enough understanding to write correct code.
I've deleted my post, but in this case, I was speaking to two actual LLVM developers, not just random dudes on IRC. Anyway, it's not their fault, "get what you pay for" and all that, and like I said, I like and use LLVM and really can't complain. :)
I don't think he's being tongue in cheek. I bet you he's genuinely angry and resentful. From his point of view, he once had a lovely language, C*, and he can't use it any more, and is being forced into assembly instead, for no reason other than the perversity of the maintainers of the only two C compilers that he feels that he can use.
I don't have enough karma to get a down vote button. Even if I did, I'm not sure that voting on your own posts is allowed. I up voted you instead.
Yes, the title is a play on the llvm post. I could be wrong about this, but the way I took it was like this:
"No, don't you presume to tell me what I need to know about undefined behavior. I am your customer, and here's what you need to know about me. Stop treating me like an enemy."
But you could be right. I'm bad enough at detecting jokes and sarcasm in person with people I know. Over a textual medium with a stranger, I don't really trust myself much.
I'm a compiler writer. I'm not actually unsympathetic to these points, and agree that compilers intended for practical use had better have strong justification for surprising behavior.
But the pervasively nasty tone of the paper is not going to make many friends among the people whose work the authors are trying to influence.
I agree about the unpleasant tone of the paper, putting "facts" and "optimizations" in quotes, implying they are not real, and talking about "claims" of GCC maintainers, and cracks about psychic powers.
But there is a good underlying point to the paper, and one I tend to agree with - a compiler's behavior should give considerable weight to how the language is used more than taking a literal view of what the words mean in the spec.
Taking things to an absurd extreme to make the point, "undefined behavior" implies that a compiler writer can insert code to reformat the hard disk if it encountered such code. Granted, the point of a spec is to define things so that compiler writers don't have to divine intent, but perfection is never attained and hence compiler writers have to use their judgment.
I know in my compiler work, I have backed out some optimizations that were allowed by the spec, but broke too many programs. The inconsequential speed improvement was just not worth it.
It may be unpleasant, but the reality is that many programmers do feel that way. I can say that as a kernel programmer, I approach using a new version of gcc with a particular version of dread. What sort of random bugs that might end up causing kernel crashes or data corruption will arise when I try using a new compiler?
As a result, I'm quite sympathetic to the attitude of "any sufficiently advanced compiler is indistinguishable from a malicious adversary"....
I'm just a crummy engineer that writes a fair amount of embedded firmware. When this subject comes up I've found mainstream programmers attitude towards these sorts of concerns to be actually hostile. A lot of programmers think when specification gives the compile maintainers a choice between making the code do something platform specific or something silently malicious, to do the malicious thing in order to 'teach the programmer a lesson'
I'm with you. These guys aren't helping us, our customers or the people that pay the bills.
These guys put thousands of hours to maintain and improve a compiler you get for free. Thousands of programmers welcome each and every release jumping to a change log. Every major GCC version lately introduced significant optimizer improvements, new diagnostics, new warnings, faster compile time etc.
How can you claim these guys aren't helping "us" is beyond me. I am developing a small project with just a little over 1000 customers. GCC team efforts over last few years saved thousands of dollars in hardware and electricity costs as well as hundred of hours of computation just because GCC 5.3 produced very significantly faster code than GCC 4.x for low x.
Bugs are different issue than optimizer being more aggressive. We would all like to avoid bugs but it's hard. There will never be improvement if we don't allow for the possibility of bugs being introduced.
I immediately thought about the same language. It's clean, safe, fast and explicit, even if a bit verbose. With a quite nice compiler, last time I checked (some 10 years ago), and without "undefined behaviour".
Optimally a compiler's version of a language is better defined simply by it's definitions. It's just problematic if the behavior is changed between versions, eg. for the sake of optimizations. It's also bad if undefined behavior is expected to be reliable where the original implementation wasn't intended to be meaningful in the first place.
If those two cases are hard to tell apart, the process of announcing this is at fault, I guess, e.g. the error messages and warnings, the documentation, the info hidden in discussions on the mailinglists or in bug reports, but the gcc docs seem fairly extensive and I can't claim to have read it completely or the c standard.
"a compiler's behavior should give considerable weight to how the language is used more than taking a literal view of what the words mean in the spec."
This is generally true, but the problem in this case is the committee history of a lot of what is being complained about shows it was added specifically to let compilers reason the way they do.
The committee didn't sit down and say "hey, this should be undefined, because it sucked". Instead, what happened is committee members and compiler writers sat down and said the compiler guys said "if you make this undefined, we can optimize this thing over here better", and the committee said "great, sounds awesome".
Complaining that the compiler writers then go and do that is misguided :)
(This covers most of the specific examples given, BTW. There are other examples where he dismissively claims that what should happen is obvious.)
The whole thing is like complaining that people follow the law, instead of the spirit of the law.
The question I have is whether committee members are sufficiently representative of programmers in the trenches, and whether committee members are sufficiently informed about the impact of their saying, "great, sounds awesome".
Sometimes yes, sometimes no. But in some sense, it doesn't matter.
They are the ones defining C. If people don't like it, they should create Cgood or whatever.
This has been tried before, and it has never caught on.
I'm not worried. If the programmers in the trenches really hate it so much, C will eventually die off in favor of something better :)
To be fair Cgood doesn't catch on because no compiler vendors implement it, and the C standards committe is at least somewhat beholden to the compiler vendors because if the compiler vendors en-masse decided not to implement their standard, then the standard isn't worth the paper it's printed on.
Compiler vendors are people, and in fact, if you don't like that set of people, there are plenty of vendors who have forks :)
IE if you don't like gcc 4.8's treatment of behavior, it's not like there aren't 50 different arm forks from different vendors you can use, with or without their SOC's.
Yet none of these vendors have apparently heard enough from customers that they have made what are pretty simple patches.
It's really, really hard to sell someone on a non-standard implementation of a popular language like C. For example, I've had customers beg me to add an extension to C. I implement it, and present it to them. They refuse to use it - because then their code would be dependent on a non-Standard feature.
The solution (for me, anyway) was to create an entirely new language. The changes that nobody wanted in C (not because they were bad, but because they were non-Standard) found plenty of traction and users in D.
This is a very good observation. The exception is strict subsets though; there are plenty of people using MISRA, and that's an extremely restrictive subset of C. Similarly this would be a dialect of C that is identical in behavior for conforming C programs, rather than C with some extensions.
> This is entirely true, but at the same time:
Compiler vendors are people, and in fact, if you don't like that set of people, there are plenty of vendors who have forks :)
With all due respect, but that sounds a bit like "if you don't like this particular law, you're always free to try a coup/revolution/run your own country".
I don't know much about the situation in standardized C, but I would argue that the point of standards is to reduce fragmentation and discuss language features in a way that all implementation can benefit from it. Demanding that you break the standard if you don't like a part of it kind of defeats that point, I think.
A different (and IMO more sensible) approach is shown by the WHATWG developing HTML5. Even though input primarily comes from browser vendors (which would be the equivalent to compiler vendors here), there is a rather strict requirement that new features must break a little as possible existing usage of HTML and should never introduce new security vulnerabilities. (Where those requirements collide, the second one takes precedence). This is done by extensive studies of how HTML is used "in the wild".
"With all due respect, but that sounds a bit like "if you don't like this particular law, you're always free to try a coup/revolution/run your own country".
"
IMHO, it's closer to, if you don't like SF, you can move to any of the cities around it :)
"Demanding that you break the standard if you don't like a part of it kind of defeats that point, I think."
Which, humorously, is precisely what Anton wants in some cases.
"A different (and IMO more sensible) approach is shown by the WHATWG developing HTML5. Even though input primarily comes from browser vendors (which would be the equivalent to compiler vendors here), there is a rather strict requirement that new features must break a little as possible existing usage of HTML and should never introduce new security vulnerabilities. (Where those requirements collide, the second one takes precedence). This is done by extensive studies of how HTML is used "in the wild"."
This is in actuality, how most of C/C++ is developed. With a bit more formality due to it being an international standard.
The truth is this code has pretty much never had defined behavior.
As far as I'm aware, the author was using optimizations in quotes ("optimizations") to parallel the previously defined C* and "C" distinctions. Hence why section 2.1 is called 'Optimization* and "Optimization".' He's not suggesting they are not real optimizations, but that they are a class of optimizations which target the "C" subset rather than the C* superset.
At least, that was my understanding of it. People seem to be reading into the quotes as indicative of tone rather than notation, so I wanted to put a different perspective out there.
You read one intention of the quotes right. But they are also intended to be scare quotes; if the result of an "optimization" is equivalent to the behaviour intended by the programmer that also worked with earlier versions of the compiler, it's a real optimization, if not, it's miscompilation. E.g., "optimizing" a bounded loop into an endless loop is not a real optimization.
The problem is that many programmers who are actually competent and care about speed welcome each and every new GCC release with joy checking the change-log for optimizer improvements. I mean, people who actually check the docs instead of ranting about how int overflow doesn't behave according to some wishful thinking.
>>The inconsequential speed improvement was just not worth it.
Yeah, but the author if taking about very significant speed improvements (often bigger than 5%) which only brake really bad code (relying on behavior of int overflow).
> really bad code (relying on behavior of int overflow).
How's that "bad code"? On most desktop architecture, integers are coded as 2's complement, and wrap around. Modulo arithmetic is weird, but it does have some uses. Then on DSP, integers generally saturate upon overflow.
Those behaviours are both perfectly well defined. Why can't one expect to be able to rely on them? Why integer overflow can't simply be "implementation defined"?
Because of traps, such as we see upon divide by zero? Because some architecture might produce nonsensical results upon overflow? Those are put into the "undefined" bucket, but they're quite different from "the compiler can assume it never happens, and treat the alternative as dead code".
>>How's that "bad code"? On most desktop architecture, integers are coded as 2's complement, and wrap around.
It doesn't matter, the language specs are very clear about it.
Think about, if you write a piece of code using unsigned ints which you know is going to overflow, what is the first thing you do? You check the docs for what happens. Does it cap? Does it wrap around? You check the standard and it says that they wrap around. Now you use that in your code because maybe that's a useful behavior.
You do the same thing with signed ints, you check the standard and it says that it's undefined which means you, as a programmer promise you won't do that so you don't.
>>Those behaviours are both perfectly well defined. Why can't one expect to be able to rely on them?
Because language specs says so.
>>Why integer overflow can't simply be "implementation defined"?
Maybe it could and maybe it would be a better language. It is undefined behavior since long time ago though and that means you end of the contract as a programmer is that you won't invoke it.
Your argument just comes down: "Gee, maybe C would be a better language if signed overflow was dependent of implementation and not UB". Maybe that's true, I don't know. It doesn't matter though, it's similar to saying Python would be better language if it didn't have an interpreter lock and then complaining your multithreading program doesn't use all the cores as it should.
You sound like programming languages are a Given™, never to be modified —or at least not by us lowly programmers. That's too pessimistic for my taste.
As for whether C would be better off… that's besides the point. Taken to its extreme, undefined behaviour upon signed integer overflow breaks C's own philosophy.
C is supposed to be closed to the metal. Originally, it was. So you'd expect that if some `ADD R1 R2` instruction triggers an overflow, you might observe funny results that might have funny consequences down the line. You'd further expect that the `+` operator has the exact same consequences —because C is close to the metal.
What you do not expect, is this (imagine you're writing a PID for a DSP, where integer arithmetic saturates):
int signal = input + 1;
if (signal == input) { // tests for saturation
// dead code, because undefined behaviour
//
// The only way for this test to be true is
// to overflow signed integer addition.
// Signed integer overflow is undefined, so
// the compiler is allowed to assume it never
// happens.
//
// Therefore, the test above is always false,
// and the code right here is marked as dead.
}
Dead code is not whatever funny behaviour that might have risen from the `ADD R1 R2` above. That's the compiler doing weird stuff because "undefined behaviour" allowed it to go crazy. This is not what you'd expect of a low level language. Craziness is supposed to come from the platform, not the compiler.
Now C being what it is, the quick fix would be to use INT_MAX instead. It's the portable way to do this test, and would avoid that crazy dead code elimination. But this is not enough: if `input == INT_MAX`, we still have an overflow, and who knows what would happen. The real fix would be something like this:
int signal = input + (input == INT_MAX ? 0 : 1);
if (signal == INT_MAX) {
// live code, yay!
}
I have to emulate saturation in software! Why?!? The platform does this already with the `ADD` instruction, why can't I just use it? Why am I even using a low level language?
In this particular case, the spirit of the C standard was clearly to have addition map to whatever hardware instruction was available. If the `ADD` wrapped around, so would `+`. If it saturated, so would `+`. And if it trapped like a divide by zero, so would `+`.
Instead, the compiler writers took the standard to the letter, and saw "undefined" as a license to eliminate code that wasn't dead by any such low level assumption. Doing this clearly violates the idea that C is close to the metal.
Is your positive experience with C, or C++? I think a lot of the friction is that the two languages often share a compiler but not a philosophy. Modern templated C++ depends heavily on the compiler's ability to optimize out redundant code, while many C programmers still want to more of a WYSIWYG compiler. I often get the feeling that many people involved with GCC would be happier if they could drop C compatibility and just work on a C++ compiler.
Ertl's work is at a low enough level where he knows in advance what the assembly of the optimized loops should look like. For example, here's a big improvement to the Python interpreter that's based on his work: https://bugs.python.org/issue4753. What he, and I, and some small but significant number of others, wish for is that there was a way to write C that will reliably produce the output we envision. The chances that an optimizer will make better code than this is low, whereas the chance that attempts to optimize the code will produce worse code is very high.
I'd be interested to see an "out of sample" longitudinal look at performance of some optimized C code with different versions of GCC. That is, ignore the code used in the standard benchmarks, since the optimization strategies have been consciously chosen to do well on that code, and instead look at code that was written for high performance with older compilers but hasn't yet been tested with current compilers. I don't know for sure what it would show, but I'm guessing it would be more of a mixed bag rather than a monotonic improvement.
It's very positive. Python used to be my favorite programming language, now it's C. I don't code anything in C++ so I don't know, the experience was very negative when I've learnt some of it long time ago but maybe it would be different today.
>>What he, and I, and some small but significant number of others, wish for is that there was a way to write C that will reliably produce the output we envision. The chances that an optimizer will make better code than this is low, whereas the chance that attempts to optimize the code will produce worse code is very high.
Ok, I can definitely see the point. Still it irks me when something as simple as integer overflow is the reason source of rants about UB. I do agree that it would be nice to have some universal way to drop to assembly. I am in probably rare position that I write code which sells because of performance but I am not that good with assembly (the reason is that I am relatively new and work in a niche). That means that optimizer improvements are more important for me than "stick to what is written even if you don't understand it" kind of compiler.
>>. I don't know for sure what it would show, but I'm guessing it would be more of a mixed bag rather than a monotonic improvement.
It could be because programmers fit what they do to the compiler. My approach is to just test a lot of things and leave what sticks so my code is naturally a better fit to a compiler I work with. Still, updating my old GCC to 4.8 was a huge boost in speed and updating to 5.3 recently gave me additional free 4%-5% while compiling faster and offering some new warnings.
"facts" absolutely should be in quotes there. He's describing assertions like "x cannot be null", something that is inferred from the code and treated as fact but turns out not to be true at runtime.
> I know in my compiler work, I have backed out some optimizations that were allowed by the spec, but broke too many programs. The inconsequential speed improvement was just not worth it.
Then perhaps those cases should be reported to the spec commitee to perhaps make the optimization not allowed?
I can't currently say that I am a compiler writer, since I'm working on static analysis tools, but I have been a compiler writer for more of my career than I have been anything else, and I was thoroughly amused by the tone of the paper. It's not particularly professional, and it's not likely to change minds among the people he is excoriating, but I don't think that's the point. Instead, I think he's trying to shift the "Overton window", to embolden people uncomfortable with the status quo, so that they can argue for what they want instead of accepting the opinions of compiler writers as some kind of unchangeable holy writ.
I think you should consider the perspective of programmers who have to write code where all the side effects aren't hidden behind the operating systems API calls.
int d[16];
int SATD (void)
{
int satd = 0, dd, k;
for (dd=d[k=0]; k<16; dd=d[++k]) {
satd += (dd < 0 ? -dd : dd);
}
return satd;
}
This gets optimized into an infinite loop, and the paper says that this is incorrect. I'm on the side of the compiler. This code is broken, if you do an out-of-bounds access then an infinite loop is the least of your worries.
Independent of undefined behavior: Who writes such god-awful code?
There:
for (k=0; k < 16; k++)
{
dd = d[k];
satd += (dd < 0 ? -dd : dd);
}
Just one more line, less error-prone, idiomatic for-loop and variable accesses (therefore easier and faster to comprehend), and best of all: no undefined behavior.
("But if my code is simple, how will other people know what a genius I am?")
There's no way I would accept the code from that benchmark in a code review.
On a reasonable machine (e.g., x86, x86_64, ARM, ARM64), in a reasonable context (i.e., not the very tippy top stack frame), the one-past-the-end read is harmless. The value is unused, and the read itself is not going to segfault because it's going to read the frame pointer or return address out of the stack frame.
... unless your implementation does bound checking at runtime. This actually happen with the new crop of sanitizes or even with the new hardware accelerated bound checking features of modern intel cpus.
It would be a shame if people had to disable critical safety features to workaround crashes caused by such "harmless" out of bound reads.
That's also taking the view that the code is executing pointless busy work as the result is meaningless -- seems fair enough to optimise the whole thing away to me under those circumstances, although I suppose the compiler ought to put in some "random" return value. Four would probably do nicely as it's already been proven to be the best random number.
When you say "optimize the whole thing away", do you mean that it's reasonable to skip that particular load or reasonable to skip the entire loop? Anton and I (and I presume 'swolchok') are all for reordering the code to avoid the the unnecessary load, and if it had some benefit, would probably be fine with replacing it with some random or non-random number.
The part that makes us cringe is for the compiler to reason backward from the undefined load to removing the entire loop, even for the values that are within range. While accepting that compiler would be spec compliant in doing so (replacing main() with "return 0" would also be spec compliant), we question whether that really makes for a better/faster/safer user experience. Essentially, we think that the clear intent of the code should have greater influence on the particular optimizations that are chosen.
I do agree that this isn't quite so straightforward, but I also can't see that it's necessary for the compiler to guess what is meant here. Even supposing the array is initialised elsewhere there is a read past the end of the array -- I hope we'd both agree that a segfault in response is perfectly reasonable.
If that's the case then maybe forcing a segfault is better than optimising the loop away, but I do err on the compiler's side here. As others have pointed out, the compiler warns that undefined behaviour is invoked by the code and it probably doesn't actually do what the author thinks it does. The diagnostic also isn't required by the standard.
The clear intent of the code certainly isn't clear to me -- whatever calculation is done involving a value read from past the end of the array only tells me that the return value isn't used in any meaningful way, and if that's the case then why not eliminate the code?
[EDIT] Ok, just re-read the code, I guess the value is actually unused so the function return value is presumably OK, but I think the option of the segfault is still reasonable for the compiler to do.
A segmentation violation when doing the out-of-bounds access would be a possible behaviour of C* code.
Concerning the intent: Note that the result of the out-of-bounds access is not being used. Looks to me like SATD() just produces the sum of absolute values of d.
Even with buggy code, you want the behavior of the generated code to reflect the behavior of the source code, or debugging becomes an exercise in frustration (especially, but not only on platforms where running a debugger is not an option).
Yeah, this is undefined behaviour and should not be expected to work, 100% agreed.
Wouldn't it be faster to remove that costly ternary conditional operator by zeroing out the first bit? Well, there's one degenerate case, for minvalue of int, but that is degenerate either way.
> Wouldn't it be faster to remove that costly ternary conditional operator by zeroing out the first bit?
No, that wouldn't work for two's-complement numbers [1], only for sign-magnitude numbers (like IEEE floating-point numbers). For example, the value -2 is stored as 11111110 (in an 8-bit signed integer); setting the most significant bit to 0 would result in 01111110, or 126 in decimal.
But they only change their mind for this particular code, because it is part of a benchmark, which is the kind of code they supports instead of useful one.
What really happened: It was diagnosed, and they produced a fix for the spec source code.
SPEC said it does not plan to offer the fix.
So they then went and said "okay, i guess we'll try to add an option to give people a chance to have this random crap work".
The flag was added to disable this assumption so people could use it if they need it.
I'm not sure what better response you want from developers than that.
Nowhere, in any of the bug report, will you see someone say "yeah, we gotta make sure this still works because it's a benchmark".
They treat it like any other undefined behavior bug, and say "that's undefined behavior".
you can look through the bug database and find literally tens of thousands of cases where they say that, benchmarks, non-benchmarks, etc. They do it whether it's important for optimization or not.
It's really pretty consistent.
So taking this example and extrapolating that the only reason it was done is because it's a benchmark is, well, pretty typical of this author :)
The "optimization" was added and turned on by default (for some higher optimization levels) in a pre-release of gcc-4.8. After it turned out to miscompile SPEC, they disabled it for the case occuring there (it still seems to be on by default for other cases); it was disabled in released gcc-4.8 and all following gcc versions. When I asked a gcc maintainer about that, he wrote that it turned out that this "optimization" did not really provide a significant performance benefit; I asked how they evaluated that and he wrote that they used a set of relevant benchmarks. I have seen little reluctance by the gcc maintainers in enabling "optimizations" by default, even if they miscompile real-world programs. My conclusion is that SPEC has a special status among gcc maintainers, but draw your own conclusions.
You are wrong about the timeline, it was added in 2012 well before the 4.8 branch point (it's closer to the 4.7 timeline). It was not added in a pre-release of gcc 4.8, and it was actually added to target some common cases. And in fact, it existed before then in other forms (I can point you at code back to 2004 doing the same thing). It has been on by default there for a long time. It happened to not miscompile SPEC before then by luck.
". After it turned out to miscompile SPEC, they disabled it for the case occuring there (it still seems to be on by default for other cases);"
No, it was not. It is on by default for all cases, AFAICT.
Please point to the patch you think disabled it.
" it was disabled in released gcc-4.8 and all following gcc versions."
Again, i don't see a patch that does this. If this has happened, i imagine it's not on purpose.
faggressive-loop-optimizations
Common Report Var(flag_aggressive_loop_optimizations) Optimization Init(1)
Aggressively optimize loops using language constraints.
Nowhere will you see something that says it is not the default, or will you see changes mentioning the default was changed.
" When I asked a gcc maintainer about that, he wrote that it turned out that this "optimization" did not really provide a significant performance benefit; I asked how they evaluated that and he wrote that they used a set of relevant benchmarks."
Which "gcc maintainer" did you ask?
I ask because i'm a listed maintainer for the SCEV related optimizations (see https://gcc.gnu.org/svn/gcc/trunk/MAINTAINERS) and you didn't ask me :)
I asked Sebastian as well, and you didn't ask him.
While compiler vendors do target SPEC, GCC generally does not, and in fact, has pretty much repeatedly refused optimizations that are only useful for SPEC.
FYI: putting optimization in quotes repeatedly does not help your case.
I tested compiling SATD() with gcc-4.8.2 and gcc-5.2.0 with -O3, and they do not "optimize" SATD() into an infinite loop. That some version between 4.7 and 4.8 did actually do it was well publicized, and is also documented in bug report 53073; this bug report was resolved as "INVALID", yet somehow a change was introduced before 4.8 was released that results in compiling SPEC as intended. If you want to know which patch achieved that, you can bisect the commits.
The gcc maintainer who justified that change to me is Andrew Haley. He may not be working on SCEV, whatever that is, but, unlike you, he admits to this change that has provably happened.
"I tested compiling SATD() with gcc-4.8.2 and gcc-5.2.0 with -O3, and they do not "optimize" SATD() into an infinite loop."
and from this you assume it was specifically changed (much like you assume a lot of bad faith throughout), instead of "just falling by the wayside as the result of some other change", which is infinitely more likely.
I have no desire to know what patch did it, actually, i'm pointing out you are, pretty much everywhere, going off half-cocked with assumptions and accusations.
"The gcc maintainer who justified that change to me is Andrew Haley. He may not be working on SCEV, whatever that is, but, unlike you, he admits to this change that has provably happened."
Andrew pretty much has only ever worked on the java front end, and that's what he maintains. SCEV is the thing that changed here.
I'm not sure why Andrew would have ever said anything about SPEC (since it's not java ;P), but i actually at this point suspect more bad faith on your part, given what you've said so far.
, whenever the code reached the end of the loop, effectively becomes
dd=d[++k]
if(k<16) goto startOfLoop;
That, in turn is equivalent to:
k+=1
dd=d[k]
if(k<16) goto startOfLoop
In the second line, k is used to index into array d. Conforming programs do not access memory outside of arrays, so the compiler can infer that
0 <= k < 16
Given that, it is clear that the goto startOfLoop branch will always be taken. Inside the loop, the program accesses some memory, but that data never is visible to the outside world, so the compiler can optimize away the memory reads and writes.
The compiler is that aggressive because it has to be in real world programs. For example, compilers using C as their target instruction set may generate bounds checks for every array access, trusting the C compiler to remove as many as possible of them. Macros are another common cause of superfluous source statements that programmers will expect the compiler to optimize away.
I think it will be very tricky to detect cases where compilers should be wary of applying 'obvious' optimizations.
I would be pretty miffed if I wrote a, say, Pascal compilers with array bounds checks, and the compiler "optimized" the checks away just because accessing a[i] would be undefined behaviour. OTOH, in a loop like
for (i=0; i<16; i++) {
if (i>=0 && i<16)
do something with a[i]
else
report an error;
you can certainly optimize the if to if(1). That's an optimization*.
No, it would not. Compilers can remove boundary checks that provably happen after accessing an item of an array, not those before the array is accessed.
For your example, the Pascal code (at least, I hope this is Pascal; it has been a while since I wrote any):
if (i>=0 and i<16) then
begin
x := a[i]
end
else
begin
ReportError;
end
A good Pascal compiler, like a good C compiler, would optimise away that second boundary check and the call to RuntimeAbort. Neither compiler is allowed to optimise away the first check.
The point of my posting was that you don't need "optimizations" to optimize away redundant bounds checks, optimization* is able to do it just fine. Sorry if I did not get that across clearly. What does your "No, it would not" refer to?
Raymond Chen has a series of blog posts (e.g. https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=...) that cover some surprising consequences that undefined behavior can have on program optimization. I'm kind of sympathetic to the point of view that these are abuses of the freedoms granted to the compiler, and that it would be better for it to just generate instructions that match the illegal code. But as others have pointed out, you can generally get the compiler to warn you when it detects something like this, and anyone who uses C and doesn't turn on "treat warnings as errors" gets what they deserve. If you want a language that does runtime bounds checking there are plenty to choose from.
The author doesn't understand the market demands for a compiler:
1) Customers who want their text-size to be as small as possible so they can shave $0.0015 off their BOM by using smaller ROM parts
2) Chip vendors who have 7 figure sales that will fall-through if they can't show a 0.7% higher Dhrystone MIPS (yes Dhrystone, the benchmark that was obsolete 20 years ago).
3) Managers who make buying decisions purely based on a set of their own pet benchmarks, with no regard to anything in this paper.
Damn right a compiler will take advantage of every optimization opportunity afforded by the standard. Good compilers will have ways of being much less aggressive about optimizations; I bet with gcc and clang each you have some -f flags that will disable certain optimizations (not to mention just good old -O0); I know most commercial compilers do.
[edit]
I would like to see some less-optimizable C standard catch on, so that there could be a mode common to multiple compilers that makes programmers happy. Unfortunately attempts to create such a dialect have failed to get off the ground.
[edit2]
Part of this is because there is less of a consensus on what are optimizations* and "optimizations" than the author would believe.
As for the market demands on compilers, I hope that this paper influences that in more reasonable directions. In the meantime, I have nothing against having an option (or more) for "optimizations". Enabling new "optimizations" by default however has the effect of miscompiling existing programs and is not such a good idea. At the very least there should be one flag for disabling "optimizations" without disabling optimizations.
Concerning what are optimizations: If the observable behaviour does not change (even when there is undefined behaviour), it's an optimization. Also, register allocation and rearranging stuff in memory is an optimization (because programs that rely on that not changing break on maintenance). Other transformations are, at best, "optimizations". If you have any questions about specific transformations, please ask.
Optimizations based on undefined behavior are actually based on the belief that the code in question doesn't have undefined behavior (that the programmer is taking care of any possibility of undefined behavior, where that possibility exists).
This assumption should be made only if the programmer who wrote that code is a machine: a machine translating something which itself doesn't have undefined behavior, into C that doesn't have undefined behavior.
That assumption should not be made when the programmer is a person. Or, if the assumption is used, it should be accompanied by a diagnostic. "I'm expressing what appears to be your intent using a more efficient approach, which breaks in a surprising way if you made a mistake."
For instance, consider
int compare(int x, unsigned char y)
{
return x + y > x;
}
This is always true, or else the behavior is undefined. So it can be replaced with:
int compare(int x, unsigned char y)
{
(void) x; /* somewhat accepted de facto idiom for ignoring arguments */
(void) y;
return 1;
}
And GCC does turn it into "return 1". But why the hell would a human write a complicated function in place of just constant 1?
If a machine generated this due to following a template of code, that is understandable, and making it return 1 is helpful --- assuming that the generator is carefully written to avoid relying on undefined behavior.
Even if the generator actually relies on undefined behavior, the issue can be fixed in the generator, which can be run to regenerate all the code; but there is no such easy band-aid for human-written code. (A code generator can even have a catalog of what version of what compiler brand supports what undefined behavior in what way, and tune its output accordingly to what compiler is used in the subsequent pass.)
What is unhelpful on the part of compiler writers is supporting some undefined behavior as a de facto extension for many years, and then backpedaling on it.
One day your code that depends on signed integers having two's complement "odometer wraparound semantics" stops working after years or decades because of some compiler cowboy who was in millennial kindergarten when that code was written.
What I think many people fail to realize is that a large part of the code as a C compiler sees it is machine-generated. While I certainly do not tend to literally write out `if (1 != 0)` conditionals in my code, these and other similarly trivial expressions, will be introduced en-masse after macro expansion and aggressive inlining.
And in this context, macro-expanded, inlined code, exploiting the assumption of no undefined behavior occurring starts making a lot more sense. Sure, if you consider something like
int x = *ptr;
if (ptr != NULL) {
*ptr = 1;
}
you may not unreasonably say that, as there is an explicit ptr != NULL check in there, probably that's what the programmer intended -- so we should not replace ptr != NULL with a constant 0 because that pointer was already dereferenced.
However if the real code is
int x = *ptr;
do_something_with_ptr(ptr)
and the do_something_with_ptr() function just happens to contain a NULL-check at some point (because it is written for a more generic purpose and not use in just this place), then not dropping that NULL-check becomes a very significant missed optimization opportunity.
If the compiler wouldn't optimize that out of consideration for the possibility of the former hand-written code, that would incur costs on the programmer. He might have to split do_something_with_ptr() into two functions, one doing NULL checks and one not doing it. He may have to add explicit compiler assumption / unreachability annotations.
The whole argument goes both ways, performing optimizations that break broken code has a cost (people have to fix it) and not performing those optimizations for working code (people have to manually optimize it).
Out of all ridiculous claims in that .pdf, many pointed out by others I find this especially entertaining:
But do “optimizations” actually produce speedups for benchmarks? Despite
frequent claims by compiler maintainers that they do, they rarely present numbers
to support these claims. E.g., Chris Lattner (from Clang) wrote a three-part
blog posting8 about undefined behavior, with most of the first part devoted to
“optimizations”, yet does not provide any speedup numbers. On the GCC side,
when asked for numbers, one developer presented numbers he had from some
unnamed source from IBM’s XLC compiler, not GCC; these numbers show a
speedup factor 1.24 for SPECint from assuming that signed overflow does not
happen (i.e., corresponding to the difference between -fwrapv and the default
on GCC and Clang).
Fortunately, Wang et al. [WCC+12] performed their own experiments compiling
SPECint 2006 for AMD64 with both gcc-4.7 and clang-3.1 with default
“optimizations” and with those “optimizations” disabled that they could identify,
and running the results on a on a Core i7-980. They found speed differences
on two out of the twelve SPECint benchmarks: 456.hmmer exhibits a speedup
by 1.072 (GCC) or 1.09 (Clang) from assuming that signed overflow does not
happen. For 462.libquantum there is a speedup by 1.063 (GCC) or 1.118 (Clang)
from assuming that pointers to different types don’t alias. If the other benchmarks
don’t show a speed difference, this is an overall SPECint improvement by
a factor 1.011 (GCC) or 1.017 (Clang) from “optimizations”.
I mean, the speed-up of 7.2%, 9%, 6.3%, 11.8% for specific cases is huge. It will make real time and money difference for people.
He is advocating against having that just because he thinks what signed integer overflow should do is obvious. Entitlement, ignorance, incompetence. I am not sure which one is it. It takes 1 minute to figure out that signed integer overflow is undefined and act accordingly.
Developer time is a zero sum game. If one of these optimizations bites me and I spend one day figuring out what happened, I don't spend one day doing optimization somewhere else, where a much greater speedup can be achieved. Or, to out it in your money difference terms, I could be adding value to the program somewhere else, adding or polishing features.
It's a balancing game. Looking at it through the lens of % speedup, ignoring everything else is just stupid. In a perfect world we'd make decisions based on numbers. He provided his. Where are yours?
I half expected the document to present statistical evidence about how programmers tend to behave, and use that to generate advice for compiler writers.
The article is a nice read (and doesn't present that kind of statistical evidence), but I'm still wondering, has anyone done a significant study of how programmers work? I would be really fascinated to read an analysis of, say, data taken from full-time software engineers at a given company where all of their actions are recorded (key-strokes, compiler runs, error messages, etc.). Similar to how online competitive gamers have their actions recorded to the microsecond to identify weak points.
It would even be interesting to know, e.g., what is the net average number of lines of code created/deleted per day?
The average doesn't matter, specific problems are reported on the mailinglists, for example. You wouldn't sway everyone's writing style with some statistical evidence, not in a giant ecosystem like C/unix and it's not a democracy either where 51% would just ignore the other 49% existence.
What? "tested and production" programs might be conforming according to the C standard, but that's only because it's a largely useless term: [a] conforming program is one that is acceptable to a conforming implementation. [C99, 4]
If the implementation is conforming (which might not be the case) and has accepted the program (which could be chock full of cruft), then the program is conforming. The "conforming" designation simply doesn't assure us of any measure of quality, other than that the program isn't so bad that it doesn't compile and link (with that one toolchain which produced its executable).
If the program is tested and in production using a compiler that is operated in a nonconforming mode (often the default on many compilers), then acceptance of the program doesn't speak to its conformance at all.
This kind of twaddle in the abstract is a very bad way to start out a paper. (And I'm one of the "converted" people, you don't have to work hard to convince me about the dangers of optimizing based on "no UB" ssumptions, without a single diagnostic squawk.)
"Conforming program" may be a largely useless term, but it's one of the two conformance levels for programs that the C standard defines. The other is "strictly conforming program", and it does not include any terminating C program, so it's also a largely useless term.
Now C compiler maintainers justify "optimizations" with the C standard, and indeed, a compiler that behaves like that is a "conforming implementation" (the only conformance level that the C standard defines for compilers). Is that a useful term?
Yes, we should be exploring the fringes of the C standard (in particular, "optimizations") less, because that is not useful territory; instead, we should orient ourselves by looking at the C programs that are out there.
But anyway, my thinking when I wrote that was: if somebody comes at you with the frequent stance of "your program is non-standard crap full of bugs", when it worked as intended with all previous versions of the compiler, you can shoot back at them with "it's a conforming program". Of course, if the other side is a language lawyer, that approach will not help, as you demonstrate, but if not, maybe you can get the other side to take a more productive stance.
"Strictly conforming program" is a set that certainly includes terminating programs; I think you mean that it has a shortcoming because it doesn't exclude non-terminating programs? But a program that keeps running forever, servicing commands entered by a user, is nonterminating, yet possibly correct and useful. Nontermination doesn't constitute misuse of the language as such.
> you can shoot back at them with "it's a conforming program
Not really; retorting with a useless term from ISO C doesn't help in any way. You can only effectively shoot back by demonstrating that all the claims that the program is buggy are rooted in constructs and uses which are in fact defined on all the platforms which the program explicitly supports. (Just not ISO-C-defined.)
If you're doing something that isn't defined by ISO C, and isn't documented by your compiler or library either, then you do in fact may have a real problem.
But portability to platforms which are not specified for the program is a specification issue, not a coding issue. If the specification is broadened to encompass those platforms, then it becomes a coding issue.
I wouldn't listen to anyone who complains that, say, some program of mine only works on two's complement machines, not ones with sign-magnitude integers. My response would not even be "patches welcome" (a patch to "fix" that would definitely not be welcome).
On the other hand, merely calling a function which isn't in the C program and isn't in ISO C is undefined behavior, as is including a nonstandard header file like #include <unistd.h>. We can make a conforming implementation in which #include <unistd.h> and a call to fork() reformats the disk.
Simply accusing a program of undefined behavior isn't damning in an of itself; hardly any useful program can escape that blame.
Basically, I can out-language-lawyer anyone who criticizes my code from that perspective, so I'm safe. :)
"Strictly conforming program" excludes implementation-defined behaviour, and AFAICS all ways to terminate a C program are implementation-defined behaviour, so terminating C programs are not strictly conforming.
"C" is actually a little bit wider than "strictly conforming", because it includes implementation-defined behaviour (I did not know that when I wrote the paper).
So "C" does not actually correspond to a conformance level defined in the C standard. So while the "C" advocates like to support their stance with language lawyering, they actually pick those pieces of the C standard that suit them and ignore the others. That is fine with me, but if the standard is not the guideline, what is? For me it is the corpus of existing working programs; "C" advocates seem to be more interested in benchmarks.
The diagram on page 13 has linear interpolation plotted between the datapoints. That's one step away from fitting an arbitrary polynomial for the points. Don't do it, implying measurements between point releases isn't very sensible.
So his real argument is with the C committee for not defining these things the way he wants them. His claim that the behavior of what should happen is obvious is pretty strange (IE "it should do exactly whatever the compiler is trying to prove it doesn't")
It's essentially a complaint that compiler authors should follow a mystical spirit of C that he seems to have in his head.
(Plus, you know, he's kind of a dick about it all)
I read things like "We had plans to turn this optimization into a production feature, but eventually realized that the GCC maintainers only care about certain benchmarks and treat production programmers as adversaries that deserve getting their code broken. ... The current GCC and Clang maintainers suggest that we should convert programs to “C”. How would that turn out for Gforth?"
I was intrigued, so i went looking for the mailing list discussion where the gcc or clang people suggested this or where any of this happened.
I can't find it. On either set of mailing lists. I searched for gforth, i searched for anton, etc Nothing.