Exactly. This isn't "fuck the programmer if he fucks up," it's "let's try to do really good optimizations."
It's really nice to be able to use abstractions that cost nothing because the compiler is smart. In this particular case, you might have a function pointer that exists for future expansion, but which currently only ever holds one value. In a case like that, it's really nice if the compiler can remove the indirection (and potentially go further and do clever things like inline the callee or do cross-call optimizations).
The other piece of this puzzle is straightforward data flow analysis. The compiler knows that there are only two possible values for this function pointer: NULL and EraseAll. It also knows that it can't be NULL at the call site. Thus, it must be EraseAll.
For every person complaining that the compiler is screwing them over with stuff like this, there's another person who would complain that the compiler is too stupid to figure out obvious optimizations.
I'm very much in favor of making things safer, but I don't think avoiding optimizations like this is the answer. C just does not accommodate safety well. For this particular scenario, the language should encode the nullability of Do as part of the type. If it's non-nullable, then it should require explicit initialization. If it's nullable, then it should require an explicit check before making the call. The trouble with C isn't clever optimizers, it's that basic things like nullability are context-dependent rather than being spelled out.
> It also knows that it can't be NULL at the call site
Ah, this is obviously some strange use of the word "can't" that I wasn't previously aware of. Or possibly of "be" or "at".
The pointer clearly is NULL at the call site. Observe: http://lpaste.net/358687. Hypotheticals about the program being linked against some library that that calls NeverCalled are just that, hypothetical. In the actual program that is actually executed, the pointer is NULL.
In what sense is the function pointer "not NULL", then, given that – in what one might call the "factual" sense – it is NULL?
> In what sense is the function pointer "not NULL"
If the pointer is NULL, dereferencing it destroys the universe. If the universe is destroyed, the program never existed. Therefore, in any universe where the program exists, the pointer is not NULL. Q.E.D.
Exercise 1: Propose a less parochial definition of universe that doesn't lead to colorful threats from major stakeholders.
"Can't" here means that your program is not well-formed otherwise, and the compiler assumes well-formedness.
I assume you don't like that, but I wonder if you'd apply that to other optimizations? For a random example:
int x = 42;
SomeFunc();
printf("%d\n", x);
Should the compiler be allowed to hard-code 42 as the second parameter to printf, or should it always store 42 to the stack before calling SomeFunc(), then loading back out? SomeFunc might lightly smash the stack and change the value of x, after all.
Hardcoding 42 as the parameter to printf here is far more defensible for several reasons. Here's one: the value actually is 42, and assuming that it continues to be 42 doesn't require the compiler to hallucinate any additional instructions outside this compilation unit.
There's a difference between assuming that a function like SomeFunc internally obeys the language semantics for the sake of code around its call site (this is the definition of modularity), and assuming that because the code around the call site "must" be "well-formed" this allows you to hallucinate whatever code you need to add elsewhere to retroactively make the call site "well-formed" (this is the definition of non-modularity).
What's the difference between assuming that a function you call will obey the language semantics, and assuming that the function that calls you will obey the language semantics? That's the only difference I can see.
> assuming that the function that calls you will obey the language semantics
That's not what I said.
What the compiler is doing in this NeverCalled example is observing:
- that the code in the current compilation unit is not "well-formed", but
- that the compilation unit can be "rescued" by some other module that could be linked in, if that other module did something specific,
and therefore concluding that it should imagine that this other module exists and does this exact thing, despite the fact that such module is in fact entirely a hallucination.
This is very different from simply assuming that a thing that in fact exists really does implement its stated interface.
Here's a different example:
#include <stdio.h>
typedef int (*Function)();
static Function Do;
static int Boom() {
return printf("<boom>\n");
}
void NeverCalled() {
Do = Boom;
}
void MightDoSomething();
int main() {
printf("Do = %p\n", Do);
MightDoSomething();
return Do();
printf("after Do\n");
}
In this case, it is possible that MightDoSomething could call NeverCalled, and that's one way this module could rescued from not being "well-formed". Should the compiler assume that MightDoSomething calls NeverCalled at some point then? No, that's absurd. There's nothing about the "void()" function interface that obliges such a function to clean up after you if write code that dereferences a null pointer or divides by zero.
We trust that a random void() function won't smash the stack and overwrite local variables, because that's a reasonable interface for a function to have. That's composable. That's different from expecting it to do "whatever it takes" to fix local undefined behaviour.
When you say "its stated interface," are you referring purely to the prototype, or are you referring to documented behaviors, or what? Because it seems reasonable to me for a function with no parameters to have prerequisites before you call it, and it seems unreasonable to say that it must be valid to call a function with no parameters in any and all circumstances.
> It's really nice to be able to use abstractions that cost nothing because the compiler is smart.
But the compiler is not smart. It's screwing up in certain cases. In this example if it was smart it would have figured out that the value never was initialized.
> In this particular case, you might have a function pointer that exists for future expansion, but which currently only ever holds one value.
Then define it as a regular function for now. The fact that you only thought of one function that needs it means you're making abstractions before you really needed them. And if you need a second function soon you'll loose the speed of the optimization anyways. And you did profile it first to figure out that this one tiny optimization actually matters, right? :)
But let's say you really needed to do it that way for whatever reason. If the compiler was smart enough to warn you that it wasn't initialized you could have made an empty function and initialized it to that. Problem solved and the compiler would be free to optimize it away.
> In a case like that, it's really nice if the compiler can remove the indirection (and potentially go further and do clever things like inline the callee or do cross-call optimizations).
Sure. Do a full program optimization and figure out that the function to initialize the pointer was actually called. Then do all those clever optimizations. The issue is that the compiler writers want the benefits of the optimization without doing the work making the optimization safe by making the compiler smarter. They just hide behind the "undefined behavior" mantra and let the programmer pick up the pieces when it goes wrong.
> For this particular scenario, the language should encode the nullability of Do as part of the type. If it's non-nullable, then it should require explicit initialization.
This. I 100% agree that this is the proper solution. But it would require a whole program pass to figure out that it's actually initialized somewhere. As I said above, the compiler writers could have done that without a change to the language.
But a lot of UB could be avoided by language changes. That's what many people have done when designing new languages. With C however we're stuck with what we have and need to make the compiler smarter before it slaps every optimization in its tool belt at every piece of code.
Maybe the C language needs to slowly evolve and add those changes to start getting rid of UB. But there has been zero progress in that direction. The compiler writers are perfectly content to squeeze out every last cycle of performance using any new UB loophole they can find.
When safety finally becomes a priority to them over benchmarks then maybe we'll start seeing some progress.
> if it was smart it would have figured out that the value never was initialized.
But that's false, which just goes to show that the compiler writers know way more about this than you do. There's nothing stopping this from being linked into a binary which doesn't even call main, or which calls NeverCalled, etc. And I bet you will also insist stamping your feet that of course programmers should be able to construct function pointers - to functions like, y'know, Never called - from arbitrary bit patterns. You know nothing, but you're convinced you know so much more than those stupid compiler writers.
>> if it was smart it would have figured out that the value never was initialized
> But that's false
Are you reading the same code the rest of us are? NeverCalled is never called. So Do is not explicitly initialized and therefore contains a null pointer because it's a static variable.
Now compiler writers wanted their benchmark scores better so instead of crashing the program when Do is called, which happens in the unoptimized version, they decided to play fast and loose with UB. They just made code vanish.
What I'm saying is that if the compiler can figure out that NeverCalled is actually called from somewhere then it's free to make these optimizations. But if it knows it's not called then it should either disable the optimization for that statement or better yet give a warning.
> There's nothing stopping this from being linked into a binary which doesn't even call main, or which calls NeverCalled, etc.
Which is why I called for Whole Program Optimization to solve that issue. Since it looks like you did not bother to find out what that is and how it would solve that issue I'll explain it here. In Whole Program Optimization the compile is pushed down to the link phase. This lets the compiler see the who program and apply optimizations globally instead of at a file by file bases. So it can tell if main is never called or if NeverCalled is called or not.
> And I bet you will also insist stamping your feet
Now you're attacking me instead of my arguments. Do you wish to have a civilized discussion or just resort to insults? Because if it's the latter I will just ignore you in the future.
> which just goes to show that the compiler writers know way more about this than you do
> You know nothing, but you're convinced you know so much more than those stupid compiler writers.
I am a compiler writer so I do know what I'm talking about. It's a small personal project but it means I've been doing a lot of thinking and research about compilers. And eliminating UB is my current design focus.
And if you reread what I wrote you can see I never called them stupid. They are quite smart and know what they are doing. But even a smart person can make bad decisions depending on their motivations. What I'm saying is that they are putting their skill towards exploiting UB instead of protecting programmers from it.
Just wanted to say, I think your comments here are useful. Given some of the replies, I guess the person who said that this is "literally a religious issue" is right. Sigh!
Thanks. I'm glad some people are getting some use from my posts.
I'm used to the "religious" attacks against me as this isn't the first time it's happened. You need to have a thick skin to post the non-mainstream ideas here. It doesn't matter if you are correct or that your idea is technically accurate, it's all about the how popular the other view is.
The funny thing is how consistent the pattern is. First you see the downvotes and upvotes come in. This is the first sign you're on a hot button topic. Then people will simply tell you that you're wrong without any counter argument. Once you respond back with further facts to back up your argument the attacks on your education/skill/knowledge come in. You misused some cargo cult terminology and that's proof you don't know what you're talking about. Usually it ends there but once in a while someone starts up with the personal insults.
It's funny and sad watching the same thing happen over and over. Sigh.
The function called at program startup is named main, which this translation unit defines. No other may therefore define it. Binaries that don't run main are out of the scope of the standard, and so irrelevant to the discussion.
Anyway, as a more general point: your argument is, basically, "the customer is wrong". But the customer is never wrong! Therefore your argument is invalid.
Right, yes, sure, whatever. Since you've evidently got the experience that I apparently lack, you'll know that this point is irrelevant, since the topic at hand is Standard C, and not whatever some random implementation happens to do... so I'm not sure what your point is. But of course perhaps it would be obvious to a more experienced practitioner.
It's really nice to be able to use abstractions that cost nothing because the compiler is smart. In this particular case, you might have a function pointer that exists for future expansion, but which currently only ever holds one value. In a case like that, it's really nice if the compiler can remove the indirection (and potentially go further and do clever things like inline the callee or do cross-call optimizations).
The other piece of this puzzle is straightforward data flow analysis. The compiler knows that there are only two possible values for this function pointer: NULL and EraseAll. It also knows that it can't be NULL at the call site. Thus, it must be EraseAll.
For every person complaining that the compiler is screwing them over with stuff like this, there's another person who would complain that the compiler is too stupid to figure out obvious optimizations.
I'm very much in favor of making things safer, but I don't think avoiding optimizations like this is the answer. C just does not accommodate safety well. For this particular scenario, the language should encode the nullability of Do as part of the type. If it's non-nullable, then it should require explicit initialization. If it's nullable, then it should require an explicit check before making the call. The trouble with C isn't clever optimizers, it's that basic things like nullability are context-dependent rather than being spelled out.