LLVM Is Smarter Than Me

constantcrying · on April 22, 2024

This is of course an optimization which is hyper specific and in fact near useless in practice.

Converting an awful algorithm into a superior form (from O(N) to O(1)) is a neat trick, but obviously can only apply if: - Such an algorithm can be easily detected by the Compiler and there is actually a replacement algorithm which does exactly the same thing within the defined behavior. These cases are very rare. - The programmer is unable or unwilling to identify the optimization, which in most cases means he does not understand his algorithm well or does not care about performance.

Just as an example this substitutions CAN NOT be applied if you replace ints with floats. Of course the O(1) algorithm is still generally preferable in that case a Compiler will never help you with that if you use floats.

pjmlp · on April 22, 2024

Actually that was the reason Compiler Explorer was created in first place, Matt Godbolt wanted an easy way to sort out all the usual discussions "I know better than the compiler" in regards to using C++ abstractions.

echoangle · on April 22, 2024

Does someone know how the compiler comes up with the closed form solution? Is there a hardcoded list of common patterns and their solution in the compiler, or is this really generated from the code itself?

drtc · on April 22, 2024

I wondered the same thing and found this to be interesting: https://stackoverflow.com/questions/74417624/how-does-clang-...

fooker · on April 22, 2024

It's more like a hundred matched patterns, with more added once in a while.

moomin · on April 22, 2024

Yeah, one suspects this is a dumb micro-optimisation to make benchmark figures look better with little practical application.

lambdaxyzw · on April 22, 2024

Doing micro optimizations is precisely the compiler's job - so I don't have to worry about it.

moomin · on April 22, 2024

Yeah, but all micro optimisations come with costs, from compiler time to maintenance overhead. Someone's still got to ask if it's worth it. And in this case, I just can't see many regular programs containing code likely to trip this, but benchmarks? All the time.

valleyer · on April 22, 2024

A small correction: the sigma summation should go to n-1, not n.

It's weird that LLVM computes n(n-1)/2 in such a convoluted way, namely:

(n-1)(n-2)/2 + n - 1

constantcrying · on April 22, 2024

It has to. Just imagine what happens when in the iterative algorithm

count = count + i

is just before overflowing. In that case the O(N) algorithm has no overflow, but if you look at the closed form solution. Surely

(count)*(count-1)

would overflow.

A compiler must never be allowed to swap a non-overflow with a overflow. Which is one big limit of these optimizations.

valleyer · on April 22, 2024

Sure, but in the scenario you describe, (count-1)*(count-2) also overflows, and that's why the generated code already to 64 bits for the multiply. (See my cousin comment to this one.)

sham1 · on April 22, 2024

IIRC that way of doing the summation is specifically done to avoid issues with overflow.

valleyer · on April 22, 2024

I feel like you're probably right, but I can't figure out the exact reason.

The multiplication of (n-1)(n-2) already expands to 64 bits. It has to: Even when the loop accumulator doesn't overflow when counting up to n(n-1)/2 in the abstract machine, (n-1)(n-2) will be larger (and possibly overflow 32 bits) for most n. (The highest valid n is 92682, I think.)

So surely once you're doing that, I'd think n(n-1)/2 can be done in 64 bits just as easily.

    unsigned int sum2(unsigned int n) {                                                                                                                                              
        unsigned long ln = n;                                                                                                                                                                                   
        return ln * (ln - 1) / 2;                                                                                                                                                                               
    }

    a.out <+0>:  pushq  %rbp
    a.out <+1>:  movq   %rsp, %rbp
    a.out <+4>:  movl   %edi, %ecx
    a.out <+6>:  leaq   -0x1(%rcx), %rax
    a.out <+10>: imulq  %rcx, %rax
    a.out <+14>: shrq   %rax
    a.out <+17>: popq   %rbp
    a.out <+18>: retq

thethirdone · on April 22, 2024

relevant: https://stackoverflow.com/questions/74417624/how-does-clang-...

fooker · on April 22, 2024

Check out the optimizations done by utilizing GVN and SCEV in LLVM, blows my mind every time.

aamargulies · on April 22, 2024

sum of i: 0,n = n(n+1)/2 not n(n-1)/2

sulami · on April 23, 2024

Oh, good catch. Corrected.