Hacker News new | past | comments | ask | show | jobs | submit login
LLVM Is Smarter Than Me (sulami.xyz)
44 points by nopipeline on April 21, 2024 | hide | past | favorite | 17 comments



This is of course an optimization which is hyper specific and in fact near useless in practice.

Converting an awful algorithm into a superior form (from O(N) to O(1)) is a neat trick, but obviously can only apply if: - Such an algorithm can be easily detected by the Compiler and there is actually a replacement algorithm which does exactly the same thing within the defined behavior. These cases are very rare. - The programmer is unable or unwilling to identify the optimization, which in most cases means he does not understand his algorithm well or does not care about performance.

Just as an example this substitutions CAN NOT be applied if you replace ints with floats. Of course the O(1) algorithm is still generally preferable in that case a Compiler will never help you with that if you use floats.


Actually that was the reason Compiler Explorer was created in first place, Matt Godbolt wanted an easy way to sort out all the usual discussions "I know better than the compiler" in regards to using C++ abstractions.


Does someone know how the compiler comes up with the closed form solution? Is there a hardcoded list of common patterns and their solution in the compiler, or is this really generated from the code itself?


I wondered the same thing and found this to be interesting: https://stackoverflow.com/questions/74417624/how-does-clang-...


It's more like a hundred matched patterns, with more added once in a while.


Yeah, one suspects this is a dumb micro-optimisation to make benchmark figures look better with little practical application.


Doing micro optimizations is precisely the compiler's job - so I don't have to worry about it.


Yeah, but all micro optimisations come with costs, from compiler time to maintenance overhead. Someone's still got to ask if it's worth it. And in this case, I just can't see many regular programs containing code likely to trip this, but benchmarks? All the time.


A small correction: the sigma summation should go to n-1, not n.

It's weird that LLVM computes n(n-1)/2 in such a convoluted way, namely:

(n-1)(n-2)/2 + n - 1


It has to. Just imagine what happens when in the iterative algorithm

count = count + i

is just before overflowing. In that case the O(N) algorithm has no overflow, but if you look at the closed form solution. Surely

(count)*(count-1)

would overflow.

A compiler must never be allowed to swap a non-overflow with a overflow. Which is one big limit of these optimizations.


Sure, but in the scenario you describe, (count-1)*(count-2) also overflows, and that's why the generated code already to 64 bits for the multiply. (See my cousin comment to this one.)


IIRC that way of doing the summation is specifically done to avoid issues with overflow.


I feel like you're probably right, but I can't figure out the exact reason.

The multiplication of (n-1)(n-2) already expands to 64 bits. It has to: Even when the loop accumulator doesn't overflow when counting up to n(n-1)/2 in the abstract machine, (n-1)(n-2) will be larger (and possibly overflow 32 bits) for most n. (The highest valid n is 92682, I think.)

So surely once you're doing that, I'd think n(n-1)/2 can be done in 64 bits just as easily.

    unsigned int sum2(unsigned int n) {                                                                                                                                              
        unsigned long ln = n;                                                                                                                                                                                   
        return ln * (ln - 1) / 2;                                                                                                                                                                               
    }

    a.out <+0>:  pushq  %rbp
    a.out <+1>:  movq   %rsp, %rbp
    a.out <+4>:  movl   %edi, %ecx
    a.out <+6>:  leaq   -0x1(%rcx), %rax
    a.out <+10>: imulq  %rcx, %rax
    a.out <+14>: shrq   %rax
    a.out <+17>: popq   %rbp
    a.out <+18>: retq



Check out the optimizations done by utilizing GVN and SCEV in LLVM, blows my mind every time.


sum of i: 0,n = n(n+1)/2 not n(n-1)/2


Oh, good catch. Corrected.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: