This depends entirely on compiler support. Intels ICX compiler can easily vector...

dzaima · 2025-01-27T14:53:50 1737989630

gcc and clang are also capable of it, given certain compiler flags: https://godbolt.org/z/z766hc64n

camel-cdr · 2025-01-27T14:57:31 1737989851

Thanks, I didn't know about this. Interesting that it seems to require fast-math.

ComputerGuru · 2025-01-27T15:10:08 1737990608

That means it’ll never be used. ffast-math is verboten in most serious codebases.

dzaima · 2025-01-27T15:18:34 1737991114

Seems "-fno-math-errno" is enough for clang. gcc needs a whole "-fno-math-errno -funsafe-math-optimizations -ffinite-math-only".

Avamander · 2025-01-27T16:56:33 1737996993

So is the optimization "wrong" or "unsafe" even in the case of Intel's ICX compiler? Is it that you can't express the right (error) semantics in C?

I'm just wondering why those two require the flag and the other doesn't.

dzaima · 2025-01-27T17:13:34 1737998014

ICX appears to just default to fast-math: https://godbolt.org/z/jzPazGjoh

Requiring -fno-math-errno is sane enough, essentially noone needs math errno anyway (and that flag is needed to vectorize even sqrt, for which there's a proper full hardware SIMD instruction, but which obviously doesn't set errno on a negative input or whatever).

Probably depends on the vector math library used whether it handles near-infinity or inf/NaN values properly. And there's also the potential concern that the scalar and vector exp() likely give different results, leading to weird behavior, which might be justification for -funsafe-math-optimizations.

bee_rider · 2025-01-27T15:17:56 1737991076

I’m not sure what “suited to SIMD” means exactly in this context. I mean, it is clearly possible for a compiler to apply some SIMD optimizations. But the program is essentially expressed as a sequential thing, and then the compiler discovers the SIMD potential. Of course, we write programs that we hope will make it easy to discover that potential. But it can be difficult to reason about how a compiler is going to optimize, for anything other than a simple loop.

camel-cdr · 2025-01-27T15:26:05 1737991565

Suites for SIMD means you write the scalar equivalent of what you'd do on a single element in a SIMD implementation.

E.g. you avoid lookup tables when you can, or only use smaller ones you know to fit in one or two SIMD registers. gcc and clang can't vevtorize it as is, but they do if you remove the brancjes than handle infinity and over/under-flow.

In the godbolt link I copied the musl expf implementation and icx was able to vectorize it, even though it uses a LUT to large for SIMD registers.

#pragma omp simd and equivalents will encourage the compiler to vectorize a specific loop and produce a warning if a loop isn't vectorized.

bee_rider · 2025-01-27T16:38:45 1737995925

I shouldn’t have started my comment with the sort of implied question or note of confusion. Sorry, that was unclear communication.

I agree that it is possible to write some C programs that some compilers will be able to discover the parallel potential of. But it isn’t very ergonomic or dependable. So, I think this is not a strong counter-argument to the theory of the blog post. It is possible to write SIMD friendly C, but often it is easier for the programmer to fall back to intrinsics to express their intent.

sifar · 2025-01-27T15:31:32 1737991892

It means auto-vectorization. Write scalar code that can be automatically vectorized by the compiler by using SIMD instructions