Hacker News new | past | comments | ask | show | jobs | submit login

This depends entirely on compiler support. Intels ICX compiler can easily vectorize a sigmoid loop, by calling SVMLs vectorized expf function: https://godbolt.org/z/no6zhYGK6

If you implement a scalar expf in a vectorizer friendly way, and it's visible to the compiler, then it could also be vectorized: https://godbolt.org/z/zxTn8hbEe




gcc and clang are also capable of it, given certain compiler flags: https://godbolt.org/z/z766hc64n


Thanks, I didn't know about this. Interesting that it seems to require fast-math.


That means it’ll never be used. ffast-math is verboten in most serious codebases.


Seems "-fno-math-errno" is enough for clang. gcc needs a whole "-fno-math-errno -funsafe-math-optimizations -ffinite-math-only".


So is the optimization "wrong" or "unsafe" even in the case of Intel's ICX compiler? Is it that you can't express the right (error) semantics in C?

I'm just wondering why those two require the flag and the other doesn't.


ICX appears to just default to fast-math: https://godbolt.org/z/jzPazGjoh

Requiring -fno-math-errno is sane enough, essentially noone needs math errno anyway (and that flag is needed to vectorize even sqrt, for which there's a proper full hardware SIMD instruction, but which obviously doesn't set errno on a negative input or whatever).

Probably depends on the vector math library used whether it handles near-infinity or inf/NaN values properly. And there's also the potential concern that the scalar and vector exp() likely give different results, leading to weird behavior, which might be justification for -funsafe-math-optimizations.


I’m not sure what “suited to SIMD” means exactly in this context. I mean, it is clearly possible for a compiler to apply some SIMD optimizations. But the program is essentially expressed as a sequential thing, and then the compiler discovers the SIMD potential. Of course, we write programs that we hope will make it easy to discover that potential. But it can be difficult to reason about how a compiler is going to optimize, for anything other than a simple loop.


Suites for SIMD means you write the scalar equivalent of what you'd do on a single element in a SIMD implementation.

E.g. you avoid lookup tables when you can, or only use smaller ones you know to fit in one or two SIMD registers. gcc and clang can't vevtorize it as is, but they do if you remove the brancjes than handle infinity and over/under-flow.

In the godbolt link I copied the musl expf implementation and icx was able to vectorize it, even though it uses a LUT to large for SIMD registers.

#pragma omp simd and equivalents will encourage the compiler to vectorize a specific loop and produce a warning if a loop isn't vectorized.


I shouldn’t have started my comment with the sort of implied question or note of confusion. Sorry, that was unclear communication.

I agree that it is possible to write some C programs that some compilers will be able to discover the parallel potential of. But it isn’t very ergonomic or dependable. So, I think this is not a strong counter-argument to the theory of the blog post. It is possible to write SIMD friendly C, but often it is easier for the programmer to fall back to intrinsics to express their intent.


It means auto-vectorization. Write scalar code that can be automatically vectorized by the compiler by using SIMD instructions




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: