Basic Chacha C implementations do not get auto-vectorized down to ultra-efficien...

rstuart4133 · on Jan 12, 2017

Basic GNU C implementations don't get auto-vectorised full stop. But with a little bit of effort Chacha20 can be made to vectorise. The implementation in here is vectorised by GNU C:

  https://sourceforge.net/p/pbkdf2/code/ci/default/tree/pbkdf2.c

If "ultra-efficient code" means what could be produced by a programmer highly skilled in some amd64 implementation (intel core2, amd bulldozer, ...) for that implementation then yes I doubt GNU C produces it. But the odds are GNU C's output runs faster than that's guru's code on other amd64 implementations.

floodyberry- · on Jan 12, 2017

That Salsa implementation is not being vectorized? Salsa also requires some values to be shuffled around to actually work in SSE registers, djb made a bit of a boo-boo when designing it. Chacha fixes that, so its SIMD implementations are a bit more straightforward.

koverstreet · on Jan 11, 2017

But ChaCha is simple enough that even implementing it in assembly with AVX or whatever isn't all that hard.