Turbobase64: A portable scalar implementation can beat SIMD and saturates the fa...

tomsmeding · on Nov 6, 2019

In the readme for Turbobase64, they state:

> Only the non portable AVX2 based libraries are faster

The paper in OP is about 2 times as fast as their previous AVX2 code. I think they are superior to Turbobase64 here (though with more stringent hardware requirements).

powturbo · on Nov 6, 2019

I agree and according to a steam hardware survey only 1% are using AVX2. AVX-512 is practically unused. And don't forget the billions ARM CPU's.

sambe · on Nov 6, 2019

Their numbers are surely wrong? Firstly, AVX2 jumped from 1% to 5% in a month. Secondly, it looks to have been on every Intel processor this decade.

eMSF · on Nov 6, 2019

Even 5% sounds sketchy indeed. However, AVX2 (or Haswell New Instructions) hasn't quite been on every Intel processor this decade, rather only since the 4th generation (late 2013), and only in Core-branded processors.

colejohnson66 · on Nov 6, 2019

It’s available, but the CPU would throttle itself to prevent overheating until very recent processors.

cperciva · on Nov 6, 2019

TurboBase64 runs at 3-4 GB/s. The authors' work runs at over 40 GB/s.

Not to say that Turbobase64 isn't impressive, but it's not at all the same level of performance.

powturbo · on Nov 6, 2019

40 GB/s in L1 cache. The TurboBase64 benchmark is more practical.

It is impossible to have the speed of a good memcpy when you're copying 33% more like in base64.

bluesign · on Nov 6, 2019

I think 40GB/s result when it is not fitting L1 cache. If you check result graph, On L1 cache fitting data memcpy is ahead with big difference, then they are almost head to head

powturbo · on Nov 6, 2019

L1 cache is 64k for this cpu. A benchmark with large files is more realistic, because it's unlikely that the data already be in the L1 cache.

bluesign · on Nov 6, 2019

Benchmark at the README says it is 4x slower than memcpy.

powturbo · on Nov 6, 2019

There is no claim about memcpy. The memcpy is used as indication in the benchmark.