Hacker News new | past | comments | ask | show | jobs | submit login

libdivide tl;dr:

> libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster. [...] divide SIMD vectors by runtime constants, [1]

> libdivide.h is a header-only C/C++ library for optimizing integer division. Integer division is one of the slowest instructions on most CPUs e.g. on current x64 CPUs a 64-bit integer division has a latency of up to 90 clock cycles whereas a multiplication has a latency of only 3 clock cycles. libdivide allows you to replace expensive integer divsion instructions by a sequence of shift, add and multiply instructions that will calculate the integer division much faster.

> On current CPUs you can get a speedup of up to 10x for 64-bit integer division and a speedup of up to to 5x for 32-bit integer division when using libdivide. libdivide also supports SSE2, AVX2 and AVX512 vector division which provides an even larger speedup. You can test how much speedup you can achieve on your CPU using the benchmark program.[2]

[1] https://libdivide.com/ [2] https://github.com/ridiculousfish/libdivide




The obvious question this invites: if this is generally faster for unknown values, why don't compilers use this optimization directly in emitted code?


They could, but it's less profitable, and cost models are, as always, a big problem. If we have:

  loop for i from 0 to n
      f(a[i]/y)
How big does n have to be before it makes sense to compute a reciprocal for y? And how frequently is it actually that big during the runtime of the code?


> Compilers usually do this, but only when the divisor is known at compile time

if the divisor is in a variable or otherwise 'hidden', the compiler can't deduce enought to get to it, is my guess.


Calculating the divisor values is also expensive, this works when you do this work once and then do the efficient divides multiple times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: