You are right that micro-benchmarks are a bit suspicious, but they are better than nothing.
Btw, have a look at https://godbolt.org/z/zMarEnYP5 to see what Clang come up with on her own.
#include<stdint.h> uint64_t div(uint64_t nlz) { __builtin_assume(nlz <= 63); return (63 - nlz) / 7; } div: # @div xori a0, a0, 63 andi a1, a0, 255 li a2, 37 mul a1, a1, a2 srli a1, a1, 8 subw a0, a0, a1 slli a0, a0, 56 srli a0, a0, 57 add a0, a0, a1 srli a0, a0, 2 ret
I haven't analysed it in detail. But it looks like Clang doesn't seem to mind multiplication.
You are right that micro-benchmarks are a bit suspicious, but they are better than nothing.
Btw, have a look at https://godbolt.org/z/zMarEnYP5 to see what Clang come up with on her own.
This uses Risc-V assembly. Just for fun. x86 is also fascinating.I haven't analysed it in detail. But it looks like Clang doesn't seem to mind multiplication.