I wonder whether the author ever ran a benchmark? Alas, the commenting system on...

plesner · on March 1, 2023

Author her. I didn't run benchmarks. I'm suspicious of micro-benchmarks and I don't have a context where I can try it against realistic data. Also, I just enjoy the maths of it even if it turns out not to make a huge performance difference in practice.

eru · on March 2, 2023

Thanks for replying! It's definitely a nice write-up.

You are right that micro-benchmarks are a bit suspicious, but they are better than nothing.

Btw, have a look at https://godbolt.org/z/zMarEnYP5 to see what Clang come up with on her own.

    #include<stdint.h>

    uint64_t div(uint64_t nlz) {
        __builtin_assume(nlz <= 63);
        return (63 - nlz) / 7;
    }

    div:                                    # @div
            xori    a0, a0, 63
            andi    a1, a0, 255
            li      a2, 37
            mul     a1, a1, a2
            srli    a1, a1, 8
            subw    a0, a0, a1
            slli    a0, a0, 56
            srli    a0, a0, 57
            add     a0, a0, a1
            srli    a0, a0, 2
            ret

This uses Risc-V assembly. Just for fun. x86 is also fascinating.

I haven't analysed it in detail. But it looks like Clang doesn't seem to mind multiplication.

Cthulhu_ · on March 1, 2023

I hope this isn't considered doxing, but he has his name and location on the blurb on the side; you may be able to reach him through linkedin: https://www.linkedin.com/in/christianplesnerhansen/