What if the multi-precision code is written in C? You can detect carry of (a+b) ...

volta83 · on Dec 2, 2021

> RISC-V is faster..

I find it funny that you make the same pitfall than the author did.

Faster on which CPU?

The author doesn't measure on any CPU, so here there are dozens of people hypothesizing whether fusion happens or not, and what the impact is.

jhallenworld · on Dec 2, 2021

All other things equal, you would prefer smaller code for better cache use.

volta83 · on Dec 3, 2021

8x 2 byte instructions (16 bytes) lead to smaller code than 4x 8 byte instructions (32 bytes).

Counting number of instructions isn't really a good metric for that either.

brutal_chaos_ · on Dec 2, 2021

> Faster on which CPU?

Perhaps faster means fewer instructions in this instance? Considering number of instructions is what has been discussed.

volta83 · on Dec 3, 2021

Right, but all architectures can handle many combinations of instructions in 1 cycle, so this is not really a great proxy for that.

Same for code size. If the instructions are half the size, having 1.5x more instructions still means smaller binaries.

brucehoult · on Dec 2, 2021

Note that with the newly-ratified B extension, RISC-V has BIC (called ANDN) as well as ORN and XNOR.

In addition to the actual ALU instructions doing the add with carry, for bignums it's important to include the load and store instructions. Even in L1 cache it's typically 2 or 3 or 4 cycles to do the load, which makes one or two extra instructions for the arithmetic less important. Once you get to bignums large enough to stream from RAM (e.g. calculating pi to a few billion digits) it's completely irrelevant.