It's shorter because ARM has bic. Neither one figures out to use carry related instructions.
Ah! But! There is a gcc macro: __builtin_uadd_overflow() that replaces the first two C lines above: c_high = __builtin_uadd_overflow(a_low, b_low, &f_low);
Note that with the newly-ratified B extension, RISC-V has BIC (called ANDN) as well as ORN and XNOR.
In addition to the actual ALU instructions doing the add with carry, for bignums it's important to include the load and store instructions. Even in L1 cache it's typically 2 or 3 or 4 cycles to do the load, which makes one or two extra instructions for the arithmetic less important. Once you get to bignums large enough to stream from RAM (e.g. calculating pi to a few billion digits) it's completely irrelevant.
You can detect carry of (a+b) in C branch-free with: ((a&b) | ((a|b) & ~(a+b))) >> 31
So 64-bit add in C is:
So for RISC-V in gcc 8.2.0 with -O2 -S -c But for ARM I get (with gcc 9.3.1): It's shorter because ARM has bic. Neither one figures out to use carry related instructions.Ah! But! There is a gcc macro: __builtin_uadd_overflow() that replaces the first two C lines above: c_high = __builtin_uadd_overflow(a_low, b_low, &f_low);
So with this:
RISC-V:
ARM: RISC-V is faster..EDIT: CLANG has one better: __builtin_addc().
x86: ARM: RISC-V: