Minor note - RISC-V's Zba (included in RVA22, which afaik is set to be the basel...

mbitsnbites · on Jan 18, 2024

For loads you can usually hide the clobbering by using the intended target register (the one you want to load to) as the temporary register, so e.g:

      // a0 = load [a1 + a2*4]
      slli    a0,a2,2
      add     a0,a0,a1
      lw      a0,0(a0)

...but not always.

dzaima · on Jan 18, 2024

That's the same approach that gets you from 2 to 1 clobbered registers for stores, no? i.e. usually possible, but needs compiler participation; if assuming the compiler will optimize load register usage properly, it shouldn't have any problem reducing the store clobbers from 2 to 1 either, which I understood as the difficult part. (or course 1 clobber is still worse than 0, but at least it means no 2 writes/fused instr, at least here)

mbitsnbites · on Jan 18, 2024

True, but for loads there is at least a theoretical way to avoid clobbers. For stores there is no way.

Stores: 1-2 superfluous results

Loads: 0-2 superfluous results