Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Minor note - RISC-V's Zba (included in RVA22, which afaik is set to be the baseline requirement for Android) has sh2add for a combined x+y*4, getting that store down to two instructions (testable with -march=rv64gczba); doesn't affect the overall point of needless clobbering of registers from the instruction fusion approach though (and shouldn't indexed loads be even worse, having three potential output registers in baseline RV64GC, and still two with Zba?)



For loads you can usually hide the clobbering by using the intended target register (the one you want to load to) as the temporary register, so e.g:

      // a0 = load [a1 + a2*4]
      slli    a0,a2,2
      add     a0,a0,a1
      lw      a0,0(a0)
...but not always.


That's the same approach that gets you from 2 to 1 clobbered registers for stores, no? i.e. usually possible, but needs compiler participation; if assuming the compiler will optimize load register usage properly, it shouldn't have any problem reducing the store clobbers from 2 to 1 either, which I understood as the difficult part. (or course 1 clobber is still worse than 0, but at least it means no 2 writes/fused instr, at least here)


True, but for loads there is at least a theoretical way to avoid clobbers. For stores there is no way.

Stores: 1-2 superfluous results

Loads: 0-2 superfluous results




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: