Whatever plan you have needs to be far faster than L1 cache to even make sense. I can imagine some very heavy register-pressure situations where registers are much better than cache accesses.
Whatever plan you have needs to be far faster than L1 cache to even make sense. I can imagine some very heavy register-pressure situations where registers are much better than cache accesses.