Yes, and the instruction cache becomes stale as well. I guess one way to avoid is to have 16 code blocks back-to-back and then to do a like a jmp into the section that contains the right register. JMP are pretty cheap, and the end point is likely to be in cache anyway.