JIT is the magic sauce and this a pretty regular optimization.
Step 1, inline repeat.
Step 2, remove the intermediate array allocation
Step 3, allocated a string array sized for the pad + str
Step 4, Use one of the many CPU instructions to repeatably copy the padding character and then the `str` into the same array of characters.
None of these optimization would be out of the question for the Jit (and I'd expect them). You don't need the cache at all, it's just a waste. The only thing it saves it creating the intermediate string which is HIGHLY likely to be optimized away with the simple code.