The micro-op cache is very small, on the order of ~1.5K uops AFAIK. It can also be repopulated quite fast. So yes, the performance hit should be quite small. You should presumably also be able to reduce the performance hit if you reduce the frequency of context switches, which should get easier the more cores you have, if I'm not mistaken. That is, the OS can have its own dedicated core, and some programs can be more or less pinned to other cores where they are rarely interrupted.
> You should presumably also be able to reduce the performance hit if you reduce the frequency of context switches, which should get easier the more cores you have, if I'm not mistaken.
Context switches don't happen that often due to preemption unless your CPU is oversubscribed. Most context switches are due to syscalls, especially the ones used to wait for contended locks. Reducing those takes a lot more optimization work.
Given the hockey stick number of cores coming at us, I see pinning and better temporal avoidance being solutions. High security code will be pinned to its own core, running in its own memory area.