Yup. Clock-for-clock, Skylake is about 10-20% faster than Sandy Bridge, IPC improvements have been <5% per year. The apparent improvement since that time has been slowly cranking up the stock clockrates. If you overclock a Sandy Bridge to >4 GHz, which is extremely reasonable, then it keeps up just fine with a Skylake in most tasks.
CPU performance is largely "good enough" for most users. OS bloat has finally stopped: Win8.1 is just as fast as Win7 (and is more stable) and Win10 is faster and skinnier. Most users don't do anything intensive and probably wouldn't even notice if you substituted in a low-end processor. For those that do have big needs, GPU offloading has taken off in a big way.
This is kind of unfortunate in other respects. CPU performance (especially single-threaded) is extremely important for high refresh rates. At 144hz there's no margin for any weak link in the system. But I recognize that I'm kind of a niche user in that regard.
Sometimes - there's multiple types of execution units in a CPU core (even multiples of the same type), and a thread can dispatch to multiple units at once (superscalar execution). It can also reorder the instruction stream to keep all the units occupied (out-of-order execution), preemptively execute along the most likely direction a branch will take (speculative execution), etc.
Basically, it's all a massive game to keep all the units of a core busy to execute the desired instruction stream as fast as possible. Over time, successive CPU architectures have gotten better at playing the game: better occupancy, more execution units, and more powerful units (SSE, AVX, etc), which translates into a greater number of instructions executed per clock cycle (IPC).
That's why a Skylake is much faster than a Pentium 4, even though the P4 might run at a higher clockrate. The Skylake has better IPC.
And as a side note: what Hyperthreading does is duplicate the part of the core that manages registers and instruction dispatch for a thread. So you have a second thread that can utilize any execution units that the first thread left unoccupied.
Bulldozer works somewhat similarly: two threads share a single core, and each core has a pair of integer execution units but they share a floating-point unit. So kinda like a Super-Hyperthreading, where they include a duplicate of (what they hope is) the most needed execution unit. Doesn't always work out in reality though.
CPU performance is largely "good enough" for most users. OS bloat has finally stopped: Win8.1 is just as fast as Win7 (and is more stable) and Win10 is faster and skinnier. Most users don't do anything intensive and probably wouldn't even notice if you substituted in a low-end processor. For those that do have big needs, GPU offloading has taken off in a big way.
This is kind of unfortunate in other respects. CPU performance (especially single-threaded) is extremely important for high refresh rates. At 144hz there's no margin for any weak link in the system. But I recognize that I'm kind of a niche user in that regard.