The GPU model is a proven way to get better performance out of traditional CPUs also. You will get the best performance from modern CPUs if you treat each SIMD lane as a parallel execution on homogeneous data and also execute on multiple cores. See ispc, and Burst Compiler as examples.
I think there is a view that this style of programming only applies to traditionally high compute areas like games, hpc, rendering, ML, etc but we've recently seen a lot of core building blocks of "normal" web applications like hash tables (https://code.fb.com/developer-tools/f14/) and json parsers (https://github.com/lemire/simdjson) get massive perform gains from SIMD.
Note that SIMD on CPUs is somewhat different to GPU style "SIMD" (particularly for cases where you want fixed width SIMD vs arbitrary vector processing (ala SPMD, ISPC, CUDA etc)).
JSON parsing, for example, doesn't scale in the same way as typical graphics applications do with wide vectors, for example. It's just that the traditional model of byte-by-byte parsing is very inefficient - SIMD implementations just exploit some of the unused parallelism available in CPUs. It's still very much a latency bound problem, and has complex control flow problems, which is why it wouldn't run well on a GPU, yet runs well on CPU SIMD.
I think there is a view that this style of programming only applies to traditionally high compute areas like games, hpc, rendering, ML, etc but we've recently seen a lot of core building blocks of "normal" web applications like hash tables (https://code.fb.com/developer-tools/f14/) and json parsers (https://github.com/lemire/simdjson) get massive perform gains from SIMD.