Connection Machine / Thinking Machines Corporation use of "scalar-like SIMD" was around in the late 1980s and early 1990s. Look up *Lisp, the parallel language that Thinking Machines Corporation used. You can still find *Lisp manuals today. https://en.wikipedia.org/wiki/*Lisp
Intel implemented SIMD using a technique they called SWAR: SIMD within a Register (aka: the 64-bit MMX registers), which eventually evolved into XMM (SSE), YMM (AVX), and ZMM (AVX512).
Today's GPUs are programmed using the old 1980s style / Connection Machine *Lisp, which is clearly the source of inspiration for HLSL / GLSL / OpenCL / CUDA / etc. etc.
Granted, today's GPUs are still SWAR (GCN's vGPR really is just a 64-wide x 32-bit register). We can see with languages like ispc (Intel's SPMD program compiler), that we can indeed implement a CUDA-like language on top of AVX / XMM registers.*
I don't think any of it was based on *Lisp. Graphics mostly developed independently. And as such, we use different words and different terminology sometimes, like saying "scalar ISA" when we talk about designing ISAs that don't mandate cross-lane interaction in one thread. Sorry!
As far as I know, the first paper covering using SIMD for graphics was Pixar's "Channel Processor", or Chap [0], in 1984. This later became one of the core implementation details of their REYES algorithm [1]. By 1989, they had their own RenderMan Shading Language [2], an improved version of Chap, and you can see the similarities from just the snippet at the start of the code. This is where Microsoft took major inspiration from when designing HLSL, and which NVIDIA then started to extend with their own Cg compiler. 3dlabs then copy/pasted this for GLSL.
Intel implemented SIMD using a technique they called SWAR: SIMD within a Register (aka: the 64-bit MMX registers), which eventually evolved into XMM (SSE), YMM (AVX), and ZMM (AVX512).
Today's GPUs are programmed using the old 1980s style / Connection Machine *Lisp, which is clearly the source of inspiration for HLSL / GLSL / OpenCL / CUDA / etc. etc.
Granted, today's GPUs are still SWAR (GCN's vGPR really is just a 64-wide x 32-bit register). We can see with languages like ispc (Intel's SPMD program compiler), that we can indeed implement a CUDA-like language on top of AVX / XMM registers.*