Because it pushes the CUDA ecosystem into dominance of yet another platform. You can run your acceleration routine on anything from a smartphone/raspberry pi to an enterprise accelerator, one algorithm. And it will be everywhere, a defacto capability of most reference-implementation ARM devices.
(and sure opencl too but that's too loose a standard to have any platform effect, it's just a standard that everyone implements a little differently and needs to be ported to their own compiler/hardware/etc, so there is no common codebase and toolchain that everyone can use like with CUDA.)
Everyone laughed at Huang saying that NVIDIA is a software company. He was right.
Nobody wants to run CUDA anymore. All the mobile SoCs jumped right over that one and have AI coprocessors now. CUDA is what runs on the developer workstation, not the "edge device" as they call it. Like there was a tiny window of software supremacy there, then everyone remembered how matrix multiplication works.
What are you even talking about? Jumped over? It was literally never an option (and still isn't). That said, compute shaders are indeed used on mobile SOCs but even then the install base is pretty abysmal.
Maybe nVidia will decide to push this type of tech a lot harder.
(and sure opencl too but that's too loose a standard to have any platform effect, it's just a standard that everyone implements a little differently and needs to be ported to their own compiler/hardware/etc, so there is no common codebase and toolchain that everyone can use like with CUDA.)
Everyone laughed at Huang saying that NVIDIA is a software company. He was right.