What is needed are true NPUs as dedicated co-processors, especially for prosumer desktop systems (devs, other professionals, gamers). GPUs work in the enterprise, but they're a hassle to use for AI on the personal computing side of the market. Especially VRAM limitations, but also the lack of a standard open API other than Vulkan (again, using video stuff for AI).
Compared to CUDA, Vulkan is... not fun to code compute in! The serialization bridge and duplicating data structures and functions between CPU and GPU is tedious.