Also, it is starting to become the case that CUDA isn't that important anymore, both pyTorch and TF have numerous other backends and the programmer doesn't need to know what it runs on. And the GGML project has shown that you can come a long way with a good CPU and large "normal" RAM and 4/8 bit weights, with no CUDA in sight. You can definitely enter this domain without having a full-fledged CUDA replacement from the start.