As I understand, Vulkan allows to run custom code on GPU, including the code to multiply matrices. Can one simply use Vulkan and ignore CUDA, PyTorch and ROCm?
You probably can, but why would you? The main (only?) reason to ignore the CUDA-based stack is so that you could save a bit of money by using some other hardware instead of nVidia. So the amount of engineering labor/costs you should be willing to accept is directly tied to how much hardware you intend to buy or rent and what % discount, if any, the alternative hardware enables compared to nVidia.
So if you'd want to ignore CUDA+PyTorch and reimplement all of what you need on top of Vulkan.... well, that becomes worthy of discussion only if you expect to spend a lot on hardware, if you really consider that savings on hardware can recoup many engineer-years of costs - otherwise it's more effective to just go with the flow.
of course, but then you are just recreating CUDA. And that won’t scale well across an industry since each company would have their own language. AMD can just do what you are describing and then sell it as a standard.
I mean they literally did that, but then dropped it so yea