I don't understand your argument, so maybe this is off base, but if you are saying people in industry aren't replacing their supercomputers with commodity gpu's, you're wrong; both apple and google have massive purchase orders for commodity nvidia gpus because they aren't just cheaper, they are better at this application. And I imagine other companies are as well.
Edit: "replace" is probably not the right word, this is work that the old systems don't do well, but they aren't throwing out x86 racks for gpus of course. It's just instead of buying more of the same for machine learning applications.
They aren't buying consumer GPU's they aren't buying the NVIDIA dedicated servers, but they aren't running Geforce chips either.
If nothing else is that because you cannot virtualize Geforce line GPU's, there is no CUDA Direct or NVLINK support etc.
If you are telling me that Google is buying Geforce GPU's and flashing the bios with a custom bios ripped off a Quadro card so they can do PCIe passthrough in a hypervisor and initialize the cards then sorry not buying it.
Containers would imply there is no hypervisor involved, only a dri device exposed by the kernel and bind-mounted into the namespace. You would still need support for multiple contexts but that doesn't require multiple (virtual) PCI devices or an IOMMU.
Edit: "replace" is probably not the right word, this is work that the old systems don't do well, but they aren't throwing out x86 racks for gpus of course. It's just instead of buying more of the same for machine learning applications.