I still, for the life of me, can't understand why Google doesn't just start selling their TPUs to everyone. Nvidia wouldn't be anywhere near their size if they only made H100s available through their DGX cloud, which is what Google is doing only making TPUs available through Google Cloud.
Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.
It is an absolutely massive amount of work to turn something designed for your custom software stack and data centers (custom rack designs, water cooling, etc) into a COTS product that is plug-and-play; not just technically but also things like sales, support, etc. You are introducing a massive amount of new problems to solve and pay for. And the in-house designs like TPUs (or Meta's accelerators) are cost effective in part because they don't do that stuff at all. They would not be as cheap per unit of work if they had to also pay off all that other stuff. They also have had a very strong demand for TPUs internally which takes priority over GCP.
Do you mean, sell TPU hardware to other companies that would run it in their data centers? I can't imagine that would ever really work. The only reason TPUs work at Google is because they have huge teams across many different areas to keep them running (SRE, hardware repair, SWE, hardware infra) and it's coupled to the design of the data centers. To vend and externalize the software would require google to setup similar teams for external customers (well beyond what Google Cloud provides for TPUs today) just to eke out some margin of profit. Plus, there is a whole proprietary stack running under the hood that google wouldn't want to share with potential competitors.
Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.
> Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.
We had a GSA for intranet search and other than the paint this was a standard Dell server. I remember not being impressed by what the GSA could do.
We also had Google Urchin for web analytics, it wasn't a hardware appliance but the product wasn't very impressive either. They then killed that and tried to get you onto Google Analytics.
They just didn't commit to these on premise enterprise products.
And undercut what they'd like to use as a huge motivator in people moving to GCP? Not likely. Even if they wanted to they can't keep up with their own internal demand.
Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.
>Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.
Once you go out of your heavily curated hardware stack, the headaches multiply exponentially.
The impression I got from this thread yesterday is that Google's having difficulty keeping up with the heavy internal demand for TPUs: https://news.ycombinator.com/item?id=39670121