TPUs have the second best software stack after CUDA though. JAX and Tensorflow support it before CUDA in some cases and it's the only Pytorch environment that comes close to CUDA for support.
Google has historically been weak at breaking into markets that someone else has already established and I think the TPUs are suffering from the same fate. There is not enough investment in making the chips compatible with anything other googles preferred stack (which happens to not be the established industry stack). Committing to getting torch to switch from device = “cuda” to device = “tpu” (or whatever) without breaking the models would go a long way imo.
I always thought Google was actually pretty good at taking over established, or rising markets, depending on the opportunity or threat they see from a competitor. Either by timely acquisition and/or ability to scale faster due to their own infrastructure capabilities.
- Google search (vs previous entrenched search engines in the early '00s)
- Adsense/doubleclick (vs early ad networks at the time)
- Gmail (vs aol, hotmail, etc)
- Android (vs iOS, palm, etc)
- Chrome (vs all other browsers)
Sure, i'm picking the obvious winners, but these are all market leaders now (Android by global share) where earlier incumbents were big, but not Google-big.
Even if Google's use of TPUs are purely self-serving, it will have a noticeable effect on their ability to scale their consumer AI usage at diminishing costs. Their ability to scale AI inference to meet "Google scale" demand, and do it cheaply (at least by industry standards), will make them formidable in the "ai race". This is why altman/microsoft and others are investing heavily in AI chips.
But I don't think their TPU will be only self-serving, rather, they'll scale it's use through GCP for enterprise customers to run AI. Microsoft is already tapping their enterprise customers for this new "product". But those kinds of customers will care more about cost than anything else.
The long-term game here is a cost game, and Google is very, very good at that and has a headstart on the chip side.
TPUs were originally intended to just be for internal use (to keep google from being dependent on Intel and nvidia). Making them an external product through cloud was a mistake (in my opinion). It was a huge drain on internal resources in many ways and few customers were truly using them in the optimal way. They also competed with google's own nvidia GPU offering in cloud.
The TPU hardware is great in a lot of ways and it allowed google to move quickly in ML research and product deployments, but I don't think it was ever a money-maker for cloud.
Having used it heavily it is nowhere near painless. Where can you get a TPU? To train models you basically need to use GCP services. There are multiple services that offer TPU support, Cloud AI Platform, GKE, and Vertex AI. For GPU you can have a machine and run any tf version you like. For tpu you need different nodes depending on tf version. Which tf versions are supported per GCP service is inconsistent. Some versions are supported on Cloud AI Platform but not Vertex AI and vice versa. I have had a lot of difficulty trying to upgrade to recent tf versions and discovering the inconsistent service support.
Additionally many operations that run on GPU but are just unsupported for TPU. Sparse tensors have pretty limited support and there's bunch of models that will crash on TPU and require refactoring. Sometimes pretty heavy thousands of lines refactoring.
edit: Pytorch is even worse. Pytorch does not implement efficient tpu device data loading and generally has poor performance no where comparable to tensorflow/jax numbers. I'm unaware of any pytorch benchmarks where tpu actually wins. For tensorflow/jax if you can get it running and your model suits tpu assumptions (so basic CNN) then yes it can be cost effective. For pytorch even simple cases tend to lose.
> Mojo is a closed source language that will never reach mainstream adoption among ML engineers and scientists.
[Citation needed]
The creator, Chris Lattner, previously created LLVM, clang, and Swift. In each case he said these projects would be open sourced, and in each case they were. In each case they reached mainstream adoption in their respective target markets.
He's stated that Mojo will be open source.
If you're going to claim with great confidence that this language will have a different outcome to his previous ones, then you probably should have some strong evidence for that.
hmm the creator says (from his podcast with Lex Friedman when I listened to him) that they are open sourcing it, but that it is a project borne out of their private effort at their company and that it is still being used privately - so the aim is open sourcing it while taking community input and updating their private code to reflect the evolving design so that when they release it their internal lang and the open sourced lang will not diverge.
of course not ideal, but better than "open sourcing" it and refusing every request because it does not work for their codebase. worse than having it open source from the get go, of course.
assuming that day comes, does it have a competitor in the works? a python superset, compatible with python libs, but enables you to go bare metal to the point that it enables you to directly program GPUs and TPUs without CUDA or anything?
"never" means you believe it will never be open sourced, or a competitor will surpass it by the time it is open sourced. or that you believe the premise of the lang is flawed and we don't need such a thing. Which one is it?
From what I see, they have a pretty active community and there is demand for such a system.
The github says something similar:
>This repo is the beginning of our Mojo open source effort. We've started with Mojo code examples and documentation, and we'll add the Mojo standard library as soon as we get the necessary infrastructure in place. The challenge is that we use Mojo pervasively inside Modular and we need to make sure that community contributions can proceed smoothly with good build and testing tools that will allow this repo to become the source of truth (right now it is not). We'll progressively add the necessary components, such as continuous integration, build tools, and more source code over time.
Doesn't really matter. Google's infra is all the client you need to continue pouring tens of billions into a project like this, bonus if others start using it more in the cloud, but they have so much use for accelerators across their own projects they aren't going to stop
The media keeps missing the real lock in Nvidia has: CUDA. It's not the hardware. It's the ability for someone to use it painlessly.