My friend, you've come to the right place. I happen to be a 33yo fellow dinosaur...

My friend, you've come to the right place. I happen to be a 33yo fellow dinosaur. If you thought I was some ML guru, know that I spent the last few months watching an 18yo and a 24yo scale GPT models to 50B parameter sizes -- 'cause they work 16 hours a day, dealing with all of tensorflow's BS. So yeah, you're not alone in feeling like a dinosaur-aged mid thirties hacker, watching the ML world fly by.

That being said, though, it's so cool that TFRC is available to people like you and me. I was nobody at all. Gwern and I were screwing around with GPT at the time -- in fact, I owe Gwern everything, because he's the reason we ended up applying. I thought TFRC was some soulless Google crap that came with a pile of caveats, just like lots of other Google projects. Boy, I was so wrong. So of course I'll ELI5 anything you want to know; it's the least I can do to repay TFRC for granting me superpowers.

>> That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

> It's not literally running on a TPU, is it? I assume it's running on that Ubuntu server that has good ol' CPU that is running the web service + a TPU accelerator doing the number crunching. Or is my world view out of date?

Your confusion here is entirely reasonable. It took a long, long time for me to finally realize that when you hear "a TPU," you should think "a gigantic ubuntu server with 8 GPUs attached."

It's that simple. I thought TPUs were this weird hardware thing. No no, they're just big Ubuntu servers that have 8 hardware accelerators attached. In the same way that you'd use GPUs to accelerate things, you can use Jax to accelerate whatever you want. (I love how friggin' effortless it feels to use TPU accelerators now, thanks to jax.)

So the ELI5 is, when you get your hands on a TPU VM, you get a behemoth of an Ubuntu server -- but it's still "just an Ubuntu server":

  $ tpu-ssh 71
  [...]
  Last login: Sun Jul  4 00:26:35 2021 from 47.232.103.82
  shawn@t1v-n-0f45785c-w-0:~$ uname -a
  Linux t1v-n-0f45785c-w-0 5.4.0-1043-gcp #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Good ol' x86_64.

Now, here's the crazy part. Until one month ago, it was impossible for us to SSH into TPUs, let alone use the accelerators for anything. That means nobody yet has had any time to integrate TPU accelerators into their products.

What I mean is -- you're absolutely correct, my blog is merely running on "an Ubuntu server," whereas I was claiming that it's being "powered by a TPU." It's not using any of the TPU accelerators for anything at all (at least, not for the blog).

But it's easy to imagine a future where, once people realize how effortless it is to use jax to do some heavy lifting, people are going to start adding jax acceleration all over the place.

It feels like a matter of time till one day, you'll run `sudo apt-get install npm` on your TPU, and then it'll turn out that the latest nodejs is being accelerated by the MXU cores. Because that's a thing you can do now. One of the big value-adds here is "libtpu" -- it's a C library that gives you low-level access to the MXU cores that are attached directly to your gigantic Ubuntu server (aka "your TPU".)

Here, check this out: https://github.com/tensorflow/tensorflow/blob/master/tensorf...

Wanna see a magic trick? That's a single, self-contained C file. I was shocked that the instructions to run this were in the comments at the top:

  // To compile: gcc -o libtpu_client libtpu_client.c -ldl
  // To run: sudo ./libtpu_client

... so I SSH'ed into a TPU, ran that, and presto. I was staring at console output indicating that I had just done some high performance number crunching. No python, no jax, nothing -- you have low-level access to everything. It's just a C API.

So, all of that being said, I feel like I can address your questions properly now:

> Again, I have some hesitations interpreting this literally. I assume what you're saying is "Google runs a Jupyter server somewhere in the cloud and it gives you access to TPU compute".

A TPU is just an ubuntu server. The MXU cores are hardware devices attached directly to that server (physically). So when you SSH in, you get a normal server you're familiar with, and you can optionally accelerate anything you can imagine.

(Till recently, it was a total pain in the ass to accelerate anything. Jax changes all that, and libtpu is going to shock the hell out of nvidia when they realize that TPUs are about to eat away at their DGX market. 'Cause libtpu gives you everything nvcc/CUDA gives you -- it's just a matter of time till people build tooling around it and package it up nicely.)

So nope, there's no TPU compute. It's just ye ole Ubuntu server, and it happens to have 8 massive hardware accelerators attached physically. You'd run a jupyter server the same way you run anything else.

So when that jupyter server executes `import jax; jax.get_devices()`, it's literally equivalent to you SSH'ing in, typing `python3` and then doing the same thing. Jax is essentially a convenience layer over the APIs that libtpu gives you at a low level.

Man, I suck at ELI5s. The point is, you can go as low as you want ("just write C! no dependencies! no handholding!") or as high as you want ("jax makes everything easy; if you want to get stuff done, just `import jax` and start doing numerical operations, 'cause every operation by default will be accelerated by the MXU cores -- the things attached physically to the TPU.)

This might clarify things:

  shawn@t1v-n-0f45785c-w-0:~$ ls /dev | grep accel
  accel0
  accel1
  accel2
  accel3

That's where all the low-level magic happens. I was curious how libtpu worked, so I spent a night ripping it apart in Hopper debugger. libtpu consists of a few underlying libraries which interact with /dev/accel0/* to do the low-level communication. Theoretically, you could reverse engineer libtpu, and send signals directly to the hardware yourself. You'd need ~infinite time to figure it out, but it is indeed theoretically possible.

> I don't think I could run, say, a Linux Desktop app with a GUI (falls under "whatever the heck I want")

You can!

> on a TPU if I wanted to, correct?

You should want to! It's easy!

> But, in case I could,

You can! (Sorry for being super annoying; I'm just so excited that it's finally possible. I've waited years...)

> how would I get that kind of direct / low level access to it?

SSH in, then use libtpu for low-level access via C APIs, or jax in python for high-level convenience.

> Are they just giving you a pointer to your instance and you get complete control?

I get total control. I've never once felt like "Oh, that's weird... it blew up. It works on a regular Ubuntu server. Must be some unfortunate TPU corner case..."

It's the opposite. Everything works by default, everything is candy and unicorns and rainbows, and hacking on all of this stuff has been the best damn two years of my life.

Now, I'll calm down and make sure I'm answering your questions properly. The truth is, I'm not quite sure what "complete control" means. But if I wanted to, I could SSH in right now and set up an instance of Hacker News, and then expose it to the world. Hell, I'll just do that:

https://tpucity.gpt4.org/item?id=1

That took like ... 10 minutes to set up. (It's also the world's shittiest HN instance. I'll shut it down soon.)

Here's an archive url:

https://web.archive.org/web/20210704132824/https://tpucity.g...

So yes. You have total control. And as I say there:

> This such a stupid demo. But suffice to say, if you can get Lisp running on a TPU, you can get anything to run.

> Theoretically, arc could use the MXU cores to accelerate its numerical operations, thanks to libtpu.

Have fun.

DM me on twitter if you run into any roadblocks whatsoever: https://twitter.com/theshawwn (Happy to help with anything; even basic questions are more than welcome.)