I don't get your excitement. How is this different from using 8xGPU box? If you ...

sillysaurusx · on Jan 30, 2020

Mostly because TPUs are in reach of hobbyists. After all, it runs on Colab for free.

In a business context, TPUs seem far cheaper. A preemptible TPUv2-8 only costs $1.35/hr. It looks like 8x Quadro 8000's would cost >$40k.

p1esk · on Jan 31, 2020

Colab is great, can’t argue with free, but in a business context if you look here https://cloud.google.com/tpu/pricing#pricing_example_using_a...

the TPU equivalent of 8x quadro 8000 would be something between tpu v2-32 and tpu v3-32, and the monthly cost of tpu v2-32 is ~$8k. Plus the cost of a beefy VM. Assuming the GPU build sets you back ~$60k, it will start saving you $8k/mo after 6 months.

sillysaurusx · on Jan 31, 2020

A single TPUv2-8 matches 8x quadro 8000 in terms of available memory. (Sort of; the available memory is 300GB, whereas for 8x quadro 8000 it's 384GB.)

TPU pods actually don't require a beefy VM; I'm using a 2GB RAM one.

p1esk · on Jan 31, 2020

In the link I posted: tpu v2-8 has 64GB of total memory, v2-32 has 256GB.

As for the beefy vm - can you do heavy data preprocessing on tpus? For example elastic distortions or scaling for images? Probably not, because usually it involves OpenCV or similar libraries.

sillysaurusx · on Jan 31, 2020

The link is talking about per-core memory. A TPUv2-8 has 300GB system memory, which you can use for training. You can verify this using the notebooks above.

(If a TPUv2-8 has 64GB memory, how can it fine tune GPT-2 1.5B using Adam with batch size 4? That requires almost 300GB.)

p1esk · on Jan 31, 2020

This is interesting. Is there an official specification clarifying this somewhere? Where’s this 300GB of memory physically located?

Are you paying on-demand or preemptible prices? Have you tried larger pod slices to see if they have even more of this “system memory”?

sillysaurusx · on Feb 1, 2020

Yeah, I've seen pod slices allocate 7TB.

A TPUv3 pod is actually a bunch of individual TPUv3-8's linked together. There's 8 cores per device, so a TPUv3-512 has 512 cores divided by 8 cores per device = 64 individual TPUs. (You can get each individual TPU's IP address using `gcloud compute tpus list`: https://imgur.com/Qym4l17)

The big question is, since there are 64 individual TPUs, does that mean we have access to 300GB * 64 = 19.2 TB of memory?

I haven't tested that, but I would bet the answer is yes, for two reasons. 1. I've seen allocations of up to 7TB according to memory usage logs, so 19TB doesn't seem far fetched in comparison. 2. If you create 64 individual TPUv3-8's, then you definitely will have access to 300GB of memory on each TPU, so it's the same engineering problem either way.

Right now, people only seem to use the TPU's CPU for infeed processing / input pipeline transformations. But the CPU is quite fast – it's almost as fast as an actual TPU core.

I wrote up some more about this in a tweet chain if you're interested: https://twitter.com/theshawwn/status/1223395022814339073

Also, if you want to play around with a few TPUv3-8's and you have a GCE project, feel free to DM me on twitter. We just figured out how to forward TPUs to VMs in different projects: https://twitter.com/theshawwn/status/1221241517626445826

Is there an official specification clarifying this somewhere?

Not that I've seen. I stumbled across it by accident. https://twitter.com/theshawwn/status/1163799288771698688

nil-sec · on Feb 1, 2020

So you are saying the system memory is 300GB and you can train your model on the cpu instead? Well yeah you can always do that but training will be slow because your model is not trained on the GPU. What’s the point?

sillysaurusx · on Feb 3, 2020

It's not that slow. And you can use many TPUs together to make up the speed difference.

nil-sec · on Feb 3, 2020

If that were the case I am wondering why anyone would buy GPUs? I invite you to retrain a state of the art model of your choice on a CPU and see how far you get.

sillysaurusx · on Feb 4, 2020

We fine-tuned GPT-2 1.5B for subreddit simulator using this technique. https://www.reddit.com/r/SubSimulatorGPT2Meta/comments/entfg...