To elaborate on the sibling comment: main memory is much bigger, but CPUs are mu...

kristjansson on April 11, 2022 | parent | context | favorite | on: DeepMind’s New Language Model, Chinchilla

To elaborate on the sibling comment: main memory is much bigger, but CPUs are much, much slower. It would be a challenge to merely run a model like this on CPU, and totally infeasible to train one. So the challenge is to fit into the memory of a single GPU you can afford, coordinate multiple GPUs, or efficiently page from main memory into GPU.