To elaborate on the sibling comment: main memory is much bigger, but CPUs are much, much slower. It would be a challenge to merely run a model like this on CPU, and totally infeasible to train one. So the challenge is to fit into the memory of a single GPU you can afford, coordinate multiple GPUs, or efficiently page from main memory into GPU.