You can think of a parameter as a number you can tweak while training. This netw...

sirk390 · on April 11, 2022

And if every parameter is one byte, the minimum, it will take at least 70gb to save or share this model. So it's still way to big to package directly in a app.

cshimmin · on April 11, 2022

From the paper, they are using bfloat16, so I guess two bytes. But distributing and "packaging into an app" are not at all of practical interest for these kinds of models. You (a consumer) would interact via some API service, with the model running on a hardware-accelerated compute cloud.

In any case, during training (where the model is run in possibly large batches), and even during inference, the size of the parameters is completely dwarfed by the intermediate tensor representations.

brrrrrm · on April 11, 2022

> even during inference, the size of the parameters is completely dwarfed by the intermediate tensor representations

What makes you say this?

cshimmin · on April 11, 2022

It's especially true for models that do some kind of weight sharing, which is very common (CNNs, RNNs, transformers, etc). For a concrete example, consider a layer from an image convolutional network, which maps from a 3-dim colorspace to a 128-dim feature space. Assuming a 5x5 kernel that's about 10k parameters. However, after applying this layer, you go from having an (B,H,W,3) tensor to a (B,H-4,W-4,128) tensor, where H,W are the height and width of the image, and B is the number of images in the batch. If you're working with even moderately high resolution images, the memory required for these intermediate tensors at each layer is much larger than the parameters.

Something similar applies for RNNs (same weights applied at each element of a sequence), GNNs and transformers (same weights applied at each pair of data).

lostmsu · on April 11, 2022

Have you seen modern games?

sva_ · on April 11, 2022

I doubt they load that amount of data in memory

replygirl · on April 11, 2022

I'm thinking about upgrading from 64gb to 128gb so i can use all my Cities: Skylines assets in the same map

lostmsu · on April 11, 2022

Right, they usually stream assets as they are requested. Large models do the same.