Hacker News new | past | comments | ask | show | jobs | submit login

You can think of a parameter as a number you can tweak while training. This network has 70B such numbers.



And if every parameter is one byte, the minimum, it will take at least 70gb to save or share this model. So it's still way to big to package directly in a app.


From the paper, they are using bfloat16, so I guess two bytes. But distributing and "packaging into an app" are not at all of practical interest for these kinds of models. You (a consumer) would interact via some API service, with the model running on a hardware-accelerated compute cloud.

In any case, during training (where the model is run in possibly large batches), and even during inference, the size of the parameters is completely dwarfed by the intermediate tensor representations.


> even during inference, the size of the parameters is completely dwarfed by the intermediate tensor representations

What makes you say this?


It's especially true for models that do some kind of weight sharing, which is very common (CNNs, RNNs, transformers, etc). For a concrete example, consider a layer from an image convolutional network, which maps from a 3-dim colorspace to a 128-dim feature space. Assuming a 5x5 kernel that's about 10k parameters. However, after applying this layer, you go from having an (B,H,W,3) tensor to a (B,H-4,W-4,128) tensor, where H,W are the height and width of the image, and B is the number of images in the batch. If you're working with even moderately high resolution images, the memory required for these intermediate tensors at each layer is much larger than the parameters.

Something similar applies for RNNs (same weights applied at each element of a sequence), GNNs and transformers (same weights applied at each pair of data).


Have you seen modern games?


I doubt they load that amount of data in memory


I'm thinking about upgrading from 64gb to 128gb so i can use all my Cities: Skylines assets in the same map


Right, they usually stream assets as they are requested. Large models do the same.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: