I'm fairly confident each of those is a 2-byte float, but yes that's over 100 GB...

sillysaurusx · on April 11, 2022

Welcome to the party! I joined ML because I realized I could help. You can too. I bet you’re both already thinking of clever ways to deal with massive models from an infrastructure standpoint. That’s just one of hundreds of interesting problems.

native_samples · on April 11, 2022

Is 100GB of parameters really that large? 128GB of RAM on a server class machine is not unusual. Seems such a model could fit entirely in RAM.

kristjansson · on April 11, 2022

To elaborate on the sibling comment: main memory is much bigger, but CPUs are much, much slower. It would be a challenge to merely run a model like this on CPU, and totally infeasible to train one. So the challenge is to fit into the memory of a single GPU you can afford, coordinate multiple GPUs, or efficiently page from main memory into GPU.

andbberger · on April 11, 2022

GPU memory is generally much smaller and more expensive

Delitio · on April 11, 2022

Is there any source which explains what billion of parameters actually are?

In my mind a parameter is: language, dialect, perhaps context parameters (food, dinner, lunch, travel) and if we than talk about language and audio perhaps sound waves, gender.

Or are context parameters which gives you insight? Like a billion of parameters are literally something like travel=false, travel-europe=true people speaking=e, age, height,

dotnet00 · on April 11, 2022

Parameters are just floating point numbers, at most they can be seen as degrees of freedom or kind of like the order of a polynomial used in curve fitting.

They're too abstract to assign much meaning to individual parameters, as our understanding of why their values are exactly the way they are is extremely limited.

brrrrrm · on April 11, 2022

A good visual introduction to neural networks can be found here: https://playground.tensorflow.org

A parameter is a "weight" in this case (the lines drawn from neuron to neuron). The neurons are effectively runtime values or "activations." Parameters (weights) are updated during training and then set as constant during "inference" (also called "prediction").

There's unfortunately a ton of jargon and different groups use different words almost exclusively.

matt123456789 · on April 11, 2022

A parameter is a scalar value, most of which are in the attention matrices and feedforward matrices, you also hear these called “weights”. Any intro to DL course will cover these in detail. I recommend started with Andrew Ng’s Coursera class on Intro to Machine Learning, although there may be better ones out there now.

Delitio · on April 11, 2022

Input parameter vs. weights then?

I see tx

lostmsu · on April 11, 2022

These networks (text models) usually have around a few thousand inputs.

jefft255 · on April 11, 2022

The parameters are the number of weights in a neural network, in this case.

nl · on April 11, 2022

It's rare a single parameter maps to a human understandable concept. Occasionally someone finds one that does map fairly well, for example this case back in 2017: https://openai.com/blog/unsupervised-sentiment-neuron/#senti...