Welcome to the party! I joined ML because I realized I could help. You can too. I bet you’re both already thinking of clever ways to deal with massive models from an infrastructure standpoint. That’s just one of hundreds of interesting problems.
To elaborate on the sibling comment: main memory is much bigger, but CPUs are much, much slower. It would be a challenge to merely run a model like this on CPU, and totally infeasible to train one. So the challenge is to fit into the memory of a single GPU you can afford, coordinate multiple GPUs, or efficiently page from main memory into GPU.
Is there any source which explains what billion of parameters actually are?
In my mind a parameter is: language, dialect, perhaps context parameters (food, dinner, lunch, travel) and if we than talk about language and audio perhaps sound waves, gender.
Or are context parameters which gives you insight? Like a billion of parameters are literally something like travel=false, travel-europe=true people speaking=e, age, height,
Parameters are just floating point numbers, at most they can be seen as degrees of freedom or kind of like the order of a polynomial used in curve fitting.
They're too abstract to assign much meaning to individual parameters, as our understanding of why their values are exactly the way they are is extremely limited.
A parameter is a "weight" in this case (the lines drawn from neuron to neuron). The neurons are effectively runtime values or "activations." Parameters (weights) are updated during training and then set as constant during "inference" (also called "prediction").
There's unfortunately a ton of jargon and different groups use different words almost exclusively.
A parameter is a scalar value, most of which are in the attention matrices and feedforward matrices, you also hear these called “weights”. Any intro to DL course will cover these in detail. I recommend started with Andrew Ng’s Coursera class on Intro to Machine Learning, although there may be better ones out there now.