Hacker News new | past | comments | ask | show | jobs | submit login

Unlike biplanes, CPU's with more transistors are more powerful than those with less. And adding more CPU cores keeps increasing the amount of threads you can run at the same time.

Why would LMM's be more like the biplanes analogy, and less like the CPU analogy?




In general you can view "understanding" as a compression of information. You take in a bunch of information, detect an underlying pattern, and remember the pattern and necessary context, instead of the entire input.

The "problem" with larger neural networks is that they can store more information, so they can substitute understanding with memorization. Something similar happens with human students, who can stuff lots of raw information into short-term-memory, but to fit it into the much more precious long-term-memory you have to "understand" the topic, not just memorize it. In neural networks we call that memorization a failure to generalize. Just like a human, a network that just memorizes doesn't do well if you ask it about anything slightly different than the training data.

Of course it's a balance act, because a network that's too small doesn't have space to store enough "understanding" and world model. A lot of the original premise of OpenAI was to figure out if LLMs keep getting better if you make them bigger, and so far that has worked. But there is bound to be a ceiling on this, where making the model bigger starts making it dumber.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: