Hacker News new | past | comments | ask | show | jobs | submit login
John Carmack on the Similarity of Human Learning and LLMs Training (twitter.com/id_aa_carmack)
62 points by amrrs on April 7, 2023 | hide | past | favorite | 75 comments



Humans are not exposed to "billions of words". They are exposed to continuous signals, and somehow extract meaningful representations, like "words" and "objects", out of them.

LLMs are exposed to human-curated data. Let's see an LLM curate its own data out of nothing but experience of raw continuous signals.

Sure, GPT trained on the internet is a nice way to condense and retrieve human-curated data. But it's not going to give us a Lt. Data or C-3P0 that can learn and adapt in real time.


> Humans are not exposed to "billions of words" ... continuous signals

Yes humans need to learn to extract words from signals, but that does not mean it's not correct to count the number of words they've been exposed to in their lifetime. Your comment is like saying we can't count the number of hamburgers someone has eaten, because they actually eat myofibrillar proteins, not burgers.


Though millions seems more realistic (thousands of words per day)


> Humans are not exposed to "billions of words". They are exposed to continuous signals, and somehow extract meaningful representations, like "words" and "objects", out of them.

Plenty of studies showing long term academic achievement differences based on number of unique words babies are exposed to.

And I promise you, having a baby/toddler is really damn close to doing data labeling. Reading picture books, you are basically labeling objects. Walking around the grocery store, you are labeling objects. All the time, again and again, and the same object will get labeled repeatedly, and sometimes it will even be fact checked. A toddler will point to something that they know is an orange, ask "apple?" and you had sure as hell better reply "orange".


>having a baby/toddler is really damn close to doing data labeling

It's analogous, sure, but not really in the same ballpark. There is no AI system today that you can teach by pointing to things and making noises with your mouth. Why not? What's missing? An awful lot.


Is anything missing?

We've got gesture recognition, so we can do pointing; and that part of the system taken in isolation is not made any harder when the label for an image happens to be from, say, a latent space vector from the interior of an unrelated GAN whose sole purpose is to turn audio input into the specific set of muscle outputs which cause the vocal tract to produce output that sounds the same as a recently heard noise.


Why not?

Humans learn based on human curated data too.

School is not natural. Our whole env is neither. Babies can't survive.

I would even go so far to say that the potential model a LLM would create internally might not be that far away of that of a human.

And segment anything was just announced. The performance of zero shot systems is tremendous.

It's not far fetched to assume that chatgpt combined with segment anything together would allow it to create an even more accurate model of the world.


> Humans learn based on human curated data too.

> School is not natural. Our whole env is neither. Babies can't survive

A human goes to school to learn.

Humans in general learned based on experience of the world around them. We invented language, no one taught it to us. We learned to make fire, forge tools, cook food, practice medicine, etc on our own.

School is just how we pass down that learning.


If I took you and pitched you on an island as an exceptionally young child with no further training, most likely you would die. Even more likely you wouldn't have fire and tools. A high proportion of our behaviors can be traced as a continuous learning chain going back eons.


https://en.wikipedia.org/wiki/Feral_child

A lot of these children had trouble with learning language after they returned to human civilization.


Humanity invented those things.

Individual humans were taught those things, either directly or indirectly by observing / listening to others.

We do learn through our experience, of course. But most of what we learn is from others in one form or another.


There's exploration and exploitation. Exploration (discovering fire and learning to create/harness it, inventing language) is slow and expensive. The current focus isn't on replicating human exploration with AI, though, I don't think (though it may become the focus later). Daily stories about AI are mostly mostly about "We got AI to do this thing humans do (sorta)!", which is much more focused on exploiting existing knowledge, and this is true with schoolkids as well, at least to start. Exploitation of existing knowledge is much faster and easier (though still expensive!): load up what we've all learned so far so we can continue to make progress. I think this is what Carmack is referring to. I don't think anyone is saying ChatGPT could discover fire, or cook food.

I am of the opinion that we'll know when we have real AGI because it'll be able to fold my laundry. One can dream.

On the other hand, I'm not so sure an LLM couldn't invent a language, particularly one to use with another AI. It's sci-fi from 1970, but I just rewatched "Colossus: The Forbin Project"[0] recently which features this as a plot point. Really enjoyed it years ago when I first saw it, but it's improved with age (at least for me) now that we are closer to AI.

[0]: https://www.imdb.com/title/tt0064177/


But humans are also trained on human-curated data so much so we put fairly hefty price tags on that curation.

https://miro.medium.com/v2/resize:fit:1056/0*E1eNateTiDThGcY...


We've only just started. Now more capital and more minds will be pouring into solving this.

Buckle up.


I sometimes like to imagine the umwelt of an LLM. Tokens are their sensory atoms, and the statistical relationships between them form their entire reality.

Of course, one could easily build a model whose tokens line up with qualia we experience.


Maybe, but I've been told by folks (and curiously LLMs-themselves) that LLMs are ontologically incapable of having a what-is-it-like-to-be-ness.


Every time it comes up in discussion, I find the same patterns and the same conversations.

Nobody really knows what is necessary or (let alone and) sufficient for having a what-is-it-like-to-be-ness.

Lots of us claim we know, but whenever I've asked for details, it's always been some combination of

(α) circular, e.g. "consciousness is sentience, sentience is consciousness";

(β) something that's so vague it accidentally includes VCRs because they can record information that can be recalled later;

and/or (γ) that doesn't include all humans (most often by excluding those with aphantasia and/or amnesia, but not only).

So… perhaps any given LLMs does, and perhaps it doesn't. Perhaps the structure can support qualia, but only actually gets them when the training process passes the analog of a thermodynamic phase change; or perhaps, even if that's the right general idea, qualia is a different phase change that this model can't support.

Unfortunately, I can't find that phase change analogy with a quick web search, so I'm not sure if that analogy has ever really been made before or if it was just a dream I've misfiled as a memory of reality.


Thar be dragons here, but here’s a nice overview by Chalmers:

https://youtu.be/-BcuCmf00_Y

Personally, I’m inclined to believe that consciousness requires an integrated attention-self-world model, but maybe not much more that that.


Chalmers? Yup, I'm definitely adding that to my Watch Later collection.

Thanks!


No matter what your ontological commitments happen to be, imagining what an LLM's umwelt might be like, and how it might approximate ours, is a useful (and fun!) exercise.


Easily? In principle yes, but we don't have the data for it. (Yet.)


Sure, easily. On example is vision: you'd just have to adjust input preprocessing in existing models.


Also LLM are not AI just parts to AI you need a bunch of interfacing and injection components. The sum of which makes up AI.

I see the progress of AI stagnating while people board the next equivalent of Crypto craze because someone want to financially profit off something not fully seeing realization.

However it is not without merit which Crypto was completely without merit and something to show for itself.

As flawed and imperfect ChatGPT is it clearly mirrors its creators and that is a compliment.


what do you think of midjourney's ability to describe a user-provided image in text? https://the-decoder.com/midjourney-new-image-tool-works-in-r...


I would wager $1,000 that many less words are required with a model that trains on audio, video and text. I bet an android-type machine that can move and interact and train without curated data on everyday human interactions would improve on current LLMs


The context is that John Carmack got $20M to try to make AGI by 2030, based on non-LLM methods. In this tweet he is boosting the idea that there is a gigantic learning gap between the statistical efficiency of human and LLM learning, even though there may not be such a huge gap in standardized test results of LLMs and humans. This means that he might yet find a cool idea that could justify the time and money he is spending to find secret knowledge of the ancients from like pre-1990s AI publications that didn't have GPU access. Maybe it will happen!

https://the-decoder.com/john-carmacks-general-artificial-int...


In a way LLMs are the worst thing that could have happened in the search for artificial intelligence, instead of the harsh winter from the 1990s we are entering now a climate changed winter: warm, products to market race winter.

Funnily enough, the Geoffrey Hinton of today is probably some symbols researcher, shouting in the desert, like Hinton shouted in the 1980s, that we need more than matmul.

One interesting, biology-inspired mechanism, would be quorum sensing [1]: a basal cognition-like, decision-making function in which decentralized systems (bacteria, cells) start building functionality (sensing/decision) from the bottom up. There is no hint for this sort of mechanism in our current artificial 'neural networks'. Not that it should, but our cells and in general cells use these kind of 'tricks' to solve problems in all kinds of spaces (transcriptomics, morphogenetics, etc.) without requiring ridiculous amounts of energy, time, or other resources.

[1] https://en.wikipedia.org/wiki/Quorum_sensing


LLMs may be terrible for the search for AGI but that may be good thing. If, mind you if, LLMs are always going to semi-smart "parrots", then society will get a picture of what an AI might without the AI being able to skynet-style takeover people are maybe justifiably worried.

Now, as far as whether LLM or offshoots can get to AGI, I know plenty of good arguments for them not being able to do that. I don't think the claim that they're just accumulating more abilities in each iteration in an inexplicable way is true. But I've lived long enough to know that you should never get too cocky when one is "arguing with success". So maybe.

Moreover, given that LLM programming is basically just bucket chemistry, if an LLM can the ability to competently pursue long term goals, it seems like it will have a good chance of some of its goals being random cruft that will make it quite dangerous.


When some hybrot [1] will write the history of the 2000s, after AES 256 is broken, we will probably find out that the Skynet takeover already happened, it just wasn't as dramatic as a James Cameron movie: it was a takeover through financialization (with enough money you can rewrite the past, so Mr. Cameron was right on the money about the time machine), it involved some suits, a glass building, and a generally apathetic, divided by design society: "as of 2020, Aladdin managed $21.6 trillion in assets" [2]. Nowadays, Mr. Sam Altman, who probably has good intentions, is speaking of OpenAI and $100 trillion [3] being unleashed into the market. There will be people wanting those $100+ trillion for themselves, and our current system is being designed by those kinds of people, biased towards and fostering their success (Blackrock, Vanguard, State Street currently barely have $30 trillion under management, with world GDP being at $101.5 trillion in 2022 [4]).

Equating LLMs to parrots is offensive, to the parrots. Well, maybe not parrots, but crows are intelligent: "Scientists [5] demonstrate that crows are capable of recursion—a key feature in grammar. Not everyone is convinced" [6].

[1] https://en.wikipedia.org/wiki/Hybrot

[2] https://en.wikipedia.org/wiki/Aladdin_(BlackRock)

[3] https://www.nytimes.com/2023/03/31/technology/sam-altman-ope...

[4] https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nomi...

[5] "Recursive sequence generation in crows", https://www.science.org/doi/10.1126/sciadv.abq3356

[6] https://www.scientificamerican.com/article/crows-perform-yet...


Carmack is no doubt brilliant and it shows through Oculus as a product. But AGI is not about algorithm optimization the way 3D graphics were in the 90s. I am not sure the approach he takes (let’s find a way to write this in assembly) can apply here. He’s said before that AGI is going to be a few thousands of lines of code, ultimately [1], and it’s hard to believe that’s really the case.

That said, I did get a chance to speak with him one on one last year and he really emphasized a few things; the need for being product oriented and giving customers what they want rather than chasing cool engineering (über)solutions; not being careless with resources just because we have more power with modern hardware (he poked fun at React where you spin up a new thread just for an interactive button); and being aware of the inefficiencies brought on by infinite resources (# of engineers and/or funding, which make you think less critically about timelines and delivering within bounded means)

[1] https://youtu.be/I845O57ZSy4


>> But AGI is not about algorithm optimization the way 3D graphics were in the 90s

Wow, you think his ability is straight performance optimisation? A big part of his early fame was from doing research to find good algorithms to achieve his goals. He also had that stint in real time control systems... flying hovering rockets before spaceX even existed.

He's a problem solver with a strong ability to sift through possible solutions for what actually works, and quite capable of devising his own solutions when is research comes up empty.

But yeah, he can write assembler too.


Many problems can be cast as optimization problems, can the problem of AGI be cast as one? Let's let carmack answer.


The guy also tried to build a rocket to reach space via "first principles." It failed, utterly.


It did the thrust vectoring thing. The rocket was ahead of the competition for a moment.

He said games were one of the most complex things humans build and (with implied comparison) the mathematics and physics of rocketry hadn't changed much since the 1960s. Sounds true to me for the 2000s.

Where was the utter failure and first principles?


They had a low budget relative to most of the others and reached break-even profitability but he moved on to other stuff. Where is the trying to reach space by "first principles" quote from?


It’s definitely Good to not put all our eggs in the LLM basket. Even openAI says it’s only a part of the solution - but it has scaled more than most anyone expected. Will that continue forever and we keep getting more impressive emergent behaviors from it? I doubt that but I could certainly be wrong.

It still seems to be missing any sense of what is True, not sure if that’s possible to embed in that model or if we’ll just have some human feedback hacks and eventually get a better model


> Will that continue forever and we keep getting more impressive emergent behaviors from it? I doubt that

I feel like putting the word 'forever' ruins the point. It's the most extreme strawman.

It's amazing how the scaling has unlocked the emergent behaviors! When I look at the scaling graphs, I see that the ability to reduce 'perplexity' is continuing with scale and capital investment with no sign of slowing yet (it will slow eventually). I also see that reducing perplexity is continually unlocking new emergent behaviors. So I would guess that scaling will probably unlock so many more new emergent behaviors before it eventually plateaus!


That debate is becoming irrelevant. Sure it would be great if GTP-6 cost 50k to train instead of 100M. But even if costs 100M it will produce many billions of dollars of value, so the original investment is still relatively small.


Yeah but the difference between those two numbers is one is accessible to individuals (albeit wealthy ones), the other is accessible only to large-ish companies.


IMO, the capabilities have not improved that much between GPT-3 vs GPT-4, the primary difference is just in the size of the context the model can handle at any one time (Max tokens).

While I definitely get "longer" and slightly more "in-depth" answers from GPT-4 vs 3, it already feels like that capability growth curve is starting to plateau.


> IMO, the capabilities have not improved that much between GPT-3 vs GPT-4

I strongly disagree. Anyone who wants to look for themself can see the GPT 4 technical report.

https://arxiv.org/pdf/2303.08774.pdf


Page 9 pretty much stopped me dead in my tracks. There's no shortage of humans, even technically-savvy ones, who wouldn't answer that question correctly.


That is incredibly impressive. Reminds me of these other examples from Microsoft’s “sparks of AGI” paper (summary link): https://ibragimov.org/2023/03/24/sparks-of-agi.html


The distinction may appear subtle in many cases, but in others GPT4 blows 3 away. E.g. recognizing when it doesn't know and admitting it's hypothesising is something I've run into with 4 but not even 3.5.

The capability curve will necessarily appear to plateau when the starting point is as good as it is now. The improvements we recognize will be subtler. Halving the remaining error rate will look less impressive for each step.


It depends how you see good. The capacities of Gpt-3.5/ChatGPT not close to perfect.

ChatGPT is quite good at something like "putting together facts using logic". Things medical diagnosis or legal argument. However, if the activity is "reconciling a summary with details", ChatGPT is pretty reliably terrible. My general recipe is "ask for a summary of a work of fiction, then ask about the relationship of the summary to details you know in the work." The thing reliably spits out falsehood in this situation.


I am yet to see GPT-4 admit to not knowing something rather than just hallucinating made up shit.

If it could reliably tell me when it DOESN'T know something, I'd have a lot more respect for it's capabilities. As it stands today, I'd feel I need to fact check nearly anything it gave me if I'm in an environment that requires high levels of factual accuracy.

Edit: To be clear, I mean it telling me it doesn't know something BEFORE hallucinating something incorrect and being caught out on it by me. It will admit that it lied, AFTER being caught, but it will never (in my experience) state that it doesn't have an answer for something upfront, and will instead default to hallucinating.

Also - even when it does admit to lying, it will often then correct itself with an equally convincing, but often just as untrue "correction" to its original lie. Honestly, anyone who wants to learn how to gaslight people just needs to spend a decent amount of time around GPT-4.


Well, I have. It still certainly goes the way you're describing often as well, to the point I was flabbergasted when it happened, but it did. Specifically I asked it to give me examples of using an NPM module published after the knowledge cutoff.

[To be clear, I did not tell it it was from after the cutoff]


Is there any information available about the non-LLM methods Carmack's AGI startup is researching?


It seems likely that there is a big missing component of visual data. How many Tb of visual data (effectively video) do babies get exposed to?

That is what sets up their mental model of the world, and language is fit over the top of that. It is hard to assess neural net efficiency with that difference in place.


> It seems likely that there is a big missing component of visual data. How many Tb of visual data (effectively video) do babies get exposed to?

Vision doesn't seem relevant for what's missing in general intelligence: there are people blind from birth that nonetheless get very fluent in thought and language. It doesn't cause many significant developmental delays other than in some vision associated stuff like early motor skill acquisition, social stuff from not being able to see facial expressions, etc.

The models may be missing data grounding in 3d space that people born blind do have, like proprioception, touch, and hearing. Deaf and blind from birth can cause more significant developmental delays, but becoming deaf later after some critical threshold doesn't seem to, so you are still looking at potentially only a short amount of years worth of data that seems to be needed (Hellen Keller went deaf and blind at 19 months).


It is going to amazing once LLMs get access to learn from real-life visual information (e.g. daily life videos without montage, or movies), and not just text.


I wonder if it only seems like our brains need relatively little training because we have millions of years of training encoded in our DNA.

No other animal can learn human language or logic to the extent that humans can, no matter how much you train them. But humans learn language easily in the first few years of life.

It’s almost as if the human brain is preprogrammed with the general concepts of all languages, and it just needs to be fine tuned with specific vocabulary and grammar rules.


But we relate the words in a 3d world, llms relate words only to each other


Who is commenting on this and what will they say??!

1-2 trillion tokens of training for gptx class models

It would take 20 years for a human to read that many tokens at 8 hrs of reading per day (although i'm pretty unclear on whether that's unique token sequences or randomly selected token pairs -- feels like a big difference between the two!!)

We are all here now -- who thinks we are anywhere of note together?

Personally I think that models that do not experience incremental change remain an entirely separate class of intelligence from those which evolve continuously -- but I am open to being persuaded otherwise (certainly I never thought making gpt2 bigger would be as impressive as gpt4) ...


Video from a pair of eyes is many terabytes per year though. If you compress it with best codecs. Raw video would be closer to 1PB per year.

I think people overestimate the data efficiency of the biological neural networks w.r.t. transformer models.


People with bad eyesight are not (to the best of my knowledge) less intelligent than those with perfect vision. We could probably supply video at a much lower resolution without significantly affecting the outcome.


True, but it's not like sound and/or touch are low bandwidth channels either.


The power effiency is pretty brutal though, in favour of bio-gelpacks.


How can you be sure about power efficiency, if you don't know how data efficiency compares? It is low power. But so is a phone running llama.cpp


You must be pretty clever to write that, and you are human right? Your data effiency and power effiency seems top notch to me.


Maybe I am, but how many are out there who aren't? And how many smartphones can run alpaca?


>1-2 trillion tokens of training for gptx class models

>It would take 20 years for a human to read that many tokens at 8 hrs of reading per day

Normal reading is ~200 words per minute.

That's 12,000 words per hour.

That's 288,000 words per day (reading for 24 hours straight).

That's 105,120,000 words per year.

That's 2,102,400,000 words per 20 years.

A trillion is 1,000,000,000,000.


Plus almost no one is reading academic papers at 200 words per minute. So for the level of material it could mean that number is halved per 20 years.


And, as humans, we don't only get "tokens" from what we read, but from our entire sensorium.


IMO a lot of the commenters here are missing the point, which is that LLMs are exposed to orders of magnitude more stuff than a human.

Two obvious differences are that humans also learn from the context in which they live (social, environmental) and humans have to basically learn how their own wiring works. And their wiring is/can be highly variable.


LeCun's reply:

> more like a thousand times more. > Between 1 and 2 trillion tokens. > It would take a person 22,000 years to read through 1 trillion words at normal speed for 8 hours a day.

It's kind of surprising to me that around 1000 humans could feasibly read the whole internet (as scraped for LLMs) between each other in only 22 years. The internet feels so much more massive than that. Though of course lots of the stuff on arxiv etc. is so dense there is no way you go through it at normal reading speed.


Q: If we need to feed additional data to the network, do we train it with existing data + new data or we can just train it on a new data? I'm asking because if we need to train it with both old and new data every time we have something new, that's not similar to human learning.


It's extremely similar to training new humans.


LLMs weights are initialized from a normal distribution (or whatever distribution is now SOTA) while some animals can walk the second they are born


Carmack is a smart guy but don't confuse his expertise in other areas with machine learning. Smells of hubris...


He's been heavily researching machine learning for the last few years, and formed an AGI startup last year.

He's not the world's leading researcher or anything, but he's far from green in this space.


Perhaps he feels that the Metaverse thing he sold to Zuck isn't going anywhere, and now tries to ride a new wave.


surprises me a human would be exposed to a billion words. i never thought about it, but a billion is such a large number

edit: oh, does he mean a billion total rather than a billion unique words? haha i r smat


GPT4 probably has a breadth of knowledge 1000x wider than a human, so the fact that it uses more tokens seems reasonable.


I’m surprised to see someone of Carmack’s caliber making this mistake. If you’ve exposed a gorilla or chimp brain to the same amount of language content they still wouldn’t be able to converse like a human can. Meanwhile almost every human can utilize spoken language, despite huge variations in their training data. This implies that most of the language capabilities of the brain depend on structural elements that have been evolutionarily trained and not on the fine-tuning that a specific human achieves in their lifetime. Think of how many billions to trillions of attempts evolution had. And it was building off of models already trained (with many orders of magnitude more iterations) at the tasks of audio processing and sound recognition.

Then add in the fact that we are only talking about language acquisition and all indications are that sufficiently large models (And the human brain is quite the large model) have significant benefits from cross training and it starts to become clear how false of a premise this is. 11 million bits of neural information are being pumped to our brains every second of our lives. That’s 80 gb of external data ingested during our waking hours in a single day plus who knows what sorts of internal synthetically generated data, all of this just for fine-tuning.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: