For long context sizes AGI is not useless without vast knowledge. You could always put a bootstrap sequence into the context (think Arecibo Message), followed by your prompt. A general enough reasoner with enough compute should be able to establish the context and reason about your prompt.
Yes, but that just effectively recreates the pretraining. You're going to have to explain everything down to what an atom is, and essentially all human knowledge if you want to have any ability to consider abstract solutions that call on lessons from foreign domains.
There's a reason people with comparable intelligence operate at varying degrees of effectiveness, and it has to do with how knowledgeable they are.
Would that make in-context learning a superset or a subset of pretraining?
This paper claimed transformers learn a gradient-descent mesa-optimizer as part of in-context learning, while being guided by the pretraining objective, and as the parent mentioned, any general reasoner can bootstrap a world model from first principles.
> Would that make in-context learning a superset or a subset of pretraining?
I guess a superset. But it doesn't really matter either way. Ultimately, there's no useful distinction between pretraining and in-context learning. They're just an artifact of the current technology.
And no, I don't think the knowledge of language is necessary. To give a concrete example, tokens from TinyStories dataset (the dataset size is ~1GB) are known to be sufficient to bootstrap basic language.
I'm not at all experienced in neuroscience, but I think that humans and other animals primarily gain intelligence by learning from their sensory input.
this is pretty vague. I certainly dont think a mastery of any concept invented in last thousand years would be considered encoded in genes though we would want or expect an AGI to be able to learn calculus for instance. In terms of "encoded in genes", I'd say most of what is asked or expected of AGI goes beyond what feral children (https://en.wikipedia.org/wiki/Feral_child) were able to demonstrate.
I don't disagree, but I think there is much more information encoded in the brain. I believe this phenomenon is called the genomic bottleneck.
There are a few orders of magnitude more neural connections in a human than there are base pairs in a human genome. I would also assume that there are more than 4 possible ways for neural connections to be formed, while there are only 4 possible base pairs. Also, most genetic information corresponds to lower level functions.
I don't think so. A lot of useful specialized problems are just patterns. Imagine your IDE could take 5 examples of matching strings and produce a regex you can count on working? It doesn't need to know the capital of Togo, metabolic pathways of the eukaryotic cell, or human psychology.
For that matter, if it had no pre-training, it means it can generalize to any new programming languages, libraries, and entire tasks. You can use it to analyze the grammar of a dying African language, write stories in the style of Hemingway, and diagnose cancer on patient data. In all of these, there are only so many samples to fit on.
Of course, none of us have exhaustive knowledge. I don't know the capital of Togo.
But I do have enough knowledge to know what an IDE is, and where that sits in a technological stack, i know what a string is, and all that it relies on etc. There's a huge body of knowledge that is required to even begin approaching the problem. If you posted that challenge to an intelligent person from 2000 years ago, they would just stare at you blankly. It doesn't matter how intelligent they are, they have no context to understand anything about the task.
> If you posted that challenge to an intelligent person from 20,00 years ago, they would just stare at you blankly.
Depending on how you pose it. If I give you a long enough series of ordered cards, you'll on some basic level begin to understand the spatiotemporal dynamics of them. You'll get the intuition that there's a stack of heads scanning the input, moving forward each turn, either growing the mark, falling back, or aborting. If not constrained by using matrices, I can draw you a state diagram, which would have much clearer immediate metaphors than colored squares.
Do these explanations correspond to some priors in human cognition? I suppose. But I don't think you strictly need them for effective few-shot learning. My main point is that learning itself is a skill, which generalist LLMs do possess, but only as one of their competencies.
Well Dr. Michael Levin would agree with you in the sense that he ascribes intelligence to any system that can accomplish a goal through multiple pathways. So for instance the single-celled Lacrymaria, lacking a brain or nervous system, can still navigate its environment to find food and fulfill its metabolic needs.
However, I assumed what we're talking about when we discuss AGI is what we'd expect a human to be able to accomplish in the world at our scale. The examples of learning without knowledge you've given, to my mind at least, are a lower level of intelligence that doesn't really approach human level AGI.
A lot of useful specialized problems are just patterns.
It doesn't need to know the capital of Togo, metabolic pathways of the eukaryotic cell, or human psychology.
What if knowing those things distills down to a pattern that matches a pattern of your code and vice versa? There's a pattern in everything, so know everything, and be ready to pattern match.
If you just look at object oriented programming, you can easily see how knowing a lot translates to abstract concepts. There's no reason those concepts can't be translated bidirectionally.
> The pretraining is the knowledge, not the intelligence.
I thought the knowledge is the training set and the intelligence is the emergent/side effect of reproducing that knowledge by making sure the reproduction is not rote memorisation?
I'd say that it takes intelligence to encode knowledge, and the more knowledge you have, the more intelligently you can encode further knowledge, in a virtuous cycle. But once you have a data set of knowledge, there's nothing to emerge, there are no side effects. It just sits there doing nothing. The intelligence is in the algorithms that access that encoded knowledge to produce something else.
It takes knowledge to even know they're flawed, noisy, and disconnected. There's no reason to "correct" anything, unless you have knowledge that applying previously "understood" data has in fact produced deficient results in some application.
That's reinforcement learning -- an algorithm, which requires accurate knowledge acquisition, to be effective.
Every statistical machine learning algorithm, including RL, deals with noisy data. The process of fitting aims to remove the sampling noise, revealing the population distribution, thereby compressing it into a model.
The argument being advanced is that intelligence is the proposal of more parsimonious models, aka compression.