>It is silly, the logic is the same: "Only a (world-altering) 'AGI' could do [test]" -> test is passed -> no (world-altering) 'AGI' -> conclude that [test] is not a sufficient test for (world-altering) 'AGI' -> chase new benchmark.
Basically nobody today thinks beating a single benchmark and nothing else will make you a General Intelligence. As you've already pointed out out, even the maintainers of ARC-AGI do not think this.
>If you want to play games about how to define AGI go ahead.
I'm not playing any games. ENIAC cannot do 99% of the things people use computers to do today and yet barely anybody will tell you it wasn't the first general purpose computer.
On the contrary, it is people who seem to think "General" is a moniker for everything under the sun (and then some) that are playing games with definitions.
>People have been claiming for years that we've already reached AGI and with every improvement they have to bizarrely claim anew that now we've really achieved AGI.
Who are these people ? Do you have any examples at all. Genuine question
>But after a few months people realize it still doesn't do what you would expect of an AGI and so you chase some new benchmark ("just one more eval").
What do you expect from 'AGI'? Everybody seems to have different expectations, much of it rooted in science fiction and not even reality, so this is a moot point. What exactly is World Altering to you ? Genuinely, do you even have anything other than a "I'll know it when i see it ?"
If you introduce technology most people adopt, is that world altering or are you waiting for Skynet ?
> Basically nobody today thinks beating a single benchmark and nothing else will make you a General Intelligence.
People's comments, including in this very thread, seem to suggest otherwise (c.f. comments about "goal post moving"). Are you saying that a widespread belief wasn't that a chess playing computer would require AGI? Or that Go was at some point the new test for AGI? Or the Turing test?
> I'm not playing any games... "General" is a moniker for everything under the sun that are playing games with definitions.
People have a colloquial understanding of AGI whose consequence is a significant change to daily life, not the tortured technical definition that you are using. Again your definition isn't something anyone cares about (except maybe in the legal contract between OpenAI and Microsoft).
> Who are these people ? Do you have any examples at all. Genuine question
How about you? I get the impression that you think AGI was achieved some time ago. It's a bit difficult to simultaneously argue both that we achieved AGI in GPT-N and also that GPT-(N+X) is now the real breakthrough AGI while claiming that your definition of AGI is useful.
> What do you expect from 'AGI'?
I think everyone's definition of AGI includes, as a component, significant changes to the world, which probably would be something like rapid GDP growth or unemployment (though you could have either of those without AGI). The fact that you have to argue about what the word "general" technically means is proof that we don't have AGI in a sense that anyone cares about.
>People's comments, including in this very thread, seem to suggest otherwise (c.f. comments about "goal post moving").
But you don't see this kind of discussion on the narrow models/techniques that made strides on this benchmark, do you ?
>People have a colloquial understanding of AGI whose consequence is a significant change to daily life, not the tortured technical definition that you are using
And ChatGPT has represented a significant change to the daily lives of many. It's the fastest adopted software product in history. In just 2 years, it's one of the top ten most visited sites on the planet worldwide. A lot of people have had the work they do significant change since its release. This is why I ask, what is world altering ?
>How about you? I get the impression that you think AGI was achieved some time ago.
Sure
>It's a bit difficult to simultaneously argue both that we achieved AGI in GPT-N and also that GPT-(N+X) is now the real breakthrough AGI
I have never claimed GPT-N+X is the "new breakthrough AGI". As far as I'm concerned, we hit AGI sometime ago and are making strides in competence and/or enabling even more capabilities.
You can recognize ENIAC as a general purpose computer and also recognize the breakthroughs in computing since then. They're not mutually exclusive.
And personally, I'm more impressed with o3's Frontier Math score than ARC.
>I think everyone's definition of AGI includes, as a component, significant changes to the world
Sure
>which probably would be something like rapid GDP growth or unemployment
What people imagine as "significant change" is definitely not in any broad agreement.
Even in science fiction, the existence of general intelligences more competent than today's LLMs does not necessarily precursor massive unemployment or GDP growth.
And for a lot of people, the clincher stopping them from calling a machine AGI is not even any of these things. For some, that it is "sentient" or "cannot lie" is far more important than any spike of unemployment.
> But you don't see this kind of discussion on the narrow models/techniques that made strides on this benchmark, do you ?
I don't understand what you are getting at.
Ultimately there is no axiomatic definition of the term AGI. I don't think the colloquial understanding of the word is what you think it is (i.e. if you had described to people, pre-chatgpt, today's chatgpt behavior, including all the limitations and failings and the fact that there was no change in GDP, unemployment, etc), and asked if that was AGI I seriously doubt they would say yes.)
More importantly I don't think anyone would say their life was much different from a few years ago and separately would say under AGI it would be.
But the point that started all this discussion is the fact that these "evals" are not good proxies for AGI and no one is moving goal-posts even if they realize this fact only after the tests have been beaten. You can foolishly define AGI as beating ARC but the moment ARC is beaten you realize that you don't care about that definition at all. That doesn't change if you make a 10 or 100 benchmark suite.
If such discussions only made when LLMs make strides in the benchmark then it's not just about beating the benchmark but also what kind of system is beating it.
>You can foolishly define AGI as beating ARC but the moment ARC is beaten you realize that you don't care about that definition at all.
If you change your definition of AGI the moment a test is beaten then yes, you are simply post moving.
If you care about other impacts like "Unemployment" and "GDP rising" but don't give any time or opportunity to see if the model is capable of such then you don't really care about that and are just mindlessly shifting posts.
How do such a person know o3 won't cause mass unemployment? The model hasn't even been released yet.
> If such discussions only made when LLMs make strides in the benchmark then it's not just about beating the benchmark but also what kind of system is beating it.
I still don't understand the point you are making. Nobody is arguing that discrete program search is AGI (and the same counter-arguments would apply if they did).
> If you change your definition of AGI the moment a test is beaten then yes, you are simply post moving.
I don't think anyone changes their definition, they just erroneously assume that any system that succeeds on the test must do so only because it has general intelligence (that was the argument for chess playing for example). When it turns out that you can pass the test with much narrower capabilities they recognize that it was a bad test (unfortunately they often replace the bad test with another bad test and repeat the error).
> If you care about other impacts like "Unemployment" and "GDP rising" but don't give any time or opportunity to see if the model is capable of such then you don't really care about that and are just mindlessly shifting posts.
We are talking about what models are doing now (is AGI here now) not what some imaginary research breakthroughs might accomplish. O3 is not going to materially change GDP or unemployment. (If you are confident otherwise please say how much you are willing to wager on it).
I'm not talking about any imaginary research breakthroughs. I'm talking about today, right now. We have a model unveiled today that seems a large improvement across several benchmarks but hasn't been released yet.
You can be confident all you want but until the model has been given the chance to not have the effect you think it won't then it's just an assertion that may or may not be entirely wrong.
If you say "this model passed this benchmark I thought would indicate AGI but didn't do this or that so I won't acknowledge it" then I can understand that. I may not agree on what the holdups are but I understand that.
If however you're "this model passed this benchmark I thought would indicate AGI but I don't think it's going to be able to do this or that so it's not AGI" then I'm sorry but that's just nonsense.
My thoughts or bets are irrelevant here.
A few days ago I saw someone seriously comparing a site with nearly 4B visits a month in under 2 years to Bitcoin and VR. People are so up in their bubbles and so assured in their way of thinking they can't see what's right in front of them, nevermind predict future usefulness. I'm just not interested in engaging "I think It won't" arguments when I can just wait and see.
I'm not saying you are one of such people. I just have no interest in such arguments.
My bet ? There's no way i would make a bet like that without playing with the model first. Why would I ? Why Would you ?
> I'm not talking about any imaginary research breakthroughs. I'm talking about today, right now.
I explicitly said so was I. I said today we don’t have large impact societal changes that people have conventionally associated with the term AGI. I also explicitly talked about how I don’t believe o3 will change this and your comments seem to suggest neither do you (you seem to prefer to emphasize that it isn’t literally impossible that o3 will make these transformative changes).
> If however you're "this model passed this benchmark I thought would indicate AGI but I don't think it's going to be able to do this or that so it's not AGI" then I'm sorry but that's just nonsense.
The entire point of the original chess example was to show that in fact it is the correct reaction to repudiate incorrect beliefs of naive litmus test of AGI-ness. If we did what you are arguing then we should accept AGI having occurred after chess was beaten because a lot of people believed that was the litmus test? Or that we should praise people who stuck to their original beliefs after they were proven wrong instead of correcting them? That’s why I said it was silly at the outset.
> My thoughts or bets are irrelevant here
No they show you don’t actually believe we have society transformative AGI today (or will when o3 is released) but get upset when someone points that out.
> I'm just not interested in engaging "I think It won't" arguments when I can just wait and see.
A lot of life is about taking decisions based on predictions about the future, including consequential decisions about societal investment, personal career choices, etc. For many things there isn’t a “wait and see approach”, you are making implicit or explicit decisions even by maintaining the status quo. People who make bad or unsubstantiated arguments are creating a toxic environment in which those decisions are made, leading personal and public harm. The most important example of this is the decision to dramatically increase energy usage to accommodate AI models despite impending climate catastrophe on the blind faith that AI will somehow fix it all (which is far from the “wait and see” approach that you are supposedly advocating by the way, this is an active decision).
> My bet ? There's no way i would make a bet like that without playing with the model first. Why would I ? Why Would you ?
You can have beliefs based on limited information. People do this all the time. And if you actually revealed that belief it would demonstrate that you don’t actually currently believe o3 is likely to be world transformative
>You can have beliefs based on limited information. People do this all the time. And if you actually revealed that belief it would demonstrate that you don’t actually currently believe o3 is likely to be world transformative
Cool...but i don't want to in this matter.
I think the models we have today are already transformative. I don't know if o3 is capable of causing sci-fi mass unemployment (for white collar work) and wouldn't have anything other than essentially a wild guess till it is released. I don't want to make a wild guess. Having beliefs on limited information is often necessary but it isn't some virtue and in my opinion should be avoided when unnecessary. It is definitely not necessary to make a wild guess about model capabilities that will be released next month.
>The entire point of the original chess example was to show that in fact it is the correct reaction to repudiate incorrect beliefs of naive litmus test of AGI-ness. If we did what you are arguing then we should accept AGI having occurred after chess was beaten because a lot of people believed that was the litmus test?
Like i said, if you have some other caveats that weren't beaten then that's fine. But it's hard to take seriously when you don't.
>This model was trained to pass this test, it was trained heavily on the example questions, so it was a narrow technique.
You are allowed to train on the train set. That's the entire point of the test.
>We even have proof that it isn't AGI, since it scores horribly on ARC-AGI 2. It overfitted for this test.
Arc 2 does not even exist yet. All we have are "early signs", not that that would be proof of anything. Whether I believe the models are generally intelligent or not doesn't depend on ARC
> You are allowed to train on the train test. That's the entire point of the test.
Right, but by training on those test cases you are creating a narrow model. The whole point of training questions is to create narrow models, like all the models we did before.
That doesn't make any sense. Training on the train set does not make the models capabilities narrow. Models are narrow when you can't train them to do anything else even if you wanted to.
You are not narrow for undergoing training and it's honestly kind of ridiculous to think so. Not even the ARC maintainers believe so.
> Training on the train set does not make the models capabilities narrow
Humans didn't need to see the training set to pass this, the AI needing it means it is narrower than the humans, at least on these kind of tasks.
The system might be more general than previous models, but still not as general as humans, and the G in AGI typically means being as general as humans. We are moving towards more general models, but still not at the level where we call them AGI.
Basically nobody today thinks beating a single benchmark and nothing else will make you a General Intelligence. As you've already pointed out out, even the maintainers of ARC-AGI do not think this.
>If you want to play games about how to define AGI go ahead.
I'm not playing any games. ENIAC cannot do 99% of the things people use computers to do today and yet barely anybody will tell you it wasn't the first general purpose computer.
On the contrary, it is people who seem to think "General" is a moniker for everything under the sun (and then some) that are playing games with definitions.
>People have been claiming for years that we've already reached AGI and with every improvement they have to bizarrely claim anew that now we've really achieved AGI.
Who are these people ? Do you have any examples at all. Genuine question
>But after a few months people realize it still doesn't do what you would expect of an AGI and so you chase some new benchmark ("just one more eval").
What do you expect from 'AGI'? Everybody seems to have different expectations, much of it rooted in science fiction and not even reality, so this is a moot point. What exactly is World Altering to you ? Genuinely, do you even have anything other than a "I'll know it when i see it ?"
If you introduce technology most people adopt, is that world altering or are you waiting for Skynet ?