Hacker News new | past | comments | ask | show | jobs | submit login

> Nowhere do they define "AGI"

Ummm, maybe you should have looked? At the top of the very first prediction, here: https://www.metaculus.com/questions/5121/date-of-artificial-...

We will thus define "an AI system" as a single unified software system that can satisfy the following criteria, all completable by at least some humans.

Able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A single demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of Metaculus Admins.

Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators and when given human-readable instructions, satisfactorily assemble a (or the equivalent of a) circa-2021 Ferrari 312 T4 1:8 scale automobile model. A single demonstration of this ability, or a sufficiently similar demonstration, will be considered sufficient.

High competency at a diverse fields of expertise, as measured by achieving at least 75% accuracy in every task and 90% mean accuracy across all tasks in the Q&A dataset developed by Dan Hendrycks et al..

Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al. Top-1 accuracy is distinguished, as in the paper, from top-k accuracy in which k outputs from the model are generated, and the best output is selected.

By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)

Resolution will come from any of three forms, whichever comes first: (1) direct demonstration of such a system achieving ALL of the above criteria, (2) confident credible statement by its developers that an existing system is able to satisfy these criteria, or (3) judgement by a majority vote in a special committee composed of the question author and two AI experts chosen in good faith by him, for the sole purpose of resolving this question. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public.




I think we're reaching a point where the Turing test is no longer useful. If you get into the nitty-gritty of it (instead of just handwaving "computer should act like person"), it's about roleplaying a fake identity. Which is a specific skill, not a general test of competence.


The Turing test seems to be a product of an era where the nature and capabilities of artificial intelligence were still in the realms of the unknown. Because of that it was difficult to conceive a specific test that could measure its abilities. So the test ended up focusing on human intelligence—the most advanced form of intelligence known at that time—as the benchmark for AI.

To illustrate, imagine if an extraterrestrial race created a Turing-style test, with their intelligence serving as the gold standard. Unless their cognitive processes closely mirrored ours, it's doubtful that humans would pass such an examination


Thank you. It was arguably never useful beyond an intuition pump. It's a test of credulity, of susceptibility to pareidolia, not reasoning ability.


Correct, which is part of the reason the "weak" AGI is relatively out there. Will anyone bother dumbing down an AI to pass a Turing Test? "Oh a human can't write a poem that fast -- it's an AI!"


Yup, missed that, thanks. Has anyone scored GPT-4 on the APPs benchmark?

I believe that if you take GPT-4 multimodal integrated with Eleven Labs and Whisper then there is a shot at passing that extended Turing test, if designed fairly. The wording is still a bit ambiguous.

Also assembling that particular scale model is probably challenging but not really a general task and something that could be probably be achieved with simulated sensors and effectors given a 3-4 month engineering effort into utilizing advanced techniques (maybe training an existing multimodal LLM and integrating it with some kind of RL-based robot controller?) at interpreting and acting on those kinds of instructions. It would be possible to integrate it with the LLM such that it could report its projects and identify objects during assembly.

So my takeaway is that with some serious attempts and an honest assessment of this bar, an AI would be able to pass that this year or next. I mean I don't know how far GPT-4 is from the 75%/90% but I doubt it is that far and so expect if not GPT-4 then GPT-4.5 or 5 could pass given some engineering effort aimed at the test competencies.

If people really are thinking 2030 or 2040 when they read "AGI" and respond to that poll (I suspect some didn't read the definition) then that would indicate that people are just ignorant of the reality of how far along we are, or in denial. Or a little of both.


You do realize that many, if not most, humans would fail this test, right?


Yes you'll find that any testable definition of AGI that has not been passed yet would be unpassable for a big chunk of the human population.

In other words, General, Artificial and Intelligent have been passed. That's why a few papers/researchers opt to call these models "General Artificial Intelligence" instead

https://jamanetwork.com/journals/jama/article-abstract/28064...

https://arxiv.org/abs/2303.12003

Or some such variant like "General Purpose Technologies" as Open AI did.

https://arxiv.org/abs/2303.10130

since "AGI" has so much baggage with posts shifting at the speed of light.


AGI is competing with human culture as a whole.

Individual humans are not exactly the best of all possible tests for AGI.


Yes, but humans as a group can do it. An AGI needs to show a similar number of AGIs can do the same given the same starting template.

The AGI will need to look at all of the tasks written, determine what the success criteria is, and then combine that that into a single set of answers. With the instructions in human-readable form, not machine readable. It can use as many or as few AGIs as it needs to accomplish this.

It's the same as if we gave these instructions to a human with sufficient skill and resources to delegate.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: