Depends on the task, no? Do you have a sense of what kind of task this benchmark... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		herval 11 months ago \| parent \| context \| favorite \| on: OpenAI O3 breakthrough high score on ARC-AGI-PUB Depends on the task, no? Do you have a sense of what kind of task this benchmark includes? Are they more “general” such that random people would fare well or more specialized (ie something a STEM grad studied and isn’t common knowledge)?

judge2020 11 months ago [–]

It does, which is why I don’t really subscribe to any test like this being great for actually determining “AGI”. A true AGI would be able to continuously train and create new LLMs that enable it to become a SME in entirely new areas.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact