Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Depends on the task, no?

Do you have a sense of what kind of task this benchmark includes? Are they more “general” such that random people would fare well or more specialized (ie something a STEM grad studied and isn’t common knowledge)?



It does, which is why I don’t really subscribe to any test like this being great for actually determining “AGI”. A true AGI would be able to continuously train and create new LLMs that enable it to become a SME in entirely new areas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: