Hacker News new | past | comments | ask | show | jobs | submit login

there are many comments in internet about this, that only subset of frontier math benchmark is "field medal level research", and o3 likely scored on easier subset.

Also, all that stuff is shady in the way that it is just numbers from OAI, which are not reproducible on benchmark sponsored by OAI. If we say OAI could be bad actor, they had plenty of opportunities to cheat on this.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: