Hacker News new | past | comments | ask | show | jobs | submit login

In most non-competitive coding benchmarks (aider, live bench, swe-bench), o1 ranks worse than Sonnet (so the benchmarks aren't saying anything different) or at least did, the new checkpoint 2 days ago finally pushed o1 over sonnet on livebench.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: