In most non-competitive coding benchmarks (aider, live bench, swe-bench), o1 ran...

og_kalu 6 months ago | parent | context | favorite | on: OpenAI O3 breakthrough high score on ARC-AGI-PUB

In most non-competitive coding benchmarks (aider, live bench, swe-bench), o1 ranks worse than Sonnet (so the benchmarks aren't saying anything different) or at least did, the new checkpoint 2 days ago finally pushed o1 over sonnet on livebench.