| Name | Semi-private eval | Public eval | |------------------------------------... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

throwaway71271 23 days ago | parent | context | favorite | on: OpenAI O3 breakthrough high score on ARC-AGI-PUB

    | Name                                 | Semi-private eval | Public eval |
    |--------------------------------------|-------------------|-------------|
    | Jeremy Berman                        | 53.6%             | 58.5%       |
    | Akyürek et al.                       | 47.5%             | 62.8%       |
    | Ryan Greenblatt                      | 43%               | 42%         |
    | OpenAI o1-preview (pass@1)           | 18%               | 21%         |
    | Anthropic Claude 3.5 Sonnet (pass@1) | 14%               | 21%         |
    | OpenAI GPT-4o (pass@1)               | 5%                | 9%          |
    | Google Gemini 1.5 (pass@1)           | 4.5%              | 8%          |

https://arxiv.org/pdf/2412.04604

kandesbunzler 23 days ago | [–]

why is this missing the o1 release / o1 pro models? Would love to know how much better they are

Freebytes 22 days ago | | [–]

This might be because they are referencing single step, and I do not think o1 is single step.

aimanbenbaha 23 days ago | [–]

Akyürek et al uses test-time compute.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact