| Name | Semi-private eval | Public eval | |--------------------------------------|-------------------|-------------| | Jeremy Berman | 53.6% | 58.5% | | Akyürek et al. | 47.5% | 62.8% | | Ryan Greenblatt | 43% | 42% | | OpenAI o1-preview (pass@1) | 18% | 21% | | Anthropic Claude 3.5 Sonnet (pass@1) | 14% | 21% | | OpenAI GPT-4o (pass@1) | 5% | 9% | | Google Gemini 1.5 (pass@1) | 4.5% | 8% |