Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fact that GPT-4.1 was the judge does not convince of the validity of the bench.




It’s probably just that they started before gpt 5 was released. It’s a good judge.

it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.

It does have a relatively large context window, and ime is very good at format adherence


You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: