The fact that GPT-4.1 was the judge does not convince of the validity of the ben...

ripped_britches · 2025-11-06T18:59:46 1762455586

It’s probably just that they started before gpt 5 was released. It’s a good judge.

tacoooooooo · 2025-11-06T18:43:34 1762454614

it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.

It does have a relatively large context window, and ime is very good at format adherence

lukaslevert · 2025-11-06T21:50:55 1762465855

You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search