Hacker News new | past | comments | ask | show | jobs | submit login

If we really want to imagine a cold-war-style solution, the two teams could meet in an empty warehouse, bring one computer with the model, one with the benchmarks, and connect them with a USB cable.

In practice I assume they just gave them the benchmarks and took it on the honor system they wouldn't cheat, yeah. They can always cook up a new test set for next time, it's only 10% of the benchmark content anyway and the results are pretty close.




There's no honor system when there's billions of dollars at stake x) I'm highly highly skeptical of these benchmarks because of intentional cheating and accidental contamination.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: