If we really want to imagine a cold-war-style solution, the two teams could meet in an empty warehouse, bring one computer with the model, one with the benchmarks, and connect them with a USB cable.
In practice I assume they just gave them the benchmarks and took it on the honor system they wouldn't cheat, yeah. They can always cook up a new test set for next time, it's only 10% of the benchmark content anyway and the results are pretty close.
There's no honor system when there's billions of dollars at stake x) I'm highly highly skeptical of these benchmarks because of intentional cheating and accidental contamination.
In practice I assume they just gave them the benchmarks and took it on the honor system they wouldn't cheat, yeah. They can always cook up a new test set for next time, it's only 10% of the benchmark content anyway and the results are pretty close.