I mean, sure, none of these models are being optimized for being the top of the leader board. They aren't even being optimized for the same things, so any comparison is going to be somewhat questionable.
But the claim I'm refuting here is "It's extremely cheap, efficient and kicks the ass of the leader of the market", and I think the leaderboard being topped by a cheap google model is pretty conclusive that that statement is not true. Is competitive with? Sure. Kicks the ass of? No.
google absolutely games for lmsys benchmarks with markdown styling. r1 is better than google flash thinking, you are putting way too much faith in lmsys
But the claim I'm refuting here is "It's extremely cheap, efficient and kicks the ass of the leader of the market", and I think the leaderboard being topped by a cheap google model is pretty conclusive that that statement is not true. Is competitive with? Sure. Kicks the ass of? No.