Hacker News new | past | comments | ask | show | jobs | submit login

I mean, sure, none of these models are being optimized for being the top of the leader board. They aren't even being optimized for the same things, so any comparison is going to be somewhat questionable.

But the claim I'm refuting here is "It's extremely cheap, efficient and kicks the ass of the leader of the market", and I think the leaderboard being topped by a cheap google model is pretty conclusive that that statement is not true. Is competitive with? Sure. Kicks the ass of? No.






google absolutely games for lmsys benchmarks with markdown styling. r1 is better than google flash thinking, you are putting way too much faith in lmsys



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: