I didn't see evidence of cheating in the article. Having a slightly differently tuned version of 4 is not the most dastardly thing that can be done. Everything else is insinuation.
Well we'll see if they suffer consequences of this and they cheated too hard, but being perceived as best in class is arguably worth even more than being the best in class, especially if differences in performance are hard to perceive anecdotally.
The goal is long term control over a technology's marketshare, as winner take all dynamics are in play here.
> Critics have pointed out that xAI’s approach involves running Grok 3 multiple times and cherry-picking the best output while comparing it against single runs of competitor models.