The pundits response to the (alleged) plateau was proportional to the certainty with which CEOs of frontier labs discussed pre-training scaling. The o3 result is from scaling test time compute, which represents a meaningful change in how you would build out compute for scaling (single supercluster --> presence in regions close to users). Thus it is important to discuss.