Of course optimizing for the best models would result in a mix of GPU spend and ML researchers experimenting with efficiency. And it may not make any sense to spend money on researching efficiency since, as has happened, these are often shared anyway for free.
What I was cautioning people was be that you might not want to spend 500B on NVidia hardware only to find out rather quickly that you didn't need to. You'd have all this CapEx that you now have to try to extract from customers from what has essentially been commoditized. That's a whole lot of money to lose very quickly. Plus there is a zero sum power dynamic at play between the CEO and ML researchers.
Not necessarily if you are pushing against a data wall. One could ask: after adjusting for DS efficiency gains how much more compute has OpenAI spent? Is their model correspondingly better? Or even DS could easily afford more than $6 million in compute but why didn't they just push the scaling?
because they’re able to pass signal on tons of newly generated tokens based on whether they result in a correct answer, rather than just fitting on existing tokens.