No, it's a full model. It's just...most concisely, it doesn't include the actual costs.
Claude gave me a good analogy, been struggling for hours: its like only accounting for the gas grill bill when pricing your meals as a restaurant owner
The thing is, that elides a lot, and you could argue it out and theoratically no one would be wrong. But $5.5 million elides so much info as to be silly.
ex. they used 2048 H100 GPUs for 2 months. That's $72 million. And we're still not even approaching the real bill for the infrastructure. And for every success, there's another N that failed, 2 would be an absurdly conservative estimate.
People are reading the # and thinking it says something about American AI lab efficiency, rather, it says something about how fast it is to copy when you can scaffold by training on another model's outputs. That's not a bad thing, or at least, a unique phenomena. That's why it's hard talking about this IMHO
Claude gave me a good analogy, been struggling for hours: its like only accounting for the gas grill bill when pricing your meals as a restaurant owner
The thing is, that elides a lot, and you could argue it out and theoratically no one would be wrong. But $5.5 million elides so much info as to be silly.
ex. they used 2048 H100 GPUs for 2 months. That's $72 million. And we're still not even approaching the real bill for the infrastructure. And for every success, there's another N that failed, 2 would be an absurdly conservative estimate.
People are reading the # and thinking it says something about American AI lab efficiency, rather, it says something about how fast it is to copy when you can scaffold by training on another model's outputs. That's not a bad thing, or at least, a unique phenomena. That's why it's hard talking about this IMHO