I cannot tell if this comment was made in just or in earnest.
As far as I understand, the earlier GPT generations required a fixed amount of compute per token inferred.
But given the tremendous load on their systems, I wouldn’t be surprised if OpenAI is playing games with running a smaller model when they predict they can get away with it. (Is there evidence for this?)
As far as I understand, the earlier GPT generations required a fixed amount of compute per token inferred.
But given the tremendous load on their systems, I wouldn’t be surprised if OpenAI is playing games with running a smaller model when they predict they can get away with it. (Is there evidence for this?)