> I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era
Compute of the "GPT-3 era" vs the "GPT-3.5 era" is identical, this is not a distinguishing factor. The architecture is also roughly identical, both are dense transformers. The only significant difference between 3.5 and 3 is the size of the model and whether it uses RLHF.
Yes you're right about the compute. Let me try to make my point differnetly: GPT-3 and GPT-4 were models which when they were released represented the best that OpenAI could do, while GPT-3.5 was an intentionally smaller (than they could train) model. I'm seeing it as GPT-3.5 = GPT-4-70b.
So to estimate when the next "best we can do" model might be released we should look at the difference between the release of GPT-3 and GPT-4, not GPT-4-70b and GPT-4. That's my understanding, dunno.
> I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era
Compute of the "GPT-3 era" vs the "GPT-3.5 era" is identical, this is not a distinguishing factor. The architecture is also roughly identical, both are dense transformers. The only significant difference between 3.5 and 3 is the size of the model and whether it uses RLHF.