Hacker News new | past | comments | ask | show | jobs | submit login

I fully don't agree.

> I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era

Compute of the "GPT-3 era" vs the "GPT-3.5 era" is identical, this is not a distinguishing factor. The architecture is also roughly identical, both are dense transformers. The only significant difference between 3.5 and 3 is the size of the model and whether it uses RLHF.




Yes you're right about the compute. Let me try to make my point differnetly: GPT-3 and GPT-4 were models which when they were released represented the best that OpenAI could do, while GPT-3.5 was an intentionally smaller (than they could train) model. I'm seeing it as GPT-3.5 = GPT-4-70b. So to estimate when the next "best we can do" model might be released we should look at the difference between the release of GPT-3 and GPT-4, not GPT-4-70b and GPT-4. That's my understanding, dunno.


GPT-4 only started training roughly at the same time/after the release of GPT-3.5, so I'm not sure where you're getting the "intentionally smaller".


Ah I misremembered GPT-3.5 as being released around the time of ChatGPT.


oh you remembered correctly, those are the same thing

actually i was wrong about when gpt-4 started training, the time i gave was roughly when they finished




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: