Hacker News new | past | comments | ask | show | jobs | submit login

I wonder how much the regression of ChatGPT is due to it adding new content which has its origin from ChatGPT. The blog and SEO spam with ChatGPT fluff is going through the roof, eventually all of that will get crawled too and the model will just get positively reinforced on its own output. Or is that not a concern?



0.1% chance

My reasons are:

- I don't recall seeing any evidence that OpenAI has included new data in pretraining beyond the previous limit (Sept. 2021?) for GPT-3.5 or GPT-4

- Maybe they did finetuning or RLHF on new data but this is likely to be highly curated data

- AI generated content should be absolutely tiny in comparison to the data they are already working with.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: