Hacker News new | past | comments | ask | show | jobs | submit login

Yes. The models require training data and they already been fed the internet.

More and more of the content generated since is LLM generated and useless as training data.

The models get worse, not better by being fed their own output, and right now they are out of training data.

This is why Reddit just went profitable, AI companies buy their text to train their models because it is at least somewhat human written.

Of course, even reddit is crawling with LLM generated text, so yes. It is coming to a halt.




Data is not the only factor. Architecture improvements, data filtering etc. matter too.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: