Hacker News new | past | comments | ask | show | jobs | submit login

Not anymore. There's arguably more than enough data to form a base for strong LLMs; extra data is nice, but doesn't have to come in such quantity.

(In fact, there's value in trying to filter excess crap out of existing training sets.)




We're talking about LLM-driven search engines here, the assumption is that they will always need up-to-date information. A "strong LLM" can't give you to latest on the presidential election if its knowledge cut-off is in 2023, so these companies "solution" is to scrape today's New York Times and get the LLM to write a summary.


LLMs arent embodied. They can not break news as they have no ability to gather fresh news


You still need fresh data for many use cases.


Uh, sorry what?

What happens when you need to search something new? Just hallucinations all the way down?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: