Hacker News new | past | comments | ask | show | jobs | submit login

> Modern BERT with the extended context has solved natural language web search. I mean it as no exaggeration that _everything_ google does for search is now obsolete.

The web UI for people using search may be obsolete, but search is hot, all AIs need it, both web and local. It's because models don't have recent information in them and are unable to reliably quote from memory.






And models often makes reasoning errors. Many users will want to check that the sources substantiate the conclusion.

The point is that the secret sauce in Google's search was better retrieval, and the assertion above is that the advantage there is gone. While crawling the web isn't a piece of cake, it's a much smaller moat than retrieval quality was.

Eh, I don't really see that.

Crawling the web has a huge moat because a huge number of sites have blocked 'abusive' crawlers except Google and possibly Bing.

For example just try to crawl sites like Reddit and see how long before you're blocked and get a "please pay us for our data" message.


My experience running a few hundred very successful shops (hundreds of thousands of orders per month) is that there's no need for quotes around 'abusive'.

95% of our load is from crawlers, so we have to pick who to serve.

If they want our data all they need to do is offer a way for us to send it, we're happy to increase exposure and shopping aggregation site updates are our second highest priority task after price and availability updates.


It may be tricky, but it's a piece of cake compared to doing good retrieval.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: