Hacker News new | past | comments | ask | show | jobs | submit login

Hosting costs are so minimal today that I don't think crawling is a natural monopoly. How much would it really cost a site to be crawled by 100 search engines?





A potentially shocking amount depending on the desired freshness if the bot isn’t custom tailored per site. I worked at a job posting site and Googlebot would nearly take down our search infrastructure because it crawled jobs via searching rather than the index.

Bots are typically tuned to work with generic sites over crawling efficiently.


Where is the cost coming from? Wouldn't a crawler mostly just accessing cached static assets served by CDN?

And what do you mean by your search infrastructure? Are you talking about elasticsearch or some equivalent?


No, in our case they were indexing job posts by sending search requests. Ie instead of pulling down the JSON files of jobs, they would search for them by sending stuff like “New York City, New York software engineer” to our search. Generally not cached because the searches weren’t something humans would search for (they’d use the location drop down).

I didn’t work on search, but yeah, something like Elasticsearch. Googlebot was a majority of our search traffic at times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: