Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Is it possible to overwhelm a web-crawler with infinite links?
4 points by hellbanner on Aug 16, 2015 | hide | past | favorite | 2 comments
mysite.com/:id

shows some text, links to

mysite.com/:id+1

Would a crawler keep hitting these infinity of pages? What scale would it take to bring down a crawler?



Consider that every page that the crawler must download is a page's worth of bandwidth your server must deliver. If these pages are generated, your server need to spend the computation to generate that too.

Add to this the fact that a crawler is able to stop and back off if its resource constraints are being overwhelmed, but as a webserver that may be serving content that is "mission-critical", for lack of a better phrase, your server probably needs to maintain a low latency. In addition, there are multiple web crawlers out there, that could hit your server at the same time.

Given that, I think the scheme you've described is a mug's game, where for every dollar's worth of damage you do to a crawler, you're doing more than a dollar of damage to yourself.


It might be possible to kill a naive crawler with some infinitely-spiralling link web. However, most modern crawlers are built with this in mind and will stop after a certain recursive level is met.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: