Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
riffraff
on Sept 29, 2013
|
parent
|
context
|
favorite
| on:
How to write a crawler
in a couple of projects I worked on, we also stored visited urls in a set of bloom filters, also stored in flat files on disk.
At some point querying the db to check what URLs you have can become quite heavy
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
At some point querying the db to check what URLs you have can become quite heavy