Hacker News new | past | comments | ask | show | jobs | submit login

@AwkwardPanda

And how does a site opt out of your scraping? Do you have a unique user-agent when you scrape? A set of IPs?




Hi, currently there is nothing of this sort. The user agents are random. I have a couple of servers doing the scraping in real-time. The IPs are not static.

Let me see if I can build a opt-out list. But wouldn't it beat the entire purpose of this app?


You should just declare your bot as a user-agent. Most publishers won't even bother to do it, but leaving a publisher an option is the correct etiquette for any bot. Random user agents is cloaked scraping.


You know the answer to this.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: