Hacker News new | past | comments | ask | show | jobs | submit login

>The argument is that you are stealing resources (computer time) from the site owner.

That's why 3taps is getting the data from google's cache without touching craigslist servers.




How the heck are they scraping Google without being banned or rate limited?


There are a lot of companies that are scraping Google quite successfully. Many of these are for 'rank checking' services that provide ranking data for certain keywords over time; these are heavily used by SEO and marketing agencies.

The two that jump to mind are Authority Labs and SEOmoz.

I guess: a shed load of proxies. :)


Amazon/ other clouds out there. Just auto provision your instances (lots of them), scrap, sleep, wake, scrap, sleep...


It's not impossible. You just tell google not to cache.


And makes it impossible to block them as a side effect.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: