Hacker News new | past | comments | ask | show | jobs | submit login

Is there a SOTA library for common web scraping issues at scale( especially distributed over cluster of nodes) for Captcha detection, IP rotation, Rate throttling, Queue Management etc.?



What's a "SOTA library" ?


A contextual guess: "'State of the art' library"

In other words: Is there a drop in library to solve all the big common issues people run into scraping websites in the wild?

At least, that's how I read it.


There is no "state of the art library" to build your own google. But "Rate throttling/limiting" can be done with Redis, rotating ip is still rate-limiting with Redis, Captcha Detection - You have to pay $$ I think.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: