Is there a SOTA library for common web scraping issues at scale( especially distributed over cluster of nodes) for Captcha detection, IP rotation, Rate throttling, Queue Management etc.?
There is no "state of the art library" to build your own google. But "Rate throttling/limiting" can be done with Redis, rotating ip is still rate-limiting with Redis, Captcha Detection - You have to pay $$ I think.