Hacker News new | past | comments | ask | show | jobs | submit login
SpiderKeeper: Admin UI for scrapy/open source scrapinghub (github.com/dormymo)
107 points by r_singh on March 7, 2020 | hide | past | favorite | 14 comments



SpiderKeeper least received an update 2 years ago. Scrapyd last received an update 9 months ago. Scrapy (the actual spider) is being actively maintained.


Loads of not maintained open source projects make it to the front page lately..


This project seems like a more updated fork although it's also not maintained

https://github.com/fliot/ScrapyKeeper


Are these spiders able to fetch content that loads in segments after scroll events?


Scrapy doesn't run javascript so it can't scrape client rendered content at all. There are projects that combine scrapy with a headless browser or javascript engine.


How does this compare with a general solution like Apache Airflow?


The following is a simplification but should suffice.

Airflow is for hooking together and managing a bunch of data sources, data-processing nodes, and data sinks.

This project seems like a GUI for Scrapy, which is a web crawling framework, i.e. one kind of data source.


Anyone here using this in production (even for a side project)?

What are some of the challenges you've faced as opposed to just using ScrapingHub's Scrapy Cloud service?


I am using this in production inside a docker container. It does the job pretty well.


All such admin tools use scrapyd which is not maintained and not good enough to be used in production.


What do you recommend using instead?


One of the companies I've worked for developed in-house solution (with some domain-specific stuff).

I've had to write some kind of admin tool for one of my clients.

And then I've used to start another project with scrapy and I feel like I need to develop another one tool for scrapyd management :)


I think it can faster than the normal then a basic function. ti increase the case memory.


Nice. I'm stealing this concept for my scraping engine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: