Do you have recommendations for platforms that monitor scraping fleets?
I occasionally code scrapers for quick data aggregation, but have trouble running anything long-term because it can be a chore to monitor. I've been looking into various options for self-hosting some sort of monitor/dashboard that can send alerts but haven't found anything satisfying yet.
Only Scrapy support atm, but additional scraping frameworks/language are on the roadmap. It would be great to have feedback to consider it when prioritizing some over others :-)
I'd create a simple health check based on the integrity of the data your retrieve.
Think about giving it a score based on how the data is shaped. If it's missing prices for example, then it immediately goes down to zero, doesn't update the database and sends an alert.
I would recommend against building a score and always stick to booleans whenever possible. Having a numeric score means you're trying to extract nice-to-have data points, and in my experience this always leads messy codebases and false assumptions.
If you allow "slightly unhealthy" scrapers in production, almost inevitably the state of your scrapers will always be "slightly unhealthy". Save yourself the trouble and always treat it as either "it works" or "it doesn't work", no in between. Your first iterations will probably break every day, but eventually you'll get to a happy place.
I occasionally code scrapers for quick data aggregation, but have trouble running anything long-term because it can be a chore to monitor. I've been looking into various options for self-hosting some sort of monitor/dashboard that can send alerts but haven't found anything satisfying yet.