This is a good list. I did this at a medium scale once (about 10,000 feeds that ...

This is a good list. I did this at a medium scale once (about 10,000 feeds that needed to be checked once per minute).

My favorite thing he mentioned is that various tags can have different meanings. Published, updated, description, content, subtitle. To do this at scale you need some configurations for each feed to specify where you can get information. Does <published> mean published, or does it actually mean updated? Everyone does it differently.

And the etag thing. Yeah…

One thing he didn’t mention is media. I think the HN crowd really likes RSS because the mostly-text tech blogs they like to read all support it, and it seems to work fine. But a lot of the population likes to read content that has embedded images and videos. Even slideshows sometimes. There are RSS extensions for this, but they suck for all the same reasons.

At my company we ended up abandoning RSS and writing a customizable web scraper instead (ingesting HTML pages). It was actually a lot easier than dealing with RSS.