Scrapy's crawling and CSS/xpath selectors are fine. But I'm annoyed about the pipeline after that. Especially to get the data into a SQLite database. I wish cleaning up the data was a series of transformations on SQL tables instead of a bunch of work on Python models.