But are also significantly slower and consume hundreds of times more memory, and will still break with the first little change in the DOM tree on the path that you rely on - which smart regexps can sometimes handle. For instance if you're looking just for a title and a price of a single product (common requirement for spiders), you can extract that with regexp without caring about the html page at all, just pick the <h1> and something that looks like a dollar value. I had spiders like that kept working after a full redesign of the site where even the platform was moved from WP to Magento. No DOM parser could handle that, plus they would be too slow in the first place. You need to pick the right tools for the job, everything else is BS...