Hacker News new | past | comments | ask | show | jobs | submit login

The long tail is tough, but rules are useful when you only need to work with a small number of sites. And assuming, as you point out, less "modern" sites. (News sites tend to be mostly consistently manageable but, yes, smaller e-commerce players tend to adopt more modern techniques -- as befitting fashion-forward product lines, naturally).

Our (Diffbot) approach is to learn what news and product (and other) pages look like, and obviate the rules-management -- we also fully execute JS when rendering.

The web keeps evolving though, dang it. Tricky thing!




Unfortunately Diffbot is not open source. Are you planning any F/OSS offerings?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: