Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was retrieving text from news sites, so URLS were not that relevant.

Some news services will re-issue a story with more information, keeping the same title and description. A full text check is necessary. I computed a secure hash of the text and compared that.




There are two extremes i know here: same/similar title changing content (we hit gold in seo, let's keep updating this "10 best foos for baring" page), changing title same content (anyone doing serious A/B testing).


Both true. The internet has way more covers than actual books you could say :). Content is very much repackaged over and over again.

However, I found, that URLs don't change as much as the titles and slightly edited texts - it will happens if course, but to go beyond that you would need a similarity hash of the actual content of the page and even that reaches it's limits pretty quickly:

Sometimes a change of title + few edits can change the whole narrative of a near-identical article. Looks like currently even ai can't solve that. And interestingly I've seen that happen even for pretty large newspapers "whatever clicks"...

Since zebra is meant to enable debate/exchange of perspectives on specific contents I think only URLs give some degree of certainty about pointing to the same content.

Happy to hear other ideas!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: