Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I posted this a few days ago... I wonder why HN didn't keep the original post.

https://news.ycombinator.com/item?id=16159589





An improvement to the Hacker news website would be to compute and compare hashes of weblinks so that the same link is not reposted multiple times. Later posters of links could then be redirected to the first post of a link.


It wouldn't work - it's trivial to add meaningless query parameters or anchors that would change the hash but still lead to the same content. And stripping that wouldn't work because some sites use them to route to content.

What might work is hashing the text and outbound link content submitted pages of, and building something like a similarity index of text, metadata and a graph of links, but that would probably still be fragile, and definitely be too much effort for a site with as little traffic as this.


They could capture the canonical URL from the meta tags in the page. I don't think they do currently.


Assuming a site has one, although most news sites probably do. Facebook Open Graph and other social media tags are worth looking for as well. Unfortunately, they're not always trustworthy.


Would still pick up some of the low hanging fruit though which is better than nothing.


I think they do. I have submitted a link before and been redirected to the existing comments page.


It does that, but only if the other submission has been very recently and/or(?) has points over a threshold - otherwise duplicate submissions are explicitly allowed.


This behavior happens already, but only if the link has gotten sufficient attention.


Why even hash the weblinks? You can just store them directly no?


... and I tried to see comments on it days ago.

https://news.ycombinator.com/item?id=16149039

It is strange how no one was interested when previously submitted and sudden it attracts a lot of interest.


Timing is one of the most important parts of getting "hits" on social media, and while it can be optimized for, it's not always controllable. A post may succeed because someone made an attention-getting comment, for example, and that's just the luck of the commenter running across the link when he or she had the time and inclination to leave a note.

Social is fickle that way, and since most social algorithms strongly consider post recency and other time-sensitive factors in their ranking, duplicates should be allowed within a reasonable time frame, because you never know when the critical path will get hit.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: