It wouldn't work - it's trivial to add meaningless query parameters or anchors that would change the hash but still lead to the same content. And stripping that wouldn't work because some sites use them to route to content.
What might work is hashing the text and outbound link content submitted pages of, and building something like a similarity index of text, metadata and a graph of links, but that would probably still be fragile, and definitely be too much effort for a site with as little traffic as this.
Assuming a site has one, although most news sites probably do. Facebook Open Graph and other social media tags are worth looking for as well. Unfortunately, they're not always trustworthy.
What might work is hashing the text and outbound link content submitted pages of, and building something like a similarity index of text, metadata and a graph of links, but that would probably still be fragile, and definitely be too much effort for a site with as little traffic as this.