I find it interesting that they're manually approving the hits, because, as they indicate, most hits are (nearly) identical.
It shouldn't be too difficult to solve this automatically though. Identical hits can be discarded very easily. The ones that only have a few words or letters reversed can be detected with some kind of similarity algorithm.
I had a look at the source code, and it does quite a bit of filtering, particularly around making sure the words are unique, and there is a primitive character comparison algorithm.
The code could be simplified by using Python's set() and improved by doing a copy'n'paste on a Levenshtein function.
It shouldn't be too difficult to solve this automatically though. Identical hits can be discarded very easily. The ones that only have a few words or letters reversed can be detected with some kind of similarity algorithm.