The hashes involved in stuff like this, as with copyright auto-matching, are perceptual hashes (https://en.wikipedia.org/wiki/Perceptual_hashing), not cryptographic hashes. False matches are common enough that perceptual hashing attacks are already a thing in use to manipulate search engine results (see the example in random paper on the subject https://gangw.cs.illinois.edu/PHashing.pdf).
It seems like that is very relevant information that was not considered by the court. If this was a cryptographic hash I would say with high confidence that this is the same image and so Google examined it - there is a small chance that some unrelated file (which might not even be a picture) matches but odds are the universe will end before that happens and so the courts can consider it the same image for search purposes. However because there are many false positive cases there is reasonable odds that the image is legal and so a higher standard for search is needed - a warrant.
>so the courts can consider it the same image for search purposes
An important part of the ruling seems to be that neither Google nor the police had the original image or any information about it, so the police viewing the image gave them more information than Google matching the hash gave Google: for example, consider how the suspect being in the image would have changed the case, or what might happen if the image turned out not to be CSAM, but showed the suspect storing drugs somewhere, or was even, somehow, something entirely legal but embarrassing to the suspect. This isn't changed by the type of hash.
It shouldn't. Google hasn't otherwise seen the image, so the employee couldn't have witnessed a crime. There are reportedly many perfectly legal images that end up in these almost perfectly unaccountable databases.
That makes sense - if they were using a cryptographic hash then people could get around it by making tiny changes to the file. I’ve used some reverse image search tools, which use perceptual hashing under the hood, to find the original source for art that gets shared without attribution (saucenao pretty solid). They’re good, but they definitely have false positives.
Now you’ve got me interested in what’s going on under the hood, lol. It’s probably like any other statistical model: you can decrease your false negatives (images people have cropped or added watermarks/text to), but at the cost of increased false positives.
Rather simple methods are surprisingly effective [1]. There's sure to be more NN fanciness nowadays (like Apple's proposed NeuralHash), but I've used the algorithms described by [1] to great effect in the not-too-distant past. The HN discussion linked in that article is also worth a read.
This submission is the first I've heard of the concept. Are there OSS implementations available? Could I use this, say, to deduplicate resized or re-jpg-compressed images?