The hashes involved in stuff like this, as with copyright auto-matching, are per...

bluGill · 2024-10-30T20:10:58 1730319058

It seems like that is very relevant information that was not considered by the court. If this was a cryptographic hash I would say with high confidence that this is the same image and so Google examined it - there is a small chance that some unrelated file (which might not even be a picture) matches but odds are the universe will end before that happens and so the courts can consider it the same image for search purposes. However because there are many false positive cases there is reasonable odds that the image is legal and so a higher standard for search is needed - a warrant.

cge · 2024-10-30T20:53:38 1730321618

>so the courts can consider it the same image for search purposes

An important part of the ruling seems to be that neither Google nor the police had the original image or any information about it, so the police viewing the image gave them more information than Google matching the hash gave Google: for example, consider how the suspect being in the image would have changed the case, or what might happen if the image turned out not to be CSAM, but showed the suspect storing drugs somewhere, or was even, somehow, something entirely legal but embarrassing to the suspect. This isn't changed by the type of hash.

wonnage · 2024-10-30T20:43:17 1730320997

That's the exact conclusion that was reached - the search required a warrant.

bluGill · 2024-10-31T01:22:22 1730337742

the court implied even a hash without collisions would not count when it should.

nullc · 2024-10-31T04:41:11 1730349671

It shouldn't. Google hasn't otherwise seen the image, so the employee couldn't have witnessed a crime. There are reportedly many perfectly legal images that end up in these almost perfectly unaccountable databases.

water-data-dude · 2024-10-30T20:28:44 1730320124

That makes sense - if they were using a cryptographic hash then people could get around it by making tiny changes to the file. I’ve used some reverse image search tools, which use perceptual hashing under the hood, to find the original source for art that gets shared without attribution (saucenao pretty solid). They’re good, but they definitely have false positives.

Now you’ve got me interested in what’s going on under the hood, lol. It’s probably like any other statistical model: you can decrease your false negatives (images people have cropped or added watermarks/text to), but at the cost of increased false positives.

ISO-morphism · 2024-10-30T23:30:21 1730331021

> what's going on under the hood

Rather simple methods are surprisingly effective [1]. There's sure to be more NN fanciness nowadays (like Apple's proposed NeuralHash), but I've used the algorithms described by [1] to great effect in the not-too-distant past. The HN discussion linked in that article is also worth a read.

[1] https://www.hackerfactor.com/blog/index.php?/archives/432-Lo...

zahlman · 2024-10-30T22:08:23 1730326103

This submission is the first I've heard of the concept. Are there OSS implementations available? Could I use this, say, to deduplicate resized or re-jpg-compressed images?

elcritch · 2024-10-31T06:36:20 1730356580

Probably yeah, though there’s significant overlap between how much distortion to accept vs the number of false positives.