I'm a big ACLU supporter, but thought this was poorly done. They never released ...

gregsadetsky · on June 10, 2020

In the ACLU's posts, they say that they used the "default match settings". In their response [0] to Amazon's response, the ACLU links to a guide published by Amazon [1] intended to "Identify Persons of Interest for Law Enforcement" that does use `searchFaceRequest?.faceMatchThreshold = 0.85;` (this is still the case today).

Fast Company [2] writes about this as well: "The ACLU in both tests used an 80% match confidence threshold, which is Amazon’s default setting, but Amazon says it encourages law enforcement to use a 99% threshold for spotting a match." That bit of the article links to the CompareFaces API documentation [3] which (still) states "By default, only faces with a similarity score of greater than or equal to 80% are returned in the response".

Have you seen/read something else about this?

[0] https://www.aclu.org/press-releases/aclu-comment-new-amazon-...

[1] https://aws.amazon.com/blogs/machine-learning/using-amazon-r...

[2] https://www.fastcompany.com/90389905/aclu-amazon-face-recogn...

[3] https://docs.aws.amazon.com/rekognition/latest/dg/API_Compar...

ajzinsbwbs · on June 11, 2020

My guess is: Amazon writes that 99% is the recommended threshold because there is little chance of false positive. If an agency is deploying this system and complains that it isn’t working well, a solutions architect will say in a meeting (but not in writing) to lower the threshold. If it comes out that false positives occur, AWS isn’t responsible.

abiogenesis · on June 11, 2020

True, but one can't blame Amazon in this case. The agency can reduce the threshold to 0% (ad absurdum) and claim that Amazon's technology is not working.

perl4ever · on June 11, 2020

You can blame Amazon if there's no point on the threshold scale where it works adequately.

abiogenesis · on June 11, 2020

I guess one must define "adequate" first. The default value of 80% could work fine if you are developing a "find your celebrity doppelgänger" game while law enforcement should probably use 99%.

perl4ever · on June 11, 2020

I think you'd have a point at which the threshold increases but the probability of the true positive being in your results starts to drop severely. 99% might be useless if you have one or two hits and they are unlikely to be correct. You can't assume that the one you're looking for will be a 100% match; if it was, then you'd just set the threshold to that, presto.

tw000001 · on June 10, 2020

>Fast Company [2] writes about this as well: "The ACLU in both tests used an 80% match confidence threshold, which is Amazon’s default setting, but Amazon says it encourages law enforcement to use a 99% threshold for spotting a match

Then this whole thing is potentially misleading because there's a huge difference between 80% and 99%. It's probably nonlinear and they could possibly see their false matches drop to 0. This is not a fair test - or rather, the conclusions are not quite supported by the parameters.

Not that I'm defending police use of facial recognition tech, I think it's abhorrent, though possibly inevitable.

jrumbut · on June 10, 2020

They made a facial recognition tool available to law enforcement and in the marketing it says "requires no machine learning expertise to use" then I think it's fair to look at any value of the threshold parameter they make available. Especially a parameter that, by changing it, will give you the answer you want more often.

I'm deeply troubled by the text I've seen here implying this threshold is some accuracy percentage or positive predictive value percentage. Unless God is working behind the scenes at AWS they can't make any claim about the accuracy of the model on an as yet unseen population of images.

That's even before getting to the more esoteric map vs territory concerns like identical twins, altered images, adversarial makeup and masks, etc.

gregsadetsky · on June 10, 2020

Just to make sure I understand, which "whole thing" is misleading? The ACLU's test? Amazon's response?

As for the test, you say it's not a fair test. The point / conversation right now seems to be about the choice of parameters used by the ACLU. As far as I see / understand, the ACLU used the default parameters (and/or those recommended in the documentation / articles that are still up today with those same non-99% values).

What would have been a better / fairer test?

rising-sky · on June 10, 2020

What are police departments using? My uninformed guess would be not 99%. I think therein lies the concern...

bigiain · on June 11, 2020

My cynical guess would be "whatever the lowest number they can get away with using".

I would bet good money that cops KPI goals benefit from false positives, since they'll reward higher "number of identified/interviewed suspects" and "number of arrests" as a positive thing even if "number of convictions" doesn't line up.

Even more cynically, I'd bet this is a powerful technique for ambitious cop promotion, and that there's little blowback on fraudulently manipulating parameters that adversely affect POC much more significantly that white people.

Thinking about it, I'm now recalling the multiple reports of police departments claiming to not be using clearview.ai, only to have to backtrack when clearview's customer data got popped and it became public knowledge that individual cops were signing up for free trials - which their department/management either chose to hide or didn't know about. That's reasonably compelling circumstantial evidence to me that ambitious cops are quick to jump on unproven and unauthorised technology with insufficient or oversight or with management actively avoiding oversight for them...

dkn775 · on June 11, 2020

In regards to the KPIs this is a known reality. Most states get money from the federal gov highway safety program. Then the states disburse it to local police depts, and the expect high numbers of citations (or even warnings) to be reported back up the chain. It is only for DUI that verdicts are considered, and that's only amongst the smarter states. Related to crime, there are NO KPIs based on the final outcome - all on the elements the police are able to carry out and be accountable for on their own. This makes sense in some ways beyond self promotion. I will say also that the general inflation of KPIs in order to justify promotions, grant renewals, etc is RAMPANT in state and local govs, but especially in policing when it comes to new tech investments and promotions

Spooky23 · on June 11, 2020

If they can turn the knob, why wouldn’t they? This stuff isn’t admissible in court, and you can sweep for potential matches to follow up on.

If the default is 80, most will be 80. The SE may say “I’m told to inform you that you should use 99.”, but I’m sure he is winking.

perl4ever · on June 11, 2020

Wouldn't it be more likely that they say "ok, we can interview/investigate/whatever X number of people" and then they adjust the threshold to produce that number? If 80% gives them 10,000 hits and 99% gives them one or none, then nobody is going to just go with either setting.

Spooky23 · on June 11, 2020

I'd guess with the potato quality of facial pictures from incidents security or phone cameras, you might want lower confidence matches to get outcomes out of lousy pictures.

chishaku · on June 10, 2020

> had configured the recognition level against Amazon's recommendations.

Citations?

My understanding was that the ACLU used the default settings.

July 26, 2018 — Amazon states that it guides law enforcement customers to set a threshold of 95% for face recognition. Amazon also notes that, if its face recognition product is used with the default settings, it won’t “identify[] individuals with a reasonable level of certainty.”

July 27, 2018 — Amazon writes that even 95% is an unacceptably low threshold, and states that 99% is the appropriate threshold for law enforcement.

https://www.aclu.org/press-releases/aclu-comment-new-amazon-...

Either way, the defaults are the problem if the application is law enforcement.

"Defaults have such powerful and pervasive effects on consumer behavior that they could be considered “hidden persuaders” in some settings. Ignoring defaults is not a sound option for marketers or consumer policy makers. The authors identify three theoretical causes of default effects—implied endorsement, cognitive biases, and effort..."

https://journals.sagepub.com/doi/10.1509/jppm.10.114

HaloZero · on June 10, 2020

I agree. I wish the ACLU would re-run the results at 99%. But Amazon's example post about law enforcement has it set to 85%.

I don't think this 99% thing is communicated properly at Amazon if it's getting through blog posts like this.

So I think a valid criticism is that we need to make sure that it's higher.

https://aws.amazon.com/blogs/machine-learning/using-amazon-r...

simion314 · on June 11, 2020

Do we know police actually use 99% ?

HaloZero · on June 11, 2020

Nope and I doubt they do. But a test of 99% would better for Amazon/ACLU to take on.

simion314 · on June 11, 2020

I do not understand, a fair test is to replicate reality. Also I am wondering if you have for each city a different software package with it's own config or IT guy that tweaks the config

bko · on June 10, 2020

I posted in another comment, but I tried to recreate this using default 70% match. The dataset was 440 images of congressmen and 1,756 mugshots. There were ten mismatches between 70-77% certainty

https://medium.com/ml-everything/how-facial-recognition-work...

JMTQp8lwXL · on June 10, 2020

It seems realistic other users would ignore Amazon's recommendations for proper configuration.

outworlder · on June 11, 2020

Specially given that turning the parameter _down_ will give you more matches. This is great for demos. "No match found" is not so great.