Lest you think this is merely an academic problem, I ran into a catastrophic, real-world case where it mattered.
I lost a notebook at a big box store. It had major sentimental value to me[1]. I called their Lost & Found and asked if someone had returned a green notebook. They insisted they didn't have one.
When I went to the store in person, they had it. Because they felt that it wasn't green, but blue. And (presumably) that no one would describe it as green, so they should return False for "matches what a green-notebook-seeking human wants?"
> Because they felt that it wasn't green, but blue.
This is just bizarre on their part. "Guys calling in about a green notebook, one of the three items in our lost and found is a notebook, but... oh it's a blue one. Just tell him we don't have it."
Yes! Fortunately, I did have my name in the back. But I never anticipated that someone would so adamantly think it was some other color. I could understand someone calling it blue, but not "definitely not green".
There's a wall at my parents house which to me always looked green. I referred to it as green and my sister and mother seemed to get downright angry, insisting it was blue. I don't really care either way, it just looks green to me. I can't stop seeing it as green.
If you were to analyze the wavelengths of light the wall, or your notebook, consist of, you'll find a certain amount of blue and a certain amount of green. The thresholds which are detected by our eyes, and the thresholds relative to that where we subjectively determine the dominant color, has high variance.
But people take their subjective perception dead serious, since their perspective is the only one which matters.
Technically, the color of that notebook is closer to "teal" than "green" or "blue", which is a color kinda halfway between blue and green (fancy that).
From the "uninterested big-box store employee" perspective, though - I would think it to be more likely described as "green", not "blue". Maybe "turquoise" if they know what that is (of course, if they get that fancy with their colors, they just might say "teal" as well).
But if we were looking at this on an RGB scale, it wouldn't be #00F - it would be closer to #0F0 - possible something like #088 - which amazingly:
FWIW - glad it was returned to you, but I don't understand how they could think it to be "blue".
Then again, that whole dress thing - and a whole slew of other research on how people perceive (and describe) colors - definitely show it's possible...sigh.
Your situation specifically brings up the interesting distinctions (or lack thereof) between blue and green throughout history and across cultures. If you haven't already heard of it you should google it yourself, but here's a Wikipedia article that touches on it: https://en.m.wikipedia.org/wiki/Blue–green_distinction_in_la...
Even that doesn't fully specify the problem since we don't know the prior probability of a random car being blue. Reword the question to be "rainbow", "chartreuse" or some other less common car color and the probability should go down.
I don't think this can really be answered. The mathematical probability arguments are ignoring important implications of the result.
If 999 out of 1000 said it was blue, you could dismiss the last guy as crazy or blind or something. But 10% providing a different answer means something strange is going on. Maybe the color is some borderline shade, or the lighting is weird, or people are being coerced, or.... Without more information, we can't really tell what's going on, so the answer pretty much has to be "who knows?"
I think this point would be more relevant to something like the modified question proposed by jbob2000:
> It's an interesting idea executed poorly. What if we changed it to: "1000 people were asked how many lights were lit up in a row of 4 lit up lights. 900 people said there were 4 lights."
But in the case of color identification, I feel like 10% disagreement is par for the course -- it doesn't imply anything weird is going on.
The other 100 people didn't necessarily provide a different answer. It just says they were shown a car and they didn't say it was blue. Maybe they just didn't feel like commenting on the colour of the car. In fact, we don't even know if they were asked the colour.
A lot will depend on how you perform the survey, too. If you presented it as a serious psychological study and paid people to participate, having 10% disagreement would be huge. If it's a survey put up before YouTube videos, it would actually be weird for so many people to have agreed. The probability depends on too many unknown factors.
The (well, a) problem lies in treating "is it really blue" as some definite, knowable thing. But that's never directly observable. We only use that question as a "intermediate step" in answering other observables:
1) "Will [x% of] people emit 'that's blue' when asked about its color?"
2) "Does it reflect light within [specified spectrum] under [specified condition]?
3) "Will Scanner model X emit True or False when it scans this?"
Depending on what question you're asking, it may or may not be blue in that sense. As in my other comment [1], 10% saying "green" may be enough for you to consider it green for the purposes of "does the guy at Lost and Found who's asking for a green object possibly own this item?" But a 10% green response may be false for "is this blue enough to meet this UX standard?"
If you were to ask me, then you would have a 10% chance that you catch me in the wrong mood and my answer to any questions would be "f___ you". That percentage might double if you were to ask me a stupid question about the color of a car when the answer is obvious.
Human color perception is a minefield. Your eyes play lots of tricks on you without you ever realizing it. A 10% divergence is entirely believable when asking color questions.
A very slightly different question is much easier to answer:
>900 out of 1000 people say a car is blue. What's the probability that, when asked, you'd say that the car is blue?
This is, IMO, 90%. Whatever skew there is between car-is-blue and survey-response-is-blue should be about the same between you and the general population. Unless, of course, you have some reason to believe that you're different than the overall population.
It doesn't matter how the question is asked either, or why people are saying what they're saying. 10% of people could be trolls who say that the car is colored "like a lizard person".
I don't know, perhaps the car is cyan, perhaps it is painted with a 3 stage pearl and most viewed from the same direction. Most people are not very sophisticated in their discerning of color. I cannot trust that the author is certain that the car is blue to start with. Pure statistics are not enough here.
In that case I would trust the math, but I'd suggest that first we somehow classify the complexity of the scenarios before we jump straight to "9 out of 10 respondents voted "fake news" therefore it is fake news".
I feel like all the mucking about with Bayes' Theorem, while technically correct, is completely missing the point. 100 people getting a color wrong seems impossible, so if it were to happen, I would look for deeper explanations rather than jumping to probabilities. Perhaps it's because those people are from a country/culture that doesn't distinguish blue and green (many don't, see https://en.wikipedia.org/wiki/Blue%E2%80%93green_distinction...).
While this particular example is unimportant, I think it illustrates a point that just paving over statistical oddities can cause you to skip important investigations.
I think we can go a step further and say that the basic Bayes treatment shown is mathematically incorrect. It assumes conditional independence and equal accuracy between judges, which is almost certainly not the case here.
If a full 10% of people who observe a car say that it is not blue, I strongly doubt their evaluations are independent. Rather, I would guess that most of them are making the same assessment, like "my culture doesn't distinguish blue and green" or "I am a person who does not consider cyan to be blue". So simply calculating odds based on 1000 conditionally-independent assessments isn't a valid approach.
Less formally: I expect a car with 900 votes for 'blue' to be a different color than a car with 999 votes for 'blue'. Is each car blue in the binary sense we're talking about? Well, for that we'd need an objective standard of blue, which the problem quietly failed to set.
It also is really important how the question is asked.. if people are asked to say what the color is, then many people might say 'blue', but others might say a more specific shade of blue (navy, periwinkle, turquoise, etc).
That is very different than if they were asked the yes/no question of 'is this car blue?'
> 100 people getting a color wrong seems impossible, so if it were to happen, I would look for deeper explanations rather than jumping to probabilities.
There is no probability - you can't apply probability theory at all since it's easy to make "Dutch Book" against any numerical answer (this can be accomplished by selecting the initial pool of 1000 as you please.) All this hinges on:
"All they know is that 900 people said it was blue, and 100 did not."
Meaning, those who are being asked for a probability don't know who selected those people or how.
This crops up with many false descriptions of the "Monty Hall Paradox" as well. Some descriptions also allow Dutch Book defeats, so probability can't be applied to the problem as it is (falsely) described.
The principle here is that you can't and shouldn't apply probability to questions about a deck of cards, if someone else can select which cards are in the deck, including 52 copies of the same card, either before or after your guess or bet. They'll take your money.
When Anderson et al changed the meaning, mid-game, of "Triple A rating" for subprime bonds, etc, before 2008 they pulled exactly this sort of trick; thus fooling those who thought they could apply calculations of probability to a situation where probability didn't apply; since the only thing that mattered was some executive's guess about how likely it was that he would end up in jail for rigging the system. (Not at all likely, we know now!)
When I was young there were a lot of "nine out of ten doctors recommend our cigarettes" ads also based on the same trick, and it must have worked on a lot of people, 'cause it was very common.
As with the common misdescriptions of the Monty Hall Problem, it's possible the writer meant to describe a quite different problem, but as the problem is described here no probability can be inferred.
Can Bayes's formula be applied usefully to controversial public arguments? For example (and here I'm attempting to choose a real example but also avoid a political discussion), if 900 out of 1000 people believe the OJ Simpson murdered Nicole Brown Simpson and 100 out of 1000 people believe he didn't, does this provide any useful information about the likelihood that Simpson murdered Nicole Brown?
This might be what the original question-poster was attempting to answer with his blue car question, using a cleaned-up example.
> if 900 out of 1000 people believe the OJ Simpson murdered Nicole Brown Simpson and 100 out of 1000 people believe he didn't, does this provide any useful information about the likelihood that Simpson murdered Nicole Brown?
Not in the same way. Note the key assumption in the accepted answer: a 10% false positive rate. That is, we assume (for good reason) that on average the population is fairly accurate at identifying and naming colors correctly.
The analogous assumption in the OJ example would be "given media-filtered information about an emotionally-charged murder trial, most people accurately assess guilt with 90% probability." This is clearly false.
But note that our entire criminal justice system does assume that "given all the facts as presented by a prosecutor and defense attorney over the course of a trial, people instructed to vote 'not guilty' unless they are sure of guilt 'beyond a reasonable doubt' will have a very low false positive rate." And here the exponent is only 12.
Well, ok, fair enough, but I was using that number as an example. The point is you can look at the term (false positive)^900 and see that its tininess will dominate.
Bayes' formula can be applied to compute probabilities for basically anything, but you have to watch your assumptions. For example, the example crucially depends on every person actually having seen the car, and their answers being conditionally independent given the color of the car. If e.g. only 10 had seen the car and each told 100 others, that works out to a completely different number.
If you want to make a probabilistic argument in a public debate, you probably won't have enough information to reach reliable numbers, and it won't convince anyone who doesn't already agree with your conclusion. (Assuming they didn't doze off when you mentioned math ...)
This is an interesting point, although I think the cleaned-up question misses the mark. Most people asked about that murder would have similar definitions of 'murder', and you'd basically be gathering predictions about the state of the world.
Many people asked about a blue car will have different standards for 'blue', so our data is distorted by the possibility that people can agree on the hue of the car, but not the 'blueness' of it.
The SE answer looks like it assumes that blue cars aren't "stupidly rare", but I don't see that reflected in any of the math.
Is there a word for that assumption in statistics? I'm guessing this is something that is so obvious to a statistician that they don't even think to include it. But without seeing the work the layperson is likely to throw up their arms and say, "Not enough information."
The assumption is worked in. He assumes a base rate of only 0.1% of all cars are blue, and then shows that even with this it is astronomically unlikely the car isn't blue. He could have explicitly calculated how low a base rate you'd need for, say, a 50% chance the car is blue, and that number would be astronomically small as well.
While working the numbers was needed for the SE answer, I actually think it obscures the intuition.
The high order bit here is that there's only a 10% chance of a false positive, and so you're raising 0.1 to the 900th power. Everything else is a second order term relative to that, and you can instantly see the answer will be "nearly certain".
The answer calculates a likelihood ratio of 10^763, so for a 50% chance you'd need a prior probability of less than 1/(10^763 - 1) ≈ 10^(-763), which would imply that there is most likely not a single blue car in the universe.
Not really related to the article (well..perhaps it is..) but we recently acquired a Subaru that is officially "gray" but looks quite clearly blue most of the time. Strangely though it looks non-blue sometimes. My best theory is that it is reflecting the sky color. I've since come across other people who own the same color Subaru and report the same controversy over its color.
Yeah, I think this question is underspecified. We're happily saying "not all judges are accurate, so use statistics to get an answer", but we aren't setting any actual boundary.
If only 600 people said the car was blue, I'd expect blue-green or blue-grey paint, and the answer to "is it blue?" would depend on who defined blue.
As far as I know, color is interpreted differently between cultures, so you'd have to know how the people define "blue" (which you touched on), or how many of them even speak a language that has a word for it.
A question like this has a lot of assumptions built into it. We prefer to ignore such messy details, and assume our own experience as universal, in pursuit of what we like to call "rationality".
There is an issue with colour blind people but also with a few tones of "blue" (or "green").
I had this argument in real life, many years ago, about what is called "petrol blue" in some car offerings, in front of the same car, I saw it as blue, and a friend of mine saw it as green.
After long discussions, and having observed the car from all possible angles, taking into account the light, etc. we came to agree that it was BOTH blue and green.
Try yourself a range (roughly) between RGB 07636E and RGB 1D4D6E.
Off topic but this reminds me of the New riddle of induction [1]. This is philosophical argument about predicates (i.e. properties that things have that are either true or false) and what it means to use induction to confirm such predicates. An example used in the paper is determining that all emeralds are green. It turns out to be quite a riddle.
The problem is much wider and is faced also by content creators (e.g. youtube): What is the probability that my videos are good based on the likes and dislikes I get?
Coloration is absolutely subjective, in that different people see colors differently, even without obvious vision problems like colorblindness.
My wife and I often argue about whether something is blue or green, or blue or purple. We're both rather good at discerning tiny changes in hue (I'm one of the few men that are gifted in this, I believe) but we're both certain of our answers.
Considering that in just about every poll I see, even if it's asking "should parents be legally allowed to barbecue their infant children on a spit and serve them for dinner?" there is invariably 10% who are "undecided", I'm going to go with an answer of "0.999999".
Though the top-voted answer does a better job of giving a reasonably sound defensible answer, we both come to the same conclusion.
I'm the jerkwad who always answers undecided, because invariably these multiple choice answers force you to choose between poorly-defined extremes.
This one is pretty clear, but I could see in some context in some (perhaps post apocalyptic) society where legally killing and eating your young is a morally justified position. It's certainly seen in the natural world with chimpanzees and many other creatures.
I didn't always do well academically because of this, but I like to think it might help me as a programmer?
One high answer uses Bayes' formula and requires estimating probabilities like the probability that a random car is blue. ... Following the metaphor, you'd have to ... estimate the probability that god exists first in order to decide what the poll means about the probability that god exists.
Another answer comes with a disclaimer that the problem only works out in a straight-forward way if blue cars aren't a super-rare unbelievable occurrence. I don't think a poll about gods lends itself to an obvious answer similarly.
The top answers are essentially well-thought-out probability arguments describing the Lizardman Constant [1]. Essentially, random humans who don't care very much about your survey are likely to respond in arbitrary or perverse ways, making them a very noisy data source. Small signals (e.g., 10% or less of a sample) are likely to be meaningless. Its nice to know that there's a solid argument that this doesn't compromise the validity of large signals, at least.
I lost a notebook at a big box store. It had major sentimental value to me[1]. I called their Lost & Found and asked if someone had returned a green notebook. They insisted they didn't have one.
When I went to the store in person, they had it. Because they felt that it wasn't green, but blue. And (presumably) that no one would describe it as green, so they should return False for "matches what a green-notebook-seeking human wants?"
Here's the notebook: http://i.imgur.com/AlQAZBJ.jpg
So, for the linked question: whenever answering a question, you need to know why you're answering it. It affects the answer! Consider these purposes:
1) "I want to know if other people will agree that this is a green notebook."
2) "I want to know if I should say this definitely-doesn't-match when someone comes looking for a green notebook."
3) "I want to know if this notebook reflects almost entirely green light."
Case 2 is the one I was interested in. In that case, 10% respondents are enough to say "hey, that might be a match".
[1] I know, "you shouldn't have brought it out with you".