When intuition and math probably look wrong (2010)

Xcelerate · on Feb 16, 2017

How you state the problem has a lot to do with the resulting probabilities. Many times the answer is ambiguous until you definitively identify the space of states that a state can be drawn from.

Bertrand's paradox is an example of this issue: https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)

tel · on Feb 16, 2017

This example along with all the similar ones drives home the basic idea that probability is tied to its model and its model is often best phrased as a generative story. If we agree on the generative story we'll agree on the model and then your intuition probably works just fine. In situations where the story is obscured—in this case, only the end results are mentioned so that the story is ambiguous—then you can have ambiguous answers.

gpawl · on Feb 17, 2017

After a long writeup about the treachery of ambiguous problem statements, the article ends with ... a misleading ambiguous statement:

> the ukulele-playing and dancing ambitions would affect the probabilities about the sex of his sibling.

I suspect that this whole article would have been much better if was written by one of the interviewees instead of the jouralist. Sigh.

tylerhou · on Feb 16, 2017

The best intuition I've heard for the two-child problem is as follows.

Suppose you have two doors with two children behind them. The 1/2 chance of a boy version works like this: Say you randomly open one of the two doors and there is a boy. That doesn't give you any information about the other boy, so you say that the probability of the other one being a boy is 1/2.

1/3: This time beforehand you know that there is at least one boy. Then you open a door and it happens to be a boy. That tells you something about the other door, which alters its probability to 1/3.

notahacker · on Feb 17, 2017

Interesting you mentioned doors, since my first thought on reading the article was of the Monty Hall Problem, specifically is this guy trying to parody the Monty Hall problem by coming up with purposely obtuse interpretations of purposely vague questions to complicate what would normally be considered a trivial conditional probability question?

But the Monty Hall problem came later. This just feels like the probability equivalent of Xeno's Paradox, except revolving around arbitrarily shifting unstated assumptions about selection effects rather than arbitrarily reducing distances travelled.

pizza · on Feb 16, 2017

For the sequential case:

P(2 boys | >= 1 boy) = P(>=1 boy | 2 boys) * P(2 boys) / P(>= 1 boy)

where P(>= 1 boy of all 2 kids): 1 gg + 1 bg + 1 gb + 1 bb: = 3/4

P(2 boys): 1 gg + 1 bg + 1 gb + 1 bb: = 1/4

and trivially P(>= 1 boy | 2 boys) = 1

So P(2 boys | >=1 boy0 = 1 * (1/4) / (3/4) = 1/3

kpil · on Feb 16, 2017

Stupid question perhaps, but why not:

  boy           girl
  boy(notgirl)  boy
  boy           boy(notgirl) <-reversed order
  girl          boy

?

acqq · on Feb 16, 2017

You were probably annoyed before you've read the whole article, like I was at that moment. But it goes on:

"Not so fast, says probabilist Yuval Peres of Microsoft Research. That naïve answer of 1/2? In real life, he says, that will usually be the most reasonable one."

"If I specifically selected him because he was a boy born on Tuesday (and if I would have kept quiet had neither of my children qualified), then the 13/27 probability is correct. But if I randomly chose one of my two children to describe and then reported the child’s sex and birthday, and he just happened to be a boy born on Tuesday," "the probability that the other child will be a boy will indeed be 1/2."

kpil · on Feb 16, 2017

Well I got annoyed a little bit further down when they started to coalesce events arbitrarily, and stopped reading.... I probably don't understand why it's meaningful or even valid to do that though...

On the other hand, I spent a rather sad year reading statistics because I thought it was interesting, and got so bored that I got a job as a developer instead, so I am rather confident that it's not for me to truly understand...

gpawl · on Feb 17, 2017

Sadly, many people will skim the article, including the incorrect first half, and come away from the article with increased misunderstanding.

pirocks · on Feb 16, 2017

I too had this question?

natosaichek · on Feb 17, 2017

I'm not sure I totally understand, but maybe someone can confirm this for me. If we add more "extraneous" information, it seems like it pushes the probability closer to (the naiive answer of) 1/2. If we add lots of extraneous information, does it get really close? What if we did something like this:

I have two children, one of whom is a black-haired, blue-eyed son with an owl-shaped birthmark on his right leg born on a Tuesday in Argentina during an eclipse while a flock of 231 seagulls circled clockwise overhead. What is the probability that I have two boys?

Am I just muddying the water, or is the probability vanishingly close to 2?

08-15 · on Feb 17, 2017

> is the probability vanishingly close to 2?

It is. Here's how it clicked for me:

Forget about Thursday; someone tells you "I have two kids, at least one of which is a boy who is special". Now you get these cases for the two children: Bg, gB, Bb, bB, BB (where g is a girl, b is a boy, B is a special boy). Without the last case (two special boys), everything is symmetrical, and the probability that the second child is a boy is 1/2. The less likely it is that the guy has two special boys, the closer the answer is to 1/2.

This also means that the everyday answer is actually 1/3, because parents always think that all of their children are special, so the problem reverts to the simple Two Children Problem with the equally likely cases BG, GB, and BB (yes, the girls are special, too).

tunesmith · on Feb 17, 2017

Peter Norvig has an iPython (jupyter) notebook that explores this same puzzle: http://nbviewer.jupyter.org/url/norvig.com/ipython/Probabili...

seycombi · on Feb 16, 2017

Argument by Gary Foshee

"My solution was based on set theory. Look at the entire set of all families with two children. Then look at a subset: those with two boys. Then look at another subset: those with a boy born on Tuesday. If you look at it that way, then 13/27 is the correct answer."

kgwgk · on Feb 17, 2017

13/27 is the correct probability if we got the answer "I have two children, one of whom is a son born on a Tuesday" by dialing random numbers from the phone book and asking "could you confirm if you have to children, one of whom is a son born on Tuesday?" Which might or might not be equivalent to the actual data generation process.

kgwgk · on Feb 17, 2017

That's one way to look at it, but not the only one. Look at the entire set of children. Then the subset of those born on Tuesday. Then the subset who have exactly one sibling. In that case the probability is 50%.

kutkloon7 · on Feb 16, 2017

It always helps me to picture a probability space (as in, an actual physical space, where size correlates to probability).

Assuming that children are always a boy or a girl, and are equally likely to be a boy or a girl, the 'complete space' of families with two children would be distributed as ABBC where A means two boys, B means a boy and a girl, and C means two girls. By excluding C from the space, ABB remains. So indeed, the probability would be 1/3 that both children would be boys.

But now I have made another assumption, namely that the family is picked uniformly random from all families with two children with at least one boy.

As you can see, there are quite a lot of assumptions. One convention seems to be that when you can't tell for sure how something is distributed, it is uniformly distributed.

For example, when a person blindly grabs a ball from a box with one red and one blue ball (and no other balls), the probability is not always 50% that he grabs the red ball. The red ball might be a bowling ball and the blue one might be the size of a marble.

While in this example it is reasonable to assume a uniform distribution, you can point out similar, but more subtle assumptions in most questions about probability.

A more complicated example: Weatherman A predicts the weather right 70% of the time. Weatherman B predicts the weather right 60% of the time. Weatherman A predicts rain tomorrow, weatherman B predicts dry weather. What is the probability that it will rain tomorrow?

The 'right' answer is 14/23. In order to arrive at this answer, you need to assume that the predictions are statistically independent (which is an unrealistic assumption). Indeed, it is easy to sketch a situation in which the probability is different: it is always dry, weatherman A predicts rain 30% of the time, and weatherman B predicts rain 40% of the time. This is consistent with the question, and the probability that there will be rain tomorrow is obviously 0%.

This lack of rigor always bothers me.

mgraczyk · on Feb 16, 2017

Your second example has completely different assumptions than your first. To get 14/23, you don't have to assume that the predictions are independent, only that they are independent given future weather.

Let W be 1 if it will rain tomorrow and 0 if it will not rain, with 0.5 probability either way. Let A be weatherman A's prediction and let B be weatherman B's prediction. We have

  P(W = 0) = P(W = 1) = 0.5
  P(A = w | W = w) = 0.7 # Weatherman A is right 70% of the time
  P(B = w | W = w) = 0.6 # Weatherman B is right 60% of the time

  P(A, B | W) = P(A | W) P(B | W)

Now we want to know "Weatherman A predicts rain tomorrow, weatherman B predicts dry weather. What is the probability that it will rain tomorrow?"

That is, what is P(W=1 | A=1, B=0)?

    P(W=1 | A=1, B=0) = P(A=1, B=0 | W=1) P(W=1) / P(A=1, B=0)
    = P(A=1 | W=1) P(B=0 | W=1) P(W=1) / (sum_w P(A=1, B=0|w)P(w))
    = 0.7*0.4*0.5 / (P(A=1|w=0)P(B=0|w=0)P(w=0) + P(A=1|w=1)P(B=0|w=1)P(w=1))
    = 14/23

If you don't like the assumption that the predictions are statistically independent (herding, etc), then you just have to come up with the conditional joint P(A,B | W). That wouldn't be difficult given a small amount of data since W is binary. You would put a dirichlet prior on the distribution (basically just a beta distribution with an additional dimension) and essentially just count the times each triple (a, b, w) happens.

Still, the problem isn't a lack of rigor, it's a lack of clarity in stated assumptions.

To be specific, you didn't state the assumption that P(w)=0.5, which you used to compute 14/23.

kutkloon7 · on Feb 17, 2017

I don't understand your point. I was arguing that the way the problem is posed seems to imply a unique solution. I was showing the problem was ill-posed, and that many probability problems have similar subtle or less subtle hidden assumptions. Here, this assumption is P(A, B | W) = P(A | W) P(B | W). I don't think you even need P(W = 0) = P(W = 1) = 0.5.

These assumptions usually seem quite natural to make, but can be very unrealistic (why would the predictions of weather men be independent? I would bet they are not in reality). This is a very bad to teach students. It is always very important to know which assumptions you are making, and if textbooks do this wrong, it will be nearly impossible for students to get this right.

I would think of a student which struggles with this problem as a better mathematician than the student which just uses P(A, B | W) = P(A | W) P(B | W) 'because the formula is in his textbook', but the second student is more likely to get rewarded (especially in American education, since the USA seems to be especially fond of textbooks which give a ready-made recipe for every problem that a student is supposed to solve).

mgraczyk · on Feb 22, 2017

> I don't think you even need P(W = 0) = P(W = 1) = 0.5.

You do,

In the article, the prior of interest is the prior gender of a child, which can be safely assumed to be 0.5. Similarly, independence of children's genders is a very good approximation as well. It wasn't necessary for the article to state these assumptions because they are obvious common knowledge.

seycombi · on Feb 16, 2017

Best advice (in general not just this particular problem) I heard was by Joe Blitzstein (Harvard): Practice, practice, practice.

Much of statistics/probability is about pattern recognition, and developing pattern recognition requires lots of practice.

I enjoyed and learned a lot from his Harvard course Statistics 110: Probability

Video Lectures http://projects.iq.harvard.edu/stat110/youtube