You can argue either way. dividing by n is the maximum likelihood estimator, but...

sanderjd · on Oct 29, 2019

What does it mean for it to be "unbiased"? What does it mean to "use up" a degree of freedom?

I don't mind if things just can't really be explained intuitively because they are fundamentally technical, but your explanation and the parent's both do this thing where it sounds like it's explaining things in plain common language, but isn't actually because it isn't clear what those plain words mean in this context.

absherwin · on Oct 29, 2019

Unbiased means that if I draw infinitely many random samples from a population and average a statistic (in this case standard deviation) across all the samples, the answer will be the statistic computed from the population itself. If one divides by n instead of n-1, the estimate for standard deviation will be be (n-1)/n too small. One reading this might think, "Wait! We're going to infinity so the ratio converges to 1." That's true if the size of each sample also goes to infinity but not if we draw millions of ten item samples.

As for using up a degree of freedom, the easiest way to build intuition for why this is a useful concept is to think about very small samples. Let's say I draw a sample of 1 item. By definition the item is equal to the mean so I receive no information about the standard deviation. Conversely, if someone had told me the mean in advance, I could learn a bit about the standard deviation with a single sample. This carries on beyond one in diminishing amounts. Imagine I draw two items. There's some probability that they're both on the same side of the mean, in that case, I'll estimate my sample mean as being between those number and underestimate the standard deviation. Note that I'd still underestimate it even with the bias correction, it's just that that factor compensates just enough that it balances out over all cases.

A simple, concrete way to convince yourself that this is real is to consider the standard deviation of a variable that has an equal probability of being 1 or 0. The standard deviation is 0.5. But if we randomly sample two items, 50% of the time they'll be the same and we'll estimate the standard deviation as zero. The other 50% of the time, we'll get the right answer. Hence, our average is half the right answer (n/(n-1)=2/1). The correction makes the standard deviation double what it should be half the same while remaining zero in the other cases. This also suggests why dividing by n is referred to as a the maximum likelihood estimator.

krychu · on Oct 29, 2019

This is very helpful, especially the example at the end, thanks. I think the difficult part to understand is that dividing by n leads to an estimate that is somehow too small. The intuition tells you that dividing by n would just give you the true average.

sanderjd · on Oct 29, 2019

Thanks! This is helpful!

dataflow · on Oct 29, 2019

"Unbiased" means the error is on average zero.

benrbray · on Oct 29, 2019

A mathematician once gave me a convincing argument for dividing by n+1 instead, but I have since forgotten his reason.

mturmon · on Oct 29, 2019

The argument is that this factor can minimize the mean squared error of the variance estimate, at the cost of some bias. In general, the small correction to the lead factor depends on the fourth moment. For a Gaussian, you get 1/(n+1).

This is an example of a “shrinkage estimator”, which comes up a lot - introduce some bias but get a smaller MSE. For more, see: https://en.wikipedia.org/wiki/Bessel%27s_correction

adouzzy · on Oct 29, 2019

Asymptotically, all the same. If sample size is small, it's a bad estimator anyway.