For Amazon, though, which is the example in Evan Miller's post, I don't really get why you'd first dichotomize the five-star rating into positive vs. negative and then use Wilson intervals. Just construct a run-of-the-mill 95% confidence interval for the mean of a continuous distribution and sort by the (still plausible) worst case scenario a.k.a. the lower bound of that: `mean - 1.96 * SE`, where the standard error is `SE = stddev(scores)/sqrt(n)`.
Because of the central limit theorem, you can do this even if scores are not normally distributed and it'll work out too.
For better accuracy with small samples you could use the multinomial distribution instead. The covariance matrix for the rating probabilities can be found here for example: http://www.math.wsu.edu/faculty/genz/papers/mvnsing/node8.ht...
Then the variance for the expected rating can be calculated as a weighted sum of the values in the covariance matrix.
These companies really should be hiring statistics consultants instead of relying on the intuitions of their programmers.
I'd prefer to just treat scores as continuous and correct using `t_ppf(.975, n-1)` instead of the normal approximation (1.96) but I suppose working from a multinomial distribution would give pretty similar results.
You're still relying on the central limit theorem (i.e. a reasonable amount of data) : using t instead of z just corrects for the fact that you only have sample variances instead of population variances.
However, I suppose it's not unreasonable to assume that the ratings are likely to have a bell shaped distribution (which could be checked), so the normal/t approximation is probably going to be OK.
For Amazon, though, which is the example in Evan Miller's post, I don't really get why you'd first dichotomize the five-star rating into positive vs. negative and then use Wilson intervals. Just construct a run-of-the-mill 95% confidence interval for the mean of a continuous distribution and sort by the (still plausible) worst case scenario a.k.a. the lower bound of that: `mean - 1.96 * SE`, where the standard error is `SE = stddev(scores)/sqrt(n)`.
Because of the central limit theorem, you can do this even if scores are not normally distributed and it'll work out too.