From Wikipedia description: Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it, and the effect of outliers and other influential observations on statistical properties.
Yes, a both funny and insightful lesson on how weak basic indicators (mean, standard deviation, correlation) can be, the interest and limits of box-plot with quartiles and whiskers, the benefit of the violin-plot.
> For the median and average to be equal, the points less than the median and greater than the median must have the same distribution (i.e., there must be the same number of points that are somewhat larger and somewhat smaller and much larger and much smaller).
[0, 2, 5, 9, 9] has both median and mean = 5, but the two sides don't really have the same distribution.
Totally true...thoughts on how I could rephrase? I guess it's more the "weight" of points greater than and less than the median should be the same, so symmetric distributions definitely have it, asymmetric may or may not. Definitely open to revising...
This is excellent information, thank you for posting this! I was not familiar with this example previously, but it is a perfect example of summary statistics not capturing certain distributions well. It's very approachable, even if you had to limit the discussion to mean and variance alone. Bookmarked, and much appreciated.
That's really nifty, wish I'd heard about it earlier. Might go back and add a link to it in the post at some point too! Very useful. Definitely know I wasn't breaking new ground or anything, but fun to see it represented so succinctly.
Very discouraging if one is trying to analyze data algorithmically. Often when faced with a problem in statistics, the answer is: "Look at the graph and use intuition!".
From Wikipedia description: Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it, and the effect of outliers and other influential observations on statistical properties.