Good opportunity to plug https://en.wikipedia.org/wiki/Anscombe%27s_quartet : if...

pdpi · on Sept 14, 2021

There's a fun paper by Autodesk where they make datasets that look whatever way you want them to.

https://www.autodesk.com/research/publications/same-stats-di...

s_gourichon · on Sept 14, 2021

Yes, a both funny and insightful lesson on how weak basic indicators (mean, standard deviation, correlation) can be, the interest and limits of box-plot with quartiles and whiskers, the benefit of the violin-plot.

Definitely worth a quick look.

djk447 · on Sept 14, 2021

NB: Post author here.

This is great! So fun...will have to use in the future...

pdpi · on Sept 14, 2021

While I have your attention...

> For the median and average to be equal, the points less than the median and greater than the median must have the same distribution (i.e., there must be the same number of points that are somewhat larger and somewhat smaller and much larger and much smaller).

[0, 2, 5, 9, 9] has both median and mean = 5, but the two sides don't really have the same distribution.

djk447 · on Sept 14, 2021

Totally true...thoughts on how I could rephrase? I guess it's more the "weight" of points greater than and less than the median should be the same, so symmetric distributions definitely have it, asymmetric may or may not. Definitely open to revising...

pdpi · on Sept 14, 2021

> symmetric distributions definitely have it, asymmetric may or may not

Doesn't have to be any more complicated than that. It's more a curio than an important point anyway :)

doctorsher · on Sept 14, 2021

This is excellent information, thank you for posting this! I was not familiar with this example previously, but it is a perfect example of summary statistics not capturing certain distributions well. It's very approachable, even if you had to limit the discussion to mean and variance alone. Bookmarked, and much appreciated.

djk447 · on Sept 14, 2021

NB: Post author here.

That's really nifty, wish I'd heard about it earlier. Might go back and add a link to it in the post at some point too! Very useful. Definitely know I wasn't breaking new ground or anything, but fun to see it represented so succinctly.

hoseja · on Sept 15, 2021

>importance of graphing data before analyzing it

Very discouraging if one is trying to analyze data algorithmically. Often when faced with a problem in statistics, the answer is: "Look at the graph and use intuition!".

potatoman22 · on Sept 15, 2021

Interesting, thanks for the concept. What's one to do for high dimensional data then?