Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Understanding statistics (and how to cheat with them).


IMO, the key to understanding statistics is learning how to quantify uncertainty over datasets. The key to cheating is to either downplay, (or overplay), the interpretation of this uncertainty as it pertains to the conclusions you can draw.


Can you elaborate what you mean by this? Are you referring to something ethical?


I see a lot phrases like "studies show that" these days. Somehow in our science-based, sophisticated society everybody likes to throw studies at each other to prove their own narrative. But I don't understand them. I am not that statistical illiterate, I know the difference of mean, mode and median and stdev (and when to use what). When I dig deeper into one study I'll find hypothesis testing methods like p-values, r^2 and whatever ("our hypothesis was proven because p > 0.9"). But here my knowledge ends. If p>0.9 is that good? or did they just tune the data to get that high p-values? or is the whole method garbage and the study could not get replicated with the same p value? And I want to know how to cheat with statistics, because since these studies are made by people, whom might get paid to prove a certain point (e.g. "My institute gets paid by Mars, hence I'll downplay the effects on health of sugar in daily nutrition and amplify the positives effects of <some chemical found in chocolate>"). or they just want to show significance for their research, because they worked the last 10 years on it and it's "their baby".


Where to start?


on my reading list so far: D. Huff et al - How to Lie with Statistics J. Ellenberg - How Not to Be Wrong: The Power of Mathematical Thinking

IIRC Bill Gates recommended them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: