Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Causal Inference in Statistics: A Primer (ucla.edu)
130 points by dstein64 on Feb 9, 2016 | hide | past | favorite | 27 comments


Judea Pearl's work on causality is some of the most important statistics work that is happening these days. We've known how to do statistics to find correlations and make inferences, but he put causality on a firm mathematical basis, and discovered fascinating statistics as he did. This book should be a blast.


His WSJ article about the murder of his son (WSJ journalist Daniel Pearl) was well-considered, too: http://online.wsj.com/article/SB123362422088941893.html


I am not sure how that can be called well considered. He really shouldn't throw stones in glass houses. It is clearly a pro-Israel piece, he even alludes to the bulldozing regime as being acceptable without directly saying it. Shameful to use his sons death for this propaganda.


> Shameful to use his sons death for this propaganda.

I don't have strong feelings towards his views, but I don't see how it can be called propaganda. He presents a reasoned argument, in contrast to the violence that was visited upon his child. I consider that admirable.


Ever since David Hume[1], a couple of hundred years ago (and arguably long before him), causality has been recognized to stand on very shaky philosophical grounds. It will be interesting to learn what statisticians can make of it. Even if there is in fact a firm mathematical foundation on which causality can rest, that in itself is problematic because mathematicians themselves have mostly given up trying to find foundations for mathematics.

[1] - https://en.wikipedia.org/wiki/David_Hume


They're different questions to some extent.

In the statistical literature on causality, the counterfactual definition is almost universally accepted: if you do X then Y happens, but if you wouldn't have done X, Y wouldn't have happened. So the main challenge that statisticians are tackling is not the ontological question of whether causality can really be said to exist in the world, but rather the practical question of how to make measurements performed at different times, in different places, on different people, using different machinery, as comparable as possible.


Can you recommend any casual (article-sized) reading about this? Sounds really interesting!

Edit: xtacy's post answers me entirely.


For a quick and general purpose introduction on Causality, the Epilogue of a former book of Pearl is great : "The Art and Science of Cause and Effect" http://bayes.cs.ucla.edu/BOOK-2K/causality2-epilogue.pdf


Thank you! This rather blew my mind and I hope others take time to try it out.


Check out his turing award lecture: http://amturing.acm.org/vp/pearl_2658896.cfm


I was looking for a review, and found Gelman's recommendation of

Causal Inference for Statistics, Social, and Biomedical Sciences

and mention of this book: http://www.hsph.harvard.edu/miguel-hernan/causal-inference-b...

Does anyone have a comment on those? I've read Pearl's two earlier books, and found the one on causality quite hard to navigate. The basic ideas are cool, but it's hard to connect the more advanced theorems with anything I could actually implement.


I too found Pearl's book hard to navigate on first attempt. Do not let that stop you! After a hiatus, I stumbled upon this blog post [1], which explained the core ideas in Pearl's framework beautifully in a simple language. My advice is to persist, fill any holes in fundamentals (mostly basic probability), and persist. After working out the examples in the blog post on paper and contrasting it to other ideas out there (potential outcome framework), it became quite clear what Pearl was trying to articulate.

Pearl is also an enthusiastic speaker. You can search for his talks online at various venues (Stanford, Microsoft Research, etc.) to learn more.

[1] http://www.michaelnielsen.org/ddi/if-correlation-doesnt-impl...


Gelman's own book (with Jennifer Hill) also has some good, practical techniques on causal inference. [0]

Morgan and Winship's Counterfactuals and Causal Inference: Methods and Principles for Social Research [1] is also really good. Be sure to get the second edition; it's much better than the first.

[0] http://www.amazon.com/Analysis-Regression-Multilevel-Hierarc...

[1] http://www.amazon.com/Counterfactuals-Causal-Inference-Princ...


I studied the Hernan/Robins book for a course on causal inference, and I love it. But frankly it's a pretty niche topic, and so for the non-statisticians here on HN who are trying to get better at statistics, just keep in mind that there are so many other topics you probably want to tackle first. Many of the concerns addressed in the Hernan/Robins book probably wouldn't even make sense to you if you didn't have a firm statistical background already.

Gelman/Hill's Data Analysis Using Regression and Multilevel/Hierarchical Models or Angrist/Pischke's Mastering 'Metrics embed ideas and techniques from causal inference into the broader context of regression modeling which makes these books more immediately useful. Those would probably be my two recommendations for non-statisticians.


I'm a very big fan of Miguel Hernan's work, and have found him fairly approachable - he's made a decent stab at taking some of the more opaque bits of epidemiology methods and making them more clear.


They're both great books, but they make heavier use of potential outcome notation and focus less on graph-theoretic formulations than Pearl's work.


Just so people know, there is a competing/complementary approach to causality in statistics, called the potential outcomes or (Neyman-)Rubin causal model, which as I understand it is currently more popular than Pearl's graphical/do-calculus approach.


There is also TS related body of knowledge about causal impact using the notion of counterfactuals. Google has sponsored research in the field [1] and also released an R package [2].

[1] http://research.google.com/pubs/pub41854.html

[2] https://google.github.io/CausalImpact/CausalImpact.html


Philosophy 101 would tell us that statistics could capture only correlations, not causation. Causation require different kind of knowledge, of what is beyond appearances.


When we see that two events are correlated (which we need some kind of statistics to do), we can tell a story (a theory, or an explanation) about how one event causes the other. If the explanation stands up to rational testing over time (where statistics are an important tool), then we have gained knowledge - one plausible explanation of what is "beyond appearances".

Therefore statistics are useful both before positing an explanation, and after to falsify it.


That theory or explanation requires the domain knowledge I am talking about.

Mere statistics about appearances is not enough.

To make it clear - statistics is obviously useful. It just cannot infer any proposition like x is y for all values of x.


I agree. I should have said that explicitly, before.


I just unflagged this comment. While it's actually somewhat wrong, it makes an important point.

Statistics can be used to discover a causal relationship. It can't give you an absolute answer, but it can give you a statistical likelihood of the causality. That's a pretty important step forward.

That's what this is about.


Likehood or probability makes sense only in models where one knows with maximum certainty that he have captured all/every relevant variables, its weights and has all kinds of possible events in a distribution. Otherwise the whole model is mere a story. An illusion. Failure to capture reality adequately is where so-called "black swans" are coming from.

This is meaning behind the "correlation is not causation" meme. There is nothing wrong with Bayesian reasoning, except when it is applyed to an inadequate dataset, which is almost always the case.

Would you like to elaborate about "somewhat wrong", with quotations from Principles of Mathematics, for example?


Would you like to elaborate about "somewhat wrong", with quotations from Principles of Mathematics, for example?

It's "somewhat wrong" because in some cases it is possible to derive causation using statistical methods.

Have you read the linked book? It should answer your questions. If not I'll point you to Michael Nielsen's post[1], where he explain[s] how the causal calculus can sometimes (but not always!) be used to infer causation from a set of data, even when a randomized controlled experiment is not possible. Also in the post, I’ll describe some of the limits of the causal calculus

It's a pretty long post, but the gist of it is that in some circumstances it's possible to build a world model of an imaginary controlled, randomized experiment and then see if non-controlled, real world data matches those expectations.

What that gives you is a distribution of the probabilities of causality.

[1] http://www.michaelnielsen.org/ddi/if-correlation-doesnt-impl...


The probability calculus has nothing to do with correctness of probabilities supplied the very same way the propositional calculus had nothing to do with validity of given propositions. Or any calculus in that matter.

Inference is application of valid heuristics. Mere statistics is not sufficient.


It actually is possible to infer causality from statistics. That's (partially) what this author's work is all about. For a brief explanation of how that's possible, see this post: http://lesswrong.com/lw/ev3/causal_diagrams_and_causal_model...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: