Hacker News new | past | comments | ask | show | jobs | submit login

So they went to fish for correlations and found this? EDA at its best...



I feel like we need an opposite saying for sentiments like this to the age old "correlation does not equal causation", something like "correlation does not mean automatically dismiss"


The comment you replied to has nothing to do with "correlation does not equal causation". The point was that fishing for results, by trying everything, is guaranteed to find something.

So if, in a study like this, they looked at differences in morality rate between: 1) men and women, 2) younger and older people, 3) younger and older surgeons, 4) surgeries in summer vs winter 4) surgeries in morning vs evening 5) etc etc.... And it came down all the way to n) surgeries on birthday - to find an effect. Then it would be almost guaranteed that such a finding is spurious.


> Then it would be almost guaranteed that such a finding is spurious.

But that's the fallacy. You can't just preemptively assume that there are no real correlations.

You definitely want to use a smaller p threshold when you look for more things, but it's quite possible to hit real correlations with a pile of plausible hypotheses.

As an example: Let's say just 1/150 of your hypotheses hit a real correlation, and you're inappropriately using a p<.05 test. Tiny signal, huge noise. But even in that pessimistic case, more than 10% of your positives are real. Far from a guarantee.


> [...] but it's quite possible to hit real correlations with a pile of plausible hypotheses.

Yes of course. But the trouble is that, if you do this p-hacking expedition, you are guaranteed to find those correlations in pure noise. So if you use a procedure that will find something in noise - you cannot also use it to claim to have found something in your data.

In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false". In this case no refutation would have occurred if the correlation is false. Hence - the result is equivalent as if no test has actually been performed.

Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens". Some astro-physicists then describe that all the observed properties of that object behave just as we expect them to behave in the case of an asteroid. But the person might then reply with: "yeah, but it still might have been aliens".

I feel that the same is true for "yeah, but the correlation might still be true".


> In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false".

Sure, one weak result out of many doesn't pass. But not passing is a far cry from "almost guaranteed" to be spurious.

> Hence - the result is equivalent as if no test has actually been performed.

A result like that takes a big list of plausible correlations and distills it down. If you think even a handful of the original list items are likely to have merit, then the distilled list is useful for suggesting where you should collect more data.

> Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens".

What fraction of asteroids to you expect to be aliens?

If it's one in a billion, then cutting the list by a factor of 20 is useless. If it's one in a hundred, then cutting the list by a factor of 20 is very helpful.

> I feel that the same is true for "yeah, but the correlation might still be true".

It depends on the original list being sufficiently plausible. You can't distill tap water into vodka.


The only way to make the results believable is if they pre-registered the study based on a proposed mechanism of action, and then validated the results. Otherwise we can never know how many different attempts at crafting signal from the noise were attempted.


This is a huge problem in scientific papers; it's very, very common to see results for all kinds of metrics with confidence intervals or p-values and then see a few "significant" measurements, without a mention of the fact that multiple tests were made - and implicitly, possibly many more tests were made during the exploratory phase of the research. What does significance even mean then? Hard to say (there are techniques to try and compensate of course, but they have their own issues).

One simple way we can at least mitigate that problem is by requiring far lower p-values (or wider CI's), and where that's not feasible, require a much clearer-eyed explanation and acceptance of the fact that such research cannot be trivially supported by statistics, and instead additionally requires careful experimental setup and consideration of causal networks.

Basically: if you have p = 0.0001 or whatever I'm more willing to believe that publication biases and multiple testing aren't super likely to cause false positives that often. But without that, you want a clear hypothesis and proposed way to test published beforehand, and just one test, and ideally a clear hypothesis about causation etc too, so you can critically push and prod the results to try and distinguish noise from signal. A p=0.03 just isn't very obvious, at all.

In general, I think modern science is too reliant on statistics over complex systems, and in the effort to tease out significance then needs to try and correct for all kinds of known interference (confounders) and other effects; thus then need to use more advanced statistical models and less general assumptions about distributions (whether for significance, or for mathematical tractability), that it's just very hard for anyone to say they didn't make some systematic error somewhere. And sure, being an expert in the subject matter and having an expert statistician on hand helps, but making reasoning errors is too easy; too human to reliably avoid. Instead of seeking signals in noise, we should be targeting research more narrowly to parts of the puzzle that we can measure better, then use classical plain logic to put the pieces together - not try and measure the whole thing in one go. After we put all the well-measured pieces together, validating with tricky statistics is reasonable as a sanity check, but not much more than that. If common sense is hard, statistics is harder, even for statisticians.

Interpreting results like this as any more than "huh, that's something we could look into" is unwise.


Why are you and others assuming this was the case and it wasn't the original hypothesis they had in mind?

They explain the reasoning for selecting this hypothesis to test:

>Operations performed on birthdays of surgeons might provide a unique opportunity to assess the relationship between personal distractions and patient outcomes, under the hypothesis that surgeons may be more likely to become distracted or feel rushed to finish procedures on their birthdays, and therefore patient outcomes might worsen on those days.


Because popular hobby on HN is trying to find mistakes in studies by guessing from title without reading article.

Incidentally, interestingly, that is how bias in real world often works - people making different assumptions for different groups in absence of evidence.


Did they pre-register before conducting the study? If not, then how can you possibly assume the opposite? They can always write the justification for choosing the hypothesis after the fact.

EDIT: This need not be done maliciously, that could very well be the actual reason why they decided to look at the birthday. What is concealed is how many other possibilities, equally well justified, were considered.


Correlation doesn't imply causation but it could co-occur with it

Too long

Correlative causation is more correlated with causation than non-correlative causation

Too tongue twisting

Correlation is closer to causation than no correlation

I like that

Ignoring possible causation is correlated with stupidity

Too cruel

Correlation does not equal causation but they are correlated

Not bad

Anyone else?


Correlation correlates with causation.


I always liked xkcd's "Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'." (https://xkcd.com/552/) although it is rather long.


Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

https://xkcd.com/552/


The website is currently unavailable so I can't read the article. However p-hacking, to which I believe the parent comment was referring, is a separate issue.

I agree that the incessant belaboring of the difference between correlation and causation in these types of threads is tiresome, but I don't think it applies in this case.


How about "if you find a correlation and want to report this as if you think it's significant, then you should formulate a specific hypothesis and perform independent experiments to test it, rather than publishing junk."


Apparently this isn't junk: I couldn't access the original at the time of posting, so I relied on a second hand bogus interpretation. Apologies!


Do we? Wait for an hour and there will be a bunch of plausible..apparent descriptions that “invalidate” that “they’re just negligent on their birthday”.


Indeed, and this finding could at the very least be said to have some plausible underlying (perhaps indirect) causation.


Well, that's how you discover things. You go fish for correlations, and publish any interesting findings, so another group can get independent data and check them.

On the meanwhile, a news reporter gets your (probably spurious) correlations and announce them for the entire world as "the TRUTH! science says so, and you don't doubt science, do you?"

That's basically how science gets done on any complex field where we can't test things directly.


Relevant xkcd: https://xkcd.com/882/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: