Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, exactly. There's a real-life example of an author group who got sufficiently annoyed at a reviewer requesting an inappropriate subgroup analysis: https://www.thelancet.com/journals/lancet/article/PIIS0140-6... Reviewers asked for the subgroup analysis; authors said "no, this is statistically nonsense"; reviewers said "yes, but we'll reject the paper otherwise"; reviewers said "ok, but only if you let us also split on astrological sign".

Result: paper reports that aspirin has an effect, but only if you're not a Gemini or Libra. Too good.



The two signs bracketing the summer months (I presume this was in the northern hemisphere?). That's a potentially interesting finding.

> inappropriate subgroup analysis

No subgroup is inappropriate unless you know all of the values of all of the parameters. But sure, the intent they tried to demonstrate (take all sub-group analyses with a grain of salt) is a good thing to remember.

Edit: To whoever downvoted me, time of year of birth does have real-world effects. https://www.livescience.com/13958-birth-month-health-effects...

> Previous studies have found similar links between spring births and various disorders, including schizophrenia, multiple sclerosis and even Type 1 diabetes. It's possible these diseases are linked to some environmental influence during gestation or the first few months of life, though researchers aren't sure what that could be.

> The leading candidates including vitamin D levels, infections that come and go seasonally, changes in nutrition, and even possibly weather fluctuations, Handunnetthi told LiveScience.

Now perhaps all of this is just bad science and these correlations are just statistical anomalies. But perhaps they aren't.


I think you got downvotes because you misunderstood what people were taking umbrage with in the first example.

It’s not that birth time can never have real effects. It’s that if you keep rolling a die long enough, eventually you’ll hit a “statically unlikely” event like rolling 4 fives in a row or hitting 1 2 3 4 in order.

Extraneous sub group analysis are like rolling the die again. Say you’re searching for a p-value of .05 with a confidence interval of 95%. That means 19 out of 20 times it’s indicative of a real relationship and 1 out of 20 times it was due to random chance.

If you do a bunch of extraneous sub group analyses like the reviewer wanted, you’re banking on the statistical likelihood that eventually you’ll get the result you want even if it’s not a real relationship.


This is what follow up studies are for. To separate the wheat from the chaff. Don't separate before. See my rant below as to why I think pre-hoc decisions on analysis is a bad idea: https://news.ycombinator.com/item?id=35883276

At the very least, I'd like to see people say in advance which parameters they are interested in. That sort of thing is fine and important to avoid un-backed p-hacking. But for researchers who come after, for their sake, if you are not publishing the entire data so that they can reanalyze it de novo, please do as much analysis as possible (and record as many parameters as possible), if only in the supplementary data.

Science can only build on previous science if the authors of that previous science allow it to happen.


> No subgroup is inappropriate unless you know all of the values of all of the parameters

I don't know how to put it less bluntly: you're incorrect.

> But sure, the intent they tried to demonstrate (take all sub-group analyses with a grain of salt)

That's not what they intended to demonstrate. They intended to demonstrate that you need a _reason_ to want to split, and that reason needs to be given _ahead_ of the analysis. If you see the results _and then choose_ a new data analysis (that includes new subgroup analyses), your procedure is no longer statistically sound.

This is a specifically bad statistical practice called HARKing https://en.wikipedia.org/wiki/HARKing


> They intended to demonstrate that you need a _reason_ to want to split

No, no, no. FFS no! Sure this is a good thing to do if you are trying to prove a hypothesis. But you you are trying to explore for truly novel and unexpected linkages p-hack to your heart's content, form hypothesis, and then do follow up studies to see whether there's really something there!

Ignoring possibilities is bad exploratory science. And, as a reader, it's quite annoying to read old studies only to find that the author's didn't bother splitting on the parameter you are currently interested in.

Split them all, if only in the supplementary data, and let future studies sort them out.


Your understanding of significance is flawed, and there is an XKCD for that: https://xkcd.com/882/


No, read my responses to the other commenters above you, who at least made arguments and didn't post a strawman, if you want to know why I believe you are wrong.


I’m not a stats expert, but it seems blatantly obvious that subgroup selection could be inherently problematic because how do you choose the groups in an effective and responsible way?

I mean, just the statement

> no subgroup is inappropriate unless you know all the values of all the parameters

seems implausible and unlikely.

In the one stats class I took, we talked a lot about how selection bias was a huge concern. Why wouldn’t Subgroup selection bias also be a concern?

I dunno, maybe I’m wrong, but I’m dubious.


I just ranted to the top two commenters so I'll keep this response brief.

Say in advance which parameters and subgroups you, as a researcher, care about in terms of significance. And keep your conclusions and discussions focused on the results for those groups. Report all of the other stuff, whether p-significant, or not, as supplementary data.

Hopefully, yours is not the only study that will use your data. Don't limit future researchers to your hypotheses.


That's great for Geminis and Libras, but what if you're a Sagittarius?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: