>Wouldn't this open the door to even more p-hacking than today - unscrupulous researchers taking others' data and finding some kind of signal in it?
I would argue that the more data is out there, the less easy it is to "p-hack", i.e. it's much more difficult to find some weird correlation if you have a huge population. Also I would not call researchers who use others data "unscrupulous". In fact that is been done in meta studies already and is exactly the point of the exercise. Increase the amount of data published so if some "unscrupulous" or "erroneous" researcher publishes a study with some some spurious correlation, we can look at a lot more data to see if that also exists there.
>My understanding was that the gold standard for research is to come up with a hypothesis, design an experiment to test that hypothesis, collecting data in such a way that you have statistical independence between the variables you wish to measure, and finally analyze the data and conclude check if you have confirmed your hypothesis or if it remains unconfirmed.
>But data collected correctly to test one hypothesis is not necessarily good for testing other hypotheses - e.g. perhaps it was irrelevant for your research if all the animals were female and might have had birth defects, but that doesn't mean your data is useful for someone studying the whole population.
It is true that sometimes data collected for one purpose is not necessarily good for another purpose, however very often it is and many discoveries were made by looking for something new in data collected for a completely different purpose.
>This would mean that not only do you have to publish the raw data, but also describe your data collection methods, your assumptions, things you didn't check etc. How far are you from publishing your whole paper at that point?
Good reproducible science should do that anyway. You should always keep a labbook that clearly documents what you are doing so that someone else (or your future self), can reproduce the results. So in the requirement I'm asking for is forcing people to do good science. I can tell from experience this documentation is still very far away from a whole paper.
Your points are very correct I think. Just wanted to comment that I didn't mean to call anyone who uses data published by others unscrupulous, I was just talking about researchers who would go hunting for p-hacking in others' data.
I would argue that the more data is out there, the less easy it is to "p-hack", i.e. it's much more difficult to find some weird correlation if you have a huge population. Also I would not call researchers who use others data "unscrupulous". In fact that is been done in meta studies already and is exactly the point of the exercise. Increase the amount of data published so if some "unscrupulous" or "erroneous" researcher publishes a study with some some spurious correlation, we can look at a lot more data to see if that also exists there.
>My understanding was that the gold standard for research is to come up with a hypothesis, design an experiment to test that hypothesis, collecting data in such a way that you have statistical independence between the variables you wish to measure, and finally analyze the data and conclude check if you have confirmed your hypothesis or if it remains unconfirmed.
>But data collected correctly to test one hypothesis is not necessarily good for testing other hypotheses - e.g. perhaps it was irrelevant for your research if all the animals were female and might have had birth defects, but that doesn't mean your data is useful for someone studying the whole population.
It is true that sometimes data collected for one purpose is not necessarily good for another purpose, however very often it is and many discoveries were made by looking for something new in data collected for a completely different purpose.
>This would mean that not only do you have to publish the raw data, but also describe your data collection methods, your assumptions, things you didn't check etc. How far are you from publishing your whole paper at that point?
Good reproducible science should do that anyway. You should always keep a labbook that clearly documents what you are doing so that someone else (or your future self), can reproduce the results. So in the requirement I'm asking for is forcing people to do good science. I can tell from experience this documentation is still very far away from a whole paper.