Hacker News new | past | comments | ask | show | jobs | submit login

I don't believe publication of non results or negative results would be helpful at least not published in the traditional sense. Apart from the work for writing this up it is already hard enough to follow the amount of publications coming out in most fields at the moment. Non result publications would just not be read and would make finding the right literature so much more difficult.

What is really needed is publication of the data. The big problem with that is that there are no good systems for data publication that don't require significant extra work. Ideally the data would be made available directly from some electronic labbook system, with the relevant metadata attached to it. At the moment all systems are miles away from that (even if we don't consider how bad record keeping is in many research labs).

If funding agencies would really want to do something about open data they would significantly invest into a good data system and make data entry mandatory and in particular not just at the end, but during the study (private, but then made available at the selection of the PI or after some time)




Wouldn't this open the door to even more p-hacking than today - unscrupulous researchers taking others' data and finding some kind of signal in it?

My understanding was that the gold standard for research is to come up with a hypothesis, design an experiment to test that hypothesis, collecting data in such a way that you have statistical independence between the variables you wish to measure, and finally analyze the data and conclude check if you have confirmed your hypothesis or if it remains unconfirmed.

But data collected correctly to test one hypothesis is not necessarily good for testing other hypotheses - e.g. perhaps it was irrelevant for your research if all the animals were female and might have had birth defects, but that doesn't mean your data is useful for someone studying the whole population.

This would mean that not only do you have to publish the raw data, but also describe your data collection methods, your assumptions, things you didn't check etc. How far are you from publishing your whole paper at that point?


>Wouldn't this open the door to even more p-hacking than today - unscrupulous researchers taking others' data and finding some kind of signal in it?

I would argue that the more data is out there, the less easy it is to "p-hack", i.e. it's much more difficult to find some weird correlation if you have a huge population. Also I would not call researchers who use others data "unscrupulous". In fact that is been done in meta studies already and is exactly the point of the exercise. Increase the amount of data published so if some "unscrupulous" or "erroneous" researcher publishes a study with some some spurious correlation, we can look at a lot more data to see if that also exists there.

>My understanding was that the gold standard for research is to come up with a hypothesis, design an experiment to test that hypothesis, collecting data in such a way that you have statistical independence between the variables you wish to measure, and finally analyze the data and conclude check if you have confirmed your hypothesis or if it remains unconfirmed.

>But data collected correctly to test one hypothesis is not necessarily good for testing other hypotheses - e.g. perhaps it was irrelevant for your research if all the animals were female and might have had birth defects, but that doesn't mean your data is useful for someone studying the whole population.

It is true that sometimes data collected for one purpose is not necessarily good for another purpose, however very often it is and many discoveries were made by looking for something new in data collected for a completely different purpose.

>This would mean that not only do you have to publish the raw data, but also describe your data collection methods, your assumptions, things you didn't check etc. How far are you from publishing your whole paper at that point?

Good reproducible science should do that anyway. You should always keep a labbook that clearly documents what you are doing so that someone else (or your future self), can reproduce the results. So in the requirement I'm asking for is forcing people to do good science. I can tell from experience this documentation is still very far away from a whole paper.


Your points are very correct I think. Just wanted to comment that I didn't mean to call anyone who uses data published by others unscrupulous, I was just talking about researchers who would go hunting for p-hacking in others' data.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: