Hacker News new | past | comments | ask | show | jobs | submit login

I think the entire concept of "stealing from the research productivity..." is absurd. Repeated analysis of my data set only improves the quality of my work by strenthening the argument via a fresh set of eyeballs or showing flaws in my work which I can imrpove upon...both of which are things I want as a scientist.

I feel like hidden datasets are a real problem. In fields like medicine where some authority will look at the data eventually there's at least some quality control (debatable as to how much) but there's entire sections of science were it would be rather trivial to fake entire studies without even collecting data...especially since redoing experiments seems to have a bad rep, too.

Pretty broken system overall. I'd love if there was a step in the acceptance of papers that would say...paper accepted under the provision that the data set is made available. You still get "the glory" because the stuff is published and you're the first source on it but now the data is also available.

Additionally, one of the goals of science is being reproducible and transparent. If an experiment is well described and reproducible, additive data sets could be built. Run a similar enough experiment but with another demographic, add to the data set...etc.

Edit: I also don't see why the role of "data gatherer" can't be more prestigious. I mean sure, traditionally you gather data for a reason and want to answer a research question with it and that's what you're judged by. However there's tremendous value in identifying that no good data set exists for some area and then outlining a solid, transparent and scientific process of collecting that data and executing it. I'd call that a valuable paper even if no hypotheses are tested. As long as there's a place where you can publish those papers and make the data available it would probably also be a paper that is good for your career by a metric that seems to matter a lot. You'd potentially get a lot of citations since everyone who conducts analysis based on the data set or extends it etc. would cite you.




> Edit: I also don't see why the role of "data gatherer" can't be more prestigious.

I agree wholeheartedly. If datasets are so valuable that lots of people can easily capitalise on them and produce great science, then the creation of those datasets should be rewarded similarly to an extremely valuable paper.

Machine learning seems to be a field that this is going pretty well in, people are publishing their models and datasets more so that I can grab a trained model of a huge image recognition neural net and try it out on my own data.

> As long as there's a place where you can publish those papers and make the data available it would probably also be a paper that is good for your career by a metric that seems to matter a lot.

I'd personally like to see a shift away from requiring a paper to cite a dataset. But I don't think that really alters your point.

(disclaimer, I work for Digital Science which is a parent company of figshare, but this is a personal opinion)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: