Hacker News new | past | comments | ask | show | jobs | submit login
Criticism of “Research Parasites” Moves Medical Journal in Wrong Direction (statnews.com)
94 points by tokenadult on Feb 2, 2016 | hide | past | favorite | 70 comments



The system within which many researchers must work to advance their careers is fundamentally at odds with sharing data. Collecting data is unglamorous, laborious, and often very expensive. This places much data collection firmly within the purview of research institutions, where resources and cheap labor are abundant. To progress in this context, one must publish. It isn't enough only to have collected the data; the researcher must also be the one to analyze it, and to do so before someone else does. Thus for any self-interested person in research, there is no incentive to disclose hard-won data to the public at large, effectively crowd-sourcing analysis and foregoing the career advantages and opportunities that a large store of useful data offers. One may find it unethical, but few who have committed many years of hard labor to advance in a research context would do otherwise.


So what you're saying is essentially that the point of science is to advance careers, not to find insights and come closer to the truth? It may very well be true, but that would be a devastating account of the situation of today's science. And it would desperately need to be fixed.


I don't think the parent is saying that's what science is for, but rather that this is what the incentive structure currently looks like. We mythologise science and scientists as being above this, but they aren't. Incentives matter.


No, and all scientists want everyone to find insights and come closer to truth.

But the reality is that both the career advancement and funding for further experiments is almost entirely based on the number of publications in prestigious journals.

And effort put into collecting data is not by itself publishable: you must add analysis, and very often analysis that successfully confirms a new hypothesis or invalidates previous beliefs to publish. Journals prefer to publish exciting results, and "we ran experiment X, and found no significant effects" is usually rejected.

Fixing this is desperately needed, but so far nobody has come up with a solution that would actually fix this.

A middle ground would be to require data publication after a certain time period. For example, NASA and ESA release original data 6-12 months after collection. Releasing original data together with the paper also works.


> Fixing this is desperately needed, but so far nobody has come up with a solution that would actually fix this.

Wouldn't rewarding people for `cites' of their dataset be a way to fix this?

Ie if someone uses your data, they have to cite you, and that's just as good as someone citing an analysis paper.


Exactly. I think publishing the data _along_ your publication is the best way to go about it.


I agree with your idealism, so why don't you and I team up, forget earning a salary and just give away anything we come up with for others to use without restriction? That would be the ultimate way of advancing insights and coming closer to the truth...

Edit: I'd love to actually, but perhaps I'll wait for the post scarcity, star trek economy to become mainstream.


No, of course he wasn't saying that. But there is a problem with the reward structure of the system.

Scientists have to eat and pay the bills, too. Unless you're really rich, it's tough to do much serious science in your spare time on weekends, paying for equipment from your own pocket. So the idea is that a scientist gets a job that pays him/her to do science. The system rewards scientists who advance knowledge and penalizes those who do not.

How do we determine whom to reward? It is extremely difficult to evaluate most scientists' work unless you are an expert in the same field. So we let peer reviewers evaluate the work, and reward scientists for publications in prestigious peer-reviewed journals.

The problem is that these incentives work against some kinds of sharing.

It's not all bad. A scientist whose work never sees the light of day is fired; no publications means no job. But sharing of ideas apart from those that are necessarily published is something the system tends to punish.

What to do about it? It's easy to propose some radical revision of the system. But we still have to deal with the question of how to evaluate the work of someone whose ideas can only really be understood after a decade of specialized study.


I don't believe my comment could reasonably be interpreted in that light. I made no normative claims. I just stated what I believe to be the situation.


There are plenty in it just for the career, gaming the incentives of an imperfect system. But not all are like this, it is usually obvious (at least in my field) who is real and who is just collecting a check.


Having a lot of data is a big incentive to find someone who can analyze it correctly because it will raise the impact of your scientific output. It is not as bad as people describe here. Even large companies need to publish and share in order to show that they know their business and in order to have any shot at a collaboration wit a large hospital (or another company). Such a hospital will also want to publish and share in general. A publication of a company also signals: "Hey come to us, maybe we can work out a deal and provide a nice service based on our work". Yes, there are NDAs, contracts and patents but that company paid a lot of people for their efforts with only a small chance at ROI, it is only fair they are rewarded.

An optimum is often found that favors sharing as little as possible but often "as little as possible" is still quite a lot. Also, how can people label this as unethical when people would not even have gathered the data if they knew beforehand someone would force them into sharing, diminishing the data's return on investment?

I'm also pretty sure the sharing of a complete data set will get you a lot of citations, so there is a financial incentive next to the moral one.


It's fine for researchers, in most cases, to keep the data to themselves until publication. This allows them to do this analysis first.

But many clinical trial researchers want to withhold the data even after publication. This is fundamentally at odds with how science works. It means no one can independently evaluate the results.


One way to solve this problem is to create a data license that allows data sharing for replication use only until some embargo date set by the researcher. This would allow external researchers to immediately replicate published results, but would prohibit them from unfairly benefiting from someone else's hard work.


This exists and has been used successfully in a number of large consortia, like the Cancer Genome Atlas. It essentially says that the data goes live immediately, but the researchers who generated the data have exclusive publishing rights for 6 months from that date. After that, it's fair game for anyone.


It isn't such a terrible thing. Unlike corporations, which routinely withhold data generated in clinical trials, academic researchers must eventually publish their data, otherwise their careers will suffer.


I follow JAMA's publications and routinely ask / seek data from authors (politely). I've never once been sent or guided towards data. I asked Dr. Vivian Fonseca for the data associated with this study: http://www.ncbi.nlm.nih.gov/pubmed/23218892 The text is free and you'll see he says on page 1, "the raw data are available from Pamlab LLC." I asked Pamlab for the data and they stonewalled. I asked Dr. Fonseca and he stonewalled. I asked the Journal and they said, "Not our job." And that's an article that says the data are available!


This is why it is so important to ensure data is deposited in an independent repository at time of publication. It is too easy to claim the data will be made available and then simply not do it, and there is little recourse.

It would be really worthwhile for the International Committee of Medical Journal Editors to hear your story, as they are proposing that authors should be allowed to withhold data for six months after publication. See my comment elsewhere in this thread: https://news.ycombinator.com/item?id=11019760

If you have access to PubMed Commons, I would post a comment warning others that the authors refused to make the data available despite promising to do so in the paper, and that people should treat the results with caution.


That kind of a situation seems pretty simple: request data upon review, and actively deny the paper until data has been submitted.

Or shame them academically. You'd be surprised how far that goes.


AIUI these researchers are complaining about what might become of their data after they have published their studies.

Edit: no, on closer reading, I am incorrect. My bad.


Very sad to hear. Too bad so much focus is on the individual - advancing ones career is at odds with advancing the science that you care about.


It's not just career advancement, it's often also career survival. No papers - no funding. No funding - no job.


I wonder how much the bottleneck of resources (money/etc) would be trimmed down if centers of research were spread throughout the country rather than in very expensive real estate IE on a college campus? I'm probably super super over simplifying the problem.


Edit: ah, never mind- bit emotional this morning. Covered by hannob below.


For an idea of what some of the backlash about the NEJM editorial looks like, simply check Twitter for #researchparasites: https://twitter.com/search?q=%23researchparasites -- and for a quicker version, some of the most influential voices are summarized here: http://www.forbes.com/sites/davidshaywitz/2016/01/21/data-sc...

The paragraph from NEJM's piece that drew the brunt of the scorn was the following:

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

The meme has already given birth to at least one parody account: https://twitter.com/dataparasite and one rather tongue-in-cheek domain purchase: http://researchparasites.com


"...or even use the data to try to disprove what the original investigators had posited."

How is this a bad thing?

The original investigators - as scientists and seekers of the valid, the factual, the accurate - should welcome competing attempts, especially given that such attempts may reveal a faulty original analysis.


Indeed. It is such a bizarrely anti-science attitude. I suspect the mindset is not about advancing scientific discourse, but maintaining the sense of prestige the journal's name carries.


I'm not sure how well that prestige is earned. NEJM has to retract substantially more papers than any other high-profile journal:

http://retractionwatch.com/2011/08/11/is-it-time-for-a-retra...

If the papers they publish get more scrutiny, they will undoubtedly have to retract even more.


I wonder if this is not a concern with high quality rebuttals but with poor, fast analysis conducted with an agenda. The worry that some people will root through the data to find any small issues and blow them far out of proportion.

I'm not sure I agree with it, but I think it's possible that's more what they're talking about. Rather than, say, a lab coming back and saying "we analysed the BICEP data carefully and found the results are possibly the result of dust".


... there is just so much wrong with that editorial


I think the entire concept of "stealing from the research productivity..." is absurd. Repeated analysis of my data set only improves the quality of my work by strenthening the argument via a fresh set of eyeballs or showing flaws in my work which I can imrpove upon...both of which are things I want as a scientist.

I feel like hidden datasets are a real problem. In fields like medicine where some authority will look at the data eventually there's at least some quality control (debatable as to how much) but there's entire sections of science were it would be rather trivial to fake entire studies without even collecting data...especially since redoing experiments seems to have a bad rep, too.

Pretty broken system overall. I'd love if there was a step in the acceptance of papers that would say...paper accepted under the provision that the data set is made available. You still get "the glory" because the stuff is published and you're the first source on it but now the data is also available.

Additionally, one of the goals of science is being reproducible and transparent. If an experiment is well described and reproducible, additive data sets could be built. Run a similar enough experiment but with another demographic, add to the data set...etc.

Edit: I also don't see why the role of "data gatherer" can't be more prestigious. I mean sure, traditionally you gather data for a reason and want to answer a research question with it and that's what you're judged by. However there's tremendous value in identifying that no good data set exists for some area and then outlining a solid, transparent and scientific process of collecting that data and executing it. I'd call that a valuable paper even if no hypotheses are tested. As long as there's a place where you can publish those papers and make the data available it would probably also be a paper that is good for your career by a metric that seems to matter a lot. You'd potentially get a lot of citations since everyone who conducts analysis based on the data set or extends it etc. would cite you.


> Edit: I also don't see why the role of "data gatherer" can't be more prestigious.

I agree wholeheartedly. If datasets are so valuable that lots of people can easily capitalise on them and produce great science, then the creation of those datasets should be rewarded similarly to an extremely valuable paper.

Machine learning seems to be a field that this is going pretty well in, people are publishing their models and datasets more so that I can grab a trained model of a huge image recognition neural net and try it out on my own data.

> As long as there's a place where you can publish those papers and make the data available it would probably also be a paper that is good for your career by a metric that seems to matter a lot.

I'd personally like to see a shift away from requiring a paper to cite a dataset. But I don't think that really alters your point.

(disclaimer, I work for Digital Science which is a parent company of figshare, but this is a personal opinion)


Oh brother. They are literally putting science and medicine back by decades. All those conspiracy theorists who won't vaccinate their children because of "big pharmacy" hiding data now can use this editorial, with some credibility, to say that scientific studies are biased and lack transparency.

The authors of this editorial should be stood down.


Though not a scientist or researcher, by any measure, I like to think that I understand the viewpoint of both positions...it seems there are two that are quite distinct:

1) Defenders of one of the tenets of the Scientific Method--peer review--and the transparency that that concept demands...

2)Those wary of what the current environment implies might befall those who are totally transparent with both research data and results...

"Research Parisites" are being deemed as synonymous with "Patent Trolls"...

It will be interesting to see how this is sorted...competition is fierce for funding...important fundamental principles, and billions of dollars, are stake...


Peer review is a tenet of the Scientific Method? I don't think so. Independent replication definitely has been considered so: https://en.wikipedia.org/wiki/Nullius_in_verba

I guess that could be considered a specific type of peer review. Getting useful feedback is always good too, whether related to science or not. Maybe of interest:

http://michaelnielsen.org/blog/three-myths-about-scientific-...


To understand the medical profession point of view, you need to understand that the "material" used for their research is a rare and precious commodity: humans with a particular condition.

Gathering enough patients to do a study on some particular disease is often difficult and the result of a career long reputation. Doctors usually refer to it as their "recruitment", ie patients addressed to them because they are the reference in this field, something that these "parasite" researchers will get credit for without having to earn it

That's why I believe they are so protective with data.


Yes, medical data is expensive and hard to obtain. But that's not (at least in the direct sense) the reason why so many medical researchers want to keep the data to themselves.

The actual reason is that there is a whole cottage industry based on trading access to data for paper co-authorship. If the researchers were to share their data openly, the person who comes after them would be able to just cite the data without having to beg the researcher for access while promising him/her a sweet co-authorship spot in return.


In CS (which is admittedly quite different), one counterbalancing incentive is that citations are at least as important as coauthorships in recent years, and releasing data is one relatively easy way to get a boost in citations. If you just release results, then people only cite your paper if they're doing particularly related work. But if you also release an accompanying dataset, all sorts of random people will cite this paper as the source of the data (if the data is interesting, anyway), even if they don't care about the paper itself.


> The actual reason is that there is a whole cottage industry based on trading access to data for paper co-authorship.

Is that a problem? That's the way it should be. Requiring that the data be shared does not rule out a requirement that the person that creates the dataset is automatically a coauthor.


Thanks for sharing a professional insider perspective. But wouldn't any other researcher who obtains the data from doctors who recruited the patients have to give credit to the doctors who shared their data? Isn't that a much better professional reputation to have than the reputation of doctors who refuse to share data?

Just in the last two weeks, the International Committee of Medical Journal Editors proposed new rules[1] about sharing clinical trial data. Commentators think that kind of data sharing is a very good idea.[2] As a reader of medical research who has heard a lot of "war stories" about the medical research process from family members who have observed that process at first hand, it seems to me that having more eyes on each data set is nearly always a good idea.

[1] http://annals.org/article.aspx?articleid=2482115

[2] http://www.statnews.com/pharmalot/2016/01/27/proposal-data-s...


Sharing the data doesn't get you promoted in academia. Writing papers where you repeatedly mine the hand-crafted-over-20-years dataset does. If you give your dataset away for free, you're destroying your only way to advance in academic medicine.

What we need are alternative funding models for academic medicine. With all the negative press and ethical issues from pharma funding the studies, and the NIH cutting their funding, publish or perish is even more important if you want a sliver of a chance to get some of the remaining grant money.


So I guess the concern "that data sharing would require them to commit scarce resources with little direct benefit" is just a pretext then. Obviously, just putting some data up on a website is cheap.


It is cheap, but how do you measure that? There's no impact factor or associated journal publication.

Since simply providing the data can't be measured for tenure or fit into a line item on a CV, academia as a whole assigns it zero value. This then means that simply sharing data is a huge net negative given the high costs of acquiring it.


I am not a doctor. But I come from a family of doctors.

Patient privacy aside, I do agree with you and with the critics of the New England. Science is about reproducable experiments, and in medecine, because of the constraints on the "research material", most studies are on the edges of statistical significance, and with professionals that aren't trained statisticians.

The only mitigant to that is data sharing.


"But wouldn't any other researcher who obtains the data from doctors who recruited the patients have to give credit to the doctors who shared their data?"

In academia, it's better to be the primary author of a paper. More primary author publications makes it easier to get funding, keep your job and advance your career. If you share the data that you might have spent a lot of time and money collecting, you will either gain a citation or a non-primary author position. This is worse for your career. I don't like the system. Most academics would prefer to share to have their work verified or even to see what others might find but the incentives are to keep it private so that you can squeeze as many primary papers (or papers from your research group) as possible out of a data set before sharing the paper.


You're probably very close to correct on this, but it doesn't make it right nor the moral thing to do.

Also, a lot of patients end up exploited by docs and companies who sell these very expensive samples taken from patients with rare diseases. The patients don't benefit from that, but the docs and medical institutions do. So while we try to "protect" medical research for the sake of doctors, they are not only using their patients for profit but they are hindering medical research in the name of their profits.

And yes, I worked for a medical lab where I helped identify and sell patient blood and tumor samples with rare forms of blood cancers. I'm not necessarily against selling patient samples where the patient doesn't benefit, but blindly protecting doctors' claims to research data doesn't tell the whole story.


Just to nitpick it is more important to be the corresponding author. While there is often a fight over who gets to be the primary author (i.e. the junior person who did all the work), it is nothing compared to the fights that go on between the PIs as to who will be the corresponding author. Grants and millions of dollars are at stake in these fights.


The backlash is from physician scientists (among others), though. It's not really "medical" vs "non-medical" so much as it is "closed" vs "open", with physician scientists on both sides.


I'm a clinical research ex-data-minion.

In that research context, it can take teams of 10's to 100's years to establish a usable dataset which can win grant money to both serve patient populations and reinforce the long-term sustainability of a clinical/academic institution.

You bet your ass that data isn't going to be given out. It's not about covering your ass. It's about stewardship of data which has been culled via consent from participants, often of a very sensitive nature, with specific and explicit limits to how it will be used, managed, and shared.

It's a gold mine that was built from years of getting at the forefront of a niche and then racing year after year to stay there.

The gold from the mine isn't used just to line someone's pockets or add lines to a CV. That gold often is used to provide in the treatment of people who would otherwise be left to the whims of state legislatures that play political football with Medicaid and funding for health services for underserved populations.

Also, from my experience, investigators sitting on good datasets often WANT to share them for use since it establishes increased value and bodes well for future grant competitiveness. Here again... bring your credentials since you are being privy to something bordering on sacred. When I see a term like "research parasites," I know who they are talking about, and no, they are not going to be accepted as a collaborator. It might be a lack of any facet necessary to carry that data stewardship at or beyond what the originator has done - and that is not something you easily demonstrate outside of the existing systems.


It's really very simple -- either publicly supported science is meant to serve the common good, in which case total data transparency is required, or publicly supported science is actually meant to be a vehicle for corporate profits, in which case all scientific data is proprietary and those who share research data are cheating stockholders.

All this apart from the most basic principle of scientific philosophy, in which earnest, transparent efforts to find out which theories cannot be falsified has the highest priority.


I was about to make much the same point, only I'd put it this way :

Scientific research is supposed to be a public good in the economic sense[0]. If it isn't done openly and transparently then it isn't and it most certainly shouldn't be publicly funded. This manifestly isn't how it works currently - see e.g. world v Elsevier et al - and this is really just a plea to retain the broken status quo.

If researchers want their work to be of benefit solely or primarily to themselves or their paymasters - a violation of the Mertonian norms [1] - they can go work in industry.

Incentives matter, of course and what's at the root of this is that scientists - no matter how much or how often they may insist otherwise - are just ordinary self interested human beings.

[0] https://en.m.wikipedia.org/wiki/Public_good [1] https://en.m.wikipedia.org/wiki/Mertonian_norms


You're very perceptive, my friend...


I'm laughing at your comment because it assumes the data is valuable! In reality a significant chunk of it is made up, this is more about people hiding their arses then corporate profits!


> ... because it assumes the data is valuable! In reality a significant chunk of it is made up ...

Yes, but made-up data is also valuable. Were this not true, the world of advertising would collapse.

An earnest economic evaluation has to assess the value of data apart from the issue of whether it accurately reflects reality.


And why would they make up the data? Not saying they don't - Wakefield certainly did.


Millions of reasons, but the problem is that scientists are not accountable. Especially, if their violation agrees with pre-existing intuition.


> ... but the problem is that scientists are not accountable.

Arguable, but if they don't have to release their data, then very true -- they aren't accountable. By requiring scientists to release their data, we make them accountable for any shenanigans that might otherwise be concealed.


This editorial was sparked by a recent proposal by the International Committee of Medical Journal Editors (ICMJE). They are considering requiring that those publishing clinical trial reports must also share the underlying data.

The ICMJE says that authors should have the ability to prevent release of the data for six months after publication. This will prevent researchers from performing independent analyses of a study, until well after its conclusions have been trumpeted in the press. Also, while many papers currently promise data or other materials are "available on request," anyone who has tried to actually get these data knows that this is often just wishful thinking. Only by requiring data availability in an independent repository at publication can journal editors and reviewers ensure that it will be available to other researchers and the public.

The ICMJE is accepting comments on this proposal through April 18:

https://forms.acponline.org/webform/comments-icmje%E2%80%99s...

If you are a patient or a researcher, please comment.


I don't understand the redaction. The title of the journal is partly redacted...how many words can fit in that blank space even if you ignore word length analysis. "Review", "Journal" maybe a couple more. An then they also have a link to nejm.org and a seal that reads "New England Journal of Medicine".

Am I missing something, that redaction is...odd.


The original line from the editorial was: " There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.” This issue of the Journal offers a product of data sharing that is exactly the opposite. The new investigators arrived on the scene with their own ideas and worked symbiotically, rather than parasitically, with the investigators holding the data, moving the field forward in a way that neither group could have done on its own. "

I do not read that as the editor calling researchers parasites. Rather, the editorial is exploring different ways of data sharing and collaboration.

See the NEJM editorial that sparked the controvery at http://www.nejm.org/doi/full/10.1056/NEJMe1516564

The blog post appears to go to some effort to misinterpret the editorial.


How about separating data gathering from scientific analysis? Government would fund data gathering based on some things that scientists need and make that data publicly available, thus shifting the responsibility of data validity at least from scientists to "data miners". I think everyone would win from this data model.



If anyone's interested I recently blogged / commented about this issue as well: https://betterscience.org/archives/7-Data-Sharing-and-the-Re...


The Partially Derivative guys covered this on their latest podcast.[1] And by which I mean the bristled at the thought of being called "parasites".

1. http://www.partiallyderivative.com/news/2016/1/25/episode-38...


tl;dr, the Market has broken Science.


This is not at all obvious, would you care to expand ?


The entire editorial is premised on the fact that some "parasites" shouldn't have access to scrutinise data because the market incentives are reduced.


What market ? Markets balance supply and demand via prices or some analogue thereof. I thoroughly and enthusiastically agree that incentives are the basic problem but incentives != markets.


The "price" is reputation, and your name on a paper. It's a market of influence and prestige!


And that balances supply and demand how ? I understand where you're coming from, but 'influence and prestige'- unless they are traded - are not market goods. To preempt, competition based on status signals also != markets.

Though what the NEMJ authors seem to desire is more akin to a market - I'll trade you this data for the prestige of a co authorship - this is not 'markets broke science', this is 'scientists believe an explicit market structure will maximise their utility'.

These seem to me to be radically different statements, though this may be a matter of perspective.


We need goverment intervention. The scientific community has shown itself unable to produce reproducible results, and unwilling to share raw data. If the NIH demands that PubMed indexes our work, they should likewise demand that all the results are published.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: