"Confirmatory labs would be less dependent on positive results than the original researchers, a situation that should promote the publication of null and negative results. They would be rewarded by authorship on published papers, service fees, or both. They would also be more motivated to build a reputation for quality and competence than to achieve a particular finding."
Sounds great, but how would this actually work. Nobody is going to get juicy grants from existing funding agencies for being a "confirmatory" lab. Nature sure as hell isn't going to pay for this. Most researchers probably can't afford to pay an outside lab to duplicate their research. Is Nature going to suddenly start refusing papers whose results haven't been reproduced elsewhere? That's basically suicide for their journal because researchers are frequently in a race with other researchers to publish first, so why publish with a journal that requires you to double your budget to pay a confirmatory lab and wait months or years for them to do the job? The pressure will be intense to publish elsewhere first.
I have a simpler solution.
Don't just slap the names of confirmatory lab authors onto other papers. Publish original papers and publish confirmatory papers with equal prominence to the original papers. Hell, devote a portion of Nature to doing just that. Currently, if you want to publish a paper about confirming someone else's original findings, not even a third rate journal will touch it unless you put at least some kind of novel-sounding spin on it. Nature should use all that scummy impact factor gaming they do to make confirmatory papers respectable. Only when the work of reproducing results gains labs respect will funding agencies start supporting "confirmation labs". At present, such "unoriginal", "hack" work is not respected at all, and Nature is a big part of the reason why.
> Most researchers probably can't afford to pay an outside lab to duplicate their research.
Even if they could, we probably don't want the researchers paying for their results to be duplicated. This would create perverse incentives, similar to what happened with investment banks and credit rating agencies. If the original researchers must get their results confirmed in order to get published, and it is them who are paying for the confirmation, they will naturally tend to choose confirmatory labs that are more likely to confirm their findings. Since the labs would then rely on the researchers for funding, that would create pressure on the confirmatory labs to adapt their methodologies in ways that make it more likely that results get confirmed (even when the original study may not warrant it).
We want confirmatory labs to have no special interest in either confirming or disproving a particular study, but in improving the overall quality of research.
Since a journal's reputation depends (at least in part) on the quality of research it publishes, the journals would seem to be the natural candidates for the source of funding of confirmatory labs. Whether they'd actually be willing to do it another matter...
Any confirmatory lab would have to be licensed in order to get grant money for it. Just like CPAs who do an audit. Sure there is some corruption and drift toward hiring more lenient firms, but it basically works.
Side note: it is weird to me that everyone talks about whether researchers can afford to pay for confirmation, but researchers never pay for anything, grants pay for everything. The granting institutions might even be excited to try a confirmation process.
That won't get you a PhD though. All PhD programs usually require original research. And the PhD students are usually the ones doing those experiments.
You're right, this would really alter the dynamics of a PhD. As is, it takes ~2-3 years since the end of experiments and beginning of writing of an article until the that article is actually released and printed. Granted, this varies by field a tremendous amount, but lets just call it ~2.5 years of 'publication hell' on the average for an article. If you were to add into this the need for confirmation, you are adding in not only the time it takes to re-run the experiments, but also the time it takes to draft them up and go through another shorter round of 'publication hell'. Lets call that 6 more months of experiments and 6 more months of publication hell. Then you have ~3.5 years of sitting on the data and the experiments since you last touched the bench with that experiment.
Many PhD programs have requirements for the number of first-author papers you must publish to get your degree. I know of one that requires 3 first-author papers to graduate, granted a little high. Running with these very loosey-goosey numbers: Say your experiments work well and only take 6 months to preform. That's then a total of 4 years just for the first paper to get out, no sooner. As you are working on the drafts, you also preform another experiment immediately after finishing the last one , so lets call it 9 months for the added time it takes to get the last one typed up. Now we are at 1 year and 3 months with 2 successful experiments and some progress of getting the first experiment into publication. Lets do that again, and you are at 2 full years and some progress on your first and second papers. That means at a minimum, you then can get out of grad school, with the new confirmation studies to do, at ~5.5 years out. That is not too bad, right?
Now what happens when that first study was not repeatable and you have to start over? That means you have to do another experiment starting the ~3.5 year process all over in your 4th year. Now we are at 7.5 year total. What happens when the last experiment is non-repeatable? You have to do another experiment starting at year 5.5, for at total of then 9 years in grad school minimum. And yes, you can be trying for 4th and 5th papers this whole time, lessening the time in school, but each also has some chance of failure.
And this is only if you are running experiments all day and they only take 6 months total to do successfully, even the repeats and new studies that don't fail you more than once; the re-do of the last one is successful and also does not fail.
Yes, these number are a bit out-of-the-hat, but I think we can all see that the length of stay in grad school either must increase by a lot, or the required number of papers must decease by a lot, if they are to be required at all. Either way, you are not getting much education while some lab elsewhere (maybe your competitors) is sitting there trying to replicate your experiments with some unknown amount of success or efforts to do so and you are just waiting and praying that you don't get screwed into a decade of grad school.
"You must include a confirmatory study by an independent lab in order to publish this research, BUT we will give you an accept/reject decision on your paper prior to doing that study."
That way, there's much reduced bias in the confirmatory results since the paper gets published either way. And if the paper would get rejected even if the confirmation is successful, then that's a ton of wasted effort and money that are major disincentives to trying at all.
As it should be, science is about experiments and predictive results, not outcomes being "desirable". Let's incentivise that.
I think there's a broader problem though -- the rigid coupling between a) whether you've published, and b) whether you've done something worthwhile, and c) the minimal size of a publishable unit.
This forces scientists to "repurpose" the journal article to signal a lot more than "hey, we did this and something interesting is going on." Anyone who contributed wants to get their name on it, and the publication is viewed as some reward rather than a test of an ideas merit (as in the "we promise to publish even you're wrong", that you're describing).
I think it would be better if we had some fine-grained system for documenting everyone's contribution, like a git DAG. Then you could separate the question of "Alice collected this data" from "Bob proposed a great hypothesis about it" and "Charlie tried to replicate it" and "so-and-so has something publishable".
The lay public thinks "peer reviewed" means that others have tried it and validated the results. What it really tends to mean is that a peer looked at the procedures and results and that it passes the "sniff test" and generally doesn't have any glaring errors.
The more subtle problem is that in some circles, it isn't even that. Since fewer and fewer people want to be the person who damaged someone else's work and/or career, it's a blanket pass.
We're drifting away from scientific study and critical thinking to "reasonable" approaches and not upsetting doctrine and/or your superiors. That looks less and less like science and more like religion.
Here's a leading scientist's description of peer review:
Peer review works superbly to separate valid science from nonsense, or, in [Thomas] Kuhnian terms, to ensure that the current paradigm has been respected. It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources ... sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their toughest competitor is rigorously honest in the reporting of scientific results ... It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.
From: Reference Manual on Scientific Evidence [for U.S. federal judges], Third Edition; How Science Works section by David Goodstein, CalTech Physics Professor and former Provost; published by National Academies Press (2011)
Peer review is usually quite a good way to identify valid science. Of course, a referee will occasionally fail to appreciate a truly visionary or revolutionary idea, but by and large, peer review works pretty well so long as scientific validity is the only issue at stake. However, it is not at all suited to arbitrate an intense competition for research funds or for editorial space in prestigious journals. There are many reasons for this, not the least being the fact that the referees have an obvious conflict of interest, since they are themselves competitors for the same resources. This point seems to be another one of those relativistic anomalies, obvious to any outside observer, but invisible to those of us who are falling into the black hole. It would take impossibly high ethical standards for referees to avoid taking advantage of their privileged anonymity to advance their own interests, but as time goes on, more and more referees have their ethical standards eroded as a consequence of having themselves been victimized by unfair reviews when they were authors. Peer review is thus one among many examples of practices that were well suited to the time of exponential expansion, but will become increasingly dysfunctional in the difficult future we face.
This has not been even close to true in my experience. Reviews are usually unsigned, so there's very little social pressure to "let things slide". On the other hand, a non-trivial number of reviewers seem to think the review process is an opportunity to "rough up the competition" instead of an opportunity to offer constructive feedback.
That doesn't rule out that it's a blanket pass for some papers. How about if we phrase it this way:
In some cases, peer review works correctly with constructive feedback. In some cases, peer review is abused to "rough up the competition" and slow down the progress of science. In yet more cases, peer review is a blanket pass when the correct actors all align on that paper. There is no way to determine which of these cases apply to any particular paper.
Or put another way, all of these experiences can be true and correct at the same time.
I'm willing to believe that the skids could be greased for a some papers, based on the trendiness of the topic or the authors' reputations. I'm also willing to believe that this is hard for people without the relevant expertise to detect.
However, I think describing peer review, generally, as "a blanket pass" is going much too far. If anything, I wish it were harder on actual methodological errors while being much, much more permissive of (openly-disclosed) ambiguities in the data or gaps in the theories. Right now, people tend to 'write around' issues in their data, lest a reviewer argue that this invalidates the entire experiment. Looking at papers from the 1980s and 1990s, it's amazing to me how much more frank the authors were.
Sorry! I'm not trying to, but I'm also not sure how else to interpret them. As I said, I'm willing to believe that papers occasionally slip through the cracks (the arsenic life thing from 2011 could have been caught in review, for example), but the idea that a decent number of papers papers just slide through the peer review process is totally unlike the experience I've had with my own and my friends' and colleagues' papers.
> What it really tends to mean is that a peer looked at the procedures and results and that it passes the "sniff test" and generally doesn't have any glaring errors.
It sometimes means that.
But there are studies that fail the smell test like a refuse heap and still somehow pass a "rigorous" peer-review process.
Remember when George Ricaurte, who by the way was already pretty obviously a charlatan ONDCP whore at the time, injected baboons with what he said was a normal dose of MDMA (2mg /kg) and found severe neurotoxicity? [0]
Yeah, well two of the five of the baboons died. I remember literally the day that study was published - in effing Science. It was all over the news, including the front page of the NYT.
But plenty of us in the drug policy reform movement (and, for that matter, those of us who had used MDMA a few times) knew immediately (and said so) that this study was obviously flawed because, well, people don't die from a normal dose of MDMA. Sure enough, it later turned out that Ricaurte had injected those poor baboons with a 2mg/kg dose of methamphetamine, not MDMA. He said that there had been a "labeling error," which his supplier denied.
There are examples like this every day.
The peer review process is only as good as the political will toward righteous honesty - the state has muscled out-and-out deceit through this system often enough to make any thinking person doubt its capacity even as an effective "sniff test."
> The lay public thinks "peer reviewed" means that others have tried it and validated the results. What it really tends to mean is that a peer looked at the procedures and results and that it passes the "sniff test" and generally doesn't have any glaring errors.
> The more subtle problem is that in some circles, it isn't even that. Since fewer and fewer people want to be the person who damaged someone else's work and/or career, it's a blanket pass.
From my experience in the biomedical review process, I would characterize the process as brutal, at least for top venues and federal grants.
> We're drifting away from scientific study and critical thinking to "reasonable" approaches and not upsetting doctrine and/or your superiors. That looks less and less like science and more like religion.
I mostly agree that there is friction with established doctrine/superiors, but hasn't this always been there? It seems hard to find a major scientific discovery that didn't have some established concept (and proponents) to push against.
Conveniently enough, that topic was discussed here a while back:
> They show that the premature deaths of elite scientists affect the dynamics of scientific discovery. Following such deaths, scientists who were not collaborators with the deceased stars become more visible, and they advance novel ideas through increased publications within the field of the deceased star.
I reject the idea of a religion-like science. I would say it has become what it is now because of the economic view the society has adopted to manage it, rather than because of irrational thinking.
Apparently, science production doesn't scale well, because scientists, when asked to compete for their bread-winning, find it easier to fool their managers than to produce legit science.
Said one Chinese scientist under Mao or soon after:
The Academy of Sciences is the Academy of Sciences ... It is not the Academy of Production. It is a place where one studies, not a place where one plants cabbages. It is not a potato patch, it is a place where one does science ...
You challenge my point of "not upsetting doctrine and/or your superiors" by saying some scientists find it "easier to fool their managers than to produce legit science."
In my reading, pyrale was agreeing with you, but instead of sourcing the problem as some vague social effect putting the blame specifically on our ways of funding science.
Isn't the approach different in different fields? I remember reading about recent advanced papers in mathematics that have been published but then left "in the void" a little bit because it took such a long time for peers to actually read, understand and try to challenge the proofs.
And when the proofs become terabytes of data produced by a program, and the reviewer has to write another program just to verify that the proof is sound, it's going to become intractable.
Maybe we should encourage it financially. Like bug bounties.
I have an idea: if a research study doesn't go the way you thought it would, put it out there.
We need a central repository like Arxiv where we dump the experiments that didn't work out so that we can quickly compare a "successful" one to ones done before. That gives us a better idea of if the data is just a fluke.
The papers wouldn't have to be super involved. What did you do? What were statistical conclusions. Give an upper-level undergraduate or an early masters student some experience writing up a procedure. Shouldn't take more than a couple hours but it could save a lot of time dealing with publication bias
To write a good, technical blog post takes 10+ hours. To write a complete academic paper with references, related work, methods, graphics, etc can take days, weeks, or even months (eg. for thesis work).
It's not just that negative results largely go un-valued by the academic status quo... it's also often just not worth the effort to write-up beyond simple documentation in your lab notebook. As a researcher, the lasting impact that we care about is (simply put) in the positive results, and being the first to achieve them.
Anecdotally, researchers generally won't do exhaustive (emphasis added) literature reviews from less reputable sources for negative results before attempting reproduction. At the very least, you'd need better organizational structures to make finding negative results easier -- something other than citations alone. Maybe this could be a section on each Nature paper with links to positive & negative reproduction notes?
The focus shouldn't be on positive results, the focus should be on publishing results. A correctly incentivized researcher would strive to publish, and strive to improve knowledge in their field.
What prevents scientists from publishing several failures and arbitrarily jumping ship off any experiments because it's faster to fail than it is to succeed?
The reason to publish failure, currently, is to challenge accepted knowledge. This can be very good for one's career. This is also done, to a lesser degree, for assumed knowledge.
But I think you'd be able to control for random failure publishing if some rigor (i.e. why did you do this) was involved.
I really don't know how incentives would be changed to allow for this though.
Currently the market (i.e. citations and novelty) are the incentives. They are gamed though, but I think ongoing discussion about problems does mitigate that over time. As everything in academia, it does take long.
If you never get a positive result then you're probably just not good at science. Also I can't imagine any scientists who gets up and tries to fail everyday.
However, they're still advancing knowledge. Just because your experiment doesn't work doesn't mean it wasn't worthwhile.
If you publish your failure then future researchers don't have to waste their time walking down the same path.
> If you never get a positive result then you're probably just not good at science. Also I can't imagine any scientists who gets up and tries to fail everyday.
Science is this: Try to disprove (i.e. fail) your hypothesis. If you've failed sufficiently, then you have something that can be published.
> Just because your experiment doesn't work doesn't mean it wasn't worthwhile.
If experiments don't work then you need to redo them to figure out why they didn't work. An experiment with a negative result is not an experiment that didn't work. It worked, but gave you an answer you did not expect. Verify.
Every scientist I know (a lot) and have interacted with understand that a negative result is not a worthwhile result.
> If you publish your failure then future researchers don't have to waste their time walking down the same path.
Scientists often publish negative results. This is mostly done if it disproves some other published work that you do not believe, through some experience of your own, and want to test with a different test or a more rigorous test. But if it does not challenge something that's accepted, it's unlikely to get acknowledged and is therefore of low value to the lab and to the scientist that is publishing. They cannot afford to put that above their own career success.
And if they did they would find it harder and harder to get grant money and therefore will fail in the academic field completely. You cannot (except for extreme edge cases) contribute to science without money.
Just publishing the raw data and the methods section would be enough imo. Alas trying to get scientists to publish the data sets for their research is surprisingly hard for various reasons.
Publishing poorly-written crap will make you look worse than publishing nothing at all.
There's already a terrible stigma around negative results. Grant committees and tenure committees would subconsciously like to believe that you are not a scientist who need to test hypotheses, but a clairvoyant superhero who only has correct hypotheses. Yes, this is ridiculous and anti-scientific.
> There's already a terrible stigma around negative results. Grant committees and tenure committees would subconsciously like to believe that you are not a scientist who need to test hypotheses, but a clairvoyant superhero who only has correct hypotheses. Yes, this is ridiculous and anti-scientific.
this is exactly why an archive of these sorts of results is necessary. so that we can see that this is part of the process, de-stigmatize it, and use those findings to make more progress and design better experiments to gain further insight into the world.
replication and null results are an important part of the scientific process and building up our overall store of knowledge. it can't all be sexy groundbreaking proofs of new hypotheses.
Everyone would have to agree to publish their rough notes (on the web). If this was the norm, then no one would be singled out for criticism. Of course, norms would have to change.
The article is calling for demanding more effort to get a paper published. That would be a change of norms. There's no real way to solve the reproducibility problem without changing norms.
Publishing notes seems easiest. It would be like "open source" research.
Not going to happen. You don't understand the forces at play when biomedical researchers (I'm one too) do the work. A major reason more specific to this type of researcher that he/she wants to hold back the data is in the event it can be re-purposed for later goals. It's capital. The time and money spent acquiring the types of negative data the article talks about (mouse experiments) take big dollars. I'm not going to throw it away in an archive. It's like donating to a Goodwill bin some really expensive clothes you saved your money to buy only to realize after you bought it that it's the wrong size. You're totally going to try to make it work often for years before you give up on it.
If you take issue with my comment and have ever worked for a startup that didn't pan out (should be many of us here on HN), think of the scenario if someone told you, "hey, why don't you just be a good person and open source your app and give all your customers to XYZ?" (Setting aside the armchair quarterback guilt imposed on you) There are a few who do that, but if you've spent any length of time on your business, you're going to think of ways to re-purpose the investment you made it in other ways before you just go dumping it in an archive like Github or whatever.
I think the notion that negative outcomes are capital is a good one. And if the research has public money involved then we have a right to see them so we are not spending dwindling public money on unnecessary redundancy (as opposed to productive or confirmatory kinds.)
The second point about open sourcing tech that do not achieve its goal is also a good point. People who do that with great docs etc get a lot of respect. People who dump badly done work do not.
Businesses that fail but care about their customers often "donate" them to a competitor.
I think these ideas are quite valid and can and should be realized more often.
> People who do that with great docs etc get a lot of respect.
That's the issue. Publishing run-of-the-mill negative results right now doesn't get you much of anything.
I would love to live in a world where hiring committees and funding agencies say "Well, 3/5 of his experiments flopped, but we looked at the papers and they were all carefully thought out and well-run. Hire him/give him some cash!" Alas, I don't.
The article even says, "We trust that reviewers and tenure committees will find appropriate ways to credit papers that include confirmation." The issue is that needs to come first for anything to happen.
There seems to be a lot of sharing amongst artificial intelligence researchers nowadays and it seems to be accelerating innovation.
Sharing of medical research could potential accelerate innovation that could save a lot of lives (including those of the researchers themselves and their loved ones).
It's curious that these industries are so different in this respect.
What would have to happen in order to increase the amount of sharing of medical research?
I would say this is more illusory than real. It is common in areas that have intense private R&D competition, such as AI, databases, et al to regularly publish research that makes you appear relevant without publishing any internal research that actually matters. The incentives are aligned to encourage this.
It is unfortunate but I have seen it first-hand in a few interesting computer science research areas, and academia simply doesn't have the resources to keep pace with the output that happens under NDA at many private research labs.
Negative results are one of the biggest things missing from publications. I highly suggest the book "The Antidote" by Burkeman. There are several chapters on our current western view of failure and how our attempt to distance ourselves from it can hinder research, product development and even personal development.
I think there was even an article posted here a few months back on how the lack of negative results in science really hurts the community. People may work on something for half a year, consider it a failure, and just dump it; and other people go down the same exact path with the same methods. (If you publish a negative result, someone may pick it up and ask, "I wonder if they tried x or y" and attempt the experiment again).
How can an experiment 'not work out'? Do you mean a negative result? Not getting evidence for your hypothesis is not 'not working'. That's a crazy way to approach science. It is more information to adjust your hypothesis. Or do you mean a failure such as broken equipment or an infected sample meaning you have no data? Well then what would you put in the paper?
Universities are charging hundreds of thousands of dollars in tuition, paying adjunct professors a pittance to teach classes, and researchers are dependent on grant funding for their research. Where is the money going?
> How can an experiment 'not work out'? Do you mean a negative result? Not getting evidence for your hypothesis is not 'not working'. That's a crazy way to approach science.
yes, i think this is exactly what StClair means. and i agree (and i'd guess that StClair would agree) that this is "a crazy way to approach science." i think that's the point, that far too many null results get thrown out, when those are actually useful data points for the wider world to see.
I don't understand - experiments only work if they prove your assumptions? If your assumption was wrong then the experiment is working if it won't confirm it.
Or your experiment was faulty. Or the effect was too small to show up in your experiment, or any of many things that cause an experiment to be inconclusive. Getting a negative result with high confidence is real work. Failing to get a positive result is not the same thing.
"Experiment A is based on the assumption that Y is true. Halfway through doing A, it's quite clear that Y is at the very least not true. Thus, Experiment A doesn't work."
Most people I know who have experiments not work out mean something like that - there's something not successful about the experiment itself, not just that they didn't get the answer they were hoping for.
a friend of mine (who just finished a clinical psych PhD) had exactly this idea a couple years ago. a null results archive, so that experiments that don't "work out" don't just get thrown away.
The obvious questions is "who's going to do the confirmation work?"
I think that masters/bachelor students should be able to handle that work. A new grant mechanism for masters/bachelor training grants that fund replication would get the job done with a lot of nice side effects.
As a middle american, I'm actually serious when I say this could be a way middle america gets on it's feet again. Being a biomedical researcher at a university hospital, the hospital has proliferated with all types of trainees. What's holding back the basic research side? It's an assembly line in the same way the midwest is familiar, the automobile industry. However, universities aren't good businesses, and what's missing is big pharma companies paying employees fair wages to do this work. Though companies saving the day won't work either because putting people to work we'll mean they won't get sick which is bad for business. So oh well on to the next approach.
The article refers chiefly to repeating mouse or rat experiments.
The obstacle there isn't what level of training researcher has (as long as it's sufficient), but who is going to pay for it.
At the scale proposed (a 6-fold greater number of mice per experiment than is usual) the cost of testing only the core hypothesis is easily over $100K. In addition there is the time involved, which can be from months to years, depending on the experiment.
I don't think they're proposing to scale up to that level, which would indeed result in excessive costs.
Even scaling-up mice experiments would be quite costly, and beyond the size of most research grants. There would have to be new funding mechanisms in place for work like this.
The article proposes that this would be more economical in the long run, and so the NIH et al. should be in its favor. Perhaps they will create a new facility for just such work? Though that seems unlikely in the current funding climate.
Also, the limited supply of researchers and facilities wiling to put up with death threats, breakins, firebombings, being beaten up on the street by groups of masked assailants...
No thanks. Papers involving animals are already backbreakingly slow compared with cell-based or in vitro work. I know because I've been lapped by my colleagues using more simple systems as I slog through our paper we got rejected from Nature because the reviewers suggested another 3 years worth of experiments. Yep, year 5 into this single project, which we knew the outcome for 4 years ago. Not excited about this proposal at all.
Look I'm all for rigor but how about the people trying to make money off the deal pay for all the work and keep people like me out of it. Or don't allow the people trying to make money interpret the results of such preliminary studies so liberally. It's like the education system. Scientists like teachers, both of which don't make much money and do all the labor, don't want more hoops jump through.
Sorry, which parts of the proposal are you responding to? The article is more specific than "more rigor". Are you objecting to the higher p-value threshold? The independent confirmation?
The author argues that a single higher quality confirmatory experiment will be able to replace gathering lots of statistics for exploratory experiments:
> Unlike clinical studies, most preclinical research papers describe a long chain of experiments, all incrementally building support for the same hypothesis. Such papers often include more than a dozen separate in vitro and animal experiments, with each one required to reach statistical significance. We argue that, as long as there is a final, impeccable study that confirms the hypothesis, the earlier experiments in this chain do not need to be held to the same rigid statistical standard.
> one that incorporates an independent, statistically rigorous confirmation of a researcher's central hypothesis. We call this large confirmatory study a preclinical trial. These would be more formal and rigorous than the typical preclinical testing conducted in academic labs, and would adopt many practices of a clinical trial.
As you can see above and from your quotation (and like many other folks who come in to save the day), this article is heavy on plans and short on who is going to do the work. Of course I support papers where every single experiment doesn't have to play p < 0.05 games, but other parts of the article wander in other directions. That's all I'm reacting to.
Having a higher threshold for publication can be imposed without detailing who does what work. You might argue that this means less research will get produced, but it's probably worth it. Practitioners underestimate the difficulty of transferring knowledge to outsiders because of frictions due to trust, clarity, and tacit knowledge.
When you call for everyone to scale their experiments up by sixfold, I think you also need to consider the logistics of doing that. I'm totally in favor of better, more rigorous experiments, but I know that we couldn't afford the time, space, or gear needed to do that right now.
I don't think they want this to apply to all papers, just a certain class of papers. But maybe you have already considered that(as your reviewers obviously are already pinging you about more testing) and you still disagree?
I don't think this is a good idea because it would increase the politicking in scientific publication. Specifically, no one is going to want to do the reproduction work, so reproduction work will be seen as a favor from one scientist to another. Moreover, in specialized fields, scientists just as frequently see each other as competitors as collaborators. I strongly suspect there would be a lot of gamesmanship where scientists refuse to (or drag their feet) do reproduction work on new studies that threaten to disrupt the status quo that has made them successful.
I would absolutely kill to get a job doing replication.
What I always hated about science was the inability for things not to work out. Even if you find something directly opposed to your hypothesis, you are somehow supposed to pretend that it worked out "just as planned".
It's toxic, boring and leads to bad science.
And so, for me, I would absolutely adore to be in a place where I got to run well-powered studies and aim to just figure out the right answer rather than build my career on a bunch of unrepeatable statistical flukes.
That being said, my PhD is in Psychology, so they probably won't be hiring me to run animal-model studies.
I really like this idea, as long as Nature put their space where their mouth is (which they won't, as they have at least one of these articles per year and it doesn't appear to have made any impact).
Pretty much this. I cannot help but see a replication requirement like this turning things even more political. Young New Investigator's Lab, who has very little to trade, is going to struggle, while someone who can conjure a postdoc out of thin air for someone's student will probably be able to find someone.
Just to note, there's a tradeoff here - not publishing work until you are massively certain of it would also cause real life harm. Reducing the p value doesn't automatically reduce harm.
Physicists require extremely low values before confirming a discovery has been made, but that's different from requiring it before publishing.
The problem is with people interpreting published work as if once its published, its completely certain.
Maybe each publication should come with a headline 'confidence' stat beside the title. I guess this is a step in that direction.
I'd argue the fetishization of a specific p-value threshold, be it p<0.05, p<0.01, or even p<0.0001, is a much bigger problem. There is an excellent quote from Rosnow and Rosenthal:
[D]ichotomous significance testing has no ontological
basis. That is, we want to underscore that, surely,
God loves the .06 nearly as much as the .05. Can
there be any doubt that God views the strength of
evidence for or against the null as a fairly
continuous function of the magnitude of p?”
Wouldn't you prefer a few correct but tenative studies that hint at an effect while paving the way for larger, more expensive replications to a scenario where the data is sliced, diced, and tortured to hit some arbitrary p-value threshold?
The problem here is that publishing a single paper is often the product of months if not years of work. Saying: now add more work without more grant money, is going to be difficult to swallow. Even worse, it departments need to hire even more PhDs who are unemployable after they graduate.
Pre-registration is nice, larger samples/greater power is necessary, and increasing the p-threshold may indirectly filter out some false positives but kind of misses the underlying issue of p-hacking, some of which would be solved by pre-registration.
The authors suggestions are preventative in nature but what I would like to see above all else is requiring researchers to publish the raw data and to make their statistical analyses minimally reproducible--something which could be satisfied by publishing scripts or Excel macros along with instructions for any non-automated data stitching. Experiments frequently implode at the analysis phase which then gets intentionally or unintentionally masked in ambiguous, poorly written methods sections. Giving others access to the data allows errors to be spotted earlier after publication and alternative hypotheses and analyses to be tested against the published results. It's also sometimes the only way of spotting abnormalities resulting from the data collection process itself. Again, not a means of preventing errors, but a low friction way of discovering them. Maybe having everything in the open would light a fire under some researchers to be more thorough, though.
Again, statistics applied to a partially observable and partially understood phenomena yield nonsense. If not all the variables are controlled or not all possible causes has been taken into account the result will be a mere aggregation of observations.
What's true for coins and dices does not applicable for partially observable environments with multiple causation and yet unknown control mechanisms.
Statistics is not applicable to imaginary models based on unproven assumptions or premises.
Quote: "Our proposal is a new type of paper for animal studies of disease therapies or preventions: one that incorporates an independent, statistically rigorous confirmation of a researcher's central hypothesis."
This probably won't happen right away, but it's a terrific and necessary idea that we need to to move forward. It will revolutionize biology and medicine, and it will end the field of psychology as we know it.
It might make it better, but we're nowhere near being able to describe (e.g.) group behavior from first principles or ion channel kinetics. Despite your link, there is a lot of solid psych research. There are (obviously) discredited theories and cranks too, but psychologists characterized rods and cones way before the biologists found them, for one example.
> Despite your link, there is a lot of solid psych research.
Yes, solid, but lacking the dimension of falsifiable theories about the mind. Tax accounting is also solid research.
> ... but psychologists characterized rods and cones way before the biologists found them, for one example.
Those weren't psychological studies. Psychology is study of the mind and behavior. Rods and cones are neither. When a psychologist studies something biological, it's not psychology any more.
Psychophysics and perception research is widely considered to be part of psychology, and has strong, falsifiable theories about how sensory stimuli are encoded and processed. Using purely behavioural methods, psychophysicists figured out that there were three color-sensitive "sensors" and pinned down their properties. I'm not sure it suddenly becomes biology because someone later found the cellular substrate, nor did it become chemistry when someone figured out the structure of opsin.
Likewise, I'd argue that a lot of the learning stuff (e.g., reinforcement learning) also describes the mind's operation in testable and falsifiable ways.
Quote: "Psychology is the study of the mind and behavior."
> and has strong, falsifiable theories about how sensory stimuli are encoded and processed.
Studies that aren't based on empirical evidence, that aren't based on theories about nature, cannot be falsified. We know a lot about behavior, but we have no empirical theories about it -- for that, we have to wait for neuroscience.
> I'm not sure it suddenly becomes biology because someone later found the cellular substrate, nor did it become chemistry when someone figured out the structure of opsin.
Of course it becomes biology/chemistry. But the connection between someone's ideas about the mind and biology can only be conjecture until neuroscience produces a physical theory that makes such a connection -- and at that point, mind studies will be abandoned.
> ... also describes the mind's operation in testable and falsifiable ways.
The mind is not a physical thing, consequently it cannot produce empirical evidence or falsifiability, two of science's fundamental requirements. If one psychological experiment asserts that X is so, and another asserts that X is not so, that's not a falsification, it's a contradiction. The difference? A contradiction can itself be contradicted in another experiment (something often seen in psychology), but a scientific falsification is conclusive.
All this talk about empirical evidence, theories and falsifiability may seem overly philosophical until one realizes this is how we keep religion out of science classrooms.
Let me give you an example, from visual perception.
The "Feature Integration Theory" suggests that low-level image features are processed in parallel: you extract information about the color, orientation, and movement in parallel across the entire visual field. However, these representations are separate, and a second, serial process is needed to combine information across the two areas.
This makes a specific, testable predictions. Suppose you're searching for a red triangle. If this shape is embedded in a sea of green triangles, your reaction time shouldn't vary as a function of the green triangles. The same thing should happen if the red triangle is surrounded by red circles--reaction times should be relatively constant regardless of the number of red circles. However, if you need to find a red triangle embedded in a mix of red circles and green triangles, you should a) be slower and b) your reaction time should be a function of the total number of shapes.
I'd argue that this theory is empirical (run the experiment, record reaction times) and about as falsifiable as it gets (it's easy to test the difference in RT. vs. item # slopes).
I'd also argue that this is a computational theory describing how visual search works, without worrying about the underlying implementation of that process. Clearly, it would be interesting to know that too, but it's certainly not necessary. David Marr proposed that cognitive processes could be studied on three levels: computational (what's the problem), algorithmic (what's a way to solve the problem), and implementation (what do the neurons do to run that algorithm), and each level was largely independent of the ones below.
You seem to be missing the point that, no matter how many hypotheses we make about the inner workings of the brain, we cannot turn them into science without either confirming or refuting them by direct examination of the brain itself. As long as we're hypothesizing about mechanisms that remain beyond direct observation, it's speculation. One cannot falsify a speculation.
Psychology doesn't study the brain, it studies the mind.
> I'd argue that this theory is empirical ...
The observation is empirical but the theory isn't. It cannot become science without validation by way of empirical evidence.
> I'd also argue that this is a computational theory describing how visual search works, without worrying about the underlying implementation of that process.
Yes -- and because we cannot directly observe the processes we're hypothesizing about, we cannot make them a matter of empirical evidence, therefore we have no basis for falsification.
Quote: "Criterion of falsifiability, in the philosophy of science, a standard of evaluation of putatively scientific theories, according to which a theory is genuinely scientific only if it is possible in principle to establish that it is false."
> Clearly, it would be interesting to know that too, but it's certainly not necessary.
Only necessary for science, otherwise not important.
On the contrary, the point I'm trying to make is that you can study the processing done by whatever's in your skull (mind, brain, GPU, nanobots, whatever) while being totally agnostic about the underlying hardware. You can develop theories about this, test them, falsify them, and revise them.
Returning to the feature integration theory, it says that during singleton search (red vs. green), reaction times should be constant regardless of the number of items, while reaction time should be a linear function of the number of items when the search involves combining information from multiple feature channels.
You can test this with a junky laptop or even some drawings and a stopwatch. In fact, if you really want, I'll send you a script so you can test it yourself. You can falsify this: just fit lines to the (item count, RT) data and see if the slopes match the predictions. People have, in fact, done this, and have shown that this explanation of visual search isn't quite complete--weird things happen when the target is very rare, for example.
Can you explain exactly what some neuroscience data would add here? Look, I'm not some hardcore dualist. I work in a systems neuroscience group and completely agree that brain-based theories are more interesting than phenomenological ones, which is why I put up with the hours, pay, etc. However, this doesn't make those psychological theories any less valid, nor does it make psychology less of a science.
Anyway, there are lots of other processes that are not directly observable. Gravity, for example, wasn't directly observed until last year. Evolution can't be observed directly either.
> Can you explain exactly what some neuroscience data would add here?
Certainly -- it would move the issue from the metaphysical to the physical realm. That might make it science. There's no science of the metaphysical.
> Anyway, there are lots of other processes that are not directly observable.
Yes, but not scientific ones.
> Gravity, for example, wasn't directly observed until last year.
Orbital mechanics both predicts and observes gravity. Each successful spacecraft journey represents another successful prediction of the physical theory of gravity. Dark Energy shows that a physical gravitation theory can be potentially falsified, a property all self-respecting scientific theories must have.
Gravitational time dilation represents an empirical confirmation of Einstein's General Relativity, our present theory about gravity. To be accurate, the atomic clocks on board GPS satellites must take this time dilation into account (as well as that from Special Relativity).
Einstein rings represent another direct observation of gravity. I think you mean that gravitational waves weren't observed until last year. Predicted long ago, finally observed.
> Evolution can't be observed directly either.
There is copious, empirical evidence for evolution. Start with how and why antibiotics lose their effectiveness over time. Then move on to laboratory studies of the evolution of Drosophila Melanogaster (fruit fly), chosen for its short reproduction cycle. Examples abound, all empirical and falsifiable.
These are all examples of experimental confirmation of empirical theories, all potentially falsifiable by observing nature.
I'm still a little hung up on the metaphysical part though.
Suppose we ignore all the baggage around the "mind" and just consider an input/output relationship: a visual stimulus goes in and some response comes out. We can formulate a theory about the transfer function that maps between them, and then test it by applying different stimuli and comparing the observed response with the expected one.
A) Suppose the device is a machine. It initially looks like it responds by beeping when illuminated. However, we falsify this theory by finding different patterns of light and dark, some of which cause the machine to beep and others that don't, falsifying that theory. However, it suggests that some feature of the image may matter, so we test it to see if varying the color of the light matters (it doesn't), or if the relative spacings of light and dark do matter (it does). We try more features and eventually discover that it responds to every single one of the barcodes we try, but nothing else, and hypothesize that it's a barcode scanner.
B) Suppose we're recording from a single neuron in a rat or cat's brain while the animal views a screen. In early experiments, we discover that cells in this brain area--and this neuron specifically--responds to visual stimuli. We adjust the stimuli and note that it only responds to some of the stimuli, so we construct a quantitative model giving the expected distribution of responses as a function of stimulus features. This suggests more tests of the model--perhaps the model has very limited spatial support and thus claims that far apart stimuli have no effect. We present the animal (and thus, the neuron) with stimuli outside the supported region and, to our surprise, the responses change. We revise the model to include some suppressive interactions, and try again....
C) Suppose we're recording the behavioural responses of a human subject. We show the subject pictures of other humans, and ask them to report whether the individual shown is a man or woman. We hypothesise that certain features in the image guide this decision, so we modify the images to enhance or degrade those features and repeat the experiment. Some of these changes have no effect, others increase or decrease the speed and accuracy with which they respond. So, we modify the images more selectively, or only in certain locations, and repeat the experiment, revising our model as we go.
It seems like you would admit A and B as being "scientific", but think that C is flawed. Is this right?
> Suppose the device is a machine. It initially looks like it responds by beeping when illuminated. However, we falsify this theory by finding different patterns of light and dark, some of which cause the machine to beep and others that don't, falsifying that theory.
But that's not a theory, it's an observation, and it cannot be falsified, only contradicted. We observe the machine's outputs without any deep understanding of the reasons for the behavior or a grasp of why it's acting as it is. Therefore when we draw a conclusion about a repetitive pattern, and make an assertion about the pattern, we could easily be contradicted by another observer seeing a different pattern and drawing a different conclusion. Those are contradictions, not falsifications.
Say we're an alien, visiting earth, and we see cars moving along a road. By observation we conclude that the cars have to stay in line, no one can force their way through all the other cars. It's a "theory".
Then a fire truck appears and does exactly what we asserted could not happen -- it makes all the other cars move out of the way. But our "theory" is not falsified, it's contradicted. A falsification would require (a) a deep understanding of why cars behave the way they do, and (b) a theoretical falsification based on that deep understanding, not simply a new observation that contradicts an old one.
Another example. For a while there was a mental illness called "Asperger Syndrome". It came into being in meetings of psychologists who talked about it, and who eventually voted it into the DSM.
Everybody liked this new mental illness, it became very popular. Some psychologists even claimed that a lot of famous people had it -- Isaac Newton, Thomas Jefferson, Albert Einstein and Bill Gates, to name just a few. This roster of famous "Aspies" made the mental illness even more popular, especially among young people.
Then things got out of control, and people were actually proactively demanding the diagnosis, for themselves and/or their children. The fact that they could collect Social Security disability payments might have been a factor.
Seeing the clamor about this disease and fearing the consequences of a public backlash, the psychologists held another vote and voted Asperger Syndrome out of the DSM (http://www.nytimes.com/2009/11/03/health/03asperger.html).
So, was Asperger Syndrome falsified? No, not at all. It wasn't falsified because it was never more than an observation -- it never had a theoretical basis. As a result, Asperger Syndrome is neither true nor is it false, and anyone can contradict anyone else while discussing it. By the way, the same thing was true about homosexuality about 30 years ago, with the same controversy and the same outcome -- it was a recognized, listed mental illness, then it wasn't.
This is not science. And it won't be until we understand the brain. Understanding the mind is not only not helping, it's an obstacle, because people have come to think of the mind as a cause of behavior, when it's clearly an effect of the workings of the brain, and science can't be based on effects -- it must be based on causes.
I was with you until here. I'd say you were getting religious if you tried to represent the mind as anything other than an emergent physical phenomenon.
> I'd say you were getting religious if you tried to represent the mind as anything other than an emergent physical phenomenon.
A similar argument could be made from the opposite perspective. The mind is not part of nature, of reality. It's an artificial philosophical construct, arising from observations of behavior, and on that basis extrapolating the existence of something for which there's no direct evidence. Not unlike God in that respect.
By definition, an emergent physical phenomenon would be something that ultimately emerged in physical form. The mind cannot do that. I think you're speaking of behavior, which does meet the definition.
I think you're putting a lot of weight on "study the mind" versus "study the brain." Not many psychologists are dualists.
They tend to say "mind" because they want to abstract over the actual hardware--they're interested in studying what happens in the skull, not how it is implemented. This doesn't make it any less scientific than a biologist who studies where birds fly while neglecting the aerodynamics of the birds' wings.
> They tend to say "mind" because they want to abstract over the actual hardware--they're interested in studying what happens in the skull, not how it is implemented.
Yes, but asking what isn't science by itself, which tries to say how a particular observation came about.
Also, psychologists don't really know what's happening inside our skulls. Studying the mind can't tell us what the brain is doing or how, any more than a computer display can tell us how a CPU works -- for the latter, we need access to the source, not its external manifestation.
* If I say, “The night sky is filled with tiny points of light,” I've offered a description. Another observer might contradict my description, for example by emerging from his cave on an overcast night and not seeing any points of light, but the contradicting observation can itself be contradicted on the next clear night, without any chance for resolution. So, apart from being shallow, inconclusive and trivial, it's not science.
* If I say, “Those points of light are distant thermonuclear furnaces like our sun,” I've offered an explanation, one that makes predictions about phenomena not yet observed and that's falsifiable by empirical test. On the basis of this explanation we might build a small-scale star (a fusion reactor) to see if our experiment shows any similarity to the spectra and behavior of stars. This deep explanation represents a theoretical claim that's linked to other areas of human knowledge, predicts phenomena not yet observed and is conclusively falsifiable by comparison with reality (our fusion reactor might fail to imitate the stars). It's science.
> This doesn't make it any less scientific ...
But it does -- one cannot falsify an observation, one can only contradict it.
> This doesn't make it any less scientific than a biologist who studies where birds fly ...
If the biologist only records where the birds flew, it's not science. If he crafts a testable, falsifiable theory about why the birds flew there, that might be science. Science is about crafting theories that go beyond simple observation, that explain them and suggest new phenomena not yet observed.
Sounds great, but how would this actually work. Nobody is going to get juicy grants from existing funding agencies for being a "confirmatory" lab. Nature sure as hell isn't going to pay for this. Most researchers probably can't afford to pay an outside lab to duplicate their research. Is Nature going to suddenly start refusing papers whose results haven't been reproduced elsewhere? That's basically suicide for their journal because researchers are frequently in a race with other researchers to publish first, so why publish with a journal that requires you to double your budget to pay a confirmatory lab and wait months or years for them to do the job? The pressure will be intense to publish elsewhere first.
I have a simpler solution.
Don't just slap the names of confirmatory lab authors onto other papers. Publish original papers and publish confirmatory papers with equal prominence to the original papers. Hell, devote a portion of Nature to doing just that. Currently, if you want to publish a paper about confirming someone else's original findings, not even a third rate journal will touch it unless you put at least some kind of novel-sounding spin on it. Nature should use all that scummy impact factor gaming they do to make confirmatory papers respectable. Only when the work of reproducing results gains labs respect will funding agencies start supporting "confirmation labs". At present, such "unoriginal", "hack" work is not respected at all, and Nature is a big part of the reason why.