Hacker News new | past | comments | ask | show | jobs | submit login
Just 11% of 53 cancer research papers were reproducible (nature.com)
292 points by vog on May 9, 2013 | hide | past | favorite | 179 comments



I came to the biological sciences trained as a chemist. As I began working on my Ph.D. and variously encountered papers on cell signaling research (the field that much "cancer research" would fall into) it was blindingly apparent to me, perhaps because of my chemistry background, what was going on...

Cell Signaling (and much of Biology) research is in its Alchemy phase. Now, alchemy often gets a bad rap as being completely worthless, snake-oil type stuff, but this was not the case at all. Rather, individuals were working on an area about which almost nothing was known, and (more importantly) for which the key central organizing laws were not yet revealed (for Alchemy: the atomic theory of matter and chemical bonding, for Cell Signaling: how individual proteins and small molecules interact). Furthermore, the goals were lofty and almost certainly unattainable (for Alchemy: turn base metals into gold, for Cell Signaling: cure cancer), driving people to do "rush" work. It's not that the results are completely invalid, or that the experiments are useless. It's just that everyone feels like they're so close to a solution (Alchemists were way off with Phlogiston, and I'll bet Cell Signaling researchers are similarly clueless as to what really matters) that no one takes the time to step back and synthesize the results in an attempt to understand the forest from the trees.

Cell Signaling will, eventually, have its Joseph Priestley, its Dmitri Mendeleev.

From experience, the fact that 89% of these "cancer research" papers are not reproducible almost definitely has less to do with fraudulent data and so much more to do with crazy complex experimental setups that end up probing a half-dozen experimental variables all at once (without the researchers even grasping that this is going on).

Yeah, publish or perish sucks, but what sucks more is the death of basic science research. The Alchemists eventually became Chemists because they refocused on core principles (atomic theory, bonding) and forgot about the lofty goals (turn lead into gold)...

...but try telling any politician that they should decrease cancer research funding and refocus on genetics, structural biology, and evolution research. I'd love to know how they respond.


>Cell Signaling will, eventually, have its Joseph Priestley, its Dmitri Mendeleev.

That's very optimistic, but not necessarily true (even ignoring the question of when). Priestley succeeded so well because he developed simple, reproducible experiments (expose mercury oxide to sunlight to produce oxygen etc.) that he was happy to share for no financial gain (even traveling to others' labs to help them reproduce his work [1]). The current environment of paywalls and competitive grants seems hardly conducive to the rise of similar figures.

[1] http://en.wikipedia.org/wiki/Experiments_and_Observations_on...


Part of why it may not be true is that we lack the language or conceptual vocabulary to describe how living systems really work.

When we engineer things, it's one-part-one-function. Living systems are every-part-every-function network graphs with weighted edges that are subject to dynamic reconfiguration.

A cell isn't a computer that runs "code" in the form of DNA, nor is it a "machine" as we understand it. A "structured cloud of probabilistic quantum interactions among molecular nanomachines" is a bit more accurate.

Until we conquer these meta-problems, we won't understand the cell or the genome. I don't think they can be understood classically -- and I don't mean classically in the sense of omitting quantum mechanics. (Though that's true too.) I mean classically in the sense of linear fixed-relationship cause and effect machines.


What do paywalls possible have to do with the reproducibility of results? Anyone who has access to a lab capable of even attempting to reproduce such research is either 1) at an institution that has access to those journals or 2) can use google to find the papers anyway. Hopefully both.

The fact that tax-payer funded research is put behind paywalls is a travesty , but it seems wrong to claim it actually hinders researchers from doing anything given.


    > What do paywalls possible have to do with the reproducibility
    > of results? Anyone who has access to a lab
Actually, there's no institution subscribed to every journal. There are, in fact, many papers inaccessible to even the most funded labs. Spotty coverage.


No, no institution subscribes to every journal. However, see point 2. Given the title and author, how many of those paper's can't you find with google / google scholar(or emailing the author).

Paywalls for publicly funded research our wrong, but the cases when it prevents people at research institutions from doing research seem to be tiny.


I'm published in a highly-ranked journal that our university doesn't have access to.

Sorta funny.


Can you find the paper online by searching for it?


If they were open, you would have an online service that could automatically point contradicting studies, and even make suggestions.


Even if they are closed, you can link to the papers (unless the DMCA metastasized again). Granted, someone can't make an automatic tool to find such studies, but that would be hard anyway and presumably the authors are aware of the study they are refuting and could tag them.

Now, people trying to make policy decisions could not read or evaluate the studies. If we assume they are capable of doing so in the first place( i.e. their not Lamar smith), then we have an actual argument for getting rid of paywalls. I still think the best one is its a waste of money and a tax on universities/ grants.


I don't understand why science should be a closed guild in the first place.


Because these days it is institutionalized, and in our society that means it is funded by capitalistic entities, i.e. corporations.

In other words, the profit motive dictates that making money to enrich a few individuals is more noble than sharing knowledge for the good of society.


Science is funded entirely by corporations? Tell me, where can I buy stock in the NSF, DARPA, NIH? Or where can I find the articles of incorporation for the MacArthur Foundation or any of the other non-profits that also fund research. Or which corporation is paying my PhD stipend --- I really ought to short their stock since they were dumb enough to hire me.


Those are government entities, and the government's parent company is the corporatocracy that owns all the politicians and media.

Science is a business in the service of global capitalism. Truth hurts.

I will give you the MacArthur foundation and certain other non-profits. But most non-profits are also operated by the corporatocratic elite, like everything else on this planet.


You're right - if you define 'funded by corporations' sufficiently loosely (to include everything from governmental bodies to universities to charitable foundations) that the statement 'science is entirely funded by corporations' becomes a tautology, it becomes clear that science is entirely funded by corporations! I for one am shocked and appalled.


You're just being pedantic. The thrust of his comment was clear while not perfectly correct. "Fund" is well understood as a euphemism for "control"

It would be ideal if science were "funded" (read: controlled) by the public in the interest of the public good, but in reality it is controlled by the corporatocracy.

Is Goldman Sachs "funded" by the government or is the government "funded" by Goldman Sachs? The best understanding is that the government is a corporate subsidiary of Goldman Sachs, a profit-centre that Goldman Sachs invests in and controls by means of investment, and those investments reap profits.

Technically you could say that GS is funded by taxpayers, but funding implies both investment and control. So it's more accurate to say that GS funds the government, because this makes it clear where the locus of control is. Citizens are indentured servants that pay tribute to GS via taxes. Academics are indentured servants that receive money and position in exchange for their services to the subsidiary of GS called NSF.

Having said that, I think many foundations are genuinely independent of GS control, but they still can't be said to be funded by the public for the public. They are just altruistic private parties as opposed to the megalithic non-altruistic faction that is the corporatocracy.


I keep coming back to this essay called "Can a Biologist Fix a Radio?": http://protein.bio.msu.ru/biokhimiya/contents/v69/pdf/bcm_14...

The thing that keeps coming to mind is that it's easy to hand-wave pseudo-understanding of systems using enough jargon and vague diagrams. And if you use enough jargon and cloak things behind complex methodologies (the mechanisms to even begin investigating how cellular machinery works are enormously complex) it's easy to hide a lack of understanding. And I can't help think a similar sort of problem exists in software. The tools and terminology we have for explaining and understanding software systems leaves a lot to be desired, and one of the ways that is revealed is in the diagrams we use for describing systems. At a fairly fine grain we can use UML diagrams, but these tell us almost nothing and in complex systems simply become a giant hairball. Typically we end up describing a system as a series of layers, or as an interconnected set of services, but I can't shake the notion that such diagrams are too much like Figure 3a in the essay above than like 3b, they are the crudest sketch of a system, they don't touch on the nature or interrelation of the components.


> It's not that the results are completely invalid

Especially if only 11% of them are reproducible, they're only 89% invalid.

I'd go so far to say: If the reproducibility rate is 11%, you're not doing science, you're just pursuing funding.


You're assuming that only "reproducible" results are valid, which is incorrect. Results that are reproducible within a lab, but not when attempted by other people in different settings indicate that there's an unaccounted for variable, and that falls into the 89%. There's still something valuable there, it's just that we don't yet know all the variables.

We could just give up, or we could try to continue study of something incredibly complex. Given that we are gaining some ground, it's clearly not a useless endeavor, it's just an extremely difficult one.


If its not reproducible it is not science.


Define reproducible, in the terms of the precise actions that people take. Is it reproducible if the same scientist repeats the experiment in the same lab and gets the same results? Because that's the current bar of reproducibility, and that 89% that is not "reproducible" certainly passed that bar.

What was being tested was a different lab, with different materials, trying to get the "same" results, for some definition of same. If you give 100 programmers an algorithms book, and tell them to produce code for a binary search, and only 25% of the programmers are able to make something that works, does that mean that binary search is only 25% reproducible?

If five different companies benchmark five different web frameworks for their application, and come to 2-4 different answers about which one is the 'best,' does that mean that the benchmarks are not reproducible? Of course not.

What's being highlighted here in this study is the extreme diversity of biological models. And one doesn't necessarily expect exact reproducibility in other people's hands, because we simply don't have technology to characterize every single aspect of a biological model, and it's impossible sometimes to even recreate the exact same biological context. Is something "reproducible" if it means that it replicates in 5% of other cell lines, 25% of other cell lines?


>Is it reproducible if the same scientist repeats the >experiment in the same lab and gets the same results? >Because that's the current bar of reproducibility, and that >89% that is not "reproducible" certainly passed that bar.

I don't follow you here. The above does not seem to be the current meaning of "reproducible":

http://en.wikipedia.org/wiki/Reproducibility

The same person doing the same experiment is repeatable, not reproducible. And I don't believe even the repeatable bar has been met, as very few projects have funding to do the same experiment twice.

The fact that a given investigator can "repeat" his experiment have very low weight among professional scientists, because we are all human. Irving Langmuir's famous talk about Pathological Science, and especially the sad story of N-rays, is a warning to every scientist.

http://en.wikipedia.org/wiki/Pathological_science

http://www.cs.princeton.edu/~ken/Langmuir/langB.htm#Nrays


So there is a theory that if something doesn't reproduce it's because the other guy was just incompetent, and that may be the case, but just like everyone wants to believe they're above average, everyone will want to go to the theory that the other guys just aren't any good, when I suspect that that will be much less of a factor. At any rate, when you start trying these drugs on the wide diversity of the patient population, if they're not super robust, they don't be of much use anyways.


Huge swaths of astronomy are functionally unreproducible. We can argue about the math but many phenomena exist as a single example and/or that's basically static on our time scales. The best we can do is see if the math seems to produce similar looking structures when (sparsely) simulated.


There's a difference between an observation and an experiment. Lab experiments should be reproducible. The fact that astronomical events are not reproducible does not make the study of them unscientific, but it also doesn't imply that lab experiments should be one time events.


Would there be a problem with just calling it something else other than science then? I don't see the need to bend definitions of words to account for inconvenient circumstances. I am a programmer and do hard stuff, I don't require people to call me a scientist. Mathematicians do hard stuff, they don't complain that they're not called scientists. Engineers, too, are not scientists. It's not pejorative, just a statement of fact. If what you say is correct, then what is the issue with just saying astronomy is not a scientific field?


Well attempting to exclude astronomy/astrophysics from the umbrella of science would be bending the definition far more than the current status quo.

Part of this is that science isn't about experimentation, it's about observation. We perform experiments when possible so that we have more stuff to observe, or more controlled events.

Much of astronomy, atmospheric physics, geology, medicine, the "soft" sciences and I'm sure plenty of other "hard" fields are at the mercy of certain phenomena having sweet FA for data points. And I'm sure they all do what astrophysicists do: make sure that what we do have plenty of data for works; make our extrapolations with as few assumptions/rounding errors as we can; and revisit existing models anytime we find a new data point.

It's as scientific as anything it's simply going to take longer to sort out in some cases.


I once had a very long and intense argument with a guy who was offended because I thought that I wasn't a scientist even though I studied Software Engineering (not even Computer Science).

Apparently people take that crap seriously.


I think this is a straw man. Astronomers, astrophysicists, etc. go to enormous lengths to address these issues over time and are well aware of the shortcomings of their work. When black holes were predicted, none had been observed. I'd suggest that astronomy is a terrible place to make claims about irreproducibility.

Cosmology on the other hand...


It's really critical that we don't confuse research that's nor reproducible with fraud. Very few scientific theories survive unmodified over time, so lack of reproducibility isn't a criticism and we really need to move the debate past this. Every theory is expected to be inaccurate as it only explains the data using the understanding of the time, but this isn't an indictment of the research or the researcher and studies of outright fraud indicate that that actually only happens around 1% of the time.

Reproducibility isn't about calling out people whose work isn't reproducible, it's about identifying and promoting the most robust stuff.


There are lots of parts of science that can't be experimented on (e.g. astronomy). Even for those parts that are experimental, just because you're wrong doesn't mean you're not doing science.

The current test (if I remember my philosophy of science correctly is about falsifiability - it's not science if its claims can't be disproven. From this perspective, bad experiments are still science - someone predicted that similar experiments would behave similarly, and their prediction was falsified. This is how science is supposed to work.

It gets problematic when any failure to reproduce instinctively gets explained away as experimental error on the part of the second experimenter. Even worse is when experimenters (as in this case) work to have failures to reproduce hidden from the scientific community (the authors of this study had to sign contracts that they would not identify specific failing studies before they were given the necessary data about experimental procedure.



That's not true. Part of science is doing experimentation. If an experiment doesn't reproduce, you need to find out why that is. The hypothesize and test part of the scientific process is every bit as much science as the rest, even if your tests show that an idea is wrong, or that more is going on then you thought.


What about the experiments run by using the LHC? No other organization has a similarly sized particle accelerator, so by your definition it is not science because it cannot be reproduced elsewhere?


If 11% of the papers are reproducible and people are trying to reproduce the results then your still making progress. Just the slow and expencive kind. Considering how complex the subject matter is I don's think it's reasonable to expect anything else.

The problem is people are not trying to reproduce results which harms the field and slows everything down.


Actually very few people are trying to reproduce results. It is a lose-lose situation. Either you confirm the previous results which won't get published or you can't confirm them which means you are either not as competent as the original researcher or you have to embarrass your colleague and bring shame on your profession. Neither of which gets you published or helps you career or gets you more funding. The incentives are all screwed up.

These Amgen researcher had to sign agreements that they would not publish the results of their attempts to reproduce these experiments before they could get enough detail to attempt to reproduce them. Clearly this is not how science is supposed to work but it is exactly how businesses operate. Very sad.


After seeing how much of the research is done, I'd agree with this more.

This is a problem that is getting worse as funding is getting cut more, people feel they have a need to get a paper out, regardless of the results. You get positive results, make a story up about it, then run to publish it before trying to reproduce it or look further into the data. While this doesn't happen in every lab, I'm unhappy to say that I've seen this happen in many "high impact" labs.


Why do you say funding is getting cut? NIH's budget has almost doubled in the last decade [1] and many of the other funders have seen similar growth as well as new funders appearing every year.

I don't think the problem is lack of funding but screwed up incentives. When medical reaseach became focused on funding the quality of the results suffered. And if the vast majority of landmark cancer research can't be reproduced much of that money was wasted.

The solution will require a huge cultural change which may be impossible. However step one is recognizing the problem. And some efforts are already underway such as journals like PLoS which publish negative results and more recently The Reproducibility Project and Reproducibility Initiative [2,3]. Still it will be difficult.

[1] http://officeofbudget.od.nih.gov/pdfs/spending_history/Mecha...

[2] http://www.openscienceframework.org/project/EZcUj/wiki/home

[3]https://www.scienceexchange.com/reproducibility


http://news.sciencemag.org/scienceinsider/2013/05/nih-detail...

Already funded grants are getting cut ~20% across the board. There is a ton of cuts going on right now to the NIH budget, google it and take a look. This has been happening for years now, trying to get an RO1 (large research grant) is becoming more and more difficult and it isn't helped by the constant changing in requirements.

Every other point I agree 100% with, the culture change has to happen. Nothing is impossible, and it just takes the right people to make the right things to happen. Everyone recognizes the problem, I can't tell you how many times I hear people complaining about the same problems over and over. The problem is that they aren't taking action, and with no action, nothing is going to take place. While there are many efforts in place (my project being one of them, http://omnisci.org), they need to be implemented properly. The same rule of startups applies to science. The ideas/concept means nothing with proper execution.

While the culture change may be slow, the academic world is having a really hard time keeping up. NIH is also fighting to stay afloat. I have a few friends who work as program officers and they really have a negative outlook on the future of research funding.


You are right about the sequester cuts. I was looking at the annual numbers of the NIH front page which didn't include 2013. I wonder why 5.5% overall cuts translate to 20% cuts. The SciMag article makes it seems like they were only cutting the number of grants not the size which kinda makes sense. Perhaps they are treating funded grants worse which seems crazy. Wouldn't this potentially waste the money already spent if the project can't be finished on 20% less?

Good luck with omnisci.org this is the sort of thing that would help: open sharing of data, techniques and negative results. If this was the norm things could be very different. But one thing I have learned is it is very hard to change and organizations culture.


I've seen this happen as well, but I think the problem is more with the "make up a story" part and less with the "run to publish" part. I've seen really really interesting results that defy explanation get passed over for publication in favor of something more mundane that can "tell a story" because, it seems, stories get funded...intriguing research? not so much...


This is a pretty dubious justification. Whether or not Alchemy lead to later easily reproducible results, the alchemists couldn't have engaged in an effort to present themselves as what we now know as scientists.

Oppositely, it seems logical to expect that a parade of fake and often impressive results would act as a damper to real, modest gains getting attention and funding.


So, basically, it's a combination between: A) A really hard problem to solve; and B) A heck of a lot of pressure to get it solved.


What is more troubling is not that so few of these results results are reproducible, but that it appears almost no-one is trying to reproduce the results of earlier studies. The ability of the scientists who wrote the paper to even get access to the resources necessary to try and reproduce the results is limited.

I'm reminded of Feymann's Cargo Cult Science:

"I was shocked to hear of an experiment done at the big accelerator at the National Accelerator Laboratory, where a person used deuterium. In order to compare his heavy hydrogen results to what might happen with light hydrogen he had to use data from someone else's experiment on light hydrogen, which was done on different apparatus. When asked why, he said it was because he couldn't get time on the program (because there's so little time and it's such expensive apparatus) to do the experiment with light hydrogen on this apparatus because there wouldn't be any new result. And so the men in charge of programs at NAL are so anxious for new results, in order to get more money to keep the thing going for public relations purposes, they are destroying--possibly--the value of the experiments themselves, which is the whole purpose of the thing. It is often hard for the experimenters there to complete their work as their scientific integrity demands."

For anyone who hasn't read it, the whole thing is excellent http://www.lhup.edu/~DSIMANEK/cargocul.htm


If I had to sum up science in one phrase, I wouldn't say anything about the "scientific process" or anything like that. I would say: "Look for reasons why you are wrong, not reasons why you are right." You can always find reasons why you're right. You can always do an astrology forecast and find someone for whom it was dead-on accurate. That's not the problem with astrology, the problem lies in how often it is wrong. It's wrong so often it's useless. But if you only examine the positive evidence in favor of it, you will never come to that conclusion.

The theories that are powerful and worthwhile are the ones that are rarely or never wrong. Can't always get "never". It's a complicated world and we aren't all physicists. But we at least ought to be able to get "rarely", and if you can't, well, I guess it's not science then. That's OK. Unfortunately, not everything is amenable to science, though you can still approach it in this spirit of trying to see how you might be wrong rather than proving yourself right.

Once you start looking around with that standard, it's not hard to see how little science is really being done. Why are we publishing these dubious studies? Because for all the scientific trappings we claim, with statistics and p-values and carefully-written recordings of their putative procedure written in precisely the right way to make it sound like everything was recorded (while still leaving out an arbitrary number of relevant details), we've created a system where we are telling people to look for reasons why they are right... or we won't publish their results. Guess what kind of results we get with that?

If you start from the idea that you need to look for why you are wrong, the scientific method will fall out of that, along with any local adjustments and elaborations you may need, and every discipline, sub-discipline, and indeed at times even individual experiments need adjustments. If you start with "The Scientific Method", but you don't understand where it came from, how to use it, or what it is really telling you, you'll never get true science, just... noise.


"Look for reasons why you are wrong, not reasons why you are right."

I share your worldview. Makes intuitive sense to me. It's intellectually honest. And if I'm wrong about something and no one corrects me, I get kinda grumpy.

Maybe everyone else already knows, but I learned relatively late that it's called Popperian.

http://en.wikipedia.org/wiki/Karl_Popper#Philosophy_of_scien... http://en.wikipedia.org/wiki/Falsifiability


Falsifiability is one natural landing point, but it is also somewhat controversial. What I'm advocating isn't so much a philosophy as a state of mind, one hopefully less controversial than trying to declare a "definition of science". I think of it more like a mind hack you can perform on yourself. (So much discipline boils down to figuring out how your conscious brain can fool your subconscious brain.)


In practical terms, reproducing someone else's work most of the time boils down to redoing someone else's PhD thesis. Which is both not very interesting and doesn't help getting your own research done.


I agree that reproducing someone else's experiment won't help you get your PhD closer to completion (because you should be doing your own experiments and publishing no matter what, if you expect to land that lecturing position, i.e. publish or perish).

Reproducibility is one of the main principles of the scientific method (those papers shouldn't even been accepted if they don't contain enough information about how to reproduce the experiments they describe)

If an experiment is so complex that it can be compared to redoing someone else's thesis it can be either that the thesis is very simple or the experiment is so complex that it probably proves nothing.


Yes, I'm not arguing the foundations of the scientific method, just pointing out how it is in practice.

In every field you have the established base theory, the bleeding edge you work on, and a bunch of preliminary work in-between. Checking out references is important, but this is a recurrent process (as you need to check the references of those works too), and you only have so much time.

So what really seems to happen is only a few select works in any subfield achieve large citation index, and those stand a decent chance of being verified at some point, or at least they do have enough work trying to build up on their results that a systematic inconsistency would show up.


Yes, it's a good piece, but actually the part you should've quoted was the part on the mice experiments...


I have been in discussions about this with one of my friends working in academic materials research. Its amazing the amount of work today done by scientist at universities writing code without very basic software development tools.

I'm talking opening their code in notepad, 'versioning' files by sending around zip files with numbers manually added to the end of the file name, etc.

This doesn't even begin to scratch the surface of the 'reproducible results' problem. Often times, the software I've seen is 'rough' to be kind. Most times its not even possible to get the software running (missing some really specific library or some changes to a dependency which haven't been distributed) or its built for a super specific environment and makes huge assumptions on what can 'be assumed about the system.' This same software produces results which end up being published in journals.

If any of these places had money to spend, I think there could be a valuable business in teaching science types how to better manage their software. Its really unfortunate that outside of a few core libraries (numpy, etc.) the default method is for each researcher to rebuild the components they need.

I'm surprised about only 11% of results being reproducible. It seems lower then I'd expect. I agree we don't want to optimize for reproducibility, but obviously there is some problem here that needs to be addressed.


> Its amazing the amount of work today done by scientist at universities writing code without very basic software development tools.

I agree 100%. I recently quit my PhD so I still know a lot of people on the frontlines of science. One of these friends recently asked me to help them with a coding issue so they gave me an ssh login to group's server. I login and start reading the source.

It was all Fortran, with comments throughout like "C A major bug was present in all versions of this program dated prior to 1993." What bug, and of what significance for past results? Unknowable. As far as I can tell from the comments, the software has been hacked on intermittently by various people of various skill since at least 1985 without ever using source control or even starting a basic CHANGELOG describing the program's evolution. The README is a copy/paste of some old emails about the project. There are no tests.

So even though computer modeling projects should, in theory, be highly reproducible... it often seems like researchers are not taking the necessary steps to know what state their codebase was in at the time certain results were obtained.


This is an entirely different issue than code; code mostly does the same thing when you run it twice. There's no such guarantee in biology. A cancer cell line growing in one lab may behave differently than descendants of those cells in a different lab. This may be due to slight differences in the timings between feeding the cells and the experiments, stochastic responses built into the biology, slight variations between batches of input materials for the cells, mutations in the genomes as the cell line grows, or even mistaking one cell line for another.

Reproducibility of software is a truly trivial problem in comparison.


Also, sometimes, doing the experiment is extremely hard. I know a guy who only slightly jokingly claims he got his Ph.D. on one brain cell. He spent a couple of years building a setup to measure electrical activity of neurons, and 'had' one cell for half an hour or so (you stick an electrode in a cell, hope it doesn't die in the process, and then hope your subject animal remains perfectly subdued, and that there will not be external vibrations that make your electrode move, thus losing contact with the cell or killing it)

Reproducible? Many people could do it, if they made the effort, but how long it would take is anybody's guess.

Experiments like that require a lot of fingerspitzengefühl from those performing them. Worse, that doesn't readily translate between labs. For example, an experimental setup in a small lab might force an experimenter in a body posture that makes his hand vibrate less when doing the experiment. If he isn't aware of that advantage, he will not be able to repeat his experiment in a better lab (I also know guys who jokingly stated they got best results with a slight hangover; there might have been some truth in that)


Oh, I agree. Biological experiment reproducibility is an incredibly hard problem. You are probably right that it is 'trivial' by comparison in the same way that landing on mars is trivial to landing on Alpha Centauri.



Have you seen: http://matt.might.net/articles/crapl/

"Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about "proof of concept." These rough edges make academics reluctant to release their software. But, that doesn't mean they shouldn't.

Most open source licenses (1) require source and modifications to be shared with binaries, and (2) absolve authors of legal liability.

An open source license for academics has additional needs: (1) it should require that source and modifications used to validate scientific claims be released with those claims; and (2) more importantly, it should absolve authors of shame, embarrassment and ridicule for ugly code."


I think that's what the folks at Software Carpentry [0] are trying to do. I went on one of their courses, and you're taught the basics of writing good software, version control and databases (SQLite). I've frequently recommended it to fellow scientists.

[0] http://software-carpentry.org/


This is great! Thanks for sharing.


Recent article on git and reproducability in science: http://www.scfbm.org/content/8/1/7

It is badly needed.


That article says "Data are ideal for managing with Git."

I one time tried using git to manage my data. The problem is, I frequently have thousands of files and gigabytes of data. And git just does not handle that well.[1]

One time, I even tried building a git repo that just had the history of pdb snapshots. The PDB frequently has updates, and I have run into many cases where an analysis of a structure was done in a paper 3 years ago, but the structure has been updated and changed since then, making the paper make no sense until I thought to look at the history of changes to the structure. Unfortunately, git could not handle this at all when I tried it, taking days to construct the repo and then that repo was unbearably slow when I tried to use it.

Git would probably work well for storing the data used by most bench scientists, but for a computational chemist puking up gigabytes of data weekly on a single project, it is sadly horrible for handling the history of your data.

[1] http://osdir.com/ml/git/2009-05/msg00051.html


You might find git-annex useful:

http://git-annex.branchable.com/


As someone who, fresh out of high school, coded for a quite published astrophysicist at a major government research institution, I can confirm that I had no idea what I was doing.


After decades of publish-or-perish sweatshop science I'm sure the great shining archive of scientific truth is sort of like an inbox with no spam filter.


Such a powerful sentence.

Lately science looks like olympic games without anti-doping control...


I'm becoming pretty convinced that the human future depends on the disruption and obsolescence of bureaucracies.


I think you are on to something. When science became so big that it first needed then eventually was run by "managers" I bet it lost much of its effectiveness. But how do we change this? For starters there is a lot of money involved for the institutions conducting this (apparently shoddy) research.


> When science became so big that it first needed then eventually was run by "managers" I bet it lost much of its effectiveness.

You can just do projects on your own, you know. There's nothing about science that absolutely requires an institution.

http://groups.google.com/group/diybio


I work in biomedical research, and this finding has been discussed quite broadly. Most researchers don't believe it.

Exactly reproducing novel findings usually requires a significant investment into the underlying procedures, which most places will not undertake. Reproduction instead usually occurs as part of an extension of the initial findings. The complexity of biology means that the first findings are frequently not reproduced exactly the same way as the original, but this does not detract from the "direction" of the initial findings.

For example, the first finding might be that protein A promotes tumor growth by modifying protein B. An extension of these findings might be Protein A sometimes modifies protein B, and when it does tumor growth is stimulated, but it mostly does not modify protein B, it instead modifies protein C, which suppresses tumor growth. In this case, were the first results replicated? Yes, and no.

This is how most biomedical research proceeds...


Really?

Meaning that either there's another unaccounted variable(s) that controls the effect of A in B and C, meaning that we cannot conclude anything from the experiment.

Yes and no is not acceptable...


You're wrong. What you say - that we conclude anything if not everything was taken into consideration - is undoubtedly true for mathematical meaning of "conclude". Mathematicians, computer scientists and a few others, like theoretic physicists, have a luxury of using law of excluded middle, which Sherlock Holmes also used. The famous detective states that if you eliminate everything that is impossible, whatever remains is true, even if it sounds improbable.

All the people who use this law do so because their work is about some kind of model, which can be wholly known. However, if you have a misfortune of working in the real world - like nearly everyone - then you can't apply this law. This means that you won't ever get a proof in a mathematical sense. You won't ever be completely certain - you can be convinced beyond reasonable doubt, but that's all.

So when we talk about how biologists are just grant hunters because someone couldn't reproduce their experiments we need to take this into account. I don't know, but if I had to guess I'd say that nobody ever expected these experiments to be 100% accurate, 100% reproducible or 100% true. I think they are treated as a data point, some input to think of, and not definite truth.

But I may be completely wrong here, of course.


There's 20,000 broad classes of variables (genes), and on average 12 variations of each of those variable (alternative splicing, post-translational modification, cellular localization, other variables we haven't yet discovered ). This series of two experiments and likely 2 years found associations between three proteins, which is a hook into system of great complexity, a way to start looking for more.

It is not "reproducible" yet it is incredibly valuable.


Yes and no is acceptable. It's really a "Yes, but...". Is Newtonian physics correct? Yes and no. It's roughly correct at a macroscopic level, but at small scales quantum effects are important and at high speeds relativistic effects are important. "It's correct, but..."

Not every work of science needs to be exact fact before being published. Really, nothing in science is accepted as absolute fact. The scientific publishing process is a conversation between scientists to try to determine the truth.


It is exactly as you suggest, additional variables. Bear in mind that a single human cell has ~25,000 genes, and therefore at least that many potential degrees of freedom, even before you include external variables. It is very difficult to control for all of them.


I read a lot of medical research papers and I'd say about 1 in 10 is about cancer. They usually go something like this: We found this receptor on a cancer cell and it caused it to grow or shrink. We therefore propose making a drug to inhibit or act as an agonist for this receptor site which also affects an enormous amounts of completely un-releated stuff around the body in order to fight cancer.

Truth is, they may have just gotten a funny little mutant of a cancer cell that happens to heavily express this one receptor as part of its unique mutated biology. For all we know a lot of cancer research has been about exploring the peculiarities of the genome of Henrietta Lacks.


Perhaps we are looking for the wrong measure of success. This is not inclined plane ball rolling in fifth period Physics with Mr. Johannes. Reproducibility in cutting edge experiment is success. It means that what is being measured - even if the wrong thing - is within the control of the experimenter.

It is absurd, but practical, to publish experimental results with the implication that they are reproducible based on a peer review of the results. "Well they look reproducible to me," when uttered by a peer reviewer is the current standard. All we are seeing is that the a priori reasoning of experts is not a substitute for empirical investigation of scientific claims.

Let us not forget that a journal which finds no worthy articles this quarter has only two options, publish unworthy ones or publish no articles. I know how I react when a magazine subscription does not come.


I wonder if it would be possible to produce a journal that only released an issue when it has, say, 10 articles? No monthly/quarterly schedule, you could have an issue release next week, or next year.


As a digital document it is easy. Because printing requires substantial coordination, it is less practical. The subscription model common to the scientific journal industry makes it contractually problematic.

I wonder what portion of academic research is equivalent to content farming.


Most published scientific research papers are wrong, according to a new analysis.

http://www.newscientist.com/article/dn7915-most-scientific-p...

Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

http://dx.doi.org/10.1371/journal.pmed.0020124


Is this analysis a published scientific research?


Joking aside, the authors of these studies have highlighted the low statistical power of research in a number of fields (see also http://www.nature.com/nrn/journal/v14/n5/abs/nrn3475.html), which is a serious issue that often leads to over-interpretation. It's as if researches willing to publish more are rushing hasty studies out the door.

It's a good thing that people can measure these things and ring the alarm


Depends. By and large, Ioannidis's paper is pointing out the basic consequences of the alpha and beta parameters in Neyman-Pearson hypothesis testing, so there's a very good argument that this paper is merely statistics, and statistics is generally considered to be mathematics, and mathematics not science but its own thing.



This is a really insightful article that contains a number of points that qualify the result and present lessons for any one interested in data science.

A few points that stand out to me:

!) The 11% number comes not from a randomized sample of cancer papers but from attempts made by AmGen to reproduce results that might be useful in drug development. This means that they made good effort to reproduce these results but there is also a strong selection bias that needs to be acknowledged.

2) They point out that using survival time as the measurement complicates things. I've done a lot of machine learning and statistics on cancer and medical data and, in my experience, this can not be overemphasized. and There are loads of confounding factors that contribute to survival time. I expect that big break throughs will come as we develop rigorous ways for measuring the behavior of a tumor (does it metastasize? how does it feed itself?) and use those as the targets of our regression. (Currently these thins are measured by human visual inspection if at all.)

3) They point out that the studies that were repeatable were the ones that were careful about using blinded controls, and eliminating investigator bias. This is basic stuff but easy to overlook. In terms applicable to a startup, a data scientist needs to me motivated to vet existing and proposed practices and identify flawed ones as much or more then they are motivated to maximize gain.


With regards to 1), it would be ironic if this study was itself not statistically valid...


I feel obliged to point out that these other journals are merely copying the real pioneer in this field: http://jir.com.


Negative results vary from field to field. In math or physics, a conclusive negative can be very important in deciding what to study next or disproving established theory. In biology, a negative usually just means "I haven't found what I'm looking for yet".


And that is the exact problem with certain researchers. I just had to explain this to my undergrad who's project resulted in negative results (his 1'st study). I congratulated him, as he discovered that the treatment he gave did NOT effect the genes he looked into. He didn't understand, so I had to explain that his study (with the correct controls), provided information that is useful to science. Negative data = data. Data = good :)


Wow, that's sad. Are we really so blind when it comes to cancer research?

I recall that oncology journals usually have a ludicrously high impact index (as ludicrously as 5 digit IIRC); that means there are a lot of citations, which is an indicator that there is a lot of research going on. And with a lot of research going on, well, you can expect a lot of false positives. So, I'm wondering, could this be a case of cherry-picking or some kind of selection bias? It wouldn't be difficult to select a lot of bogus-sounding research and test it.


Medical research in general is dubious at best. See for example http://www.plosmedicine.org/article/info:doi/10.1371/journal...

There's even a startup trying to work around it: http://www.metamed.com/. They look for papers related for a given condition on request, trying to find ones that are actually well done and promising.


"And with a lot of research going on, well, you can expect a lot of false positives."

Why would you expect a lot of false positives? Aren't results supposed to be reproducible at least 95% of the time?


I was thinking about absolute numbers, so 5% of a lot can still be a lot.


But who is going to pay to reproduce that research? What if that research took years to do?


Why do the research in the first place if the only outcome is a number, floating in space, disconnected from anything and unreproducible?

You're still sort of operating on the idea that the unreproducible papers have some sort of abstract value to them, and therefore we shouldn't slow the flow of them lest we ruin their value. But they don't. They're worthless. They're worse than worthless. They're of negative value. It would be far better to slow down and verify that what we think we know is actually true, because in the end that would actually both faster and a more efficient use of resources. Basically, instead of learning worse than nothing (thinking we know something but actually being wrong), we'd learn something. That's a pretty decent upgrade.


You don't have to do the entire experiment, you can just do a proof of principle and then build on it. For example, a lot of studies in science have 1 big idea, and they show it works using a variety of different methods. Pick 1 method, show consistencies, and then build on it.

One of the serious problems in science is that people state what you said above and will jump in feet first on a huge multi thousand $ study, but never spend the time to validate the model in their own lab. So you waste that money initially, then go back to figure out what went wrong. $$$$ being wasted. (The lack of negative data publications is also a massive problem that contributes to this)

What I'd like to see (and I'm currently working on) is raw data from labs. I do xyz experiment, publish it, and release all the raw images, excel files, raw data outputs etc. Now other scientists can go into those data, ask their questions and try to reproduce it or at lease provide a different perspective. This is what I'm working on at http://omnisci.org and I intend to bring to the scientific community. We need transparency, because this "behind the door" shit (peer review, grant reviews, only positive data etc) isn't working.


An article claiming that only 11% of findings could be reproduced doesn't cite its sources, thus rendering the findings of the article unreproducible. Nice work!


Nature wrote an editorial justifying the situation, you can read it here http://www.nature.com/nature/journal/v485/n7396/full/485041e...


> "those authors required them to sign an agreement that they would not disclose their findings about specific papers"

I don't know why this doesn't cause a huge stir in the scientific community. Seems like everyone is fine with people sweeping their negative data under the rug. Shameful is a very mild word for that.


How were those authors able to enforce this request? To be a proper paper it should not be necessary to contact the author in order to reproduce it.

If you had to ask the author for more details then their paper was incomplete - perhaps that's why it could not be reproduced.


If you want to maximize your chances of reproducing the results, you will use identical apparatus and methods to what the original study used. That is often not possible to do just from reading the paper.


If the identical apparatus is essential to reproducing the result, then the description of that apparatus is part of the experiment. Omitting that means it's not a proper paper.


If only it were so. Often important parts of the protocol exist only in a hand written entries in a lab notebook. Because very few people are reproducing experiments and it is lots of extra work and jornals have limited space all the required details are rarely included. This is a big part of the problem. The typical work around is to contact the original reaseach to fill in the missing details.


Well then this article would be testing how well the papers were written, not whether or not it was possible to reproduce the results.


Agreed, but this study was specifically designed to highlight the problem not solve it. And the results would be useless, or non-existent, if they restricted themselves to scientists that would co-operate fully with the process and the possible "shaming" afterward.


I think it also highlights the underlying mentality that prestige matters more than integrity and "if everyone does it then it's fine".

If tax-funded scientists can't uphold high standards for themselves, the public is entitled to demand them to do so.


You might be being a little harsh. There is a huge difference between "hey, I'm having a problem reproducing your results, want to help me either reproduce or disprove your results?", which legitimate scientists will and do participate in, and "I'm doing a study on unreproducible results and want to publish your paper and name in the list afterwards".

Notice they say they got help without the shaming part so the scientists were willing to participate, just not get listed.

I personally know scientists who have helped out with disproving their own findings, it's not uncommon.


Exactly. This should be the top comment. Are we just supposed to assume their work was true based on trust? Perhaps this 'study' was actually just a fleecing meant to show that we take too many studies to be true based on simple trust.


It should not be the top comment. The original article is an explanation of WHY this was done. If a comment is posted explaining why the study was a bad idea or should not have been published despite this explanation THAT should be the top comment.


It seems like a big win for science could be had by centralizing some aspects of research within academic and research institutions. There should be well-funded software czars and statistics czars at each university that facilitate the efforts of the individual researchers to try to reduce the amount of shoddy science that gets published. Researchers are really terrible at this, and presumably slower than dedicated people even who are working on many different projects, so the division of labor should make the whole apparatus vastly more efficient, able to write and win more grants, etc. Of course, many researchers would fight against this because if you can't publish shoddy research, you can't publish as much research, but I would think there should be some way to get there from here.


This is a great issue to highlight. However, I'm unconvinced we need to optimize around creating research that can be reproducible, rather that the process of science generates the best possible results.

Both the scientific and lay public should read this (appropriately) as: one study does not prove a result, there is an 89% chance that a single research paper is simply wrong.

This mentality seems to be the bigger gap, although generating high quality research is a big part of the equation (mostly I think for the wasting of resources to validate, process, and reproduce claims).


Cold fusion was reproducible. What are you suggesting we do here?


This editorial commentary and the article on which it is based are part of an ongoing effort to improve the quality of scientific publication in a number of disciplines. The Retraction Watch group blog

http://retractionwatch.wordpress.com/

by two experienced science journalists picks up many--but not all--of the cases of peer-reviewed research papers being retracted later from science journals.

Psychology as a discipline has been especially stung by papers that cannot be reproduced and indeed in many cases have simply been made up.

http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-...

That has prompted statistically astute psychologists such as Jelte Wicherts

http://wicherts.socsci.uva.nl/

and Uri Simonsohn

http://opim.wharton.upenn.edu/~uws/

to call for better general research standards that can be practiced as checklists by researchers and journal editors so that errors are prevented.

Jelte Wicherts writing in Frontiers of Computational Neuroscience (an open-access journal) provides a set of general suggestions

Jelte M. Wicherts, Rogier A. Kievit, Marjan Bakker and Denny Borsboom. Letting the daylight in: reviewing the reviewers and other ways to maximize transparency in science. Front. Comput. Neurosci., 03 April 2012 doi: 10.3389/fncom.2012.00020

http://www.frontiersin.org/Computational_Neuroscience/10.338...

on how to make the peer-review process in scientific publishing more reliable. Wicherts does a lot of research on this issue to try to reduce the number of dubious publications in his main discipline, the psychology of human intelligence.

"With the emergence of online publishing, opportunities to maximize transparency of scientific research have grown considerably. However, these possibilities are still only marginally used. We argue for the implementation of (1) peer-reviewed peer review, (2) transparent editorial hierarchies, and (3) online data publication. First, peer-reviewed peer review entails a community-wide review system in which reviews are published online and rated by peers. This ensures accountability of reviewers, thereby increasing academic quality of reviews. Second, reviewers who write many highly regarded reviews may move to higher editorial positions. Third, online publication of data ensures the possibility of independent verification of inferential claims in published papers. This counters statistical errors and overly positive reporting of statistical results. We illustrate the benefits of these strategies by discussing an example in which the classical publication system has gone awry, namely controversial IQ research. We argue that this case would have likely been avoided using more transparent publication practices. We argue that the proposed system leads to better reviews, meritocratic editorial hierarchies, and a higher degree of replicability of statistical analyses."

Uri Simonsohn provides an abstract (which links to a full, free download of a funny, thought-provoking paper)

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588

with a "twenty-one word solution" to some of the practices most likely to make psychology research papers unreliable. He has a whole site devoted to avoiding "p-hacking,"

http://www.p-curve.com/

an all too common practice in science that can be detected by statistical tests. He also has a paper posted just a few days ago

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2259879

on evaluating replication results (the issue discussed in the commentary submitted to open this thread) with more specific tips on that issue.

"Abstract: "When does a replication attempt fail? The most common standard is: when it obtains p>.05. I begin here by evaluating this standard in the context of three published replication attempts, involving investigations of the embodiment of morality, the endowment effect, and weather effects on life satisfaction, concluding the standard has unacceptable problems. I then describe similarly unacceptable problems associated with standards that rely on effect-size comparisons between original and replication results. Finally, I propose a new standard: Replication attempts fail when their results indicate that the effect, if it exists at all, is too small to have been detected by the original study. This new standard (1) circumvents the problems associated with existing standards, (2) arrives at intuitively compelling interpretations of existing replication results, and (3) suggests a simple sample size requirement for replication attempts: 2.5 times the original sample."

The writers of scientific papers have a responsibility to do better. And the readers of scientific papers that haven't been replicated (or, worse, press releases about findings that haven't even been published yet) also have a responsibility not to be too credulous. That's why my all-time favorite link to share in comments on HN is the essay "Warning Signs in Experimental Design and Interpretation" by Peter Norvig, LISP hacker and director of research at Google, on how to interpret scientific research.

http://norvig.com/experiment-design.html

Check each submission to Hacker News you read for how many of the important issues in interpreting research are NOT discussed in the submission.


I would like to add The Reproducibility Project and Reproducibility Initiative to your list. They are seeking to specifically address this issue.

http://www.openscienceframework.org/project/EZcUj/wiki/home https://www.scienceexchange.com/reproducibility


The pressure on academics is indeed an issue. They have to teach, guide PhD's, research and publish (without mentioning controlling internal affairs).

On a side note. Has anyone found a good alternative for Mendeley? I heard http://bohr.launchrock.com/ is working on sth cool.


Have you considered Papers? http://papersapp.com/


Have you looked at http://www.zotero.org ? it covers most of the same ground as Mendeley, browser integration is quite good and the standalone app is as good or better than Mendeley


We are attempting to solve this by identifying and rewarding reproducible research (www.reproducibilityinitiative.org). However, it has been incredibly difficult to get funding to conduct the validation studies due to the obsession of funding "novel" results, rather than funding the replication studies required to identify reproducible research. Funders need to step up to solve this problem.


This is one of the reasons I went into math and software. In math there is only black and white, provably true and patently wrong. Still, part of the reason this is so is that to show any results everything must be laid on the table. CS and Math papers are therefore open by design and the world has benefited. I hope the rest of science will move in this direction.


I'm guessing you have not read Proofs and Refutations by Imre Lakatos. It is worthwhile.

Also if you read on the history of mathematics, the various back and forth foundational issues raised by infinitesmals, Fourier series, proof by contradiction and the like are very interesting. Sure, it is possible for a modern mathematician to look at all that and say, They just didn't know how to do it, HERE you go. But over decades and centuries people were somehow successfully continuing to do math, even though they knew that something serious was wrong in their understanding.


At least those historical mathematicians acknowledged their gaps of knowledge. Cancer research is a far cry different.


You clearly are unfamiliar with the history.

If you can find a copy, George Berkeley's The Analyst was a cogent criticism of the foundations of calculus in his day. (He didn't have answers, but he identified real problems.) A variety of books were written by mathematicians in response. Uniformly these were much lower quality, and consisted of defenses of the foundations of mathematics, rather than acknowledgements that there were real issues. His criticisms were not taken seriously until that mathematical framework completely fell apart when Fourier series constructed "obviously impossible" things.

Today we are used to a very general notion of a "function". But historically a square wave simply wasn't a function. (Because if it was, how did you know how it interacted with infinitesmals?) And getting one out of the sum of a bunch of well behaved sins, with some simple integration, was a huge shock.


Yikes! Thanks for setting me right.

I should have said some historical mathematicians then.


There are 'proofs' that contain errors, just as there are science papers that reach incorrect conclusions.

Also, it's not necessarily easier to verify a proof than it is to repeat an experiment (or perform additional supporting experiments), especially for complex proofs which very few people know enough to completely understand (eg. of Fermat's last theorem, or the Poincare conjecture).


CS has similar problem, specially in AI fields. Researches often claim the accuracy rate more than 95%, but in reality that is much lesser (in fact less than 10% in most of the claims).


Yes, that's why I no longer consider pursuing PhD in CS. It's a waste of life to reproduce non-reproducible papers that is full of buzzword.

Hopefully, more journals and confs will follow the Open Access and Reproducible Research model like IPOL[1].

> Each article contains a text describing an algorithm andsource code, with an online demonstration facility and an archive of online experiments. The text and source code are peer-reviewed and the demonstration is controlled.

[1]http://www.ipol.im/


Having worked in computer vision and robotics, I can attest to this.


Yes. Of the fields I've implemented algorithms from papers in, computer vision seems to be particularly adverse to discussing the (often critical) downsides of their algorithms. Often its something like "Camera calibration isn't exactly perfect? Well this scene reconstruction technique simply wont work."


Not to be too picky (am a mathematician too) but you ought to read about the history of the theory of "limit cycles" of planar vector fields and Dulac's "Theorem". Quite a lesson.

Edit: sorry, could not help writing this later (was on my iPhone before). The thing is that Dulac 'solved' an important part of one of Hilbert's problems. Was a 'proof' for about 60 years.


Pick a paper on NLP. Measures of document relevance and summary quality are not black and white, especially when you account for the ways the authors munged their test sets and scores.


> In math there is only black and white, provably true and patently wrong.

You'd probably be interested in Godel's incompleteness theorem.

Also, exact reproduction of other people's experimental work is highly unusual in CS, despite the theoretical possibility. Usually the materials and methods sections of papers in hard sciences like physics, chemistry, and biology are much more detailed, to the point of being recipes.

Finally, CS is not really a science. It lies somewhere between math and engineering, which are also not sciences.


If that were an actual possibility in other fields everyone would be delighted. Sadly, biology (human minds and societies as well) are much too complicated for this to work.

This is an actually challenging problem. I don’t think running away from it and resorting to cheap tribalism is a great idea.


This is one of the reasons I am happy to see what is going on with the Center for Open Science [0]. The goal of the project is to open up the entire research process, not just create open access to published results. They aim to make open tools supporting the entire research process as well, through the Open Science Framework [1].

One of the benefits of this approach is to provide peer review at each step of the research process. There is also an emphasis on encouraging validation of prior results, rather than having everyone focus on creating new research. This is the focus of the Reproducibility Project [2].

The Center is just getting started, and I sure hope it takes off.

[0] - http://centerforopenscience.org/

[1] - http://openscienceframework.org/

[2] - http://openscienceframework.org/project/EZcUj/wiki/home


This can be explained without bringing in accusations of fraud or incompetence [1].

Basically, if you have a bazillion experiments running and only publish the results which are significant, you still have a huge predictability problem, because most of those statistically significant results will be due to chance alone.

For example, assume that only 1% of cancer experiments produce something valid and interesting, and that standard for statistical significance is the industry standard 95%, or 1 in 20. If 100,000 hypothesis are tested, you'll get 100,000 * 1% = 1000 valid results and 100,000 * 5% = 5000 results from chance alone. In this example, only 16% of the published results will be reproducible, because only 1000 / 6000 are valid.

[1] http://sciencehouse.wordpress.com/2009/05/08/why-most-publis...


this basic fact is understood by scientists and children alike. it is still either incompetence (lack of middle school statistics) or fraud (ignoring the above fact).


The most interesting thing about all this is the fact that the researchers were required by paper authors to sign non-disclosure agreements and agree not to identify papers with non-reproducible results. This says a lot about the confidence authors have in their papers, doesn't it?

Sorry but this isn't science anymore.


It's good old capitalism making money from "treating" sickness. The longer the better. A lifetime of "treatment" means a lifetime of profits.

Who cares about the cure.


Pediatric Cancer gets only 4% of Federal funding and less than 1% of Private Drug company research. Also American Cancer Society less than 5 cents on the dollar goes to research.

Pediatric Cancer is the leading cause of death for 15 and under.

So to see this funding for cancer get thrown away by fraud is extremely upsetting.


Those are some pretty misleading statistics.

First off, pediatric cancer is NOT the leading cause of death for those 15 and under[1]. It's up there, but "unintentional injury" takes twice as many lives.

Second of all, not many children die (1200 per year), so even if something is the leading cause of death, it doesn't mean it happens very often. Cancer kills 10 times as many people between the ages of 35-44. If you add up all adult cancers it's probably 20-30x the number of deaths.

Also, just because money goes to adult cancer, doesn't mean it can't help. Most drug are developed for adults first and then tested in children. It's not like people are ignoring pediatric cancer.

Before you get so upset you should really sit down and think these things through.

[1]http://www.cdc.gov/injury/wisqars/pdf/10LCID_All_Deaths_By_A...


"Most drug are developed for adults first and then tested in children."

It was also be unethical to develop drugs for children without first testing them on adults, since children can't give informed consent.


Having read through these comments for the last several minutes, I'm left with the overwhelming conclusion that programmers, on the whole, have about as much to contribute to a discussion on the biological sciences as they do one on economics.


There's an awful lot of money in cancer (research funding/patents), an awful lot of diet/lifestyle handwaving in cancer research conclusions (that aren't about treatments), and very little progress in making cancer more survivable.

Just those facts would lead me to assume that most of the published research is bad. The great part is that 11% of the papers are reproducible, so the science isn't at a standstill.


That isn't limited to cancer. Western medicine is driven by a lot of bad mental and social models. Doctors only make money if you are sick. That strikes me as a seriously flawed model right there.


Cancer is one of those sickness where you either cure it, or you die.

Problem is, for profit companies seeks "treatment" to maximize profit, they don't seek the cure. Hence no progress on cancer.


There are many reasons so many papers are not reproducible.

1.) The prospective data collection methods used in a lot of studies are deeply flawed and worse aren't documented.

2.) The retrospective data used in many studies is poorly validated and the quality is rarely a concern for most universities.

3.) The people who publish these papers are very intelligent, most far more intelligent than I am, but you'd be surprised how many of them don't have a very good handle on basic statistics. Not because they aren't smart enough to learn but mostly because they have no interest in it.

4.) There is a lot of pressure put on researchers to publish papers even if they aren't ready and as a result mistakes are made, there really needs to be less of an emphasis on the quantity of papers that are published and more emphasis on quality.

For anyone who might be interested in such things, if you ever find yourself locked in a dungeon with nothing but a computer loaded with every journal article ever published you can find articles involving similar cohorts from the exact same universities that have different results.


The problem is not that research is not reproducible. Researchers are humans and full of biases. The problem is too much emphasis is put on novelty vs. reproducibility. It took a long time for software engineers to recognize the value of testing/validating vs. creating new code. In the same way the research world needs to put new emphasis on reproducing/validating results.


The problem is that too much emphasis is put on prestige and reputation than on actual impact of the work. Coming from a physics background I always felt that life scientists tend to be overly audacious in their claims, am I alone in this?


Is it only in life sciences? Remember the cold fusion and the faster than light particle fiascos?


So maybe we need rename Life Sciences to Life Arts?


This has been going around along with the Chemo Therapy doesn't work thread.

The problem with people like the author is they think all cancer is the same. They try to apply information about Hodgkins to non-hodgkins. They think "Breast cancer" is a disease, not a symptom. There over 40 different cancer causes for breast cancer. And the treatments that work are primarily based on the cause not the visible symptom.

This makes studies hard. Most people never learn the cause of their cancer. We are not fortunate enough to always have a single known cause that says "yes you worked in a nuclear waste plant for 6 years" and know that was the cause.

Chemo for example in Hodgkin's Lymphoma increase your 5 year survival chances by 45%. But if you have Smoking related Lung Cancer it is less than 2% difference.

Nature.com rarely produces articles that are informed. They push an agenda of Holistic medicine at the expense of scientific research. I am all for non-traditional medicine, but I don't discount the advances from University and Clinical research.


There's nuance here that's going to be totally lost on the general public as this becomes an anti-science talking point.


Does reproducible in this context mean:

The paper did not document their steps sufficiently, so the authors were unable to reproduce the experiment.

-- or --

The paper relied on circumstantial evidence that was not well-understood enough to be reproduced.

-- or --

The authors were able to perform the experiment as documented in the paper, but were unable to reproduce the results.

or something else?


Perhaps part of this is statistical errors and selection bias. Most studies are done with a 95% confidence interval, but only studies that reject the null hypothesis are published. If we assume only 20% of all experiments reject the null hypothesis, then we should expect ~25% of those to be fasle positives. Journals might also be biased to include more sensationalistic results, which are more likely to be statistical errors.

My guess is that if you included all test results, that reject the null hypothesis or not, and are published or not, then the reproducibility rate would be closer to 90%-95%.


> My guess is that if you included all test results, that reject the null hypothesis or not, and are published or not, then the reproducibility rate would be closer to 90%-95%.

No; you need to bring in power, not just assume an alpha of 0.05 and a base-rate of real results of 20%. (Consider the recent neuroscience paper estimating the experiments average a power of like 0.3...)


Cancer is such a weird beast that I'm actually not as alarmed by this as it might seem reasonable to be.

Only in the last year or so people have realized that how heterogeneous cancer is. A patient biopsied 10 times from different parts of the cancer turns out to have 10 different (but related) cancers. While such major findings are still occurring about the underlying nature of cancer, it is not surprising that studies are hard to produce; they don't even know which parameters need to be controlled yet to make them reproducible.


I found a company to precisely fix this problem. Every dataset, every analysis, every computation fully transparent, reproducible, accessible. You can even embed it in your blog. For instance check out https://igor.sbgenomics.com/lab/public/pipelines/ if someone uses one of those, and share their task, there is full transparency + instant replication.


The tradition in science publishing has been to leave out the nitty-gritty details. Most specialists will know (or quickly deduce) most available methods (unless the results are a radical improvement). As for the details, that's what conferences and telephones and joining the department and gossip are for.


Scientific publications have a history of having papers contradictory to each other for a long time. In medical research, findings are often based on statics results from limited samples. From that perspective, it will be a big surprise if they are more accurate than public polls.


After hearing similar things about psychology papers, this is rather disconcerting. This is why I am a climate change skeptic, I don't know whether it is happening due to CO2 or not, but I am confident that science can't be confident when they can't do control experiments. You can't control for any variable when it comes to the climate, let alone all the reasonable ones. We have infinitely more capabilities to control variables when researching cancer than we do with the climate of something as massive and complex as the earths climate, and it turns out our confidence is cancer research may kind of suck.

one interested in the subject should check out feynman's discussion on the psychological effect on scientific research. Millikan use bad assumptions for his oil drop experiment to determine the charge of an electron, but nobody would publish results that differed too much. http://en.m.wikipedia.org/wiki/Oil_drop_experiment


The possibility of CO2 causing global warming isn't 100% but it isn't very low either. It is the risk we are talking about, the risk of doing nothing and let the man made green house (if there is one) turn earth into an irreversible disastrous environment. Most CO2 emitting energy sources are not sustainable anyway and many of them emit other proved pollution as well. There is really nothing lost to going green energy, other than the cost of having some CO2 emitting energy reserved there.


Without qualification, I'm afraid what you're saying is parlously close to mere verbiage. Even the most ardent climate sceptic does not object to solar, wind or water energy under certain conditions. What is 'Going Green' then? In the UK it currently means, for instance, paying wind farms a million pounds sterling a day to produce no energy at all. Meanwhile poor people die because they can't afford to keep themselves warm thanks to horrendous energy bills punped up by huge subsidies to the likes of windmill owners and owners of land, hosting windmills.


"Going Green" does not equal to switch to renewable energy at the cost of poor people's lives for gods sake. It simply means that recognizing the global warming and have a plan to switch to renewable energy, for example, take some of the hundreds of billions of dollars of the oil companies' profits to invest on renewable energy.


proposed cap and tax carbon regimes disagree.


First, thanks for trying to add to the discussion, I appreciate it. I disagree on two points:

1. I make no judgement about whether or not CO2 causes global warming. It may cause it, it may cause it but to a lesser (or greater) effect than sun cycles, cosmic radiation (and it's effect on cloud formation) or natural variation in the climate. I'm just asserting that my confidence in knowing one way or the other when you can't create control experiments is low. Ultimately proving causation instead of simple correlation is something that I believe is near impossible for something as large and complicated as the climate. Especially when the data being analysed for correlation relies hundreds of thousands of years of extrapolated proxy data (tree rings, ice cores, sediment, etc).

2. (this is tangential to the discussion of reproducibility in science) I think there is something lost to going to green energy - cost and power density. The unfortunate fact is that fossil fuels are incredibly cheap compared to green energy and they are more easily transportable. Carbon regimes threaten the ability of the third world to build itself out of poverty and threaten the global economy in general. I'm not saying we shouldn't invest in new forms of energy (especially wind and nuclear), in fact I think we certainly should. But I don't want to hamstring economic progress for some calamitous event with something I have low confidence will actually happen.


You took an almost entirely unrelated story and made it about your political hobbyhorse. Please keep politics off HN.


The article was not about cancer. The whole point of the discussion is about reproducibility in scientific research. I was merely applying the same discussion and concept to a different branch of science. HN is a place for nerds to discuss interesting things. Things like science and it's applications. As far as I'm concerned, that is what I did.


And you just took an entire field off the table for discussion on the grounds that it looks like a "political hobbyhorse." Good job.


I realized that by replying i'm probably encouraging you to continue posting this non-sense. I agree with Joachim below, this is barely related to the article.


A scientific writer should state this result as 6 of 53, not 11%.


So ironic that the paper about these findings is not reproducible itself because the initial data set is not available for legal reasons.


It would have been 13% of papers reproducible if this blasted paper hadn't ruined things and included itself


Why not just say 6 papers!


This seems like an awesome example of Sturgeon's Law: 90% of everything is crap.


What the hell? That's 5.83 papers. How do you reproduce .83 of something?


Um, 6 papers out of 53, significant digits give you 11%. Not that difficult.


I guess saying "11% of 53 papers" was cooler than saying "6 out of 53". Meh.


Yeah, my guess is they used that form to imply 11% of all cancer research is unreproducible. But in ironic fashion, they can't show their own data and thus we can make no broad claims about it.


Also, this is why evidence base medicine is flawed. What we want is science based medicine.


NaturalNews brought this back up recently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: