My own rather short academic career involved doing lab work with three different PI-led groups. One PI was actually excellent, and I really had no idea how good I had it. I caught the other two engaging in deliberately fraudulent practices. For example, data they'd collect from experiments would be thrown out selectively so that they could publish better curve-fits. Another trick was fabricating data with highly obscure methods that other groups would be unlikely to replicate. They'd also apply pressure to graduate students to falsify data in order to get results that agreed with their previously published work.
The main difference between the excellent PI and the two fraudsters was that the former insisted on everyone in her lab keeping highly accurate and detailed daily lab notebooks, while the other two had incredibly poor lab notebook discipline (and often didn't even keep records!). She actually caught one of her grad students fudging data via this method, before it went to publication. Another requirement was that samples had to be blindly randomized before we analyzed them, so that nobody could manipulate the analytical process to get their desired result.
If you're thinking about going into academia, that's the kind of thing to look out for when visiting prospective PIs. Shoddy record keeping is a huge red flag. Inability to replicate results, and in particular no desire to replicate results, is another warning sign. And yes, a fair number of PIs have made careers out of publishing fraudulent results and never get caught, and they infest the academic system.
I would say that this applies even more so outside of academia. At early stages of development, a research group's or company's product is by necessity a report or a presentation rather than a physical plant's or process's real, quantifiable performance. No malicious intent is required; it's just all too easy to fool yourself or cherry-pick data to support desired conclusions when the recordkeeping is poor.
In my hard-tech experiment-heavy start-up there's no way we could have made any actual technical progress without setting up a solid data preservation and analysis framework first. For every experimental run, all the original sensor data are collected and immediately uploaded along with any photos, videos, and operator comments to a uniquely-tagged confluence page. Results and data from any further data or product analysis are linked to this original page.
As an anecdotal example, we recently caught swapped dataset labels in results from analysis performed on our physical samples by a third-party lab. We were able to do this easily just because we could refer back to every other piece of information regarding these samples, including the conditions in which they were generated months prior to this analysis. As soon all the data were on display at once, the discrepancies were obvious.
[PLUG]
Some of what you mention are "negative results" that are quite prevelant and a necessary part of any research. However, the expected mold at publishing venues is such that they are not considered worthwhile.
My colleagues and I are trying to address this by creating a platform to discuss and publish such "bad" or "negative" results. More info here:
My favorite article on this topic is "Escaping science's paradox" by Stuart Buck[1]. I'm, in particular, interested by the idea (at least within the United States) of "red teaming" science. This would involve having an independent agency funding attempts to replicate (and to find/publish flaws in) NSF- or NIH-funded projects, and publishing those. Ideally, the history of replication for authors' papers could then be part of the criteria for receiving funding for more novel research in the future.
Obviously, there's a few fields where this might not work (you can't just create a second Large Hadron Collider for validation), but in areas from sociology to organic chemistry to environmental science, I think there's a lot of promise in that method for helping to re-align incentives around producing solid, replicable research.
I took a science journalism class in college where our instructor had us read a paper and then write the news story that explained what was interesting about the result.
"You all got it wrong," he said, "the news is not that Amy Wagers could not make things work with mouse stem cells the way this prior paper said this one time. The news is that Wagers-ize is becoming a verb which means 'to disprove an amazing result after attempting to replicate it'. The lab has Wageres-ed another pluripotent stem cell result. The news is about how often this happens and what it means for this kind of science."
One thing I'd like to see is a requirement that for all government funded research, a certain percentage of that funding, say 30%, must go toward replicating other publicly funded research that has had less than 2 independent and non-affiliated labs replicate. Any original research couldn't be published until at least two independent and non-affiliated labs replicate based on the submitted paper and report on the results that can then be included with the original research. I'd like to see this across all of academia, but I imagine there are enough challenges with enforcing this in a productive manner already that doing it across all research becomes both impractical and difficult to prevent abuse. But at least with public funds, it would be nice to put in some checks to reduce the amount of fraudulent or sloppy research that tax payers pay for.
I should point out that the notion of "replication" can often be way more difficult and nuanced than people expect. For one, what is the scope of the replication? Would it be simply to re-run the analysis on the data and make sure the math checks out? Or would it be to re-collect the data according to the methods described by the original researchers?
The former is pretty easy, but only catches errors in the analysis phase (i.e., the data itself could be flawed). The latter is very comprehensive, but you essentially have to double up the effort on re-doing the study---which may not always be possible if you're studying a moving target (e.g., how the original SARS-CoV-2 variant spread through the initial set of hosts).
Here's an even easier set of requirement to simplify the first case:
- Require all research to publish their source code.
- Require all research to publish their raw data minus "PII".
* Note: I use "PII" here with the intention of it taking the most liberal meaning possible, where privacy trumps transparency absolutely and where de-anonymization is impossible. This would rule out a lot of data, and personally I think we could take a more balanced approach, but even this minimalist approach would be a vast improvement on the current situation.
"Not all" is a big understatement... I would estimate that less than 0.00001% of published research does this. Every time I talk about this to someone (colleagues in adjacent fields, PIs...), they seem to give zero pucks. It's really mind-boggling.
> re-run the analysis on the data and make sure the math checks out?
That isn't a replication in any meaningful sense. But a replication can certainly take many forms. An exact replication is one, another could be to do a conceptual replication, so studying the same effect but with a different design, or combining these with a new analysis pooling the data from both the study and the new study with (possibly) improved statistical analysis.
Be aware that despite how much focus replicability gets, it's only one of many things that goes wrong with research papers. Even if you somehow waved a magic wand and fixed replicability perfectly tomorrow, entire academic fields would still be worthless and misleading.
How can replicable research go wrong? Here's just a fraction of the things I've seen reading papers:
1. Logic errors. So many logic errors. Replicating something that doesn't make sense leaves you with two things that don't make sense: a waste of time and money.
2. Tiny effect sizes. Often an effect will "replicate" but with a smaller effect than the one claimed; is this a successful replication or not?
3. Intellectual fraud. Often this works by taking a normal English term and then at the start of your paper giving it an incorrect definition. Again this will replicate just fine but the result is still misinformation.
4. Incoherent concepts. What exactly does R0 mean in epidemiology and precisely how is it determined? You can replicate the calculations that are used but you won't be calculating what you think you are.
5. A lot of research isn't experimental, it's purely observational. You can't go back and re-observe the things being studied, only re-analyze the data they originally collected. Does this count?
6. Incredibly obvious findings. Wealthy parents have more successful children, etc. It'll replicate all right but so what? Why are taxpayers being made to fund this stuff?
7. Fraudulent practices that are nonetheless normalized within a field. The article complains about scientists Photoshopping western blots (a type of artifact produced in biology experiments). That's because editing your data in ways that make it fit your theory is universally understood to be fraud ... except in climatology, where scientists have developed a habit of constantly rewriting the databases that contain historical temperature records. And by "historical" we mean "last year" here, not 1000 years ago. These edits always make global warming more pronounced, and sometimes actually create warming trends where previously there were none (e.g. [1]). Needless to say climatologists don't consider this fraud. It means if you're trying to replicate a claim from climatology, even an apparently factual claim about a certain fixed year, you may run into the problem that it was "true" at the time it was made and may even have been replicated, but is now "false" because the data has been edited since.
Epidemiology has a somewhat similar problem - they don't consider deterministic models to be important, i.e. it may be impossible to get the same numbers out of a model as a paper presents, even if you give it identical inputs, due to race conditions/memory corruption bugs in the code. They do not consider this a problem and will claim it doesn't matter because the model uses a PRNG somewhere, or that they "replicated" the model outputs because they got numbers only 25% different.
What does it even mean to say a claim does or does not replicate, in fields like these?
All this takes place in an environment of near total institutional indifference. Paper replicates? Great. Nobody cares, because they all assumed it would. Paper doesn't replicate, or has methodological errors making replication pointless? Nobody cares about that either.
Your proposal suggests blocking publication until replication is done by independent labs. That won't work, because even if you found some way to actually enforce that (not all grants come from the government!), you'll just end up with lots of papers that can be replicated but are still nonsensical for other reasons.
If we study something that seems obvious and it's confirmed and replicated, now we actually know what everyone "knew." If common knowledge turns out to be wrong, we strike a false belief and add a true(er) one—even better.
And of course, you would be free to fund such studies with your own money, but so much science is taxpayer funded or subsidized in various ways that the ROI has to be treated as important.
"That's academic" is already a mild insult meaning useless or irrelevant, but that perception never had any impact on academia so far. The risk for the academy is that negative feelings grow, and then people start wondering why they're paying for so many studies where they're either shoddy or obvious. The justification for public funding is really only studies that:
a. Yield non obvious conclusions.
b. Correctly.
c. And which wouldn't have been funded by industry.
It's possible that the set of such studies is small, and in some fields there are probably zero such studies (e.g. my bête noire, twitter bot research).
Today the risk to academia is low because the political elites in western countries have all self-selected through credentialism and university based social networking, more or less. But all it takes is a populist candidate to get a big enough base and universities may find themselves in the firing lines, with little in the way of defense. It'd be better to prune the obvious studies now.
> Even if you somehow waved a magic wand and fixed replicability perfectly tomorrow, entire academic fields would still be worthless and misleading.
Even ignoring how, frankly, paternalistic and condescending this sentence comes off as, let's take the rest of your comment on face value.
First, I can agree with you that the parent comment's idea about not allowing publication without replication is a non-starter; it would basically be impossible to implement (in the U.S., you'd immediately run into 1st Amendment issues), and on a practical level, that doesn't match with almost any realistic scenario of how research both is or could be conducted. Anyone that can get published should be welcome to do so.
However, I think that the idea of funding a subset of replication studies would have great value, and your list of "issues" are largely knocking down strawmen. Let's look at this from the point of view of a funding agency, which annually pour billions of dollars into research.
1. Logic errors would be great! A replication report stating "the conclusions of this research are deeply invalid due to x,y, and z logic errors here, here, and here" would make the agency less likely to fund authors that make those errors. If the errors don't invalidate the conclusions, they're still worth noting and can be taken into account.
2. Effect sizes varying in replicated studies is still useful information. Not all "does/doesn't replicate" questions need a binary yes/no answer.
3. Similarly to #1, catching and publishing instances where authors are making invalid claims (even if not doing invalid research) seems like a good thing. "Hey, these researchers have a tendency to claim X when really they're talking about Y. What the heck?" is great information to know for a funding agency to know.
4. See #1 and #3. Generally, the authors of the replication studies are also going to be intelligent; if they're being paid to check for logical fallacies and incoherent concepts (one can imagine a first-year required grad course that lays out these ideas and case studies of how to catch them).
5. I've been thinking about a situations where this still wouldn't produce useful meta information, and I'm having a hard time coming up with them. Imagine a fish survey, where divers are doing fish counts along transects. You can't go re-observe those fish, but a) research groups might want to change their methods to include using cameras instead of marking dive slates, or b) if you are able to re-run their transects due to good record-keeping (yay, replication!) but don't see the same fish, the funding agency now knows that a) the original research group has solid experimental methods and b) maybe they should keep funding groups to run that transect because of the variation, or not fund groups using a single transect data as the basis for forming conclusions. Happy to consider non-outlier (i.e. building a second LHC) examples where this breaks down.
6. Yes, maybe don't fund groups to try replicating these findings. Or maybe do occasionally anyway--if the research groups are doing good work, it should be cheaper to conduct replication-based research than doing a novel one anyway.
7. This is a good argument for replicable research. A huge number of research proposals go unfunded, which is to say that many researchers are somewhat competing for funding. In this paradigm, saying "hey, my group found and thoroughly documented a huge, glaring issue in the methods used in these studies due to a bad-faith corruption of historical data," in a situation where agencies would be looking to pay groups for valid claims. Increasing the number of replication studies adds a greater adversarial capacity to a research system which helps to catch/prevent this kind of fraud in the first place.
This is already a long comment, but I can make similar points about almost all of your claims. In summary: you seem to be making the assumption that replication studies are basically done by robots blindly following an instruction manual of someone else's research. Especially given the funding incentives mentioned above, I don't think that would be true; rather, many errors would be caught (or prevented ahead of time) and lead to better, more solid research being done in the first place.
To be clear, I'm not against funding replication studies. That's a great thing that should be done. The risk is that if it were done people would think - great! There was a problem but science is fixed now. That wouldn't be true. The clear majority of papers I've read that were bad/unusable in some way in the past 10 years wouldn't have been helped by funding replication studies.
> (1) (3) (4) you seem to be making the assumption that replication studies are basically done by robots blindly following an instruction manual of someone else's research.
I guess we're using the word replication differently. You seem to be using it to mean a general re-review and re-analysis of everything - basically a funded more aggressive peer review followed by an actual re-running of the study sometimes, if it makes sense, whereas I'm using it to just mean re-running the study exactly as originally described to check the results are the same.
I think in science the term replication normally just means re-running the study exactly as described. It doesn't mean arguing with the authors over their definitions or the logical basis of the study itself. I've seen a bunch of cases where a replication fails and the original study team basically rejects the whole exercise by saying "They didn't follow our instructions so of course they got different results". And I mean, that's kind of fair, right? If you say "I did X and saw Y", and then someone else says it's not true but they didn't actually do X, surely the scientists have every right to be annoyed and reject the exercise? So good replications are exactly what they sound like - a replication of the original process. They aren't generalized adversarial funded peer reviews.
And that's why it's important to highlight that replication is only one aspect. If Congress or whomever goes and earmarks money for replication, then someone with money will eventually ask you to replicate a study with a flawed design. What do you do? You could:
a. Say no: the study is flawed. Replication would be useless because you'd just be repeating a nonsensical procedure. You get no money.
b. Take the money and re-run it. The study conclusions are still wrong but now you got a paper published, and the original is successfully replicated so journalists/professors will use that as a stamp of approval. You may even want the study to replicate because it's professionally or ideologically useful.
c. Try to 'fix' the original design and do an improved study. This just starts the process over from scratch, it doesn't yield evidence.
It'll be (b) or (c) of course, every time. That's just how the incentives are set up and besides, doing another study can be done without involving the original authors. The moment you go down route (a) you're going to get pushback. They won't agree their definitions are illogical. So, it'll devolve into an exchange of letters that nobody ever sees or cares about, and granting agencies won't know how to value it.
> 2. Effect sizes varying in replicated studies is still useful information. Not all "does/doesn't replicate" questions need a binary yes/no answer.
But the bureaucracy needs actionable outcomes, otherwise there's no point. If someone can point at a replication and say "ah yes, we claimed our educational intervention would boost grades by 2x in 15 year olds and governments spent money on that basis, but a replicated 0.1% improvement nonetheless proves us right" then nobody outside the system will consider this a valid replication. There must be actual outcomes from a failure to replicate meaning you'd have to draw the line at how much delta is allowable from the original numbers, but nobody is even having that discussion let alone doing anything about it.
For (5), it was more of a question than anything else. Replication using originally collected data might be useful sometimes but the question is whether the general public would consider this to be a genuine cross-check. My guess is no. Science has no formal mechanisms to detect made up or fraudulent data sets, which is a real problem, so if you allow re-analysis of originally collected data you'll get a steady stream of situations where a study is announced, some skeptics say "uhhh that sounds wrong", the media/academic institutions beat up on them claiming it's peer reviewed and replicated which is gold standard so you've got to believe, and then it turns out the whole original study + replications were all based on fraud. What could be more damaging than that? A replication is a type of audit. These have to take into account the possibility of people playing games for profit.
> I guess we're using the word replication differently. You seem to be using it to mean a general re-review and re-analysis of everything - basically a funded more aggressive peer review followed by an actual re-running of the study sometimes, if it makes sense, whereas I'm using it to just mean re-running the study exactly as originally described to check the results are the same.
Yes, I think that this is the crux of the matter here, and I agree that merely funding direct, vanilla replication is not the optimal solution here and would not solve many of the problems we're facing. I'm really talking about the idea from the thread I linked in my original comment--the idea of "red teams" for science. There's probably a number of ways to accomplish this, and direct replication could play a role, but I imagine that a funding agency tasked with funding red teams for other government-funded research would lean towards a more diverse and holistic toolkit. You'd certainly need to have some mechanisms for "watching the watchers," but again, very few researchers I've met are stupid, many of them actually care pretty deeply about ethics and scientific integrity, and would probably jump at the chance for a steady paycheck while helping to advance that.
> Science has no formal mechanisms to detect made up or fraudulent data sets, which is a real problem [...]
This isn't strictly true (I'd like encourage people not to use "Science" as a proper noun; I think it's actually a bit unhelpful). I think a common approach to performing research using the scientific method often (almost always?) involves using observation across multiple instances, then using statistics to draw a generalized conclusions based on the group of observations. And statistics does have formal methods for identifying potentially fraudulent datasets, some of which a Dr. Hilgard does a great job of explaining in an actual case of research fraud he found in this article[1].
Ah yes, Hilgard's article is great. I link to an article that links to it elsewhere in this thread. SPRITE and such are very clever but such ammo is very limited. As Hilgard himself points out, the first thing institutions do when faced with a claim of fraudulent data is immediately tell the accused everything. Then there's either no investigation at all or a useless one. All this approach does is teach fraudsters how to avoid detection, and let's face it, the moment there are any consequences, fraudulent scientists will just start running SPRITE and GRIM on their own papers before publication. Fighting fraud requires massive consequences for fraudsters, otherwise if they fail they'll just keep trying again until they learn to evade detection.
Full blown red teams would be great, but seem very hard to set up. I've actually talked to British MPs about this and made some concrete proposals along these lines, but there are several blockers:
1. General apathy / lack of political will. Some of the more savvy MPs know their scientific advisors weren't reliable in COVID and understand the underlying issues, but the vast majority do not and would rather not think about it.
2. The moment you say let's set up a red team, you face the question of how to find high integrity well trained people that have exactly the same idea of what good science is as you do. Particularly problematic: do you have field-specific red teams? If so what stops Ferguson-style problems where the entire field agrees that it's unreasonable to even expect mathematical models to be replicable? You really need outsiders with the highest standards, who won't accept justifications like "We rewrote our dataset with a model because that's just how we do things in our field and who are you to argue with us?".
I think there's a meme about how physicists are always telling other fields they're doing it wrong. Maybe red teams need to be made of physicists :)
3. What exactly is Science, proper noun intended? To red team you need to have a very clear definition of what the scientific method actually is. Who should come up with this? I ended up suggesting a committee of MPs should do it even though that sounds wrong, because ultimately they're the ones authorizing the flow of money, and they're the ultimate outsiders. The moment you start picking insiders it's going to turn into a huge fight (who are these physics nerds to tell us we can't do experiments with 30 undergrads? etc)
Being on the science red team could also be really cool and fun. Since the goal is to explore the type of error or lie that gets through reliably, put new scientists on a team with some old greybeard, let's pass along that hard earned "how to screw up cleverly" experience.
>Being on the science red team could also be really cool and fun.
I think it depends on what you're investigating, and how much is at stake. I doubt it would be much fun to be put on a corporate hit list.
>The court was told that James Fries, professor of medicine at Stanford University, wrote to the then Merck head Ray Gilmartin in October 2000 to complain about the treatment of some of his researchers who had criticised the drug.
>"Even worse were allegations of Merck damage control by intimidation," he wrote, ... "This has happened to at least eight (clinical) investigators ... I suppose I was mildly threatened myself but I never have spoken or written on these issues."
Talk to people who have actually done it. Not one will tell you it's cool or fun. Here's how science red teaming actually goes:
1. You download a paper and read it. It's got major, obvious problems that look suspiciously like they might be deliberate.
2. You report the problems to the authors. They never reply.
3. You report the problems to the journals. They never reply.
4. You report the problems to the university where those people work. They never reply.
5. Months have passed, you're tired of this and besides by now the same team has published 3 more papers all of which are also flawed. So you start hunting around for people who will reply, and eventually you find some people who run websites where bad science is discussed. They do reply and even publish an article you wrote about what is going wrong in science, but it's the wrong sort of site so nobody who can do anything about the problem is reading it.
6. In addition if you red-teamed the wrong field, you get booted off Twitter for "spreading misinformation" and the press describe you as a right wing science denier. Nobody has ever asked you what your politics are and you're not denying science, you're denying pseudo-science in an effort to make actual science better, but none of that matters.
7. You realize that this is a pointless endeavour. The people you hoped would welcome your "red teaming" are actually committed to defending the institutions regardless of merit, and the people who actually do welcome it are all ideologically persona non grata in the academic world - even inviting them to give a talk risks your cancellation. The End.
An essay that explores this problem from the perspective of psychology reform can be found here:
One problem is that the amount of scientific output is increasing at an increasing rate.
This means that the vast, vary majority of works will never be considered for replication - even with a dedicated replication institute. So for most applicants, the amount of replicated results will be 0.
it's the same as everything. there should be more and easier money for the less rewarding task of verification/replication. some people actually enjoy this sort of work just as much as some people enjoy being on the bleeding edge... but there are probably less of them.
where it would get complicated is also the same as everything. when the verification effort neither supports nor refutes the original one. many would argue that it means it wasn't done right, but lots of things aren't done right in life.
then there can be the triple replication revolution! so it goes...
This isn't intended as snarky, but I don't understand what "it's the same as everything" is supposed to mean. What is "it"? What is "everything"? Why are they the same?
I'd also argue that your reduction of this problem sort of misses the point. One of the big problems with the way that studies are done is not that replication efforts aren't conclusive (it's very difficult to prove something doesn't exist), it's that a) non-replicable studies are generally considered as valuable as replicable ones, and b) as a result, it's extremely difficult to replicate many studies to begin with, because there's no incentive to take the time to make it possible. Even if the end result of a replication paper is "we couldn't produce the same results", the people working on it can say "this author's experiments were exceedingly difficult to even try to reproduce," or conversely "we didn't get the same results, but their data collection methods and analysis code were well-documented and accessible." That has a lot of value!
If you tried doing triple replication for every paper, I agree that maybe wouldn't be the best use of resources. But the current state of affairs is so bad that a well-organized drive to create single-attempt replication on a fraction of publicly-funded projects has the potential to be a significant driver of change.
"the same as everything" is an observation that often times verification/correctness/accuracy efforts are tossed aside in favor of new development and this is a truism across many fields. in science you see this as funding being committed to shiny new nature and science cover stories, with replication being left as an afterthought. in software you see this as heavy commitments to new features that drive revenue, with security/compliance/architecture and qa remaining underfunded and less respected. (until, of course, the problems that result from underfunding them make themselves apparent).
Funding replication is a great idea, but cannot solve this. It would require roughly add much funding add now goes to science merely to replicate results produced now. That still leaves a rather hefty backlog. Moreover, the pace with which scientific output doubles is increasing. From top of my head out would be below a decade nowadays.
Even if 90% of those would not need replication (new algorithms that work), then merely keeping pace would basically require 1 in 10 institutions to devote itself fully to replication studies. Even then we'd need more capacity to look at previous results - that 10% is fully needed to investigate new results.
Note that this is optimistic: I'd expect the percentage of publications where a reproducibility study makes sense to be above 50%.
Worked on a NASA program once, about measuring Earth Science data. We built a database application to gather suggested requirements from members of the earth science community. One such member from our own team helped develop specs for our system.
After we built it, she wanted to measure its utility and usability. She watched as users navigated and entered data into the system. She also asked myself and members of my team who developed the developed the software to use it and be measured. I and one other developer (2 of 3 members) explained why we implemented each feature as we were utilizing the system. The "scientist" measuring us all promptly published as a conclusion in her paper "The usability of the system was better for inexperienced users than it was for experienced users. The experienced users took nearly 50% longer to navigate and enter similar requirements".
She basically made up an "interesting" conclusion by omitting characterization of our testing session, where we explained how we implemented her requirements.
It's crazy to me that academic fraud isn't a more pressing concern to society in general and academia in particular. The scientific process as currently implemented is broken across every single discipline. Even subjects like CS that should in theory be trivially reproducible, are rarely so. The reproduction crisis is still going on, but only nerds like us care.
I think there's two problems impeding our ability to focus better on this:
1) for many people, the idea that science has widespread fraud is just hard to accept; in this respect it is similar to the difficulties that many religious communities have in accepting that their clergy could have a corruption problem
2) the solutions require thinking about problems like p-hacking, incentives, selection effects, and other non-trivial concepts that are tough for the average person to wrap their heads around.
I have. According to [1], "1 in 4 cancer research papers contains faked data". As the article argues, the standards are perhaps unreasonably strict, but even by more favorable criteria, 1/8 of the papers contained faked data. Interestingly, [2] using the same approach, found fraud in 12.4% of papers in the International Journal of Oncology. More broadly, [2] found fraud in about 4% of the papers studied (782 of 20,621). I'd say that's pretty widespread, but you further have to consider that these papers focused narrowly on a very specific type of fraud that is easy to detect (image duplication), so we would expect the true number of fraudulent papers to be much higher.
That's a fair point, although fraud per se is only a small part of all the problems. There's other forms of corruption than fraud, and a lot of it falls into this zone of plausible deniability rather than outright fraud. Also, I think the problems tend to find most weight with higher concentration of power, so what matters isn't as much "how widespread is corruption?" but rather "how is corruption distributed among power structures in academics and what is rewarded?"
I've often thought religious corruption is a good analogy, in that many of the societal dynamics are very similar. As I'm writing this the parallels are interesting to think about relative to US politics.
It it is a good analogy. For most lay people, science is a religion. They lack the expertise to understand the theory, but they unquestioningly accept the explanations and interpretations of the so-called experts.
Most people don't understand astronomy and physics well enough to prove to themselves that the earth orbits the sun and not vice-versa. Yet they believe it does, with certainty, because they have been taught that it is true.
There are many causes of the lack of concern, but I think at the heart the problem, at least in the US, is that science has become politicized such that attempts at reform are mischaracterized for political gain. There's also a bit of ignorance in the general public, but that's only part of it.
For example, if some on the right suggest some difficult but needed reforms, it tends to be spun as an attack on science. Or complaints that trivial projects are being overfunded get misinterpreted by the right and they try to make an example of the wrong studies for the wrong reasons.
The pandemic was a good example of this in my mind, in that I think there were some serious systematic problems in academics and healthcare that were laid brutally bare, and many people suffered or died as a result. But then the whole thing got misidentified and sucked into the political vortex and all you end up with are hearings about how to rehabilitate the CDC, as if that is the problem and not a symptom of even bigger problems.
I still think there are ways for things to change, but the most likely of them involve unnecessary suffering and chaos.
I can give a different perspective. It is not because of politicization IMO - at least not in the hard sciences. The problem comes from way up, from Congress because the immediate impact of science is not obvious. Especially, for basic sciences the impact takes decades to be really felt.
But then how do you do promotions? How do you judge output? Worse still, how does US Congress justify spending taxpayer dollars. Rather than acknowledging that any short term measurement of the quality of science is a fool's errand, we have doubled down on meaningless metrics like impact factors and h-indices. And this is what we have as a result.
If you're any good at your chosen specialty you get a "feel" for the bullshit. I know this doesn't help the public. My experience is in medical research, crystallography, and computer science. Here's an example for detecting "bullshit" in cardiology: call up the MD PI from the published paper and ask to review anonymized charts from patients targeted with the procedure. Are there any? Then, the research is probably good; are there none? It's probably because it'd kill the patient. Similarly, in Programming Language Theory: we'd just ask which popular compilers added the pass. Is it on in -O3 in LLVM? Serious fucking result; is it in some dodgy branch in GHC? Not useful.
It's not really broken because fraud gets discovered eventually.
When I was working on my PhD, different lab were known to have different rigor. If a new discovery was made, skepticism was the norm. Until reproduced and proven, a new discovery was suspect.
I mean it's clear that there is benefit to stamping out fraud just because it's inefficient, but in the end, the scientific method filters it out.
One of those papers is about counting the scale and scope of online political advertising during the 2020 election. How does one reproduce that study? The 2020 election is long past, and that data isn't archived anywhere other than what the researchers have already collected. This is a pretty simple empiracal data collection tastk, but you can't just re-measure that today because that study is about a moving target.
I worked on this problem 25 years ago. It hasn't gone away.
In general, performance comparisons are hard to reproduce. For instance, when benchmarking network protocols, often a tiny change in configuration makes a big change in a results. You might change the size of a buffer from 150 packets to 151 packets and see performance double (or halve.)
Instead of making measurements with some arbitrary choices for parameters, you can take lots of measurements with parameters randomly varied to show a distribution of measurements. It's hard work to track down all the possible parameters and decide on a reasonable range for them, so it's rarely done. I found many 10x variances in network protocol performance (like how fairly competing TCP streams can sharing bandwidth).
The big idea was to show that by randomizing some decisions in the protocol (like discarding packets with some probability as the buffer gets full) you can make the performance less sensitive to small changes. ie, more reproducible. Less sensitivity is especially good when you care about the worst-case performance rather than average. It can also make tuning a protocol much easier, since you aren't constantly being fooled by unstable performance.
Performance sensitivity analysis is hard work, so most papers are just like "we ran our new thing 3 times and got similar numbers so there you go."
I always thought it would be a good idea to start a journal that has a lab submit their methods and intent of study for peer review and approval / denial PRIOR to performing the work. Then, if approved, and as long as they adhere to the approved methods, they get published regardless of outcome. That would really encourage the publishing of negative results and eliminate a lot of the incentive to fudge the numbers on the data. It would probably overly reward pre-existing clout, but frankly that's a problem ANYWAY.
This is done with clinical trials (or at least it's recommended). Many researchers register their study at https://clinicaltrials.gov/ before data collection starts. I'm not sure if something similar exists for lab based research.
As long as there is some quantitative criteria on which jobs and promotions depend, there will be people gaming the system.
One solution is to couple this quantitative criteria with independent committees that assess people beyond the metrics, but that requires a lot of human effort and not scalable.
Assessing people in ways that don’t scale seem to be the way to avoid this gaming trap in academia.
I'd argue that it's not just that the metrics don't scale but the problem is that we are trying to find quantitative metrics for something that can't be easily quantified. The worst outcome is not even the forgeries and fakes as in this article, but more that even the vast majority of ethical academics are being pushed into a direction that is detrimental to longterm scientific progress, in particular short term outcomes instead of longterm progress.
even the vast majority of ethical academics are being pushed into a direction that is detrimental to longterm scientific progress, in particular short term outcomes instead of longterm progress
I agree. The egregious fraud is just the high-sigma wing. It's a symptom, but the real problem is how it affects the majority.
It’s why one needs both. You need both undeniable productivity by quantitative metrics, as well as glowing reviews from independent panels that are not influenced by favoritism (almost like an audit).
I was thinking it's much like Google trying to deal with SEO. Most people optimize for high google ranking, not quality content.
Google periodically changes the evaluation, in theory to reward good content and penalize bad, but people still try to game the system, rather than improving content.
Could it be there's just too much science being done for much of it to be any use? And that this oversupply causes these schemes, as a side-effect? If so, selling authorship is merely a symptom of the worthlessness of most modern science.
For much of human history, science was something you did in your spare time - or, if you were exceptional, you might have a patron. Then nation states discovered the value of technology and science, and wanted more, and so have created science factories. But, perhaps unsurprisingly, the rate of science production cannot really be improved in this way, and yet the economics of science demand that is does. This disconnect between reality and expectation is the root of this problem, and many others.
Something I was wondering is if faking results is so common, then surely these things they are researching must never be used in any application right? If they were, it would quickly be found that it does not actually work...
This is exactly how it works in practice. Anyone who works at the bench learns quickly to spot the frauds and fakes and avoids them. That's the "replication" everyone talks about, no special agency to waste funds on boring stuff needed.
> If they were, it would quickly be found that it does not actually work...
Unfortunately some of the effect sizes are so small that it's hard to tell what's working or not. The results of papers on body building, for instance, are definitely put into practice by some people. If the claim of the paper is that eating pumpkin [EDIT] decreases muscle recovery time by 5%, how is an individual who starts eating pumpkin supposed to notice that he's not getting any particular benefit from following its advice? Particularly if he's also following random bits of advice from a dozen other papers, half of which are valid and half of which are not?
One problem I've observed is that people applying things often cargo-cult "proven" things from the scientific literature that aren't actually proven. It's easier to say that you're following "best practices" than it is to check that what you're doing works, unfortunately.
Or a government printing money to hire private contractors, completely disregarding its ability do anything on its own.
To some extent, this is the curse of being the creator of the global reserve currency. The US government can, in theory, print as much money as it wants and pay off whoever it wants to do whatever it wants. This also extends to the academic and financial (VC) sectors, because a lot of that liquidity comes directly from the Government/Fed.
Unfortunately this leads to a culture of corruption (who gets the grants/contracts/funding?) and widespread fraud. This causes the ROI of money printing to go down, the money printer accelerates and we get inflation too.
This is in fact incredibly wrong. At least in my field, there is so much more data than there are qualified experts to analyze it. For one reason, academia pays so much less than the private sphere that post docs are leaving.
Maybe it's a deeper problem related to western liberal notions that anyone can do anything if they just "set their mind to it". We have a glut of "professionals" across industries and institutions who don't really have any business being there, but the machine requires that they appear to be useful, and so mechanisms emerge to satisfy this constraint. A consequence in science is a long list of poor quality junk publications, and few people are willing to acknowledge the nakedness of the emperor for fear of losing their positions, but because doing so may betray their own redundancy.
I think there are valid reasons why people don't trust science. It's not that we think scientists are malice, it's just they can be lazy and incompetent like people in other careers. In addition, incentives are clear in industry, they are either not clear or completely broken in academia, so the quality of their work is too much dependent on their integrity.
I was a chemistry researcher working on renewables, and during my master's thesis 9 months were spent validating fake results (from a publication of a scientist who worked in our group moreover).
This cannot be allowed. If institutions allow scientists to publish fake papers, then science itself will fail. I hope you raised hell about this and got the researcher fired.
The main difference between the excellent PI and the two fraudsters was that the former insisted on everyone in her lab keeping highly accurate and detailed daily lab notebooks, while the other two had incredibly poor lab notebook discipline (and often didn't even keep records!). She actually caught one of her grad students fudging data via this method, before it went to publication. Another requirement was that samples had to be blindly randomized before we analyzed them, so that nobody could manipulate the analytical process to get their desired result.
If you're thinking about going into academia, that's the kind of thing to look out for when visiting prospective PIs. Shoddy record keeping is a huge red flag. Inability to replicate results, and in particular no desire to replicate results, is another warning sign. And yes, a fair number of PIs have made careers out of publishing fraudulent results and never get caught, and they infest the academic system.