One thing I'd like to see is a requirement that for all government funded resear...

the_snooze · on April 11, 2022

I should point out that the notion of "replication" can often be way more difficult and nuanced than people expect. For one, what is the scope of the replication? Would it be simply to re-run the analysis on the data and make sure the math checks out? Or would it be to re-collect the data according to the methods described by the original researchers?

The former is pretty easy, but only catches errors in the analysis phase (i.e., the data itself could be flawed). The latter is very comprehensive, but you essentially have to double up the effort on re-doing the study---which may not always be possible if you're studying a moving target (e.g., how the original SARS-CoV-2 variant spread through the initial set of hosts).

epgui · on April 11, 2022

Here's an even easier set of requirement to simplify the first case:

- Require all research to publish their source code.

- Require all research to publish their raw data minus "PII".

* Note: I use "PII" here with the intention of it taking the most liberal meaning possible, where privacy trumps transparency absolutely and where de-anonymization is impossible. This would rule out a lot of data, and personally I think we could take a more balanced approach, but even this minimalist approach would be a vast improvement on the current situation.

bjelkeman-again · on April 11, 2022

When I learned at university that not all published research, especially government funded, didn’t do this already I was dumbfounded.

epgui · on April 11, 2022

"Not all" is a big understatement... I would estimate that less than 0.00001% of published research does this. Every time I talk about this to someone (colleagues in adjacent fields, PIs...), they seem to give zero pucks. It's really mind-boggling.

OrderlyTiamat · on April 11, 2022

> re-run the analysis on the data and make sure the math checks out?

That isn't a replication in any meaningful sense. But a replication can certainly take many forms. An exact replication is one, another could be to do a conceptual replication, so studying the same effect but with a different design, or combining these with a new analysis pooling the data from both the study and the new study with (possibly) improved statistical analysis.

mike_hearn · on April 11, 2022

Be aware that despite how much focus replicability gets, it's only one of many things that goes wrong with research papers. Even if you somehow waved a magic wand and fixed replicability perfectly tomorrow, entire academic fields would still be worthless and misleading.

How can replicable research go wrong? Here's just a fraction of the things I've seen reading papers:

1. Logic errors. So many logic errors. Replicating something that doesn't make sense leaves you with two things that don't make sense: a waste of time and money.

2. Tiny effect sizes. Often an effect will "replicate" but with a smaller effect than the one claimed; is this a successful replication or not?

3. Intellectual fraud. Often this works by taking a normal English term and then at the start of your paper giving it an incorrect definition. Again this will replicate just fine but the result is still misinformation.

4. Incoherent concepts. What exactly does R0 mean in epidemiology and precisely how is it determined? You can replicate the calculations that are used but you won't be calculating what you think you are.

5. A lot of research isn't experimental, it's purely observational. You can't go back and re-observe the things being studied, only re-analyze the data they originally collected. Does this count?

6. Incredibly obvious findings. Wealthy parents have more successful children, etc. It'll replicate all right but so what? Why are taxpayers being made to fund this stuff?

7. Fraudulent practices that are nonetheless normalized within a field. The article complains about scientists Photoshopping western blots (a type of artifact produced in biology experiments). That's because editing your data in ways that make it fit your theory is universally understood to be fraud ... except in climatology, where scientists have developed a habit of constantly rewriting the databases that contain historical temperature records. And by "historical" we mean "last year" here, not 1000 years ago. These edits always make global warming more pronounced, and sometimes actually create warming trends where previously there were none (e.g. [1]). Needless to say climatologists don't consider this fraud. It means if you're trying to replicate a claim from climatology, even an apparently factual claim about a certain fixed year, you may run into the problem that it was "true" at the time it was made and may even have been replicated, but is now "false" because the data has been edited since.

Epidemiology has a somewhat similar problem - they don't consider deterministic models to be important, i.e. it may be impossible to get the same numbers out of a model as a paper presents, even if you give it identical inputs, due to race conditions/memory corruption bugs in the code. They do not consider this a problem and will claim it doesn't matter because the model uses a PRNG somewhere, or that they "replicated" the model outputs because they got numbers only 25% different.

What does it even mean to say a claim does or does not replicate, in fields like these?

All this takes place in an environment of near total institutional indifference. Paper replicates? Great. Nobody cares, because they all assumed it would. Paper doesn't replicate, or has methodological errors making replication pointless? Nobody cares about that either.

Your proposal suggests blocking publication until replication is done by independent labs. That won't work, because even if you found some way to actually enforce that (not all grants come from the government!), you'll just end up with lots of papers that can be replicated but are still nonsensical for other reasons.

[1] https://nature.com/articles/nature.2015.17700

bulatb · on April 11, 2022

If we study something that seems obvious and it's confirmed and replicated, now we actually know what everyone "knew." If common knowledge turns out to be wrong, we strike a false belief and add a true(er) one—even better.

mike_hearn · on April 12, 2022

And of course, you would be free to fund such studies with your own money, but so much science is taxpayer funded or subsidized in various ways that the ROI has to be treated as important.

"That's academic" is already a mild insult meaning useless or irrelevant, but that perception never had any impact on academia so far. The risk for the academy is that negative feelings grow, and then people start wondering why they're paying for so many studies where they're either shoddy or obvious. The justification for public funding is really only studies that:

a. Yield non obvious conclusions.

b. Correctly.

c. And which wouldn't have been funded by industry.

It's possible that the set of such studies is small, and in some fields there are probably zero such studies (e.g. my bête noire, twitter bot research).

Today the risk to academia is low because the political elites in western countries have all self-selected through credentialism and university based social networking, more or less. But all it takes is a populist candidate to get a big enough base and universities may find themselves in the firing lines, with little in the way of defense. It'd be better to prune the obvious studies now.

qchris · on April 11, 2022

> Even if you somehow waved a magic wand and fixed replicability perfectly tomorrow, entire academic fields would still be worthless and misleading.

Even ignoring how, frankly, paternalistic and condescending this sentence comes off as, let's take the rest of your comment on face value.

First, I can agree with you that the parent comment's idea about not allowing publication without replication is a non-starter; it would basically be impossible to implement (in the U.S., you'd immediately run into 1st Amendment issues), and on a practical level, that doesn't match with almost any realistic scenario of how research both is or could be conducted. Anyone that can get published should be welcome to do so.

However, I think that the idea of funding a subset of replication studies would have great value, and your list of "issues" are largely knocking down strawmen. Let's look at this from the point of view of a funding agency, which annually pour billions of dollars into research.

1. Logic errors would be great! A replication report stating "the conclusions of this research are deeply invalid due to x,y, and z logic errors here, here, and here" would make the agency less likely to fund authors that make those errors. If the errors don't invalidate the conclusions, they're still worth noting and can be taken into account.

2. Effect sizes varying in replicated studies is still useful information. Not all "does/doesn't replicate" questions need a binary yes/no answer.

3. Similarly to #1, catching and publishing instances where authors are making invalid claims (even if not doing invalid research) seems like a good thing. "Hey, these researchers have a tendency to claim X when really they're talking about Y. What the heck?" is great information to know for a funding agency to know.

4. See #1 and #3. Generally, the authors of the replication studies are also going to be intelligent; if they're being paid to check for logical fallacies and incoherent concepts (one can imagine a first-year required grad course that lays out these ideas and case studies of how to catch them).

5. I've been thinking about a situations where this still wouldn't produce useful meta information, and I'm having a hard time coming up with them. Imagine a fish survey, where divers are doing fish counts along transects. You can't go re-observe those fish, but a) research groups might want to change their methods to include using cameras instead of marking dive slates, or b) if you are able to re-run their transects due to good record-keeping (yay, replication!) but don't see the same fish, the funding agency now knows that a) the original research group has solid experimental methods and b) maybe they should keep funding groups to run that transect because of the variation, or not fund groups using a single transect data as the basis for forming conclusions. Happy to consider non-outlier (i.e. building a second LHC) examples where this breaks down.

6. Yes, maybe don't fund groups to try replicating these findings. Or maybe do occasionally anyway--if the research groups are doing good work, it should be cheaper to conduct replication-based research than doing a novel one anyway.

7. This is a good argument for replicable research. A huge number of research proposals go unfunded, which is to say that many researchers are somewhat competing for funding. In this paradigm, saying "hey, my group found and thoroughly documented a huge, glaring issue in the methods used in these studies due to a bad-faith corruption of historical data," in a situation where agencies would be looking to pay groups for valid claims. Increasing the number of replication studies adds a greater adversarial capacity to a research system which helps to catch/prevent this kind of fraud in the first place.

This is already a long comment, but I can make similar points about almost all of your claims. In summary: you seem to be making the assumption that replication studies are basically done by robots blindly following an instruction manual of someone else's research. Especially given the funding incentives mentioned above, I don't think that would be true; rather, many errors would be caught (or prevented ahead of time) and lead to better, more solid research being done in the first place.

mike_hearn · on April 12, 2022

To be clear, I'm not against funding replication studies. That's a great thing that should be done. The risk is that if it were done people would think - great! There was a problem but science is fixed now. That wouldn't be true. The clear majority of papers I've read that were bad/unusable in some way in the past 10 years wouldn't have been helped by funding replication studies.

> (1) (3) (4) you seem to be making the assumption that replication studies are basically done by robots blindly following an instruction manual of someone else's research.

I guess we're using the word replication differently. You seem to be using it to mean a general re-review and re-analysis of everything - basically a funded more aggressive peer review followed by an actual re-running of the study sometimes, if it makes sense, whereas I'm using it to just mean re-running the study exactly as originally described to check the results are the same.

I think in science the term replication normally just means re-running the study exactly as described. It doesn't mean arguing with the authors over their definitions or the logical basis of the study itself. I've seen a bunch of cases where a replication fails and the original study team basically rejects the whole exercise by saying "They didn't follow our instructions so of course they got different results". And I mean, that's kind of fair, right? If you say "I did X and saw Y", and then someone else says it's not true but they didn't actually do X, surely the scientists have every right to be annoyed and reject the exercise? So good replications are exactly what they sound like - a replication of the original process. They aren't generalized adversarial funded peer reviews.

And that's why it's important to highlight that replication is only one aspect. If Congress or whomever goes and earmarks money for replication, then someone with money will eventually ask you to replicate a study with a flawed design. What do you do? You could:

a. Say no: the study is flawed. Replication would be useless because you'd just be repeating a nonsensical procedure. You get no money.

b. Take the money and re-run it. The study conclusions are still wrong but now you got a paper published, and the original is successfully replicated so journalists/professors will use that as a stamp of approval. You may even want the study to replicate because it's professionally or ideologically useful.

c. Try to 'fix' the original design and do an improved study. This just starts the process over from scratch, it doesn't yield evidence.

It'll be (b) or (c) of course, every time. That's just how the incentives are set up and besides, doing another study can be done without involving the original authors. The moment you go down route (a) you're going to get pushback. They won't agree their definitions are illogical. So, it'll devolve into an exchange of letters that nobody ever sees or cares about, and granting agencies won't know how to value it.

> 2. Effect sizes varying in replicated studies is still useful information. Not all "does/doesn't replicate" questions need a binary yes/no answer.

But the bureaucracy needs actionable outcomes, otherwise there's no point. If someone can point at a replication and say "ah yes, we claimed our educational intervention would boost grades by 2x in 15 year olds and governments spent money on that basis, but a replicated 0.1% improvement nonetheless proves us right" then nobody outside the system will consider this a valid replication. There must be actual outcomes from a failure to replicate meaning you'd have to draw the line at how much delta is allowable from the original numbers, but nobody is even having that discussion let alone doing anything about it.

For (5), it was more of a question than anything else. Replication using originally collected data might be useful sometimes but the question is whether the general public would consider this to be a genuine cross-check. My guess is no. Science has no formal mechanisms to detect made up or fraudulent data sets, which is a real problem, so if you allow re-analysis of originally collected data you'll get a steady stream of situations where a study is announced, some skeptics say "uhhh that sounds wrong", the media/academic institutions beat up on them claiming it's peer reviewed and replicated which is gold standard so you've got to believe, and then it turns out the whole original study + replications were all based on fraud. What could be more damaging than that? A replication is a type of audit. These have to take into account the possibility of people playing games for profit.

qchris · on April 12, 2022

> I guess we're using the word replication differently. You seem to be using it to mean a general re-review and re-analysis of everything - basically a funded more aggressive peer review followed by an actual re-running of the study sometimes, if it makes sense, whereas I'm using it to just mean re-running the study exactly as originally described to check the results are the same.

Yes, I think that this is the crux of the matter here, and I agree that merely funding direct, vanilla replication is not the optimal solution here and would not solve many of the problems we're facing. I'm really talking about the idea from the thread I linked in my original comment--the idea of "red teams" for science. There's probably a number of ways to accomplish this, and direct replication could play a role, but I imagine that a funding agency tasked with funding red teams for other government-funded research would lean towards a more diverse and holistic toolkit. You'd certainly need to have some mechanisms for "watching the watchers," but again, very few researchers I've met are stupid, many of them actually care pretty deeply about ethics and scientific integrity, and would probably jump at the chance for a steady paycheck while helping to advance that.

> Science has no formal mechanisms to detect made up or fraudulent data sets, which is a real problem [...]

This isn't strictly true (I'd like encourage people not to use "Science" as a proper noun; I think it's actually a bit unhelpful). I think a common approach to performing research using the scientific method often (almost always?) involves using observation across multiple instances, then using statistics to draw a generalized conclusions based on the group of observations. And statistics does have formal methods for identifying potentially fraudulent datasets, some of which a Dr. Hilgard does a great job of explaining in an actual case of research fraud he found in this article[1].

[1] https://crystalprisonzone.blogspot.com/2021/01/i-tried-to-re...

mike_hearn · on April 12, 2022

Ah yes, Hilgard's article is great. I link to an article that links to it elsewhere in this thread. SPRITE and such are very clever but such ammo is very limited. As Hilgard himself points out, the first thing institutions do when faced with a claim of fraudulent data is immediately tell the accused everything. Then there's either no investigation at all or a useless one. All this approach does is teach fraudsters how to avoid detection, and let's face it, the moment there are any consequences, fraudulent scientists will just start running SPRITE and GRIM on their own papers before publication. Fighting fraud requires massive consequences for fraudsters, otherwise if they fail they'll just keep trying again until they learn to evade detection.

Full blown red teams would be great, but seem very hard to set up. I've actually talked to British MPs about this and made some concrete proposals along these lines, but there are several blockers:

1. General apathy / lack of political will. Some of the more savvy MPs know their scientific advisors weren't reliable in COVID and understand the underlying issues, but the vast majority do not and would rather not think about it.

2. The moment you say let's set up a red team, you face the question of how to find high integrity well trained people that have exactly the same idea of what good science is as you do. Particularly problematic: do you have field-specific red teams? If so what stops Ferguson-style problems where the entire field agrees that it's unreasonable to even expect mathematical models to be replicable? You really need outsiders with the highest standards, who won't accept justifications like "We rewrote our dataset with a model because that's just how we do things in our field and who are you to argue with us?".

I think there's a meme about how physicists are always telling other fields they're doing it wrong. Maybe red teams need to be made of physicists :)

3. What exactly is Science, proper noun intended? To red team you need to have a very clear definition of what the scientific method actually is. Who should come up with this? I ended up suggesting a committee of MPs should do it even though that sounds wrong, because ultimately they're the ones authorizing the flow of money, and they're the ultimate outsiders. The moment you start picking insiders it's going to turn into a huge fight (who are these physics nerds to tell us we can't do experiments with 30 undergrads? etc)