Hacker News new | past | comments | ask | show | jobs | submit login
Rebuilding after the replication crisis (asteriskmag.com)
125 points by ignored on Nov 24, 2022 | hide | past | favorite | 73 comments



>One of my formative experiences as a PhD student, in 2011, was submitting a replication study to the Journal of Personality and Social Psychology, only to be told that the journal did not publish replications under any circumstances (you might be thinking, “WTF?” — and we were too).

A while back, I asked a psychology professor why replication studies were frowned upon. She said something like "studies need to be unique and never been done before" and "there's no money in replicating someone else's work". That's why replication studies are frowned upon.

If that's true, then I'm guessing we'll continue to get more Amy Cuddy's popping out of the social sciences.


Thanks for mentioning Amy Cuddy. I remembered her TED conference but hadn’t heard about her since. In the meantime, however, I’ve been exposed to the academic world of social psychology. And I’m not surprised about the phenomenon of Amy Cuddy. I’ve seen a fair share of professors using a social psychology course to simply preach what could be called progressive ideology. The funny thing is I entered the field with progressive views myself, but I’m growing more and more confused seeing the bad faith on display.


It probably takes a different type of academic to do replication studies. It's a specific kind of detective work, to see if you can expose someone else's mistakes which are sometimes even deliberate.

We need more of this type of people.

EDIT: In addition to this, perhaps we should stimulate new students to do a replication study as part of their education.


It takes the sort of person who doesn't care about collecting a bunch of enemies, possibly well-connected ones.


Also a person that doesn't care about ever being published in "approved" journals, magazines, etc.


Imo. this would actually be a great target of public funding. A large international research organization with the the sole purpose of replicating, validating and criticising existing research.

As you correctly point out, the skillset is slightly different to doing novel research, the attitude is different as well, so it would make sense to pool talent. Since there is no money to be made directly with it, governments should step in. And to avoid the problem of journals not wanting to publish replications, the institute would have to be well funded enough to self-pubpish with prestige.


What does money matter once a study is already done? Finding the right incentive balance between novelty and rigor isn't trivial, I get the debate surrounding grant funding. But there should be 0 debate from the perspective of what is publishable, given that it's already done.

My guess is part of the resistance is fear from established PIs that their work won't actually replicate. Even if a minority of profs, I suspect some of the most famous and powerful ones have cut corners or equivocated at one point or another, to get to where they are. It's easy for many other PIs to then get swept up in the "opinion of the field".


Journals want highly-cited papers because that boosts their impact factor, makes them more popular and therefore makes them, yes, money.

A paper claiming a new result generally gets more citations than a paper saying "we replicated the study of Doe et al. and reproduced the results", even if the latter is equally or more useful from a scientific standpoint.


And it's much more deeply rooted in human nature than merely the design of peer review or the quantitative metrics used for evaluating scientists.

Indeed, it is not just about money either. Intangible prestige and status among the scientific expert community is just as much, or even more, coveted. Do people read your papers and talk about it at dinner parties? Do you get invited to give talks at prestigious institutions? Do a lot of interesting and similarly active people turn out to your talks? Do people with good connections and resources want to collaborate with you on exciting ideas?

And it turns out that what people including scientists actually care about is novel, bold, visionary ideas, not drone-like repetitive meticulous detail-oriented work following the footsteps of some other group. People want something new, something cool, something flashy, something sexy, something surprising. Not just the media! Scientists themselves, too!


Most scientists only want something flashy within the confines of the "cookbook" they've already learned on how to do studies. NIH grant applications are formulaic as hell.

Moreover, many labs organize around a methodology like fMRI rather than around any particular type of question. Imagine planning a multi-decade future career around a single tool that exists today and then pretending you truly care about novelty.

You're spot on that people care deeply about academic status. But what is valued in gaining that status has become deeply broken. The fastest way to status is to put out multiple overhyped individual publications that meet the minimum viable novelty threshold for inclusion in a good journal. Half of the battle is a marketing game.

Increased rigor is a time and money commitment that many aren't willing to make, but true novelty is a much bigger risk and a good way to kill a career for anyone not yet tenured.


My larger point is that this is a general human problem, not a science-specific one. Voters listen to the flashy corrupt demagogue politician who speaks to emotions, not the boring one who speaks in nuance and works transparently. In dating people complain about the other gender being shallow and overlooking deeper values. On TV, people watch garbage reality shows so those make the most money. On YouTube, people click thumbnails with obnoxious facial expressions so those win out. In the movies the safe bet is to churn out films from the same franchises as before (not unlike the scientist who barely changes things between papers).

It's not gonna change, one has to learn to accept it and to adapt to it.


I think this is less clearly true than the funding hype imbalance though. I've seen PIs complain that a meta analysis took citations away from their paper that was included in said meta analysis. A replication study that is well written, particularly if it synthesizes replication results of multiple studies, could overtake original work in citations over time IMO. Doubly so if published in a journal that is already fairly reputable.


What is the significance of Amy cuddy to this discussion?


Per her Wikipedia [0], she authored papers about the debunked power posing.

> The theory is often cited as an example of the replication crisis in psychology, in which initially seductive theories cannot be replicated in follow-up experiments.

[0]: https://en.m.wikipedia.org/wiki/Amy_Cuddy


And is still making money off of motivational speaking based on it.


This might explain Amy Cuddy better: https://en.wikipedia.org/wiki/Power_posing


With so many of these stories, is it any wonder that people now refer to this as Science™?


not only social sciences. Except for the very visible areas of ML, the same happens in actual science and engineering...

Example: the thousands of fraudulent XRD spectra of made-up compounds.


Even in ML, it's common knowledge that the long tail of papers demonstrate brittle effects that don't really replicate/generalize and often do uncomparable evaluations, fiddle with hyperparameters to fit the test data, use various evaluation tricks (Goodhart's Law) to improve the metrics, sometimes don't cite better prior work, etc. etc. Industry people definitely know not to just take a random ML paper and believe that it has any use for applications.

This isn't to say there are no good works, but in a field that produces >10,000 papers per year, the bulk of it can't be all that great, but academics have to keep their jobs, PhD students have to graduate etc. So everyone keeps pretending.


those papers are not "very visible" ML (NeurIPS and co.), but domainspecific "applications" and there's tons of it (as I am acutely aware).


What I wrote also applies to most papers at top tier conferences, like Neurips and CVPR. There are thousands of papers published per year even just in those top conferences. What gets picked up by the media or even just reaches the HN crowd is just a small tip of the iceberg.


> Example: the thousands of fraudulent XRD spectra of made-up compounds.

Interesting, didn't hear about this case before. Can you provide a link?



Useful background, if you aren't familiar:

https://en.wikipedia.org/wiki/Replication_crisis

Noteworthy is that the crisis is a huge deal in Psychology - a field which "real" scientists were sneering at a century or more ago.

Personal anecdote:

I once knew a guy who majored in Psychology at a pretty prestigious U.S. research university, back in the mid 1980's. He said that the Psych Dept. there did a big survey of Psych undergrads, asking what they thought of the subject. The most significant finding? That the Psych majors thought the first 2 years of Psych classes were real facts about the real world. But after that - they thought that it was all bullsh*t, and learning how create and spew bullsh*t yourself. (The guy went on to law school, and was quite successful. Which could be interpreted in interesting ways.)


That sneering misses the mark, IMO.

Other fields definitely have similar problems: Amgen found that they could only replicate about half of the cancer biology papers they tested.

The "replication crisis" showed up in psychology first for a few reasons. First, psychology probably runs the most replications of any field. The experiments themselves are easy to reproduce: all you need to redo a survey project is a Qualtrics account—--or a photocopier. They can often be done fairly quickly, making replications a good "warm-up" project for a new grad student. The field also seems to value replications, at least a little.

Second, psychology is actually quite hard! Some failed replications surely reflect sloppy or sketchy research practices, but many of them are probably due to reflect uncontrolled variables/variance. A physicist can produce an endless stream of photons, all of which are exact copies and totally independent of each other. Psychologists have nothing like that. People's reactions are shaped by their own idiosyncratic biases, their past history, and even what they think the experimenter wants to see. It's often not clear what variables actually matter, and in that sense, "failed" replications are sometimes more interesting than successes.


Not going to comment on psychology directly, because I don't know enough about it, but there are some fields where there really is no replication crisis.

In structural engineering we literally tore metals apart to find their physical properties. Concrete was mixed. Giant crushing machines were used. Etc. Sometimes I think that half the reason why some fields don't have a replication crisis is because literal lives are on the line if you get it wrong, so people actually do the work to make the testing gear and align the incentives that need to be aligned.


>Sometimes I think that half the reason why some fields don't have a replication crisis is because literal lives are on the line if you get it wrong, so people actually do the work to make the testing gear and align the incentives that need to be aligned.

You don't think lives are on the line with psychology? The replication crisis there should be terrifying but because of the extra degree of separation between theory and result, it is, for some reason, just hand-waved over.

You are right that if people died because of a structural engineering mistake due to incorrect information people would be sued and possibly sent to prison. In mental health, however, people can die in droves and nothing! Or public policy that destroys the lives of millions can be made on bad research and just shrugged off later.


Yes, I think your way of phrasing it is more correct. Lives are on the line with psychology, but the feedback loop is usually broken.


Yeah, yet this bonduary traces a fuzzy and surprising route. Lives are on the line in microbiology, yet Bik uncovered thousands of suspect image errors there (https://en.wikipedia.org/wiki/Elisabeth_Bik). Semiconductor physics is relatively tangible and close to production, yet Hsu&Loo and others found that Schön faked his award and attention grabbing recipes (https://en.wikipedia.org/wiki/Sch%C3%B6n_scandal).


Right--but that's half the problem!

You can't tear people apart to figure out why they're depressed. Not only will no one volunteer to be torn apart, but it won't tell you much anyway: depression (probably) isn' a property of a single protein or cell. In a sense, this is actually my day job: I work with animals where we can dig down to the level of a single neuron or sometimes even a single molecule....but it's still different enough that you still need to go back to intact, living people to see if your proposed treatment works.


There's definitely no replication crisis in jet turbine design research also.


ahem https://retractionwatch.com/?s=turbine

A lot of these appear to be cases of flat-out fraud or dubious research practices. These all get muddled together under the heading of "replication crisis" but I do think they're a bit of a different beast.


You linked to something about wind turbines.

Did you link to the wrong article?


Statistical research based psychology is hardly a real science. It has no feedback from reality. It has nothing whose success in the real world depends on the accuracy of the research.

Clinical psychology is still useful in my opinion. It still helps people. There's a real feedback loop where understanding can change outcomes.

I dislike the falsifiability approach (if its falsifiable it's science) and the peer review attitude (the peer review process and scientific consensus is what defines science).

My approach is that you need to close a loop. You need to do something useful whose success depended on the truthfulness of the research, and only to the extent of this dependence was anything proved.


In fundamental research you don't typically know the potential applications where it will be useful. In fact I dislike the current overfocus on applicability, your research can't get media-boosted unless you somehow say it's a step in curing cancer or solving climate change. Similarly, pure math is valuable even without foreseeable uses (if pressed, they will say cryptography etc. but they shouldn't have to).

Curiosity to understand how the world works is enough as motivation in science. Applications can also provide ideas but then you can also diverge from them and follow directions that look intrinsically interesting or perplexing.

What you note is not addressing the core problem, namely that academia selects for certain non-diverse personality traits, like very high conscientious ess, wanting to please authority, tolerance of monotony etc which all lead to it becoming an inside baseball, working to scratch each other's backs.

You want to force them to turn away from their navel gazing by orienting them to concrete useful applications but what you really should incentivize is research whose main characteristic isn't that it's useful or all the current trendy hivemind consensus thinks it's what you're supposed to do, but the pursuit of understanding, the desire to see clearly and get closer to truth. And this doesn't just depend on what you incentivize with metrics but also what personalities are allowed in in the first place.


It's not the real world applicability that I care about, it's about verifying your understanding beyond just passive observation. It's very easy to give the wrong story about something true, but much harder to use wrong story to build something.

I hardly care if nobody will gain anything from what you built - I only care that in building it, you proved your understanding. It doesn't even have to be building something useful. Even mathematical proofs count. It wasn't enough to hold a nice story in your head about the behavior of mathematical objects to get a proof - you had to use that understanding to write the mathematical proof. You did something that would fail if your understanding was incorrect - and everyone can objectively judge your success.


> Clinical psychology is still useful in my opinion. It still helps people. There's a real feedback loop where understanding can change outcomes.

Things psychological research has taught us about clinical psychology; the therapeutic relationship is the only thing that reliably predicts helping the client. Not the school of thought, not training, whether you vibe with the therapist.


That's not fair--or true.

Work on perception and memory has held up astonishingly well. Behavioral experiments from the 1800s and 1900s basically nailed down the properties of the retina, a hundred years before we had the techniques to measure it physiologically.

It's also integrated in a lot of "real world" applications: a lot of work has gone into building psychoacoustic models for audio codecs, color spaces for image reproduction, etc. Findings about attention and eye movements influence UX, and all sorts of products exploit behavioral biases (often for nefarious ends).


IMO, this is why game-designers, sales people, actors, anthropologists (to some extent) are IMO much closer to understanding human behavior than social psychologists.


In my view, clinical psychology, like medical practice, is likely to be an art and not a science, for a long time. The reason is we can't wait for the science to come up to speed. People are suffering, and need relief, right now.

My view is that replication is just one safeguard within science, but is not the only thing needed. Confirming or refuting isolated factoids doesn't tell us much more than the factoids themselves.

I think a science needs to develop towards theories that connect those factoids into a framework. Psychology has not reached that stage yet. Compare to physics or chemistry. Some commentary has suggested that the replication crisis extends to those fields too. My graduate research project in physics was never replicated. But physics has a framework of theory that remains strong by connecting studies to one another, so that if one or more isolated studies fail on replication, the whole framework remains strong.

In some sense, "useful" could include merely "useful to science."


I don't think this could work out in medical practice or psychology. There are simply too many variables that influence the outcome in those disciplines and too many unknowns how those things work in detail that can't be expressed in a formula. The brain still remains a mistery


Strongly agree. The main reason we got so much out of physics and chemistry has little to do with the statistical methods and peer review they employed, and much more with how obviously some things were repeatable when found. You need basically 100% reproduction rates to be able to do something truly useful and that's a bar you just can't hit with some fields


This article is worth a read if you want a good overview of where we are in the uphill fight against the replication crisis. If you only have time to read the article or the comments here on HN, skip the comments this time.


We often hear these days from science commentators if Physics should receive or not funding for the next big accelerator, High Energy Particle physics detector or gravitational wave interferometer. Most frequently brought argument is the lack of interesting findings during the last decades, and if that money would be better spent in other fields that have a higher return on investment given they are more prolific in their discoveries.

Guess what, it's very easy to find things when 1 in 20 of your papers can produce them by random chance (95% CI). When not outright participating in scientific fraud and p-hacking.

Perhaps the real lesson to learn from all of this is that we should defund every field that does not adhere to strict replicability controls. When funding follows findings you have set-up the perverse incentive to find something when there is nothing to get funding.


Finding new particles and studying astronomical events has a larger nugget of similarity with the psychology modus operandi than you may think. They’re studying rare or unique objects with extremely finicky measurement tools; enormous amounts of noise are statistically treated to extract a signal. Victor Ninov claimed to have discovered new elements, causing years of disappointing failures to replicate (https://en.wikipedia.org/wiki/Victor_Ninov). Even if the actual scientists learn their lesson, with some regularity popular science media announce that “the Standard Model just got overturned” or that this or that new quark was discovered.


That depends on what you mean by replicability. Climatologists can't exactly conduct controlled experiments, but I think it's still worth putting some public funding into that field.


Plack said, "A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it ..."

I wish I could be more hopeful, but it seems like a large portion of researchers in fields like psychology are too worried about their prior, poor quality research to embrace change.


Admittedly that quote is a bit of a trigger for me, since I'm an older scientist. Also, "embracing change" is a business dog whistle that is often applied when the change is onerous to the people on whom it's being imposed. Both are technically ad hominem.

A bigger problem may be that the entire body of published knowledge, and even the choice of categories that are the focus of study (such as "personality" and "intelligence") are too numerous and prevalent throughout our entire culture to readily abandon.

The right thing to do if half of your knowledge base is bunk, might be to erase all of it and start over, but that's virtually impossible.


Science is an approach to epistemology.

My conjecture is that all truths must be experienced. This aligns with the notion of “nullius in verba”, the original motto of the Royal Society, arguably the birthplace of modern science.

Science takes place in a laboratory. Whether or not ink on a page is true depends on nothing other than replicating the methods for oneself.

That the current environment is for printing ink on paper and calling it a day tells me that we’ve moved on from science as an epistemological solution to the notion of truth and regressed to an era of truth emanating from privileged authorities.


Surely this is also a consequence of knowledge expansion? Up to a certain year some people could credibly be said to know everything scientific. There just wasn't much of it. I couldn't imagine verifying with my own eyes even a small subset of one field today. I doubt there's one person alive who's seen 5% of, say, all metallurgy with their own eyes.

How would we ever progress without a change in method from observation to belief?


My point is that we’re probably no longer in what future historians will consider the scientific era. “Believe in science” would be an incoherent statement to Newton.


>“Believe in science” would be an incoherent statement to Newton.

It wouldn't, because Newton considered himself primarily an alchemist and philosopher, and viewed all of his "scientific" endeavors as a means to the end of understanding God through understanding Creation.


Really, the guy who was president of a club whose motto is “take no one’s word for it” would be fine with just like, taking someone’s word for it?


If that "someone" was the Bible, yes. Apart from his occult work (which like most Western magic was done in a heretical Christian context) he published numerous works on Biblical literalism and prophecy. That Newton considered his work primarily an expression of his religious views isn't exactly a secret.

You made the mistake of reaching too far back in time, searching for an example of someone who should have been aghast at the premise of applying faith to science, and found someone for whom they were one and the same. Although he did reject a lot of the orthodox views of the church, he certainly didn't reject religion outright for its lack of verifiability.


You’ve taken the discussion into a discussion about religion and not just about “natural philosophy”, which is fine, but not what anyone is talking about with “believe in science”.

If we want to talk about religion and taking no one’s word for it you’ve already started us in this direction by pointing out his rejection of church orthodoxy, which is basically the extreme end of Protestant practice.

If you believe that God wrote the Bible and that the church is made up of corrupt men and that only your personal understanding of the word of God is the path towards the True, well, you’re completely at odds with the epistemology of the Catholic Church with regards to religion and very much primed to take the same approach of a personal relationship with God and his word to a personal relationship with natural philosophy.


How could any replication study sample from the same pattern of people, given that there are so many cultures with different ways of living, eating, spending time etc. All of these cultures also constantly change. Universal representativity is unattainable, so any replication will have a number of unknown sample variables that changed. This is still not highlighted enough in most papers I read.


If the result isn't stable across samples how valid is the result?


If the study otherwise follows best practices, results about differences between well-defined subpopulations, and the reasons behind them, are still worthwhile to investigate and report.


If the relevant subpopulations are well-defined, then it should be possible to sample from those subpopulations and reproduce the results.


But then, for many studies about (e.g.) Eating, Behavior, Psychology (etc.): Even for well defined populations (lets say, "religious male adults in Texas") there would be so many incidental variables that I would have strong doubts that results are easily reproducible. On the other hand, studies with "well defined" but very narrow subpopulations (e.g. "religious male adults in Texas that work in printing offices and eat granola every morning") are not easily transferrable, which makes it difficult to generalize results (thus, results are less significant).


Even producing results that reliably replicate across "American psychology college students age 18-21" would be a start.


That's the whole point of sampling. If you get a big enough sample, and you have a reasonably fair way of getting it, then all those things wash out as individual variation.


I agree. But it is very rare that I see a study that samples people living in more than one country, not to speak of sampling of people from all countries in the world. My point is this: If a study samples only people from the US, it should be highlighted very early (like in the first sentence) that it is only valid(ated) in the US and cannot be applied elsewhere, without testing.


In the article https://www.riskofbias.info/ "Risk of Bias Tools" is linked, if anyone is interested.


Slowly, all articles published in this magazine are making their way to the front page.


[flagged]


got dumped by a psychologist?


Dude. Psychologist still study Froyd.

That's like doctors studying phrenology or physicst studying phlogiston. Outside of a history lesson of what not to do, it's not really useful.


At least in my University, in 2010, they were very adamant to let us know in our first month there that Freud was quite wrong and that this is not what modern Psychology is like.

If you think Freud is well-accepted you might be getting your views on Psychologists based on TV shows, or at minimum are vastly overgeneralizing.


Physicists still study aether, but that doesn't mean it's taken as fact. Understanding where ideas and notation came from and how a field has grown to it's current state is important.


Neither psychologists nor physicists study Freud or aether.


I am a practicing physicist and I took a whole course on the history and development of physics and the natural sciences in general. It was useful.

For example, the general public, and even scientists and engineers, still talk about "heat flow" as though temperature itself is a fluid being transfered between objects. This is incorrect physically, but nevertheless there are clear mathematical analogies between how objects in contact with each reach equilibrium, and other physical systems, like two containers filled with water to different levels and connected with a pipe at the base. The reason for this is entirely historical, and if one is not mindful of that the terminology can be very misleading.


Freud*


Replication, not reptilian. The title makes more sense now.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: