I will never forget the day a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted. The primary author is now a hot shot professor.
My whole perception of academia and peer review changed that day.
Edit to elaborate: like many of our institutions, peer review is an effective system in many ways but was designed assuming good faith. Reviewers accept the author’s results on faith and largely just check to make sure you didn’t forget any obvious angles to cover and that the import of the work is worth flagging for the whole community to read. Since there’s no actual verification of results, it’s vulnerable to attack by dishonesty.
When I was in school pre-university, this type of "crap we can't get the what we wanted to happen so let's just fiddle around with it until it seems about right" was very common. I was convinced this was how children learned, so that as adults they wouldn't have to do things that way.
When I got into university and started alternating studying and work, I realised just how incredibly clueless even adults are. The "let's just try something and hope nothing bad happens" attitude permeates everything.
It's really a miracle the civilisation works as well as it does.
The upshot is that if something seems stupid, it probably is and can be improved.
In the lyceum where I studied, there was one lab on Physics, where the book that accompanied the lab was deliberately wrong. We were told to perform an experiment that "should" support a certain conclusion, but actually neither the "correct" conclusion nor its opposite could be done because of the flawed setup which measured something slightly different. A lot of students (in some groups, all students) fell into this trap and submitted paperwork with the "correct" conclusion according to the book.
A CS-specific analogy might be to give the students a compiler that has a bug in it, such that the students' code is deliberately mis-compiled. The standard of evidence to believe that the compiler is buggy is much higher than the standard to believe that my code is buggy.
A lab exercise like that could really just be selecting for chutzpah (feeling charitable) or arrogance (less charitable).
Well, that's more evil than my lab. A more direct equivalent in the CS would be an algorithm description in the booklet with one subtly wrong (e.g. proven using some well-hidden circular reasoning) and uncorrectable step. The expectation would be that a good student finds the mistake instead of submitting the implementation of the flawed algorithm, or, for even better matching with my case, proves that the supposed output cannot be obtained from the inputs at all.
I had an appendectomy just before the final first-year Modern Physics lab and had to come back in to do a make-up lab. Sure enough it was the slightly-messed-up lab where the results should in theory look exponential but come out linear. I, naturally, drew an exponential curve through the points. Lab instructor decided to grade it right there before I left and tore a strip off me.
Very valuable lesson, although it sure did suck at the time.
I've come to think things works as well as it does largely because a whole lot of what people do has no effect either way. I see so much stupidity where the only saving grace is that it is directed into pointless efforts that won't be allowed to do any real damage.
When you start talking millions of people damage gets subtle.
Robocall scams are very high on the profit:human misery scale, but their hardly going to end civilization. Pollution, corruption, theft etc all make things worse, but we never see the better world without such things so it all feels very abstract. Of course you need to lock your doors etc that’s just the way things are.
A bio professor of mine said something that stuck with me: “life doesn’t work perfectly, it just works.”
It has to work well enough to… work… and reproduce. That’s it. It’s not “survival of the fittest.” It’s “survival of a randomized subset of the fit.”
There’s even a set of thermodynamic arguments to the effect that systems are unlikely to exceed such minimum requirements for a given threshold. For example, if we are visited by interstellar travelers they are likely to be the absolute dumbest and most dysfunctional possible examples of beings capable of interstellar travel since anything more is a less likely thermodynamic state.
So much for Star Trek toga wearing utopian aliens.
> if we are visited by interstellar travelers they are likely to be the absolute dumbest and most dysfunctional possible examples of beings capable of interstellar travel
Otoh, they would be aware about that, and they might have spent some time improving how genes (or what they have) and evolutionary selection works for them, so that, say, their species with time becomes brighter and brighter than what's actually needed. If they wanted to do that.
How do you improve your genes? Removing obvious deleterious disease mutations is easy, but as soon as you try to “go where there are no roads” you hit the same combinatorial challenge as evolution.
Also more intelligence does not equal better ideas. The world is full of crazy or amoral people with apparently very high IQs. Your average flat Earther probably has an above average IQ.
Improvement is a war against entropy and n^n^n^… combinatorics any way you slice it.
Slowly across hundreds and thousands of generations.
By adding evolutionary pressure, for what they want -- it'd be up to those space traveling aliens to decide -- they can change their species, generations into the future.
> Improvement is a war against entropy ...
Reasoning in that way, the humans would not have gotten brighter than the chimpanzee monkeys. There's been evolutionary pressure for the humans to get brighter, and it would be possible for you (I mean the humans), or the space travelers, to add artificial ev. pressure.
Anyway never mind all this, maybe talking about space travelers and the humans and their genes isn't the best way to spend the day. Have a nice day btw
The problem is the incentives. To do well, you must publish. To publish, you must have a good story, and ‘we tried this and it didn’t work’ is not one.
So after a certain time spent, you are left with a choice of ‘massaging’ the data to get some results, or not and getting left behind those that do or were luckier in their research.
"We tried this and it didn't work, and here's why we think it didn't" should be among the bests stories to publish. Looking back I learned more from stuff that didn't work, or rather figuring out why it didn't, than from success.
That can end up being just as time consuming as doing the research to begin with. Often there is no time and no money to go back and do that. If your 'budget' is 6 month you're going to spend 6 month trying to get your experiment to work. You're not going to 'give up' after 4 month and spend 2 month putting together a "why we failed" paper.
However, the advantage is, if it is published, it can decrease the likelihood of multiple other attempts to try the same “unique” (and wrong) approach.
"We chose this area because we believed it should be archeologically interesting based on XYZ. However, we searched through ABC methods and found nothing there" would be valuable for the future. Maybe XYZ isn't as good as we though, maybe ABC couldn't find it. Maybe now some other sod in the future won't try that location.
Not as valuable as a discovery, but very far off zero value. Yet the reward in academia would be near-zero.
Distracting from the main point, "let's just try something and hope nothing bad happens" (trial and error) is precisely the reason civilization made it this far :)
I'm just done with a 3-hour reading session of an evolutionary psychology book by one of the leading scientists in the field. The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.
Still, reading your comment makes me despair. It plants a nagging doubt in my mind, "how many of these zillion studies cited that are actually replicable?" This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.
What are the solutions here? A big incentive-shift to reward replication more? Public shaming of misleading studies? Influential conferences giving more air-time for talks about "studies that did not replicate"? I know some of these happen at a smaller-scale[1], but I wonder about the "scaling" aspect (to use a very HN-esque term).
PS: Since I read Behave by Sapolsky — where he says "your prefrontal cortex [which plays critical role in cognition, emotional regulation, and control of impulsive behavior] doesn't come online until you are 24" — I tend to take all studies done on university campuses with students younger than 24 with a good spoon of salt. ;-)
Evo psych is questionable to me for more basic reasons. It seems full of untestable just so stories to explain apparent biases that are themselves hard to pin down or prove are a result of nature not nurture.
It’s probably not all bullshit but I would bet a double digit percentage of it is.
I'm conscious that this is a flame-bait topic. That said, no, dismissing the whole field as "questionable" is callous. Yes, there are many open questions, loaded landmines, and ethical concerns in evolutionary psychology research. But there's also copious evidence in its favour. (Reference: David Buss et al.)
Many people might spare themselves at least some misery by educating themselves about evolutionary psychology, including the landmines and open questions.
psych is questionable for basic reasons. It is a humanities science. It's purpose is not to figure out the world but to change it. Figure out how to end poverty for example.
Therefore it is not well suited to figure out the world.
You should treat all of it with extreme helpings of salt.
Can't this be applied to wide swaths of hard sciences as well? Lots of scientific work overlaps heavily with engineering, which is all about changing the world.
Also, I don't think ending poverty is a major stated goal of psychology research. . .
> “how many of these zillion studies cited that are actually replicable?” This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.
I think the problem is much bigger than simply a binary is it replicable or not. It’s extremely easy to find papers by “leading experts” that have valid data with replicable results where the conclusions have been generalized beyond the experiments. The media does this more or less by default when reporting on scientific results, but researchers do it themselves to a huge degree, use very specific conditions and results to jump to a wider conclusion that is not actually supported by the results.
A high profile example of this is the “Dunning Kruger” effect; the data in paper did not show what the flowery narrative in the paper claimed to show, but there’s no reason to think they falsified the results. Some researchers have reproduced the results, as long as the conditions were very similar. Other researchers have tried to reproduce the results under different conditions that should have worked according to the paper’s narrative and conclusions, but found that they could not reproduce, because there were specific factors in the original experiment that were not discussed in the original paper’s conclusions -- in other words, Dunning and Kruger overstated what they measured such that the conclusion was not true. They both enjoyed successful academic careers and some degree of academic fame as a result of this paper that is technically reproducible but not generally true.
To make matters worse, the public has generally misinterpreted and misunderstood even the incorrect conclusions the authors stated, and turned it into something else. Almost never in discussions where the DK effect is invoked do people talk about the context or methodology of the experiments, or the people who participated in them.
This human tendency to tell a story and lose the context and details and specificity of the original evidence, the tendency to declare that one piece of evidence means there is a general truth, that is scarier to me than whether papers are replicable or not, because it casts doubt on all the replicable papers too.
> The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.
One approach that can be adopted on a personal level is simply changing the way one thinks. For example, switch from a binary (true/false) method of epistemology to trinary (true/false/unknown), defaulting to unknown, and consciously insist on a high level of certainty to reclassify an idea.
There's obviously more complexity than this, but I believe that if even a relatively small percentage of the population started thinking like this (particularly, influential people) it could make a very big difference.
Unfortunately, this seems to be extremely counter to human nature and desires - people seem seem compelled to form conclusions, even when it is not necessary ("Do people have ideas, or do ideas have people?").
Yes, but you have to convince your readers that you did a more careful and meticulous job than 'Top Institution's Best Paper Award' did. After all, a failure to replicate only means that one of you is wrong, but it doesn't give any hint as to who.
> a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted.
Isn't that the moment where you try even harder to falsify the claims in that paper? You already know that you'll succeed so it wouldn't be a waste of time in your effort.
The problem with experimental results is that they are difficult to replicate. In software you can "git clone x.git & cd x & make" and replicate the correct or incorrect results. In hardware, it's more difficult.
The main problem is that even if you reproduce their experiment, they can claim that you did some step wrong, perhaps you are mixing it too fast or too slow, or the temperature is not correctly controlled, or that one of your reactive have a contamination that destroy the effect, or magically realize that their reactive that is important.
It's very difficult to publish papers with negative results. So there is a high chance it will not count in your total number of publications. Also, expect a low number of citation, so it's not useful for other metrics like citation count or h.
For the same reason, you will not see publications of exact replications. A good paper X will be followed by almost-replications by another teams, like "we changed this and got X with a 10% improvement" or "we mixed the methods of X and Y and unsurprisingly^W got X+Y". This is somewhat good because it shows that the initial result is robust enough to survive small modifications.
It is not possible (in principle) and it was never intended for peer review to protect against fraud. And this is ok. Usually if a result is very important and forged, other groups try to replicate and fail, after some time the original dataset (which needs to be kept for 10 years I think) will be requested and then things go done from there.
Assuming not good faith for peer review would make academia more interesting, only way would probably for the peer reviewer go to the lab and get live measurements shown. Then check the equipment...
I wonder if it's a better system to just hire smart professors and give them tenure immediately. The lazy ones in it just for the status won't do any work, but the good ones will. Sure, there will be dead weight that gets salaries for life, but I feel like that's a lesser problem than incentivizing bad research.
The problem isn't just the scientists, it goes all the way up. Let's say we implement your system. Who decides how many 'smart professors' the Type Theory group gets to hire? What if the Type Theory and Machine Learning departments both want to hire a new 'smart professor' but the Computer Science department only has money to hire one more person?
One reasonable approach might be to look at which group has produced the 'best' research over the past few years. But how do you judge that in a way that seems fair? Once you have a criteria to judge that, then people will start to game that criteria.
Or taking a step up, The university needs to save money. How do you judge if the Chemistry department or the Computer Science department should have its funding cut.
No matter how you slice it at some point you're going to need a way for someone to judge which of two departments is producing the 'best' research and thus deserves more money, and that will incentivize people to game that metric.
There is no shortage of resources to provide for every person who wants to devote their life to discovering something valuable for all of humankind.
We aren't short on food, shelter, clothes, tech, etc - those are all solved problems.
The problem that isn't solved is stupid people sitting in charge of decisions they don't have the brain make-up to comprehend or manage, making pretend they know what they're doing, holding people far superior to them hostage.
Smart isn't the biggest criteria for success as a professor. The PhD degree is a good filter because it trains and tests research aptitude, work ethic, ability to collaborate, ability to focus on a single problem for a long period of time, and others.
One problem is PhD degrees are too costly to those who don't get academic or industrial success from them. But as long as talented people are willing to try to become a professor I don't see the system changing.
Who is to judge the merit of their talent? Shouldn’t their results speak for themselves? And prey tell, what are the results of academia in the age of the digital revolution where there is no obligation to complete a university education with the knowledge of its mathematical scientific foundation?
I think many more are drawn to professorship for a sense of status, ie prestige. It shows in their overwhelming mediocrity, eg the failure of economics to progress to a biologically scientific paradigm.
Which is exactly how it still works in many places. You have to be co-opted and have the vote of your peers. This doesn't do anything to ensure those elected are able. It ensures they are politically desirable.
In the past people who did science could do so with less personally on the line. In the early days you had men of letters like Cavendish who didn't really need to care if you liked what he wrote, he'd be fine without any grants. That obviously doesn't work for everyone, but then the tenure system developed for a similar reason: you have to be able to follow an unproductive path sometimes without starving. And that can mean unproductive in that you don't find anything or in that your peers don't rate your work. There'd be a gap between being a young researcher and tenured, sure.
Nowadays there's an army of precariously employed phds and postdocs. Publish or perish is a trope. People get really quite old while still being juniors in some sense, and during that time everyone is thinking "I have to not jeopardise my career".
When you have a system where all the agents are under huge pressure, they adapt in certain ways: take safer bets, write more papers from each experiment, cooperate with others for mutual gain, congregate around previous winners, generally more risk reducing behaviour.
Perhaps the thing to do is make a hard barrier: everyone who wants to be a researcher needs to get tenure after undergrad, or not at all. (Or after masters or whatever, I wouldn't know.) Those people then get a grant for life. It will be hard to get one of these, but it will be clear if you have to give up. Lab assistants and other untenured staff know what they are negotiating for. Tenured young people can start a family and not have the rug pulled out when they write something interesting.
I agree with your diagnosis of the problem, but don't think your solution is a good way forward - immediately after undergrad is way too early to be evaluating research potential and would just shift the hyper competitiveness earlier.
A better solution would be to stop overproducing PhDs. We could reduce funding for PhD students and re-direct that towards more postdoctoral positions - perhaps even make research scientist a viable career choice?
Overproducing PhDs seems to be a necessary aspect of how research is conducted in the current university. Most serious lines of work are pursued by a PhD student or Postdoc and advised by a Professor. They need a critical mass of PhD students which is definitely a much larger number than 1 per professorship. This is especially true in fields where industry jobs aren't readily available.
I think that's a huge part of the problem though - we've made it so the only way we can get research done is by training a new researcher - even though there's already plenty of trained researchers who are struggling to find a decent job.
I'm suggesting that we re-direct some of the funding for training PhD students into funding for postdoctoral positions (via either fellowships or research grants). Professors would still get their research team, but rather than consisting mostly of untrained PhD students, they'd have a smaller, but more effective team of trained researchers.
Isn't that the case simply because professors are expected to be highly productive, to the extent where it is not possible to meet the bar without offloading the work to students and switching to a full-time manager?
> I agree with your diagnosis of the problem, but don't think your solution is a good way forward - immediately after undergrad is way too early to be evaluating research potential and would just shift the hyper competitiveness earlier.
Immediately after undergrad is how it used to work in the golden days of science, more or less.
If the competitiveness is the problem maybe tenure should be a lottery that you enter once at a fixed stage, preferably before you're expected to start publishing in journals.
The system that produces PhDs isn’t that bad. It is a good way to create research portfolio useful for employment in private sector. We need to pay less attention to the title though - this is not a distinguishing achievement for life.
The act of producing a doctoral dissertation usually leaves something of a mark on one's outlook, skills, etc. I claim it is a _distinguishable_ achievement for life.
Yet the principle of pursuing knowledge is not for pecuniary interests. So your judgment demonstrates the temporal shift of the Western University towards rubber stamping people’s vocational aptitude. This leads to corruption, of course.
This is one of the many reasons I like Universal Basic Income. Having UBI would let researchers take risks and have something to fall back on if needed and could reduce some of the pressure
I don't think UBI works well here because in most fields the level of success that the precarious group experiences in industry is substantially higher than a guaranteed minimum. A lot of people have identity aspects tied to their university affiliation and don't want to stop working with the university in part for that reason.
No matter what level we put UBI at, it will almost certainly be less than a third of what a researcher salary would be. Also it's not just about the money. Losing your job means losing access to a lab, access to data, access to grant money and basically everything you need to actually do research.
Solution is to publish data, not „papers“ first and assign it a replication score - how many times it was verified by independent research.
The paper can follow with the explanation, but citations will no longer be important - what will matter is the contribution to the replication score (will also work as an incentive to confirm other‘s results).
If someone gets contradicting result the replication score of the entire ring can be nullified or in case of intentional manipulation with data negated.
But you have the same basic problem as now - you’d need some sort of science police to control it, which goes against the scientific process. Essentially it’s a problem of establishing trust in an untrusted system. Putting it
that way actually makes it sound like a blockchain problem. Maybe there could be some incentive system to replicate work based on smart contracts, but I don’t know how you could ensure the replicating parties to be independent.
Scientific progress today heavily depends on financial support of society, so as a whole it cannot be completely decentralized and independent. People want to know how their money are spent and want to have guarantees that science will not create something awful. This means that policing of science is inevitable and important part of the system. It is not a question if we need science “police”, it is a question how it should look like. Today it is decentralized: someone maintains the list of media publishing in which will count for citation index, there are ethical committees and scientific boards, lawmakers regularly tell what can be done and what should not etc. How this will change if there will be a new system of incentives in place, we can only imagine: it can be a good or a bad thing, but as long as the system remains democratic, all problems should be easy to fix.
A key function of scientific publication is to inform other researchers in the field about potentially interesting things as quickly as resonable. Getting "two more confirmations from separate teams" is a very high bar, as it's not about just asking a source, it's asking someone else to do all the same work again. Not only we don't require it before publication, we don't expect it to happen for the vast majority of publications ever. Important studies get replicated, but most don't get repeated ever. A partial explanation of the original article's observation is the (very many!) papers that don't have much citations and don't fail to replicate because nobody cared enough to put the work in to try.
If publication would require two more confirmations from separate teams, that would mean (a) doing the work in triplicate, so you get three times less results for the same effort; (b) the process would take twice as long as I spend a year doing the experiment and then someone else can start and spend a year doing the same experiment, and only then it gets published; (c) there's a funding issue - I have somehow got funding to spend many months of multiple people on this, but who's paying the other independent teams to do that?; (d) it's not a given that there are two other teams capable of doing the exact same research, e.g. if you want to publish a study on the results of an innovative surgery procedure, it's plausible that there aren't (yet!) any other surgeons worldwide who are ready to perform that operation, that will come some time after the publication; (e) many types of science really can't get a separate confirmation - for example, we have only one Large Hadron Collider, you can't re-do archeological digs, event-specific on-site sociological data gathering can't really be repeated, etc; so you have to take the data at face value.
What you describe is absolutely right, it is important to have this kind of communication. If publications were only the means to communicate, that would serve the purpose and won't be a problem. The problem is that they are considered having a second purpose - to create scientific reputation, based on which society allocates funds and prioritizes the research. The original article illustrates how wrong this approach can be, substituting the ability to produce scientific facts with the good story telling.
Having a scientific blockchain looks like a decent idea... but of course it will not suffice and will be gamed. The real causes of the mess are complexity of the world compared with our minds and tools and the lack of epistemologic undestanding as society,institutions, culture. Science can't be more than a nice and usefull collection of heuristics. Otherwise is just the Scientism religion lurking arround and pretenting to read God's mind. Metarationality concepts could offer an exit from the inevitable mess.
Ok, I got 12, 18, 45. Does anyone want to verify my results? If so, I'll write up a paper describing what they mean...
Hopefully it is clear that that data is useless without some written text explaining what it means. Given that for hundreds of years the accepted way of presenting that explanatory text was by writing papers, I don't see any reason to abandon that. Tweaking our strategies for replication (after a description of the experiment has been published!) and reputation don't seem to contradict that.
Im not sure prosecuting academics is particularly obvious: you'd need to prove malicious intent (rather than ignorance) which is always difficult.
For me a better solution would be to properly incentivise replication work and solid scientific principles. If repeating an experiment and getting a contradictory result carried the same kudos as running the original experiment then I think we'd be in a healthier place. Similarly if doing the 'scientific grind work' of working out mistakes in experimental practice that can affect results and, ultimately, our understanding of the universe around us.
I think an analogy with software development works pretty well: often the incentives point towards adding new features above all else. Rarely is sitting down and grinding through the litany of small bugs prioritised, but as any dev will tell you doing that grind work is as important otherwise you'll run in to a wall of technical debt and the whole thing will come tumbling down.
Open source and Free Software is (despite it being a cliche for programmers to over apply it) a good model to compare with.
You have big companies making Billions with the work of relatively poorly paid nerds. But as soon as you make it possible for the nerds to claim all the profits of the work then you have a whole class of people whose job is to insert themselves as middlemen and ruin it for everyone, both customers and developers.
So basically the aim is to limit the degree to which you can privately profit from science, and expand the amount of science you can easily build on. You still get enough incentives for progress, the benefits accrue to society as a whole, and competition and change is enabled without powerful gatekeepers controlling too much in their own interests.
One perspective is that, “knowledge generation wise,” the current system really does work from a long term perspective. Evolutionary pressure keeps the good work alive while bad work dies. Like that [Top Institution] paper: if nobody else could reproduce it, then the ideas within it die because nobody can extend the work.
But that comes at the heavy short term cost of good researchers getting duped into wasting time and bad researchers seeing incentives in lying. Which will make academia less attractive to the kind of people that ought to be there, dragging down the whole community.
Due to career and other reasons, there is a publish or perish crisis today.
Maybe we can do better by accepting not everyone can publish ground breaking results, and it's okay.
There are lots of incompetent people in academia, who later go to upper positions and decide your promotions by citation counts and how much papers you published. I have no realistic ideas how to counter this.
We need to create new a social institution of Anti-Science, which would work on other stimuli correlated with the amount of refuted articles. No tenures, no long-term contracts. If anti-scientist wished to have income it would need to refute science articles.
Create a platform allowing to hold a scientific debate between scientists and anti-scientists, for a scientist had an ability to defend his/her research.
No need to do anything special to prosecute, because Science is a very competitive, and availability of refutations would be used inevitable to stop career progressions of authors of refuted articles.
This seems like a pragmatic and workable idea. We could even have the same type of thing for journalism and "facts" in general, it would be a step up from the current tribal meme/propaganda war approach we rely upon.
Data and code archives, along with better methods training.
Data manipulation generally doesn't happen by changing values in a data frame. It's done by running and rerunning similar models with slightly different specifications to get a P value under .05, or by applying various "manipulations" to variables or the models themselves for the same effect. It's much easier to identify this when you have the code that was used to recreate whatever was eventually published.
Sure, but often there are perfectly valid reasons to change your methodology half way through a project when you know a lot more about the thing you are trying to do than you did before you started.
I don't think prosecution is the right tool but if we were going down that road material misrepresentations only would fit with anti-fraud standard for companies. Just drawing dumb, unpopular, or 'biased' conclusions shouldn't be a crime but data tampering would fall into the scope. Not a great idea as it would add a chilling effect, lawyer-friction and expenses and still be hard to enforce for little direct gain.
I personally favor requirements which call for bundling raw datasets with the "papers". The data storage and transmission is very cheap now so there isn't a need to restrict ourselves to just texts. We should still be able to check all of the thrown out "outliers" from the datasets. An aim should be to make the tricks for massaging data nonviable. Even if you found your first data set was full of embarassing screw ups due to doing it hungover and mixing up step order it could be helpful to get a collection of "known errors" to analyze. Optimistically it could also uncover phenomenon scientests thought was them screwing up like say cosmic background radiation being taken as just noise and not really there.
Paper reviewing is already a problem but adding some transparency should help.
Leveraging the prestigious papers to win grant proposals is where they need to get them. Citations aren't what gets you a job or tenure at a R1 research school, it's the grants that the high-impact papers help you win.
You don't have to convict people for full-on fraud. If you are caught using an obvious mistake in your favor or using a weak statistical approach, the punishment can be you are not allowed to apply for grants with a supervisor/co-PI/etc who's role is to prevent you from following that "dumb" process in the future.
We could use public funding to do the work OP tried to do.
Something like a well funded ten year campaign to do peer review, retrying experiments and publishing papers on why results are wrong.
I have a co-worker who had a job than involved publishing research papers. Based on his horror stories it seems like the most effective course of action is to attack the credibility of those who fudges results.
The single biggest impediment to "fixing this" is that you haven't identified what "this" is or in what manner it is broken.
There will always be cases of fraud if someone deeps deeply enough into large institutions. That doesn't actually indicate that there is a problem.
Launching in to change complex systems like the research community based on a couple of anecdotes and just-so stories is a great way not actually achieving anything meaningful. There needs to be a very thorough, emotionally and technically correct enumeration of what the actual problem(s) are.
A couple of anecdotes is a very disingenuous way to frame the replication crisis. Heavily cited fraudulent research impacts public policy, medicine, and technology development. This means it's everyone's business.
The problem you're describing there is a public policy one, not something to do with the scientific community. Public policy should be implemented with a trial at the start and a "check for effectiveness" step at the end because there is no way to guarantee the research it is being based on is accurate. Statistically, we expect a big chunk of research to be wrong no matter what level of integrity the scientists have.
"Statistically, we expect a big chunk of research to be wrong no matter what level of integrity the scientists have" - that's the actual problem under discussion here.
Research is heavily funded because people believe it's something more than a random claim making machine. You say governments should assume research is wrong and then try to replicate any claim before acting on it. But you end up in a catch 22: if the research community is constantly producing wrong claims there's no reason to believe your replication attempt is correct, as it will presumably be done by researchers or people who are closely aligned.
Additionally inability to replicate is only one of many possible problems with a paper. Many badly designed studies that cannot tell you anything will easily replicate. A lot of papers are of the form "Wet pavements cause umbrella usage". That'll replicate every single time, but it's not telling you anything useful about the world. Merely trying to fix things with lots of replication studies thus won't really solve the problem.
Research is far better than a random claim making machine even if some of it has errors that have caused the replication crisis. It's easy to overstate the level of the problem even though it's fairly severe at this point.
"Wet pavements cause umbrella usage" is something where I'd want to see your specific examples because it's easy to get a correlational study of that nature but very hard to design a causal one. The correlational studies are usually accurate and often useful for other research.
I would argue the whole framing of the "replication crisis" is another example of the problem with "overselling" research results. Yes there is a problem with some research in some areas of science not being replicatable. However, the vast majority of research in many fields does not have this problem. Framing this as a "crisis" overstates the problem and gives the impression that the majority of research can't be replicated.
By waiting until scientists address this? Note that the 'replication crisis' is something that originated inside science itself, so, despite there being problems science has not lost its self-correcting abilities. The scientists themselves can do something by insisting on reliable and correct methods and pointing it out wherever such methods are not in use. It is also not like there are no gains in doing this. Brian Nosek became rather famous.
The replication crisis is not being addressed. It's being discussed occasionally within the academy, but a cynic might wonder if that's because writing about the prevalence of bad papers is a way to write an interesting paper (and who is checking if papers about replication themselves replicate?). It's been discussed far longer and more extensively by the general public but those discussions aren't taken seriously by the establishment, being as they are often phrased in street terms like "you can find an expert to tell you anything" or "according to scientists everything causes cancer so what do they know?". And of course the higher quality criticism gets blown off as mere "skepticism" or "conspiracy theories" and anyone who tries to research that is labelled as toxic.
So a lot of people only notice this in the rare cases when someone within the academy decides to write about it. This can make it seem like science is self correcting, but it appears in reality it's not. When measured quantitatively there is no real improvement over time. Alvaro de Menard has written extensively on this topic and presented data on the evolution of P values over the last decade:
Additionally as he observes at the end of his essay, the problems are due to bad incentives, so the only true changes can come from changes to incentives. However those incentives are set by the government. Individual scientists cannot themselves change the incentives. The granting agencies are entirely oblivious to the problems and the scale of their ambition is in no way equal to the scale of their problem:
"If you look at the NSF's 2019 Performance Highlights, you'll find items such as "Foster a culture of inclusion through change management efforts" (Status: "Achieved") and "Inform applicants whether their proposals have been declined or recommended for funding in a timely manner" (Status: "Not Achieved") .... We're talking about an organization with an 8 billion dollar budget that is responsible for a huge part of social science funding, and they can't manage to inform people that their grant was declined! These are the people we must depend on to fix everything."
Scientists with a proven track record should have life-long funding of their laboratory without any questions asked. So they can act as they want without fear of social repercussions. Of course some money will be wasted and the question of determining whether a track record is proven is still open, but I think that's the only way for things to work (except when the scientist himself have enough money to fund his own work).
I think this would be a positive step, but to play devil's advocate, what happens when this superstar scientist retires? If I'm a researcher in his lab, does my job just disappear? If so, I'm still going to feel pressure to exaggerate the impact of my research.
I've been spending a lot of time on 'bad science' as a topic lately (check my comment history or blog for some examples). I think what you're proposing is the opposite of what's required.
Firstly, the problem here is not an epidemic of scientists who feel too financially insecure to do good work. Many of the worst papers are being written by people with decades-long careers and who lead large labs. Their funding is very secure. They are doing bad work anyway for other reasons, sometimes political or ideological, more often because doing bad work results in attention, praise and power. Or sometimes because they don't know how to explain their chosen question, but don't want to admit that scientifically they failed and don't know where to go next.
Secondly, as you already realized your proposal relies on identifying which scientists have a proven track record, but the whole problem is that science is flooded with fraudulent/garbage claims which are highly cited ("proven") and which were written by large teams of supposedly respectable scientists at supposedly respectable institutions. Any metric you can invent to decide who or what has a proven track record is going to be circular in this regard. To Rumsfeld the problem, we are surrounded by "unknown knowns". You say this is an open question but to me that's a fatal flaw.
So the problem is actually the inverse. You say at the end, well, scientists who can fund their own work are an exception. Obviously in most cases scientists don't need to do this, they can also be funded by companies. Most computer science research works this way. Better CPUs and hardware is done almost entirely by companies. AI research has been driven by corporate scientists, and so on. In contrast academic funding comes primarily from government agencies that distribute money according to the desires of academics. This means a tiny number of people control large sums of money, and they are accountable to nobody except themselves. There are no systems or controls on academic behavior except peer review, which is largely useless because the peers are doing the same bad things as everyone else.
Viewed from an economic perspective academia is a planned reputation economy. The state is the source of all resource allocation decisions (academics being effectively state employees in most fields). There's also a deeply embedded Marxist worldview: universities have no working mechanisms to detect fraud, because of an implicit assumption that deep down when market forces are gone everyone is automatically honest and good. The hierarchy is stagnant; the same institutions remain at the top for centuries. A good reputation lets them select the people with the reputation for being smart (e.g. by school grade), so that reputation accrues to the institutions, which lets them keep selecting intake by reputation and so on. Supposedly Oxford and Cambridge are the best UK universities, they always have been, and they always will be. In a competitive, free market economy they would face competition and other institutions would seek to figure out what their secret is and copy it, like how so many companies try to copy the Toyota Way. In science this doesn't happen because there's nothing to copy: these institutions aren't actually different.
This implies a simple solution, just privatize it all. It would be wrenching, just like it was when the USSR transitioned to a market economy, just like it was when China (sort of) did the same. But one thing the 20th century teaches us is that you can't really fix the problems of a planned economy by tinkering with small reforms at the edges. The Soviets weren't able to fix their culture with glasnost and perestroika. They eventually had to give up on the whole thing. Replacing the current reputation economy with a real economy, with all the mechanisms that economic system has evolved (markets, prices, regulators, court cases, fraud laws etc), seems like a more direct and obvious approach to making things better, even if it may sound extreme.
Oh hey, Mike Hearn! I've long been a fan of yours in Bitcoin. It's good to see you're interested in 'bad science' lately as well -- this is a topic I've also been working on for the last N years along with Bitcoin. I hope we get to interact more in the future. :)
My envisioned solution is similar to yours, here. But rather than "privatize science", which I think most people will interpret as "move to industrial research", my rallying cry is a little more like "hey scientists, stop depending on public funding, let's find creative ways to get the science done."
I also like to point out that money is often not the missing factor as much as community. This has always been true. Mendel discovered genetics by experimenting on beanstalks in his garden at his monastery. It cost him very little to do it, and he only stopped the research when his community told him to stop wasting time on beans and get back to the important accounting work that impacted the church's politics at the time.
You might think that maybe science was cheap in the past, but that today you need lots of money, to get the lab equipment, etc. However, science always has a cutting edge of cheaply evaluable questions. We recently hosted a DIY Synthetic Biologist (currently on the homepage of https://invisible.college) who showed the actual costs of his work, and his laboratory equipment was far, far, cheaper than the "cost" of his time. We can get far more science done with "amateur scientists" (remember that "ama" means love, and an amateur scientist is one doing science for love) by creating a scientific community outside the institutions for interested parties to work together, pool their brainpower and resources, and come up with great novel work.
And if anyone else agrees with me on this, please let me know so we can forces. I'm toomim@gmail.com, and am doing work on invisible.college.
Hello! Absolutely, drop me an email any time you like.
I absolutely agree that a lot of science can be done very cheaply. Some of the most impactful papers were done by people who weren't in an institutional framework, even in the modern era (Satoshi being an obvious example). Additionally it seems most of the really problematic fields are ones where the budget gets dispersed over large number of people writing very cheap low budget papers, hence millions of social science papers with tiny sample sizes.
I'm a big supporter of industrial research though. Many great papers come out of industrial labs. Modern computing is practically defined by such research. The big advances all seem to come from big corporate labs (Xerox PARC, Bell Labs, Google, DeepMind, IBM, Sun, Microsoft, etc). The research is powerful because it's funded by people who expect some sort of meaningful results and supervise the work to ensure it doesn't go completely off the rails. Academic institutions have developed this totally hands off attitude that makes research more or less unaccountable to any standard beyond "will it get published", which in turn can be rephrased as "are the claims interesting".
> The big advances all seem to come from big corporate labs
That's an interesting claim, and I'd encourage you to find some statistics to verify this hypothesis, because in my experience, that doesn't ring true.
From my subjective perspective, it seems that academic and industrial research labs innovate at roughly the same rate per-capita. I was a PhD student when Microsoft was dominant, hiring the best faculty from all top-4 CS schools (CMU, Berkeley, MIT, Stanford), and they certainly produced a lot of papers, and did seem to dominate conferences, but the actual innovation in computing came from Apple and startups, which did not have "research labs". Microsoft, including its giant industrial research lab, certainly was not the driver of innovation in computing!
And here are some numbers to back that up: Microsoft's R&D budget in 2011 was 10x the budget of the entire NSF -- for all sciences. Yet, Microsoft was clearly not producing more than 10x the scientific output of all NSF-funded academic science.
So it would help to have some statistics for the claim that industrial research innovates more than academic research. They certainly pay more, and often hire more people, but per-capita they don't seem any more productive or healthier than academics.
Ah, right. That gets us into the definitions of innovation and research.
Apple does very little research, in the conventional scientific sense we're discussing here, I think that's pretty uncontroversial. They produce few if any papers. They are (or were, under Jobs) very good at coming up with new ideas that strongly appeal to the buyer and which got them a reputation for innovation, but which probably wouldn't be considered clever enough to be research papers. At least not top tier papers.
For example, exposé is a widely imitated feature and was considered very innovative at the time, but it wouldn't be seen as serious computer science. The iPhone is/was widely considered innovative but had basically no new research tech in it, given that capacitive touch screens weren't developed by Apple. It was just a really nicely implemented mobile computer. Actually the innovations in the iPhone are nearly all packagings of tech developed by third party firms that Apple then buys or buys exclusivity rights too. At least, that's true in my view.
Microsoft's R&D budget I think is also a victim of definitions. Software firms normally report all product development as R&D, right? I think these days they may even report datacenter builds as R&D. We can see this on Microsoft's investor website:
"In addition to our main research and development operations, we also operate Microsoft Research. Microsoft Research is one of the world's largest computer science research organizations"
i.e. the kind of university type "scientific" research we're discussing here is only a sideshow in Microsoft's R&D budget.
You're right to call me out though; I don't have any stats to prove that industrial research does more than academic research. It's not a statistical argument to begin with, just my own own perception ("all seem to"). I read a lot of CS papers and the best ones have corporate email addresses at the top - the second best, a mix of corporate and university addresses, the third best, only university addresses. If you asked the man on the street to name the biggest innovations in computing in the past 20 years they'd probably say things like, uh, smartphones, YouTube, AI, blockchain, etc etc. All things that have little connection to universities, with AI being the closest but it was Google that revived that whole field and has been pushing it forward ever since. Neural nets weren't receiving much investment by the academic community before that.
Anyway, that's CS. CS really isn't the problem here. The pseudo-science is elsewhere.
I did peer review for a number of scientific papers that include code. Almost every time, I was the only reviewer that even look at the code.
In most cases, peer reviewers will just assume that authors claiming the "code is available" means that a) it is reproducible and b) it is actually there.
claims the code is available on github, but the github version ( https://github.com/jameswweis/delphi ) contains the actual model only as a Pickle file, and contains no data or featurization.
re-running is definitely too much work for most scientific papers, at least in ML and computational sciences were experiments might take 1000s of core-hours or gpu-hours, but that's usually not necessary. In addition, just running the code can spot really bad problems (it doesn't work) but easily miss subtle ones (it works but only for very specific cases).
I think it's more important for reviewers to read the source, the same way one would read an experimental protocol and supplementary information, mainly checking for discrepancies between what the paper claims is happening and what is actually being done. In the above example, a reviewer reading the code would have spotted that the model isn't there at all, even though it runs fine.
Providing source code is a good thing, but a lot of people confuse re-running experiments with replicating them. If you take the authors' source code and re-run it, then any bugs are going to invalidate your results too. The only way to actually have confidence in the paper's results are to rewrite the software from scratch.
In fact, I'd actually go further and question what kinds of errors could possibly be caught be running the same software that the authors did? Any accidental bugs will remain, and any malicious tampering with the experiment data is exceedingly unlikely to be caught even with a careful audit of the code.
That isn't possible if you're using commercially licensed source from other people, drivers for scientific instruments, lacking copyright assignment for some of it, etc. Same reason many commercial projects can't be open sourced even if the company wanted to.
of course i was simplifying... but it seems obvious to me that enforcing automatic reproducibility in peer reviewed publications can only be a good thing in the long run
My personal opinion is this problem fixes itself over time.
When I was in graduate school papers from one lab at Harvard were know to be “best case scenario”. Other labs had a rock solid reputation - if they said you could do X with their procedure, you could bet on it.
So basically we treated every claim as potential BS unless it came from a reputable lab or we or others had replicated it.
An approach to how go about it is to include a replication package with the paper, including the dataset... This should be regarded as standard practice today, as sharing something was never easier. However, adding a replication package is still done by the minority of researchers...
I can understand why journals don’t publish studies which don’t find anything. But they really should publish studies that are unable to replicate previous findings. If the original finding was a big deal, its potential nullification should be equally noteworthy.
While I would have agree with that when I was younger. I learned there is a lot of possibilities why PhD students (the guys who do studies) fail to replicate anything (and I am talking about fundamental solid engineering).
this was exactly my experience and I remember the paper that I read that finally convinced me. It turns out the author had intentionally omitted a key step that made it impossible to reproduce the results, and only extremely careful reading and some clever guessing found the right step.
There are several levels of peer review. I've definitely been a reviwer on papers where the reviewers requested everything required and reproduced the experiment. That's extremely rare.
Their username is publicly linked to their real-life identity. Revealing the name and institution has a reasonable chance of provoking a potentially messy dispute in real life. Maybe eob has justice on their side, but picking fights has a lot of downsides, especially if your evidence is secondhand.
From what I have read, peer review was a system that worked when academia and the scientific world were much smaller and much more like "a small town." It seems to me like growth has caused sheer numbers to make that system game-able and no longer reliable in the way it once was.
may i ask what field of knowledge the manipulated paper was from? Your page lists CS/NLP, so that field may also be linguistics or neurology (linguistics which would be easier to swallow for me) https://scholar.google.com/citations?user=FMScFbwAAAAJ&hl=en
Some wider questions would be: Are there similar problems in Mathematics/physics versus the life sciences/other social sciences? Are there the same kind of problems across different fields of study?
Also i wonder if replication issues would be less severe if there was a requirement to publish the software and raw data that any study is based on as open source / data. It is possible that a change in this direction would make it more difficult to manipulate the results (after all it's the public who paid for the research, in most cases)
I worked at a prestigious physics lab working for the top researcher in a field. It absolutely happens there and probably everywhere.
The only way to fix replication issues is to give financial and career incentives for doing replication work. Right now there are few carrots and many sticks.
Frankly, sir, it is the reason you wish your anecdote to remain anonymous that such perfidy survives. If these traitors to human reason and the public’s faith in their interests serving the general welfare - after all who is the one feeding them? - became more public, perhaps there would be less fraudulence? But I suppose you have too much to lose? If so, why do you surround yourself in the company of bad men?
The issue is that the authors of bad papers still participate in the peer-review process. If they are the only expert reviewers and you do not pay proper respect to their work, they will squash your submission. To avoid this, papers can propagate mistakes for a long time.
Personally, I'm always very careful to cite and praise work by "competing" researchers even when that work has well-known errors, because I know that those researchers will review my paper and if there aren't other experts on the review committee the paper won't make it. I wish I didn't have to, but my supervisor wants to get tenured and I want to finish grad school, and for that we need to publish papers.
Lots of science is completely inaccessible for non-experts as a result of this sort of politics. There is no guarantee that the work you hear praised/cited in papers is actually any good; it may have been inserted just to appease someone.
I thought that this was something specific to my field, but apparently not. Leaves me very jaded about the scientific community.
What is it that makes you have a nice career in research? Is it a robust pile of publishing or is it a star finding? Can you get far on just pure volume?
I want to answer the question "if I were a researcher and were willing to cheat to get ahead, what should be the objective of my cheating?"
I suppose it depends on how you define nice? If you cheat at some point people will catch on, even if you don't face any real consequences. So if you want prestige within your community then cheating isn't the way to go.
If you want to look impressive to non-experts and get lots of grant money/opportunities, I'd go for lots of straightforward publications in top-tier venues. Star findings will come under greater scrutiny.
Not outright cheating, but cooking results to seem better/surprising and publishing lots of those shitty papers is the optimal way to build a career in many fields. In medicine, for example.
For grants and tenure, 100 tiny increments over 10 years are much better for your research career then 1 major paper in 5 years that is better than all of them put together.
If you want to write a pop book and on TV and sell classes, you need one interesting bit of pseudoscience and a dozen followup papers using the same bad methodology.
This sounds inseparable from the replication crisis. The incentives are clearly broken: they are not structured in a manner that achieves the goal of research, which is to expand the scope and quality of human knowledge. To solve the crisis, we must change the incentives.
Does anyone have ideas on how that may be achieved - what a correct incentive structure for research might look like?
Ex-biochemist here, turned political technologist (who's spent a few years engaged in electoral reform and governance convos)
> the goal of research, which is to expand the scope and quality of human knowledge.
But are we so certain this is ever what drove science? Before we dive into twiddling knobs with a presumption of understanding some foundational motivation, it's worth asking. Sometimes the stories we tell are not the stories that drive the underlying machinery.
For e.g., we have a lot of wishy-washy "folk theories" of how democracy works, but actual political scientists know that most of the ones people "think" drive democracy, are actually just a bullshit story. According to some, it's even possible that the function of these common-belief fabrications is that their falsely simple narrative stabilizes democracy itself in the mind of the everyman, due to the trustworthiness of seemingly simple things. So it's an important falsehood to have in the meme pool. But the real forces that make democracy work are either (a) quite complex and obscure, or even (b) as-of-yet inconclusive. [1]
I wonder if science has some similar vibes: folks theory vs what actually drives it. Maybe the folk theory is "expand human knowledge", but the true machinery is and always has been a complex concoction of human ego, corruption and the fancies of the wealthy, topped with an icing of natural human curiosity.
> I wonder if science has some similar vibes: folks theory vs what actually drives it. Maybe the folk theory is "expand human knowledge", but the true machinery is and always has been a complex concoction of human ego, corruption and the fancies of the wealthy, topped with an icing of natural human curiosity.
The Structure of Scientific Revolutions by Thomas Kuhn is an excellent read on this topic - dense but considered one of the most important works in the philosophy of science. It popularized Planck's Principle paraphrased as "Science progresses one funeral at a time." As you note, the true machinery is a very complicated mix of human factors and actual science.
Modern real science is driven by engineering that is driven by an industry that is is driven by profit and nature. If you are reading a paper that isn't driven by that chain of incentives, then the bullshit probability shoots way up. If someone somewhere isn't reading your paper to make a widget that is sold to someone to do something useful, then you can say whatever you want.
I've thought about it a lot and I don't think it might be achieved.
The trouble is that for the evaluators (all the institutions that can be sources of an incentive structure) it's impossible to distinguish an unpublished 90%-ready Nobel prize from unpublished 90%-ready bullshit. So if you've been working for 4 years on minor, incremental work and published a bunch of papers it's clear that you've done something useful, not extraordinary, but not bad; but if you've been working on a breakthrough and haven't achieved it, then there's simply no data to judge. Are you one step from major success? Or is that one step impossible and will never be achieved? Perhaps all of it is a dead end? Perhaps you're just slacking off on a direction that you know is a dead end, but it's the one thing you can do which brings you some money, so meh? Perhaps you're just crazy and it was definitely a worthless dead end? Perhaps everyone in the field thought that you're just crazy and this direction is worthless but they're actually wrong?
Peter Higgs was a relevant case - IIRC he said in one interview taht for quite some time "they" didn't know what to do with him as he wasn't producing anything much, and the things he had done earlier were either useless or Nobel prize worthy, but it was impossible to tell for many years after the fact. How the heck can an objective incentive structure take that into account? It's a minefield.
IMHO any effective solution has to scale back on accountability and measurability, and to some extent just give some funding to some people/teams with great potential, and see what they do - with the expectation that it's OK if it doesn't turn out, since otherwise they're forced to pick only safe topics that are certain to succeed and also certain to not achieve a breaktrhough. I believe European Research Foundation had a grant policy with similar principles, and I think that DARPA, at least originally, was like that.
But there's a strong entirely opposite pressure from key stakeholders holding the (usually government) purses, their interests are more towards avoiding bad PR for any project with seemingly wasted money, and that results in a push towards these broken incentive structures and mediocrity.
I would go a step further and say that the value of specific scientific discoveries (even if no bullshit is involved) can often not be evaluated until decades later.
Moreover, I would argue that trying to measure scientific value is in fact an effort to try to quantify something unquantifiable.
At the same time, academics have been increasingly been evaluated by some metrics to show value for money. This has let to some schizophrenic incentive structures. Most professor level academics are spending probably around 30% of their time on writing grants, evaluating grants and reporting on grants. Moreover, the evaluation criteria also often demand that work should be innovative, "high risk/high reward" and "breakthrough science", but at the same time feasible (and often you should show preliminary work), which I would argue is a contradiction.
This naturally leads to academics overselling their results. Even more so because you are also supposed to show impact.
The main reason for all this IMO is the reduced funding for academic research in particular considering the number of academics that are around. So everyone is competing for a small pot, which makes those that play to the (broken) incentives, the most successful.
Well, perhaps we can learn from how the startup ecosystem works?
For commercial ventures, you also have the same issue of incremental progress vs big breakthroughs that don't look like much until they are ready.
As far as I can tell, in the startup ecosystem the whole thing works by different investors (various angels and VCs and public markets etc), all having their own process to (attempt to) solve this tension.
There's beauty in competition. And no taxpayer money is wasted here. (Yes, there are government grants for startups in many parts of the world, but that's a different issue from angels evaluating would-be companies.)
Start ups are at an entirely different phase that have something research does not - feedback via market success.
The USSR already demonstrated what happens when you try to run a process dependent upon price signals with their dead end economic theory attempts to calculate a global fair price.
You get what you measure for applies here. Now if we had some Objective Useful Research Quality Score t could replace the price signals. But then we wouldn't have the problem in the first place, just promote based on OURQS.
Startups have misaligned incentives in a monopoly ruled world? Build a thousand messenger variations to get acquired by Facebook, comes to mind. So economic thinking might be harmful here?
You don't make a nice career in a vacuum. With very few exceptions, you don't get star findings in a social desert. You get star findings by being liked by influential supervisors who are liked by even more influential supervisors.
>Lots of science is completely inaccessible for non-experts as a result of this sort of politics
As a non-expert, this is not the type of inaccessibility that is relevant to my interests.
"Unfortunately, alumni do not have access to our online journal subscriptions and databases because of licensing restrictions. We usually advise alumni to request items through interlibrary loan at their home institution/public library. In addition, under normal circumstances, you would be able to come in to the library and access the article."
This may not be technically completely inaccessible. But it is a significant "chilling effect" for someone who wants to read on a subject.
Some journals allow you to specify reviewers to exclude. True that there is no guarantee about published work being good, but that is likely more about the fact that it takes time to sort out the truth than about nefarious cabals of bad scientist.
I think the inaccessibility is for different reasons, most of which revolve around the use of jargon.
In my experience, the situation is not so bad. It is obvious who the good scientist are and you can almost always be sure that if they wrote it it's good.
In many journals it's abuse of process to exclude reviewers you don't like. Much of the times this is supposed to be used to declare conflicts of interest based on relationships you have in the field.
Why do people need to publish? The whole point of publishing was content discovery. Now that you can just push it to a preprint or to your blog what’s the point? I’ve written papers that weren’t published but still got cited.
I need money to do research, available grants require achieving specific measurable results during the grant (mostly publications fitting specific criteria e.g. "journal that's rated above 50% of average citation rating in your subfield" or "peer reviewed publication that's indexed in SCOPUS or WebOfScience", definitely not a preprint or blog), and getting one is also conditional on earlier publications like that.
In essence, the evaluators (non-scientific organizations who fund scientific organizations) need some metric to compare and distinguish decent research from weak, one that's (a) comparable across fields of science; (b) verifiable by people outside that field (so you can compare across subfields); (c) not trivially changeable by the funded institutions themselves; (d) describable in an objective manner so that you can write up the exact criteria/metrics in a legal act or contract. There are NO reasonable metrics that fit these criteria; international peer-reviewed-publications fitting certain criteria are bad but perhaps least bad from the (even worse) alternatives like direct evaluation by government committees.
At some point, there's not going to be enough budget for both the football coach and the Latin philology professor. We should hire another three layers of housing diversity deans just to be safe.
What’s crazy to me is nothing should stop an intelligent person from submitting papers, doing research, etc. even outside the confines of academia and having a PhD. But in practice you will never get anywhere without such things because of the politics involved and the incestuous relationship between the journals and their monetarily-uncompensated yet prestige-hungry army of researchers enthralled to the existing system.
If you add 'self funded' to this hypothetical person, then it would not matter if they play any games. Getting published is really not that hard if your work is good. And if it is good it will get noticed (hopefully during the hypothetical person's lifetime). Conferences have less of these games in my experience and would help.
Also, I know of no researchers personally who are enthralled by the existing system.
I think one of the most famous examples is that of Grosset, who published his work on statistical significance under the pen name "Student.”[0] I wish I could give you a more recent example, but I don't pay attention to authors' degrees much, unless a paper is suspicious and from a journal I am unfamiliar with.
If I am reading between the lines correctly, you are implying there are few undergrads publishing in high caliber journals because of gatekeeping. As a reviewer, I often don't even know the authors' names, let alone their degrees and affiliations. It is theoretically possible that editors would desk reject undergrads' papers, but: a) I personally don't think a PhD is required to do quality research, especially in CS, and I know I am not the only person thinking that; b) In some fields like psychology and, perhaps, physics many junior PhD students only have BS degrees, which doesn't stop them from publishing.
I think that single-authored research papers by people without a PhD are relatively uncommon because getting a PhD is a very popular way of leveling up to the required expertise threshold and getting research funding without one is very difficult. I don't suspect folks without a PhD are systematically discriminated against by editors and reviewers, but, of course, I can't guarantee that this universally true across all research communities.
I believe that “good” research, i.e. that which would be referenced by other “good” researchers and useful in obtaining government grants, reported in the press, and so on is indeed gatekeeped. Some subjects such as mathematics and computer science have had much progress in preprints and anyone can publish anonymously and make a mark. But the majority of subjects are blocked to those already connected, especially soft sciences like sociology, psychology, and economics.
I think the entire academic enterprise needs to be burnt down and rebuilt. It’s rotten to the core and the people who are providing the most value - the scholars - are simultaneously underpaid and beholden to a deranged publishing process that is a rat race that accomplishes little and hurts society. Not just in our checkbook but also in the wasted talent.
The status quo isn't perfect, but I think you are severely exaggerating how bad things are. The fact that nearly all scientific publishing is done by people who are paid to do research (grad students, research scientists, professors, etc.) isn't evidence of gatekeeping. It just means that most people aren't able/willing to work for free.
It also isn't any sort of conspiracy that government grants are given out to people with a proven history of doing good research, as evaluated by their peers.
I personally as a BS holder only along with a (at the time) high school senior published a paper in a top 6 NLP conference. I had no help or assistance from any pH.d or institution.
Maybe not quite as prestigious as nature, but NLP is pretty huge and the conference I got into has average h index of I think 60+
I know a person who got published in high school. They did so by working closely with multiple professors on various projects. You don't have to do a PhD to do this especially if you're a talented and motivated youngster.
I have personally recommended for publication papers written by people who do not have a master's degree. In most cases I did not know that at the time of review, but it did not occur to me to care about it when I did.
I can name several high school students who conducted studies and led first author papers to leading HCI venues. They were supervised by academics though. Would that suffice?
This has nothing to do with gatekeeping. I agree that the current publication and incentive system is broken, but it's completely unrelated to the question if outsiders are being published. The reason why you see very little work from outsiders is because research is difficult. It typically requires years of full time dedicated work, you can't just do it on the side. Moreover, you need to first study and understand the field to identify the gaps. If you try to identify gaps on your own, you are highly likely to go off into a direction which is completely irrelevant.
BTW I can tell you the the vast majority of researchers are not "enthralled" by the system, but highly critical. They simply don't have a choice but to work with it.
I think this is a bit naive. One thing that stops a smart person doing research without a PhD is that it takes a long time to learn enough to be at the scientific frontier where new research can be done. About a PhD’s length of time, in fact. So, many people without a PhD who try to do scientific research are cranks. I don’t say all.
Some quality journals and conferences have double blind reviews now. So the work is reviewed without knowing who the work belongs to. It's not so much the politics of the system as the skills required to write a research paper being hard to learn outside of a PhD. You need to understand how to identify a line of work in a very narrow field so that you can cite prior work and demonstrate a proper understanding of how your work compares and contrasts to other closely related work. That's an important part of demonstrating your work is novel and it's hard to do (especially for the first time) without expert guidance. Most students trying this for the first time cite far too broadly (getting work that's somewhat related but not related to the core of their ideas) and miss important closely related work.
Sure, but I'm dreaming of a whole parallel reformed "New-niversity" system that replaces outdated and wasteful practices with systems that are more productive.
It will probably have to be started by some civic minded billionaires. I don't think the established system can reform itself.
You're like that Chinese sports boss who was arrested for corruption and complained that it would be impossible to do his job without participating in bribery. Just because you stand to personally gain from your corrupt practices doesn't excuse them. If anything, it makes them morally worse!
I don't tell lies about bad papers, only give some peremptory praise so that reviewers don't have ammunition to kill my submission. E.g. if a paper makes a false contribution X and a true contribution Y, I only mention Y. If I were to say "So-and-so claimed X but actually that's false" I would have to prove it, and unless it's a big enough issue to warrant its own paper, I don't want to prove it. Any ways, without having the raw data, source code etc for the experiments, there is no way for me to prove that X is false (I'm not a mathematician). Then the reviews will ask why I believe X is not true when peer-review accepted it. Suddenly all of my contributions are out the window, and all anybody cares about is X.
The situation is even worse when the paper claiming X underwent artifact review, where reviewers actually DID look at the raw data and source code but simply lacked the attention or expertise to recognize errors.
I don’t really buy the comparison entirely. Presumably the sports boss is doing something patently illegal, and obviously there are alternative career paths. OP is working in academia, which is socially acceptable, and feels that this is what is normal in their academic field, necessary for their goals, and isn’t actively harmful.
I wouldn’t necessarily condone the behavior, but what would you do in the situation? To always whistleblow whenever something doesn’t feel right and risk the politics? To quit working in the field if your concerns aren’t heard? To never cite papers that have absolutely any errors? I think it’s a tough situation and not productive to say OP isn’t behaving morally.
More specifically, this paper is focused on the social sciences. That's not to say that this isn't present in the basic sciences either.
But one other thing to note here is that these headlines about a "replication crisis" seems to imply that this is a new phenomenon. Let's not forget the history of the electron charge. As Feynman said:
"We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.
Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of—this history—because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that ..."
Something that I think the physical sciences benefit from is the ability to look at a problem from more than one angle. For instance, the stuff that we think is the most important, such as the most general laws, is supported by many different kinds of measurements, plus the parallel investigations of theoreticians. A few scattered experiments could bite the dust, like unplugging one node in a mesh network, and it could either be ignored or repaired.
The social sciences face the problem of not having so many different possible angles, such as quantitative theories or even a clear idea of what is being tested. Much of the research is engaged in the collection of isolated factoids. Hopefully something like a quantitative theory will emerge, that allows these results to be connected together like a mesh network, but no new science gets there right away.
The other thing is, to be fair, social sciences have to deal with noisy data, and with ethics. There were things I could do to atoms in my experiments, such as deprive them of air and smash them to bits, that would not pass ethical review if performed on humans. ;-)
Your example of looking at a problem from more than one angle made me think of the problem of finding the Hubble constant that describes the rate of expansion of the universe. There are two recent methods which have different estimates for this rate of expansion.
Indeed, one of the things that's possible in physics is to nail down the experimental results to the point where, if two results disagree, you know that they really disagree, and that it's not just a statistical fluke. Then it gets interesting.
In physics, when a result raises more questions than it answers, we call it "job security." ;-)
> More specifically, this paper is focused on the social sciences.
No, it isn't. It looked at a few different fields, and found that the problem was actually worse for general science papers published in Nature/Science, where non-reproducible papers were cited 300 times more often as reproducible ones.
I think you might be mistaken. The study of Nature/Science papers was "Evaluating replicability of social science experiments published in Nature and Science between 2010 and 2015"
Feynman's examples is of people being more critical about certain issues. A better example is the case of "radiation" that could only be seen in a dark room in the corner of your eye, which turned out to be a human visual artifact and wishful thinking.
It's interesting that according to the Wikipedia article it's not entirely certain whether the radiation is producing actual light or just the sensation of light.
I worked in the academic world for two years. What I saw was that lots of people are under a constant pressure to publish, and quantity is often put above quality.
I've seen papers without any sort of value or reason to exist being bruteforced through reviewing just to avoid some useless junk data of no value whatsoever being wasted, all to just add a line on someone's CV.
This is without saying that some Unis are packed of totally incompetent people that only got to advance their careers by always finding a way to piggyback on someone else's paper.
The worst thing I've seen is that reviewing papers is also often offloaded to newly graduated fellows, which are often instructed to be lenient when reviewing papers coming from "friendly universities".
The level of most papers I have had the disgrace to read is so bad it made me want to quit that world as soon as I could.
I got to the conclusion the whole system is basically a complex game of politics and strategy, fed by a loop in which bad research gets published on mediocre outlets, which then get a financial return by publishing them. This bad published research is then used to justify further money being spent on low quality rubbish work, and the cycle continues.
Sometimes you get to review papers that are so comically bad and low effort they almost feel insulting on a personal level.
For instance, I had to reject multiple papers not only due their complete lack of content, but also because their English was so horrendous they were basically unintelligible.
Quality is definitely above quantity in academia in almost all disciplines. The issue is citation count is used as a proxy for quality and its a poor one in many respects.
Until this is fixed, people need to stop saying "listen to The Science", in an attempt to convince others of a given viewpoint. Skeptical people already distrust our modern scientific institutions; not completely obviously, but definitely when they're cited as a cudgel. Continued articles like this should make everyone stop and wonder just how firm the supposed facts are, behind their own favoured opinions. We need a little more humility about which scientific facts are truly beyond reproach.
We also need to listen to the science on things that are clearly established. The replication crisis is not something that affects almost anything in public debate. Evolution is well established science. Large parts of Climate Change are well established science. Etc.
Yea, evolution and climate change are pretty solid. But the projected models do no favor to climate science. Those are bogus. They've mostly been wrong in the past and will be mostly wrong again in the future. There are way too many variables and unknowns, and those accumulate the further out the model reaches.
And your evidence for that is what exactly? Since this replication problem is already known to appear in multiple disciplines, it's quite likely that the same misconduct is happening in other areas too. I think you're being a little too quick to hope that it doesn't affect those areas where you have a vested interest.
There is small but non-zero chance that any given paper on climate change is wrong or fraudulent, but it is absurd to go from that to claiming that we should disregard literally all scientific research that has ever been done. At a certain point you have to accept basic truths like that 2+2=4, or that the earth revolves around the sun, or that the composition of the atmosphere impacts our climate
"Believe science" is incredibly destructive to the entire field. It is quite literally turning science into a religion. Replacing the scientific method with "just believe what we say, how dare you question the orthodoxy". We're back to church and priests in all but name.
In the main people don't literally mean it that way. they're expressing belief int he validity of the scientific method, but the more they explaina nd justify the scientific method the more time consuming it is. When dealing with stupid people or obdurate trolls, the sincere adherent of the scientific method can be tricked into wasting a great deal of time by being peppered with questions that are foolish or posed in bad faith.
You're free to treat it as unquestionable, but the fact remains there is ample evidence of our scientific process being deeply broken with a really bad incentive structure if nothing else.
If you think there is no incorrect science at all in regards to evolution and climate change you're no better than the zealots of any religion.
Perhaps the government should have a team of people who randomly try to replicate science papers that are funded by the government.
The government can then reduce funding to institutions that have too high a percentage of research that failed to be replicated.
From that point the situation should resolve itself as institutions wouldn’t want to lose funding - so they’d either have an internal group replicate before publishing or coordinate with other institutions pre-publish.
This sounds like doubling down on the approach was causing the problems.
The desire to control and incentivize researchers to compete against each other in order to justify their salary is understandable, but it looks like it has been blown so out of proportions lately that it's doing active harm. Most researchers start their career pretty self-motivated to do good research.
Installing another system to double-check every contribution will just increase the pressure to game the system in addition to doing research. And replicating a paper may sometimes cost as much as the original research, and it's not clear when to stop trying. How much collaboration with the original authors are you supposed to do, if you fail to replicate? If you are making decisions about their career, you will need some system to ensure it's not arbitrary, etc.
While I agree that "most" researchers start out with good intentions, I'm afraid I've directly and indirectly witnessed so many examples of fraud, data manipulation, wilful misrepresentation and outright incompetence, that I think we need some proper checks and balances put in place.
When people deliberately fake lab data to further their career, and that fake data is used to perform clinical trials on actual people, that's not just fraudulent, it's morally destitute. Yet this has happened.
People deliberately use improper statistics all the time to make their data "significant". It's outright fraud.
I've seen people doing sloppy work in the lab, and when questioning them, was told "no one cares so long as it's publishable". Coming from industry, where quality, accuracy and precision are paramount, I found the attitude shocking and repugnant. People should take pride and care in their work. If they can't do that, they shouldn't be working in the field.
PIs don't care so long as things are publishable. They live in wilful ignorance. Unless they are forced to investigate, it's easiest not to ask any questions and get unpleasant answers back. Many of them would be shocked if they saw the quality of work done by their underlings, but they live in an office and rarely get directly involved.
I've since gone back to industry. Academia is fundamentally broken.
When you say "double-checking" won't solve anything, I'd like to propose a different way of thinking about this:
* lab notebooks are supposed to be kept as a permanent record, checked and signed off. This rarely happens. It should be the responsibility of a manager to check and sign off every page, and question any changes or discrepancies.
* lab work needs independent validation, and lab workers should be able to prove their competence to perform tasks accurately and reproducibly; in industry labs do things like sending samples to reference labs, and receiving unknown samples to test, and these are used to calculate any deviation from the real value both between the reference lab and others in the same industry. They get ranked based upon their real-world performance.
* random external audits to check everything, record keeping, facilities, materials, data, working practices, with penalties for noncompliance.
Now, academic research is not the same as industry, but the point I'm making here is that what's largely missing here is oversight. By and large, there isn't any. But putting it in place would fix most of the problems, because most of the problems only exist because they are permitted to flourish in the absence of oversight. That's a failure of management in academia, globally. PIs aren't good managers. PIs see management in terms of academic prestige, and expanding their research group empires, but they are incompetent at it. They have zero training, little desire to do it, and it could be made a separate position in a department. Stop PIs managing, let them focus on science, and have a professional do it. And have compliance with oversight and work quality part of staff performance metrics, above publication quantity.
This is what industry does though. That is in the less theoretical fields. If you actually want to make something that works, then you need to base your science on provable fact. Produce oil, build a cool structure, generate electricity. Based on amazing and complex science, but it has to work.
Conclusion is that the science that is done needs to be provable, but that means practical. Which is unfortunate. Because what about all that science that may be, or one day may be, practical?
The trouble is, industrial researchers usually don't publish negative results or failures to reproduce. So it takes a long time to correct the published scientific record even if privately some people know it's wrong.
This is like the xkcd test for weird science: Is some big boring company making billions with it? If so (quantum physics) then it’s legit. If not (healing crystals, orgone energy, essential oils...) it probably doesn’t work.
>Is some big boring company making billions with it? If so (quantum physics) then it’s legit. If not (healing crystals, orgone energy, essential oils...) it probably doesn’t work.
Research is non-linear and criteria based evaluation is lacking in perspective. You might throw away the baby with the bathwater. Advancement of science follows a deceptive path. Remember how the inventor of the mRNA method was shunned at her university just a few years ago? Because of things like that millions might die, but we can't tell beforehand which scientist is a visionary and who's a crackpot. If you close funding to seemingly useless research you might cut the next breakthrough.
I don't really see why "being shunned" or "being a visionary" has anything to do with this, to be honest. If you set up a simple rule: "the results have to be reproducable", then surely it shouldn't matter whether or not the theory is considered "crackpot" or "brilliant"?
Reproducibility is 10x more expensive, you have to make sure you can replicate all the conditions exactly, and there are no thanks at the end for all that effort. The incentive is to publish instead of finding the truth.
I don't really like the idea of 'replication police', I think it would increase pressure on researchers who are doing their job of pushing the boundaries of science.
However, I think there is potential in taking the 'funded by the government' idea in a different direction. Having a publication house that was considered a public service, with scientists (and others) employed by the government and working to review and publish research without commercial pressures could be a way to redirect the incentives in science.
Of course this would be expensive and probably difficult to justify politically, but a country/bloc that succeeded in such long term support for science might end up with a very healthy scientific sector.
- You would need some sort of barrier preventing movement of researchers between these audit teams and the institutions they are supposed to audit otherwise there would be a perverse incentive for a researcher to provide favorable treatment to certain institutions in exchange for a guaranteed position at said institutions later on. You could have an internal audit team audit the audit team, but you quickly run into an infinitely recursive structure and we'd have to question whether there would even be sufficient resources to support anything more than the initial team to begin with.
- From my admittedly limited experience as an economics research assistant in undergrad, I understood replication studies to be considered low-value projects that are barely worth listing on a CV for a tenure-track academic. That in conjunction with the aforementioned movement barrier would make such an auditing researcher position a career dead-end, which would then raise the question of which researchers would be willing to take on this role (though to be fair there would still be someone given the insane ratio of candidates in academia to available positions). The uncomfortable truth is that most researchers would likely jump at other opportunities if they are able to and this position would be a last resort for those who aren't able to land a gig elsewhere. I wouldn't doubt the ability of this pool of candidates to still perform quality work, but if some of them have an axe to grind (e.g. denied tenure, criticized in a peer review) that is another source of bias to be wary of as they are effectively being granted the leverage to cut off the lifeline for their rivals.
- You could implement a sort of academic jury duty to randomly select the members of this team to address the issues in the last point, which might be an interesting structure to consider further. I could still see conflict-of-interest issues being present especially if the panel members are actively involved in the field of research (and from what I've seen of academia, it's a bunch of high-intellect individuals playing by high school social rules lol) but it would at least address the incentive issue of self-selection. Perhaps some sort of election structure like this (https://en.wikipedia.org/wiki/Doge_of_Venice#:~:text=Thirty%....) could be used to filter out conflict of interest, but it would make selecting the panel a much more involved and time-consuming process.
The "Jury Duty" could easily be implemented in the existing grant structure - condition some new research grant on also doing an audit of some previous grant in your field (and fund it as part of the grant).
Depending how big the stick is and how it's implemented, this might push people away from novel exploratory research that has a lower chance of replicating despite best efforts.
Pulling up the actual paper, there is an added part the article doesn't mention.
> Prediction markets, in which experts in the field bet on the replication results before the replication studies, showed that experts could predict well which findings would replicate (11).
So it's even stating that this isn't completely innocent, given different incentives most reviewers identify a suspicious study, but under current incentives it seems letting it through due to the novelty is somehow warranted.
This is almost a tautology. Unlikely/unexpected findings are more noteworthy, so they're more likely to be both cited and false, perhaps based on small sample sizes or p-hacking.
People love this stuff. Malcolm Gladwell's made a career on it: half of the stuff he writes about is disproven before he publishes. It's very interesting that facial microexpressions analysis can predict relationship outcome with 90% certainty. Except it's just an overfit model, it can't, and he's no longer my favorite author. [0]
Similarly, Thomas Erikson's "Surrounded by Idiots" also lacks validation. [1]
Both authors have been making top 10 lists for years, and Audible's top selling list just reminded me of them.
Similarly, shocking publications in Nature or Science are to be viewed with skepticism.
I don't know what I can read anymore. It's the same with politics. The truth is morally ambiguous, time consuming, complicated, and doesn't sell. I feel powerless against market forces.
One my pet peeves is when the local NPR station will advocate some position or policy based on a recent small study (usually by/at some state school), sometimes they'll couch it saying it's not peer reviewed, it's preliminary, or something, but it's too late, they already planted the seed, had their talking point -all with a study to back up their position and listeners just go along with it.
Start up idea - run hundreds of “preliminary studies” with small sample sizes, then sell any of the p-hacked results to marketing or news groups. Want a story about how milk boosts intelligence? Want a story for how a standing desk leads to promotions? We have it all
That market segment is already completely satisfied with existing options. The "hundreds of studies" are done for free by academics without meta-epistemic-scruples, and the selling to marketing groups is done for free by marketers without epistemic scruples.
Terrible start-up idea that has a better revenue model. Sell the experiment design to politically motivated organizations that want certain outcomes in the public sphere. Hack your way to results. Give to media for free.
Or they develop the critical thinking skills to realize that science is 95% failing and 5% "we don't reject the alternative". It is not absolute, and certainly the scientists themselves are deeply flawed being that they're human.
But here's the thing, people don't have time for this. They have work, bills, home and car maintenance, groceries, kids, friends, a slew of media to consume, recreation on top of it - and they're all dying. So it doesn't matter who says what, they're going to pick the dilute politicized version of the results that their team supports and run with it regardless of what the nigh-unreadable highly specialized papers say. Orwell said it well "I believe that this instinct to perpetuate useless work is, at bottom, simply fear of the mob. The mob (the thought runs) are such low animals that they would be dangerous if they had leisure; it is safer to keep them too busy to think."
And those who do elect the burden of extracurricular mental activity aren't given much in the way of options in any case. What are they to do, disseminate the material to their friends, co-workers, children - quite probably the very same population as mentioned above, weighted with the ceaseless demands of reality? To what end? Chinese whispers? It's better to have them say, "I don't know, I'm not convinced either way." A construal which is developed from adequately exercised critical skills. But that's another discussion about perverse social conditioning no doubt evolved from the deployment of poorly understood technique compounded by its acceptance as custom in education - I'm speaking of course about grading and student assessment. Nobody wants to be stupid at the very least, and professing one's ignorance is construed as an admission of guilt.
That’s why this whole journal peer review thing is bullshit. There’s a better solution: read pre prints and let people rank/discuss them as organic peer review.
“We did it Reddit” stands starkly counter to your recommendation.
I don’t know what the answer is, and I’ve been worried for a while that we are putting blind faith in “science” which just lines up with our preferred worldview. Maybe the answer is simply to use science, however it is performed, to inform, not guide, policy, and always keep in mind that what science believes has a non-zero chance of being politically-driven itself.
This is interesting. My first reaction is that the upvotes there are not a measure of quality because, I assume, they are based mainly on titles and abstracts. But I will try to use that site a bit more, so thanks for sharing.
"You might hypothesize that the citations of non-replicating papers are negative, but negative citations are extremely rare.5 One study puts the rate at 2.4%. Astonishingly, even after retraction the vast majority of citations are positive, and those positive citations continue for decades after retraction.6"
They state the cause for replication being the "findings" in the papers "are interesting".
Is this really the case? And is this actually a "new" phenomenon?
It seems like it could be a disguised version of the Availability Cascade. [1] In other words, when we encounter a simple-to-understand explanation of something complex, the explanation ends up catching on.
Then, because the explanation is simple, its popularity snowballs. The idea cascades like a waterfall throughout the public. Soon it becomes common sense—not because of sense, but because of common.
I don’t mean to pick on one field in particular, but last year I made the throwaway comment to a FB post “arXiv is the new businesswire”.
The number of academic “big shots” (friends of the poster, not of me) who “liked” the comment was a bit alarming.
There’s too much incentive for fudging things (depending on your field, either grants or company funding).
The degree of fraud in Chinese journals is high and well discussed (as it should be). But apart from a small amount of hand-wringing over “the replication crisis” there is no similar condemnation of the work in the rest of the world.
There was a comment I read somewhere (not sure where exactly) but they stated that the modern peer review process would never let someone like Einstein who was a patent clerk ever get the limelight.
The paper implies that less reproducible papers have a greater influence on science because they are more highly cited. But an alternate explanation suggests the opposite -- less reproducible papers are more highly cited because people are publishing papers pointing out the results are false.
It is also quite telling that the biggest differences in citation counts are for papers published in Nature and Science. But in discipline specific journals (Figs. 1B,C), the effect is very modest. Practicing scientists know that Science and Nature are publish the least reproducible results, in part because they like "sexy" (surprising, less likely to be correct) science, and in part because they provide almost no detail on how experiments were performed (no Materials and Methods).
The implication of the paper is that less reproducible science has more impact than reproducible science. But we know this is wrong -- reproducible results persist, while incorrect results do not (we haven't heard much more about the organisms that use arsenic rather than phosphorus in their DNA -- https://science.sciencemag.org/content/332/6034/1163 )
When I started a PhD in a biomedical field at a top institution, we were told that our results are not that important; what's important is the ability to "sell them". This focus on presentation over content permeated the whole establishment. I remember sitting with a professor trying to come up with plausible buzzwords for a multi-million grant application.
The phenomenon described in the articles sounds like a natural consequence of this attitude.
I'm not really surprised about the results related to Nature (and Science to a lesser degree). I have seen it multiple times that Nature editors (who are not experts in the field) have overruled reviewer recommendations to reject, because the results were exciting.
The incentives for Nature are not to produce great science, but to sell journals and that requires them to give the impression of being on the forefront of "scientific discovery". I've in fact been told by an editor "our job is to make money, not great science)".
The irony is that their incentives also make them very risk averse, so they will not publish results which don't have a buzz around them. I know of several papers which created new "fields" which were rejected by the editors. The incentive also results in highly cited authors having an easier time to get published in Nature.
I should say that this is typically much better in the journals run by expert editors, published by the technical societies like e.g. IEEE.
yeah - I'd be surprised to learn that the average paper in their study had 153 citations regardless of other things. Perhaps they only looked at highly cited papers, but that induces its own issues.
> “Given this prediction, we ask ‘why are non-replicable papers accepted for publication in the first place?’”
Obviously, because journals don't attempt replication, nor should they.
A study will only have replication by another researcher/team after it's been published. Or not, in which case that's publishable as well.
This is how science works and is supposed to work.
The problem should rather be: why are journals accepting papers with citations to debunked papers that don't also cite the debunking papers?
I've had plenty of friends have a paper get sent back for revisions because it was missing a newer citation. This should be a major responsibility of reviewers. So why aren't peer reviewers staying on top of the literature?
There is a really good reason for this. Everyone wants the shock value for the easy publication but no one wants to do the work to validate the original study because that can imply possibly being wrong.
Almost certainly this has to do with popular or cherished narratives about reality increasingly diverging from reality. I have seen it here on HN over the past several years. HN used to be a place of thoughtful discussion. Now it is becoming increasingly politicised and redditified.
To me it seems all our social structures are decaying. Even look at the language and how everything is faux hyperbole these days. People are forgetting how to think, how to write, how to reason all for the sake of the woke cultural revolution. The political correctness of thought is valued over the ontological correctness.
Have some portion of academic funding distributed by assigning a pool of money to grantors and having them bet portions of that money on papers they consider impactful AND likely to replicate. Allow the authors to use the money on any research they want.
This lets authors get some research money without writing costly and wasteful grant proposals. It takes advantage of the fact that experts in the field can generally tell which studies are likely to replicate (I'm assuming granting agencies can find experts in the field to do this).
> Littlewood's law states that a person can expect to experience events with odds of one in a million (defined by the law as a "miracle") at the rate of about one per month. - wiki
> At a global scale, anything that can happen will happen a small but nonzero times: this has been epitomized as “Littlewood’s Law: in the course of any normal person’s life, miracles happen at a rate of roughly one per month.” This must now be extended to a global scale for a hyper-networked global media covering anomalies from 8 billion people—all coincidences, hoaxes, mental illnesses, psychological oddities, extremes of continuums, mistakes, misunderstandings, terrorism, unexplained phenomena etc. Hence, there will be enough ‘miracles’ that all media coverage of events can potentially be composed of nothing but extreme outliers, even though it would seem like an ‘extraordinary’ claim to say that all media-reported events may be flukes.
For this to be true a person has to experience a large number of events. For any meaningful definition of event that's false. So clearly this law requires an event to be routine in some extreme way in order for enough of them to occur that these outliers are hit. So maybe my miracle for this month is I take a breath that is 1/1,000,000 in how unusually large it is.
It is about media - the interned and other media are gathering all the outliers and reporting on them - it is not that a person experiences now more events than he used to before the internet.
Keep in mind that highly cited papers are usually first to their field on a particular subject, and it makes sense that they will be usually be less "accurate" simply because they don't have the luxury of building off of a direct line of prior research.
The first Tesla Model S isn't anywhere as good as the current version, but it was still groundbreaking. Academic publications work the same way.
I think the issue with this analogy is: the new and improved Tesla Model S is hugely popular and profitable, while the academic papers that later revisit and improve upon those initial (high impact but less accurate) findings receive little attention or grant support, and consequently the "inaccuracies" persist.
This seems really obvious. People demand citations for bold claims, precisely because those claims are hard to believe.
The problem is that some fields don’t do replication studies. In such fields these highly cited papers with dubious findings are never debunked, because you know economists to pick on one specific group, don’t do replication studies.
1/ For best journals: a non-replication betting market for peer-reviewed and published papers. Grants to replicate papers that are highest in the betting pool.
2/ Citation index were citing and publishing a paper that does not replicate lowers the score.
(This should be first used in machine learning research)
This is Psychology. Psychology has lots of problems and those are not going away anytime soon. I think more deeper biological monitoring technology like fmri etc. will pave way to more robust studies other than that Don't have much hope..
fMRI has been at the avant-garde of p-hacking since its inception. I'm the co-author of several fMRI papers, but not responsible for the actual analysis. That was done by a PhD student under supervision by the head of the institute, fairly influential in his field. The first experiment didn't yield "significant" results, not even after a few months of trying all the possible ways to manipulate the data, but the values were close to acceptable. But since there was no budget to run another experiment, it was decided to add 8 subjects more. And lo and behold, after a few more months of data wrangling, there were significant blobs, and the article was published.
To add insult to injury: when subsequent experiments didn't give any interesting results, and the PhD didn't have enough material for graduation, they decided to split the data they already had by gene variants, because technology had just reached the point where that kind of analysis could be done by a sufficiently rich lab. And there was a new result! The original blob was now seen in two areas instead of one, depending on variant. Highly publishable. That that meant that the original finding was now invalidated wasn't even considered.
agree, the sausage-making in science is ugly, but my point is if technology is more accessible, there will be some reliable data points to go by rather than kludging together some experimental premise and then making profound pronouncements. Then those findings are then parroted by everyone.
Take for example a study done in about 2005 that said "willpower is an exhaustible resource" - They gave some participants some problems to solve - they were cooking cookies at that back so the participants could smell them. Later, half of the participants were given cucumbers and the other half cookies and then observed that people given cookies persisted for longer in solving the problems. This proved "willpower is an exhaustible resources".
Is willpower exhaustible? what if the participants while they were giving up early where given an incentive of 500 USD to continue working on the same problem, would they work for much longer than the other group? why? willpower is an exhaustible resource, how can you draw from something that now no longer exists? would you then conclude that willpower can be depleted in humans, but humans have the ability to synthesize it from currency. Ofcorse thats ridiculous.
Willpower and expended effort are context based and each person can set their own threshold of when its appropriate to give up - its partly a conscious decision. A few years later the authors of this same study walked back on their claims that "willpower is an exhaustible resource" ( it does not stop people from touting this around as reasoning for whatever point they are trying to make ).
My point is bio-monitoring technology with probably VR will atleast give us some verifiable data-points..
Psychology has been using "bio-monitoring" since Helmholtz, although they think of it as involuntary or unconscious (modification of) response, e.g. reaction time, pupil dilation, heart rate. It all has the same problem: the distance between the hypothesis, the experimental manipulation, and the measurement is large and badly understood. Differences between conditions are then ascribed to noise, which enables "creative" interpretations of the results.
As an example: in the field in which I worked, it was commonly assumed that the processing mechanism had to interpret a part of the stimulus as belonging to one thing or the other. Then they would manipulate the context to see how they could influence that. There were many results and experimental refinements, leading to an endless stream of contradictory articles, all claiming victory for their theory. But there are models in which this assumption doesn't make sense. Instead, it could belong to both, and the ambiguity would be resolved later. That makes the whole research line I described nonsense. The data is valid (if well documented), but the analyses and conclusions are not.
A genetics statistician explained to me that all early genetic research (and that means up until 2000 at least) is highly unreliable, because they used Neyman-Fisher significance testing with high p values, like 0.01. The lack of understanding was (and probably still is) so great, that 0.01 was simply not enough to exclude random results. Nearly all those papers are irreproducible. Nowadays, 5 sigma, which is less than 0.000001, is considered safe, although he himself was convinced that Bayesian is the only way to go.
The understanding of the subject in psychology is considerably worse than in biology. No amount of bio-monitoring is going to overcome that.
I know, it would have been fine if it was just another discipline of science or research that was hard to study..
The amount of interest from regular people and in the subject and the demand for research is so much that people want to hang on to twigs.. most people want to improve themselves and be able to think and do better and take atleast a fleeting interest in psychology... Evidenced by the rise of Jordon Peterson -
I recently found Dr.Andrew Huberman to be a voice of reason (check out his Youtube podcast, its very information-dense and pretty good) - He approaches things from a neurobiology perspective. To navigate around getting a deeper understanding yourself, I thing a chemical / hormone / neurotransmitter based discussion at least grounds your feet in real things, from which you can probably explore things for yourself.. It also eliminates a lot of the cruft from the discussion..
This is a special case of a more general problem. Inferior technologies very often win in our industry because they are better marketed as well. Marketing is a completely distinct and unrelated skill from doing.
By definition a paper which is wrong is going to be cited a whole bunch as people disprove it though.
The real issue is that citation count is a terrible metric and always has been (when I was doing research the magic you wanted was the low but not no citation count paper which went into a huge amount of detail on exactly what they did).
My most cited paper has the head of an institution down as an author. Most, but far from all, of the citations are from students there. A previous paper introduced the concept and tools to apply it, while the cited paper just adds an example application and a host of authors.
Oddly, I think this kinda is, especially the 150x difference they're claiming.
Replication failures disproportionately hit high-profile journals. The journals' editorial processes favor "surprising", counter-intuitive claims: life based on arsenic, stem cells created by treating skin cells with a mild acid, cold fusion, etc. The prior probability these are true is lower (hence the surprise). At the same time, this also makes it more likely that someone will attempt to replicate the results--and that there will be interest in a negative paper if they can't.
This may be an existential property of the Scientific Community: the more novel, the more interest, the more popularity ... but also the more likely to be untrue.
In Robert Heinlein's novel Starship Troopers there is a scene or two where the human forces randomly have psychics on their team and it's largely unexplored in the novel. I always thought of it as Heinlein projecting physics to say "In the future we'll have spaceships and robot suits" and projecting psychology out to "Psychics and remote sensors".
Maybe that was reasonable given how science was advancing at the time. If so, what powers and advances have poor research systems denied us?
> what powers and advances have poor research systems denied us
Alternatively, it may be that all the low hanging fruit in the known orchards are picked. But there are still workers reaching for the remaining higher, rarer fruit. Only a few are needed for what fruit remains. But there is instead a glut of workers. A few are up to the task. The rest pick up leaves and make convoluted arguments about their potential value. "These leaves are edible!"
Maybe we need can find new orchards. Or maybe the unpicked orchards are too far away to ever reach. Or maybe they don't exist at all.
This is the big problem with the 'marketplace of ideas' and the notion that better education and dialog will fix everything. In a small marketplace, like a farmer's market, that's true. Also, markets based on a fungible commodity work OK as long as there's standardization of the product and conditions of perfect competition or something close to them obtain.
But in most marketplaces advertising is a dominant force, and advertising power is not a function of product quality; you can market a bad product successfully. There are standards of truth in advertising, but they're loose and poorly enforced, the burden of evaluation falls on the consumer. Additionally, tribalist behavior builds up around products to a certain extent, as buyers of product X who dislike hearing it's good or crap push back on such claims, for varying reasons. Where these differences are purely aesthetic that doesn't matter much, but where they're functional a objectively better product can lose out against a one from a dishonest competitor or one with an irrationally loyal following. A problem for both producers and consumers is that it's more expensive to refute a false claim than to make it, so bad actors are incentivized to lie and lock int he advantage; it's arguably cheaper to apologize if caught out than to forgo the profitable behavior, as exemplified in the aphorism 'it's easier to ask forgiveness than permission.'
We see the results with problems like the OP and also in things like debates about public health, vaccine safety, climate change, and many other political issues. It's profitable to lie, a significant number of people have no problem doing it, the techniques of doing so have been repeatedly refined and weaponized, and those who rely on or simply prefer truth end up at a significant economic disadvantage. don't make the mistake of thinking this is problem confined to the niche world of academic journals.
Are papers ever cited if the results are disproven? Would papers have cited the notorious Wakefield paper on vaccines and autism if they were writing about how their results do not match that paper? Does that count as a cite?
Yes, citations aren't endorsements. Some citations (rarely) are negative/explicitly critical, but this kind of difference isn't tracked. A citation is a citation.
References are often ambiguous. A mention like “Although some have argued X[1]” could be skeptical, but not explicit, about Reference #1’s quality. I could mean that there’s convincing data on both sides, or I could mean that Ref #1 is hot garbage (but don’t want a fight).
In some cases, there might be legitimately useful information in a retracted paper. If a paper describe an experiment, but then shows faked results, I’m not sure it’s wrong to cite it if you use a similar setup; that is where the idea came from, after all.
Most critically, there’s an unpredictable and often long lag between reading a paper, citing it one’s own work, and that work being published. I’ve had things sit at a journal for a year before publication, and it never occurred to me to “re-verify” the citations I included; indeed, I’ve never heard of anyone doing that.
My whole perception of academia and peer review changed that day.
Edit to elaborate: like many of our institutions, peer review is an effective system in many ways but was designed assuming good faith. Reviewers accept the author’s results on faith and largely just check to make sure you didn’t forget any obvious angles to cover and that the import of the work is worth flagging for the whole community to read. Since there’s no actual verification of results, it’s vulnerable to attack by dishonesty.