Failures of replication in psychology

The replication crisis in science is often discussed. It is often used to suggest that science is not trustworthy and may be just as flawed as religion. Many of the most important results in psychology and other areas have not been replicated. These criticisms ignore the large number of hard science studies that have been replicated. My knowledge is that DNA still has a double helix and Jupiter is bigger than Earth. Benzene also has six carbon atoms. And the continents move on tectonic plates. Of course, nobody has tallied up the percentage of all field results that have been reproduced. Nevertheless, replication failures are not only concerning but inevitable because science is an ongoing process. They also allow us to add or subtract credibility from a hypothesis.A list of failures in replication serves to remind us that science can be flawed, and that it is an ongoing enterprise that is open to revision. There is no proof in science. The concept of proof is only for mathematics. In science, the accumulation of evidence gives us more or less confidence about a hypothesis. Remember, however, that scientific facts are extremely unlikely to be rewritten and can be proved using any reasonable layperson's idea of proof. Normal water molecules have two hydrogen atoms, and one oxygen atom. The normal form of DNA, a double helix is the speed of light in vacuum (299792458 meters per second, or roughly 186,000 miles per sec).The below list of reverses is only for psychology. It is 18 months old. It is sourced from the site argmin gravity and was created by Gavin, an AI PhD candidate at Bristol.Gavin gives the following caveats. The most important is that failure to replicate does not necessarily mean that the original result was incorrect or that someone cheated. Psychological studies can use different samples from different locations. The statistical power of tests that detect effects is dependent on the sample size. Different statistical tests can give differing results. There could also be confirmation bias when accepting a result. If you apply the 5% level, approximately 1 in 20 tests will produce a false positive.Medical reversals are when an existing treatment is found not to be effective or harmful. In recent years, psychology has been experiencing a lot of reversals. Only 40-65% of its original social results were replicated in the weakest sense that significant results could be found in the same direction. Even in cases where replication was possible, the average effect that was found was only half of the original reported effect. These errors are much less costly than medical mistakes, but it's still pollution. So here's the cleanup. Psychology isn't the only discipline with irreplicable results: economics, medicine, and cancer biology all have similar results. Psychology is not to be dismissed. We know a lot about the problems because psychologists. Subfields of psychology differ by replication rate and effect size shrinkage. Psychology reversals are prominent because it is a very open field in terms code and data sharing. An unscientific field would not have been able to catch its own bullshit. These are empirical findings about empirical findings. They can all be reversed. It's not that these claims are false. Failed replications (or proofs fraud) often just challenge the evidence for one hypothesis rather than affirming the opposing hypothesis. I have tried to refrain from saying successful or unsuccessful replications and instead report the best-guess effects size, rather than playing the Yes/No science game. These figures are correct as of March 2020. I will make some efforts to keep them current but not too much.Here is the code to convert between Cohens d or Hedges, g.To see the reverses, click on the image.Gavin has discussed 13 different branches of psychology. I will only mention one example from each. These are experiments that are well-known or which have failed to replicate. You can see statistics from both the original papers as well as attempts to reproduce them on the website. You can also find citations to many other papers.Gavin's words are indentedPsychology of social relationsThere is no good evidence that anything was found in the Stanford prison experiment. It wasn't an experiment. Demand characteristics and scripting of abuse were used; constant experimenter intervention was used; participants faked their reactions. Zimbardo admits that they started with no specific hypotheses.Positive psychologyThere is no evidence to support facial-feedback (that smiling can improve mood, while pouting can make you feel bad).Cognitive psychologySome readings of the Dunning-Kruger effect are questioned.Psychology for development:After 10,000 hours of practice, expertise is attained (Gladwell). The supposed proponents were disowned.Psychology of personality:Hans Eysenck's work should be suspected, especially these 26 unreliable papers (including one that says reading prevents cancer).Behavioural scienceNudges (clever design for defaults) can have a significant effect on the user's perception of the system. A large review revealed that the average effect was six times less than what was claimed. (This is not to say that there aren't big effects.MarketingBrian Wansink was negligently guilty of gross malpractice. 50 labs papers contained fatal errors. These include flashy results that increased portion sizes dramatically reduced satiety.Neuroscience:Readiness potentials are not diagnostic, but causal. Libets studies don't prove what they claim to. Although we don't have free will, random circuit noise can tip the evidence when it is weak. But this is a different story.I've seen the references to failure to reproduce Libet and it doesn't seem like conscious will is involved in making decisions. They show that neural inputs, whether random or nonrandom, influence decisions and that brain activity can predict behavior before the subject is aware that they have made a decision. That is not something I disagree with. If free will is to be understood, it must include the causation or occurrence of an action through a conscious decision. This is especially true for dualists. The Libet experiment and many others have shown a real decoupling between brain activity and the ability to predict an action and the consciousness of performing that action. This is an indisputable proof of dualistic freewill, but not compatibilist freewill. Nearly all its adherents accept physical determination and reject dualism.PsychiatryThere is very little evidence to suggest that psychiatric hospitals in the 1970s could detect normal patients if there was no deception.ParapsychologyThere is no good evidence to support precognition. Undergraduates can improve their memory test performance by studying after taking the test. This one is great because Bems statistical techniques were perfect in the sense that they were exactly what everyone was using. Bem is Patient Zero in the replication crises and has been a tremendous help to us all. (Heavy reliance on a flat/frequentist prior; evidence for optional stopping; forking path analysis.Evolutionary psychologyThere is no evidence to support the menstrual-cycle version of the dual-mating strategy hypothesis. This hypothesis states that heterosexual women prefer uncommitted relationships with more masculine men during the high-fertility phase of their menstrual cycles, but favors long-term relationships elsewhere. Studies usually only cover one cycle and are often very small (median n=34). However, the funnel plot is acceptable.PsychophysiologyThere is very little evidence that the sympathetic nervous system predicts political ideology. Particularly, the subjects' skin reactions to disturbing or disgusting visual prompts is a noisy and questionable measurement.Genetics of behaviour:You should be suspicious of candidate gene findings (post-hoc data mining showing large >1% contributions to a single allele). There are 18 replications of candidate genes for depression. 73% of candidates did not succeed in replication in psychiatry. Without multiple replications, a single journal will not publish them. A massive GWAS, n=1,000,000: We found no evidence of enrichment in genes that were previously thought to be related to risk tolerance.h/t Luana