“Uh, hypothetical situation: you see a paper published that is based on a premise which is clearly flawed, proven by existing literature.” So began an exasperated Twitter thread by Andrew Althouse, a statistician at University of Pittsburgh, in which he debated whether a study using what he calls a “nonsense statistic” should be addressed by letters to the editor or swiftly retracted.
The thread was the latest development in an ongoing disagreement over research in surgery. In one corner, a group of Harvard researchers claim they’re improving how surgeons interpret underpowered or negative studies. In the other corner, statisticians suggest the authors are making things worse by repeatedly misusing a statistical technique called post-hoc power. The authors are giving weak surgical studies an unwarranted pass, according to critics.
Senior author David Chang feels “online trolls” are presenting one-sided arguments, which do not justify retraction. “If we had completely fabricated our data, that would be the only justifiable reason for retracting the study,” Chang told Retraction Watch. “There is no reason to demand retraction based on differences of opinion.”
The study Althouse is referencing, “Is the Power Threshold of 0.8 Applicable to Surgical Science?-Empowering the Underpowered Study,” was recently published in the Journal of Surgical Research by a group of Massachusetts General Hospital investigators, led by Yanik Bababekov.
By looking at the post-hoc (also called “observed”) power of negative studies, the article suggests surgical studies aren’t living up to the widely-accepted goal in biomedical research of achieving 80% statistical power. Statistical power measures a study’s ability to detect treatment effects. Studies with low statistical power, like those with small samples, are more likely to show false negative and false positive results.
The authors conclude that “the surgical community should reconsider the power standard as it applies to surgery.” If these conclusions were acted upon, it might mean adopting new surgical techniques based on weaker evidence from smaller studies.
These conclusions are being questioned because post-hoc power is considered unreliable and misleading, according to statisticians. “The problem is that a post-hoc power calculation is just a transformation of the p-value,” Althouse told Retraction Watch, referring to the much-maligned statistic. “Observed power has nothing to do with the study’s actual designed power to detect a meaningful difference [from treatment].”
Widely-used research guidelines agree. The CONSORT guidelines, for example, state “there is little merit in a post hoc calculation of statistical power using the results of a trial.” Instead, experts recommend calculating the study’s power before it is performed using the minimal effect that would warrant adoption of a treatment.
Another concern statisticians raised is that the article only examined negative studies, which presents a biased view of the surgical literature. Althouse suggested that the authors practically guaranteed surgical studies would appear underpowered by only looking at negative results. This biased finding might mislead readers into thinking most surgical studies are too small to be meaningful.
Chang believes statisticians do not appreciate the practical context in which his paper was written. Surgeons, he emphasizes, misinterpret underpowered “negative” studies as evidence that two techniques are equally safe and effective. Surgeons “write these papers claiming A is as safe as B based on nothing other than p>0.05,” Chang says.
To discourage these misleading interpretations, Chang sought to use post-hoc power “as a damage control measure to inject some caution” into how doctors interpret negative studies, although he recognizes that larger changes are needed to how studies are conducted and reported.
In fact, statisticians critiquing the research have been sympathetic to this problem. Andrew Gelman, a statistician at Columbia University, wrote that the authors are “completely right” in bringing attention to the misinterpretation of negative studies. Althouse says that he agrees “they have identified a real problem,” but he feels their proposal is “not actually a solution.”
Scott LeMaire, editor of the Journal of Surgical Research, told us that the journal is “actively evaluating the comments that we have received about this paper.”
Not the first time
Last year, the same research group published a perspective in the prestigious Annals of Surgery also calling for surgical studies to include a post-hoc power calculation. An extended back-and-forth in the journal’s pages ensued ( here, here, here, and here) between statisticians (including Althouse) and the authors. Bababekov and Chang appeared undeterred, replying that they “respectfully disagree that it is wrong to report post hoc power.”
Althouse, whose efforts previously led to the retraction of a cardiology study, is not alone in his concerns. Dozens of comments have been posted to PubPeer about the latest paper, calling it “completely flawed” and suggesting it “should be urgently retracted.”
“Papers that have had corrections or rebuttals issued often continue to be cited as though the rebuttals simply don’t exist,” Althouse told us. “If this paper remains published, the ripple effect that I fear is that people will still believe that this idea of post-hoc power has legitimacy.”
Gelman, who penned a letter opposing the Annals of Surgery article, wrote on PubPeer that “it is irresponsible for [the authors] to have written this new paper given that various people have already pointed out their error in print.”
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us , like us on Facebook, add us to your , sign up for an email every time there’s a new post (look for the “follow” button at the lower right part of your screen), or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at email@example.com.