Credit: Unsplash/CC0 public domain
Social media networks and news agencies often use fact-checkers to distinguish the true from the fake in the face of serious misinformation concerns. However, fact-checkers are unable to assess all of the stories that circulate online.
Researchers at MIT have just released a new study that suggests a different approach. Crowdsourced accuracy judgments by normal readers can be nearly as accurate as professional fact-checkers.
Jennifer Allen, a doctoral student at the MIT Sloan School of Management, says that fact-checking can be difficult because there is too much information for professionals to cover. She is also co-author of a paper detailing her findings.
The current study that examined over 200 news stories that Facebook had flagged for further scrutiny may have revealed a solution. It used small, politically balanced groups to assess headlines and lead sentences in news stories.
Allen says, "We found it encouraging." Allen says that the average rating for a group of 10 to 15 people was as good as the fact-checkers' judgements. This is because fact-checkers are correlated with one another. This is because the raters were ordinary people who didn't have any fact-checking training and simply read headlines without doing any research.
Crowdsourcing could therefore be used widely and cheaply. According to the study, it costs about $0.90 per story for readers to evaluate news stories.
David Rand, a senior co-author and professor at MIT Sloan, says that there is no single solution to the problem of fake news online. "But, we're still working on promising approaches to the anti misinformation toolkit."
Science Advances published the paper today, entitled "Scaling Up Fact-Checking Using Wisdom of Crowds." Allen, Antonio A. Arechar (a researcher at the MIT Human Cooperation Lab); Gordon Pennycook (an assistant professor of behavioral sciences at University of Regina's Hill/Levene Schools of Business); and Rand, who is Erwin H. Schell Professor, a professor of brain and cognitive science at MIT and director of MIT’s Applied Cooperation Lab.
Critical mass of readers
The researchers used 207 news articles identified by Facebook's internal algorithm as in need of fact-checking. These articles were either problematic because they were widely shared or related to important topics such as health. The experiment involved 1,128 U.S. citizens using Amazon's Mechanical Turk platform.
Participants were given headlines and lead sentences from 20 news stories. They were then asked seven questions to determine how accurate each story was.
Three professional fact-checkers were also given the 207 stories and asked to rate them after they had been researched. According to other studies, the fact-checkers' ratings were not perfect, but they did have a high correlation with one another. About 49 percent of cases were ruled by all three fact-checkers. In 42 percent of cases, two fact-checkers agreed. And in 9 percent, each fact-checker had a different rating.
Surprisingly, when regular readers were divided into groups with equal numbers of Republicans and Democrats, their average ratings were strongly correlated with professional fact-checkers ratings. With at least double the number of readers, crowd ratings were as strong as those of fact-checkers with each other.
Allen says that these readers were not trained in fact-checking and were only able to read headlines and lead sentences.
Although it may seem surprising at first that a group of 12-20 readers could achieve the same performance as professional fact-checkers is a classic phenomenon called the wisdom of the crowds. In a variety of applications, it has been shown that groups of laypeople can match or surpass expert judgments. This can happen even in highly polarizing situations such as misinformation identification, according to the current study.
Participants also completed a political knowledge test as well as a test to assess their ability to think analytically. The fact-checkers found that people who were more informed about civic issues and had more analytical thinking were more aligned with their ratings.
Rand states that people who engaged in more reasoning and were better informed agreed with fact-checkers more. Rand says that this was true regardless if they were Republicans or Democrats.
Participation mechanisms
Researchers say that the findings could be used in many ways and note that social media giants like Facebook are actively seeking to make crowdsourcing work. Facebook offers a Community Review program where laypeople are employed to evaluate news content. Twitter, on the other hand, has Birdwatch which solicits feedback from users about the authenticity of tweets. The wisdom of the crowd can be used to either apply public-facing labels on content or to inform ranking algorithms that determine what content is shown to people.
The authors point out that any organization using crowdsourcing must have a way for readers to participate. Participation should be open to all, otherwise it's possible for crowdsourcing to be unfairly influenced or distorted by partisans.
Allen points out that we have not yet tried this in an environment where everyone can opt in. "Platforms should not expect other crowdsourcing strategies to produce the same positive results."
Rand suggests that news organizations and social media agencies would need to find ways of getting large enough numbers of people actively reviewing news articles in order for crowdsourcing to work.
Rand states that "most people don't care much about politics and care enough try to influence them." Rand says that the problem is that if people can rate content, the only ones doing it will be those who are trying to manipulate the system. To me, the bigger problem than being overwhelmed by zealots, is that nobody would do it. This is a classic problem of public goods: While society benefits from misinformation being identified, why should people bother rating it?
Science Advances (2021): More information. Information from Science Advances: Scaling up fact checking using the wisdom and crowds, (2021). DOI: 10.1126/sciadv.abf4393