It starts with one and spreads from there. It grows exponentially, jumping borders and burning through communities. One becomes ten, and ten become thousands. If one of the people it reaches happens to be a superspreader, then the numbers really skyrocket. The way information spreads online is so much like the way pathogens spread through populations, it’s called “going viral.”
Misinformation can be defined as “a claim of fact that is currently false due to lack of scientific evidence.” Throughout the coronavirus pandemic, the global spread of misinformation has handily kept pace with the global spread of the virus. But swarming misinformation isn’t unique to the pandemic; it’s practically inevitable any time a lot of people are talking about the same thing, especially if some elements are difficult to pin down.
Misinformation can also be spread by people’s attempts to refute it.
“It usually starts with a fairly obscure source,” explains information scientist Ashlynn Daughton, who led a team at Los Alamos in the study of pandemic-related misinformation. “Then it gets picked up by others and spreads. If a person of prominence or a person with the appearance of credibility picks it up, once that person’s tweets or clips are attached to it, they have basically vouched for it, and no one bothers to go back and vet the original source.”
Daughton and other scientists at Los Alamos were studying online information patterns well before the pandemic. From the spread of seasonal influenza within the United States, to political instability abroad, they have been learning how publicly available data from social media can be incorporated into powerful forecast models.
“People go through a sort of sensemaking process, and they talk or tweet about what’s going on,” says Daughton. “As COVID-19 was emerging, we wanted to see what we could do with misinformation from Twitter.”
“We realized immediately that it’s a problem our skill set is uniquely able to address,” adds Geoffrey Fairchild, a Los Alamos data scientist who worked on Daughton’s team. “We had been considering misinformation for a while, then COVID-19 came along, and it was the perfect opportunity to apply what we had all been thinking about.”
Disconnect the dots
One specific form of misinformation that Daughton and her team set out to examine is the conspiracy theory. Conspiracy theories use elaborate ideas about a particular group of people secretly working together in an illegal or immoral capacity to achieve a goal that will benefit them and harm the public—all to explain events that can be otherwise explained.
A confluence of circumstances made the first months of the pandemic a breeding ground for conspiracy theories: Topics such as vaccine efficacy and perceived threats to personal freedoms were already controversial to some; people felt powerless during lockdowns and were isolated from one another, relying heavily on social media for news as well as for personal connection; and news about the virus focused on highly specialized subjects such as epidemiology, immunology, and RNA technology, that can be difficult for non-experts to understand.
Infectious disease conspiracy theories are not new. In 1832, British doctors were accused of faking cholera outbreaks so they could steal patients and sell their body parts, and in 1889, an influenza pandemic was blamed on the new-at-the-time technology of electric lighting.
The Los Alamos team used tweets collected by NewsGuard, an independent information watchdog that tracks online misinformation, including conspiracy theories. Some examples of conspiracy theories that NewsGuard identified surrounding COVID-19 are: that the new technology of 5G internet was causing disease symptoms, that Microsoft co-founder Bill Gates along with his wife and charitable foundation were involved, and that the eventual vaccine would be harmful in some way. The details varied but certain keywords popped up again and again.
The team began with a collection of anonymized individual tweets that contained keywords associated with certain COVID-19 conspiracy theories identified by NewsGuard. Two researchers had to independently deem a tweet to be related to a conspiracy theory for it to be classified as such. Over 7000 tweets were manually classified in this way, creating a data set that was then used to train a machine learning (ML) model that would subsequently churn through orders of magnitude more data. After training it on the labeled tweets, the researchers fed the model 120 million more tweets, 1.8 million of which it determined to be related to a COVID-19 conspiracy theory. Using this filtered data set, the team then characterized the linguistic features of COVID-19 conspiracy theories to see how they evolve over time.
“We showed that misinformation tweets show more negative sentiment than non-misinformation tweets,” explains team member Dax Gerts. “We also saw the theories evolve over time, incorporating details from previously unrelated conspiracy theories as well as from real-world events.”
For example, in December of 2019, before most people knew about the novel coronavirus outbreak that would soon become the COVID-19 pandemic, a research team with funding from the Bill and Melinda Gates Foundation published their development of a technology that can keep a vaccination record on a patient’s skin with an ink-like injection that could be read by a smartphone. A few months later, in March of 2020, perhaps in oblique reference to this technology, Bill Gates publicly suggested that the records of COVID-19 tests and vaccinations may be made digital so that people wouldn’t have to carry paper proof. This casual mention by a prominent person, in combination with incorrect assumptions about the technology—which does not have the capacity to track people’s movements—quickly morphed into various conspiracy theories that, for one reason or another, Gates was planning to microchip the masses.
The team compared original tweets to retweets and found that original tweets present false information more often than they present evidence-based information, however, evidence-based information is retweeted more often than false information.
There was also a surprise finding: Tweets refuting false information, or otherwise tempering it, can instead promote it. Attempts at counter narration, regardless of sentiment or content, caused Twitter’s algorithms to upregulate all related content, even the misinformation itself, leading to more views of both.
“My hope is that the public health experts will see this and try to craft their messages in a more general way,” says team member and epidemiologist Courtney Shelley. “They shouldn’t use the same terminology as the misinformation does. I think that would make for more effective outreach.”
It’s hard to change people’s minds—their first impression is usually the one that endures. For that reason, the Los Alamos team’s goal is to build a tool that can help public health officials get out ahead of health-related misinformation, instead of trying to chase it down.
“It’s still a challenge to figure out how to proactively deal with public health misinformation,” says Gerts. “Early messaging should try to correct misinformation before it takes off—it’s harder to combat once it’s widespread. Later messaging should try to target the new elements of conspiracy theories as they evolve.”
But even with on-point messaging from health authorities, the public must take the messaging seriously for it to be effective. Therefore, Daughton and her team also looked at public adherence to behavioral messaging about COVID-19 transmission—whether or not people were actually taking the right actions to avoid infection.
Where the rubber meets the road
People frequently post on social media about health-related behaviors like what they eat or how they exercise. But just because someone tweets about a salad, doesn’t mean they eat it. Human behaviors that can alter infectious disease transmission include staying home, socially distancing when you must go out, and washing your hands when you return. People tweeted a lot about these things in the early stages of the COVID-19 pandemic. It’s hard to know though, how much adherence was actually going on. Is there a way to tell if people are really doing what they claim in their tweets to be doing?
To explore this question the Los Alamos team again started with thousands of anonymized tweets and manually categorized them to create training sets for ML models. The tweets were labeled according to specific health-related behaviors like social distancing, as well as personal impacts like economic hardship. The ML models, one for each behavior- or impact-category, were then used to classify 228 million tweets from January to July of 2020, and the results were mixed.
The ML classifier for identifying tweets about monitoring disease symptoms, for example, performed poorly with a success rate no better than mere chance. But the classifiers for identifying tweets about social distancing and sheltering-in-place were much stronger, allowing the team to do further analyses of these tweets.
The scientists used publicly available anonymous mobility data (gleaned from smartphone geolocation services) to see if they could correlate people’s mobility—or, more precisely, people’s smartphones’ mobility—to tweets about social distancing and sheltering-in-place. Clear temporal and spatial patterns were apparent. Generally, as the number of tweets about staying home increased, the average mobility for the same region and period decreased, indicating agreement between conversations on Twitter and real-world behavior. The comparison was done on a state-by-state basis, with the correlation being stronger in some states than in others. Most states, however, had a maximum amount of Twitter chatter about social distancing and a minimum average mobility occur within 20 days of each other.
In addition to correlating tweet quantity with mobility, the team was also able to correlate tweet upticks and downticks with a variety of major pandemic-related events. Immediately after the first COVID-19 death in the United States, the number of tweets about social distancing increased by more than 50 percent. A further increase occurred about a month later when 95 percent of the country was put under some form of lockdown order.
The Los Alamos team hopes this work will inform public-health communication strategies for the present as well as the future. Real-time social media conversations, as well as other sources of behavioral data, can reveal a lot about people—how they’re feeling, what they’re thinking, and what they’re doing. A better understanding of people’s awareness and compliance with suggested behaviors can improve decision-makers’ overall knowledge and policy making in the crucial early stages of an epidemic or other emergency.
But, that’s uncredible!
The ease with which online misinformation spreads is due in part to the equal footing afforded by the online environment, regardless of credibility. In a sort of “signal versus noise” paradigm, information from credible online sources can be overwhelmed by that from uncredible ones, complicating public outreach messaging. Presently the Los Alamos team members are looking at how uncredible information travels online, compared to information from more credible sources. The journalists at NewsGuard assign credibility scores to online news sources, and the Los Alamos scientists are comparing news sources with low credibility scores to those with high ones. Specifically, they are looking at which sources get published and tweeted about the most, which topics those sources are talking about, and which topics become the most popular.
Updating information is the whole point of the scientific endeavor.
“It’s frustrating to see how effective misinformation can be,” says Fairchild. “As data scientists, we look at it and wonder how many people are basing important decisions on misinformation. It’s frustrating, but at the same time, it drives us to do this work—to see the scale of the problem and develop tools that might be able to help.”
The Los Alamos team’s work is not limited to COVID-19, nor even to public health. The scientists are working to characterize misinformation in general, which is an important first step in counteracting its spread. Other online-information work the team members are doing includes identifying and tracking online hate speech, looking for population-level behavior changes over time, and finding trends in how humans move through space and time.
During the COVID-19 pandemic, and indeed in any emerging infectious disease scenario, the scientific understanding evolves quite quickly. So public messaging based on that understanding—about infection risks, transmission pathways, and mitigation strategies—also evolves.
“There’s sometimes this sense in the general public,” Fairchild muses, “that when scientists change their messaging, it means that the information they’re providing is somehow untrue or not trustworthy. But really, updating information is the whole point of the scientific endeavor.”