Computing for a cure

By J. Weston PhippenDecember 1, 2020

COVID-19 poses an unprecedented threat to national security. But scientists at Los Alamos National Laboratory are fighting back by harnessing the power of computers.

The first confirmed case of COVID-19 in the United States was discovered in Seattle, Washington, on January 21, 2020. By the time the World Health Organization declared the coronavirus-induced illness a global pandemic 10 days later, Wuhan, China, where COVID-19 originated, was already on full lockdown. In February, the virus spread to the United Kingdom and parts of Africa and Europe. By March, global cases reached 86,000.

COVID-19 reached New Mexico, home to Los Alamos National Laboratory, on March 11. But some six weeks before that, Lab scientists had already set aside their usual research to help tackle the worst virus outbreak in the world since the Spanish flu of 1918. By early summer, the Lab was hard at work on more than 40 projects focused on COVID-19.

“From an impact point of view, the modeling we’re doing with computers has changed our country’s understanding of this pandemic.” —J. Patrick Fitch

“With our history in epidemiological modeling and research, it was a natural pivot for us,” says associate Laboratory director J. Patrick Fitch, who leads the Lab’s coronavirus response. “From an impact point of view, the modeling we’re doing with computers has changed our country’s understanding of this pandemic.”

Scientists at the Lab are harnessing computing power to help return life to normal. Some have used models to simulate how the virus spreads around the globe and how social media misinformation helps it spread. Others have created virtual communities, then infected their virtual communities with SARS-CoV-2, the coronavirus that causes the COVID-19 illness. And as the world races to develop a vaccine, Lab supercomputers are helping to discern if a vaccine will really spell the end of the virus, or if SARS-CoV-2 is here for good.

Forecasting the flu—and COVID-19

A virus’ indiscriminate attack on a country’s people can disable its economy and have national security implications. So, as part of its mission to study national security threats, each week for the duration of flu season, with a more accurate forecast each time new data is incorporated. “Submitting each week of the season,” he explains, “allows forecasters to update their forecasts in light of current data—similar to how, for instance, hurricane forecasts are updated as the hurricane is unfolding.”

Compared with other, more complicated forecasting models, Osthus jokes, “if I were listening to me describe our system, even I wouldn’t trust it. But thankfully we have the CDC prize to prove that it’s one of the best in the country.”

This year, for the 2019–2020 season, Osthus again submitted two models, including an updated Dante that includes “internetbased nowcasting,” which “develops and uses a model that maps Google search traffic for flu-related terms onto official flu activity data.” But before the competition could conclude, the coronavirus pandemic hit. The FluSight challenge was abruptly ended as, Osthus says, “the whole flu forecasting community turned its attention to COVID-19 modeling.” Now, Los Alamos’ winning flu-modeling team is hard at work modeling the novel coronavirus.

The flu and COVID-19 are similar in that they’re both spread through microscopic droplets of water in a person’s breath. They’re also similar in that an infected person can spread the virus to others before showing symptoms. So pulling Dante into the COVID-19 effort was natural, but the modeling required for COVID-19 is different in some ways from that used to model the flu. Dante was very good at using the past to predict the future; the model used “20 years of historical flu data to makeLos Alamos has invested significant time into understanding viruses, including dengue fever, Zika, malaria, HIV, and even the flu.

Since the fall of 2013, the Centers for Disease Control and Prevention (CDC) has hosted an annual flu forecasting competition, the FluSight challenge, which asks researchers to forecast—to predict the timing, peak, and short-term intensity of—the unfolding flu season. The winner of the challenge for 2018– 2019 was Dave Osthus, a statistician in the Lab’s Computer, Computational, and Statistical Sciences Division. The team also included Kelly Moran, Reid Priedhorsky, Ashlynn Daughton, Sara Del Valle, and Jim Gattiker. Osthus submitted two models; they won first and second place. The firstplace model, named Dante, most accurately predicted the 2018– 2019 flu season at the national, regional, and state levels.

Flu models generally collect a large variety of data inputs—weather, the mobility of a state’s population, a state’s vaccination rate, even internet keyword searches for flu symptoms. But as inputs increase, the forecasts can actually become less predictable. So, for Dante, Osthus and his team did away with as many inputs as possible. Instead, they focused almost entirely on the confirmed flu cases that are reported each week by hospitals and then passed to the CDC.

“When the flu season starts, we observe the first week of flu activity data,” Osthus explains. “That data point constrains what our predictions look like. As the flu season unfolds, each week we get another data point.” Dante makes a new prediction forecasts for the new flu season,” Osthus says. But “there is no historical COVID-19 data,” so the new model is built “basically from the ground up,” using a great deal of knowledge gained from years of flu forecasting.

At first, the team thought it could compensate for this lack of past data by increasing inputs. They accounted for rate of transmission, the time it took for a person to show symptoms, and how many people an infected person might come into contact with beforehand—about a dozen inputs in all. But as the team members wrestled with all the variables, they slowly cut the number of inputs, finally reverting to a much simpler method.

“The model that’s now running on our site is the seventh iteration,” Osthus says. It’s much like the original Dante. Each week, the COVID-19 model of Dante analyzes data published by Johns Hopkins University and, similar to the flu version, it uses only confirmed data of infections, confirmed deaths, and also factors in the population size of the forecasted area. Then the model constantly reanalyzes its prediction with each new reported dataset, growing more accurate each week. But whereas the original Dante could run on a laptop, the COVID-19 version requires the Lab’s Darwin supercomputer because not only is Dante crunching data for every U.S. state, it’s doing the same for 250 countries around the world.

“As of now,” Osthus says, “our system is one of only a handful of models being used by the CDC that has consistently outperformed the baseline models.

Simulating a sickness

Unlike the flu, the spread of COVID-19 hinges on many factors that scientists are only beginning to understand. Human behavior, including opting to wear masks, social distance, and quarantine—or not—has presented new situations that past models never needed to consider.

Accounting for people’s actions during the coronavirus pandemic requires a much different type of model, one that the Lab has already developed.

Over the past 15 years, Timothy Germann, a computational scientist, has helped develop EpiCast, one of the most accurate agent-based virus models in the country. An agent-based model can simulate the actions and interactions of individual or collective agents to see how those actions and interactions affect an entire system. The way EpiCast works is kind of like Sims, the life-simulation video game. EpiCast can create a virtual city populated with virtual citizens who can be assigned ages, incomes, children who attend school, and jobs that take them through the community—all of it based on information pulled from sources such as the U.S. Census Bureau’s decennial census.

EpiCast can be used to model a pandemic spreading through areas as diverse as New York City or the state of Georgia. And although EpiCast was originally built to study smallpox, and later the avian flu, it’s flexible enough to allow researchers to swap in the characteristics of almost any virus. As the coronavirus pandemic reached New Mexico in March, Germann along with mathematical and computational epidemiologist Sara Del Valle and applied mathematician Carrie Manore, put this powerful modeling tool to work in their home state.

Each week, Del Valle and Manore met with the New Mexico Department of Health to discuss how COVID-19 might spread through different counties and what the best policies might be to prevent infections. The health department then consulted with the governor, Michelle Lujan Grisham, whose actions would impact the lives of the state’s 2.1 million residents.

Lujan Grisham, formerly New Mexico’s secretary of health, declared one of the first statewide health emergencies in the nation. New Mexico hospitals offered free COVID-19 testing. Businesses deemed nonessential were closed and slowly allowed to reopen once the infection curve had mellowed. By late summer, New Mexico, a state with one of the lowest hospital and physician rates per capita in the country, had fared much better during the pandemic than neighboring states.

By August, the debate about whether to open schools was in full swing. “We were able to put our model to use analyzing a few different options for New Mexico,” says Del Valle, explaining that she, Manore, and Germann used EpiCast to test three scenarios (students attend in-person school five days a week, students attend virtual school from home, and students do a combination of in-person and virtual school). Based in part on the results of EpiCast simulations and other Lab modeling efforts, New Mexico’s decision has been a hybrid of these options. Counties with less than a 5 percent COVID-19 positivity rate opened schools to partial capacity, with students doing a combination of in-person and virtual learning, with one day reserved for thorough cleaning of schools.

Later, Del Valle and Germann ran similar simulations for the nation. “At that point, we were not only working with the state, we were also informing the CDC,” Del Valle says.

Studying social media

The tough part about models, even one as advanced as EpiCast, is that human behavior is hard to predict. Humans don’t always act rationally or lawfully. For example, despite municipal or state laws requiring masks, some people refuse to wear them. This could simply be because these people find masks inconvenient or uncomfortable. But Lab information scientist Ashlynn Daughton also wondered if this behavior could be traced to a larger influence.

Daughton’s past work used an algorithm to study public social media posts to understand how seriously people considered the Zika virus to be a threat. To do this, the algorithm looked for signs on social media that indicated people had canceled travel plans to areas with Zika. Daughton hoped to do something similar with COVID-19.

She began by building an algorithm that helps her study public social media posts on Twitter and Redditt. The algorithm looks for posts that express feelings for or against safe behavioral practices, such as adhering to social distancing, washing hands, or wearing masks.

In all, the algorithm looks for 30 virusrelated behaviors posted on social media. The algorithm uses the words in a post to identify what behavior is being discussed, and it recognizes if the post is a personal observation (“I am wearing a mask”) or a general observation (“here is an article about mask wearing”). The observation is then analyzed to decide if it reflects safe or risky practices, and its geolocation is crossed with COVID-19 outbreaks.

“In the long term, it will be interesting to see whether living in a place where online misinformation about COVID-19 is widespread makes you more likely to buy into that misinformation and put yourself at risk,” Daughton explains.

If it can be proven that there is a correlation between outbreaks and social media influence, the data could also feed into larger modeling systems, such as EpiCast, to create even more-accurate simulations of how the virus can spread.

Looking into the lungs

Similar to the way EpiCast creates a digital representation of a community, mathematical biologist Ruy Ribeiro and his team have created a model to understand how SARS-CoV-2 interacts with the human respiratory and immune systems. They hope their work will enable physicians to develop better treatments to save lives.

One of the main interests of Ribeiro’s team is how SARS-CoV-2 migrates from the upper respiratory tract (the nose and mouth) to the lower one (the lungs). A virus like the flu tends to linger in the upper tract, but SARS-CoV-2 can move fairly quickly to the lower tract, where it can do serious and sometimes fatal damage. Typically, the virus infects cells to make more (tens of millions of) copies of itself, and the body uses its immune system to fight back. This immune system response usually starts with the production of proteins that protect cells against infection and prevent the virus from replicating. Often, specialized B and T defense cells join the fight. But in some COVID-19-positive patients, the communication between the first responder cells and the reinforcement T and B cells becomes garbled.

With its model, Ribeiro’s team analyzes viral replication and the virus’ interaction with the body’s defenses. The team found that people are most infectious when their viral load (the amount of virus present in a person) is above 10,000 copies of the virus per test swab.

From there, Ribeiro’s team can introduce different treatments into the model to see if any of the treatments help the body’s immune system fight the virus. Remdesivir, for example, is an antiviral medication that slows virus reproduction and has been shown to shorten patient recovery time. Ribeiro’s model showed that remdesivir can in fact help patients with mild symptoms recover more quickly, but only if the treatment is started very early in the process, almost on day one of infection. Beyond day three, they found little-to-no benefit for treating patients with remdesivir.

Another question the team wanted to understand is how SARS-CoV-2 mutates once it’s inside the body. Using a handful of available studies in which infected patients were repeatedly tested for up to 20 days, Ribeiro and his team are running thousands of simulations that show how the virus can replicate itself and how rapidly it can mutate.

“This is like the holy grail of the research being done now,” Ribeiro says, “because it will tell us not only how quickly the virus is evolving in the human body, it will also tell us how it’s evolving.”

Understanding this progression is of the utmost importance as a vaccine against the virus is developed. If the SARS-CoV-2 mutates rapidly enough, it might evolve beyond the vaccines currently being developed, rendering them ineffective. Creating a vaccine is like aiming at a moving target: you can’t just shoot where it is, you have to aim where it will be.

Variations of the virus

Theoretical biologist Bette Korber—a Laboratory Fellow renowned for her HIV work—is also studying how SARS-CoV-2 mutates. In July, she and 20 co-authors published a paper in the journal Cell that stated a variation of the virus, called D614G, has become the most prevalent form of the virus globally.

“The D614G variant first came to our attention in early April, as we had observed a strikingly repetitive pattern,” Korber says. “All over the world, even when local epidemics had many cases of the original form circulating, soon after the D614G variant was introduced into a region it became the prevalent form.”

Geographic information from samples from GISAID, a global science initiative that provides open-access to genomic data of viruses, enabled tracking of D614G, which occurred at every geographic level: country, subcountry, county, and city.

“It is possible to track SARS-CoV-2 evolution globally because researchers worldwide are rapidly making their viral sequence data available through the GISAID viral sequence database,” Korber says. GISAID was originally established to encourage collaboration among influenza researchers, but early in the coronavirus pandemic, the consortium established a SARS-CoV-2 database, which soon became the de facto standard for sharing outbreak sequences among researchers worldwide. Currently, tens of thousands of sequences are available through this project, and this enabled Korber and colleagues to identify the emergence of the D614G variant.

The SARS-CoV-2 virus has a low mutation rate overall (much lower than the viruses that cause influenza and HIV-AIDS). The D614G variant appears as part of a set of four linked mutations that appear to have arisen once and then moved together around the world as a consistent set of variations.

“These findings suggest that the newer form of the virus may be even more readily transmitted than the original form,” Korber says. “Whether or not that conclusion is ultimately confirmed, it highlights the value of what were already good ideas: to wear masks and to maintain social distancing.”

Mapping the spread

Computational biologist Thomas Leitner is trying to understanding how SARS-CoV-2 has evolved on a global scale as it has spread across cities, regions, and countries. “At the moment, SARS-CoV-2 has spread through communities quickly,” Leitner says. “But viruses like this tend to evolve faster when they have time to settle for longer periods in a host community, so we’re just beginning to understand the nature of how it’s changing.”

To understand that question, Leitner is developing a genetic tree for COVID-19, not unlike the family trees schoolchildren fill out. Leitner’s tree traces the most current genetic variants of the virus back to their beginning in China, where the virus is believed to have originated in horseshoe bats.

Every week, Leitner downloads the latest information released by GISAID, where there are currently more than 49,000 logged genetic sequences of the novel coronavirus. Slowly, Leitner is building topologies—related configurations—that reveal how the virus has spread across the world in human populations.

“There are fewer atoms in the universe than the number of topologies I can create with 49,000 genetic variations of this virus.” Leitner says. “I’ve definitely got my work cut out for me.”

Once Leitner models the SARS-CoV-2 genetic tree, he’ll be able to understand how the virus evolves as it’s passed around the world. Leitner’s work could also prove crucial to creating a vaccine because after a prolonged time the virus’ genetic code may evolve to look very different in one region of the globe than it does in another. If, for example, SARS-CoV-2 in America differs dramatically from the variant in Brazil, this might require specialized vaccines for each region.

“The ultimate goal for this work is to then couple it with the other research being done that looks at how SARS-CoV-2 evolves inside the body,” Leitner says. When that time comes, the Laboratory’s computing power will be instrumental in linking various research together.

“Over the decades, we’ve built a robust computing capability,” Fitch says, “and it is perfectly suited for fighting the COVID-19 outbreak.”

Viable ventilators

When a ventilator shortage worried the nation, the Lab stepped up with a couple inventive solutions.

In February, a surge of COVID-19 patients at hospitals across the country led to the realization that the United States might not have enough ventilators—machines that help people breathe—to keep severely ill patients alive. This problem prompted some innovative responses from the Laboratory.

Two groups of engineers wondered if they could build their own ventilators using locally sourced supplies. One group purchased parts from hardware and car supply stores in Los Alamos, took them back to someone’s garage, and fashioned them into a ventilator. Another team partnered with health experts from Presbyterian Health Services to build a ventilator using plumbing from a hardware store. As the projects progressed, the engineers made improvements as they assessed how well the ventilators worked on simulated lungs.

Typical ventilators deliver to patients regularly timed spurts of oxygen, which can be unnatural for someone whose breathing is labored or erratic. So, Lab engineers modified one of their ventilators to deliver oxygen in response to a patient’s natural pace of breath. Then they added pulsed aerosols into the oxygen delivery system in hopes that the aerosols would break down mucus in the lungs of infected patients.

“What we’re doing is essentially transforming a machine meant to keep people alive into a potential treatment,” says associate Laboratory director J. Patrick Fitch, who’s leading the Lab’s coronavirus response. The new ventilator is now a joint project with Idaho National Laboratory.

Predicting the need for ventilators

Data analyst Paolo Patelli and computational scientist Nidhi Parikh viewed the ventilator shortage as a supply and demand problem— one for which they had a solution.

“Our idea was to create a program that policymakers could use that took stock of how much equipment they had and what equipment they would need, based on the virus’ spread,” Parikh says.

The program they developed accounts for a state’s supply of ventilators, as well as the supply of neighboring states. It uses a model that predicts the future infection rate of that state, and then measures how many infected people will potentially end up in a hospital with need of a ventilator. If there aren’t enough ventilators on hand during the projected COVID-19 outbreak in one location, the program can direct policymakers to regions of the country with a surplus.

“Since then,” Parikh says, “we’ve expanded the program to include personal protective equipment, medications, and almost anything a hospital needs to help COVID-19 patients.”