"We thought we’d know the whole story once we sequenced the human genome,” says Los Alamos molecular biologist Christina Steadman. Scientists in the 1990s were hopeful that by cataloging genes they would uncover the secret to life without disease, or even aging. This desire fueled the famed Human Genome Project, an international effort to decipher the entire sequence of human DNA. Today, scientists understand that it isn’t enough to know the sequence—how and when genes are switched on is paramount. And to make it even more complicated, everything from a nurturing learning experience to a season of famine can toy with our genes' on- and off-switches.
Genes inherited from our parents are just the blueprint of what is possible. Gene expression is the result of turning genes on or off, which determines how the blueprint is used. But research shows that the twists and turns of life, such as exposure to environmental toxins, viral infections, and even behavior or trauma, have a strong influence on gene expression through a variety of mechanisms. The study of this is called epigenetics.
“Our bodies have certain agnostic, regulatory tools they use to respond to our environment. For instance, what we eat influences the expression of our metabolism genes. If we eat healthy amounts of greens and other vegetables, this causes certain positive epigenetic changes to metabolism genes,” Steadman says. “Conversely, not having enough to eat, or eating not-so-healthy food causes other epigenetic changes to those same genes, but instead results in potentially harmful effects.” To thoroughly understand epigenetics, scientists need more data linking specific environmental causes to specific epigenetic changes, plus data linking those cause-and-effect sets to specific functional impacts. What they’re finding is that it all comes down to the physical shape of the DNA bundle inside our cells.
To investigate the interplay between sequence, structure, function, and the environment, Steadman, who specializes in epigenetics, is part of a multidisciplinary team of Los Alamos scientists working to visualize the genome in four dimensions: the usual three, plus time. The 4D genome team, led by structural biologist Karissa Sanbonmatsu and microbiologist Shawn Starkenburg, seeks to understand how external factors in life’s journey impact both gene expression and the physical shape of our genome. The project melds cell biology, DNA sequencing, high performance computing, and state-of-the-art microscopy in a dynamic feat of data fusion.
Form and function
For hundreds of years, scientists were only able to study organisms through observing their appearance and behavior. Then, in the 1800s, Austrian friar Gregor Mendel studied changes in flower color and seed shape over several generations of crossbred pea plants, leading him to suggest that genetic material exists as units, now called genes. In the 1940s, British embryologist Conrad Waddington proposed that nurture was influencing nature on a genetic level. Waddington explored how environmental changes impacted the growth and development of fruit fly embryos—and he coined the term “epigenetics” to describe the influence of external factors that act “upon” or operate “above” genetics. But still, the actual mechanism of how the environment could impact gene expression remained hidden.
Everything from a nurturing learning experience to a season of famine can toy with our genes' on and off switches.
It was not until decades later, with the inventions of DNA sequencing and highly advanced molecular biology, that scientists were able to truly begin investigating epigenetics. Today, it is widely understood that epigenetic changes occur when external factors in the environment cue specific enzymes in our bodies to physically attach small molecules—some as tiny as methyl groups (-CH3)—to our genes. The presence or absence of these molecular “marks” has an influence on gene expression, which translates into variation in function. Scientists have found evidence of extra methyl groups on the DNA of cancerous cells, and other epigenetic marks have been documented in trauma survivors and have also been inherited by their children.
The scientific community’s working hypothesis is that epigenetic molecules alter access to genes by subtly changing the shape of chromosomes. However, establishing these cause-and-effect relationships on a molecular level is exceedingly difficult because of how DNA is packaged in our cells. Most human cells contain a complete copy of a person’s DNA: their genome. Laid out end to end, one complete genome is about two meters long, but by coiling up, it fits inside a cell nucleus that is just 10 micrometers in diameter. The coiling up begins with the double-helix strand of DNA, coiled around specific proteins called histones, together creating subunits called nucleosomes, which twist into bundles called chromatin. Chromatin appears tangled like a bowl of spaghetti but is actually organized into 46 distinct chromosomes. It changes between states of being more or less condensed based on its activity. The familiar X-shape of chromosomes is only visible when chromosomes are highly condensed and paired up for cell division.
Although chromatin is tightly wound, during gene expression it unravels just a bit to grant DNA-copying enzymes access to a gene encoded in a particular stretch of DNA. This is the first step in gene expression; once the gene is copied, the copy is used elsewhere in the cell to direct other enzymes and various functions such as the synthesis of proteins, enzymes, and hormones. However, when epigenetic marks are present on DNA or histone proteins—having been added by enzymes in response to environmental factors—they can potentially impact the chromatin’s ability to unravel.
If an epigentic mark prevents the chromatin from unwinding, genes won’t be properly expressed. Conversely, a gene can be over-expressed if an epigenetic mark keeps the chromatin open for extended periods of time. Either of these scenarios can cause a change in function to one or more genes—resulting in too much of a hormone or not enough of a protein. It is possible for epigenetic enzymes to remove the marks, reversing their impacts. However, some marks, especially those associated with cancer and pathogen exposure, are more complicated to remove and the resulting functional changes can be permanent. Overall, scientists believe there are hundreds of types of epigenetic marks, and an increasing body of research shows they can directly influence our health. By studying the placement of epigenetic marks and their impact on gene access, Los Alamos scientists hope to learn more about how our environment impacts our genomes and to discover important clues for improving human health.
“We have a wide range of resources at Los Alamos, so we are able to look at multi-scale levels of detail to create a low-resolution, big picture of how epigenetic changes impact chromosome shape,” says Sanbonmatsu. She explains that the team’s goal is to evaluate the chromatin from cells that have been epigenetically modified by analyzing sequence data and microscopy images and creating computer simulations of entire chromosomes. To accelerate the scientific community’s understanding of epigenetics, the team is also creating a web browser to help other researchers visualize their own data.
Multi-scale visualization will allow scientists to figuratively zoom in, by accessing various types of information, such as where on the chromatin specific marks are bound, which genes are nearby, and how the shape of the twisted chromatin impacts access to those genes. There is no single tool that can enable such a viewpoint, but by combining a number of different technologies, the team is beginning to see things more clearly.
As Steadman and her colleagues considered how to gather experimental data for their project, the omnipresence of the COVID-19 pandemic was hard to ignore. Many epigenetic studies have looked at cancerous tumors or intergenerational trauma, but many fewer have investigated viral infection. Yet viral infection occurs over a much shorter time span (days or weeks rather than years or generations), which would make it easier for the Los Alamos team to document changes. Furthermore, as COVID-19 showed us, viral infections can lead to long-term side effects that might have epigenetic causes.
The team could not work directly with the virus that causes COVID-19, so instead it used a similar type of coronavirus as a model system. The scientists conducted infection experiments with two kinds of cells: neonatal lung cells and adult lung cells. They infected both types of cells with a coronavirus strain called 229E so the team could investigate how the virus affected the cells’ genomes.
To understand the interplay between sequence, structure, function, and the environment we are visualizing the genome in four dimensions: the usual three, plus time.
Normally, when a cold virus enters a human cell, the cell turns on genes to make a protein that forms a complex with fragments of virus proteins. This complex sticks out from the cell surface, like a flag, to alert the immune system that the cell is infected. “We suspected that the virus might interfere with this immune response, and also we wondered if epigenetics is involved,” says team member and Los Alamos microbiologist Sofiya Micheva-Viteva.
The team set out to evaluate how the lung cells’ chromatin changed shape in response to coronavius infection. If the chromatin remained tightly wound, it might indicate that the cell was not transcribing genes to alert the central immune system to the viral infection. On the other hand, if the chromatin unwound, gene expression could be happening, which would suggest normal gene activation. The team harvested cells at various time points during the infection—six hours, twenty-four hours, and forty-eight hours—and then sequenced the DNA from each time point.
Because epigenetic marks are too small to see, the team used a combination of three types of sequencing techniques to reveal epigenetic mark locations and chromatin shape. One, the Hi-C technique, indicates which stretches of DNA are close enough to interact chemically, which helps inform 3D models of the chromosomes. Another method, called ATAC-seq, sequences all the DNA that is not tightly wound around nucleosomes, which correlates with how open the chromatin is. Finally, ChIP-seq is a technique that uses antibodies that bind to specific kinds of epigenetic marks, therefore identifying which marks are present and where they are bound to the DNA. Using the sequence data like waypoints on a map, the scientists are able to discover which genes are interacting because they are close together and how the 3D twists and folds of the chromatin contribute to this closeness.
“Our approach is to combine these individual techniques for an unprecedented view of the genome with all of its epigenetic marks and 3D structures,” says Steadman. “To do this, our outstanding postdocs, Vrinda Venu and Cullen Roth, have been adapting protocols and developing analysis tools.” By revealing the epigenetic marks and open sections the team can infer how epigenetics may be responsible for open areas or why some genes are close together. Further, by comparing data from infected and uninfected cells, as well as reference genomes, the team is learning how the 229E coronavirus may have impacted the shape of the host’s genome.
Into the next dimension
DNA sequencing gave the Los Alamos team the first dimension of chromatin structure in virus-infected cells, but gaining the next level—creating a 3D chromosome model—required an important computational leap. Sanbonmatsu has spent her career using complex computer algorithms to model various molecules. Before the 4D genome project began, she had already begun analyzing chromatin shape and modeling entire chromosomes. Sanbonmatsu and her collaborators, Anna Lappala and Jeannie Lee at Harvard University, predicted the 3D structure of the X chromosome using a novel process they called “4D-HiC,” which uses sequence and Hi-C information about where the DNA crosses and interacts with itself.
Many epigenetic studies have looked at cancerous tumors or intergenerational trauma, but many fewer have investigated viral infection.
The team applied the 4D-HiC computational technique to data collected from their coronavirus-infected cells. Using Hi-C data, team member Ankush Singhal made a plot of all the constraints: the places where the DNA is close enough to interact with itself. This plot helped Singhal create 3D models of some of the chromosomes from the lung cells, and the process was repeated for sequence data taken at each time point—6 hours into the infection, 24 hours in, etc.—so that the team could compare how the chromosome shape changed over time.
Although the sequence data detail every single nucleic acid base, which could be as many as a few hundred million for just one chromosome, the team’s 3D model could not be made at the same resolution. In fact, the amount of computing power necessary to model every single base would far exceed the world’s fastest supercomputers. To solve this problem, the team made a more coarse-grained model that would still provide enough essential information to understand if viral infections can trigger epigenetic changes. They chose to represent the chromatin as a spring (DNA) with beads (nucleosomes)—each bead representing about 200,000 bases and more than 1000 histone proteins.
“There are no similar 3D models out there at this resolution,” says Sanbonmatsu. “With our approach, we can visualize the 3D architecture of each chromosome and superpose them with information about epigenetic marks and the openness of chromatin to get an integrated picture. This gives us insight about the interplay between spatial genome architecture, epigenetic marks, and gene-to-gene interactions.”
Advanced sequencing and fancy computer models aside, there is nothing quite like a photo. Team member John Watt, a materials chemist and electron microscopist, helped validate the models by providing an image of the real thing.
Cryogenic electron microscopy (cryo-EM) enables scientists to take microscopic images of organic materials. Regular electron microscopes are run in a vacuum that can damage biological materials; cryo-EM avoids this problem by freezing the samples in water to preserve their structure. The samples are rapidly frozen into thin films of amorphous ice, about 100–200 nanometers thick. Once frozen, scientists can’t control the orientation of the sample, so they capture hundreds of images at various angles and combine them to get a model of the sample in 3D. Watt and his colleagues used this process to create 3D images of the chromatin extracted from coronavirus-infected cells and compared the images with the 4D-HiC computer models.
“There are only a few examples of cryo-EM being used to image chromatin, and even fewer concerning viral infection, so we had to figure out the process for ourselves,” says Watt. “Chromatin is a relatively big, globular, highly charged molecule so we had to tweak our processes, especially the freezing protocol, so that we could get the chromatin nicely dispersed in the ice.”
The cryo-EM images appear to support the chromatin models. Combining these images with other data, the 4D Team could now confidently conclude that early in the course of viral infection, around 6 hours, the host chromatin became more compact, and around 24 hours into the infection, the chromatin relaxed. “This means that the virus may have been shutting down gene expression in the cell, essentially silencing the cell’s ability to respond,” says Micheva-Viteva.“But it can’t shut everything down permanently, because a virus needs the cell’s machinery to make new virus copies.” Another explanation for the results could be that the lung cell itself is shutting down gene expression in an effort to stop the virus from multiplying. The team is working to identify exactly which genes are impacted to try and determine which of these explanations is most likely.
The results of the virus-infection experiments were exciting not just because of what the team learned about chromosome shape—they were also an important validation of the team’s approach to studying epigenetics. By combining sequence data, 4D-HiC modeling, and cryo-EM imaging, the team demonstrated a pathway that can now be used to evaluate other epigenetics questions.
Epigenetics for everyone
With each additional experiment and data set, the collective knowledge base about epigenetics grows, and scientists gain more evidence linking epigenetic causes to changes in cellular function. Furthermore, as more data are collected, predictive modeling may be able to help speed up the investigative process. To help accelerate scientific discovery, the 4D team developed a genome browser that uses data fusion and visualization to make epigenetic analysis available to everyone.
“Data fusion is what Google maps does,” says Los Alamos computational scientist David Rogers. “It blends satellite imagery, simple maps, traffic data, and so on into an easy-to-use interface. This is the core of what we’re doing with multi-scale data on epigenetics.”
The 4D genome browser uses sequence datasets as input and executes computer algorithms to create 3D molecular simulations for users. Rogers designed the browser to use a tool called a docker container, allowing it to operate as a virtual machine to process, for all users, the molecular dynamics code through the web architecture. This adaptation means that users do not have to purchase or install specialized modeling software to make 3D models using their own epigenetic data.
“Epigenetics is dynamic, so we wanted to create something more than a database. Our genome browser allows scientists worldwide to look at their own data in a whole new way,” says Starkenburg.
Our genome browser allows scientists worldwide to look at their own data in a whole new way.
The Los Alamos team’s comprehensive approach for studying epigenetics, combined with their 4D genome browser, is an important step toward the future of epigenetics research. With these techniques and tools, scientists can accelerate studies that link epigenetic marks to their impacts, and they can also begin to investigate drugs or behaviors that might reverse epigenetic changes. Just as the Human Genome Project sought to catalog all the genes in the body, cataloging epigenetic changes and their impacts on chromosomes could give scientists a whole new level of awareness about how the environment affects our genes.
“Epigenetics is profoundly important in rendering who we are. If we want to understand organisms—their behavior, their responses to changing environments, why they are what they are—then we need to look at their epigenetics, not just genetics,” says Steadman.
By using the unique combination of tools available at Los Alamos, Steadman and the 4D genome team are making it possible to understand the whole story of how nature and nurture are endlessly intertwined.