Supercomputers Keep Getting Super-er

By Eleanor HuttererJune 1, 2023

Two new supercomputers—each a unique beast—are coming to Los Alamos this year.

Thirty years ago, the computing industry was really taking off. Supercomputers had gone massively parallel, running thousands of processors at once instead of just one, completing a hundred billion operations per second. At the same time, and in part because of these advances, computing at Los Alamos took on new significance: in 1992, the U.S. ceased full-scale nuclear testing, adopting instead a combination of component testing and supercomputer simulations to ensure that the nation’s nuclear weapons stockpile is safe, secure, and reliable.

To liken a nuclear weapon to a car, for example, the U.S. no longer has to start its cars to know they will run. Instead, data from previous starts, and from meticulous testing of every tire, spark plug, circuit, and brake pad all go into a supercomputer, which then models all the parts and materials, as well as their interactions, across every instant of the startup process. Not only does this simulation-heavy approach to stockpile stewardship rely on the latest advances in supercomputing, but it actually helps guide those advances, pushing the technology forward and driving innovation.

The Laboratory’s mission is to solve national security problems, which include, but are not limited to, stockpile stewardship. This year, two new supercomputers that will serve that mission are coming to Los Alamos. One, called Crossroads, will be a specialist primarily for classified weapons work, and the other, called Venado, will be a generalist intended for both classified and non-classified, weapons and non-weapons national security work. Each one will fill its own large room in the Lab’s supercomputing facility, and despite appearances—row upon row of gently humming, shiny new processors—they are both unlike anything that has come before.

Purpose-built

Immediately upon its founding in 1943, scientists at the Laboratory needed new ways of making complicated calculations fast. Despite 80 years of constant progress in world-class high performance computing (HPC), that need has never been fully sated: as the technology gets more sophisticated and the calculations more complex, scientists are still working to understand the details of weapons performance.

The state of the art for HPC has progressed from one-dimensional single-physics models to three-dimensional multiphysics models. (Multiphysics refers to the coupling, within a computer model, of otherwise distinct physical fields, like fluid flow, radiation transport, and nuclear reactions.) But computing demands do not scale well with spatial resolution, so a ten-fold increase in resolution across three dimensions can require a thousand-fold increase in computing capability. Consequently, as the models have grown, so have the supercomputers that run them. Modern simulations at Los Alamos regularly occupy half a petabyte of memory (that’s half a million billion bytes).

Memory size is only part of the package. Computing capability comes from a few different variables: memory size, as measured by bytes, usually in billion-byte units called gigabytes; processing speed, as measured by floating point operations per second (flops); and memory accessibility, as measured by bytes accessed per second. The overall computing capability is limited by whichever of these variables maxes out first. For supercomputers tackling stockpile stewardship, that’s usually memory accessibility.

“Given the hoopla in the press about the ‘fastest computer in the world,’ one might think we should buy computers with the most flops,” explains Gary Grider, leader of the Lab’s HPC Division. “Every class of problem requires a different balance of flops, memory size, and memory access. For the problems we are working on, the time it takes to get a result is mainly determined by memory size and memory access, not flops.”

Super1 — According to one popular estimate, it would take about five exabytes to store all the words spoken in all languages across all of time. The Laboratory’s Trinity supercomputer could process that in about five seconds.

Modern high-resolution simulations are built to include multiple kinds of physics and multiple kinds of materials across multiple distance and time scales. The calculations get so complex and so large that it takes an unusual approach to fit them into a supercomputer’s memory. Part of this unusual approach is the use of irregular data representation. Regular representation means that the data references occur in a series: 1, 2, 3, etc., all the way to, say, 100. Irregular representation means that the references are all over the place: 5, 19, 77, 35, 8, 82, 55. The hunt-and-peck retrieval process takes longer for irregular references than regular references, but, because of how complex the calculations are, it’s how they must be organized.

Recent trends in the commercial computing industry have skewed large-scale computing architectures toward regular references. But these trends have been driven by the needs of less complex science and engineering problems, and by machine learning (ML) and artificial intelligence (AI) applications that are wildly popular and perform optimally when reference memory retrieval is regular. So, what does this drift of the industry toward architectures best suited for regular representation mean for applications that need irregular representation?

“The commercial computing industry has been moving in a direction that is less than ideal for efficiently solving these extremely complex problems,” says Grider. “Large scale computing requires large scale memory, networking, storage, power and cooling infrastructure, and a host of other things. The complex applications we’re interested in use irregular memory access, so we can’t leverage industry trends as well as we would like.”

Scientific institutions like Los Alamos used to be the main customers for computers, so the machines were built with those institutions’ needs at the forefront. Despite the mainstream drifting in a different direction, Los Alamos still needs purpose-built supercomputers, with codesigned custom hardware and software. And the two new systems arriving this year, Crossroads and Venado, are just that.

Crossroads

In the mid 2010s, the Advanced Simulation and Computing Program within the National Nuclear Security Administration launched the Advanced Technology Systems (ATS) computing platform to support stockpile stewardship. The plan was for a new purpose-built supercomputer to be delivered every few years at alternating locations between Los Alamos and Lawrence Livermore National Labs. The ATS machines were to be large, but more importantly, they would have memory size and memory accessibility geared toward the very hard computational problems of stockpile stewardship. The first ATS system, named Trinity after the first nuclear test, came to Los Alamos in 2015. The next one, named Sierra, arrived at Livermore in 2018. This year, the third ATS system, Crossroads—also named for a nuclear test— is coming to Los Alamos.

Before the ATS program, simulations had to be scaled down to fit on a single HPC system. The goal for Trinity was for it to be the first machine with large enough memory to run extremely large problems at full scale. Trinity’s massive memory and advanced data management have indeed enabled successful simulations as large as one million gigabytes, but it takes months to complete a single problem, which is frustratingly slow for scientists. The long-term goal for the ATS program is for whole problems to be run at full scale and complete in a matter of days.

Crossroads' focus is to be faster at moving data in and out of memory.

Crossroads will get significantly closer to this goal. The focus this time is to be faster at moving data in and out of memory, so Crossroads will have a memory interface that involves more physical connections between the memory and the processor. This high-bandwidth memory configuration is up to eight times faster than normal memory, so problems that take months for Trinity to complete will be solved by Crossroads in just weeks.

“Crossroads is emblematic of the future,” says Irene Qualters, associate laboratory director for simulation and computation. “By codesigning the architecture and introducing high-bandwidth memory, Crossroads will have five-to-ten times better performance than Trinity. And we expect the next one will be better still.”

The high-bandwidth memory of Crossroads addresses memory bandwidth wholly—improving both regular and irregular memory access. But even with high-bandwidth memory, irregular retrieval is still slower than regular retrieval. Therefore, irregular memory access specifically will be one of the focus areas of the next Los Alamos ATS supercomputer. That system, ATS5 (after ATS4 goes to Livermore), is currently unnamed and scheduled for 2027. It will have the goal of bringing the time scale down even more, perhaps even completing whole problems in just days.

Los Alamos and stockpile stewardship aren’t alone in the problems of huge memory demands or irregular reference retrieval. Other areas like seismic science and graph analytics also require large memory and use highly irregular memory references. So, solving these problems for stockpile stewardship will also benefit other fields important to national security.

Venado

The other supercomputer arriving at Los Alamos this year is Venado, named after a New Mexico mountain peak. This system, more mainstream in structure and not as specialized as Crossroads in terms of mission, was built with a different purpose in mind and will occupy a different, broader niche at the Lab. “This is an institutional investment to support HPC at Los Alamos,” explains Jim Lujan, HPC program director. “It will support crucial science across the whole Lab, like climatology and virology, and it will help us explore artificial-intelligence-based approaches to stockpile stewardship.”

Venado will use a new, very fast processor that combines a central processing unit specialized for scientific computing, called Grace, with a world class general-purpose graphics processing unit, called Hopper. Dubbed a “superchip,” the Grace-Hopper processor will be especially adept at ML/AI applications, which are being used and developed across the Laboratory for a wide variety of purposes. (The names Grace and Hopper honor a U.S. Navy veteran and computer science pioneer named Grace Hopper.) ML/AI applications are also very much in the forefront of mainstream computing and Venado will be capable of up to ten exaflops—that’s ten billion billion flops—of ML/AI computing power.

Venado is more than an upgrade, it's the first of its kind.

“We are already using ML and AI to reduce the time to solution and improve the fidelity of models,” says Qualters. “Venado will really help us expand and develop our proficiency in ML and AI in support of the Lab’s mission.”

But that’s not all. Venado will also facilitate HPC codesign as a research discipline. Part of the machine will be dedicated to the study of how to best build the next generation of supercomputers based on what kinds of work they will do. Specifically, Grider and his colleagues will use it to explore what is needed for the next ATS machine, the one that will replace Crossroads, and other future HPC systems. They will use these computers to study ways to improve the next generation of computers.

“We’re pushing the frontiers of HPC and trying to advance what these tools can do,” says Lujan. “Venado is more than an upgrade, it’s the first of its kind.”

Finally, because it will serve and help connect the broad HPC community across the Laboratory, a certain cross-pollination benefit is expected. Some Los Alamos researchers have work that falls squarely under stockpile stewardship, some have work that falls outside of stockpile stewardship, and some have work that does both. Aimee Hungerford, Laboratory deputy program director for the Advanced Simulation and Computing Program, explains. “For people who straddle that line, both parts can now be in one place. At the meetings in which we coordinate operation of these machines, we’ll have people from both sides, as well as those who have always straddled that line. It will provide visibility and opportunities for collaboration.”

Fitting in

In addition to innovation in the HPC systems themselves, the crosstalk between the computers and the algorithms they use is another area of vibrant growth. The development of new computing methods is moving in parallel with the development of the machines. “It hardly ever happens in computing that you can move to a new system and see huge gains without changing the codes,” says Grider. “But the switch from Trinity to Crossroads will do just that.”

Super2 — If each floating point operation (flop) were a single index card, after one second of Crossroads' operation, the stack of cards would reach beyond our solar system. In one hour, it would reach our closest neighboring star—four light years away.

Whereas a laptop has a fan to cool its processor, a supercomputer needs a whole building’s worth of infrastructure to keep it cool (and large crews of workers to install and maintain that infrastructure.) A recent move to greywater for cooling is an innovative and ecofriendly feather in the Lab’s HPC cap. Rather than using chilled freshwater to cool Trinity, the Lab has been experimenting with the use of room-temperature greywater. It has worked so well, conserving both electricity and potable water while doing the job of keeping the processors cool, that Crossroads and Venado are being installed with similar systems.

National security—nuclear and otherwise—relies on the most advanced computers in the world. Crossroads and Venado are steps on a path toward making computers the most efficient for the specialized kind of work done at Los Alamos. They will join a family of other specialized computing systems, all of which are working to provide one-of-a-kind solutions to the nation’s hardest one-of-a-kind problems.