Ditching the Data

By Eleanor HuttererJune 1, 2023

A supercomputing workaround that proves a picture is worth a billion billion bytes.

High performance computers are now generating data faster than they can store them. “We can produce exabytes of data per second now,” explains Roxana Bujack, a Los Alamos computer scientist. “But we can’t store it all to analyze later, like we used to. There’s just too much.”

Bujack is a part of the Exascale Computing Project, the National Strategic Computing Initiative’s push to develop computing systems capable of a billion-billion calculations per second. One of the hurdles the Exascale Computing Project faces is the mismatch between data generation and data storage—known as the input/output bottleneck, or the I/O problem.

One method to tackle the I/O problem is in-situ processing. The idea is for the computer to build the final artifact—whether that is a statistical analysis, a visualization, a trend in climate simulation, a characterization of particle interactions, or some other product—while the computation is running rather than after. The final artifact is then saved, but the data that went into it are discarded. This approach negates the need to store the raw data, which frees up space, but it also frees up time because moving data in and out of storage takes a lot longer than producing the artifact does.

Data — These images are excerpts from a series illustrating a simulated explosion contained in a box. Beginning with a gas concentrated in a high-density region, the explosion first drives the material away from the center, then, as shock waves reverberate, the matter moves in different ways. Colors correspond to energy with yellow as high and purple as low. The green and purple shapes are isosurfaces separating materials of different densities. The yellow tentacles indicate where high-energy particles are likely to go. In-situ processing saves time and space for supercomputers by keeping the images while discarding the data that made them.

An image of a simulation takes up about one megabyte (106 bytes), so an exascale computer could store a billion images every second and still take less storage space than the raw data would. The images can still be retrieved as needed. It’s as if the data were there, but they are not.

As a demonstration, Bujack used the Summit supercomputer at Oakridge National Laboratory—the fastest in the world at the time—to create this visualization of a simulated contained explosion. She produced the images with ParaView, a visualization software developed at Los Alamos for large scientific datasets. As a simulation runs, ParaView uses in-situ analysis to take and store pictures from all perspectives and all times. Another software tool developed at Los Alamos, called Cinema, then allows Bujack to interact with the images—zooming, shifting, rotating, browsing through time—by loading them from the collection of stored images, instead of generating a new image from stored data.

“In-situ processing is a huge deal for exascale,” concludes Bujack. “No matter how detailed the simulation gets, the storage needs for images will remain constant. This is true scalability and it’s a big win for us.”

Ditching the Data

Share

A supercomputing workaround that proves a picture is worth a billion billion bytes.

More 1663 Stories