High performance computers are now generating data faster than they can store them. “We can produce exabytes of data per second now,” explains Roxana Bujack, a Los Alamos computer scientist. “But we can’t store it all to analyze later, like we used to. There’s just too much.”
Bujack is a part of the Exascale Computing Project, the National Strategic Computing Initiative’s push to develop computing systems capable of a billion-billion calculations per second. One of the hurdles the Exascale Computing Project faces is the mismatch between data generation and data storage—known as the input/output bottleneck, or the I/O problem.
One method to tackle the I/O problem is in-situ processing. The idea is for the computer to build the final artifact—whether that is a statistical analysis, a visualization, a trend in climate simulation, a characterization of particle interactions, or some other product—while the computation is running rather than after. The final artifact is then saved, but the data that went into it are discarded. This approach negates the need to store the raw data, which frees up space, but it also frees up time because moving data in and out of storage takes a lot longer than producing the artifact does.
An image of a simulation takes up about one megabyte (106 bytes), so an exascale computer could store a billion images every second and still take less storage space than the raw data would. The images can still be retrieved as needed. It’s as if the data were there, but they are not.
As a demonstration, Bujack used the Summit supercomputer at Oakridge National Laboratory—the fastest in the world at the time—to create this visualization of a simulated contained explosion. She produced the images with ParaView, a visualization software developed at Los Alamos for large scientific datasets. As a simulation runs, ParaView uses in-situ analysis to take and store pictures from all perspectives and all times. Another software tool developed at Los Alamos, called Cinema, then allows Bujack to interact with the images—zooming, shifting, rotating, browsing through time—by loading them from the collection of stored images, instead of generating a new image from stored data.
“In-situ processing is a huge deal for exascale,” concludes Bujack. “No matter how detailed the simulation gets, the storage needs for images will remain constant. This is true scalability and it’s a big win for us.”