Powerful storage for big data

A radically new approach to storage acceleration aids data manipulation for research and discovery.

By Brian Keenan | July 25, 2022

Abof Jason Lee  Opt
Jason Lee, a research scientist in the Lab’s High Performance Computing division, works on ABOF hardware. Los Alamos National Laboratory

Data is a vital part of solving complicated scientific questions in fields such as climate, genomics, and nuclear physics. However, an abundance of data is often only as good as the ability to efficiently store, access, and manipulate that data. To facilitate discovery with big data problems, researchers at Los Alamos National Laboratory, in collaboration with industry partners, have developed the Accelerated Box of Flash, or ABOF, an open storage system acceleration architecture for scientific data analysis, which can deliver 10 to 30 times the performance of current systems.

“Scientific data and the data-driven scientific discovery techniques used to analyze that data are both growing rapidly,” says Dominic Manno, a researcher with the Lab’s High Performance Computing division. “Performing complex analysis to enable scientific discovery requires huge advances in the performance and efficiency of scientific data storage systems.”

Scalable computing systems use data processing units (DPUs) to speed up intensive functions between central processing units (CPUs) and storage devices; however, scientists have struggled to use DPUs within production-quality storage systems for complex high-performance computing simulation and data-analysis systems. ABOF solves that problem using a unique hardware and software storage system co-design that is programmable and attached to the network. This makes it simpler to use DPUs to move intensive operations away from the storage server CPUs. No major storage system software modifications or application changes are required. The result is faster and more efficient data manipulation that decreases time, cost, and energy use.

An example of a project that might benefit from storage system acceleration is the Energy Exascale Earth System Model, an Earth system modeling, simulation, and prediction project. Currently, the model runs, then is analyzed—a serial process. ABOF, however, could provide the architecture and the speeds to conduct analysis and modeling in parallel.

“For this kind of project, the value in a technology like ABOF would be in model analysis,” says Luke Van Roekel, a scientist with the Lab’s Theoretical division and co-lead of E3SM’s Water Cycle science campaign. “The data could be shipped off to ABOF, which can do analysis while the model moves to its next tasks on the main nodes. Analysis on E3SM’s large data volumes is where ABOF would potentially really shine.”