Automating at exascale

A new parallel programming system boosts supercomputing performance and efficiency.

By Katharine Coggeshall | December 1, 2020

Mccormick Shipman Task Graph Opt
Computer scientists Pat McCormick (left) and Galen Shipman inspect a task graph for a supercomputing application. Task graphs indicate which tasks are dependent on others and which tasks can run simultaneously. Los Alamos National Laboratory

Supercomputing is on the verge of a breakthrough—computing at the exascale, where a billion billion calculations per second will be the norm. That’s 10 times faster than current supercomputing speeds. Before that milestone can be met, supercomputing applications—essentially souped-up versions of phone apps—need to become much more efficient.

“We wanted to improve the process of finding ways to keep a computer system as busy as possible and assist in scheduling application tasks and data movement.”
- Pat McCormick

Whereas your phone apps keep you posting and swiping, supercomputing applications help simulate molecule interactions, fluid dynamics, and other physical phenomena. There’s a profound difference in the level of complexity here, but surprisingly little difference in the inner application workings. All applications, whether for a phone or a supercomputer, rely on coded commands (or tasks) that instruct the computer how to run the application. Applications can have millions of tasks, and those tasks can be executed in tens of thousands of different processors (the physical locations for data crunching). Designating the “what” and “where” for all of those tasks is currently hampering computing speed and efficiency.

Right now, these applications depend on their human developers for scheduling tasks and moving the corresponding data. This hands-on coding isn’t realistic at the exascale—where the aim is to more accurately simulate real-world mechanisms and behaviors, which are inherently complex. In the most complex cases, there are too many choices to be evaluated and implemented by hand. Therefore, the ability to harness the full power of modern supercomputers depends on replacing this hands-on coding with something more efficient, something automated.

“We wanted to improve the process of finding ways to keep a computer system as busy as possible and assist in scheduling application tasks and data movement,” says Pat McCormick, a computer scientist at Los Alamos National Laboratory.

So, Los Alamos scientists, along with colleagues at five other institutions (the Nvidia corporation, the University of California–Davis, Stanford University, SLAC National Accelerator Laboratory, and Sandia National Laboratories) created a parallel programming system to do just that. The system, called Legion, sifts through an application to determine which tasks can run in parallel, or simultaneously, to save time and boost computing efficiency. Legion can also determine which processor should run each task in order to maximize resources.

This is harder than it sounds because not all applications and processors speak the same language. Application developers have had to become experts in switching syntax, jumping between C++, Python, and Fortran codes as needed. This was an obvious area for improvement, so the Legion creators came up with a universal language instead. They named it Regent.

Now, application developers can write a single set of instructions in Regent, and Legion will automatically decide how best to run the application and move the data. Legion has been shown to improve efficiency up to 10-fold.

“We’ve demonstrated Legion on top supercomputers, such as Trinity at Los Alamos and Sierra at Lawrence Livermore National Laboratory, for physics simulations that support the stockpile stewardship program,” says Galen Shipman, Los Alamos Legion co-developer, “and we will demonstrate it on Crossroads, the new supercomputer coming to Los Alamos, when it’s up and running.”

Legion is already a foundational part of the Department of Energy’s Exascale Computing Project, and researchers from academia, other national laboratories, and industry have started using Legion to boost the performance and scalability of their supercomputing applications. As open-source software, Legion is available to anyone, which will allow swaths of scientists to study systems that otherwise would be impractical or impossible to investigate in the real world.