Recognizing the challenges of running sophisticated applications including complex simulations, data analytics, artificial intelligence and heterogenous workflows at scale in hybrid computing environments, multiple institutions are forming an open-source community to develop and support a framework for better systems management.
The new Open, Composable, Heterogeneous, Adaptable, Management Infrastructure (OCHAMI) will support new and existing applications and workflows. OCHAMI embraces modern systems-management methods while building in flexibility for sites to develop and deploy their preferred tools to enable plugins, micro-services and multi-tenancy solutions.
"Los Alamos is pleased to be involved in this important community and eager to have other institutions join this inclusive effort," said Gary Grider, High Performance Computing division leader at Los Alamos National Laboratory. "We see the need to move from today's method of installing, configuring and then resisting changes to an ongoing approach of flexible and change-friendly management of heterogeneous computing environments."
OCHAMI is being launched as an open community effort. Forming members include Los Alamos National Laboratory, the National Energy Research Scientific Computing (NERSC) Center at Lawrence Berkeley National Laboratory, the Swiss National Supercomputing Center (CSCS), Hewlett Packard Enterprise (HPE) and the University of Bristol. Following a public governance model similar to the Cloud Native Computing Foundation, the goal of OCHAMI is to create a community where all current and future members collaborate on the framework's architectural direction, individual HPC sites can address their specific challenges and the resulting solutions are shared with the community.
Established scalable computing sites each have their own preferences for systems management tools, most selected in an era when resisting change was the norm. Relatively few sites adopted modern concepts like micro-services, heterogeneous orchestration, self-discovery and healing, adaptability and composability. The OCHAMI community is encouraging participants to embrace a new philosophy for systems management that promotes innovation, increases the ability to recover from mistakes and leverages existing system management priorities as part of a larger-scale managed environment. The effort will leverage today's build-and-configure tools and extend to active management of operation and change.
"The NERSC is excited to partner with other institutions in OCHAMI," said Sudip Dosanjh, director of the NERSC at Lawrence Berkeley National Laboratory. "Software challenges are increasing in HPC as users require more tools and systems become more complex. We believe that open-source efforts like OCHAMI will broadly benefit the community and our users."
"Open standards are imperative to driving innovation in supercomputing," said Trish Damkroger, senior vice president and chief product officer, HPC, AI & Labs at HPE. "The HPC industry needs a modern, cloud-native data center infrastructure management interface with a simplified user experience that supports legacy tools but is flexible enough to adapt to modern and future features. We look forward to collaborating with the members of OCHAMI to achieve our mutual goal of enabling operational excellence to advance the HPC community."
"The Swiss National Supercomputer Center is adopting a cloud-native architecture for its new supercomputing infrastructure dubbed Alps, which is based on the OCHAMI software stack," said Thomas Schulthess, director at the CSCS. "We are enthusiastic about developing this technology in an open community that will enable all HPC sites to adopt such architectures and DevOps practices."
"The University of Bristol's research and teaching excellence mission will directly benefit from OCHAMI, as it draws upon existing open-source efforts for introducing contemporary DevSecOps approaches for supercomputing and thus interoperability to the wider digital research infrastructure," said Sadaf Alam, director of Advanced Computing Strategy at the University of Bristol. "Specifically, OCHAMI will significantly benefit the upcoming Isambard supercomputing digital research infrastructure and the development of associated curriculum, training programs and graduate-level projects."
OCHAMI seeks to offer fresh thinking and new tools to serve today's application demands while enabling the scalable composable infrastructure for the future.
If interested in learning more, please reach out to ochami-info@lanl.gov.
LA-UR-23-32872