Computing on the mesa

By Nicholas Lewis & Whitney SpiveyDecember 1, 2020

Since the Manhattan Project, scientists’ need to process, store, and visualize large quantities of numerical data to simulate physical phenomena has kept Los Alamos on the front lines of computing.

If you’ve seen the 2016 Academy Award–nominated film Hidden Figures or read the book of the same name, you know that human computers were employed by NASA in the 1960s. These computers, usually African American women, performed mathematical calculations to help the United States stay competitive in the Space Race.

But nearly two decades before the Space Race, human computers were helping to win another race—the race to create the world’s first atomic bomb. This race took place in various capacities all across the United States, but much of the action occurred at a mesa-top laboratory in Los Alamos, New Mexico. This laboratory, known as Project Y, was established in 1943 with the singular goal of creating an atomic bomb to help end World War II.

At the time, scientists had a theoretical understanding of such a bomb, but to make it a reality, they needed to better understand how its components might work and how certain materials might behave. Uranium and plutonium, two actinide metals that are key to a nuclear explosion, weren’t well-understood at the time. Having been discovered only in 1940, plutonium was particularly mysterious.

Scientists could hypothesize what these materials and components might do at extreme scales and speeds, and these hypotheses could be written out as elaborate mathematical calculations—so elaborate, in fact, that the math took months to do by hand.

The wives of scientists, members of the Women’s Army Corps, and local civilians were recruited to perform these calculations on mechanical calculators from two California-based companies, Marchant and Friden, and the New Jersey–based Monroe Calculating Machine Company. These calculators were similar in shape, size, and operation to old typewriters; the women could add, subtract, multiply, and divide by punching numbers on a keypad. These women—along with some of the theoretical physicists who used the machines—were the first “computers” at Project Y.

Ibm Punchards — IBM’s punched cards were standardized in the 1920s and remained in regular use for more than half a century. These 62,000 punched cards equal roughly five megabytes of data—that’s about one iPhone photo. Los Alamos National Laboratory

Punched-card machines

Physicist Dana Mitchell was in charge of equipment procurement at Project Y. He’d previously worked at Columbia University, and from that experience, he was aware that New York– based International Business Machines Corporation (IBM) made accounting machines. These machines used paper cards in which holes had been punched to represent data. The punched cards were then fed into the accounting machines (and later, into early electronic computers), which could perform calculations faster than human computers.

Mitchell persuaded Project Y’s governing board that the IBM machines would be a good addition to the computing group. Upon arrival, the punched-card machines were modified to perform implosion simulations. (Implosion— when high explosives compress the plutonium core of a bomb to create a nuclear explosion— was scientists’ hope for an atomic weapon.)

The first implosion calculation took three months to complete, but theoretical physicist Richard Feynman’s refinement of the process reduced the calculating time to less than three weeks.

On July 16, 1945, the Trinity test—the detonation of the Gadget, the world’s first implosion device— corroborated the results of the massive calculation effort and demonstrated the utility of the computational models the effort employed. Just 24 days later, a second implosion device—the Fat Man bomb— was dropped over Nagasaki, Japan, contributing to the end of World War II.

The ENIAC

To continue its weapons work after the war, Los Alamos Scientific Laboratory (previously Project Y) needed more-powerful computational tools, particularly as scientists began to develop hydrogen (thermonuclear) weapons.

As the Laboratory began to search for these tools, John von Neumann, a Hungarian-born mathematician who consulted at Los Alamos, recalled the ENIAC project, which had been built between 1943 and 1945 at the Moore School of Electrical Engineering at the University of Pennsylvania.

“The Eniac [sic], known more formally as ‘the electronic numerical integrator and computer,’ has not a single moving part,” according to a 1946 New York Times article. “Nothing inside its 18,000 vacuum tubes and several miles of wiring moves except the tiniest elements of matter-electrons. There are, however, mechanical devices associated with it which translate or ‘interpret’ the mathematical language of man to terms understood by the Eniac, and vice versa. … [I]ts inventors say it computes a mathematical problem 1,000 times faster than it has ever been done before.”

The ENIAC occupied more than 1,000 square feet and weighed 30 tons. Even though programming the machine was tedious—wires and switches had to be rearranged for each calculation—the machine is believed to have done more calculations in 10 years than all of humanity had done until that time, according to the Computer History Museum in Mountain View, California.

In early 1945, von Neumann brought word of the ENIAC to Los Alamos, and that summer, he began working alongside physicists Stanislaw Ulam and Nicholas Metropolis to formulate a calculation for the ENIAC to test the viability of a thermonuclear bomb. The results of the test calculation, which was programmed at Los Alamos by Stanley Frankel but performed in Pennsylvania, were largely inconclusive but served as the first practical use of a general-purpose electronic digital computer.

Monte Carlo method

In 1946, Ulam was recovering from an illness and playing cards when he wondered: “What are the chances that a Canfield solitaire [which is very difficult to win] laid out with 52 cards will come out successfully?”

As Ulam tried to calculate the odds, he wondered if a more practical approach might be to lay out the cards 100 times and simply observe and count the number of successful plays. “This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form interpretable as a succession of random operations.”

This idea—essentially using randomness to solve problems that might be deterministic (not random) in principle— became known as the Monte Carlo method. In 1946, Ulam described the idea to von Neumann, and they began to plan actual calculations. The ENIAC ran the first Monte Carlo calculation in 1947. By the 1950s, Monte Carlo methods were being applied to the hydrogen bomb effort.

“Statistical sampling had been known for some time, but without computers the process of making the calculations was so laborious that the method was seldom used unless the need was compelling,” explained physicist Herbert Anderson in the fall 1986 issue of Los Alamos Science. “The computer made the approach extremely useful for many physics problems.” In fact, the Lab maintains an entire group of Monte Carlo scientists today.

Maniac Computer — The MANIAC computer, built at Los Alamos in 1952, was one of the first electronic digital computers and could perform up to 10,000 operations per second. The MANIAC powered Monte Carlo and other simulation techniques invented at Los Alamos. MANIAC operators sat directly in front of the machine and grew to know it so well that they could debug code by ear, using a radio to listen to the interference patterns the computer generated.

The MANIAC

With each calculation, the ENIAC demonstrated the feasibility of quickly and accurately translating real-world phenomena into computable problems. But the ENIAC had three major problems: it had to be rewired for each new problem, its memory was limited, and it wasn’t located at Los Alamos.

So, scientists were thrilled when the MANIAC (Mathematical Analyzer, Numerical Integrator, and Computer) was built at Los Alamos from 1949 to 1952. The MANIAC was an early example of what came to be known as “von Neumann architecture”— instead of the computer being programmed using wires and switches, it was programmed using the same media that contained the input data, such as punched cards or paper tape (a long strip of paper in which holes are punched to represent data). This process greatly reduced both the amount of time required to program the computer and the amount of human intervention needed during a computation.

The timing could not have been better. On August 29, 1949, the Soviet Union detonated its first atomic bomb. Many believed a Soviet thermonuclear weapon was imminent and that the United States should be ready with one of its own. “It is part of my responsibility as commander in chief of the Armed Forces to see to it that our country is able to defend itself against any possible aggressor,” President Harry Truman said in January 1950. “Accordingly, I have directed the Atomic Energy Commission [which oversaw Los Alamos Scientific Laboratory] to continue its work on all forms of atomic weapons, including the so-called hydrogen or superbomb.”

Computers such as the MANIAC were deemed essential tools in the development of these bombs. But although the MANIAC began to displace the IBM accounting machines and human computers at Los Alamos, it did not replace either one of them immediately. Human computing was a well-established form of computation, while electronic digital computing was still enormously expensive and technically challenging.

But using this hybrid system of electronic and human computers, Los Alamos inched its way closer to developing a thermonuclear weapon, and on May 8, 1951, Laboratory scientists demonstrated thermonuclear fusion with the George test at Eniwetok Atoll in the Pacific Ocean. With a 225-kiloton yield (the equivalent of 225,000 tons of TNT), George was the largest nuclear explosion up to that time.

Ibm 704 — A woman works on the IBM 704, the first commercially available floating-point computer and the first computer equipped with magnetic core memory. Los Alamos operated three IBM 704s in the late 1950s.

The IBM era

In the early 1950s, IBM developed its first commercial computer, the 701 Electronic Data Processing Machine. Also called the Defense Calculator, the 701 was a powerful digital computer based on von Neumann’s architecture. In 1953, Los Alamos leased the first commercially available 701, beginning a long relationship between IBM and Los Alamos and a tradition of collaboration between the Lab and commercial vendors that has continued, in various forms, to the present.

The Los Alamos–built MANIAC edged out the 701 in performance, but IBM’s successor to the 701, the 704, proved to be more capable and reliable than the MANIAC’s successor, the MANIAC II. While the MANIAC II would remain in service for 20 years, it was the last of the Los Alamos– built computer systems. The cost of developing and producing digital computers meant that by the mid- 1950s, purchasing or leasing a computer offered greater value than building one.

Despite opting to exclusively purchase computers following the MANIAC II, Los Alamos did not remain outside the computer development process and worked with a multitude of vendors and government and academic institutions to influence, fund, and promote developments in scientific computing.

The first major collaboration involved the IBM 7030, also known as Stretch because it was intended to “stretch” IBM’s capability with computing technology—IBM pledged the computer would have 100 times the power of the 704. The new machine, however, did not live up to this lofty projection, offering closer to 35 times the power of the 704. Even so, Stretch— arguably the world’s first supercomputer—was by far the most powerful computer in the world when delivered to Los Alamos in 1961.

Through the mid-1960s, Los Alamos remained an “IBM shop,” making extensive use of high-end IBM computers and equipment for both weapons and administrative work. During this period, “batch processing” dominated computer use at Los Alamos. Computer users typically brought their pre-coded data and programs, in the form of punched cards or paper tape, to professional operators, who scheduled time for the user’s code to run on the computer. Shorter jobs ran during the day, while longer jobs ran at night and on the weekends.

Cray 1 — In an April 1974 meeting with Cray Research, Jack Worlton, who was involved with Laboratory procurement, sketched the computer that would eventually be known as the Cray-1. Los Alamos eventually had five Cray-1 computers.

Control Data Coporation era

The Partial Test-Ban Treaty of 1963 prohibited nuclear tests in the atmosphere and under water, which forced nuclear testing to be conducted underground. The added cost of underground testing increased the reliance on computer simulation. By first using a computer to simulate a nuclear test (largely using data from previous nuclear tests), scientists hoped to avoid expensive test failures.

As reliance on computers increased, the Lab hired more and more people to operate the machines around the clock. But it soon became obvious that what Los Alamos needed was a better computer, one that was at least as powerful as Stretch, with plenty of memory and storage to cope with the demands that increasing numbers of users and the growing complexity of weapons codes were placing on the computers.

After reviewing proposals from several manufacturers, a selection committee narrowly voted to remain with IBM. IBM’s proposal, a series of larger computers that would be traded repeatedly as newer models became available, did not offer a significant jump in computing power at Los Alamos but promised to keep step with the projected computing demands.

The other serious contender in the search was from the relatively new Control Data Corporation (CDC) with its CDC 6600 supercomputer. The CDC computer, the most powerful in the world in 1965, offered four to six times the performance of Stretch. But it was rejected because it was extraordinarily difficult to program and could not recover automatically from a programming error. In addition, its parent company did not, at the time, offer the support, software, or range of disc storage that IBM provided. IBM was the safer— and cheaper—option.

In late 1965, however, IBM backpedaled on its agreement, and Los Alamos signed a contract with CDC, whose 6000-series of computers had by that time been improved in performance and reliability. The first of several 6600s arrived at the Lab in August 1966. What began as the second choice in a botched agreement with IBM developed into a new era of computing at Los Alamos, and an even longer relationship with Seymour Cray, the CDC designer behind the 6600.

The CDC machines offered considerable leaps of performance with each generation. The successor to the 6600, the CDC 7600, went on the market in 1969 and offered approximately 10 times the performance of the 6600. Los Alamos ultimately purchased four 7600s, which formed the bulk of the Lab’s production capacity for nearly a decade.

The 7600s allowed for time-sharing—multiple users accessing a single computer’s resources simultaneously. Time-sharing meant that, while some of the machine’s processing power was invested in swapping between simultaneous users, rather than focusing entirely on processing code, users would not have to wait for machine access.

In 1972, Seymour Cray left Control Data to form his own company, Cray Research, Inc. Cray’s first computer, the Cray-1, was completed in 1976 and went to Los Alamos for a six-month evaluation at no cost to the Lab. Despite the machine’s extraordinary speed, it lacked error-correcting memory—the ability to detect changes in data due to mechanical or environmental problems and then correct the data back to its original state. Los Alamos returned it at the end of the evaluation period.

The evaluation at Los Alamos provided Cray Research more than technical assistance. The fact that Los Alamos, with its long history of computing expertise, was willing, even eager, to evaluate and use the company’s first computer was important for the image of the new company.

Cray Research modified subsequent Cray-1 computers to incorporate error-correcting memory, and five of them went to Los Alamos. The Cray-1 was the first commercially successful vector machine, meaning that the computer’s processor (essentially its brain) could execute a single instruction on multiple pieces of data simultaneously. Vectoring stood in contrast with scalar processing, in which a computer could execute only a single instruction on a single piece of data at a time.

The enormously successful Cray-1 was followed by a series of machines, including multiprocessor computers. The 1982 Cray X-MP possessed two central processors, while its successor, the Y-MP, featured up to eight. These machines formed the bulk of the Lab’s computing capacity into the early 1990s.

Computing Ssp — Visualization is an important tool for the understanding of complex data. The creation of a 3D visual representation in the CAVE enables the human visual system to detect, among other things, trends, correlations, anomalies, and unexpected events in the data.

Stockpile stewardship

As the 1980s drew to a close, Los Alamos continued to drive the evolution of computing. The Lab worked with Thinking Machines Corporation to develop the massively parallel Connection Machine series, which focused on quantity over quality: using thousands of microprocessors (not more powerful ones) to perform numerous calculations simultaneously. This took Lab computing into the gigaflop zone (1 billion floating-point operations, or calculations, per second) by 1990.

But then the Cold War came to an abrupt end in 1991, and nuclear weapons testing stopped in 1992. A science-based stockpile stewardship program (SSP, see p. 54) was implemented to ensure the continued safety, security, and effectiveness of the nation’s nuclear deterrent. SSP would use data from past nuclear tests and data from current small-scale (nonnuclear) experiments to make three-dimensional (3D) computer simulations that would gauge the health of America’s nuclear weapons.

Now a crucial part of how Los Alamos fulfills its mission, computer simulations allow Laboratory scientists to virtually detonate weapons systems and monitor what is happening inside the nation’s aging deterrent— most nuclear weapons in the U.S. stockpile were produced during the 1970s and 1980s and were not designed or intended to last indefinitely.

Baseline simulations of historical nuclear tests are used to compare against real historical test data to verify the correctness of the simulation tools. Then these simulation tools, including applications, codes, and more, are used to explore weapons virtually in circumstances different from the original test to determine unseen aspects of the weapon, such as the effects of aging.

“The codes model every piece of physics involved in a weapon—these codes are very complicated and unique compared with most science codes,” says Bill Archer, Lab scientist and former program director for Advanced Simulation and Computing.

The extensive use of simulations made it necessary to rapidly develop supercomputers powerful enough to replace real-world nuclear testing with virtual testing. Increasing computer speed was important, but having 3D simulations with high resolution and accuracy was even more important. To achieve high-fidelity 3D simulations, computing would need to make incredible technological leaps: gigaflops to teraflops (trillions of calculations per second, which happened in 1999), teraflops to petaflops (quadrillions of calculations per second, which happened in 2008), and petaflops to exaflops (quintillions of calculations per second, coming soon).

Roadrunner Cropped — Installed in 2008 and named for New Mexico’s state bird, Roadrunner was the first large-scale hybrid cluster computer and the first to break the petaflop barrier.

Accelerated Strategic Computing Initiative

As Department of Energy (DOE) laboratories— including Los Alamos, Lawrence Livermore, and Sandia—pivoted to stockpile stewardship, they had to rely more heavily on computer-based simulations to verify the health of America’s nuclear weapons. The DOE’s Accelerated Strategic Computing Initiative (ASCI, now ASC) began in 1995 as a joint effort among the laboratories to provide the computational and simulation capabilities needed for stockpile stewardship.

ASCI was intended to promote industry collaboration and meet progressive performance goals. The initiative emphasized hardware and software solutions that could leverage existing commodity products, such as cluster computers—collections of small computers linked by a network to operate as a single, large computers.

In 1998, Los Alamos collaborated with Silicon Graphics to install the ASCI Blue Mountain cluster—the first large-scale supercomputer to emerge from this effort. “A considerable challenge in the deployment of the ASCI Blue Mountain system is connecting the 48 individual machines into an integrated parallel compute engine,” states a Laboratory leaflet from 1998. But once installed, “in its full configuration, the Blue Mountain system is one of the most powerful computers installed on-site in the world.”

By the early 2000s, the cluster was the dominant type of supercomputer. But Los Alamos computing planners realized that ever-larger clusters would eventually become unsustainable, needing too much electricity and too much cooling to be affordable and reach exaflop-level performance. In concept, “hybrid” clusters—using more than one type of processing chip—offered the performance and efficiency the supercomputing field needed, but only Los Alamos and its co-developer, IBM, were willing to put the radical new hybrid approach to the test.

In 2008, Los Alamos and IBM co-designed the Roadrunner supercomputer and proved enhancing performance did not mean sacrificing energy efficiency. Based on what was considered a radical approach at the time, Roadrunner was the first large hybrid supercomputer. This meant that Roadrunner had multiple kinds of processing chips, rather than just one type of microprocessor. With careful programming, Roadrunner used the chip best suited for a task; either its conventional AMD microprocessors—like those in a desktop computer—or its energy-efficient Cell accelerator chips from IBM— similar to the Cell chip found in the Sony Playstation 3. This hybrid approach (rather than a “one-chip-fits-all” approach) made Roadrunner the fastest computer in the world and extremely efficient, using only one-third the power of equivalent, nonhybrid supercomputers. Ever since Roadrunner pioneered the concept, hybrid supercomputers have become the norm.

On May 26, 2008, Roadrunner became the first supercomputer to exceed a sustained speed of 1 petaflop/s—a million billion calculations per second. How fast is a petaflop? Imagine that a normal desktop computer was the fastest human sprinter alive, running at about 28 miles per hour. At full speed (and without stops), that sprinter could run from Los Alamos, New Mexico, to New York City in 72 hours. Roadrunner, a 1-petaflop supercomputer, would make the same journey in only 25 seconds. A 1-exaflop supercomputer would reach New York in only 2 seconds.

Trinity Cooling Min — At Los Alamos’ Strategic Computing Complex, installation of the cooling infrastructure to support the Trinity supercomputer is underway in June 2015.

Trinity and Crossroads

In 2002, the 300,000-square-foot Strategic Computing Complex was built to house the Lab’s ever-expanding fleet of high-performance computers. The floor of the supercomputing room is 43,500 square feet, nearly the size of a football field. The largest computer there, named Trinity after the 1945 Trinity nuclear test, enables (among other things) large-scale data analysis and visualization capabilities in two of the building’s special facilities: the Powerwall Theater and the Cave Automatic Virtual Environment (CAVE), an immersive virtual reality environment powered by computers.

In the CAVE, users wearing special glasses can interact with virtual 3D environments of everything from nuclear detonations to the birth of galaxies. Weapons scientists must produce high-resolution simulations of real events, and interacting with visualizations helps scientists test their hypotheses and their solutions to problems.

The tri-lab computing community at Los Alamos, Lawrence Livermore, and Sandia National Laboratories share Trinity, an Advanced Simulation and Computing Advanced Technology System (ATS) supercomputer, for their largest-scale weapons-computing needs. Los Alamos uses Trinity primarily to study weapons performance, Sandia to study weapons engineering, and Livermore to quantify uncertainty (to study the likelihood of a certain outcome if some aspects of the problem are not known).

Trinity, however, is approaching the end of its useful lifetime. Even though it’s only five years old, soon parts will no longer be available because they’re already considered dated— that’s how fast the computing world moves.

In 2022, the Lab will acquire a new computer, Crossroads, named for Operation Crossroads, the 1946 series of nuclear tests in the Marshall Islands. Hewlett Packard Enterprise (HPE) was awarded the $105-million contract to deliver Crossroads to Los Alamos.

“We can only purchase and build these big world-class computers one at a time,” says Jim Lujan, a program manager in the High Performance Computing division. “The three labs share NNSA [National Nuclear Security Administration] codes and computing time on these unique resources, and it’s typically two and a half years before an increase in capability is needed, requiring the next system.” The location for the ATS-class supercomputers alternates between Los Alamos and Livermore.

Crossroads’ design is focused on efficiency in performance, workflow, and porting. Performance efficiency means that more usable computing power is available to the applications than on previous systems. Workflow efficiency aims to decrease the total time-to-solution of the entire problem, including all steps like data input, computing, data output, and analysis. Porting efficiency refers to the ease with which existing computing codes can be enhanced to take advantage of the new capabilities of the Crossroads system.

“This machine will advance our ability to study the most complex physical systems for science and national security,” says Jason Pruet, Los Alamos’ program director for the Advanced Simulating and Computing (ASC) program. “We look forward to its arrival and deployment.”

Metropolis1 — The Laboratory’s Nicholas C. Metropolis Center (also known as the Strategic Computing Complex) is a 300,000 squarefoot advanced computing facility that supports DOE’s Accelerated Strategic Computing Initiative.

Looking forward

The Los Alamos tradition of adapting and pushing the boundaries of what’s possible in high-performance computing, which began with human computers, now continues with the use of multi-petaflop clusters and the exploration of quantum computing (see p. 34).

Although the technologies have changed, the talent for creating and innovating where few, if any, have trod before, has remained consistent. Although the future of computing is difficult to predict, Lab history demonstrates that Los Alamos will be a driving force in the decades to come, helping to turn big ideas into the commonplace.

Author Nicholas Lewis has compiled a history of computing at the Laboratory. To learn more about his work, email HPChistory@lanl.gov.