The Supercomputer Era: The Fastest Machines on Earth

Zusammenfassung

This article traces the history of supercomputing from Seymour Cray’s solitary genius in the 1960s through the vector processor era, the transition to massively parallel commodity clusters, and the race to exascale computing. It is a story about what happens at the frontier of computational ambition — where the problems are too large for any ordinary machine, the hardware must be invented alongside the software, and the applications range from predicting the weather to simulating nuclear explosions.

Seymour Cray and the First Supercomputers

Seymour Cray was an electrical engineer from Chippewa Falls, Wisconsin, who combined extraordinary technical intuition with an almost monastic focus. He worked in isolation, literally and figuratively: he built a lab under his house in Wisconsin, away from the corporate offices of the companies he worked for, and was known to say that the best way to design a computer was to hire a few good engineers and leave them alone.

At Control Data Corporation (CDC), Cray designed the CDC 6600 (1964) — widely recognized as the first true supercomputer. It achieved 3 megaFLOPS (3 million floating-point operations per second) at a time when the most powerful IBM mainframe managed 500 kiloFLOPS. It did this through a novel architecture: ten peripheral processors handled input/output while a central processor focused exclusively on computation, and the central processor itself executed instructions in parallel using multiple functional units.

IBM’s response to the 6600’s announcement was to form a task force of 34 engineers, managers, and programmers — a number that prompted Cray to observe, with characteristic dryness, that he had just learned why IBM was behind: it takes 34 people to do what he did with three.

Cray left CDC in 1972 to found Cray Research. The Cray-1 (1976) was his masterpiece: a $8 million machine that delivered 160 megaFLOPS, weighed 5.5 tons, and was shaped like a padded bench — the padding concealing the cooling system. Its defining innovation was the vector register: a hardware unit that could perform the same arithmetic operation on 64 numbers simultaneously, accelerating the loops that dominated scientific computation. Los Alamos National Laboratory bought the first unit for nuclear weapons simulation. The Cray-1 became the standard by which all scientific computation was measured.

Vector Processing vs. Modern Parallelism

A vector processor executes one instruction on many data elements simultaneously — “add these 64 numbers to those 64 numbers” in a single operation. This is powerful for the regular, predictable loops of scientific computation (weather simulation, fluid dynamics, matrix operations) and less useful for irregular, data-dependent code. Modern parallelism, by contrast, runs many independent threads of execution simultaneously on separate cores. Today’s supercomputers use both: many-core CPUs and GPUs for thread-level parallelism, combined with SIMD instructions for data-level parallelism within each core.

The Cold War and Scientific Computation

Supercomputers were not merely academic tools. The U.S. government’s primary customer for the most powerful machines was the nuclear weapons complex — specifically, the national laboratories at Los Alamos, Lawrence Livermore, and Sandia. After the Limited Test Ban Treaty (1963) prohibited atmospheric nuclear tests and the Comprehensive Test Ban Treaty (1996) halted underground tests, simulation became the only legal way to verify that nuclear weapons in the stockpile still worked as designed. The required simulation accuracy drove supercomputer procurement for decades.

The Cold War framing extended to export controls: the U.S. government restricted the export of supercomputers to Soviet-bloc countries and, later, China. When Japan’s NEC and Fujitsu built competing machines in the 1980s, the U.S. imposed tariffs and export restrictions that created significant diplomatic friction.

The civilian scientific applications were equally consequential. Weather forecasting — global atmospheric simulation — required updating models faster than real time to be useful, a threshold that each generation of supercomputers moved further ahead. The European Centre for Medium-Range Weather Forecasts (ECMWF) and the U.S. National Weather Service became major supercomputer customers, and the improvement in forecast accuracy through the 1980s and 1990s is directly attributable to increases in computing power.

The Top500 List and the Benchmark Race

In 1993, Jack Dongarra at the University of Tennessee and Hans Meuer at the University of Mannheim began publishing the Top500 list — a twice-yearly ranking of the 500 most powerful supercomputers in the world, measured by performance on the Linpack benchmark: solving a dense system of linear equations.

Linpack performance is measured in FLOPS (floating-point operations per second). The trajectory of the Top500 list illustrates Moore’s Law applied to the largest machines:

Year	Top machine	Performance
1993	CM-5 (Thinking Machines)	60 GigaFLOPS
1997	ASCI Red (Intel/Sandia)	1.8 TeraFLOPS
2008	Roadrunner (IBM/Los Alamos)	1.1 PetaFLOPS
2018	Summit (IBM/Oak Ridge)	122.3 PetaFLOPS
2022	Frontier (HPE/Oak Ridge)	1.1 ExaFLOPS

Frontier, installed at Oak Ridge National Laboratory in 2022, was the first machine to exceed one ExaFLOP — $10^{18}$ floating-point operations per second. It comprises 74 HPE Cray EX cabinets and 9,408 nodes — each node pairing one AMD EPYC CPU with four AMD Instinct MI250X GPU accelerators — consuming 21 megawatts of power, roughly the electricity consumption of a small city.

Dead End: The Massively Parallel Processor and the Commodity Cluster

Through the late 1980s and early 1990s, a competing vision to Cray’s vector processors emerged: Massively Parallel Processing (MPP) — connecting hundreds or thousands of ordinary processors with a fast network, each running its own program on its own memory.

Companies like Thinking Machines (CM-2, 65,536 processors, 1987) and nCUBE built elegant MPP machines. The CM-2 in particular attracted enormous academic attention. But programming MPP machines was hard: the programmer had to explicitly manage which processor had which data and how processors communicated. And by the mid-1990s, the hardware advantage of specialized interconnects was evaporating as commodity Ethernet and later InfiniBand approached similar bandwidth.

The Beowulf Moment

In 1994, Thomas Sterling and Don Becker at NASA built Beowulf — a cluster of 16 commodity Intel 486 PCs connected by Ethernet, running Linux. It cost approximately $50,000 and delivered performance comparable to a $250,000 workstation cluster. Beowulf demonstrated that supercomputer-class performance could be assembled from commodity parts with open-source software. Thinking Machines filed for bankruptcy in 1994. The era of purpose-built supercomputer hardware — outside of the most extreme applications — effectively ended. Today, every machine on the Top500 list is a cluster of commodity CPUs and GPUs connected by a fast network. Cray Research was acquired, eventually becoming part of HPE, and its brand survives primarily as a system integrator rather than a hardware innovator.

From Weather to Proteins: The Applications That Justify the Cost

The case for exascale computing is made by problems that are not metaphors for progress but direct, measurable challenges:

Climate modeling at sufficient resolution to capture regional effects requires simulating the ocean, atmosphere, land surface, and ice sheets simultaneously at scales of kilometers. Current climate models run at 25–100 km resolution; projecting regional rainfall, drought, and flooding with policy-relevant precision requires 1 km or finer, a computational demand that current machines can barely approximate.

Protein folding — predicting the three-dimensional structure of a protein from its amino acid sequence — was one of biology’s defining unsolved problems for fifty years. DeepMind’s AlphaFold2 (2020) solved it with deep learning; the training required the equivalent of thousands of GPU-years. Understanding why AlphaFold works, and extending it to drug design and protein engineering, requires simulation at scales beyond what AlphaFold alone provides.

Nuclear stockpile stewardship — maintaining confidence in aging weapons designs without physical testing — drives the U.S. Advanced Simulation and Computing program, which has been the primary funder of the most powerful U.S. supercomputers for thirty years.

For the parallel processors that now populate supercomputer nodes, see The GPU Revolution. For the transistors those processors are built from, see The Semiconductor Race.