DeepMind and AlphaGo: The Match That Changed AI

Zusammenfassung

In March 2016, an AI system built by a London research lab defeated the world’s strongest Go player four games to one — a moment many AI researchers had predicted was at least a decade away. The match between AlphaGo and Lee Sedol was not just a benchmark milestone; it was a demonstration that deep reinforcement learning could master domains previously thought to require distinctly human intuition. The company behind it, DeepMind, had been built on the conviction that intelligence itself was a problem that could be solved, and the AlphaGo victory was its first proof at the highest level.

Three Friends and a Theory of Intelligence

DeepMind Technologies was founded in London in 2010 by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Hassabis and Suleyman had been friends since childhood (through family); Hassabis met Legg later, when the two were postdocs at UCL’s Gatsby Computational Neuroscience Unit.

Hassabis was the central figure — a chess prodigy who had achieved a FIDE rating equivalent to master level at thirteen, a neuroscience PhD from University College London, and a game developer who had built the acclaimed AI strategy game Theme Park at Bullfrog Productions as a teenager and co-founded Elixir Studios. His conviction was that the brain was not magic — it was a physical system running an algorithm, and that algorithm could be understood, reverse-engineered, and ultimately re-implemented in silicon. This was not a casual belief; it was the animating theory of his adult life.

Legg had written his PhD dissertation on machine super-intelligence at IDSIA (the Dalle Molle Institute for Artificial Intelligence Research) in Lugano, Switzerland, under Marcus Hutter; Suleyman had co-founded a mental health charity. Together they attracted early investment from Peter Thiel’s Founders Fund, Scott Banister, and Jaan Tallinn (co-creator of Skype). The lab operated initially in a converted townhouse in Bloomsbury, attracting some of the world’s best reinforcement learning and neural network researchers.

The lab’s early work was deliberately fundamental: understanding how neural networks could learn to play Atari games from raw pixel input, with no domain knowledge except the game rules and the score.

The Atari Paper: Proof That Deep RL Worked

In February 2015, DeepMind published “Human-level control through deep reinforcement learning” in Nature — the DQN (Deep Q-Network) paper. A single neural network learned to play 49 Atari 2600 games directly from the raw pixel display and the game score, with no other information. The same network, with the same architecture and hyperparameters, achieved superhuman performance on more than half the games — games as varied as Breakout, Space Invaders, Pong, and Seaquest.

The paper was a watershed. Previous demonstrations of game-playing AI had required hand-crafted features, domain-specific knowledge, and separate systems for each game. DQN used one architecture, learned from scratch, and surpassed human performance through pure trial and error. The combination of deep convolutional neural networks (to perceive the pixel input) with Q-learning reinforcement (to choose actions that maximize long-term reward) produced an agent that could generalize across fundamentally different visual environments.

The Nature publication was unusual — fundamental machine learning results typically appeared in conference proceedings, not in one of the world’s most prestigious science journals. It signaled both DeepMind’s own ambitions and, three months after Google had paid approximately $500 million to acquire the lab in January 2014, the resources to match them.

Why Go Was the Last Frontier

Chess had fallen to computers in 1997, when IBM’s Deep Blue defeated Garry Kasparov. Checkers had been solved completely in 2007. But Go — the ancient Chinese board game, older than chess, played on a 19×19 grid — remained a human domain well into the 2010s.

The reasons were mathematical. A standard 19×19 Go board has approximately 10^170 possible game states — more than the number of atoms in the observable universe. Chess’s search tree, while large, could be navigated by brute-force search combined with carefully tuned evaluation functions; the best chess programs were essentially very fast lookahead systems. Go resisted this approach because:

The branching factor (legal moves at each step) averages around 250, compared to ~35 in chess — brute-force search hits a wall within a few moves.
Positional evaluation in Go is extraordinarily difficult. A chess program can assess “who’s ahead” fairly reliably by counting material and evaluating king safety; in Go, the value of a position depends on subtle strategic factors that even professional players find difficult to articulate.
Go required what players and coaches called intuition — pattern recognition operating at a level that eluded algorithmic description.

The conventional wisdom in the AI community was that strong Go programs were 10–20 years away. DeepMind collapsed that estimate to one.

AlphaGo vs. Fan Hui: The Quiet First Victory

In October 2015, AlphaGo played a match against Fan Hui — the three-time European Go champion — at DeepMind’s London office. The match was held in secret; only a small group of observers was present. AlphaGo won 5–0.

Fan Hui was approximately 600 ELO points below the world’s top players, and the victory was therefore not conclusive proof that AlphaGo had achieved top-human or superhuman performance. But it was the first time any computer program had defeated a professional Go player in a formal match without handicap. Fan Hui wept after one of the games.

The result was published in Nature in January 2016, simultaneous with the announcement of a match against Lee Sedol — the dominant player in world Go, the winner of 18 international championships, widely described as the greatest Go player of the past decade.

Seoul, March 2016: The Match

The five-game match between AlphaGo and Lee Sedol took place in Seoul, South Korea, from March 9–15, 2016. It was broadcast live online and watched by an estimated 200 million-plus viewers worldwide, with an especially large audience in China. Google DeepMind offered a $1 million prize for the match; Lee expected to win.

Lee had studied AlphaGo’s games against Fan Hui. He said publicly that AlphaGo showed “some weaknesses” and that he would win by at least 4–1, perhaps 5–0. He had been playing Go since age five. He was considered virtually unbeatable.

Game 1: AlphaGo won. Lee appeared puzzled rather than alarmed — he had made errors, he said, and AlphaGo had played well.

Game 2, Move 37: This was the moment the match became historic.

On the 37th move of the second game, AlphaGo played a move — a 5th-line shoulder hit far from the active area of play — that no human Go player would have played. The commentators, both professional Go players, initially assumed the program had made an error. They said so on the live broadcast. Lee Sedol left the table for fifteen minutes, taking an extended pause that the match rules permitted. He later said he needed time to recalibrate his entire understanding of the position. AlphaGo won Game 2.

The program estimated the probability of a human professional playing Move 37 at 1 in 10,000. It played the move because its self-play training had discovered its value empirically — the move appeared in games where AlphaGo won, and so it was assigned positive value, regardless of whether any human had played it or whether any human tradition endorsed it. AlphaGo had developed strategic ideas with no human origin.

Game 3: AlphaGo won. Lee Sedol was now facing a shutout.

Game 4, Move 78 — “The Divine Move”: In what became the most celebrated single move of the match, Lee Sedol played a move on turn 78 — a wedge between two AlphaGo stones — that AlphaGo had assigned less than a 1-in-10,000 probability. The program’s evaluation of its own position collapsed almost immediately; it began playing what commentators recognized as confused, inconsistent moves. Lee Sedol won Game 4. He wept at the board. He pumped his fist. The match audience, many of whom had been devastated by the first three games, broke into applause.

After the game, Lee said the move was a “divine move” — a translation of the Korean concept of shinui — a move so unexpected and brilliant that it transcended normal calculation.

Game 5: AlphaGo won. Final score: AlphaGo 4, Lee Sedol 1.

What AlphaGo Was Doing

AlphaGo combined two neural networks — a policy network (which moves to consider) and a value network (how good is the current position) — with Monte Carlo Tree Search. Both networks were trained first on a database of professional human games (supervised learning) and then improved through self-play (reinforcement learning). The policy network compressed the search space from 250 possible moves to a handful worth examining; the value network replaced deep search with positional evaluation. The combination made the search tractable.

AlphaGo Zero: Teaching Itself From Nothing

In October 2017, DeepMind published the paper for AlphaGo Zero — a version trained exclusively through self-play, with no access to human games, starting from nothing but the rules.

AlphaGo Zero surpassed AlphaGo Lee — the version that beat Lee Sedol — after just 3 days of training, winning 100 games to 0. After 21 days it reached the level of AlphaGo Master, the version that had beaten the world’s current top players 60–0 in online matches. After 40 days it exceeded all previous versions.

The significance was philosophical as well as technical. The original AlphaGo had been built partly on human knowledge — the policy network was initialized on professional game records. AlphaGo Zero had no such initialization. Its strategic knowledge was discovered entirely from self-play. Yet it discovered independently many of the canonical strategies of human Go tradition — joseki (corner patterns), fuseki (opening frameworks) — and several strategies that human players found unfamiliar and had never articulated.

AlphaZero and AlphaFold

In December 2017, DeepMind published AlphaZero — a generalization of the AlphaGo Zero approach to chess, shogi, and Go simultaneously. AlphaZero trained from scratch for each game and surpassed the strongest specialized programs: it defeated Stockfish (then the world’s strongest chess engine) after four hours of self-play training. The games AlphaZero played against Stockfish were noted for their style — attacking, sacrificial, human-like in aesthetic character, unlike the defensive computer-optimal play that had dominated top computer chess.

The broader message was that the core algorithmic insight — deep neural networks trained by self-play reinforcement learning — was not Go-specific. It was a general method for any well-defined problem with a clear reward signal.

AlphaFold (2020) was DeepMind’s most consequential scientific contribution. The protein-folding problem — predicting the three-dimensional structure of a protein from its amino acid sequence — had been the central unsolved problem in structural biology for fifty years. In November 2020, at the CASP14 competition (Critical Assessment of Protein Structure Prediction), AlphaFold2 achieved accuracy levels indistinguishable from experimental measurements for most proteins. In July 2022, DeepMind released structures for over 200 million proteins — essentially every known protein — freely available to the scientific community.

In October 2024, Demis Hassabis and John Jumper (AlphaFold’s lead researcher) were awarded the Nobel Prize in Chemistry, shared with David Baker for computational protein design. The Nobel committee credited AlphaFold with “solving a 50-year-old grand challenge” in biology.

For Lee Sedol’s response to being defeated, he retired from professional Go in November 2019, saying: “With the debut of AI in Go games, I’ve realized that I’m not at the top even if I become the number one through frantic efforts. There is an entity that cannot be defeated.” He remains the only human ever to have beaten AlphaGo in a formal match.

For the broader context of AI development, see Geoffrey Hinton and Deep Learning and The Rise of Artificial Intelligence. For DeepMind’s parent company, see Google: From Dorm Room to Digital Infrastructure. For the reinforcement learning theory underpinning AlphaGo’s self-play, see Reinforcement Learning.