AlphaFold
Zusammenfassung
For fifty years, “how does a protein fold?” was biology’s most famous unsolved problem: a chain of amino acids spontaneously crumples into a precise 3-D shape that determines everything it does, and no one could reliably predict that shape from the sequence. Determining one structure in the lab could take a PhD student years. Then DeepMind’s AlphaFold entered a biennial blind competition and, with AlphaFold2 in 2020, predicted structures so accurately that organizers declared the protein-folding problem essentially solved. DeepMind then computed structures for nearly all 200 million proteins known to science and released them free. In 2024 Demis Hassabis and John Jumper shared the Nobel Prize in Chemistry — one of the first Nobels for a deep-learning system, and the clearest proof yet that AI can deliver not just chatbots but fundamental scientific discovery.
The Problem: Anfinsen’s 50-Year Riddle
A protein is a linear chain of amino acids, but it only works once folded into a specific three-dimensional shape — the shape is the function. In 1972 Christian Anfinsen won a Nobel Prize for showing that a protein’s sequence alone determines its folded structure: all the information is there in the chain. That implied prediction should be possible in principle. In practice it was a nightmare — the number of possible conformations is astronomically large (Levinthal’s paradox), yet real proteins fold in milliseconds.
For decades, structures were found experimentally — X-ray crystallography, NMR, later cryo-electron microscopy — each structure a multi-year, expensive undertaking. By the 2010s the Protein Data Bank held ~170,000 structures while databases listed hundreds of millions of sequences with no known shape. The gap was the bottleneck of molecular biology and drug discovery.
CASP: The Blind Test
Since 1994, the Critical Assessment of Structure Prediction (CASP) has run every two years as the field’s honest scoreboard: organizers hand teams amino-acid sequences whose structures have been solved experimentally but not yet published, and teams predict blind. Scores are measured in GDT (Global Distance Test, 0–100); a score around 90 is considered comparable to experiment. For years the best methods plateaued in the 30s–40s for hard targets.
- CASP13 (2018): DeepMind’s first AlphaFold entry won decisively, a clear jump over rivals — proof that deep learning could attack the problem, though not yet at experimental accuracy.
- CASP14 (2020): AlphaFold2, substantially rearchitected after John Jumper took technical leadership, scored a median GDT around 90 across targets. The improvement was so large that CASP co-founder John Moult said the problem could be considered “solved in some sense.” It was a discontinuity, not an increment.
How AlphaFold2 Works
AlphaFold2’s accuracy came from a custom architecture, not an off-the-shelf network. It begins with a multiple sequence alignment (MSA) — gathering evolutionarily related sequences across species, because amino acids that mutate together tend to be physically close in the folded structure (co-evolution carries 3-D information). Its central module, the Evoformer, uses attention to pass information back and forth between the sequence-alignment representation and a residue-pair representation, iteratively reasoning about geometry. A final structure module outputs explicit 3-D atomic coordinates and, crucially, a per-residue confidence score (pLDDT) telling biologists which parts to trust.
The Database: 200 Million Structures, Free
DeepMind’s most consequential decision was not technical but distributive. In July 2021 it published the AlphaFold2 method openly and, with the European Bioinformatics Institute (EMBL-EBI), launched the AlphaFold Protein Structure Database — initially the human proteome and a few model organisms, expanded by 2022 to predicted structures for virtually all ~200 million proteins catalogued in UniProt. Free, no login, no fee. Within a few years it had been used by over two million researchers in 190+ countries, accelerating work on enzymes, antibiotic resistance, neglected tropical diseases, and plastic-eating enzymes — research that simply could not have waited years per structure.
AlphaFold3 and the Licensing Fight
In May 2024, AlphaFold3 broadened the scope from single proteins to molecular complexes — proteins bound to DNA, RNA, small-molecule drugs (ligands), and ions — the interactions that actually matter for designing medicines. It replaced the structure module with a diffusion-based generator (the same family of models behind AI image generation), reasoning directly over atom coordinates.
But AlphaFold3 launched without open code, accessible only through DeepMind’s web AlphaFold Server under non-commercial terms. Scientists protested in Nature that an unreproducible method undercut peer review and open science. Under pressure, DeepMind released the AlphaFold3 model code and weights for academic use about six months later — a revealing collision between DeepMind-the-research-lab and DeepMind-as-part-of-Google’s-commercial-AI strategy.
The Nobel Prize, 2024
The 2024 Nobel Prize in Chemistry went half to David Baker (University of Washington) for computational protein design — building new proteins from scratch (Rosetta, RFdiffusion) — and half jointly to Demis Hassabis and John Jumper of Google DeepMind for protein structure prediction with AlphaFold2. It was one of the earliest Nobel Prizes recognizing a deep-learning system for a scientific result, arriving the same week Geoffrey Hinton and John Hopfield took the Physics prize for neural networks — a double signal that machine learning had moved to the center of science. For Hassabis it vindicated a career-long thesis: that the same techniques that mastered Go could be turned on nature’s hardest problems.
Dead End / Caution: What AlphaFold Does Not Solve
The “protein folding problem solved” headlines oversold a real but bounded achievement. AlphaFold predicts a protein’s most likely static folded shape; it does not simulate the folding process, nor fully capture how proteins flex, change conformation, or behave inside the crowded chemistry of a living cell. It is weaker on intrinsically disordered regions, on the effect of single mutations, and on novel folds with no evolutionary cousins to align against. It predicts structure, not function or dynamics. The breakthrough was enormous — and it relocated the frontier rather than dissolving it.
Fun Fact: It Inherited the Go Team’s Playbook
AlphaFold is not a detour from games to biology — it is the same lineage. DeepMind was founded on the belief that mastering games like Go would forge general problem-solving techniques transferable to science, and AlphaFold2’s lead, John Jumper, brought physics-based protein-simulation expertise into a group steeped in the AlphaGo/AlphaZero approach to searching vast structured spaces. The system that solved a 50-year biology problem was built by people who had spent the previous years teaching machines to play board games — and that was the plan all along.
📚 Sources
- Nobel Prize in Chemistry 2024 — Press release (NobelPrize.org)
- Nobel Prize in Chemistry 2024 — Popular information
- Wikipedia: AlphaFold
- Jumper et al., “Highly accurate protein structure prediction with AlphaFold,” Nature (2021)
- AlphaFold Protein Structure Database (DeepMind / EMBL-EBI)
- AlphaFold — Wikipedia
- Chemistry World: How AI protein structure prediction and design won the Nobel prize
- CASP — Critical Assessment of Structure Prediction
- Lasker Foundation: AlphaFold — a technology for predicting protein structures