Judea Pearl and the Causal Revolution

Zusammenfassung

Judea Pearl rescued probability for artificial intelligence — and then declared probability insufficient and rescued causality too. In the 1980s, when AI handled uncertainty with ad-hoc scoring rules, Pearl’s Bayesian networks showed how full probabilistic reasoning could be made computationally tractable by exploiting the structure of what depends on what; his 1988 book turned probabilistic AI from heresy into orthodoxy and laid groundwork that machine learning still stands on. He then spent the next decades building something statistics had refused to formalize for a century: a mathematics of cause and effect — causal diagrams, the do-calculus, counterfactuals — that transformed epidemiology, economics, and social science, and earned him the 2011 Turing Award. Late in life he became deep learning’s most distinguished critic, insisting that systems which only fit curves to data remain on the bottom rung of the “ladder of causation,” unable to answer why.

From Tel Aviv to the Engine Room of AI

Judea Pearl (born September 4, 1936 in Tel Aviv) studied electrical engineering at the Technion, emigrated to the United States in 1960, and took graduate degrees in engineering and physics (Ph.D., Polytechnic Institute of Brooklyn, 1965). He spent the 1960s as a hardware researcher at RCA’s David Sarnoff Research Center working on computer memory — superconducting memory among it — before the transistor’s triumph erased that field. In 1970 he joined UCLA, where he has remained ever since, and turned to the question that would define his career: how can a machine reason under uncertainty? His first book, Heuristics (1984), analyzed the search strategies of early AI; his second would upend the field.

Bayesian Networks: Making Probability Tractable

AI in the early 1980s had a dirty secret: its systems faced uncertain evidence everywhere — symptoms, sensor readings, conflicting rules — but mainstream AI had rejected probability theory as both computationally hopeless (a joint distribution over n variables needs exponentially many numbers) and epistemologically wrong-headed. Expert systems like MYCIN papered over the gap with invented “certainty factors” (see the Dead End below).

Pearl’s answer, developed in papers from 1982 onward and named in 1985, was the Bayesian network: represent variables as nodes in a directed acyclic graph whose arrows encode direct dependence, and the joint distribution factorizes along the graph. What depends on what — knowledge experts find natural to state — becomes the very structure that makes computation feasible. His belief propagation algorithm let evidence flow through such networks efficiently, updating beliefs everywhere from a new observation anywhere.

His 1988 book, Probabilistic Reasoning in Intelligent Systems, consolidated the framework and became one of the most cited works in computer science. Within a decade, probabilistic methods had gone from fringe to foundation: Bayesian networks ran medical diagnosis systems and the troubleshooting wizards Microsoft shipped in Windows and Office; their mathematical descendants (graphical models, message passing) pervade modern machine learning, error-correcting decoders, and the probabilistic worldview that the deep learning era inherited even as its methods diverged.

The Causal Turn

Having legitimized probability in AI, Pearl concluded it wasn’t enough. Probability describes what we see; it cannot by itself distinguish “the barometer falls before storms” from “the barometer causes storms.” Statistics had enforced that silence for a century — “correlation is not causation” served as both warning and prohibition, with randomized experiments the only sanctioned escape.

Starting around 1990, Pearl built the prohibited mathematics. Causal diagrams make assumptions about mechanism explicit; the do-operator distinguishes observing an event from intervening to force it — P(recovery | do(treatment)) versus P(recovery | treatment) — and his do-calculus (1995) gives complete rules for when the effect of an intervention can be computed from purely observational data, no experiment required. Structural causal models extend the framework to counterfactuals: what would have happened to this patient, had she not been treated? His treatise Causality (2000) assembled the edifice.

The revolution landed hardest outside computer science. Epidemiology adopted causal diagrams as standard methodology for reasoning about confounding; economics, political science, and public health followed. Questions that had been matters of taste — which variables must be controlled for, which conclusions survive observational data — became calculations. Pearl’s The Book of Why (2018, with Dana Mackenzie) carried the ladder of causation to a general audience: rung one, association (seeing); rung two, intervention (doing); rung three, counterfactuals (imagining). For “fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning,” Pearl received the 2011 Turing Award.

The Critic of Curve Fitting

Pearl spent the deep-learning boom as its friendliest heretic. Today’s neural networks, he argues — however spectacular at generation and prediction — operate on rung one of the ladder: they fit exquisitely flexible curves to observational data, and no amount of data alone can lift a learner to rungs two and three. A system that cannot represent intervention and counterfactuals cannot understand why, cannot transfer reliably to changed conditions, and cannot explain itself. “All the impressive achievements of deep learning amount to just curve fitting,” he told an interviewer in 2018 — not a dismissal, he insisted, but a map of what remains: machines with causal models of their world. Whether the path to that goal runs through his calculus or through scale remains one of AI’s live disputes.

Daniel

In 2002, Pearl’s son Daniel Pearl, the South Asia bureau chief of the Wall Street Journal, was kidnapped and murdered by al-Qaeda-linked terrorists in Karachi, Pakistan. Judea and Ruth Pearl founded the Daniel Pearl Foundation, promoting cross-cultural understanding through journalism and music, and Judea co-edited I Am Jewish (2004), a collection built from Daniel’s last words. He has written that his scientific work on cause and effect and his public work against hatred became, after 2002, two halves of the same life.

⚠️ Dead End: Certainty Factors and Ad-Hoc Uncertainty

Before Bayesian networks, AI’s flagship systems managed uncertainty with invented arithmetic. MYCIN’s certainty factors attached numbers between −1 and +1 to rules and combined them with formulas chosen for convenience; other systems used scores, weights, or early fuzzy hybrids. The schemes worked in narrow demos and collapsed under analysis: researchers showed the combination rules harbored hidden independence assumptions that were routinely violated, so chaining rules could yield confidently wrong conclusions, and the numbers experts supplied had no coherent interpretation at all. The approach was abandoned as probabilistic methods matured — a textbook case of a field accepting mathematical incoherence because the coherent alternative looked intractable, until Pearl showed it wasn’t. The certainty-factor episode survives mainly as the cautionary opening chapter in accounts of why Bayesian networks won.

Fun Fact: The Other Pearl Effect

Pearl has a scientific legacy that predates AI entirely: as a young electrical engineer at RCA in 1964, he analyzed the behavior of magnetic vortices in thin superconducting films. Those vortices are known in condensed-matter physics to this day as Pearl vortices, with their characteristic “Pearl length.” He is plausibly the only person commemorated both in superconductivity textbooks and on the Turing Award roll.