Newell and Simon: The Thinking Machines

Zusammenfassung

In the summer of 1956, Allen Newell and Herbert Simon demonstrated a program that could prove theorems in mathematical logic — not by exhaustive search but by using heuristics modeled on how humans actually think through problems. It was the first AI program to do something that had previously required human intelligence, and it launched one of the most ambitious research programs in the history of science: the hypothesis that human thinking was computation, and that human thinking could therefore be modeled as a computer program. Newell and Simon spent the next three decades trying to prove this hypothesis, and in 1975 they received the Turing Award jointly for the attempt.

Two Lives on a Collision Course

Allen Newell was born in San Francisco on March 19, 1927, the son of a radiologist. He grew up in comfortable circumstances, studied mathematics at Stanford, and in 1950 took a research position at the RAND Corporation in Santa Monica — the think tank that the Air Force had established to bring scientific rigor to military planning. RAND in the early 1950s was one of the most intellectually intense environments in America, filled with game theorists, economists, mathematicians, and psychologists trying to apply the methods of operations research to national strategy. Newell was assigned to work on air defense — specifically, how to coordinate the hundreds of operators who monitored radar displays and tracked aircraft. He became convinced that understanding how humans processed information was central to designing effective systems.

Herbert Alexander Simon was born in Milwaukee on June 15, 1916. His background was utterly different: he studied political science and economics at the University of Chicago, spent years studying municipal government and administrative behavior, and in 1947 published a book, Administrative Behavior, that reframed organizational decision-making as an information-processing problem. The book argued that real organizations did not optimize — as classical economic theory assumed — but made decisions under constraints of limited information, limited cognitive capacity, and limited time. This insight would eventually win Simon the Nobel Prize in Economics in 1978.

Simon joined Carnegie Institute of Technology (later Carnegie Mellon University) in 1949 as a professor in the Graduate School of Industrial Administration. He came to RAND as a consultant, and in 1952 Newell was assigned to work with him. They discovered, almost immediately, that they were thinking about the same questions from complementary directions. Newell was an engineer who wanted to understand how humans worked so he could build better systems. Simon was a social scientist who wanted a computational theory of human cognition. They would spend forty years trying to build both simultaneously.

The Logic Theorist: Program as Proof of Concept

Through 1955 and into 1956, Newell and Simon, working with programmer J.C. Shaw, built the Logic Theorist — a program that could prove theorems in the propositional calculus of Principia Mathematica, the 1910 three-volume work in which Bertrand Russell and Alfred North Whitehead had attempted to derive all of mathematics from logical axioms. Principia was both a monument of intellectual ambition and, to most mathematicians, an unreadable slog. Its second chapter contained fifty-two theorems. The Logic Theorist would try to prove them.

The approach was not brute force. Brute force — trying all possible proof sequences — was computationally impossible even in principle; the search space was astronomically large. Instead, Newell and Simon used heuristics: rules of thumb derived from observing how mathematicians actually solved problems. The program would compare the theorem to be proved against known theorems, looking for structural similarities that might suggest useful substitutions. It would work backward from the goal, identifying what intermediate results would be needed. It would use analogy — finding a proof of a related theorem and adapting it.

The Logic Theorist proved 38 of the 52 theorems in the second chapter of Principia Mathematica. For one theorem, it found a proof shorter and more elegant than Russell and Whitehead’s original — so elegant that Simon, delighted, tried to publish it in the Journal of Symbolic Logic as a joint paper by Newell, Simon, and the Logic Theorist. The editor rejected the submission, reportedly questioning whether a computer program could be a co-author. The proof itself was not published until years later.

At the Dartmouth Conference in the summer of 1956, Newell and Simon demonstrated the Logic Theorist alongside John McCarthy’s nascent AI programs and other early systems. The demonstration was the most concrete evidence anyone had yet produced that machines could do something that looked like reasoning — not retrieving stored facts, not computing numerically, but generating novel logical arguments.

Information Processing Language

To build the Logic Theorist, Newell and Simon had needed a programming language capable of manipulating symbolic structures — lists, trees, patterns, expressions — rather than numbers. FORTRAN, available in 1955, was designed for numerical computation and was entirely unsuited to the task.

They built Information Processing Language (IPL), which predated LISP by three years. IPL was effectively assembly language for a hypothetical list-processing machine: low-level, difficult to use, but capable of the symbolic operations that AI programs required. It established the pattern that John McCarthy would refine more elegantly in LISP: AI programming needed list-based data structures, recursive processing, and symbolic rather than numeric operations.

IPL was not widely adopted outside Newell and Simon’s own group, but its conceptual contribution was real: it demonstrated the need for a new class of programming language, and it influenced McCarthy’s design of LISP directly. For the LISP story, see John McCarthy and LISP.

The General Problem Solver

In 1957, Newell and Simon began their most ambitious project: a program that would not be specialized for theorem proving or any other specific domain, but would apply a single general reasoning strategy to any problem that could be represented in its formalism. They called it the General Problem Solver (GPS).

GPS’s method was means-ends analysis — a reasoning strategy that Simon and Newell had identified by watching human subjects think aloud while solving problems (a technique they called protocol analysis). Means-ends analysis worked as follows: compare the current state of the problem to the desired goal state; identify the most significant difference between them; find an operator — an action — that would reduce that difference; apply the operator; repeat until the current state matches the goal. If an operator’s preconditions were not met, create a subgoal to meet them first, and apply means-ends analysis recursively.

The method was domain-independent in principle. The domain-specific knowledge lived entirely in the representation of states, the specification of operators, and the definition of difference. Change those, and GPS would solve a different kind of problem. Newell and Simon tested GPS on symbolic logic, chess endgames, the Tower of Hanoi, and various other puzzle problems. It worked. It was slow, and it scaled poorly to difficult problems, but it worked.

Info

GPS was published as a running program in 1957 and described in a series of technical reports and papers through the early 1960s. The research program it represented — building general-purpose AI through explicit problem-solving methods — was enormously influential on subsequent work in AI planning, including STRIPS (1971), the planning language used in robotics and automated planning systems for decades. Means-ends analysis, as a concept, became standard vocabulary in AI and cognitive science.

Bounded Rationality and the Nobel Prize

While Newell pursued the computational aspects of their collaboration, Simon’s parallel career as an economist and organizational theorist continued to develop. His central insight — that real decision-makers did not behave like the perfectly rational agents of classical economics — crystallized into a full theory of bounded rationality.

Real people, Simon argued, operated under three fundamental constraints: limited information about the world, limited cognitive capacity to process information, and limited time to make decisions. Under these constraints, optimal decision-making was impossible. Real people instead satisficed — they searched through available options until they found one that was “good enough” by some aspiration level, and then stopped searching. The word “satisfice” combined “satisfy” and “suffice”: good enough, not best.

Bounded rationality was a direct challenge to the rational actor model that underlay most of twentieth-century economics, and it required decades to gain acceptance. Simon received the Nobel Prize in Economic Sciences in 1978, with the citation specifically noting his work on decision-making in organizations and his development of bounded rationality as a theoretical concept. He remains one of very few people to win the Nobel Prize while also winning the Turing Award.

The connection between bounded rationality and the General Problem Solver was direct: GPS’s means-ends analysis was satisficing. The program stopped searching when it found a proof or a solution, not the best possible proof or solution. Simon believed that understanding the limits of human cognition was not a limitation to apologize for but a design principle — both for building AI systems and for designing organizations.

The Physical Symbol System Hypothesis

In 1976, Newell and Simon delivered their Turing Award lecture — the most important statement of the theoretical principles underlying their entire research program. The central claim was the Physical Symbol System Hypothesis:

“A physical symbol system has the necessary and sufficient means for general intelligent action.”

A physical symbol system is any device — biological or artificial — that reads and writes symbols, stores them, copies them, and applies formal rules to operate on them. Digital computers are physical symbol systems. Brains, Newell and Simon argued, are also physical symbol systems, operating on neural representations that function as symbols. The hypothesis had two parts.

The necessity claim: any system that exhibits general intelligent behavior must, at some level of description, be a physical symbol system. Intelligence requires representing the world in symbols, manipulating those symbols by formal rules, and using the results to guide action. A system that did not do this could not exhibit general, flexible intelligence.

The sufficiency claim: any physical symbol system, given sufficient memory and speed, can exhibit general intelligent behavior. Intelligence is not magic; it does not require anything beyond symbol manipulation.

The hypothesis was a manifesto for the computational theory of mind: the claim that thinking is information processing, that cognition can be understood as computation, and that therefore artificial general intelligence is achievable in principle. It was also, for critics, the primary target. John Searle’s Chinese Room argument (1980) was a direct attack on the sufficiency claim: a system can manipulate symbols according to formal rules without understanding anything, Searle argued, and therefore symbol manipulation alone cannot constitute genuine intelligence. The debate continues.

The Prediction and the Long Road to Deep Blue

In 1957, Simon made a prediction that became one of the most cited examples of AI overconfidence: “Within ten years, a computer will be the world’s chess champion.” He was wrong about the timeline by four decades. Deep Blue, IBM’s specialized chess computer, defeated world champion Garry Kasparov in 1997 — forty years after Simon’s prediction, not ten.

The chess prediction illustrated something important about the difficulty of predicting progress in AI. The goal was achieved — a computer did become the world chess champion — but by methods very different from those Newell and Simon were pursuing in 1957. Deep Blue’s success came not from general problem-solving methods modeled on human reasoning but from specialized hardware, massive databases of opening theory and endgame tablebases, and brute-force search of hundreds of millions of positions per second. It was, in the vocabulary Newell and Simon would have used, a search machine, not a reasoning machine.

Info

The arc from the Logic Theorist to Deep Blue illustrates a recurring pattern in AI: a task that seems to require intelligence turns out, when solved by a machine, to have been solved in ways that bypass the kinds of intelligence humans use. Kasparov himself, reflecting on his loss to Deep Blue, observed that the machine did not play like a person at all — it did not get nervous, did not press advantages for psychological reasons, did not make strategically meaningful sacrifices. It played a kind of chess that was objectively strong but humanly alien.

Carnegie Mellon and the Legacy

Both Newell and Simon spent their careers at Carnegie Mellon, which became, partly through their influence, one of the world’s premier computer science research institutions. The cognitive simulation tradition they built there — protocol analysis, production systems, computational models of memory and learning — generated decades of work in cognitive psychology, AI, and human-computer interaction.

Newell’s final major project was SOAR (States, Operators And Reasoning), a cognitive architecture — a general computational model of human cognition — developed with John Laird and Paul Rosenbloom in the 1980s. SOAR attempted to implement a unified theory of cognition: all human cognitive functions, from simple reaction to complex problem solving, arising from a single underlying mechanism. It remains an active research system and has been used in both AI applications and cognitive psychology experiments.

Allen Newell was diagnosed with prostate cancer in 1991. He continued working until near the end, delivering a keynote at the 1991 ACM SIGART conference while seriously ill. He died in Pittsburgh on July 19, 1992. He was sixty-five.

Herbert Simon continued teaching and writing at Carnegie Mellon until very close to his death. He produced influential work in economics, psychology, political science, philosophy of science, and computer science — a range of output that no other figure in twentieth-century intellectual life matched. He died in Pittsburgh on February 9, 2001, at eighty-four.

Their Turing Award citation, from 1975, described contributions to “basic research in artificial intelligence, the psychology of human cognition, and list processing” — an unusual citation in acknowledging contributions across the boundary between computer science and psychology. The boundary itself was something Newell and Simon had, more than anyone, helped to dissolve.

Dead End: The General Problem Solver’s Limitations

The GPS research program ultimately ran into limits that no amount of refinement could resolve. GPS worked reliably on toy problems: logical transformations, puzzles with small state spaces, carefully defined domain tasks. It failed to scale to real problems.

Warnung

The fundamental obstacle was combinatorial explosion. In real problem domains — planning a complex engineering project, diagnosing a medical condition, understanding natural language — the number of states that a means-ends search needed to consider grew exponentially with problem size. No heuristic strategy could reduce this explosion to manageable proportions without enormous amounts of domain-specific knowledge. And when that knowledge was added, the result was not a general problem solver but a specialized expert system — capable within its domain, useless outside it. Expert systems of the 1970s and 1980s were essentially GPS with enough specific knowledge to solve specific problems. They were commercially valuable in narrow applications and collectively fragile: a medical expert system that was brilliant at diagnosing liver conditions had no resources at all for any problem it had not been explicitly programmed to handle. The brittleness of knowledge-based AI — its total inability to generalize beyond programmed knowledge — ultimately drove the field toward statistical and neural approaches that Newell and Simon had regarded with skepticism. Whether those approaches constitute the kind of intelligence the Physical Symbol System Hypothesis described, or represent something categorically different, is one of the central open questions of contemporary AI research.

The AI tradition that Newell and Simon founded and the subsequent developments that transformed the field are traced in The Rise of Artificial Intelligence.