Jim Gray and the Transaction
Zusammenfassung
Every time money moves between bank accounts and either both balances change or neither does, Jim Gray’s central idea is at work. At IBM’s System R project in the 1970s he worked out the theory and machinery of the transaction — locking, logging, recovery, two-phase commit — the guarantees later canonized as ACID that let databases promise an all-or-nothing world atop crash-prone machines. He spent the next decades making the idea industrial at Tandem and DEC, popularizing the vocabulary of reliability (“Heisenbug”), inventing the economics of storage (the five-minute rule), and the data cube of analytics, then turned to putting terabytes of satellite imagery and the entire sky online at Microsoft, evangelizing data-intensive science as a “fourth paradigm” of discovery. He won the 1998 Turing Award. In January 2007 the man who taught computers never to lose data sailed out of San Francisco Bay alone and vanished without a trace — despite the largest crowdsourced search in history, mounted by an industry that loved him.
Berkeley’s First
James Nicholas Gray (born January 12, 1944 in San Francisco) earned his B.S. in mathematics and engineering at UC Berkeley in 1966 and stayed on to receive, in 1969, the first Ph.D. granted by Berkeley’s newly created computer science department, working under Michael Harrison on programming language theory. After a stint at Bell Labs he joined IBM Research, and in 1972 arrived at the San Jose lab just as it was about to test whether Edgar Codd’s relational model could be made real.
System R and the Transaction
System R (1974–1979) built the first full relational engine and, with it, SQL. Gray’s domain was the part with no margin for error: what happens when hundreds of users update the same data at once and the machine can crash at any instant. With Raymond Lorie, Gianfranco Putzolu, and Irving Traiger, he developed granular locking — locks at the level of records, pages, or whole tables, with “intention” locks coordinating between levels — and the degrees of consistency that became the isolation levels of every SQL standard since. His work on write-ahead logging and recovery, and on two-phase commit for transactions spanning machines (see Distributed Systems), completed the toolkit.
Gray’s 1981 paper “The Transaction Concept: Virtues and Limitations” distilled the worldview: bundle work into units that are atomic, consistent, and durable, and recovery becomes a solved problem rather than a per-application heroic. Theo Härder and Andreas Reuter coined the acronym ACID in 1983 to name the properties Gray’s work had established. His thousand-page book with Reuter, Transaction Processing: Concepts and Techniques (1992), remains the field’s bible. The transaction abstraction proved as portable as it was deep — it now underlies airline reservations, stock exchanges, e-commerce checkouts, and the journaling in your laptop’s file system.
Tandem: Making Failure Boring
At Tandem Computers (1980–1990), builder of the fault-tolerant NonStop systems that ran exchanges and ATM networks, Gray studied failure empirically. His 1985 report “Why Do Computers Stop and What Can Be Done About It?” analyzed real outage data and popularized a now-standard taxonomy: the Bohrbug, solid and reproducible, versus the Heisenbug, which evaporates when you attach a debugger — and he showed why transactional retry plus process pairs turns flaky software into reliable service. With Putzolu he formulated the five-minute rule (1987): a page accessed at least every five minutes is cheaper to keep in RAM than to re-read from disk — a clean economic law, periodically recomputed by the field ever since, that still guides cache and storage hierarchy design.
In the same years he quietly created the discipline of database benchmarking: a 1985 Datamation paper, “A Measure of Transaction Processing Power,” defined the debit-credit workload and was published under the byline “Anon et al.” Its metrics evolved into the TPC benchmarks on which the database industry has measured itself for four decades (see The Database Wars).
Microsoft: Putting the Sky in a Database
After a period at DEC, Gray joined Microsoft Research in 1995 as a Technical Fellow, running a small San Francisco lab with a simple thesis: the interesting future of databases was scientific data, arriving in volumes no one could analyze by hand. His 1996 data cube paper gave online analytical processing its core operator. TerraServer (1998) put terabytes of aerial and satellite imagery on the public web — at the time among the largest databases ever exposed to the internet, and a precursor of modern map services. With astronomer Alex Szalay he built SkyServer, putting the Sloan Digital Sky Survey online and turning astronomy into a database discipline where discoveries are made by query.
Gray argued this was a general transformation: after empirical, theoretical, and computational science, data-intensive discovery was a “fourth paradigm” (see The Big Data Revolution); Microsoft’s 2009 essay collection of that name was assembled in his honor. He received the 1998 Turing Award “for seminal contributions to database and transaction processing research and technical leadership in system implementation.”
Lost at Sea
On the morning of January 28, 2007, Gray — an experienced sailor — set out alone from San Francisco aboard his 40-foot sloop Tenacious to scatter his mother’s ashes at the Farallon Islands, 27 miles offshore. The weather was clear and calm. He never returned, and no distress call was ever received. The Coast Guard searched 132,000 square miles over four days and found nothing. Then his friends — who included some of the most resourceful engineers alive — mounted something unprecedented: DigitalGlobe re-tasked a satellite over the search area, and tens of thousands of volunteers scanned the imagery, tile by tile, through Amazon Mechanical Turk; oceanographers modeled currents, and private planes flew the candidate coordinates. Not a plank, cushion, or life vest was ever found. In May 2012 a court granted his wife’s petition and declared him legally dead as of January 28, 2012 — five years to the day after he sailed. The disappearance remains unexplained — a wholly analog mystery at the end of a life spent eliminating data loss, and the crowdsourced search itself became a landmark (and a sobering benchmark) for the citizen-science methods Gray had championed.
Fun Fact: Anon et al.
Gray circulated his 1985 benchmarking paper anonymously — the published byline reads “Anon et al.” — because he had assembled it with input from two dozen colleagues across rival companies, and because no vendor’s employee could safely put a name on a paper defining how all vendors’ products would be measured. It worked: the paper carried no corporate taint, everyone adopted its workload, and one of the database industry’s most consequential papers is officially by nobody.
📚 Sources
- Wikipedia: Jim Gray (computer scientist)
- ACM A.M. Turing Award: Jim Gray (1998) — citation and profile
- Gray: The Transaction Concept — Virtues and Limitations (VLDB 1981, PDF)
- Jim Gray (computer scientist) — Wikipedia
- Gray & Putzolu: The 5 Minute Rule (SIGMOD 1987)
- Anon et al.: A Measure of Transaction Processing Power (Datamation 1985, PDF)
- Gray et al.: Data Cube — A Relational Aggregation Operator (1996/1997)
- NYT: Jim Gray declared dead in absentia (2012)
- The Fourth Paradigm: Data-Intensive Scientific Discovery (Microsoft Research, 2009)