Secure by Design: A History of Software Security Engineering

Zusammenfassung

Most histories of computer security are told from the attacker’s side: the worms, the break-ins, the zero-day market. But there is a parallel history of the defenders — not the people responding to incidents, but the people trying to build software that does not break in the first place. This is the history of security as an engineering discipline: the reference monitor, the design principles of Saltzer and Schroeder, the buffer overflow and the decades-long arms race to neutralize it, Bill Gates halting Windows development to bolt security into the process itself, and the slow, contested journey toward treating memory safety as a matter of national policy. It is the story of how the industry tried to move from “penetrate and patch” to “secure by design” — and how hard that turned out to be.

The Reference Monitor: Security as a Design Goal

The idea that security should be designed into a system, rather than added afterward, is nearly as old as multi-user computing. In October 1972, James P. Anderson produced the Computer Security Technology Planning Study for the U.S. Air Force — the Anderson Report. Faced with the problem of letting users of different clearance levels share one machine, Anderson articulated the concept of a reference monitor: an abstract mechanism that mediates every access by a subject (a user or process) to an object (a file or resource), enforcing the security policy. To be trustworthy, the reference monitor had to be tamper-proof, always invoked, and small enough to be verified by exhaustive analysis.

The intellectual capstone came in 1975, when Jerome Saltzer and Michael Schroeder published The Protection of Information in Computer Systems, drawing on their work on MIT’s Multics operating system. Their paper distilled eight design principles that remain the canon of secure design half a century later:

The Saltzer–Schroeder Principles (1975)

Economy of mechanism — keep the design as simple and small as possible.
Fail-safe defaults — base access decisions on permission, not exclusion; deny by default.
Complete mediation — check every access to every object for authority.
Open design — security must not depend on attackers’ ignorance of the mechanism.
Separation of privilege — require more than one condition to grant access where practical.
Least privilege — every program and user operates with the minimum privileges needed.
Least common mechanism — minimize mechanisms shared between users.
Psychological acceptability — make the protection easy enough to use that people actually use it.

These principles were prescriptive, not descriptive. They told engineers how to build, decades before most of the industry was listening. “Least privilege” and “fail-safe defaults” are now repeated in every security curriculum; “open design” is the formal rejection of security through obscurity that later underpinned the disclosure debate and modern public-key cryptography.

The Orange Book and the Certification Era

If the 1970s produced the principles, the 1980s tried to turn them into a bureaucracy. In 1983 the U.S. Department of Defense, through the National Computer Security Center (an arm of the NSA), issued the Trusted Computer System Evaluation Criteria (TCSEC) — universally known as the Orange Book for the color of its cover, the centerpiece of the so-called Rainbow Series. The Orange Book graded systems on a hierarchy of assurance levels, from D (minimal protection) up through C1, C2, B1, B2, B3, to A1 (verified design). It operationalized the reference monitor as the Trusted Computing Base — the portion of a system whose correctness is sufficient to enforce the security policy.

The Orange Book mattered because, for the first time, security was something a system could be formally evaluated and certified against, with government procurement dollars attached. Its descendant, the international Common Criteria (ISO/IEC 15408), replaced it in 1999 and introduced Evaluation Assurance Levels (EAL1–EAL7) still used in certifying smartcards, firewalls, and operating systems today.

Reflections on Trusting Trust

In 1984, accepting the Turing Award, Ken Thompson — co-creator of Unix — delivered a lecture titled Reflections on Trusting Trust that set a permanent outer boundary on what software security engineering can promise. Thompson described how he could plant a backdoor in a compiler such that it would (a) insert a backdoor whenever it compiled the login program, and (b) insert both malicious behaviors whenever it compiled a fresh copy of itself — and then remove the evidence from the compiler’s own source. The backdoor would propagate through generations of compilers while being invisible in every line of source code anyone could inspect.

His conclusion: “You can’t trust code that you did not totally create yourself.” No amount of source review proves the absence of a backdoor if you did not also build the tools that built it. The trusting-trust attack reframed security as a question not just of code, but of the entire supply chain that produces code — a point the industry would rediscover, painfully, nearly forty years later.

Smashing the Stack: The Buffer Overflow Era

The most consequential class of vulnerability in software history is the buffer overflow, and it was the engine that turned security from a niche concern into a daily emergency. The mechanism is old — the Morris Worm of 1988 spread in part by overflowing a buffer in the Unix fingerd daemon — but it became democratized knowledge on November 8, 1996, when Aleph One (Elias Levy) published Smashing the Stack for Fun and Profit in Phrack magazine, issue 49. The article was the first clear, step-by-step public tutorial on how writing past the end of a stack buffer in a C program could overwrite the return address and hand control of execution to an attacker. It became the most cited underground security paper ever written, and for a decade afterward buffer overflows dominated the exploitation landscape.

The defensive response was a layered arms race that defined applied security engineering for twenty years:

Stack canaries (1998). Crispin Cowan and colleagues presented StackGuard at the 7th USENIX Security Symposium, placing a known “canary” value before the return address; if a buffer overflow clobbered the canary, the program detected the corruption and entered a fail-safe halt instead of executing the attacker’s payload.
Non-executable memory (DEP/NX). Marking the stack and heap non-executable — shipped broadly in Windows XP Service Pack 2 (2004) using the processor’s NX bit — meant injected code could not simply be run where it landed.
Address Space Layout Randomization (ASLR). Pioneered by the PaX project in 2001 and adopted in Windows Vista (2007) and modern Linux, ASLR randomizes the memory addresses of code and data so attackers cannot reliably predict where to jump.

Each defense raised the cost of exploitation without ever fully closing the door — attackers answered canaries and DEP with return-oriented programming, and ASLR with information leaks. The buffer overflow was contained, not cured, which is precisely why the industry eventually turned to eliminating the underlying language hazard altogether.

The Trustworthy Computing Memo

By the turn of the millennium, Microsoft Windows was the most attacked software on earth, and a series of self-replicating worms had turned its security reputation into a business liability with government and enterprise customers. On January 15, 2002, Bill Gates sent a company-wide memo titled Trustworthy Computing, building on a white paper by CTO Craig Mundie. Its central instruction reordered Microsoft’s priorities: when forced to choose between adding a feature and securing the product, security must win. Gates set the bar at making software “as available, reliable and secure as standard services such as electricity, water services, and telephony.”

The memo was not merely rhetoric. Microsoft halted development of Windows Server 2003 for roughly two months while thousands of engineers were retrained and made to audit their own code for security flaws — an event known internally as the security “standdown” or “security push.” This was the founding moment of the Security Development Lifecycle (SDL): a mandatory process that wove security into every phase of development — training, requirements, design, threat modeling, implementation with banned unsafe functions, security testing, a final security review, and a response plan.

Threat Modeling and STRIDE

A core SDL practice is threat modeling — systematically enumerating how a design could be attacked before a line of code is written. Microsoft’s STRIDE framework, created by Loren Kohnfelder and Praerit Garg in 1999, classifies threats into six categories: Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, and Elevation of privilege. STRIDE gave engineers a repeatable checklist for asking “what could go wrong here?” and became one of the most widely adopted threat-modeling methods in the industry.

In 2006 Michael Howard and Steve Lipner — coincidentally, Lipner had been an Orange Book evaluator decades earlier — published The Security Development Lifecycle (Microsoft Press), making the process public. Other vendors adopted their own variants, and the SDL became the template for what is now called “shift-left” security: catching defects in design and development rather than in production. Gates’s memo is widely regarded as the moment the largest software company in the world committed, institutionally, to secure-by-design engineering.

From Process to Industry Practice

The SDL did not stay inside Microsoft. The same period produced the Open Web Application Security Project (OWASP), founded in 2001, whose OWASP Top Ten (first released in 2003) gave the entire web-development world a shared, accessible vocabulary for the most critical application risks — injection, broken authentication, cross-site scripting, and the rest. Where the Orange Book had spoken to defense contractors, OWASP spoke to every web developer.

Over the following two decades, secure-by-design practices fused with automation and the rise of continuous delivery to become DevSecOps: static and dynamic analysis, dependency scanning, fuzzing, and secret detection wired directly into the build pipeline so that security checks run on every commit. The aspiration of Saltzer and Schroeder — that protection be cheap enough and routine enough that engineers actually apply it — became, for the first time, operationally plausible.

Memory Safety as Policy

The defenses against buffer overflows treated the symptom. The root cause was the language: C and C++ trust the programmer to manage memory correctly, and humans, at scale, do not. Microsoft’s own analysis found that roughly 70% of the security vulnerabilities it patched were memory-safety bugs; Google found the same proportion in Chromium and Android. The same root cause links the Morris Worm, the Slammer worm, and Heartbleed — the 2014 OpenSSL flaw that exposed server memory across much of the internet and prompted the industry to start funding critical open-source infrastructure.

The engineering answer was the memory-safe language. Rust, which reached its stable 1.0 release in 2015, demonstrated that a systems language could guarantee memory safety at compile time — no garbage collector, no runtime penalty — through its ownership and borrow-checking model. As teams rewrote components in memory-safe languages, the data was striking: Google reported that the proportion of memory-safety vulnerabilities in Android fell sharply as new code shifted to Rust.

What had been an engineering preference became government policy. After the SolarWinds supply-chain compromise of 2020 — a real-world instance of Thompson’s trusting-trust problem — U.S. Executive Order 14028 (May 2021) directed the creation of secure software development standards, yielding NIST’s Secure Software Development Framework (SSDF). Then, on February 26, 2024, the White House Office of the National Cyber Director published Back to the Building Blocks: A Path Toward Secure and Measurable Software, explicitly calling on the technology industry to adopt memory-safe programming languages such as Rust, Go, C#, Java, Python, and Swift to eliminate an entire class of vulnerability. Fifty years after Saltzer and Schroeder, “secure by design” had become a stated objective of national cybersecurity strategy.

Dead End: The Orange Book’s Certification Model

For all its historical importance, the Orange Book’s approach to security was a dead end — and instructively so. Its model assumed that security was a static property a finished system could be evaluated against, by an external authority, over a long and expensive process. In practice, an evaluation could take years; by the time a system earned its rating, it was often a version or two out of date, and the rating said nothing about the patched, networked, configurable system that customers actually ran. The criteria were rooted in the military’s central concern — preventing unauthorized disclosure across clearance levels (multilevel security) — and translated poorly to the commercial world’s concerns of integrity, availability, and rapidly shifting threats.

Worst of all, the heavyweight certification model was fundamentally mismatched to the pace of software. Steve Lipner, who helped run Orange Book evaluations before later co-authoring the SDL, has written candidly about its decline: rigorous, formal, point-in-time certification simply could not keep up with software shipped and patched on a continuous cycle. The industry’s center of gravity moved away from certifying a finished artifact toward securing the process that produces it — from the Orange Book to the SDL, from a stamp of approval to a continuous discipline. The reference monitor and the Saltzer–Schroeder principles survived because they were ideas about how to build; the certification bureaucracy built around them did not, because security, it turned out, is not a milestone you reach but a property you have to keep re-earning with every release.