Zum Inhalt springen

The Database Wars: Oracle, Open Source, and the Cloud

Zusammenfassung

The database industry was shaped by one of technology’s most dramatic market failures: IBM invented the relational database, published the research, declined to ship a product for seven years, and watched a competitor use IBM’s own papers to build a company worth $200 billion. What followed was fifty years of commercial warfare — Oracle’s rise to monopoly, the open-source insurgency of MySQL and PostgreSQL, Oracle’s acquisition of its own competition, Amazon’s cloud databases quietly hollowing out Oracle’s most profitable business, and a data warehouse startup called Snowflake reaching the largest software IPO in history. The database market is the clearest case study in how technical superiority, institutional timidity, aggressive licensing, and platform shifts interact to move billions of dollars between companies.

Oracle’s Founding and the IBM Hesitation

The story is told in detail in The Database Revolution: Edgar Codd published the relational model in 1970, IBM built a working prototype (System R) by 1974, published the results openly — and then did not ship a commercial product until DB2 in 1983. The nine-year gap between proof-of-concept and product was not engineering; it was institutional. IBM’s IMS hierarchical database was a major revenue source, and the company moved carefully to avoid cannibalizing it.

Larry Ellison read the IBM research papers in 1977 and shipped Oracle Version 2 in 1979 — four years before DB2. The head start compounded. By the time IBM entered the relational market commercially, Oracle had enterprise customers, sales relationships, and a product that ran on the Unix workstations that were displacing IBM’s proprietary minicomputers. Oracle won the market IBM invented.

Through the 1980s, the relational database market was genuinely competitive. Sybase (founded 1984) was technically strong, particularly on Microsoft and Sybase’s shared SQL Server product. Informix (1980) competed aggressively on performance. Ingres — the academic Berkeley project commercialized in 1980 — had a loyal following. IBM DB2 was the safe choice for shops already standardized on IBM hardware.

Oracle won not because its technology was categorically superior but because of its sales force. Ellison built an aggressive, quota-driven sales organization that was willing to promise features the product did not yet have, close deals at discounts that looked generous in the short term but locked customers in for decades, and pursue every competitive opportunity with a speed IBM’s more deliberate organization could not match. Oracle’s quarterly earnings calls became famous for their predictability: Oracle reliably met guidance by accelerating deals at quarter-end, often by offering large discounts for multi-year prepayments that would not recur.

The Microsoft SQL Server Split

In 1988, Sybase and Microsoft jointly announced SQL Server — a relational database for Microsoft’s OS/2 operating system. The collaboration made sense: Sybase had a database, Microsoft had an operating system. The product shared code.

The partnership collapsed as Microsoft’s ambitions expanded. When Microsoft shifted focus from OS/2 to Windows NT in the early 1990s, Sybase and Microsoft diverged: Sybase continued developing its product for Unix and high-end servers; Microsoft focused SQL Server on Windows. The products shared a common ancestor but became increasingly different.

Microsoft SQL Server grew with Windows Server into the dominant database for Windows-centric organizations — mid-market companies that ran Microsoft everything. Oracle dominated the high end: large enterprises, financial institutions, telecoms, government agencies that required maximum reliability and were willing to pay for it. The two companies carved up the market geographically and by size, competing at the edges but rarely displacing each other’s core customers.

IBM DB2 remained a significant force in mainframe-attached environments, where the cost and complexity of migrating off DB2 was prohibitive. The mainframe database market was not a competitive war; it was a protected monopoly maintained by switching costs.

Michael Stonebraker and the Academic Alternative

While the commercial database wars played out in sales offices, one of the field’s most consequential researchers was building alternatives in Berkeley.

Michael Stonebraker had led the INGRES project at UC Berkeley in the early 1970s — the academic relational database that preceded Oracle’s commercial work. He had been frustrated by Ingres’s commercialization (the company took the code and locked it into a proprietary product) and returned to Berkeley to start again.

POSTGRES (1986) was Stonebraker’s second system — designed to address the limitations of pure relational databases by adding support for complex data types, rules, and user-defined functions. It was, in retrospect, a decade ahead of the commercial products. The system was distributed freely and acquired a small but dedicated user base.

After Stonebraker left Berkeley in 1992, a group of Berkeley students and volunteers continued developing POSTGRES. They replaced its query language with standard SQL, renamed the system PostgreSQL (1996), and released it under the BSD license — genuinely free for any use, commercial or otherwise.

PostgreSQL grew slowly through the late 1990s, faster through the 2000s, and explosively through the 2010s as the open-source database ecosystem matured and cloud deployment made self-managed database hosting trivial. By the early 2020s, PostgreSQL was the most admired database in developer surveys — more often chosen for new projects than Oracle, SQL Server, or MySQL. Stonebraker received the ACM Turing Award in 2014 for his foundational contributions to database systems.

Stonebraker’s Subsequent Ventures

Stonebraker did not stop at PostgreSQL. He went on to found or co-found a series of specialized database companies, each addressing a specific limitation of general-purpose relational databases: Streambase (streaming data), Vertica (columnar analytics), SciDB (array databases for scientific computing), VoltDB (in-memory OLTP). He became the database field’s most prolific commercial researcher, treating each company as an experiment in applying new architectural ideas to real workloads. His philosophy: the one-size-fits-all relational database was an approximation that would be replaced, workload by workload, by specialized systems designed for specific access patterns.

MySQL and the LAMP Stack Revolution

If PostgreSQL represented the academic alternative to Oracle, MySQL represented the populist one.

MySQL was created by Swedish developer Michael “Monty” Widenius and released in 1995. It was deliberately simple: faster than PostgreSQL for common web workloads, easier to administer, and — initially — lacking many features that database purists considered essential (transactions, foreign key enforcement, stored procedures). It was also free.

The combination of Linux, Apache, MySQL, and PHP — the LAMP stack — became the default infrastructure for web development in the late 1990s and 2000s. Building a website meant spinning up a Linux server, installing Apache, creating MySQL tables, and writing PHP. The stack was free, well-documented, and widely understood. MySQL ran Yahoo, Wikipedia, Facebook in its early years, YouTube, and Twitter.

MySQL’s Missing Features

MySQL’s original storage engine (MyISAM) did not support transactions — a database operation would complete or fail, but there was no guarantee of atomicity across multiple operations. For a web application updating a user’s profile, this was acceptable. For a financial application moving money between accounts, it was dangerous. MySQL added transactional support through the InnoDB storage engine (acquired in 2001, made default in MySQL 5.5 in 2010), but years of MyISAM use left a legacy of applications written without transaction discipline. Many web applications contained subtle data integrity bugs that SQL developers from the Oracle world found alarming.

Sun Microsystems acquired MySQL AB in 2008 for $1 billion — a price that seemed extraordinary for a free software company and that reflected Sun’s desperation as its hardware business collapsed. The rationale was that MySQL’s large install base would drive demand for Sun’s hardware and services.

Oracle acquired Sun in 2010, inheriting MySQL along with Java. The acquisition created an immediate conflict of interest: Oracle now owned both the dominant commercial database and its most significant open-source competitor. The MySQL community, alarmed by Ellison’s reputation for aggressive licensing, feared Oracle would either neglect or deliberately hobble MySQL.

Monty Widenius responded by forking MySQL before the acquisition closed, creating MariaDB — a drop-in compatible replacement designed to remain genuinely open source. The fork was a pre-emptive act of preservation. MariaDB became the default MySQL replacement in most Linux distributions.

Oracle’s actual stewardship of MySQL was more benign than feared — the company continued investing in MySQL development and kept it free — but trust had been damaged. MariaDB, PostgreSQL, and MySQL now occupied overlapping positions in the open-source database market, competing with each other and with Oracle’s commercial products simultaneously.

Oracle’s Acquisition Strategy

Oracle’s response to competitive pressure was, consistently, acquisition.

Siebel Systems (2005, $5.85 billion): the leading CRM software company, which ran on Oracle databases. Acquiring the application locked in the database revenue.

PeopleSoft (2005, $10.3 billion): a hostile acquisition — Oracle made an unsolicited bid in 2003, PeopleSoft’s management resisted for eighteen months, and Oracle ultimately prevailed. The acquisition was accompanied by Oracle’s declaration that it would eventually discontinue PeopleSoft’s products and migrate customers to Oracle’s own application suite. Thousands of PeopleSoft customers faced a forced migration they had not chosen.

BEA Systems (2008, $8.5 billion): the leading Java application server company. Again, the pattern: acquire the software that runs on Oracle databases, lock in the customer relationship at multiple layers.

Sun Microsystems (2010, $7.4 billion): with MySQL, Java, and Solaris. Each piece served the same logic — extend Oracle’s control of the enterprise infrastructure stack.

The PeopleSoft acquisition became a landmark antitrust case. The U.S. Department of Justice challenged the deal, arguing it would harm competition in the enterprise applications market. Oracle won the case in 2004 when a judge ruled that the market was not as concentrated as the DOJ alleged. The ruling reflected a narrow market definition; critics argued that the practical outcome — Oracle eventually discontinuing PeopleSoft’s product lines and forcing customers to migrate — vindicated the DOJ’s concerns even if the legal theory was rejected.

Oracle’s Licensing Model: The Compliance Trap

Oracle’s commercial success rested not just on the quality of its database but on the complexity of its licensing and the vigor of its audit program.

Oracle database licensing was, and remains, extraordinarily complicated. Licenses were priced per processor core, but the definition of a “processor core” changed as chip architectures evolved — requiring customers to track not just how many servers they ran Oracle on but what type of processors those servers used and what Oracle’s current “Core Factor” was for that processor family. Licensing for virtual machines in cloud environments required detailed tracking of which physical processors the VMs might run on, even if the VM was not consistently assigned to those processors.

Oracle’s audit program — formally called the “License Management Services” team — reviewed customers’ compliance with their license agreements. Audits frequently found that customers were using more Oracle software than they had licensed, generating settlement demands. The audit process became a profit center: Oracle’s sales force would offer to resolve audit findings through expanded license purchases, often bundled with multi-year support contracts at high margins.

The complexity created a class of specialized consultant — the Oracle licensing specialist — whose sole function was to help companies understand what they had purchased and whether they were in compliance. Enterprise customers routinely maintained standing retainers with these consultants. The licensing regime served Oracle’s interests by making it very difficult for customers to understand their actual exposure and very costly to exit the platform.

Amazon and the Cloud Migration

Oracle’s dominant market position began eroding in the early 2010s, not through competition from another database company but through a platform shift.

Amazon Web Services had been building cloud infrastructure since 2006. In 2009, AWS launched Amazon RDS (Relational Database Service) — a managed service that ran MySQL, PostgreSQL, and eventually Oracle in the cloud, with AWS handling backups, failover, patching, and scaling. For organizations that had previously employed full-time database administrators to manage Oracle installations, RDS offered an alternative: pay AWS per hour and eliminate most of the administrative overhead.

In 2014, AWS launched Amazon Aurora — a MySQL and PostgreSQL-compatible database designed from scratch for cloud infrastructure. Aurora stored data across multiple availability zones with automatic replication, provided MySQL and PostgreSQL interfaces (allowing application code to run unchanged), and cost a fraction of Oracle’s licensing fees. For new applications, the case for choosing Oracle over Aurora was primarily inertia and the requirements of existing Oracle-specific features.

AWS CEO Jeff Bezos made Oracle migration a company priority. Amazon itself ran its consumer website on Oracle for years; migrating off Oracle became an internal initiative in 2018. By 2020, Amazon had migrated 7,500 Oracle databases, eliminating $1 billion in annual Oracle licensing costs. The migration became a case study AWS referenced in sales conversations with other Oracle customers.

Oracle’s Structural Dilemma

Oracle’s licensing model was designed for a world of on-premise data centers where physical servers were purchased and managed by enterprise IT departments. Cloud computing destroyed that model’s foundations. In the cloud, customers could spin up database instances on demand and pay by the hour; Oracle’s per-processor licensing did not map cleanly onto this model. Oracle’s response — Oracle Cloud (2014) and Oracle Autonomous Database (2018) — came late and struggled to overcome the trust deficit created by years of aggressive licensing. By 2024, AWS, Microsoft Azure, and Google Cloud collectively hosted more database workloads than Oracle’s on-premise products.

Snowflake and the Data Warehouse Disruption

While the OLTP (transactional) database market was shifting to the cloud, a parallel disruption was happening in the analytical database market — the systems used for business intelligence, reporting, and data analysis.

The traditional analytical database was the data warehouse: a separate database optimized for large-scale reads and aggregations, loaded nightly from operational systems. Teradata (founded 1979) dominated this market through the 2000s, running the analytical systems of large retailers, financial institutions, and telecoms on expensive proprietary hardware. IBM Netezza and Oracle Exadata competed at the high end.

Amazon Redshift (2012) brought data warehousing to the cloud on a columnar, MPP (massively parallel processing) architecture at a fraction of Teradata’s cost. It democratized analytics for organizations that could not afford dedicated Teradata hardware.

Snowflake (founded 2012, product launched 2014) went further. Its insight was that storage and compute should be separated — you should be able to scale the number of servers doing analysis independently of how much data you were storing, pay only for the compute you used, and allow multiple independent compute clusters to query the same data simultaneously. This architecture was impossible on traditional hardware; it worked naturally in the cloud.

Snowflake’s September 2020 IPO raised $3.4 billion at a valuation of $33 billion — the largest software IPO in history to that point. Both Berkshire Hathaway and Salesforce invested simultaneously with the IPO, an unusual endorsement. The company had gone from $96 million to $265 million in annual revenue in one year, growing at a rate that traditional enterprise software companies took a decade to achieve.

The Separation of Storage and Compute

Traditional databases tightly coupled storage and compute: the same servers that stored data also processed queries against it. Scaling a query required adding more servers that also stored more data. Snowflake’s cloud architecture separated these: data lived in S3 (cheap, durable object storage); compute was provided by “virtual warehouses” that could be started and stopped in seconds. A customer could run twenty simultaneous query workloads against the same data without any contention, scaling each workload independently. This architecture was not achievable before cloud object storage existed.

NewSQL: Distributed Transactions at Scale

The NoSQL movement of the late 2000s had proposed a trade-off: give up ACID transactions, gain horizontal scale. For many workloads this was acceptable. For financial applications, healthcare records, and any system where data integrity was non-negotiable, it was not.

Google Spanner (2012) challenged the premise that you had to choose. Spanner was a globally distributed relational database that provided ACID transactions across geographically separated data centers. It used TrueTime — GPS and atomic clock synchronization across Google’s data centers — to assign globally consistent timestamps to transactions, enabling linearizable consistency across the globe.

Google published the Spanner paper in 2012; it became the blueprint for a new category. CockroachDB (2015) and TiDB (2016, PingCAP) built open-source systems inspired by Spanner, using the Raft consensus protocol instead of GPS clocks. Both found commercial success in applications requiring global distribution with strong consistency — fintech, e-commerce, SaaS platforms.

The NewSQL systems demonstrated that the CAP theorem, while technically correct, did not preclude building practical systems with strong consistency at large scale: you had to be careful about latency and partition handling, but ACID semantics and horizontal scale were not mutually exclusive.

The State of Play

The database market of the 2020s is fragmented in ways that would have been unrecognizable in 2000, when Oracle’s dominance seemed permanent.

Oracle remains the largest database company by revenue and continues to dominate the legacy enterprise database market — the mainframes, the SAP deployments, the financial systems that cannot be migrated without enormous risk. Its support and licensing revenue from this installed base is extraordinarily profitable. But new workloads are not going to Oracle; they are going to PostgreSQL, Aurora, Snowflake, or one of dozens of specialized databases designed for specific patterns.

PostgreSQL’s position is remarkable. A volunteer-maintained open-source database, with no company behind it, has become the default choice for new relational workloads — trusted by more developers than any commercial alternative. Its extension ecosystem (PostGIS for geographic data, TimescaleDB for time series, pgvector for machine learning embeddings) has expanded its reach into domains that would have required specialized databases a decade ago.

The database wars are not over; they have become more complex. Every new data pattern — streaming, vector search, graph, time series — has spawned specialized databases. The question of which specialized database to use for which workload has replaced the simpler question of which relational database to choose.

For the theoretical foundations of the relational model and SQL, see The Database Revolution. For Oracle’s founder, see Larry Ellison and Oracle. For the cloud infrastructure that displaced Oracle’s data centers, see Jeff Bezos and Amazon.


📚 Sources