The LLM Race (2022–)

Zusammenfassung

On November 30, 2022, OpenAI launched ChatGPT. One hundred million people used it within two months — the fastest product adoption in internet history. The release triggered a competitive cascade: Google rushed Bard to market despite organizational caution, Microsoft integrated GPT-4 into Bing and Office, Meta released its LLaMA weights, and a wave of startups — Anthropic, Mistral, Cohere, Inflection, and dozens of others — entered the large language model market. By 2025, every major technology company had a frontier AI program, governments were passing AI regulations, and the economics of the AI race — which required billions of dollars in GPU clusters before a product could ship — were concentrating the field into a smaller number of well-capitalized players. The LLM race was simultaneously the most consequential technological transition since the smartphone and an unresolved competition whose outcome remained open.

The Foundation: GPT-3 and the Scale Hypothesis

OpenAI’s GPT-3, released in June 2020 with 175 billion parameters, demonstrated the “scale hypothesis”: that simply making language models larger, training them on more data, and giving them more compute would produce qualitatively better capabilities, not just quantitative improvements. GPT-3 could write coherent essays, generate code, answer questions, translate languages, and engage in multi-turn conversations — without any task-specific training. It was a single general-purpose model.

The reaction within the AI community was divided. Some researchers saw GPT-3 as evidence that scaling alone was insufficient — the model made factual errors, hallucinated, and was inconsistent. Others saw it as a preview of transformative capabilities that would emerge as scaling continued. Scaling laws (Kaplan et al., OpenAI, 2020) provided theoretical support: model loss scaled predictably with model size, dataset size, and compute budget, suggesting that continued investment would produce predictable improvements.

GPT-3’s public API launch was intentionally restricted — OpenAI believed the model was too capable and too prone to misuse for open release. The restricted access created scarcity and mystique; access to GPT-3 became a competitive advantage for AI startups building on top of it.

ChatGPT: The Interface That Changed Everything

The missing element was usability. GPT-3’s API required prompt engineering — understanding how to phrase inputs to get useful outputs — that most users didn’t have. The technical capability was impressive to AI researchers; it was not accessible to ordinary people.

ChatGPT solved the interface problem. Launched on November 30, 2022, it wrapped GPT-3.5 (a more capable model) in a simple chat interface and applied Reinforcement Learning from Human Feedback (RLHF) to make the model follow instructions reliably, decline harmful requests, and produce responses calibrated to user intent. The result was a model that an ordinary person could have a useful conversation with.

The adoption was unlike anything in consumer internet history. One million users in five days. One hundred million users in two months (compared to nine months for Instagram, 3.5 years for Netflix). ChatGPT demonstrated that there was a mass market for conversational AI that could write, summarize, explain, code, brainstorm, and answer questions — at a quality level that was not always correct but was often better than a quick Google search.

The business implications were immediate and alarming to competitors. Google’s Larry Page and Sergey Brin were reportedly called back from semi-retirement to assess the threat. The company’s advertising business — built on the assumption that users wanted to query a search engine and receive ten blue links — was suddenly vulnerable to a model that could answer questions directly.

Google’s Rushed Response: Bard

Google’s response demonstrated both the company’s technical resources and the organizational challenges of competing when the stakes were existential to a profitable business.

Bard launched in March 2023, two weeks after Google’s February announcement, in a rapid deployment driven by competitive pressure rather than product readiness. At the launch demonstration, Bard made a factual error visible in the promotional video — incorrectly stating that the James Webb Space Telescope had taken “the first pictures of a planet outside our own solar system.” (It had not; the first direct image of an exoplanet was captured in 2004 by the ground-based Very Large Telescope, years before JWST existed.) The error was noticed immediately by astronomers on social media. Google’s stock fell approximately 8% ($100 billion in market capitalization) in the hours following.

The deeper problem: Google had developed LaMDA (Language Model for Dialogue Applications) and subsequently PaLM internally, but had been cautious about releasing them publicly because of “reputational and legal risks.” The organization was “red-teaming” its own AI products out of caution about harms at exactly the moment when a competitor was shipping boldly. The gap between capability and deployment reflected a genuine tension: Google had more to lose from AI systems that produced harmful content or eroded user trust, and was more cautious as a result.

Bard was subsequently replaced by Gemini (December 2023), built on Google’s next-generation model family and available in multiple sizes (Ultra, Pro, Nano). Gemini 1.5’s 1-million-token context window was a genuine technical advance, enabling analysis of long documents and codebases in single prompts.

The “AI Emergency”

Google CEO Sundar Pichai reportedly declared a “code red” internally after ChatGPT’s launch. The company reorganized its AI research teams, accelerated product development timelines, and integrated Gemini across Gmail, Docs, Search, and other products. The speed of the response was unprecedented for a company Google’s size and risk profile.

Anthropic: Safety-Focused Competition

Anthropic was founded in 2021 by Dario Amodei and Daniela Amodei, former OpenAI executives, along with several other OpenAI alumni. The company’s founding thesis was that the most important work in AI was making AI systems safe and beneficial as they became more powerful — and that the best way to do that was to build frontier models while focusing on alignment and safety research.

Anthropic’s Claude series (Claude 1, 2, 3) competed directly with OpenAI’s GPT series. Anthropic’s stated differentiation was emphasis on Constitutional AI (CAI), a technique for training models to be helpful, harmless, and honest using an AI-generated critique process rather than exclusively relying on human raters. Claude models were generally regarded as producing more thoughtful, nuanced, and careful responses — particularly on ethically complex topics — than GPT models, at some cost to instruction-following aggressiveness.

Anthropic’s funding structure was unusual: it raised $7.3 billion through 2024, with major investors including Google ($300M), Amazon ($4B), and Spark Capital. The investment from cloud providers reflected strategic interests in AI infrastructure as much as financial returns.

Meta’s Open-Source Bet: LLaMA

Meta AI released the weights of LLaMA (Large Language Model Meta AI) in February 2023, initially to researchers through a restricted access process. Within a week, the weights had leaked to 4chan and were publicly available. Meta subsequently released LLaMA 2 (July 2023) under a custom license that permitted commercial use for smaller organizations, and LLaMA 3 (April 2024) under similar terms.

The release of open-weight models was strategically significant. Commercial models (ChatGPT, Claude, Gemini) required API access with per-token pricing and terms of service restrictions. Open-weight models could be downloaded, run locally, fine-tuned on custom data, and deployed without any ongoing payments or restrictions. For enterprises with sensitive data, medical or legal applications, or jurisdictions with data sovereignty requirements, open-weight models were the only viable option.

The LLaMA releases catalyzed an ecosystem of fine-tuned variants, open-source tooling (Ollama, llama.cpp for CPU inference, LangChain), and applications built on open-weight foundations. Mistral AI (Paris-based, founded 2023) released competitive open-weight models at smaller sizes that challenged the performance-per-parameter efficiency of Meta’s models.

The strategic logic for Meta: Meta’s business does not depend on LLM API revenue (unlike OpenAI, Anthropic, or Google Cloud). Commoditizing LLMs by releasing high-quality open models benefits Meta by making the cost of intelligence cheap, while competitors who are trying to build API businesses see their pricing power eroded.

The Infrastructure Competition: GPUs and Cloud

The LLM race was also a race for computational infrastructure. Training frontier models required tens of thousands of NVIDIA H100 GPUs — each costing approximately $30,000 — networked in high-speed clusters consuming megawatts of power. Microsoft committed over $10 billion to OpenAI, primarily for infrastructure access. The hyperscalers (Google, Amazon, Microsoft) were simultaneously AI competitors and AI infrastructure providers.

NVIDIA emerged as the primary beneficiary of the LLM race. The company’s market capitalization rose from approximately $350 billion in early 2023 to over $3 trillion by mid-2024 — the fastest appreciation of value in the history of any US public company. Jensen Huang’s long-term bet on GPGPU computing and the CUDA ecosystem had created a near-monopoly on the hardware required for AI training.

Competition for GPU supply drove AI startups to extraordinary fundraising rounds: Inflection AI raised $1.3 billion in 2023; xAI (Elon Musk’s AI company) raised $6 billion; scale-up AI companies collectively raised over $50 billion in 2024. The financing environment was explicitly driven by concern about being left behind if compute constraints became the binding constraint.

The LLM Landscape as of 2025

By mid-2025, the LLM market had undergone significant consolidation and differentiation. OpenAI remained the market leader in consumer and enterprise API access, with GPT-4 and its successors powering Microsoft’s Copilot integrations across Windows, Office, and GitHub (Copilot). Anthropic’s Claude held a strong position in enterprise use cases requiring long context and nuanced reasoning. Google’s Gemini was deeply integrated into the Google Workspace ecosystem. Meta’s open-weight Llama models dominated the self-hosted and fine-tuning market.

The competitive dynamics had shifted from pure capability (who has the smartest model) to ecosystem integration (whose model is embedded in the products users already use), cost efficiency (who can deliver capability per dollar most efficiently), and specialization (vertical models for legal, medical, coding, and other domains that outperformed general-purpose models on specific tasks).

The question that remained open: whether the race would produce genuine AI systems that understood and reasoned about the world, or whether scaling transformer architectures would reach a capability ceiling before human-level reasoning was achieved. The answer would determine whether the LLM race was a chapter in AI history or its climax.