The History of Spam

Zusammenfassung

On May 3, 1978, Gary Thuerk, a marketing manager at Digital Equipment Corporation, sent an unsolicited promotional message about a new computer product to 393 ARPANET users. The response was furious: the ARPANET was a government research network with norms against commercial use, and Thuerk had violated them. He was reprimanded. His email is remembered as the first spam message. It took another seventeen years, and the commercialization of the internet, for spam to grow from isolated incidents into an existential threat to email itself. By 2004, spam constituted 70–80% of all email traffic worldwide. A decade of arms races between spammers and filters, botnet takedowns, criminal prosecutions, and evolving AI-based detection has reduced spam’s share but never eliminated it. The history of spam is the history of the abuse of open communication systems — and the costs of keeping them open.

ARPANET and the First Spam

Gary Thuerk’s May 1978 message is the canonical origin story, but the cultural phenomenon of spam — unsolicited mass communication — predates the internet.

Usenet, the distributed bulletin board system launched in 1980, experienced spam long before email spam became dominant. In 1994, Arizona immigration lawyers Canter and Siegel posted advertisements for their green card lottery services to approximately 6,000 Usenet newsgroups — every group they could reach — using an automated script. The message was inappropriate, unwanted, and repeated: a textbook definition of spam. The incident attracted national press coverage, Canter and Siegel were unapologetic, and the lawyers wrote a book, How to Make a Fortune on the Information Superhighway, explaining how to repeat their strategy. This was widely regarded as the event that transformed “spam” from an isolated annoyance into a recognized problem requiring systematic response.

The term “spam” itself derives from a Monty Python sketch in which the word “Spam” (the Hormel canned meat product) is repeated increasingly loudly by Vikings until it drowns out all other conversation — a metaphor for unwanted messages that overwhelm legitimate content.

Email Spam Explodes

The commercialization of the internet in the mid-1990s transformed spam from a norm-violation into an industry. Email addresses became commercially valuable; any business with a list of addresses could market to them at near-zero marginal cost. The economics were brutally favorable to spammers: sending ten million emails cost essentially nothing, and if even one in ten thousand recipients made a purchase, the campaign was profitable.

Sanford Wallace — “Spam King” — built one of the first commercial spam operations in 1995, sending promotional email for small businesses through his company Cyberpromo. Wallace sold email marketing services, claiming his lists were “opt-in” (recipients had consented). The claim was false; the lists were purchased or scraped. Internet service providers began blocking his IP addresses; Wallace responded by constantly changing addresses, then by using open mail relays — mail servers misconfigured to forward email from anyone.

Open mail relays were the spam industry’s enabler through the late 1990s. Any mail server accessible to the internet could, if misconfigured, forward email from any sender to any recipient. Spammers would send their campaigns through thousands of such relays, obscuring the true origin and distributing the sending load. The security community responded by compiling blocklists of known open relays — the Mail Abuse Prevention System (MAPS) real-time blackhole list (RBL), created in 1997, allowed mail administrators to reject connections from listed IP addresses. The arms race between relay blocklisting and relay discovery was the defining technical battle of late-1990s anti-spam work.

Botnets and Industrial-Scale Spam

By 2001, major spam operations had shifted from open mail relay abuse to botnets — networks of compromised home computers controlled by remote attackers. An infected PC running Windows (and nearly all home PCs ran Windows) could be enrolled in a spam botnet without the owner’s knowledge. The botnet’s controller — the botmaster — could dispatch millions of spam messages from millions of geographically distributed IP addresses, making blocklisting ineffective.

The Storm Worm botnet (2007) infected approximately one million computers and was used to send spam advertising pharmaceutical products, stock pump-and-dump schemes, and malware. McColo Corporation, a California ISP, was discovered in 2008 to be hosting command-and-control servers for multiple botnets including Rustock, Srizbi, and Cutwail, which together were responsible for roughly 75% of all spam at the time. When McColo’s upstream providers disconnected it in November 2008, global spam volume dropped by approximately 65% within 24 hours — demonstrating both how concentrated the botnet infrastructure was and how effective infrastructure-level action could be.

Rustock was one of the most technically sophisticated botnets of its era. It used rootkit techniques to hide itself from security software, used encrypted communications to its command-and-control servers, and implemented anti-analysis tricks to defeat security researchers’ reverse engineering. Rustock was estimated to send 30 billion spam messages daily at peak. Microsoft’s Digital Crimes Unit, working with the FBI and international law enforcement, seized Rustock’s command-and-control servers in March 2011; global spam volume fell immediately.

The Content of Spam

Spam content evolved with the sophistication of spam filters:

Early spam (1995–2000) was straightforward marketing: product advertisements, get-rich-quick schemes, pornography. The messages looked like ordinary marketing email and were easy to filter by keyword.

Pharmaceutical spam became dominant by 2003. Ads for erectile dysfunction drugs (Viagra, Cialis, and counterfeits) constituted a substantial fraction of global spam for years. The Canadian Pharmacy spam network, which operated through multiple front organizations and spam botnets, sold counterfeit pharmaceuticals and was estimated to generate hundreds of millions of dollars annually before major enforcement actions in 2010.

Stock pump-and-dump spam promoted penny stocks, artificially inflating prices so the senders (who owned the stocks) could sell before the price collapsed. Academic research found that stock spam campaigns produced statistically significant price increases and demonstrated measurable returns to the senders.

Image spam (approximately 2006–2008) defeated text-based filters by embedding the spam message in an attached image. The email body contained only HTML and an image tag; the message text existed only in a JPEG that filters could not read. This required computer vision techniques to counter.

Advance-fee fraud — the 419 scam, named after the Nigerian criminal code section it violated — invited recipients to help transfer large sums of money in exchange for a share, ultimately extracting advance fees from victims. While individual 419 emails were easy to recognize, the scams exploited the fact that a fraction of recipients would respond, and a fraction of those would send money. Economics research estimated the scams were profitable precisely because they were obviously absurd: only genuine victims would respond, filtering out everyone who would later become skeptical.

Spear phishing — targeted spam impersonating trusted entities — evolved into the dominant credential theft vector for organized crime and nation-state attackers. Unlike bulk spam, spear phishing was personalized: the sender knew the target’s name, organization, and often specific details about their role, crafting messages that appeared to come from colleagues, IT departments, or known services. Spear phishing against corporate employees, carrying malware payloads, was the initial access method for most significant data breaches of the 2010s.

The Filter Wars

Anti-spam technology evolved through several generations:

Rule-based filters applied hand-written rules: block messages containing “Viagra”, “mortgage refinancing”, “you have won”. Rule maintenance was constant as spammers changed terminology; rules also produced false positives (legitimate messages about pharmaceuticals were blocked). SpamAssassin (2001), the dominant open-source filter, combined dozens of rule-based checks with weighted scoring.

Bayesian filtering, popularized by Paul Graham’s influential 2002 essay “A Plan for Spam,” used statistical analysis of word frequencies in known spam vs. legitimate mail to classify new messages. A Bayesian filter trained on a user’s personal email learned that “Niagara” was likely spam (a frequent misspelling in early pharmaceutical spam) while “meeting” was likely legitimate. Bayesian filters were far more adaptive than rule-based systems and temporarily gave recipients a significant advantage.

Spammers responded with dictionary attacks and word salad — appending random legitimate words to spam messages to confuse statistical filters. A spam message advertising pharmaceuticals might include several paragraphs of randomly selected words from a dictionary, making the word frequency distribution appear more like legitimate email.

Machine learning filters began replacing hand-coded rules and simple Bayesian methods in the 2010s. Google’s email infrastructure used large-scale neural network classifiers trained on billions of messages. By 2019, Google reported that Gmail’s filters caught 99.9% of spam with very low false-positive rates — a remarkable engineering achievement, though the remaining 0.1% was still millions of messages daily.

The Legal Dimension

CAN-SPAM Act (2003) was the US federal anti-spam law, requiring that commercial email include opt-out mechanisms, accurate header information, and sender identification. It did not require opt-in consent and explicitly preempted stronger state laws. The result was widely criticized as legitimizing spam by establishing a regulatory floor — “CAN-SPAM” was sometimes read as “You Can Spam.” The law was used successfully against commercial email marketers who violated its requirements, but had limited effect on criminal spammers operating from outside US jurisdiction.

European Union regulations were stricter. The ePrivacy Directive (2002) required opt-in consent for commercial email to individuals. GDPR (2018) created substantial enforcement mechanisms for data protection violations including unauthorized commercial email.

Criminal prosecutions of major spammers did occur. Alan Ralsky, a major US spam operator, was sentenced to 51 months in federal prison in 2009 for fraud and money laundering related to stock pump-and-dump campaigns. Robert Soloway (“Spam King”) received 47 months. But the most significant spam operations were operated from Russia and Eastern Europe, where extradition was difficult and local authorities were often uninterested in prosecution.

Spam Beyond Email

Spam migrated to every communication channel that achieved scale:

SMS spam grew as mobile phone penetration increased. Bulk SMS was sent advertising services, phishing for banking credentials, and distributing malware through malicious links. SMS lacked email’s filter infrastructure; mobile carriers built filter systems, but SMS spam remained more likely to reach recipients than email spam.

Social media spam exploited the trust networks of social platforms. Spam accounts on Twitter and Facebook spread links to malware, counterfeit goods, and phishing sites through automated posting and follower networks. Platform companies built sophisticated classification systems, but spam remained a persistent problem — estimated to constitute 5–15% of Twitter accounts at various points.

Comment spam on blogs, forums, and product review systems was designed primarily to create backlinks for SEO manipulation. Automated systems posted millions of generic comments containing links; the target was Google’s ranking algorithm, not human readers. Google’s algorithm updates targeting link spam, and authentication systems (CAPTCHA) limiting automated posting, reduced comment spam substantially.

Voice spam (robocalls) became significant in the 2010s as VoIP technology made bulk calling economically viable. US consumers received approximately 50 billion robocalls in 2018. Technical countermeasures — STIR/SHAKEN call authentication standards, carrier-level filtering, spam call labeling — were deployed from 2019 onward with partial effectiveness.

📚 Sources

History of email spam — Wikipedia — the archived original; possibly the first spam email
Paul Graham, A Plan for Spam (2002) — the essay that popularized Bayesian filtering for spam
CAN-SPAM Act (2003) — FTC guidance on the federal anti-spam law
McColo — Wikipedia — Brian Krebs’s reporting on the ISP disconnect and spam volume drop
The Spamhaus Project — operational history and statistical tracking of spam sources
Levine and Hoffman, Fighting Spam for Dummies (2004) — accessible overview of the technical and legal landscape