The Linker and Loader Story: The Invisible Tool That Assembles Programs

Zusammenfassung

Every compiled program passes through a linker before it can run — a tool so unglamorous that most programmers never think about it until it produces a cryptic error. The linker’s job is to take the separate compiled object files that constitute a program, resolve the references between them, and assemble them into an executable. But the linker is also the keeper of one of computing’s most consequential decisions: how do separate pieces of code share libraries? The evolution from static linking through shared libraries, dynamic loading, and position-independent code reflects forty years of tradeoffs between deployment simplicity, memory efficiency, and security. Understanding the linker means understanding how programs actually exist — not as source code, not as abstractions, but as arrangements of machine code in memory.

What the Linker Does

A compiler translates source code into machine code. But modern programs are not written as single files; they are assembled from many translation units — separate source files, each compiled independently. Each compilation unit produces an object file: machine code for the functions defined in that file, with unresolved references to symbols (functions, global variables) defined elsewhere.

The linker’s job is to resolve these references. It takes a collection of object files, finds the definition of every referenced symbol in some object file (or library), and replaces each unresolved reference with the actual memory address of the definition. The result is an executable — all addresses resolved, all references satisfied, ready to load into memory and run.

Before linkers existed, programmers had to manually calculate and specify the memory address of every subroutine and variable. A program change that moved a subroutine’s position in memory required updating every instruction that referenced it. The assembler — the predecessor to the compiler — could assign symbolic names to addresses, but it could only work within a single program file. The linker extended this capability across files, making modular programming practical.

The Origins: Loader and Linking Loader

The earliest programs on stored-program computers were loaded at absolute addresses: the programmer specified exactly which memory address each instruction should occupy. The first improvement was the relocating loader, which could load a program at any convenient memory address by adjusting all absolute addresses relative to the load address.

Linking loaders (early 1950s) extended this by accepting multiple separately assembled modules and resolving cross-module references at load time. IBM’s mainframe operating systems of the 1960s used linking loaders extensively; the IBM OS/360 linker (which IBM called the “linkage editor”) was a sophisticated tool that produced a single executable module from multiple object files, performed name resolution, and created a load module that could be subsequently loaded and executed rapidly.

The distinction between the linker and the loader reflected different performance concerns. Linking took time — it involved processing many object files, resolving names, and writing a new file. Loading was time-critical — it needed to happen quickly each time a program ran. Separating the two allowed linking to happen once (at build time) while loading remained fast.

Static Libraries and the Archive Format

Collections of commonly used functions — mathematical routines, string operations, I/O functions — were organized into libraries: archives of object files that the linker could search when resolving symbols. The Unix ar archive format (1971) and its descendants remained in use fifty years later as the standard format for static libraries (.a files on Unix/Linux, .lib on Windows).

Static linking incorporated library code directly into the executable. The advantages were simplicity and self-containment: a statically linked executable contained everything it needed and could be deployed as a single file. The disadvantages were size and redundancy: every executable that used the C standard library included its own copy of printf, malloc, and the hundred other functions most programs needed. A system with a hundred programs all statically linked against the C library kept a hundred copies of the same code in storage.

More problematically, a security vulnerability in a widely used library required relinking every program that used it. If printf had a buffer overflow, every statically linked executable on the system was vulnerable until rebuilt.

Shared Libraries: One Copy for Everyone

The solution was shared libraries (also called dynamic link libraries on Windows, .dll): a single copy of library code loaded into memory once and shared by all running programs. When libc.so was loaded into memory, every process using the C standard library mapped the same physical memory pages into their address spaces. Storage cost was paid once; memory cost was paid once.

The technical challenge was addressing. When library code was compiled, it did not know where in memory it would be loaded. The addresses of its own functions and data would differ each time the library was loaded, depending on what else was already in memory. Two solutions addressed this:

Position-Independent Code (PIC): compiler techniques that generated machine code using relative addressing — addresses specified as offsets from the current instruction pointer rather than absolute addresses. PIC code could be loaded anywhere in memory and execute correctly without modification. The Global Offset Table (GOT) and Procedure Linkage Table (PLT) were data structures maintained by the dynamic linker to hold the actual addresses of library functions and global variables at runtime.

Load-time relocation: the dynamic linker modified the library’s code at load time, patching absolute addresses to reflect the actual load address. This was simpler than PIC but meant the library pages could not be shared between processes (since each process’s copy was patched differently). Load-time relocation was used in older Windows DLLs; position-independent shared objects (.so files) on Unix/Linux used PIC.

`dlopen` and Runtime Dynamic Loading

Both static and early shared library linking resolved symbols at program startup. The program was linked against specific libraries; those libraries were loaded when the program started; the program ran with fixed library versions.

The Unix dlopen() interface (introduced in SunOS 4.0, 1988, and later standardized in POSIX) allowed programs to load libraries at runtime, after the program had already started. A program could open a library file, look up function symbols by name, and call them — without having been linked against that library at build time.

This enabled plugin architectures: programs that could be extended by loading code that did not exist when the program was originally written. Every web browser’s extension system, every application’s plugin framework, every module system in interpreted languages ultimately relies on dynamic loading. The Apache web server’s module system (mod_php, mod_ssl) used dlopen()-style dynamic loading to load language handlers and protocol modules without recompiling Apache itself.

The Windows Equivalent

Windows implemented equivalent functionality through LoadLibrary(), GetProcAddress(), and FreeLibrary() — mirroring dlopen/dlsym/dlclose. Windows DLL Hell — the name given to the version conflict problems that plagued Windows installations through the 1990s — arose from shared libraries: multiple programs depending on different versions of the same DLL, installed to the same system directory, overwriting each other. Microsoft’s solution was the Windows Side-by-Side (WinSxS) system that stored multiple versions of shared components simultaneously, and later the packaging model that encouraged applications to ship their own library copies. Both solutions were responses to the fundamental tension in shared library design: sharing saves space but creates version dependency.

Address Space Layout Randomization and the Security Implications

Shared libraries had an unexpected security implication. Many exploits — particularly return-to-libc attacks — worked by overwriting a function’s return address with the address of a useful function in a shared library (like system() in libc). This required knowing the fixed address where the library was loaded.

Address Space Layout Randomization (ASLR), introduced in Linux 2.6.12 (2005) and Windows Vista (2007), randomized the load addresses of libraries, stack, and heap at program startup. An attacker who did not know where system() was loaded could not reliably exploit a buffer overflow to call it.

ASLR required that shared libraries be compiled as position-independent code — which they already were on Unix/Linux. Its adoption transformed PIC from a memory-sharing optimization into a security requirement. Modern compilers generate position-independent executables (PIE) by default, applying ASLR to the executable itself as well as its libraries.

Dead End: DLL Hell and Static Reincarnation

The deployment complexity of shared library versioning drove a partial reversal in the 2010s. Containerization (Docker, 2013) packaged applications with all their library dependencies in isolated containers, effectively statically linking at the distribution level: a container image contained a complete filesystem including all libraries at specific versions, isolated from the host system’s libraries. This solved DLL Hell by returning to the deployment simplicity of static linking — one bundle per application — while maintaining the memory efficiency of shared libraries within each container’s namespace.

Go (2009) made this tradeoff explicit: Go programs were statically linked by default, producing single-binary executables with no external dependencies. The increased binary size was acceptable to developers who valued deployment simplicity. For cloud-native applications deployed in containers, the Go approach aligned perfectly with the container model: build a single binary, package it in a container, deploy it anywhere.

The linker’s story did not end with containerization. WebAssembly (2017) introduced a new linkage model for sandboxed execution in browsers and beyond; Rust’s link-time optimization and cross-language linking capabilities pushed linker sophistication further. The invisible tool that assembles programs kept evolving.