File Systems: From FAT to ext to ZFS

Zusammenfassung

A storage device is, physically, just a vast array of numbered blocks that can hold bytes. The file system is the illusion layered on top — the idea that those bytes are organized into named files, arranged in folders, with creation dates and permissions, that survive a power loss and can be found again. It is one of computing’s most successful abstractions, so successful that most people never think about it until it fails. Behind the familiar folder icon lies fifty years of engineering against a brutal adversary: the certainty that the power will eventually cut out mid-write, that disks will silently corrupt data, and that the file system must somehow stay consistent anyway.

What a File System Does

A raw storage device — a hard disk, an SSD, a USB stick — presents itself to the computer as a long sequence of fixed-size blocks (or sectors), each identified by a number. That is all the hardware offers: read block N, write block N. Everything else is software.

A file system is the layer that turns this flat array of blocks into the structured world users and programs expect:

Files — named sequences of bytes of arbitrary length, abstracting away which physical blocks actually hold the data.
Directories (folders) — a hierarchical namespace for organizing files, so a file has a path like /home/user/report.txt rather than a block number.
Metadata — for each file, bookkeeping the data itself does not contain: size, owner, permissions, and timestamps for creation, modification, and access.
Allocation and free-space management — deciding which blocks hold which file’s data, tracking which blocks are free, and reclaiming them when files are deleted.

The file system’s job is to maintain all of this reliably on a device that knows nothing about files — and to keep it consistent across crashes, which is where the real difficulty lies.

The Core Mechanism: Inodes and Allocation

Most Unix-derived file systems organize a file’s existence around an inode (index node) — a metadata record holding everything about a file except its name and contents: its size, owner, permissions, timestamps, and crucially the list of which disk blocks hold its data. The directory, separately, maps human-readable names to inode numbers. This separation is elegant: it allows one file to have multiple names (hard links), since several directory entries can point to the same inode.

The central design problem is how to find a file’s blocks. A small file’s blocks can be listed directly in the inode. But a large file may span millions of blocks, too many to list inline. The classic Unix solution is indirect blocks: the inode points to blocks that themselves point to data blocks, with double- and triple-indirect levels for ever-larger files — a tree of pointers. The alternative, used by many modern file systems, is extents: rather than listing every block, record contiguous ranges (“blocks 1000 through 5000”), which is far more compact and efficient for the large, contiguous files common today.

How free space is tracked and how files are laid out determines a file system’s vulnerability to fragmentation — the scattering of a file’s blocks across the disk, which on spinning disks meant slow, seek-heavy reads. Defragmentation was a familiar ritual of the DOS and early Windows era; better allocation strategies and the rise of SSDs (which have no seek penalty) largely retired it.

FAT: The Lowest Common Denominator

The File Allocation Table (FAT) file system, originating with Microsoft in the late 1970s and carried through DOS and early Windows, is the most widely deployed file system in history — not because it is good, but because it is simple. FAT tracks file blocks with a single table, a linked list of which block follows which, stored at the start of the disk.

FAT’s evolution — FAT12, FAT16, FAT32, and later exFAT — was a running battle against its own size limits (FAT32’s notorious 4 GB maximum file size frustrated a generation trying to copy large videos). It has no permissions, no journaling, and poor reliability. Yet its very simplicity made it the universal format: nearly every operating system, camera, and embedded device can read FAT, which is why USB flash drives and SD cards still ship formatted with FAT/exFAT to this day. FAT survives as computing’s lowest common denominator — the format everything agrees on precisely because it asks so little.

The Crash Problem and the Journaling Revolution

The defining challenge of file-system design is crash consistency. Writing a file is not one operation but several — update the data blocks, update the inode, update the free-space map, update the directory. If the power fails between these steps, the file system is left in an inconsistent state: a file whose inode says it owns blocks that the free map also lists as free, or a directory entry pointing at an inode that was never written.

The early answer was the fsck (file system check) utility, which scanned the entire disk on boot after a crash to find and repair inconsistencies. As disks grew, this became untenable — a full fsck of a large disk could take hours, an eternity for a server that needed to come back online.

The breakthrough was journaling. Before making changes, the file system first writes a description of what it is about to do to a dedicated log (the journal). If a crash interrupts the operation, recovery is fast: on reboot, the system simply replays or discards the journal entries, restoring consistency in seconds instead of scanning the whole disk. Journaling, borrowed conceptually from database transaction logs (see The Database Revolution), became standard:

ext3 (2001) added journaling to Linux’s ext2, and ext4 (2008) added extents and other modern features; ext4 remains the default for much of the Linux world.
NTFS (1993), Windows’ journaling file system, brought permissions, encryption, and reliability that FAT never had.
HFS+ and Apple’s later file systems brought journaling to the Mac.

Journaling was the moment file systems became genuinely trustworthy on consumer hardware — the quiet reason your computer no longer spends ten minutes “checking the disk” after an unclean shutdown.

The Modern Frontier: Copy-on-Write and Data Integrity

The current generation of file systems — ZFS (Sun Microsystems, 2005) and Btrfs (Linux) — went further, built around copy-on-write (CoW). Instead of overwriting data in place, CoW writes modified data to new blocks and only then atomically updates the pointers to reference them. The old data remains intact until the switch is complete, so a crash can never leave a half-written state — there is always a consistent version to fall back to. CoW makes journaling largely unnecessary and enables features that older designs could not offer cheaply:

Snapshots — because old blocks are preserved, the file system can keep instant, near-free point-in-time copies of its entire state, allowing rollback after a bad update or accidental deletion.
End-to-end checksums — ZFS’s signature contribution. It checksums every block and verifies it on every read, detecting silent data corruption (“bit rot”) that the disk hardware itself fails to catch. Combined with redundancy, it can automatically repair corrupted data from a good copy. ZFS treated the disk as an adversary that will corrupt your data, and built integrity checking into the file system itself — a philosophical shift in what a file system is responsible for.

These systems blurred the old line between the file system and the volume manager, absorbing functions like software RAID and logical volume management into the file system proper.

Beyond the Local Disk

File systems have stretched far beyond a single machine’s disk. Network file systems — Sun’s NFS (1984) and Windows’ SMB/CIFS — let machines access files on remote servers as if they were local, an early and influential piece of distributed-systems engineering (see Distributed Systems). At hyperscale, distributed file systems like the Google File System (GFS) and Hadoop’s HDFS spread enormous datasets across thousands of machines, underpinning the Big Data revolution. And flash storage (SSDs) forced its own rethink: flash cannot overwrite in place and wears out after a limited number of writes per cell, demanding wear-leveling and TRIM support, and inspiring flash-specific file systems like F2FS. The abstraction of “a file” stayed constant while the machinery beneath it was rebuilt for entirely new physics.

Dead End: The One True File System

A recurring ambition was the universal file system — one design to serve every purpose, every operating system, every device. It never arrived, and the reasons are instructive. The most ambitious attempt was ZFS itself, which combined volume management, integrity, snapshots, and scalability into one technically dazzling system widely regarded as ahead of its time. Yet ZFS never became the universal Linux file system, and not for technical reasons: Sun released it under the CDDL license, which is incompatible with the Linux kernel’s GPL (see The Open Source Revolution). The licensing clash — not engineering — kept the best file system of its era out of the kernel where it would have done the most good, and left Linux to develop Btrfs as a homegrown alternative that took over a decade to stabilize. Meanwhile FAT, technically primitive, kept winning the universality crown precisely because it was simple and unencumbered. The lesson recurs across this encyclopedia: the “best” technology rarely wins on merit alone, and in file systems, where data must outlive hardware and cross between rival operating systems, licensing, compatibility, and institutional inertia have shaped outcomes as decisively as any clever allocation algorithm. There is no one true file system, and there probably never will be.