Zum Inhalt springen

Yann LeCun and Convolutional Networks

Zusammenfassung

Yann LeCun designed the first practical convolutional neural network in 1989, deployed it at Bell Labs to read handwritten checks that represented over ten percent of all checks written in the United States, and then watched the approach be dismissed for a decade before it became the foundation of the entire field of computer vision and deep learning. He became Facebook’s Chief AI Scientist in 2013, built one of the world’s premier AI research organizations, and has spent the years since ChatGPT’s arrival arguing in public — loudly, technically, and sometimes combatively — that large language models are not the path to human-level intelligence. He may be wrong. He may also be the only person with sufficient credentials to make the argument serious.

A French Engineer in New Jersey

Yann LeCun was born on July 8, 1960, in the Paris suburb of Soisy-sous-Montmorency. He studied electrical engineering at ESIEE Paris and completed his PhD at the Université Pierre et Marie Curie (Paris VI) in 1987, working on learning algorithms in neural networks. His dissertation research came to the attention of Geoffrey Hinton, then at the University of Toronto, who had recently published the backpropagation paper with Rumelhart and Williams.

LeCun spent a postdoctoral year with Hinton at Toronto — a formative collaboration that connected him to the small, embattled community working on neural networks at a time when the field had largely moved on to other methods. He then joined Bell Labs in New Jersey in 1988, entering perhaps the most storied industrial research institution in American science. Bell Labs was where Claude Shannon had developed information theory, where Ken Thompson and Dennis Ritchie had built Unix, where the transistor had been invented. It had long time horizons, serious resources, and problems that needed solving.

One of those problems was handwriting recognition. AT&T and the banking industry wanted to read the amounts on personal checks automatically. The digits were handwritten by millions of different people with millions of different styles. No one could write rules that covered all the variation. The problem was, in essence, a machine learning problem — and LeCun arrived with exactly the approach to solve it.

LeNet: The Spatial Structure of Images

The insight that made LeNet possible was about structure. A standard fully connected neural network, applied to an image, would connect every input pixel to every neuron in the first layer. This was computationally extravagant and, more importantly, threw away everything that images had in common: nearby pixels were more related than distant ones, the same feature could appear anywhere in the image, and the task of recognizing an object should not depend on exactly where in the frame it appeared.

LeCun’s solution was to encode these priors directly in the architecture:

Local connectivity: each neuron connected only to a small spatial region of the input — a receptive field — not to the whole image.

Weight sharing: all neurons in a given feature map applied the same filter, scanning across the image. If the filter detected horizontal edges, it detected horizontal edges everywhere, using the same weights. This reduced parameters enormously and enforced translation invariance.

Pooling layers: spatial subsampling after each convolutional layer reduced the spatial dimensions of the representation, making the system tolerant to small shifts and distortions while preserving which features were present.

In 1989, LeCun published “Backpropagation Applied to Handwritten Zip Code Recognition” — the paper describing the first practical convolutional neural network. The architecture was trained end-to-end with backpropagation: the same gradient-based learning algorithm that Hinton had championed, but now applied to a structured architecture that used prior knowledge about images to constrain the learning problem.

The results were good enough that AT&T and Bell Labs invested in turning the research into a deployed product. Over the following years, LeCun developed LeNet-5, an evolved architecture with multiple convolutional and pooling layers followed by fully connected layers for classification. By the mid-1990s, the system was in production.

Info

At its peak deployment, LeNet-5 was reading the handwritten amounts on checks that represented between ten and twenty percent of all personal checks written in the United States. This was the first large-scale commercial deployment of a deep neural network, and it worked. The system ran on specialized hardware — a custom VLSI chip — because general-purpose computers of the mid-1990s could not process the volume of checks required at acceptable speed. LeCun’s architecture was right roughly twenty years before the hardware infrastructure existed to scale it.

The Winter: Support Vector Machines and the Long Wait

Despite its commercial success at Bell Labs, the convolutional network approach did not take over the field. Support vector machines (SVMs), introduced by Vladimir Vapnik and Corinna Cortes at Bell Labs in 1995, offered competitive performance on many tasks with stronger theoretical guarantees and lower computational requirements. The ML mainstream found SVMs more attractive: they had convergence proofs, generalization bounds, and connections to statistical learning theory that neural networks lacked.

Through the late 1990s and early 2000s, the prevailing view was that neural networks had been superseded. They worked on specific problems like digit recognition but did not generalize. The features they learned were opaque. The training was temperamental. LeCun published his definitive LeNet-5 paper in 1998 — “Gradient-Based Learning Applied to Document Recognition” — a comprehensive account of the architecture and its applications that remains one of the most cited papers in computer science. Then, for most of the subsequent decade, the broader field moved on.

LeCun moved to NYU in 2003, establishing a research group focused on visual perception and deep learning at a moment when neither was fashionable. He was part of the small community — he, Hinton, and Yoshua Bengio — that kept neural network research alive through the lean years. They called themselves, somewhat ironically, the “deep learning conspiracy”: a small group convinced they were right and willing to wait for the field to catch up.

Warnung

The decade between LeNet-5 (1998) and the ImageNet era (2012) is a case study in how good ideas can lose institutional momentum. LeCun’s convolutional architecture worked. It was deployed commercially. The principles were sound. But it could not yet beat SVMs on the benchmark problems that determined funding and career trajectories in the ML community of the 2000s. The architecture waited for GPUs, for large labeled datasets, and for the organizational will to train large networks — and when all three arrived, the architecture that had been “superseded” turned out to have been right all along.

The ImageNet Vindication

The dataset and competition that precipitated this moment — built by Fei-Fei Li over five years — is covered in ImageNet and the Deep Learning Revolution.

In 2012, Hinton’s group at Toronto entered the ImageNet Large Scale Visual Recognition Challenge with AlexNet — a deep convolutional neural network built squarely on the principles LeCun had developed at Bell Labs. AlexNet achieved a top-5 error of 15.3%, beating the second-place system by more than ten percentage points. The architecture that Bell Labs had used to read checks in 1995 was the same architecture that, scaled up with more data and GPU computing, achieved breakthrough performance on every visual recognition benchmark.

LeCun’s reaction to AlexNet was not “I told you so” — though he had, in some sense, told them so. His reaction was to get back to work. The vindication was real but also incomplete: the field now knew that convolutional networks worked at scale. The question was what to do with them.

Facebook AI Research: Building an Academic Lab Inside a Company

In December 2013, Mark Zuckerberg hired LeCun to found and lead Facebook AI Research (FAIR). The timing was driven partly by competitive anxiety — Google had acquired DeepMind in January 2014, and tech companies were beginning to understand that AI research capacity would be strategically decisive — and partly by LeCun’s specific reputation as the person who had built the architecture now transforming the field.

LeCun was given substantial autonomy. He maintained his NYU professorship and ran FAIR as an academic laboratory within a commercial organization: publishing research openly, attending academic conferences, taking graduate students, pursuing long-term fundamental questions alongside applied work. The model was similar to how Bell Labs had operated at its best — research embedded in industry, oriented toward fundamental questions, publishing results rather than hoarding them as trade secrets.

Under LeCun’s leadership, FAIR produced foundational contributions in computer vision, natural language processing, generative modeling, and self-supervised learning. Among its most consequential outputs was PyTorch, the deep learning framework developed by a FAIR team led by Soumith Chintala. PyTorch’s design — dynamic computation graphs, Python-native debugging, imperative programming style — made it significantly more comfortable to use than Google’s TensorFlow, and it became the dominant framework in academic AI research and, increasingly, in production deployment.

The LLM Debate: An Insider’s Critique

When ChatGPT launched in November 2022 and demonstrated a quality of language generation that surprised even researchers in the field, the AI community’s predominant reaction was that large language models (LLMs) represented a genuine path toward general intelligence — perhaps the path. LeCun disagreed, and said so, in terms that were technical, repeated, and sometimes heated.

His argument was not that LLMs were unimpressive. It was that they were impressive in ways that were not the relevant ways. The specific points:

LLMs lack world models. Human intelligence is grounded in physical experience — in navigating space, manipulating objects, predicting the consequences of actions. LLMs train on text, which is a compressed, filtered, second-order representation of the world. No amount of text can substitute for the grounded sensorimotor experience that infant cognition develops over years. A system that has processed all of Wikipedia but has never pushed a button does not understand buttons in the way that matters for general intelligence.

LLMs do not plan. Generating a response token by token is not reasoning; it is autoregressive pattern completion. When an LLM produces a correct answer to a multi-step problem, it is not planning a solution path — it is sampling from a distribution over plausible continuations. These can coincide, but they are not the same process, and the distinction matters for reliability, generalizability, and robustness.

The right architecture is different. LeCun has proposed the Joint Embedding Predictive Architecture (JEPA) as a framework for systems that learn by building internal models of the world at multiple levels of abstraction, predicting not pixels or tokens but abstract representations. This approach, he argues, is more consistent with how biological systems learn and is more likely to produce the world models that general intelligence requires.

Warnung

LeCun’s critique of LLMs is unusual precisely because of its source. He is not a symbolic AI advocate or an LLM skeptic by disposition. He is the engineer of convolutional networks, one of the foundational tools of the deep learning era, and one of three recipients of the 2018 Turing Award for deep learning. His critique comes from inside the paradigm: not that deep learning is wrong, but that the specific implementation — autoregressive LLMs trained on text — is missing something essential. Whether this is correct is, as of 2026, a genuinely open empirical question.

His public presence on these questions — primarily on social media, in technical preprints, and in interviews — is characteristically combative. He has engaged directly with researchers at OpenAI, Google DeepMind, and elsewhere who disagreed, and has not softened his positions when they generated friction. The style reflects both genuine conviction and a willingness, sharpened by decades of being in the minority, to hold a position regardless of whether the field agrees.

Turing Award and the Godfathers Narrative

In 2018, LeCun, Hinton, and Yoshua Bengio (see Yoshua Bengio and the Montreal School) shared the ACM Turing Award for “conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.” The award was widely described as recognizing the “Godfathers of Deep Learning” — a label that captured the genuine intellectual debt the field owed to three researchers who had maintained their research program through years of dismissal and returned to find that they had been right.

The three were not, and are not, always aligned. LeCun’s optimism about AI’s near-term limitations places him in tension with Hinton’s fears and Bengio’s safety advocacy. The Turing Award ceremony in 2019 was a brief moment of shared recognition before the field moved rapidly into territory where their disagreements became substantive and public.

LeCun’s position in the post-ChatGPT discourse — credentialed enough to be taken seriously, contrarian enough to resist the consensus — gives him a particular role: the voice that says “not so fast” from inside the room where the decisions are being made. Whether he is the last voice of caution or the last voice of the previous era is not yet clear. The history of the field suggests both possibilities should be taken seriously.


📚 Sources