Deep Learning
A subset of machine learning that uses neural networks with many layers to learn increasingly abstract representations of data, powering breakthroughs in language, vision, and generation.
Why it matters
Deep learning is what made modern AI possible. Every large language model, image generator, and speech recognition system runs on deep neural networks. Understanding the concept explains why these systems need so much data and compute.
Why "deep"
The "deep" in deep learning refers to the number of layers in a neural network. A shallow network might have two or three layers. A deep network has dozens, hundreds, or even thousands. Each layer transforms the data into a slightly more abstract representation. Early layers in an image model detect edges. Middle layers detect shapes. Later layers detect objects.
Why it works
Deep networks can learn hierarchical features automatically from raw data, without anyone hand-engineering them. Given enough data and compute, they discover representations that are remarkably effective at tasks from translation to protein folding.
Key architectures
- Transformers — the architecture behind GPT, Claude, and most modern language models. Uses attention mechanisms to process sequences in parallel.
- Convolutional Neural Networks (CNNs) — specialized for grid-like data such as images. Still widely used in computer vision.
- Recurrent Neural Networks (RNNs) — process sequences one step at a time. Largely replaced by transformers for language tasks.