Models & Platforms
Context Window
The maximum amount of text (measured in tokens) that a language model can consider at once — including both the input prompt and the generated output.
Why it matters
The context window is the fundamental constraint of every LLM application. It determines how much information the model can work with, what your architecture looks like, and how much each request costs.
Size matters, but not how you think
Context windows have grown dramatically: from 4K tokens (early GPT-3.5) to 200K (Claude) to 1M+ (Gemini). But bigger isn't always better. Models can struggle with information buried in the middle of very long contexts (the "lost in the middle" problem), and costs scale linearly with input length.
Practical implications
- RAG vs. long context — for very large document sets, RAG is still more cost-effective than stuffing everything into the context. For single documents or small corpora, long context is simpler.
- Prompt caching — services like Anthropic's prompt caching let you pay once for a long system prompt and reuse it across requests.
- Context engineering — the art of deciding what goes into the context window and in what order. More important than context size.
Related Terms
Large Language Model(LLM)— A neural network trained on massive text corpora that can understand and generate human language, typically with billions of parameters.Retrieval-Augmented Generation(RAG)— A technique that grounds a language model's output in external data by retrieving relevant documents before generating a response.Token— The basic unit of text that a language model processes — typically a word, subword, or punctuation mark, roughly equivalent to 3/4 of an English word.
On the AI Radar
Prompt Caching— Stores and reuses previously computed key-value tensors from attention layers to avoid redundant computation on repeated prompt prefixes, delivering up to 90% cost reduction and 85% latency reduction. All three major providers now offer it with different implementations.Context Engineering— The practice of assembling optimal context for LLM applications — going beyond retrieval to systematic management of instructions, tools, state, and knowledge across the full prompt lifecycle.