Models & Platforms

Context Window

The maximum amount of text (measured in tokens) that a language model can consider at once — including both the input prompt and the generated output.

Why it matters

The context window is the fundamental constraint of every LLM application. It determines how much information the model can work with, what your architecture looks like, and how much each request costs.

Size matters, but not how you think

Context windows have grown dramatically: from 4K tokens (early GPT-3.5) to 200K (Claude) to 1M+ (Gemini). But bigger isn't always better. Models can struggle with information buried in the middle of very long contexts (the "lost in the middle" problem), and costs scale linearly with input length.

Practical implications

RAG vs. long context — for very large document sets, RAG is still more cost-effective than stuffing everything into the context. For single documents or small corpora, long context is simpler.
Prompt caching — services like Anthropic's prompt caching let you pay once for a long system prompt and reuse it across requests.
Context engineering — the art of deciding what goes into the context window and in what order. More important than context size.

From our blog

Comparison9 min

RAG vs Long-Context LLMs: The Decision Framework That Actually Helps You Choose

Feb 5, 2026

Engineering9 min

RAG vs. Long Context Windows: A Decision Framework for Research Workflows

Dec 28, 2025