Reasoning Models
LLMs trained with reinforcement learning to "think before they answer" by generating internal chains of reasoning — producing more accurate results on complex tasks like maths, coding, and multi-step logic at the cost of higher latency and token usage.
Why it matters
Reasoning models represent a new scaling paradigm — instead of just making models bigger, inference-time compute lets models "think harder" on demand. The key is knowing when the extra cost is worth it: complex analysis and coding benefit enormously, but simple queries don't.
How They Differ from Standard LLMs
Traditional LLMs generate answers token by token in a single forward pass — they essentially blurt out the first plausible continuation. Reasoning models add an explicit "thinking" phase before producing a final answer. During this phase, the model generates internal chain-of-thought tokens that explore the problem, consider edge cases, and self-correct mistakes.
The key architectural difference isn't in the model weights themselves but in how they're trained. Reasoning models use reinforcement learning (typically variants of GRPO or PPO) to reward outputs that arrive at correct answers through valid reasoning steps. This teaches the model to allocate more computation to harder problems — a form of inference-time scaling that trades speed for accuracy.
Key Examples
- OpenAI o1 / o3 / o4-mini — The series that popularised the paradigm. o1 introduced visible chain-of-thought reasoning; o3 and o4-mini refined cost-performance trade-offs with configurable "thinking effort" levels.
- DeepSeek R1 — An open-source reasoning model that demonstrated you don't need a trillion-dollar lab to build competitive reasoning capabilities. Its open weights made it a catalyst for community research into reasoning techniques.
- Claude with extended thinking — Anthropic's approach bakes reasoning into the Claude model family with adjustable effort levels, letting developers control how much thinking budget to allocate per request rather than choosing a separate model entirely.
Trade-offs and When to Use Them
Reasoning models are not a universal upgrade. They are slower (often 10-60 seconds for complex queries) and consume significantly more tokens — both of which translate directly to higher cost. For straightforward tasks like summarisation, translation, or simple Q&A, a standard model is faster, cheaper, and often just as accurate.
Research consistently shows diminishing and sometimes negative returns on low-complexity tasks. A reasoning model asked to draft a friendly email may overthink it, producing stilted or over-qualified output. The real gains show up on multi-step logic, competitive programming, advanced maths, and tasks requiring the model to catch its own mistakes before committing to an answer.
The practical takeaway: route complex analytical work to reasoning models and keep simple tasks on standard models. Most production systems benefit from a mix of both rather than defaulting to the most capable option for everything.