Developer ExperienceCoT

Chain-of-Thought Reasoning

A prompting technique that improves LLM accuracy on complex tasks by guiding the model to show its reasoning step-by-step before arriving at a final answer.

How It Works

Chain-of-thought prompting asks the model to produce intermediate reasoning steps rather than jumping straight to a conclusion. Instead of asking "What is 47 times 83?" and getting a potentially wrong snap answer, you prompt the model to break it down: multiply the ones, carry, multiply the tens, add the partial products. This mirrors how humans handle complex problems — decomposing them into manageable sub-steps where each builds on the last. The technique works because LLMs are autoregressive: each generated token conditions on all previous tokens. By generating reasoning tokens first, the model effectively gives itself a scratchpad, making the correct final answer more likely to follow from its own intermediate work.

Key Variants

Zero-shot CoT is the simplest form — just append "Let's think step by step" to your prompt. Surprisingly effective for a technique that requires zero examples. Few-shot CoT provides worked examples in the prompt, showing the model the reasoning format you expect. This gives more control over the style and depth of reasoning. Self-consistency takes it further by sampling multiple reasoning paths at higher temperature and taking a majority vote on the final answer — different reasoning chains might make different errors, but the correct answer tends to appear most frequently. Tree-of-thought extends the linear chain into branching exploration, letting the model consider and evaluate multiple approaches before committing.

When to Use It

CoT shines on tasks that require genuine multi-step reasoning: math word problems, logical deductions, code debugging, multi-hop question answering. The accuracy gains on these tasks can be dramatic — turning a 40% success rate into 80%+. But it's not always the right tool. Simple factual lookups ("What is the capital of France?") don't benefit from step-by-step reasoning and just waste tokens. Classification tasks with straightforward criteria rarely need CoT either. The trade-off is always latency and cost: more reasoning tokens mean longer responses and higher bills. Use CoT when the accuracy gain justifies the overhead.