Temperature
A parameter that controls how random or deterministic an LLM's output is — lower values produce more predictable, focused responses while higher values increase creativity and variation.
How It Works
Under the hood, an LLM generates a probability distribution over all possible next tokens at each step. Temperature modifies this distribution before sampling. The raw model outputs (logits) are divided by the temperature value before being passed through the softmax function. A temperature of 1.0 uses the distribution as-is. Values below 1.0 sharpen the distribution — the most likely tokens become even more likely, making the model's choices more predictable. Values above 1.0 flatten the distribution, giving lower-probability tokens a better chance of being selected. At temperature 0 (or near-zero), the model becomes effectively deterministic, always picking the highest-probability token — this is sometimes called greedy decoding.
Practical Guidance
The right temperature depends entirely on the task. Temperature 0 is the go-to for factual question answering, code generation, structured data extraction, and anything where consistency and correctness matter more than variety. You want the same input to produce the same output every time. Temperature 0.3-0.7 hits the sweet spot for most conversational and writing tasks — enough variation to feel natural, enough focus to stay coherent. Temperature 1.0+ is for brainstorming, creative writing, and exploration where you actively want surprising outputs. Above 1.5 or so, outputs tend to degrade into incoherence as the model starts picking genuinely unlikely tokens.
Related Parameters
Temperature isn't the only sampling control available. Top-p (nucleus sampling) takes a different approach — instead of modifying all probabilities uniformly, it considers only the smallest set of tokens whose cumulative probability exceeds a threshold p. Top-k is simpler: consider only the k most likely tokens. The key distinction is that temperature controls global randomness across the entire distribution, while top-p and top-k filter which tokens are even considered. In practice, many API providers let you combine these — for example, temperature 0.7 with top-p 0.9 gives controlled creativity with a safety net against truly bizarre outputs. Some providers recommend using either temperature or top-p but not both, as they can interact unpredictably.