Token
The basic unit of text that a language model processes — typically a word, subword, or punctuation mark, roughly equivalent to 3/4 of an English word.
Why it matters
Tokens are the currency of AI. They determine what fits in a context window, how much an API call costs, and how fast a model can respond. Understanding tokenization helps you optimize for all three.
Tokenization
Before a model can process text, it must be broken into tokens using a tokenizer. Modern LLMs use subword tokenization (BPE or SentencePiece), which splits common words into single tokens and rare words into multiple pieces. The word "embedding" might be one token, while "defenestration" might be split into "defen" + "estration."
Why tokens matter
- Context window — models have a maximum token limit (e.g., 200K tokens for Claude). This is the total budget for input + output.
- Pricing — API costs are calculated per token (input and output priced separately).
- Speed — generation speed is measured in tokens per second. Output tokens are slower than input processing.
Rules of thumb
For English text: 1 token ≈ 4 characters ≈ 0.75 words. A page of text is roughly 500–800 tokens. Code tends to use more tokens per logical unit than prose.