Models & PlatformsRL

Reinforcement Learning

A machine learning approach where an agent learns by taking actions in an environment and receiving rewards or penalties, gradually discovering which strategies produce the best outcomes.

Why it matters

Reinforcement learning is the technique behind RLHF, which is how language models like ChatGPT and Claude are fine-tuned to be helpful and safe. It is also how AI learns to play games, control robots, and optimize complex systems.

How it works

An RL agent interacts with an environment by taking actions. After each action, it receives a reward signal — positive for good outcomes, negative for bad ones. Over many episodes of trial and error, the agent learns a policy: a strategy for choosing actions that maximize cumulative reward.

Key concepts

Agent — the learner that takes actions.
Environment — everything the agent interacts with.
Reward — the feedback signal that tells the agent how well it is doing.
Policy — the learned strategy mapping states to actions.

RLHF: the language model connection

Reinforcement Learning from Human Feedback (RLHF) applies this framework to language models. Human raters rank model outputs by quality. A reward model learns to predict those preferences. Then the language model is fine-tuned via RL to generate responses the reward model scores highly. This is how models learn to be helpful, harmless, and honest.