Evaluation & Safety

Guardrails

Programmatic constraints placed around AI model inputs and outputs to prevent harmful, off-topic, or policy-violating behavior.

Why it matters

Guardrails are how you ship AI to production without anxiety. They are the safety net between a capable-but-unpredictable model and real users.

Input vs. output guardrails

Input guardrails screen user messages before they reach the model — blocking prompt injection attempts, filtering PII, and enforcing topic boundaries. Output guardrails validate model responses before they reach the user — checking for toxicity, factual consistency, format compliance, and policy adherence.

Implementation patterns

Rule-based — regex, keyword blocklists, format validators. Fast and predictable.
Classifier-based — lightweight ML models that detect toxicity, PII, or off-topic content.
LLM-as-judge — use a second LLM call to evaluate whether the output meets quality criteria.
Constitutional AI — self-critique loops where the model checks its own output against principles.

Frameworks

Guardrails AI, NeMo Guardrails (NVIDIA), and Anthropic's built-in safety layers are the main options. Many teams also build custom guardrail pipelines tailored to their domain.

From our blog

Engineering8 min

Gaslighting Your AI Into Better Results: What the Research Actually Shows

Jan 29, 2026

Workflows9 min

The Productivity Lie: Why AI Made Me Slower Before It Made Me Faster (and the 3 techniques that finally fixed it)

Dec 14, 2025