Evaluation & Safety

Red Teaming

Systematic adversarial testing of AI systems to identify vulnerabilities, failure modes, and unintended behaviours before deployment — adapted from cybersecurity to probe AI-specific weaknesses like prompt injection and jailbreaks.

Types of Testing

Manual red teaming remains essential for nuanced edge cases that require human creativity — skilled testers craft scenarios that automated tools would never think to try. Automated red teaming scales the effort by using AI models to generate adversarial inputs systematically, covering thousands of attack vectors that manual testing can't reach in reasonable time. Domain-specific testing focuses on the particular risks of a given application — a medical AI needs probes for dangerous health advice, while a financial system needs tests for discriminatory lending suggestions.

AI-Specific Attack Surfaces

Traditional security testing covers things like SQL injection and buffer overflows. AI red teaming adds an entirely new category of vulnerabilities. Prompt injection manipulates the model by embedding hidden instructions in user input or retrieved content. Jailbreaking uses social engineering techniques against the model itself — role-playing scenarios, encoding tricks, or multi-turn persuasion to bypass safety training. Data poisoning targets the training pipeline rather than the deployed model. These attack vectors don't map cleanly onto traditional security frameworks, which is why AI-specific red teaming practices have emerged as a distinct discipline.

Why It Matters Before Deployment

Red teaming has shifted from "nice to have" to regulatory expectation. NIST's AI Risk Management Framework, the OWASP Top 10 for LLM Applications, and the EU AI Act all reference adversarial testing as a key component of responsible AI deployment. The shift is from reactive — waiting for users to find problems — to proactive, where teams deliberately try to break their own systems. Organisations that skip this step are essentially running their red teaming exercise in production, with real users as the unwitting testers. The cost of discovering a jailbreak post-launch is orders of magnitude higher than finding it in a controlled testing environment.