Jailbreak Taxonomy
From Stage 1 · Foundation
A jailbreak is an input designed to cause an AI model to generate content or take actions that its safety training was intended to prevent — and understanding the major attack categories helps you evaluate attack sophistication and defense
You'll learn how this concept slots into the production systems you'll be expected to design in interviews — the same pipelines used at Anthropic, OpenAI, Cohere, and Google DeepMind.
Includes a worked example with annotated code, the common mistakes interviewers probe for, and a glossary of the terms you'll need to use fluently.
🔒Unlock with Pro — continue your learning streak
Pro unlocks the locked concept pages and problems in every stage, plus all of Stage 5 and the completion certificate.
Upgrade to ProAlready a Pro member? Sign in