AI systems have security vulnerabilities that emerge from how they learn and process information — not from bugs in code — making them resistant to traditional security defenses while vulnerable to entirely new attack classes.
Why this appears in interviews
This is the conceptual foundation of AI security. Without understanding why AI systems are fundamentally different from a security perspective, everything else is just memorizing attack names without understanding the logic.
The mental model — rules vs. learning
Traditional software does exactly what the code says. If the code says "reject input containing DROP TABLE," no SQL injection using those exact words will succeed. Security is about defining and enforcing rules.
AI systems learn behavior from data. They generalize patterns. You cannot fully enumerate what an AI system will do. Inputs that look normal can cause abnormal behavior.
Five properties that make AI systems uniquely vulnerable
1. Non-determinism. The same input may produce different outputs. Security testing is statistical, not conclusive. You cannot prove a vulnerability does not exist by testing once.
2. Emergent behavior. Large models exhibit capabilities not explicitly trained — including the ability to follow complex multi-step instructions embedded in innocent-seeming text.
3. The instruction-following paradox. The more capable an LLM is at following instructions, the more capable it is at following malicious instructions.
4. Training data memorization. Models may reveal private information present in training data even when not explicitly asked.
5. Supply chain vulnerabilities. AI systems often incorporate pre-trained models from third parties. A poisoned open-source model on Hugging Face is a supply chain attack.
What traditional security misses
A traditional penetration tester will test for SQL injection, XSS, API authentication, and misconfigured cloud services. They will not test for: prompt injectionPrompt injectionAttack where malicious input overrides an AI system's instructions, causing unintended behaviour., training data extraction, jailbreaks that ignore safety instructions, or retrieval corpus poisoning.
Common interview mistakes
Mistake 1: Treating non-determinism as only a reliability problem. Non-determinism has security implications — a vulnerability may not trigger every time, making it harder to detect.
Mistake 2: Thinking safety and security are the same thing. Safety: preventing unintended harm. Security: preventing intentional harm by adversaries.
Mistake 3: Believing RAGRAGRetrieval-Augmented Generation — gives LLMs access to external knowledge by retrieving relevant documents before generating a response.Learn more → eliminates training data risks. RAGRAGRetrieval-Augmented Generation — gives LLMs access to external knowledge by retrieving relevant documents before generating a response.Learn more → introduces new attack surfaces (retrieval poisoning). It changes the attack surface; it does not eliminate it.
Key vocabulary
- Non-determinism — The property that the same input may produce different outputs, making exhaustive security testing impossible.
- Emergent capability — A model behavior not explicitly trained and not predictable from the training procedure.
- Training data memorization — The tendency of large models to reproduce verbatim segments of their training data.
- Supply chain attack — An attack that compromises a system by targeting its dependencies rather than the system itself.