Interview questions
Anthropic AI engineer interview questions (2026)
23 real interview questions reported by engineers who interviewed at Anthropic, spanning AI Engineering, Software Engineering, ML Engineering, AI Security Engineering. Every question is scored against a golden answer on the things Anthropic actually grades — architecture, token efficiency, security and correctness — not just whether your code runs.
Anthropic AI Engineering questions
- Evaluation Metric Misalignment: Faithfulness Scores Mask Semantic Errors
Your automated evaluation pipeline is reporting 95% faithfulness scores but manual review shows the answers are frequently wrong. The evaluation is producing false positives. Find
- RAG Chatbot Ignoring Retrieved Documents, Using Training Data Instead
A customer-facing RAG chatbot is giving answers based on the model's training data instead of the retrieved documents. Users keep getting outdated information. The retrieval step w
- Model Context Protocol standardizes AI tool integration architecture
What is Model Context Protocol (MCP), and how does it standardize tool integration?
- Building Customer Support Agent Evaluation Harness: Metrics Design and Contamination Prevention
Walk through how you would build an evaluation harness for a customer-support agent. What metrics do you track, where do labels come from, and how do you guard against eval-set con
- Infinite Loop Detection: Root Cause Analysis and Design Prevention
Your agent loops infinitely on certain user inputs. Walk through your debugging approach and how you would change the design to prevent it.
- Domain Adaptation Strategies for Generative AI Models
How would you adapt a generative AI model to perform well in a specific domain?
- Iterative refinement balancing clarity specificity and desired output format
What's your approach to prompt engineering?
- Control generative AI model creativity through temperature and sampling parameters
How do you adjust the creativity of generative AI models?
- Adaptive agent architecture with dynamic task learning capabilities
Design an agentic AI system that can autonomously adapt to new tasks.
- Mitigating Hallucinations In Production Generative AI Systems
How would you handle hallucinations in a generative AI model deployed to users?
Anthropic Software Engineering questions
- Conversation History Compression for Long-Running Agents
Implement a top-level `solve(payload)` that compresses old conversation history into a single summary message while preserving the system message and the most recent turns. ## Th
- Resilient LLM API calls with exponential backoff retry logic
LLM APIs are unreliable in production — they return rate limit errors (429) and server errors (500/503) that require retries. Implement a function `retry_llm_call` that: (1) Take
- Prompt Injection Vulnerability in Content Filter Implementation
Your customer support chatbot has a content filter that blocks harmful responses. Users have found a way to bypass it by including instructions in their messages. Find the vulnerab
- AI Agent Prompt Injection Attack Via Support Ticket Content
A customer service agent that reads support tickets and drafts responses has been behaving strangely in production — sending unauthorized responses, accessing data outside its scop
- RAG evaluation metrics masking real-world answer quality failures
A production RAG system has faithfulness 0.91 and context relevance 0.89 — both look great on the dashboard. But users keep filing support tickets saying answers feel wrong or besi
Anthropic ML Engineering questions
- Distributed Search System Architecture with LLM Inference at Scale
Design a distributed search system capable of handling 1 billion documents and 1 million QPS, while managing LLM inference for over 10,000 requests per second. Walk through your ar
- End-to-end LLM query batching system design and optimization
Design an end-to-end batching system for LLM queries.
- GPU Inference Batching System Design with Synchronous User Requests
Design an inference batching system for a single GPU that can handle up to 100 inputs per batch while users wait synchronously, maximizing utilization under compute constraints.
- Scalable Token Generation Service Architecture for High-Throughput LLM
Design a scalable system for a token-generation service used by an LLM that needs to handle up to 100,000 requests per second.
Anthropic AI Security Engineering questions
- GenAI Safety: Building Trust Through Guardrails and Evaluation
How do you approach GenAI safety in consumer products?
- Overconfident Model Hallucinations in High-Risk Conversational AI Contexts
Imagine you're part of a team deploying a conversational AI model that can reason across sensitive topics. During internal testing, you discovered that the model usually gives over
- Generative AI safety: detecting and mitigating harmful content generation
What are the challenges in ensuring generative AI doesn't produce harmful or unsafe content, and how would you address them?
- Design a prompt injection defence architecture
You are building a customer-facing LLM application for a bank. Users can ask questions about their accounts. A security audit identified prompt injection as a critical risk — malic
Prepare for your Anthropic interview
Velocode turns reported Anthropic interview questions into scored practice. Free accounts get the full community problem set and one Anthropic-tagged question per domain; Pro unlocks every company-verified question, the interview simulator, and your domain readiness radar.