Interview questions

Anthropic AI engineer interview questions (2026)

23 real interview questions reported by engineers who interviewed at Anthropic, spanning AI Engineering, Software Engineering, ML Engineering, AI Security Engineering. Every question is scored against a golden answer on the things Anthropic actually grades — architecture, token efficiency, security and correctness — not just whether your code runs.

Start practicing free Find your best-fit role

Anthropic AI Engineering questions

Evaluation Metric Misalignment: Faithfulness Scores Mask Semantic Errors
Code_comprehensionmedium
Your automated evaluation pipeline is reporting 95% faithfulness scores but manual review shows the answers are frequently wrong. The evaluation is producing false positives. Find
RAG Chatbot Ignoring Retrieved Documents, Using Training Data Instead
Code_comprehensioneasy
A customer-facing RAG chatbot is giving answers based on the model's training data instead of the retrieved documents. Users keep getting outdated information. The retrieval step w
Model Context Protocol standardizes AI tool integration architecture
Theorymedium
What is Model Context Protocol (MCP), and how does it standardize tool integration?
Building Customer Support Agent Evaluation Harness: Metrics Design and Contamination Prevention
Theorymedium
Walk through how you would build an evaluation harness for a customer-support agent. What metrics do you track, where do labels come from, and how do you guard against eval-set con
Infinite Loop Detection: Root Cause Analysis and Design Prevention
Theorymedium
Your agent loops infinitely on certain user inputs. Walk through your debugging approach and how you would change the design to prevent it.
Domain Adaptation Strategies for Generative AI Models
System designmedium
How would you adapt a generative AI model to perform well in a specific domain?
Iterative refinement balancing clarity specificity and desired output format
Theoryeasy
What's your approach to prompt engineering?
Control generative AI model creativity through temperature and sampling parameters
Theoryeasy
How do you adjust the creativity of generative AI models?
Adaptive agent architecture with dynamic task learning capabilities
System designhard
Design an agentic AI system that can autonomously adapt to new tasks.
Mitigating Hallucinations In Production Generative AI Systems
System designmedium
How would you handle hallucinations in a generative AI model deployed to users?

Anthropic Software Engineering questions

Conversation History Compression for Long-Running Agents
Codinghard
Implement a top-level `solve(payload)` that compresses old conversation history into a single summary message while preserving the system message and the most recent turns. ## Th
Resilient LLM API calls with exponential backoff retry logic
Codingeasy
LLM APIs are unreliable in production — they return rate limit errors (429) and server errors (500/503) that require retries. Implement a function `retry_llm_call` that: (1) Take
Prompt Injection Vulnerability in Content Filter Implementation
Code_comprehensionmedium
Your customer support chatbot has a content filter that blocks harmful responses. Users have found a way to bypass it by including instructions in their messages. Find the vulnerab
AI Agent Prompt Injection Attack Via Support Ticket Content
Code_comprehensionhard
A customer service agent that reads support tickets and drafts responses has been behaving strangely in production — sending unauthorized responses, accessing data outside its scop
RAG evaluation metrics masking real-world answer quality failures
Code_comprehensionmedium
A production RAG system has faithfulness 0.91 and context relevance 0.89 — both look great on the dashboard. But users keep filing support tickets saying answers feel wrong or besi

Anthropic ML Engineering questions

Distributed Search System Architecture with LLM Inference at Scale
System designhard
Design a distributed search system capable of handling 1 billion documents and 1 million QPS, while managing LLM inference for over 10,000 requests per second. Walk through your ar
End-to-end LLM query batching system design and optimization
System designhard
Design an end-to-end batching system for LLM queries.
GPU Inference Batching System Design with Synchronous User Requests
System designhard
Design an inference batching system for a single GPU that can handle up to 100 inputs per batch while users wait synchronously, maximizing utilization under compute constraints.
Scalable Token Generation Service Architecture for High-Throughput LLM
System designhard
Design a scalable system for a token-generation service used by an LLM that needs to handle up to 100,000 requests per second.

Anthropic AI Security Engineering questions

GenAI Safety: Building Trust Through Guardrails and Evaluation
System designmedium
How do you approach GenAI safety in consumer products?
Overconfident Model Hallucinations in High-Risk Conversational AI Contexts
System designhard
Imagine you're part of a team deploying a conversational AI model that can reason across sensitive topics. During internal testing, you discovered that the model usually gives over
Generative AI safety: detecting and mitigating harmful content generation
Theorymedium
What are the challenges in ensuring generative AI doesn't produce harmful or unsafe content, and how would you address them?
Design a prompt injection defence architecture
System designmedium
You are building a customer-facing LLM application for a bank. Users can ask questions about their accounts. A security audit identified prompt injection as a critical risk — malic

Prepare for your Anthropic interview

Velocode turns reported Anthropic interview questions into scored practice. Free accounts get the full community problem set and one Anthropic-tagged question per domain; Pro unlocks every company-verified question, the interview simulator, and your domain readiness radar.

Create a free account See Pro