Interview questions

OpenAI AI engineer interview questions (2026)

19 real interview questions reported by engineers who interviewed at OpenAI, spanning AI Engineering, AI Security Engineering, ML Engineering, Software Engineering. Every question is scored against a golden answer on the things OpenAI actually grades — architecture, token efficiency, security and correctness — not just whether your code runs.

Start practicing free Find your best-fit role

OpenAI AI Engineering questions

Multi-tenant RAG: Secure Vector Index Isolation for Enterprise
System designhard
Design a multi-tenant RAG system that serves 50 enterprise customers from a shared vector index without leaking documents across tenants. Cover index layout, query-time filtering,
Concurrent dictionary writes cause data loss without synchronization locks
Code_comprehensionhard
A multi-agent research system has agents that write findings to a shared results dictionary. Occasionally results from one agent overwrite results from another and some findings ar
Vector Embedding Normalization Missing Causes Low Similarity Scores
Code_comprehensioneasy
A vector search system returns completely irrelevant results even though the documents were indexed correctly. The similarity scores are suspiciously low for all queries. Find the
Improve Low Embedding Accuracy Through Fine-tuning and Domain Adaptation
Theorymedium
Suppose you are working with an open AI embedding model, after benchmarking accuracy is coming low, how would you further improve the accuracy of embedding the search model?
Streaming LLM tool calls: partial parsing, client buffering, progressive UI render
Theorymedium
You need to ship a streaming chat UI backed by an LLM with tool use. Walk through how you handle partial tool calls in the stream and how the client renders them.
ChatGPT Processing Pipeline: Request To Response Generation
Theorymedium
Inside ChatGPT: What Happens After You Hit Enter?
ChatGPT Product Strategy and Metrics Design
Theorymedium
Why do you like ChatGPT? Who are its users, what metrics would you use to track its success, and how would you improve it?
Detect and Prevent Model Overfitting and Underfitting Issues
Theoryeasy
What is overfitting or underfitting? Which models are most likely to experience this, and why?
Enterprise RAG Pipeline: Scalable Search with Sub-200ms Latency Guarantees
System designhard
You're building a RAG pipeline for enterprise document search. The corpus is 10 million documents updated daily. Design for sub-200ms p99 latency with cited sources.
Implement a RAG pipeline with citation tracking
Codingmedium
You are building a RAG pipeline for a legal research platform. The corpus is 2 million case law documents updated weekly. Users ask complex multi-part questions. Requirements: sub-

OpenAI Software Engineering questions

Token Budget Manager for Multi-turn LLM Applications
Codingeasy
Implement a top-level `solve(payload)` that estimates the token cost of a chat history. ## The solve contract `payload` is a dict: `{"messages": [{"role": str, "content": str}],
Semantic cache using embeddings for similarity matching
Codingmedium
Implement a top-level `solve(payload)` for a semantic cache: return a stored response when a new query's embedding is similar enough (by cosine similarity) to a stored query's emb
Token Counter Mismatch Between Dashboard Estimates And Actual LLM Costs
Code_comprehensioneasy
Your team's cost monitoring dashboard is showing LLM spend 40% lower than the actual Stripe invoices. The token counter used for cost estimation is producing wrong numbers. Find th
Agent Function Calling JSON Generation Produces Malformed Output Intermittently
Code_comprehensionmedium
Your agent uses function calling to extract structured data from user messages. Occasionally the downstream system crashes because it receives malformed JSON. The bug is intermitte
Agent hallucinating confident incorrect data from tool composition failures
Code_comprehensionhard
A financial data agent calls three tools in sequence to generate investment summaries. In production it intermittently returns confident summaries with completely wrong numbers. Th
Embedding Model Version Mismatch Between Development Production Environments
Code_comprehensioneasy
A semantic search feature worked fine in development but crashes in production with a dimension mismatch error. A new engineer recently upgraded the embedding model to "improve qua

OpenAI AI Security Engineering questions

Red-team harness for automated prompt-injection bypass discovery
Theoryhard
Walk through how you would build a red-team harness that automatically discovers prompt-injection bypasses against a production agent.
Secure GPT-4o Agent Against Indirect MCP Prompt Injection Attacks
System designhard
Secure a GPT-4o agent with MCP tool access against indirect prompt injection Your team is deploying a GPT-4o agent using the Model Context Protocol (MCP) with access to: terminal

OpenAI ML Engineering questions

Handling Multicollinearity: Select Features From Highly Correlated Sets
Theorymedium
How do you select input for modeling if there are features highly correlated with each other?

Prepare for your OpenAI interview

Velocode turns reported OpenAI interview questions into scored practice. Free accounts get the full community problem set and one OpenAI-tagged question per domain; Pro unlocks every company-verified question, the interview simulator, and your domain readiness radar.

Create a free account See Pro