AI Engineering Interview Questions — Practice Library

41 problems
Problem
Tags
Type
Users with session count growth for three consecutive months
Reported at AirbnbData EngineeringMedium
Coding
Streaming Clickstream Lakehouse: Storage, Compaction, Queries, Schema Evolution
Reported at DatabricksData EngineeringHard
System Design
Employee hierarchy with full management chain as array
Data EngineeringMedium
Coding
Re-partition large events table for cross-quarter joins efficiently
Reported at SnowflakeData EngineeringHard
Theory
Snowflake Micro-Partitions: Design, Clustering Keys, Performance Trade-offs
Reported at SnowflakeData EngineeringMedium
Theory
Lakehouse Architecture for 50TB of JSON Events
Data EngineeringEasy
System Design
User lifetime net revenue and distinct active months count
Reported at UberData EngineeringMedium
Coding
Delta Lake Time Travel: Querying Historical Table Versions
Reported at DatabricksData EngineeringHard
Theory
95th percentile trailing value across 1000 orders per row
Data EngineeringMedium
Coding
Ride-level fact table with trip completion metrics and surge multipliers
Reported at UberData EngineeringMedium
System Design
Seven Day Rolling Average Daily Active Users Including Zero Days
Reported at StripeData EngineeringMedium
Coding
Data skew in partition causing task straggler bottleneck
Data EngineeringMedium
Theory
Feature drift detection pipeline with automated monitoring and alerting system
Reported at MetaData EngineeringMedium
System Design
Schema evolution resilience: versioning, compatibility layers, registry patterns
Reported at GoogleData EngineeringHard
System Design
Choosing Streaming or Batch for Two Data Products
Data EngineeringMedium
System Design
User purchase median interval calculation with window functions
Reported at NetflixData EngineeringMedium
Coding
Spark Join OOM: Check broadcast threshold and shuffle partition configuration
Reported at DatabricksData EngineeringHard
Theory
Data Quality — Diagnosing Three Trust-Eroding Incidents
Data EngineeringMedium
Theory
Kafka to S3 Iceberg pipeline with exactly-once semantics guarantees
Reported at NetflixData EngineeringHard
System Design
DBT Model Performance Degradation: Diagnosis and Optimization Strategy
Reported at StripeData EngineeringMedium
Theory
CDC — Designing a Debezium Pipeline and Handling Schema Changes
Data EngineeringHard
Theory
CDC Log-Based Versus Trigger-Based: Implementation Trade-offs
Reported at LinkedInData EngineeringEasy
Theory
Product Daily Revenue Rolling 30-Day Percentile Rank Calculation
Reported at AmazonData EngineeringMedium
Coding
Advanced dbt — Diagnosing Three Production Incidents
Data EngineeringMedium
Theory
Loading more problems…