MediumML EngineeringSystem Design

Batch vs Real-Time Inference — Two Serving Architectures

You are designing ML serving architecture for two features:

(A) A "people you may know" recommendation shown when a user opens the LinkedIn home feed. The recommendations are pre-computed for all users.

(B) A fraud detection model that must approve or reject a credit card transaction in real time before the merchant gets a response.

For each:

Choose batch or real-time inference and justify your choice.
Specify the latency SLA you would target (p50, p99) and explain what that SLA implies for the architecture.
Describe the architecture end-to-end: where features come from, where the model runs, where predictions are stored or returned.
Identify the single most important failure mode and how the architecture handles it.

Sign in to attempt this problem

Free account gives you full access to community problems with the complete solution reveal — golden answer, senior walkthrough, and score breakdown — after submission.

Start free →Already have an account? Sign in