You are designing ML serving architecture for two features:
(A) A "people you may know" recommendation shown when a user opens the LinkedIn home feed. The recommendations are pre-computed for all users.
(B) A fraud detection model that must approve or reject a credit card transaction in real time before the merchant gets a response.
For each:
- Choose batch or real-time inference and justify your choice.
- Specify the latency SLA you would target (p50, p99) and explain what that SLA implies for the architecture.
- Describe the architecture end-to-end: where features come from, where the model runs, where predictions are stored or returned.
- Identify the single most important failure mode and how the architecture handles it.