The ML lifecycle is the complete sequence of steps from defining a problem to maintaining a model in production — and most ML engineering work happens in the latter half, after the model is trained.
Why this appears in interviews
Interviewers test whether you think end-to-end or only about one phase. The lifecycle framework gives you a structured way to answer any "design an ML system" question.
The mental model — an iceberg
Training and tuning — the tip above water. Data pipelines, feature engineering, serving infrastructure, monitoring, and retraining — the larger mass below. Most ML engineering work is below the surface.
The six phases
Phase 1: Problem definition. What are you predicting? What does a wrong prediction cost? Often skipped and causes most ML project failures.
Phase 2: Data collection and preparation. Where does training data come from? This phase typically consumes 60-80% of project time.
Phase 3: Feature engineering. Transform raw data into model inputs. Avoid data leakage (future information in training).
Phase 4: Model training and evaluation. Choose architecture, train, evaluate on held-out data, compare against a baseline. What tutorials cover — least of an ML engineer's time in mature organizations.
Phase 5: Deployment. Containerize, build serving endpoint, integrate with application, test in staging.
Phase 6: Monitoring and maintenance. Track prediction distribution, feature distribution, business metrics. This phase never ends.
The retraining loop
New data arrives → pipeline processes it → monitoring detects degradation
OR trigger fires → new model trained → evaluated against current
production model → if new model wins: promote → monitor → repeat forever
Common interview mistakes
Mistake 1: Jumping to model selection. Problem definition, data assessment, and baseline establishment come first.
Mistake 2: Not thinking about retraining. A complete answer always includes a monitoring and retraining strategy.
Mistake 3: Ignoring the baseline. If your model does not beat the simplest possible approach, you do not have a model worth deploying.
Key vocabulary
- Data leakage — When future information accidentally influences model training, causing artificially high evaluation metrics.
- Label — The target variable you are predicting. Defining labels correctly is one of the most important steps.
- Baseline — The simplest possible model that defines the floor of acceptable performance.
- Retraining trigger — The condition that initiates a new training run (scheduled, performance-based, or data-volume-based).