ML Lifecycle — ML Engineer Stage 1

The ML lifecycle is the complete sequence of steps from defining a problem to maintaining a model in production — and most ML engineering work happens in the latter half, after the model is trained.

Why this appears in interviews

Interviewers test whether you think end-to-end or only about one phase. The lifecycle framework gives you a structured way to answer any "design an ML system" question.

The mental model — an iceberg

Training and tuning — the tip above water. Data pipelines, feature engineering, serving infrastructure, monitoring, and retraining — the larger mass below. Most ML engineering work is below the surface.

The six phases

Phase 1: Problem definition. What are you predicting? What does a wrong prediction cost? Often skipped and causes most ML project failures.

Phase 2: Data collection and preparation. Where does training data come from? This phase typically consumes 60-80% of project time.

Phase 3: Feature engineering. Transform raw data into model inputs. Avoid data leakage (future information in training).

Phase 4: Model training and evaluation. Choose architecture, train, evaluate on held-out data, compare against a baseline. What tutorials cover — least of an ML engineer's time in mature organizations.

Phase 5: Deployment. Containerize, build serving endpoint, integrate with application, test in staging.

Phase 6: Monitoring and maintenance. Track prediction distribution, feature distribution, business metrics. This phase never ends.

The retraining loop

New data arrives → pipeline processes it → monitoring detects degradation
   OR trigger fires → new model trained → evaluated against current
   production model → if new model wins: promote → monitor → repeat forever

Common interview mistakes

Mistake 1: Jumping to model selection. Problem definition, data assessment, and baseline establishment come first.

Mistake 2: Not thinking about retraining. A complete answer always includes a monitoring and retraining strategy.

Mistake 3: Ignoring the baseline. If your model does not beat the simplest possible approach, you do not have a model worth deploying.

Key vocabulary

Data leakage — When future information accidentally influences model training, causing artificially high evaluation metrics.
Label — The target variable you are predicting. Defining labels correctly is one of the most important steps.
Baseline — The simplest possible model that defines the floor of acceptable performance.
Retraining trigger — The condition that initiates a new training run (scheduled, performance-based, or data-volume-based).