Concept · ~8 min read

Mlops What It Is

is the practice of applying software engineering and DevOps principles — automation, CI/CD, monitoring, and reproducibility — to the full lifecycle of machine learning systems in production, closing the gap between model development and reliable model operation.

Why this appears in interviews

is both a philosophy and a set of practices, and the term is frequently misused. Interviewers ask about it to see whether you understand the operational challenges of production ML — not just the tooling — and whether you can articulate why the ML development process breaks down without it.

The mental model — why ML is harder than software to operate

The model has three inputs, not one. A software system's behaviour is determined by its code. An ML model's behaviour is determined by its code, its training data, and its hyperparameters. Change any one of these and the model changes — even if you did not touch the deployment.

"Testing" a model is probabilistic, not deterministic. You cannot exhaustively test an ML model the way you can test software. You sample from a distribution of inputs and measure aggregate performance. This means bugs can slip through that deterministic tests would catch.

Production is part of the system. In traditional software, production is where you deploy your finished product. In ML, production is where you collect the data that trains your next model. The boundary between development and production is blurry.

The maturity model — three levels

Level 0 — Manual ML: Data scientists train models in notebooks, export weights, a DevOps engineer manually deploys them. No automated retraining. No monitoring. Most ML in industry is at this level.

Level 1 — ML Pipeline Automation: The training pipeline is automated — new data triggers a training run, the trained model is automatically evaluated, and if it passes quality gates it is promoted to staging. Human approval is still required for production deployment.

Level 2 — CI/CD for ML: Training, evaluation, and deployment are all automated in a CI/CD pipeline. Model performance monitoring triggers automated retraining. New model versions are deployed using canary releases with automated rollback. Engineers are alerted when intervention is needed but the system manages itself day-to-day.

What MLOps actually looks like in practice

  • A model registry: versioned storage of trained models with metadata, enabling instant rollback to any previous model version
  • Automated training pipelines: a data change triggers preprocessing, training, evaluation, and promotion
  • Feature stores: a shared system that computes features consistently between training and serving (prevents training-serving skew)
  • Monitoring and alerting: automated checks on data drift, prediction distribution shifts, system performance, and business metrics
  • Reproducibility: every model in the registry can be re-trained from scratch by re-running the training pipeline with the logged data version, code version, and hyperparameters

The MLOps tool landscape

Experiment tracking: MLflow, Weights & Biases — track training runs, hyperparameters, metrics

Pipeline orchestration: Kubeflow Pipelines, Metaflow, Vertex AI Pipelines — automate the training workflow

Model registry: MLflow Model Registry, SageMaker Model Registry — version and serve models

Feature stores: Feast, Tecton, Vertex AI Feature Store — consistent feature computation

Monitoring: Evidently, Arize AI, WhyLabs — track data and model quality in production

Common interview mistakes

Mistake 1: Listing tools without explaining the problem they solve. Always ground your answer in the underlying operational challenge.

Mistake 2: Treating as only about deployment. covers the entire lifecycle — from data collection through training through deployment through monitoring and retraining.

Mistake 3: Not knowing what a model registry is. The model registry is the most fundamental concept.

Key vocabulary

Model registry — A versioned store of trained models with metadata, enabling rollback, auditing, and deployment management.

Reproducibility — The ability to recreate any past model exactly by re-running the training pipeline with the same inputs.

Training pipeline — An automated sequence of steps (data preprocessing, model training, evaluation, promotion) triggered by new data or a schedule.

ML maturity level — A framework (Level 0-2) describing how automated and robust an organisation's ML operations are.

← Previous
Next · ProblemModel Evaluation for Imbalanced Loan Fraud Detection