Introduction to MLOps: DevOps for AI

September 8, 2020

ai machine-learning mlops devops

A model that works in a notebook is a prototype. A model that works in production is a product. The gap between them is MLOps.

The ML Lifecycle

Data → Prepare → Train → Evaluate → Deploy → Monitor → Repeat
  ↑                                              │
  └──────────────────────────────────────────────┘

Every step needs automation, versioning, and monitoring.

Why MLOps is Different

Code vs Model Versioning

Traditional:

git commit -m "Updated payment logic"

ML:

- Model version
- Training data version
- Feature engineering code
- Hyperparameters
- Training environment

All must be reproducible.

Testing is Harder

Traditional: Unit tests, integration tests.

ML:

Data quality tests
Model performance tests
Bias/fairness tests
A/B tests in production

Monitoring is Different

Traditional: Uptime, latency, errors.

ML: Add model-specific:

Prediction distribution (drift)
Feature distribution (data drift)
Model performance (degradation)

Core MLOps Components

Data Versioning

# DVC (Data Version Control)
dvc init
dvc add data/training.csv
git add data/training.csv.dvc
git commit -m "Add training data v1"

# Track data alongside code
dvc push  # Push to remote storage

Experiment Tracking

import mlflow

mlflow.set_experiment("fraud_detection")

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("epochs", 100)
    
    model = train(...)
    
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("f1_score", 0.87)
    mlflow.sklearn.log_model(model, "model")

Feature Stores

Central repository for features:

from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Get features for training
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["user_features:age", "user_features:purchase_count"]
)

# Get features for inference (real-time)
online_features = store.get_online_features(
    features=["user_features:age", "user_features:purchase_count"],
    entity_rows=[{"user_id": 123}]
)

Model Registry

import mlflow

# Register model
mlflow.register_model(
    "runs:/abc123/model",
    "fraud_detector"
)

# Transition stages
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="fraud_detector",
    version=3,
    stage="Production"
)

The Pipeline

# Example: Kubeflow Pipeline
stages:
  - name: data-prep
    container: preprocess:v1
    inputs: [raw_data]
    outputs: [cleaned_data]
    
  - name: train
    container: train:v1
    inputs: [cleaned_data, params]
    outputs: [model]
    
  - name: evaluate
    container: evaluate:v1
    inputs: [model, test_data]
    outputs: [metrics]
    
  - name: deploy
    container: deploy:v1
    inputs: [model]
    condition: metrics.accuracy > 0.90

Monitoring in Production

Data Drift

from evidently.metrics import DataDriftPreset
from evidently.report import Report

report = Report(metrics=[DataDriftPreset()])
report.run(
    reference_data=training_data,
    current_data=production_data
)

# Alert if drift detected
if report.as_dict()['metrics'][0]['result']['dataset_drift']:
    alert("Data drift detected!")

Model Performance

# Log predictions and ground truth
def predict_and_log(features, model):
    prediction = model.predict(features)
    
    # Log for later evaluation
    mlflow.log_metric("prediction_count", 1)
    store_for_ground_truth_comparison(features, prediction)
    
    return prediction

# Scheduled evaluation
def evaluate_recent_predictions():
    predictions = get_recent_predictions()
    ground_truth = get_ground_truth(predictions)
    
    accuracy = calculate_accuracy(predictions, ground_truth)
    if accuracy < threshold:
        trigger_retraining()

Tool Landscape

End-to-End Platforms

Vertex AI (Google): Full managed MLOps
SageMaker (AWS): Training to deployment
Azure ML: Microsoft ecosystem

Open Source Stack

Component	Tools
Data versioning	DVC, LakeFS
Experiment tracking	MLflow, Weights & Biases
Feature store	Feast, Tecton
Pipeline orchestration	Kubeflow, Airflow
Model serving	Seldon, KServe, MLflow
Monitoring	Evidently, Arize, WhyLabs

Getting Started

Level 0: Manual

Training in notebooks
Manual deployment
No versioning

Level 1: ML Pipeline

Automated training pipeline
Model versioning
Basic monitoring

Level 2: CI/CD for ML

Automated retraining on new data
Automated deployment with gates
A/B testing

Level 3: Full Automation

Continuous training
Continuous monitoring
Automatic drift detection and retraining

Best Practices

Version everything: Data, code, models, configs
Reproducible training: Same inputs → same model
Test data quality: Garbage in, garbage out
Monitor relentlessly: Models degrade silently
Automate carefully: Understand before automating

Final Thoughts

MLOps bridges the gap between data science and production. Without it, models rot in notebooks.

Start simple: version data, track experiments, monitor predictions. Add automation as you mature.

The goal is reliable, maintainable ML systems. Not just working models.

Models don’t ship themselves. MLOps does.