🚀 Module 10

MLOps & Model Deployment

⏱ 14 hoursAdvanced6 topics

🎯 By the end: save and version models, track experiments with MLflow, serve predictions through a FastAPI endpoint, package everything with Docker, and monitor a live model for data drift and decay.

A model in a notebook helps no one. MLOps is the engineering discipline that gets models into production and keeps them working — the difference between a data scientist who produces interesting charts and one who ships systems the business depends on. This module walks the whole last mile: persist and version a trained model, track your experiments so results are reproducible, wrap the model in a web API, containerise it with Docker so it runs anywhere, and monitor it in production because — unlike ordinary software — models silently rot as the world changes. This is the skill set that separates senior practitioners from beginners.

1Why MLOps? From notebook to production

Training is maybe 10% of a real ML project. The other 90% is everything around it: data pipelines, versioning, serving, monitoring and retraining. MLOps brings software-engineering rigour (and DevOps ideas) to machine learning.

MLOps is a loop: monitoring in production feeds new data and triggers retraining.

Notebook ML	Production ML (MLOps)
runs once, by you	runs continuously, automatically
data sits in a CSV	data flows from live pipelines
“it works on my machine”	reproducible anywhere (Docker)
accuracy in a cell	monitored metrics + alerts
forgotten after the demo	versioned, retrained, maintained

Models decay even when the code does not. Ordinary software keeps working until you change it. A model degrades on its own as the world drifts away from its training data — which is why monitoring and retraining are core to MLOps, not optional extras.

Key points

Training is ~10% of a project; data, serving, monitoring and retraining are the rest.
MLOps applies software/DevOps rigour: versioning, reproducibility, automation, monitoring.
Unlike normal software, models silently decay as data drifts — maintenance is built-in.

2Saving & versioning models

Step one of deployment: persist the trained model so you can load it elsewhere without retraining. For scikit-learn, joblib is the standard.

Save and load

import joblib

# Save the entire fitted pipeline (preprocessing + model together)
joblib.dump(pipeline, 'model.joblib')

# Later, in a totally different process:
loaded = joblib.load('model.joblib')
print(loaded.predict(X_new[:3]))

▶ Output

[1 0 1]

Save the whole pipeline, not just the model. If you persist only the estimator, you must perfectly recreate every preprocessing step (the same scaler, the same encoder, fit on the same data) at serving time. Saving the full Pipeline from Module 5 guarantees train-time and serve-time transforms match exactly.

Version everything that makes a model

Code — Git (you already do this).
Data — DVC or dataset hashes, so you know what it trained on.
Model artefacts — a model registry with versions and stages (Staging → Production).
Environment — pinned requirements.txt so dependencies match.

A model is reproducible only if all four are pinned. “Which data + which code + which library versions produced this exact model?” must have a precise answer. Beware: pickled models can run arbitrary code on load — only load artefacts you trust.

Key points

Use joblib.dump/load to persist and restore scikit-learn models.
Always save the entire Pipeline so serving uses the exact same preprocessing as training.
Reproducibility needs four things versioned: code, data, model artefact, and environment.

3Experiment tracking with MLflow

Real projects train dozens of models with different features and settings. MLflow logs each run's parameters, metrics and artefacts so you can compare them and reproduce the winner — no more “which notebook had the 0.94 model?”

Track a run

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

with mlflow.start_run(run_name='rf-200-trees'):
    params = {'n_estimators': 200, 'max_depth': 10}
    model = RandomForestClassifier(**params, random_state=42).fit(X_tr, y_tr)
    acc = model.score(X_te, y_te)

    mlflow.log_params(params)
    mlflow.log_metric('accuracy', acc)
    mlflow.sklearn.log_model(model, 'model')
    print('Logged accuracy:', round(acc, 3))

▶ Output

Logged accuracy: 0.958

Run mlflow ui and open http://localhost:5000 to browse every run side by side — sort by metric, inspect parameters, and download any model.

Tracking turns guesswork into science. When you can compare 50 runs by metric and reproduce any of them on demand, model development stops being a memory game. MLflow also offers a model registry to promote a chosen run from Staging to Production — the bridge from experiment to deployment.

Key points

MLflow logs parameters, metrics and model artefacts for every training run.
mlflow ui lets you compare runs and reproduce the best one.
The model registry promotes a run through stages (Staging → Production).

4Serving a model with FastAPI

To make predictions available to apps, wrap the model in a web API. FastAPI is the modern Python choice: fast, with automatic validation and interactive docs.

A client POSTs features as JSON; the service runs the loaded model and returns a prediction.

# app.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib

app = FastAPI(title='Churn Predictor')
model = joblib.load('model.joblib')      # loaded once at startup

class Customer(BaseModel):               # auto-validated request schema
    features: list[float]

@app.post('/predict')
def predict(item: Customer):
    proba = model.predict_proba([item.features])[0, 1]
    return {'churn_probability': round(float(proba), 4)}

# Run it, then call it
uvicorn app:app --reload

curl -X POST http://localhost:8000/predict \
  -H 'Content-Type: application/json' \
  -d '{"features": [0.2, 45.0, 1.0, 3.0]}'

▶ Response

{"churn_probability": 0.8123}

Load the model once, at startup. Loading it inside the request handler would reload from disk on every call — slow and wasteful. FastAPI also auto-generates interactive docs at /docs, so others can try your endpoint in the browser.

Key points

FastAPI wraps a model in a web endpoint with automatic request validation (Pydantic).
Load the model once at startup, not per request, for speed.
FastAPI auto-generates interactive docs at /docs for easy testing.

5Packaging with Docker

“It works on my machine” is not deployment. Docker packages your code, dependencies and runtime into a single image that runs identically on any machine — your laptop, a colleague's, or a cloud server.

A Dockerfile for the API

FROM python:3.11-slim

WORKDIR /app

# Install dependencies first (better layer caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the app and the trained model
COPY app.py model.joblib ./

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

# Build the image, then run a container
docker build -t churn-api .
docker run -p 8000:8000 churn-api

▶ Output

INFO:     Uvicorn running on http://0.0.0.0:8000
INFO:     Application startup complete.

Why containers won. The image bundles the exact Python version, every library, your code and the model. Whoever runs docker run gets the identical environment — eliminating the dependency mismatches that break deployments. From here, the container deploys to Kubernetes, AWS, GCP or any cloud the same way.

Keep images lean. Start from a -slim base, copy requirements.txt before your code (so dependency layers cache), and use a .dockerignore. Smaller images build faster, deploy faster and have a smaller attack surface.

Key points

Docker packages code + dependencies + runtime into one portable image.
A Dockerfile defines the build; docker build then docker run launches the container.
Containers run identically everywhere and deploy cleanly to any cloud or Kubernetes.

6Monitoring, drift & retraining

A deployed model is not done — it is on probation. The world changes, and the data it sees in production drifts away from its training data, quietly eroding accuracy. Monitoring catches this before users do.

Two kinds of drift

Data drift: the input distribution shifts (e.g. new customer demographics).
Concept drift: the relationship between inputs and target changes (e.g. behaviour after a price change or a pandemic).

Performance drifts down over time; monitoring triggers a retrain that restores it.

Detect drift in code

from scipy import stats

# Compare a feature's training vs recent-production distribution
stat, p = stats.ks_2samp(train_feature, live_feature)
print(f'KS statistic: {stat:.3f}, p-value: {p:.4f}')

if p < 0.05:
    print('Drift detected -- flag for review / retraining')

▶ Output

KS statistic: 0.214, p-value: 0.0008
Drift detected -- flag for review / retraining

Close the loop. A mature system logs predictions and (eventually) true outcomes, tracks live metrics on a dashboard, alerts on drift or accuracy drops, and retrains automatically on fresh data. That feedback loop — not the model itself — is what keeps an ML product valuable for years. You have now seen the entire lifecycle, end to end.

Key points

Data drift = inputs shift; concept drift = the input-output relationship changes.
Detect drift statistically (e.g. KS test) and monitor live accuracy on a dashboard.
Mature MLOps closes the loop: log, monitor, alert, and retrain automatically on fresh data.

★ Hands-on Project — Deploy a Model as a Container

Take a trained model all the way to a running, containerised API — the deliverable that proves you can ship.

Train a model from an earlier module inside a scikit-learn Pipeline and log the run with MLflow (params + metrics).
Persist the full pipeline with joblib.dump and pin your dependencies in requirements.txt.
Write a FastAPI app.py that loads the model at startup and exposes a POST /predict endpoint with a Pydantic request schema.
Run it with uvicorn and test it with curl and the interactive /docs page.
Write a Dockerfile (slim base, cached deps), then docker build and docker run the API.
Confirm the containerised endpoint returns the same predictions as your local run.
Add a simple drift check: a script that compares a feature's training vs new distribution with a KS test and prints a warning.
Write a short README (how to build, run, call) and a note on how you'd monitor and retrain it, then commit to your portfolio.

Ready to test yourself?

Take the module quiz. Score 70% or more to mark this module complete.

Start the quiz →

💡 Log in to save your progress and earn the certificate.

← Previous

Time Series Analysis & Forecasting

Responsible AI, Ethics & Career Readiness