An introduction to MLOps with MLflow

Romain Avouac (Insee), Thomas Faria (Insee), Tom Seimandi (Insee)

Introduction

Who are we ?

Data scientists at Insee
- methodological and IT innovation teams
- support data science projects
Contact us

Context

Difficulty of transitioning from experiments to production-grade machine learning systems
Leverage best practices from software engineering
- Improve reproducibility of analysis
- Deploy applications in a scalable way
- Monitor running applications

The DevOps approach

Unify development (dev) and system administration (ops)
- shorten development time
- maintain software quality

The MLOps approach

Integrate the specificities of machine learning projects
- Experimentation
- Continuous improvement

MLOps : principles

Reproducibility
Versioning
Automation
Monitoring
Collaboration

Why MLflow ?

Multiple frameworks implement the MLOps principles
Pros of MLflow
- Open-source
- Covers the whole ML lifecycle
- Agnostic to the ML library used
- We have experience with it

Training platform : the SSP Cloud

An open innovation production-like environment
- Kubernetes cluster
- S3-compatible object storage
- Large computational resources (including GPUs)
Based on the Onyxia project
- User-friendly interface to launch data science services
- A catalog of services which covers the full lifecycle of data science projects

Outline

1️⃣ Introduction to MLFlow

2️⃣ A Practical Example: NACE Code Prediction for French companies

3️⃣ Deploying a ML model as an API

4️⃣ Distributing the hyperparameter optimization

5️⃣ Maintenance of a model in production

Preparation of the working environment

Create an account on the SSP Cloud using your professional mail address
Launch a MLflow service by clicking this URL
Launch a VSCode-python service by clicking this URL
Open the VSCode-python service and input the service password
You’re all set !

Preparation of the working environment

It is assumed that you have a Github account and have already created a token. Fork the training repository by clicking here.
Create an account on the SSP Cloud using your professional mail address
Launch a MLflow service by clicking this URL
Launch a VSCode-python service by clicking this URL
Open the VSCode-python service and input the service password

In VSCode, open a terminal and clone your forked repository (modify the first two lines):

GIT_REPO=formation-mlops
GIT_USERNAME=InseeFrLab

git clone https://github.com/$GIT_USERNAME/$GIT_REPO.git
cd $GIT_REPO

Install the necessary packages for the training (with uv):
```
uv sync
uv run python -m nltk.downloader stopwords
```
```
You’re all set !

1️⃣ Introduction to MLFlow

Tracking server

“An API and UI for logging parameters, code versions, metrics, and artifacts”

Projects

“A standard format for packaging reusable data science code”

Models

“A convention for packaging machine learning models in multiple flavors”

Model registry

“A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model”

Application 1

Introduction to MLflow concepts

In VSCode, open the notebook located at formation-mlops/notebooks/mlflow-introduction.ipynb
Execute the notebook cell by cell.
If you are finished early, explore the MLflow UI and try to build your own experiments from the example code provided in the notebook. For example, try to add other hyperparameters in the grid search process.

Summary

MLflow simplifies the tracking of model training
- Keeps record of experiments and their outputs
- Simple integration with main ML frameworks
Limitations
- How to use custom frameworks (non-natively integrated)?
- How to move from experimentation to production?

2️⃣ A Practical Example

Context

NACE
- European standard classification of productive economic activities
- Hierarchical structure with 4 levels and 615 codes
At Insee, previously handled by an outdated rule-based algorithm
Common problematic to many National Statistical Institutes

FastText model

“Bag of n-gram model” : embeddings for words but also n-gram of words and characters
Very simple and fast model

OVA: One vs. All

Data used

Slide
Raw
Preprocessed

A simple use-case with only 2 variables:
- Textual description of the activity – text
- True NACE code labelised by the rule-based engine – nace (732 modalities)
Standard preprocessing:
- lowercasing
- punctuation removal
- number removal
- stopwords removal
- stemming
- …

viewof table_data = Inputs.table(transpose(data_raw), {
    rows: 22
})

viewof table_data_prepro = Inputs.table(transpose(data_prepro), {
    rows: 22
})

MLflow with a non standard framework

Easy to use with a variety of machine learning frameworks (scikit-learn, Keras, Pytorch…)

mlflow.sklearn.log_model(pipe_rf, "model")

mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{version}")
y_train_pred = model.predict(X_train)

What if we require greater flexibility, e.g. to use a custom framework?

Possibility to track , register and deliver your own model

MLflow with a non standard framework

There are 2 main differences when using your own framework:
- logging of parameters, metrics and artifacts
- wrapping of your custom model so that MLflow can serve it

# Define a custom model
class MyModel(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
        self.my_model.load_model(context.artifacts["my_model"])

    def predict(self, context, model_input):
        return self.my_model.predict(model_input)

From experiment towards production

Notebooks are not suitable to build production-grade ML systems:
- Limited potential for automation of ML pipelines.
- Lack of clear and reproducible workflows.
- Hinders collaboration and versioning among team members.
- Insufficient modularity for managing complex ML components.

Application 2

Part 1: Using a custom model

All scripts related to our custom model are stored in the src folder. Check them out. In particular, the train.py script is responsible for training the model. What are the main differences compared to application 1?
Why can we say that the MLflow model integrates preprocessing?

Application 2

Part 2 : From notebooks to a package-like project

The train.py script is also responsible for logging experiments in MLFlow. Note how the parameters of each experiment are passed to the training function when the script is called.
To make the model training procedure more reproducible, MLFlow provides the mlflow run command. The MLproject file specifies the command and parameters that will be passed to it. Inspect this file.

Run a model training using MLFlow. To do this, open a terminal ( -> Terminal -> New Terminal) and execute the following command:

export MLFLOW_EXPERIMENT_NAME="nace-prediction"
mlflow run ~/work/formation-mlops/ --env-manager=local \
    -P remote_server_uri=$MLFLOW_TRACKING_URI \
    -P experiment_name=$MLFLOW_EXPERIMENT_NAME

In the MLflow interface, examine the results of your previous run:
- Experiments -> nace-prediction -> <run_name>
You trained the model with certain default parameters. In the MLproject file, check the available parameters. Retrain a model with different parameters (e.g., dim = 25).

Click to see the command

mlflow run ~/work/formation-mlops/ --env-manager=local \
    -P remote_server_uri=$MLFLOW_TRACKING_URI \
    -P experiment_name=$MLFLOW_EXPERIMENT_NAME \
    -P dim=25

In MLflow, compare the 2 models by plotting the accuracy against one parameter you have changed (i.e. dim)
- Select the 2 runs -> Compare -> Scatter Plot -> Select your X and Y axis
Save the model with the best accuracy as fasttext to make it easily queryable from Python.

Application 2

Part 3: Querying the locally trained model

Create a script predict_mlflow.py in the src folder of the project. This script should:
1. Load version 1 of the fasttext model
2. Use the model to predict NACE codes for a given list of activity descriptions (e.g., ["vendeur d'huitres", "boulanger"]).

💡 Don’t forget to read the documentation of the predict() function from the custom class (src/fasttext_wrapper.py) to understand the expected input format!

Click to see the script content

predict_mlflow.py

import mlflow

model_name = "fasttext"
version = 1

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{version}"
)

list_libs = ["vendeur d'huitres", "boulanger"]

results = model.predict(list_libs, params={"k": 1})
print(results)

Run your predict_mlflow.py script.

Click to see the command

python formation-mlops/src/predict_mlflow.py

Ensure that the following two descriptions give the same main prediction: "COIFFEUR" et "coiffeur, & 98789".
Change the value of the k parameter and try to understand how the output structure has changed accordingly.

Summary

MLflow is versatile
- Use of custom frameworks (with a “interface” class)
- Industrialization of training (file MLproject)
- Simple querying of trained and stored models
Limitation: the trained model is not accessible
- Simplified querying… but irrelevant format for all users
- The model is not deployed

3️⃣ Serving a ML model to users

Essential questions

Once a ML model has been developed, it must be deployed to serve its end users
- Which production infrastructure ?
- Who are the end users ?
- Batch serving vs. online serving

Envisioned configuration

The model might serve various applications
- Make the model accessible via an API
Online serving
- Client applications send a request to the API and get a fast response
Production infrastructure : Kubernetes cluster

Exposing a model through an API

Why expose a model via a REST API?

Simplicity: single entry point that hides the underlying complexity of the model
Standardization: HTTP requests -> agnostic to the programming language used
Scalability: adapts to the load of concurrent requests
Modularity: separation of model management and its availability

Exposing a model through an API

Run the API in a container

Container: self-contained and isolated environment that encapsulates the model, its dependencies and the API code
Advantages:
- Portability
- Scalability to efficiently distribute the model
Technical prerequisites for deploying on Kubernetes

Deploying an API on `Kubernetes`

Application 3

Part 1: Exposing a ML model locally as an API

We constructed a very simplistic Rest API using FastAPI. All underlying files are in the app folder. Check them.
Deploy the API locally by running the following commands in a terminal:

export MLFLOW_MODEL_NAME="fasttext"
export MLFLOW_MODEL_VERSION=1
uvicorn app.main:app --root-path /proxy/8000

Open the API page using the button provided by VSCode.
Display your API documentation by adding /docs to your URL.
Test your API!

Application 3

Part 2 : Deploying manually a machine-learning model as an API

Open the Dockerfile to see how the image is built. The image is automatically rebuilt and published via Github Actions, if interested have a look to .github/workflows/build_image.yml. Dans le cadre de cette formation, nous allons tous utiliser cette même image.
Open the file kubernetes/deployment.yml and modify the highlighted lines accordingly:

deployment.yml

containers:
- name: api
    image: inseefrlab/formation-mlops-api:main
    imagePullPolicy: Always
    env:
    - name: MLFLOW_TRACKING_URI
        value: https://user-<namespace>-<pod_id>.user.lab.sspcloud.fr
    - name: MLFLOW_MODEL_NAME
        value: fasttext
    - name: MLFLOW_MODEL_VERSION
        value: "1"

Open the file kubernetes/ingress.yml and modify (two times) the URL of the API endpoint to be of the form <your_firstname>-<your_lastname>-api.lab.sspcloud.fr
Apply the three Kubernetes contracts contained in the kubernetes/ folder in a terminal to deploy the API

kubectl apply -f formation-mlops/kubernetes/

Reach your API using the URL defined in your ingress.yml file
Re-train a new model and deploy this new model in your API

Cliquez pour voir les étapes

Train a model
Register the model in MLflow
Adjust your MLFLOW_MODEL_NAME or MLFLOW_MODEL_VERSION (if you didn’t modify the model name) environment variable in the deployment.yml file
Apply the new Kubernetes contracts to update the API

kubectl apply -f formation-mlops/kubernetes/

Refresh your API, and verify on the home page that it is now based on the new version of the model

Application 3

Part 3 : déploiement continu d’un modèle de ML en tant qu’API

⚠️ The previous applications must have been created with the Git option to be able to follow this one.

Previously, you deployed your model manually. Thanks to ArgoCD, it is possible to deploy a model continuously. This means that every modification of a file in the kubernetes/ folder will automatically trigger redeployment, synchronized with your GitHub repository. To convince yourself, follow the steps below:

Delete the manual deployment of the previous application to prevent Kubernetes resources from overlapping:

kubectl delete -f formation-mlops/kubernetes/

Launch an ArgoCD service by clicking on this URL. Open the service, enter the username (admin), and the service’s password.
Commit the changes made and push them to your GitHub repository.
Open the template argocd/template-argocd.yml and modify the highlighted lines:

template-argocd.yml

spec:
  project: default
  source:
    repoURL: https://github.com/<your-github-id>/formation-mlops.git
    targetRevision: HEAD
    path: kubernetes
  destination:
    server: https://kubernetes.default.svc
    namespace: <your-namespace>

In ArgoCD, click on New App and then Edit as a YAML. Copy and paste the content of argocd/template-argocd.yml, and click on Create.
Reach your API using the URL defined in your ingress.yml file
Display the documentation of your API by adding /docs to your URL
Try your API out!
Re-train a new model and deploy automatically this new model in your API

Click to see the steps

Train a model
Register the model in MLflow
Adjust your MLFLOW_MODEL_NAME or MLFLOW_MODEL_VERSION (if you didn’t modify the model name) environment variable in the deployment.yml file
Commit these changes and push them to your GitHub repository.
Wait for 5 minutes for ArgoCD to automatically synchronize the changes from your GitHub repository, or force synchronization. Refresh your API and check on the homepage that it is now based on the new version of the model.

Application 3

Part 4: Querying your deployed model

Create a file predict_api.py. This script should:
- Read the parquet file available at the following address:
```
https://minio.lab.sspcloud.fr/projet-formation/diffusion/mlops/data/data_to_classify.parquet
```
- Make queries to your API for each label present in the parquet file.
- Display the prediction results.

Click to see the script content

predict_api.py

import pandas as pd
import requests


# Function to make a request to the API
def make_prediction(api_url: str, description: str):
    params = {"description": description, "nb_echoes_max": 2}
    response = requests.get(api_url, params=params)
    return response.json()


# Data URL
data_path = "https://minio.lab.sspcloud.fr/projet-formation/diffusion/mlops/data/data_to_classify.parquet"

# Load the Parquet file into a pandas DataFrame
df = pd.read_parquet(data_path)

# API URL
api_url = "https://<your_firstname>-<your_lastname>-api.lab.sspcloud.fr/predict"

# Make the requests
responses = df["text"].apply(lambda x: make_prediction(api_url, x))

# Display the DataFrame with prediction results
print(pd.merge(df, pd.json_normalize(responses),
               left_index=True,
               right_index=True))

Run your predict_api.py script.

Click to see the command

python formation-mlops/src/predict_api.py

In ArgoCD, open your application and click on your pod that should start with "codification-api-...". Observe the logs.
What information do you have? Is it sufficient?

Important

We performed a series of GET requests here as we have a single entry point to our API. To perform batch queries, it is preferable to use POST requests.

Summary

4️⃣ Machine learning in production

Lifecycle of a ML model in production

The challenge of responsibility

The lifecycle of a ML model is complex
Several stakeholders involved:
- Data scientists
- IT/DevOps
- Business teams
Different expertise and vocabulary between these stakeholders

➡️ Communication essential between teams to monitor the model in production

Why monitor a model in production?

Detect biased data: unalignment between production and training data
Anticipate model instability: stable model performance over time
Continuously improve the model: regular retraining

⚠️ The term monitoring of an application/model has different definitions depending on the team.

Monitoring according to the IT specialist

Monitoring an application is part of the DevOps approach
Technical control of the model:
- Latency
- Memory
- Disk usage
- …

Monitoring according to the data scientist

Monitoring a ML model is part of the MLOps approach
Methodological control of the model
Real-time performance monitoring of the model often impossible, use of proxies:
- Data drift: the input data distribution changes over time
- Concept drift: the modeled relationship changes over time

How to monitor a model in production?

Integration of logs in the API
Collection and formatting of logs
Monitoring of ML metrics
Implementation of an alert system

Application 4

Part 1: Logging business metrics

Using the logging package, add logs to your API. For each request, display the label to be coded as well as the responses returned by your API. To do this, modify the app/main.py file.

Click to see the steps to complete

Import the logging package:

main.py

import logging

Set up your logging configuration before defining your first entry point:

main.py

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("log_file.log"),
        logging.StreamHandler(),
    ],
)

Add the label and the API response to your logs:

main.py

# Logging
logging.info(f"{{'Query': {description}, 'Response': {predictions[0]}}}")

Commit your changes and push them to your remote repository.
Whenever you make a change to your API, it needs to be redeployed for the changes to take effect. In theory, it would be necessary to rebuild a new image for our API containing the latest adjustments. To simplify, we have already built the two images with and without logs in the API. Until now you have used the image without logs, redeploy your API using the image with logs tagged as logs.

Click to see the steps to complete

In the kubernetes/deployment.yml file, replace the no-logs tag with the logs tag:

deployment.yml

template:
  metadata:
    labels:
      app: codification-api
  spec:
    containers:
      - name: api
        image: inseefrlab/formation-mlops:logs
        imagePullPolicy: Always

Commit your changes and push them to your remote repository.
Wait 5 minutes for ArgoCD to automatically synchronize the changes from your Github repository or force synchronization.

Run your predict-api.py script.

Click to see the command

python formation-mlops/src/predict-api.py

In ArgoCD, open your application and click on your pod that should start with "codification-api-...". Observe the logs.

Model observability through a dashboard

API logs now contain business information
For processing/storage of logs: ETL pipeline
To analyze the behavior of the coding engine: creation of a dashboard
Multiple solutions for the dashboard: Grafana, Quarto Dashboards, Apache Superset, …

An example stack

ETL in the form of a cron job that parses logs and stores them in .parquet format
Using DuckDB to query the .parquet files
… and create the components of a Quarto Dashboard
The dashboard is a static site to be updated daily, for example

An example stack

Application 4

Part 2: Creating a monitoring dashboard

We will use Quarto Dashboards. Open the dashboard/index.qmd file and inspect the code. To retrieve the data needed to create the dashboard, we use a serverless DBMS: DuckDB. DuckDB allows us to run SQL queries on a .parquet file containing parsed logs. This file contains one row per prediction, with the variables timestamp, text, prediction_1, proba_1, prediction_2, and proba_2.
To visualize the dashboard, enter the following commands in a Terminal from the project root and click on the generated link.
```
cd dashboard
quarto preview index.qmd
```
Currently, the percentage of predictions with a probability greater than 0.8 does not correspond to reality. Modify the SQL query to obtain the pct_predictions variable to display the correct value.

Click to see the answer

pct_predictions = duckdb.sql(
    """
    SELECT 100 * COUNT(*) / COUNT(*)
    FROM data;
    """
).fetchall()[0][0]

The two charts at the bottom of the dashboard are also incorrect. Modify the SQL query to obtain the daily_stats variable to display the correct charts.

Click to see the answer

daily_stats = duckdb.sql(
    """
    SELECT
        CAST(timestamp AS DATE) AS date,
        COUNT(*) AS n_liasses,
        (
            COUNT(
                CASE WHEN data.proba_1 > 0.8 THEN 1 END
            ) * 100.0 / COUNT(*)
        ) AS pct_high_proba
    FROM data
    GROUP BY CAST(timestamp AS DATE);
    """
).to_df()

Notice the changes made to the dashboard.

Summary

5️⃣ Distributing the hyperparameter optimization

Parallel training

With our setup, we can train models one by one and log all relevant information to the MLflow tracking server
What if we would like to train multiple models at once, for example to optimize hyperparameters ?

Workflow automation

General principles :
- Define workflows where each step in the workflow is a container (reproducibility)
- Model multi-step workflows as a sequence of tasks or as a directed acyclic graph
- This allows to easily run in parallel compute intensive jobs for machine learning or data processing

Argo workflows

A popular workflow engine for orchestrating parallel jobs on Kubernetes
- open-source
- container-native
- available on the SSP Cloud

Hello World

apiVersion: argoproj.io/v1alpha1
kind: Workflow                  # new type of k8s spec
metadata:
  generateName: hello-world-    # name of the workflow spec
spec:
  entrypoint: whalesay          # invoke the whalesay template
  templates:
    - name: whalesay            # name of the template
      container:
        image: docker/whalesay
        command: [ cowsay ]
        args: [ "hello world" ]

What is going on ?

What is going on ?

What is going on ?

Parameters

Templates can take input parameters

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-parameters-
spec:
  entrypoint: whalesay
  arguments:
    parameters:
    - name: message
      value: hello world

  templates:
  - name: whalesay
    inputs:
      parameters:
      - name: message       # parameter declaration
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

Multi-step workflows

Multi-steps workflows can be specified (steps or dag)

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello-hello-hello

  # This spec contains two templates: hello-hello-hello and whalesay
  templates:
  - name: hello-hello-hello
    # Instead of just running a container
    # This template has a sequence of steps
    steps:
    - - name: hello1            # hello1 is run before the following steps
        template: whalesay
    - - name: hello2a           # double dash => run after previous step
        template: whalesay
      - name: hello2b           # single dash => run in parallel with previous step
        template: whalesay
  - name: whalesay              # name of the template
    container:
      image: docker/whalesay
      command: [ cowsay ]
      args: [ "hello world" ]

What is going on ?

What is going on ?

What is going on ?

What is going on ?

What is going on ?

Further applications

Workflow to test registered models, or models pushed to staging / production
Workflows can be triggered automatically (via Argo Events for example)
Continuous training workflows
Distributed machine learning pipelines in general (data downloading, processing, etc.)

Further applications

Notes

Python SDK for Argo Workflows
Kubeflow pipelines
Couler : unified interface for constructing and managing workflows on different workflow engines
Other Python-native orchestration tools : Apache Airflow, Metaflow, Prefect

Application 5

Part 1 : introduction to Argo Workflows

Launch an Argo Workflows service by clicking this URL. Open the service and input the service password (either automatically copied or available in the README of the service)
In VSCode, create a file hello_world.yaml at the root of the project with the following content:

hello_world.yml

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
  labels:
    workflows.argoproj.io/archive-strategy: "false"
  annotations:
    workflows.argoproj.io/description: |
      This is a simple hello world example.
      You can also run it in Python: https://couler-proj.github.io/couler/examples/#hello-world
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]

Submit the Hello world workflow via a terminal in VSCode :

argo submit formation-mlops/hello_world.yaml

Open the UI of Argo Workflows. Find the logs of the workflow you just launched. You should see the Docker logo .

Application 4

Part 2 : distributing the hyperparameters optimization

Take a look at the argo_workflows/workflow.yml file. What do you expect will happen when we submit this workflow ?
Modify the highlighted line in the same manner as in application 3.

workflow.yml

parameters:
    # The MLflow tracking server is responsable to log the hyper-parameter and model metrics.
    - name: mlflow-tracking-uri
    value: https://user-<namespace>-<pod_id>.user.lab.sspcloud.fr
    - name: mlflow-experiment-name
    value: nace-prediction

Submit the workflow and look at the jobs completing live in the UI.

Click to see the command

argo submit formation-mlops/argo_workflows/workflow.yml

Once all jobs are completed, visualize the logs of the whole workflow.
Finally, open the MLflow UI to check what has been done.

Conclusion

The opportunity for more continuous organizations

Required transformations

Transformations at different levels
- Technical tools
- Methodological
- Organizational
Strategy: incremental change
- Training
- Application to pilot projects

An introduction to MLOps with MLflow

Introduction

Who are we ?

Context

The DevOps approach

The MLOps approach

MLOps : principles

Why MLflow ?

Training platform : the SSP Cloud

Outline

Application 0

1️⃣ Introduction to MLFlow

Tracking server

Projects

Models

Model registry

Application 1

Summary

2️⃣ A Practical Example

Context

FastText model

Data used

MLflow with a non standard framework

MLflow with a non standard framework

From experiment towards production

Application 2

Application 2

Application 2

Summary

3️⃣ Serving a ML model to users

Essential questions

Envisioned configuration

Exposing a model through an API

Why expose a model via a REST API?

Exposing a model through an API

Run the API in a container

Deploying an API on Kubernetes

Application 3

Application 3

Application 3

Application 3

Summary

4️⃣ Machine learning in production

Lifecycle of a ML model in production

The challenge of responsibility

Why monitor a model in production?

Monitoring according to the IT specialist

Monitoring according to the data scientist

How to monitor a model in production?

Application 4

Model observability through a dashboard

An example stack

An example stack

Application 4

Summary

5️⃣ Distributing the hyperparameter optimization

Parallel training

Workflow automation

Argo workflows

Hello World

What is going on ?

What is going on ?

What is going on ?

Parameters

Multi-step workflows

What is going on ?

What is going on ?

What is going on ?

What is going on ?

What is going on ?

Further applications

Further applications

Notes

Application 5

Application 4

Conclusion

The opportunity for more continuous organizations

Required transformations

Deploying an API on `Kubernetes`