Romain Avouac (Insee), Thomas Faria (Insee), Tom Seimandi (Insee)
Difficulty of transitioning from experiments to production-grade machine learning systems
Leverage best practices from software engineering


Reproducibility
Versioning
Automation
Monitoring
Collaboration
Multiple frameworks implement the MLOps principles
Pros of MLflow
1️⃣ Introduction to MLFlow
2️⃣ A Practical Example: NACE Code Prediction for French companies
3️⃣ Deploying a ML model as an API
4️⃣ Distributing the hyperparameter optimization
5️⃣ Maintenance of a model in production
Preparation of the working environment
It is assumed that you have a Github account and have already created a token. Fork the training repository by clicking here.
Create an account on the SSP Cloud using your professional mail address
Launch a MLflow service by clicking this URL
Launch a Jupyter-python service by clicking this URL
Open the Jupyter-python service and input the service password
In Jupyter, open a terminal and clone your forked repository (modify the first two lines):
Install the necessary packages for the training:
You’re all set !




Introduction to MLflow concepts
JupyterLab, open the notebook located at formation-mlops/notebooks/mlflow-introduction.ipynbMLflow UI and try to build your own experiments from the example code provided in the notebook.NACE
At Insee, previously handled by an outdated rule-based algorithm
Common problematic to many National Statistical Institutes
“Bag of n-gram model” : embeddings for words but also n-gram of words and characters
Very simple and fast model
OVA: One vs. All
Part 1 : From notebooks to a package-like project
Launch a VSCode service by clicking this URL. Open the service and input the service password.
All scripts related to our custom model are stored in the src folder. Check them out. Have a look at the MLproject file as well.
Run a training of the model using MLflow. To do so, open a terminal (-> Terminal -> New Terminal) and run the following command :
In the UI of MLflow, look at the results of your previous run:
Experiments -> nace-prediction -> <run_name>You have trained the model with some default parameters. In MLproject check the available parameters. Re-train a model with different parameters (e.g. dim = 25).
MLflow, compare the 2 models by plotting the accuracy against one parameter you have changed (i.e. dim)
Select the 2 runs -> Compare -> Scatter Plot -> Select your X and Y axisPart 1 : From notebooks to a package-like project
Launch a VSCode service by clicking this URL. Open the service and input the service password.
In VSCode, open a terminal (-> Terminal -> New Terminal) and redo steps 6 and 7 of application 0 (clone and package installation).
All scripts related to our custom model are stored in the src folder. Check them out. Have a look at the MLproject file as well.
Run a training of the model using MLflow. To do so, open a terminal and run the following command :
In the UI of MLflow, look at the results of your previous run:
Experiments -> nace-prediction -> <run_name>You have trained the model with some default parameters. In MLproject check the available parameters. Re-train a model with different parameters (e.g. dim = 25).
MLflow, compare the 2 models by plotting the accuracy against one parameter you have changed (i.e. dim)
Select the 2 runs -> Compare -> Scatter Plot -> Select your X and Y axisPart 2 : Distributing and querying a custom model
src/train.py file carefully. What are the main differences with application 1?MLflow model onboards the preprocessing?MLflow, register your last model as fasttext to make it easily queryable from the Python APIpredict_mlflow.py in the src folder of the project. This script should:
fasttext model["vendeur d'huitres", "boulanger"]).💡 Don’t forget to read the documentation of the predict() function of the custom class (src/fasttext_wrapper.py) to understand the expected format for the inputs !
predict_mlflow.py script."COIFFEUR" and "coiffeur, & 98789".k and try to understand how the structure of the output changed as a result.Production infrastructure : Kubernetes cluster
The model might serve various applications
Online serving
Container: self-contained and isolated environment that encapsulates the model, its dependencies and the API code
Containers provide high portability and scalability for distributing the model efficiently.
The Dockerfile is used to configure and build the Docker container.

Kubernetesdeployment.yaml : defines how the API should run (container image, resources, and environment variables)service.yaml : establishes a stable internal network endpoint for the API.ingress.yaml : provides an entry point for external clients to access the API.Deploying manually a machine-learning model as an API
app folder. Check them.Dockerfile to see how the image is built. The image is automatically rebuilt and published via Github Actions, if interested have a look to .github/workflows/build_image.yml.kubernetes/deployment.yml and modify the highlighted lines accordingly:deployment.yml
kubernetes/ingress.yml and modify (two times) the URL of the API endpoint to be of the form <your_firstname>-<your_lastname>-api.lab.sspcloud.frKubernetes contracts contained in the kubernetes/ folder in a terminal to deploy the APIingress.yml file/docs to your URLMLFLOW_MODEL_NAME or MLFLOW_MODEL_VERSION (if you didn’t modify the model name) environment variable in the deployment.yml fileKubernetes contracts to update the APIContinuous deployment of a machine-learning model as an API
⚠️ The previous applications must have been created with the Git option to be able to follow this one.
Previously, you deployed your model manually. Thanks to ArgoCD, it is possible to deploy a model continuously. This means that every modification of a file in the kubernetes/ folder will automatically trigger redeployment, synchronized with your GitHub repository. To convince yourself, follow the steps below:
ArgoCD service by clicking on this URL. Open the service, enter the username (admin), and the service’s password.argocd/template-argocd.yml and modify the highlighted lines:template-argocd.yml
New App and then Edit as a YAML. Copy and paste the content of argocd/template-argocd.yml, and click on Create.ingress.yml file/docs to your URLMLFLOW_MODEL_NAME or MLFLOW_MODEL_VERSION (if you didn’t modify the model name) environment variable in the deployment.yml fileArgoCD to automatically synchronize the changes from your GitHub repository, or force synchronization. Refresh your API and check on the homepage that it is now based on the new version of the model.Kubernetes

apiVersion: argoproj.io/v1alpha1
kind: Workflow # new type of k8s spec
metadata:
generateName: hello-world- # name of the workflow spec
spec:
entrypoint: whalesay # invoke the whalesay template
templates:
- name: whalesay # name of the template
container:
image: docker/whalesay
command: [ cowsay ]
args: [ "hello world" ]


apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-parameters-
spec:
entrypoint: whalesay
arguments:
parameters:
- name: message
value: hello world
templates:
- name: whalesay
inputs:
parameters:
- name: message # parameter declaration
container:
image: docker/whalesay
command: [cowsay]
args: ["{{inputs.parameters.message}}"]steps or dag)apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: steps-
spec:
entrypoint: hello-hello-hello
# This spec contains two templates: hello-hello-hello and whalesay
templates:
- name: hello-hello-hello
# Instead of just running a container
# This template has a sequence of steps
steps:
- - name: hello1 # hello1 is run before the following steps
template: whalesay
- - name: hello2a # double dash => run after previous step
template: whalesay
- name: hello2b # single dash => run in parallel with previous step
template: whalesay
- name: whalesay # name of the template
container:
image: docker/whalesay
command: [ cowsay ]
args: [ "hello world" ]





Part 1 : introduction to Argo Workflows
Argo Workflows service by clicking this URL. Open the service and input the service password (either automatically copied or available in the README of the service)VSCode, create a file hello_world.yaml at the root of the project with the following content:hello_world.yml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
labels:
workflows.argoproj.io/archive-strategy: "false"
annotations:
workflows.argoproj.io/description: |
This is a simple hello world example.
You can also run it in Python: https://couler-proj.github.io/couler/examples/#hello-world
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]Hello world workflow via a terminal in VSCode :Argo Workflows. Find the logs of the workflow you just launched. You should see the Docker logo .Part 2 : distributing the hyperparameters optimization
argo_workflows/workflow.yml file. What do you expect will happen when we submit this workflow ?workflow.yml
MLflow UI to check what has been done.Logging
An introduction to MLOps with MLflow