Automatic coding of companies’ main activity

Develop a machine learning algorithm to automate the classification of companies’ main activities and put it into production
Python
automatic coding
fasttext
package
in production
MLFlow
Author

Nicolas

Published

1 January 2022

Project summary

Automatic coding of companies’ main activity
Project details The coding of companies’ main activity (APE) on the basis of activity descriptions (in the form of free text) in the Sirene register was previously carried out using 6 deterministic coding environments mobilising a huge number of decision rules. The aim of the experiment is to test the performance of statistical learning models in predicting the category of APE category as part of the overhaul of the Sirene register and the introduction of a one-stop shop.
Players Insee
Project results The model developed presents similar performance to previous models by automating them, and also offers decision support. The model has also been put into production, applying MLOps principles where possible.
Presentations and written materials relating to the project can be accessed at this site.
Project code Accessible code repositories here. They include:
- Code for annotating data using Label Studio ;
- Code for a coding web API deployed on the SSP Cloud ;
- Code implementing a visualisation dashboard for monitoring the activity of a coding model in production and accessible via a web API;
- Code for training APE classification models.

Project documents