Detecting and processing outliers or missing values, application to the Déclaration Sociale Nominative (Social Nominative Declarations)

Use of machine learning methods to detect and process outliers or missing values, application to the Social Nominative Declarations (Déclaration Sociale Nominative)
administrative data
Insee
machine learning
in production ??
data editing
Published

1 January 2018

Project summary

Use of machine learning methods to detect and process outliers or missing values, application to the Social Nominative Declarations (Déclaration Sociale Nominative)
Project details As part of the modernisation of INSEE’s internal processes following the Social Nominative Declarations entry into production, this project aims to rethink anomaly detection and salary adjustment. The management of new monthly DSN data has led to the testing of machine learning methods to automatically detect anomalies in the triplet of variables (gross salary, net salary and number of hours).
Players Insee
Project results The work carried out has made it possible to compare the characteristics of the anomalies identified by three machine learning methods. The three algorithms largely detect different anomalies, depending on how they define and identify the presumed errors. The combined use of several error detection algorithms would therefore make it possible to cover a wider spectrum of potential errors.
Project products and documentation - 5,324 per hour: outlier or footballer? Unsupervised learning methods for anomaly detection: application to the case of the Nominative Social Declaration (mixed French and English), Statistical Methodology Days 2018 (Journées de la méthodologie statistique 2018)
- A more detailed abstract is available here (French and English)