Comparison of matching methods and the contribution of machine learning

To test and compare different matching methods in order to draw up recommendations for the work needed to build directories, particularly as part of the RESIL multiannual programme.
matching
administrative data
Resil
Author

Nicolas

Published

1 January 2021

Project summary

Comparison of matching methods and the contribution of machine learning
Project details The Resil programme aims to build a sustainable and scalable system of directories of individuals, households and residential premises, updated from a variety of administrative sources. It requires the aggregation of several data sources without a common direct identifier.
The aim of the experiment is to test and compare different matching methods in order to draw up recommendations for the work needed to build the directories. These will be based on performance criteria (quality of matching) but also on operational considerations (ease of deployment, calculation time, etc.). In particular, the aim is to assess the contribution and constraints of probabilistic methods and machine learning in matching tasks. This work will be accompanied by a reflection on the prior normalisation of data and the evaluation of matching results.
Players Insee
Project results Results presented at the Statistical Methodology Days 2022:
- Methodology for matching individual data ;
- Probabilistic or deterministic, matching methods put to the test by the RéSIL programme ;
- Impact of data cleaning on the quality of a match

Writing a working document:
- Matching: aims, practices and quality issues, July 2024

In production :
- ??
Project code