Comparison of matching methods and the contribution of machine learning

To test and compare different matching methods in order to draw up recommendations for the work needed to build directories, particularly as part of the RESIL multiannual programme.
matching
administrative data
in production
Author

Nicolas

Published

1 January 2021

Project summary

Comparison of matching methods and the contribution of machine learning
Project details The Resil programme aims to build a sustainable and scalable system of directories of individuals, households and residential premises, updated from a variety of administrative sources. It requires the aggregation of several data sources without a common direct identifier.
The aim of the experiment is to test and compare different matching methods in order to draw up recommendations for the work needed to build the directories. These will be based on performance criteria (quality of matching) but also on operational considerations (ease of deployment, calculation time, etc.). In particular, the aim is to assess the contribution and constraints of probabilistic methods and machine learning in matching tasks. This work will be accompanied by a reflection on the prior normalisation of data and the evaluation of matching results.
Stakeholders Insee
Project products and documentation - Methodology for matching individual data, 2022 Statistical Methodology Days (Journées de méthodologie statistique 2022)
- Probabilistic or deterministic, matching methods put to the test by the RéSIL programme, 2022 Statistical Methodology Days (Journées de méthodologie statistique 2022)
- Impact of data cleaning on the quality of a match, 2022 Statistical Methodology Days (Journées de méthodologie statistique 2022)
- Matching: aims, practices and quality issues (French), Working document, Insee, July 2024