Project summary
| Comparison of matching methods and the contribution of machine learning | |
|---|---|
| Project details | The Resil programme aims to build a sustainable and scalable system of directories of individuals, households and residential premises, updated from a variety of administrative sources. It requires the aggregation of several data sources without a common direct identifier. The aim of the experiment is to test and compare different matching methods in order to draw up recommendations for the work needed to build the directories. These will be based on performance criteria (quality of matching) but also on operational considerations (ease of deployment, calculation time, etc.). In particular, the aim is to assess the contribution and constraints of probabilistic methods and machine learning in matching tasks. This work will be accompanied by a reflection on the prior normalisation of data and the evaluation of matching results. | 
| Players | Insee | 
| Project results | Results presented at the Statistical Methodology Days 2022: - Methodology for matching individual data ; - Probabilistic or deterministic, matching methods put to the test by the RéSIL programme ; - Impact of data cleaning on the quality of a match Writing a working document: - Matching: aims, practices and quality issues, July 2024 In production : - ?? | 
| Project code |