Curiexplore, the platform for comparing national education and research policies
Interactive visualisation of the teaching environment and research environment in different countries.
1 January 2020
| Classification of checkout data using machine learning | |
|---|---|
| Project details | scanner data has been used by INSEE to calculate the CPI since 2010. For each barcode, each day and each point of sale, till data gives the quantities sold as well as the turnover and/or the price at which the product was sold. To use this data, however, you need to know which product is behind a barcode. Currently, the IPC relies on a barcode repository, purchased from a service provider, which provides very detailed and structured information on the characteristics of these products. This information is subject to a charge and does not cover all products. The aim of the experiment is to identify the steps involved in textual processing of the labels, as well as the classification or other methods that would enable the labels to be coded automatically, without going through the repository, in the Coicop nomenclature for the IPC and on the groupings used for Emagsa as part of the Nosica project, which aims to integrate cashier data into the production of short-term activity indicators. It is also testing their performance on test data sets. |
| Stakeholders | Insee |
| Project results | scanner data is now used in production to calculate inflation and business activity indicators. |
| Project products and documentation | - Using Scanner Data to Calculate the Consumer Price Index, courrier des statistiques n°3 de l’Insee, décembre 2019 - Scanner data and quality adjustment, documents de travail n°F1704 de l’Insee, août 2017 |
| Project code | - https://github.com/InseeFrLab/predicat API for classifying checkout labels - https://github.com/InseeFrLab/product-labelling : Application for labelling scanner data |
Online data collection (web scraping) is not only used in the production of inflation figures. It is also used in other areas and by other entities within the public statistics service besides INSEE. Since 2020, INSEE has also been using checkout data in the definition of the CPI, as noted in the article Using Scanner Data to Calculate the Consumer Price Index in the 2019 statistics newsletter.