Classification of checkout data using machine learning

Using machine learning to classify cash register data in the COICOP nomenclature to calculate the CPI
Python
automatic coding
POS data
COICOP
IPC
situation
in production
Published

1 January 2020

Project summary

Classification of checkout data using machine learning
Project details Cash register data has been used by INSEE to calculate the CPI since 2010 (see the working paper on the subject). For each barcode, each day and each point of sale, till data gives the quantities sold as well as the turnover and/or the price at which the product was sold. To use this data, however, you need to know which product is behind a barcode. Currently, the IPC relies on a barcode repository, purchased from a service provider, which provides very detailed and structured information on the characteristics of these products. This information is subject to a charge and does not cover all products. The aim of the experiment is to identify the steps involved in textual processing of the labels, as well as the classification or other methods that would enable the labels to be coded automatically, without going through the repository, in the Coicop nomenclature for the IPC and on the groupings used for Emagsa as part of the Nosica project, which aims to integrate cashier data into the production of short-term activity indicators. It is also testing their performance on test data sets.
Players Insee
Project results Cash register data is now used in production to calculate inflation and business activity indicators.
summer 2020 update
progress report winter 2020-2021
Project code - https://github.com/InseeFrLab/predicat API for classifying checkout labels
- https://github.com/InseeFrLab/product-labelling : Application for labelling cash register data