Jocas, webscraping online job offers

The project Jocas (Job offers collection and analysis system) project enables the DARES (Ministerial Statistical Office for Labour) to automatically collect job offers online in order to compile statistics on the labour market.
webscraping
in production
automatic coding
DARES
Published

1 January 2022

Project summary

Online job vacancies, a new source of labour market data
Project details In just a few years, the Internet has become a new source of information on the job market. According to the Dares Job Offer and Recruitment survey (Ofer), 95% of job advertisements were published on the Internet in 2016, compared with 53% in 2005. With this in mind, Dares decided to collect online job offers published on around fifteen websites to create a new database of job offers: Jocas (Job offers collection and analysis system). Various tools are being used to build this new database: webscraping, automatic text classification algorithms and de-duplication.
For 2019, the Jocas database can be compared with the usual sources of public statistics on job vacancies, whether administrative sources, such as vacancies advertised by Pôle emploi and Declarations Préliminaires à l’embauche (DPAE) from Urssaf, or data from surveys such as Pôle emploi’s “Besoins en main-d’œuvre” (BMO), INSEE’s “Emploi” survey, and DARES’ “Activité et conditions d’emploi de la main-d’œuvre” (Acemo) survey. The results show that Jocas covers occupations unevenly. Occupations with a high proportion of managerial staff or that recruit a lot of people online tend to be over-represented. Conversely, those with a high proportion of multiple recruitments or using informal recruitment channels tend to be under-represented.
Players DARES
Project results Online vacancy data has been used to calculate tensions on the labour market. They were also used to produce the table monitoring the labour market situation in 2020-2021 during the Covid-19 crisis. Jocas data is freely accessible to students, researchers and civil servants. Access to the data may also be granted for statistical and non-commercial use, on request from DARES. The database is accessible on the INSEE’s SSPCloud pl by following the path ‘projet-jocas-prod/diffusion/JOCAS’.
Project products and documentation - Description on the DARES website
- Working document
- Hackathon in March 2023 on the duplication of job offers
- Project news
- Training to use the JOCAS database
Project code - Code repo is available on GitHub https://github.com/OnlineJobVacanciesESSnetBigData/JobTitleProcessing_FR

Similar projects

No matching items