ein Tablet liegt in der linken, eher oberen Bildhälfte, ist angeschaltet und zeigt ein Grafiken

Code to compare online job vacancies data to official employment statistics published

Online job vacancies (OJV) are an innovative data source to analyse developments in the labour market. However, in order to produce relevant insights from any subset of the data about the labour market as a whole, one needs to know more about the distribution of their OJV data. This recently published GitHub repository contains helpful code for such an analysis.


We believe that online job vacancy data can be very useful to acquire insights about developments in the labour market across occupations and economic sectors. This is why the Bertelsmann Stiftung’s project Kompetenzen für die Arbeit von morgen (competences for the work of tomorrow) has acquired a large databank of online job vacancies in Germany (entitled DOSTA: Datenbank von Online Stellenanzeigen).

In order to better understand how the data contained in DOSTA is distributed across occupations and economic sectors, we have created a data integration process that compares distributions of online job vacancies to statistical data from the Bundesagentur für Arbeit according to the Klassifikation der Berufe (KldB 2010) and the Klassifikation der Wirtschaftszweigen (WZ 2008). In a nutshell, the process cleans and integrates different files containing data on employees (Beschäftigte), initiatied employment relationships (begonnene Beschäftigungsverhältnisse) and online job vacancies, unifying all of those into a single file for each classification type (Klassifikation der Wirtschaftszweige 2008 and Klassifikation der Berufe 2010) and region (Germany as a whole and each federal state).

Since this process can be of interest to anyone working with online job vacancy data, we have published the data processing scripts to a public GitHub repository. The available code is aimed at anyone working with online job vacancy data who wishes to acquire a better understanding of the distribution of their data across Germany. It is written in Python, so a basic understanding of the programming language is necessary to run and adapt the scripts. Also, the online job vacancy data must be already mapped to the Klassifikation der Berufe 2010 or Klassifikation der Wirtschaftszweigen 2008. The code does not provide an automated process of classifying and mapping the OJVs to these classifications.

More information can be found on the repository’s page. Any feedback on the code, including adaptations and possible improvements are very welcome.