Published on Let's Talk Development

Where are all the jobs? A machine learning approach for high resolution urban employment prediction

This page in:
Dominic Chavez / World Bank Dominic Chavez / World Bank

The data we need in cities  

Detailed data on the spatial distribution of jobs is crucial. It enables urban planners and developers to identify economic hubs within a city and take targeted measures to improve their productivity, connectivity, and resilience . For example, measures such as investing in infrastructure upgrades or flood protection systems, enhancing commuting options, and adapting urban planning decisions, can support firms and yield city-wide benefits for the lives and livelihoods of workers and their communities.  

And yet, few cities have high resolution data on jobs, especially in developing countries. In practice, business registries, employment censuses, or travel surveys are the most common sources for mapping the density and spatial distribution of jobs within a city. But such data are rarely available; and when they do exist, they tend to be incomplete, unreliable, or outdated. As a result, urban planning and investment decisions often rely on patchy, outdated or anecdotal information that cannot efficiently target employment centers.  

Recent initiatives have successfully leveraged mobile phone-derived data to document “meaningful” locations, including jobs. This is a breakthrough, especially given the increasingly ubiquitous use of mobile phones . And yet, despite fast progress, accessing and processing mobile phone data remains a difficult, lengthy, and often costly process. Quick to deploy, cheap and robust alternatives are sorely needed. 

A new scalable approach for predicting the spatial distribution of jobs 

In a new study and its companion demonstration note, supported by the Global Facility for Disaster Reduction and Recovery, we develop a machine learning algorithm that relies on widely available public data to predict the spatial distribution of employment in urban areas at high-resolution. We train and test the algorithm on 14 cities in Sub-Saharan Africa and Latin America, for which survey-based observed employment data was available. These cities range from Abidjan, Ivory Coast to Nairobi, Kenya in Sub-Saharan Africa; and from Belo Horizonte, Brazil to Mexico City, Mexico in Latin America. When comparing our predicted employment maps against employment observations, we find very robust performance of the prediction algorithm, with R2, that is the goodness of fit measure, averaging 0.63 and reaching values up to 0.8. In cities where existing employment maps are coarse, the algorithm may in fact offer more granular insights (Table 1).  

Sub-Saharan Africa

Abidjan

Dakar

Dar Es Salaam

Douala

Harare

Kampala

Kigali

Kinshasa

Nairobi

0.70

0.70

0.71

0.65

0.30

0.54

0.81

0.55

0.77

Latin America

Belo Horizonte

Bogota

Buenos Aires

Lima

Mexico City

0.77

0.52

0.78

0.42

0.58

Table 1: Performance (R2) of the machine learning algorithm when comparing observed and predicted spatial distribution of jobs in 14 cities across Sub-Saharan Africa and Latin America 

We demonstrate that it is a scalable, quick, and low-cost approach that can yield high-resolution job density maps, which can be used in the absence of alternative official data and offer highly detailed insights into the spatial structure of urban economies (Figure 1).   

Figure 1

Figure 1: Observed and predicted employment density in urban Buenos Aires, Argentina (R2=0.78). Employment data source: LOGIT (2012), based on 2004/5 Censo Nacional Ecónomico/The Argentinian National Institute of Statistics and Census (INDEC) and 2011 Encuesta Permanente de Hogares/INDEC.

Note: R2, or goodness of fit measure, indicates the share of variation in the observed employment density that can be explained by the algorithm.

The idea behind the method is simple: locations within a city that have higher concentrations of amenities (such as restaurants but also ATM machines and schools), road intersections, public transport stops, or that display more intense nighttime lights, among other features, are more likely to be hubs of economic activity and employment. Conversely, terrain roughness, water bodies, and vegetation indices are likely to be negatively correlated to the presence of jobs. We implement this notion through a machine learning algorithm that takes into account an array of data sources extracted from OpenStreetMap and Google Earth Engine.

Opportunities for targeted policies and investments

High resolution employment maps open a host of possibilities for operational and analytical applications. For instance, such maps allow for more systematic employment accessibility analyses in cities , with the goal of assessing and improving the effectiveness of urban transport investments and land use interventions. This means that urban planning and investment decisions can specifically target weak points and bottlenecks that hold back urban prosperity and resilience.

Employment data could also help increase the development and application of quantitative spatial economic models, which can be used to identify and prioritize effective economic policies. And high-resolution employment data could constitute a key piece of the puzzle in understanding agglomeration forces in developing country cities and their link with urban spatial layouts.  

From 14 cities to a thousand: Scaling up the coverage

With the validation exercise giving us confidence in the approach’s ability to approximate for the distribution of jobs in urban areas, we are currently scaling up. While we started with comparing our results with observations in 14 cities, we are now focusing on building a library of employment prediction maps for a thousand cities in developing countries (see examples in Figure 2). This library will be made publicly available, following basic validation steps.

Figure 2

Figure 2: Employment predictions across various cities 


Authors

Paolo Avner

Senior Economist, Global Facility for Disaster Reduction and Recovery (GFDRR), World Bank

Samira Barzin

Researcher at the Oxford Martin School at the University of Oxford/UK

Neave O'Clery

Associate Professor and Acting Director of Research at the Centre for Advanced Spatial Analysis (CASA) at University College London

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000