Data from Earth observation – also called remote sensing data or satellite data – has rapidly become widely available, substantially easier to use, and increasingly common in development economics research. Indeed, 14% of the job market papers in economics that have been blogged about in the annual guest posts here since the 2019/20 job market have used Earth observation data as an outcome or to measure a key treatment variable. Nearly a quarter of those blogged about for the 2022/23 job market use Earth observation data.
Motivated by this increasing prevalence of Earth observation data in development work and the enormous potential that it offers, this will be the first in an occasional series of posts on Development Impact about its use. I’ll focus on developments in the practical application of Earth observation data in research, with specific attention given to new tools that allow those without significant training to use new remotely sensed products and become familiar with some unusual measurement error challenges that using Earth observation data can present.
What is Earth Observation?
First though, I will provide a brief overview of what Earth observation data is, why it is becoming increasingly common, and how it can be useful. The EU Scientific Commission offers the following definition: Earth observation is the gathering of information about planet Earth’s physical, chemical and biological systems via remote sensing technologies, usually involving satellites carrying imaging devices.
That encompasses a wide range of data – from the well-known measures of nightlights that are common in development work to the less common but still widely used satellite measures of pollution to more cutting edge image based applications like counting cars.
How can researchers use Earth observation data?
Data from satellites has been available to researchers who knew where to look and had the technical background to utilize it for decades. But there have been recent advances in both availability of imagery and accessibility that have substantially lowered the barriers to entry to accessing this data. One of the most substantial is the development of Google Earth Engine. This is a cloud-based platform that allows remote processing of petabytes of remotely sensed data. The steep decline in the cost of imagery – high-resolution images of anywhere in the world are now available for less than $200 – combined with the deployment of machine learning algorithms has rapidly increased the usability of satellite imagery in research.
The applications of Earth observation data to development research are myriad. In my own work I have used it to measure fires and crop management regimes. More creative uses include measuring the impacts of access to electrical grids on livelihoods in areas without survey data and predicting compliance in an RCT on crop-burning. The central promise of Earth observation is the ability to measure development relevant outcomes or treatments in areas where ground-based (i.e. survey or administrative) data do not exist.
MOSIAKS and data access
Despite the advances in data accessibility and processing, significant hurdles to wide-spread adoption remain. While raw imagery has never been more accessible, turning that imagery into a measure of a development-relevant outcome can still be computationally challenging. A new(ish) paper seeks to make this process easy for non-experts by centralizing the core task of predicting outcomes from raw imagery.
To understand how the new proposed process works, consider the following simplified workflow for creating a new remote sensing product:
1. Collect satellite imagery. Generally this means assembling a database of unlabeled images over your study area and time period of interest from one of the various satellite imagery sources (e.g. LandSat, Sentinel, etc).
2. Label a set of images based on your study question. For example, if you want to count cars in the images you would select a number of images and hand label all of the cars in each image.
3. Train a machine learning algorithm. Using the hand-labelled images you’d then train a ML algorithm to detect and count cars in all the unlabeled images.
4. Run the algorithm on the full set of images. With an appropriately trained algorithm you could then predict car counts in all your remaining images to create a dataset of cars over your spatial and temporal area of interest.
The workflow is straightforward but the implementation can be so computationally costly as to be prohibitive for anyone without access to a supercomputer. Critically, any time a researcher wants to create a new product all four steps must be repeated. This is where the new approach – called MOSAIKS -- comes in. By simplifying a comprehensive set of raw imagery data into structured tabular data, the computational costs of creating any subsequent data product from that imagery is substantially reduced.
Essentially, the new process adds a new second step that utilizes a machine learning approach to convert a very large set of satellite imagery into tabular data. This set of imagery is not constrained to one particular application but is intended to cover, for example, all of Africa for ten years. Critically, this step only needs to be done one time.
Once the data has been converted, the tabular data can be used instead of the raw images to conduct different outcome analyses for any question that needs data in the spatial and temporal span of the original imagery. For example, two researchers, one interested in counting cars and the other in predicting population density, could both use the tabular data to train their prediction algorithms without having to turn to the raw imagery.
The authors find that reducing the computational cost of the process in this way does not come at the cost of significant losses in accuracy. While the approach is more accurate for some of their test cases than others, the areas in which the approach tends to perform poorly are the same in which Earth observation data tends to be less useful in general (e.g. predicting forest cover vs. incomes).
The authors have made their tools available for beta testing by researchers via an API and have created a tutorial in how to apply the tool. It is an approach that will likely prove useful to a wide variety of development researchers and practitioners who work on questions for which there is not off-the-shelf remote sensing products available.
Having introduced both Earth observation and some tools for making it more feasible for those without a super-computer and computer science background, following posts in this occasional series will consider some of the measurement challenges that come from using remote sensing data.