Published on Data Blog

Predicting food crisis outbreaks using natural language processing of news streams

This page in:

Village produce market. Nigeria. Photo: Curt Carnemark / World Bank
Village produce market. Nigeria. Photo: Curt Carnemark / World Bank

Food insecurity continues to threaten the lives of hundreds of millions of people around the world today. According to the Food and Agriculture Organization of the United Nations, the number of undernourished increased from 624 million people in 2014 to 688 million in 2019. The situation has sharply deteriorated since, due to the interlinked shocks of the COVID-19 pandemic, climate change, and conflicts, with between 702 and 828 million people worldwide having faced hunger in 2021. Severe food insecurity increased both globally and in every region in 2021 and is now a top priority of the international community. 

It is estimated that about $75 billion has been spent on global food security assistance between 2014 and 2018. Considerable evidence shows that quickly responding to emerging risks of food insecurity saves lives, preserves the livelihood of the most vulnerable, and lowers humanitarian costs, which leads aid agencies to resort to early warning systems to decide when and where to deploy emergency relief. While risk factors are well established, ranging from climate change to conflicts, pests, and migration, delayed or infrequent measurements of these factors at the local level typically impede early warning systems’ ability to promptly anticipate food crises. Furthermore, food-insecure countries often lack the capacity to systematically measure risk factors, generating data gaps.

Against this backdrop, the past decade has seen an explosion in the availability of vast repositories of digital data, from satellite imagery to mobile phone call detailed records, which are increasingly being exploited to address development challenges. Encouraged by these approaches, I collaborated with Ananth Balashankar and  Lakshmi Subramanian from New York University to develop a new method which leverages recent advances in deep learning and natural language processing to extract mentions of risk factors of food insecurity from the text of more than 11 million news articles about food-insecure countries published between 1980 and 2020 (Fig. 1). We discovered nearly 170 relevant keywords and phrases—such as “conflict”, “pests”, “drought”, “floods”, “raising food prices”, or “migration”—and we constructed risk indicators of food insecurity from the occurrences of these mentions in news articles over time and across districts.

Figure 1. Extracted mentions of risk factors of food insecurity from the text of more than 11 million news articles about food-insecure countries published between 1980 and 2020
Figure 1: A new machine learning model discovered 167 key words and phrases predictive of food insecurity from a corpus of 11 million news articles about food-insecure countries published between 1980 and 2020. Each box contains an example of a sentence in which the model detected a relevant text feature (highlighted in color). The 167 text features are grouped into 12 categories of food insecurity risk factors indicated in the legend and mapped into a network. A node’s size is proportional to the text feature’s frequency in news articles, and an edge’s width encodes the semantic proximity between nodes. Over the period 2009-2020 and across 37 food-insecure countries in Africa, Latin America, and South Asia, news coverage yielded substantially more accurate predictions of food crisis outbreaks at the district level than traditional measurements that did not include news story text. Credit: Samuel Fraiberger and Alice Grishchenko.

(See larger version)

We then developed a machine learning model incorporating the news indicators to generate monthly district-level predictions of the integrated food security phase classification (IPC)—a widely scrutinized indicator of food security. Our findings recently published in Science Advances indicate that our predictions are considerably more accurate up to twelve months ahead of time than both experts and existing forecasts that do not include news story text. Over the period 2009-2020 and across 37 food-insecure countries in Africa, Latin America, and South Asia, incorporating news indicators into existing forecasts of food crises three months ahead would have reduced prediction errors by more than 40%. Furthermore, we also found that while traditional risk indicators—such as precipitation levels, conflict severity or food price indices—are costly and time-consuming to collect, news indicators could serve as a cost-effective substitute. Finally, contrary to existing early warning systems, the articles that we collected are published daily, which allow us to generate high-frequency forecasts of food insecurity.

Why are news-based predictions so accurate? This innovative approach harnesses the fact that risk factors triggering a food crisis are often mentioned in on-the-ground news reports prior to being recorded in traditional risk indicators, which can be incomplete, delayed, or outdated.

To understand the mechanics of what is happening under the hood, we zoom in on a crisis episode during which news factors would have helped to anticipate the deterioration of the situation. In early 2016, the fall armyworm—a lepidopteran pest native to the Americas—started spreading across 20 countries in Africa, decimating large quantities of crops. By September, news from the Yambio county in South Sudan mentioning pest-related terms had peaked, five months ahead of the IPC raising from a “stressed” to a “crisis” phase. Our news-based model was therefore capable of correctly predicting the upcoming crisis outbreak three months ahead, providing additional time for action.

By contrast, the vegetation index typically used in traditional models to quantify the impact of pest infestations on vegetation greenness only dropped one month before the crisis phase started. This measurement delay did not offer sufficient lead time for traditional models to provide actionable insights to anticipate the upcoming crisis. Damage to crops caused by the pest infestation was only reflected in vegetation greenness once the food security of neighboring populations had begun to deteriorate, strengthening the importance of measuring anticipatory signals from the news. This example also illustrates that while machine learning models are often too complex to be interpretable by humans, our approach makes it possible to explicitly interpret the predictions of food crisis outbreaks by tracing back to variations in news mentions of the underlying causes of an upcoming outbreak.

Although the drivers of food insecurity are well-known, early warning systems relying on high-frequency measurements of these factors are still lacking.  Our machine learning model drastically improves the prediction of food crisis outbreaks using real-time news streams, and the model’s predictions are simple to interpret and explain to policymakers. The code and data to reproduce this study are publicly available on Github. We are also planning to release an automated data system in which regular updates of our news indicators and model predictions will be publicly available. We, therefore, encourage development practitioners and policymakers to use the predictions of our model to prioritize the allocation of emergency food assistance across vulnerable regions in a targeted way. The adoption of our data-driven assessment of food insecurity could enable more effective crisis preparedness, earlier, faster, and more targeted response when a crisis hits, and a reduction in human suffering.

Early warnings cannot address all of the sources of delay in emergency responses; however, it can mitigate it by increasing the cost of inaction for governments and the international community.  Beyond the context of food insecurity, measuring anticipatory signals of fragility risks from news streams could have profound implications on how aid gets allocated and open previously unexplored avenues for machine learning to improve decision-making in data-scarce environments.


Samuel Paul Fraiberger

Data Scientist, Development Economics Vice Presidency

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000