The term "big data" is much in the news lately – alternatingly touted as the next silver bullet potentially containing answers to myriad questions on natural and human dynamics, and dismissed by others as hype. We are only beginning to discover what value exists in the vast quantities of information we have today, and how we are now capable of generating, storing, and analyzing this information. But how can we begin to extract that value? More importantly, how can we begin to apply it to improving the human condition by promoting development and reducing poverty?
That is precisely the question that motivated the World Bank Group and Second Muse to collaborate on the recently released report Big Data in Action for Development. Interviews with big data practitioners around the world and an extensive review of literature on the topic led us to some surprising answers.
Good questions help define scope of analysis, identify key behaviors
It is a common assumption that in order to engage effectively with big data, you have to start with the data itself and let them "speak". It turns out, most practitioners disagree. We heard time and again from experts in the field that any work with big data must begin first with questions. As opposed to being led by whatever dataset is available, starting with questions allows practitioners to define the setting and scope of their analysis and identify the behaviors or conditions in the world that interest them. Questions help practitioners determine why they are seeking data and identify the media generating the data relevant to their purpose and scope.
In Big Data in Action for Development, we note that the purpose of most big data projects fall into three related categories: awareness, understanding, and forecasting. To share a few examples:
- Real-time information and awareness regarding the extent of the damage resulting from Typhoon Haiyan in the Philippines provided insight into the optimal direction of response efforts, while access to data raised awareness of the extent of mobile money transfers in Kenya and was able to inform changes in banking policy in that country.
- Mexico's pilot project tracking population movements in response to the spread of epidemic disease deepened understanding of those dynamics, informing the need for policy levers that could reduce infection rates.
- Assessing sentiments of "confusion" in conversations about employment in online forums in Ireland forecasted unemployment increases three months earlier than official statistics.
These categories can give shape to the formulation of questions. If you're interested in the changing price of wheat in a given country, big data may be used to answer one of the following questions:
- How much are farmers currently receiving for the wheat they are selling? (Awareness)
- What is driving changes in wheat purchase prices? (Understanding)
- What will wheat purchase prices be next month? (Forecasting)
A well-articulated series of questions and a purpose help inform the selection of relevant data mediums. Mediums that provide effective sources of big data include satellite, mobile phones, social media, internet text, internet search queries, financial transactions, among others.
As the examples below illustrate, by cross-referencing primary media with the primary purpose of the big data, big data projects can take on a great variety of configurations depending on the context. Carefully combining datasets from various sources to create "mashups" can reveal further insights.
|Using Mobile Data (Call Detail Records) to detect impacts from "microviolence" (skirmishes and improvised explosive device (IED)) in Afghanistan. (Blumenstock, 2014)||Using Financial Data to increase understanding of individual customer preferences (IBM Global Business Services, 2013).||Using Satellite Data to analyze the spectral signature of water and forecast the occurrence of mosquito outbreaks (Smolan & Erwitt, 2012)|
|Using social media to identify the locations of ceasefire violations in Syria (Robertson & Olson, 2013)||Using Internet Text Data and text analytics to understand cultural differences in the Middle East.||Using Internet Search Queries to forecast home sales dynamics (Lohr, 2012)|
As we deepen our ability to gather insights from big data and put those insights into action, organizations working in international development can make more efficient use of their resources if they start out by posing the right questions and leveraging relevant data sources.
Very informative post. Thanks for sharing! The amount of data in the world is exploding - large portion of this comes from the interactions over mobile devices being used by people in the developing world - people whose needs and habits have been poorly understood until now. Researchers and policymakers are beginning to realize the potential for channeling these torrents of data into actionable information that can be used to identify needs & provide services for the benefit of low-income populations.
By now, with the amount of available data, it's natural to think that the best practice is to rely on the data-driven insights, but that's not true, or at least not anymore. Generally, every kind of big data related work has to start with reasonable questions about what kind of effect we expect from a specific variable. For example whether a company is analyzing the cause of large amount of cost for a specific campaign or a government is trying to figure out how to solve low-income populations problem, the question to ask is always what kind of cause can theorically generate theese problems and then, with analytics aid, see if our assumptions are true or false. Only by then the true analysis starts, enriching our assumptions with data-driven insights.