Published on Data Blog

Will your project fail without a data scientist?

This page in:

Data scientist may be the sexiest job of the current century, and everybody in the world may be crying hoarse over the growing shortage of data scientists, but if you are leading an international development project or an international development agency, chances are you don’t have a data scientist on your team and you likely aren’t looking for one. That’s a problem.

Yes, a data revolution is remaking the world but the development community has been slow to embrace the potential of new data sources and techniques. That’s not to say that the potential of new data sources and techniques is not understood by the development community - in fact the report of the high level panel of eminent persons on the post-2015 development agenda calls for a ‘data revolution’ to ‘strengthen data and statistics for accountability and decision-making purposes’ - but the reality is that it is still not the norm for most operations on the ground to build emerging data techniques and sources into their DNA. It’s almost as if data scientists and development specialists live in two different worlds.

The reason it matters is because development practitioners - exceptions aside - may be missing significant opportunities to develop and deliver their projects faster, more effectively, and perhaps even less expensively. A few examples below may help illustrate the point.

Are you missing the data opportunity?

Designing smarter projects
Good baseline data is at the heart of successful project design, and most development projects do a good job of assembling traditional data - mostly from the government and recognized external sources. Quite often however this data can be incomplete or not current enough and projects either commission new studies (that take time) or make decisions that recognize the limitations of the available data.

So if you are working to upgrade the transportation infrastructure of a city you might make do with traffic studies that are slightly old or do not necessarily take into account all modes of transport. Wouldn’t it however be nice if you could access/create a real-time view of transport in a city (as in this example from Moscow) and use it as your baseline - something that tells you how many people are underground or above ground, how they are people are traveling, their commute patterns, their mode of transport, how disruptions/shocks affect the system, and more? This is the kind of data that planners used in Nairobi; there are more examples from Mexico here. And here’s a project in Morocco collecting citizen sentiment data to help design an e-participation platform

Monitoring projects more effectively
There’s nothing quite like being in the field but it’s often impractical (and expensive) to monitor large scale implementations from afar. The traditional approach relies on official reports from the field (‘the paperwork is always perfect’ - a project manager told us!) or spot checks at a sample of implementation locations (or perhaps engage/train/equip local citizens to do it on your behalf). So if you are installing hand-pumps at thousands of locations in a country/province you lay your faith in a combination of paperwork from contractors and visits to a few locations to make sure that the hand-pumps have actually been installed and work (plus perhaps regular updates from local citizens - but we know it can be hard and difficult to scale/replicate/sustain). What if you could talk to the hand-pump directly and it told you where it was, how often it was used, whether it worked or not, and more (here’s an example from Rwanda on how to do it through a combination of sensors and cell phone technology). See other examples of data driven monitoring at Global Forest Watch and Bagega (Nigeria).

Using data to monitor hand-pumps in Rwanda

Tracking results better
Eventually nothing other than results on the ground matters and unfortunately, as at least some in the development community concede, it has typically been hard to measure development outcomes conclusively. New data techniques aren’t the magic bullet here but they make it possible to measure, even if roughly, macro-trends faster and more locally. So if you are frustrated that the poverty data in the location you are working in is ‘national’ or hasn’t been updated for a few years, you could try ‘lean data collection’ techniques like the one being tried by Ziqitza/Acumen in India that seeks to gather local poverty rate data using a call center infrastructure dedicated to providing social services. You could also explore the feasibility of using cell phone usage data to ‘predict’ socio-economic levels.  

The missing data link in international development organizations

The examples above are just some of the ways new sources of data and emerging data techniques can help reshape how development projects are organized and how they deliver/measure results. There are many others - see the work of Global Pulse for example, or the Guardian’s data blog. We are at an intricate juncture - few question the value-proposition of emerging data techniques (though some genuine concerns remain), but very few teams yet incorporate data scientists in any systematic fashion (except as ‘innovation’ pilots). Some of that is because there is still an under-developed understanding of how to use data scientists on the ground; the other reason is that most international development organizations (and the governments they work with) weren’t born digital and they haven’t yet made the transition to being data driven organizations.

Ready to hire a data scientist?
If you do buy into the notion that your project/organization might need a data scientist, what’s the best way to use one? International development organizations have a fairly good understanding of the roles domain/sector experts play and how/when legal/environment/financial management experts can help, but data scientists? Not so much yet. Here are a few ideas -

  1. Ask your data scientist to broaden your sources of data - your sector experts/economists must lead the work with mainstream data sources but are they able to provide you all the answers you might need? If not, ask your data scientist to examine the availability of alternative data sources that may provide the answers. Ask whether the answer may lie in social media (here’s an example from India’s first social media election), satellite images (as were used to study poverty at our last data dive), videos, photographs, drones (used to save rhinoceros from poachers in South Africa), sensors (say for crop monitoring as in India), and more
  2. Ask your data scientist to set up mechanisms to tap into or collect data from new sources - new data sources aren’t very useful if you can’t find a way to access/collect them. Ask your data scientist to scrape websites (on the lines of this price scraping project), partner with private sector firms to access proprietary data, connect to data sources through API nodes, build mobile data collection toolkits (see examples from developing countries here), aggregate crowdsourced data, compile citizen voices, establish a data infrastructure, and more
  3. Ask your data scientist to make sense of the data - your data is only as good as the inferences you are able to draw from it. Ask your data scientist to visualize the data (a picture is worth at least a thousand words, as this example shows), spot patterns and trends, report on real-time events (here are a few examples of real-time data about financial inclusion), recommend decisions and actions, measure and report progress, and more
  4. Help engage with data collaborators and providers - however good your data scientist may be, he or she likely won’t have access to all the data you need or possess all the necessary skills in a field that is developing very rapidly. Work with your data scientist to engage with the community (it’s growing faster than you might think and many data scientists have a strong social good streak), cultivate relationships with professional services providers, and connect with the latest research (there’s so much more to learn all the time)

Using data to tell compelling stories.

And as with everything else, your data scientist is going to be part of a larger team so the usual rules for working well within teams apply here as well. Two worth pointing out -

  1. The domain/sector expert rules - the data scientist can help, but not replace, the domain expert. Knowing how to work with data doesn’t replace expertise in a field (even though it sometimes helps to bring a fresh pair of eyes to a problem). And of course the domain/sector practitioner is a data expert too (but frequently this is expertise with traditional data sources)
  2. It’s ultimately about the problem you are trying to solve - your project isn’t about data, it’s about the answer to a specific answerable, measurable question; it’s no good if you haven’t defined your problem as sharply as possible (and can’t figure out what to measure as the answer)

Building a data driven international development organization
It isn’t enough for you to hire a data scientist. Plenty of you are already doing so in fact (and the examples above clearly show that data for development is already a reality). What’s missing is international development organizations steeped in the modern culture of data, where data influences all decisions and actions, where data must support insight, and where people turn to data first for answers.

So how do you take international development organizations that weren’t born digital and make them data driven? How do you bring the right skills into such organizations? How do you organize/integrate these skills within existing structures? Who ‘owns’ such practices? How do operational processes/expectations change, adjust, and adapt to data scientists? What results should you expect from the practice? How did the private sector do it and what have been the outcomes? Who are experts that we can turn to? These and questions like this are the prime topic for conversation at two open events we are holding at the World Bank in Washington DC on May 29 and June 7. Join us in person or remotely (registration and other details follow shortly). The events follow the data dive last year that we organized with the help of UNDP, QCRI, UNDB, and DataKind. We look forward to meeting you for the first time or seeing you again.

Report from the Data Dive 2013
Video from the Data Dive 2013


DC Big Data Exploration

World Bank Group Finances is the online access point for IBRD, IDA, and IFC open financial data. The website features datasets that cover loans, contracts, trust funds, investments, and financial statements. A relatedmobile app, which allows you to “talk” to us more easily about operational and financial data in nine languages, is available for download for Android and iOS smartphone and tablet users at the Google Storeand the iTunes Store, respectively. Follow us on Twitter to join and remain engaged in the conversation about the Bank’s open financial data.


Prasanna Lal Das

Lead Knowledge Management Officer, Trade & Competitiveness

Julia Bezgacheva

Data Scientist and Open Data Specialist

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000