Open Data Is Not Enough

|

This page in:

Since its inception, the World Bank’s Open Data initiative has generated considerable excitement and discussion on the possibilities that it holds for democratizing development economics as well as for democratizing the way that development itself is conducted around the world.  Robert Zoellick, in a speech given last year at Georgetown University, expounded on the many benefits resulting directly from open data.  Offering the example of a health care worker in a village, he spoke of her newfound ability to “see which schools have feeding programs . . . access 20 years of data on infant mortality for her country . . . and mobilize the community to demand better or more targeted health programs.”  Beyond this, Zoellick argued that open data means open research, resulting in “more hands and minds to confront theory with evidence on major policy issues.”

The New York Times featured the Bank’s Open Data initiative in an article published earlier this month, in which it referred to the released data as “highly valuable”, saying that “whatever its accuracy or biases, this data essentially defines the economic reality of billions of people and is used in making policies and decisions that have an enormous impact on their lives.”  The far-reaching policymaking consequences of the data are undeniable, but the New York Times touches upon a crucial question that has been overshadowed by the current push for transparency: what about quality? 
 
Without placing equal emphasis on collecting data that is timely, consistent, and of high quality, few benefits can be reaped from the release of data to the public.  A treasure trove of data that is rife with bias and plagued by inaccuracies is of little use to any researcher, statistician or village health care worker, regardless of whether they operate within or outside of the Bank.  Indeed, inaccuracies and biases in data can result in significant harm, inasmuch as the data is used to inform the policies of developing countries. 

Fortunately, the 7,000+ datasets that have been released to the public under the Open Data initiative represent some of the highest quality data currently available in a number of sectors.  In some sectors, however, data quality lags severely behind.  For example, despite the importance of the agricultural sector in reducing poverty and food insecurity throughout the developing world, serious weaknesses in agricultural statistics persist.  According to the 2008 findings of the FAO’s Agricultural Bulletin Board on Data Collection, Dissemination and Quality of Statistics, only two of the forty-four countries in Sub-Saharan Africa are considered to have high standards in data collection, while standards in twenty-one countries remain low.  As a result, the quality of the agricultural statistics collected in many countries is questionable, rendering the data ineffectual in guiding policy decisions aimed at benefitting the poor.

It is therefore crucial that the current mandate for open data go hand in hand with an equally strong mandate for better data – data collected based on sound survey and sample design, free from bias or error, and disseminated in a timely fashion.  Without improvements in data collection methodology, the data that constitutes the bulk of the development community’s knowledge about the realities of life in many of the world’s poorest countries will continue to suffer from inaccuracy and error.  Furthermore, many sectors that play major roles in the livelihoods of the extreme poor but on which little data is available – livestock, fisheries, and forestry, to name a few – will continue to be overlooked by both the international community and national governments with regards to future policy and flows of assistance or aid.

There are a number of ongoing efforts to improve data quality and coverage in the developing world.  One such initiative is the Global Strategy to Improve Agriculture and Rural Statistics, a multi-institution initiative endorsed at the 41st Session of the United Nations Statistical Commission.  The Strategy assigns a pivotal role to methodological research in order to improve the quality and policy relevance of the available information specific to the agricultural sector.  In accordance with the main tenets of the Global Strategy, the Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) project is collaborating with the Ministries of Agriculture and National Statistics Offices of its partner countries in Sub-Saharan Africa to design and implement systems of multi-topic, nationally representative panel household surveys with a strong focus on agriculture.  The project is implemented by the World Bank’s Living Standards Measurement Study team, which has been championing the cause of better data since its creation in 1980 under World Bank President Robert McNamara, as a response to his urgent call for better, more timely micro-data.  The LSMS-ISA project represents the team’s latest effort to improve data quality, motivated by the widespread problems faced by agricultural statistics in much of the developing world.  In addition to increasing the availability of data in Sub-Saharan Africa, the project is conducting methodological work to enhance the quality of survey data in several areas, including crop productivity, livestock and climate change.  The data generated by the project intends to shed light on the links between agriculture and poverty reduction in the region, as well as to foster innovation and efficiency in statistical research in the sector.

The clarion call for open data heralds an era where people around the world will gain the ability to more knowledgeably campaign for their rights and to make their voices heard.  As governments increasingly respond to the need for transparency and move to freely share their data with citizens, individuals will be empowered to take an active part in their country’s development and in the improvement of their livelihoods.  However, as we move forward into a world where statistics play an increasingly powerful role in setting priorities and determining the direction of future policy efforts, it is essential to recognize that we cannot realize a more open and inclusive model for citizen-centric development without identifying the areas in which existing data is insufficient or problematic and working to bridge those gaps. 

We must recognize that open data is not enough.

Regions

Authors

Raka Banerjee

Project Coordinator, Development Data Group, World Bank

Anonymous
July 25, 2011

Maybe to someone involved in research the Open Data isn’t enough, but on the ground the policy is severely damaging our relationships with clients. Many of them aren’t comfortable with the fact that anything they say or write can become public.

Tim Davies
July 25, 2011

It's clear that the quality of data supplied is a big issue to making open data effective.

But we also need to recognise that improving the use of data in policy making is not just about improving the quality of data supplied, and to recognise that collecting data is in itself a developmental intervention with many different impacts*.

Going beyond "open data" as simply releasing the data we already have, with a goal of more open and inclusive** development might involve:

Taking advantage of new digital approaches to change the way we collect data - empowering communities to collect the data that matters to them, as well as the data that matters for our global conversations (open governance data, as well as open government data)
Ensuring we spread the skills to access, work with and interpret data (and not just through developer intermediaries) widely amongst policy makers and local communities - and equip local communities to speak up when the data doesn't match their experiences
Develop better methods to combine big data, small data, and information (stories/content/analysis) - taking advantage of wider open information principles to ensure that when we find statistics in the big data, we can check their reliability, and check our interpretations of them by looking at small, local datasets, and looking at stories and narratives from citizen groups, to improve our policy making

It would be really interesting to see how learning from open data and other open knowledge initiatives so far can help us think about ways to improve data quality that both contribute to wider skills for working with data, and that mitigate the risks that can emerge from increased central data demands.

----
*Scott's 'Seeing like a state' (http://amzn.to/o0nl8e) highlights through a number of cases both how data collection and the categories that any data centralised collection imposes can impact upon how communities understand and organise themselves, often with unintended policy consequences; and how, unless data collection also takes into account local knowledge, customs and practice, policy made with large datasets can have problematic (even catastrophic) effects.

**The use of 'inclusive' as a separate term from 'open' is a really important here. A lot of open data initiatives are implicitly 'meritocratic' (as in, 'the best argument' or 'the best use' from the data should win out) , but often this meritocratic framing can come without considering whether people started from a position of equal opportunities, and so goals of inclusivity may not naturally flow from openness.

A Walji
July 25, 2011

I'm very pleased that a genuine conversation has started about the merits of Open Data, the possibilities of Open Development (a more inclusive model) including potential risks and pitfalls.

Clearly data quality matters as well who collects the data and whose reality the data represents. In as much as multi-country, multi-year Rolls Royce survey tools are important, there is a also a role for quicker citizen-driven surveys that give us a pulse on what's happening in real time. Tools like episurveyor (in the heatlh sector) and TextEagle in the private sector have shown us that sometimes a Timex tells us time as well as Rolex. Listening to people's realities (be they on agricultural yields, teacher attendance, or the presence of medicines in clinics) can generate data much more frequently, cheaply, and reliably than before and these tools rely on the experience of users (not just external data collectors).

It's the combination of citizen data and government data coming together that present opportunities for new kinds of analysis, policy debate, and decision-making. Technology makes it easier to "listen" to our users and focus not just on governments as our clients but citizens as end-users. Unless we know their economic and social realities are improving, satisfying our clients is simply not enough.

Doug Hidden
July 25, 2011

Data quality is a moving target. Concerns about data quality often delays publication, to the point where perfect data is often too late for impact. That's why crowdsourcing quality through rapid publication is often better. And, it can help data providers to manage quality risks and focus on what is material for information users.

AMIT SENGUPTA
July 26, 2011

For a citizen centric development it is doubtful how macrodata could be more useful to elicit microdata.I do not know whether World bank is having the mechanism to elicit such request promptly and how their staff and officers would react because it will be an additional work for each division to provide required information which,apart from financial involvement, would be a gigantic task too.
The core issue will be how can you call into play the wisdom of human species. An experiment is going on in India to elicit Public Authority information to common man by promulgation of the Right to Information Act,2005 by paying a small fee of 0.22 USD. The bureaucracy within themselves formed a caucus and whenever common man asks for information which is sensitive and affect the working of the concerned department there was a common reply from first tier of Public Information Officer "the required information is not available", Although there are procedural ways to make the information available it would take at least 1-11/2years which means an individual must have patience of perpetual nature to extract information from bureaucracy.
Let world bank should not suffer from this syndrome.

Jia H. Jung
July 29, 2011

Unguarded acceptance of the Open Data Initiative would be yet another outgrowth of how all of society has begun sourcing information in general.

Steeped in a quasi-infinity of information, half truths, opinions, and outright falisities, we laud the democracy of information sharing afforded by technology and instantaneous transmission.

This would be fine if individuals employed their own filters, but many do not. People of all fields and levels of education take information for granted and place accountability upon sources and systems rather than upon themselves for the decisions they make and for the new information that they propogate.

Search engines and databases are corporations themselves, generating revenue and approaching a monopoly on information (Google springs to mind)by prioritizing results in an order of importance as determined by algorithms which are in turn determined by secret invasions into consumers' lives to predict their market behavior...something which should have nothing to do with the facts that are made available to them if these facts are objectively accurate.

Just so, open data opens doors to manipulation of raw information into producing conclusions favored by the right people. This is an outcome that might trump inpreciseness insofar as its risk to policy making.

The difference between the past and the present is not technology itself nor the ease of information sharing that it affords. The difference is how humans presuppose the quality of information, trusting blindly in systems in a way that they never would have before...further legitimizing misuse as being more democratic and therefore closer to truth.

We've reached a time in which open information affects decisions at a greater rate than rational human evaluations (as opposed to search engine algorithms or political influences)affect the information that makes the decisions.

Of course, I am not a statistician who could attempt to quantize this last statement, but even the observation causes me to demand quality above all things.

If technology can make access to information faster and broader, it can do the same to expedite studies that employ some necessary skepticism of their data. If technology cannot do this, we must make it so.

Ms. Banerjee's is a sound reminder that in order to dispose of the chaff that accompanies democracy of information, one above all needs to know what one is looking for. And all pleasant surprises along the way will have substance in virtue of preceding knowledge.

Raka Banerjee
August 04, 2011

Many of the comments on this post raise important issues concerning open data and data collection processes. Tim Davies discusses some of the various steps necessary to improve the collection of data in order to empower communities, which is perhaps the most crucial component of achieving truly open, inclusive development, as Aleem Walji called for in his earlier post. Tim’s reference to Scott’s “Seeing Like A State” also highlights the dangerous consequences that can result not just from policies based on inaccurate and/or biased data, as I mentioned in my post, but also from the data collection process itself and its impact upon the enumerated communities. It serves as a reminder that the need for quality extends throughout the entire data collection process, and involves paying attention to local needs, customs and knowledge at every step of the process.

I also agree with Jia Jung’s comment on the unfortunate tendency of information users to increasingly place accountability for information quality entirely upon the generative sources and systems. Indeed, one of the greatest values of open knowledge initiatives such as the Bank’s Open Data Initiative is to empower users to review, replicate, and challenge existing data and policies, which requires claiming accountability as well as power with regards to information production.

However, I disagree that crowdsourcing quality through rapid publication is necessarily the way forward. It is first and foremost incumbent upon those involved in the collection of data to ensure that the information that they release is as accurate and free from bias as possible. It is not reasonable to outsource the accountability for doing one’s job properly to a population that is unable to effectively monitor the quality of the data collection process – nor is it always easy for users to identify overarching problems in data quality ex-post. That said, the issue of timeliness is crucial to informing relevant policy, and should not be overlooked. New technologies such as Computer Assisted Personal Interviewing (CAPI) software packages or field practices such as decentralized data entry can greatly reduce the amount of time necessary to produce clean, error-free data, with no attendant negative implications for quality.

Raka

Gérard Chenais
July 22, 2012

Quality of data is important and more and more pressure is put on statisticians to 'assure' the quality of their outputs; but these outputs are mainly inputs to analysis processes so quality of analysis is even more important than the quality of data ; analysis would be useless without decisions for actions as data would be useless without analysis, and finally the quality of the actions implemented is all what matters.

So Opening data is necessary but what is on the way at the World bank for assuring quality of analysis, quality of advices, quality of decision, quality of action?