Since its inception, the World Bank’s Open Data initiative has generated considerable excitement and discussion on the possibilities that it holds for democratizing development economics as well as for democratizing the way that development itself is conducted around the world. Robert Zoellick, in a speech given last year at Georgetown University, expounded on the many benefits resulting directly from open data. Offering the example of a health care worker in a village, he spoke of her newfound ability to “see which schools have feeding programs . . . access 20 years of data on infant mortality for her country . . . and mobilize the community to demand better or more targeted health programs.” Beyond this, Zoellick argued that open data means open research, resulting in “more hands and minds to confront theory with evidence on major policy issues.”
The New York Times featured the Bank’s Open Data initiative in an article published earlier this month, in which it referred to the released data as “highly valuable”, saying that “whatever its accuracy or biases, this data essentially defines the economic reality of billions of people and is used in making policies and decisions that have an enormous impact on their lives.” The far-reaching policymaking consequences of the data are undeniable, but the New York Times touches upon a crucial question that has been overshadowed by the current push for transparency: what about quality?
Without placing equal emphasis on collecting data that is timely, consistent, and of high quality, few benefits can be reaped from the release of data to the public. A treasure trove of data that is rife with bias and plagued by inaccuracies is of little use to any researcher, statistician or village health care worker, regardless of whether they operate within or outside of the Bank. Indeed, inaccuracies and biases in data can result in significant harm, inasmuch as the data is used to inform the policies of developing countries.
Fortunately, the 7,000+ datasets that have been released to the public under the Open Data initiative represent some of the highest quality data currently available in a number of sectors. In some sectors, however, data quality lags severely behind. For example, despite the importance of the agricultural sector in reducing poverty and food insecurity throughout the developing world, serious weaknesses in agricultural statistics persist. According to the 2008 findings of the FAO’s Agricultural Bulletin Board on Data Collection, Dissemination and Quality of Statistics, only two of the forty-four countries in Sub-Saharan Africa are considered to have high standards in data collection, while standards in twenty-one countries remain low. As a result, the quality of the agricultural statistics collected in many countries is questionable, rendering the data ineffectual in guiding policy decisions aimed at benefitting the poor.
It is therefore crucial that the current mandate for open data go hand in hand with an equally strong mandate for better data – data collected based on sound survey and sample design, free from bias or error, and disseminated in a timely fashion. Without improvements in data collection methodology, the data that constitutes the bulk of the development community’s knowledge about the realities of life in many of the world’s poorest countries will continue to suffer from inaccuracy and error. Furthermore, many sectors that play major roles in the livelihoods of the extreme poor but on which little data is available – livestock, fisheries, and forestry, to name a few – will continue to be overlooked by both the international community and national governments with regards to future policy and flows of assistance or aid.
There are a number of ongoing efforts to improve data quality and coverage in the developing world. One such initiative is the Global Strategy to Improve Agriculture and Rural Statistics, a multi-institution initiative endorsed at the 41st Session of the United Nations Statistical Commission. The Strategy assigns a pivotal role to methodological research in order to improve the quality and policy relevance of the available information specific to the agricultural sector. In accordance with the main tenets of the Global Strategy, the Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) project is collaborating with the Ministries of Agriculture and National Statistics Offices of its partner countries in Sub-Saharan Africa to design and implement systems of multi-topic, nationally representative panel household surveys with a strong focus on agriculture. The project is implemented by the World Bank’s Living Standards Measurement Study team, which has been championing the cause of better data since its creation in 1980 under World Bank President Robert McNamara, as a response to his urgent call for better, more timely micro-data. The LSMS-ISA project represents the team’s latest effort to improve data quality, motivated by the widespread problems faced by agricultural statistics in much of the developing world. In addition to increasing the availability of data in Sub-Saharan Africa, the project is conducting methodological work to enhance the quality of survey data in several areas, including crop productivity, livestock and climate change. The data generated by the project intends to shed light on the links between agriculture and poverty reduction in the region, as well as to foster innovation and efficiency in statistical research in the sector.
The clarion call for open data heralds an era where people around the world will gain the ability to more knowledgeably campaign for their rights and to make their voices heard. As governments increasingly respond to the need for transparency and move to freely share their data with citizens, individuals will be empowered to take an active part in their country’s development and in the improvement of their livelihoods. However, as we move forward into a world where statistics play an increasingly powerful role in setting priorities and determining the direction of future policy efforts, it is essential to recognize that we cannot realize a more open and inclusive model for citizen-centric development without identifying the areas in which existing data is insufficient or problematic and working to bridge those gaps.
We must recognize that open data is not enough.


Comments
Maybe to someone involved in
Data collection needs to be inclusive too
- Taking advantage of new digital approaches to change the way we collect data - empowering communities to collect the data that matters to them, as well as the data that matters for our global conversations (open governance data, as well as open government data)
- Ensuring we spread the skills to access, work with and interpret data (and not just through developer intermediaries) widely amongst policy makers and local communities - and equip local communities to speak up when the data doesn't match their experiences
- Develop better methods to combine big data, small data, and information (stories/content/analysis) - taking advantage of wider open information principles to ensure that when we find statistics in the big data, we can check their reliability, and check our interpretations of them by looking at small, local datasets, and looking at stories and narratives from citizen groups, to improve our policy making
It would be really interesting to see how learning from open data and other open knowledge initiatives so far can help us think about ways to improve data quality that both contribute to wider skills for working with data, and that mitigate the risks that can emerge from increased central data demands. ---- *Scott's 'Seeing like a state' (http://amzn.to/o0nl8e) highlights through a number of cases both how data collection and the categories that any data centralised collection imposes can impact upon how communities understand and organise themselves, often with unintended policy consequences; and how, unless data collection also takes into account local knowledge, customs and practice, policy made with large datasets can have problematic (even catastrophic) effects. **The use of 'inclusive' as a separate term from 'open' is a really important here. A lot of open data initiatives are implicitly 'meritocratic' (as in, 'the best argument' or 'the best use' from the data should win out) , but often this meritocratic framing can come without considering whether people started from a position of equal opportunities, and so goals of inclusivity may not naturally flow from openness.Capturing Citizen Data: What's Good Enough?
data quality
Open Data Collection
Quality Must Prevail
Response
quality all through