Open Data at the World Bank: 2 years old today

Today is the second birthday of the Bank's Open Data Initiative—announced by the Bank's President, Robert Zoellick on April 20th, 2010:


"It's important to make the data and knowledge of the World Bank available to everyone. Statistics tell the story of people in developing and emerging countries and can play an important part in helping to overcome poverty."


To mark the occasion, we've created a new blog specifically aimed at discussing data and open data issues related to development. And I'm going to use the opportunity of this first post to briefly look back and recap what's been achieved, and—more importantly—to outline some of the plans we have for moving ahead.

When we started out, we focused on making existing public datasets freely available and as accessible as possible, building on the new Access to Information Policy. To make sure our data was usable, we "ate our own dog food;" that is, we developed our own website using the same data and tools (an application programming interface, or API) that we provide to the public. And we developed an "open" Terms of Use for World Bank datasets, together with a single catalog listing of all our open data resources.

Since then, we've been adding datasets to the catalog at a rapid rate. There are now many more development indicators, from around 2,000 in the initial release to over 8,000 today; a new interface and API for accessing data on the Bank's projects and operations—with project activities geo-coded and shown on maps; datasets on the financial aspects of the Bank's business (we're a Bank, after all); and a library of raw data from household and other surveys—an enormously popular resource for development researchers. This week we've updated the World Development Indicators database—the open data source behind the open data website—with the 2012 edition.

Being more open has helped us provide World Bank data in multiple ways: we've published to the International Aid Transparency Initiative's registry, we've mashed-up World Bank projects together with indicators of development progress, and we've launched interactive applications to better analyze key datasets—such as poverty, climate change, jobs, financial inclusion, and aid flows. We've developed applications for mobile devices, with free applications for all major platforms. And, earlier this month, the Bank launched a new Open Knowledge Repository with over 2,100 research and knowledge reports available under a Creative Commons (CC-BY) license—with more reports being added every week.

That's an impressive list. But we've also learnt that others can often do things better than we can. Software developers have been quick to build applications using the datasets. We had overwhelming responses to the Apps for Development competition we ran last year, and the Apps for Climate competition which closed last month. To solve development challenges, you need to understand them; statisticians and designers have developed impressive, innovative visualizations, and story-tellers like Hans Rosling have brought the numbers to life to help explain and illuminate key development policy issues.

So what's next? How can we manage the growth of online data properties? How do we make sure that the data we provide, which often describe complex phenomena, can be interpreted and used correctly? How can open data have an impact on our mission to reduce poverty?

In the last few months, we've been thinking about these and other questions, and we've concluded that some of the priorities and principles are:

To build better links between open data resources. Integrate open data resources, but be careful not to constrain growth—the principle should be "small pieces loosely joined," using open metadata standards. An improved data catalog, bringing datasets from different data repositories together, is a key priority. We've made a start this week, with a new design for the data website country pages, bringing together resources from several data repositories.

To provide better metadata. Data is not enough—good quality, accessible context and documentation (metadata) is often just as important. Part of this should be about World Bank data specialists providing documentation in searchable, accessible, and usable ways; but it's also important to provide online, community mechanisms for sharing the knowledge of other data producers and users.

To promote interaction and use. We plan to continue to improve the tools available for accessing and visualizing our datasets. As an example, we've recently upgraded our DataBank application, so that you can save and share reports, charts, and maps. We aim to develop and share new and innovative ways to help you explore data.

To be more local. Data are often most powerful when used to address pressing local concerns. Meeting this demand means finding ways to provide more datasets at sub-national or transactional levels, with higher-frequency updates and with improved support for local languages. We're working on "curating" key development indicators at sub-national levels, and we're working with countries to support their efforts to publish datasets openly.

To listen more, and to support the global community. We think it should be easier to ask questions, receive answers, provide feedback—and we want to be responsive to what we hear and to support the global open data community. We're taking some actions: we're planning to support on-line discussions about the datasets we publish, through blogs and community forums, and in partnership with the US Government we're hosting the next International Open Government Data Conference here at World Bank headquarters, on July 10-12, 2012.

We're starting to implement some of these plans now, but we're very eager to hear from you. What do you think of these ideas? What are we still missing? I'm looking forward to your thoughts.

