Published on Data Blog

5 Reasons to Check out the World Bank’s new Data Catalog

Image

Please help us out by completing this short user survey on the new data catalog.

Data is the key ingredient for evidence based policy making. A growing family of artificial intelligence techniques are transforming how we use data for development. But for these and more traditional techniques to be successful, they need a foundation in good data. We need high quality data that is well managed, and that is appropriately stored, accessed, shared and reused.

The World Bank’s new data catalog transforms the way we manage data. It provides access to over 3,000 datasets and 14,000 indicators and includes microdata, time series statistics, and geospatial data.

Open data is at the heart of our strategy

Since its launch in 2010, the World Bank’s Open Data Initiative has provided free, open access to the Bank’s development data. We’ve continuously updated our data dissemination and visualization tools, and we’ve supported countries to launch their own open data initiatives.

We’re strong advocates for open data, but we also recognize that some data, often by virtue of how it has been acquired or the subjects it covers, may have limitations on how it can be used. In the new data catalog, rather than having such data remain unpublished, we’re making many of these previously unpublished datasets available, and we document any restrictions on how they can be used. This new catalog is an extension of the open data catalog and relies heavily on the work previously done by the microdata library.

Five reasons to use the new data catalog

The catalog provides a single entry point to all Bank datasets tagged with consistent license, essential metadata and other features for you to find data easily. While we have introduced many features, here are my five favorites:

Image

  1. Search
For the first time, you can search the Bank’s survey, time series and geospatial data across all regions and topics from one place. What’s even better is you can search inside datasets, down to the names of indicators and variables. This kind of “deep search” is great for discovering data you may not even know existed. Search
  1. Geospatial catalog
For the first time (I know I said it again), we are releasing  geospatial datasets covering various topics such as land cover, roads, and energy,. This wouldn’t have been possible without some serious heavy lifting done by our Geospatial Operations and Support Team (GOST). Image
  1. Metadata
The only thing more important than data is Metadata. Who made this dataset? How was
it produced or acquired? When was it last updated? Who’s allowed to use it? What have
people already done with it? Are algorithms shared that will allow for reproducing the
construction of a dataset or indicator? While the catalog tags each dataset with some basic metadata consistent across all datasets, the amount and nature of the metadata will vary depending on the type of dataset. Image
  1. Data licenses - essential metadata that is often ignored
Each dataset is clearly tagged with a data license which describes their terms of use, which greatly improves responsible reuse of datasets. We are happy to announce that the Bank’s Open data terms of use will use the Creative Commons Attribution license 4.0 (CC-BY 4.0) for all our open datasets. That means anybody is free to share and use our open datasets, as long as they credit it appropriately. Creative Commons is one of the world’s most widely used and understood licensing schemes so it means more people can use and re-use our data with confidence that they’re allowed to do so. Image
  1. Tracking data use - citations & visualizations
Data producers often don’t get enough credit for their data work - part of the reason is that it’s hard to track where data have been cited, used and re-used. Understanding where and how the data are used helps us understand the impact of the dataset and divert investments in priority data areas . It incentivizes data providers to release their data more proactively much like research papers, giving them more accountability. We are also tracking visualizations published by our teams using our data as a means to better understand the way our data is being used in various articles and blogs.
Image

How are you using the new catalog? Are there any features you particularly like or would like to see? You can get in touch on Twitter: @worldbankdata, by email: data@worldbank.org and you can take our data catalog survey.

Authors

Malarvizhi Veerappan

Senior Data Scientist, Development Data Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000