Published on Data Blog

Introducing WBGAPI: A new python package for accessing World Bank data

This page in:

Version 1.0.4 of the wbgapi python package is now available. This package has been in the Python Package Index for almost a year, and the latest version adds several new features to make exploring and searching databases easier and more interactive.

Python packages for World Bank data have been around for a while, but WBGAPI is relatively new. I wrote this package to take advantage of some improvements in the API that have also been around for a while, but were difficult to understand or use, and not well supported in other packages. I also wanted to include better pandas support and in general make it easier to retrieve data without a lot of extra code.

The README file provides an overview and the package itself provides extensive documentation through python's help function. But just to get you started, here is a quick overview of 5 features that make WBGAPI unique.

1. WBGAPI makes databases easier to understand and use

The World Bank API sometimes gives the illusion that all indicators reside in one big database. For example:

Graphical user interface, text  Description automatically generated

Actually, the API consists of over 63 databases for a total (as of this writing) of 17,517 indicators. If you request an indicator such as population (SP.POP.TOTL) that is part of the World Development Indicators (WDI), the API returns the data from the WDI. But if you request one of the natural capital indicators (say NW.NCA.FORE.TO for forests) that is not in the WDI, the indicator comes from another database such as Wealth Accounts. And if you request a bunch of different indicators, it's possible they will come from different databases unless you explicitly specify which database you want for each indicator. To make it even more complicated, different databases often contain different countries and time periods, and are updated on different schedules. All of these details are available in the API, if you know how to find them.

WBGAPI has a different implementation that provides greater clarity and consistency about which database your data is coming from and what that database includes. By default, data requests are made against the WDI; it's not possible to inadvertently get data from an unspecified dataset.

WBGAPI provides an easy way to list all available databases:

(Several sample outputs below have been condensed or truncated for length)

Image

2. Easier search and discovery

WBGAPI includes info() functions like the one shown above for exploring the indicators, countries and other elements of the API. These are optimized for both interactive mode and jupyter notebooks, but you can access the same information programmatically as iterable objects. For example:

Image

You can also search metadata at the database level:

Image

Or you can access metadata for a series or country/economy:

Image

3. Simple but powerful data queries

WBGAPI lets you request a single indicator or any number of indicators from a database; the same is true for countries, aggregates and time periods. For example, you can request 2 indicators for 40 countries, or 40 indicators for 2 countries (or all countries) in a single request. If your request exceeds the API limit in some way, WBGAPI will chunk it into multiple API calls.

Here are just a few examples building on pandas data frames (pandas is optional but makes WBGAPI much more powerful). Note that lengthy outputs have been abbreviated for simplicity’s sake:


Fig.1. Impact on GDP from action versus inaction on water-related threats by 2030 and 2045

Image

 

Image


And since WBGAPI supports pandas, it's easy to use the built-in graph functions or whatever graph package you prefer (ggplot, seaborn, etc):

Image

Image

4. Custom dimensions

Most databases in the API have 3 dimensions: series, country (or economy) and time. But an increasing number of databases have additional dimensions. For instance, WDI Archives includes a version dimension, and the ICP database adds Classification. WBGAPI allows you to view all dimensions of a database:

Image

and query them:

Image

Custom dimensions can be singular or multiple just like any other dimension. In this example, WDI version numbers are in YYYYMM format. This lets you request data from, say, the April release of the WDI over several years using a python range that increments in steps of 100:

Image

Image

5. Resolving country codes

One common and onerous task in development data is resolving country codes across different systems and as official names change. WBGAPI includes a beta version of a country name lookup utility that can usually guess the correct code for names of countries in the UN system as well as those used by other international organizations and donors. The output is optimized for human readability, but is based on a python dictionary so it can be used programmatically as well.

Image

Getting Started

To get started, install WBGAPI using pip:

pip install wbgapi

Then review the README file to get a sense of typical uses.

For more examples, see the cookbook in GitHub.

 


Authors

Tim Herzog

Senior Data Scientist

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000