Published on Data Blog

Accessing World Bank Open Data in Stata

This page in:

Stata is a statistical computing package widely used in the business and academic worlds. We use it at the World Bank and it’s great to see a new version of the wbopendata module that gives Stata users direct access to much of the data on data.worldbank.org.

Academic institutions and hundreds of users are already taking advantage of it – why not give it a try?

Why use the wbopendata module to access data?

It’s important to have convenient access to the best data available. The wbopendata module connects to the World Bank Open Data API and provides direct access to the latest version of the Bank’s data though the Stata interface – there’s no unnecessary downloading or management of data needed.

What's new in this version of wbopendata?

The new version of wbopendata lets you:

  • Access 1,000 new indicators, bringing the total up from 4,200 to 5,300 time series.
  • Access the metadata of the downloaded series: including indicator definitions, the organization and/or agency responsible for its collection, and links to available supporting information.
  • Easily link the indicators downloaded to maps from within Stata.
  • Access data in three Stata-supported languages: English, Spanish or French

The wbopendata module lets you connect to information from over 256 and regions since 1960, the accessible datasets include:

How do you use the wbopendata module?

Following the instructions on the wbopendata homepage to install the module just type the command:

ssc install wbopendata

Once installed, the wbopendata module offers four possible download options:

  • Country – all indicators for all selected years for selected country. (WDI dataset)
  • Topic – all indicators for a selected topic, selected years and all countries (WDI dataset)
  • Indicator – selected indicator for selected years for all countries (any dataset)
  • Indicator and Country – selected indicator  for selected years from selected country (any dataset)

 Once installed, to open up the module’s graphical panel type the following:

db wbopendata

stata wbopendata window

The latest version of wbopendata can also display the metadata for a given indicator – here’s the metadata for the indicator “Mobile Cellular Subscriptions per 100 people” with indicator code it.cel.sets.p2. when you type:

wbopendata, language(en – English) indicator(it.cel.sets.p2) long clear latest

stata wbopendata window

Making maps with wbopendata in Stata

You can easily take the data from this kind of indicator and generate maps with Stata:

. tempfile tmp
. wbopendata, language(en - English) indicator(it.cel.sets.p2) long clear latest
. sort countrycode
. save `tmp', replace
. sysuse world-d, clear
. merge countrycode using `tmp'
. sum year
. local avg = string(`r(mean)',"%16.1f")
. spmap  it_cel_sets_p2 using "world-c.dta", id(_id)                                  ///
clnumber(20) fcolor(Reds2) ocolor(none ..)                                  ///
title("Mobile cellular subscriptions (per 100 people)", size(*1.2))         ///
legstyle(3) legend(ring(1) position(3))                                     ///
note("Source: World Development Indicators (latest available year: `avg') using ///
Azevedo, J.P. (2011) wbopendata: Stata module to " "access World Bank databases, ///
Statistical Software Components S457234 Boston College Department of Economics.", size(*.7))

(click to run)

This code above should generate a map like this one:

stata wbopendata window

Reproducibility of analysis and keeping track of dataset vintages

One of the important advantages of wbopendata is that it facilitates the reproducibility of any analysis using WDI data in Stata. It is much easier for analysts to document how particular results were obtained, since the syntax used to conduct the analysis can embed the name of the actual series used in the analysis and the actual code to download them. This can facilitate more open and transparent knowledge generation as it facilitate the reproducibility of the analysis by others, can make it easier to update previous analysis once newer data becomes available, and enable us to take the principles of literate programming introduced by Donald Knuth a step further, since even the data used in the analysis can be embedded in the code and updated in real time.

One important implication for users of wbopendata is the need of a careful documentation of the vintage of the dataset being used, best captured through the date of extract of the series. It is important to remember that wbopendata tap’s into a live dataset which is update at least twice a year, hence the underlying data accessed through the API’s will always be changing.

Making World Bank Data easier to access for specialist users

So, if you’re a Stata user, download the updated wbopendata module and start using the latest World Bank Open data. If you use the open source R statistical software, there’s a similar module available.

The wbopendata module was developed and is maintained by Joao Pedro Azevedo from the Poverty, Gender and Equity Unit from the Poverty Reduction and Economic Management Team (LCSPP) in the Latin America and Caribbean Region of the World Bank.

 

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000