How we do Open Data: #1 - choosing development indicators

|

This page in:

A recent question from Lorenz Noe caught our eye - how do we choose which indicators to publish in World Development Indicators (WDI), a major part of our Open Data Initiative? It’s a good question, so I thought I’d write a post about that - and we’ll also post something similar in the data help desk.

1. There’s no perfect indicator

There are sometimes gaps in the data


Like many things in life, selecting indicators for the WDI is not an exact science. The intention is to provide good coverage of key development issues, but many of the countries that we work with do not have the quantity - or quality - of data that exists in countries like the United States, for example.

Take a look at the Federal Reserve Economic Data (FRED for short); that database alone includes 123,662 economic time-series about the US, including indicators for sub-national areas like states (I’ll tell you at the end of this post how many indicators we include in the current 2014 edition of the WDI...). It’s usually not possible to find that level of coverage for indicators in many low and middle income countries.

So, while we follow a set of basic principles, it’s relatively rare that we’re able to find and publish the perfect indicator: the one that’s most relevant for measuring a particular development issue, that’s available for every country in the world, for every year, with very high levels of accuracy.

More often, indicators have one of more limitations. We try to describe some of these in our metadata and in the sections of the WDI tables called “about the data”. You’ll find this metadata available in the book, in the on-line tables, and in the databank application - it includes the indicator definition, the source, periodicity, method of aggregation used, statistical concepts and methodology, relevance for development, and limitations and exceptions. The idea behind providing these notes it to help data users decide whether any specific indicator is fit for their purpose. 

Judgement is required on our part to select an indicator to publish in the WDI; and judgement is required on your part to decide whether it’s useful. 

2. Is the indicator relevant?

Spurious Correlation

 

This might the one of the most difficult principles to implement - but we are lucky enough to be able to draw on the experience and expertise of the development professionals of the World Bank Group. We work closely with these specialists to identify the best indicators for each of the topics covered in the WDI: World View, People, the Environment, the Economy, States and Markets, and Global Links. And we also consult with the international statistical community, through agencies like UNESCO’s Institute for Statistics, and through working groups like the Inter-Agency and Expert Group on MDG Indicators - the last group has some very useful advice for choosing indicators for the Post-2015 development framework.

And we’re always on the lookout for new ideas and indicators we can improve, through new research that might produce new datasets, discussions on social media (and, yes, blog posts), and other communication channels. And it’s worth noting here that the WDI is not the only source of development indicators - we also have more specialised data sets available in the Open Data Catalog for users looking for additional and more detailed indicators on specific topics – e.g., Gender, EducationHealth, etc.

3. Is it Open Data?

Screen Shot 2014-05-20 at 3.05.50 PM.png

 Since the World Bank launched its Open Data initiative in April 2010, we obviously need to make sure that any of the data we publish in the WDI can be distributed freely according to the Terms of Use for datasets. We obtain many indicator datasets and time-series from partner agencies, including many specialized agencies of the United Nations, through World Bank country teams who use data from publicly available sources - such as national statistical agencies, or through the work of World Bank staff - such as poverty incidence estimates from PovcalNet.

Clearly, if any agency is unable to provide their data because of the data license that we use, we can’t select those indicators for the WDI. For example, the International Telecommunications Union (ITU) currently provides most of its indicators through a subscription service, with a subset available free of charge. We are only able to include the “free” indicators in the WDI - which in this case are some of the most relevant.

4. Does the indicator have good coverage - over time, and across the world?

Population density

It might seem obvious, but we try to include indicators in the WDI for which estimates are available for most countries in the world - or, at least, for countries which are “clients” of the World Bank. So indicators that might be very relevant but which are available for only a few countries might not be chosen for inclusion. And indicators which are only available at a single point on time - or for very few years - might also be difficult to include.

There are some exceptions to that: for example, we currently include the purchasing power parities produced from the 2005 round of the International Comparison Program for the benchmark year of 2005 only (we also, of course, include the data from the more recent 2011 round). Good coverage is important to be able to provide a complete picture of development, but it’s also important so that stuff can be added up - so that totals or averages for regional or country groupings can be produced.

 

5. Are different years and different countries comparable?

Compare the GDP of selected countries over time

It’s also important to try and select indicators which are comparable - over time, and between countries. One of the strengths of the WDI is the ability to compare the values or the rates of change in different indicators between countries, or to compare a country to the average or total for a region or a country grouping. And it’s only possible to produce aggregates with indicators that are compiled for each country on a comparable basis.

For example, estimates of Gross Domestic Product - the single most accessed indicator published in the WDI dataset - are compiled by countries according to the System of National Accounts, a framework for compiling comparable economic statistics agreed by the United National Statistical Commission. Another good example is the Under-5 Mortality Rate: in this case, there are often many estimates of the same indicator, and the Inter-Agency Group on Child Mortality assess all sources and make a judgement on their reliability and comparability, to produce a single series comparable between countries.

There are exceptions: for example, WDI includes series on national poverty rates. Since each country uses their own methodology for calculating these rates, the data are not comparable and totals and averages can’t be derived - and, moreover, values should not be compared between countries. But despite this, our judgement is that these series are important enough for development to include in the WDI.

6. Is the indicator produced by a good, reliable source, with regular updates?

Selected WDI Partners

The international compilation of indicators requires well-managed and sustainable data collection and compilation methodologies if the series are to be maintained over time. And that’s why we prefer indicators produced by established agencies, institutions and companies. You can find a complete list of partners in the WDI book front matter.  Sometimes, interesting and relevant new indicators are produced through innovative research or other activities, but we have to review such indicators to ensure that their compilation will continue using sound processes. Sometimes we include new indicators on a pilot basis, but we might remove these indicators should their production not continue.

7. Not SMART? Maybe you’re RACCCCCTAAPE.

Screen Shot 2014-05-20 at 4.37.00 PM.png

You might have heard this acronym: SMART. You’ll’ see it used in management textbooks to help define good objectives and good indicators: Specific, Measurable, Achievable, Realistic, Time-bound - or some variation on that theme. Those ideas may not be so relevant for choosing indicators for the WDI, since we’re not setting objectives.

But the principles do help: if you’re using an indicator for measuring progress, for example, part of being specific is knowing whether you can easily interpret any change. If it goes up (or down), can you say that something has got better or worse? More useful than being SMART for the WDI are the general principles for understanding the meaning of quality as it relates to statistical data.

There are several documents about that; one useful checklist was produced by the Netherlands’ Central Bureau of Statistics and is here. A summary: Relevance, Accuracy, Coherence, Clarity, Comparability, Completeness, Confidentiality, Timeliness, Accuracy, Accessibility, Plausibility, extent of detail. As far we can, we try to look at all these aspects of statistical quality in the WDI.

Do you have any suggestions for any indicators we might include in WDI 2015?

 ---

Indicators mentioned in this post:

Poverty headcount ratio at $1.25 a day (PPP) (% of population)

Poverty headcount ratio at national poverty line (% of population)

Gross domestic product (current US$)

Mortality rate, under-5 (per 1,000 live births)



Oh, and how many indicators are in WDI 2014? The download file for the full database contains 336,168 rows of data - each row is a specific time-series indicator (there are 1,334 of those) for a specific country, economy, or grouping of countries or economies (and there are 252 of those).
Series

Authors

Neil Fantom

Manager, Development Data Group, World Bank

Sarven Capadisli
June 02, 2014

Some feedback:
* Include attribute values for the observations so that the measurements can be qualified. (No, including them as part of the indicator text is not good enough)
* Establish what the World Bank Group considers to be a country consistently across all of its departments.
* Differentiate between countries and regions in the API i.e., do not pile up all "areas". Otherwise, just rename the country to something more suitable e.g., "area"?
* Provide the (meta)data in SDMX-ML format accessible via a REST API. ("Clicking" one indicator at a time through the Web interface is not good enough)
* Synchronize the data represenations across formats i.e., XML, CSV, SDMX-ML.
* Provide machine-friendly metadata concerning provenance.
* Datasets should not be just "dropped". Existing users of the datasets should have a way to track down datasets that are "no longer updated".
Looking at the WB's APIs or data dumps for the past 2+ years, there is hardly any notable improvement on any of the points mentioned above (which were previously discussed in the world-bank-api mailing list numerous times). What sort of a conversation is the WB engaged in with the community to improve any of this? Is it documented?

Katia Diaz
May 21, 2014

Excellent post. The World Bank World Development Indicators bring to us a wide range of indicators that we can look at it for an specific country and analyze any particular investment project to evaluate for example according to the lines of investments what is the value of the indicators that should be related to the project expected outcome.