Published on Data Blog

Demystifying ICP purchasing power parity calculations using Python: Regional results

April 07, 2021

This page in:

New results from the ICP including purchasing power parities (PPP), price level indexes and PPP-based expenditures for reference year 2017 are now available at icp.worldbank.org. This blog series, edited by Edie Purdie, covers all aspects of the ICP and explores the use made of these data by researchers, policymakers, economists, data scientists and others. We encourage users to share their data applications and findings in this blog series via icp@worldank.org.

Every few years, results from the International Comparison Program (ICP) make headlines around the world as they shed new light on the size and structure of the global economy. Underlying these results are purchasing power parities (PPPs) for countries across the globe. PPPs are exchange rates or conversion factors that enable us to convert one currency into the other while also—and this part is key—adjusting for the fact that prices of the same goods and services may differ between countries.

PPPs are essential tools in any analysis that requires adjusting for differences in price levels across countries.

In this blog we describe the calculation steps and the main formulas needed to estimate ICP PPPs using a worked-out coded example. To help with the task, we use mock average price data and Python, a free, open-source programming language with an ever-growing community of users around the world.

Details of ICP PPP calculations have been publicly available for years. The methodology section of the ICP website and Chapter 5 of the ICP 2017 report Purchasing Power Parities and the Size of World Economies provide a general account of the calculation of ICP PPPs. The ICP PPP eLearning course also provides an overview of the calculation process. Furthermore, Deaton and Heston (2010) provides details on the choice of formulas used in the calculations and their properties as they apply to ICP PPPs. However, ICP PPP calculations are not the most intuitive concept, and providing another way to access this information will be a helpful refresher to some, and a useful initiation to those who are programming-adept but newcomers to ICP PPPs.

Example of estimating PPPs using Python

What follows are code excerpts, using a Jupyter notebook, sequentially organized to understand how ICP PPPs are estimated. Click each tab to see the Python code applied at each step. The full notebook with the entire code set in an executable online environment is available here (no installations are required, but it may take some time to load the first few times).

Input Data

We start by loading the input dataset containing mock average price data and other relevant country-level information. A review of the price and expenditure data required to estimate PPPs is provided in the World Bank Data Blog How does the ICP measure price levels across the world?

Load required Python libraries

## Load libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm

Load and display input price data

# Load price data
data="price_data.csv"
prices=pd.read_csv(data)
prices # Show full dataset

This mock dataset contains four countries ('country') and three basic headings ('bh'): garment; rice; and pork. 'Basic headings' in the ICP literature refer to detailed expenditure categories containing similar item varieties, for example the 'Rice' basic heading contains several rice varieties. It is also the lowest level of aggregation for which PPPs are first calculated. The different item varieties in each basic heading are noted under the 'item' column, for example, within 'garment' there are three item varieties, identified as 'garment 1', 'garment 2', and 'garment 3'. Finally, an average price in the local currency unit of each country is reported for each item ('price') and information on the relative importance of each item in a country's consumption at the basic heading level is included for each item priced in the importance column ('imp'). Following the guidelines provided by the ICP Technical Advisory Group, countries assign a weight of '3' to items identified as 'important' within a given basic heading and a weight of '1' to items deemed unimportant

It should be highlighted that in practice the full ICP classification consists of 155 basic headings with the number of items within each varying from one basic heading to another. Also, not all countries are able to report prices for all items. These two realities are reflected in the example: some basic headings contain more items than others, and prices for some items are missing in some countries.

Basic heading PPPs

PPPs are first estimated at the basic heading level resulting in a set of several PPPs per country, one PPP for each basic heading per country.

The estimation procedure involves averaging price relatives for individual items from different countries within each basic heading to obtain basic heading-level PPPs. This is done via a regression method known as the weighted country product dummy (CPD-W).

The CPD-W is carried out within each basic heading by regressing the logarithm of the observed country item prices on item dummies (one for each item) and country dummies (one for each country other than the numeraire). The CPD-W method also incorporates the country reported item-level importance indicators discussed earlier with the idea of "down-weighting" unrepresentative items during the calculation.

Select the base or numeraire currency

numeraire = 'country2'

This refers to the currency against which all the estimated PPP values will be compared. In the case of the global PPP results, the numeraire is the US dollar. In this case, we select the currency of 'country2' as the numeraire and say that 'country2' is the base or reference country.

Run the CPD-W on each basic heading and store results

for bh in prices.bh.unique():

tempdf=prices[prices.bh == bh]

X=tempdf.loc[:, [x for x in tempdf.columns if x.startswith(('c_', 'i_'))]]
y=np.log(tempdf['price'])

wts=tempdf['imp']

wts_cpd=sm.WLS(y, X,weights=wts) res=wts_cpd.fit() res_eparams=np.exp(res.params)

print("\n","Basic Heading:", bh, "\n")
print('Exponentiated Parameters:',"\n",

res_eparams)

l_coef.append(res_eparams)
l_bh.append(bh)

coef = np.array(l_coef, dtype=float)

coef = np.round(coef,4) # round to 4 decimals
cols = list(X) #store column heads of X as a list

coef[coef == 1] = np.nan #%% replace PPPs that were exp(0)=1 with 'np.nan'

Display the estimated basic heading PPPs

Basic Heading PPPs
bh	country2	country1	country3	country4
garment	1.0	9.7435	20.3606	0.0947
pork	1.0	13.8749	18.9851	0.0917
rice	1.0	14.0847	10.5113	0.0672

Above-basic heading PPPs

Next, PPPs estimated at the basic heading level are aggregated using national accounts expenditure values in local currency units for each country as weights.

The aggregation method involves constructing bilateral PPPs for each pair of countries, using basic heading-level national accounts expenditure values as weights from each country in turn. First, a Laspeyres-type bilateral PPP is calculated between each pair of countries and then a Paasche-type bilateral PPP. The geometric mean of the Laspeyres- and Paasche-type bilateral PPPs gives us the Fisher-type bilateral PPP between each pair of countries in the dataset.

Load the basic heading level expenditure values

#Load basic heading expenditure values

#Should contain bh and countries with prefix c

code="bhdata_exp.csv"
df_bh=pd.read_csv(code,index_col="icp_bh")

#Sort cols with numeraire as col1
def sorting(first_col, df):

columns = df.columns.tolist()
columns.remove(first_col)
columns.insert(0,first_col)

return df.reindex(columns, axis=1)

df_bhexp=sorting(c_numeraire,df_bh)

#sort rows alphabetically
df_bhexp=df_bhexp.sort_values('icp_bh')

Calculate and display bilateral PPPs (Laspeyres-, Paasche-, and Fisher-type)

Laspeyres-type bilateral PPPs:
	country2	country1	country3	country4
country2	1.000	0.072	0.056	12.639
country1	12.467	1.000	0.640	172.687
country3	14.229	1.294	1.000	185.295
country4	0.078	0.006	0.005	1.000

Paasche-type bilateral PPPs:
	country2	country1	country3	country4
country2	1.000	0.080	0.070	12.889
country1	13.857	1.000	0.773	156.306
country3	17.899	1.563	1.000	203.452
country4	0.079	0.006	0.005	1.000

Fisher-type bilateral PPPs:
	country2	country1	country3	country4
country2	1.000	0.076	0.063	12.763
country1	13.143	1.000	0.703	164.292
country3	15.959	1.422	1.000	194.162
country4	0.078	0.006	0.005	1.000

As a last step, the Gini-Éltető-Köves-Szulc (GEKS) method is applied to the matrix of Fisher-type bilateral PPPs. GEKS PPPs are calculated between each country relative to the numeraire or base country. To this end, the first step is to divide each country row of the Fisher-type bilateral PPP matrix by the row of the numeraire country. Each row will then contain two direct PPPs (each country to itself and directly to the numeraire country) and n−2 indirect PPPs (each country to the numeraire country via each of the other third countries), where n equals the total number of countries in the dataset. Finally, the GEKS PPP for each country relative to the numeraire is given by the geometric mean of the direct and indirect PPPs in each respective country row.

GEKS PPPs are considered 'multilateral' because the GEKS procedure uses both direct and indirect PPPs and thus takes into account the relative prices between all the countries as a group. The GEKS method is needed to make the Fisher-type bilateral PPPs transitive and base country-invariant. Transitivity means that the PPP between any two countries should be the same whether it is computed directly or indirectly through a third country. Base country-invariant means that the PPPs between any two countries should be the same regardless of the choice of base or numeraire country.

Calculate and display GEKS PPPs

#Calculate GEKS multilateral ppps
##requires the earlier nangmean function

geks = np.zeros(shape) # zero 'country x country' matrix
nrow=len(geks) # gets the number of rows
ncol=len(geks[0])

for row in range(nrow):

for col in range(ncol):

geks[row][col]= nangmean(fi[row]/fi[col])

geks_vec = np.zeros(shape=(1,len(df_bhexp.columns)))
# as we need a vec- tor of ppps, not a matrix

j=len(geks_vec[0])

for col in range(j):#..one PPP per country, or col of bhexp df

geks_vec[:,col]=nangmean(geks[col,0]/geks[0,0]) #ge-

omean over each row, w/ each col rebased to country in col1

geks_ppp = np.array(geks_vec)

GEKS Multilateral PPPs:
country2	country1	country3	country4
1.0	12.568	16.405	0.08

In the above example we showcased the main steps to calculate PPPs. Information about the overall ICP methodology is provided on the ICP website.

Get updates from Data Blog

Authors

William Vigil-Oliver

International Comparison Program (ICP), The World Bank

More Blogs By William

Shriya Chauhan

Consultant

More Blogs By Shriya

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.