Published on Data Blog

How we mass-produced reproducible Human Capital Project country briefs

February 05, 2020

This page in:

“It is supposed to be automatic, but actually you have to push this button.”

John Brunner, 1968

Author

At the World Bank, and other international institutions, the demand for country-specific briefs on recently released global/regional data is often high. Country briefs play a key role in facilitating policy dialogue. This was the case during the launch of the Human Capital Index (HCI) at the October 2018 Annual Meetings in Bali, Indonesia. A combination of print and digital resources helped to familiarize technical and non-technical audience with the index (the World Development Report 2019 (World Bank 2019), the Human Capital Project Booklet (World Bank 2018), a website, and an explainer video). A key component of these print and digital resources were 2-pager country briefs that were prepared for all the 157 countries for which an HCI was calculated.

The feedback received on the usefulness of these 2-pagers and the demand from regional and sectoral Vice-presidents to have easily accessible information led to the mass production of three additional country profiles in the lead up to Annual Meetings 2019. These country profiles include i) 50 country profiles on the socio-economic disaggregation of the HCI; ii) 185 country profiles with illustrative indicators relevant for human capital development; and iii) 48 scorecards to assess commitment to the human capital agenda in the Africa region. How did we go about mass-producing these products?

The technical challenge was to mass-produce a set of 48-185 country briefs with country-specific information in an automated manner. The amount of country-specific information in each brief is wide—exceeding 100 variables in the case of the scorecards—and the text varies depending on the data of its corresponding country. Even more challenging, we had to incorporate into the workflow the constant update of the input data for some or all countries. Producing the briefs manually, one-by-one, was not an option. Such an approach is inefficient and open to error, especially when the demand for future updates is considered.

We needed a system that would allow us to do any or all the country briefs at the same time painlessly and automatically. Moreover, the system had to be flexible enough to enable the three authors to simultaneously code and write syntax in Stata, R and Latex. Given the iterative nature of the underlying project, it was also critical that the system allows for tracking changes to go back in time and work from earlier versions. Fortunately, new technologies come to help.

We relied on the concept of Reproducible Research (Gandrud 2015; Stodden, Leisch, and Peng 2014) and started by creating a GitHub repository to track, contribute, and share all the files, while building an understandable history of the project. Then, each of us started to work on our assigned tasks—compiling indicators from multiple public sources, composing interactive text that varies with data, creating charts, and integrating all the inputs in Rmarkdown. The technology facilitated a seamless collaboration between the authors and made last-minute changes possible at relatively low effort. Both the process and the end output were a success (see it for yourself here, here, and here).

To promote the Open Data, Open Code, and Open knowledge philosophies of the World Bank, all the Stata do-files, R scripts, Rmarkdown files, charts, and raw data of the first two country briefs are available to anybody in the GitHub repositories, worldbank/HCP_AM19_Illustrative_indicators and worldbank/SES_HCI_AM19. We hope other projects inside and outside the World Bank make use of these repositories as an example of reproducible documentation, mass production of files, and collaboration between people with different skills.

51 years after the publication of Stand on Zanzibar by John Brunner, we could say, “The production of the HCI country briefs is supposed to be automatic, but actually you have to click on the ‘knit’ button.”

Get updates from Data Blog

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.

How we mass-produced reproducible Human Capital Project country briefs

Get updates from Data Blog

Authors

R. Andres Castaneda Aguilar

Zelalem Yilma Debebe

Martín E. De Simone

Join the Conversation