Published on Let's Talk Development

Quality of computer code and reproducibility of economic research

This page in:
Authors argue that incremental improvements in code quality could quickly bring long-lasting benefits.. Photo by  Desola Lanre-Ologun on Unsplash Authors argue that incremental improvements in code quality could quickly bring long-lasting benefits.. Photo by Desola Lanre-Ologun on Unsplash

A recipe, regardless of how detailed, is insufficient to make even a simple dish. Culinary schools flourish because they fill the information gap between reading a recipe and making a good meal. A scientific publication is like a recipe. The computer code that accompanies it is auxiliary information required to implement or replicate the publication findings. This information is difficult or impossible to convey in academic journals.

 

"Programs must be written for people to read, and only incidentally for machines to execute."

—Abelson, Susman and Susman, Structure and Interpretation of Computer Programs (1985)

 

Empirical economic research relies heavily on data analysis. The algorithm for constructing a poverty line in a particular country and year consists of hundreds of steps. How should income be imputed for households owning their homes versus those renting their dwelling? How should the calculations account for the value of home-produced products or services provided for free by household members (such as childcare)? How should poverty lines adjust for differences in access to public services in urban and rural areas? The main steps of such procedures could be described in a publication, but computer code is needed to capture the details critical for implementing and replicating these principles in different environments.  

The editorial boards of the leading economic journals focus primarily on reproducibility. Contributors are supposed to provide all of the components (code, data, documentation) necessary for other researchers to “duplicate the results of a … study using the same materials and procedures as were used by the original investigator.” They are not required to provide code that is readable and understandable, however—a surprising lapse given the numerous times coding errors have led to the publication of incorrect conclusions (see, for example, Levitt 1997, Hoxby 2000, and Reinhart and Rogoff 2010).

The code accompanying an economic publication should be written in a way that allows readers to understand the paper’s logic and main assumptions; check its correctness, efficiency, and portability; and modify the code .

Consider the following Stata implementations of the algorithm for calculating the Gini coefficient, one of the most popular measures used in poverty assessments.

g = 1 – quadcross(sort(X, 1), ((rows(X)::1):*2:–1))/quadcolsum(X)/rows(X)                             (1)


// Gini index using formula: G = (N + 1) / N – 2 / N^2 * mean(X) * sum(P_i * X_i)                     (2)

// N: population size, X_i: income of the person i

// P_i: rank of person i : the richest gets rank of 1 and the poorest rank of N


N = rows(X)                                     // determine the total sample size

sorted_X = sort(X, 1)                     // sort observations by income


// if working with sorted income vector then

// income rank P would be (N \ N – 1 \ ... \ 1)

P = (N::1)

sum_PX = quadcross(sorted_X, P)

mean_X = quadcolsum(X) / N

g = (N + 1) / N – 2 / (N^2) * sum_PX / mean_X

 

The first is an “overoptimized” code that most reviewers would find impenetrable. Making such code available adds little value to research reproducibility, as only a few people can comprehend and modify it.

The second implementation writes the algorithm in a way that most readers could use and improve upon. This code’s structure mirrors the formula used for the calculations and allows even a person unfamiliar with Stata (Mata) to understand the program logic. This code explicitly states the main steps used to calculate the Gini coefficient, allowing researchers to investigate, for example, the numerical properties of the algorithm.

The principle of “programming for others” is rarely used in applied economic analysis. Both supply and demand factors determine the quality of the code that is accepted. On the supply side, much analytical work is produced under tight deadlines; researchers often lack the time to improve the readability of their code—efforts that can increase the time spent on a project by more than 30 percent. Most analytical work is performed by junior staff or consultants who either have no skills in producing reusable code or are not motivated to invest their time in “beautifying” their programs for future users. For their part, senior researchers and management do not seem to appreciate the quality of code, viewing it as a low-level input, and are, therefore, unwilling to invest resources and delay a project to refactor code.

These short-term considerations can be costly for organizations in the long run. Prioritizing the rapid publication of results over the quality of computer code or taking shortcuts in writing code in order to achieve goals more quickly generates “technical debt”—the cost of additional rework caused by choosing an easy (limited) solution now instead of a better approach that would take longer. According to the National Institute of Standards and Technology, up to 80 percent of software development costs are spent identifying and correcting code errors. Technical debt also results in legal, reputational, opportunity, talent, and other costs, among others. These costs are also highly heterogeneous: a script to produce a simple chart that is not intended to be reused bear lower costs than those of large projects with an extended lifespan. However, our experience taught us that we often must reuse the code intended to be used only once.    

The software industry offers a range of low-cost solutions that research institutions could use to improve code quality, thereby strengthening the reproducibility and replicability of empirical research . They include guidelines for writing code that can be understood and reused by authors, co-authors, and collaborators. External code reviews could improve the quality of the code and significantly reduce the number of errors by enforcing simple rules on naming conventions, code formatting, and commenting. Such reviews could require minimal programming skills from reviewers and could be performed quickly by junior staff. Centralized archives and repositories would help researchers maintain code and data and make it easier to discover and try to reuse existing code.

Research institutions should budget time and resources for activities that enhance the reproducibility and transparency of their analytical projects. They should also provide technical help to researchers and analysts  on reviewing and auditing the quality of code. Even incremental improvements in code quality could quickly bring long-lasting benefits.


Authors

Michael M. Lokshin

Lead Economist, Office of the Regional Chief Economist, Europe and Central Asia

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000