The new World Development symposium on experiments in development: Has anyone got anything new to say about RCTs or are we just re-hashing the same-old arguments over again?

|

This page in:

Following the announcement of the 2019 Nobel Prize in Economics, Arun Agrawal, the editor of World Development decided to put together a special issue on perspectives on the experimental approach to development and poverty alleviation. This was done incredibly quickly, with authors submitting an abstract first, being accepted largely on this basis, and then being asked to write about 1000-1500 words within three weeks. This World Development symposium is now up online, and some summary is discussed by Arun in this twitter thread and by the editors in this introduction piece.

Given the light refereeing and fast turnaround time, as well as the huge amount of existing literature and debate around RCTs, I wondered whether anyone had anything new to say – or whether we just see a re-hashing of existing arguments and a set of short summaries of longer papers. Many of the contributions indeed summarize existing arguments the authors or others have already made before, with the usual complaints about the types of questions addressed, about external validity, many complaints of the words “gold standard” and “plumber”, arguments against strawmen (“we can only learn from RCTs”), etc. Others provide summaries of the use of RCTs in one area of the literature. Nevertheless, I found a number of papers did have something new to say (and I also realize there are many audiences for these pieces, and that papers that I found less new may be the first time others are seeing a particular argument). The contributions that stood out most to me presented me with new facts or formulations of ideas, or gave good concrete examples of broader phenomena:

Why did the early wave of RCTs die out in developing countries? Luciana de Souza Leão and Gil Eyal note that there was a first wave of family planning, public health and education experiments in developing countries that began in the 1960s and ended by the early 1980s.  They compare a sample of 60 such experiments (the appendix gives a big list) to those of the more recent RCT-wave, and argue that the earlier ones tended to be of large-scale policies with long time horizons, and “the longer the study took, the less likely it was to conform to a “true” RCT design, as political resistance and the likelihood of substitution bias increased over time” – and they argue that the key advantage of the new wave of RCTs was less political resistance because the nature of the partners changed (global NGOs rather than governments) and of the interventions (the assistance many recent interventions bring is “relatively trivial”). Also another reason they may have received less interest from economics was who did these first experiments: “First-wave researchers were a motley crew of sociologists, psychologists, public health and education experts, typically employed as consultants or staff at the Population Council, the Rockefeller and Ford Foundations, SSRC and USAID.” I hadn’t realized quite how many of these early experiments there were, and would love to read and learn more about them.

Contributions of local researchers and field staff, and challenges they face: Lennart Kaplan, Jana Kuhnt, and Janina I.Steinert discuss some of the issues around safety concerns, working conditions, and emotional burden faced by field staff, as well as their frustrations about lack of formal recognition for this work when it comes to research publications. They note that many young people are given large degrees of responsibility which may overwhelm them at times: “we did have extreme, extreme amounts of responsibility for not having a lot of experience. [A]nd then there were car accidents, people had all kinds of health issues, and have seen things they did not want to see.”. On a more positive note, Joana Naritomi, Sandra Sequeira, Jonathan Weigel and Diana Weinhold document that RCTs do lead to more collaborations, as “papers involving RCTs were about 3.6 percentage points more likely to include co-authors from developing countries for the 2000–2019 period” (compared to only 6.2% of non-RCT development papers), and are also more likely to involve a co-author from an academic department outside of economics.

A call to measure environmental impacts of poverty programs and poverty impacts of environmental programs: Francisco Alpízar and Paul J.Ferraro discuss the lack of measurement of environmental impacts in programs designed to reduce poverty, and the lack of RCTs of environmental programs, and several hypotheses for why this is the case. One idea that stood out was that “the disciplines that focus on challenges like species extinction and climate change are often framed as “crisis disciplines” for which the guiding ethical precept is “do something and do it now” and for which the solutions are often framed as unequivocally beneficial. A logical implication of these ethical perspectives is that RCTs are not needed in environmental contexts. We, of course, disagree with both the perspectives and their implication.”

A key advantage of experiments is directly testing what is possible through human intervention. Cyrus Samii (ungated) notes that while much of the discussion of RCTs deals with its role in reducing selection bias, an underappreciated benefit is in “debates about what policy makers should do and how they should do it” by “clarifying what is materially possible through human intervention….From an experiment, in principle, you should have before you a recipe for creating an effect. Context-dependence means that replicability may sometimes be elusive in practice. One could measure scientific progress in terms of the ability to fashion recipes (including the contextual conditions) for giving rise to predictable effects…Observational studies are often deficient in this regard, because they do not control where and when we get the variation in treatments of interest”.

Sometimes the value of experiments is precisely when they are devoid of economic theory: Paul Glewwe has a useful categorization of education policies into demand-based policies, inputs, pedagogy, and governance – and notes that economic theory has much to say about demand-based policies (e.g. responses to prices/scholarships) and about governance (e.g. how schools are managed and teachers rewarded), but not much to say on the other two topics – and that “RCTs that have little or no basis in economic theory, such as RCTs that focus on different pedagogical practices, can offer much useful information on education policy even if they are not grounded in economic theory” – giving the example of experiments on teacher training programs. He cautions the downside is that without theory, the external validity of such findings is harder to consider and replication in many settings necessary.

Finally, in my own contribution, I ask if it needs a power calculation, does it matter for poverty reduction? I’ll let you decide whether this is a re-hash of things you’ve already heard me say, but I tried to say something new. I take on the argument that randomized experiments played virtually no role in China’s growth, and they could learn by “seeing whether something worked” by asking what types of questions one can learn the answer by simple before-after observation, versus needing large samples and careful counterfactuals. I note that in cases where the treatment effect is massive (e.g. migrating to a richer country, stabilizing hyperinflation), then the bias from assuming that not much would have changed otherwise is second-order and won’t affect your decision-making. A second case is when life is otherwise fairly stable – using the same machine to make a similar number of garments each day, and then changing technology and seeing you produce a lot more doesn’t need an RCT to tell you it worked. But, then I note that research has found that both country growth rates and individual incomes are highly volatile, and most policies don’t lead to 100%+ gains – so then it is really hard to just see whether something worked, and it is only through large samples and collective learning with careful counterfactuals that we can learn what no individual can learn simply by doing.

Readers, which papers did you find most interesting/novel/thought-provoking? Let me know in the comments.

Authors

David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

Marat
January 09, 2020

I guess for the cases like China risk management comes first, as, de facto, policy makers have been running experiments without full understanding of causal chains. This is the area, where a lot could be done still. In the development circles, risk management too often stays as a paper risk matrix without true changes to the programme approach.
Your description of China's growth is somewhat reminiscent of the development of medicine (wonder if that is a good article to write). Paracetamol was adopted without full understanding of molecular mechanisms. If paracetamol was developed today, it won't be possible to sell it in Europe, as it will run into regulations. Digitalis is another medicine with somewhat similar story.