What works to get children into school? How can we improve their learning outcomes?
Glewwe and Muralidharan (2015) have a review of educational research in developing countries that seeks to answer just these questions…
WAIT A MINUTE! Don’t we have a whole bunch of these reviews already? Say, for example, Asim et al. (2015), Conn (2014), Glewwe et al. (2014), Kremer et al. (2013), Krishnaratne et al. (2013), Masino & Niño-Zarazúa (2015), McEwan (2015), and Murnane and Ganimian (2014). Didn’t I just review a bunch of these, in Evans & Popova (2015)? Yes and yes.
Despite the fact that the education synthesis cocktail party gets increasingly crowded, this new effort has much to offer. The authors provide an analysis of how low- and middle-income countries are doing on enrollment (better than now-rich countries when they were poor or middle-income) and on learning outcomes (predictably well given income levels). They give an insightful narrative discussion of 118 “high quality” studies ( RCTs, RDDs, and DDs are in; simple regression and PSM are out) and draw some key lessons for how we can learn from this evidence. As they say, their goal is “to provide a framework for understanding what the results mean in addition to summarizing what they are.”
Here are some of my take-aways. Some, such as the interpretation of zero effects, have implications well beyond education.
There is a lot of evidence, but for many interventions, it’s wide rather than deep.
G&M walk through the evidence in a whole range of areas, on both the demand and the supply side. The analysis is refreshingly disaggregated, discussing scholarship programs, remedial instruction, reading intensive pedagogy, rather than lumping interventions into large, meaningless categories. (Popova and I recommend this in our review of reviews.)
At the same time, what this disaggregated analysis demonstrates is that the ongoing rise in education research is widening at least as much as deepening our knowledge base. For example, even with 118 studies, there are just two studies on providing information on the returns to education, one on career counseling (from 2013), and one on school counseling (from 2014).
So on the one hand, the ongoing increase in evidence means you're more likely to be able to find at least one study on many kinds of interventions. On the other hand, you still can't find lots of studies on any one intervention unless you're looking at conditional cash transfers. The pattern is similar across the wide range of interventions examined.
Adapted from Tables 4 and 5 of Glewwe & Muralidharan (2015)
This lends itself to the conclusion that – rather than simply using the evidence to say “this intervention works” or “this intervention doesn’t,” we should use our empirical studies – together with theory – to understand just what the binding constraints to better education are, as well as how to relax them. “Many more insights can be gained by collecting data on intermediate processes and inputs to better understand the factors that explain the observed…program impact.”
What does a zero effect really mean?
As an example of the above, they have a useful discussion of how we can learn from interventions with zero effects. (In this discussion, they don’t distinguish between a zero effect – what I might call a “true zero” – and a non-significant effect – which could have a large magnitude but an even larger confidence bound.) As they point out, a zero effect could mean one of five things:
Are you measuring what you think you’re measuring? The production function parameter versus the policy parameter.
In their theoretical framework, G&M make a key distinction, between
and
They argue that empirical estimates move from the production function parameter to the policy parameter more and more over time. Again, the example of school grants in India ( Das et al.) tells this story nicely: At the end of the first year, the grants had an impact (production function parameter). By the end of the second year, households had adjusted and the result disappeared (policy parameter). This reminds me of the finding in McEwan (2015) that across 70 RCTs seeking to improve instruction, outcomes were measured, on average, after 13 months of treatment. So many of our evaluations may not really be getting at the policy parameter.
The recommendation here is to do more high quality evaluations with administrative data sets (for example, using regression discontinuity designs), which permit longer-term follow up more easily. But this is still a real challenge in many low-income countries with serious administrative data limitations.
A few last miscellaneous notes
I recommend reading the whole paper.
WAIT A MINUTE! Don’t we have a whole bunch of these reviews already? Say, for example, Asim et al. (2015), Conn (2014), Glewwe et al. (2014), Kremer et al. (2013), Krishnaratne et al. (2013), Masino & Niño-Zarazúa (2015), McEwan (2015), and Murnane and Ganimian (2014). Didn’t I just review a bunch of these, in Evans & Popova (2015)? Yes and yes.
Despite the fact that the education synthesis cocktail party gets increasingly crowded, this new effort has much to offer. The authors provide an analysis of how low- and middle-income countries are doing on enrollment (better than now-rich countries when they were poor or middle-income) and on learning outcomes (predictably well given income levels). They give an insightful narrative discussion of 118 “high quality” studies ( RCTs, RDDs, and DDs are in; simple regression and PSM are out) and draw some key lessons for how we can learn from this evidence. As they say, their goal is “to provide a framework for understanding what the results mean in addition to summarizing what they are.”
Here are some of my take-aways. Some, such as the interpretation of zero effects, have implications well beyond education.
There is a lot of evidence, but for many interventions, it’s wide rather than deep.
G&M walk through the evidence in a whole range of areas, on both the demand and the supply side. The analysis is refreshingly disaggregated, discussing scholarship programs, remedial instruction, reading intensive pedagogy, rather than lumping interventions into large, meaningless categories. (Popova and I recommend this in our review of reviews.)
At the same time, what this disaggregated analysis demonstrates is that the ongoing rise in education research is widening at least as much as deepening our knowledge base. For example, even with 118 studies, there are just two studies on providing information on the returns to education, one on career counseling (from 2013), and one on school counseling (from 2014).
So on the one hand, the ongoing increase in evidence means you're more likely to be able to find at least one study on many kinds of interventions. On the other hand, you still can't find lots of studies on any one intervention unless you're looking at conditional cash transfers. The pattern is similar across the wide range of interventions examined.
Adapted from Tables 4 and 5 of Glewwe & Muralidharan (2015)
This lends itself to the conclusion that – rather than simply using the evidence to say “this intervention works” or “this intervention doesn’t,” we should use our empirical studies – together with theory – to understand just what the binding constraints to better education are, as well as how to relax them. “Many more insights can be gained by collecting data on intermediate processes and inputs to better understand the factors that explain the observed…program impact.”
What does a zero effect really mean?
As an example of the above, they have a useful discussion of how we can learn from interventions with zero effects. (In this discussion, they don’t distinguish between a zero effect – what I might call a “true zero” – and a non-significant effect – which could have a large magnitude but an even larger confidence bound.) As they point out, a zero effect could mean one of five things:
- The intervention doesn’t work. (The easiest conclusion, but often the wrong one.)
- The intervention was implemented poorly. Textbooks in Sierra Leone that never got distributed to students (Sabarwal et al. 2014).
- The intervention led to substitution away from program inputs by other actors. School grants in India lost their impact in the second year when households lowered their education spending to compensate (Das et al. 2013).
- The intervention works for some students, but it doesn’t alleviate a binding constraint for the average student. English language textbooks in rural Kenya only benefitted the top students, who were the only ones who could read them (Glewwe et al. 2009).
- The intervention will only work with complementary interventions. School grants in Tanzania only worked when complemented with teacher performance pay (Mbiti et al. 2014).
Are you measuring what you think you’re measuring? The production function parameter versus the policy parameter.
In their theoretical framework, G&M make a key distinction, between
the production function parameter: the impact of some intervention on learning, holding all other parameters equal (i.e., the impact of a school grant if households do not update their spending)
and
the policy parameter: the impact of the intervention on learning, allowing other parameters to adjust.
They argue that empirical estimates move from the production function parameter to the policy parameter more and more over time. Again, the example of school grants in India ( Das et al.) tells this story nicely: At the end of the first year, the grants had an impact (production function parameter). By the end of the second year, households had adjusted and the result disappeared (policy parameter). This reminds me of the finding in McEwan (2015) that across 70 RCTs seeking to improve instruction, outcomes were measured, on average, after 13 months of treatment. So many of our evaluations may not really be getting at the policy parameter.
The recommendation here is to do more high quality evaluations with administrative data sets (for example, using regression discontinuity designs), which permit longer-term follow up more easily. But this is still a real challenge in many low-income countries with serious administrative data limitations.
A few last miscellaneous notes
I recommend reading the whole paper.
- There is a fair amount of new evidence. On quick review, I found 20+ additional studies not included in our list of 320+ studies from 8 previous reviews.
- They exclude “academic working papers written before 2010 that had not yet been published by the end of 2014,” as they are “likely to have some methodological flaws that have resulted in their not being published in peer-reviewed journals.” In some cases, the “methodological flaw” may be lazy or overcommited authors, so #GetYourManuscriptOut.
- You can also read Lee Crawfurd’s take on the G&M paper.
Join the Conversation