Published on Development Impact

Scaling Up Effective Programs – Kenya and Liberia Edition

This page in:
This page in: français

Over the last decade, both Kenya and Liberia have sought to scale up successful pilot programs that help children to learn to read. Even as more and more impact evaluations are of programs at scale, pilots still constitute a significant portion of what we test. That’s with good reason: Governments wisely seek to pilot and test programs before expending valuable resources in implementing a program across the country. Last year, I wrote about how the Indian organization Pratham worked with J-PAL to test effective programs to improve reading iteratively, varying different parameters in terms of who was implementing (government teachers versus volunteers) and when (in-school versus during the holidays).

I recently read a paper that adds additional insight to the process of scaling up an effective intervention: “Designing for Scale: Reflections on Rolling Out Reading Improvement in Kenya and Liberia,” by Gove, Korda Poole, and Piper.

The Problem. As motivation, both countries have low performance in early reading. In Liberia, just 16 percent of third graders could read even 40 words per minute. (The number needed to comprehend what you’re reading varies by language, but in English – as an example – some people will use between 60 and 90 words per minute as a benchmark.) In Kenya, less than half of third graders can complete second-grade work in Math, Kiswahili, or English.

The Pilot Solution. In Liberia, the “EGRA Plus” program was tested in a randomized controlled trial including 180 schools, with 60 in each of two treatments and 60 serving as controls. A “light” treatment taught teachers to do classroom-based assessment and shared student results with parents and the community. Think of this as an accountability intervention. A “full” treatment additionally included a set of scripted lesson plans and other instructional materials as well as coaching over the course of two semesters. The full treatment had significant impacts on all 7 tested reading tasks (e.g., “letter-naming fluency” and “reading comprehension”), with a median impact of 0.78 standard deviations. The light treatment significantly impacted just 2 of the tested tasks, with a median measured impact of 0.02. So, accountability isn’t enough. But the effects of the full treatment are large: “Relative to control schools, students’ scores in full-treatment schools improved at a rate 4.1 times faster for oral reading fluency of connected text and 4.0 times faster for reading comprehension.” (You can read all about the Liberia pilot RCT here.)

In Kenya, the Primary Math and Reading (PRIMR) initiative had a few components: (1) student textbooks in English and Kiswahili focused on specific reading skills (rather than traditional, content-based books), (2) accompanying teachers’ guides that included already developed lesson plans, (3) 7 days of teacher training per year, focused on practice (not theory) – after each mini-lesson, teachers received a “mastery card,” and (4) visits from what are now called Curriculum Support Officers, who provided coaching to teachers. The program showed significant, sizeable impacts on all three tested reading tasks in both English and Kiswahili, with a median effect size of 0.33. (You can read more about the Kenya evaluation here.) Researchers also used those data to examine the relationship between the number of teachers per coach (fewer teachers per coach has a measurable impact on some outcomes), the value of adding e-readers and tablets at the school level (not so much), and the specific impact on the poor (positive).

Going to Scale. In Liberia, the success of the full treatment led to the design of a much larger-scale program. It was similar to the full treatment but with coaches notably assigned to 12 schools instead of 4 for cost reasons. The scaled intervention reached 1,200 schools in two phases beginning in 2011, with a cluster randomized model that allowed evaluation. Initial results were significantly smaller than in the pilot, with significant gains in just 2 of the 7 reading tasks, and a median effect across all tasks of 0.35. Still, these aren’t trivial. The Ebola crisis significantly disrupted the intervention thereafter. (You can read about that evaluation here.)

In Kenya, the scale-up – called Tusome – was relatively straightforward because the Curriculum Support Officers were already in place. Analysis in the pilot affected the ratio of coaches to teachers, the choice of ICT, and revisions to the student books and teacher guides. This scale-up is for “all public primary schools in the country,” so credible evaluation is tougher. That said, take-up of materials by students and methods by teachers seems high, and analysis to be released later this year will tell whether – at least – learning outcomes are moving in the right direction.


  1. Test as many elements as possible in the pilot. In Kenya, the pilot clearly influenced the scale-up, and much of that was because the researchers tested many elements – ratio of coaches to teachers, level of ICT penetration, quality of the materials – within the pilot. Not everything was varied experimentally. But you get the sense that every ounce of that data was leveraged for all it was worth to provide feedback into the scaled design.
  2. Use government systems as much as possible. In Liberia, the scale-up involved hiring lots of new coaches who weren’t already in the system. In Kenya, the program worked with existing “coaches.” As an important correlary, one of the authors, Ben Piper, told me, “Ensure that the activities that we expect government officers to do are already in their job descriptions.”
  3. Use costs in your analysis. In much of the analysis of the pilots, the authors are not looking just at how to improve outcomes, but at how to do it at reasonable cost. The ratio of coaches to teachers in Kenya is a good example of that. An even better ratio than 1 coach to 15 teachers seems to improve outcomes, but the gains are small relative to the costs. This is crucial for actual policy influence. Again, from Piper: “Utilize incentives that are not more than the government structures already have available, rather than altering behaviors using financial resources that aren’t sustainable.”
In some ways, this analysis of course feels incomplete. We still don’t have results at scale from Kenya, and the scaled program in Liberia was interrupted by the Ebola crisis. We’ll continue to learn more from these experiences. At the same time, the incompleteness is in itself a reminder to not let the perfect be the enemy of the good. We draw on a range of evidence, experimental and non-experimental, quantitative and qualitative. That isn’t an excuse to settle for low non-credible evidence; we always drawing on the best possible evidence and nudging our partners to create more credible evidence. At the same time, we want to use the best evidence available as carefully as possible to inform policy choices.

Many thanks to Amber Gove and Benjamin Piper who provided additional insights.


David Evans

Senior Fellow, Center for Global Development

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000