No one said it’s easy to run a randomized experiment! Like Berk and Marcus have pointed out in previous posts, randomization can go wrong (see here ), treatment assignment can get mixed up (see here ), and turnover by management at the implementing institution can stop an impact evaluation cold in its tracks (see here ). In a recent World Bank BBL titled “Integrating Impact Evaluations into Youth Employment Programs: Challenges and Approaches,” the International Youth Foundation (IYF ) gave a brave presentation on the challenges and failures that IYF has experienced trying to implement youth employment impact evaluations (IEs) in Africa, Latin America, and the Middle East. Since published IEs tend to be those that succeeded, it is rare to hear about the ones that experienced challenges or failed – so this presentation was very interesting and worth sharing some lessons from.
In Kenya, IYF is in the middle of a two year long youth employment program  targeting young women from informal settlements around Nairobi for ICT training and job placement. While the IE has been successful in following through from baseline to endline and the results have been positive, here are a few examples of obstacles IYF is currently dealing with:
· Limited training facilities of implementing partner: The intervention involves two full months of training, but the local partner understandably doesn’t have the computers and space to train the full treatment group of 700 young women in one go. As a solution, IYF divided the sample into multiple cohorts to maximize the partner’s capacity. IYF also directed efforts at expanding local capacity by instituting morning and afternoon shifts, which halved the time necessary to complete the training.
· Take-up: Take-up of the employment program was less than expected, primarily due to the time commitments required of the young women and transportation costs. Lower take-up means lower power (see David’s piece on how to do power calculations here ), and thus, in order maintain enough statistical power, IYF provided “transportation stipends for the highest need treatment participants” to boost participation rates.
· Attrition: IEs require following up with the same people from the baseline, which isn’t always easy. In this IE, attrition rates were fairly high across treatment groups (20-30%) and particularly high for the control group (30-40%) over their 10-month program. If attrition is purely random then they’d simply lose statistical power. However, if attrition is in anyway systematic, then it’s no longer clear that the control group is a reasonable counterfactual for the treatment. IYF’s solution: paying a participation stipend to the respondents in the control group, providing phone cards, and offering an “alumni packet” of services (e.g. computer use, job placement) after the endline data collection.
· Enumerator security: While collecting data in one of the informal settlements, a group of enumerators was attacked. Now, IYF makes sure all enumerators travel with police escorts in unmarked vehicles.
Under the entra21 youth employment program in Latin America and the Caribbean, IYF supported nearly a dozen IEs. In Argentina, IYF helped to design and fund an IE, which had problems with attrition in the treatment group. Less than half of the treatment group of 220 people successfully completed this particular youth employment project.
Elsewhere in Latin America, IYF has had varying successes and failures. In Peru, IYF successfully conducted a matched pair evaluation, which revealed a positive impact on employment and job quality. In Colombia, IYF ran two randomized trials: one showed no impact and the other was discarded due to design errors. In Chile, Ecuador, and Brazil, IYF abandoned designs for an IE due to inabilities to identify a counterfactual and/or lack of partner capacity.
In Jordan, IYF attempted to design a quasi-IE for a youth employment program, but ultimately, decided it wasn’t feasible. In this case, the sample size wouldn’t have provided enough power, it wasn’t clear whether the capacity to implement an IE even existed, and it was believed that randomization of the treatment might cause negative repercussion in the at-risk target community.
These youth employment IEs bring up several good questions. How do we balance adhering to IE protocol and running an innovative, experimental program? Why is take-up low, and should we run IEs on programs like this that don’t seem to be in high demand by the beneficiaries? Is it ok to incentivize particular participants to ensure high take-up or members of the control group to reduce attrition? In conflict ridden areas, how can we ensure safety for all parties involved? What does the program management and organizational structure look like of the local partners that would be involved in the IE – are they conducive to ensuring the successful implementation of an IE?
More generally, they raise the issue of how to best share experiences from evaluations that don’t work out. Trial registries provide one way of at least ensuring that more trials get recorded, something discussed previously  on development impact. However, perhaps at most half of the above would have made it to the registry stage – in many cases there is some desire or exploration of doing an impact evaluation, but one of these issues comes up before it even gets to the stage of baseline and randomization to treatment. Thinking about a way to record and learn from these ones that get away seems an interesting area for future work.
Matthew Groh is an Extended-Term Consultant in the Finance & Private Sector Development Unit of the Research Group, working with David McKenzie on evaluations in the Middle East.