Syndicate content

Are impact evaluations useful for justice reforms in developing countries?

Nicholas Menzies's picture

I have been somewhat skeptical about the application of impact evaluations to justice reform activities but I’m coming around to their utility for a limited – yet important – set of questions. The basic method behind impact evaluations – establishing a counterfactual in order to attribute net impact – is fairly new to justice so I thought I’d set out some ideas that might be worth considering in developing this nascent field.

Some of the general concerns with impact evaluations are relevant when thinking about justice activities:

  1. The need for a counterfactual (or comparison group) means impact evaluations can be pretty ineffective in answering some of the big policy or structural questions. For example, it’s hard to test out one constitutional set up for a supreme court against another.
  2. Impact evaluations can be costly and complex, especially in low capacity contexts, requiring careful data collection and control over the implementation of the intervention(s). Many justice institutions lack a history of strong data collection and the use of aggregate data for decision making.
  3. The setup of a counterfactual often involves denying a policy, service or activity to one group and is one reason the issue of ethics arises in most impact evaluations. Given the preponderance of lawyers involved in justice projects, the question of ethics can take on an especially central role. There are, however, some good counter-arguments. If legal services are as crucial as lawyers think they are, then it could be said to be unethical to offer them without some confidence they work. But perhaps the most resonant counter-argument for those who run legal institutions is one of scarcity. Funding for legal aid programs may be limited, or a police training facility may be too small to train all police at once. One way to deal with this scarcity is to select units for ‘treatment’ randomly, giving each unit an equal chance of being selected. This uses the natural circumstances of scarcity to test impact.

The few justice impact evaluations conducted have largely focused on the impact of legal services – legal aid, paralegals, alternative dispute resolution – on outcomes for clients and have shed some useful insights. In Liberia, ADR training had a sizeable impact on the resolution of long-standing land disputes (as well as some unsavory side–effects) and the provision of paralegal services had some positive economic impacts on clients. In the United States the offer of assistance from a university legal aid clinic had little effect, suggesting who you offer a legal service to is just as important as how you deliver it. One of the limited forays into testing operational alternatives within justice institutions was a UK experiment with text message reminders for the payment of court fines. These increased payment rates by a third – and significantly reduced collection costs.

Given the concerns with impact evaluations, as well as their application so far, what might be some areas to focus on and principles to guide us?

Whilst jurisdiction- and institution-relevant questions should drive any evaluation, the focus of much development assistance in justice reform suggests a few areas to consider:
  1. The internal dysfunction of justice institutions is felt keenly by citizens, but almost no experimental evidence exists to understand the impact of different alternatives. The internal structure of agencies means that they are often suitable for impact evaluations – for example, one half of courts could adopt in-court stenographers to record proceedings with the other trying electronic equipment, and accuracy, speed and costs could be compared. A Bank-supported effort to alert judges in Senegal about seriously delayed cases is one example in this area that is just getting off the ground.
  2. Many justice personnel operate under levels of considerable autonomy and thus how to incentivize performance through different management techniques is a real issue. The mixture of supervision, sanctions and rewards (both financial and non-financial) for justice agents – be they police, prosecutors or judges – is an area ripe for interrogation, especially when intrinsic motivation can be a crucial aspect for many personnel. The balance between increasing efficiency without sacrificing quality is critical.
  3. Much money is spent on training justice officials yet impact measurement is limited. One example of how a more sophisticated evaluation of impact might work could be a police training program so that police are randomly split into three groups: the first group receives intensive week long residential courses; another, periodic lunchtime seminars; and another, peer to peer learning. Impact could be measured according to metrics of interest (knowledge of procedures; number of cases processed; ratings by the public) and the most effective method scaled-up and the least effective dropped.

Other fields have developed detailed protocols for conducting impact evaluations which justice reformers should benefit from and build on. In addition to those, a few simple principles to keep in mind are:

  1. Test ‘live’ operational alternatives that can actually continue after the evaluation ends. Finding out via an impact evaluation that financial awards improve the integrity of corrections officers offers limited benefit if there’s not enough money to continue the program. It might be better to test two (or more) operational alternatives currently subject to debate within the justice system and find out which one works best.
  2. Measure broader socio-economic impacts. Lawyers tend to focus strongly on normative and constitutional rights yet public funding for legal systems is subject to competing claims from other public services. Does an alternative dispute resolution program divert cases from the courts and thus reduce the burden on state resources? Does a legal service that increases women’s chances of receiving alimony increase investment in their child’s education? Whilst our immediate counterparts are often legal agencies, results that are compelling for Ministries of Finance (and others) are good to keep in mind.
  3. Use impact evaluations as an instrument to improve administrative data collection. Your average impact evaluation specialist will likely want to design a bespoke population survey (and these might be necessary in some circumstances) which are costly, time consuming and not easily replicable by justice agencies. Building up administrative data systems will allow you to try different reforms over time and continually assess whether they are working.

Impact evaluations can’t answer all the important questions in justice reform and we certainly shouldn’t pursue only those reforms amenable to measurement by such methods. Impact evaluations can increase the confidence in making causal claims, but even the most rigorous impact evaluation will not provide "100% confidence", and cannot assure whether a particular intervention that ‘worked’ in a particular situation can work again in a different one. Theory, ethnography and behavioral experiments in ‘labs’ are also critical sources of ‘evidence’ in guiding justice reform – helping us understand not only if something works but how. That said, there are areas of reform, currently subject to considerable spending and limited empirical reflection on effectiveness, for which impact evaluations can be useful.


Submitted by Jim Greiner on

Hi, Nick, great post! I see little with which I disagree. Just a few of highlights. First, you hint at two key ethical principles that might justify RCT-based evaluation: equipoise (not knowing whether an intervention has certain effects) and scarcity (not being able to provide an intervention in all eligible cases). Ordinarily, either should be enough by itself to justify randomization. Frequently, objections to RCTs by those of us in the justice fields are based on what I call the “Deity Complex”: we in the justice fields already know everything about how the universe works, and the universe would be better off if we were given all the resources we needed to do our jobs completely; the only problem is that no one is as smart as we are. Alas, I do not subscribe to this view, and once one acknowledges that we are not Deities, there’s an awful lot we don’t know. Second, justice-interventions often have multiple goals, and have even more numerous effects (sometime undesirable effects). We need to measure all such effects that can be measured, and to consider fully for effects that cannot be measured. Third, it is pointless to do an RCT without a simultaneous and serious investigation into the institutional, sociological, and inter-personal setting. No thinking investigator in any field simply slams units through treated and control groups, compares means, and declares victory. If that’s all that is planned in an RCT, the RCT should be abandoned.

If we keep these three principles, along with the others that you highlight here, firmly in mind, my view is that RCTs have a great deal to offer in justice, access to justice, adjudicatory administration, crime prevention, and other areas.

Nick, thanks for this very good post that touches on a wide array of complex issues pertaining to justice-oriented evaluation. Some quick thoughts:

1. As you may know, the international NGO Namati is preparing a DFID-funded study on the impact of legal empowerment. The study will overlap with justice-oriented concerns, though it will be broader in some respects (e.g., more toward what the Bank might categorize as governance matters and socioeconomic impact) and narrower in others (e.g., less focused on judicial and police reform) The most recent draft is from last year, but I'd expect that the study would be finished soon. Those interested in the study might best contact your former Bank colleague, Abigail Moy, at [email protected]

2. I prepared a paper, "Legal Empowerment Impact: An Initial Guide to Issues, Methods and Impact," for Namati and the Open Society Justice Initiative. It can be found at Namati's website, at, though apparently one must become a member of the Namati network to download it. I have not gotten around to posting it at my own website yet, unfortunately. In any event, the guide is a rather basic tool much more intended for those NGOs and other legal empowerment practitioners that might be less sophisticated in evaluation techniques than it would be for evaluation professionals and others familiar with the methodologies you discuss. But I wanted to mention it all the same.

3. The five-year, DFID-funded Community Legal Services Program in Bangladesh ( may be undertaking some innovative research along the lines of what you sketch, especially given that with a budget of over $30 million it may well be the the largest grant-making (mainly to NGOs) legal services program in the world. As a part-time adviser to the program I am up on some research plans afoot, but the most current information about the program and its research agenda can be obtained from Team Leader Hector Soliman, at [email protected]

4. In a more substantive vein, I know that the Bank's Justice for the Poor program is generating some data along the lines of what you discuss in your post, though largely not in the context of impact evaluations. But I would suggest that J4P map out an even more robust, impact-oriented research agenda along those lines.

5. I want to second Jim Greiner's excellent comment: "Third, it is pointless to do an RCT without a simultaneous and serious investigation into the institutional, sociological, and inter-personal setting." Building on that, I'd suggest that to complement the more quantitative investigations you sketch, it would be useful and important to also commission qualitative case studies of how community level and policy changes have been brought about (or frustrated) by development initiatives. Such studies would seem necessary to begin to scratch the surface of the political economy analysis necessary to understand what work is worth supporting and where. Somewhat along those lines, next month the Open Society Justice Initiative is launching a book I edited, "Legal Empowerment: The How and Why," which among other (hopefully useful articles) includes a chapter that takes something of an anthropological perspective on the politics of the work of two paralegals in Indonesia helping a community combat illegal, corrupt manipulation of land resource laws in one area.

6. As for state justice institutions, we need to go beyond limited inquiries regarding the knowledge-oriented impact of training to see whether projects focusing on the judiciary, police and other such institutions actually change behavior and yield other important impacts. To pick just one of many examples, one would think and hope that properly trained judges and police would treat women better in cases of domestic violence than those who have not received such training, and that to some extent such changes could be studied by examining relevant records and, in the qualitative vein, by observational studies. I realize that many projects that engage with such institutions are geared more toward other issues, such as expediting case processing. So my suggestion here is geared toward the large, related issue of what such projects aim to accomplish to begin with.

7. Speaking about large, related issues: After 20 years of investing billions of dollars in judicial reform efforts, what proof is there that World Bank and other development agency funding have done much good? Is the rule of law any stronger in, say, Russia or Egypt today, in the wake of such investments? I'd suggest that the smattering of studies you cite and the forthcoming Namati study point more in the direction of grassroots work and in helping progressive forces within societies push for legal, policy and institutional reform, rather than the main direction that Bank has been taking. The point here is not that judicial reforms are not worthwhile, in principal; rather, it is that in practice our ability to help bring them about in most countries is extremely limited. To intentionally twist that words of that great development thinker, Donald Rumsfeld, at some point the absence of evidence really does indicate evidence of absence (of impact).

Thanks for this very useful post on guiding principles and challenges to keep in mind when conducting impact evaluations concerning justice reform. For those researchers and practitioners interested in the intersection between justice work and data collection via household surveys, I wanted to point out a helpful guide written by my colleague, which offers guidance on the design and implementation of surveys that address justice issues (in addition to suggested questions, topics, and ideas for moving the discussion forward). The guide is available here: I expect that the suggestions offered would be useful to survey researchers as well as justice practitioners in the creation of new household survey data in some of the research areas suggested in this post.

Submitted by Sahawal on

Hi everyone,
Great post and interesting question from many points of view indeed. Actually, I performed an impact evaluation of judicial reforms in Benin using a satisfaction index of citizen towards justice service and matching propensity score method. The idea is that improvments induced by reforms may increase satisfaction of users. So the index of a user is expected to be higher than that of his counterfactual. I also analyzed inequality in the distribution of satisfaction index.

Submitted by Anonymous on

Hi Sahawal, thanks for your post. I would love to hear more about the evaluation you conducted. If you could link to any information here that would be great, or otherwise please be in touch at nmenzies[@] Thanks

Add new comment