I have been somewhat skeptical about the application of impact evaluations to justice reform activities but I’m coming around to their utility for a limited – yet important – set of questions. The basic method behind impact evaluations – establishing a counterfactual in order to attribute net impact – is fairly new to justice so I thought I’d set out some ideas that might be worth considering in developing this nascent field.
- The need for a counterfactual (or comparison group) means impact evaluations can be pretty ineffective in answering some of the big policy or structural questions. For example, it’s hard to test out one constitutional set up for a supreme court against another.
- Impact evaluations can be costly and complex, especially in low capacity contexts, requiring careful data collection and control over the implementation of the intervention(s). Many justice institutions lack a history of strong data collection and the use of aggregate data for decision making.
- The setup of a counterfactual often involves denying a policy, service or activity to one group and is one reason the issue of ethics arises in most impact evaluations. Given the preponderance of lawyers involved in justice projects, the question of ethics can take on an especially central role. There are, however, some good counter-arguments. If legal services are as crucial as lawyers think they are, then it could be said to be unethical to offer them without some confidence they work. But perhaps the most resonant counter-argument for those who run legal institutions is one of scarcity. Funding for legal aid programs may be limited, or a police training facility may be too small to train all police at once. One way to deal with this scarcity is to select units for ‘treatment’ randomly, giving each unit an equal chance of being selected. This uses the natural circumstances of scarcity to test impact.
The few justice impact evaluations conducted have largely focused on the impact of legal services – legal aid, paralegals, alternative dispute resolution – on outcomes for clients and have shed some useful insights. In Liberia, ADR training  had a sizeable impact on the resolution of long-standing land disputes (as well as some unsavory side–effects) and the provision of paralegal services  had some positive economic impacts on clients. In the United States the offer of assistance from a university legal aid clinic  had little effect, suggesting who you offer a legal service to is just as important as how you deliver it. One of the limited forays into testing operational alternatives within justice institutions was a UK experiment with text message reminders for the payment of court fines . These increased payment rates by a third – and significantly reduced collection costs.
Given the concerns with impact evaluations, as well as their application so far, what might be some areas to focus on and principles to guide us?Whilst jurisdiction- and institution-relevant questions should drive any evaluation, the focus of much development assistance in justice reform suggests a few areas to consider:
- The internal dysfunction of justice institutions is felt keenly by citizens, but almost no experimental evidence exists to understand the impact of different alternatives. The internal structure of agencies means that they are often suitable for impact evaluations – for example, one half of courts could adopt in-court stenographers to record proceedings with the other trying electronic equipment, and accuracy, speed and costs could be compared. A Bank-supported effort to alert judges in Senegal  about seriously delayed cases is one example in this area that is just getting off the ground.
- Many justice personnel operate under levels of considerable autonomy and thus how to incentivize performance through different management techniques is a real issue. The mixture of supervision, sanctions and rewards (both financial and non-financial) for justice agents – be they police, prosecutors or judges – is an area ripe for interrogation, especially when intrinsic motivation can be a crucial aspect for many personnel. The balance between increasing efficiency without sacrificing quality is critical.
- Much money is spent on training justice officials yet impact measurement is limited. One example of how a more sophisticated evaluation of impact might work could be a police training program so that police are randomly split into three groups: the first group receives intensive week long residential courses; another, periodic lunchtime seminars; and another, peer to peer learning. Impact could be measured according to metrics of interest (knowledge of procedures; number of cases processed; ratings by the public) and the most effective method scaled-up and the least effective dropped.
Other fields have developed detailed protocols for conducting impact evaluations which justice reformers should benefit from and build on. In addition to those, a few simple principles to keep in mind are:
- Test ‘live’ operational alternatives that can actually continue after the evaluation ends. Finding out via an impact evaluation that financial awards improve the integrity of corrections officers offers limited benefit if there’s not enough money to continue the program. It might be better to test two (or more) operational alternatives currently subject to debate within the justice system and find out which one works best.
- Measure broader socio-economic impacts. Lawyers tend to focus strongly on normative and constitutional rights yet public funding for legal systems is subject to competing claims from other public services. Does an alternative dispute resolution program divert cases from the courts and thus reduce the burden on state resources? Does a legal service that increases women’s chances of receiving alimony increase investment in their child’s education? Whilst our immediate counterparts are often legal agencies, results that are compelling for Ministries of Finance (and others) are good to keep in mind.
- Use impact evaluations as an instrument to improve administrative data collection. Your average impact evaluation specialist will likely want to design a bespoke population survey (and these might be necessary in some circumstances) which are costly, time consuming and not easily replicable by justice agencies. Building up administrative data systems  will allow you to try different reforms over time and continually assess whether they are working.
Impact evaluations can’t answer all the important questions in justice reform and we certainly shouldn’t pursue only those reforms amenable to measurement by such methods . Impact evaluations can increase the confidence in making causal claims, but even the most rigorous impact evaluation will not provide "100% confidence", and cannot assure whether a particular intervention that ‘worked’ in a particular situation can work again in a different one. Theory, ethnography and behavioral experiments in ‘labs’ are also critical sources of ‘evidence’ in guiding justice reform – helping us understand not only if something works but how. That said, there are areas of reform, currently subject to considerable spending and limited empirical reflection on effectiveness, for which impact evaluations can be useful.