In this blog, we advocate the importance of in-depth reporting on implementation processes, evaluation processes, and relevant contextual details of interventions and linked evaluations. This will facilitate research transparency, as well as assessments of both learning and the potential for generalizability beyond the original study setting (learning lessons from ‘there’ for ‘here,’ but not necessarily promoting the strict and exact duplication of a program from one setting to another, in line with an understanding of external validity that is appropriate for the social sciences in development).
We start with a hypothetical scenario of an intervention and associated evaluation, based on too-frequent experiences in the impact evaluation space. We hope that it doesn’t sound familiar to those of you who have been involved in evaluation or have tried to make sense of evaluation results -- but suspect that it will.
A research team, connected to a larger research and evaluation organization, ran a study on an intervention. For reasons of statistical and political significance, they have deemed it sufficiently successful and worthy of scaling up, at least in a very specific new setting.
The intervention sought to overcome the following problem, for which there are supply-side and demand-side issues. People in malarious areas may procure a bednet (whether for free or for a positive price), but they do not always follow-through with maintenance (re-treatment or replacement).
For supply, the private sector only sporadically offers retreatment and replacement, and it is expensive, while the public sector does not always have supplies available. The intervention, therefore, concentrates provision of this service at a specific time and place through temporary service centers.
For demand, people with nets often don’t understand the need for retreatment and, even if they do, continuously put off doing so. The intervention, therefore, included a non-monetary incentive for which there is local demand (in this case, soap) to be picked up at the time of net retreatment.
This is, altogether, a relatively straightforward (or ‘technocratic’ in Woolcock’s classification) but potentially powerful intervention that can improve the private and public good. As such, researchers in the research organization would like to try this intervention (with associated impact evaluation) in other locations, in which they suspect net retreatment and replacement faces a similar set of challenges.
However, when the research team in charge of the external replication looked back to the original reports from this experiment, they discovered relatively little information about how the intervention was designed and implemented. The publication is mum on the process itself and any lessons learned, including challenges faced and whether the researchers have done something different with the advantage of 20/20 hindsight. Moreover, there aren’t many internal notes that lay out the operations of the intervention or the evaluation. What exists as tacit knowledge remains kept for elite seminar discussions or cathartic gossip over beers.
This hypothetical raises two key problems: (1) research transparency and (2) the potential for learning and assessing generalizability.
Research Transparency: From Implementation to Data Collection
While the current focus of research transparency movements (across and within disciplines) is clear about making data and code available for internal/statistical replication, there is a critical piece missing about process. How was the evaluation run? How was the intervention run? What challenged and enabled the success of an intervention in a particular setting? From the hypothetical scenario, this includes questions such as:
- Who was supposed to organize and run the service centers and who actually did so?
- Did the evaluation put in place any kind of monitoring that would likely to be present if the implementers were acting alone? Should this be properly considered part of the intervention?
- How was the procurement of soap supply managed and were there any relevant challenges?
- How was soap determined to be a good incentive in the first place?
The research team should ideally be able to refer to a report or working paper, or at least set of internal notes to guide them. But a lack of documentation means that neither evidence users nor even those within the research organization know the answers to these questions. This isn’t just an issue of operations and redundant work, but one of research transparency and ethics: to understand what an intervention actually included, and what is required in order for it to be successful.
Understanding the intricacies of both the implementation and study setting should include both systematic documentation of relevant factors (ideally informed by a theory of change), as well as ensuring that both quantitative and qualitative ‘process’ data are collected with the same rigor as ‘evaluation’ (baseline/endline) data. Going beyond a bare-bones theory of change (and including theoretical mechanisms, implementation processes, and contextual interactions) requires extra work. This should, admittedly, fall with both researchers and donors/commissioners -- to ensure that study teams have the necessary financial resources, time, and research capacity to effectively and systematically collect and process this information.
Reporting for learning and generalizing
The ‘active ingredients’ of programs (if not whole programs) tested in one setting can also be tried in other settings (other times, geographies, scales, etc). Indeed, some may say this is a key goal of policy-relevant evaluations. Such trials may be done on a one-off basis or as part of a more systematic approach to external/field replications to learn whether some interventions are indeed effective in a variety of settings.
Neither can be done well if details about how the active ingredients were implemented and measured are not reported. This, in turn, is quite difficult without tools to measure and document processes and decisions made along the way. But this needs to be sorted because reporting on implementation and evaluation experiences and challenges is central to a learning agenda in the social sciences and in programmatic and policy work.
This argument is not new. Lincoln and Guba (1986) call for "narrative developed about the [setting to allow] judgments about the degree of fit or similarity may be made by others who wish to apply all of part of the findings elsewhere". It seems that similar concerns motivated Woolcock to close his paper on external validity with a call for more case study work. Thickness and richness of description, rather than thinness, helps users of evidence learn and make assessments and adjustments in light of their own setting. This description, guided by a good theory of change, can address directly some key challenges to external validity, such as site selection and partner selection biases.
Failure to prioritize intervention details that track along a detailed theory of change could be potentially detrimental, with ill-advised implementation and/or a locally inappropriate intervention. In the case of the bed-net intervention, failure to report on challenges (e.g., need for extra community buy-in, developing efficient supply chains, clarity about what government workers could not handle alone, etc.), as well as enabling factors (e.g., community-level awareness of proper bed-net usage, the operational strength and local reputation of the implementing partner, etc.), could mean that the research team team conducts a study (and intervention) that, at the least, inefficiently uses both research and implementation resources, and at the most, has negative unintended consequences. Ultimately, the conversations that combine research transparency and policy recommendations should prioritize high-quality, systematic, and readily available data from all parts of the impact evaluation cycle.
Our current efforts to be ‘rigorous’ while conducting evaluations are insufficient. We must also be rigorous in our efforts to document evaluation and implement processes, to report on these, and to critically and openly reflect on how we might approach the same problem differently in the future. This level of transparency, though initially daunting, will only improve the potential for better-informed and better-implemented policies, aiding us in transferring lessons from ‘here’ to ‘there’.
Follow PublicSphereWB on Twitter!