Syndicate content

Jed Friedman's blog

A proposed taxonomy of behavioral responses to evaluation

Jed Friedman's picture

My summary of recent attempts to quantify the Hawthorne effect a few weeks back led to some useful exchanges with colleagues and commenters who pointed me to further work I hadn’t yet read. It turns out that, historically, there has been a great deal of inconsistent use of the term “Hawthorne effect”. The term has referred not only to (a) behavioral responses to a subject’s knowledge of being observed – the definition we tend to use in impact evaluation – but also to refer to (b) behavioral responses to simple participation in a study, or even (c) a subject’s wish to alter behavior in order to please the experimenter. Of course all these definitions are loosely related, but it is important to be conceptually clear in our use of the term since there are several distinct inferential challenges to impact evaluation arising from the messy nature of behavioral responses to research. The Hawthorne effect is only one of these possible challenges. Let me layout a classification of different behavioral responses that, if and when they occur, may threaten the validity of any evaluation (with a strong emphasis on may).

Quantifying the Hawthorne Effect

Jed Friedman's picture
This post is co-authored with Brinda Gokul
 
Many who work on impact evaluation are familiar with the concept of the Hawthorne effect and its potential risk to the accurate inference of causal impact. But if this is a new concept, let’s quickly review the definition and history of the Hawthorne effect:
 

Involving local non-state capacity to improve service delivery: it can be more difficult than it appears

Jed Friedman's picture

When state institutions find it a challenge to deliver services in under-resourced areas, its common for policy makers to consider leveraging existing local non-state capacity to help. This involvement of NGOs or CBOs is meant to supplement the state as service provider but a recent paper by Ashis Das, Eeshani Kandpal, and me demonstrates possible pitfalls with this extension approach. Just as implementation capacity of governments is a key determinant of government program performance, NGO capacity is a key determinant of NGO performance and under-resourced areas are likely to contain under-resourced local organizations. We find this to be the case in our study context of malaria control in endemic regions of India. Besides highlighting this challenge, our results also highlight the difficulties that small-scale evaluations present to the generalizability of findings, especially those implemented by non-state actors. Implementation capacity can be a key confounder of generalizability and it is not often measured or even discussed the current practice of impact evaluation needs to think harder about measures that capture implementation capacity in order to generalize IE results to other contexts.

External validity as seen from other quantitative social sciences - and the gaps in our practice

Jed Friedman's picture
For impact evaluation to inform policy, we need to understand how the intervention will work in the intended population once implemented. However impact evaluations are not always conducted in a sample representative of the intended population, and sometimes they are not conducted under implementation conditions that would exist at scale-up.

Towards a more systematic approach to external validity: understanding site-selection bias

Jed Friedman's picture
The impact evaluation of a new policy or program aims to inform the decision on wider adoption and even, perhaps, national scale-up. Yet often the practice of IE involves a study in one localized area – a sample “site” in the terminology of a newly revised working paper by Hunt Allcott. This working paper leverages a unique program roll-out in the U.S. to explore the challenges and pitfalls that arise when generalizing IE results from a handful of sites to a larger context. And the leap from applying impact estimates taken in one site to the larger world is not always straightforward

The often (unspoken) assumptions behind the difference-in-difference estimator in practice

Jed Friedman's picture
This post is co-written with Ricardo Mora and Iliana Reggio
 
The difference-in-difference (DID) evaluation method should be very familiar to our readers – a method that infers program impact by comparing the pre- to post-intervention change in the outcome of interest for the treated group relative to a comparison group. The key assumption here is what is known as the “Parallel Paths” assumption, which posits that the average change in the comparison group represents the counterfactual change in the treatment group if there were no treatment. It is a popular method in part because the data requirements are not particularly onerous – it requires data from only two points in time – and the results are robust to any possible confounder as long as it doesn’t violate the Parallel Paths assumption. When data on several pre-treatment periods exist, researchers like to check the Parallel Paths assumption by testing for differences in the pre-treatment trends of the treatment and comparison groups. Equality of pre-treatment trends may lend confidence but this can’t directly test the identifying assumption; by construction that is untestable. Researchers also tend to explicitly model the “natural dynamics” of the outcome variable by including flexible time dummies for the control group and a parametric time trend differential between the control and the treated in the estimating specification.
 
Typically, the applied researcher’s practice of DID ends at this point. Yet a very recent working paper by Ricardo Mora and Iliana Reggio (two co-authors of this post) points out that DID-as-commonly-practiced implicitly involves other assumptions instead of  Parallel Paths, assumptions perhaps unknown to the researcher, which may influence the estimate of the treatment effect. These assumptions concern the dynamics of the outcome of interest, both before and after the introduction of treatment, and the implications of the particular dynamic specification for the Parallel Paths assumption.

Policy learning with impact evaluation and the “science of delivery”

Jed Friedman's picture
The “science of delivery”, a relatively new term among development practitioners, refers to the focused study of the processes, contexts, and general determinants of the delivery of public services and goods. Or to paraphrase my colleague Adam Wagstaff, the term represents a broadening of inquiry towards an understanding of the “how to deliver” and not simply a focus on the “what to deliver”.
 

Pages