I want to thank Catherine, David and some anonymous readers for their responses to last week’s post  on who pays for evaluations. Their thoughtful responses led to me think more about objectivity and engagement with project teams, so here it goes:
The way I see it, there are three and a half main reasons to do an impact evaluation: 1) it’s a really expensive project and rigorous evidence is going to strengthen the justification, 2) it’s a new setting for an intervention 3) it’s a new intervention and, 3.5) it’s a new intervention (or an old intervention carried out in a way) that sheds some fundamental light on a question of theory or behavior.
So, depending on your job and what makes you happy, I suspect at least one would apply – I’ve definitely done evaluations for more than one of these reasons. As I approach the project team, though, my big initial question is: do I think there is a chance this could work? I’ve never, yet, undertaken an evaluation where I thought the intervention was hopeless. (This is not to say these types of evaluation shouldn’t be done – in fact I hope to be able to blog soon about a really interesting paper on an intervention that does the exact opposite of what is intended).
But for most of them, I think or hope it’s going to work. And to me, this is the first big threat to objectivity – I don’t want the results to be bad. This very often comes with a belief in the project team -- if it’s going to fly, having a plausible pilot and crew help (sorry, I can’t help myself – I’m at the airport). And this is a further threat to my objectivity – some of the implementers I work with end up being my heroes (yes mom, your son looks up to bureaucrats, as well as a guy who threatened to break his legs). And the ones I really admire are those who want to learn – who approach things with an attitude of “I think this might work, but if not, I want to know why and what else I can do.”
And this is where the tide turns, because these folks, with whom I am less likely to be objective, are the very ones who demand it of me.
But many evaluations that I do don’t quite reach this happy scientific nirvana. Most of the implementers do something because their prior is “it works” and, given the amount of time they are going to put in, it’s a fairly strong prior (the agnosticism, I find, is more vested in the managers – like one who told me (post-implementation) – if you show any effects of this program, then I’ll know you are lying). And with these priors, and my priors, is where I find I have to be extra careful. More on mechanisms for this in a minute.
I think the best, most informative evaluations, I’ve done (or am doing) are those where I’ve engaged early and deeply with the project team. This interaction is key to understanding what the project is doing (as opposed to what was in the description set down on paper), what the potential effects might be and how we might measure them. And then, when there are results, discussing them with the team is key – they often know better than me what they mean and how to unpack them: which heterogeneity might be relevant (in implementation and in effects), what the timing of different outcomes is likely to be, and the like. Bottom line: this engagement is critical.
But this engagement usually deepens my respect for the folks implementing the program. And this makes a negative or zero result harder to live with. As David suggests, we can put in place external mechanisms to push us to objectivity – if I had to register each evaluation at inception in a place other people could see I would feel more obligated to report everything – whatever the outcomes. Things like requiring the public release of data (as the World Bank Research Group does) can also help. And, as David would agree, there is also the internal motivation. If I really respect these guys, I owe them the truth as best I can measure it. Now, should these be published? Ideally, yes – but with the increasing emphasis of funders on “results” (that is things they can take to their constituents and show something tangible and puts less emphasis on aggregating lessons of failures) and the lack of top-notch publishing outlets for all but the most counter-intuitive negative/zero results, the reality is messier. But again, some kind of registry, with space for the ultimate results, could help other programs learn from what doesn’t work instead of just what works.
Look, I started writing this post in the third person – my ethics training coming out with “ought” and “one” but these are fundamentally personal judgments – with a bunch of ways to get the right balance between objectivity and engagement to produce results with integrity. Thoughts?