Syndicate content

An incrementalist view of Impact Evaluation and knowledge

Jed Friedman's picture

 

Some years back I received rather bemusing comments from an anonymous referee regarding a journal submission. He or she wrote: “This study doesn’t constitute an economics paper but is simply one large calculation”. Suffice to say, I didn’t agree. However the implied sentiment – i.e. only work that contains behavioral models grounded in economic theory, embodied in at least one line of offset mathematical text, constitutes real economic work – is a common one. I was reminded of this sentiment as I read through a recent article in the American Economic Review, one of our flagship journals. The aforementioned article presents an environmental accounting framework for air pollution impacts in the United States and, yes, it is one very large calculation.

It is also a meaningful very large calculation in so far as it attempts to develop a framework for integrating the negative externalities related to air pollution from economic activities into national economic accounts. Of course any attempt to assess the value (or cost) of non-market services (or dis-services) depends on several key assumptions and the authors, Muller, Mendelsohn, and Nordhaus, are explicit in listing them. For one, they limit the considered pollutants to six of the most prevalent (such as sulfur dioxide and particulate matter) and are forced to ignore other pollutants due to lack of data (one important pollutant, carbon dioxide, is not featured due to lack of source specific emissions data). The authors translate the flow of these pollutants into selected dis-services such as ill-health, mortality, and reduced crop yields. They are forced by necessity to consider only the dis-services that have previously been measured to a sufficient degree, and they then monetize these dis-services.

While perhaps not entirely revelatory, the findings are very interesting. For example:

-          The preferred estimate for Gross External Damages from air pollution in 2002 is $US 184 billion. Taken at face value, the 2002 GDP estimate would need to be adjusted by this amount in order to account for air pollution externalities.

-          The external damages from certain industries such as stone quarrying and marinas exceed their value added to the national accounts. The largest contributor of gross damages is coal-fired electric generation.

-          The gross damage per kilowatt hour of coal generated electricity (2.8 cents/kwh) is more than three times that of natural gas generated electricity (0.9 cents ) – and this estimate  doesn’t yet take into account externalities from CO2 emissions, which per unit of energy are much higher for coal.

One key step in this massive exercise is the ability to link pollution levels to detrimental effects, especially health related effects such as premature mortality. This linkage is based on a summary of peer-reviewed dose-response relationships – these studies are typically descriptive “impact-evaluation-type” work that systematically varies the level of treatment, either deliberately or by natural experiment.

Dose-response studies generally do not test an explicit theoretical framework, but their usefulness is clear in the example above. I walk through this example because I believe it helps clarify (what I find to be) the increasingly sterile debate about whether impact evaluation, as practiced today, forces method to drive the research question and is deliberately light on theory. I’m sure every reader is familiar with this discussion’s concern that practitioner affection for random assignment results in the choice of a-theoretic, and hence impoverished, research questions.

A recent edition of the Journal of Economic Perspectives explores this debate through a series of thoughtful papers. In one, Card, DellaVigna, and Malmendiear review recently published field experiments in top journals and propose a four-class taxonomy of experimental studies:

-          Descriptive studies that lack any formally specified model;

-          Single model studies that lay out a formal economic model and test qualitative implications of the model;

-          Competing models studies that specify alternative models with at least one contrasting implication and test between them;

-          Parameter Estimation studies that specify a complete data-generating process for the observed data in order to obtain estimates of structural parameters.

Card and colleagues explicitly state that they present a descriptive taxonomy. However I am sure many of my colleagues naturally interpret this taxonomy as hierarchy with descriptive studies at the bottom and competing models or parameter estimation studies at the top.

The problem with this hierarchy, however, is that useful work is not always concerned with distinguishing between contrasting theoretical models. Indeed useful work may simply involve the accurate measurement of a dose-response relationship.

Now by saying this, I don’t deny the need for good empirical work to be informed by theory, either implicitly or explicitly (although theory that informs a useful empirical study need not be economic theory per se but theory from any social or natural science). In the same vein, good theory cannot ignore the lessons of empirics. Every piece of research is a negotiation between theory and data, with the tip of scales determined by the nature of the question and the inclination of the research team.

In fact, as highlighted by Card and colleagues, there tends to be a progression in experimental empirical work on any particular topic as certain empirical facts are first suggested by descriptive studies and then incorporated into explicit model based tests. For example the experimental work on determinants of charitable giving began with descriptive studies and more recently evolved to formal tests between contrasting models of giving.

Knowledge gain is incremental. As Angrist and Pischke wrote in an earlier article from the Journal of Economic Perspectives: “the process of accumulating empirical evidence is rarely sexy in the unfolding, but accumulation is the necessary road along which results become more general”. While we should try to be creative and efficient in our approach – and without a doubt in some settings a series of relatively small scale mechanism experiments, in the vein of Ludwig, Kling, and Mullainathan, would be more sensible than a single large policy evaluation – each impact evaluation study, be it descriptive or model based, holds the possibility to expand the bounds of human knowledge in some modest direction.