Syndicate content

Add new comment

Sorting through heterogeneity of impact to enhance policy learning

Jed Friedman's picture

The demand and expectation for concrete policy learning from impact evaluation are high. Quite often we don’t want to know only the basic question that IE addresses: “what is the impact of intervention X on outcome Y in setting Z”. We also want to know the why and the how behind these observed impacts. But these why and how questions, for various reasons often not explicitly incorporated in the IE design, can be particularly challenging. Sometimes we turn to mixed methods to help us understand the causal channels behind the observed impact. Other times we explore the heterogeneity of impact and look to see if there are discernible patterns with respect to important covariates.

This exploration of correlates of heterogeneity can be hit or miss. At times there are no easily understandable patterns in the results, and other times there are such patterns but we cannot find a suitable explanation for why they arise. Nevertheless the systematic exploration of impact heterogeneity can yield dividends. I recently read an excellent example of this approach with an evaluation of charter school effectiveness in Boston. (Charter schools are independently managed schools that receive public financing.) The authors, Angrist, Pathak, and Walters, first document the magnitude of impact heterogeneity in a relatively large sample of charter schools in Massachusetts. They then present a framework for interpreting this heterogeneity using both student and school-level information.

Angrist and co-authors find that charter schools significantly increase test scores, at least in an urban setting. This impact is estimated from the randomization inherent in the lottery system that assigns students either to a place in the oversubscribed charter school or not. The lottery based estimates suggest that an urban middle school student spending a year in a charter school increases her score on the standardized English exam by .15 sd and on the math exam by .32 sd (the results are even greater at the high school level). In contrast the non-urban charter schools fail to raise test scores and may even lower them for middle school students. This lack of improvement in non-urban charter schools is consistent with the general literature and tantalizingly stands as a clear dimension of heterogeneity that can be explored further.

To do so, the authors first investigate the role of student characteristics through a straightforward application of the Oaxaca-Blinder decomposition. This decomposition allocates the difference in the urban/non-urban impact estimates into a component due to differences in the demographics of charter students and a component due to differences in the effectiveness of urban charter schools conditional on demographics.

The exercise suggests that roughly half of the test score gain in urban schools is due to demographic characteristics of the students – it turns out that the urban charter schools are especially effective for poor and minority students and that these schools indeed serve more of these students. This leaves the other half of the relative test score gains likely due to differential effectiveness at the school level. But what are the school level characteristics most associated with effective schools?

To answer this question the authors combine information with non-charter (i.e. regular public) schools and conduct observational analysis that matches students in charter schools with those in public schools. This allows a fairly flexible regression framework to explores differences in effectiveness with respect to student, peer, and school characteristics.

Other work in the area suggests that teacher feedback, a focus on tutoring, increased instruction time, and high expectations significantly predict effectiveness in charter schools. It turns out that one model of charter schools in the Boston area also contains these features – the “No Excuses” model explicitly adopted by approximately two-thirds of the urban charter schools in Boston. The “No Excuses” model includes a strict disciplinary environment, an emphasis on student behavior and comportment, and extended instruction time (with a renewed focus on reading and math skills).

Angrist and co-authors find that controlling for standard inputs in the education production function, such as instruction time and per-pupil expenditures, does little to explain the charter school treatment effect. However it turns out that a No Excuses dummy fully accounts for the differential treatment effect between urban and non-urban charters. In other words, much of the differential effectiveness of urban charter schools appears to be due to features associated with the No Excuses model.

Why is the “No Excuses” model apparently so effective? It is difficult for the authors to unpack this further – these are the data-constrained limits of this heterogeneity analysis – although they note that certain behavioral measures linked to comportment and discipline (such as suspension and truancy) are significantly affected by charter status in urban areas. As the No Excuses model fails to reach comparable achievement gains in non-urban areas, this suggests there may be important interactions between the No Excuses model and population (or school) characteristics.

So this exploratory work, at the end of the day, only suggests possible causal channels (while ruling others out). But it’s a critical step in learning that will hopefully point the way to further theoretical modeling and, perhaps, the design of targeted mechanism experiments.