Syndicate content

Tools of the Trade

Endogenous stratification: the surprisingly easy way to bias your heterogeneous treatment effect results and what you should do instead

David McKenzie's picture

A common question of interest in evaluations is “which groups does the treatment work for best?” A standard way to address this is to look at heterogeneity in treatment effects with respect to baseline characteristics. However, there are often many such possible baseline characteristics to look at, and really the heterogeneity of interest may be with respect to outcomes in the absence of treatment. Consider two examples:
A: A vocational training program for the unemployed: we might want to know if the treatment helps more those who were likely to stay unemployed in the absence of an intervention compared to those who would have been likely to find a job anyway.
B: Smaller class sizes: we might want to know if the treatment helps more those students whose test scores would have been low in the absence of smaller classes, compared to those students who were likely to get high test scores anyway.

Why is Difference-in-Difference Estimation Still so Popular in Experimental Analysis?

Berk Ozler's picture
David McKenzie pops out from under many empirical questions that come up in my research projects, which has not yet ceased to be surprising every time it happens, despite his prolific production. The last time it happened was a teachable moment for me, so I thought I’d share it in a short post that fits nicely under our “Tools of the Trade” tag.

Tools of the Trade: a joint test of orthogonality when testing for balance

David McKenzie's picture
This is a very simple (and for once short) post, but since I have been asked this question quite a few times by people who are new to doing experiments, I figured it would be worth posting. It is also useful for non-experimental comparisons of a treatment and a control group.

Curves in all the wrong places: Gelman and Imbens on why not to use higher-order polynomials in RD

David McKenzie's picture
A good regression-discontinuity can be a beautiful thing, as Dave Evans illustrates in a previous post. The typical RD consists of controlling for a smooth function of the forcing variable (i.e. the score that has a cut-off where people on one side of the cut-off get the treatment, and those on the other side do not), and then looking for a discontinuity in the outcome of interest at this cut-off. A key practical problem is then how exactly to control for the forcing variable.

Tools of the trade: recent tests of matching estimators through the evaluation of job-training programs

Jed Friedman's picture
Of all the impact evaluation methods, the one that consistently (and justifiably) comes last in the methods courses we teach is matching. We de-emphasize this method because it requires the strongest assumptions to yield a valid estimate of causal impact. Most importantly this concerns the assumption of unconfoundedness, namely that selection into treatment can be accurately captured solely as a function of observable covariates in the data.

Tools of the trade: when to use those sample weights

Jed Friedman's picture

In numerous discussions with colleagues I am struck by the varied views and confusion around whether to use sample weights in regression analysis (a confusion that I share at times). A recent working paper by Gary Solon, Steven Haider, and Jeffrey Wooldridge aims at the heart of this topic. It is short and comprehensive, and I recommend it to all practitioners confronted by this question.

“Oops! Did I just ruin this impact evaluation?” Top 5 of mistakes and how the new Impact Evaluation Toolkit can help.

Christel Vermeersch's picture

On October 3rd, I sent out a survey asking people what was the biggest, most embarrassing, dramatic, funny, or other oops mistake they made in an impact evaluation. Within a few hours, a former manager came into my office to warn me: “Christel, I tried this 10 years ago, and I got exactly two responses.” 

Tools of the Trade: Intra-cluster correlations

David McKenzie's picture

In clustered randomized experiments, random assignment occurs at the group level, with multiple units observed within each group. For example, education interventions might be assigned at the school level, with outcomes measured at the student level, or microfinance interventions might be assigned at the savings group level, with outcomes measured for individual clients.

Tools of the Trade: A quick adjustment for multiple hypothesis testing

David McKenzie's picture

As our impact evaluations broaden to consider more and more possible outcomes of economic interventions (an extreme example being the 334 unique outcome variables considered by Casey et al. in their CDD evaluation) and increasingly investigate the channels of impact through subgroup heterogeneity analysis, the issue of multiple hypothesis testing is gaining increasing prominence.