Another reason to prefer Ancova: dealing with changes in measurement between baseline and follow-up


This page in:

A few months ago, Berk blogged about my paper on the case for more T, and in particular, on the point that Ancova estimation can deliver a lot more power than difference-in-differences when outcomes are not strongly autocorrelated. I continue to get a number of questions about this paper, and some of them recently have lead me to emphasize another potential benefit of Ancova which I don’t discuss in the paper – namely, it can be a useful way of dealing with changes in measurement between baseline and follow-up.

Let me discuss three different types of changes in measurement, and how Ancova deals with them better relative to differences in differences.

1. Changes in who the outcome gets measured for between baseline and follow-up, because the baseline data are missing for some observations. For example, in your baseline survey perhaps some firms didn’t report profits, or perhaps you only had enough funding to test half of the kids, etc.
With Ancova, then you just want to dummy out the baseline data for these observations. So create a dummy variable missingbaseline, set Y0 then to be zero if the baseline is missing, and then run    Y = a + b*Treat  + c*Y0 + d*missingbaseline + any other controls.
With difference-in-differences you would typically throw away these observations, since you can’t take a difference if you don’t have the baseline.
2. Changes in the recall period for measurement. This has happened in several projects recently. For example, the baseline data asked about monthly profits, but then in the follow-up because of high non-response for monthly recall (see change 1 above), we went to a weekly recall. In another example, we asked about whether the firm had done several innovative activities in the last year at baseline, but then since the first follow-up was only 6 months after treatment, asked about innovative activities in the last 6 months at follow-up.
With Ancova, this creates no problems. We can run a regression like:
Weekly profits = a +b*Treat + c*Baseline monthly profits + other controls
And here c controls for baseline monthly profits to the extent that they are useful in explaining weekly profits at follow-up. But it is still very clear that we are estimating the treatment effect at follow-up on weekly profits.
In contrast, with difference-in-difference, you might try and convert the monthly profits to a weekly figure and then take the difference, or do other changes, and then you are looking at the treatment effect in the difference between weekly profits now and some transform of monthly profits before, which is a less easy to explain outcome.

3. Changes in how an outcome is measured, especially when forming indices. This covers a range of measurement changes. For example, you might change the wording of a question based on feedback from the baseline on some respondents getting confused. Or you might have an outcome which is an index of a whole bunch of questions, and you may change which precise questions are asked at follow-up versus baseline (e.g. you might not use the exact same test questions both times, or the exact same subset questions intended to measure some personality); or you may have respondents play a different game or activity to measure some behavior to avoid issues with them learning from their baseline attempt at this.
This issue is then dealt with in the Ancova in the same way as the change in the recall period – namely you control for whatever measure you have at baseline, and then the Ancova decides how much to control for it by how useful it is in predicting the future outcome. In contrast DD gets worried about what exactly the difference is measuring.

There is often this concern about changing how you measure an outcome from one survey round to another. This gets beaten into us when we study poverty measurement, where the concern is that changes in levels of poverty from one period to the next might be arising from changes in the measurement method rather than actual changes. But in an RCT world, where your interest is in treatment effects, then I think using Ancova can give you more latitude to make improvements or changes in your outcome measure between baseline and follow-up as you learn more information. (Note that the same does not apply when you are doing multiple follow-ups to allow estimating a pooled treatment effect or to estimate impact trajectories, which is another part of the same paper – then you do want to keep the outcome measure consistently measured across follow-up rounds, even if you have modified it from the baseline).


David McKenzie

Lead Economist, Development Research Group, World Bank

Jason Kerwin
June 24, 2015

You can also show that controlling for baseline values of the outcome minimizes the bias of estimates relative to simple differences in endline means or a diff-in-diff. See Appendix IV here, starting on page 18 (of the current version):

Bruce Wydick
June 22, 2015

This insight in David's JDE paper about the efficiency of ANCOVA relative to diff-in-diff estimators is a critical insight for the many of us who often carry out RCTs with baselines and moderate sample sizes. Sometimes when one hears claims about added efficiency of a new type of estimation, it is tempting to think it will make little noticeable difference, but we should employ it anyway because of the demonstrated greater efficiency--however, this is not one of those situations. If anyone is craving an example of the greater efficiency of ANCOVA, browse the standard errors of our estimates in our TOMS Shoes child impact paper (URL below), where Diff-in-Diff and ANCOVA se's are presented right next to each other. Standard errors using ANCOVA are nearly always lower, and often shrinking to half the size of diff-in-diff or less. These added practical advantages mentioned here only tip the scale even more favorably toward use of ANCOVA.…?