I received an interesting question last week from Andrea Guariso, that I think speaks to a more general issue that I have faced in several evaluations, and so I thought I would share his question, my responses, and see whether others have experience or advice to share on this type of problem.
Andrea is working with two colleagues (Tara Mitchell and Carol Newman) at Trinity College Dublin, and three researchers (Marcus Holmlund, Chloe Fernandez, and Serge Adjognon) in the DIME group at the World Bank on an evaluation of an entrepreneurship support program for refugees and host populations in Niger. The AEA registry entry provides more details on the trial. The program will give treated individuals a cash grant and some business training, and they wish to measure both economic outcomes, and also whether the program affects social outcomes like social networks and social interactions.
The problem: not being able to survey everyone before they know whether they are in the program
Here is how Andrea described the set-up and main issue:
“The design is such that within each treatment village there will be an assessment (made by two village committees) to identify the eligible households in the village. Then a lottery will determine who will benefit from the program among the eligible people. From the pilot that is currently ongoing we are seeing that roughly 20% of the eligible individuals are going to receive the program.
The big question we are currently struggling with is how and when to run our baseline survey. Ideally, we would like to survey individuals after eligibility is determined, but before the lottery, to avoid that the outcome of the lottery itself could influence their answers (which is unclear whether it should be the case, but perhaps knowing for certainty about participation in the program might influence some decisions, or there could also be some psychological influence playing a role). But we do not have the resources to survey everyone and with just 20% of eligible households being selected we would have power issues if we were to simply survey a representative sample of eligible households”.
The team were considered a two-stage approach, where they sample a random sample of households after eligibility (which will oversample the control group), and then a booster sample of households after the lottery (oversampling the treatment group), and then using this to test whether responses differ with knowledge of program status. But they wanted to see how others had dealt with these issues.
Potential solutions
This issue of trying to interview everyone after selection but before status is known or implementation starts is a reasonably common one. Here are some possibilities I suggested.
Approach one: skip the full baseline and just use data from the program application form. Often you require people to apply to a program by filling out some quite basic application form. This normally contains some basic demographics and questions used to determine eligibility status, but it may be possible to add another page or two to this to collect a little more data (and importantly to make sure you ask for as much contact information as possible to help in re-interviewing them). This was the approach I used in the Nigeria business plan competition I worked on. There we had almost 24,000 people apply, and almost 6,000 get through to a second phase where they submitted business plans. All of this was done online, and it would have been very expensive to interview them in person. At this second stage, I added a short baseline datasheet that had them fill in a little more background information themselves online. This approach may be particularly useful for evaluations in which the key outcome of interest might be similar for everyone to begin with – for example, employment status (not-working) and income (zero) might be the pretty much the same for a sample of unemployed young job-seekers, or business start-up status (not started) and profits (zero) might be pretty much the same for a program designed to help people set up new businesses. As a result, the baseline won’t be so informative for predicting future outcomes. The first follow-up survey can then be used to collect more time-invariant characteristics of interest, and perhaps some retrospective variables.
Approach two: do the baseline once you know treatment status (and perhaps even as the intervention is starting), but focus on variables that are time-invariant, retrospective over a longer period, or slow moving. I’ve used this approach in several studies in which random selection occurs in batches on a rolling basis, with interventions starting very quickly thereafter. For example, in this evaluation of a vocational training program for the unemployed in Turkey, individuals applied to different courses and we then would randomize if the courses were oversubscribed. We ended up having 130 different courses throughout the country, and once the application deadline was reached, random selection occurred immediately and then courses started quickly thereafter. As a result, only one-third of the sample was able to be interviewed before the courses started, and the remainder during the first weeks of the course. Our baseline balance table then focuses on demographics, and employment history (e.g. ever employed), rather than on current employment status would could change in a few weeks once people know if they have the course or not. Similar, in this non-experimental evaluation of a seasonal worker program, workers were recruited in small batches and then migrated quite quickly after selection. Our surveys then asked remaining family members about demographics, work status in the previous year, and slow moving variables like housing infrastructure that we thought would not change quickly.
Approach three: non-public lottery and notification with delay. Another possibility which I thought might work in the Niger case could be to just do the lottery privately or semi-publicly (e.g. with a government official or village leader as a witness), and then delay telling people for a week or so the results, doing the baseline survey in between. This approach could be helpful if it is really important to get baseline information on something you think could change with knowledge of treatment status.
Approach four: disassociate the survey from the program. We might be concerned about two issues in collecting data after treatment status is known, or training has just begun. The first is a reporting effect/experimenter demand effect – where people just change how they report to you because of the outcome of randomization. E.g. if I am in the control I might claim to be poorer than I am with the hope of being moved to treatment, or if treated I might want to make you happy by telling you what I think you want to hear. Disassociating the survey from the program (by having it done by a separate organization, with it being framed for a different purpose) can help here. Alternatively one could bound the size of the effects as discussed in this post. But a second concern is genuine short-term changes – e.g. if I am not selected for the training program, I might start working, whereas I stop working and wait for training to start if I am selected for treatment. Or I might genuinely have an increase in happiness from being chosen for the program, even if it hasn’t started yet. Then it will not be enough to disassociate the survey from the program, and you either need to focus again on variables that don’t change so quickly, or use an alternative approach.
The best approach depends on why you need a baseline at all
We’ve previously blogged about several related issues here (see Alaka Holla’s post on whether we over-invest in baselines; and my posts on what to do when everyone has the same baseline value, or when measurement of the outcome changes between baseline and follow-up), and Jeffery McManus has a nice post on when to collect baseline data on the IDInsight blog. But assuming you are in a situation where you at least know who is in the experimental sample and have some very basic information on them, then we can consider several key reasons for the baseline that matter here:
· Improving power by controlling for baseline covariates: this reason is most compelling when your baseline variables (including the baseline of the main outcome) are strongly predictive of the future outcome of interest (see here). As noted in approach one, it is less likely to be a key factor in employment generation or business start-up programs. However, it could matter for social networks, since we would expect the contacts you have today to be reasonably predictive of who you will talk with a year from now. (Note I’m abstracting here from the additional power gains possible from stratifying on baseline covariates, since this isn’t possible in the Niger case).
· Collecting enough baseline controls for a balance table and to test and correct for possible attrition: for the balance table, it may be enough to use the administrative data and some basic information collected in my approach one or two above. See here for discussion on use of such a table. But if you are concerned that attrition may be high, having more variables that are closely related to the outcome of interest can be helpful for convincing readers that attrition is not causing bias, and for possibly reweighting estimates.
· Heterogeneity analysis: I’m usually struggling enough for power in looking at main effects that I try to be pretty parsimonious in heterogeneity analysis, and focus on relatively simple to collect variables like gender, education level, or firm sector have been what I want to look at. But as people increasingly turn to data-hungry machine-learning heterogeneity approaches, they may want to collect more baseline variables for this purpose. In their case, they are interested in heterogeneity by how much they already interact with others, and by some type of measure of baseline psychological wellbeing. You could then ask about interactions with others last month/before they applied for the program to get some pre-program measure that is perhaps ok for the former, but the concern would more be that simply learning treatment status might have at least short-term effects on psychological wellbeing.
Each of the approaches discussed above involves different trade-offs in terms of logistics, the types of data one can collect, and what it allows you do. But starting off by being very clear why you want a baseline and which variables you care about most might help in making these trade-offs.
Readers – any other creative approaches you have used, or suggestions for alternatives that could work in these settings?
Join the Conversation