I am currently involved in efforts to plan two impact evaluations of large government programs. With my collaborators and government counterparts, I have been weighing the feasibility and pros/cons of conducting baseline surveys given our budget and timing of the interventions (whether we can even get field work done before the program starts). Of course, the topic of whether to do a baseline is not new to this blog – see Alaka’s post here and David’s post here – among many other online conversations about this.
The recent paper by Mark Treurniet, just out in EDCC, caught my eye for less conventional reasons than those related to budget and survey feasibility. Treurniet’s study finds that participating in the baseline significantly affected take-up of an agricultural technology. As discussed in the paper, this work relates to the literature on “panel conditioning” as well as previous work, such as how surveys can influence the subsequent use of water treatment products, take-up of medical insurance, and borrowing behavior.
The setting is the Eastern region of Kenya, areas with high levels of aflatoxin contamination (a fungal toxin that can grow in maize and groundnuts). While effective technologies which reduce aflatoxin contamination exist, take-up of them has been low. The agricultural technology studied here is the take-up of Aflasafe, a biocontrol product recently introduced in Kenya. From the baseline, most farmers in this study know about aflatoxin but they report little knowledge of Aflasafe (only 10%) and very few had every used it (2%). Due to budget realities, not all farmers could be surveyed in the baseline. So the baseline was randomized (see the paper for many nuances about the survey assignment process). As not all farmers assigned to the baseline were surveyed, baseline survey completion is instrumented by primary and replacement status in assignment.
Based on the 2SLS estimates, farmers in the baseline (September-October) had significantly higher take-up of Aflasafe (which was sold between November and December) as measured by administrative sales data. This higher take-up was both on the extensive margin (adoption increased by 9% points, where the level is between 12% and 30% for the non-surveyed farmers) and intensive margin (by .36 Kgs, where the level is between .3 and .9 Kgs for the non-surveyed farmers). The relevant comparison level is not straightforward since there is not one ideal control group of non-surveyed farmers. The paper explains this in more detail which I will omit here… Except to say that there is a nice discussion in a couple of places in the paper about the use of replacement farmers for inclusion into the baseline survey (when the primary selected farmer is not surveyed), including the potential bias in estimating treatment effects when using randomized waiting lists for treatment assignment (see de Chaisemartin and Behaghel 2020 and working paper ungated here).
These effects seem sizable (to me anyway) even compared to the wide ranges for non-surveyed farmers, especially considering the seemingly modest baseline survey questionnaire. It consisted of a maximum of 158 questions (depending on skip patterns), of which 9 related to the take-up of concern. There were 4 questions related to knowledge of aflatoxin, 2 about experience with aflatoxin prevention measures, and 1 asked if respondent had heard of Aflasafe. For the 10% who report they have, the survey asked about past use and planned use this season.
This effect is argued not to be explained by the survey prompting a change in farming production decisions (which, in turn, could impact Aflasafe adoption); inorganic fertilizer is too weakly correlated with Aflasafe to explain the size impacts above. So what else might it be? Information provision is ruled out since such information provided via the baseline survey was quite minimal (and the survey did not seem to prompt an increase in participation in project trainings). Question-behavior effects are not the source since the baseline did not ask about predictions or intentions with regards to aflatoxin.
Three other mechanisms are proposed. One is experimenter demand effects where the farmer takes the survey as a signal to her that aflatoxin prevention is needed and so she responds with greater take-up. Second is Hawthorne effects where being surveyed made farmers more aware of their choices and behaviors (the researchers are watching you!). Third is the bandwidth mechanism, where the subset of survey questions (the at-least seven of them) drew farmers attention to aflatoxin, its consequences, and the idea of prevention measures. Ultimately, the paper is not able to distinguish between these three.
And a fourth mechanism that lies somewhere between or adjacent to these others (& which David pointed out to me): the trust signal. In this case, farmers might take the survey as a signal which addresses some of uncertainty that they might have about a product or treament. This might especially be the case where the survey team is associated with or is itself the treatment implementer, and has repeat association/interaction with the farmers. Put another way, this is like the "NGO reputation effect" (earlier ungated version).
But do these baseline effects matter in regards to the motivation to have a baseline? The point of the baseline in this case was to inform a study of a “market linkage” treatment randomized across villages where treated villages received a modest market premium aflatoxin-safe maize sold through the project. The main findings of the market linkage treatment were that the intervention did not increase take-up rates but it increased the quantity of Aflasafe purchased (Hoffmann, Kariuki, Pieters, and Treurniet 2022). However, these treatment impacts seem to be, to some extent, underestimated due to the baseline survey impacts. This would be consistent with the proposition that “the baseline survey increased the farmers’ valuation for safe maize for home consumption, then the additional motivation from the market linkage [i.e. the treatment] might simply have been less important.” (So maybe it’s a good thing they did not have more funds for a larger baseline!)
The paper concludes that “experiments that rely exclusively on samples that are surveyed before outcomes are measured are likely to provide adoption estimates that are higher than they would be in non-surveyed populations.” This seems a bit bold and too broad, but the point that “estimates of treatment effect for surveyed samples may not be valid for external populations” seems more appropriate. As well as the point that biases “may especially arise in situations where available bandwidth is an important driver of technology adoption [or whatever take-up action is at play] and where financial costs and benefits play a smaller role.” And circling back to my own work, this has me thinking hard about what information we would want to collect if we do a baseline, so as to limit these risks.