I often have that urge to add one more question to household surveys I am working on. My collaborators are usually sitting there, telling me the survey is already too long. And, as a new paper by Kate Ambler, Sylvan Herskowitz and Mywish Maredia shows, this additional question has a serious cost. And it’s a cost that’s going to mean different things for different people in the household.
Ambler and co. are examining reports of productive activities in Northern Ghana. Bottom line: they find that people later in the survey get less reported about them by the respondent – losing 2.2% each time for each place in the household queue. And it’s significantly worse for women and youth.
The setting is a household survey in Northern Ghana. Ambler and co. take the household roster, which lists each household member, and then randomize the order in which those folks show up when it is time to ask about productive activities.
Now, the productive activities follow a branching question structure. First the enumerator asks about the primary activity of each household member over the last 12 months. If there is one, this leads to a set of questions on the type of work, hours and earnings. This is followed by questions on secondary activities, with details again, if there is an activity. And then the enumerator asks about the primary activity over the last seven days, with more questions if it is different from the earlier one. This is then repeated for all household members.
So, respondents quickly figure out that if they say person X is working, this will trigger a host of other questions. The incentives, if you want this survey to be over, are pretty clear.
Ambler and co. restrict their analyses to answers on folks other than the respondent. (Yes, the approach here is to have one person respond for the whole household, with an option to confer if they’d like – more on this later). And they use household fixed effects, so they drop households with less than two over 14 non-respondent folks.
What do they find? First, let’s start with the household roster. Women are listed later (their position is 0.3 greater than men on average). Part of this is likely due to how we frequently structure household roster – start with the head, then the spouse, then other adults and, finally, kids (although Ambler and co. note that enumerators don’t always comply with this ordering). So, given fatigue/impatience might accumulate as the respondent goes on, the female and kid disadvantages are built in.
Ambler and co. then run a regression on the number of work activities reported as a function of an individual’s place in the randomized version of the household roster. Individuals lose .017 activities (2.2%) for each position they drop in the ordering. To put this in perspective, the median position in these households is 3, which translates to losing 6.6 percent.
There is also a bigger hit for secondary activities. While I jumped right to the conclusion “that’s why they are secondary” it turns out that when Ambler and co. check the position penalty for hours rather than activities, there is no difference across primary versus secondary activities. So maybe not.
Ambler and co. then look at heterogeneity. To start with, effects are significantly larger for women than men: a loss of 3.1 percent versus 1.4 percent (using the sex specific means). And the youth also lose out, dropping 3.1 to 3.4 percent for each place in the queue. So, these folks are doubly penalized: they are very often later in the list, and the underreporting is greater for them.
Does this vary with household size? It turns out the per position effect is pretty similar. Of course, this still means that on aggregate larger households lose more reporting of their total efforts.
Maybe the enumerators are just tired? Ambler and co. assure us that the results are pretty much the same for surveys done later in the day, later during the survey period and (the clincher?) for enumerators who were rated worse in training.
OK, so what’s a body to do? A first option is to rely less on proxy responses. This is indeed the approach recommended in the original LSMS guide (see page 234 of the book). This will definitely increase costs. But you do get greater reporting. Ambler and co. find that self-reporting is associated with a significant .085 (11.5%) bump in the number of activities reported.
Ambler and co. also point out that you should consider your research question. For some (e.g. around inequality), the best approach would be to randomize the order in which you ask about members of the household (much easier if you are using a tablet survey). For other research questions (e.g. overall poverty), you may want to start with the largest contribution to household income.
Another approach could be to get rid of the branching structure of these questions. So, go through and list all of the primary activities for everyone in the household. Then secondary. And then and only then ask all of the earnings, hours, and other questions. This may lead to more difficulty in both the enumerator and the respondent following the questionnaire, but it could help. Indeed, it would be interesting to see (experimentally) how this works relative to the current approach or a randomized household roster.
Finally, we could just ask less questions. Indeed, this is always the trade-off: people’s valuable time versus the knowledge gained. And, as Ambler and co. point out, if we aren’t asking about secondary activities (for example), we are missing important parts of diversification (let alone income). So c’mon, how about that added question?
Join the Conversation