Markus’ s post yesterday  is the first on what will be one recurring blog theme here- measurement. I’ll continue the trend today with a focus on one of the most fundamental welfare constructs in economics: consumption. Specifically, how might the development researcher accurately measure household consumption through survey?
Accurate consumption measurement has been a long standing challenge for applied work. While promising higher-tech measurement options will become available eventually (such as shop scanner data, smart phones, etc.) the standard bearer of consumption measurement in development is still the survey. Typically this survey will take one of two modes:
1. a consumption diary left at the household for a primary respondent to record consumption for all members (ideally this diary is filled out at the end of each day or even more frequently), or
2. a recall survey where, again, a primary respondent is asked to calculate/estimate total household consumption over a fixed time period in the recent past.
Within these two modes there are numerous variations on the degree of disaggregation/commodity specificity in the itemized list of consumption choices (some questionnaires list highly disaggregate detailed lists of hundreds of items, others list only 10-20 highly aggregate consumption categories), the specified respondent (the household head, the spouse of the head, etc.), and the time period of recall (two weeks, 4 weeks, one month, etc.).
The consumption questionnaire design will have implications not only for the accuracy of measurement but also for the costs of the study, and a priori it is not clear which variant of questionnaire design optimizes accuracy conditional on a limited survey budget. With these questions in mind, Kathleen Beegle, Joachim De Weerdt, John Gibson, and I recently conducted a survey experiment in Tanzania .
We chose seven of the most common consumption questionnaire designs in use and randomly assigned them to households in order to compare the costs of implementation and relative accuracy. Our explicit focus in this experiment was on the measurement of food consumption, although we attempted to measure non-food items as well. To assess accuracy we also included an impractical but much more accurate “benchmark” eighth variant – a personal diary with frequent supervisory oversight.
This personal diary was given to each adult member of the household and the consumption of all non-adult members were assigned to one and only one adult. One expectation of the benchmark variant was, because it had multiple adult respondents, that it would more comprehensively capture consumption outside the home such as prepared meals and transport. This type of “private” consumption can lie beyond the purview of the sole household respondent of the other variants.
There are numerous findings from this field experiment discussed in depth in the paper. The table at the end of this post lists the measured mean and median annualized per capita consumption, as well as one measure – the Gini coefficient – of the distribution of consumption in the study population. Selected findings include:
-Condensed lists of consumption items save only a marginal amount of interview time at a relatively large cost in terms of accuracy.
-The cognitive demand of asking consumption over a hypothetical “typical” period of time, rather than a concrete period in the immediate past, doubles interview length while also resulting in more inaccurate measures.
- Frequent enumerator supervision of the household diary is roughly twice as expensive as infrequent supervision, with little gain in accuracy.
- Recall surveys tend to lose accuracy as the number of adults in the household increase, while household diaries loose accuracy in urban settings. In both settings the degree of “private” outside-the-household consumption is relatively large and perhaps problematic for the traditional variants.
We conclude that with regards to recall modules, a long disaggregated list of items with a 7-day recall period (for food consumption) is the most accurate and cost-effective, although not without certain drawbacks. In this study setting it is also more cost-effective to field a 7-day long-list recall design than a household diary which is three times as expensive and less accurate with regard to a variety of parameters.
Like the results of any single experiment, we wonder how applicable these conclusions are to different settings such as middle income countries where there may be more dietary diversity and consumption choices (as well as different degrees of information sharing in the household). We speculate a bit in the paper but of course this is no substitute for similar experiments conducted in these different environments.
For impact evaluations, the lesson, first and foremost, is to choose a consistent measure over time. Even if the researcher has inherited a sub-optimal design in baseline, it is often more important to maintain consistency rather than switch design midcourse to a more accurate measure. Doing so will induce unneeded noise in the measured outcomes.
More generally, field experiments that shed light on measurement issues can often be a low cost extension to the primary data collection needs of impact evaluations. Hopefully there will be numerous “piggy back” field experiments in the future that will help us understand how to better measure the variables we care about.