This post is co-authored with Thomas Pave Sohnesen
Since 2011, we have struggled to reconcile the poverty trends from two complementary poverty monitoring sources in Malawi. From 2005 to 2009, the Welfare Monitoring Survey (WMS) was used to predict consumption and showed a solid decline in poverty. In contrast, the 2004/05 and 2010/11 rounds of the Integrated Household Survey (IHS) that measured consumption through recall-based modules showed no decline.
Today’s blog post is about a household survey experiment and our working paper, which can, at least partially, explain why complementary monitoring tools could provide different results. The results are also relevant for other tools that rely on vastly different instruments to measure the same outcomes.
Why use proxy-based poverty measurement? Collecting consumption data to measure poverty is argued to be complex and costly, while proxy-based poverty measurement is often marketed as a more cost-effective alternative. Recent years have seen a proliferation of applications of the method to improve frequency (and comparability) of inter- and intra-annual poverty estimates, and to develop proxy means tests.
Testing an underlying assumption not tested before When first faced with the diverging trends in poverty in Malawi, our focus turned to the prediction model. We knew that the 3-month WMS fieldwork period had varied over the time of year (i.e. the season). However, our scrutiny of the prediction model to address this issue did not solve our conundrum. The model performed quite well, and when used with the IHS 2010/11 data, a range of models validated the result of stagnant poverty.
A competing hypothesis (not testable with the existing data) was that the differences between the WMS and the IHS questionnaire design could explain the observed outcomes. The WMS relied on a 20-page questionnaire that was significantly lighter in inter- and intra-module scope of data collection than the 66-page IHS counterpart. Our research question was then twofold: Can identical questions asked of the same population yield different answers in short vs. long questionnaires due to differences in interactions between different questionnaire designs and respondent cognitive processes? If so, what would be the implications for proxy-based poverty measurement?
How did we approach the problem? Our working paper is based on a survey experiment that was piggybacked onto the Malawi Integrated Household Panel Survey (IHPS) 2013. The IHPS 2013 attempted to visit each household twice, with three months in between visits. 204 enumeration areas (as a whole) were randomly divided into two halves, known as Sample A and B. Sample A was administered the (long) IHS questionnaire during Visit 1, and received an update to the household roster module in Visit 2. Conversely, Sample B received only the household roster module of the IHS questionnaire in Visit 1, and had been administered the rest of the questionnaire in Visit 2. Our experiment was to administer an additional 2-page instrument (with shortened IHS questionnaire modules with otherwise identical questions) after the household roster (i.e. the first module), to a subsample of IHPS households during the visit in which only the household roster would have been administered. Towards this end, 4 households in each EA were randomly selected for the experiment. The unique setup had a bonus: the experiment households also received the identical questions as part the long questionnaire with 3 months between the interviews.
What did we find? There are three key findings.
First, we find that observationally-equivalent households answer the same questions differently when interviewed with the short versus the long questionnaire during the same time period. Even the same households answer the same questions differently when interviewed at different points in time depending on the questionnaire. The analysis yields statistically significant differences in reporting across all topics and types of questions.
Secondly, the overwhelming majority of the proxies have lower averages among the households that were administered the long questionnaire. The effect is quite pronounced for binary poverty proxies related to consumption of non-food and food consumption items, and experience of household shocks. The average binary response is 2.3 percentage point higher in the short questionnaire than in the long questionnaire. At the mean of 25.7 percentage points for the long questionnaire, this effect is equivalent to 8.9 percent higher reporting. The categorical variables, particularly those related to subjective welfare and housing, were also impacted by changes in questionnaire design, although the pattern is less obvious.
Third, relying on prediction models (including the WMS model) based on the IHS 2010/11 data, we find that the differences in reporting are sufficient to yield poverty predictions that are significantly different in the short and long questionnaires. The resulting difference in predicted poverty estimates ranges from 3 to 7 percentage points, depending on the model specification. If we work only with the proxies solicited prior the administration of the experiment modules (demographics, education and location), we predict the same poverty rates in both samples.
What are the implications? Unfortunately, our study does not provide conclusive evidence on why we observe these notable differences, or which questionnaire provides the more accurate answers. Interview time does not, in itself, seem to be the driving factor, but beyond that the study was not designed to dwell much on the why.
The findings, however, emphasize the need for further methodological research on module/question placement effects in household surveys and associated cognitive and behavioral processes. Once you throw into this mix the surveys that use different interview modes and different questionnaire designs for different segments of the sample, the work becomes very complicated very quickly. And since we would still not know the truth in most cases, it would be ideal to think of outcomes that could be objectively measured and compared to alternative measures stemming from different questionnaire designs and interview modes. The literature on these issues in developing countries is still in its infancy.
One suggestion for survey operations collecting data for proxy means tests or poverty score cards is to pilot their instruments prior to roll-out, in parallel with the questionnaires that they have evolved from. The data on the same poverty proxies from different questionnaire instruments could then be checked for differences.
A broader point relates to direct consumption measurement in household surveys. In the case of Malawi, the mean duration was 26 minutes for the IHPS 2013 modules on food and non-food consumption that we seek to proxy. Thus, with respect to a survey for proxy-based poverty measurement, collecting consumption data, in and of itself, may not be as costly as commonly perceived. Here, “perceived” is the operative word as the costs of surveys with and without of consumption modules are not rigorously documented. The Living Standards Measurement Study (LSMS) team is working to fill these gaps based on the experience as part of the LSMS-Integrated Surveys on Agriculture (LSMS-ISA).