Yesterday the World Bank released their first report on the socioeconomic impacts of Ebola that was based on household data. The report provides a number of new insights into the crisis in Liberia, showing, for example, an unexpected resiliency in agriculture, and broader economic impacts than previously believed in areas outside the main zones of infection. As widely reported, prices for staple crops (such as rice) have jumped well above seasonal increases, but additionally we find an important income effect. We also find the highest prices in the remote southeast of the country, an area that has been relatively unaffected by the disease. The link to the full report can be found here.
But while the above are interesting and useful pieces of information, they were not easy to come by. There often seems to be an inverse relationship between the usefulness of data and the difficulties involved in collecting it. Emerging crises require timely information to target interventions, but these situations also face the most extreme capacity and frequent delays. The recently released findings from the high frequency cell phone survey in Liberia (and the coming results from its sister project in Sierra Leone) provide one example of leveraging the available information, along with making some unorthodox decisions, to get results quickly.
There are three main steps in any data collection project: select the sample, implement the survey, and analyze the results. In Liberia the team was fortunate enough to have a ready-made representative sampling frame. The Liberia Institute of Statistics and Geo-Information Services (LISGIS) was about halfway through administering a nationally representative multi-topic survey (HIES) when, in August, the situation had become too unstable to continue. (This was both because of the risk of infection, and because the earlier decision to re-use Ministry of Health vehicles left over from the recent DHS proved exceptionally dangerous when entering rural areas in the early days of the outbreak.) The HIES collected re-contact information for all households with cell phones, approximately 45 percent of which were urban and the remainder rural. The households reporting numbers were unevenly distributed across the country, but fortunately there was at least partial coverage in all areas. These phone numbers served as our sampling frame. The other main option we considered, obtaining lists of active numbers directly from telecoms, would have provided us no way of directly comparing respondents and non-respondents, and would not have allowed us to link to existing data.
The implementation challenges were less straightforward. The LISGIS had no experience in setting up a call center or administering the computer-based surveys necessary to avoid data entry delays. None of the staff had been trained on call-back and verification protocols. And, with travel bans and the exodus of technical capacity from Monrovia, it was impossible to find the necessary support. So we choose the next most logical location… Nebraska. After considering options from Ghana, Nigeria, India, and the United States, it was decided to contract the survey to the Gallup Organization’s Omaha and Lincoln call centers. They had extensive experience in the technical aspects, and the project team was convinced that the Liberia-Nebraska accent clash would be no more severe than the other options.
And so on October 1, dialing began. By the completion of round 1 one week later, all 2137 households with phone numbers had been called at least five times, and we had 638 complete interviews. This was lower than even our most pessimistic projections. Undaunted, we decided that it must be either an information or incentive problem (or both), and promptly sent text messages to all eligible numbers (everyone that was not disconnected and had not yet refused). We explained the purpose of these strange missed calls they were getting from the US, and told them they would receive 1 USD in phone credit for participating. From this process, we learned that only just over half of the text messages went through over the course of the week, indicating that many phones were switched off for the duration, a reasonable finding if households were no longer paying to charge handsets or if diesel for generators were held up by travel restrictions. The second round netted 48 new respondents and 425 of the original 648.
Out of options for increasing response, and becoming more and more conscious of how quickly information becomes stale in a crisis, we turned to weighting. Initial reviews of the characteristics of respondents and non-respondents showed relatively small differences in the observable characteristics (except for low response rates in rural areas), but it was still important to be as representative as possible with the results. In the first step of the weight calculations, we applied a propensity score adjustment to increase the weight for under-represented groups. We then used a post-stratification adjustment to shift proportions to those of the 2008 census. This was repeated three times to produce two sets of cross-sectional weights and one set for the panel component. (Further details, including logit regression results, are available in the technical appendix of the report.)
And from there we began to produce the report, with the caveat that the results were more informative than representative. Wherever possible, we verified our findings against auxiliary sources, such as the WFP VAM database and findings from the recent Mercy Corps (and other) reports. And while it would be misleading to claim our results are precise down to the hundredths decimal place, we are confident that the trends are accurate reflections, and that, to date, this represents the closest study of socio-economic impacts of the Ebola to true probability sampling.
Join the Conversation