Data gaps and innovation
In times of emergency, traditional data sources such as census and survey data are quickly outdated. Call detail record (CDR) data are generated with every call and SMS, including timestamps and location information. In the context of COVID-19 and other crises, the analysis of CDR data can inform policy. However, this requires an institutional framework, infrastructure and analytical capacity to build a suitable data pipeline, allowing policymakers to respond quickly to a fluid situation. In a recent working paper, we outline how such a data pipeline was built in the Gambia.
With the onset of COVID-19 in The Gambia, the government announced a national health emergency with restrictions on economic activity. It became clear that prolonged social distancing would bear a high cost for households and firms, and there was interest in creating an evidence base tracking the effects of these restrictions on mobility patterns. The World Bank, the Gambia Bureau of Statistics (GBoS), the Public Utilities Regulatory Authority (PURA), and The University of Tokyo had already set up a partnership to explore using CDR data to create an evidence base for policy design.
Building a CDR Data Pipeline
The partnership established the institutional groundwork to collect aggregated, anonymized data from mobile phone companies through the regulatory authority. With the interest spurred by the COVID-19 emergency, the team sought to put in place a data pipeline that allowed for rapid analysis while adhering to best practices in terms of data confidentiality and security. These included the following activities:
- Strengthening existing data collection protocols: As part of its mandate to monitor the quality of mobile network services, PURA had put in place a centralized repository of data. After securing the necessary approvals, the team worked with the system administrator to include additional indicators as part of this routine monitoring for use in the analysis. This minimized the reporting burden on MNOs, facilitating compliance.
- Defining processes required for ensuring data protection and privacy: Data protection and privacy is of utmost importance for PURA and MNOs[1]; their operations strictly comply with the national legislations. Data used for this initiative were anonymized in order to protect personally identifiable information. All data processing happened within the premises of PURA, and the team only had access to aggregated summary statistics. Open-source tools for data de-identification and analytics were provided along with hand-on training to the technicians of PURA and MNOs.
A system on the regulator’s premise
- Specifying system requirements: System requirements were specified based on the data size and computational capacity required for the analysis using a Hadoop platform.[2] To ensure an additional level of security, data collected for the analysis were firewalled and stored on a separate server on the premises, with remote access strictly limited to key researchers and system administrators.
- Procuring hardware and installing a temporary machine: Following the system requirements specification, hardware procurement was initiated. Given delays in procurement, the team installed a temporary machine on the regulator’s premises. It was a small mainframe capable of accommodating data provided by MNOs as well as performing non-computationally intensive analysis for data quality assessment, ensuring data remained on premises.
- Workshops and relationship building: Before the onset of COVID-19, the team had organized a series of workshops and training sessions, which helped all parties understand their roles and capacities. These exercises build trust and offered an opportunity to discuss lessons learned from other countries. It helped inform collaboration when all interaction had to shift to remote work due to COVID-19.
Putting the project to work for the COVID-19 response
The data pipeline draws de-identified CDR data provided by MNOs and aggregated on the regulator’s premise, before making it available to decision makers. The team employed methodologies developed for producing standardized indicators to analyze human mobility patterns during COVID-19.[3] Results demonstrated that the lockdown disproportionately affected urban areas by restricting economic activity, which should inform relief and recovery efforts.
Results from the analysis were presented to the Ministry of Finance and Ministry of Health. The team argued that findings from the use-case on COVID-19 could inform targeted testing initiatives, by concentrating efforts in areas of high mobility. When a full lockdown is not possible, this could also inform where social distancing policies should be enforced to reduce the risks of transmission.
Figure. Mobility Restrictions Disproportionally Affected Urban Areas
Scaling for Further Impacts
The paper demonstrates the potential of CDR data to inform decision making. This approach offers an opportunity to leapfrog existing constraints in developing country’s data collection capacity by exploiting data which is available in real-time, highly localized, and at low cost. However, as this experience exemplifies, the use of CDR data requires investments into the institutional and organizational framework of national statistical systems, including the necessary IT infrastructure and technical capacity. Once in place, a CDR data pipeline can become an essential tool in government planning and disaster response.
[1] Data used for this project were de-identified and no individually identifiable information was included. Data at the individual level can be processed only on PURA’s premises.
[2] Hadoop is a set of open-source software for data-intensive and distributed applications aiming to solve massive amounts of data and computation.
[3] These tools were proposed by the World Bank COVID-19 Mobility Task Force and codes for computing the indicators are maintained as open-source programs. Also see Flowminder COVID-19 Resources: https://covid19.flowminder.org/
Join the Conversation