Recently I attended the inaugural meeting of the Data for African Development Working Group put together by the Center for Global Development http://www.cgdev.org/ and the African Population & Health Research Center http://www.aphrc.org/ here in Nairobi. The group aims to improve data for policymaking on the continent and in particular to overcome “political economy” problems in data collection and dissemination.
In my view, the key data problem in Africa is data access. Typically, data from a household survey or census is collected, used to produce a single survey or report, and then sits nearly untouched for years. For administrative data like that collected by health and education ministries, the situation is typically even worse: great effort is expended to collect detailed school and health clinic data, and the data is never used for anything beyond producing a few aggregate summary statistics.
One reason that data is hidden away is that data producers are often embarrassed by the quality of the underlying data and unwilling to have someone sniffing around pointing out problems. A second reason is that “Data is power,” but not in a good way. Organizations keep a tight grip on their data because it is a thing of value. As long as they hold exclusive access, they have the possibility of receiving contracts for analyzing the data or outright selling the data.
One example is the 2005-06 Kenya Integrated Household Budget Survey (KIHBS), the country’s most recent multi-purpose consumption survey conducted. This survey should be a keystone reference for understanding poverty, agriculture, employment and many other issues. Unfortunately, although in principle the data is available on request from the Kenya National Bureau of Statistics, in practice it has been made available to only a very small circle of researchers (including those at the World Bank) under the proviso that it not be shared more widely. As a result, there are just 157 citations in Google Scholar for the KIHBS since 2008—and many of those cite just simple published tabulations. In contrast, there have been nearly 26 times as many citations (4020) for the Kenya Demographic and Health Survey, for which the microdata is widely available.
Another example is the long-running Kenya rural panel survey conducted by the Tegemeo Institute. This data is unusually rich and could be used to explore a variety of crucial policy questions for the country. The data is not generally accessible to researchers not associated with Tegemeo http://www.tegemeo.org/ , and as a result has been greatly under-used. (Very recently, Tegemeo did announce that data collected more than 8 years ago will be made public, and the institute will consider requests for more recent data.)
A similar problem exists with researchers who generate new data often clutch their data because they want to have exclusive access for own research, fear others pointing out problems in their analysis, and do not want to take the time to put their data into a publicly available form. To take one example, I asked the Millennium Villages Project to share the data used in its published evaluation work—for which the initial data was collected in 2005. I was told that the project would consider sharing some data once the entire research project was complete and all resulting work had been published—i.e., sometime after 2020. Such an extreme “closed data” attitude is hardly unique to the MVP.
You might think that data producers and researchers are within their rights to control access to their data—except that with few exceptions data collection in Africa (and elsewhere) has been paid for with public money, either by the country’ citizens paying taxes to their governments or by taxpayers abroad who fund organizations like the UN and World Bank that support data collection. It is those citizens who are the rightful owners of that data. As a broad principle, publicly funded data should be freely available to the public.
Of course this principle should be subject to some conditions. Individual identifying information should be stripped for datasets, and adequate time should be allowed for the researcher or data producer to process and take a “first cut” at the data. A sensible rule might be all public data should be available to the public on request within 2-3 years of collection.
There are already a number of laudable data access models, such as the Afrobarometer, http://afrobarometer.org/, the Demographic and Health Surveys http://www.measuredhs.com/ , International IPUMS census project https://international.ipums.org/international/ . Making these models the rule rather than the exception will require governments and organizations that fund data collection to do two things: 1) make Open Data access the norm for funding agreements, and 2) ensure that data dissemination is funded from the start.
There are number of details and possible exceptions that would have to be considered in such policies. For example, as a colleague pointed out to me last week, it can be particularly difficult to ensure anonymity for respondents in qualitative data. This suggests that Open Data policies should differentiate between quantitative and qualitative surveys. Open Data policies don’t need to be absolute. But given how much data access is currently closed, we have a long way to go in the direction of Open Data to make sure that data is placed within reach of citizens who ultimately pay for it.