Syndicate content

Getting better access to impact evaluation data

Markus Goldstein's picture

If the data and related metadata collected for impact evaluations was more readily discoverable, searchable, and made available, the world would be a better place.   Well, at least the research would be better.   It would be easier to replicate studies and, in the process, to expand them by for example: trying other outcome indicators; checking robustness; and looking for heterogeneity effects (e.g. gender).   There is also a wealth of other things one could do with the related metadata, including: looking at how different wording of survey questions generates different answers and getting parameters for power calculations.   Last but not least, making these data available would allow for a wide range of non-impact evaluation research.

So this is arguably a good thing.   But there isn’t that much of it. Why?   First, researchers need some return on their investment.   We spend a lot of time developing the instrument, negotiating the entire set up of both the survey and the evaluation, finding the money, etc.   Second, making data available is a pain. You have to document variables you might not use. The format has to be somewhat friendly.   Concealing confidential information takes careful attention and some creativity. And then people might email you with questions such as: “Why does the age of respondent of 4319 go down across survey rounds?”   Third, there are no rewards or incentives in the economics profession as a whole for bearing this cost or pain.  

So it would seem that this is doomed.   And it might be, but I am optimistic.   Let’s look at the arguments above a bit more.   1) Return on the investment.   Yes – but if there is agreement on this not going on for ever, then this becomes a time bound option instead of a monopoly.   2) The pain: it’ll never be zero, but with global outsourcing (as is the case with data entry for example) these costs are rapidly declining.   3) The incentives.   One big move to solve this (and to set some bounds on the return on investment as well) is that increasingly journals are requiring you to make the data available when you submit a paper – and in some cases when you publish, the data go on the web. One nice example of this is the American Economic Journals’ data policy. The journal policy is key because it lines up availability and incentives.   But it is primarily geared towards replication, not access for broader use.  

A broader option is to make the whole dataset (and the attendant documentation) available.   And this is starting to happen.    Two examples are JPALs website and the World Bank’s Impact Evaluation Microdata Catalog. Some coauthors and I were guinea pigs for the early work on the World Bank site and we deposited three rounds of our earlier Kenya HIV work. So the experience was fairly easy – there was a discussion of what had to be stripped for confidentiality, there were a bunch of questions on the documentation and then the folks who maintain the site did the processing and put it up. So this was a dataset where we had done most of the papers we had planned on.   And, critically, the processing – organizing the data files, documentation, setting it up, was all done for us.   So now the datasets are there and folks can use them to see what the answers were, do comparisons and the like (I encourage readers interested in exploring the World Bank site to check out the guide on how to work it – it wasn’t obvious to me).   What has made me particularly happy is the number of requests we have gotten for the full dataset.   The way it works here is that you can ask for the full dataset, but you have to explain what you are going to do with it.   This is cool for me as they come up with topics I hadn’t thought of using it for – apparently there are people looking at the effects on shocks on children’s anthropometric outcomes and others looking at access to community savings groups.  

The JPAL website has 14 datasets and the Bank has 17 – which is a start. So what I’d like to do is get a discussion started on how we might grow these things.   Clearly a centralized repository isn’t the answer – in addition to the sites above there are other sites with surveys from developing countries that could be of use (for example ICPSR at Michigan, the UK data archive to name two that cover both developing and developed countries).   So what we need is a way to aggregate all of these – maybe something like a Travelocity of development surveys. It could find surveys and heck, if we get the metadata attached to the surveys right – it could even go inside, finding variables, giving us a range of values, etc.   But what would you like to see?   Do you know of other sites which put data and some form of documentation up in a way that is pretty easy to download (let’s say both for impact evaluations in particular and surveys more generally)?   What other ideas are out there to make these things more available? 


Markus, The Social Science Research Network (SSRN) links up a lot of research information that can be accessed via different academic communities. Perhaps investigating that model might provide some ideas?

Agree with everything you've said here. One point that I would add or emphasize is process learning. Effective diffusion of methodological learning often requires us to get down to the detailed operational documents and results: you can rarely get the sort of practical details that matter from the published findings. I think this is why you get an experience-curve effect much more readily within organizations than between them, because a lot of this stuff tends to stay proprietary (methods, labour efficiency in information collection, instrument presentation, etc.). In the peace & security space (where the data is squishier and more heterogenous) it is a somewhat-common practice to put together briefing books of the key documents used for evaluations and major decisions. There are third-party providers that do this albeit mostly focused on national security in the OECD countries. This is RSS-able but also not very efficiently searchable in the same sense you identify. Would nonetheless love to see if third-party think-tanks and research could take a similar approach for international organizations.

Submitted by Pete Forde on
Markus, Two leads for you to check out. First is our data publishing and collaboration site, which actually seeks to address the primary questions you raise. We believe that well curated datasets that are accessible to non-technical users and place significant emphasis on discussion and the narrative behind the datasets changes the big picture significantly. It'd be great to know if you have any practical suggestions for us. The second is the IDRC's Think Tank Initiative, based out of Toronto, Canada: I think that there is cause for optimism that many of the hardest problems around working with data are starting to get solved. Pete

Submitted by Rebekka Grun on
I like the accessibility of DHS data: You just have to register and give a purpose. Similar procedure as UK data archive, but much more lenient access. I seem to recall that StatCan also used to be relatively uncomplicated, releasing the IALS data (International Adult Literacy Survey) but a quick check of their website suggests this may have changed. Best, Rebekka