Published on Development Impact

Learning from efforts to start experimentation in European cohesion policies

David McKenzie

May 06, 2024

This page in:

On April 30 I took part in a workshop organized by the World Bank with DG REGIO, the European Commission’s Directorate-General for regional and urban policy. EU Cohesion Policy supports less developed regions of the EU with a lot of funding that goes to programs designed to enhance competitiveness and growth of those regions – 392 billion euros are allocated between 2021-27. With all this funding comes a lot of EU bureaucracy and monitoring, with the director noting that there are thousands of evaluations, leading cohesion policy to be one of the most evaluated programs. But almost none of these evaluations involve comparison to rigorous counterfactuals – there are ex ante assessments of the likely effects, monitoring and process evaluations checking money is spent on things the regions say they were going to spend it on, and some ex-post assessments similar to those done by IEG at the World Bank. The goal of the workshop was to spur more experimental evaluations of these policies, and share lessons from other efforts in Europe and in different parts of the world. I thought I’d share some of the examples and discussion that might be of broader interest to our readers. I should note that these reflect my takeaways and reflections, and may well differ from those of the EC or World Bank teams organizing this.

Lessons from Embedding Systems of Randomized Evaluations in France and Spain

There were very interesting presentations from Martin Hirsch, former High Commissioner for Active Solidarity against Poverty from France, and Mónica Martinez-Bravo, Secretary of State of Inclusion from Spain, on how these countries were able to build up a system of evaluations in particular areas.

Martin noted that 15 years ago or so it was incredibly hard to think of doing randomization with public policy in France, since the law mandated that all citizens should be treated equally by policies – so that giving some people a program and other similar people not getting it or getting something different would not be allowed (I might argue that giving people an equal chance of being selected could still be interpreted as treated people probabilistically equal). He noted how a confluence of the right people, including a group of French academics trained in experimental methodology, and changes in the policy environment led to a changing of the law, to explicitly allow piloting and experimentation, as well as the establishment of a fund for social policy experimentation – and since then 300 or so social experiments have been conducted – with a lot of the early work around issues of youth employment and social assistance. One nice point that came up was the big gap between scientific time and political time – and so the need to build a stock of experimental results so that when policy questions come, you are already to offer solutions. Another nice example came from how evaluations helped prevent counterproductive policies being implemented. A case in point was a suggestion to mandate the use of anonymized resumes, based on the idea that this would prevent discrimination. However, an RCT found that this actually worsened employment outcomes for minorities, perhaps because it prevented positive discrimination in the first screening stage by some companies aiming to increase diversity, while those that wanted to discriminate could then still discriminate at the interview stage. See here for more details.

A second and newer example came from Monica, who discussed a program of 32 RCTs recently implemented between 2021 and 2024 in the Inclusion Policy Lab in the Ministry of Inclusion, Social Security and Migration. This arose out of the efforts to address the effects of the COVID-19 pandemic, when the poor in Spain were hit particularly hard, and the Spanish government introduced a new minimum income policy to provide cash transfers to the poor. There was a sense that additional efforts on top of basic transfers were needed, and so the government set up a fund to systematically experiment and identify successful interventions through RCTs. This fund, equipped with 200 million euros, has supported the implementation of more than 30 experiments and reached more than 90,000 beneficiaries Many of these programs involve personalized assistance - e.g. labor insertion programs for the homeless which offers personalized assistance, or programs for the very poor that offer tailored assistance in helping them overcome barriers to even accessing the minimum income. In many cases, there is not a pure control group getting nothing, but rather a control receiving the status quo benefits, and a treatment group getting a more intense and innovative type of support. A big part of the effort of this fund was to reframe the discussion on social spending from this just being spending, to it being “social investment” – and experimental evidence is important in being able to document whether the resources used really deliver benefits to society in terms of getting people into work and housing. Running this as a set of coordinated evaluations was seen as key to making this not a piecemeal approach, but seen as more central to government conversations around the program. See here and here for more details.

Challenges faced and lessons from new implementors

There was a lot of discussion around some of the challenges faced in implementing RCTs in the European context, as well as presentations by teams from four countries that had newly started implementing RCTs in the innovation policy space. Some of the takeaways and my reflections.

· GDPR and Data Access Challenges: Despite many European countries having a large amount of administrative data, one of the challenges raised multiple times was uncertainty about how data privacy provisions in European law inhibit the sharing, linking, and access to data. Even agencies within the government faced a lot of difficulties in being able to link data from different parts of their own agencies, let alone linking to other databases. One thing that greatly puzzled me here is how this squares with the famous Nordic cases, where it seems many papers have been able to link every dataset ever on Swedish and Norwegians to trace people for years. A big issue here seems to be the general risk aversion attitude of governments and legal advisors, where there is a lot of uncertainty, a lack of accounting for or documenting the costs of not providing data, and it is just easier to say no – this is where having an organization like the European Commission providing clear guidelines on what can be done under European law would be helpful. We’ve also advised project teams to think proactively on this, by getting applicants to programs to commit to sharing data for evaluation purposes at the time they apply. (Post-script: in fact project teams working with the World Bank note that this seems to be a fear that was much less of an issue in practice for prospective evaluations of firm policies. Here setting up NDAs and getting access to application and MIS early has been helpful in giving real-time feedback to agencies on who is applying and who they have yet to reach, and this in turn has built trust and an appreciation for the usefulness of this work. Additionally a lot of firm-level data can be publicly available in the EU, whereas this may be less the case when dealing with beneficiaries of employment and social assistance programs.)

· Legal barriers exist but can be addressed: The issues of legal barriers and concerns to randomization was mentioned however it was very interesting to note as in various cases these could simply be addressed as long as political will existed and the value of experimentation was clear.

· Bureaucracy and the need to experiment on processes as well as programs: a point that came up multiple times was that a lot of European processes for having firms apply for programs, for scoring them, for procurement, for monitoring afterwards, etc. were all very time-consuming and burdensome. A great quote from one of the participants was that “even our simplification efforts involve a lot of complex bureaucracy and time”. I noted that especially for smaller grant programs, there could be large cost-savings and time savings if firms were just randomly selected for grants and noted that once there has been some screening, a lot of reviewer scoring is pretty time-consuming and rankings arbitrary. I pointed to this blog on how several scientific research agencies in Europe and New Zealand were introducing partial randomization in their funding for this reason. But more generally, I think both the EC and World Bank should be randomly testing much more streamlined and less bureaucratic systems for procurement, reporting, etc.

· Is an RCT even needed? One comment that came up that I have heard multiple times is “we have a program that has many beneficiaries, who come back to it again, so we know it works and don’t need to do an evaluation”. I particularly hear this in the private sector. I perhaps unfairly then pitched my new business idea of a casino which also offers horoscopes, homeopathy, and cigarettes. But more seriously, I think it is worth noting that there might be things people like that a government may still not want to support, that might not be cost-effective, and that might be hard for users to learn the costs and benefits from. But not everything needs an RCT to learn whether it is working, and some RCTs may not be powerful enough to make it worth investing in them. Here my little paper on whether you need a power calculation for poverty alleviation came to mind.

· Many of the benefits of getting started on an RCT come before any of the results are known: several of the countries noted the value of thinking very carefully through the causal chain and doing market research/beneficiary discussions to ensure the intervention was meeting the desired needs, of the communication assistance in boosting application rates they had received, and in what they were learning from baseline data.

Framing, and RCTs as R&D, not as quality control

One recurring theme that came up was around the framing of RCTs and how organizations should think of them. For example, in discussions over ethics of doing RCTs, if you view the status quo as governments are spending lots of taxpayer money to experiment with untested programs without learning anything, then the ethical debate looks a lot different than viewing the status quo as offering services people seem to like and then now denying some people something. But I think a more useful framing point was made by my colleague Leonardo Iacovone in his remarks. He noted that too often RCTs are seen as akin to quality control in firms, where they are viewed as being used for accountability purposes, to tell a government whether they have spent their money well or not. But instead he wants us to think of RCTs as being like an R&D expenditure, where it is an investment in the future, testing new innovations and generating new knowledge for future policy efforts. And just like R&D, this should then not be a one time investment, but a continued and ongoing process seen as integral to long-term policy success.

Get updates from Development Impact

David McKenzie

Lead Economist, Development Research Group, World Bank

More Blogs By David

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.