Published on Development Impact

We randomized! But did we really, though?

This page in:

Disclaimer: In this blog, we will not address the “So, should I re-randomize?” — cf. our conclusion for a few useful resources, we are going to stay out of this for fear of the gods of randomization … 


Imagine you are running a Randomized Control Trial, and you randomized an intervention across treatment and control groups. You find balance in your outcome variables across groups, so you are happy with the random assignment. Habemus randomization! You conclude that the assignment process was random and hand over the assignment result to your implementation team. Does this sound like a process you do all the time? What could potentially be wrong with this?  


A true story [Motivation]

The idea of writing this post came to us a couple of weeks ago, as we were performing a randomization for a multi-arm trial under multiple policy constraints. The project had a target number of treated individuals, for a combined group of two treatment arms, for each district. While the random assignment had to be done at the community level, the project target numbers were counted at the individual level (and multiple individuals for each community were identified as eligible for the program). 

Our first step was to manually allocate the district quotas of treated individuals across arms, and randomly assign communities, stratified by district. The random assignment generated balance in a key variable between arms. 

We were still a bit skeptical, so as a second step we performed the random assignment 100 times to see how frequently the selected covariate was balanced among the 100 random draws. We discovered that we had gotten lucky with our first draw. [For more on what exactly led to this issue, see our answer to Jason's great question in the comments!]

This exercise allowed us to fix our randomization. We thought we would share this experience with our readers and overview different methods for checking that your random assignment is truly random. For an in depth discussion of whether balance test should be required and when they are useful, we suggest reading David’s blog post


Is some data consistent with a random assignment process? 

First, we consider the case when you’re not the one running a random assignment; instead, you are presented with a dataset where units are supposedly randomly assigned to groups. One example of this is when the randomization for an experiment is done by a government or NGO. See this paper that discusses that a government-randomized farm input subsidy program in Malawi shows some level of targeting.

In such cases, a question that naturally arises is whether the data is consistent with a process that randomly assigned units to groups?

To tackle this question there are many tools one can use: 

  1. The simplest test performed in experiments with a treatment and control groups is just to conduct a series of t-tests comparing the means of the treatment and control group for different predetermined covariates by regressing each of them on treatment. DIME Analytics provides a convenient tool for doing this here.
  2. Another check one could do is a joint orthogonality test proposed by David.
  3. For cases where you have many groups, you can use resampling techniques analogous to those conducted in Carrell and West (2010). These tests involve drawing multiple equally sized random samples for each observed group,  computing the mean of a given covariate in each simulated group, then computing empirical p‐values for each observed group which is equal to the proportion of simulated group with means less than that of the observed group. If the observed groups were generated by a random process, the empirical p‐values should be uniformly distributed (this can be tested using a Kolmogorov‐Smirnov or Chi-Squared goodness of fit test). Example code for doing this can be found here


Habemus randomization?

We next consider the case where you are the experimenter randomly assigning units to treatment or control. 

In cases where the randomization is complex, it is probably prudent to check whether the treatment assignment mechanism generates balanced groups over multiple draws. As we previously mentioned, one scary possibility is that a treatment assignment mechanism that does not generate balanced groups over multiple draws can still generate certain draws where groups appear balanced across observable characteristics, the same way that a non-random assignment rule may yield some balanced draws—so just checking for balance on one realization is not a sufficient test!

To test whether a randomization procedure is working as intended, we turn to randomization inference methods. We use the procedure to draw a large number of possible treatment assignment vectors, then for each draw:


  • Conduct a t-test comparing the means of the treatment and control group for a given covariate(s) by regressing the covariate on treatment, and save the resulting coefficient and p-value.
  • Save the treatment assignment vector.


Once this is done, the checks one can conduct are:


  • Check if the distribution of the saved p-values is uniform. If assignment was truly random they should be uniformly distributed between 0 and 1. This can be checked using the previously mentioned tests.
  • Check if the distribution of the saved coefficients is mean zero. If assignment was truly random then the treatment should on average have no effects on a predetermined covariate. 
  • Check if there exists a randomization unit that never gets assigned to treatment (or conversely one that never gets assigned to control). This can be done by checking whether the average of a given component of the treatment assignment vector is 0 or 1 across the many draws.


We provide some example code for running these checks here.



We started by discussing how to check whether the treatment and control groups are balanced for a given draw of treatment assignments. Next, we discussed how to check whether a treatment assignment mechanism on average generates balanced groups. Naturally, this raises the question of how one should proceed when one or both sets of checks fail.  

Consider the case described in the introduction. If one obtains results suggesting a draw is balanced but that the treatment assignment mechanism does not on average create comparable groups, then one should find out why the treatment assignment mechanism is failing, address that issue, and then draw a random vector of treatment assignments again.

What should one do if instead we find that the treatment assignment mechanism generates comparable groups on average but obtain a draw where some imbalance is present? Bruhn and McKenzie (2009) provide a good discussion of why checking the balance of a draw is problematic when we know that the assignment of treatment was random. 

However, if one is still concerned about balance in such cases, recent advances in re-randomization techniques allow for choosing a draw that enforces covariate balance while taking that into account when conducting inference. For an example of such a method being applied see Beaman et al. (2020). For an example of the theory behind such methods see Li et al. (2018).

Now, habemus randomization!


Dahyeon Jeong

Economist in the Development Impact Evaluation (DIME) department, World Bank

Florence Kondylis

Research Manager, Lead Economist, Development Impact Evaluation

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000