Syndicate content

Finally, a way to do easy randomization inference in Stata!

David McKenzie's picture

Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s paper, and the books by Imbens and Rubin, and Gerber and Green). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.

This has now changed with a new command ritest written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The Stata journal article is also now out.

How do I get this command?
Simply type findit ritest in Stata.
[edit: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type

net describe ritest, from(

How does the command work?

  1. Using it when you have randomized at the individual level within strata
Here is an example from my Nigeria business plan competition . I’ve put the do file and data files used in this blog post here.  The full data from Nigeria are in my list of datasets.

One of the outcomes I consider is truncated profits in the second round follow-up, for firms that were existing at the time . My stata command for doing this in the paper was:

areg s_prof_trunc assigntreat, a(strata) robust

where s_prof_trunc was the profit outcome, and assigntreat my treatment assignment variable, and I am controlling for randomization strata. The p-value from this regression is 0.051.

The code for using this new command is then:
ritest assigntreat _b[assigntreat], reps(1000) seed(125) strata(strata): areg s_prof_trunc assigntreat, a(strata) robust

where here I am telling it what the treatment variable is (assigntreat), and what coefficient I want the p-value for, telling it to do randomization inference within the randomization strata, and do use 1000 replications or draws.  The output looks like this:

You can see the p-value is 0.047 here, very similar to what I got from the regression. This is individual-level randomization, with a sample size of 497, so we might expect randomization inference to give similar results to regression.
[edit: after the latest update to ritest which now enables replicability, I get 0.058 as the p-value using the seed of 125 in my code.]

2. Individual-level randomization, but with multiple time periods and clustered standard errors
Here I have re-shaped the Nigeria data so that I am estimating a pooled effect over the second and third rounds of follow-up data together. The p-value when I use regression to do this is 0.096. The command is then very similar to that above, except now telling it also to take into account the clustered nature when doing the permutations, by using the cluster option (with uid being my individual identifier):

ritest assigntreat _b[assigntreat], reps(1000) strata(strata) cluster(uid) seed(124): areg prof_trunc assigntreat time3 if group<=2 & existing==1, a(strata) robust cluster(uid)

The p-value I get is 0.100, which is again very similar to the regression value I estimate. In contrast, if I had not specified the cluster() option, it would give me the artificially low p-value of 0.06.

3. Clustered randomization with stratification
A third example I tried comes from ongoing work in Colombia on shortening supply chains. Here we have firms in 63 market blocks, which were formed into 31 strata (pairs and one triplet), with clustered randomization occurring at this market block level. An outcome of interest here is how many days a week firms shop at the central market. The p-value I get in the regression with clustered standard errors is 0.024. Randomization inference is meant to make more of a difference with clustered randomizations with relatively few clusters, so I was curious to see what difference it makes here:
ritest b_treat _b[b_treat], reps(5000) cluster(b_block) strata(b_pair) seed(546): areg dayscorab b_treat b_dayscorab miss_b_dayscorab round2 round3, cluster(b_block) a(b_pair)

So here randomization inference does make a big difference – increasing my p-value from 0.024 to 0.106. So if you are doing clustered randomization, randomization inference will be particularly important to use.

Some observations and caveats
This is fantastically easy to use for the simple randomization cases, and should lead to many more people using randomization inference, at least as a check. Here are some observations from my initial uses to keep in mind:
  1. Despite the ability to set the seed for randomization, it currently does NOT seem to give replicable responses. For example, in case 1 above, I ran the code with 1000 replications and the same seed 5 times in a row, and got different p-values each time: 0.047, 0.054, 0.039, 0.053, and 0.057. These are all close, but obviously it is not ideal that the same code will produce different p-values each time it is run.[edit: Simon has now fixed this in the latest version available on his github]
  2. This raises the second issue of how many replications you should use. The default is 100 replications. This is likely to be way too low in many settings. Alwyn Young uses 10,000 draws, but says he finds little difference once he goes beyond 2000 draws. Once you are doing enough draws, point 1 will become less of an issue.
  3. Time taken: in my simple case above, it takes my laptop about 35 seconds to run this with 1000 draws, and 3 minutes to do 5000 draws. However, this is after I have dropped all other variables from my dataset. When I had my full dataset, it took more than 2 minutes to do 1000 draws. So if you want this to run relatively fast, drop all the other data from your dataset apart from what is needed for your regression.
  4. The command doesn’t allow for xi to be used in the regression you are running – so if your regression had been using this, you should remove this from your command. Jason Kerwin notes that you can still just put e.g. i.groupvariable to add group variable fixed effects, without using xi.
  5. Jason Kerwin notes to me that if your strata variable is a string rather than numeric, then the code ignores your strata and will give incorrect standard errors. So make sure any strata variables are numeric. [edit: Simon has now fixed this in his latest version, available on his github]
  6. More complicated randomizations such as two-level clustered randomization
When the randomization becomes more complicated, you need to do a little more work. As an example, consider my Kenya business training paper, in which there is a double randomization: first markets are randomized into treatment or control markets within market strata, and then within the treated markets, firms are individually randomized to be assigned to training or not within individual strata. The command handles this by either calling on a program you write that shows how this randomization is done, or by using a file of valid permutations that you create. This is still on my to do list to play around with.

7. Multiple treatments:
The standard textbook explanation of randomization inference always seems to involve just a single binary treatment, so that a unit is either treatment or control. Then under the sharp null that all treatment effects are zero, the outcomes for each unit should be the same whether or not they are assigned to treatment or control, and so we can just estimate the treatment effect under all the possible permutations and see in how many of these permutations do we get a coefficient at least as large as the one in the actual data.
However, suppose now that we assign firms to control (T=0), traditional business training (T=1), or personal initiative training (T=2) as in this recent paper. Then we have at least three hypotheses we might want to test: no effect of traditional training, no effect of personal initiative training, and no effect of either training. It seems to me that we then need to use three separate sets of permutations to test these, and that you will need to program up each of these as a file of permutations for the program to then call on. For example, if I had 10 observations and my treatment assignment variable has values (0, 1, 0, 2, 1, 2, 0, 1, 2, 0) then to test no effect of traditional training, I would want to only permute the 0s and 1s with one another, without changing the assignments of those getting personal initiative training. I have yet to program this.

If anyone has examples they want to share of using this command to do randomization inference with multiple treatments or multi-level randomizations, please link to them in the comments.

For further reading:
Jason Kerwin also blogged last week about the difference between randomization inference and bootstrapping, and this command.

Alwyn Young has also developed a Stata command randcmd.ado available on his webpage, which I have yet to play around with.



Submitted by Berk Ozler on
Hi David,

When I rerun your code from Nigeria with the latest ritest from the address you provided, I get p=0.058 instead of 0.047, which is a decent difference. I wonder what update might be causing it...

Submitted by Berk Ozler on
I was running code as I was reading through and only now saw your comment about replicability...I can at least now replicate the p= 0.058 with repeated commands using the same seed. Changing the seed (to 550) changed the p=0.059 with 1000 reps, so that's encouraging. Also, your Colombia example replicated perfectly...


Submitted by Simon H. Heß on

Great to see this is solved now. Thanks for the bug hunt and the testing. Please keep me posted if anything else does not work as intended or if questions arrise.

Submitted by Gareth on

Hi David is is possible to use the permute command in Stata for RI of from more complex specifications? Thanks

Submitted by Simon H. Heß on

-ritest- and -permute- are similar. In fact, the code is based on -permute-. Unfortunately -permute- is very restricted in terms of how it permutes the variables of interest, which limits the hypotheses you can test with it. For example if your treatment assignment was clustered, the treatment itself continuous, or the assignment procedure in any other way non-standard (e.g. repeated rerandomizations, branching treatment arms), you will soon hit the limits of what permute can do.

If you use -ritest- with nothing but the strata()-option you can just as well take -permute-.

Submitted by Gareth Roberts on

Thanks Simon

Submitted by Rahul Mehrotra on

Hi David, thanks for this useful blog-post. One overall question which I don't think you have discussed before (apologies if I missed it): what do you think about how randomization inference should be operationalized, so to speak, in the randomized experiments literature? Should we start reporting two sets of p-values in results tables, as we did with robust and cluster(-bootstrapped) after that approach gained prominence a decade ago? And if there are significant differences then interpret result as not robust?

I think the norm is still evolving here. If sample sizes are small (either in terms of units or clusters for clustered trials), then just reporting the randomization inference p-values seems a reasonable approach. But I've also seen both sets reported (because they are testing different nulls).

Add new comment