Finally, a way to do easy randomization inference in Stata!


This page in:

Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s paper, and the books by Imbens and Rubin, and Gerber and Green). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.

This has now changed with a new command ritest written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The Stata journal article is also now out.

How do I get this command?
Simply type findit ritest in Stata.
[edit: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type

net describe ritest, from(

How does the command work?

  1. Using it when you have randomized at the individual level within strata
Here is an example from my Nigeria business plan competition . I’ve put the do file and data files used in this blog post here.  The full data from Nigeria are in my list of datasets.

One of the outcomes I consider is truncated profits in the second round follow-up, for firms that were existing at the time . My stata command for doing this in the paper was:

areg s_prof_trunc assigntreat, a(strata) robust

where s_prof_trunc was the profit outcome, and assigntreat my treatment assignment variable, and I am controlling for randomization strata. The p-value from this regression is 0.051.

The code for using this new command is then:
ritest assigntreat _b[assigntreat], reps(1000) seed(125) strata(strata): areg s_prof_trunc assigntreat, a(strata) robust

where here I am telling it what the treatment variable is (assigntreat), and what coefficient I want the p-value for, telling it to do randomization inference within the randomization strata, and do use 1000 replications or draws.  The output looks like this:

You can see the p-value is 0.047 here, very similar to what I got from the regression. This is individual-level randomization, with a sample size of 497, so we might expect randomization inference to give similar results to regression.
[edit: after the latest update to ritest which now enables replicability, I get 0.058 as the p-value using the seed of 125 in my code.]

2. Individual-level randomization, but with multiple time periods and clustered standard errors
Here I have re-shaped the Nigeria data so that I am estimating a pooled effect over the second and third rounds of follow-up data together. The p-value when I use regression to do this is 0.096. The command is then very similar to that above, except now telling it also to take into account the clustered nature when doing the permutations, by using the cluster option (with uid being my individual identifier):

ritest assigntreat _b[assigntreat], reps(1000) strata(strata) cluster(uid) seed(124): areg prof_trunc assigntreat time3 if group

The p-value I get is 0.100, which is again very similar to the regression value I estimate. In contrast, if I had not specified the cluster() option, it would give me the artificially low p-value of 0.06.

3. Clustered randomization with stratification
A third example I tried comes from ongoing work in Colombia on shortening supply chains. Here we have firms in 63 market blocks, which were formed into 31 strata (pairs and one triplet), with clustered randomization occurring at this market block level. An outcome of interest here is how many days a week firms shop at the central market. The p-value I get in the regression with clustered standard errors is 0.024. Randomization inference is meant to make more of a difference with clustered randomizations with relatively few clusters, so I was curious to see what difference it makes here:
ritest b_treat _b[b_treat], reps(5000) cluster(b_block) strata(b_pair) seed(546): areg dayscorab b_treat b_dayscorab miss_b_dayscorab round2 round3, cluster(b_block) a(b_pair)

So here randomization inference does make a big difference – increasing my p-value from 0.024 to 0.106. So if you are doing clustered randomization, randomization inference will be particularly important to use.

Some observations and caveats
This is fantastically easy to use for the simple randomization cases, and should lead to many more people using randomization inference, at least as a check. Here are some observations from my initial uses to keep in mind:
  1. Despite the ability to set the seed for randomization, it currently does NOT seem to give replicable responses. For example, in case 1 above, I ran the code with 1000 replications and the same seed 5 times in a row, and got different p-values each time: 0.047, 0.054, 0.039, 0.053, and 0.057. These are all close, but obviously it is not ideal that the same code will produce different p-values each time it is run.[edit: Simon has now fixed this in the latest version available on his github]
  2. This raises the second issue of how many replications you should use. The default is 100 replications. This is likely to be way too low in many settings. Alwyn Young uses 10,000 draws, but says he finds little difference once he goes beyond 2000 draws. Once you are doing enough draws, point 1 will become less of an issue.
  3. Time taken: in my simple case above, it takes my laptop about 35 seconds to run this with 1000 draws, and 3 minutes to do 5000 draws. However, this is after I have dropped all other variables from my dataset. When I had my full dataset, it took more than 2 minutes to do 1000 draws. So if you want this to run relatively fast, drop all the other data from your dataset apart from what is needed for your regression.
  4. The command doesn’t allow for xi to be used in the regression you are running – so if your regression had been using this, you should remove this from your command. Jason Kerwin notes that you can still just put e.g. i.groupvariable to add group variable fixed effects, without using xi.
  5. Jason Kerwin notes to me that if your strata variable is a string rather than numeric, then the code ignores your strata and will give incorrect standard errors. So make sure any strata variables are numeric. [edit: Simon has now fixed this in his latest version, available on his github]
  6. More complicated randomizations such as two-level clustered randomization
When the randomization becomes more complicated, you need to do a little more work. As an example, consider my Kenya business training paper, in which there is a double randomization: first markets are randomized into treatment or control markets within market strata, and then within the treated markets, firms are individually randomized to be assigned to training or not within individual strata. The command handles this by either calling on a program you write that shows how this randomization is done, or by using a file of valid permutations that you create. This is still on my to do list to play around with.

7. Multiple treatments:
The standard textbook explanation of randomization inference always seems to involve just a single binary treatment, so that a unit is either treatment or control. Then under the sharp null that all treatment effects are zero, the outcomes for each unit should be the same whether or not they are assigned to treatment or control, and so we can just estimate the treatment effect under all the possible permutations and see in how many of these permutations do we get a coefficient at least as large as the one in the actual data.
However, suppose now that we assign firms to control (T=0), traditional business training (T=1), or personal initiative training (T=2) as in this recent paper. Then we have at least three hypotheses we might want to test: no effect of traditional training, no effect of personal initiative training, and no effect of either training. It seems to me that we then need to use three separate sets of permutations to test these, and that you will need to program up each of these as a file of permutations for the program to then call on. For example, if I had 10 observations and my treatment assignment variable has values (0, 1, 0, 2, 1, 2, 0, 1, 2, 0) then to test no effect of traditional training, I would want to only permute the 0s and 1s with one another, without changing the assignments of those getting personal initiative training. I have yet to program this.

If anyone has examples they want to share of using this command to do randomization inference with multiple treatments or multi-level randomizations, please link to them in the comments.

For further reading:
Jason Kerwin also blogged last week about the difference between randomization inference and bootstrapping, and this command.

Alwyn Young has also developed a Stata command randcmd.ado available on his webpage, which I have yet to play around with.



David McKenzie

Lead Economist, Development Research Group, World Bank

Berk Ozler
October 03, 2017

Hi David,
When I rerun your code from Nigeria with the latest ritest from the address you provided, I get p=0.058 instead of 0.047, which is a decent difference. I wonder what update might be causing it...


Berk Ozler
October 03, 2017

I was running code as I was reading through and only now saw your comment about replicability...I can at least now replicate the p= 0.058 with repeated commands using the same seed. Changing the seed (to 550) changed the p=0.059 with 1000 reps, so that's encouraging. Also, your Colombia example replicated perfectly...

Simon H. Heß
October 04, 2017

Great to see this is solved now. Thanks for the bug hunt and the testing. Please keep me posted if anything else does not work as intended or if questions arrise.

October 05, 2017

Hi David is is possible to use the permute command in Stata for RI of from more complex specifications? Thanks

Simon H. Heß
October 10, 2017

-ritest- and -permute- are similar. In fact, the code is based on -permute-. Unfortunately -permute- is very restricted in terms of how it permutes the variables of interest, which limits the hypotheses you can test with it. For example if your treatment assignment was clustered, the treatment itself continuous, or the assignment procedure in any other way non-standard (e.g. repeated rerandomizations, branching treatment arms), you will soon hit the limits of what permute can do.
If you use -ritest- with nothing but the strata()-option you can just as well take -permute-.

Gareth Roberts
October 11, 2017

Thanks Simon

Rahul Mehrotra
October 08, 2017

Hi David, thanks for this useful blog-post. One overall question which I don't think you have discussed before (apologies if I missed it): what do you think about how randomization inference should be operationalized, so to speak, in the randomized experiments literature? Should we start reporting two sets of p-values in results tables, as we did with robust and cluster(-bootstrapped) after that approach gained prominence a decade ago? And if there are significant differences then interpret result as not robust?

David McKenzie
October 09, 2017

I think the norm is still evolving here. If sample sizes are small (either in terms of units or clusters for clustered trials), then just reporting the randomization inference p-values seems a reasonable approach. But I've also seen both sets reported (because they are testing different nulls).

October 24, 2017

Hi David,
thanks for posting this. It is really helpful and super important!
One thing that currently keeps me busy is the randomization inference with a Difference in Difference estimator.
Idealy I would like to present the single difference estimator, the single difference estimator controlling for previous outcomes (ancova), and the DiD estimator. However, for the latter, of course the treatment group variable changes from 0 to 1 when being in the time during the treatment. This then leads to the error: "b_treat doesnt seem to be constant within clusters".
Any ideas?

Simon H. Heß
October 25, 2017

Dear Timo,
It depends a bit on how you specify your regression exactly and I have to guess here because I can't see your code. But your problem seems to be that you're instructing -ritest- to permute the treatment-period-interaction---instead of the treatment-assignment---which you shouldn't do. This interaction is of course not constant within clusters. If that's the case reformulating your Stata-statement following this example should solve your problem:
ritest treatment_village _b[c.treatment_village#c.period], cluster(village): reg y c.treatment_village#c.period c.treatment_village c.period

October 25, 2017

Hi Simon,
thank you so much for your reply.
I tried your code but it gives me the following error message:
"invalid matrix stripe;
error in expression: _b[c.treatment_village##c.period]"
So I tried to generate the treatment variable:
gen treatment=treatment_village*treatment_period
And run:
ritest treatment_village _b[treatment], cluster(village): xtreg y i.treatment i.period, fe cluster(village)
Resulting however in a p-value = 1 which seems not right at all.

Simon H. Heß
October 26, 2017

Dear Timo,
Generating the interaction manually first and then running ritest won't work, because ritest doesn't know it has to recreate the interatcion after each permutation of treatment_village.
You error is most likely due to the double ## you use, as _b[c.treatment_village##c.period] does not exist. The parameter you seem to be interested in is accessed via _b[c.treatment_village#c.period].

May 23, 2019

Hey Simon,
Thanks for putting this together as it is really helpful!
I have a similar question. I'm working with school data that is structured at the student x year level for ten years. I have a treatment that was introduced to a number of schools at different time points (ex: some got it in 2010, some got it in 2015), with the treatment variable as dichotomous (1 during treatment years, 0 during non-treatment years and schools that never received the treatment). I'm currently trying to run an ritest with clustering at the school level. The code looks something like:
ritest teatment _b [treatment], cluster (school): reg outcome treatment year $controls, robust cluster (school)
I keep getting the error that my treatment isn't constant within the clusters (which it shouldn't be), but am not sure how to resolve this issue. Any ideas about what could be going wrong?

May 23, 2019

Hi! I wondered if you ever solved this? I have a similar problem where I have aggregate state-level data from 2000-2012 & a policy that was implemented in some states in 2007 and others in 2009. I have a similar dichotomous variable - treatment - which is 1 during treatment years in treated states, and 0 otherwise). The code for my regression is:
xtreg outcome treatment i.year i.state, vce(robust)
For the ritest, I understand I need an interaction variable so that I'm evaluating _b[c.treatment#c.postperiod]. But my problem is, given the postperiod differs between states (sometimes it is 2007+ and sometimes 2009+) & we don't know what it would have been for the control countries, how do I model this? I think I need to program a two-stage randomization process (randomized to control/treatment & randomized to 2007/2009) - but I am struggling with writing this program. Specifically, I can't tie the second randomization (randomization of years in the postperiod variable) into the program properly so that this part of randomization is implemented.
Thanks for any advice!

December 04, 2017

This is an excellent resource and write-up. Thanks David and Jason for blogging, and thanks Simon for writing the code.
I have also been struggling with how to test equality of coefficients with two treatments, using randomization inference.
I think the p-value should be constructed by calculating the proportion of times where the *difference* in observed (re-randomized) treatment impacts is larger than the difference in the true treatment impacts.
For example, take a dataset where “b1_estimated” and “b2_estimated” are the stored coefficients of the two treatment dummies, after running the main regression thousands of times, each time randomly re-assigning clusters to treatment. The code to calculate the randomization inference p-value would be:
count if true_difference < abs(b2_estimated – b1_estimated)
gen p1 = `r(N)' / obs
where “true_difference” is a constant of the true difference in treatment impacts; and "obs" is the number of repetitions. With this code I get a very similar p-value to the simple t-test of equality of coefficients.
But I am not sure if this can be done with the ritest command. At first I thought I could take a short-cut by setting one of the treatment arms as omitted category and throw in a dummy for the control (e.g. gen T0 = treat == 0)
E.g. ritest assigntreat _b[T2], reps(1000) strata(strata) cluster(uid) seed(124): reg y T0 T2 $controls, cluster(uid)
But this does not take into account the statistical uncertainty in assignment of treatment to T1. With this method one calculates:
count if abs(true_difference) < abs(b2_estimated)
gen p2 = `r(N)' / obs
But this under-estimates the p-value. In my data, p2>p1. (Intuitively, the variance of the difference between two normally distributed random variables is higher than the variance of each of the random variable.)
Let me know if you think this is the correct approach.

Simon H. Heß
December 04, 2017

Hi Jacobus,
What the right approach is depends very much on your original randomization, about which you did not provide all details. I can however give you some general hints which might help you compute what you want.
(1) if the two treatments (and control) are mutually exclusive, i.e. they can be described by a single treatment variable taking on the values 0, 1, or 2, you might be able to run:
ritest treat (_b[1.treat]-_b[2.treat]), cluster(cid) strata(sid): reg y x i.treatment
to obtain what you are looking for.
(2) If your treatments are overlapping, it is only slightly more complex to let the re-randomization follow the original procedure. For this you can use the samplingprogram()-option. A user-defined rerandomization program is allowed to alter more than just a single treatment-indicator variable and can thus rerandomize your data in the way you want. You can than still specify "(_b[1.treat]-_b[2.treat])" as the statistic of interest. You can always double-check what ritest does using the saveresampling()-option.
Please let me know if I misunderstood your question or if you have any further questions.

April 06, 2018

Hi !
Thanks for the presentation of the package. I have a question on the usage of ritest with continuous variables.
I am using it in the context of panel data ; the scale of observation is municipality (N=30 000) and the time dimension is years (T=26).
I regress the average level of income by municipality on several weather conditions and I would like to check the robustness of the results by using Randomization Inference.
I therefore tried to estimate it with: ritest mean_temp _b[mean_temp], reps(1000) strata(municipality_id) cluster(region) : xtreg income mean_temperature i.year, fe cluster(region) and it returned:
mean_temp doesnt seem to be constant within clusters
I nevertheless want to allow for correlations at the regional level (or even higher) because two close municipalities cannot be considered as having independent weather conditions, ie: I would like ritest to assign to each commune in one cluster the same year of observation.
Is there a simple way of doing it using your package or should I create my own permutation programme?
Thanks a lot for your help,

Simon H. Heß
April 08, 2018

Hi Nicolas,
You used "ritest ..., ... cluster(municipality)", which lets ritest think that treatment (weather) has been assigned at that level. For two reasons this is not what you seem to want to be doing: (1) you argued for dependence of weather across municipalities, (2) your data has a panel structure, so you have multiple observations per municipality that have different [by assumption independent] weather conditions.
I assume you want to use resample the weather variables, ignoring that there might be inter-temporal dependence but leaving the spacial correlation structure as is. That could be achieved by shuffling the weather data across years. The easiest way to do so would be via your own permutation program. What the easiest approach for that would be depends a lot on your data structure and how exactly you want the resampling, but it could be something like this:
program randomyearmerge
 syntax, * //this "swallows" whatever ritest passes on to the program
 sort year municipality, stable //to ensure reproducibility
 //draw a random year
 by year: gen randomyear = round(1980+runiform()*20) if _n==1
 //use the same random year for all municipalities
 by year: replace randomyear = randomyear[1]
 merge m:1 year municipility using weatherdata.dta, nogen //load the weatherdata
. ritest mean_temp _b[mean_temp] , samplingprogram(randomyearmerge): xtreg ...
(obviously you'll have to adapt the code to your needs and I can give no guarantee that it does what you want it to do)

Simon Heß
October 16, 2018

In the past months I received several emails asking about how to implement the pairwise tests for experiments with more than two treatment groups (Point 7 above). I thus added a new option to -ritest- that makes this easy.
Using the option fixlevels(list of levels), one can restrict the re-randomization to hold all observations fixed that have the specified values for the treatment variable. In terms of the example David gave in this post, this could be simply done by specifying:
ritest treatment (_b[0.treatment]-_b[1.treatment]), cluster(clustervar) strata(stratavar) fixlevels(2): reg y i.treatment
The way the option is implemented internally is fairly straightforward, by creating a temporary stratum for each of the specified levels during the re-randomization (Jason Kerwins's idea). The new version of -ritest- can be obtained from my GitHub. Via Stata:
net describe ritest, from(
Please don't hesitate to contact me for bug reports or if you have questions.

November 06, 2020

David, thank you, so much for your blogs!! there are incredible useful. I have a question. I am analyzing an experiment and I want to get the pval for the intention to treat but also for the second stage, using the treatment as an instrument. Ritest pval are very similar to the OLS pval in the IIT estimates but completely different in the IV estimates. Does ritest work well with ivreg/second stage estimates? thanks!!

Simon Heß
November 08, 2020

Hi Patti,

I'm the author of ritest and David was so kind to notify me about your question. Technically, ritest works fine with ivreg etc. Meaning: it is able to compare sample realizations of a certain statistic against the distribution of this under permuted treatment. Econometrically however, it is not obvious that this provides a test for what you intend to test. At the risk of oversimplifying things, this is what happens: by resampling treatment in a normal OLS/reduced form/ITT, you're imposing the null that treatment does not matter, which is usually the null that you want to test. Contrarily, in the IV context, resampling the instrument imposes the null that the instrument is entirely irrelevant (and exogenous), which is usually not what you want test. So you don't really learn much from a rejection here. I.e., may not be asking/answering the same question as you would with the standard IV p-value.

I've seen people refer this paper and presentation by Paul Rosenbaum and Guido Imbens in the context of randomization and instrumental variables (,…), which seems to contain all building blocks of a simple, but more useful test. In spite of its simplicity and benefits, I have neither seen applied researchers use this regularly nor have I worked with this myself. I only recently started looking at this, but I find this interesting, so if anyone here is interested in working on this (or knows someone who is), please reach out!