Syndicate content

Finally, a way to do easy randomization inference in Stata!

David McKenzie's picture

Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s paper, and the books by Imbens and Rubin, and Gerber and Green). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.

This has now changed with a new command ritest written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The Stata journal article is also now out.

How do I get this command?
Simply type findit ritest in Stata.
[edit: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type

net describe ritest, from(https://raw.githubusercontent.com/simonheb/ritest/master/)

How does the command work?

  1. Using it when you have randomized at the individual level within strata
Here is an example from my Nigeria business plan competition . I’ve put the do file and data files used in this blog post here.  The full data from Nigeria are in my list of datasets.

One of the outcomes I consider is truncated profits in the second round follow-up, for firms that were existing at the time . My stata command for doing this in the paper was:

areg s_prof_trunc assigntreat, a(strata) robust

where s_prof_trunc was the profit outcome, and assigntreat my treatment assignment variable, and I am controlling for randomization strata. The p-value from this regression is 0.051.

The code for using this new command is then:
ritest assigntreat _b[assigntreat], reps(1000) seed(125) strata(strata): areg s_prof_trunc assigntreat, a(strata) robust

where here I am telling it what the treatment variable is (assigntreat), and what coefficient I want the p-value for, telling it to do randomization inference within the randomization strata, and do use 1000 replications or draws.  The output looks like this:


You can see the p-value is 0.047 here, very similar to what I got from the regression. This is individual-level randomization, with a sample size of 497, so we might expect randomization inference to give similar results to regression.
[edit: after the latest update to ritest which now enables replicability, I get 0.058 as the p-value using the seed of 125 in my code.]

2. Individual-level randomization, but with multiple time periods and clustered standard errors
Here I have re-shaped the Nigeria data so that I am estimating a pooled effect over the second and third rounds of follow-up data together. The p-value when I use regression to do this is 0.096. The command is then very similar to that above, except now telling it also to take into account the clustered nature when doing the permutations, by using the cluster option (with uid being my individual identifier):

ritest assigntreat _b[assigntreat], reps(1000) strata(strata) cluster(uid) seed(124): areg prof_trunc assigntreat time3 if group<=2 & existing==1, a(strata) robust cluster(uid)

The p-value I get is 0.100, which is again very similar to the regression value I estimate. In contrast, if I had not specified the cluster() option, it would give me the artificially low p-value of 0.06.

3. Clustered randomization with stratification
A third example I tried comes from ongoing work in Colombia on shortening supply chains. Here we have firms in 63 market blocks, which were formed into 31 strata (pairs and one triplet), with clustered randomization occurring at this market block level. An outcome of interest here is how many days a week firms shop at the central market. The p-value I get in the regression with clustered standard errors is 0.024. Randomization inference is meant to make more of a difference with clustered randomizations with relatively few clusters, so I was curious to see what difference it makes here:
ritest b_treat _b[b_treat], reps(5000) cluster(b_block) strata(b_pair) seed(546): areg dayscorab b_treat b_dayscorab miss_b_dayscorab round2 round3, cluster(b_block) a(b_pair)


So here randomization inference does make a big difference – increasing my p-value from 0.024 to 0.106. So if you are doing clustered randomization, randomization inference will be particularly important to use.

Some observations and caveats
This is fantastically easy to use for the simple randomization cases, and should lead to many more people using randomization inference, at least as a check. Here are some observations from my initial uses to keep in mind:
  1. Despite the ability to set the seed for randomization, it currently does NOT seem to give replicable responses. For example, in case 1 above, I ran the code with 1000 replications and the same seed 5 times in a row, and got different p-values each time: 0.047, 0.054, 0.039, 0.053, and 0.057. These are all close, but obviously it is not ideal that the same code will produce different p-values each time it is run.[edit: Simon has now fixed this in the latest version available on his github https://github.com/simonheb/ritest/blob/master/README.md]
  2. This raises the second issue of how many replications you should use. The default is 100 replications. This is likely to be way too low in many settings. Alwyn Young uses 10,000 draws, but says he finds little difference once he goes beyond 2000 draws. Once you are doing enough draws, point 1 will become less of an issue.
  3. Time taken: in my simple case above, it takes my laptop about 35 seconds to run this with 1000 draws, and 3 minutes to do 5000 draws. However, this is after I have dropped all other variables from my dataset. When I had my full dataset, it took more than 2 minutes to do 1000 draws. So if you want this to run relatively fast, drop all the other data from your dataset apart from what is needed for your regression.
  4. The command doesn’t allow for xi to be used in the regression you are running – so if your regression had been using this, you should remove this from your command. Jason Kerwin notes that you can still just put e.g. i.groupvariable to add group variable fixed effects, without using xi.
  5. Jason Kerwin notes to me that if your strata variable is a string rather than numeric, then the code ignores your strata and will give incorrect standard errors. So make sure any strata variables are numeric. [edit: Simon has now fixed this in his latest version, available on his github https://github.com/simonheb/ritest/blob/master/README.md]
  6. More complicated randomizations such as two-level clustered randomization
When the randomization becomes more complicated, you need to do a little more work. As an example, consider my Kenya business training paper, in which there is a double randomization: first markets are randomized into treatment or control markets within market strata, and then within the treated markets, firms are individually randomized to be assigned to training or not within individual strata. The command handles this by either calling on a program you write that shows how this randomization is done, or by using a file of valid permutations that you create. This is still on my to do list to play around with.

7. Multiple treatments:
The standard textbook explanation of randomization inference always seems to involve just a single binary treatment, so that a unit is either treatment or control. Then under the sharp null that all treatment effects are zero, the outcomes for each unit should be the same whether or not they are assigned to treatment or control, and so we can just estimate the treatment effect under all the possible permutations and see in how many of these permutations do we get a coefficient at least as large as the one in the actual data.
However, suppose now that we assign firms to control (T=0), traditional business training (T=1), or personal initiative training (T=2) as in this recent paper. Then we have at least three hypotheses we might want to test: no effect of traditional training, no effect of personal initiative training, and no effect of either training. It seems to me that we then need to use three separate sets of permutations to test these, and that you will need to program up each of these as a file of permutations for the program to then call on. For example, if I had 10 observations and my treatment assignment variable has values (0, 1, 0, 2, 1, 2, 0, 1, 2, 0) then to test no effect of traditional training, I would want to only permute the 0s and 1s with one another, without changing the assignments of those getting personal initiative training. I have yet to program this.

If anyone has examples they want to share of using this command to do randomization inference with multiple treatments or multi-level randomizations, please link to them in the comments.

For further reading:
Jason Kerwin also blogged last week about the difference between randomization inference and bootstrapping, and this command.

Alwyn Young has also developed a Stata command randcmd.ado available on his webpage, which I have yet to play around with.

 

Comments

Submitted by Berk Ozler on
Hi David,

When I rerun your code from Nigeria with the latest ritest from the address you provided, I get p=0.058 instead of 0.047, which is a decent difference. I wonder what update might be causing it...
 
 

Submitted by Berk Ozler on
I was running code as I was reading through and only now saw your comment about replicability...I can at least now replicate the p= 0.058 with repeated commands using the same seed. Changing the seed (to 550) changed the p=0.059 with 1000 reps, so that's encouraging. Also, your Colombia example replicated perfectly...

Berk.

Submitted by Simon H. Heß on

Great to see this is solved now. Thanks for the bug hunt and the testing. Please keep me posted if anything else does not work as intended or if questions arrise.

Submitted by Gareth on

Hi David is is possible to use the permute command in Stata for RI of from more complex specifications? Thanks

Submitted by Simon H. Heß on

-ritest- and -permute- are similar. In fact, the code is based on -permute-. Unfortunately -permute- is very restricted in terms of how it permutes the variables of interest, which limits the hypotheses you can test with it. For example if your treatment assignment was clustered, the treatment itself continuous, or the assignment procedure in any other way non-standard (e.g. repeated rerandomizations, branching treatment arms), you will soon hit the limits of what permute can do.

If you use -ritest- with nothing but the strata()-option you can just as well take -permute-.

Submitted by Gareth Roberts on

Thanks Simon

Submitted by Rahul Mehrotra on

Hi David, thanks for this useful blog-post. One overall question which I don't think you have discussed before (apologies if I missed it): what do you think about how randomization inference should be operationalized, so to speak, in the randomized experiments literature? Should we start reporting two sets of p-values in results tables, as we did with robust and cluster(-bootstrapped) after that approach gained prominence a decade ago? And if there are significant differences then interpret result as not robust?

I think the norm is still evolving here. If sample sizes are small (either in terms of units or clusters for clustered trials), then just reporting the randomization inference p-values seems a reasonable approach. But I've also seen both sets reported (because they are testing different nulls).

Submitted by Timo on

Hi David,

thanks for posting this. It is really helpful and super important!
One thing that currently keeps me busy is the randomization inference with a Difference in Difference estimator.
Idealy I would like to present the single difference estimator, the single difference estimator controlling for previous outcomes (ancova), and the DiD estimator. However, for the latter, of course the treatment group variable changes from 0 to 1 when being in the time during the treatment. This then leads to the error: "b_treat doesnt seem to be constant within clusters".
Any ideas?

Submitted by Simon H. Heß on

Dear Timo,

It depends a bit on how you specify your regression exactly and I have to guess here because I can't see your code. But your problem seems to be that you're instructing -ritest- to permute the treatment-period-interaction---instead of the treatment-assignment---which you shouldn't do. This interaction is of course not constant within clusters. If that's the case reformulating your Stata-statement following this example should solve your problem:

ritest treatment_village _b[c.treatment_village#c.period], cluster(village): reg y c.treatment_village#c.period c.treatment_village c.period

Submitted by Timo on

Hi Simon,
thank you so much for your reply.
I tried your code but it gives me the following error message:

"invalid matrix stripe;
c.treatment_village##c.period
error in expression: _b[c.treatment_village##c.period]"

So I tried to generate the treatment variable:

gen treatment=treatment_village*treatment_period

And run:
ritest treatment_village _b[treatment], cluster(village): xtreg y i.treatment i.period, fe cluster(village)

Resulting however in a p-value = 1 which seems not right at all.

Submitted by Simon H. Heß on

Dear Timo,

Generating the interaction manually first and then running ritest won't work, because ritest doesn't know it has to recreate the interatcion after each permutation of treatment_village.

You error is most likely due to the double ## you use, as _b[c.treatment_village##c.period] does not exist. The parameter you seem to be interested in is accessed via _b[c.treatment_village#c.period].

Best,

Simon

Submitted by Jacobus on

This is an excellent resource and write-up. Thanks David and Jason for blogging, and thanks Simon for writing the code.

I have also been struggling with how to test equality of coefficients with two treatments, using randomization inference.

I think the p-value should be constructed by calculating the proportion of times where the *difference* in observed (re-randomized) treatment impacts is larger than the difference in the true treatment impacts.

For example, take a dataset where “b1_estimated” and “b2_estimated” are the stored coefficients of the two treatment dummies, after running the main regression thousands of times, each time randomly re-assigning clusters to treatment. The code to calculate the randomization inference p-value would be:

count if true_difference < abs(b2_estimated – b1_estimated)
gen p1 = `r(N)' / obs

where “true_difference” is a constant of the true difference in treatment impacts; and "obs" is the number of repetitions. With this code I get a very similar p-value to the simple t-test of equality of coefficients.

But I am not sure if this can be done with the ritest command. At first I thought I could take a short-cut by setting one of the treatment arms as omitted category and throw in a dummy for the control (e.g. gen T0 = treat == 0)

E.g. ritest assigntreat _b[T2], reps(1000) strata(strata) cluster(uid) seed(124): reg y T0 T2 $controls, cluster(uid)

But this does not take into account the statistical uncertainty in assignment of treatment to T1. With this method one calculates:

count if abs(true_difference) < abs(b2_estimated)
gen p2 = `r(N)' / obs

But this under-estimates the p-value. In my data, p2>p1. (Intuitively, the variance of the difference between two normally distributed random variables is higher than the variance of each of the random variable.)

Let me know if you think this is the correct approach.

Submitted by Simon H. Heß on

Hi Jacobus,

What the right approach is depends very much on your original randomization, about which you did not provide all details. I can however give you some general hints which might help you compute what you want.

(1) if the two treatments (and control) are mutually exclusive, i.e. they can be described by a single treatment variable taking on the values 0, 1, or 2, you might be able to run:

ritest treat (_b[1.treat]-_b[2.treat]), cluster(cid) strata(sid): reg y x i.treatment

to obtain what you are looking for.

(2) If your treatments are overlapping, it is only slightly more complex to let the re-randomization follow the original procedure. For this you can use the samplingprogram()-option. A user-defined rerandomization program is allowed to alter more than just a single treatment-indicator variable and can thus rerandomize your data in the way you want. You can than still specify "(_b[1.treat]-_b[2.treat])" as the statistic of interest. You can always double-check what ritest does using the saveresampling()-option.

Please let me know if I misunderstood your question or if you have any further questions.

Add new comment