This is a very simple (and for once short) post, but since I have been asked this question quite a few times by people who are new to doing experiments, I figured it would be worth posting. It is also useful for non-experimental comparisons of a treatment and a control group.

Most papers with an experiment have a Table 1 where they compare the characteristics of the treatment and control group and test for balance. (See my paper with Miriam Bruhn for discussion of why this often isn’t a sensible thing to do). Ok, but let’s assume you are in a situation where you want to do this. One approach people use is just to do a series of t-tests comparing the means of the treatment and control group variable by variable. Or they might do this with regressions of the form:

X = a + b*Treat +e

And test whether b=0.

They might do this for 20 variables, find 1 or 2 are significant at the 5% level, and then say “this is about what we expect by chance, so it seems randomization has succeeded in generating balance”. But what if we find 3 or 4 differences out of 20 to be significant? Or what if none are individually significant, but the differences are all in the same direction.

An alternative, or complementary approach is to test for joint orthogonality. To do this, take your set of X variables (X1, X2, …, X20) and run the following:

Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u

And then test the joint hypothesis b1=b2=b3=…=b20=0

This can be run as a linear regression, with an F-test; or as a probit, with a chi-squared test.

That’s it, very simple. I think people get confused because the treatment variable jumps from being on the right-hand side for the single variable tests to being on the left-hand side for the joint orthogonality test.

Now what if you have multiple treatment groups? You can then run a multinomial logit or your other preferred specification and test for joint orthogonality within this framework, but I’ve not seen this done very often – typically I see people just compare each treatment separately to the control.

Most papers with an experiment have a Table 1 where they compare the characteristics of the treatment and control group and test for balance. (See my paper with Miriam Bruhn for discussion of why this often isn’t a sensible thing to do). Ok, but let’s assume you are in a situation where you want to do this. One approach people use is just to do a series of t-tests comparing the means of the treatment and control group variable by variable. Or they might do this with regressions of the form:

X = a + b*Treat +e

And test whether b=0.

They might do this for 20 variables, find 1 or 2 are significant at the 5% level, and then say “this is about what we expect by chance, so it seems randomization has succeeded in generating balance”. But what if we find 3 or 4 differences out of 20 to be significant? Or what if none are individually significant, but the differences are all in the same direction.

An alternative, or complementary approach is to test for joint orthogonality. To do this, take your set of X variables (X1, X2, …, X20) and run the following:

Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u

And then test the joint hypothesis b1=b2=b3=…=b20=0

This can be run as a linear regression, with an F-test; or as a probit, with a chi-squared test.

That’s it, very simple. I think people get confused because the treatment variable jumps from being on the right-hand side for the single variable tests to being on the left-hand side for the joint orthogonality test.

Now what if you have multiple treatment groups? You can then run a multinomial logit or your other preferred specification and test for joint orthogonality within this framework, but I’ve not seen this done very often – typically I see people just compare each treatment separately to the control.

## Comments

## Hi David,

Hi David,

Hansen and Bowers have a nice paper where they compare the performance of this test with a joint permutation test they propose for testing balance in clustered randomized trials. http://www.jstor.org/stable/27645895?seq=1#page_scan_tab_contents

As a bonus, the paper has a really clear explanation of the issues involved in testing for balance in clustered trials.

## Thanks Doug! I should mention

Thanks Doug! I should mention that even mild clustering ---assignment to some households of size 1 and some of size 2--- led the likelihood ratio based balance test to produce surprisingly misleading results. Worth checking the size-vs-level of the LR tests if the samples are small, covariates are binary and 1s are not close to 50%, or assignment is by cluster.

## Thanks, David. Super simple

Thanks, David. Super simple and helpful--makes complete sense and eliminates the "well let's just forget about those 4 significantly different covariates" comments in an experiment paper.

## Add new comment