This is a very simple (and for once short) post, but since I have been asked this question quite a few times by people who are new to doing experiments, I figured it would be worth posting. It is also useful for non-experimental comparisons of a treatment and a control group.

Most papers with an experiment have a Table 1 where they compare the characteristics of the treatment and control group and test for balance. (See my paper with Miriam Bruhn for discussion of why this often isn’t a sensible thing to do). Ok, but let’s assume you are in a situation where you want to do this. One approach people use is just to do a series of t-tests comparing the means of the treatment and control group variable by variable. Or they might do this with regressions of the form:

X = a + b*Treat +e

And test whether b=0.

They might do this for 20 variables, find 1 or 2 are significant at the 5% level, and then say “this is about what we expect by chance, so it seems randomization has succeeded in generating balance”. But what if we find 3 or 4 differences out of 20 to be significant? Or what if none are individually significant, but the differences are all in the same direction.

An alternative, or complementary approach is to test for joint orthogonality. To do this, take your set of X variables (X1, X2, …, X20) and run the following:

Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u

And then test the joint hypothesis b1=b2=b3=…=b20=0

This can be run as a linear regression, with an F-test; or as a probit, with a chi-squared test.

That’s it, very simple. I think people get confused because the treatment variable jumps from being on the right-hand side for the single variable tests to being on the left-hand side for the joint orthogonality test.

Now what if you have multiple treatment groups? You can then run a multinomial logit or your other preferred specification and test for joint orthogonality within this framework, but I’ve not seen this done very often – typically I see people just compare each treatment separately to the control.

Most papers with an experiment have a Table 1 where they compare the characteristics of the treatment and control group and test for balance. (See my paper with Miriam Bruhn for discussion of why this often isn’t a sensible thing to do). Ok, but let’s assume you are in a situation where you want to do this. One approach people use is just to do a series of t-tests comparing the means of the treatment and control group variable by variable. Or they might do this with regressions of the form:

X = a + b*Treat +e

And test whether b=0.

They might do this for 20 variables, find 1 or 2 are significant at the 5% level, and then say “this is about what we expect by chance, so it seems randomization has succeeded in generating balance”. But what if we find 3 or 4 differences out of 20 to be significant? Or what if none are individually significant, but the differences are all in the same direction.

An alternative, or complementary approach is to test for joint orthogonality. To do this, take your set of X variables (X1, X2, …, X20) and run the following:

Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u

And then test the joint hypothesis b1=b2=b3=…=b20=0

This can be run as a linear regression, with an F-test; or as a probit, with a chi-squared test.

That’s it, very simple. I think people get confused because the treatment variable jumps from being on the right-hand side for the single variable tests to being on the left-hand side for the joint orthogonality test.

Now what if you have multiple treatment groups? You can then run a multinomial logit or your other preferred specification and test for joint orthogonality within this framework, but I’ve not seen this done very often – typically I see people just compare each treatment separately to the control.

## Comments

## Hi David,

Hi David,

Hansen and Bowers have a nice paper where they compare the performance of this test with a joint permutation test they propose for testing balance in clustered randomized trials. http://www.jstor.org/stable/27645895?seq=1#page_scan_tab_contents

As a bonus, the paper has a really clear explanation of the issues involved in testing for balance in clustered trials.

## Thanks Doug! I should mention

Thanks Doug! I should mention that even mild clustering ---assignment to some households of size 1 and some of size 2--- led the likelihood ratio based balance test to produce surprisingly misleading results. Worth checking the size-vs-level of the LR tests if the samples are small, covariates are binary and 1s are not close to 50%, or assignment is by cluster.

## Thanks, David. Super simple

Thanks, David. Super simple and helpful--makes complete sense and eliminates the "well let's just forget about those 4 significantly different covariates" comments in an experiment paper.

## Long time reader, first time

Long time reader, first time poster.

I have a slightly off-the-wall question about using a joint test of orthogonality.

Say I’m looking at dating website profiles. I note 20 adjective that are much more likely to be used on females’ profiles than males’ profiles. I note another 20 adjective that are much more likely to be used on males’ profiles than females’ profiles.

I then roll out a design change across the site that I hypothesise will reduce the use of ‘gendered language’. I want to test whether it has done so. Imagine that we pushed the redesign to only half of our users, so this is a proper randomised A/B test.

Should I use a joint test of orthogonality?

Here’s how I would envision it working:

- Using the same list of words that the exploratory analysis has already found, we would make each word an indicator variable which takes the value of 1 if the word is used in a profile and 0 if it is not. We would then run a joint test of orthogonality:

- We take our set of 40 words (X1, X2, …, X40) and run the following regression:

- Female = a + b1*X1 + b2*X2 + b3*X3 + ….+b40*X40 +u

- We then test the joint hypothesis b1=b2=b3=…=b40=0 as a linear regression, with an F-test.

But how should I use the indicator variable for whether the user has been ‘treated’ or not? Interacted with each word-indicator variable?

Many thanks in advance,

Andrew

## Hi Andrew,

I don't think you want a joint orthogonality test here - you aren't trying to test if none of the adjectives are related to gender. Instead you are testing if your treatment reduces the use of gendered adjectives. So there would be two approaches I would take to doing this:

1. Just define a count of the number of male adjectives used on male profiles (call this M20), and a count of the number of female adjectives used on female profiles (call this F20), and then run regressions like:

M20 = a+b*Treat + e

F20 = a+b*Treat + e

This will show whether your treatment succeeds in getting males to use the male adjectives less, and females to use the female adjectives less. (I would run these regressions separately by gender, but you could also pool together males and females and just create a variable that is the number of gendered adjectives of your gender you use).

2. If you are particularly interested in whether the treatment reduces the use of particular adjectives, then you can run the 40 regressions of the form:

X1 = a + b*Treat + e

or X1 = a+b*Treat + c*Female + d*Treat*Female + e

and so on up to X40

and then use a multiple testing correction to account for the fact you are doing 40 different tests.

## Add new comment