Syndicate content

Add new comment

Tools of the Trade: Intra-cluster correlations

David McKenzie's picture

In clustered randomized experiments, random assignment occurs at the group level, with multiple units observed within each group. For example, education interventions might be assigned at the school level, with outcomes measured at the student level, or microfinance interventions might be assigned at the savings group level, with outcomes measured for individual clients.

A key parameter in these experiments is the intracluster correlation, which measures the proportion of the overall variance in the outcome which is explained by within group variance. Consider, for example, a sample of 2000 individuals, divided into 100 groups of 20 each (e.g. 100 classes each of 20 students). When the intracluster correlation is 0, individuals within classes are no more similar than individuals in different classes, and it is as if you effectively assigned 2000 individuals to treatment or control. When the intracluster correlation is 1, everyone within a class acts the same, and so you effectively only have 100 independent observations. This graph (made in Optimal Design), shows how the power of a study for detecting a treatment effect of 0.2 standard deviations (delta =0.2), and of 0.5 standard deviations (delta = 0.5) as a function of the intracluster correlation. You see for the smaller effect size that power falls dramatically as the intracluster correlation increases.

As a result, when doing power calculations for cluster randomized trials, it is important to know what the likely intracluster correlation will be for your study. Most of my experiments have been randomized at the individual level, and the few that I have done at a group level have been cases where I haven’t had any baseline data available at the time of doing power calculations, so have had to typically rely on estimates from other studies of what this correlation could be. However, I am currently planning a financial literacy study in which we have the individual savings balances of microfinance group members, and so have the opportunity to actually calculate this for once. I realized I had forgotten how to do this in Stata, but luckily it is very simple. Just use the loneway command. Here is an example, showing my intracluster correlation is 0.13:

. loneway savings group

One-way Analysis of Variance for savings: Savings

Number of obs = 3535
R-squared = 0.1796

Source SS df MS F Prob > F

Between group 77952412 194 401816.56 3.77 0.0000
Within group 3.562e+08 3340 106635.14

Total 4.341e+08 3534 122839.21

Intraclass Asy.
correlation S.E. [95% Conf. Interval]
------------------------------------------------
0.13258 0.01713 0.09901 0.16616

Estimated SD of group effect 127.6682
Estimated SD within group 326.5504
Est. reliability of a group mean 0.73462
(evaluated at n=18.11)

As a result, with 20 individuals per microfinance group, standard errors will be approximately 1.86 as large as if I had individual randomization (see equation 11 on page 3922 of the Duflo et al, randomization toolkit for this formula), which should in our case still leave sufficient power to detect effect sizes we are interested in.

This should at least remind me of this command next time I forget it. Let us know if there are any other practical “how to do this in Stata?” questions you might have.