Syndicate content

Add new comment

Submitted by Nathan Kallus on

Berk, thanks for a great blog post!

David, your comments are right on and turn out to apply to a very wide range of experimental designs called a priori balancing, i.e. those that try (if at all) to balance covariates ONLY before treatment and before randomization. I study optimality among these designs in another paper, Optimal A Priori Balance in the Design of Controlled Experiments accessible at http://arxiv.org/abs/1312.0531. It turns out that among such designs (which include complete randomization, our OR paper, and many other designs including all those mentioned by you and the blog post), if the multiple correlation (R^2) between outcomes and covariates is reduced, then all such designs get the same exact hit to their corresponding post-treatment estimation variance. In the limit, if there is no association, then no matter what a priori balancing you do, nothing is better than complete randomization. So, first, you are right that this behavior applies widely -- I prove it theoretically and demonstrate it empirically. Second, if one were to reduce R^2 then all methods would experience the same hit and therefore variance and power plots are simply to show how different methods address the part of the variance that covariates can reduce to begin with; the rest is a constant addition of noise across the board. Apart from this across the board noise, within the confines of what could ever be potentially reduced by balancing covariates, our OR paper provides a method that accelerates the balancing from logarithmic rates (like 1 over a power of n, e.g. 1/sqrt(n)) to linear (aka exponential) rates (like 1 over an exponent in n, e.g. 1/2^n).

Now, what is really interesting is that it turns out that even if R^2 is very high, without any partial knowledge about the structure of the association between outcomes and covariates, taking an adversarial nature that can act within the confines of what you don't know, complete randomization is the optimal design. Hence, there is no free lunch -- you can't hope to reduce variance better than complete randomization without knowing something about the problem. When you do know something, various designs come out as "optimal." For example, if you know the association is what's known as Lipschitz continuous, then pairwise matching is optimal! If you know the association is linear, then moment matching is optimal! Extending this, one can think about associations that live in functional spaces that are common in machine learning for general consistent learning -- the so-called universal reproducing kernel Hilbert spaces (RKHS). These leads to new and powerful nonparametric designs that seem to work well in practice.

From of my practical experience with applying these methods to real data, however, I've found that matching first and second moments is almost always the best combination of efficient (when you have fewer moments you can match them better) and sufficient (first moments only can cause problems when things aren't very exactly linear and adding second moments seems to almost always solve this problem to a practically sufficient degree). Hence, in our short OR paper, this is the method we focus on and recommend for all practical experiments involving small samples.