Published on Development Impact

Design sandbox: Power calculations and optimal design for cost effectiveness (Part 2: The case of nonlinearities)

This page in:

In our previous blog post, we presented a dashboard that implemented power calculations and provided optimal design for cash benchmarking experiments. One key component of these experiments is comparing the impacts of two programs, expressing these impacts per unit of cost. A closely related question is comparing, once again per unit of cost, larger and smaller versions of the same program. To give one example of this, we may want to compare the cost effectiveness of large and small temporary unconditional cash transfers at reducing poverty; we do so in a forthcoming paper!


In this Part 2 blog, we apply insights from our previous blog post to this closely related problem of testing for differences in impacts, per unit of cost, of large and small versions of a program. We refer to this as testing for “nonlinearities” — this is because, when program impacts are linear in program size, the problem will have a constant impact per unit of cost regardless of its size. For instance, the claim that bigger programs are “better” at pushing households out of poverty is typically a claim that impacts are nonlinear with respect to size, and specifically a claim that impacts per unit of cost are increasing in program size.


Nonlinearities dashboard

Researcher's problem

To start, the dashboard (presented in the figure below) allows the user to select the key parameters of the researcher’s problem described in our previous blog post. For concreteness, we focus on the example of unconditional cash transfers (UCT). When testing for nonlinearities, by comparing a small UCT to a large UCT, the dashboard focuses on estimation of the following reduced form:

\[ Y_{i} = \beta_{0} + \beta_{1} \text{Small UCT}_{i} + \beta_{2} \text{Large UCT}_{i} + \epsilon_{i} \]




We assume the researcher is interested in the difference in impacts per unit of transfer between the large UCT and the small UCT, that is \( \frac{\beta_{2}}{\text{Large UCT size}} - \frac{\beta_{1}}{\text{Small UCT size}} \). This is the difference between the average effect of the Large UCT per unit of transfer and the average effect of the Small UCT per unit of transfer. We refer to this estimate as the “estimate of nonlinearities”, as when this difference is 0, the effect of the Large UCT and Small UCT are linear in transfer size. The researcher’s objective is to minimize the variance of the estimate of nonlinearities.


In the dashboard, we also focus on the difference between the marginal impact of the large UCT and the average impact of the UCT, that is \( \frac{\beta_{2} - \beta_{1}}{\text{Large UCT size} - \text{Small UCT size}} - \frac{\beta_{1}}{\text{Small UCT size}} \). With some algebra, this difference is simply a scaled version of our “estimate of nonlinearities”, so the optimal design that minimizes the variance of one will also minimize the variance of the other.


We then allow the researcher to select the size of the two cash transfers, that is, \( \text{Small UCT size} \) and \( \text{Large UCT size} \). In the presence of nonlinearities, different size cash transfers will have different average effects per unit of transfer — therefore, cash transfers sizes for which the researcher can anticipate very different impacts will result in higher power for detecting nonlinearities.


Conditional on the above choices, the dashboard then minimizes the variance of the estimate of nonlinearities, by choosing the optimal allocation of observations across the control group, the small UCT group, and the large UCT group. The dashboard then reports this optimal design and associated power calculations.


Optimal design and minimized variances

In the figure below, we present an example output of the dashboard, which includes the optimal design (the fraction of observations assigned to control, to the small UCT, and to the large UCT), and the variance of an estimate of a particular model of nonlinearities (which we don’t focus on in this blog post). To facilitate comparison across alternative parameter choices (which lead to alternative designs), we provide two panels for which the researcher can specify different choices of the key parameters (the size of the small UCT and the size of the large UCT).


The example below is for the case where the large UCT is twice as large as the small UCT. In this example, the optimal design assigns 1/4 of observations to the control group, 1/2 of observations to the small UCT, and 1/4 to the large UCT. This is because our estimate of nonlinearities can be interpreted as a difference-in-differences estimate in a 2x2 design, comparing (Large UCT - Small UCT) to (Small UCT - Control). The optimal design here assigns 1/4 of observations to each group (with Small UCT counted twice, as it appears in both differences), just as the optimal design in a 2x2 for estimating this interaction does. The challenges with power when testing for nonlinearities are therefore related to the challenges when testing for interaction effects in a 2x2, discussed recently on this blog.


However, we could also consider testing for nonlinearities using a larger Large UCT. The graph in the example above plots the optimal design as a function of the size of the Large UCT; other graphs allow the dashboard user to explore the effects of also varying the size of the Small UCT, and to explore how the variance of the estimate is impacted. Note that the optimal allocation to the Large UCT shrinks as the size of the Large UCT increases — this is because the impact of the Large UCT per unit of transfer is more precisely estimated as the size of the cash transfer increases. We caution that this is an asymptotic result — with a very small sample, if the “optimal” number of observations assigned to the cash transfer arm is very small (e.g., less than 15), this asymptotic approximation will be meaningfully biased.




MDE and required sample sizes

Lastly, in the figure below, we present the power panel of the dashboard, which allows calculation of minimum detectable effects and required sample size, as a function of the selections in the first part of the dashboard. These calculations are standard transformations of the variances described above; our back-of-the-envelope calculations were again useful here! For power for nonlinearities, we focus on the minimum detectable effect of the marginal effect of the large cash transfer, per unit of transfer, relative to the small cash transfer per unit of transfer.


We present one possible result below, continuing our example above, where the larger cash transfer is twice as large as the smaller cash transfer. Considering consumption as the outcome of interest, for a realistic choice of variance of the error, the minimum detectable effect for the difference between the marginal effect of the Large UCT and the average effect of the Small UCT is 0.35 with 1,000 observations. Based on estimates from our highly anticipated forthcoming paper mentioned above, 0.2 is a typical difference between these effects; inputting this, we get a required sample size of 3,140. In general, testing for nonlinearities requires larger samples, as the magnitude of nonlinearities is often small compared to the magnitude of average effects.


Florence Kondylis

Research Manager, Lead Economist, Development Impact Evaluation

John Loeser

Economist, Development Impact Evaluation

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000