Published on Development Impact

How to do meta-analysis using a Bayesian Hierarchical Model and when does it make sense to do so? Guest post by Pauline Castaing and Jules Gazeaud

September 28, 2022

This page in:

Results from impact evaluations continue to accumulate, but practical questions remain about how to best aggregate evidence and extract generalizable information. The Bayesian hierarchical model offers one approach. The method is not new: Don Rubin used it in his 1981 paper, in which he evaluated the impact of coaching in eight parallel randomized experiments conducted in US high schools. However, recent years have seen a handful of new applications on index insurance, micro-credit, performance pay, water treatment, and 20 types of interventions.

The Bayesian Hierarchical Model

A key question for meta-analysis is that of the modelling of treatment effect heterogeneity. The full pooling (or fixed-effects) model considers that each individual study estimates a common effect and that there is no heterogeneity in treatment effects. However, this assumption often seems implausible, as studies are rooted in specific contexts and effects are likely to be more pronounced in certain characteristics (period, location, intervention, and sample). In contrast, the hierarchical (or random-effects) model allows treatment effects to vary across studies. The typical “Rubin” set-up specifies that each individual study estimates its own treatment effect, and that each individual treatment effect is in turn drawn from a common distribution. A hierarchical model can be estimated using frequentist methods such as maximum-likelihood or empirical Bayes estimators, but Bayesian methods offer several advantages (for example, they limit overfitting issues, are better at quantifying the uncertainty around parameters, and allow one to incorporate strong priors). Such models are particularly suited to aggregate evidence from similar experiments conducted in different contexts.

In a new working paper, we use a Bayesian hierarchical model (BHM) to aggregate evidence from six experiments expanding farmers’ access to index insurance. Each study explores the effect of similar interventions in different contexts (Bangladesh, Burkina Faso, Ghana, Kenya, India, and Mali). With the full pooling model, we find that farmers with access to index insurance cultivate more land and invest more in productive inputs (Figure 1). Estimates from the BHM confirm the potential of index insurance to foster farmers’ productive investments, however, they also show that effects are more imprecise than with the full pooling model and can be close to zero or negative. This is not surprising: the full pooling model tends to underestimate the uncertainty around point estimates because it ignores treatment effect heterogeneity. The BHM also estimates more uncertainty around treatment effects than frequentist methods to estimate the hierarchical model such as a maximum-likelihood estimator (REML estimates in Figure 1).

Figure 1: The average effect of index insurance on farmers’ production decisions

How to estimate a BHM?

BHMs can be estimated on R using the baggr package (short for "Bayesian aggregator"). This package was developed by Rachael Meager and Witold Więcek and greatly facilitates the implementation and tractability of Bayesian meta-analysis. It provides a suite of models that work with both published estimates and full data sets.

One needs to specify the following main arguments in baggr: (i) the model to use; (ii) the level of pooling; and (iii) the priors.

i. Model: the most popular models are the "rubin" model (outlined above), and its extension, the "mutau" (or joint) model. The "mutau" model incorporates information on the mean in the control group and can improve the precision of the estimates. In our case, we used the "rubin" model because our outcome variables are expressed in standard deviation units. However, when outcomes are defined using more tangible units, e.g. in US dollars, one can use the "mutau" model (as in Rachael Meager’s meta-analysis of micro-credit experiments).

ii. Pooling: this option allows to specify whether the model to be estimated is a fixed-effects (pooling = "full") or a random-effects (pooling = "partial") model. One can also specify pooling = "none" to recover the estimates from the individual studies.

iii. Priors: baggr can generate automatic priors based on the data (prior = NULL). In this case, the priors are considered as weakly informative. Stronger priors can also be specified by the researchers to improve precision and reflect the knowledge from the literature (as in Meager’s paper).

For reference, our R code to aggregate effects on the use of fertilizer is:

One can then do pooling(fertilizer_baggr, metric = c("weights")) to see how the different estimates are weighted and which studies contribute the most to the estimated average effect.

Things to watch out for:

- Is the original data from each study needed? BHMs and baggr can be applied using only the published estimates, but relying on the original data has some advantages: (i) it allows one to standardize outcomes and to analyze variables that are not always included in the original studies (either because of a lack of statistical power or because of researchers' own preferences); (ii) it allows incorporating individual covariates in the model and exploring the sources of treatment effect heterogeneity; (iii) when they are not reported in the original studies, it allows one to derive the mean and the standard errors of the outcome in the control group and thereby to improve precision by estimating a joint (“mutau”) model. Naturally, these advantages should be weighed against data availability constraints.

- Can evidence from observational studies be included? A crucial assumption of the BHM is that each study provides an unbiased estimate of the study-specific treatment effect. For this reason, most applications in economics so far have focused on aggregating evidence from experimental studies, although nothing in the method prevents the inclusion of observational data. We note also that new models to aggregate evidence from both observational studies – which may be more subject to internal bias in estimating causal effects – and experimental studies are under development (see e.g., Gechter and Meager 2022).

- Which covariates to use to understand heterogeneity in treatment effects? In theory, both individual and study-level covariates can be used in a BHM to understand treatment effect heterogeneity. They can help to identify characteristics of the populations and of the interventions that are associated with larger and/or more heterogeneous effects. In practice, however, conditioning on study-level covariates can be challenging if the meta-analysis includes only a few studies. In our study of index insurance interventions, some characteristics (e.g., the decision level for take-up and the timing of payments) are perfectly colinear across studies and therefore hard to disentangle.

- Can it be used to assess external validity? BHMs estimate the level of treatment effect heterogeneity in experiments run in different contexts, and as such provide an appealing tool to explore questions around external validity. A range of pooling metrics have been developed in the literature, the most prominent of which is perhaps the conventional pooling factor. This metric emphasizes the percentage of total variation in treatment effects stemming from sampling variation (the higher this metric, the more suggestive it is of external validity). However, our experience using this metric is that estimates can be quite imprecise, especially with few studies or small samples. In our application using six studies on index insurance, we estimate an average pooling factor of 0.52 with 95% posterior intervals between 0.14 and 0.94. We also find that the predicted effects of index insurance in a new setting are imprecise (Figure 2), highlighting the need for more studies. The continued accumulation of evidence and the rapid transition towards more open norms around data sharing give rise to optimism. However, given the typical size of evidence bases in economics, our expectation is that accurately answering questions around external validity will remain difficult with current metrics.

Figure 2. The predicted effect of index insurance in a new study

Jules Gazeaud is a Postdoctoral Fellow at J-PAL Middle East and North Africa.

Pauline Castaing is an ETC in the World Bank’s Development Data Group in the Data Production and Methods Unit.

Thanks to David McKenzie and Witold Wiecek for helpful comments as we developed this blog post.

Get updates from Development Impact

Authors

Development Impact Guest Blogger

Guest bloggers

More Blogs By Development

Pauline Castaing

Economist, Living Standards Measurement Study (LSMS), World Bank

More Blogs By Pauline

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.