Published on Development Impact

Issues to consider around using repeated cross-sections in a clustered experiment

This page in:
Development Impact logo

This post is co-authored with Sarah Baird and Jennifer Seager. Both are economists in the Department of Global Health at George Washington University.

In a cluster-level RCT, assignment to treatment is randomized at the level of a community, school, clinic, or other cluster. The standard practice is to then conduct a baseline survey of units (households, students, patients, etc.) and then re-interview these same individuals in a follow-up survey. But in an experiment that Sarah and Jennifer are currently designing, repeated cross-sectional surveys of community members are instead being considered. Upon reflection, this is a situation that may arise in a broader variety of clustered experiments, so we thought it useful to discuss some of the design issues and considerations that arise.

Why might one use repeated cross-sections rather than a panel

A first situation that comes to mind is when researchers are using data collected by others. For example, if governments randomly choose some individuals from a community to survey for a labor force survey or some patients to conduct patient interviews with, then the data available at baseline will be for a different set of individuals than those available at follow-up. Here it is not a researcher choice, but rather dictated by data availability.

The second situation is when it is researcher/implementor choice. Here there could be several reasons for considering a fresh sample at follow-up rather than re-interviewing the same baseline sample:

(a)    Using the baseline to understand which characteristics to target at follow-up:  in a microfinance experiment, Banerjee et al. did a random sample of households at baseline. However, they then realized that borrowing was a relatively rare event, so a random sample would end up having few households that actually borrowed (took up treatment) – so they did an entirely new census at follow-up and used it to select a follow-up sample of households with characteristics that they thought were more likely to borrow.

(b)    Concerns about a baseline survey changing future responses: this could come about in two ways. The first is attrition – maybe the survey is burdensome enough that people might be willing to do it once, but many people would not want to do it again a second time. A second concern would be one of Hawthorne or Social Desirability Bias effects – perhaps merely asking people something in a baseline survey could change their subsequent behavior (e.g. more likely to seek help after a survey on violence).or change how they report behavior in a future survey even if their actual behavior has not changed.

(c)     Costs: There are costs associated with tracking individuals over time, that may become  prohibitively expensive in highly mobile populations, particularly when mobile phone access is low.

Considering repeated cross-sections when evaluating IPV interventions

So we now consider this in Jennifer and Sarah’s situation looking at IPV.  First, a little background. Globally, close to a third of women aged 15-49 who have been in a relationship have been subjected to physical or sexual intimate partner violence (IPV) in their lifetime. This not only has profound direct impacts on the women herself, but much broader societal economic and social costs. Thus, it is critical to find scalable and cost-effectives solutions to prevent violence against women and girls (VAWG).  One initiative aiming to do just that is the  ‘What Works to Prevent Violence – Impact at Scale’ Programme funded by the UK’s Foreign, Commonwealth and Development Office (FCDO) that aims to scale up and evaluate promising interventions to prevent VAWG. The research consortium is led by the Global Women’s Institute at George Washington University and is tasked with identifying evidence-based approaches to preventing VAWG, evaluating their impact, and strengthening governments’ and other institutions’ ability to deliver scaled-up, innovative and effective programs to various populations.

As part of their mandate is a set of five initial RCTs (followed by more in future years) to evaluate promising interventions. While all evaluations require careful ethical consideration, violence research presents a very specific set of challenges given the sensitive nature of the topic and potential to put already vulnerable women at further risk. To mitigate the risks there are established WHO protocols that include, for example, interviewing only one out of every five households, and only asking violence questions to one person in a household. With violence research, there are concerns over longitudinal data collection related to attenuation bias due to Hawthorne effects and social desirability, though recent research suggests these concerns may be overemphasized (Jewkes et al., 2020). There are also concerns that asking women about experience of violence puts them at risk of retaliation by abusers, and repeating interviews puts them at further risk. For these reasons, when looking at the impacts of an intervention on community level IPV, a repeated cross-sectional design is being considered. This got us thinking about what the trade-offs are in doing this, what one should be considering when thinking about a panel vs. a repeated cross-sectional design, what this means for power, and if you do a repeated cross-sectional design, what is the purpose of baseline data collection.

Factors to consider with repeated cross-sections and whether a baseline helps at all in this case

As noted above, the main reasons for considering repeated cross-sections instead of a panel are to help address potential concerns around respondent burden, a baseline changing future results, and concerns around costs of tracking respondents over time. But this then raises the question of whether it is worth even doing a baseline survey if it is going to be of different respondents, and how such data could be used.

We see a baseline of different individuals as having the following main uses:

·         Giving estimates of community level prevalence. These could be used for targeting of the program (e.g. if the baseline includes more communities than the program plans on having in the final sample, then one may want to screen out the communities with low initial IPV prevalence and focus on higher prevalence places). They could also be used for heterogeneity analysis (e.g. Do impacts vary in higher versus lower baseline prevalence communities?). In other settings the cluster-level variables of interest may be those available from administrative sources without needing to do a separate baseline survey (e.g. population characteristics from a census, or school characteristics from a Ministry of Education).

·         Giving community level baseline characteristics that could be used for stratifying the random assignment, for showing baseline balance, and as community-level controls in the follow-up regressions to boost power. How much power gain you get will depend on how closely correlated experience of IPV is within communities, which relates to the intra-cluster correlation (ICC). If there is a high ICC, then community is a big explainer of whether you explain violence, and being able to control for community level characteristics should help in predicting endline outcomes even for other women in the same communities. In contrast, if you are working with a group of relatively homogenous communities, or if violence is more related to idiosyncratic or household-specific factors, having community characteristics won’t help much for power.

However, not re-surveying the same individuals has important trade-offs:

·        The possibility of the population changing between baseline and follow-up. Normally we would worry about this with respect to migration. But this captures concerns about any reason individuals might leave the available sample. We might be concerned that those experiencing a recent incidence of violence may too ashamed to answer a follow-up survey, or they may have moved out of the community to flee abusive partners. The only individuals available for a new follow-up survey will be those that this has not happened to. With a panel survey we would at least know that they had attrited and potentially be able to track some of them, whereas with a repeated cross-section you will not be able to learn whether this has happened. But, on the other hand a new sample allows you to explore those that move to the community, which could include key demographics of interest – newly married women (as they often move to the husbands community in many settings), and younger women (as opposed to the cohort that has aged over time)

·         Losing the power gains from individual data: these gains depend on how autocorrelated the outcome of interest is – so here, whether having experienced violence in the past 12 months is a strong predictor of whether you will also be at higher risk of experiencing it again in the next year (likely in the case of violence). If this is the case, then you lose a lot of statistical power by not having the same individuals tracked. In contrast, if instead violence is more random, and any given woman in the community has a similar risk of experiencing violence next year, then the cross-section will be as good for power as the individual panel (unlikely in the case of violence.

Is hybrid the best of both worlds?

Given the above discussion, a third option is to do a rotating panel, where the follow-up survey includes tracking a subsample of the original respondents, as well as surveying a brand new sample. This would allow one to test and document how much attrition/migration out of the sample frame there is, to test for a combination of Hawthorne and Social Desirability Bias effects by seeing whether responses differ for those previously surveyed, as well as to measure inconsistencies in reporting overtime (e.g., Loxton et al., 2019). Banerjee et al. (p.33) note “in retrospect it was a clear mistake not to attempt to systematically re-survey at least a fraction of the baseline sample, even though the baseline sampling frame was weak”.

While this seems sensible, it also involves some potential risks. For example, if the panel data component were to show that those experiencing the highest incidences of IPV actually leave the community, this would make the new cross-sectional component biased and difficult to use. Conversely, if there is no attritional changes, but we see that responses differ according to whether you have been surveyed before, we might then worry about using the panel component. So if the overall follow-up sample size is kept the same as the baseline, but split between the rotating panel and refresher samples, one could end up with a lower sample that you trust for the follow-up analysis. If IPV is highly autocorrelated at the individual level, there will still be less power than a complete panel, since the baseline individual level data will only be there for a subsample.

In the end, authors (and implementing partners) will then have to weigh up these different considerations, and simulate some power calculations for the ICC and autocorrelations they expect in their sample to help guide their decision.  Jennifer and Sarah are still working with the broader team to better understand their context and make a final decision for the design given the factors at play.

David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000