An article titled “Synthetic control arms can save time and money in clinical trials” that I read last month discusses how drug trials can be faster and cheaper by using data collected from real world patients instead of recruiting a control group, hence the term “synthetic controls.” Proliferation of digital data in the health sector, such as “…health data generated during routine care, including electronic health records; administrative claims data; patient-generated data from fitness trackers or home medical equipment; disease registries; and historical clinical trial data” makes such designs an increasingly feasible possibility. Combined with the fact that large amounts of time and money are spent on clinical trials, the option is attractive to researchers, drug companies, and patients awaiting new treatments alike.
Given that “big data” is the buzz word in social sciences these days and all of us are making more use of administrative (or other non-survey) data, reading the article made me wonder under what circumstances I might consider foregoing recruiting a control group for an intervention trial. Below are my initial thoughts, none of which have been thoroughly thought through, discussed with colleagues, or vetted. Comments and pointers to existing literature are welcome as usual…
First, what do we mean by a “synthetic control group” in this setting? It essentially means that I want a control group that is not randomly drawn, but rather “matched” to my intervention group. Say, I am recruiting patients for a trial that is evaluating a new treatment. As people arrive at the clinic, instead of randomizing them into standard or care or new treatment (C or T), I assign them all into the new treatment. Say that power calculations suggest that I need 128 individuals in each arm to detect the effect we are after and I reach that mark: now what?
The idea is that there is a pool of individuals out there with same disease, eligible for the same treatment, for whom relevant (I am assuming, anonymized/de-identified) data already exist and is available to us (maybe with some effort, like manually going through patient charts). Now, all we have to do is comb that database to find 128 good matches to my treatment group to serve as the control group. [Note that if the data retrieval is costless, I can have a control group that is much larger, which would reduce the variance in that group and allow me to lower the sample size in T by a few people.]
Now, you might have been asked: “Does anybody believe matching anymore?” My thought on this is that it depends: if we ‘re talking PSM diff-in-diff, then, I might say no (recognizing that there might have been no palatable alternative in a particular case). You can’t replace an RCT with that without a real trade-off. But, what if we’re talking about high-frequency (say, weekly or monthly) data that is going back a few years on a large number of people? What if I can use a method that shows you that both relevant covariates and the lagged primary outcome indicator are tracking the treatment group perfectly over a long period of time. Not only can I show baseline balance, but also identical trends pre-baseline: would you be more convinced then? I possibly would, especially if the alternative is an RCT that may come with a significant delay to reach time-sensitive findings and/or cost millions more.
The readers keeping up with the literature on causal inference will notice that we seem to be talking about an intersection of diff-in-diff models and synthetic control models (a la Abadie et al.). The idea is that a marriage of these two literatures, i.e. treatment effects under ‘unconfoundedness’ and the synthetic control literature, may give us much more confidence when we want to consider this alternative. Fortunately, for us, Athey et al. (2018) have brought us “Matrix Completion Methods for Causal Panel Data Models,” which discusses this very issue and proposes a new estimator that not only does well when we have large N and T, but also better than the alternatives when T
What are some of my questions/worries?
- I would probably want my control group from a pool of people who did not have the choice to recruit in the trial. This is because I would worry about unobserved characteristics driving the differences in outcomes (“why did the controls not step forward and get recruited to participate in the trial?”). So, pick them from far away areas, perhaps.
- As mentioned in the STAT news article linked above, the “standard of care” would have to be well-defined and stable. If people are undergoing a variety of different treatments, it will be harder to find good matches.
- A few years back, I attended a causal inference workshop and Abadie said something to the effect of “There is no guarantee that [synthetic controls] will work, but you are free to try.” This recognizes the heavy data requirements of this method, with data on each observation going back a long way and preferably frequently. What if I recruit a bunch of people for my trial but can find no good matches (synthetic controls) for them? Methods are needed to do ex ante simulations to show that this risk is minimal. [If regulatory bodies like the FDA are going to approve drugs on the basis of such methods, you cannot have an exercise where the balancing between T and C is only so so – rather than what you might expect from, say, a block-stratified random assignment…]
- We thought about doing something like this in an upcoming trial with 180 health clinics, where we have monthly data on the outcomes of interest going back a good while. So, we could have randomly selected some clinics for our intervention and not recruit the other ones into the study (note, here, however that the groups are still randomly assigned – unlike the suggestion of finding real world synthetic controls) and used publicly available administrative data to evaluate the outcomes. However, we did want a little more data than that was just available publicly and we wanted the data to be consistent across treatment arms. That immediately settled the debate in favor of recruiting all facilities and harmonizing data collection and quality across them – rather than save some money and worry about data and interpretation issues.
- We might also value the placebo treatment in our trial. For example, if you are switching from pen and paper way of doing something (say, medical records) to a digital one (say, tablet-based) and you have an intervention that comes with the tablet, you would like to be able to isolate the effect of the intervention than that of simply using the tablet, which requires giving everyone tablets and varying the software (placebo vs. the real intervention).
I get that you may have all the data you need before the trial, but what if you need data from your control group after the trial and some of your synthetic controls are lost to follow-up (they don’t show up in your source of data anymore). With anonymous data and no consent, you are not allowed to contact these people for follow-up, so you are at the mercy of these data systems. That requires a large amount of faith in data sources, over the collection of which you have no control.
- Related to this, you could use data of people completely from the past, including people whose final outcomes have been observed. But, can people from the past be a good control group? I think you would have to assume stationarity of some sort – otherwise, they are not good counterfactuals to what would happen to your subjects in the trial today in the absence of the new treatment.
- Finally, would registries like clinicaltrials.gov or AEA Social Science Registry or journal editors accept your trial as a proper trial without the randomization?
Addendum [3/28/2019]: Susan Athey kindly referred to three papers that might be useful for the motivated readers (I might do a follow-up blog once I read them OR we might ask her to do a guest post...).
- Paper on the impact of the shutdown of Google News in Spain takes a similar approach of constructing a control group selected from a large set of users.
- In terms of machine learning methods for this, she'd now recommend combining their "synthetic difference in differences" approach with the matrix completion approach mentioned in my post above.
- Ensemble Methods for Causal Effects in Panel Data Settings (forthcoming in AEA P&P)