For the World AIDS Day, there is a sign at the World Bank that states that taking ARVs reduces rate of HIV transmission by 96%. If this was last year, a sign somewhere may well have read “A cheap microbicidal gel  that women can use up to 12 hours before sexual intercourse reduces HIV infection risk by more than half – when used consistently.” Well, sadly, it turns out, so much for that.
NYT  and Nature  reported last week, a separate trial called VOICE  (Vaginal and Oral Interventions to Control the Epidemic) conducted at 15 trial sites in three African countries with a sample size of 5,000 concluded that the incidence rate among the women using the gel was not better than the placebo group . In contrast, last year’s promising trial CAPRISA  was based in two sites in Kwazulu-Natal, South Africa with a sample size less than 1,000.
We will not know what happened until later in 2012, as while the gel arm of the intervention has been discontinued, the study is continuing (in particular, Truvada, a combination pill, is continuing to be studied – I guess it is promising that it did not get cancelled as well after the routine review of the data) and the data will not be unblinded until the end of the study. However, there are a few likely culprits. The first that comes to mind is that “tenofovir” by itself is not effective in reducing HIV infection risk, and that the findings from CAPRISA were a fluke. Second is the possibility that the earlier CAPRISA trial had external validity issues: we think less of generalizability issues when it comes to medical trials but we probably should pay more attention to them. It is possible that the initially HIV-negative women enrolled in that trial look quite different than those in the VOICE trial that got cancelled. When we offer cash transfers to individuals in our studies, we don’t get much of a selection bias – not many people say no to cash with no strings attached. But, when we try to enroll people to test a new drug, especially for STIs, that’s a different story: it’s hard to think that the people who were enrolled are a representative sample of all sexually active women in the target age group. For example, eligibility criteria included using a non-barrier form of contraceptive in CAPRISA (I don’t know if a similar criterion existed in VOICE). Actual HIV incidence was also lower than those extrapolated from prevalence studies in the study area. Third, in the CAPRISA trial, the success rate was much higher for people who used the gel regularly, with the success rate going down to just a 28% reduction in HIV incidence among people who used it less than 50% of the time. Of course, this makes sense: you have to take a drug for it to work – the vitamin D pills sitting in my bathroom cabinet are doing no good to me because I keep forgetting to take them every morning (however, the gummy bear version my partner smartly purchased is helping).
So, it is possible that the drug works, but people did not use it consistently/properly in the VOICE trial. As Stefano Bertozzi nicely put in a recent talk, “biomedical interventions still need behavior change.” This explanation seems unlikely, however, as even if people used it less than half the time (but significantly more than zero), the new trial should have detected a decline with its larger sample size: that did not happen – the incidence rates were identical in the placebo and gel trial arms. Unblinded data will hopefully help solve part of this puzzle. Or, the drug simply is not effective alone and the earlier results were a statistical anomaly: either because the proposed biological pathway is not there, or because it only works for certain types of people and not everyone.
It is important to note, by the way, that FACTS 001, a Phase III trial testing the same regimen of tenofovir gel used in CAPRISA 004, plans to continue its study. FACTS 001 began enrolling participants in October and will involve approximately 2,200 women at up to nine sites in South Africa, with results expected in 2014. This suggests that there may be differences in the regimen (dosage, use, etc.) between the two trials that researchers will try to iron out. I wonder how much collaboration there was to coordinate these two studies so that we could learn as much as possible about the efficacy (and eventually effectiveness) of this medical intervention. It likely is the case that two studies on the same drug could benefit from such coordination ex-ante, rather than examining what differences there were between them ex-post.
This brings me to the issue of trials in economics. As far as I can tell, the studies we run are akin to Phase 2 trials in medicine. They are in a particular place with a particular set of fixed parameters (that matter for the outcome of interest) and assess efficacy. We, like the biomedical community, get excited about positive Phase 2 trial results if the intervention is important: the first experimental evidence demonstrating that a certain poverty alleviation program works is very exciting. However, unlike the biomedical community, who run Phase III trials immediately afterwards, we are much more laissez-faire and/or lackadaisical about following up our promising studies in other settings. I am not saying that this does not happen, but it happens slowly and haphazardly. This is partly because we are too quick to go from “we don’t have any good evidence on whether X works to improve Y” to “we already know that X doesn’t work to improve Y” after one study.
Again, I am not saying that follow-up studies to promising interventions don’t happen: follow-up studies in microfinance and some promising savings initiatives are ongoing. But, usually we don’t have a coordinated effort to find out for whom and under what conditions a socioeconomic intervention works. After a successful efficacy trial, we need to launch studies in multiple sites that systematically keep some characteristics fixed while tweaking other important design elements. We can let a thousand flowers bloom, e.g. there are now a million cash transfer evaluations, but our ability to synthesize the information from independently conceived studies would be much larger had some of these studies been designed as part of a coordinated effort to produce important knowledge for development. I have been recently finding out that raising funds for an ambitious multi-site follow-up study is not easy –nor is getting the necessary buy-in from all the different counterparts. In medicine, there are existing networks of sites that can be used for follow-up trials and, presumably, private sector helps finance a lot of the costs of Phase III trials. We usually cannot expect to have either of these to be in place with structural interventions, hence much more effort by the researcher and institutions are needed to make a large follow-up study happen.
Perhaps one idea is for donors to make a pot of money (and other resources) available to groups of researchers who have completed what are akin to Phase 2 trials to provide them with incentives to collaborate (rather than compete or race to be the first) and to follow-up on the efficacy results by conducting a larger, multi-country, longer-term evaluation of the promising intervention. USAID’s Development Innovation Ventures  (started under the guidance of Michael Kremer) has hints of this, but could be made much more explicit. Let’s give people incentives to follow-up on their one-off studies in a systematic manner rather than moving on to the next shiny project.