A little while back, I blogged about a paper that traced the effects of having a gendered language through to the labor market outcomes of today. Today, I am writing about a much more narrower version of this problem – and one near and dear to researcher hearts: grant applications. A fascinating new paper by Julian Kolev, Yuly Fuentes-Medel, and Fiona Murray looks at how we can still get gendered outcomes, even with a blind review process.
Let’s start with the data. Kolev and co. have access to the Global Challenges: Exploration program at the Bill and Melinda Gates Foundation and the 6,794 applications from US-based researchers from 2008-17, as well as the scores by reviewers and the funding decisions. Now, the Gates Foundation uses a particular process that is somewhat different from many other options for these life sciences researchers. First, reviews are blind – the reviewers don’t know whose grant application they are reading. Second, reviewers are from a range of fields and not always the narrow sub-field of the application. Third, all scoring is done independently – there is no conference over divergent scores. And fourth, there is a scoring system that is not a continuous measure, but rather allows reviewers to award gold (1 per pool of about 100 applications) and 5 silvers. So a somewhat coarse measure. Kudos clearly go to the Gates Foundation for making these data accessible and putting all these details out there for the researchers to use.
Kolev and co. start off with the simple fact that most of the proposals – 66 percent – are submitted by men. Beyond that, are women less likely to get funded in this blind review process? The answer is yes, by about 16 percent. But why? The process is blind, and the reviewers can’t know these are applicants by female scientists.
Could it be the subject? Nope – this result holds controlling for topic-area fixed effects (within, of course the broad category of life sciences). Maybe it’s the fact that most of the reviewers are men? Nope – the result holds when controlling for the gender of the reviewer and, interestingly, the interaction of female reviewer and female applicant is positive (but the aggregate effect isn’t significant since there aren’t that many female reviewers – more on this below).
OK, maybe it’s experience or publications? Kolev and co. look at the length of career (measured from first publication) and indeed, the women in the sample are less senior than the men. And they also have fewer publications (even controlling for career length) but not a lower share of top journal publications (when controlling for career length). So Kolev and co. throw these variables into the regression looking at the possibility of getting funded. And lo, publications do help you get funded (especially those top journals). And gender? Still significant and negative, even when controlling for all the measures of career length and publications. So It doesn’t appear to be publications or experience.
Maybe it’s persistence? It turns out, repeat applications tend to score higher (this article is full of useful, data driven tips on how to increase one’s odds of getting funding). But no, women do reapply less, but this appears to be driven by experience and indeed, with Kolev and co. put it into the regression, the gender effect stays negative and significant.
Maybe it’s how the proposals are written? Kolev and co. take to textual analysis and look at the words used by men and women. These turn out to be highly correlated but there are some differences. To try and get a handle on this, Kolev and co. single out “’narrow’ words (those which appear significantly more often in some topics than others) and ‘broad’ words (which appear at similar rates in all topic areas).” Female proposal authors tend to use more of these “narrow” words, while male proposals tend to have more of the “broad” ones. (It’s important to note that these are relative concepts of narrow and broad, driven by the proposals, not some broader notion).
Kolev and co. then take words associated with higher or lower scores and use these to predict scores using calibration from male proposals only. Both broad and narrow words seem to make a dent in the gender coefficient, but narrow words are the ones that reduce the gender effect to insignificant (and less than half of the base case). So a large chunk of the gender disparity in scores appears to be driven by word choice. Interestingly, the inclusion of the word choice variables (as well as a host of other grammatical controls) doesn’t budge the female reviewer positive effect for female applicants – which suggests that female reviewers are forming their scores based on other factors.
Kolev and co. go on to look at the impacts of getting funding on future outcomes. They do this with a difference in difference among the proposals that received at least one positive review (silver or gold, as described above) and comparing the funded with the non-funded proposals. It’s basically a crude regression discontinuity. Among this sample, funding on its own doesn’t do statistically significant wonderous things for publications – very little is significant. However, for women it makes a big difference, with the combined effect of female and getting funded offsetting the negative female coefficient on variables such as top journal articles and future large NIH funding (with the latter more than offset). As Kolev and co. put it, this funding seems to “level the playing field” for women.
All in all, this is an interesting look behind the scenes at grant funding and how even a blind review process clearly still has some biases. It also raises a host of questions: what if more reviewers were women? What if this were a set of fields where more applicants were women? The majority were women? What if there was a different scoring process? What if more funding programs made their data available so we could answer these?
(Full disclosure: the team I work with holds two grants from the Gates Foundation. Neither of them were in the life sciences. The writing of one was led by a woman. The other by a man. And next time we will use a higher share of verbs.)
Let’s start with the data. Kolev and co. have access to the Global Challenges: Exploration program at the Bill and Melinda Gates Foundation and the 6,794 applications from US-based researchers from 2008-17, as well as the scores by reviewers and the funding decisions. Now, the Gates Foundation uses a particular process that is somewhat different from many other options for these life sciences researchers. First, reviews are blind – the reviewers don’t know whose grant application they are reading. Second, reviewers are from a range of fields and not always the narrow sub-field of the application. Third, all scoring is done independently – there is no conference over divergent scores. And fourth, there is a scoring system that is not a continuous measure, but rather allows reviewers to award gold (1 per pool of about 100 applications) and 5 silvers. So a somewhat coarse measure. Kudos clearly go to the Gates Foundation for making these data accessible and putting all these details out there for the researchers to use.
Kolev and co. start off with the simple fact that most of the proposals – 66 percent – are submitted by men. Beyond that, are women less likely to get funded in this blind review process? The answer is yes, by about 16 percent. But why? The process is blind, and the reviewers can’t know these are applicants by female scientists.
Could it be the subject? Nope – this result holds controlling for topic-area fixed effects (within, of course the broad category of life sciences). Maybe it’s the fact that most of the reviewers are men? Nope – the result holds when controlling for the gender of the reviewer and, interestingly, the interaction of female reviewer and female applicant is positive (but the aggregate effect isn’t significant since there aren’t that many female reviewers – more on this below).
OK, maybe it’s experience or publications? Kolev and co. look at the length of career (measured from first publication) and indeed, the women in the sample are less senior than the men. And they also have fewer publications (even controlling for career length) but not a lower share of top journal publications (when controlling for career length). So Kolev and co. throw these variables into the regression looking at the possibility of getting funded. And lo, publications do help you get funded (especially those top journals). And gender? Still significant and negative, even when controlling for all the measures of career length and publications. So It doesn’t appear to be publications or experience.
Maybe it’s persistence? It turns out, repeat applications tend to score higher (this article is full of useful, data driven tips on how to increase one’s odds of getting funding). But no, women do reapply less, but this appears to be driven by experience and indeed, with Kolev and co. put it into the regression, the gender effect stays negative and significant.
Maybe it’s how the proposals are written? Kolev and co. take to textual analysis and look at the words used by men and women. These turn out to be highly correlated but there are some differences. To try and get a handle on this, Kolev and co. single out “’narrow’ words (those which appear significantly more often in some topics than others) and ‘broad’ words (which appear at similar rates in all topic areas).” Female proposal authors tend to use more of these “narrow” words, while male proposals tend to have more of the “broad” ones. (It’s important to note that these are relative concepts of narrow and broad, driven by the proposals, not some broader notion).
Kolev and co. then take words associated with higher or lower scores and use these to predict scores using calibration from male proposals only. Both broad and narrow words seem to make a dent in the gender coefficient, but narrow words are the ones that reduce the gender effect to insignificant (and less than half of the base case). So a large chunk of the gender disparity in scores appears to be driven by word choice. Interestingly, the inclusion of the word choice variables (as well as a host of other grammatical controls) doesn’t budge the female reviewer positive effect for female applicants – which suggests that female reviewers are forming their scores based on other factors.
Kolev and co. go on to look at the impacts of getting funding on future outcomes. They do this with a difference in difference among the proposals that received at least one positive review (silver or gold, as described above) and comparing the funded with the non-funded proposals. It’s basically a crude regression discontinuity. Among this sample, funding on its own doesn’t do statistically significant wonderous things for publications – very little is significant. However, for women it makes a big difference, with the combined effect of female and getting funded offsetting the negative female coefficient on variables such as top journal articles and future large NIH funding (with the latter more than offset). As Kolev and co. put it, this funding seems to “level the playing field” for women.
All in all, this is an interesting look behind the scenes at grant funding and how even a blind review process clearly still has some biases. It also raises a host of questions: what if more reviewers were women? What if this were a set of fields where more applicants were women? The majority were women? What if there was a different scoring process? What if more funding programs made their data available so we could answer these?
(Full disclosure: the team I work with holds two grants from the Gates Foundation. Neither of them were in the life sciences. The writing of one was led by a woman. The other by a man. And next time we will use a higher share of verbs.)
Join the Conversation