There has been a lot of work recently on measuring women’s agency. Together with colleagues from a range of institutions that care a lot about gender and do a lot of surveys, we did a survey paper a couple of years back. More recently, Aletheia Donald and I did a post about some work showing how disagreement over decision making might be showing us other facets of power rather than just looking at individual responses.
An interesting new paper by Seema Jayachandran, Monica Biradavolu and Jan Cooper seeks to give us concrete guidance on how to measure agency more simply – especially when agency isn’t the main focus of the survey at hand. Jayachandran and co. take a very nice qual-quant approach to try and figure out what are the key questions you should be asking.
The key issue here is: what’s the gold standard against which these survey questions should be measured? Enter the qualitative work. Jayachandran and co. conduct 45 minute semi-structured interviews with a sub-sample of their respondents (more on the sample in a minute). These interviews cover five selected domains: women’s decision making around kids’ education, kids’ health and household expenditures plus her own fertility and mobility. From these interviews they get a rich set of data that lets them pull out things like resistance in addition to the usual restrictions. (Cool methodological side note: during the qualitative work they hired a “distractor” who kept the rest of the household engaged in a discussion so interviews could be private).
Jayachandran and co. and their research team then code these qualitative interviews to reduce the answers on domains to a 1-4 scale (fertility turns out to be more complicated and needed a two-stage process).
Now, in case you’re a diehard experimental economist and think this is all a bunch of malarkey, Jayachandran and co. also run a real-stakes game with their respondents to set up another potential standard against which to measure their survey questions. It turns out the game didn’t work well (I’ve had similar experiences – where games don’t correlate well with survey questions, but may line up with some behaviors – that’s a topic for a future post).
So putting aside the game and returning to the qualitative work, Jayachandran and co. now have a standard against which to measure our frequently used quantitative survey questions. To put together the candidate questions, they assemble a long list of questions that have been used to measure women’s agency by a number of reputable survey and research folks. They toss out questions that overlap and end up with 64 questions.
So, off to northern India to put this to the test. They sample married women with a child under 10 across 21 villages. 443 of them get the quantitative questions, with a random subset of 210 getting the qualitative questions as well.
Now, what to do with all of the data? Wait, did you say machine learning? Indeed, this is one of the approaches Jayachandran and co. use. They use LASSO stability selection which takes repeated sub-samples of the survey question data and looks at which ones do the best at predicting the results of the qualitative work. They limit the number of questions they want to end up with at 5 – since this is a reasonable add-on to a survey that isn’t focused on agency.
The second approach they take is backward sequential selection (aka how I get dressed in the time of COVID). This approach basically iterates on creating indices with the survey question answers, regressing them on the qualitative answers, and progressively tossing out the variables that lead to the smallest loss of R-squared.
Interestingly, these two approaches converge on three of the top five questions (although they don’t rank them the same). These are:
· Is her opinion heard when an expensive item like a bicycle or cow is purchased for the household?
· Does she need permission from other household members to buy clothing for herself?
· Is she permitted to visit women in other neighborhoods to talk with them?
The LASSO stability approach also picks a question on whether she can buy things in the market without permission and another on who she has to consult on children’s health care. The backward sequential selection chooses instead whether she is allowed to go alone to meet her friends for any reason and who in the household decides to pay schools for a relative of hers.
Jayachandran and co. note that these are fairly specific. None of the more general questions (e.g. on a ladder, where do you see yourself…) show up in the top questions by either method. And they also provide a bit of a cost-benefit in terms of interview length. Using the top 5 questions explains about 27-29 percent of the variation in the qual measures. On the other hand, using all 64 of the questions that Jayachandran and co. tested (and which take 45 minutes to administer) explains 53 percent of the variation.
So this is helpful work – giving us some core questions that seem to capture a chunk of much more in-depth and richer qualitative measures. However, as Jayachandran and co. note, this is from one sample in northern India. And an obvious next step would be to take this to another context. And to that you can bring their neat new approach of MASI – machine learning and semi-structured interviews. A couple of other things to think about. First, this is a measure of stock of agency, not changes. It’s altogether another question to ask which are the most dynamic components of agency. And that brings us to using this in the context of an impact evaluation – not only would you want to think about what would change, but also in which domain(s). The quest continues.
Our group at IFPRI that has been working on measuring empowerment (Ruth Meinzen-Dick, Hazel Malapit, Greg Seymour, Jessica Heckert, and myself) was very excited about this paper (we are a mixed-methods group of economists, a sociologist, and a social demographer). We really like the idea of using the qualitative work as the “gold standard” against which individual items (questions) will be judged. The qualitative work indeed seems much more appropriate as a gold standard for local understandings of agency than the lab-in-the field. We thought that the machine learning approach was novel (and it gave us ideas for validating our own measures (pro-WEAI) as well, although we are using more standard psychometric validation approaches). What we are less optimistic about is using the 5-item scale in a context that’s very different from the Indian villages where the qual work and the survey were conducted.
One of the strengths of qualitative work is its ability to capture what people really think in their particular contexts. However, the findings aren’t necessarily transferable because gender norms and definitions of agency are so context specific. In our qualitative work for pro-WEAI, we found certain things that emerged in common across contexts, but many that differed, particularly in the extent to which taking decisions alone or in consultation with others was considered empowering. So, if the ML identifies three specific decisions as important, these may not necessarily translate to other contexts (I can see large household purchases and items in the market being transferable to other contexts, but not necessarily clothing for the woman, especially in societies with "separate purses"). Agency over physical mobility (whether she can visit women in her neighborhood without permission, and her children's health care) may not be relevant in contexts where women have more freedom of movement. For example, there was an interesting study with widows, in Uganda (Navbor et al. 2016), that used GIS trackers to understand their mobility. They concluded that they took tons of short nearby trips (neighbors, nearby market/store, etc). However, they didn't travel far and for long periods of time. So basically, the questions above would miss where these women experience limitations. In terms of comparing the qual work with the extensive set of questions, would the results not also depend on what this original set of questions contained?
All in all, while we like the approach of using qualitative work as a gold standard to compare question sets to, we are understandably cautious about the recommendation to use the 5 items in other studies outside of India. We’re sure that many researchers will use this as a chance to validate this question set against others; we hope that no one will take the particular 5 items as the final word, rather than replicating the process for their context. I guess I would say that there should be different “gold standards” for different contexts to reflect the context-specificity of gender norms and definitions of empowerment/agency. I would suggest stressing the importance of the approach, rather than necessarily the 5 items specifically, for measuring women’s empowerment.
Naybor, D., Poon, J. P. H., & Casas, I. (2016). Mobility Disadvantage and Livelihood Opportunities of Marginalized Widowed Women in Rural Uganda. Annals of the American Association of Geographers, 106(2), 404–412.