Published on Development Impact

List Experiments for Sensitive Questions – a Methods Bleg

Berk Özler

May 08, 2017

This page in:

About a year ago, I wrote a blog post on issues surrounding data collection and measurement. In it, I talked about “list experiments” for sensitive questions, about which I was not sold at the time. However, now that I have a bunch of studies going to the field at different stages of data collection, many of which are about sensitive topics in adolescent female target populations, I am paying closer attention to them. In my reading and thinking about the topic and how to implement it in our surveys, I came up with a bunch of questions surrounding the optimal implementation of these methods. In addition, there is probably more to be learned on these methods to improve them further, opening up the possibility of experimenting with them when we can. Below are a bunch of things that I am thinking about and, as we still have some time before our data collection tools are finalized, you, our readers, have a chance to help shape them with your comments and feedback.

What is a sensitive question? In any setting, a sensitive question is one that you’d think that the likelihood of your respondents being untruthful with their response is significantly higher than the rest of your questions. For various reasons, people are more reluctant to talk about some topics than others, even when the enumerator emphasizes their anonymity and privacy. However, in an RCT setting, I like to define a sensitive question as any key (primary) outcome for which the possibility of differential self-reporting bias is a concern. Think schooling, for example, in an evaluation of a scholarship program. If you have the resources, you can get other independent estimates of school attendance (random audits, etc.), but sometimes you don’t have the money to do this. Methods, such as list experiments, can perhaps then help.

What is a list experiment? The basic list experiment would follow something like this: you would randomly split your sample into two groups: direct response and veiled response. In the direct response group, you ask your sensitive question directly as you normally would when you don’t worry about self-reporting bias. In another section, you’d have a set of, say, four “innocuous statements.” The respondent is instructed to listen to these statements and respond with the total number of correct statements. Notice that the enumerator does not know the answers to the individual questions (more on caveats later). Via random assignment, this group serves as the control group that gives is the average number of correct statements in our sample. In the veiled response group, the sensitive question is included in the list of innocuous questions as an additional question. The difference between the veiled and the direct response groups is the prevalence of your sensitive question in your sample. See this paper from a health messaging intervention in Uganda and this paper on antigay sentiment in the United States, as recent examples using this method.

The power tradeoff from experimenting within your experiment: Notice that this method requires you to split your sample. If you’re going to ignore the answers to your key outcome from the direct response group (as I am leaning towards doing), this leads to a straightforward loss of power from reduced sample size in your experiment. If you have the money and the energy, you could go to some individuals outside your study sample (that are identical to your study sample) and learn the number of correct responses from them, after which you can subtract it from everyone in your study, who would be in the veiled response group. But, that can be costly, and, unless, you thought this through at baseline and have a group that you excluded just for this purpose, you’re not sure that this average is what would have held in your sample. However, this is individual randomization, so you don’t need a lot of individuals allocated to the direct response group to get a good estimate of the number of correct statements. Instead of allocating 50% of your sample, do, say, 10-20%.

Should I block the randomization by original treatment arms? I currently think that this is a good idea. It is possible that the answers to the innocuous question you came up with are influenced by your original treatment, in which case, you want more of a difference-in-difference rather than a single difference for your prevalence of sensitive questions by original treatment arm: for each individual, you could consider subtracting the correct number of statements for the intervention group that individual belongs to rather than everyone. However, this becomes more demanding from a power perspective. So, I’d favor blocking the random assignment by intervention status, but then still analyzing the data by using the average in the entire sample of the direct response group – as long as there are no apparent large differences within this group by intervention arm.

What about the innocuous questions? Notice that, in my example above, if you answer 0 or 5 as your number of correct statements in the veiled response group, the enumerator still knows your answer to the sensitive question (and every other one). Perhaps, you don’t care about one tail because people in that tail are saying they did not bribe officials, etc. But, what should we do about the other tail? One way to tackle this problem would be to come up with a question that is obviously false? For example, in our study in Liberia, I could have a statement that says, “Barack Obama is the President of Liberia.” Then, no one (hopefully) would answer with a 5. However, the quick respondents will catch on to this and know that the enumerator will discern their answer to the sensitive question if they say “4.” This becomes academic, but it’s been suggested to me that one way to get around this problem is to find one statement that a large majority of individuals in your sample would say is correct, and then find another statement that is highly negatively correlated with it. So, if you answer, “correct” to one, you’d almost always answer the other with a “false” (“I live in an apartment,” and “I have a pit latrine.”).

What experiment should I run? Currently, I am leaning towards experimenting with some unknowns. First, one method to deal with sensitive questions is CASI or ACASI (Audio Computer-Assisted Self-Interviewing). However, their effectiveness has not been established when it comes to eliciting more truthful answers, in, say, risky sexual activity. I would first introduce a CASI/ACASI group in addition to the direct response group. Second, as I mentioned in the previous post mentioned above, the veiled response requires the respondents to listen, count/add to report the total number of correct statements. Some people I talked to worry about the noise that is introduced by this. So, one way of dealing with this would be to also do the veiled response group use ACASI. So, at least, they could take their time to count the correct number of statements on the screen before responding. Third, one could add a further improvement here to reduce noise: imagine that the list of questions are on one screen (in an ACASI setting where the enumerator cannot see the responses) and as the respondent answers each question, the question disappears and a counter updates the number of correct statements. At the end, the screen has only a big number at the center. If you instructed the respondent before the administration of the section that the enumerator cannot know her answers due to this method of administration, she might be more truthful. But, the question is whether she’d be more truthful than a simple ACASI, both of which require her to record the answer to the individual question on the screen, even though the answer cannot be seen by the enumerator. So, an experiment may include a direct response group, an ACASI direct response group, a traditional veiled response group, an ACASI veiled response group, and an ACASI veiled response where the respondent is still answering the questions individually, but she knows that only the total is visible at the end (you could make the total disappear at the end as well before the tablet is passed back onto the enumerator). Lots of choices…

Finally, some people have commented on the fact that making a big deal about the innocuous question, telling the respondent that the enumerator cannot know their answers, as well as the contrast between the innocuous and sensitive questions may only draw more attention to the whole thing. A better method may be to simply bundle some questions that you would have in your survey anyway, into the group of “innocuous questions” (so they’re not so innocuous) in the direct response group and then add your sensitive one to these in the veiled response group without prompting the respondent about privacy, etc. at all. Most people will realize that their answer is hidden from the enumerator. This is a hypothesis that can be tested, so is the hypothesis that where these questions come in the survey may matter.

If you have comments on any of these ideas, or have more ideas of your own, please use the comment section to share. Thanks!

Trade

Get updates from Development Impact

Authors

Berk Özler

Lead Economist, Development Research Group, World Bank

More Blogs By Berk

Join the Conversation

The content of this field is kept private and will not be shown publicly

Remaining characters: 1000

I have read the Privacy Notice and consent to my personal data being processed, to the extent necessary, to submit my comment for moderation. I also consent to having my name published.