Syndicate content

Being indirect sometimes gets closer to the truth: New work on indirect elicitation surveys

Jed Friedman's picture

Often in IE (and in social research more generally) the researcher wishes to know respondent views or information regarded as highly sensitive and hence difficult to directly elicit through survey. There are numerous examples of this sensitive information – sexual history especially as it relates to risky or taboo practices, violence in the home, and political or religious views. In a previous blog I wrote about how new survey technologies may help elicit truthful responses. But non-truthful response remains a common challenge. Fortunately direct elicitation is not the only survey method available to researchers.

One alternative survey method – the list method – takes advantage of the fact that social research is largely concerned with drawing inferences on a population rather than assessing any one individual view. This method also takes advantage of the ability to randomize the questions asked to survey respondents in order to estimate the average incidence of a sensitive view or behavior in the studied population.

Also known as the item count technique, the list method presents respondents with a list of items and asks for the total number of items with which they agree. The population incidence related to the sensitive question is identified through the differences in the summary total across respondents given the “control” version of the questionnaire, which does not include the sensitive item, and the “treatment” version, which does.

For example, Blair, Imai, and Lyall (BIL), working in the highly challenging research environment of conflict regions in Afghanistan, seek to measure population support for the international coalition of forces. Among control respondents, they present the following list:

Karzai Government, National Solidarity Program, Local Farmers

and ask only for the total number of groups or individuals that he (or she) broadly supports (this would range from 0 to 3 here). The same instructions are given to the treatment group of respondents, but instead they are read the following list:

Karzai Government, National Solidarity Program, Local Farmers, Foreign Forces

By comparing the responses between treatment and control, the authors estimate a 16% support rate among the target population for the foreign forces.

In order to yield a valid estimate, this type of analysis requires that the addition of a sensitive item to the list does not alter the aggregate response for the control items. Often this seems like an innocuous assumption but one that is very hard to verify. Another possible drawback with this technique: it is not entirely free of bias since respondents recognize they automatically reveal their preference for the sensitive issue if they agree with either all of the items or none. Hence respondents may avoid choosing extremes. These “floor” and “ceiling” effects are unavoidable. Nevertheless the list method is often seen as an improvement in accuracy over previous randomized response efforts that attempt to get at sensitive private information.

Fortunately there is another indirect method, the endorsement method, which can sometimes be used to cross-validate the responses given by the list method. The endorsement method asks respondents to rate their support for an issue or policy, but randomizes across respondents whether the issue is described as supported by a sensitive social group or actor. Again, in the case of BIL, control respondents were asked something like:

A recent proposal calls for the sweeping reform of the Afghan prison system, including the upgrading of facilities… How do you feel about this proposal?

In contrast, the treatment group was read the following:

A recent proposal by the ISAF (Foreign Forces) calls for the sweeping reform of the Afghan prison system, including the upgrading of facilities… How do you feel about this proposal?

Typically in the endorsement method multiple questions regarding different proposals are read so that measurement does not rely on the view of a single proposed policy. BIL actually lists four policies that were widely discussed in the media at the time of the survey and estimates an overall support rate for the ISAF of 17.5%.

The fact that the two methods yield almost identical support rates is reassuring, especially since we might expect indirect methods to be subject to a higher degree of measurement error (in part because responses are highly sensitive to implementation details). That is, these indirect methods may indeed have less bias then direct methods, but may be more imprecise. BIL proposes to improve the precision of indirect elicitation by combining both methods in a joint analysis.

To enact this joint analysis we must assume that there is an unobserved, i.e. latent, level of support that drives responses to both types of question. (The close estimates of overall support from the two methods give us some confidence that this is a safe assumption.) BIL then adopts a specified parametric form to yield a maximum likelihood estimator that in turn gives an overall approval rating and can explore how approval varies with covariates of interest.

In the joint estimate the overall support rate is 19% (and more precisely estimated than from either solo method). The estimated support also co-varies in what seems to be sensible and expected ways – for example households that report victimization by Taliban view ISAF more favorably whereas households that report victimization from ISAF view them less favorably. The details are in the paper, as well some interesting discussion on social desirability bias among different sub-groups.

While a “magic survey truth machine” remains far in the realm of science fiction, I’d like to think the inventive methods discussed here take us a little further down the road of progress.

Comments

These are some smart methods but I would question the ethics of any technique that tricks people into giving a truthful answer they are not inclined to give. Are they being told "Your answers will be used to gauge support for foreign forces" or "Questions about the prison system will actually be used to gauge whether you support ISAF" when they give their consent to take part? One simpler route is to ask people what they think other people think (or how other people behave). Still doesn't guarantee truthfulness (and they might, of course, have wrong ideas about what others think) but can nevertheless be very useful in certain circumstances.

about the role of deception in survey measurement. I wrote about this before with respect to the use of mystery patients in health research (who deceive health practitioners as they pose as sick patients). http://blogs.worldbank.org/impactevaluations/sometimes-it-is-ethical-to-lie-to-your-study-subjects It's not clear to me that indirect elicitation is unethical (for example the methods above cannot ascribe a particular view to any individual respondent). It may depend on the perceived social importance of the study, which of course is subjectively assessed by an IRB. I presume the researchers above did receive approval from at least one IRB. This is really fascinating to think about...

The ethical concern is that deception is used. The questions are not really about what they seem to be about. Of course survey data is often used in ways not anticipated by the survey designers. This is a bit of a grey area: there are ethical problems if the participants didn't consent for their responses to be used in that way. But in the cases you describe it seems more clear-cut: the researchers quite deliberately, as part of the design, deceive the respondents. That seems to me to require quite strong justification. As a further thought experiment - would you do this with participants who, for instance, could read English and had internet access? If they found out what was going on and for whatever reason it got widely publicised, it would give researchers a pretty bad name and make it hard to do similar research there in future. (I don't mean to denigrate this specific study without knowing the details but just to sound a warning about this idea in general. I did check the linked paper but ethics are not mentioned.) On another note I wonder if it makes sense to worry so much about truthfulness when it comes to attitudes (as opposed to more objectively describable events). It may not be safe to assume that each individual has a single true attitude towards something that they express when they are being honest and which guides their behaviour -- rather than shifting their view according to how the question is framed, what aspects are made salient, social context and so on.

... my comment though is presumably that at least one IRB cleared this higher bar to allow the study. Of course we may have divergent opinions on whether this bar is met. But ethical review is a decentralized process. This makes me want to learn more about: 1. guidelines for use in deception in field research (clearly it is permissible in some settings, but how developed are these guidelines, do they need to be revisited, etc) 2. Review is a decentralized process, and probably should be, but how does IRB accreditation work and does it ensure sufficient standards of review? I have no idea right now of either question. I also like your second point on attitudes affected by framing, social context, etc. In fact subjective data is prone to many of these potential biases... Deaton recently showed that in the US subjective welfare questions such as "life satisfaction" are more sensitive to day of the week effects than they were to the 2008-9 financial crisis (essentially people are not satisfied with life on Mondays, but a good bit more so on Fridays, aggregate movements in unemployment hardly cause a ripple). However researchers keep asking these questions because they are presumed to get at important information that can't be measured in other ways... so an informed researcher has to be aware of these pitfalls and look for validation in related measures, etc. Thanks for the discussion!

Submitted by dan on
thanks for this interesting post. am wondering if you could explain something for me. I don't understand in the above examples why would you give the control group a different set of responses from the treatment. doesn't this make their responses a bit incomparable?

... the "control" and "treatment" designation refers to the type of questionnaire received by the respondent. It's a measurement experiment of a sort, but there is no intervention tested. Instead the researchers divided the respondent sample into two groups and gave each group a slightly different question to try to indirectly measure attitudes. If we were evaluating a policy change or intervention, then I completely agree - we would not want to give the treatment and control samples a different question, at least in most cases!