In a recent blog post  on stories, and following some themes from an earlier talk by Tyler Cowen , David Evans ends by suggesting: “Vivid and touching tales move us more than statistics. So let’s listen to some stories… then let’s look at some hard data and rigorous analysis before we make any big decisions.” Stories, in this sense, are potentially idiosyncratic and over-simplified and, therefore, may be misleading as well as moving. I acknowledge that this is a dangerous situation.
However, there are a couple things that are frustrating about the above quote, intentional or not.
- First, it equates ‘hard data’ with ‘statistics,’ as though qualitative (text/word) data cannot be hard (or, by implication, rigorously analysed). Qualitative twork – even when producing ‘stories’ – should move beyond mere anecdote (or even journalistic inquiry).
- Second, it suggests that the main role of stories (words) is to dress up and humanize statistics – or, at best, to generate hypotheses for future research. This seems both unfair and out-of-step with increasing calls for mixed-methods to take our understanding beyond ‘what works’ (average treatment effects) to ‘why’ (causal mechanisms) – with ‘why’ probably being fairly crucial to ‘decision-making’ (Paluck ’s piece worth checking out in this regard).
In this post, I try to make the case that there are important potential distinctions between anecdotes and stories/narratives that are too often overlooked when discussing qualitative data, focusing on representativeness and the counterfactual. Second, I suggest that just because many researchers do not collect or analyse qualitative work rigorously does not mean it cannot (or should not) be done this way. Third, I make a few remarks about numbers.
As a small soapbox and aside, even calls for mixed-methods for making causal claims give unnecessary priority to quantitative data and statistical analysis for making causal claims, in my opinion. A randomized-control trial – randomizing who gets a treatment and who will remain in the comparison group – is a method of assigning treatment. It doesn’t *necessarily* imply what kind of data will be collected and analyzed within that framework.
Anecdotes, narratives and stories
As to the danger of stories, what Evans, Cowen, and others (partly) caution against is believing, using or being seduced by anecdotes – stories from a single point of view. Here I agree – development decisions (and legislative and policy decisions more generally) have too often been taken on the basis of a compelling anecdote. But not all stories are mere anecdotes, though this is what is implied when ‘hard data’ are equated with ‘statistics’ (an equation that becomes all the more odd when, say, the ‘rigorous’ bit of the analysis is referred to as the ‘quantitative narrative’).
Single stories from single points in time in single places – anecdotes – are indeed potentially dangerous and misleading. Anecdotes lack both representativeness and a counterfactual – both of which are important for making credible (causal) claims and both of which are feasible to accomplish with qualitative work. As revealed with the phrase ‘quantitative narrative,’ humans respond well to narratives – they help us make sense of things – the trick is to tell them from as many perspectives as possible to not un-mess the messiness too far.
Representitiveness: It is clear from the growing buzz about external validity that we need to be cautious of even the most objective and verifiable data analysed in the very most rigorous and internally valid way because it simply may not apply elsewhere (e.g. here  and here ). Putting this concern aside for a moment, both qualitative and quantitative data can be collected to be as representative of a particular time and place and circumstance as possible. I say more about this, below.
Counterfactuals: Cowen notes that many stories can be summed up as ‘a stranger came to town.’ True, to understand something causal about this (which is where anecdotes and tales following particular plot-lines can lead us astray), we would like to consider what would have happened if the stranger had not come to town and/or what happened in the town next door that the stranger by-passed. But those are still stories and they can be collected in multiple places, at multiple time points. Instead of dismissing it or using it only as window-dressing, we can demand more of qualitative data so that it can tell a multi-faceted, multi-perspectival, representative story.
That rigor thing
Perhaps it seems that we have a clearer idea of how to be rigorous with collecting and analysing quantitative data. I don’t think this is necessarily true — but it does seem that many quant-focused researchers trying out mixed methods for the first time don’t even bother to consider how to make the qualitative data more rigorous by applying similar criteria as they might to the quant part. This strikes me as very odd. We need to start holding qualitative data collection and analysis to higher standards, not be tempted to scrap it just because some people do it poorly.
An excellent piece  on this (though there are plenty of manuals on qualitative data collection and analysis) is by Lincoln and Guba. They suggest that ‘conventional’ rigor addresses internal validity (which they take as ‘truth value’), external validity, consistency/replicability and neutrality. (The extent to which quantitative research in the social sciences fulfils all these criteria is another debate for another time.) They highlight the concept of ‘trustworthiness’ – capturing credibility, transferrability, dependability and confirmability – as a counterpart to rigor in the quantitative social sciences. It’s a paper worth reading.
Regardless of what types of data are being collected, representativeness is important to being able to accommodate messiness and heterogeneity. If a research team uses stratification along several to select its sample for quantitative data collection (or intends to look at specific heterogeneities/sub-groups for the analysis), it boggles my mind why those same criteria are not used to select participants for qualitative data. Why does representativeness so often get reduced to four focus groups among men and four among women? Equally puzzling, qualitative data are too often collected only in the ‘treated’ groups. Why does the counterfactual go out the window when we are discussing open-ended interview or textual data?
Similarly, qualitative work has a counterpart to statistical power and sample size considerations: saturation. Generally, when the researcher starts hearing the same answers over and over, saturation is ‘reached.’ A predetermined number of interviews or focus groups does not guarantee saturation. Research budgets and timetables that take qualitative work seriously should start to accommodate that reality. In addition, Lincoln and Guba suggest that length of engagement – with observations over time also enhancing representativeness – is critical to credibility. The nature of qualitative work, with more emphasis on simultaneous and iterative data collection and analysis can make use of that time to follow up on leads and insights revealed over the study period.
Also bizarre to me is that quant-focused researchers tend to spend much more time discussing data analysis than data collection and coding for quantitative stuff but then put absolutely all the focus (of the limited attention-slice qualitative gets) on collecting qualitative data and none into how those data are analysed or will be used. Too often, the report tells me that a focus group discussion was done and, if convenient, it is pointed out that the findings corroborate or ‘explain’ the numeric findings. Huh? If I am given no idea of the range of answers given (let’s say the counterpart of a minimum and a maximum) or how ‘common themes’ were determined, that thing that one person said in a focus group just becomes an anecdote with no real ‘place’ in the reporting of the results except as a useful aside. One more thing on credibility – the equivalent of internal validity. Lincoln and Guba say that credibility *requires* using member-checks (stay tuned for a paper on this), which means sharing the results of the analysis back with those who provided the raw data so that interpretations can, at least in part, be co-constructed. This helps prevent off-the-mark speculation and situation analyses but also helps to breakdown the need to ‘represent’ people who ‘cannot represent themselves’ – as Said  quotes from Marx. I’ve said a few things about this sort of shared interpretation here , recognizing that respondents’ perceptions will reflect the stories they tell themselves. That said, as development researchers increasingly look at nudging behavior, the stories (not-always-rational) actors tell themselves are potentially all the more important. We need to collect and present them well.
One key hurdle I see with enhancing the perceived rigor and non-anecdotal-ness of qualitative work is that it is hard to display the equivalent of descriptive statistics for textual/interviewin data. That doesn’t mean we shouldn’t try. In addition, it is more difficult and unwieldy to share (even ‘cleaned’) qualitative data than the quantitative equivalent, as increasingly happens to allow for replication. Still, if this would enhance some of credibility of the multifaceted stories revealed by these data, it is worth pushing this frontier.
Numbers aren’t always clean
In terms of stories we tell ourselves, one is that data are no longer messy (and, often by implication, are clean, hard, ‘true’) because they fit in a spreadsheet. Everything that happened in the field, all the surveyors’ concerns about certain questions or reports of outright lying all often seem to fade from view as soon as the data make it into a spreadsheet. If you ask a farmer how many chickens he has and he gives you a story about how he had 5, 2 got stolen yesterday but his brother will give him 4 tomorrow, regardless of what number the enumerator records, the messiness has been contained for the analyst but not in the reality of the farmer that is meant to be represented.
In general, if we want to talk about creating credible, causal narratives that can be believed and that can inform decision-making at one or more levels, we need to talk about (a) careful collection of all types of data and (b) getting better at rigorously analysing and incorporating qualitative data into the overall ‘narrative’ for triangulating towards the ‘hard’ truth, not equating qualitative data with anecdotes.
Photo courtesy of ProtoplasmaKid (Own work) via Wikimedia Commons
This post first appeared on Heather's blog, hlanthorn.com 
Follow PublicSphereWB on Twitter