For those who don’t recall the classic 1973 plot of Disney’s “Robin Hood,” the eponymous hero, played by an anthropomorphic fox, competes in an archery competition to win a kiss from ladylove Maid Marian (also a fox). We are going to take some liberties with the plot of that pivotal scene to discuss the findings of our new paper, “A Map of the Poor or a Poor Map?,” which explores some best-practice guidelines for small area estimation (SAE) in the study of poverty.
Just as Robin Hood prioritized the poor in redistributing the Sheriff of Nottingham’s gold, national governments need to know how to target their limited resources to the neediest within their borders. One key tool in effort is a highly disaggregated “poverty map,” which is developed by using a recent national household survey augmented with an auxiliary data source – typically a census. In essence, the household survey, which has the well-being (income or consumption) measure used to calculate poverty, uses a model to “borrow strength” from the much larger census dataset, which has many overlapping variables but not the well-being measure, to predict poverty at a much finer geographic disaggregation than possible with only the survey.
In our Robin Hood analogy, using SAE is like giving Robin a high-powered scope for his bow; it allows him to cluster his arrows much more accurately – hopefully in the center of the target. And just as a target without the painted bulls-eye rings would not be much use in an archery competition, we need to have a good estimate of the precision for our predicted values in our map to be of much use in policymaking. The problem is that our high-powered scope, or our model, can introduce design bias into the estimators. Think of the x’s as Robin’s arrows and compare the two figures below. Figure 1 depicts arrows shot with no scope – it is unbiased (the arrows are clustered around the center point of the target) but has a high Mean Squared Error (MSE), which represents the variation in accuracy from the center of the target. Figure 2 depicts arrows shot with our fancy scope – they are very tightly clustered (low MSE or variance) but biased (or centered off the middle of the target).
Figure 1 |
Figure 2 |
The objective of our paper is to use the Mexican intracensal survey, a large-scale household survey of millions of households conducted between censuses, as a census dataset to evaluate a number of different methods for doing small area estimation. You can think of the census as being used to establish an objective truth (painting the bullseye on the target) while the different methods are different scopes that Robin Hood might employ to win his archery competition. We are testing for both the bias (how far the average arrow lands from the bullseye) and the variance (how tightly the arrows are clustered) as measured by the MSE. We need to remember that in real life we do not have the benefit of the painted bulls-eye – the truth is unobserved – so we need to be relatively confident in the accuracy of our scope if we are going to win that foxy smooch. The worst case scenario would actually be to cluster all your arrows in one part of the blank target thinking you hit the bulls-eye when you were really quite far off (i.e. present your unknowingly biased results with narrow confidence intervals – conveying misplaced confidence in the findings).
To evaluate the accuracy of our scopes, we do two types of simulations with the census: design-based and model-based. In the design-based simulation, we make Robin shoot arrows at the same bulls-eye 500 times using each of the scopes, and then compared how each scope did on average. The benefit of this approach is that it is a real-world test – actually shooting arrows at the target. The downside is you have a problem with external validity, or the likelihood that the results obtained from Robin shooting at this particular target 500 times would hold true generally for any other target (like the all-important one at the archery competition). What if there was something about Robin that just matched really well with one type of scope for this target (or characteristics of our Mexican data that lend themselves well to one approach)?
Model-based simulations work the opposite way. They look at multiple targets full of arrows shot by someone with a particular scope, and then try to figure out the entire universe of possible shooters and conditions that could have led to this outcome. What if the archer was short? Very strong? Shooting from a tree? Had a head cold? There was a strong cross breeze? This analysis is useful because, though it is a bit more removed from real life, we can do a bit of meta-analysis of the findings. Scope 1 is very good when it is breezy, but you have to be tall to shoot accurately. Scope 2 is great if you are super strong but vertically challenged. Scope 3 is disastrous if you have to shoot from a tree no matter whom you are. Etc.
So, after all our arrow shooting was said and done, what did we find?
- For small area estimation under most of the methods considered, a critical assumption is the errors are normally distributed, which requires a data transformation. Historically most analysts have taken the natural log – but we find log-shift and Box-Cox transformations actually improve both accuracy and efficiency. For Robin and his scope, he needs to make an assumption about the direction and magnitude of the wind – get it wrong and the best scope in the world still won’t strike true – and therefore he wants the best method to adjust his scope to the conditions.
- All methods tested present gains in MSE over direct estimates (which means shooting with a scope clusters Robin’s arrow more tightly), though certain methods – such as those using only area-level characteristics – suffer from considerable bias (which means some groups of scopes tend to systematically strike off the bulls-eye), and the bias could be worsened or countered by other deviations from model assumptions (which means you are in serious trouble with some scopes if it is rainy or windy on the competition day). In real-world scenarios where we lack a census (or painted bulls-eye) for verification, not knowing if the deviations from model assumptions are making things better or worse could lead us to unknowingly missing the target entirely.
- Well-being is dependent on household-level characteristics. This means that if you work with only aggregated covariates (such as those at the area level), ignoring the household-specific characteristics, the most likely result will be biased estimates of the model coefficients, and, by extension, of the poverty predictions. (Bear with me here because this one is going to stretch the analogy to the breaking point…) If Robin has the choice when purchasing his scope, he should choose a “Household-Level Covariate” brand scope because overall it is a more reliable option than the “Aggregate-Level Covariate” brand scope. The problem though is that “Household-Level Covariate” scopes are more expensive and sometimes just plain not available. In that case, if there are truly no other alternatives available, shooting with a “Aggregate-Level Covariate” is probably better than no scope at all, Robin just needs to be super conscious of the potential inaccuracy – bias – inherent to the design and compounded under certain conditions.
Our paper goes into much more detail and we would encourage you to read it if you are interested in learning more. In terms of how things turned out for Robin, after all this rigorous testing, he selected a top of the line “Household-Level Covariate” brand scope, fitted with lasso model selection and log-shift transformation upgrade kits, aced the archery competition, and won the hand of the fair Maid Marian – because what foxy lady could help but be swept off her feet by this level of rigor in empirical analysis?
And they lived happily ever after…
Join the Conversation