Published on Development Impact

Invisible sample selection: Why you should care about those who leave when you are interested in those left behind: Guest post by Andreas Steinmayr

This page in:

This is the eighth in our series of posts by students on the job market this year.
A key problem in the literature on the economics of migration is how emigration of an individual affects the welfare of households left behind (see Antman (2013) for a literature overview). The literature has worried a lot about the possibility that households that select into migration are different from those that don’t. A whole range of different IV approaches, along with a few migration lottery experiments have tried to address this form of selection.  However, the literature has worried less about (and been less successful dealing with) a second form of selection, namely that some households do not leave any member behind. I call this invisible sample selection since these all-move households are not observed at all in the standard household surveys in origin countries used in most studies (and also not in many other datasets). But failing to account for this problem leads to biased estimates, as explained below and shown in this graphical illustration.

My job market paper provides a novel way of dealing with this problem. I show that relatively weak assumptions about the behavior of household members and selectivity of migrants allow bounding the effect of migration on household members left behind. The following example illustrates the problem and my proposed solution.
Consider one-adult-one-child households. Suppose adults participate in a visa lottery, and migrate if they win and stay if they don’t. Households decide about migration of the child. We can group households into four latent groups that describe all hypothetically possible reactions to randomly assigned adult migration (see Table 1a). One can think about these latent groups in a similar way as the latent groups in the LATE framework (Angrist, Imbens, and Rubin 1996). Not all groups necessarily exist in reality. If we assume that children do not migrate alone (which might be a justified assumption in many settings), we can rule out the existence of the types always migrants and defiers.
Table 1b shows the correspondence between observed groups and latent strata. Most origin country datasets include only households in which someone stays behind and thus no households of group O(1,1). If children do not migrate alone, we should not observe any households of group O(0,1). An approach ignoring invisible sample selection would imply taking the difference in outcomes between groups O(1,0) and O(0,0). Households in group O(1,0) are households that leave the child behind (never migrants). However, observed households without migrants (group O(0,0)) could either be never migrants, or compliers that did not win in the lottery. One can imagine many reasons why households that take the child with them are different from households that leave the child behind. The naïve approach therefore compares never migrants under treatment to a mixture of never migrants and compliers under control and therefore produces biased estimates.
Bounding the effect
For never migrants we have identified the outcome under treatment, but not under control. We can bound the mean outcome under control by trimming group O(0,0) from below or above, i.e., we look at the two extreme scenarios that all never migrants have lower outcomes than compliers or alternatively that they all have higher outcomes (Lee 2009). However, for the trimming we need to know the shares of compliers and never migrants, which can’t be estimated using origin country data alone, if all-move households are not included. I show how these shares can be calculated using data from origin and destination countries.
Things become somewhat more complicated when we consider a situation where we have an instrument for adult migration instead of random assignment. However, the logic of the approach is very similar to the case with random assignment. I define 16 latent strata that characterize all potential reactions of households to the assigned value of the instrument. I then make behavioral assumptions that rule out the existence of some latent groups. In my setting, such assumptions are for example that the instrument makes nobody less likely to migrate (no defiers among adults), or that the instrument only affects migration of the child by affecting migration of the adult. I then show that also in this scenario the outcome is identified under treatment and can be bounded under control.
A surprising result of this analysis is that IV estimates are usually biased, even if among households that react to the instrument, no systematic differences exist between those that take the child with them and those that leave the child behind. The absence of all-move households in the data causes the estimated share of compliers to be incorrect, which biases the IV estimates. My paper proposes a correction for the IV estimator for this scenario.
I illustrate the approach using data from two recent studies. First, I use data from a visa lottery for Tongans migrating to New Zealand and investigate the effects on household level outcomes. In this setting, 53% of households disappear from the data when they win in the visa lottery. Gibson, McKenzie, and Stillman (2011) address the problem of intra-household selection by removing all households, in which all members would have been eligible to join the principal migrant from the control group and the group of non-compliers. While this approach ensures that treated and control group contain no all-move households, it always removes immediate family members of the principal migrant, and thus does not allow identifying the effects of migration on spouse and children of the migrant. I show that informative bounds can be derived that don’t require a separation into movers and stayers based on observable characteristics.
In a second application, I study the effect of migration on educational attainment of children left behind in Mexico based on data used in McKenzie and Rapoport (2011). Comparing data from the 2000 Mexican and U.S. census suggests that roughly 82,000 Mexican born children aged 12 to 15 lived in the United States, who were not included in Mexican data. This number most likely increased substantially over the past decade. Taking into account sample selection due to migrating children is therefore of increasing importance. The bounds suggest a negative effect of adult migration on school attendance of boys between -0.19 and -0.14 and no significant effect on girls.
The proposed approach is especially suited to migration studies since theoretical and empirical literature on migrant selectivity provides a foundation to derive credible assumptions. Several extensions of this approach are possible, especially when exogenous variation to address one endogeneity problem is available, as in increasingly popular experimental approaches (McKenzie and Yang, 2010). However, invisible sample selection is a problem not only in the migration literature but also in other fields, and particularly in studies concerning developing countries, where high fertility, mortality and internal and external migration may rapidly change the population. Available observational or administrative data (e.g. census data) are often not representative of the population before, but only of the population after a treatment or event (natural experiment) took place. I study a scenario where units disappear from the population. But a treatment might also lead to more immigration, change fertility behavior, or household formation and therefore lead to more or different units to be observed. My paper shows that although the problem is invisible in many cases, we should not close our eyes.
Andreas Steinmayr is a Visiting Postdoc at the University of Chicago.


Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000