International migration is the most effective action  that people in developing countries can take to increase their incomes and well-being. Yet our ability to learn about the policies that enhance or inhibit the gains to migration is severely restricted due to the poor state of migration data . One element of this is the lack of representative surveys of immigrants. Surveying immigrants is difficult since i) they are often rare elements (e.g. China has the second largest number of immigrants in the U.S., at around 2 million, so a random sample of 1000 U.S. households would thus only yield 7 households with Chinese migrants – and it gets much worse for other migrant groups) and ii) in some countries, an important share of the developing country migrant population is undocumented, so that they do not appear on community registers and may be reluctant to answer surveys.
A recently published paper  by Cris Beauchemin and Amparo González-Ferrer summarizes their experiences using “origin-based snowballing” to construct a sample of Senegalese migrants in France, Italy and Spain. The basic idea of this approach is to first carry out a representative survey of households in the sending country (Senegal), oversampling households in areas with a high prevalence of emigration. Then these households are asked for contact details of household members, and of close friends and family who are relatives abroad, with a survey team then following up to interview these individuals in the destination countries.
In theory this approach offers several advantages: it deals with the needle in the haystack problem of finding immigrants in the destination country; and migrants may be more willing to respond to the survey if the introduction has come through a trusted home source, which may be particularly important for illegal migrants. However, even if it goes well, it will tend to identify immigrants who are more connected to their home countries – since those who migrate with their whole households, or who don’t keep in contact with people in their home country won’t have anyone to provide their contact information. Also, it will make in-person surveying expensive in the destination country if immigrants locate in a whole range of different cities.
The experience of the MAFE-Senegal project suggests one should be even more cautious about using such a method, with the method essentially being close to a complete failure. Despite Senegalese households declaring they had 783 migrants in these European countries, they only managed to survey 36 individuals (6%), with the individuals who were contacted in this way no more likely to be undocumented, and much more likely to be strongly connected to Senegal than individuals interviewed in France, Spain and Italy using other sampling methods. The authors discuss how the method suffered problems in each stage: first, households only supplied phone numbers for 1/3 of the migrants they said they had abroad, with richer households less likely to supply this information; second, only 78% of the numbers given were correct; and third, only 17% of the correct contact details resulted in interviews. Refusal rates were particularly high in Italy, which the authors attribute to concerns about restrictive measures against immigrants being implemented by the Berlusconi government at the time.
So in this case the method was not very useful for constructing a sample of immigrants, and was also ineffective for giving a large sample of linked transnational households. The authors note two other cases where similar methods have been applied – once with Dominican migrants, with again poor results; and Mexican Family Life Survey (MexFLS), in which 91% of the migrants who left their origin households in Mexico between 2002 and 2005 were successfully tracked and interviewed in the U.S. The high success rate of the MexFLS likely reflects the fact that individuals had already been interviewed in Mexico first, and were recent migrants – but is still amazing given the difficulties I have getting Mexicans in Mexico to answer my surveys!
So what are the alternatives? Johan Mistiaen and I have a paper  where we tried 3 different methods of interviewing Japanese-Brazilians in Brazil: snowball sampling, intercept-point sampling, and full probabilistic sampling based on a door-to-door listing of over 22,000 dwellings to get a sample of 403 Nikkei households. The bad news was that while we found using a wide variety of intercept points and weights to take account of how often people go to different locations helps move towards a representative sample, there were still substantial biases in these non-probability sampling methods. Depending on the reason the survey is being done (and thus how important full representativeness is), there thus currently seems to be no easy solution to finding the needles in the haystack – leading to the painful process of rummaging through random samples of said haystack trying to find those elusive needles.