Published on Development Impact

How to collect data on social networks

This page in:
Development Impact logo

With Guest Blogger Juni Singh.

Social interactions play an important role in determining individual behavior and outcomes, especially in developing countries where people rely on informal relations for information sharing, public good provision, and risk sharing. As such, one needs to carefully think about the underlying social network when examining the effects of various interventions. For instance, social connections can lead to spillovers from the treatment group to the control group. Or there may be within treatment group spillovers that generate heterogeneous treatment effects by how connected individuals/ households/ firms are. In some cases, social connectedness may itself be an outcome of interest (Anukriti et al 2022; Khandelwal and Singh 2024). Mapping social networks is a way to formalize and think through how these interactions and spillovers occur. Collecting data on social connectedness gives us access to statistics that help us better understand the channels of transmission.  

What type of network data should I collect?

The first question to ask yourself is what kind of relational data is relevant for the question that you want to answer. Think for example about adoption of electric cookstoves – if it is typically a decision made at the household level, you might want to know how information flows across households. For family planning and reproductive health knowledge, on the other hand, asking women who they spend the most time with might be more relevant (e.g., Anukriti et al 2020).  The unit of measurement depends very much on the level of interaction that you are interested in. A word of caution: if you are mapping household networks, be careful as to who in the household is surveyed, as that can introduce bias in your data.

Once you have determined the unit of measurement, you need to determine what form this data will take. There are many types of relationships or interactions, e.g., kinship, actions (talking with; visiting the market with; borrowing from); co-occurrence (studying or working together). And whether you simply want binary data (connected vs not connected) or continuous data (e.g., number of times talked during the last month).

What network statistics am I interested in?

There are two main components to constructing a network, nodes (think of individuals) and edge list (think of connection between individuals). In the network diagram below (extracted from Iacobelli & Singh 2023), each circle is a woman in the network (node) and a link is drawn between two nodes if any of them names the other person as a friend. Depending on your research question, you might be interested in different network statistics. One of the most common statistics is the number of ties or direct connections (or, the degree). In the network below, the person circled in blue has two connections. Centrality is another important concept which highlights power/influence of a person in the network. The person circled in red is, for example, very “central” since she has many friends who have many friends.

Besides individual level statistics, you may also be interested in statistics at the level of the entire network. One such measure is clustering that shows how closely knit a given network is.

Figure 1: Social network mapped in a Nepali village.


What is the best way to map a social network?

Let’s start with the best-case scenario. Suppose you want to map the individual or household level network in a village. In an ideal world, with unlimited budget and time, you would first conduct a census of the village. You would then administer your social network module to everyone (or each household). Here too, you have choices. You could ask each respondent whether they engage with every other person in your census one at a time. Or you could ask them to tell you the names of individuals that they are connected to (with or without an upper limit to how many names they can mention). Each of these approaches has pros and cons. Using the census to go through each name can be time consuming and act as priming for the respondents. On the other hand, limiting the number of names can lead to connections being omitted, resulting in a top coded network. The best practice is to let people name as many people as they like. This generally does not tend to exceed six to seven names. Finally, you match each individual connection back to the census. In the figure below from Iacobelli & Singh 2023, the authors surveyed every woman in the village and asked each woman to report all her connections in the village to construct the network. In Beaman et al. 2021, the authors created a household network by asking all households in the village who they consulted for agricultural decisions. This method gives you a complete picture of the network but is costly in terms of time and money.

How can I measure social networks with resource constraints?

Reduce the number of questions to the type of relation you are interested in. This could be asking one question that infers friendship networks along the dimension of advice, for example “Who in the village do you go to for advice on reproductive health?”  Depending on the question that you want to answer, you could use the number of ties as the outcome and not build the complete network. To build the complete network, you would still need to match the names collected in the survey to the census to create an edge list. This can be a cost effective approach but still requires a lot of time to do the matching correctly. For example, say there are two Maya Tamang in the village; so, when someone names Maya as a friend, you’d need to correctly link to the right Maya. And let us not get started on the issue of different spellings!

Conduct partial snowball sampling to obtain a “complete” network for a subset of the entire network. In this method you would choose a fixed number of seeds randomly. You would elicit their connections, then elicit connections from their connections, and so on. And then match the names to the census. However, if the overall network is segregated, the seed selection can have non-trivial consequences.

Randomly sample individuals and estimate the missing links. In Banerjee et al. (2013), the authors conduct a full census of 75 villages and administer the network survey to 46% of all households per village. They “correct” their results for measures of missing data. Chandrasekhar & Lewis (2016) outline a set of analytic corrections and a two-step estimation procedure using graphical reconstruction.

Aggregated Relational Data (ARD) is another approach that can be used to recover parameters of a network at the level of the individual and the whole network (Breza et al. 2020). Instead of asking the names of all individuals that a respondent is connected to, you only need to ask for the number of individuals with certain characteristics that a given person is connected to. For example, “How many of the households that you know have twins?” can be asked to a random subset of the full sample. This method is up to 70% cheaper than a standard network elicitation. See this post from a few years back on this approach.

Given the resources at your disposal, you should now be able to decide what works best for your context.

S Anukriti

Senior Economist, Development Research Group

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000