Notes from the Field: Using Big Data to unlock insights

This page in:
Big Data Word Cloud Big Data Word Cloud

Google was the seventh search engine to be developed, although looking at this timeline it might even have been the fourteenth… But what did it do that the others didn’t? And why should it matter to development practitioners?

Until recently, search engines only looked for the frequency a particular word was mentioned. Along came Google where instead, Sergey and Larry developed an algorithm to look at ‘backlinks’-i.e. ranking pages based on how many other websites referenced them; and, the trustworthiness of those other sites. What they essentially did was categorize data in a novel way. Today, in addition to being the foremost search engine, Google is one of the truest sources of human behavioral data and its inherent biases.

‘How do wildfires start?’ , ‘How much will the wall cost?’. -- most frequent Google searches, 2017
Did you know?
Data according to Google

It is not just about the rampant collection of data but, what is collected, how it is analyzed and oftentimes how it is presented. For instance, data on airport arrivals, airline ticket sales, credit reporting enquiries, and collateral registries can be fed into analysis on economic growth, job creation and estimates of the stock of migrant workers. Today’s ‘Big Data’ trend is supported by computing powers that did not exist before. A single google search today requires more computing power than it took to send Apollo 11 to the Moon.

Then there is the case of how the supreme potential of racehorse and triple crown winner - American Pharaoh was initially overlooked, until a, Harvard grad with no professional horse experience, recognized Pharaoh’s latent talent. He did this by using an unusual system of ranking- the size of his left ventricle and the corresponding largeness of his internal organs. A measure completely ignored by most horse breeders- at least in 2013.

A similar hypothesis is drawn in Michael Lewis’ best seller ‘Moneyball: The Art of Winning an Unfair Game’ - where baseball players were not chosen on their individual prowess alone but to ensure that all skills were accounted for in the team by either one player or by four.  This theory- referred to as ‘sabermetrics’- allowed America’s lowest-salaried Major League Baseball team to put together a 20-game winning streak. A team is more than just the sum of its parts.

Word cloud from the latest World Development Report: The Changing Nature of Work illustrating the power of technology and data. The picture which took less than a minute to create, effectively represents the most commonly used words in the report and displays the frequency with which the term appears by the size of the font.

Conventional sources of information- including surveys face many challenges in their presentation and interpretation. Often times, we need to reassess what data we collect, how we collect it and more importantly the conclusions we draw from it. Big Data introduces diverse sources of data and allows a whole new way to study populations. Equity analysts in many countries, rely on data such as electricity consumption and shipments, to assess GDP growth; in addition to official statistics.

Alternative data analysis- both quantitative and behavioral - can be used to dramatically bolster financial access around the world ; especially if we use it to supplement our traditional sources of information. Small unobtrusive experiments that vary the stimuli and potentially tabulate responses in real time may allow us to develop more targeted interventions in the work we do. The practice of using text as data is beginning to grow. Using tools such as the word clouds, and simple trends analysis and more complex artificial intelligence methods, we can look at patterns such as how language has changed over the years. For instance, the change in language used to refer to women over the years is one measure of advances in gender equality. But that is a topic for a different blog.

Before we can probe into the analytics, there needs to be a data source to analyze. For example, the rich transactions data held by the wide agent banking network combined with social data and the size of an individual’s network, could help us create improved methods for increasing usage of inactive accounts. This is critical for financial inclusion goals that can support poverty reduction efforts. The important work that the data collaboratives working group is already doing in this space to create public value by exchanging data with private enterprises should be leveraged more thoroughly in our operations.

There have been some open data initiatives which hold a lot of promise- a critical indicator to tracking progress of the SDG agenda- from the sharing of financial data such as the UK Open Banking Initiative to using satellite data to look at flooding patterns in Kerala and the US governments open data initiative and of course the WBG’s own open data to name a few.

Data analytics is exciting for many. For some, however, there are legitimate concerns of data protection. With so much personal data being held, and sold, by corporations, ethics is a primary concern. Aside from ethics, we need to be attuned to the inherent biases raised by the analysis offering women cheaper insurance as ‘safer drivers’ or the social ranking and views on your trustworthiness based on shopping preferences. Moreover, private and public sectors are increasingly turning to artificial intelligence and machine-learning algorithms fed on Big Data to automate decision making processes, but can we really see into the ‘black box’ of these algorithms?

Big Data often offers the sheen of being impartial but the algorithm may be masking intrinsic biases. The output is highly dependent on the input or in other words, at the end of a trail of Big Data is the algorithm that is fed into it, and that algorithm invariably, designed by humans; is not without bias.  The combination of Big Data and traditional research offers valuable lessons, but we need to be willing to admit that some of our previous assumptions might need to be altered.


Authors

Sharmista Appaya

Business Line Lead for Digital Data Infrastructure

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000