Last week, I had the honor of receiving one of the World Bank's FY15 Big Data Innovation Challenge awards for a proposal developed with a team of researchers from within and outside of the Bank. To give you a snapshot of the project, let me recount a familiar story which you may not have thought about for a while. On December 17th, 2010, a Tunisian fruit vendor named Mohammed Bouazizi took a can of gasoline and set himself on fire in front of the local governor's office. Bouazizi’s actions resulted from having his fruit cart confiscated by local police and his frustration at not obtaining an audience with the local governor; his death sparked what we now know as the "Arab Spring." With no other means of voicing discontent and lack of trust, citizens can embrace extreme forms of protest against institutions and governments that quickly escalate.
Our Big Data Innovation Challenge Project "'We Feel Fine': Big Data Observations of Citizen Sentiment About State Institutions and Social Inclusion" asks three research questions related to governance, trust, and inclusive institutions: 1)
In this project, we aim to discover answers to these questions through Big Data Analytics. What does this mean exactly? It means that we will collect a huge volume of data – approximately 100 million tweets or approximately 200 Gigabytes of twitter data, which, if each byte were converted to a unit of time, would equate to 214 million hours or approximately 3 million human lifetimes. It also means that we’ll be receiving some of that data at high velocity - every day 58 million new tweets stream forth from a multitude of Twitter accounts. And we will compare this twitter data with a variety of other data sources, such as news reports and indicators, to make inferences that will help us answer our research questions.
As a brief aside: To illustrate, the booklet distributed at the Big Data awards event included a word cloud visualization that mapped the size of the words to what participants indicated were the biggest risks they saw to the successful completion of their projects. Rather than the biggest concern being too much data, the majority of participants cited "Data Availability." I think we have to ask ourselves why – amidst floods of data - the biggest concern people have is the lack of it. One reason is what we might call "data rot" - data that have been left untended, uncontrolled, uncared for and, through technical obsolescence and sheer neglect, simply degrade out of existence. Of course the reasons are many, and exploring them is worth a research project in itself.
The volume, velocity, and variety of data – and many more features – characterize Big Data Analytics. In the end, however, though the data may be big, the methodological approaches computational and technically sophisticated, the most important characteristic of Big Data Analytics and our project may be what they can reveal at scale about human behavior and, more importantly, about inclusiveness, human dignity, and saving lives - like the life of a single fruit vendor in Tunisia.