Syndicate content

Are we ready to embrace big private-sector data?

Andrew Whitby's picture



The use of big data to help understand the global economy continues to build momentum. Last week our sister institution, the International Monetary Fund, launched their own program in big data, with a slate of interesting speakers including Hal Varian (Google Chief Economist), Susan Athey (Professor at Stanford GSB and a former Microsoft Chief Economist) and DJ Patil (Chief Data Scientist of the United States).
 
The day's speakers grappled with the implications of big data for the Fund's bread-and-butter macroeconomic analysis--a topic of great interest to the World Bank Group too. Examples were presented in which big data is used to generate macroeconomic series that have traditionally been the preserve of national statistical offices (NSOs): for example, MIT's Billion Prices Project, which measures price inflation in a radically different way from traditional CPI statistics.
 

Big data is often private-sector data 

A recurring theme was the interaction between the private and public sector in the production of data. While many NSOs are currently investigating how big data can inform their work, the private sector is some way ahead. Moreover, many of the most interesting sources of data originate with the private sector. Varian, for example, presented an overview of Google trends and Google correlate, which are based on aggregated search query data (along the way responding to some common criticisms).
 
This raises an interesting problem for NSOs. For a long time governments had an effective monopoly on big data (One might even argue that the original big data era got underway with the US census of 1890, for which the first large-scale data processing machine - the Hollerith punched-card tabulator - was invented.) This is no longer the case. Today, the best jobs data may lie with LinkedIn; Facebook could estimate international migration flows better than any national government; and Amazon probably has a better sense of minute-by-minute consumption spending than the BEA.
 

Private-sector data offers opportunities, but also risks

If NSOs don't use this data, they face disruption from private firms that do. But are governments ready to turn parts of their 'critical data infrastructure' over to private companies, who do not share the NSOs' mandate to provide accurate, timely, impartial snapshots of society? Lucrezia Reichlin of LBS noted that Google trends data, for example, is highly-processed, making it difficult to trust for more than demonstration applications. Athey and David Lazer of Northeastern University discussed the way in which small algorithmic tweaks at Facebook and Twitter, made for business reasons, can change the meaning of data derived from those sites.
 
The new world of big data forces us to confront these issues. As Patil noted in a closing 'fireside chat', many of our modern statistical products reflect engineering constraints from an earlier era, when computation and communication were vastly more expensive. We could, he noted, run a census every night, but should we? Indeed experimental projects (including some within Innovation Labs) have showed that cellphone records, for example, can be used to measure population relatively accurately (albeit with many caveats).
 
Do we really understand the risks of these approaches? After all, the de facto official status granted to privately-generated credit ratings data -- and the conflicts of interest that resulted -- has been cited as a contributory factor to the subprime crisis of 2008. As an audience member from the US Census Bureau pointed out, the census is constitutionally mandated in the US (and in at least 53 other countries). Basic statistics like this make a critical contribution to an informed and representative democracy. Can we really imagine a world in which they are based on private-sector data? And if not, what role will private sector big data play in the official statistics of the future?
 
We'd love to hear your thoughts on this topic - use the comment box below!

Comments

Submitted by Erik van Ingen on

Collecting data from retailers is fundamental for the further development of statistics in the UN, WorldBank, IMF, etc.. A framework for big data anonymization is needed in order to overcome the reluctance of the private sector to share their data for the sake of global objectives.

Add new comment