The epic battle of man against machine has been fought on many occasions. One of the most memorable encounters was the chess game between IBM’s Deep Blue and Gary Kasparov. Deep Blue was the first computer to beat a reigning chess champion in 1996 (the machine still lost 2 to 4 after six games). A year later, at their “rematch”, the machine won on the overall score: 3.5 to 2.5.
However, it is surprising that, 18 years later, we still have not figured out the ultimate winning strategy in chess. Any game with limited combinations and full disclosure of information must have ‘safe strategies’ and can be ‘solved’ (as has happened with the game checkers in 2007). The solution, in chess, would from what we know today involve strategies whereby the white player would win or the black player would force a draw. Yet no human or super computer to date has managed to solve chess’ mathematical puzzle. How much more computing power do we need to succeed?
Some say, that Kasparov managed to win several games against Deep Blue thanks to the advantage provided to him by his “intuition”. By contrast the machine calculated every possible theoretical option, including many that made little sense, and thus needlessly wasted a large share of its computing power. This is about to change. IBM has developed another super-computer called WATSON that, the company says, will herald a new era of “cognitive computing”. WATSON has already challenged humans in a game that is particularly tricky for machine: Jeopardy! The game covers a wide variety of topics (including history and current events, science, the arts, popular culture, literature, and languages) and the question are often framed in a tricky way requiring the machine to absorb the nuances of the human language. In 2011, WATSON won by a wide margin against the reigning champion live on TV.
Are we indeed about to enter into “the third era of computing” and if so how different is it likely to be? In the first era of computing machines were used to carry out massive but systematic tasks (e.g. the US census). In the second era of computing, machines (including now ubiquitous cellphones) were enhanced with “programs” and consumer friendly applications like windows and word, which have become integral to our daily routines. But even in this second era, the computer remains a glorified calculator processing a limited number of operations with certainty. The third era, by contrast, involves a departure from the world of clean and curated data and a deep dive into a much larger but also messier “big data” world of probability. The next era will have machines than can learn and adjust. They are taught, not programmed. Instead of telling you a limited number of things with certainty (mathematical calculations, spell checks) cognitive computing is entering new terrain, such as assisting in medical diagnosis or predicting crime, but will often provide answers in probabilities.
Our approach to social and economic analysis need to change by sheer fact that big data is indeed “big” on a staggering scale. By 2015, we should live in a world of 100 sextillion data points: this is 1023 or 100,000,000,000,000,000,000,000. The problem with this data Tsunami is that over 80 percent of the data is “unstructured”, simply “out there” and not available in neat tables. Traditional computing methods don’t help you much in interpreting it. More than two thirds of the big data wave is expected to come from sensors and social media (see figure).
Figure- From 0 to 100 sextillion in five years
Why are these technical developments important to watch and why should a development institution care? Put simply, because better, faster, and cheaper data can have massive payoffs in terms of poverty reduction and development. Parents want to know if their kids are learning something at school, patients want to know if the doctors are making the right diagnoses (note: 50 percent of diagnoses are wrong even in developed countries), farmers want to know at what price to sell their crop, truck drivers want to know if they can expect major delays, and everyone, especially economic policy makers, want to know if inflation is rising.
But our methods and approaches are still those of the 20th century. We still live in a world where statistical offices produce monthly inflation and employment statistics using traditional “baskets of goods” and review the size of the economy and poverty levels once a decade (and publish the results with such delay that the numbers are already outdated by the time they are out). Even in the most advanced countries, delays are substantial (especially after the census) and economic numbers often need to be corrected significantly afterwards.
This is anachronistic because the technology is available to calculate all this data basically on the spot. Almost everyone in the world today has a cell phone and three billion are using the internet. This makes it possible to gather vital data faster and more accurately. But it needs smart systems to separate right data from wrong and make sure it is properly weighted to represent the overall population or economy.
Until now, however, breakthroughs have mainly occurred in the commercial world, often predicting what you may be interested in buying. However, no-one has yet been able to mine these data on scale for development impact. For an economist and statistician it would be wonderful to calculate inflation or economic activity on the spot and being able to project forward with greater accuracy and granularity. But we need to adopt new approaches because what we know today is already outdated tomorrow: 90 percent of the world’s current data is only two years old.
This is like in the early days of the discovery of oil. We know the resource is there and recognize that it can be made useful. But it took massive investments to deploy the heavy technology needed to extract the oil and transform it into the refined gasoline you fill your car with. The same is true for the big data revolution. It is there. We know for a fact that it will change our lives, but we don’t really know how to make the best use of it. Once big data reaches all of us, humans will still be in charge because even the smartest machine doesn’t have the judgment required to make the data revolution meaningful to people. But the machines can help humans make better judgments.