Syndicate content

Add new comment

#4 from 2013: Numbers Are Never Enough (especially when dealing with Big Data)

Susan Moeller's picture

Our Top Ten Blog Posts by readership in 2013
This post was originally published on January 8, 2013


The newest trend in Big Data is the personal touch.  When both the New York Times and Fast Company have headlines that trumpet: “Sure, Big Data Is Great. But So Is Intuition.” (The Times) and “Without Human Insight, Big Data Is Just A Bunch Of Numbers.” (Fast Company) you know that a major trend is afoot.

So what’s up?

The claims for what Big Data can do have been extraordinary, witness Andrew McAfee and Erik Brynjolfsson’s seminal article in October in the Harvard Business Review: “Big Data: The Management Revolution,” which began with the showstopper:  “‘You can’t manage what you don’t measure.’”  It’s hard not to feel that Big Data will provide the solutions to everything after that statement.  As the HBR article noted:  “…the recent explosion of digital data is so important. Simply put, because of big data, managers can measure, and hence know, radically more about their businesses, and directly translate that knowledge into improved decision making and performance.”

HBR’s point actually is well taken:  data can help managers, among others, better understand their businesses and all the component parts.  And it is true that this fall has seen compelling evidence that those who are able to make sense of a torrent of data have a crucial edge over the competition:  Consider Nate Silver’s polling tracked in his FiveThirtyEight blog and Obama’s election staffers.  News outlets from ComputerWorld to the Wall Street Journal, CNN, Time and ProPublica all reported that the data crunching of the Obama team is in large measure responsible for the president winning a second term.

But it’s not true - or rather it’s misleading to say - that you can’t manage what you don’t measure, at least if you are thinking only about quantitative measures.  And it’s misleading to think that all it takes is the right algorithm and analytics and the numbers will fall neatly into place, outlining the best path ahead - that if you “measure” you “hence know.”

No doubt, Big Data is the next big thing, but businesses, governments, NGOs and international organizations can’t lose sight of what matters most about the data trend:  finding someone - that “Nate Silver” in your field - to make sense of the numbers.  That comes back, then to the Times’ comment about “intuition” and Fast Company’s note that “human insight” is needed.

The good news is that there are those out there who are training the next-generation Nate Silvers:  students who can evaluate data better because they are asking political and policy questions of it.  I’ve mentioned University of Maryland Prof. Ben Shneiderman’s Computer Science Information Visualization class in this column before.  Once again his students are in front of the curve in making sense of a wide range of data that is out there… and not just by applying their geek computer science skills.  They are applying human insight to data - first as they consider how to sift and visualize the data, and second, as they translate their findings for all the non-geeks out there.

Take the case of one student team, Fan Yang and Sheng Zha, who evaluated a World Bank data set related to global Science and Technology Development.  As Yang and Zha wrote: “Across the world, the science and technology development of nations and regions significantly varies. Usually it is difficult to infer useful information by only looking at the tremendous amount of raw data collected from several decades and hundreds of nations.”  As a consequence of that observation, Yang and Zha decided to map the number of R&D researchers in a country and correlated that first with the numbers of students enrolled in tertiary education in the country and second with that country’s investment in research.

What does the data show?  Well, for starters, it suggests that governments’ investment in research is globally relatively limited - that in many countries research may be being driven by universities, rather than by governments.  That insight by itself has further research as well as policy and funding implications.

The top map shows the correlation between the number of researchers and tertiary enrollment rate from 1995 to 2010 and the bottom map illustrates the relationship between the number of researchers and nations’ expenditures on research.  The size of the circles is the same in both maps and relates to the number of researchers in a country.  The “warmer” the color of the circle, the greater is a country’s level of student enrollment in tertiary education or investment in research.


Shneiderman’s course points up two key reasons policymakers should care about marrying Big Data and “human insight”: 

  • First, to vividly and visually show data trends and patterns.  Not every data set profits from a Sherlock Holmes-style investigation, although tough questions should be asked of every data set.  Sometimes the dog does bark, and for relatively well known reasons.  But even on those occasions there is still a need to confirm what’s suspected. And beyond that, there is a value in being able to graphically show findings, rather than just talk about them or represent them via a spreadsheet.  Shneiderman’s students who tracked U.S. cancer rates or traffic incidents around the Washington, D.C. region, for example, did not uncover hitherto unknown patterns in cancer or traffic, but the visualizations they created make for more compelling testimony about what is known than a narrative recap of the data would be by itself.

Washington, DC region traffic incidents as visualized
in a pie chart and spatially mapped

  • Second, to find and show data trends and patterns that no one suspected (or at least sufficiently acted upon).  Unless you have a ton of data, unless you think to ask the right question of those numbers, and unless you have a way to show (not just tell) others about what you find, you often don’t have a compelling way to influence policy.  Rongjian Lan and Yulu Wang, two other of Shneiderman’s students looked at the U.S. Mine Accident Injuries Dataset, for example, and discovered that dramatically more fatalities and permanent injuries occurred to miners who had less experience on a specific job.  That correlation in turn suggests a raft of questions - but it also suggests that more training, in addition to better attention to mine safety standards might be a way to save lives and limbs.

Data must be evaluated by those who have the math skills, but also by those who understand the content of what is being evaluated.  Great visualizations of data, for example, can make something so seemingly self-evident, that complicating factors can be overlooked and additional questions are never asked.  That’s another reason why the “human instinct” is needed - and at both the front end, and the backend of working with Big Data.

In these two charts, the X-Axis denotes mining accident dates and the Y-Axis the job experience of the victims.  Color is used to differentiate the degree of injury: green represents minor injury, black fatality, and pink permanent injury. The charts graphically show that the more serious injuries are sustained by those who have less experience.

The data and human skills it takes to make sense of Big Data, has - perhaps to no one’s surprise - created a “work-force bottleneck,” as the NY Times calls it. There aren’t currently enough people who have both the math skills and the personal touch.  In the United States alone, a 2012 report by the McKinsey Global Institute projected, 140,000 to 190,000 workers with “deep analytical” expertise are needed and another 1.5 million more data-literate managers.

So if you’re looking to make an investment in Big Data, perhaps start there, with training.  And if you have questions about what the next generation needs, you could always ask Prof. Shneiderman.

---------------------------------

SIDENOTE:  Students in Prof. Ben Shneiderman’s course at the University of Maryland, College Park, use existing software tools, such as Spotfire and Tableau, to explore data sets relating to healthcare, poverty and life expectancy, unemployment, education, R&D, mine accidents, flight and car traffic data, and online communities.

For more on the data visualizations mentioned here, visit the course’s Wiki pages.  (Do not be dissuaded by the warning screen; it is safe to visit the site.)

Photo Credit: (jennY)

Follow PublicSphereWB on Twitter