The outbreak of disease is often a latent but hard-hitting global concern, as currently exemplified by the flurry of anxiety swirling around Ebola. Many efforts have sprung up to fight spread of the disease, including the World Bank which has recently committed $400 million to said cause. While the growing human cost of the disease is distressing enough, if Ebola is not contained, the World Bank has estimated an additional economic cost of $32.6 billion to the West Africa region.
To find out, we recently spoke with Clark Freifeld, co-founder of HealthMap.org, a web-based tool that has been in the news for detecting Ebola 9 days before the World Health Organization officially announced it. Started in 2006 at Boston Children's Hospital, HealthMap uses public health, media, and open data sources to provide real-time information and alerts about disease outbreak based on a number of filters, including location.
In 2006, co-founders Clark Freifeld (developer) and John Brownstein (epidemiologist) were working at Boston Children's Hospital. They wanted to capture value from what was then the untapped explosion of information on the web about disease outbreaks. However, the problem was that this information was not well-organized and not useful for real-time decision making or analysis.
HealthMap uses a web crawler that searches the web 24 hours a day, pulling in data from hundreds of thousands of relevant publicly available sources including news media, health groups, and government agencies. HealthMap then applies filtering and text-processing algorithms before making the data available- offering the public a global view of ongoing disease outbreak activities in 15 languages.
While there was initially skepticism in the health community about relying on informal sources, the HealthMap approach has become an important part of public health surveillance. Government agencies (including the CDC, HHS, USAID) now take a structured feed from Healthmap for integration with their own data feeds. The CDC has also collaborated with HealthMap on an interactive Dengue map which combines the CDC's official risk maps and Healthmap's real-time outbreak data.
A good mix of public health officials and the general public regularly access the site--many trying to understand what is happening around them. But there are challenges. As with all data approaches that currently deal with mining content that comes in the form of free text, there is a lot of noise. Automated algorithms are created to process and account for this dynamic, but it is hard to get a true sense of what is happening with an unfiltered raw feed.
Beyond sheer volume, there are granularity and linguistic challenges with this type of intake. Sometimes there is no location information in news articles, making it difficult to pinpoint locations of outbreak. A lot of noise also results in searches for disease-related terms, typically turning up scientific findings, vaccination campaigns, and linguistic variations that don't pertain to disease outbreak. For example: Justin Bieber Fever.
Beyond refining and improving existing text-mashing and filtering algorithms to reduce noise and to cast a wider net, the team has a number of areas set for future experimentation and development. Mobile technology has allowed for spin-off projects exploring direct reports from the field as an additional data source and method for validation. Further refining algorithms to make better sense of social media data is another area of key exploration in the search for more valuable signals. There are also plans to extend the number of languages.
Do you know of other uses of open data? Tell us! We might add to our list.
Also, join us on Fridays, at 10:30 EST for Google Hangouts discussing specific uses of open data and the interesting people behind them. Check out our Open Data Use Hangouts Calendar for details.