Open data for economic growth continues to create buzz in all circles. We wrote about it ourselves on this blog site earlier in the year. You can barely utter the phrase without somebody mentioning the McKinsey report and the $3 trillion open data market. The Economist gave the subject credibility with its talk about a 'new goldmine.' Omidyar published a report a few months ago that made $13 trillion the new $3 trillion. The wonderful folks at New York University's GovLab launched the OpenData500 to much fanfare. The World Bank Group got into the act with this study. The Shakespeare report was among the first to bring attention to open data's many possibilities. Furthermore, governments worldwide now routinely seem to insert economic growth in their policy recommendations about open data – and the list is long and growing.
Geographic distribution of companies we surveyed. Here is the complete list.
We hope to publish a detailed report shortly but here meanwhile are a few of the regional findings in greater detail.
We count ourselves among the believers (even as we roll our eyes over some of the numbers!), but what strikes us most about these reports is their northern focus ̶ the McKinsey report, for instance, is 'global' but concentrates almost exclusively on North America and Europe. The Shakespeare report was written for the U.K. government, and the Economist report fails to include any examples from the developing world. And the Omidyar report is broader and covers the G-20.
How are open data-driven companies doing in emerging economies?
Was there a class of entrepreneurs emerging to take advantage of the economic possibilities offered by open data, were investors keen to back such companies, were governments tuned to and responsive to the demands of such companies, and what were some of the key financing challenges and opportunities in emerging markets? As we began our work on the concept of an Open Fund, we partnered with Ennovent(India), MDIF (East Asia and Latin America) and Digital Data Divide (Africa) to conduct short market surveys to answer these questions, with a focus on trying to understand whether a financing gap truly existed in these markets. The studies were fairly quick (4-6 weeks) and reached only a small number of companies (193 in India, 70 in Latin America, 63 in South East Asia, and 41 in Africa - and not everybody responded) but the findings were fairly consistent.
Open data is still a very nascent concept in emerging markets. and there's only a small class of entrepreneurs/investors that is aware of the economic possibilities; there's a lot of work to do in the 'enabling environment'
- In many regions the distinction between open data, big data, and private sector generated/scraped/collected data was blurry at best among entrepreneurs and investors (some of our findings consequently are better indicators of data-driven rather than open data-driven businesses)
There's a small but growing number of open data-driven companies in all the markets we surveyed and these companies target a wide range of consumers/users and are active in multiple sectors
- A large percentage of identified companies operate in sectors with high social impact – health and wellness, environment, agriculture, transport. For instance, in India, after excluding business analytics companies, a third of data companies seeking financing are in healthcare and a fifth in food and agriculture, and some of them have the low-income population or the rural segment of India as an intended beneficiary segment. In Latin America, the number of companies in business services, research and analytics was closely followed by health, environment and agriculture. In Southeast Asia, business, consumer services, and transport came out in the lead.
- We found the highest number of companies in Latin America and Asia with the following countries leading the way - Mexico, Chile, and Brazil, with Colombia and Argentina closely behind in Latin America; and India, Indonesia, Philippines, and Malaysia in Asia
An actionable pipeline of data-driven companies exists in Latin America and in Asia
- We heard demand for different kinds of financing (equity, debt, working capital) but the majority of the need was for equity and quasi-equity in amounts ranging from $100,000 to $5 million USD, with averages of between $2 and $3 million USD depending on the region.
There's a significant financing gap in all the markets
- The investment sizes required, while they range up to several million dollars, are generally small. Analysis of more than 300 data companies in Latin America and Asia indicates a total estimated need for financing of more than $400 million
Venture capitals generally don't recognize data as a separate sector and club data-driven companies with their standard information communication technology (ICT) investments
- Interviews with founders suggest that moving beyond seed stage is particularly difficult for data-driven startups. While many companies are able to cobble together an initial seed round augmented by bootstrapping to get their idea off the ground, they face a great deal of difficulty when trying to raise a second, larger seed round or Series A investment.
- From the perspective of startups, investors favor banal e-commerce (e.g., according toTech in Asia, out of the $645 million in technology investments made public across the region in 2013, 92% were related to fashion and online retail) or consumer service startups and ignore open data-focused startups even if they have a strong business model and solid key performance indicators. The space is ripe for a long-term investor with a generous risk appetite and multiple bottom line goals.
Poor data quality was the number one issue these companies reported.
- Companies reported significant waste and inefficiency in accessing/scraping/cleaning data.
The analysis below borrows heavily from the work done by the partners. We should of course mention that the findings are provisional and should not be considered authoritative (please see the section on methodology for more details). Please also note that the studies were loosely coordinated so they are somewhat inconsistent with each other. To help others dig through the data for further analysis or combine it with research they may have done separately, we havepublished some raw data from the studies here, listing the companies we surveyed on the Open Finances website.
Key findings in Latin America based on the work done by MDIF
In Latin America (LATAM), the companies MDIF surveyed are seeking a combined $58.9 million in investments with an average of $2.9 million per company and a median of $750,000. Of these companies, 75% are seeking equity investment. Based on the pipeline data and conversations with startups and investors, MDIF estimates that there are about $168 million in investment opportunities in LATAM data-driven companies.
There are, however, very few investors specifically focused on data-driven startups. Investors tend to view these companies through a broader technology lens and see data-driven startups as potentially risky, given the low quality of data and the high cost of expanding data operations from one country to a whole region. In this region, as in others, the open/big data continuum is murky at best and the terms are sometimes used interchangeably.
Nevertheless, LATAM has significant near-term prospects. The data-driven startup sector is growing quickly and the necessary structures—quality data (though the quality of government data isn't very high yet), financing systems and supporting institutions—are in place. Companies that use open data as a core part of their business however expressed frustration that they are not taken seriously by potential investors. On the whole, open data is viewed as a nice idea, but not as a potential growth area. A new fund to provide larger seed funding and A round investments in promising companies would encourage other investors to consider open data startups and precipitate larger investments.
Who we surveyed
In Latin America, the MDIF team reached out to 70 companies, of which 25 responded. MDIF was also able to speak with 3 investors. The study covered several countries in Latin America. The companies were active in a variety of sectors and had different business models.
Business and consumer service providers represent the largest proportion of companies surveyed, a finding reinforced by discussions with entrepreneurs and investors. Successful data-driven business service providers include companies likeScanntech, a Uruguayan company that provides small retailers with a simple register system and sells the data generated to market research firms, andRock Content a Brazilian company that helps brands and media outlets generate the content their audiences want using predictive analytics. Agriculture and health and wellness are sectors where companies are developing products and services with both high potential for growth and important impact on societies. Notable companies in these areas includeMedicinia, a Brazilian startup with a platform to facilitate the sharing of data within hospitals to improve patient care and doctor performance, andSolapa, an Argentinian startup building a platform to help farmers analyze their crop strategy and yield by combining market data with sensory and GIS data.
Of the companies surveyed, 55% identified themselves as "early-stage" and 45% as "seed-stage." On average, the companies were 1.6 years old with a minimum of three months and a maximum of 6 years. Interviews indicated that while later stage companies do exist, the vast majority of data-driven companies are young. In terms of business model, the most common approach for companies in LATAM was developing a product (e.g. a piece of software or a unique analytics approach) that they can sell to governments, businesses, or consumers. Notably, none of the companies surveyed use advertisements as their primary business model.
The companies use a variety of data sources. Government data is especially problematic as startups from one country seek to expand their business to another. OPI, a government data-service provider developing new tools for governments to produce and analyze data, found that even within Mexico, data quality differs across government offices and there is little uniformity in how official bodies maintain their data. This means that startups have to spend a significant amount of money and human resources acquiring data, a process that only grows more complex as they expand into new markets.
The second most referenced type of data was consumer-generated data. Data within this category range from transaction data from the sale of a good or service to questionnaire data provided by users. Increasingly companies are developing application programming interfaces (APIs) to make their user-generated data available at a price to other companies.
Companies are becoming more sophisticated in how they merge unique data sources to build entirely new analytical tools. In the consumer finance sector, Mexico'sKonfio and Colombia's Lenddo both combine survey data with social media data and market data to expand loan access to individuals with no credit history. In the Mexican health sector,Bien.io combines user generated data about activity and diet with government health risk data to develop individually tailored health improvement plans. Medii combines geo-spatial data with government data on pharmacy locations and prices to give consumers a tool to find the best deals on the medications they need at whatever hour of the day.
Overall, the results of the surveys and interviews indicate that while the data quality situation across the region is far from perfect, LATAM is clearly headed in the right direction. As quality improves, the time and cost of acquiring and processing data will decrease, further increasing opportunities for data-driven businesses across the region.
Financing requirements of the companies profiled
Based on their research, MDIF suggests that the funding cycle needed to take a startup from an idea to a multimillion dollar company does exist, but is weak across the board. Of the 25 startups surveyed, 17 (68%) reported previously receiving some sort of financing. Latin American venture firms and accelerator programs were the most commonly reported sources (36% and 28% respectively) while only 16% of companies reported receiving investment from international venture firms.
In terms of financing type, 75% of respondents reported seeking equity investments. While only 35% and 30% were interested in quasi-equity or debt mechanisms respectively, analysis from the interviews conducted indicates that the lack of interest was more likely a lack of awareness on the part of entrepreneurs. Of the 25 companies surveyed, 20 indicated that they were currently seeking financing. These companies are seeking a combined total of $58,946,000 in financing with a minimum of $1,000 and a maximum of $20 million. The average amount for LATAM startups was $2,947,000 and the median, which ignores the handful companies seeking large B and C round investments, was $750,000. Overwhelmingly, the companies are interested in U.S. dollar financing. Only three companies reported interest in receiving financing in local currency and all three indicated that they would be able to receive financing in U.S. dollars.
Who are the investors in Latin America
LATAM investors tend to lump open or big data startups together with other non-data-focused technology companies instead of treating them as their own unique sector. The absence of explicitly data-focused investors means that data-driven companies, many of which are still working to solidify their business models and revenue strategy, have to compete with straightforward e-commerce sites that are already generating significant revenue or technology hardware companies with viable product sales.
One notable exception is Aurus, a $60 million venture investor in Chile that has identified open and big data companies as one of its fund focus areas. Aurus' managing partner Raimundo Cerda noted in an interview that while firms recognize the potential, the dearth of clear exit success stories makes even the LATAM region's most successful venture firms hesitant to invest the money and time in complex, data-driven companies. As a result, many potentially viable companies are unable to raise the Series A investments they need to grow.
Other regional investors with the potential to expand the data-driven sections of their portfolios include DGF Investomentos, a $300 million fund focusing on technology and e-commerce in Brazil; Kaszek, a $190 million fund based in Argentina and investing in technology and e-commerce companies throughout the region; RedPoint eVentures, a $130 million Brazilian fund focusing on e-commerce, mobile and cloud technologies globally; and Alta Ventures, a $70 million Mexican fund investing in technology companies.
While international venture capital firms were initially enthusiastic about LATAM, that enthusiasm faded in recent years. In one notable case, Silicon Valley firm Sequoia Capital opened and then quickly closed an office in Brazil, citing an "overwhelming e-commerce focus among entrepreneurs" and less than optimal potential for growth. Our discussions with founders and investors indicate that this sentiment is widespread. From the perspective of those in the data-driven startup scene, this situation is exacerbated by local venture firms that actively pursue seed-stage e-commerce companies instead of innovative (and also riskier) data-driven companies.
Some of the companies we reached out to in Latin America included
- Solapa: agribusiness management
- Junar: open data platform
- OPI: data-driven policy
- Scanntech: retail network services
Key findings in Southeast Asia based on the work done by MDIF
In Southeast Asia (SEA), the companies MDIF surveyed are seeking a combined $43.7 million in financing, with an average investment of $2.1 million and a median of $650,000. Of these companies, 86% are seeking equity investment. Based on the pipeline data and conversations with startups and investors, MDIF estimates about $104 million in investments in South East Asia data-driven companies. As in Latin America, there are very few investors specifically focused on data-driven startups. Investors here, too, tend to view these companies through a broader technology lens and see data-driven startups as potentially risky, given the low quality of data and the high cost of expanding data operations from one country to a whole region. Unlike LATAM, data-driven startups in SEA face a range of issues including regional brain drain, low data quality, and limited supporting institutions.
Who we surveyed
In South East Asia, the MDIF team reached out to 63 companies, of which 27 responded. MDIF was also able to speak with 2 investors. Localization of existing products is an evident trend among startups that MDIF identified in South East Asia. Despite the large number of "me too" startups, a small number of companies are experimenting with innovative and authentic business ideas. Regional diversity in language, culture, and IT infrastructure make expansion difficult for many companies. As a result, startups are often forced to limit their ambitions to their local markets. The most sophisticated and innovative data-driven startups are found in the Philippines and Indonesia. Singapore serves as a hub for entrepreneurs seeking better IT and business infrastructure, as well as greater access to regional investors and venture capital.
The companies represent a variety of sectors and business models. In terms of sector focus, a combined 41% of companies surveyed focus on business or consumer services. Within these sectors, e-commerce products and applications are the most common approaches. While a significant number of these products and services are local adaptations of ideas produced first in the U.S., there are a number of organic and innovative approaches among the companies we surveyed. Notable examples includeGeoMash, a Malaysian company that provides logistics, shipping and infrastructure companies with a platform to utilize GIS and sensory data; andKalibrr, company that uses natural language processing to match skilled job seekers with companies seeking talent.
Transport represents the third most commonly cited sector for data-driven startups. Given the traffic congestion and difficulties with intercity transport across the region, this sector represents a strong area for potential growth and data-driven innovation. In addition, research identified a smaller number of high-quality companies working in high social impact fields like health and welling and media. For example,Tipsdokter is an Indonesian company that is developing an online healthcare ecosystem to provide easy and affordable healthcare access, andPolitweet, a Malaysian startup that studies Malaysians' political interest and voting behavior through social media analytics.
The companies use a variety of sources of data. Apart from exceptions such as Indonesia and the Philippines, government data is largely unavailable across the region. Most startups surveyed did not plan to use government data either because relevant data is not available or not in digital format. Even Indonesia and the Philippines are making only slow progress. The companies MDIF spoke with in these two countries complained about the reliability and validity of the data they are able to access.Urbanindo, an Indonesian startup that runs an online real estate marketplace, questioned the integrity of the crime rate data released by the government and Tipsdokter indicated that despite the Indonesian government's push for digitization, the vast majority of healthcare data are still stored on paper.
Consequently, only 15% of startups reported using government data as part of their work across the region. The dearth of quality data means that companies have to devote a great deal of resources to generating the data themselves, which in turn makes it difficult for them to scale.Blueship, a Thai startup is developing an on-board diagnostics (OBD) device called Drivebot, which can transfer data of a vehicle to a mobile application because the government does not have good traffic data. Jagad, an Indonesian startup that operates the online shuttle reservation websiteTravelcar, is building a database of all shuttle operators in the country because the government data is incomplete. And Tipsdokter said they prefer to collect healthcare data through its website as it is easier and cheaper than digitalizing government healthcare data stored on paper although the data are valuable to the startup.
The most commonly used type of data was consumer-generated data followed by business-generated data. Data within these categories range from transaction records from the sale of a good or services, to user behavior information, to personal details submitted by the users or company. Geo-spatial data is another popular type of data used by almost 60% of surveyed companies due to the growing use of smartphones and geo-location applications in the region.
Financing requirements of the companies profiled
MDIF's surveys and interviews indicate that fundraising is a significant challenge for SEA startups generally and for data-driven companies specifically. Because of this difficulty, many companies prefer to move directly into the market and generate revenue instead of seeking financing to expand or further develop their ideas. Of the 26 companies that provided an answer, half reported previously receiving some form of financing. The most commonly cited sources of financing were regional accelerator programs followed by angel investors. Only 27% of companies reported receiving venture investments previously.
An overwhelmingly 86% of respondents reported seeking equity investment. Less than a quarter of respondents indicated an interest in quasi-equity, debt mechanisms or working capital. Based on qualitative interviews, it is clear that equity is the dominant financing type across the region. Even if the companies are aware of other financing types, few investors offer alternatives to traditional equity.
From conversation with startups, attracting venture capital beyond seed funding presents a large challenge. While there have been more local and regional venture firms as well as regional arms of major international venture firms established in SEA in recent years, and startup investments in the region have been growing, the scene is still dominated by quick cash e-commerce startups. According to online technology websiteTech in Asia, out of the $645 million in technology investments made public across the SEA region, 92% were related to fashion and online retail. These broader dynamics mean that there is little incentive for a talented regional entrepreneur to work on creative, data-driven projects or pursue ideas with strong potential for social impact.
The 21 companies that were currently seeking financing reported a combined total need of $43,703,000 with a minimum of $7,000 and a maximum of $20 million. The average amount for SEA startups was $2,185,150 and the median of $650,000. A large majority of the companies prefer financing in U.S. dollars with only four interested in local currency.
Who are the investors in South East Asia?
While MDIF did not find investors who were currently focused on data-driven companies in the region, there are general technology investors that have the skill and funding to move into the sector if their perspectives on data-driven companies changed. The most likely candidates for this shift are venture capital firms based in Japan or Singapore. Notable companies includeGree Ventures, a Japanese venture firm with a $20 million fund focused on internet companies in SEA;Gobi Partners, a $30 million fund headquartered in Singapore that invests in technology and digital media companies across Asia; andIMJ Fenox, a collaborative fund between Japanese investor IMJ Corporation and U.S. investor Fenox Venture Capital that focuses on Japanese and Southeast Asian companies.
Outside of the venture scene, the region has strong accelerators and incubator programs that help technology entrepreneurs get their ideas off the ground. The majority of the companies we spoke with got their start through an accelerator, and it is clear that the accelerator model is quickly expanding throughout the region. Programs like theJoyful Frog Digital Incubator, which provides Southeast Asian investors with $25,000 cash investment for equity and 100 days of mentorships, are very popular among startup founders. Based on conversation withAllstars, a Malaysian accelerator, their program is actively interested in data-driven companies, but the supply of startups is limited and companies that use open data are even rarer.
Some of the companies we reached out to in South East Asia included
- GeoMash: map-centric applications and visualization
- UrBanIndo: real estate services
- SkyEye: UAV services
Key findings in India based on the work done by Ennovent
There seems to have been a melding of open and big data in the entrepreneur/investor consciousness in India and many interviewees used the terms interchangeably possibly because most 'open' datasets that entrepreneurs/Indian IT firms initially worked on were also 'big' (these included GIS data, health data, weather data, and population data). However, of the 193 companies that Ennovent identified, 136 do appear to use open data sources. Roughly a third of the companies require financing (mostly equity) and their financing needs exceed $200 million (Ennovent extrapolated the total open data financing need to be in the vicinity of $500 million - this is however a crude estimate). Interestingly, a third of the companies that require financing have never received financing before (which may be a red flag).
Who we surveyed
In India, Ennovent reached out to 193 companies (of whom 106 responded) and 59 investors. Almost half of the companies were based in South India. Some 'companies' were non-profits. They provided useful information about open data demand in India but weren't included in the analysis of funding sources and needs.
From the target market perspective, a large number (72 of 193) of companies aimed at selling to other businesses, but several (25) targeted the rural population, particularly the farming sector. Companies aimed at urban markets included many developing convenience-centric products and services.
Which companies need financing?
60 of the 193 companies surveyed needed financing. Most companies needed equity, but they also needed debt, grant, and working capital. Many of these companies have received financing in the past. Angel/venture capital/private equity investment has been the major source of funding for these companies but several also reported government and other grants.
The highest demand for funding seems to come from companies operating in spaces such as location intelligence, healthcare, food and agriculture, and urban planning.
Who is investing in this market?
Ennovent contacted 59 investors in India to gauge their interest in the data driven market. Here are some of the investors in the market who have an interest in open data-driven companies. None of the other investors that Ennovent contacted was focused on data.
Some of the companies we reached out to India included
- Traffline: real-time traffic monitoring
- Stellaps: a dairy technology management company
- Railyatri: railway app
- NextDrop: water delivery service
Key Findings and who we survey based on the work done by Digital Data Divide
In Africa, Digital Data Divide (DDD) reached out to 41 companies in 11 countries. 34% of these companies were located in Kenya, 27 % in South Africa, 10% in Nigeria. Ghana and Tanzania accounted for 7% each. Cameroon, Egypt, Ethiopia, Uganda, Zambia, and Zimbabwe accounted for 15% of the entities.
These companies were primarily active in agriculture, business, health, and telecommunications, though DDD did also hear from companies engaged in transportation, real estate, human resources, finance, information technology, environment, insurance, legal, online stores, and security and technology.
33 companies provided complete or partial financial information. Ghana and Morocco accounted for a third each of the companies willing to provide such information, Kenya for about a fifth, and a couple of companies from South Africa responded as well. The financing needs of the companies that are trying to raise resources added up to over $21,000,000.
Financing requirements of the companies profiled
The companies that require financing are at different stages of maturity.
Financing needs of the companies surveyed ranged from a few hundred thousand dollars to $5 million and totaled above $21,000,000.
Some of the companies we reached out to in Africa included
- DataScience: data analysis and research
- Caciopee: software services
- Last Mile for BoP: profit for purpose social business
Key findings in Russia
A local consultant working for the World Bank Group was able to gather information on 27 Russian companies that use open data. The number is likely much lower than the reality of the ground where some companies that use open data tend not to publicize it (an opinion expressed by one of the market experts). Some do not have sufficient marketing capacity, and interestingly, the consultant did not observe any crossover between companies that use data commercially and those who use it for social purposes. In addition, the list is skewed towards Moscow- and, to a lesser extent, St. Petersburg-based companies (there might be more unexplored potential in the region).
The information provided by the companies suggests that
- Almost 70% of them already use open data in one way or the other, even if their companies are not built on open data
- Financing requirements range from $100,000 to over $10 million, with most common types of financing being equity, quasi-equity and working capital. Companies project relatively quick break even points in 1-3 years
- Most companies are in seed and early stages, but many founders have already cut their teeth in other projects
- The most common types of business models are data enrichers and aggregators
In the list, 7 companies provide data mining, data analytics and other data solutions, 6 companies represent urban and transportation (including geospatial) sector, 3 companies provide services based on legal data, 2 companies specialize in app development, 2 companies analyze social media, 2 companies have procurement projects, and real estate, healthcare, science, and social services are represented by one company each.
Another common characteristic that emerged from conversations and meetings is that many projects have goals to expand internationally. Some entrepreneurs have already looked into foreign competitors and are confident they can differentiate themselves and grow beyond Russian borders. Given that a significant number of startups in emerging markets tend to be copycats or simply fail to expand internationally, data-driven businesses can be a welcome break from this trend.
Some of the companies we reached out to in Russia included
- Habidatum: data analysis and visualization
- Transparent World: remote sensing data
- NextGIS: - geospatial solutions
A note on the methodology
The goal of these surveys was to gather first cut information about whether the private sector in emerging countries has begun to produce open data-driven companies, and if yes, whether these companies face financing-related challenges. The surveys were designed to get a temperature of the market, rather than serve as comprehensive assessments.
The information in the study is based on
- Desk research to compile a list of companies and gather background information about them (including financial and investment information)
- Desk research to identify investors active in the markets
- Direct outreach to the companies to gather information about open data use, financing needs, and growth potential
- Direct outreach to investors to gather their perspective on the markets they operate in
- Engagement with the 'open data community' in the regions covered
We did not validate self-reported information about specific uses of open data and financing needs/revenue projections/business model viability. Readers should not assume that the companies we list are viable investment options and have been vetted in any way by us.
It is also important to clarify that many companies used the concept of 'open data' loosely. Some conflated it with publicly available data, others with big data. We tried to account for the confusion as much as possible but it is possible that we could not do so fully.
Finally, the findings of the study must be understood and used in the spirit of the project which was to conduct a rough analysis of the potential pipeline of open data driven businesses in emerging markets (rather than create an official knowledge product or report which this certainly is not). The study is not meant to be definitive or comprehensive and we are sharing our findings in the spirit of openness. The findings/numbers are indicative at best, and we welcome alternative, or even better, more comprehensive analysis based on additional, more rigorously gathered data. View the complete list of companies we surveyed here.
World Bank Group Finances is the online access point for IBRD, IDA, and IFC open financial data. The website features datasets that cover loans, contracts, trust funds, investments, and financial statements. A related mobile app, which allows you to "talk" to us more easily about operational and financial data in nine languages, is available for download for Android and iOS smartphone and tablet users at the Google Store and the iTunes Store, respectively. Follow us on Twitter to join and remain engaged in the conversation about the Bank's open financial data.
The World Bank Group and Inter-American Development Bank are conducting a study on open data and collecting examples of its use. You can submit your use here: https://bit.ly/OpenDataOps.
Join us on Fridays, at 10:30 AM EST for Google Hangouts discussing specific uses of open data and the interesting people behind them. See what we've covered so far and have coming up next on our Open Data Use Hangouts Calendar.