Published on Development Impact

Can predicting successful entrepreneurship go beyond “choose smart guys in their 30s”? Comparing machine learning and expert judge predictions

This page in:

Business plan competitions have increasingly become one policy option used to identify and support high-growth potential businesses. For example, the World Bank has helped design and support these programs in a number of sub-Saharan African countries, including Côte d’Ivoire, Gabon, Guinea-Bissau, Kenya, Nigeria, Rwanda, Senegal, Somalia, South Sudan, Tanzania, and Uganda. These competitions often attract large numbers of applications, raising the question of how do you identify which business owners are most likely to succeed?

In a recent working paper, Dario Sansone and I compare three different approaches to answering this question, in the context of Nigeria’s YouWiN! program. Nigerians aged 18 to 40 could apply with either a new or existing business. The first year of this program attracted almost 24,000 applications, and the third year over 100,000 applications. After a preliminary screening and scoring, the top 6,000 were invited to a 4-day business plan training workshop, and then could submit business plans, with 1,200 winners each chosen to receive an average of US$50,000 each. We use data from the first year of this program, together with follow-up surveys over three years, to determine how well different approaches would do in predicting which entrants will have the most successful businesses.

The standard approach: use expert judges
Business plan competitions typically rely on expert judges to score proposals, with higher scores given to those businesses judges view as having higher likelihoods of success. This was the case with YouWiN!, where 20 judges from the Enterprise Development Center and the Nigerian office of PriceWaterhouseCoopers spent about 30-45 minutes per proposal, scoring business plans out of 100 based on a framework that assigned scores on 10 different criterion such as the management skills and background of the owner, the business idea and market, financial sustainability and viability, job creation potential.

These judges did believe there were big differences amongst these firms in terms of potential: winning scores range from 30 to 91 out of 100, with a mean of 56 and standard deviation of 12. Many of the winners were chosen via lottery from amongst a semi-finalist pool (which I use for an impact evaluation of the program), so that the scores for the non-winners overlap, ranging from 2 to 73 out of 100, with a mean of 51.

Alternative approach 1: use economists as human experts
A first alternative approach is to use ad hoc models of economists to predict business success, where these models take a subset of variables that the literature has suggested correlate with business performance, such as gender, age, education, ability (as measured by Raven test and digit span), business sector, and household wealth. We use two such models – one from the working paper version of my impact evaluation, and one taken from Fafchamps and Woodruff’s analysis of a business plan competition in Ghana.

Alternative 2: use machine learning to predict which businesses will succeed
The application form and baseline data contain a large number of possible variables that could be used to predict success. An alternative to judges or human experts is to use machine learning methods to build a prediction model, starting with 393 possible predictors. We use three machine learning approaches: lasso, support vector machines (SVM), and boosting (which is similar to random forests). In each case we split the sample into three groups: a training sample (60% of the data) which is used to estimate the algorithm, a cross-validation sample (20% of the data) which is used for a grid-search to choose the optimal tuning parameters for each method (e.g. a lasso shrinkage parameter), and then a test sample (20% of the data) which is used to measure out-of-sample performance. We split the data five different times, so that in the end we have out-of-sample performance for the whole sample, and average results over these five folds.
These methods can be computationally and time-intensive – the boosted regressions took approximately 25 hours per outcome to estimate for example.

We examine the performance of these three methods in predicting business operation and survival, employment, profits, and sales three years after application. We do this separately for the winners (who received grants) and the non-winners.

  1. Conditional on getting to the stage of submitting a business plan, the judges’ scores have almost no predictive power in determining which entrepreneurs will succeed.
The judges scores explain less than 2 percent of the variation in outcomes; are not significant predictors of any of our four outcomes for non-winners; and only are able to help in predicting employment for the winners (with much of this coming from those with higher scores getting larger grants). This is seen clearly in Figure 1 below, which shows almost no relationship between the judges scores and outcomes for business plan competition winners (the paper also has a similar figure for non-winners).

Figure 1: Judges Scores aren’t very helpful in predicting which winners will do better

2. Despite the large number of potential predictors available for machine learning, it neither outperforms simple ad hoc models of economists, nor even simpler single prediction models when it comes to average performance.

We do find some simple characteristics are correlated with business success: males in their 30s who score highly on a Raven test are more likely to succeed. Just using the Raven test alone often does just as well as the models of economists, or as our machine learning models.

3. Business success is really hard to predict.

The out-of-sample performance of all our methods is very low. Typically the models explain less than 5 percent of the variation in business outcomes after three years.

4. Machine learning does not uncover many predictors that wouldn’t already be considered by human experts

The boosting models choose a lot of the same predictors that were chosen in the ad hoc models that didn’t use any model selection methods. Moreover, they contain few splits, suggesting that interactions aren’t that important. It does end up using a few variables we might not otherwise consider: how close to the deadline people ended up applying, the length of their response to why they wanted to start a business, and how many siblings and children they have.

5. The methods do a little better at predicting the very top firms, and small improvements here might make considerable difference to the return on investment.

We also look at whether the methods can identify which firms end up in the top 10% of employment or profits. We focus here on the recall rate: the proportion of the top tail of firms that are correctly predicted. This would be 10% with random selection, but some of the methods get to be about twice as accurate. One problem is that there is no consistent ranking of methods here, with different methods doing best on different outcomes. Nevertheless, if we were an investor aiming to pick the top 100% of firms to invest in, and then got royalties based on their profits, we could potentially have 2-4 times the returns of random selection using the ad hoc models of economists – this would beat both the judges and machine learning approaches.

Caveats and Lessons
  • A first caveat is that this is based on trying to pick amongst individuals that had already massively positively self-selected relative to the Nigerian population as a whole – requiring people to apply online and submit a detailed business plan already screens out lots of individuals with low education levels and perhaps lower ability to grow businesses, as well as those who can’t get their act together to meet deadlines and to show up for the 4-day training. So this doesn’t mean that you can’t tell business potential apart in a general population, just that after the first stages of a business plan competition, it may be really hard to tell people further apart.
  • It may be much harder to predict among high-growth entrepreneurs (e.g. Hall and Woodward, or Nanda) than to tell apart subsistence businesses (e.g. Hussam et al; de Mel et al).
  • We often argue that random selection is a fair way of selecting participants for programs, as well as helping in impact evaluation. These results help provide support for this view in the context of high-growth entrepreneurs – and this is without even thinking about the costs involved in implementing a judging process.



David McKenzie

Lead Economist, Development Research Group, World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000