In late 2014, I launched a call for innovative ideas on SME growth, which attracted multiple ideas, leading to 15 finalists pitching their ideas at the World Bank, and two of these ideas being chosen to each receive prizes of $100,000, with a condition that there be (separately funded) impact evaluations to learn from testing these ideas. One of the two ideas chosen was by a Colombian start-up called Agruppa. The problem they wanted to solve was the inefficient supply chain for small retailers selling fresh fruit and vegetables. The status quo is that the typical owner gets up daily at 4:30am, spends over two hours daily traveling to, around, and back from a massive centralized market (Corabastos, pictured above), resulting in large time and transportation costs for store owners. Their idea was to use mobile phone technology to create virtual buyer groups, aggregate daily orders from store owners, bulk buying the produce from farmers, and then delivering it directly to stores.
In a new working paper, Leonardo Iacovone and I report on the results of a randomized experiment we conducted with Agruppa to test this idea. We summarize the results here, and then reflections from both us and Agruppa on the process of conducting an RCT with a start-up:
Experimental Design
Agruppa went door-to-door in poor neighborhoods of Bogota in Jan-Feb 2016 to map stores selling fruit and vegetables. Using larger streets as natural boundaries, these neighborhoods were divided into 63 market blocks, containing 1,620 firms. These blocks were then randomized into 32 treatment blocks and 31 control blocks (see Figure 1 below). Baseline surveys in these blocks collected information on the firms and their owners, and at the end, firms received a factual explanation of Agruppa, and were asked whether they would be interested in being a client should Agruppa launch in their block.
Figure 1: Random Assignment at the City-Block Level
Note: treatment blocks in gray, control blocks in green (Corabastos indicated by red diamond).
Our main analysis compares the 586 interested firms in treated blocks who get offered Agruppa to 536 interested firms in control blocks who are not offered Agruppa. We then measure spillover effects by comparing uninterested firms in treatment blocks to uninterested firms in control blocks.
Agruppa did not have capacity to launch everywhere at once, so planned to expand at the rate of one block a week. We therefore matched blocks, and each week would survey a treatment and matched control block. Agruppa would then go in the following week and start offering their service to interested firms. At the beginning, Agruppa was unable to start offering all possible products that shops would purchase at Corabastos, so it started by selling only five of the most frequently purchased products focusing initially on the bulkier and heavier rotation ones (potatoes, plantains, tomatoes, onions, and spring onions). These core products accounted for just over half of total sales in the retail stores at baseline. Agruppa gradually increased its products range over time, reaching 28 products during our survey period.
One innovation of this work was the use of very high-frequency follow-ups to track the immediate effects on business owners travel times, purchases, and pricing. Working with IPA Colombia, we conducted seven rounds of follow-up surveys at 2, 4, 6, 10, 14, 26 and 52 weeks after Agruppa had launched in a block in order to measure impacts on stores and owners. A big challenge in this work was collecting data from these stores: these stores were in poor neighborhoods with relatively high levels of mistrust and crime – 56% of store owners viewed their neighborhood as unsafe or very unsafe and 10% had been robbed in the last six months. Indeed, our enumerators were robbed several times. This insecurity and the lack of formality in keeping financial accounts generated a significant reluctance to share profits and sales data, and so we had much higher response rates for asking about travel times and sale prices than for profits and total sales.
Results
· Initial interest was high, but fell off quickly: 52% of interested firms made at least one purchase from Agruppa within the first two weeks, and 66% made a purchase at least once in the first year. But only 24% were still using Agruppa after 6 months and only 16% after one year.
· Agruppa's service did reduce travel time and costs, just not as much as anticipated. Over the first six weeks of introducing Agruppa, trips to Corabastos market fell 0.4 days per week for those using Agruppa, and conditional on going to market, firm owners spent less time there. As a result, those using the service saved almost 2 hours per week in travel time, relative to a control mean of 12 hours. Accordingly, firm owners continued to travel frequently to market to buy other products not sold by Agruppa.
· Work-life stress improved, particularly measures of having time for family and for all the activities they have planned.
· Retailers save on purchase costs by 6 to 8 percent and pass some of this saving onto consumers in the form of slightly lower prices, but also increase mark-ups.
· However, overall sales and profits appear to have fallen, due to less sales of non-Agruppa products. This finding should be caveated by many firms not revealing their sales. It is consistent with retail firms being less likely to be carrying some of the fruits and vegetables that were not core-Agruppa products.
In addition to directly learning about the effectiveness of this particular intervention, the rich high-frequency data we have on prices for these products provides rich information that enables us to better understand the nature of competition in these markets. We find that firm owners change their prices almost daily, and there is high, almost one-for-one, pass-through of market-level prices into prices charged. This would appear to suggest a very competitive environment. But on the other hand, we also find large cross-sectional price dispersion in the same blocks – even controlling for quality (which we measure by different appearance measures), an onion at a store at the 90th percentile for prices sells in the same neighborhood on the same day for 77 percent more than at a store at the 10th percentile, suggesting imperfect competition. The paper discusses how we reconcile these facts, and how they can help explain why Agruppa’s lower prices did not enable firms to gain more market share.
Our final 12 month follow-up survey finished in its blocks in November 2017. Unfortunately, in January 2018 Agruppa ceased operations.
Experimenting with Start-ups
It is often the case that researchers effectively experiment with a start-up idea they have come up with and are implementing themselves. But this was the first time I had conducted an experiment with a start-up enterprise at such an early stage. The upsides of doing so were clear – the early data collected could be informative to Agruppa in learning about its customer base and their limited scale meant that they could not serve everyone from the start, so that a planned geographic expansion easily lent itself to a randomized roll-out design.
However, launching a randomized trial at such an early stage also raised challenges for both the social enterprise and research teams.
Verena Liedgens and Carolina Medina, the leaders of Agruppa, wrote (in Spanish) a series of blog posts sharing their lessons from this experience, and thoughts about what they got right and wrong in launching the business. One of their posts was titled “the hidden costs of measuring social impact at an early stage”. It is definitely interesting reading and a cause for reflection for us as researchers. In this post, they describe their initial excitement as “like winning the lottery”, with both the receipt of funds to support their expansion from the SME ideas competition, as well as the opportunity to have an evaluation that would be far beyond their own ability to finance. They note the initial funding was key for them, and that our preliminary results helped them in raising further funding from other social investors. However, they note that participating in an RCT involved several challenges for their evolution as a business that were costly.
1) It limited radical pivots: when they committed to the RCT, they believed they had their business model and customer value proposition sorted out. But early-stage companies often discover they need to change rapidly in response to testing and customer feedback. They felt that they could make only certain types of changes (like increasing the number of products offered), but not radically change the structure of their product offering without invalidating the impact study.
2) Block-randomization and logistics: because of the block randomization, the blocks where they offered their service were not all continuous and optimized for delivery routes, but more difficult to navigate and less efficient (see Figure 1), resulting in sending delivery trucks further, and the need to stay away from control block areas meant they couldn’t just tell transporters to sell everything to anyone.
3) Growth timing: the plan of introducing one block per week was perfectly aligned with growth projections at the beginning. However, ex post they feel this led them to expand by faster than they should have done, offering a little less polished service to some customers as a result.
From our side, although Carolina and Verena were fantastic partners to work with, the early stage of the firm presented several challenges for measuring impacts. Starting to measure impacts while the product logistics are still being finalized, customer retention systems are not fully in place, and the operation is not operating at scale meant that:
1) We are measuring the impact of an evolving intervention: this is the flipside of their point 1) – the more the intervention changes, the more the average treatment impact we measure becomes an average over different service offerings. As noted, they did not wildly pivot, and we can compare take-up and retention rates for firms that were in blocks treated later to those treated earlier, but in a more general setting, this is something to consider.
2) Lower than anticipated retention and usage reduces statistical power: the drop-off of customer usage means that we have more difficulty measuring impacts due to incomplete take-up, and because usage varies over time, the group of compliers for forming LATE estimates is different each period. A more mature operation is likely to have addressed some of the retention problem and be easier to evaluate.
3) Less bandwidth for integrating additional experimentation: While we worked very closely with Agruppa, their team was directly focused on organizing the logistics of delivery and expansion, and there wasn’t much time to step back and explore integrating additional research questions which may have been helpful to both the business and research, such as formally experimenting with alternative methods for retention or acquisition of new clients.
My sense is that both sides came out of this evaluation being very appreciative for the spirit of collaboration and that they had learned a lot from the experience, but also feeling that the evaluation perhaps started too early in the launch cycle. Ideally, in future collaborations, it would be desirable for proof-of-concept to be delivered at scale in one location, and then a design similar to that used here be used to measure impacts when expanding to a second city or part of the city. The basic idea behind Agruppa – of reducing supply chain efficiencies by aggregating orders over many small vendors – still seems promising, and other start-ups like frubana for restaurants in Colombia and twiga for small vendors in Kenya are pursuing similar ideas.
Join the Conversation