Numerous recent discussions on the future of development financing focus on the delivery of results and how to mainstream accounting for results in aid flows (see here for one review paper by Nemat Shafik ). This “results based approach” to aid is gathering steam in many contexts. Recent examples include the UK’s Department for International Development (DFID) which has recently adopted a results based approach to allocate aid across countries and sectors . Another is Results Based Financing for Health (RBF) where the World Bank, with financing from the Norwegian government and DFID, is piloting RBF schemes in the health sector  in various developing countries.
A results focus on aid is attractive to donors for many reasons including that, if pay-for-performance is the future, donors would no longer have to invest scarce resources in process monitoring of activities such as the procurement of inputs. However another form of monitoring takes priority in these schemes – the monitoring of incentivized or targeted results.
Results can be measured through numerous methods depending on the frequency and scale as dictated by the particular program or policy. In many settings, such as RBF that incentivizes health delivery at the level of the individual clinic, the rewarded indicators are currently self-reported. There is one challenge to linking rewards to a self-reported indicator: the temptation to inflate gains and misreport may arise and dictate behavior. As such, pay-for-performance schemes need a verification mechanism.
The risk of third-party audit is the most common approach to induce veracity in self-reporting, as taxpayers throughout the world are well aware. However numerous operational audit-related questions in a pay-for-performance context are unresolved including: what is the most effective form of audit, what is the most effective frequency of audit, and how severe should sanctions be set? Currently, while theory provides some guide, there is little empirical evidence and only a few recent impact evaluation studies (in vastly disparate settings) that address these questions.
A well know experiment from Ben Olken  found that an increase in the risk of central government audit from 4% to 100% reduced missing expenditures in community driven development road construction projects in Indonesia. However even with a guaranteed audit, missing expenditures were still estimated to be substantial – the audit reduced missing expenditures from 28% to 20% of the total grant. One pessimistic suggestion from this study is that the increased audit risk may have displaced corruption into channels not covered by the audit activities, such as an increase in the employment of family members. Another pessimistic finding is that community based monitoring of the project wasn’t nearly as successful in uncovering corruption as the central government audit. (Although in other settings community involvement in monitoring  has led to dramatic improvements in accountability).
Rafael Di Tella and Ernesto Schagrodsky investigate an anti-corruption campaign in hospitals in Buenos Aires.  This observational study uses a time-series of hospital input prices that span both the pre-campaign and campaign periods. They find that the announcement of intensified audits initially reduced procurement prices by 15% when the risk of audit was at the highest point. When audit risk receded to an intermediate level, procurement prices rose a bit but were still 10% lower than pre-crackdown prices. The wage level of the procurement officer was also a determining factor – procurement staff with relatively low wages were more likely to exhibit pricing behavior consistent with kickbacks than their higher paid counterparts in other hospitals.
A recent paper by Henrik Kleven and four co-authors  describes a multi-year audit experiment conducted with Danish taxpayers. For the Danish population at large the audit risk is approximately 4%. However 40,000 individuals in the base tax year were randomly assigned to either an unannounced thorough audit or to a guaranteed free-from-audit pass. In the subsequent year, audit risk was again randomly determined for the same subjects but this time pre-announced to all.
The results from the base tax-year audit established that income subject to third-party reporting – the vast majority of income in Denmark – is reported extremely accurately by tax payers. However self-reported income (from all sources) was under-reported by an average of 37%. The experience of an audit has a strong positive impact on reported income the following year, and this impact operates again through the self-reported income channel. Presumably suffering through an audit changes the perceived probability of detection in the future.
In the second year of the study, the risk of audit was pre-announced and randomized to either 0%, 50%, or 100%. As you might expect, this announcement affected income reporting behavior. Tax-payers in the group facing certain audit were significantly more likely to file amendments increasing their reported income (again mostly from self-reported sources). This was especially true for the tax-payers who did not receive any audit in the first year, as their expectations of detection were presumably much lower before they received the audit announcement. The tax-payers who had previously received the audit were already wary and had reported higher incomes in the first place.
(And, as predicted by simple theoretical models, the group that faced a 50% risk of audit also filed amendments to increase their reported self-income but at roughly half the rate of the 100% audit risk group.)
The general literature on tax compliance suggests that when penalties and audit probabilities are set at realistic levels for many OECD economies, the deterrent effect of audit risk is expected to be small. Therefore theoretical models predict more tax evasion than what is observed in practice. Previous explanations for this conundrum include feelings of social solidarity. The authors of the Danish study instead argue that low rates of tax evasion are due to the widespread prevalence of (disinterested) third party income reporting. As the authors write “it’s not that taxpayers are unwilling to cheat, it’s that they are unable to cheat.”
While all three studies share important lessons, numerous questions remain for the design of pay-for-performance programs.
In the Kleven et al. study, there was a clear effect of audit experience on behavior the following year, but what about the longer-run impacts of audit experience? If an audit leads to a perceived higher risk of future detection, does this risk adjust downward over the years as the memory of the audit recedes?
The Kleven et al. study also points to the importance of third party reporting in assuring truthfulness. In the development context, third party reporting such as independent survey may be feasible in selected settings, but it is far from universally applicable right now. In the settings where an independent population based survey of outcomes is a viable option, this can hopefully serve the same deterrence as a third-party report. Creative thinking (perhaps involving joint liabilities of reporting units) may also be able to institute or approximate third party reporting in settings where it is currently unfeasible.
In other settings, such as the clinic based health RBF, third-party surveys to measure health gains at the frequency and precision necessary for the program would be exorbitantly expensive. Thus in these settings we appear to be back in the world of verification through audit, and are confronted with questions on optimal audit rates and sanction levels that would ensure honest reporting in a cost-effective manner.
There is a great need for applied theoretical work in these areas and, yes, more field experiments.