Public sector reforms often attempt to mimic the “discipline” of the market in order to spur better performance among service providers. Examples include numerous variants of performance based financing for health , where health providers are compensated monetarily for achievement of specified health targets. At the heart of this approach is a standard view of economic agents induced to modify behaviors through pecuniary incentives. Yet incentives do not have to take the form of money, and reforms that attempt to modify behavior based on an agent’s concern with reputation have also found some success.
A quasi-experimental evaluation of a hospital quality ratings system  instituted among U.S. hospitals in the state of Wisconsin systematically varied the public release of assessment reports. Low-scoring hospitals in the publicized group took efforts to improve performance in time for the next round of assessment. However hospitals with the same low score but in the non-publicized group did not address the shortcomings. The motivation for redress among low performers in the publicized group did not directly appear to be pecuniary as there was no disciplining device of consumer demand – poor reports didn’t affect patient choice as patients could not process the released information, at least as reported by the hospital managers. What motivated reform effort was management concern with hospital reputation.
(Of course an agent may care about reputation because of pride, or because reputation can ultimately affect earnings and careers – it’s difficult to separate these two channels. Regardless of the mechanism, the wide dissemination of rating scores is an integral feature of a “ratings” reform.)
A related natural experiment from the U.K.  also met with the same conclusion as the Wisconsin hospital study, but with a twist.
The U.K. National Health Service adopted a universal benchmark for ambulance organizations: success was defined to be at least 75% of life-threatening emergency calls met within 8 minutes. However the public release of such a rating system was left to the discretion of the different U.K. regions: in England the ambulance ratings were made available to the public starting in 2001 while in Wales the same rating system was only revealed to the assessed health organizations for use as a managerial tool.
Publicizing the English low performers was explicitly meant to motivate behavior through “naming and shaming” while high performing organizations were celebrated. And shame avoidance motivation again appeared to work: For English ambulances, the timely response rate increased significantly from 1999 to 2003 – the percent of priority emergency calls serviced within 8 minutes rose from 55% to 75%, while the rate for Welsh ambulances remained flat at 55%.
But all is not rosy with these relatively simple interventions of informational tracking and release. For one, any reform needs to ensure that the selected ratings measure what matters – for example school ratings are often based on test performance, but test results are not a complete measure of education, and ratings systems risk teachers focusing their efforts on “teaching to the test” at the expense of a broader education.
In the U.K. case, the metric of prompt ambulance response to emergency calls is a salient (although not complete) measure of effective service– for example the heart attack survival rate is very sensitive to prompt treatment. However there were anecdotes that ambulance services relocated closer to urban areas in order to respond more promptly to the majority of calls. While this type of response to the ratings system does increase the number of patients promptly attended, it also raises the issue of equity and presents perhaps unanticipated consequences of such a reform.
Another key challenge is “gaming”: agents’ manipulation of information reporting for their own benefit.
After public disclosure, the English ambulance response time distribution exhibited a sharp discontinuity at exactly 8 minutes whereas before 2001 there was no such discontinuity. This suggests the deliberate reclassification of response times to just below the 8 minute threshold when the true response time actually exceeded it. Subsequent analysis indicates that one third of the reported performance gain is due to gaming and not actual improvements in service. This is a serious challenge to any performance rating system, whether or not it is explicitly tied to monetary incentives.
How can the rating system be made more “game-proof”? Well a highly entertaining paper  by Bevan and Hood mentions one approach used widely in a system that overwhelmingly relied on targets to galvanize behavior – the Soviet example of “hanging the admirals” i.e. liquidating the managers who were caught gaming their targets. So certainly excessive or extreme sanction may give pause to potential gamers.
But, more realistically for the settings we work in today, a robust auditing system that introduces unannounced, and hence uncertain, audit activities may reduce gaming behavior. Bevan and Hood give the example of traffic cameras that record speeding cars. Drivers may know the location of the cameras but not whether any particular camera is operating or the precise speed that trips the camera into action. If these parameters were known than drivers would be able to “game” the system and drive right up until the trip speed in the presence of the camera (and speed elsewhere). Introducing uncertainty around such features as the timing of audit or the targets/performances assessed in audit is likely to be key in efforts to reduce gaming.