I just got back from the annual meetings of the American Economic Association (AEAs) in Boston. It’s been a couple of years since I last went, and after usually going to just development conferences, it was interesting to see some of the work going on in other fields. Here are a few notes:
Present Bias after 20 years: I went to a great session on this topic, which was looking back on 20 years since David Laibson’s dissertation work on present bias. The session included Ted O’Donoghue and Matthew Rabin on “lessons learned and to be learned”, David Laibson on “why there isn’t more demand for commitment” and Charles Sprenger on “judging experimental evidence on dynamic inconsistency”. None of the papers appear to be online yet, so something to look forward to in due course. But a few key take-aways:
- The problems with measuring time inconsistency using money questions: it has been relatively standard to assess time inconsistency by asking someone to choose between an amount today and an amount in 1 month, and then an amount in 5 months vs 6 months, and see if the discount rates differ. However, as O’Donoghue noted, present bias should operate on utility, not money. As a result, individuals should maximize wealth, and then present bias should determine how they allocate that wealth as consumption over time. So if they can borrow and save, they should arbitrage away any difference between interest rates and the rates at which you offer them money today vs in the future. Money discounting therefore has this problem of requiring either individuals to not be able to borrow or lend or having them not think about it; the problem of depending on what the external consumption choices are; and the well-known issues of confounding with transaction costs and payment reliability. So there appears to be a move away from these questions. But this raises two issues: 1) these questions do seem to provide reliable indicators of behavior in some settings, so we need to know under what settings; and 2) there is no generally accepted alternative – people have been playing around with non-monetary choice problems where behavior over time or other things can be used.
- Why is there not more demand/so much demand for commitment? David Laibson presented work calibrating a model of choice for commitment technology. He showed that there is a large range of parameters in which a sophisticated present-biased decision-maker will choose commitment products when there are no costs. But as soon as you add either partial naiveté about time preferences, or some relatively small costs of entering into commitment contracts, the demand for commitment products almost disappears entirely. This can be viewed with the fact that few firms offer commitment contracts, and that the main examples we have are products introduced by researchers. But as Nava Ashraf, one of the discussants noted, some of these products in developing countries have had very large impacts – suggesting more people should be demanding commitment devices.
- Heterogeneity in discount rates is very hard to disentangle from heterogeneity in other parameters
- Hard and soft commitments – one of the frontier issues is thinking about continuums of commitment, and thinking about a sweet spot where individuals can commit to some extent, but still retain some flexibility to back out if they need to. Nava pointed to recent work by Karlan and Linden as an example, where ear-marking beat strong commitment.
Should we be using Units of Standard Deviation to Compare Effect Sizes Across Studies? This came up in both the two discussions I gave, as well as in thinking about my own presentation (new work on measuring business practices in small firms that I’ll blog about when a full paper is available). This should be particularly familiar for readers working on health and education - it is very common to hear that “intervention X led to a 0.2 S.D. increase in test scores”. But the more I think about this, the more I think that this is not a good measure for comparing across studies, or for power calculations:
- Comparing across studies: I discussed Eva Vivalt’s paper on external validity (which she blogged about previously on our blog). One of her findings is that interventions run by NGOs/Academics have larger effect sizes that those run by governments. But consider the following example, where both run the same intervention to try to improve test scores in India. The NGO works with a very homogeneous group (control mean score 50%, std dev of 5%). The NGO increases test scores by 1 percentage point, which is a 0.2 S.D. improvement. The Government works with a much more diverse set of kids, with the same control mean (50%), but std dev of 20%. The Government program increases test scores by 2 percentage points. Despite this being twice as large as the NGO effect, when converted into units of S.D., it is only half the size (0.1 S.D.). i.e. comparing effect sizes in terms of units of standard deviations artificially inflates the effectiveness of interventions done on more homogeneous groups, all else equal. But as I found when trying to compare the estimates in my study to those in other work, we may also be concerned trying to compare magnitudes across studies with other ways of scaling them.
- Power calculations: I discussed the Give Directly evaluation by Haushofer and Shapiro. They noted they had powered their study to detect a 0.2 S.D. impact. But this got me thinking about why we should care about S.D. when thinking about impacts. In particular, consider the impact on business revenue, which is quite heterogeneous (the std dev is about twice the mean in the control group). A 0.2 S.D. increase is then approximately a 37% increase in business revenue. If they had screened the sample to make it more homogeneous, then 0.2 S.D. might be a 20% or even a 10% revenue increase. It seems to me much more natural to think in terms of return on investment or what the percentage or absolute level increase we would like to see is than S.D.