"In summary, the similarities between follow-up studies with and without baseline randomization are becoming increasingly apparent as more randomized trials study the effects of sustained interventions over long periods in real world settings. What started as a randomized trial may effectively become an observational study that requires analyses that complement, but go beyond, intention-to-treat analyses. A key obstacle in the adoption of these complementary methods is a widespread reluctance to accept that overcoming the limitations of intention-to-treat analyses necessitates untestable assumptions. Embracing these more sophisticated analyses will require a new framework for both the design and conduct of randomized trials."
- Well-known blog skeptic Jishnu Das continues to blog at Future Development, arguing that higher wages will not lead to better quality or more effective teachers in many developing countries – summarizing evidence from several countries that i) doubling teacher wages had no impact on performance; ii) temporary teachers paid less than permanent teachers do just as well; and iii) observed teacher characteristics explain little of the differences in teacher effectiveness.
- Are we now all doomed from ever finding significance? In a paper in Nature Human Behavior, a multi-discipline list of 72 authors (including economists Colin Camerer, Ernst Fehr, Guido Imbens, David Laibson, John List and Jon Zinman) argue for redefining statistical significance for the discovery of new effects from 0.05 to using a cutoff of 0.005. They suggest results with p-values between 0.005 and 0.05 now be described as “suggestive”. They claim that for a wide range of statistical tests, this would require an increase in sample size of around 70%, but would of course reduce the incidence of false positives. Playing around with power calculations, it seems that studies that are powered at 80% for an alpha of 0.05 have about 50% power for an alpha of 0.005. It implies using a 2.81 t-stat cutoff instead of 1.96. Then of course if you want to further adjust for multiple hypothesis testing…
The rigorous evidence on vocational training programs is, at best, mixed. For example, Markus recently blogged about some work looking at long term impacts of job training in the Dominican Republic. In that paper, the authors find no impact on overall employment, but they do find a change in the quality of employment, with more folks having jobs with health insurance (for example).
- Martin Kanz summarizes his new paper on understanding the demand for status good consumption based on credit card experiments in Indonesia on Let’s Talk Development – including discussion of an intervention that temporarily boosts self-esteem, and showing that this lowers the demand for status goods.
- Nature news on how brain imaging technology is being used to measure how poverty affects brain development of infants in Bangladesh – differences in grey matter already seen at 2-3 months of age!
- Want to check out what’s going on across many fields in economics? The program and papers from the NBER Summer Institute is a great place to see what’s new.
- Sure, that intervention delivered great results in a well-managed pilot. But it doesn’t tell us anything about whether it would work at a larger scale.
- Does this result really surprise you? (With both positive results and null results, I often hear, Didn’t we already know that intuitively?)
A recent paper – “Cognitive science in the field: A preschool intervention durably enhances intuitive but not formal mathematics” – by Dillon et al., provides answers to both of these, as well as giving new insights into the design of effective early child education.
This is a follow-up to my earlier blog on list experiments for sensitive questions, which, thanks to our readers generated many responses via the comments section and emails: more reading for me – yay! More recently, my colleague Julian Jamison, who is also interested in the topic, sent me three recent papers that I had not been aware of. This short post discusses those papers and serves as a coda to the earlier post…
Random response techniques (RRT) are used to provide more valid data than direct questioning (DQ) when it comes to sensitive questions, such as corruption, sexual behavior, etc. Using some randomization technique, such as dice, they introduce noise into the respondent’s answer, in the process concealing her answer to the sensitive question while still allowing the researcher to estimate an overall prevalence of the behavior in question. These are attractive in principle, but, in practice, as we have been trying to implement them in field work recently, one worries about implementation details and the cognitive burden on the respondents: in real life, it’s not clear that they provide an advantage to warrant use over and above DQ.
The June 2017 issue of the Economic Journal has a paper entitled “Assignment procedure biases in randomized policy experiments” (ungated version). The abstract summarizes the claim of the paper:
“We analyse theoretically encouragement and resentful demoralisation in RCTs and show that these might be rooted in the same behavioural trait –people’s propensity to act reciprocally. When people are motivated by reciprocity, the choice of assignment procedure influences the RCTs’ findings. We show that even credible and explicit randomisation procedures do not guarantee an unbiased prediction of the impact of policy interventions; however, they minimise any bias relative to other less transparent assignment procedures.”
Of particular interest to our readers might be the conclusion “Finally, we have shown that the assignment procedure bias is minimised by public randomisation. If possible, public lotteries should be used to allocated subjects into the two groups”
Given this recommendation, I thought it worth discussing how they get to this conclusion, and whether I agree that public randomization will minimize such bias.
- In this week’s Science, Rema Hanna, Gabriel Kreindler, and Ben Olken look what happened when Jakarta abruptly ended HOV rules – showing how traffic got worse for everyone. Nice example of using Google traffic data – MIT news has a summary and discussion of how the research took place : “The key thing we did is to start collecting traffic data immediately,” Hanna explains. “Within 48 hours of the policy announcement, we were regularly having our computers check Google Maps every 10 minutes to check current traffic speeds on several roads in Jakarta. ... By starting so quickly we were able to capture real-time traffic conditions while the HOV policy was still in effect. We then compared the changes in traffic before and after the policy change.”All told, the impact of changing the HOV policy was highly significant. After the HOV policy was abandoned, the average speed of Jakarta’s rush hour traffic declined from about 17 to 12 miles per hour in the mornings, and from about 13 to 7 miles per hour in the evenings”
- From NPR’s Goats and Soda: 4-year kids of Cameroonian subsistence farmers take the marshmallow test, as do German kids – who do you think did best?