As a change from my usual posts, I thought I’d note five small things I’ve learned recently, mostly to do with Stata, with the hope that they might help others, or at least jog my memory when I unlearn them again soon.
1.Stata’s random number generator has a limit on the seed that you can set of 2,147,483,647.
Why did I learn this? We were doing a live random assignment for an impact evaluation I am starting in Colombia. We had programmed up the code, and tested it several times, with it working fine. In our test code, we had set the seed for random number generation as the date “04112018”. Then when my collaborator went to run this live, it was decided to also add the time of the drawing at the end, so that the seed was set as “041120180304”. This generated an error, and prevented the code from running. Luckily we could quickly fix it, and the live draw proceeded ok. But lesson learned, 2^31-1 is a large number, but sometimes binds.
From the DIME Analytics Weekly newsletter (which I recommend subscribing to): applyCodebook – One of the biggest time-wasters for research assistants is typing "rename", "recode", "label var", and so on to get a dataset in shape. Even worse is reading through it all later and figuring out what's been done. Freshly released on the World Bank Stata GitHub thanks to the DIME Analytics team is applyCodebook, a utility that reads an .xlsx "codebook" file and applies all the renames, recodes, variable labels, and value labels you need in one go. It takes one line in Stata to use, and all the edits are reviewable variable-by-variable in Excel. If you haven't visited the GitHub repo before, don't forget to browse all the utilities on offer and feel free to fork and submit your own on the dev branch. Happy coding!
Is it possible to speed up a justice system? On the Let's Talk Development blog, Kondylis and Corthay document a reform in Senegal that gave judges tools to speed up decisions, to positive effect. The evaluation then led to further legal reform.
"Reviewing thousands of evaluation studies over the years has also given us a profound appreciation of how challenging it is to find interventions...that produce a real improvement in people’s lives." Over at Straight Talk on Evidence, the team highlights the challenge of finding impacts at scale, nodding to Rossi's iron law of evaluation ("The expected value of any net impact assessment of any large scale social program is zero") and the "stainless steel law of evaluation" ("the more technically rigorous the net impact assessment, the more likely are its results to be zero – or no effect"). They give evidence across fields – business, medicine, education, and training. They offer a proposed solution in another post, and Chris Blattman offers a critique in a Twitter thread.
Kate Cronin-Furman and Milli Lake discuss ethical issues in doing fieldwork in fragile and violent conflicts.
"What’s the latest research on the quality of governance?" Dan Rogger gives a quick round-up of research presented at a recent conference at Stanford University.
In public procurement, lower transaction costs aren't always better. Over at VoxDev, Ferenc Szucs writes about what procurement records in Hungary teach about open auctions versus discretion. In short, discretion means lower transaction costs, more corruption, higher prices, and inefficient allocation.
Justin Sandefur seeks to give a non-technical explanation of the recent discussion of longer term benefits of cash transfers in Kenya (1. Cash transfers cure poverty. 2. Side effects vary. 3. Symptoms may return when treatment stops.) This is at least partially in response to Berk Özler's dual posts, here and here. Özler adds some additional discussion in this Twitter thread.
Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s paper, and the books by Imbens and Rubin, and Gerber and Green). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.
This has now changed with a new command ritest written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The Stata journal article is also now out.
How do I get this command?
Simply type findit ritest in Stata.
[edit: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type
net describe ritest, from(https://raw.githubusercontent.com/simonheb/ritest/master/)
- Marc Bellemare on the subject of my dissertation work – using repeated cross-sections
- From Next Billion, a summary of research showing how saving leads people to generate more income by working harder
- In the Guardian, how the World Bank is nudging health and hygiene in several projects…and the defense against whether this distracts from more structural issues “Why not make all programmes as effective as possible, even if it doesn’t turn a very poor country into a Scandinavian country overnight”
- Also from the Guardian, 10 sources of data for international development research
- randtreat – a new Stata command to do random assignment that can deal with uneven numbers of observations (more details here) – this builds on an old blog post I did on the issue, and great to see some of these practical issues getting made easier for everyone.
- synth_runner – the IDB’s Development that Works blog has a post about a new Stata command to help automate use of the synthetic control method.
A common question of interest in evaluations is “which groups does the treatment work for best?” A standard way to address this is to look at heterogeneity in treatment effects with respect to baseline characteristics. However, there are often many such possible baseline characteristics to look at, and really the heterogeneity of interest may be with respect to outcomes in the absence of treatment. Consider two examples:
A: A vocational training program for the unemployed: we might want to know if the treatment helps more those who were likely to stay unemployed in the absence of an intervention compared to those who would have been likely to find a job anyway.
B: Smaller class sizes: we might want to know if the treatment helps more those students whose test scores would have been low in the absence of smaller classes, compared to those students who were likely to get high test scores anyway.
- Stata commands