Syndicate content

Stata commands

Five small things I’ve learned recently

David McKenzie's picture

As a change from my usual posts, I thought I’d note five small things I’ve learned recently, mostly to do with Stata, with the hope that they might help others, or at least jog my memory when I unlearn them again soon.

1.Stata’s random number generator has a limit on the seed that you can set of 2,147,483,647.
Why did I learn this? We were doing a live random assignment for an impact evaluation I am starting in Colombia. We had programmed up the code, and tested it several times, with it working fine. In our test code, we had set the seed for random number generation as the date “04112018”. Then when my collaborator went to run this live, it was decided to also add the time of the drawing at the end, so that the seed was set as “041120180304”.  This generated an error, and prevented the code from running. Luckily we could quickly fix it, and the live draw proceeded ok. But lesson learned, 2^31-1 is a large number, but sometimes binds.

Weekly links April 20: Swifter justice, swifter coding, better ethics, cash transfers, and more

David Evans's picture
  • From the DIME Analytics Weekly newsletter (which I recommend subscribing to): applyCodebook – One of the biggest time-wasters for research assistants is typing "rename", "recode", "label var", and so on to get a dataset in shape. Even worse is reading through it all later and figuring out what's been done. Freshly released on the World Bank Stata GitHub thanks to the DIME Analytics team is applyCodebook, a utility that reads an .xlsx "codebook" file and applies all the renames, recodes, variable labels, and value labels you need in one go. It takes one line in Stata to use, and all the edits are reviewable variable-by-variable in Excel. If you haven't visited the GitHub repo before, don't forget to browse all the utilities on offer and feel free to fork and submit your own on the dev branch. Happy coding! 

  • Is it possible to speed up a justice system? On the Let's Talk Development blog, Kondylis and Corthay document a reform in Senegal that gave judges tools to speed up decisions, to positive effect. The evaluation then led to further legal reform.  

  • "Reviewing thousands of evaluation studies over the years has also given us a profound appreciation of how challenging it is to find interventions...that produce a real improvement in people’s lives." Over at Straight Talk on Evidence, the team highlights the challenge of finding impacts at scale, nodding to Rossi's iron law of evaluation ("The expected value of any net impact assessment of any large scale social program is zero") and the "stainless steel law of evaluation" ("the more technically rigorous the net impact assessment, the more likely are its results to be zero – or no effect"). They give evidence across fields – business, medicine, education, and training. They offer a proposed solution in another post, and Chris Blattman offers a critique in a Twitter thread.  

  • Kate Cronin-Furman and Milli Lake discuss ethical issues in doing fieldwork in fragile and violent conflicts

  • "What’s the latest research on the quality of governance?" Dan Rogger gives a quick round-up of research presented at a recent conference at Stanford University.  

  • In public procurement, lower transaction costs aren't always better. Over at VoxDev, Ferenc Szucs writes about what procurement records in Hungary teach about open auctions versus discretion. In short, discretion means lower transaction costs, more corruption, higher prices, and inefficient allocation. 

  • Justin Sandefur seeks to give a non-technical explanation of the recent discussion of longer term benefits of cash transfers in Kenya (1. Cash transfers cure poverty. 2. Side effects vary. 3. Symptoms may return when treatment stops.) This is at least partially in response to Berk Özler's dual posts, here and here. Özler adds some additional discussion in this Twitter thread.  

IE analytics: introducing ietoolkit

Luiza Andrade's picture
Scientific advances are the result of a long, cumulative process of building knowledge and methodologies -- or, as the cliché goes, “standing on the shoulders of giants”. One often overlooked, but crucial part of this climb is a long tradition of standardization of everything from mathematical notation and scientific terminology, to format for academic articles and references.

Finally, a way to do easy randomization inference in Stata!

David McKenzie's picture

Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s paper, and the books by Imbens and Rubin, and Gerber and Green). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.

This has now changed with a new command ritest written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The Stata journal article is also now out.

How do I get this command?
Simply type findit ritest in Stata.
[edit: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type

net describe ritest, from(

Weekly links March 25: nudges, helpful Stata commands, saving more and earning more, and more…

David McKenzie's picture

Endogenous stratification: the surprisingly easy way to bias your heterogeneous treatment effect results and what you should do instead

David McKenzie's picture

A common question of interest in evaluations is “which groups does the treatment work for best?” A standard way to address this is to look at heterogeneity in treatment effects with respect to baseline characteristics. However, there are often many such possible baseline characteristics to look at, and really the heterogeneity of interest may be with respect to outcomes in the absence of treatment. Consider two examples:
A: A vocational training program for the unemployed: we might want to know if the treatment helps more those who were likely to stay unemployed in the absence of an intervention compared to those who would have been likely to find a job anyway.
B: Smaller class sizes: we might want to know if the treatment helps more those students whose test scores would have been low in the absence of smaller classes, compared to those students who were likely to get high test scores anyway.

How to overcome the (almost insurmountable) task of tracking poverty trends without good consumption data?

Hai-Anh H. Dang's picture
Just imagine a scenario where your counterpart—the Minister of Economic Development in country X—is soon to present to his Congress the latest poverty trends. This is for a hearing on the country’s next 5-year (or 10-year) economic development plan. As a development practitioner, you are tasked with supporting him or her with the technical analysis, despite the notorious challenge that the most recent round of household survey data is not comparable to earlier rounds due to various changes in survey design.

Generating Regression and Summary Statistics Tables in Stata: A checklist and code

Matthew Groh's picture
As a research assistant working for David, I’ve had to create many, many regression and summary statistics tables. Just the other day, I sent David a draft of some tables for a paper that we are working on. After re-reading the draft, I realized that I had forgotten to label dependent variables and add joint significance tests in a couple regression tables. In an attempt to avoid forgetting these details in the future and potentially help future researchers, I thought I’d post a checklist for generating regression and summary statistics tables.