Syndicate content

January 2019

Successful Teachers, Successful Students: A New Approach Paper on Teachers

David Evans's picture

Teachers are crucial to the learning process. Every year, we get new evidence from a new country on how much value an effective teacher adds. This is one area where the evidence lines up with intuition: Even without a bunch of value added measures, most of us would readily admit that without good teachers, we wouldn’t be where we are today. 

We’ve both done some research on teachers – Tara with her work on managing the teacher workforce in India, Dave with his work on teacher professional development, and both contributing to the World Bank’s World Development Report 2018 on learning. Over the last several months, we reviewed the latest evidence on how to attract the best candidates into the teacher profession and then how to prepare them, select them, support them, and motivate them. The result of that review is the new World Bank policy approach to teachers: Successful Teachers, Successful Students: Recruiting and Supporting Society’s Most Crucial Profession. We know this is a crowded field: There are lots of reports on how to help teachers to be as effective as they can. Our objective was to make the most recent evidence accessible, drawing on dozens of studies out in 2017 and 2018 as well as much of the accumulated work to that point.

Sex, Lies, and Measurement: Do Indirect Response Survey Methods Work? (No…)

Berk Ozler's picture

Smart people, mainly with good reason, like to make statements like “Measure what is important, don’t make important what you can measure,” or “Measure what we treasure and not treasure what we measure.” It is rumored that even Einstein weighed in on this by saying: “Not everything that can be counted counts and not everything that counts can be counted.” A variant of this has also become a rallying cry among those who are “anti-randomista,” to agitate against focusing research only on questions that one can answer experimentally.

However, I am confident that all researchers can generally agree that there is not much worse than the helpless feeling of not being able to vouch for the veracity of what you measured. We can deal with papers reporting null results, we can deal with messy or confusing stories, but what gives no satisfaction to anyone is to present some findings and then having to say: “This could all be wrong, because we’re not sure the respondents in our surveys are telling the truth.” This does not mean that research on sensitive topics does not get done, but like the proverbial sausage, it is necessary to block out where the data came from and how it was made.

Weekly links January 25: Doing SMS surveys, a Deaton classic re-released, upcoming conferences, coding tips, and more...

David McKenzie's picture
  • Recommendations for conducting SMS surveys from the Busara Center, who “sent a one-time mobile SMS survey to 3,489 Kenyans familiar with SMS surveys and to 6,279 not familiar. Each sample was randomized into one of 54 cross-cutting treatment combinations with variation across several dimensions: incentive amounts, pre-survey communication, survey lengths, and content variation” include keep mobile surveys to 5 questions or provide higher incentives; randomize questions and response options; and know that males and under-30s will be most likely to respond. Some useful benchmarks on survey response rates (only 36% overall, and 55% for those who have participated in past studies, vs only 18% for a sample of newer respondents); how much incentives help (moving from 0 to 100 KES ($1) increases response by 8% in the new respondent sample, but has no effect for past respondents).
  • Oxford’s CSAE has set up a new coder’s corner, where DPhil students will be posting weekly tips on coding that they have found useful.
  • VoxDev this week focuses on industrial policy – including Dani Rodrik starting the series off by giving an overview of where we currently stand in the literature: “the relevant question for industrial policy is not whether but how”
  • On Let’s Talk Development, Dave Evans notes that a 20-year re-issue of Angus Deaton’s famous “Analysis of Household Surveys” is now out (DOWNLOAD FOR FREE!!!!), with a new preface in which he reflects in trends over the last two decades – “I would be even more skeptical. As I taught the material over the years, it became clear that many of the uses of instrumental variables and natural experiments that had seemed so compelling at first lost a good deal of their luster with time.” – “Twenty years later, I now find myself very much more skeptical about instruments in almost any situation”.  I read this book cover-to-cover multiple times during my PhD and I highly recommend it.
  • Video of Chico Ferreira’s policy talk this week on Inequality as cholesterol : Attempting to quantify inequality of opportunity.
  • Conference calls for papers:
    • CEGA at Berkeley is holding a conference on lab experiments in developing countries, submissions due March 1.
    • Maryland is hosting the next BREAD conference. They invite submissions from interested researchers on any topic within the area of Development Economics. The deadline for submissions is February 18, 2019. Only full-length papers will be considered. Please send your paper to [email protected]
    • The World Bank’s ABCDE conference is on multilateralism/global public goods – submissions are due March 24.

Setting up your own firm for a firm experiment

David McKenzie's picture

The typical approach to examining how workers, consumers, or governments interact with a firm has been for researchers to find a willing firm owner and convince them to run experiments. Examples include Bandiera et al. working with a UK fruit-farmer to test different payment incentives for immigrant workers; Bloom et al. working with a Chinese travel agency to test the effect of letting workers work from home; and Adhvaryu et al. working with an Indian garment firm to measure impacts of soft-skills training for workers and of introducing LED-lighting. However, finding/persuading a firm to do the experiment that a researcher would like to do can be hard, with many of these existing samples coming about through a researcher having a former student or relative who runs one of these firms.

So what should you do if you lack a connection, or you want to do something that you cannot persuade a firm to do?

Recently, a number of researchers have taken a different approach, which is to set up and run for themselves a firm in order to answer research questions. I thought I would give some examples of this work, and then discuss some of the issues that arise or things to think about when deciding about pursuing this research strategy.

Weekly links January 18: an example of the problem of ex-post power calcs, new tools for measuring behavior change, plan your surveys better, and more...

David McKenzie's picture
  • The Science of Behavior Change Repository offers a repository of measures of stress, personality, self-regulation, time preferences, etc. – with instruments for both children and adults, and information on how long the questions take to administer and where they have been validated.
  • Andrew Gelman on post-hoc power calculations – “my problem is that their recommended calculations will give wrong answers because they are based on extremely noisy estimates of effect size... Suppose you have 200 patients: 100 treated and 100 control, and post-operative survival is 94 for the treated group and 90 for the controls. Then the raw estimated treatment effect is 0.04 with standard error sqrt(0.94*0.06/100 + 0.90*0.10/100) = 0.04. The estimate is just one s.e. away from zero, hence not statistically significant. And the crudely estimated post-hoc power, using the normal distribution, is approximately 16% (the probability of observing an estimate at least 2 standard errors away from zero, conditional on the true parameter value being 1 standard error away from zero). But that’s a noisy, noisy estimate! Consider that effect sizes consistent with these data could be anywhere from -0.04 to +0.12 (roughly), hence absolute effect sizes could be roughly between 0 and 3 standard errors away from zero, corresponding to power being somewhere between 5% (if the true population effect size happened to be zero) and 97.5% (if the true effect size were three standard errors from zero).”
  • The World Bank’s data blog uses meta-data from hosting its survey solutions tool to ask how well people plan their surveys (and read the comments for good context in interpreting the data). Some key findings:
    • Surveys usually take longer than you think they will: 47% of users underestimated the amount of time they needed for the field work – and after requesting more server time, many then re-request this extension
    • Spend more time piloting questionnaires before launching: 80% of users revise their surveys at least once when surveying has started, and “a surprisingly high proportion of novice users made 10 or more revisions of their questionnaires during the fieldwork”
    • Another factoid of interest “An average nationally representative survey in developing countries costs about US$2M”
  • On the EDI Global blog, Nkolo, Mallet, and Terenzi draw on the experiences of EDI and the recent literature to discuss how to deal with surveys on sensitive topics.

Education spending and student learning outcomes

David Evans's picture

How much does financing matter for education? The Education Commission argued that to achieve access and quality education “will require total spending on education to rise steadily from $1.2 trillion per year today to $3 trillion by 2030 (in constant prices) across all low- and middle-income countries.” At the same time, the World Bank’s World Development Report 2004 showed little correlation between spending and access to school, and the World Development Report 2018 (for which I was on the team) shows a similarly weak correlation between spending and learning outcomes. (Vegas and Coffin, using a different econometric specification, do find a correlation between spending and learning outcomes up to US$8,000 per student annually.)


Sources: Left-hand figure is from WDR 2004. Right-hand figure is from WDR 2018

And yet, correlation is not causation (or in this case, a lack of correlation is not necessarily a lack of causation)! Last month, Kirabo Jackson put out a review paper on this topic: Does School Spending Matter? The New Literature on an Old Question. This draws on a new wave of evidence from the United States’ experience, moving beyond correlations to efforts to measure the causal impact of spending changes. (Jackson and various co-authors have contributed significantly to this literature.) I’ll summarize his findings and then discuss what we might expect to be the same or different in low- or middle-income contexts.

When it comes to modern contraceptives, history should not make us silent: it should make us smarter.

Berk Ozler's picture

On January 2, 2019, the New York Times ran an Op-Ed piece by Drs. Dehlendorf and Holt, titled “The Dangerous rise of the IUD as Poverty Cure.” It comes from two respected experts in the field, whose paper with Langer on quality contraceptive counseling I had listed as one of my favorite papers that I read in 2018 just days earlier in pure coincidence. It is penned to warn the reader about the dangers of promoting long-acting reversible contraceptives (or LARCs, as the IUD and the implant are often termed) with a mind towards poverty reduction. Citing the shameful history of state-sponsored eugenics, which sadly took place both the U.S. and elsewhere, they argue that “promoting them from a poverty-reduction perspective still targets the reproduction of certain women based on a problematic and simplistic understanding of the causes of societal ills.

What started as an Op-Ed with an important and legitimate concern starts unraveling from there. A statement that no one I know believes and is not referenced (in an otherwise very-well referenced Op-Ed) “But there is a clear danger in suggesting that ending poverty on a societal level is as simple as inserting a device into an arm or uterus” is followed by: “Providing contraception is critical because it is a core component of women’s health care, not because of an unfounded belief that it is a silver bullet for poverty.” In the process, the piece risks undermining its own laudable goal: promoting the right and ability of women – especially adolescents, minorities, and the disadvantaged – to make informed personal decisions about whether and when to have a child to improve their own individual welfare first and foremost.

Weekly links January 11: it’s not the experiment, it’s the policy; using evidence; clustering re-visited; and more...

David McKenzie's picture
  • “Experiments are not unpopular, unpopular policies are unpopular” – Mislavsky et al. on whether people object to companies running experiments. “Additionally, participants found experiments with deception (e.g., one shipping speed was promised, another was actually delivered), unequal outcomes (e.g., some participants get $5 for attending the gym, others get $10), and lack of consent, to be acceptable, as long as all conditions were themselves acceptable.” – caveat to note-  results are based on asking MTurk subjects (and one sample of university workers) whether they thought it was ok for companies to do this.
  • Doing power calculations via simulations in Stata – the Stata blog provides an introduction on how to do this.
  • Marc Bellemare has a post on how to use Pearl’s front-door criterion for identifying causal effects – he references this more comprehensive post by Alex Chino which provides some examples of its use in economics.

Changing gender attitudes, one teenager at a time

Markus Goldstein's picture
I’ve been trying to figure out how to get my kids to do more household chores.   Luckily, help was forthcoming from a recent paper by Diva Dhar, Tarun Jain, and Seema Jayachandran.   They take to Indian secondary schools with an intervention designed to increase support for gender equality among adolescents.   And yes,  it does work, including getting boys to do more chores.  
 

Attrition rates typically aren’t that different for the control group than the treatment group – really? and why?

David McKenzie's picture

When I start discussing evaluations with government partners, and note the need for us to follow and survey over time a control group who did not get the program, one of the first questions I always get is “Won’t it be really hard to get them to respond?”. I often answer with reference to a couple of case examples from my own work, but now have a new answer courtesy of a new paper on testing for attrition bias in experiments by Dalia Ghanem, Sarojini Hirshleifer and Karen Ortiz-Becerra.

As part of the paper, they conduct a systematic review of field experiments with baseline data published in the top 5 economics journals plus the AEJ Applied, EJ, ReStat, and JDE over the years 2009 to 2015”, covering 84 journal articles. They note that attrition is a common problem, with 43% of these experiments having attrition rates over 15% and 68% having attrition rates over 5%. The paper then has discussion over what the appropriate tests should be to figure out whether this is a problem. But I wanted to highlight this panel from Figure 1 in their paper, which plots the absolute value of the difference in attrition rates by treatment and control. They note “64% have a differential rate that is less than 2 percentage points, and only 10% have a differential attrition rate that is greater than 5 percentage points.” That is, attrition rates aren’t much different for the control group.