Teachers don’t matter says Nobel Laureate: A new study in Science, and why economists would never publish it…


This page in:

At a recent seminar someone joked that the effect size in any education intervention is always 0.1 standard deviations, regardless of what the intervention actually is. So a new study published last week in Science which has a 2.5 standard deviation effect certainly deserves attention. And then there is the small matter of one of the authors (Carl Wieman) being a Nobel Laureate in Physics and a Science advisor to President Obama.

I was intrigued, especially when I saw the Associated Press headline “It's not teacher, but method that matters” and this quote from Professor Wieman “This is clearly more effective learning. Everybody should be doing this. ... You're practising bad teaching if you are not doing this.” I therefore read the actual article in Science (3 pages [with 26 pages of supporting online material]) and have been struggling since with how to view this work.

So let me start by telling you what this intervention actually was:

Context: the second term of the first-year physics sequence taken by all undergraduate engineering students at the University of British Columbia, Canada. The class was taught in three sections, each by a different instructor. Two of the instructors agreed to take part in the intervention, so their classes were used.

Sample: 2 sections of the same class at UBC. Just over 250 students in each section.

Intervention: The sections were taught the same for 11 weeks. Then in the 12th week, one of the sections was taught as normal by an experienced faculty member with high student evaluations, while the other was taught by two of Wieman’s grad students (the other two co-authors of this paper), using the latest in interactive instruction. This included pre-class reading assignments, pre-class reading quizzes, in-class clicker questions (using a device like the audience uses to vote with in e.g. Who wants to be a millionaire?), student-student discussion, small-group active learning tasks, and targeted in-class instructor feedback. There was no formal lecturing – the point is that rather than being passive recipients of power-point slides and lectures, everything is interactive, with the instructor responding to clicker responses and what is heard during the student exercises.

Timing: A 12-question quiz on what they had been taught during the week was administered at the end of this 12th week.

Results: Despite the fact that students were not randomized into the two sections, and that they had had different instructors for 11 weeks before, the students looked similar on test scores and other characteristics before this 12th week. Then the authors find (i) attendance was up 20% in the interactive class; (ii) students were more engaged in class (as measured by third-party observers monitoring from the back rows of the lecture theatre), and (iii) on the 12 question test, students in the interactive class got an average of 74 percent of the questions right, while those taught using traditional method scored only 41 percent – a 2.5 standard deviation effect. This histogram demonstrates how stark the difference in performance was between the two groups.

Source: Figure 1 in Deslauriers et al. (2011), Science 332(862).

My first reaction was “They published this?, it would never get into a top quality Economics journal”. Why?

·         I don’t think this even passes my external validity rant criteria – except at most as proof of concept. But performance on 12 questions after one week of one course in one Canadian University seems to be pushing it.

·         The 12 questions were written by the grad students teaching the class (and approved by the instructor of the other class), but there are obvious concerns about teaching to the test.

·         One week is a really short time to look at effects for- surely we want to see if they persist.

·         The graduate students teaching the class certainly had extra incentive to do extra well in this week of teaching.

·         The test was done in class, and 211 students attended class in the intervention section, compared to 171 in the control section. No adjustment is made for selection into taking the test- one could imagine for example that the most capable students get bored during regular lectures, and so only start attending when there is something different going on…(or tell a story in the other direction too, the point is some bias seems likely)

·         The test was low stakes, counting for at most 3% of final grade.

·         Despite them calling this an “experiment”, which it is in the scientific sense of any intervention being an experiment to try something out, there is no randomization going on anywhere here, and the difference-in-difference is only done implicitly.

But on the other hand, having sat through (and taught) a number of large lectures in my time, and having talked to academics who have started using the “clicker”, it does seem plausible that the intervention could have large effects. The experience certainly is interesting, and by sharing it, is likely to motivate others to try it – hopefully in experimental trials with many more classes and subjects to learn whether/how this works best. This got me to start thinking about what we miss in Economics by not having a formal outlet to share results like this – Medical journals have case studies and proof of concept (or efficacy) trial results, and it seems Science does too: 3 pages, published quite fast (the paper was submitted on December 16, 2010 and appeared in print on May 13), of something with a massive effect size – it is hard to argue that we shouldn’t be sharing this information.

Of course, I worry about the media hyping of such a story. The authors humbly enough title the paper “Improved learning in a large-enrollment physics class” (emphasis mine) ; but then this becomes “It's not teacher, but method that matters” in the AP, and the public quotes of Professor Wieman are much stronger than those in the article. On the other hand, the New York Times reported the results with reasonable criticism, noting quite a lot of the limitations of the study in its reporting. So some of the media is doing its job….none the less, I do feel uneasy about having press releases about such a slight study – it may be worth sharing with other educators at this stage to encourage more study, but I’m not sure it meets the bar of being worth broadcasting to the World.

So I’m left feeling confused – on the one hand astonished that such a paper is published in a prestigious scientific journal, but on the other hand still wondering what we are missing by not publishing such encouraging early results. It does seem the sort of paper that might cause us to update our priors, even if it doesn’t convince us completely. I think in economics we immediately turn to the “will it work?” questions, without appreciating the “can it work?” evidence enough at times.  I can think of several recent economic studies which I might classify similarly (a great example being the work by Pascaline Dupas and Jon Robinson on the impact of a savings account in Kenya – a paper that was done on a small budget while they were students, and has really remarkable results which have got quite a bit of press, even if the data are not always precise and they don’t have enough power to fully nail everything down) – it would be great if Economists could accept and publish such results in a succinct form, as they are, and build on them going forward, rather than the drag it out and focus only on what the study can’t do approach of publishing in economics.

What do you think? Is the economics profession missing insightful experiences from promising new interventions because we only consider the full packaged product? Or have the Science guys got it all wrong?



David McKenzie

Lead Economist, Development Research Group, World Bank

Mark Thorson
May 16, 2011

Surely you mean the paper was submitted on December 16, 2010.
The publication cycle may be fast, but not so fast that they're
publishing papers submitted in the future.

David McKenzie
May 16, 2011

Thanks Mark, I've corrected this in the post. Given how long some economics journals take, it certainly feels at time like scientific and medical journals are using time machines!

May 17, 2011

My first thought in reading about this was: "it could just be the Hawthorne effect". On the plus side, it seems likely that someone will run a better designed experiment of these methods soon.

May 17, 2011

I'm a big fan of Carl Wieman for turning his attention to teaching and learning and his experimental method. However, here that the experimental class had twice as many teachers (even if they were less experienced) and it is not clear which of the interventions had the positive effect or whether it was the combination. Still, I hope he continues to try these experiments and publish the results so they can develop from pilot status.

Suhas D. Parandekar
May 20, 2011

Thank you for a very interesting post. I would like to share my first reactions to reading your piece (I could not access the SCIENCE article itself).

1. The study of complex systems using tools of discrete mathematics, computer simulations and scientific experiments has by now provided a substantial body of knowledge regarding the functioning of what John Holland termed as "complex adaptive systems" or c.a.s. It seems to me fairly straightforward to prove that a classroom session as well as further hierarchical aggregations of such sessions forms a complex adaptive system. The human brain is another c.a.s. and in a sense one can conceive of a classroom session as a temporary network of human brains.

2. Within a c.a.s. depending on the structure of the network and the protocols that govern the flow of information, matter and energy, two things happen often - combinations of negative and positive feedback loops and iterated self-similar replication result in dramatical non-linear effects and massive reduction in entropy. In simple, perhaps non-mathematical terms, "tall oaks from little acorns grow."

3. Economists like everybody else would do very well to understand that by altering network architecture and protocols (which is what I understand the reported experiment did from your concise and clear description) certain teaching methods can unleash huge non-linear effects. This does not seem to be at all a surprising result.

4. I am not too concerned that there were only 12 questions, nor would 50 questions have convinced me any better. I would have liked to read the paper, but from your summary I do not think that the authors are attempting to make any statistical inference about any hypothetical universe from which a random sample was drawn for that purpose. One may also ask - why University of British Columbia, why just one University and so on? Before asking any of those questions, it seems to me that the most important question to ask is what does this paper purport to do? To my mind, it tells a story about this experiment (which of course is not perfect) which shows that in the UBC, when such and such was done, the following results were obtained. The "such and such" was not that the "professor wore a yellow shirt" but something with a theoretical explanation. And I find those 2.5 standard deviation results to be immensely fasinating, thank you very much for sharing.

May 22, 2011

The teacher nevertheless matters (I think). Good teachers would have the capacity to vary their teaching techniques, try new methods and motivate students to participate interactively.

Economists could perhaps widen the channels for publishing work. Could economists submit work to science journals for publication? Just a thought.

Thanks for your interesting blog post. I hope your suggestions get taken up.

Yongmei Zhou
June 06, 2011

David, first of all, congratulations! You guys have made a splash in a pretty saturated blogger sphere. I first came across the blog not from our intranet but from Chris Blattman's indefatigable blogging. Chris' recommendation increased my chance of clicking the link. Similarly, Carl Wieman's advocacy of interactive teaching probably got more people thinking about the merits of interactive learning, despite the obvious methodological weakness of the evaluation. Which brings me to a topic I often wonder about: what's the best way to influence policymakers to adopt new ideas. Are they more convinced by an idea that's proven by randomized evaluation, or personal observation of a compelling story, or hearing it from a source they trust? It's interesting to see that Wieman combines the force of mass media with the reputation of the Science journal. I suspect that the Science journal alone would have made limited impact on the education policy community or the public. Would a famous journalist reporting on this case and all the numbers (without Wieman's study) have made the same splash? I don't know. Long time ago when Rachel Glennerster and Ben Olken and I were on the same panel discussing RCT, I had challenged JPAL to do an RCT to evaluate the relative efficacy of these channels of influence. Haven't seen anything yet.

I love high-standard evaluations but I've also sinned many times to "broadcast to (my small) world" a compelling idea in my role as a policy advisor. One such idea is the Activity-based Learning model of classroom management. Though an NGO invented it two decades ago, Tamil Nadu was the first Indian state to adopt it in large scale. A number of other Indian states have since adopted it. Now, if you have seen ABL in action and the familiar counter factual, you cannot help but talking about it and wondering why others don't do it. A rigorous evaluation will be great, though the lack of it didn't seem to deter a number of large states to embark on this highly challenging reform. I hope the lack of a rigorous evaluation of ABL does not indicate lack of responsiveness of the research community.

Joshua Muskin
August 12, 2011

Thank you for this very thoughtful blog. The sincere intellectual reflection, and even uncertainty, to which you admit is precisely what i perceive that we miss in most scientific journal articles, whether in the physical or social (which i know much better) sciences. Here, and in the article on which you report, I hear you thinking out loud and intelligently about issues for which there is, I posit, virtually never one correct answer. What we really need on the ground (in this instance, in the classroom) are concrete ideas that teachers and other educators might try out themselves. These ideas include strategies, or approaches, but they also include the criteria, or standards, and means to assess these criteria that educators (or anyone) can employ to determine the effectiveness of what they are trying and to guide them in any revisions. So, yes, we disserve ourselves when we only publish empirically proven "answers," mainly because these answers will always be contextualized and, therefore, limited in their applicability; at least in the social sciences (of which economics is one, though sometimes it seems to pretend to be a physical science). Let's think out loud in public more often. Thank you for your doing so here with such intellectual honesty.

Also, thank you for drawing our attention to the ridiculous (even dangerous) AP interpretation of the research into the title “It's not teacher, but method that matters.”


Joshua Muskin

August 19, 2011

Excellent points about the Wieman paper, however, Carl Wieman is not the only physicist studying physic education. I don't know if you will appreciate a paper list or not but you seem to have a proclivity to read the literature so I found myself obligated to expose you to more of it. Please don't take this as me trying to prove you wrong. I mean no offense by it.

A 6000 student study on interactive engagement techniques - Richard Hake

Ten years of research on peer instruction - Eric Mazur

Differences in professors implementations of peer instruction - Chandra Nurpen and Noah Finkelstein

Student expectations in introductory physics - Edward Redish

I hope you will find that physics education is a growing field not just represented by Carl Wieman.