Syndicate content

Looking for a shortcut to identifying great teachers? You may be out of luck.

David Evans's picture
Teachers are important. From Pakistan to Uganda to Ecuador to the United States, study after study shows that a good teacher can make a big difference in student learning. If we want more student learning, then it seems that “hire better teachers” or “make sure you retain the good teachers,” would be good bets. But identifying good teachers is challenging. The studies above measure the learning gains associated with being in a particular teacher’s class, but they don’t identify observable characteristics of good teachers, like “tall teachers are good teachers” or “brunette teachers are good teachers.” While those characteristics seem silly, some areas that seem like no-brainers – such as teacher education and experience – are not consistently correlated with student learning.   

Many countries want to do a better job of identifying the best teachers (to retain and reward them) and the worst teachers (to help them and – in some cases – dismiss them). Some countries have begun to test teachers. The World Bank’s Service Delivery Indicators initiative has tested teachers’ pedagogical ability and basic math and reading ability in several countries, to largely dispiriting results.

A new study by Cruz-Aguayo, Ibarrarán, and Schady asks the question, "Do tests applied to teachers predict their effectiveness?" Short answer: In Ecuador, no.

How do we know? In Ecuador, one in three public school teachers work on a short-term contract. Tenured teachers have higher pay and more benefits. Each year, new tenured slots open and candidates apply for them. Candidates with the highest score on an evaluation get the tenured jobs. The evaluation has three elements, each receiving equal weight: a test, a demonstration class, and points awarded for experience, degrees, or in-service professional development.

The authors then look at schools with at least two contract teachers who participated in the evaluation. They compare children within that school to see if teacher performance in the evaluation has any impact on student learning. For all children, they control for child age and class size. In additional specifications, they control for parental education and household socioeconomic status; they only have that information for three-quarters of the sample. It doesn’t affect the results.

Student learning is measured in six different ways. They use the Early Grade Reading Assessment (EGRA) at mid-year (#1) and at the end of the year (#2), the Early Grade Math Assessment (EGMA) at mid-year (#3) and at the end of the year (#4), and the change in EGRA (#5) and EGMA (#6) scores.

Because children aren’t randomly assigned to classes, they regress teacher evaluation results on student characteristics. That shows if certain kinds of students (for example, rich students or boys) get assigned to the best teachers. There’s no evidence of selective sorting. So after all that, what’s the result? According to the authors, there is “no evidence that the test score or any of its components, the score on the demonstration class, points on the Méritos scale, or the aggregate score on the Concurso predict child achievement in language or math.” Across 84 coefficients [6 outcomes x (6 evaluation elements + 1 total score) x 2 specifications], not a single one is statistically significant.

These are precise zeros: “In the specifications with the additional controls, we can rule out positive associations between the total Concurso score and child test scores larger than 0.02 standard deviations for language, and 0.03 standard deviations for math.”

How does this fit with other results on tests?
Hanushek and Rivkin brought together the evidence some years ago. Out of 9 estimates, 6 were statistically insignificant, 2 were positive, and 1 was negative. (As the figure below shows, most of THAT pie is an insignificant, unappetizing gray.) So this Ecuador data falls in line with much of the existing evidence.

That said, evidence from Peru compares students taught by the same teacher in at two subjects to see whether the students perform better in the subject where the teacher herself tests better. They do, sometimes: “one standard deviation in subject-specific teacher achievement increases student achievement by about 9% of a standard deviation in math. Effects in reading are significantly smaller and mostly not significantly different from zero.”

Beyond tests, a study of demonstration lessons on subsequent teacher performance in Argentina – by Ganimian, Ho, and Alfonso – found modest predictive power of demonstration lessons in identifying candidates who would go on to perform particularly poorly, but not so much for identifying future top performers.

Does this mean that teacher tests are worthless? No. 
When a test of teachers in Mozambique shows that zero percent of teachers have a minimum standard of language knowledge to teach their subject, or a test in Nigeria shows that less than 60 percent of teachers can subtract double digits – as found by Bold and others – an adverse impact on student learning seems likely. It would be interesting to know – in the Ecuador context studied by Cruz-Aguayo, Ibarrán, and Schady – whether any teachers are below a minimum level of knowledge. It’s possible that all the contract teachers who are participating in the evaluation have some basic level of content knowledge. Without minimal content knowledge, teaching the content would seem impossible.

That said, these results from Ecuador are an important reminder that a teacher test is no magic bullet – and may be completely useless – in identifying great candidates for teachers. Ultimately, there may be no shortcut for identifying great teachers: The best bet may be to hire as best as one can, but then rigorously evaluate the teachers’ value added to student learning over several years before granting permanent status.

Bonus materials
Here are open-access versions to three of the papers linked above that have been published in journals  


Submitted by Alonso Sanchez on


Great post. On this subject, I would say that Rockoff, Jacob, Kane and Staiger´s 2010 paper titled Can you recognize an effective teacher when you recruit one is a must read. Their findings and implications, while broadly similar to what you describe above, allow for a more hopeful (certainly more so than in Ecuador) and nuanced set of policy responses. The link is below but I can´t find a way to turn it into a hyperlink right now.

Another issue not discussed here is that even if prediction from the test is poor the entry into the teaching service (or getting tenure within it) via an arguably more meritocratic approach (relevant tests vs. more discretionary ways) is likely to make untraditional candidates of higher-ability more attracted in becoming teachers. Thus, over time, it may be that teacher quality of the system is improved.

Thanks, Alonso. Rockoff et al. is indeed important to this discussion. But here is what they find: "We find modest and marginally significant relationships between student achievement and several nontraditional predictors of teacher effectiveness, including performance on the Haberman selection instrument and a test of math knowledge for teaching." Modest and marginally significant. And that's from one city (New York City). More hopeful? Perhaps. But taken together with the Hanushek results summarized above, I think, I'd still manage expectations. 

Your point that a more meritocratic may attract "better" candidates is certainly plausible and would be interesting to see explored. I wonder if the quality of candidates in Ecuador is changing over time.

Estrada's work in Mexico shows that teachers hired by a rule based on test performance delivered better value added than teachers hired by more discretionary means.

Submitted by Michael Frese on

There is a much talked about Book on all meta-analyses on teachers: Hattie, J. A. (2009). Visible learning: A synthesis of 800+ meta-analyses on achievement. Abingdon: Routledge.
The major point seems to be that it is teacher behavior that counts - all other points of interventions are of low to zero importance.. So it all depends on how the behavior of teachers was measured in the class room exercise. What do you think of that?
Michael Frese

Submitted by Helen Abadzi on

One reason for the lack of results is insufficient detail. And the missing details have to do with memory functions. "Great teachers" first and foremost have to retrieve curricular content automatically and effortlessly. They must read and write rapidly in order to process the necessary paperwork quickly and effortlessly. Only then can working memory have enough space to attend to students. If teaching demands mental effort, the teachers avoid it. So you may find more teacher affects if you collect data on reading speed (silent), arithmetic speed and accuracy, and working memory span.

To get student, the curricula must actually be teaching content with methods and at rates that will enable students to consolidate. But curricular rates are set for the middle class, and the methods are completely empirical. Reading and math fads come and go, but textbooks are set accordingly. There is very little that teachers can do about mitigating bad curricula.

The case of Peru and EGRA illustrates that. Reading completely depends on perceptual learning, and that requires letter-by-letter presentation and a ton of practice to happen. Only then can we talk about comprehension. But textbooks teach whole words and give very little practice. Short of bringing new textbooks in, there is nothing teachers can do to mitigate that.

Many years of educational research by the Bank illustrate one finding: It's impossible to do education without cognitive science. What the donors do is akin to buying a car without an idea what the engine does or whether it will run at all.

Submitted by steven frayne on

great teacher may be temporarily out of luck but never permanently. As they can always provide assignment assistance services to students.

Add new comment