You Can’t Manage What You Don’t Measure


This page in:

When it comes to measuring student learning outcomes, you often hear critics refrain “you can’t fatten a cow by weighing him all the time,” in an attempt to say that you cannot truly educate students by spending all the time getting ready for testing and recording test scores. Of course not. But as the management guru Peter Drucker famously said, “If you can't measure it, you can't manage it.”

If you don’t measure, then how do you know how you are doing? How do you know if you are doing well? Or poorly? Without adequate information about learning outcomes, students, families, teachers cannot properly decide on what actions should be taken to improve learning outcomes. And improving cognitive skills is important for economic development.

In my experience visiting schools over the years, I have spoken to teachers who use assessment results to gauge their teaching and decide on allocating inputs. I have also been to schools where teachers have no test scores. How are they expected to make choices about what they do in the classroom without any information?

I suppose the criticism against measurement is directed at high stakes testing and over-reliance on assessments (see for example). 

But high-stakes testing is one way of using information to improve outcomes. There is some evidence that in some states in the United States such as Florida and New York school actors respond through accountability systems. Another high-stakes example has come to be known as “naming and shaming” school leaders. This has been used in Great Britain and a policy change led to a unique natural experiment in England and Wales. Prior to devolution in 1999 in Great Britain, the governance of schools (and hospitals) in England and Wales was similar. After devolution, the funding and organization continued to be similar, but the two governments adopted different policies in the pursuit of common objectives. A study of these two “natural experiments” compared outcomes in the two countries before and after these policy changes. The governance model of “trust and altruism” resulted in worse reported performance in Wales as compared with England on what were each government's key objectives. “Naming and shaming” school leaders worked in England, as compared with Wales, resulting in improved examination performance.

In school systems where parents choose schools, information is vital for the decision. But it can also be used as an accountability measure and prod providers into improving outcomes. In the Netherlands, the school quality scores not only improve school choice, they also lead to school improvement. Both average grades and the number of diplomas awarded increase after receiving a negative score, and these responses cannot be attributed to gaming activities of the school. For schools that receive the most negative ranking, the short-term effects (one year after a change in the ranking of schools) of quality transparency on final exam grades equal 10 to 30 percent of a standard deviation increase.

But what about simply providing information on learning outcomes? Would that be enough to improve what is going on?

The use of information from international student assessments helped reform an education system. It wasn’t that there was no interest in learning. It’s a case of lack of information. Over the past two decades, the Jordanian education system has made significant advances. Net enrollment in basic education increased from 89 percent in 2000 to 97 percent in 2012. Transition rates to secondary education increased from 63 to 79 percent during the same period. At the same time, Jordan made significant gains on international surveys of student achievement, with a particularly impressive gain of almost 30 points on the science portion of the Third International Mathematics and Science Study (TIMSS). Benchmarking their education system and constant feedback between researchers and policymakers contributed to this achievement.

Jordan was the first Arab country to participate in an international student assessment. This took place at the same time that the country launched its comprehensive system reform. The assessment results were alarming as performance was extremely poor. As a follow up, Jordan sped up their efforts on reforming the education system. The curriculum was targeted, reviewed, and new textbooks were developed. Teacher qualifications were reviewed and evaluated, and teacher upgrading through a university bridging program was implemented. Benchmarks for 13-year-olds’ achievement were established. Jordanian authorities developed a feedback loop between those researching the education system and those implementing change through to teachers. Teachers were supported with guides and feedback. In fact, teacher confidence was one of the factors associated with improvements in learning outcomes.

But even just information, even low stakes testing can lead to improvements. This is the case in Mexico prior to the introduction of national and universal student assessments. Holding everything else constant, states with tests and accountability systems performed significantly better than states without tests. Furthermore, such simple accountability measures are demonstrably cost-effective measures for improving outcomes. Even in Finland, where there is no high-states testing until the end of secondary school, assessments are used to improve learning and they are “encouraging and supportive by nature.”

This conforms to international evidence. Differences in educational institutions explain the large international differences in student performance in cognitive achievement tests.

Moreover, test based accountability – be it high stakes, low stakes, or simply information – is cost-effective. "Even if accountability costs were 10 times as large as they are, they would still not amount to 1 percent of the cost of public education!" argues Caroline Hoxby in an influential paper. According to the Association of American Publishers, total revenues from the sales of tests, related teaching materials, and services amounted to $234 million in 2000. Hoxby calculates that the revenues amount to less than $5 a student. In relation to the overall average cost of educating a child, payments to all test makers represented just 0.07 percent (seven-hundredths of 1 percent) of the cost of basic education in the United States.

Globally, it has been shown that testing is among the least expensive innovations in education reform. In fact, in no country does testing cost more than 0.3 percent of the national education budget at the basic education level.

But measure what is important. That is, tests should inform teachers about how their students are progressing and this feedback should be timely and useful. In other words, avoid teaching to the “bad” test. Policymakers have a role to play, too, as Hoxby points out, as they should “encourage teaching the curriculum, but they should discourage teaching the test.”

So, test and be tested; manage what you measure; and let’s improve learning outcomes for all children.

Follow Harry Patrinos on Twitter @hpatrinos
Follow the World Bank Group Education team on Twitter @WBG_Education

December 02, 2014

Testing has its value if all the circumstances in the various diverse contexts we find ourselves in are common. That is impossible so we have to build in mechanisms that provides some sense of what we are dealing with mainly in terms of socio-economic data. Other factors should be utilised to establish improvements. For exalmple, critical thinking, impact on social challenges and the school environment. Very often teachers teach to the test and in some cases there is cheating to obtain the best results. Schools are also encouraged to sift learners when they are first admitted to school. I think the entire testing movement needs a rethink

Harry Patrinos
December 05, 2014

Thank you Sigamoney. The testing movement may need a rethink – what would you say we need to do first? – but a lack of information to teachers, parents and students is holding us back. In addition to critical thinking and the like, we need to start with very basic information about minimal learning levels.

Paul Atherton
December 04, 2014

Thanks for the blog. I think you raise some interesting points, most notably about the need for learning data as 'management information'. My reflections on this would be that at a classroom level teachers do this informally all the time - but with a class of 100, it's difficult to consistently track this in a way that would be natural in classrooms in the UK/US. So there’s arguably a greater need for a systematic way of supporting formative testing, irrespective of the argument for summative testing. Trouble is, this is inherently reductionist when done through sample surveys due to a lack of tools to do this, and the feedback loop back into the classroom is weak as M&E is seen as external and donor led. But hard to argue with the basics of knowing if children can read when designing education interventions – which is the route we’ve gone down in the GEC, and have seen many adaptations in projects as a result.
So my challenge would be how can we quickly agree and develop the tools to do this? Placing the classroom at the heart of the metrics discussions would help, as would a clear call for action on getting the basics right – reading isn’t learning, but it’s really hard to learn if you can’t read and most (if not all) countries have a system predicated on reading after the early years. We know from PRIMR and other studies that a targeted focus can work, but how do we do this in the wider system? And then, how do we do this at higher levels?
An interesting discussion though and one to be encouraged.

Harry Patrinos
December 04, 2014

Thanks Paul. You are right to say that a discussion should be encouraged. We do need tools at the classroom level that are timey and make sense. But we have one: the early grade reading assessment. It’s a good start. Don’t you agree?

Paul Atherton
December 04, 2014

Yes - but it's complicated to administer (as we've learned in the GEC) and is rarely used as a classroom improvement tool. That said, the Liberia programme shows how good it can be when used like this ( But still, issues remain on expanding this to 'standard' operating environments - so the day to day education experience outside a project.
On a seperate note - people talk a lot about 'data' in these discussions, which puts people off. However, data and information are close relations - just requiring interpretation. So getting the data in a manner which can be used (so can be interpreted by teachers and parents) is a key part of doing this succesfully.

osamha obeidat
December 14, 2014

Thank you, Harry for this interesting blog. The question is do we have a consensus on what "learning outcomes" we are talking about?
In this case of Jordan, I think there is wealth of data from the different tests/exams that were and still not used appropriately to inform the policy making process. As far as I know, the results on Jordanian students in the national or the international studies/tests have not succeeded in generating a high level policy dialogue or received the attention deserved.

Harry Patrinos
December 15, 2014

Osama, thank you for the feedback. In terms of learning outcomes we need to focus on what makes the most sense for the different stakeholders. The use of learning data is key. In fact, Jordan is one of the countries that used results very well to improve learning outcomes between the late 1990s and mid-2000s. See for example: A lot of lessons can be learned from the Jordan experience. But countries need to continue to analyze and use results. Thank you,

Steven Klees
December 18, 2014

This Doesn’t Measure Up
Peter Drucker was a smart guy. Harry Patrinos is a smart guy. In this case, they both happen to be very wrong. You can manage things you don’t measure. You have to. Everyone does. Harry tries to make the opposite sound like both common sense and proven by research. It is neither. We all manage our households every day of our lives. Our time, our children. With outputs much more immeasurable than measurable. Business, especially large ones, have a lot of distance between what their employees do and the final products produced. As many management gurus will tell you, management is more art than science. Supervisors have to evaluate the performance of subordinates every day, when the impact of the subordinate’s work on final products is tenuous at best. All of these examples involve human judgment. Sometimes measuring some outcomes can improve some assessments but many times it can misdirect attention to what may be measurable but not most important. I doubt that Harry’s performance is evaluated by his supervisors only in terms of what can be measured, certainly not in terms of the Bank’s aims of improving development. I doubt if President Kim of the World Bank is evaluating his extensive management reforms of the Bank exclusively, or even principally, in terms of measurable output. There is a whole literature in the field of public administration pointing out how wrong Harry and Peter are. That literature details the many circumstances in which focusing management on measurable output leads to distorted and inefficient decisions.
This is equally true of education. Neoliberal, market fundamentalists like Harry have tried to convince us for 30+ years that testing and measurement is a cheap solution to most of our educational problems. For 30+ years we have increased educational testing and measurement around the world and our educational problems remain, or have increased. No Child Left Behind raised measurement to a fetish in the U.S. but, as with similar attempts elsewhere around the world, all the resources go to building a better thermometer with little attention to the causes of the illness or resources to do something about it. Moreover, the testing and measurement fetish has distorted education towards simplistic measurements of language and math achievement, neglect of other subjects, rampant teacher dissatisfaction, and damage to our children.
Of course, Harry is correct that sometimes, in some instances, measurements can be useful. But they are far from necessary, and when useful, what is measured should usually only be a small piece of what is assessed. Before neoliberal dominance took hold, in the U.S. there was a strong movement towards portfolios of student work as the essential ingredients for assessment, much of which required qualitative judgment to assess, not relying on test scores or even grades necessarily. Colleges did and still do in some places accept portfolios in order to make admission decisions. Colleges like Reed and Antioch do not give grades, yet they manage very well, as do universities that offer admission to their graduates based on qualitative assessments. In vaunted Finland, classroom teachers control assessment; many use portfolio assessment and do not give grades or tests yet they turn out some of the best test-takers in the world.
The research Harry cites to prove that testing “works” would be embarrassing to most economists. The studies are almost all purely simple correlations between test score improvement and engaging in testing in Jordan, or ‘naming and shaming’ in England, or Florida accountability legislation, without many or any controls for the dozens of other variables that might have caused test scores to go up. There is nothing cost-effective about testing. It is another cheap reform brought to you by people who are unwilling to put in the resources needed to improve education. To argue how cost-effective even high-stakes testing is also neglects the significant psychological and material damage it does to so many children. Of course, testing has a place, but to argue ‘if you can’t measure it, you can’t manage it” is belied by experience, research, and common sense.
Steven Klees
University of Maryland

Harry A. Patrinos
December 18, 2014

Steve, Thank you for your comments.  You say that we have “increased educational testing and measurement around the world” and yet “our educational problems remain, or have increased.”  How does one know if things have gotten worse without some sort of measurement – be it a standardized test, a survey, or whatever.  Thanks for saying that I am “correct that sometimes, in some instances, measurements can be useful.”  I would say however that measurement is necessary – but not sufficient.  You still need to take action to solve a problem.  Harry

Gustavo Arcia
December 18, 2014

Can you manage things that you don’t measure? Yes, if you have no choice. Can you manage things better if you also measure? Absolutely yes. It is disingenuous to insinuate that not measuring is better than measuring. It is also disingenuous to suggest that measurement is the end all to education management. Measuring learning outcomes has become a tool for accountability because it is clear that not measuring results has made education improvement very difficult. Professor Klees is correct in saying that true professional performance is not a simplistic ritual of looking an the numbers (although in many fields, such as sales, finance, and retail it simply is). But as a parent I would like to know if my child is learning and if my child’s school is doing well relative to similar schools. I would like ask any professor at the University of Maryland what percent of them bought their homes without looking at school rankings in test scores. Yes, I am sure they also looked at other factors too, but test scores probably played a big role in their decision. Don’t believe me? Ask any realtor. Professor Klees is correct in pointing out the converting testing into a fetish is damaging the system. I completely agree. However, I think he is completely wrong in suggesting that not testing is better. In Finland testing is extremely flexible because the system runs on trust—the result of purposely selecting only the most talented individuals in society as public school teachers. In most countries, including the U.S. the most talented individuals go into other occupations, not public school teaching. As a result, we have to rely on accountability, instead of trust, to improve education quality (by the way, using Reed and Antioch as examples is wrong; they are elite colleges with a self selected elite group of students. I would like to see if their approach would work in ordinary public schools). For me the blog post tells me that measurement is a good idea; as a parent I want to know if my children are learning, and judging from the collective evidence, schools that are asked for accounts tend to make more of an effort than schools that do not render accounts. Is testing the only thing? Of course not. Can testing be a source of distortion? Yes. On that regard Professor Klees warnings are very useful. But to suggest that testing and measurements are bad for schools and students is not a solution. Tests and measurements are just tools; they can be helpful when used correctly. To reject them outright is shortsighted.