Finding a reasonable target for math and reading test scores
This page in:
The United Nations’ Sustainable Development Goal (SDG) for education calls for learning for all. This includes ensuring that, by 2030, all students achieve relevant learning outcomes by the end of their primary and lowersecondary schooling. An important way to measure the attainment of this target will be to look at the percentage of children in each country achieving at the very least “minimum proficiency” on standardized math and reading tests.
Ideally, one would want all children in every country to demonstrate at least minimum levels of proficiency on designated math and reading tests by 2030. Having proficiency in these core areas is vital to their success in school and life.
However, given where many countries are starting from, is this attainable in the space of 15 years? What should progress look like? What would be a challenging but realistic target?
We get some ideas by looking at existing trends for countries on international largescale assessments of math and reading. These include the Trends in International Mathematics and Science Study (TIMSS), the Progress in International Reading Literacy Study (PIRLS), and the Program for International Student Assessment (PISA). These tests reflect the consensus of the participating countries about which reading and math learning outcomes are important for primary and secondary schooling.
TIMSS measures student achievement in math and science at the fourth (nine year olds) and eighth grade (13yearolds) levels every four years. PIRLS measures reading achievement at the fourth grade level every five years. PISA measures the reading, math, and science proficiency of 15year olds on a threeyear cycle.
All three assessments report countries’ results on scales with a midpoint of 500. Benchmarks corresponding to different levels of proficiency are also set on each scale. In the case of TIMSS and PIRLS, there are four benchmarks: Low, Intermediate, High, and Advanced. Students reaching the Intermediate benchmark (a score of 475 or above) are able to apply basic knowledge in a variety of situations, similar to the idea of “minimum proficiency”. In the case of PISA, the benchmarks are six Levels (16), with students performing at or above Level 2 (scores above 420) viewed as having achieved “minimum proficiency”.
There are very large differences in the scores of the highest and lowestperforming countries on each of these tests. For example, average scores on the 2011 PIRLS test ranged from a low of 310 for Morocco to a high of 571 for Hong Kong SAR, China – a vast difference of 261 points.
Lowerincome countries are far more likely to score below the international mean and to have smaller percentages of students reaching minimum proficiency. Higherincome countries, particularly those in Asia, tend to score above the mean and to have the majority of their students reaching minimum proficiency. This suggests that countries have very different starting points for getting to the SDG goal and that lowerincome countries have much further to go, at least judging by their performance on these international assessments.
Trends in average scores and in the percentages of students reaching minimum levels of proficiency on TIMSS (19952011), PIRLS (20012011), and PISA (20002012) reveal the following:

Countries are far more likely to see significant test score improvements over time at the primary level, particularly in math. For example, around three quarters of the countries participating in TIMSS at the fourth grade level in 1995 and 2011 saw significant increases in their math scores over that time period. Around half of the countries that participated in PIRLS in 2001 and 2011 saw significant increases in their fourth grade reading scores during this period. In contrast, only one third of the countries participating in TIMSS at the eighth grade level in 1995 and 2011 saw significant increases in their math scores over this 16year period. In addition, only about one third of the countries participating in PISA have seen significant increases in their reading (20002009) or math (20032012) trend scores.

Countries can make significant gains in their performance levels over time regardless of where they’re starting from. Several highperforming countries, such as Hong Kong SAR, China, Korea, and Singapore, continue to impress with steady gains on PIRLS, TIMSS, and PISA over time. However, several lowerperforming countries also have shown impressive gains.
For example, Brazil’s score on the PISA math test increased by 35 points between 2003 and 2012, the largest increase for any country over that time period. Poland saw an increase of 28 points on the PISA math test over the same time period, moving the country from below to above the OECD average. Both countries also had the largest increases in the percentage of their students reaching minimum proficiency (8.1 and 7.7 percentage points increase respectively) on the test. 41 points on the PISA scale is equivalent to one year of formal schooling so all of these gains are quite significant.

While some countries achieve very large increases over time, most experience more modest changes. In fact, some countries see declines. For example, while Portugal’s score on TIMSS at the fourth grade increased by an impressive 90 points between 1995 and 2011, the typical increase in math scores over this time period was 35.5 points (an average of 2.2 scale points per year).
Portugal also showed the biggest increase – 43 points – in the percentage of students reaching minimum proficiency on the fourth grade TIMSS math test between 1995 and 2011. However, the typical increase for countries was 12.4 points, a gain of less than one percentage point per year.
While Peru had a sizeable increase of 43 points in its PISA reading scores between 2000 and 2009, the typical increase for countries was 4.2 points (roughly half a scale point per year). In addition, five countries saw a significant decline in their reading scores over this time period.
Chile showed the biggest increase – 17.6 points – in the percentage of students reaching minimum proficiency on the PISA reading test between 2000 and 2009. However, the typical increase for countries over this 9year period was 2.3 points (less than one percentage point per year).
It’s important to point out that since many lowerincome countries have only recently begun to participate in TIMSS, PIRLS, and PISA, there is not as much longterm trend data available for them as for higherincome countries. However, the data that are available show roughly similar trends in the magnitude and direction of changes in student performance.
It’s also important to note that countries will be able to choose among a variety of assessment tools – international, regional, or national in nature – for monitoring their progress towards minimum proficiency under the SDGs. Whichever tools they use, the lessons gleaned from longterm trends on TIMSS, PIRLS, and PISA should be useful for managing expectations, including what is possible at the high end.
The TIMSS, PIRLS, and PISA data suggest that improvements in the percentage of students reaching minimum proficiency on reading and math tests are very possible for most countries, but likely to be relatively modest in the short term, particularly if mínimum proficiency is defined similarly to these three tests.
Of course, prior score trends are not destiny and only show what has been typical on these types of systemlevel monitoring exercises in the past, not necessarily what is possible in the future. Regardless, when you investigate countries like Brazil, Poland, or Russia that have shown large improvements on TIMSS, PIRLS, PISA, or similar assessments over time, it becomes crystal clear that test score improvements do not happen by chance, but are linked to deliberate efforts on the part of country policymakers and other stakeholders to improve education quality and opportunities for all students.
Meaningful improvements in test scores are possible in a variety of country contexts, but take hard work and commitment. Unfortunately, there are no short cuts to learning for all!
Find out more about the World Bank Group’s work on education on Twitter and Flipboard.
Read more about what World Bank experts have to say about PISA.
Join the Conversation
Great piece Marguerite. I'm worried though about the continued reliance on just a few International Large Scale Assessments (as if those are the only one that "count"). How can we better leverage national and regional efforts, whether results from Washington DC's PARCC assessments or results from ASER in Pakistan? Should the emphasis be on the USE of the assessments rather than the results themselves? How do we really define what is "comparable"?
I completely agree that we need to help manage expectations in partnership with governments. What are reasonable expectations for gains? Against which measures? What is the most equitable way to encourage measurement of progress? One key message we need to continue to share is that (for most countries) if you want to improve overall means you need to focus on moving results for the bottom quintile(s).
More thoughts here: http://bit.ly/SDG411blog
Thanks for sharing your thoughts on this important topic. One way to better leverage national and regional assessment efforts would be to create more systematic mechanisms for those involved in these assessments to engage with each other at the policy and technical levels in concert with global and international bodies. This would create opportunities for synergies, capacity building, and knowledge and data sharing. One possible mechanism for doing this under the SDGs will be the Global Alliance to Monitor Learning, overseen by the UNESCO Institute for Statistics (https://sdg.uis.unesco.org/2016/05/13/gettingdowntobusinesstheglob…). The use of the results from assessments is the justification for why we administer them in the first place, but the effectiveness of that use depends on how good the data are. Making use of inaccurate data will not necessarily improve student learning and may actually make things worse. Attention needs to be paid both to generating valid and reliable data on student learning and to making good use of that data. One without the other will not work.
Thanks, Marguerite, for your blog.
Can you clarify if you are asking (1) what level the bar should be at in terms of attainment, or (2) what proportion of children with "minimum proficiency" should be set as the target? @stuartpjohnson