The Sustainable Development Goal for education (SDG4) aims to ensure that by 2030 all boys and girls “complete free, equitable and quality primary and secondary education leading to relevant and effective learning outcomes”. The indicator for tracking progress towards this target requires countries to report on the “Proportion of children and young people: (a) in grades 2/3; (b) at the end of primary; and (c) at the end of lower secondary achieving at least a minimum proficiency level in (i) reading and (ii) mathematics, by sex.” The World Bank will soon launch a new learning target that is closely aligned with SDG4 but aims to specifically encourage progress in the foundational area of reading.
Both SDG4 and the new World Bank target bring much needed attention to the foundational skills that all children need to learn to thrive in today’s global economy. Reporting on progress towards these targets, however, will require countries to “up their game” in terms of the information value of the data generated by their existing large-scale assessments of reading and mathematics. It will be important for countries to think about what they need to do to enhance the value of the data generated by these assessments so that they can use it to monitor education reforms and report on progress to both local and global audiences.
Before proceeding, it is important to differentiate these large-scale assessments used for system-level monitoring from the high-stakes public examinations found in many countries. The former is used to report on aggregate levels of performance in an education system while the latter is used to make decisions about individual students, such as whether they graduate from secondary school or gain admission to university. The issues discussed here pertain to large-scale assessments for system-level monitoring.
Figure 1 shows a continuum of large-scale assessment data reporting, moving from data with less informational value to data with more informational value. Table 1 offers a brief definition for each type of data and summarizes some of its advantages and disadvantages for countries to consider.
Figure 1. Continuum of large-scale assessment data reporting
Table 1. Definitions, advantages and disadvantages of different types of assessment data
Many World Bank client countries have room to improve the information value of their large-scale assessment data. Some have national assessments that report results only as an average percent correct. The percent-correct score is easy to calculate and understand and is often used in classroom tests for score reporting. However, a percent-correct score does not tell how easy or hard a test is. A student who achieves 57 percent on a very difficult mathematics test may know more than a student who scores 80 percent on a very easy mathematics test. To address this issue, many countries have started moving to scale score reporting for their large-scale assessments. The reported scale scores are obtained by statistically converting raw scores onto a common scale to account for differences in difficulty across different versions of a test. For example, a scale of 200 to 800 could be selected as the scale score range for a test with a possible range of raw scores from 0 to 100. For an easier test, a student needs to answer more questions correctly to get a scale score of 510. For a more difficult test, a student can get the same scale score answering fewer questions correctly.
Several countries also have, or are developing, performance levels to facilitate interpretation of the scale scores. An important step in this journey may be the creation of a common framework for learning and assessment in the form of national learning goals or curricula. This set of common goals helps inform the large-scale assessment framework and provides a basis for constructing and interpreting performance standards and levels. As part of its efforts to support countries in reporting on SDG4, the UNESCO Institute for Statistics’ (UIS) Global Alliance to Monitor Learning (GAML) has developed global content frameworks for reading and mathematics. These frameworks aim to describe core competencies, domains, and constructs for each subject area that can be used by countries to inform their national learning goals, curricula, and assessment frameworks.
As of now, very few countries can report trend scores. In other words, most countries are still unable to tell by looking at their national assessment data whether learning levels are improving or declining over time. One way to address this issue is by embedding a common set of questions across different cycles of their large-scale assessment exercise. For example, a country could create a set of ‘anchor’ questions for a Grade 3 reading test and include these questions every time the test is administered. Other questions on the test would change, but the common questions would remain. Information about test-taker performance on these common questions across different administrations of the assessment could then be used to link scores and report them on the same scale. Another approach (although less commonly used) is to have the same group of students take different versions of the test and use their responses to create links across the versions (a different version could then be used in each subsequent administration of the assessment). Either way, it is important to have technical advisory support, at least in the initial stages, to ensure that the linkages are done correctly.
The World Bank is working with UIS and other partners to create additional resources and tools for countries to help them increase the information value of their national assessments.
We will share more about this exciting work in the coming months, including on a new World Bank Learning Assessment Platform (LeAP) that will provide technical support for the design and implementation of learning assessments; financing for the implementation of learning assessments through World Bank projects; and knowledge products and capacity building tools on key assessment topics.