The Sustainable Development Goal 4 (SDG 4) has an ambitious agenda to ensure inclusive and equitable quality education and promote lifelong learning opportunities for all. To fulfill this commitment and track progress, accurate measures of learning outcomes are essential. Thanks to the Global Alliance to Monitor Learning (GAML), there is a consensus of how minimum standards for learning map into the scales of international, regional, and national learning assessments.
A handful of countries such as China, Ethiopia, India, Pakistan, and Uganda currently use their national assessments to report on the SDGs. Like Brazil, other countries report on the SDGs using solely international or regional assessments (i.e., LLECE for early grade and primary, and PISA for lower secondary). Nevertheless, Brazil has a biannual census-based national learning assessment used to monitor learning systematically through the Index of Basic Education Quality (IDEB), an index which factors the average score as a measure of education quality.
The coexistence of national and international assessments and different metrics to track an educational system's performance begs the three questions addressed in this blog: (1) How do learning outcomes compare across assessments? (2) How can national and international assessments be aligned? (3) How much the choice of indicator matter?
1. Comparing the national and international pictures
Comparing two learning assessments is challenging even if they intend to measure the same content and student population. In psychometrics, this challenge is known as linking. It is like comparing two pictures of the same forest, taken days apart, using different lens, speed, light, and film sensitivity (yes, some of us are old enough to remember film ISO). The picture's timing mirrors pupils being assessed at different grades or moments of the school year. Sampling in each assessment, akin to a film sensitivity, can give a more or less precise image of learning and limit our ability to zoom into the picture. Lastly, each assessment has its own scale for measuring proficiency, just like a wide-angle lens captures a broad scope while a telephoto lens allows the photographers to hone in the details of a narrow area. All these factors combined can create contrasting compositions by highlighting distinct aspects of the same subject.
Let's take this analogy to the Brazilian case and compare learning outcomes in reading and mathematics using national and international assessments. To minimize the difference in the pictures' timing, we select the closest match of years and grades. At the end of primary, we compare the 2013 regional assessment of 6th graders and the national assessment of 5th graders. At lower secondary, we look at PISA 2015 assessment of 15 years-old and the national learning assessment (NLA) 2015 assessment of 9th graders. The resolutions (or film sensitivity) differ significantly: LLECE and PISA are sample-based and evaluated a few thousand pupils each, while the NLA tested close to 2 million pupils in each grade. Lastly, we need minimum proficiency thresholds. For LLECE and PISA, we rely on the GAML consensus. For the NLA, we use the national advocacy threshold proposed by experts convened in 2006 by "Todos pela Educação".
How do these pictures of learning in Brazil compare? Figure 1 illustrates our findings. Each bar represents students' share by level, with pupils below the minimum proficiency in orange and above in blue (each shade corresponding to a proficiency level). The share of students below the minimum proficiency in the national assessment is consistently lower, possibly because: (i) pupils are tested at earlier grades in the national exam; (ii) the national aspirational level might be more ambitious than the international one; or, (iii) the scales are not necessarily comparable.
Figure 1 – Share of students in each proficiency level by subject, in national and international assessments
2. Aligning scales of national and international assessments
One of the GAML process's biggest contributions was shifting the conversation from a common scale (requiring that all countries implemented the same assessment to have a global number) to a Global Proficiency Framework, which describes competencies that students should master to be considered proficient at each grade. This means different assessments of sufficient quality can be used as long as they can be benchmarked against this international standard. Though less precise, this approach has the benefit of building on assessment systems that already exist and can increase the country ownership of the results (another critical objective in the SDGs). GAML experts undertook the Herculean task of mapping the proficiency levels of several international (i.e., PISA, PIRLS) and regional (i.e., LLECE, SAQMEC, PASEC) learning assessments in a continuum and identify the minimum proficiency levels at early grade, end of primary, and lower secondary.
As mentioned before, some countries have chosen to report their SDGs using their national learning assessment. This has the advantage of increasing the national ownership of the measure and allowing countries to compare their own national aspirations with international standards easily. The mapping to a global scale can be a valuable complement for triangulation and identifying countries to learn from and benchmark.
However, to be valid, it is critical to go beyond an empirical validation, as done in this blog, and explore more suitable methods to map the scale of a national assessment to the Global Proficiency Framework. As discussed in the latest GAML meeting, which took place in October 2020, countries can do this through:
1. The inclusion of global items in the national learning assessments which can then be used to psychometrically link the two scales;
2. An UIS/IEA Rosetta Stone approach (paper) such as what the Laboratorio de UNESCO/UIS/IEA are piloting in Colombia and Guatemala to link the LLECE scale with the PIRLS scale by applying both assessments to the same group of students; or
3. A policy linking workshop where experts from the national and international assessment map the minimum passing scores on both assessments (policy linking update presented in the 7th GAML meeting).
3. How much does the choice of indicator matters?
Brazil has an impressive education monitoring system – besides assessing students' learning, it tracks individual schooling trajectories through a School Census. More importantly, there is ownership and accountability of education indicators, with the composite Index of Basic Education Quality (IDEB) being widely reported in the national media and actively used to inform policy. The IDEB of each cycle (primary, lower-secondary, and upper-secondary) is the product of the average reading and mathematics test scores for the final grade in the cycle and the average pass rate for the cycle.
The SDG target 4.1 has a dual commitment with schooling and learning, similar in spirit to the IDEB. However, learning is monitored through the share of students achieving a minimum proficiency level rather than by average scores. Pupils' proficiency can be combined with out-of-school rates, so all children count, as done in the learning poverty indicator. This reflects a commitment to focus our efforts on improving learning for the kids that fail to reach these minimum standards of competencies.
Currently, the IDEB is the leading metric for benchmarking municipal and state systems' performance within Brazil, but what are its blind spots? With recent initiatives to revise this indicator, it is vital to reflect on the country's aspirations for education and how they translate into measures.
The IDEB uses the average test score, which can hide many important aspects of the improvement of education quality. As Figure 2 illustrates, the mean score and the share of students below the national benchmark minimum proficiency level do not move at the same speed. For example, students who cannot read proficiently at the end of primary almost halved (from 72% to 39%) between 2007 and 2019, while the mean scores increased only 22%. You can interact with this dashboard to visualize other years, grades, and subjects. This happens because changes in proficiency levels are driven both by changes in the mean score and the learning distribution (see another blog that dives deeper on this point using PISA data).
Figure 2 – Evolution of mean scores and share of students below minimum proficiency
Learning performance can also improve when the proportion of students participating of the learning assessments is reduced (sometimes intentionally) to exclude the low-performing students. IDEB mitigates this type of selection (at least at the school level) that could artificially inflate the average learning, as it is only reported for schools with a participation rate (ratio between the students that are assessed and the total number of students enrolled) above 80%. IDEB also considers the progression rate of students throughout the entire education cycle, discouraging the automatic promotion of children who are not learning while also discouraging grade retention to boost test scores.
Going forward, IDEB can be improved. Accountability can be higher if the participation rate enters the formula as a reduction factor, rather than a criterion for IDEB being reported or not. IDEB could also more effectively capture the efforts to increase equity and access by also considering completion rates and/or access rates, as a substantial proportion of dropouts happens between grades (or school years). All those choices can help strengthen the incentives generated by IDEB.
Measuring learning is a key part of evidence-based education policy, and the choices of measures matter. Educational systems that want to adopt a performance metric sensitive to learning inequalities should not rely on average scores. The focus on students at the bottom of the learning distribution is the first step toward a greater commitment to equity.
Let us hear your views!