Teachers should be at the center of policies focused on improving education systems. Without effective teachers promoting positive interactions with their students and carrying out meaningful educational activities in the classroom, it will be challenging to improve students' learning. However, policymakers and other stakeholders interested in supporting teachers have limited information to make the right decisions and allocate resources where they are needed the most.
Classroom observation tools that capture the quality of teaching practices help produce this information in order to provide feedback to teachers and education systems. However, for this information to be accurate, these tools must adhere to the highest technical standards in terms of reliability and validity. Instances that may reduce the reliability and validity of classroom observation tools include sources of bias during the scoring process.
In simple terms, bias is any source of error that can reduce the precision of the information collected or that distorts the interpretation of this information. With classroom observation tools, bias may stem from subjective criteria enumerators use to score these tools or perception biases of teachers' performance. Bias is a problem because it reduces the accuracy of information produced by classroom observations, while also making it more challenging to interpret and use the information. Imagine multiple enumerators are using the same classroom observation tool, but they are each placing importance on different aspects of teaching practices. Some may be more inclined to assign higher scores than others because of their idiosyncratic set of beliefs. This would make it challenging to interpret teaching quality scores, and the extent to which they reflect the quality of teaching practices versus biases from enumerators.
Over the past few years, the World Bank has supported the development and continuous improvement of the Teach classroom observation tool, which is designed to capture the quality of teaching practices in the classroom. Since its development, the team behind Teach has been analyzing available worldwide data to report on the tool's technical properties (See 1, 2, and 3), in an effort to verify that Teach meets the technical standards of a high-quality classroom observation tool. For instance, Teach appears to measure criteria deemed essential for effective classroom practice. Moreover, improved teaching quality, as measured by Teach, has been linked to higher student achievement in language and mathematics.
As the use of Teach expands worldwide to measure and support the improvement of teaching practices, the team strives to ensure that its scores accurately reflect classroom practices. Taking advantage of Teach's data in low and middle-income countries, we used a set of analytical tools from the field of psychometrics to determine to what extent enumerator bias exists, and how much it impacts scores produced by Teach. The findings of this study can be found in a recently published chapter: "A generalizability study of Teach, a classroom observation tool," in the Quantitative Psychology series. Data from four countries located in South Asia, Sub-Saharan Africa, East Asia and the Pacific, and Latin America and the Caribbean, were gathered, harmonized, and analyzed to identify the extent to which raters or enumerators contributed to Teach score bias.
What are the results of this study? In summary, the scores produced by the Teach classroom observation tool are mostly the product of the aspects of teacher quality measured by each of its items and the teacher performance, rather than the product of enumerator bias. This is a positive finding confirming that Teach produces accurate, valuable information in supporting teachers. However, our team recognizes that other unreported sources of bias could influence the scores produced by Teach. So, we explored different possibilities, such as the school location or subject taught by the teacher during the observation, but we have not found any significant contribution of bias from them.
How can users of classroom observation tools, like Teach, guarantee that enumerators are a minor source of score bias? Ensuring that enumerators go through proper training before they collect any data from classrooms, has been proven to reduce bias. In the case of Teach, certified master trainers, training and administration protocols, and training guidance and resources are publicly available to ensure that enumerators produce precise scores. The Teach enumerator training consists of locally sourced practice coding videos that match a variety of areas for collecting data in classrooms. Additionally, clear examples of the teaching practices that enumerators are expected to observe and score, can help them mark teaching performance with minimum bias. In the case of Teach, examples are provided for each of the behaviors to be scored by enumerators.
This study sheds light on possible bias in assessing teacher practices, an important topic often overlooked during the development of classroom observation tools. Enumerators represent a minimal source of bias in the Teach tool's scores, confirming the benefit of performing appropriate training for enumerators with examples of representative teaching behaviors. Moving forward, our team seeks to expand the global understanding around the impact of teaching quality on student achievement. We hope that the data produced by our study will bridge the gap between research and practice by enhancing an overall understanding of how the effective use of classroom observation tools can improve teaching and learning practices.
To learn more about the generalizability study on enumerator bias, please see the recently published article that presents this work. If you want a copy of the article or are interested in learning more about psychometrics and evaluation projects around the Teach set of classroom observation tools, please contact Ezequiel Molina, Adelle Pushparatnam, and Diego Luna Bazaldua.
Join the Conversation