“If you had to pull a single lever to eliminate the achievement gap, what would it be?” It was a hefty question posed to kick off a casual lunch-time discussion amongst a group of program directors leading the way in school reform. Even with their experience supporting new school leaders and actively shaping the national reform agenda in education, silence ensued. But it wasn’t the immensity of the question—or the many possible responses—that left me tongue-tied. Even if one was able to identify a single intervention for the sake of argument, how could a measure of its effectiveness possibly be reduced to changes in student test scores? When would we decide that any differences in student outcomes had been eliminated?
Recent players in the field of school reform adamantly demand doing away with “business as usual.” Incompetent systems of educational administration, union bureaucracy, and school leadership complacent in their approach to opportunity gaps are prime targets for removal. In their place, business administration models, talent from non-education backgrounds, and leaders willing to disrupt system norms should be instated to revolutionize an otherwise broken institution.
Another common phrase used in this forum, “moving the needle in education reform,” implies that each of these corrective strategies should be gauged against some standard of improvement immune to political sway, public sentiment, and the obstinate status quo. The effectiveness of any business venture is weighed against a bottom line. For innovative approaches to education reform, that bottom line is student achievement. But what do business leaders, and other pioneering reform advocates, really know about the measurement of student knowledge?
For certain, educational measurement is not an easy topic to navigate. In general, it aims to measure students’ abilities and knowledge attainment in content areas such as reading, mathematics, and science. Integral to this is the study of psychometrics, which is primarily focused on the construction and validation of measurement instruments like surveys and tests. Psychometricians look closely not only at the development of instruments and procedures for psychological measurement, but also at the development and refinement of theoretical approaches to measuring individual knowledge, abilities, attitudes, and personality traits.
In other words, to use testing to determine the effectiveness of a policy, practice or reform relies upon careful, complex work in understanding what, exactly, is measured by tests and surveys and how differences in individuals’ cognitive states can be accurately and reliably assessed. It must then be understood what resulting data patterns mean in their interpretation. Can we definitively pinpoint a cause-and-effect relationship between changes in district-level human resource policies and student test scores, for example? Between a school’s technology infrastructure and student performance in state-level language exams?
The intensive thought and work invested in standardized student testing has resulted in some of the most reliable indicators of acquired student knowledge. To make use of a recent example, in the identification of reliable measures of student achievement, the Measures of Effective Teaching (MET) project recently indicated the Balanced Assessment in Mathematics (BAM) as a strong complement to state examinations in grades K-12 math. With tests like BAM, which present a collection of cognitively challenging, higher-order thinking questions tied to curricular practices, which are shown to measure student knowledge consistently across different student groups, and which reflect recent changes in methodological approaches to math, it is not difficult to see why measures of student achievement have become such highly regarded metrics. It is imperative to recognize, however, that we, as advocates of education improvement also assign value to these metrics—oftentimes beyond the scope of whatever information they are capable of conveying. In the case of BAM, despite its strength as an independent assessment, final findings from the MET project suggested that multiple measures (that is, a combination of classroom observations, student surveys, and measures of student achievement) produce more consistent ratings of teacher effectiveness than student achievement measures alone.
Instead, the conversation around school reform seems caught in the habit of invoking the consequential language of “accountability,” where it is the responsibility of singular data points to “drive decision-making.” Student test scores have become high stakes currency in the allocation of public funding, in attracting philanthropic investment, and in substantiating changes in school, district, state, and national policies.
Cloaked in buzz phrases, such as “moving the needle” or “eliminating the achievement gap,” the inherent complexity of educational measurement is obscured. Suddenly, an organization finds itself trying to measure the effectiveness of every school-level program against test scores, even if improving a district’s transportation system is only indirectly related to student learning activities. Individual program managers also know this challenge well–I’ve watched many, on several occasions, wrestling to interpret year-to-year fluctuations in state-level exam scores in the attempt to evaluate the merits of a particular school or intervention. In these ways, charged promises of results, data, and student achievement have become dangerous determinants of policy lacking any knowledge of the history, limitations, and past mistakes of educational measurement. This is especially disturbing when blanket statements of “improved student outcomes” are guaranteed by powerful players.
Surely not everyone is expected to become an expert in psychometrics. However, there needs to be some basic skill development in understanding measurement issues amongst education policy reformers. This could be achieved by the conscientious recruitment of key team members who are experts in measurement. In addition to interpreting test results and defining the appropriate boundaries of their application, such a person would contribute valuable insight into the technical aspects of measurement, its intricacies, and the practical and theoretical implications of the use of measurement techniques in schools.
The incorporation of such knowledge might well temper the guarantees of some reformers, and might well lead to more modest claims. But the claims and promises made would be realistically grounded in what can be reasonably measured. Declarations of programmatic success and the identification of needs areas might instead be derived from more than simplistic interpretations of student test scores.
If we are going to continue elevating the importance of measurement; if we are going to continue referring to student test scores as the barometer of educational success; if school reform is intent on “moving the needle,” then we have the responsibility of knowing what comprises the dial.
– Jennifer Ho