When we move to the Common Core, standardized tests will require students to write essays, solve complex multi-step problems, and explain their solutions.
This is a good thing. A positive movement beyond bubbling on multiple choice tests.
But, accurately scoring long essays (and students’ responses to complex problems) is hard. If humans score the essays, the human raters can be biased or just plain get tired, so multiple raters and lots of training are needed to ensure that the scores are reliable. This costs serious money. If computers score the essays it’s much cheaper. Recent research claims that computer software can score tests with similar reliability to human raters. The Hewlett Foundation sponsored an X-Prize type competition to build a better computer grader. Entrants analyzed 17,500 essays that had already been graded by humans and a study analyzing the entrants found that the best programs are “capable of producing scores similar to human scores.”
However, using computer software to score essays introduces a new problem.
Computers can’t really understand what they read, so the scoring algorithms have to rely on an approximation to quality: finding and counting instances of some characteristic of good writing. For instance, Randy Bennett, an assessment expert, recently spoke at UCLA questioning the validity of computer-based scoring, in part by recounting how one of the finalists in this Hewlett competition developed a very competent algorithm that was based on counting the number of commas in a piece of writing.
In a way, this makes sense, since it’s likely that, on average, more sophisticated writers use more complex sentence structure and thus, more commas. But, as we all know, more commas ≠ better ‘riting. This points to a larger problem with computer-based scoring. The folks who are creating this software don’t generally open up the black box and tell everyone how the algorithms work. If they did, people could cheat just by adding more commas. But if they don’t, there’s no way to assess the validity of these scores: are they measuring what they are supposed to be measuring? On top of this, some critics who are more philosophically inclined complain that having computers score essays degrades the whole nature of the writing process, which should be an inherently interpersonal communicative act.
So, it seems like we’re stuck. We can’t afford enough human raters to assess every student in a complex way like the Common Core (and common sense) tell us we should. But computers can’t really assess complex thinking, and using them too much threatens to derail the complexity we’re trying to encourage.
Here’s what we’re likely to end up with: Common Core essays will be largely scored by machines, checked for accuracy by a few underemployed adults with college degrees, and then the results will be sent back with some vague feedback. The scores will be, on average, wholly predictable by zip code and socioeconomic status.
There’s a better way.
Rather than focusing on a centralized, high-stakes accountability system, we can create systems of interdependence so that our schools can hold one another accountable. We want students in Compton Middle Schools to be able to write as well as those in Manhattan Beach, right? (And solve complex math problems, conduct science experiments, etc.,) Have the Compton essays sent to Manhattan Beach, the Manhattan Beach essays sent to Inglewood, and the Inglewood essays sent to West Hollywood. And then switch it up next semester.
Scoring and providing feedback on essays from a wider range of students could open teachers’ eyes to a broadened perspective on what they can and should expect from their own students. Randomly selected essays could still be double-scored by both machines and expert human raters, but teachers would now assume some ownership in the success of the Common Core assessments and standards. Such a system could also be the start of building deeper partnerships among schools, so that we could build an education system that works toward the success of all children, rather than placing our schools into artificial Hunger Games competitions pitting charter against magnet and district against district.
What’s more, if we move our assessments closer to our teachers and students, and involve them in the day-to-day creation of what is quality work, then we can make our assessments part of our instruction, and move our instruction closer to the needs identified through the tests. In other words, we can make sure that our assessments help students learn more.
So, what do you think? How can we make sure our new assessments help our students learn more? Respond, 200 words or less.