Between a Rock and a Black Box


When we move to the Common Core, standardized tests will require students to write essays, solve complex multi-step problems, and explain their solutions.

This is a good thing.  A positive movement beyond bubbling on multiple choice tests.

But, accurately scoring long essays (and students’ responses to complex problems) is hard.  If humans score the essays, the human raters can be biased or just plain get tired, so multiple raters and lots of training are needed to ensure that the scores are reliable.  This costs serious money.  If computers score the essays it’s much cheaper.  Recent research claims that computer software can score tests with similar reliability to human raters.  The Hewlett Foundation sponsored an X-Prize type competition to build a better computer grader.  Entrants analyzed 17,500 essays that had already been graded by humans and a study analyzing the entrants found that the best programs are “capable of producing scores similar to human scores.”

However, using computer software to score essays introduces a new problem.

Computers can’t really understand what they read, so the scoring algorithms have to rely on an approximation to quality: finding and counting instances of some characteristic of good writing.  For instance, Randy Bennett, an assessment expert, recently spoke at UCLA questioning the validity of computer-based scoring, in part by recounting how one of the finalists in this Hewlett competition developed a very competent algorithm that was based on counting the number of commas in a piece of writing.

In a way, this makes sense, since it’s likely that, on average, more sophisticated writers use more complex sentence structure and thus, more commas.  But, as we all know, more commas ≠ better ‘riting.  This points to a larger problem with computer-based scoring.  The folks who are creating this software don’t generally open up the black box and tell everyone how the algorithms work.  If they did, people could cheat just by adding more commas.  But if they don’t, there’s no way to assess the validity of these scores: are they measuring what they are supposed to be measuring?  On top of this, some critics who are more philosophically inclined complain that having computers score essays degrades the whole nature of the writing process, which should be an inherently interpersonal communicative act.

So, it seems like we’re stuck.  We can’t afford enough human raters to assess every student in a complex way like the Common Core (and common sense) tell us we should.  But computers can’t really assess complex thinking, and using them too much threatens to derail the complexity we’re trying to encourage.

Here’s what we’re likely to end up with: Common Core essays will be largely scored by machines, checked for accuracy by a few underemployed adults with college degrees, and then the results will be sent back with some vague feedback.  The scores will be, on average, wholly predictable by zip code and socioeconomic status.

There’s a better way.

Rather than focusing on a centralized, high-stakes accountability system, we can create systems of interdependence so that our schools can hold one another accountable.  We want students in Compton Middle Schools to be able to write as well as those in Manhattan Beach, right?  (And solve complex math problems, conduct science experiments, etc.,)  Have the Compton essays sent to Manhattan Beach, the Manhattan Beach essays sent to Inglewood, and the Inglewood essays sent to West Hollywood.  And then switch it up next semester.

Scoring and providing feedback on essays from a wider range of students could open teachers’ eyes to a broadened perspective on what they can and should expect from their own students.  Randomly selected essays could still be double-scored by both machines and expert human raters, but teachers would now assume some ownership in the success of the Common Core assessments and standards.  Such a system could also be the start of building deeper partnerships among schools, so that we could build an education system that works toward the success of all children, rather than placing our schools into artificial Hunger Games competitions pitting charter against magnet and district against district.

What’s more, if we move our assessments closer to our teachers and students, and involve them in the day-to-day creation of what is quality work, then we can make our assessments part of our instruction, and move our instruction closer to the needs identified through the tests.  In other words, we can make sure that our assessments help students learn more.

So, what do you think?  How can we make sure our new assessments help our students learn more?  Respond, 200 words or less.

– Kevin


4 thoughts on “Between a Rock and a Black Box

  1. I am not sure where the inter-district essay scoring would go, but in addition to the possible benefits, I think having more interconnections between high-affluence and low-affluence districts in same region would pay-off in many ways.

  2. I think that increasing the scope for educators to see the work of other educators (and why not somehow include students in this too) might indeed be beneficial. I would add that we also need a system that recognizes the importance of a metalanguage to talk about the qualities of student products in ways that better get at their achievements and potentials. A history essay, for example, needs to mine the particular rhetorical resources that constitute the genres typically employed by historians. Knowing about these rhetorical resources and being able to name their particular lexico-grammatical formations as well as their meaning functions and potentials would more than enhance the design challenge of writing an essay. Having facility with such meta-language would be analogous to a jazz composer understanding the role of chord progressions – not a prescription for what to do but useful and generative resources to draw upon in the act of creation. I think all this is important to say because too many of our students do poorly with academic discourse partly because they are not sufficiently given access to the various repertoires of rhetorical resources that constitute academic writing and thinking. Of course providing such access is tricky because it means showing how rhetorical resources are intimately connected to the values and ethos of an academic field’s practice (e.g. nominalized sentence structures found in scientific texts help to create an objectifying stance where human agency is downplayed and causal processes are more valued). This means teachers would need to be equipped with an understanding of how the lexico-grammar aspect of language is a social cultural phenomenon (I.e. not a set of formal language rules) and have a meta language to better talk about all this. I can think of no better such meta language than the one developed by the Sydney School, a group of researchers inspired by the sociological theories of Bernstein, the linguistics of Halliday and the pedagogical theories of Vygotsky.

  3. I’m not convinced we should “move to the common core” as you put it, as beyond it. Basically I worry whether its reception (by teachers and the curriculum publishing industry) will make the necessary differences to educate for a multi-modal, multi-discourse and entrepreneurially demanding 21st century of culturally diverse people. It is especially lamentable, that it has ignored rafts of internationally valued educational research in linguistics, discourse analysis and the sociology of language – research that would have provided important pedagogical principles around such things as developmental appropriateness, a functional understanding of grammar, language acquisition (when English isn’t your mother tongue), the rhetorical structures of academic language, and smart teacher scaffolding. In this regard, the CCSS is a pathetically sad missed opportunity to have been a truly world class document.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s