What’s Galileo got to do with how we teach math?


Turns out that Copernicus’ new theory (Earth revolves around Sun) had more trouble getting accepted than is commonly known.

According to Alan Chalmers, a philosopher of science and author of What is this Thing Called Science, his ideas faced opposition not only from religious , orthodoxy but from some pretty stubborn facts that the new theory could not explain.  For instance, if the Earth was really spinning, why didn’t rocks and mud spin off the Earth?  And why did blocks of lead fall straight down from the Tower of Pisa, instead of landing farther away (or closer) when the Earth spun beneath them as they fell?

These facts, and a host of others that contradicted the Copernican view, could not be adequately accounted for by scientists for dozens, even hundreds, of years, yet Galileo abandoned the Ptolemaic theory (Earth is center of Universe) in spite of these facts while many others held fast to Ptolemaic theory.


Chalmers suggests that both the Ptolemaic and the Copernican view were almost hopelessly complex, convoluted and confronted with dozens of facts that they couldn’t easily explain, but that the main attraction of the Copernican theory lay in the “neat way it explained a number of features…which could be explained in the rival Ptolemaic theory only in an unattractive, artificial way.”

Neat…attractive.  One of the single largest leaps of scientific progress in history came about in part because of “evidence” that belongs more to the realm of poetry than science.

Is science no more different from poetry than a sonnet is from free verse?  And if the criteria are unclear for abandoning the Ptolemaic theory in favor of the Copernican, then how can we decide what evidence we will need in order to determine whether to abandon or defend any of our current theories?  What do we do when faced with such a situation, where the facts line up on both sides of the debate?

This conundrum seems to be even worse when we consider that this is the situation as it exists in the hard sciences, and our debates in the social sciences are likely to be even more difficult to conclusively resolve.  Can we ever make progress?  Can we ever really learn anything?

A possible solution comes from Karl Popper, generally regarded as one of the greatest philosophers of science.  Popper offers a neat three-step logic model for learning that he then expands to four steps and uses to explain everything from the behavior of plants to natural selection to the scientific method and the progress of scientific theories. In three steps, the common sense method of learning through trial and error goes Problem – Attempted Solutions – Elimination of unsuccessful attempts.

Here’s a quick example: a tree needs water – sends out roots in all directions – roots that don’t reach water wither and die, those that reach water grow longer and stronger.  Popper then elegantly uses this schema to explain Darwin’s revolutionary theory: change in environment causes danger for a species – lots of genetic mutations – most of these mutations kill the organism, but perhaps one results in a successful adaptation.

The scientific method improves on this innate procedure by adding circularity and conceptualizing attempted solutions as theories: Old Problem – formation of tentative theories – attempts at elimination (or we could say falsification) of these theories – New Problems.

So maybe this model gets us out of the problems Chalmers raised.  What do we do when we are faced with a situation where the facts line up on both sides of the debate?  We propose theories and see if we can falsify them.  In other words, we engage in research!  That’s what Galileo did.  He saw that some facts, and a very few thinkers, seemed to support this new Copernican theory.  He decided to open his mind, and test out these two competing theories with a series of experiments.  Galileo risked the wrath of the Inquisition, but his work eventually changed the thinking of an entire culture.

You might think that this solution would be encouraging to a budding researcher such as me, but I’m not sure I feel encouraged.  Instead, I find myself wondering if this solution even applies to educational research, or if, perhaps, we are stuck back in 1543, when ideological orthodoxy, rather than scientific evidence, decided what could be accepted as truth.

Let’s look at a current problem and see if we can find out.

Problem: kids in the U.S. aren’t learning mathematical problem-solving skills like they should be.

Tentative Theories:

  • Teachers ought to require students to think up their own solutions to problems and make connections across concepts.


  • Teachers ought to teach students to apply a certain procedure in a given situation.

Attempts at elimination: At least twenty years of research suggests that when teachers require creative solutions and making connections, students learn more.  The National Council of Teachers of Mathematics (NCTM) believes this.  The folks who wrote the Common Core standards believe this.

On the other hand, rank and file teachers generally focus on teaching students to apply a certain procedure in a given situation (Hiebert and Grouws, 2007).  Textbooks demonstrate one procedure and then provide lots of practice in applying that procedure.  Testing reinforces this status quo by pressuring teachers to cover so many standards – one teacher recently told me she “does critical thinking” with her students only after the standardized tests in the spring.

As I read the research evidence, it appears to line up in favor of eliminating, or at least curtailing, the procedural approach to teaching. I don’t have the space to adequately present all the evidence right here, but let’s pretend I’m right.  Why don’t teachers follow what the research evidence recommends?

Even if you don’t agree with me on the evidence regarding this particular question of math instruction, every time a new curriculum is adopted, every time teachers are sent to a new training session, they face this question: should I change the way I’m teaching, or continue with what has gotten me this far?  How should they make these decisions?  On what expertise should they rely?  And why do they often not seem to rely on the evidence that is trusted by researchers?

There are at least two possibilities:  First, perhaps they don’t know of this evidence. This seems plausible.  After all, the line of communication between researchers and teachers is more like a string between two paper cups than a high speed internet cable.

Second, perhaps teachers don’t recognize the research consensus as authentic expertise.  This lack of trust could come from two sub-sources: some teachers, isolated in their classrooms, might not accept evidence from some ivory tower disconnected from their daily reality, and choose instead to place their faith in what they see working for them.  Others might be following the lead of textbooks, tests, and district pacing guides, implicitly assigning authority to these sources rather than to researchers.

What other possibilities am I missing?  What are your theories?

The line between science and dogma can be blurred.  Sometimes the facts don’t line up clearly on one side.  But sometimes they do, and we still don’t follow them.  The story of Copernicus’ great advance should remind us that progress is a tenuous prospect, often impeded by legitimate debates that are intertwined with personal biases and political intrigue.

Popper contends that “the application of…the critical method alone explains the extraordinarily rapid growth of the scientific form of knowledge,” but if the field of education is to come anywhere close to making kind of progress that we see today in the hard sciences, we will all, researchers, teachers, administrators, and policy makers, need to come together to advance, defend, and critique our tentative theories, to begin to form an open-minded scientific community of experts.

Keep following this site.  Contribute your thoughts.  Invite your friends.  We can build this community together.

– Kevin


Between a Rock and a Black Box


When we move to the Common Core, standardized tests will require students to write essays, solve complex multi-step problems, and explain their solutions.

This is a good thing.  A positive movement beyond bubbling on multiple choice tests.

But, accurately scoring long essays (and students’ responses to complex problems) is hard.  If humans score the essays, the human raters can be biased or just plain get tired, so multiple raters and lots of training are needed to ensure that the scores are reliable.  This costs serious money.  If computers score the essays it’s much cheaper.  Recent research claims that computer software can score tests with similar reliability to human raters.  The Hewlett Foundation sponsored an X-Prize type competition to build a better computer grader.  Entrants analyzed 17,500 essays that had already been graded by humans and a study analyzing the entrants found that the best programs are “capable of producing scores similar to human scores.”

However, using computer software to score essays introduces a new problem.

Computers can’t really understand what they read, so the scoring algorithms have to rely on an approximation to quality: finding and counting instances of some characteristic of good writing.  For instance, Randy Bennett, an assessment expert, recently spoke at UCLA questioning the validity of computer-based scoring, in part by recounting how one of the finalists in this Hewlett competition developed a very competent algorithm that was based on counting the number of commas in a piece of writing.

In a way, this makes sense, since it’s likely that, on average, more sophisticated writers use more complex sentence structure and thus, more commas.  But, as we all know, more commas ≠ better ‘riting.  This points to a larger problem with computer-based scoring.  The folks who are creating this software don’t generally open up the black box and tell everyone how the algorithms work.  If they did, people could cheat just by adding more commas.  But if they don’t, there’s no way to assess the validity of these scores: are they measuring what they are supposed to be measuring?  On top of this, some critics who are more philosophically inclined complain that having computers score essays degrades the whole nature of the writing process, which should be an inherently interpersonal communicative act.

So, it seems like we’re stuck.  We can’t afford enough human raters to assess every student in a complex way like the Common Core (and common sense) tell us we should.  But computers can’t really assess complex thinking, and using them too much threatens to derail the complexity we’re trying to encourage.

Here’s what we’re likely to end up with: Common Core essays will be largely scored by machines, checked for accuracy by a few underemployed adults with college degrees, and then the results will be sent back with some vague feedback.  The scores will be, on average, wholly predictable by zip code and socioeconomic status.

There’s a better way.

Rather than focusing on a centralized, high-stakes accountability system, we can create systems of interdependence so that our schools can hold one another accountable.  We want students in Compton Middle Schools to be able to write as well as those in Manhattan Beach, right?  (And solve complex math problems, conduct science experiments, etc.,)  Have the Compton essays sent to Manhattan Beach, the Manhattan Beach essays sent to Inglewood, and the Inglewood essays sent to West Hollywood.  And then switch it up next semester.

Scoring and providing feedback on essays from a wider range of students could open teachers’ eyes to a broadened perspective on what they can and should expect from their own students.  Randomly selected essays could still be double-scored by both machines and expert human raters, but teachers would now assume some ownership in the success of the Common Core assessments and standards.  Such a system could also be the start of building deeper partnerships among schools, so that we could build an education system that works toward the success of all children, rather than placing our schools into artificial Hunger Games competitions pitting charter against magnet and district against district.

What’s more, if we move our assessments closer to our teachers and students, and involve them in the day-to-day creation of what is quality work, then we can make our assessments part of our instruction, and move our instruction closer to the needs identified through the tests.  In other words, we can make sure that our assessments help students learn more.

So, what do you think?  How can we make sure our new assessments help our students learn more?  Respond, 200 words or less.

– Kevin