Showdown at the CA Corral


First came posturing and threats:  Sept. 9, U.S. Secretary of Education Arne Duncan said that California’s plan to suspend most accountability testing for one year while transitioning to the new Common Core standards “is not something we could approve in good conscience.” 

Then backtracking and broad compliments: “We want to be flexible, we want to be thoughtful,” Duncan said. “We don’t want to be stuck. There are lots of different things happening across the country. I don’t want to be too hard and fast on any one of these things because I have not gone through every detail, every permutation.”

“I give the governor tremendous credit,” Duncan said. “He’s worked really, really hard” in moving to new and rigorous learning goals. “He’s put real resources behind that.”

Meanwhile, pontificators  find plenty of blame on both sides.

But while there is no clear resolution to the impasse between Secretary Duncan and Governor Brown, and no real rationale for insisting that California toe the line on outdated accountability measures, I think it’s high time that we begin thinking more broadly about how to develop accountability measures that might actually work.

One thing that seems to have mostly escaped notice is that LAUSD (and 6 other districts’) recently obtained a waiver from NCLB sanctions for a plan to make 40% of their accountability system based on non-cognitive outcomes.

Somehow, these districts will apparently be trying to figure out how to accurately measure these other outcomes.  Which ones?  Attendance, perhaps.  That should be simple.  But this also raises interesting questions about what we actually want our schools to accomplish.  What doesn’t get measured, doesn’t get done…  Should we try to measure how well our kids are doing in other outcomes that we actually care about, say empathy? Cooperation? Courage? Mental health?

There is interesting work being done on how to measure some of these outcomes: The Flourishing Children Project has developed indicators of constructs such as empathy, gratitude, altruism, and reliability.   And a group of private schools has developed an assessment to measure whether they are fulfilling their missions in terms of creative thinking, intellectual curiosity, collaboration, resilience, ethics, and time management. 

So, it appears quite possible for us to expand our view of accountability to include a much broader vision of what schools should be accomplishing.  And, as I read it, the evidence seems to suggest that our society’s overall need to improve these types of outcomes is at least as great as our need to improve academic outcomes.

As far as academics, there’s been lots of pressure lately to improve, and we seem bent on acquiring more and more information, but the weight of the evidence seems to be that schools are doing a slightly better job on these outcomes – national scores are generally up, and it seems pretty clear that the challenges schools are facing are greater than ever – inequality is widening , child poverty is increasing , one out of every nine African American children grows up with a parent incarcerated – these are the kinds of challenges that you’d think would lead to lower test scores – yet scores are (slowly) inching up.

For non-cognitive outcomes, however, the picture doesn’t look so pretty, nor is it anywhere near filled in.  More people now die from suicide than from car accidents, obesity rates are skyrocketing, and a 2010 U-M study found that empathy among teens is much worse than it was in 1970.  Other trends seem more promising – tolerance for LGBT youth and adults has definitely grown.  But these are just tiny peep-holes into the state of our children’s mental and emotional health outcomes.

If we’re so serious about holding schools accountable that we can’t “in good conscience” go one year without high-stakes testing, then perhaps we ought to be serious about measuring a reasonable range of student outcomes.  And perhaps it’s high time we started some serious experiments to figure out which outcomes those ought to be, how well we can accurately measure them, what goals we ought to aim for, etc.,

– Kevin

Measuring What Works


Educational research has existed for hundreds of years, with people from various backgrounds using a large variety of methods to try to answer a set of simple questions: “How do students learn? What contexts/curricula/teacher behaviors are associated with promoting student learning?”

While I have spent many of the last years thinking about research on educational contexts and practices, two separate articles that I read this week brought a clear perspective to my thinking about the role of research in education. The first was published in the New York Times as a part of a special issue Learning What Works.  This article describes the Institute of Educational Sciences, an office within the federal Department of Education, and the use of the “gold standard” in research: the randomized clinical trial. For a long, long time, almost all research in education was done using small scale, often qualitative, and rarely rigorous research methods. Full disclosure: I am a full-blown quantitative researcher whose graduate work is currently sponsored by the Institute of Educational Sciences, so I am not the least bit impartial in this discussion, but hear me out. I say that the previous research was not “rigorous” because in educational contexts, along with almost any other real world situation, there are a vast number of influences on students’ learning and teachers’ practices, and it is very difficult to isolate the effect of a particular teacher training program or reading curriculum on the outcome of interest. What works in one environment may fail horribly in another. This is why the randomized trial is trusted throughout so many fields of research. Random assignment is one of the few ways of actually isolating the effect of a certain treatment on a population. However, there are all kinds of ethical and logistical problems with random assignment in many educational contexts, which is partially why the implementation of this method in education has been very slow.

This point brings me to the second article of the week, an article by Atul Gawande  published in the New Yorker in July 2013. The article contains a long but fascinating discussion of why certain ideas and innovations take off and are implemented widely in a short period of time, while others languish and are not implemented for decades or sometimes even at all. The author follows the trajectories of the use of surgical anesthesia and antiseptics in medicine. Both were discovered in the nineteenth century, and while anesthesia was routinely used across hospitals in the U.S. and Britain within seven years, the use of carbolic acid and other cleansers for cleaning hands and wounds during surgery took decades to truly catch on. The article goes into a lot of details about why certain ideas, including simple, lifesaving solutions to medical problems, are developed by scientists and researchers but don’t catch on with the general public, and I am not going to discuss most of it here (read the article! It’s worth your time.). To summarize one of the main points, the ideas that often stall “attack problems that are big but, to most people, invisible; and making them work can be tedious, if not outright painful.”

It may be a stretch, but to me, that sounds awfully like the problem of how so much educational research is being outputted without the use of strong research designs that allow for the ability to truly decide whether a program/practice/curricula works and if so, with whom. It has been known for decades, if not centuries, that randomized trials, when available, provide the most definitive results. Before I create a giant uproar, I will add the caveat that other methodologies can provide a great deal of more detailed, nuanced information about students and characteristics of programs that impact the effectiveness. But to truly test if something is working, randomized control trials are the best approach.    Furthermore, when randomized trials are not ethically or logistically feasible, there are other rigorous approaches that can mimic the effect of randomization and lead to stronger causal inferences than a standard non-randomized treatment vs. control group comparison. Yet doing either of these approaches can be expensive, difficult, and for those without training, nearly impossible. Part of the problem is that so many researchers in education do not receive training in rigorous methods, and are unexposed to the pitfalls of many research practices. However, another problem mirrors the problems described in the New Yorker article, which is convincing people who already know something is the best practice that it is worth their time to do it. This issue is not just on the researcher side, but also among educators who trust programs or curricula that have never been tested or are not supported by rigorous evidence. As Joseph Merlino is quoted as saying in the NY Times article, ‘“A lot of districts go by the herd mentality,’ citing the example of a Singapore-based math program now in vogue that has never been rigorously compared with other programs and found to be better. ‘Personal anecdote trumps data.’”

But how do we change practice to emphasize rigorous methods? The New Yorker article states that the key is not big public awareness campaigns but rather a one-on-one approach. As the author writes, “Simple ‘awareness’ isn’t going to solve anything. We need our sales force and our seven easy-to-remember messages. And in many places around the world the concerted, person-by-person effort of changing norms is under way.” So far, the Institute of Educational Sciences (IES) has not done a very good job of getting the word out on why it is so important to rigorously review the programs implemented in our schools. I will be interested to see in the next decade whether IES and like-minded educational researchers across the country are able to use the lessons learned from other fields to promote rigorous research methods in education, or if these methods continue to be used by just a few isolated researchers in the ivory tower with little effect on educational practice across the country.


– Megan

Applauding Teach For America – and Demanding More


We should all be applauding Teach For America (TFA) right now.   But, of course, we’re not.  And we should all be examining this organization carefully to hold it accountable to its stated vision.  But, of course, we’re not.

Instead, we’ve divided ourselves into two camps – TFA critics and defenders.  The defenders are applauding, and the critics are criticizing, and there is plenty of research holding TFA accountable (see below).  However, the critics seem to be willfully ignoring credible evidence showing that TFA has accomplished some very impressive results.  Do TFA’s critics honestly believe that our nation’s schools of education have nothing to learn from TFA?

And the defenders are, in my opinion, not looking honestly at whether the organization is actually doing everything possible to accomplish its stated vision: “One dayall children in this nation will have the opportunity to attain an excellent education.”  Do TFA’s defenders honestly believe that a revolving system of folks teaching for 2 years can create sustainable change for our hardest hit schools and kids?

A recent Mathematica study (summary here) compared secondary math student achievement results for 5,790 students who were randomly assigned to 66 TFA math teachers or 70 comparison teachers.  The researchers identified a group of classrooms within a school that were matched in terms of subject matter, class conditions, and period.  Students in that school who enrolled in the same math course were then randomly assigned to a classroom taught by a TFA teacher or by a comparison teacher.

The students taught by TFA teachers outperformed those taught by the comparison teachers by an average of .07 standard deviations .  That’s statistically significant and meaningful, the randomized study design makes the results compelling, and the findings held up under many different sensitivity analyses.  This study adds to other evidence (and here, and here) that TFA teachers are as effective or more effective than other teachers.

But, before everyone starts burning down schools of education or citing contradictory research findings, I think it’s worth considering what it would mean if these results were NOT true.  What if we had an organization that recruited the cream of the crop from our top colleges, and we found that those folks were less successful than our traditionally prepared teachers)? Well, TFA receives tens of thousands of applications every year from our nation’s top colleges.  Meanwhile, traditionally prepared teachers, on average, are “among the lowest achieving graduates of U.S. high schools (Committee for Economic Development, 1985, in Ballou & Podgursky, 1995)” with SAT scores near the bottom of all college graduates (Weaver, 1983, in Ballou & Podgursky, 1995).

If the TFA folks were less successful than traditionally prepared teachers, it would seem to suggest that our K-16 education system had less of an impact than the one or two years of teacher preparation in an education school.

This does not appear to be the case.  Instead, it seems that the best educated people actually turn out to be slightly better at educating the next generation (see Note below for one alternative hypothesis).  In other words, the things that parents and teachers have been telling kids for years – study hard, get good grades, go to a good college – actually seem to matter not just in helping a kid earn more money, but also in helping that kid become better at a really important job – teaching.

Thinking in these terms ought to restrain our anger toward schools of education.  Their relative ineffectiveness might be largely determined by the students they admit.  Of course, our judgment ought to be restrained by the humility of admitting that we actually have almost no data or evidence about the relative effectiveness of schools of education.  But our collective lack of restraint in the face of ignorance is a subject for another day.

With regard to this study, TFA’s defenders will likely trumpet the rigorously researched results, and the study helps them do that by “translating” the effect size into “about equal to 2 ½ months of schooling.  That sure sounds like a big and important result, and a strong justification of the TFA approach.  However, a closer look at the study results for individual teachers shows a lot of variability in the effectiveness of TFA teachers (see Figure V.1 from the study: variability in effectiveness of TFA teachers).  This means that it’s very likely that an excellent TFA math teacher will be replaced in two years by a mediocre teacher, who will be replaced by an average teacher, and on and on.  This kind of turnover cannot logically lead to the dramatic school improvement to which TFA says it is committed.

Instead of descending into TFA vs. ed school debates, we ought to celebrate that TFA appears to have successfully cracked a really important nut – they’ve figured out a way to get our most ambitious, highest-achieving college graduates to go into teaching.  What’s more, those high-achievers appear to be able to help our most disadvantaged students improve their achievement (at least in secondary math).

The problem then becomes – how can we get those most successful teachers to remain in teaching?  Here’s one possibility: gradually increase the TFA time commitment from 2 years to 5 years but this is a tricky thing to accomplish politically.  If we’re going to make any progress on getting TFA teachers to remain in the classroom, it would seem important for us to agree that this is a worthy goal in the first place.

And if we’re not going to make any progress on this second goal, then I’m afraid that the .07 effect size difference is not going to be enough to help anywhere near “all children” achieve TFA’s vision in my lifetime.

Note: It’s also possible that the positive impact of TFA teachers is not driven by the recruitment and selection of higher-achieving people, but by the training that TFA teachers receive.  However, because the TFA organization itself is staffed heavily by TFA alumni, and because that organization has been relentlessly refining and improving its training methods for over two decades, the credit in the end goes back to the same group of people: perhaps the people that TFA recruited have managed to develop a better training system than our schools of education.


– Kevin