A note on causal relationships from a statistics standpoint


An increasingly hot and debated topic (see the recent judge’s decision to allow the LA Times to publish LAUSD teacher ratings: http://www.latimes.com/local/lanow/la-me-ln-teachers-ratings–20130801,0,165579.story) is the use of test scores as a metric for effective testing.  In this week’s discussion, we’ll discuss some basics about inferring a causal relationship in any study and apply it to using test scores to evaluate teacher effectiveness.    


Examining causality through randomization: an example

In most introductory statistics courses, a common message is that correlation and causality are different (there is an xkcd comic about this: http://xkcd.com/552/). 

Let’s use an example of investigating the effectiveness of cancer drug A versus cancer drug B.  Let’s say we were to assign cancer drug A to only late stage cancer patients and drug B to early stage cancer patients, and then we follow the patients to see who goes into remission and who dies.  Under this set-up, even if drug A is very effective, we do not see how the late-stage patients would do under drug B and how early-stage patients would do under drug A.   So whatever difference exists between the two groups is only possibly due to the drug, but more likely is due to the differences in stages of cancer.  Either way, we cannot tell under this study design, and this study design would be rejected by every ethics review board as well as the FDA.

The traditional method for researching a causal relationship involves random assignment.  The premise is that we assign patients randomly to receive drug A or drug B.  Randomization allows for the assumption that there is no systematic difference between the two groups (those who get drug A and those who get drug B) on any pre-treatment covariate.  Thus after we randomize patients, and then observe the outcome, we can say whether or not there is an association between drug assignment and cancer survival.  Note: the usage of “assignment” was intentional here, and we’ll discuss why in a later paragraph. 

Randomization does not exist in teacher assignment

So now let’s say the outcome of interest is student test scores.  The theory is that an effective teacher should improve a student’s performance on the test.  The current “study design,” if you will, is more similar to the first scenario of cancer drug assignment.  Teachers are not assigned randomly to students.  The probability of a student being assigned to a specific teacher depends not only on the state, district, and school, but also on that student’s characteristics.  Parents that are heavily involved in their children’s education can try to have their children placed in a specific classroom.  Special needs children are more likely to be placed in a classroom designed for special needs.  Even past test performance can impact classroom assignment.  These and other factors confound the teacher effect being measured, because the students assigned to specific teachers may be very different.   

Value-added models, which will be an ongoing topic of conversation, include past test scores in the model (the assumption is made that the previous year’s test scores are an accurate baseline, and this may be challenged in later posts).  Several models also include demographic information on the students such as ethnicity, gender, and indicators of socioeconomic status (free-lunch eligibility) to adjust for some of the differences that exist between classrooms.  However, even if these models were to capture many of the confounding relationships, there are still problems with the assumption that whatever effect is left over after all these adjustments is the teacher causal effect. There is no counter-factual data: we do not observe what would happen to the students under a different teacher assignment. We also do not see the compliance to the assigned treatment.

Compliance to treatment 

In the example of the cancer drug, if a patient assigned to drug A had difficulties taking the prescribed treatment at the exact intervals and missed several dosages, we do not observe what would happen to that patient if he/she had been fully disciplined and had adhered perfectly to all instructions.  Thus claims about drug A’s effectiveness (or lack there of) using simple linear models and an indicator for drug assignment may be called into question.

The same can be said when estimating a teacher effect on student scores.  If examining the effect of the teacher assignment, what will still be masked in the model are student study habits, the home environment, how many days the student actually attended class, extenuating circumstances, etc. that may impact the student’s “compliance” with the teacher’s curriculum. 

The take-away

The intent of this article is simply to point out some issues with estimating a causal effect of teacher assignment using test score data. That is not to say that test score data cannot and should not be used in trying to improve the learning of students across the country, only that caution should be advised when attempting to use such data for the causal effect of a specific teacher.