PDF Version: TeacherBrief.SGP
The New Jersey Education Policy Forum recently released a policy brief on the use of test scores in teacher evaluation. In “Deconstructing Disinformation on Student Growth Percentiles & Teacher Evaluation in New Jersey,” Professors Bruce D. Baker of Rutgers University and Joseph Oluwole of Montclair State University explore the consequences of using student growth percentiles (SGPs) to rate teacher effectiveness.
As a service to New Jersey’s teachers and parents, we present this Q&A based on their policy brief. Readers are encouraged to visit njedpolicy.wordpress.com for the full text, including links to the research cited here.
Q: Are student test scores going to be used to evaluate teachers in New Jersey’s new teacher evaluation system, AchieveNJ?
A: Yes. Teachers of math and language arts in fourth through eighth grades will be evaluated, in part, on their students’ NJASK scores. According to regulations proposed by the New Jersey Department of Education (NJDOE), growth in test scores will comprise at least 30 percent of a teacher’s evaluation rubric rating.
Q: How will the NJDOE calculate a teacher’s “effectiveness” based on these test scores?
A: Rather than simply use the raw scores a teacher’s students receive, the NJDOE will use a method called student growth percentiles (SGPs) in a teacher’s evaluation. Each teacher’s rating will be based on the median SGP score for his or her class. In a class of 21 students, for example, the teacher’s evaluation will be based on the 11th-highest SGP score for that class.
Q: What is a student growth percentile (SGP)?
A: SGPs use a method called “quantile regression” to measure the relative change of a student’s performance compared to that of students who are their “academic peers.” SGPs estimate not how much the underlying scores changed, but how much the student moved within the mix of other students taking the same assessments.
A student’s “academic peers” are those students across the state who have a similar history of scores on previous administrations of the NJASK. Students within each peer group are then ranked based on their “growth” from the previous year. The ranking follows a “normal distribution,” commonly known as a “bell curve.”
It’s important to understand that the ranking is relative to all other students in the peer group. In other words, SGPs are not a measure of absolute growth; they are a comparison of how much growth a student exhibited on the test compared to his or her “peers.”
Q: I’ve heard a lot about the use of value-added models (VAM) in test-based teacher evaluation. Is an SGP the same as a VAM?
A: No. A VAM uses assessment data in the context of a model statisticians call a “regression analysis.” The objective is to estimate the extent to which a student having a specific teacher or attending a specific school influences that student’s “growth” in test scores. The most thorough VAMs attempt to account for the student’s prior year test scores, the classroom level mix of student peers, individual student background characteristics, and even school level characteristics.
Value-added models, while intended to estimate teacher effects on student achievement growth, largely fail to do so in any accurate or precise way. Specifically, value-added measures tend to be highly unstable from year to year, and have very wide error ranges when applied to individual teachers, making confident distinctions between “good” and “bad” teachers difficult if not impossible.
Q: So do SGPs avoid the problems VAMs have with error rates?
A: Absolutely not. Value-added models, while intended to estimate teacher effects on student achievement growth, largely fail to do so in any accurate or precise way. But student growth percentiles make no such attempt. They are merely desciptive measures; they do not even try to tease out a teacher’s effect on her students test performance.
Researchers agree that SGPs do not serve the function of finding the cause of student growth on test scores; even Damian Betebenner, the researcher cited most often by the NJDOE in defense of using SGPs, admits they are descriptive, and not causal, measures.
Q: But if SGPs can’t determine how a teacher affects a student’s test scores, why do the NJDOE and other states’ education departments want to use them in teacher evaluations?
A: Arguably, one reason for the increasing popularity of the SGP approach is controversy surrounding the use of VAMs in determining teacher effectiveness. There is a large and growing body of empirical research describing the problems with using VAMs; however, there has been far less research on using SGPs for determining teacher effectiveness. The reason for this vacuum is not that SGPs are simply immune to problems of VAMs, but that researchers have, until recently, chosen not to evaluate their validity for estimating teacher effectiveness because SGPs are not designed for this task.
Q: NJ Education Commissioner Christopher Cerf has said that an SGP-based system “fully takes into account the socio-economic status” of students. Is he correct?
A: Cerf’s statement is not supported by research on estimating teacher effects, which largely finds that differences in student, classroom and school level factors do relate to variations in both initial performance levels and in performance gains.
Further, teachers aren’t all assigned similar groups of students with an evenly distributed mix of kids who started at similar points. Since a teacher’s evaluation is based on the median SGP for his class, only similar mixes of students in each class would provide a fair comparison. This is, of course, a practical impossibility.
Q: How would differences in student characteristics affect their SGPs?
A: For both ELA and Math growth percentile measures, there are negative and statistically significant correlations at the school level with the percentage of students who qualify for free lunch, and with the percentage of students who are black or Hispanic.
Higher shares of low-income children and higher shares of minority children are each associated with lower average growth percentiles. This means that SGPs – which fail on their face to take into account student background characteristics – fail statistically to remove the bias associated with these measures.
At a practical level, it is relatively easy to understand how and why student background characteristics affect not only their initial performance level but also their achievement growth. There are a multitude of things that go on outside of the few hours a day where the teacher has contact with a child that influence any given child’s “gains” over the year, and those things that go on outside of school vary widely by children’s economic status. Further, children with particular life experiences are more likely to be clustered with each other in schools and classrooms.
Q: The NJDOE claims that the Gates Foundation’s Measure of Effective Teaching (MET) project supports their use of SGPs; does it?
A: The Gates Foundation MET project results provide no basis for arguing that student growth percentile measures should have a substantial place in teacher evaluation. The Gates MET project never addressed SGPs; rather, the study looked at the use of value-added models (and the results they present cast serious doubt on the usefulness of even that method). Those who have compared the relative usefulness of growth percentiles and value-added metrics have found growth percentiles sorely lacking as a method for sorting out teacher influence on student gains.
Q: How will the NJDOE use SGPs to make decisions?
A: There are three ways the state plans to use SGPs: rating schools for interventions, employment decisions, and evaluating teacher preparation institutions such as colleges and universities.
In all of these cases, the use of SGPs is inappropriate. SGPs are not designed to determine a teacher’s or a school’s effect on test scores; again, they are descriptive, not causal, measures. Further, the bias patterns found in SGPs provides a disincentive for teachers to teach in schools with large numbers of low-income students.
Q: What can be done to ameliorate the negative consequences of using SGPs for teacher evaluation?
A: Professors Baker and Oluwole suggest three courses of action going forward:
- An immediate moratorium on attaching any consequences – job action, tenure action or compensation – to these measures.
- A general rethinking – back to square one – on how to estimate school and teacher effect, with particular emphasis on better models from the field.
- A general rethinking/overhaul of how data may be used to inform thoughtful administrative decision making, rather than dictate decisions. Data including statistical estimates of school, program, intervention or teacher effects can be useful for guiding decision making in schools. But rigid decision frameworks, mandates and specific cut scores violate the most basic attributes of statistical measures.
As Baker and Oluwole conclude:
Perhaps most importantly, NJDOE must reposition itself as an entity providing thoughtful, rigorous technical support for assisting local public school districts in making informed decisions regarding programs and services. Mandating decision frameworks absent sound research support is unfair and sends the wrong message to educators who are in the daily trenches At best, the state’s failure to understand the disconnect between existing research and current practices suggests a need for critical technical capacity. At worst, endorsing policy positions through a campaign of disinformation raises serious concerns.