By: Julia Sass Rubin, PhD.
PDF of Policy Brief: Rubin.ReportCards
Peter Drucker is right. “What gets measured, gets managed.” And, as the recent standardized test cheating scandals in Washington DC and Atlanta highlight, poorly designed measures, especially those attached to punishments and rewards, can shape behavior in very unproductive ways.
I was reminded of this while reading the new School Performance Reports, that the NJ Department of Education (NJ DOE) released last month.
The reports, intended to help parents, boards and superintendents “to find ways to address weaknesses in their districts,” compare publicly-funded schools against a small group of their peers with similar free and reduced lunch, Limited English Proficiency, and special needs demographics.
The schools are evaluated in four categories created by the NJ DOE: Academic Achievement, College & Career Readiness, Student Growth and, for high schools only, Graduation and Post-Secondary Enrollment Rates.
Comparing schools to those with similar demographics is a good idea that highlights that students’ personal characteristics play a bigger role in determining their academic outcomes than anything that happens to them in-school. And what could be bad about giving parents and educators more information?
Unfortunately, rather than providing useful data, the new reports undercut New Jersey’s excellent public schools. The reports also create incentives for districts to manage to the new standards through policies that produce higher rankings but may not meet students’ needs.
There are four types of problems with the new school reports: artificially created competition, poorly designed comparison groups, arbitrary category definitions, and inaccurate data.
Rather than providing parents and educators with data on how schools performed on various important metrics and allowing them to form their own judgments, the NJ DOE created an artificial list of winners and losers, by ranking each school’s performance against a unique subset of thirty demographically comparable schools. This peer ranking information appears on the first page of the reports and will be the primary take-away for most readers. However, these forced rankings are not useful for evaluating or improving public schools.
For example, in high-performing districts, the academic performance of peer schools is often separated by a tiny fraction of a percentage point. Westfield High School saw 97.8% of its students pass the Language Arts High School Proficient Assessment (HSPA) versus 97.9% of Chatham High School students. This earned Westfield a peer rank of 45 versus 58 for Chatham. In fact, “14 of 31 schools in the Westfield High School peer group achieved a 98% passing on the Language Arts High School Proficiency Assessment (HSPA), and yet their percentile ranks ranged from 39 to 84, regardless of the identical passing score of 98%.” Unfortunately, this kind of information is not easily obtainable from the reports, which provide no data comparing a school’s performance to its peers except for the useless rankings.
The peer rankings are also confusing as they do not reflect whether a school met the academic achievement targets established for it by the NJ DOE. For example, within the same peer group, Tenafly High School met 88% of its academic achievement targets, yet it received an 81% peer rank, while Westfield met 100% of its targets and received a peer rank of only 53%. How are parents supposed to make sense of this seemingly contradictory information?
Nor do the reports provide any way for parents to evaluate the four overarching categories against each other. For example, Princeton’s Community Park elementary school scored at the absolute top of its peer group for student growth – the change in individual students’ performance on the NJ Ask test from year to year. However, Community Park was rated in the bottom fifth of its peers on College and Career Readiness because a few students missed school for 15 or more days last year. If a parent manages to dig deeply enough into the report to figure all this out, they might wonder why those absences are a problem if the students’ test scores are climbing?
Poorly Designed Comparison Groups
While NJ DOE’s overall objective of comparing schools with similar demographics is a very good one, some of the school groupings they created do not seem comparable. For example, New Egypt High School, with 10% free and reduced lunch students, was compared to Tenafly High School, with 2.4% free and reduced lunch students.
The NJ DOE also compared traditional public schools against magnet and charter schools, which have selective enrollment or the ability to remove children for a variety of reasons. For example, Science Park High School, a selective magnet school that accepts students on the basis of test scores and an entrance exam, was compared to Trenton High School, which accepts all students who live in Trenton. Not surprisingly, Science Park High School fared very well in this extremely unfair comparison.
NJ DOE’s analysis also conflates children receiving Free and Reduced lunch, treating all students from families earning up to 185 percent of the poverty line as interchangeable. In other words, it lumps together students who are homeless with students whose families have an income of $50,000 a year.
Yet these differences in income are very significant when it comes to academic performance. For example, a child whose family makes $40,000 a year averages 100 points higher on the SAT than a child whose family makes $20,000 a year.
Comparing schools with different demographic compositions is particularly problematic given the previously discussed forced rankings of peer schools. How are parents to interpret rankings based on small differences in test scores if one school has twice the number of homeless students than a peer institution to which it is being compared?
Arbitrary Category Definitions
The NJDOE used somewhat arbitrary data to create the four categories, which placed certain districts at a disadvantage for reasons that had nothing to do with the quality of their schools.
For example, elementary and middle schools were marked down in College and Career Readiness if more than six percent of their students were absent for more than ten percent of the year, regardless of the reason for those absences or those students’ overall academic performance. This lowers the scores of districts that have a chicken pox outbreak, or that have large immigrant populations, who may return to their countries of origin for multiple weeks during the school year. It also hurts districts whose students may have religious reasons for extended school absences.
None of these factors should penalize a school’s evaluation, particularly as the absences are outside the school’s control and not correlated to individual children’s academic performance. More worrisome, such measures encourage districts to “manage to the measure,” taking actions that raise their rankings but not their academic performance. South Brunswick, for example, is considering asking families traveling abroad for extended period of time to withdraw their children from the school system rather than have them counted as absent.
At the high school level, NJ DOE evaluated schools on whether at least 35% of all students took one of ten specific AP tests in English, Math, Social Studies, or Science. AP exams in subjects outside these ten were not counted against the 35% benchmark, so high schools that provided additional AP offerings in foreign languages, Art History, or Music Theory, for example, were penalized if their students chose those AP courses instead of the ten subjects specified by the NJ DOE. Is a student who takes an AP course in Art History really less college and career ready than one who takes an AP course in English?
Some districts use the International Baccalaureate program or provide opportunities for their students to enroll in college courses while in high school, rather than taking AP classes. Yet both of these were ignored in the NJ DOE calculations, even though passing a college course while in high school is absolutely the best predictor of whether a student is college and career ready.
For the Post-Secondary Enrollment category, the NJ DOE looked at the percentage of students who were enrolled in a post-secondary institution 16 months after high school graduation. However, students attending post-secondary institutions outside the United States were excluded from this analysis. So students who attended the UK’s highly selective Oxford or Cambridge Universities were not counted by the NJ DOE as obtaining a post-secondary education. This marked down districts such as Princeton, where more than a dozen graduates each year may “go on to study at top-ranked universities abroad.”
In addition to problems with the kind of data that the NJ DOE used for its assessment, there are issues with the accuracy of that data. Commissioner Cerf admitted that the reports had many mistakes, but refused to delay their release or to designate the first year’s data as anything less than credible.
So why would the NJ DOE want to release confusing, inaccurate school reports that create useless competition, encourage gaming of the system rather than high quality decision making, drive a false narrative of failure, and “unfairly characterize many fine schools, dishearten students and teachers, and cause unnecessary alarm”?
Commissioner Cerf has been quite open about his objective of making “clear that there are a number of schools out there that perhaps are a little bit too satisfied with how they are doing.” In other words, knocking down New Jersey’s excellent public schools is exactly what the Commissioner intended these reports to accomplish.