Home » 2014

Yearly Archives: 2014

Research Note: On Student Growth & the Productivity of New Jersey Charter Schools

Bruce D. Baker, Rutgers University, Graduate School of Education

October 31, 2014

PDF: Research Note on Productive Efficiency

In June of 2014, I wrote a brief in which I evaluated New Jersey’s school growth percentile measures to determine whether factors outside the control of local schools or districts are significantly predictive of variation in those growth percentile measures.[1] I found that this was indeed the case. Specifically, I found:

Student Population Characteristics

  1. % free lunch is significantly, negatively associated with growth percentiles for both subjects and both years. That is, schools with higher shares of low income children have significantly lower growth percentiles;
  2. When controlling for low income concentrations, schools with higher shares of English language learners have higher growth percentiles on both tests in both years;
  3. Schools with larger shares of children already at or above proficiency tend to show greater gains on both tests in both years;

School Resources

  1. Schools with more competitive teacher salaries (at constant degree and experience) have higher growth percentiles on both tests in both years.
  2. Schools with more full time classroom teachers per pupil have higher growth percentiles on both tests in both years.


  1. Charter schools have neither higher nor lower growth percentiles than otherwise similar schools in the same county.


On the one hand, these findings raise some serious questions about the usefulness of the state’s growth percentile measures for characterizing school effectiveness. At the very least, if one wishes to compare the growth percentiles of one school to another, one should use a statistical approach that first corrects for those factors that are a) outside of the control of local school officials and b) substantively influence growth.

While I’ve been critical of the growth percentile data produced by the state, most notably for their failure to more completely address these issues, the growth percentile measures are certainly more useful than performance level measures which are even more highly correlated with differences in demographics and other contextual variables.

Here I use the growth percentile measures as the outcomes of interest in a set of models wherein I attempt to estimate the relative efficiency of production of outcomes across New Jersey schools. Given the findings of my previous analyses, if I wish to compare school growth percentiles and make assertions about how well one school versus another achieves growth, I must account for several factors.

I must, for example, account for a) initial performance levels, b) demographic differences, c) school size and grade range differences.

Here, my goal is slightly different from the previous analysis in terms of how I characterize resources. Here, the goal is to correct for the aggregate resource inputs to each school, on the assumption that schools (or their operators) might make tradeoffs between teacher compensation, compensation structures and class sizes to achieve greater efficiency in producing student achievement gains. Lacking comprehensive school site spending data in New Jersey, I take a second best (perhaps third or fourth) approach of using the summed certified staffing salaries per pupil as a proxy for total fiscal resource inputs. Otherwise the regressions are identical to those in the previous analysis.

Table 1 shows the regression model output.

Table 1



Again, we can see that these various factors explain from around 20 to nearly 40% of the variation across schools in growth percentiles.

We also see that aggregate school resources matter. Schools with higher certified staffing spending per pupil are also showing higher growth.

But we can also use these models to compare the relative performance of schools in the models. Specifically, we can evaluate the extent to which a school’s actual growth percentile is higher or lower than would be predicted, given the school’s population, resources and other characteristics. Because there are other unmeasured common pressures on schools in particular locations, including differences in the value of the dollar inputs from Newark to Camden or Atlantic City, I compare schools against others in the same county (rather than city in this case, because so many cities in New Jersey have such small numbers of schools). So, each school is compared against similar spending, similar student, and other similar characteristic schools in their county, but in a statewide model.

Now let’s take a look at performance distributions of charter and district schools for 2013 and for 2012 on math and language arts growth. We know from my previous research note that the average difference in growth between charter and district schools was “0.” But averages are uninteresting and provide little policy guidance. What’s more interesting is evaluating the variation in charter, and for that matter district school growth, corrected for the various factors above.

Figure 2 shows the statewide 2013 performance profile – the relationship between corrected language arts and corrected math growth percentiles. The two are modestly related (.49). Schools in the upper right are high on both and in the lower left are low on both. Charters, like district schools are scattered throughout the distribution.

Figure 2


Figure 3 looks pretty much like Figure 2 with charter and district schools scattered. In both cases, it is generally true, via modest correlation (.53), that schools that were high on one assessment, tended to be higher on the other. I cannot be entirely confident whether these patterns reflect true quality differences in production of outcomes, or whether they simply represent outside factors not fully controlled for in the models.

Figure 3


Figure 4 looks specifically at schools in the city of Newark in 2013. Again, charter and district schools are scattered, with some district schools performing quite high on both LA and Math. Higher (on both) performing charters, in terms of resource and need adjusted growth, include Discovery, Maria Varisco Rogers and Newark Educators charter, and low performing charters included University Heights and Greater Newark.

Figure 4


TEAM academy was average on Math and slightly above average on LA. Robert Treat was average on LA and slightly below average on Math. North Star was slightly above average on both.

Patterns are similar for 2012, with Discovery being the standout, and University Heights being in positive rather than negative position. North Star again showed better than average growth on both tests, but TEAM showed slightly below average growth adjusted for resources, students, enrollment size and grade range.

Figure 5


These final two graphs rank charter schools statewide by their performance on growth measures, given their resources, students, enrollment size and grade range. Figure 6 shows the 2013 ranking and Figure 7 shows the 2012 ranking. Both charts are sorted from lowest (average across both tests) to highest growth against expectations.

Figure 6 shows that Freedom Academy, Discovery Charter School and Camden’s promise had the greatest achievement growth given their resources, students, enrollment size and grade range and Union County TEAMS, Sussex County CS for TEC and East Orange CS had the lowest growth against expectations. In 2012, Discovery and Camden’s promise also did very well.

But other more “talked about” charters fall within or closer to the average mix of schools. Specifically, large and long running charter operators in Newark, include TEAM academy, whose performance is consistently around average (slightly above or slightly below). North Star Academy is consistently slightly to modestly above average, while Robert Treat Academy is more consistently below average on the student growth measures adjusted for resources, students, enrollment size and grade range.

Importantly, the distribution of charters around the mean is not different from the distribution of district schools around the statewide mean, as shown in Figures 2 and 3 above, and as estimated in my previous brief.[2]

Figure 6


Figure 7



Conclusions & Implications

Of course, the big question is what to make of all of this, if anything. Much has been debated in recent years about the average test scores and proficiency rates of these schools and of charter schools and district schools in comparison. That debate requires a cautious accounting for a variety of student background characteristics which substantively influence status measures of student performance.

Notably, so too do the state growth percentile measures require substantial adjustment for student characteristics. But these measures should provide some more insights into differences across schools in their achievement, most notably, whether kids under their watch are achieving normatively better or worse achievement growth on math and language arts assessments.

As I’ve opined on numerous occasions, the interesting question is not whether the charter sector on the whole or by location “outperforms” district schools, but rather, what’s going on behind the variation. Using analyses of this type, we should begin exploring in greater depth what’s going on in schools more consistently in the upper right and lower left hand corners of these distributions. Applying these methods and measures, we may find schools we hadn’t previously considered worthy of that closer look.


[1] https://njedpolicy.files.wordpress.com/2014/06/bbaker-sgps_and_otherstuff2.pdf

[2] https://njedpolicy.files.wordpress.com/2014/06/bbaker-sgps_and_otherstuff2.pdf


Raw model output: Productivity Output

Stata code for compiling (and rolling up) resource and demographic measures: Step 1-Staffing Files | Step 2-SRC Aggregation | Step 3-School Resource Aggregation

Research Note: On Teacher Effect vs. Other Stuff in New Jersey’s Growth Percentiles

Bruce D. Baker, Rutgers University, Graduate School of Education

June 2, 2014

PDF: BBaker.SGPs_and_OtherStuff

In this research note, I estimate a series of models to evaluate variation in New Jersey’s school median growth percentile measures. These measures of student growth are intended by the New Jersey Department of Education to serve as measures of both school and teacher effectiveness. That is, the effect that teachers and schools have on marginal changes in their median student’s test scores in language arts and math from one year to the next, all else equal. But all else isn’t equal, and that matters greatly!

Variations in student test score growth estimates, generated either by value-added models or growth percentile methods, contain three distinct parts:

  1. “Teacher” effect: Variations in changes in numbers of items answered correctly that may be fairly attributed to specific teaching approaches/ strategies/ pedagogy adopted or implemented by the child’s teacher over the course of the school year;
  2. “Other stuff” effect: Variations in changes in numbers of items answered correctly that may have been influenced by some non-random factor other than the teacher, including classroom peers, after school activities, health factors, available resources (class size, texts, technology, tutoring support), room temperature on testing days, other distractions, etc;
  3. Random noise: Variations in changes in numbers of items answered correctly that are largely random, based on poorly constructed/asked items, child error in responding to questions, etc.

In theory, these first two types of variations are predictable. I often use a version of Figure 1 below when presenting on this topic.

We can pick up variation in growth across classrooms, which is likely partly attributable to the teacher and partly attributable to other stuff unique to that classroom or school. The problem is, since the classroom (or school) is the unit of comparison, we really can’t sort out what share is what?

Figure 1


We can try to sort out the variance by adding more background measures to our model, including student individual characteristics, student group characteristics, class sizes, etc., or by constructing more intricate analyses involving teachers who switch settings. But we can never really get to a point where we can be confident that we have correctly parsed that share of variance attributable to the teacher versus that share attributable to other stuff. And the most accurate, intricate analyses can rarely be applied to any significant number of teachers.

Thankfully, to make our lives easier, the New Jersey Department of Education has chosen not to try to parse the extent to which variation in teacher or school median growth percentiles is influenced by other stuff. They rely instead on two completely unfounded, thoroughly refuted claims:

  1. By accounting for prior student performance (measuring “growth” rather than level) they have fully accounted for all student background characteristics (refuted here[1]); and
  2. Thus, any uneven distribution of growth percentiles, for example, lower growth percentiles in higher poverty schools, is a true reflection of the distribution of teacher quality (refuted here[2]).

In previous analyses I have explored predictors of New Jersey growth percentiles at the school level, including the 2012 and 2013 school reports. Among other concerns, I have found that the year over year correlation (across schools) between growth percentiles is only slightly stronger than the correlation between growth percentiles and school poverty.[3] That is, NJ SGPs tend to be about as correlated with other stuff as they are with themselves year over year. One implication of this finding is that even the year-over-year consistency is merely consistently measuring the wrong effect year over year. That is, the effect of poverty.

In the following models, I take advantage of a richer data set in which I have used a) school report card measures, b) school enrollment characteristics and c) detailed statewide staffing files and have combined those data sets into one, multi-year data set which includes outcome measures (SGPs and proficiency rates), enrollment characteristics (low income shares, ELL shares) and resource measures derived from the staffing files.

Following are what I would characterize as exploratory regression models, using 3-years of measures of student populations, resources and school features, as predictors of 2012 and 2013 school median growth percentiles.

               Resource measures include:

  •  Competitiveness of wages: a measure of how much teachers’ actual wages differ from predicted wages for all teachers in the same labor market (metro area) in the same job code, with the same total experience and degree level (estimated via regression model). This measure indicates the wage premium (>1.0) or deficit (<1.0) associated with working in a given school or district. This measure is constant across all same job code teachers across schools within a district. This measure is created using teacher level data from the fall staffing reports from 2010 through 2012.
  • Total certified teaching staff per pupil (staffing intensity): This measure is created by summing the full time certified classroom teaching staff for each school and dividing by the total enrolled pupils. This measure is created using teacher level data from the fall staffing reports from 2010 through 2012.
  • % Novice teachers with only a bachelors’ degree: This measure also focuses on classroom teachers, taking the number with fewer than 3 years of experience and only a bachelors’ degree and dividing by the total number of classroom teachers. This measure is created using teacher level data from the fall staffing reports from 2010 through 2012.

I have pointed out previously that it would be inappropriate to consider a teacher or school to be failing, or successful for that matter, simply because of the children they happen to serve. Estimate bias with respect to student population characteristics is a huge validity concern regarding the intended uses of New Jersey’s growth percentile measures.

The potential influence of resource variations presents a comparable validity concern, though the implications vary by resource measure. If we find, for example that teachers receiving a more competitive wage are showing greater gains, we might assert that the wage differential offered by a given district is leading to a more effective teacher workforce. A logical policy implication would then be to provide resources to achieve wage premiums in schools and districts serving the neediest children, and otherwise lagging most on measures of student growth.

Of course, schools having more resources for use in one way – wages – also may have other advantages. If we find that overall staffing intensity is a significant predictor of student growth, it would be unfair to assert that the growth percentiles reflect teacher quality. That is, if growth in some schools is greater than in others because of more advantageous staffing ratios. Rather than firing the teachers in the schools producing low growth, the more logical policy response would be to provide those schools the additional resources to achieve similarly advantageous staffing ratios.

With these models, I also test assumptions about variations across schools within larger and smaller geographic areas – counties and cities. This geography question is important for a variety of reasons.

New Jersey is an intensely racially and socioeconomically segregated state. Most of that segregation occurs between municipalities far more so than within municipalities. That is, it is far more likely to encounter rich and poor neighboring school districts than rich and poor schools within districts. Yet education policy in New Jersey, like elsewhere, has taken a sharp turn toward reforms which merely reshuffle students and resources among schools (charter and district) within cities, pulling back significantly from efforts to target additional resources to high need settings.

Figure 2 shows that from the early 1990s through about 2005, New Jersey placed significant emphasis on targeting additional resources to higher poverty school districts. Since that time, New Jersey’s school funding progressiveness has backslid dramatically. And these are the very resources needed for districts – especially high need districts – to provide wage differentials to recruit and retain a high quality workforce, coupled with sufficient staffing ratios to meet their students’ needs.

Figure 2



Table 1 shows the estimates from the first set of regression models which identify predictors of cross school and district, within county variation in growth percentiles. The four separate models are of language arts and math growth percentiles (school level) from the 2012 and 2013 school report cards. These models show that:

Student Population Other Stuff

  1. % free lunch is significantly, negatively associated with growth percentiles for both subjects and both years. That is, schools with higher shares of low income children have significantly lower growth percentiles;
  2. When controlling for low income concentrations, schools with higher shares of English language learners have higher growth percentiles on both tests in both years;
  3. Schools with larger shares of children already at or above proficiency tend to show greater gains on both tests in both years;

School Resource Other Stuff

  1. Schools with more competitive teacher salaries (at constant degree and experience) have higher growth percentiles on both tests in both years.
  2. Schools with more full time classroom teachers per pupil have higher growth percentiles on both tests in both years.

Other Other Stuff

  1. Charter schools have neither higher nor lower growth percentiles than otherwise similar schools in the same county.


TABLE 1. Predicting within County, Cross School (cross district) Variation in New Jersey SGPs


*p<.05, **p<.10

TABLE 2. Predicting within City Cross School (primarily within district) Variation in New Jersey SGPs


*p<.05, **p<.10

Table 2 includes a fixed effect for city location. That is, Table 2 runs the same regressions as in Table 1, but compares schools only against others in the same city. In most cases, because of municipal/school district alignment in New Jersey, comparing within the same city means comparing within the same school district. But, using city as the unit of analysis permits comparisons of district schools with charter schools in the same city.

In Table 2 we see that student population characteristics remain the dominant predictor of growth percentile variation. That is, across schools within cities, student population characteristics significantly influence growth percentiles. But the influence of demography on destiny, shall we say (as measured by SGPs), is greater across cities than within them, an entirely unsurprising finding. Resource variations within cities show few significant effects. Notably, our wage index measure does not vary within districts but rather across them and was replaced in these models by a measure of average teacher experience. Again, there was no significant difference in average growth achieved by charters than by other similar schools in the same city.

Preliminary Policy Implications

The following preliminary policy implications may be drawn from the preceding regressions.

Implication 1: Because student population characteristics are significantly associated with SGPs, the SGPS are measuring differences in students served rather than, or at the very least in addition to differences in collective (school) teacher effectiveness. As such, it would simply be wrong to use these measures in any consequential way to characterize either teacher or school performance.

Implication 2: SGPs reveal positive effects of substantive differences in key resources, including staffing intensity and competitive wages. That is, resource availability matters and teachers in settings with access to more resources are collectively achieving greater student growth. SGPs cannot be fairly used to compare school or teacher effectiveness across schools and districts where resources vary.

These findings provide support for a renewed emphasis on progressive distribution of school funding. That is, providing the opportunity for schools and districts serving higher concentrations of low income children and lower current growth, to provide the wage premiums and staffing intensity required to offset these deficits.[4]

Implication 3: The presence of stronger relationships between student characteristics and SGPs across schools and districts within counties, versus across schools within individual cities highlights the reality that between district (between city) segregation of students remains a more substantive equity concerns than within city segregation of students across schools.

As such, policies which seek merely to reshuffle students across charter and district schools within cities and without attention to resources are unlikely to yield any substantive positive effect in the long run. In fact, given the influence of student sorting on the SGPs, sorting students within cities into poorer and less poor clusters will likely exacerbate within city achievement gaps.

Implication 4: The presence of significant resource effects across schools and districts within counties, but lack of resource effects across schools within cities, reveals that between district disparities in resources, coupled with sorting of students and families, remains a significant concern, and more substantive concern than within district inequities. Again, this finding supports a renewed emphasis on targeting additional resources to districts serving the neediest children.

Implication 5: Charter schools do not vary substantively on measures of student growth from other schools in the same county or city when controlling for student characteristics and resources. As such, policies assuming that “chartering” in-and-of-itself (without regard for key resources) can improve outcomes are likely misguided. This is especially true where such policies do little more than reshuffle low and lower income minority students across schools within city boundaries.





[4]This finding also directly refutes the dubious assertion by NJDOE officials in their 2012 school funding report that the additional targeted funding was not only doing no good, but potentially causing harm and inducing inefficiency. http://schoolfinance101.wordpress.com/2012/12/18/twisted-truths-dubious-policies-comments-on-the-njdoecerf-school-funding-report/

Buyer Beware: One Newark and the Market For Lemons

Mark Weber, PhD student, Rutgers University, Graduate School of Education

PDF of Policy Brief: Weber_OneNewarkLemonsFINAL

             The cost of dishonesty, therefore, lies not only in the amount by which the purchaser is cheated; the cost also must include the loss incurred from driving legitimate business out of existence.

– George A. Akerlof, The Market for “Lemons”: Quality Uncertainty and the Market Mechanism.

In his classic economics paper, Akerlof[1] addresses the problem of “asymmetrical information” in market systems. Using the used car market as an example, Akerlof shows that consumers who do not have good information about the quality of goods often get caught buying “lemons.” This not only hurts the individual consumer; it damages the market as a whole, as honest consumers and producers refuse to participate, concerned that false information keeps consumers from distinguishing a good car from a “lemon.”

The “school choice” movement is predicated on the idea that treating students and their families as “consumers” of education will introduce market forces into America’s school systems and improve the quality of education for all.[2]

But what if those families must make their choices armed only with incomplete or faulty data? How can a market operate successfully when consumers suffer from an asymmetry of information?

This brief looks at one example of asymmetrical information in a school choice system: Newark, NJ, whose schools were recently restructured under a plan entitled One Newark.

Newark’s schools have been under state control for nearly two decades; the elected school board only serves in an advisory capacity, making rapid, large-scale transformations much easier to facilitate. Under State Superintendent Cami Anderson, the district introduced One Newark, a plan that calls for students and their families to select a list of eight schools in order of preference for enrollment in the fall of 2014.

This author, in collaboration with Dr. Bruce D. Baker of Rutgers University and Dr. Joseph Oluwole of Montclair State University, has published several briefs analyzing One Newark’s consequences.[3] Among our findings:

  • The plan affects a disproportionate number of black and low-income students, whose schools are more likely to close, to be turned over to charter management organizations (CMOs), or to be “renewed.”
  • CMOs in Newark have little experience in educating demographically equivalent populations of students to the NPS schools; consequently, there is little evidence they will perform any better.
  • The statistical practices and models NPS has used to justify the classification of schools are fundamentally flawed.

This last point is of particular concern. The One Newark application[4] gives one of three ratings for each school a family may choose: “Great,” “On The Move,” or “Falling Behind.” While the district does offer its own profiles of each school, and the NJ Department of Education does offer individual school profiles, it is likely that the ratings on the application will have the most influence on families’ decisions.

If, however, these ratings suffer from the same defects we found in NPS’s previous attempts to classify schools – the lack of accounting for student characteristics, poor statistical practice, and using flawed or incomplete measures, among other problems – families may have a disadvantage when attempting to make an informed choice.

To ascertain whether the One Newark application ratings make sense, I used a statistical modeling technique embraced by NPS itself: linear regression. Only schools reporting Grade 8 test scores were included. The model here uses four covariates the district acknowledges affect test score outcomes: free lunch eligibility, Limited English Proficiency (LEP) status, special education status, and race.[5] The percentage of each student subpopulation for each school is included in this model, along with a covariate for gender, which has been shown to have an effect on test-based outcomes. This model is quite robust: over three-quarters of the difference in test-based outcomes can be statistically explained by these five student population characteristics.

The outcome used here is the one preferred by NPS: scale scores on the English Language Arts (ELA) section of the NJASK, New Jersey’s yearly statewide test. NPS averages this score across grade levels; however, as we have shown in our previous reports on One Newark, this is poor practice, as scale score means and distributions vary by grade.[6] I explore this problem more fully in the Appendix; for now, however, I accede to NPS and use their preferred measure, however flawed it may be.

When all five covariates are included in this model, they create a prediction of how a school will perform (relative to the other schools in Newark). We can then compare the predicted performance of the school with its actual performance. While not all of the difference can or should be attributed to the effectiveness of the school, this technique does allow us to compare the school’s performance against prediction to how the district rated the school in the One Newark application.

Figure 1

Figure 1 shows the difference from prediction for Newark schools – both NPS and charters – and their rating under One Newark. Schools that are being closed or turned over to CMOs are included for comparison. This graph illustrates several important points:

  • While there are many schools labeled as “Falling Behind” that perform below prediction, there are several schools that perform above prediction. Miller St. and South 17th, in particular, perform especially well given their student population. Under what criteria does NPS find that these schools are “Falling Behind”?
  • Conversely, several schools that perform below prediction are rated as “On The Move” or “Great.”
  • Only one charter school in the One Newark application is rated “Falling Behind” (University Heights, which did not report Grade 8 scores and is, therefore, not included in this analysis). But two charters in the application perform below prediction (Greater Newark and Great Oaks), and all except North Star[7] perform below Miller and South 17th.
  • Two other charters that perform below prediction – Robert Treat and Maria Varisco-Rogers – are not included in the One Newark application; these schools opted not to participate in the universal enrollment process.

Certainly, no school should be judged solely on one (flawed) metric. The point here, however, is that even by NPS’s own questionable standards, the classification of schools under the One Newark rating system appears to be arbitrary and capricious.

To be fair, the One Newark application does state that the district did not use averaged scale scores as its sole measure of a school’s effectiveness (to my knowledge, however, NPS has never publicly released a white paper or other document that outlines its precise methodology for rating schools). The district has also used median Student Growth Percentiles (mSGPs) to create its ratings.

SGPs, as measures of growth, are ostensibly measures that do not penalize schools for having students who start at lower absolute levels but still demonstrate progress. Supposedly, SGPs account for the differences in student population characteristics, which are correlated to test results. Former Education Commissioner Christopher Cerf[8] has stated: “You are looking at the progress students make and that fully takes into account socio-economic status… By focusing on the starting point, it equalizes for things like special education and poverty and so on.”

If this were true, we would expect to see little correlation between a school’s average scale score – its “starting point” – and its mSGP. Figure 2 plots these two measures in one graph.

Figure 2

As this scatterplot shows, there is a moderate but significant correlation between a school’s growth and its “starting point.” The bias shown in a school’s scale score, created by its student characteristics, is then at least partially shown also in its mSGP. In other words: SGPs are influenced by student characteristics, but NPS does not account for that bias when using SGPs to create its ratings.[9]

If a school’s student population, then, affects its mSGP, how do student characteristics affect the One Newark ratings? Figure 3 shows the differences in student populations for all three classifications.

Schools that are “Falling Behind” have significantly larger proportions of black students than schools that are “On The Move” or “Great.” Those “Great” schools also have significantly fewer students in poverty (as measured by free lunch eligibility) than “Falling Behind” and “On The Move” schools. “Great” schools also serve fewer special education students, and a slightly smaller proportion of boys.

Figure 3


Arguably, the One Newark rating is less a measure of a school’s effectiveness than it is a measure of its student population. If a family chooses “Great” schools, they are really choosing schools with fewer black, poor, and special needs students.

There is a serious debate to be had as to whether a “choice” system of education is viable for a city like Newark. If, however, NPS has committed to One Newark, it should view its role as a “consumer advocate,” correcting the asymmetry in information and providing justifiable school ratings, rather than limiting the choices students and their families have.

Unfortunately, it appears that NPS is choosing not to be an impartial arbiter; by forcing the closure of NPS schools that, by at least some measures, outperform charters, the district is actively distorting the market forces it claims will improve education.

Under Akerlof’s theory, then, One Newark may not only lead to more student stuck with lemons: it may actually drive more non-lemons out of the market.


Technical Appendix: Problems With Averaging Scale Scores


In its response to our first report on One Newark, NPS made the case that averaging scale scores across grade levels is a superior methodology to ours, which used Grade 8 proficiency rates. We acknowledged that scale scores are a limited but legitimate measure of test-based student performance; certainly no less limited than proficiency rates, but still arguably as valid and reliable.

In our response to NPS, however, we do argue that while scale scores are acceptable for this sort of analysis, averaging scale scores

across grade levels creates a distortion that renders the scale scores less valid as school performance measures.

The problem with averaging scale scores across grades is that each grade level has a different mean scale score and a different distribution of scores around that mean. Table 1, originally presented in our response, shows the different mean scores for each grade level of Newark’s schools, both charter and NPS. The Grade 8 mean score differs from the Grade 4 mean score by over 16 points.

Table 1– Weighted Mean Scale Scores, NJASK LAL, 2013, Newark Only (Charter & NPS)

Test Obs Mean Std. Dev. Min Max
LAL 8 3301 205.0583 11.06671 183 235.8
LAL 7 3154 193.2245 15.9329 170.5 227.6
LAL 6 3631 192.7007 11.03825 172.9 224.5
LAL 5 3255 189.9525 12.66214 166.1 217.0
LAL 4 3223 188.3744 14.46348 165.6 235.5
LAL 3 3680 194.5205 12.0455 173.9 235.7

Why does this matter? Consider two schools with exactly the same average scale scores in all grades; now imagine that they each scored exactly at the citywide mean in all grades. One school, however, has considerably more 8th graders than 4th graders. That school would have an advantage when compared to the other: its larger proportion of 8th graders would push up its overall average, because the mean score for 8th grade is higher than the mean score for 4th. Weighting the means by the number of students in each grade wouldn’t solve this problem; in fact, it creates the problem, because the “average” student in 8th grade gets a higher score than the “average” student in 4th. More weight is being put on the score that is arbitrarily higher.

This problem is further compounded when running a linear regression. Because the dependent variable, grade-averaged mean ELA scale scores, is distorted by grade enrollment, the independent variables do not have a consistent relationship to the dependent variable from school to school. In effect, the rules change for every player.

A more defensible technique for averaging across grades is to run a linear regression for each grade, then calculate standardized residuals, which allow for comparisons across different mean scores. Those residuals are then averaged, weighted for student enrollment.

Figure 4 uses this methodology. Careful readers will notice that the relative position of many schools has shifted from Figure 1, significantly in some cases. Once again, however, there are “Great” schools that underperform relative to “Falling Behind” schools.

Even under this improved method, the classification of schools under One Newark remains arbitrary and capricious.



[1] Akerlof, G.A. (1970). Quarterly Journal of Economics (84) 3, 488-500. https://www.iei.liu.se/nek/730g83/artiklar/1.328833/AkerlofMarketforLemons.pdf

[2] For a classic example, see: Friedman, M. (1980) “What’s Wrong With Our Schools?” http://www.edchoice.org/the-friedmans/the-friedmans-on-school-choice/what-s-wrong-with-our-schools-.aspx

[3] – An Empirical Critique Of “One Newark”: https://njedpolicy.wordpress.com/2014/01/24/new-report-an-empirical-critique-of-one-newark/
– “One Newark’s” Racially Disparate Impact On Teachers:
– A Response to “Correcting the Facts about the One Newark Plan: A Strategic Approach To 100 Excellent Schools”: https://njedpolicy.wordpress.com/2014/03/24/a-response-to-correcting-the-facts-about-the-one-newark-plan-a-strategic-approach-to-100-excellent-schools/

[4] The paper application used for One Newark is no longer available at NPS’s website. Originally retrieved from: http://onewark.org/wp-content/uploads/2013/12/One-Newark-Enrolls-Paper-Application.pdf

[5] http://onewark.org/wp-content/uploads/2013/12/StrategicApproach.pdf

[6] See: A Response to “Correcting the Facts about the One Newark Plan: A Strategic Approach To 100 Excellent Schools,” p. 8.

[7] North Star, however, does engage in significant patterns of student cohort attrition that likely affect its student population and test scores. See: http://schoolfinance101.wordpress.com/2013/10/25/friday-story-time-deconstructing-the-cycle-of-reformy-awesomeness/



A Response to “Correcting the Facts about the One Newark Plan: A Strategic Approach To 100 Excellent Schools”

Full report here: Weber.Baker.OneNewarkResponsewithexecsum

Mark Weber & Bruce Baker


This brief is a response to the Newark Public Schools rebuttal of our analysis of the district’s schools restructuring plan, One Newark. In this response, we find:

  • The consequences of the One Newark plan are racially disparate, creating a possible legal challenge for both the families of students and staff. NPS, however, has not acknowledged this part of our analysis.
  • NPS uses scale scores from state tests, averaged across grade levels, in their rebuttal. We find these measures to be seriously flawed, and certainly no better than the measures we used in our initial report.
  • Even using these flawed measures, we still find the classifications of schools under One Newark to be arbitrary and capricious when accounting for student population characteristics.
  • Even when using scale scores, we find no evidence that the student population of Newark will do better under schools run by charter management organizations. Further, the patterns of student cohort attrition in some charter schools and other behaviors lead us to question the validity of One Newark’s charter takeover strategy.
  • The statistical models used by NPS in their rebuttal are fundamentally flawed: specifically, the author(s) did not account for collinearity within the NPS model, biasing the results towards NPS’s favored position.


On March 11, 2014, the Newark Public Schools (NPS) released a response to our policy brief of January 24, 2014: “An Empirical Critique of One Newark.”[1] Our brief examined the One Newark plan, a proposal by NPS to close, “renew,” or turn over to charter management organizations (CMOs) many of the district’s schools. Our brief reached the following conclusions:

  •  Measures of academic performance are not significant predictors of the classifications assigned to NPS schools by the district, when controlling for student population characteristics.
  • Schools assigned the consequential classifications have substantively and statistically significantly greater shares of low income and black students.
  • Further, facilities utilization is also not a predictor of assigned classifications, though utilization rates are somewhat lower for those schools slated for charter takeover.
  • Proposed charter takeovers cannot be justified on the assumption that charters will yield better outcomes with those same children. This is because the charters in question do not currently serve similar children. Rather they serve less needy children and when adjusting school aggregate performance measures for the children they serve, they achieve no better current outcomes on average than the schools they are slated to take over.
  • Schools slated for charter takeover or closure specifically serve higher shares of black children than do schools facing no consequential classification. Schools classified under “renew” status serve higher shares of low‐income children.

In its response[2], NPS questions both our methodology and our data sources. We are pleased to engage NPS in a thoughtful dialogue about One Newark; however, their rebuttal unfortunately confirms many of our conclusions about the plan, and refuses to even acknowledge many of our critiques.

Rather than answer NPS’s criticisms point-by-point, we take this opportunity to focus on the larger issues NPS raises about our brief, addressing specific arguments within the body of this response. It is our intention here to further the dialogue about One Newark in the hopes that NPS will move toward a position of transparency and engagement with stakeholders, both in and out of Newark.


We are pleased that “An Empirical Critique of One Newark” has generated a response from the Newark Public Schools administration. We have watched over the last few months as the topic of the One Newark plan has generated strong reactions from stakeholders both in and out of Newark. Given the changes that One Newark will bring – changes that even NPS agrees are profound and far-reaching – a measured, careful analysis of the rationale and consequences of these changes is clearly necessary.

Our conclusions are informed by public data using standard statistical methods. We labor to make our results replicable and understandable: we believe it is a testament to our work that NPS was able to respond to “An Empirical Critique” without any questions as to why we reached the conclusions that we did, even if they disagreed with those conclusions.

We believe it is time for NPS to make a similar commitment to transparency in their own formulations of policy. Despite their protestations, we are still no closer to understanding how NPS classified particular schools than we were before. We still do not know NPS’s rationale for why three particular schools are being taken over by two particular CMOs. We still do not know why staff at particular schools face an employment consequence while staff at other schools do not. We don’t know why NPS proposes to divest particular facilities to particular parties.

Backwards-engineering a rationale for One Newark does not contribute to transparency. Using flawed measures like averaged scale scores does not increase stakeholders’ faith in NPS’s ability to justify its plan. Engaging in poor statistical practice does not lead to confidence in NPS’s judgments. And failing to fulfill legal obligations to release data in a timely manner does not encourage a candid exchange of views.

We agree that the educational outcomes of Newark’s students are not acceptable, and that change is needed in the lives of Newark’s deserving children. Whether that change can come solely, or even primarily, through the policies of a state-run school district is an open question. We heartily agree, however, that school policies certainly matter, and Newark should constantly strive to make its schools better, even in the face of seemingly insurmountable problems whose solutions lie outside the purview of the public schools.

But no change can come unless and until an open dialogue about education takes place in front of a well-informed public, where all stakeholders have access to the inner working of the mechanisms that generate policies. If our briefs have compelled NPS to begin to engage in this dialogue, we will consider our time analyzing One Newark to have been well spent.

“One Newark’s” Racially Disparate Impact on Teachers

PDF of Policy Brief: Weber.Baker.Oluwole.Staffing.Report_3_10_2014_FINAL

As with our previous One Newark policy brief, this one is too long and complex to post in full as a blog. Below are the executive summary and conclusions and policy recommendations. We encourage you to read the full report at the link above.

Executive Summary

In December of 2013, State Superintendent Cami Anderson introduced a district-wide restructuring plan for the Newark Public Schools (NPS). In our last brief on “One Newark,” we analyzed the consequences for students; we found that, when controlling for student population characteristics, academic performance was not a significant predictor of the classifications assigned to schools by NPS. This results in consequences for schools and their students that are arbitrary and capricious; in addition, we found those consequences disproportionately affected black and low-income students. We also found little evidence that the interventions planned under One Newark – including takeovers of schools by charter management organizations – would lead to better student outcomes.

In this brief, we continue our examination of One Newark by analyzing its impact on NPS’s teaching staff. We find the following:

  • There is a historical context of racial discrimination against black teachers in the United States, and “choice” systems of education have previously been found to disproportionately affect the employment of these teachers. One Newark appears to continue this tradition.
  • There are significant differences in race, gender, and experience in the characteristics of NPS staff and the staff of Newark’s charter schools.
  • NPS’s black teachers are far more likely to teach black students; consequently, these black teachers are more likely to face an employment consequence as black students are more likely to attend schools sanctioned under One Newark.
  • Black and Hispanic teachers are more likely to teach at schools targeted by NJDOE for interventions – the “tougher” school assignments.
  • The schools NPS’s black and Hispanic teachers are assigned to lag behind white teachers’ schools in proficiency measures on average; however, these schools show more comparable results in “growth,” the state’s preferred measure for school and teacher accountability.
  • Because the demographics of teachers in Newark’s charter sector differ from NPS teacher demographics, turning over schools to charter management operators may result in an overall Newark teacher corps that is more white and less experienced.

These findings are a cause for concern: to the extent that the One Newark plan disproportionately affects teachers of one race versus another, the plan may be vulnerable to legal challenge under civil rights laws.

Conclusions and Policy Implications

In our previous brief, we found that the One Newark plan imposed consequences on schools and their students that were arbitrary and capricious. We found little evidence to support the claim of NPS that One Newark would improve student outcomes, and we found that the students who would see their schools closed, turned over to CMOs, or “renewed” were more likely to be black and/or suffering from economic disadvantage.

In this brief, we turn our attention to the effects of One Newark on NPS staff. We find patterns of racial bias in the consequences to staff similar to those we found in the consequences to students, largely because the racial profiles of students and staff within the NPS schools are correlated. In other words: Newark’s black teachers tend to teach the district’s black students; therefore, because One Newark disproportionately affects those black students, black teachers are more likely to face an employment consequence.

NPS’s black teachers are also more likely to have positions in the schools that are designated by the state as needing interventions – the more challenging school assignments. The schools of NPS black teachers consequently lag in proficiency rates, but not in student growth. We do not know the dynamics that lead to more black teachers being assigned to these schools; qualitative research on this question is likely needed to understand this phenomenon.

One Newark will turn management of more NPS schools over to charter management organizations. In our previous brief, we questioned the logic of this strategy, as these CMOs currently run schools that do not teach students with similar characteristics to NPS’s neighborhood schools. Evidence suggests these charters would not achieve any better outcomes with this different student population.

This brief adds a new consideration to the shift from traditional public schools to charters: if the CMOs maintain their current teaching corps’ profile in an expansion, Newark’s teachers are likely to become more white and less experienced overall. Given the importance of teacher experience, particular in the first few years of work, Newark’s students would likely face a decline in teacher quality as more students enroll in charters.

The potential change in the racial composition of the Newark teaching corps under One Newark – to a staff that has a smaller proportion of teachers of color – would occur within a historical context of established patterns of discrimination against black teachers. “Choice” plans in education have previously been found to disproportionately impact the employment of black teachers; One Newark continues in this tradition. NPS may be vulnerable to a disparate impact legal challenge on the grounds that black teachers will disproportionately face employment consequences under a plan that arbitrarily targets their schools.

The Opportunity Costs of Teacher Evaluation: A Labor and Equity Analysis of the TEACHNJ Legislation

Policy Brief: DougLarkin&JosephOluwole-OpportunityCostPolicyBrief

Dr. Douglas Larkin
Dept. of Secondary & Special Education
Montclair State University
Dr. Joseph O. Oluwole,
Dept. of Counseling and Educational
Leadership, Montclair State University

Executive Summary

In 2012, the New Jersey State Legislature passed and the Governor signed into law the Teacher Effectiveness and Accountability for the Children of New Jersey (TEACHNJ) Act. This brief examines the following questions about the impact of this law:

  • What is the effect of intensifying the teacher evaluation process on the time necessary for administrators to conduct observations in accordance with the new teacher evaluation regulations in New Jersey?
  • In what ways do the demands of the new teacher evaluation system impact various types of school districts, and does this impact ameliorate or magnify existing inequities?

We find the following:

On average, the minimum amount of time dedicated solely to classroom observations will increase by over 35%. It is likely that the other time requirements for compliance with the new evaluation system, such as pre- and post-conferences, observation write-ups, and scheduling will increase correspondingly.

The new evaluation system is highly sensitive to existing faculty-to-administrator ratios, and a tremendous range of these ratios exists in New Jersey school districts across all operating types, sizes, and District Factor Groups. There is clear evidence that a greater burden is placed on districts with high faculty-to-administrator ratios by the TEACHNJ observation regulations. There is a weak correlation between per-pupil expenditures and faculty-to-administrator ratios.

The change in administrative workload will increase more in districts with a greater proportion of tenured teachers because of the additional time required for observations of this group under the new law.

The increased burden the TEACHNJ Act imposes on administrators’ time in some districts may compromise their ability to thoroughly and properly evaluate their teachers. In districts where there are not adequate resources to ensure administrators have enough time to conduct evaluations, there is an increased likelihood of substantive due process concerns in personnel decisions such as the denial or termination of tenure.

New Report: An Empirical Critique of “One Newark”

Our new report is too long to post in its entirety in blog form.

The report can be downloaded here: Weber.Baker_OneNewark_Jan24_2014

Below is the executive summary of the report:

Executive Summary

On December 18, 2013, State Superintendent Cami Anderson announced a wide-scale restructuring of the Newark Public Schools. This brief examines the following questions about One Newark:

  • Has NPS identified the schools that are the least effective in the system? Or has the district instead identified schools that serve more at-risk students, which would explain their lower performance on state tests?
  • Do the interventions planned under One Newark — forcing staff to reapply for jobs, turning over schools to charter operators, closure – make sense, given state performance data on NPS schools and Newark’s charter schools?
  • Is underutilization a justification for closing and divesting NPS school properties?
  • Are the One Newark sanctions, which may abrogate the rights of students, parents, and staff, applied without racial or socio-economic status bias?

We find the following:

  • Measures of academic performance are not significant predictors of the classifications assigned to NPS schools by the district, when controlling for student population characteristics.
  • Schools assigned the consequential classifications have substantively and statistically significantly greater shares of low income and black students.
  • Further, facilities utilization is also not a predictor of assigned classifications, though utilization rates are somewhat lower for those schools slated for charter takeover.
  • Proposed charter takeovers cannot be justified on the assumption that charters will yield better outcomes with those same children. This is because the charters in question do not currently serve similar children. Rather they serve less needy children and when adjusting school aggregate performance measures for the children they serve, they achieve no better current outcomes on average than the schools they are slated to take over.
  • Schools slated for charter takeover or closure specifically serve higher shares of black children than do schools facing no consequential classification. Schools classified under “renew” status serve higher shares of low-income children.

These findings raise serious concerns at two levels. First, these findings raise questions about the district’s own purported methodology for classifying schools. Our analyses suggest the district’s own classifications are arbitrary and capricious, yielding racially and economically disparate effects.  Second, the choice, based on arbitrary and capricious classification, to subject disproportionate shares of low income and minority children to substantial disruption to their schooling, shifting many to schools under private governance, may substantially alter the rights of these children, their parents and local taxpayers.



One Newark is a program that appears to place sanctions on schools – including closure, charter takeover, and “renewal” – on the basis of student test outcomes, without regard for student background. The schools under sanction may have lower proficiency rates, but they also serve more challenging student populations: students in economic disadvantage, students with special educational needs, and students who are Limited English Proficient.

There is a statistically significant difference in the student populations of schools that face One Newark sanctions and those that do not. “Renew” schools serve more free lunch-eligible students, which undoubtedly affects their proficiency rates. Schools slated for charter takeover and closure serve larger proportions of students who are black; those students and their families may have their rights abrogated if they choose to stay at a school that will now be run by a private entity.[1]

There is a clear correlation between student characteristics and proficiency rates on state tests. When we control for student characteristics, we find that many of the schools slated for sanction under One Newark actually have higher proficiency rates than we would predict. Further, the Newark charter schools that may take over those NPS schools perform worse than prediction.

There is, therefore, no empirical justification for assuming that charter takeovers will work when, after adjusting for student populations, schools to be taken over actually outperform the charters assigned to take them over. Further, these charters have no track record of actually serving populations like those attending the schools identified for takeover.

Our analysis calls into question NPS’s methodology for classifying schools under One Newark. Without statistical justification that takes into account student characteristics, the school classifications appear to be arbitrary and capricious.

Further, our analyses herein find that the assumption that charter takeover can solve the ills of certain district schools is specious at best.  The charters in question, including TEAM academy, have never served populations like those in schools slated for takeover and have not produced superior current outcome levels relative to the populations they actually serve.

Finally, as with other similar proposals sweeping the nation arguing to shift larger and larger shares of low income and minority children into schools under private and quasi-private governance, we have significant concerns regarding the protections of the rights of the children and taxpayers in these communities.

[1] Green, P.C., Baker, B.C., Oluwole, J. (in press) Having it Both Ways: How Charter Schools try to Obtain Funding of Public Schools and the Autonomy of Private Schools. Emory Law Journal