California’s Flawed Methodology for Identifying the Lowest Performing Schools

Or, there’s no escaping the long arm of the law of large numbers

In 2000, the Gates Foundation announced it had a new plan to improve high school performance and they were going to pay school districts to adopt this new “reform.”  After reviewing top performing schools around the nation for characteristics that are correlated with high achievement, the Gates people saw something that caught their collective eye—small schools are overrepresented among the top schools they surveyed.  In other words, the percentage of small schools in the population of high performing schools was higher than the percentage of small schools among all schools.  All things being equal, we might expect the percentages to be about the same.

It was only a small—if not entirely logical—leap to the conclusion that smallness contributes to greatness.  Armed with this new insight, the Gates Foundation launched the Small School Initiative and over the next eight years spent approximately $2 billion convincing school districts around the country to break up large, comprehensive high schools into smaller schools.  Other foundations followed the Gates lead and added their own resources to the effort. 

So, for eight years, school districts in cities like Los Angeles, Chicago, New York, and others closed or broke up schools into smaller schools within schools, transferred students and staff, and otherwise disrupted educational programs in pursuit of this new “research based” pathway to school improvement.  A total of 2,602 small schools in 48 states and the District of Columbia were created.

By 2008 it was all over. Bill Gates announced the end of the program with the statement that it had not achieved the results hoped for.  Many sceptics of the small school movement were not surprised and offered various explanations for its failure, such as the idea that small schools have a narrower curriculum and fewer advanced study opportunities.  Bill Gates himself blamed the failure on implementation issues, saying that he did not foresee or sufficiently appreciate the logistical difficulties and disruptive consequences of breaking up large schools into smaller ones.

Whatever the reason, the idea that small schools produce better student outcomes was not drawn from thin air.  After all, the Gates Foundation did its research and found that small schools actually are overrepresented among the highest achieving schools.  True enough.  But here’s the thing:  if the Gates Foundation had bothered to look, they would have found that small schools are also overrepresented among the lowest performing schools.  The reason is purely statistical.  Small schools are overrepresented at both ends of the distribution because their smallness makes them statistically more variable.  This is the Law of Large Numbers.  And it has a long arm, extending to the sample of schools reviewed by Gates and—as I will show—to the 5% of the lowest performing schools identified by California for accountability purposes.

Here’s how it works. If you flip a coin, it has a 50% chance of coming up heads.  The more times you flip it, the more likely it is that half—or nearly half—of the results will be heads.  If you flip a coin only 10 times, it may come up heads 5 times, but relatively large deviations from the expected result of 5 heads would not be surprising with so few coin tosses.  (I use the term “expected result” to refer to a 50-50 result of 5 heads and 5 tails.  In reality, the chances of getting a 50-50 result with so few flips are less than 50-50 and not really to be expected.)

I just flipped a coin ten times, and it came up heads 6 times.  Then I flipped it for 9 more sets of 10 flips each, and here is the number of heads that came up each time:  6, 4, 5, 7, 5, 2, 4, 6, 3.   There is quite a bit variance around the expected result of 5.  But it’s not surprising.  We all know intuitively that anything can happen with such a small sample size.  Now, if we combine the 10 sets of 10 flips into one set of 100 flips, then we get 48 heads, which is proportionately closer to the expected result of 50 than I got in 8 of the 10 sets of flips.  This illustrates the Law of Large Numbers:  the average of the results obtained from a large number of trials should be close to the expected value andwill tend to become closer as more trials are performed.  (Just for fun, I used a computer to simulate 1,000 flips and got heads 512 times. Proportionately, this is about half of the variance from the expected result than 48 out of 100.)  We should not only acceptthat with a small number of trials the result could vary substantially from the mean, we should also expectit.  While a result of 20 heads out of 100 flips is extremely unlikely, a result of 2 heads out of 10 flips is not.  

(The obverse of the Law of Large Numbers would be the Law of Small Numbers, which is that the smaller the number of trials—or observations—the larger the variance from the expected result.  However, I’m avoiding that terminology to prevent confusion with the use of the term by Amos Tversky and Daniel Kahneman in their article, “Belief in the Law of Small Numbers” [Psychological Bulletin, 1971, Vol. 76, No. 2] in which they define the Law of Small Numbers to mean the belief (erroneous) that the Law of Large Numbers applies to small numbers as well.  I.e., that equally valid conclusions can be drawn from small samples as from large samples.)

So, returning to the small school initiative, we can see that the reason that small schools are overrepresented among high performing schools as well as low performing schools is due to the statistical fact that small school performance is more variable. The difference that smallness can make can be quite dramatic, as illustrated by Howard Wainer and Harris Zwerling in “Evidence that Smaller Schools Do Not Improve Student Achievement” (Phi Delta Kappan, December 1, 2006).  Doing a county-by-county analysis of kidney cancer death rates among men, they found that rural counties had the lowest rates.  They also found that rural counties had the highest rates.  The common denominator?  Small populations.  As they put it, “A county with, say, 100 inhabitants that has no cancer deaths would be in the lowest category [that is, among the counties with the lowest incidence cancer deaths].  But if it has one cancer death it would be among the highest.”  In a small population county, the difference between zero and one is the difference between being ranked among the lowest or highest counties in the death rate.

The Law of Large Numbers applies to all situations in which statistical measures are used to sort, rank, or classify any type of entity, including schools.  We have already seen that from the example of the small school initiative.  It’s also evident in the method by which California identifies the lowest performing 5% of schools for state and federal accountability purposes.  

To identify schools for Comprehensive Support and Improvement (CSI), California uses the California Dashboard, which incorporates data on suspensions, expulsions, absenteeism, graduation rates (for high schools), English learner reclassification (pending), and academic indicators drawn from the Smarter Balanced assessments. The dashboard uses five color-coded performance levels for each indicator.  Each indicator incorporates current year status plus the change (positive or negative) from the prior year.  The lowest performance is red.  A school is identified for CSI primarily on the basis of the number of red cells on its Dashboard.  

In a small school, the status and/or change from the prior year of just a few students can determine whether or not that school is in the red category for an indicator.  This is a significant issue, because California has 2,158 schools with an enrollment of less than 200, and 1,512 enroll fewer than 100. That’s a lot of schools that are subject to the Law of Large Numbers.  The California Department of Education (CDE) has recognized that small schools (which it defines as schools with an enrollment of less than 150) are over identified with both the highest the highest (blue) and lowest (red) categories, at least with respect to data reflecting the change from the prior year. To adjust for this, the CDE developed an alternative methodology for small schools called the “Safety Net,” which is intended to limit large swings in change data that can occur in small schools. In 2017 the Safety Net methodology was applied to the graduation rate and suspension rate indictors, and in 2018 it was also applied to a third indicator—chronic absenteeism.

Even with this adjustment, however, small schools are still overrepresented among the lowest-achieving 5%. The average enrollment in California is 594 students per school.  The average enrollment in California schools that have been identified for CSI is 410 students.  That alone tells us that small schools are over represented among CSI schools.  But there’s more evidence.  Here are the percentages of all California schools compared to the percentages of CSI schools with enrollments of less than 100 and 200:

 Enrollment < 100Enrollment < 200
All Schools14.3%20.6%
CSI Schools24.2%42.2%

The percentage of small schools identified for CSI is substantially larger than the percentage of all schools that are small.  In particular, the proportion of schools with an enrollment of less than 200 students that have been identified for CSI is more than double the proportion of those schools among all schools.  There’s no escaping the long arm of the Law of Large Numbers.  

The point is not that small schools are being unfairly picked on (though that’s not an unreasonable conclusion), but that the over identification of small schools leads to an under identification of the larger schools, which is where most of our struggling students are enrolled.   Our sole purpose in identifying schools for CSI is to improve the ability of those schools to improve student outcomes.  In other words, school improvement is not an end in itself, but a means to the end of improved student performance.  With this in mind, our ultimate objective should be to maximize the number of struggling students whose schools receive additional assistance. By over identifying small schools and under identifying larger schools, we fall short of this objective.

I understand that federal law requires states to identify the lowest performing 5% of schools. However, federal law also gives states the authority to determine how those schools will be identified.  It does not mandate California’s current methodology.  California’s attempt to ameliorate the effect of smallness in the identification of low performing schools does not seem to be working. Other approaches, such as averaging data from the prior 2 or 3 years also seem to have limited effect (although a simulation using California data could prove me wrong).  And these relatively minor adjustments will do little to help the large number of struggling students in the bigger schools. A possible solution is to use a hybrid system, in which schools would be identified partly on the basis of the percentage of struggling students and partly on the basis of the number of struggling students they enroll.  This need not change the number of schools that get identified, but it would increase the number of struggling students whose schools receive assistance.