What’s the Standard for the Standards?

How should we interpret the Common Core State Standards cut scores?

In 2015, prior to the release of test scores from the new Smarter Balanced (SBAC) assessments, then-California Superintendent of Public Instruction Tom Torlakson tried to prepare the public by warning that scores were likely to be lower than scores on the former assessments, because the new standards were higher.  He was right on both counts—the standards are higher, and the scores were lower.  In a baffling case of standing logic on its head, somebody, somewhere, decided that if too few students were meeting the former standards, then the best solution, by golly, was to raise the standards even higher. 

Despite the Superintendent’s warning, the media dutifully reported the scores without seriously questioning if—just maybe—the standards had been raised too much.  The coverage by EdSource was typical, with a headline reading, “Most California students below standards on Common Core-aligned tests.” Six paragraphs into the story EdSource acknowledged Superintendent Torlakson’s statement that the standards are more rigorous than previous tests, but then cast doubt on that claim by stating, “Test results from 2003, the baseline year for students taking the STAR tests under the 1997 California academic standards, don’t appear to support Torlakson’s argument that the current tests are harder, however.  More students met or exceeded the English language arts test this year than were proficient or advanced in 2003:  44 percent vs. 35 percent (emphasis added).”  The potentially good news that students actually got smarter was, instead, presented as evidence that the tests got easier!  

Similarly, the Sacramento Beeeditorialized, “Only 33 percent of California students met or exceeded the new math standards, and only 44 percent met or exceeded the standards in English. Yikes.”  To be sure, the Bee acknowledged that “These numbers are a baseline, and a lot of those ‘below standard’ scores are probably closer than they seem to the goal.”  But still, even while admitting that the Common Core is a “major upgrade” from previous standards, the purpose, validity, or meaning of that upgrade were not questioned.

That lack of curiosity about what the standards actually are, who sets them, and how, continues to this day. I’ve never understood why the standards themselves receive such little—if any—scrutiny from the public, the media, and policy makers.  It’s as if the standards have been handed down as some sort of received wisdom and accepted as an article of faith, beyond the reach of reasonable inquiry.  

These thoughts came to my mind as I leafed through Cal Facts 2018, recently released by California’s Legislative Analyst’s Office (LAO), and found a graph with the heading, “fewer than half of California’s K-12 students meet state standards.”  It shows that, in Grades 4, 6, 8, and 11, fewer than half (in some cases far fewer than half) of students met state standards in reading and math in the Spring 2018 SBAC assessment.  The lone exception was Grade 11 reading, in which slightly more than half met the standards.

Now, among the first questions that should come to mind (but rarely, if ever, do) are:  what are the standards, who sets them, and how?  After all, on the face of it, there are two equally plausible ways to interpret this graph.  Either California’s students (and by extension our schools) are failing miserably or the standards themselves are unreasonably high.   The second interpretation should be given at least as much consideration as the first.  Instead, first interpretation gets all of the headlines.

California uses four performance levels to describe student performance on the SBAC assessments:

  • Level 1:  Standard Not Met
  • Level 2:  Standard Nearly Met
  • Level 3:  Standard Met
  • Level 4:  Standard Exceeded

Individual student scales scores are used to determine which performance level each student falls within. For example, in third grade math, a student whose scaled score is between 2436 and 2500 would be in Level 3.  A score that divides one level from the next is a cut score.  California uses the numbered levels to get away from the use of the terms below basic, basic, proficient, and advanced that were associated with the former assessments, but the idea is the same.  In fact, in some other states that use the SBAC assessments, Level 3 is still referred to as “proficient.”  

State standards are commonly thought of as grade level expectations—what students are expected to know and do at each grade level. Originally, grade level expectations were based on average student performance (based on norm-referenced tests), which subsequently became the expected level of performance for all students, so that all students were expected to achieve at or above average. While it’s good practice to have high expectations for each individual student, it makes no sense to expect all students within a school, district, or state to achieve above average.   

Most states, including California, have adopted the Common Core State Standards (CCSS), which were developed by a consortium of states through the National Governors Association (NGA) and the Council of Chief State School Officers (CCSSO).  Among other criteria, the standards at each grade level were developed to be “rigorous,” which means, according to the SBAC consortium, they include “high-level cognitive demands by asking students to demonstrate deep conceptual understanding through the application of content knowledge and skills to new situations.”

The Smarter Balanced (SBAC) assessments and performance expectations are tied to those standards.  So, to say that 48% of 4thgraders meet state standards in reading and language arts, is to say that 48% of students have mastered a rigorous course of study and demonstrate a deep conceptual understanding of content knowledge and skills.  We used to refer to them as “A” students.  Now they’re just Level 3:  Standard Met, or, in the prior parlance, “proficient.”

The setting of standards and the setting of cut scores to distinguish between achievement levels is an art, more than a science, and it is done away from the public eye.  The California Alliance of Researchers for Equity in Education (CARE-ED), a collaboration of more than 100 California education researchers) argues that the SBAC (along with PARCC, the other CCSS-aligned assessment) lack “basic principles of sound science, such as construct validity, research-based cut scores, computer adaptability, inter-rater reliability, and most basic of all, independent verification of validity (http://care-ed.org).”  CARE-ED also reports that, “when asked for documentation of the validity of the CA tests, the CA Department of Education failed to make such documentation public.”   (By the way, the SBAC consortium invited non-educators like members of the general public and the business community, who had no pedagogical background at all, to participate in standard setting!)

What the standards lack in scientific rigor, they make up for in subjectivity.  Standards assessed on the SBAC are expressed in the form of “claims,” which are summary statements about the knowledge and skills students are expected to demonstrate on the assessment related to a particular aspect of the standards.”  Here, for example, are the claims for English/language arts for grades 3 through 8:

  • Overall claim:“Students can demonstrate progress toward college and career readiness in English language arts and literacy.” 
  • Reading:“Students can read closely and analytically to comprehend “Students can read closely and analytically to comprehend a range of increasingly complex literary and informational texts.”
  • Writing:“Students can produce effective and well-grounded writing for a range of purposes and audiences.” 
  • Speaking and Listening:“Students can employ effective speaking and listening skills for a range of purposes and audiences.” 
  • Research/inquiry:“Students can engage in research and inquiry to investigate topics, and to analyze, integrate, and present information.” 

All of these things are important, to be sure, but the question of how these objectives get translated into measurable student outcomes at each grade level is not easy (or perhaps even possible) to answer. Let’s look at just one claim at the 3rdgrade level: Students can read closely and analytically to comprehend a range of increasingly complex literary and informational texts.  What does it mean to read “closely” and “analytically?”  How close is close enough?  How is that measured?  What degree of close and analytical reading distinguishes between Levels 1, 2, 3, and 4? What level of complexity should a 3rdgrader comprehend in order to be on a college- and career-ready track?   Where is the evidence to support any specified level?  And is it even possible to define in measurable terms “a range of increasingly complex literary and informational texts?”  Who can even say what a “range” is or how broad or extensive should be the abilities it encompasses? 

Before we really know how to interpret a finding like “fewer than half of California’s K-12 students meet state standards,” we need to know what level of knowledge and performance these standards denote, and we don’t.  And we need to be public and explicit about whether a standard represents a minimum level of achievement reasonably expected of all students or an aspirational level. By its own admission, SBAC has chosen to set the bar at a “rigorous” level.  That’s fine, but SBAC and the CDE and SBE should be open with the public about the level of rigor that the cut scores represent.

This is not an argument for watering down standards or dumbing down expectations, but it is an argument for being open and explicit about what level of knowledge and understanding we want standards to denote. Currently, the public sees them as minimum, while they are really designed to be aspirational.  This disconnect leads to a misinterpretation of test results.  

That the standards are nebulous is beyond doubt. And because they are nebulous, achievement of them cannot be measured with any precision.  Yet, results are presented as if the difference between a scaled score of 2435 and 2436 in grade 3 math determines whether a student is performing at Level 2 (Basic), or Level 3 (Proficient).  

But the problems of measurement, as serious as they are, should take a back seat to the fundamental question of whether the cut scores (however they are measured) have been reasonably set. As James Harvey, Executive Director of the National Superintendents Roundtable has said, “No matter how well-meaning advocates are, when they push for school improvement on the grounds of highly questionable assessment benchmarks, they aren’t strengthening schools and building a better America. By painting a false picture of student achievement, they compromise public confidence in the nation’s schools, in the process undermining education and weakening the United States” (Educational Leadership, February 2018).  This sentiment is echoed by Gary Orfield of the Civil Rights Project at UCLA: “Setting absurd standards and then announcing massive failures has undermined public support for public schools….We are dismantling public school systems whose problems are basically the problems of racial and economic polarization, segregation and economic disinvestment.”

Maybe the standards are absurd or maybe they aren’t. My argument is that we don’t know. And until we know, we cannot make informed judgements about the performance of our schools and students. We have no basis for meaningfully interpreting a finding that fewer than half of California’s students meet “standards” without knowing how demanding those standards are.

I understand that the purpose of Cal Facts 2018is to provide quick, easily digestible facts about a broad range of issues confronting California.  But rather than being digestible, this one gave me heartburn. Sometimes presenting a fact without the necessary context can be more harmful than helpful.

The U.S. News “Best High Schools” ranking for charter schools continues to fall into the survivorship bias trap

This is a quick update to my post of April 25.  Just yesterday the U. S. News released its newest list of the best high schools in the United States.  The list includes a separate ranking for charter schools, which are ranked on the basis of several indicators.

The indicator with the highest weight (30%), is “College Readiness,” which is the proportion of 12thgraders who took and passed at least one AP or IB exam.  This indicator applies only to those students who enrolled in 9th grade or later AND were retained through 12thgrade.  These are the “survivors.”  All students who left the school prior to the administration of these exams in 12thgrade–and there were many–are excluded from the calculation.  You won’t find a purer example of survivorship bias.

Charter schools are also ranked on “Math and Reading Proficiency” and “Math and Reading Performance,” which are both weighted at 20%.  Proficiency is the aggregated scores on state math and reading assessments that students may be required to pass for graduation.  This does not apply to California, because state graduation exams are not required.  It is not known how the 20% weight that would have been applied to this criterion is redistributed among the others.

Math and Reading Performance is how scores on state assessments compare to U.S. News’ expectations, given the student demographics of the school.  (They apparently use some kind of “value added” calculation, but I don’t have the space to get into that here.) In California, these assessments are administered late in 11thgrade, so would measure only those students who did not transfer out before the time of administration.  More survivorship bias.

The graduation rate is weighted at 10% and is defined as the proportion of entering 9thgraders who graduated four years later in 2018.  This is a curious way of putting it, because it implies that U. S. News tracks ALL 9thgraders who enroll in a charter school, including those who leave the school prior to 12thgrade.  They don’t.  That would require them to track outcomes for every student who transfers out of a charter school.  Actually, their numbers reflect the graduation rate for only the survivors.

The four-year cohort graduation rate for all California public schools in 2018 was 83%, and all of the graduation rates of the charter schools ranked reported by U. S. News exceed that rate.  At the same time, the statewide retention rate for the same cohort was 99%, and NONE of the ranked charter schools exceeded that rate.  The school with the highest graduation rate, at 100%, is Stockton Unified Early College Academy, which also has the third lowest retention rate:  77%.  Over all, the retention rates ranged from 58% (Stockton Collegiate International Academy) to 98% (Leadership Public Schools in Hayward), but only four exceeded 90% while six were below 80%. The charter school that was ranked #1 in California and #12 nationally—Preuss School USD—had a retention rate of only 82% and a graduation rate of 95%.

How the University Arkansas Measures Charter School Effectiveness and Return on Investment

(You’ve got to read it to believe it)

One problem with easy access to lots of data and a computer is that—in the wrong hands—it can result in some pretty ridiculous “research.”  Especially when the research is more for the purpose of promoting an agenda than advancing knowledge.  Case in point: A Good Investment: The Updated Productivity of Public charter Schools in Eight U. S. Cities, recently released by the Walton-funded School Choice Demonstration Project at the University of Arkansas.  The authors are Corey DeAngelis, an education policy analyst at the Cato Institute Center for Education Freedom; Patrick Wolf, a professor of education policy and 21stCentury Endowed Chair in School Choice at the University of Arkansas (endowed by Walton); Larry Maloney, president of Aspire Consulting (information technology); and Jay May, a senior consultant for EduAnalytics, LLC (they provide data based solutions to what ails K-12 education).  The eight cities reviewed are Atlanta, Boston, Denver, Houston, Indianapolis, New York City, San Antonio, and Washington, D.C.

This paper has been reported in newspapers around the country like the New York Postand the Washington Examiner.  The tone of the reporting is exemplified by the headline in the Anchorage Daily Planet: “Case Closed:  Charter Schools Deliver More Education.”  As if, in social science research, the case is ever closed. Never mind that the report examines only two subject areas (reading and math) in one grade (8th) and uses highly suspect methodology at that.    

The report compares charter schools in each city’s metropolitan area (not just the city districts) with the traditional public schools (TPS) in the same area on the basis of two factors:  cost-effectiveness and return on investment (ROI).    The authors conclude that, “On average, for the students in our cities, public charter schools are 40 percent more cost-effective and produce a 53 percent larger ROI than TPS.”  This is a pretty startling finding, but does it hold up under even casual scrutiny?  The answer is “no” according to Peter Green, the blogger at Curmudgucation.org.  He has posted an excellent critiqueof this report, but there are some problems he doesn’t get into that further discredit it. 

To begin with, the estimates of cost-effectiveness and ROI both depend on the definition of cost effectiveness the authors use, which is simply the average standardized test scores divided by average revenue per student:

The numerator and denominator in this formula are both beset with problems that relegate the resulting estimates of cost-effectiveness and ROI to the category of junk science.  First, the numerator consists of only 8thgrade reading and math NAEP scores.  Peter Green rightly questions whether standardized test scores in only two academic areas in only one grade really tell us everything we need to know about school quality for an entire school or district.   But let’s play devil’s advocate and say they do.  Even with this stipulation, the scores still have to provide a valid point of comparison between the two segments.  In other words, either the demographics of the test taking populations in the two segments must be identical, or the test scores must be adjusted to account for the differences.  The authors of this report do not adjust the scores for demographic differences, and they apparently believe that the two populations are identical for comparison purposes.  I say “apparently,” because they don’t raise the issue and make no effort to convince the reader that they are.  There are solid reasons for assuming they aren’t.

First, to be demographically identical, the student subgroups within each population must be proportionately the same.  We know, however, that charter schools typically enroll fewer students with disabilities (SD) and English learners (EL), which are historically low scoring subgroups.  Enrolling proportionately fewer of these students would result in higher scores for the charter schools as compared to TPS.  The authors do acknowledge that charter schools enroll fewer EL and SD students, but say “those enrollment gaps failed to explain the revenuedifferences between the public school sectors in every city except Boston (emphasis added).”  Huh?  What about test scoredifferences?  No mention of that.

But even if they did enroll the same proportion of SD and EL students, this still is no guarantee that these students in the charter schools actually took the NAEP.  Here’s what NAEP itself has to say on the subject:

Some SD and ELL students may be able to participate in NAEP, with or without accommodations. Others are excluded from NAEP assessments because they cannot participate with allowable accommodations. The percentage of SD and ELL students who are excluded from the NAEP assessments varies both from one jurisdiction to another and within a jurisdiction over time (emphasis added).  

Charter schools are jurisdictions that are independent of the districts in which they are located. They can make their own decisions about testing accommodations (if any) and which SD and EL students they exclude from the test.  It’s a stretch to believe that the decisions they make are identical to the decisions their host districts make.  And because charter schools like to point to standardized test scores as evidence of their superiority, they have an incentive to exclude low-scoring students from the NAEP.  For these reasons, we cannot accept as an article of faith—as the authors would seem to have us do—that the demographics of either the school enrollment or the test taking populations are identical or even close enough to make a valid comparison in each of the either cities.

The denominator used in this study—total revenue per student—has equally serious problems.  First of all, it uses total revenue, not just revenue applied to 8thgrade reading and math instruction.  So, it compares the total revenue of a charter school containing an 8thgrade (usually a middle school, but sometimes a K-8 elementary school) with that of city districts, which are K-12.  Since grade 9-12 secondary schools receive a higher level of per student funding than the lower grades, this alone inflates the TPS denominator relative to the charter schools.

But there’s more.  The revenue for the TPS includes funding for preschool and adult education!  The authors do this with a straight face and make no effort to try to explain how preschool and adult education revenue can possibly be related to 8thgrade test scores.  Instead, they literally state that, with this methodology, “We are able to connect funding to student outcomes,” completely ignoring the disconnect between the funding they include and the outcomes they purport to measure.

Even within a 6-8 or 7-8 middle school, different schools will spend different amounts on 8thgrade math and reading, depending on different needs and priorities.  The only valid denominator for the author’s purposes is revenue actually spent on 8thgrade math and reading instruction.  Instead of attempting to do this, the author’s go to the opposite extreme of including funding for everything, including the kitchen sink.

I don’t know what’s worse: what this report says about what passes for educational research these days or about the gullibility of too many newspapers and their education journalists. 

True Facts; False Narrative

The Effect of Survivorship Bias on the Calculation of Charter School Graduation Rates

Charter schools market themselves as being superior to traditional public schools largely on the basis of student performance indicators such as standardized test scores and high school graduation rates.  With respect to graduation rates, they usually offer a simple comparison of their rates with something like the statewide average rate for all public schools.  You can see why.  Charter school graduation rates are nearly always higher than the rates for other schools. According to the California Department of Education (CDE), the statewide graduation rate for all public schools in 2016-17 was 82.7 percent.  Nearly all charter schools I reviewed exceeded that rate.

However, graduation rates don’t tell the whole story, as I learned from an examination of charter schools that promote themselves as being “college prep” and/or have been identified as being high performing, in large part on the basis of their high graduation rates.  As it turns out, the calculation of charter school graduation rates suffers from “survivorship bias,” which makes them unreliable indicators of school performance.

In his fascinating book, How Not to Be Wrong:  The Power of Mathematical Thinking, Jordan Ellenberg provides an illustration of survivorship bias at work.  He recounts the story of Abraham Wald, a mathematician at the Statistical Research Group (basically a military think tank) during World War II.  At that time, the U. S. military was confronted with the problem of how much protective armor to put on fighter planes and what parts of the plane needed the most protection.  More armor made the plane less vulnerable to enemy fire, but also made the plane heavier, less fuel efficient, and less maneuverable.  It increased defensive capability at the expense of offensive capability.  On the other hand, less armor allowed a plane to be lighter and more maneuverable—thus increasing offensive capability—but made it more vulnerable to enemy fire. The challenge was to find the amount of protective armor that provided the optimal balance between a plane’s offensive and defensive capabilities and determine where on the plane the armor should be placed.  This problem was assigned to Abraham Wald to solve.  

The military provided Wald with data showing where planes had sustained enemy fire.  When planes returned from combat, they were covered with bullet holes, which, on average were distributed on planes as follows:

               Section of the Plane                    Bullet Holes per Square Inch

                Engine                                               1.11

                Fuselage                                           1.73

                 Fuel system                                      1.55

                 Rest of the plane                             1.80

Using these data, the military asked Wald to figure out the optimal distribution of armor, assuming that the least amount of armor would cover the part of the plane that appears to sustain the least damage—the engine. Wald came back with the opposite conclusion—the engine should receive the most protection.  This is because the data showing the distribution of bullet holes was based on planes that returned to base and did not include the planes that were shot down and did not make it back.  Assuming that all parts of the plane were equally likely to be struck by bullets, Wald wondered why the distribution of bullet holes on the returning planes were unevenly distributed.  His realization was that the returning planes were not representative of all of the planes that left the field that day, because they did not include the planes that did not return.  Those planes must have had bullet holes, too, but where were they?  As Ellenberg puts it:

Wald’s insight was to simply ask:  where are the missing holes?  The ones that would have been all over the engine casing, if the damage had been spread equally all over the plane?  Wald was pretty sure he knew.  The missing holes were on the missing planes.  The reason planes were coming back with fewer hits to the engine is that planes that got hit in the engine weren’t coming back.

Wald’s conclusion was that the armor should not go where the holes were on the returning planes, but where they were not.  Basing an analysis on only the planes that returned to the base and not all of the planes that left the base is an example of survivorship bias.  

The calculation of charter school graduation rates provides another example.  The 12thgraders in a charter school are the “survivors” from the total population of 9thgraders that were enrolled in the school four years before.  The greater the gap between 12thgrade enrollment and 9thgrade enrollment four years before, the greater the effect of survivorship bias, and the less valid it is to use 12thgrade performance as an indicator of a school’s effectiveness.  And yet charter schools do exactly that.

Data from the California Department of Education (CDE) show that the number of students that charter schools graduate from 12thgrade is far below the number of 9thgraders the same schools enrolled four years earlier.  While normal student mobility may account for some of this, the data show that charter high schools systematically fail to retain and graduate as many 9thgraders as traditional high schools.  The result of this survivorship bias renders charter high school graduation rates all but meaningless as a measure of school quality.

How Graduation Rates are Computed

I use the adjusted 4-year cohort graduation rates for 2016-17 provided by the CDE.  This rate is the percentage of 9thgraders who graduate at the end of 12thgrade in four years.  To account for student mobility, the 9thgrade cohort is adjusted for students who transfer in and out during the four-year period.   So, to calculate the graduation rate for an individual school, the number of students who graduate at the end of 12thgrade is divided by the number of students who were enrolled in the school four years earlier (in 9thgrade) plus students who transferred in, minus students who transferred out at any time during that four-year period.  With this methodology, students who drop out from the charter school are retained in the denominator.  Dropouts cause the graduation rate to be less than 100%. But a student who transfers out of the charter school and then drops out from a different school prior to graduation has no effect on the charter school’s graduation rate, even if they don’t transfer out until grade 12.  Those students count against the graduation rate of the school that they transfer to.

High School Retention Rates

To get a measure of possible survivorship bias I look at the retention rate for this same cohort, which is calculated by dividing the number of students enrolled in 12thgrade in a school by the number of students enrolled in 9thgrade in the same school four years later.  In this case, “retention rate” does not denote the retention of the same students during the four-year period; rather, it denotes the retention of the same number of students.  According to data provided by the CDE, the statewide retention rate for all public schools in 2016-17 was 98.5 percent.  This means that the number of students enrolled in all public schools in 12thgrade in 2016-17 was almost equal to the number of students who were enrolled in 9thgrade four years earlier. The loss of 1.5 percent of 9thgrade enrollment is due mostly to the students who drop out.  Students who transfer from one public school to another public school (including charter schools) do not have an impact of the calculation of the statewide retention rate.  

The Charter Schools in this Analysis

Rather than use a randomly selected sample of charter schools, I focus on charter schools that specifically promote themselves as being college preparatory and/or that have been identified as high performing schools, as least partly on the basis of their graduation rates.  These are schools that point to their graduation rates as evidence of their superiority.  Among the schools in this study, three are KIPP schools, and seven are Aspire schools.  Most of these schools have “college preparatory” or “university preparatory” in their names, signifying their focus on preparing their students for graduation and college admission.  

According to its website, KIPP offers “an excellent college-prep education” that provides “personalize[d] learning based on a student’s needs, skills, and interests.”  Similarly, the Aspire website states that, “our purpose is to prepare our…students for success in college, career, and life” and that Aspire schools have a “clear focus on College for Certain.”

I also looked at four Los Angeles Partnership Schools high schools.  The Partnership is a collaboration between the Los Angeles Unified School District, the City of Los Angeles, and other public and private partners to “transform schools and revolutionize school systems.”  It claims to have raised graduation rates since 2008 from 36 percent to 81 percent.  

Finally, I include charter high schools that have been identified by either the US News & World Report or Great Schools.Org as being among the best public schools in the nation based, in large part, on their graduation rates.

Graduation Rates

Table 1 shows the graduation rates of the selected charter schools.  It shows that all but five of the schools have a graduate rate that exceeds the statewide rate for all schools of 82.7 percent.  And 17 of the 29 schools have graduation rates of 90 percent or higher, including two at 100 percent and one at 99.3 percent.  These are, indeed, impressive figures and—by themselves—suggest a high level of performance.  All things being equal, one would expect the charter school graduation rate to mirror the statewide average, with about half of the schools exceeding the average and half falling below it.  The fact that 82 percent of these charter schools exceed the statewide average graduation is a strong indication that all things are not equal, and that there is a systematic reason for this.  Charter school advocates have taken this “systematic reason” to be a superior education provided by charter schools.  Under this line of reasoning, a charter school education is an intervention that, when applied to a population of students that resembles students statewide, produces superior results.

But we cannot look at graduation rates out of context, and when we look at the broader picture, we are forced to question whether charter school 12thgraders do, in fact, resemble their traditional school counterparts.  Specifically, we must take a deeper look at the four-year cohort from which these graduation rates are computed.  Table 1 shows that, on a statewide basis, the cohort graduating in 2016-17 had a retention rate of 98.5 percent.  In other words, the 12thgrade enrollment for all California public schools in 2016-17 was 98.5 percent of the 9thgrade enrollment four years earlier, in 2013-14.  By contrast, the retention rate for the charter schools in this analysis ranges from a low of 56.5 percent to a high of 97.1 percent.  None exceed the statewide average.  Assuming a normal distribution around the average of 98.5 percent, about half, or 14 of the schools in this analysis would be expected to be higher, and the other half lower.  Instead, all of the charter schools are below the statewide average, and 24 of the 28 are far below the average, at less that 90 percent.  This finding is too consistent across schools to be random.  

The charter schools in this analysis have been selected because of their presumed superiority and their focus on college and university preparation.  They claim to provide instruction that is student centered and equitable.  Aspire Public Schools, for example, claim to be “committed to providing equitable opportunities for our students, families, and teammates.  We use an equity lens to examine our policies, practices, and systems at Aspire to strive for all groupsto increase access and benefit from our work” [emphasis added].  Similarly, KIPP schools “blend small-group instruction and technology in creative ways to personalize learning and keep children encouraged, engaged, and continuously learning” [emphasis added].  The Stockton Collegiate International Secondary School “provides a multi-cultural, student-centered environment.”  One would expect that schools with these welcoming, student-centered environments designed to serve all students and keep them engaged would be better at retaining them—at least as good as the statewide average—especially from among a population of students that affirmatively chose to attend those schools in the first place.  Instead, these schools show a remarkable failure to retain students.

Under the methodology for computing the four-year cohort graduation rate, students who transfer away from a charter school for any reason are excluded from the denominator.  Accordingly, when we combine the fact that charter school retention rates are consistently below the state average with the fact that charter school graduation rates are consistently above the state average, we are forced to that conclude that something systematic is at work. Specifically, the charter school 9thgraders who do not persist in their schools through 12thgrade must be among those least likely to graduate.  Attrition among low performers would naturally increase a school’s graduation rate as well as improve scores on standardized tests. Charter schools may succeed not necessarily by providing a superior education, but by ridding themselves of lower performing students.

California law prohibits charter schools from “limiting enrollment access” of “academically low-achieving pupils.” However, nothing in the law requires charter schools to retain low-achieving pupils after they have been admitted and enrolled, and charter schools are not prohibited from expelling students for academic reasons.  They may be expelled, counseled out, or simply discouraged from persisting at any time. Whatever the reason, or however accomplished, the pairing of a consistently higher-than-average graduation rate with a consistently lower-than-average retention rate leads to the inescapable conclusion that it is the low-performing students that charter schools fail to retain.  Again, this finding is too consistent across schools to be random and strongly suggests that the attrition of low performing students from charter schools is part of the “magic sauce” that makes them appear to be better than traditional schools.

Policy Implications

Policy makers often look at performance indicators such as standardized test scores and graduation rates in isolation and conclude that charter schools provide a superior education.  This leads to the further conclusion that the creation of more charter schools should be encouraged as a way to improve opportunities for more students.  In fact, this would be consistent with the intent of the Legislature, as expressed in the California Education Code, that charter schools “Increase learning opportunities for all pupils…” However, the legislative intent goes on to specify that the increased learning opportunities should be “with special emphasis on expanded learning experiences for pupils who are identified as academically low achieving [emphasis added].”   The evidence presented by this analysis suggests that charter schools, at least at the high school level, actually do the opposite:  they weed out low performing students.  

Because of this culling out process, a high graduation rate is not an indication that a charter school does a superior job of serving all of the students it initially enrolls.  One could reasonably question whether a high graduation rate is a measure of the value that the schools brings to the students or of the value that the remaining students bring to the school.  

The first lesson for policy makers is to recognize the effect of survivorship bias and look beyond performance indicators that exclude all students that charter schools enroll in order to understand better the reason why a particular school or program achieves exceptionally good (or poor) results.  Performance indicators may measure results, but they do not explain how results are achieved.  If good results are achieved by weeding out low performing students, it would be poor public policy to promote the expansion of those practices.

The second implication is the need to get a better understanding of the reasons for the high charter school attrition rates. Without surveying students who transfer out of charter schools, this is not easy to do.  To collect data on charter school students who may have been expelled for academic reasons, the Legislature could fund the dropout report required by Section 48070.6 of the Education Code.  Although the first report was due August 1, 2011, it has never been produced due to lack of funding.  The law also requires that, when data are available, the report include “behavioral data by school and school district, including suspensions and expulsions.”  The Legislature could specify that the suspension and expulsion data include the reasons for and outcomes of those actions.

California law also requires that “If a pupil is expelled or leaves the charter school without graduating or completing the school year for any reason, the charter school shall notify the superintendent of the school district of the pupil’s last known address within 30 days, and shall, upon request, provide that school district with a copy of the cumulative record of the pupil, including report cards or a transcript of grades, and health information.” This requirement could be expanded to report also to the school’s chartering authority, if it is different from the district of residence, and to include in the report the reason(s) for expulsions or transfers out.  In addition, the report should also require verification that the student has actually enrolled in another school.  This would prevent a charter school from erroneously reporting a dropout as a transfer.  The Legislature could make these reports a condition of reauthorization and allow authorizing agencies to take this information into account when considering reauthorization.

San Francisco College PrepNo datan/a69.4%-29.4%
King Collegiate High School*93.1%10.4%78.6%-20.2%
San Jose Collegiate*88.0%5.3%77.7%-21.1%
Golden State College Prep90.0%7.3%86.4%-12.4%
Lionel Wilson College Prep81.5%-1.2%77.4%-21.4%
E. Palo Alto Phoenix Academy69.4%-13.3%69.8%-29.0%
Benjamin Holt College Prep100%17.3%81.2%-17.6%
Langston Hughes College Academy93.9%11.2%74.3%-24.5%
Vanguard College Prep Academy85.0%2.3%63.9%-34.9%
Ollin Univ. Prep Academy92.5%9.8%91.3%-7.5%
Stockton Collegiate International AcademyNo datan/a56.5%-42.7%
Univ. Prep. Academy Ctr.93.0%10.3%68.8%-30.0%
Preuss School YCSD95.3%12.6%87.4%-11.4%
Animo Leadership High93.8%11.1%80.6%-18.2%
Summit Prep. Charter High94.4%12.0%64.1%-34.8%
Leadership Public School, Richmond94.1%11.4%94.3%-4.5%
Oakland Charter HighNo datan/a97.1%-1.7%
New West Charter, LA91.2%8.5%84.1%-14.7%
Bright Star, LA89.9%7.2%73.9%-24.9%
High Tech High, San Diego99.3%16.6%87.3%-11.5%
University High, Fresno97.3%14.6%86.2%-12.6%
Stockton Early College Academy100%17.3%77.1%-21.7%
University Prep Academy, San Jose93.0%10.3%96.9%-1.9%
Gateway HS, San Francisco94.3%11.6%85.3%-13.5%
David Starr Jordan HS66.0%-16.7%87.9%-10.9%
Felicitas & Gonzale Mendez HS90.0.%7.3%75.9%-22.9%
Santee Education Complex81.3%-1.4%62.9%-35.9%
Theodore Roosevelt HS78.4%-4.3%82.6%-16.2%

*Also on the USNews Top Schools list

Summit Learning Poses Significant Privacy Concerns

Today’s New York Timesran an article on its front page about Summit Learning, an online learning platform funded by Mark Zuckerberg and developed by Facebook engineers.  For those who don’t have a subscription to the Times, the article is summarized by Diane Ravitch in her blog today.  Summit claims to be used in 380 schools in 38 states and the District of Columbia but, as the Timesarticle describes, it’s being met with growing resistance from students, parents, and teachers alike.

I don’t need to repeat their concerns here.  But what caught my eye is the fact that the platform is provided for free.  Facebook is free, too, and yet it’s worth around $70 billion.

Just as Mark Zuckerberg has found a way to monetize his free Facebook platform by harvesting user data, you can bet he has found a way to monetize Summit Learning.  How?  Could it be data harvesting? In a blog posted in 2016, the Parent Coalition for Student Privacy raised a number of concerns regarding Summit Learning and the potential for data harvesting and provided a list of 25 questions that Summit should be required to answer regarding it privacy policies.  

I’ve read Summit’s privacy policy and terms of use, and to the uninitiated they seem to provide pretty good protection. But there are loopholes.  

For example, Summit agrees to comply with the federal Children’s Online Privacy Protection Act (COPPA), which of course it is required to do by federal law.  However, COPPA only applies to children up to age 12 and students with disabilities.  Its protections do not extend to many middle schoolers or any high schoolers without disabilities. And COPPA provides only a limited amount of protection against the use of user data for direct advertising.

In addition, while Summit agrees to abide by the requirements of the federal Family Educational Rights and Privacy Act (FERPA), this, too, is not as ironclad as it sounds.  FERPA places restrictions on the use and disclosure of information in school records.  However, the FERPA “lock box” applies only to school records.  This means that student information that is obtained from a source other than school records—such as through a Summit Learning platform—is not protected by FERPA, even if it’s the same information.

 But what about student information that Summit obtains from school records? Surely that’s protected, right? Well, no, because Summit’s terms of use give it the keys to the lock box. Here is a direct quote from their TOU: “For the purposes of FERPA, to the extent Personally Identifiable Information from Education Records are transmitted to Summit Learning from Partner School, Summit Learning shall be considered a School Official, under the control and direction of the Partner Schools as it pertains to the use of Education Records…” This is huge, because FERPA gives “school officials” the authority to make critical decisions about the disclosure of student information from school records without the prior written approval of a parent or guardian.

Generally, FERPA prohibits the disclosure of student information without the prior written approval of the parent or guardian.  However, there are certain exceptions to this general prohibition.  For example, prior written approval is not needed to send student records to another school to which a student is transferring.  Another exception is the disclosure of information to organizations conducting certain studies for or on behalf of the school. As a “school official,” Summit has the authority to disclose student information to other organizations, such as, but not restricted to, third party contractors without getting prior written approval from the parents or guardians. 

This is regarded as a major problem by the Electronic Frontier Foundation.  And the concern is not merely academic.  Last November students at the Secondary School for Journalism in Brooklyn walked out in protest of the school’s use of the Summit Learning platform. Two organizers of the walkout co-signed a letter, which included the following:

 “Another issue that raises flags to us is all our personal information the Summit program collects without our knowledge or consent. We were never informed about this by Summit or anyone at our school, but recently learned that Summit is collecting our names, student ID numbers, email addresses, our attendance, disability, suspension and expulsion records, our race, gender, ethnicity and socio-economic status, our date of birth, teacher observations of our behavior, our grade promotion or retention status, our test scores and grades, our college admissions, our homework, and our extracurricular activities. Summit also says on its website that they plan to track us after graduation through college and beyond. Summit collects too much of our personal information, and discloses this to 19 other corporations.”

You can read more about it here.

If you are still comfortable with the level of data protection provided by the terms of use, you should know that they can be changed unilaterally by Summit at any time.  Its terms of use state, “Summit reserves the right to modify or replace these Terms at any time.”  And if you don’t like the changes?  Well, Summit has an answer for you: “If you do not agree to the changes, please stop using the Platform.”  It is silent on what happens to the information already collected.  

California’s Flawed Methodology for Identifying the Lowest Performing Schools

Or, there’s no escaping the long arm of the law of large numbers

In 2000, the Gates Foundation announced it had a new plan to improve high school performance and they were going to pay school districts to adopt this new “reform.”  After reviewing top performing schools around the nation for characteristics that are correlated with high achievement, the Gates people saw something that caught their collective eye—small schools are overrepresented among the top schools they surveyed.  In other words, the percentage of small schools in the population of high performing schools was higher than the percentage of small schools among all schools.  All things being equal, we might expect the percentages to be about the same.

It was only a small—if not entirely logical—leap to the conclusion that smallness contributes to greatness.  Armed with this new insight, the Gates Foundation launched the Small School Initiative and over the next eight years spent approximately $2 billion convincing school districts around the country to break up large, comprehensive high schools into smaller schools.  Other foundations followed the Gates lead and added their own resources to the effort. 

So, for eight years, school districts in cities like Los Angeles, Chicago, New York, and others closed or broke up schools into smaller schools within schools, transferred students and staff, and otherwise disrupted educational programs in pursuit of this new “research based” pathway to school improvement.  A total of 2,602 small schools in 48 states and the District of Columbia were created.

By 2008 it was all over. Bill Gates announced the end of the program with the statement that it had not achieved the results hoped for.  Many sceptics of the small school movement were not surprised and offered various explanations for its failure, such as the idea that small schools have a narrower curriculum and fewer advanced study opportunities.  Bill Gates himself blamed the failure on implementation issues, saying that he did not foresee or sufficiently appreciate the logistical difficulties and disruptive consequences of breaking up large schools into smaller ones.

Whatever the reason, the idea that small schools produce better student outcomes was not drawn from thin air.  After all, the Gates Foundation did its research and found that small schools actually are overrepresented among the highest achieving schools.  True enough.  But here’s the thing:  if the Gates Foundation had bothered to look, they would have found that small schools are also overrepresented among the lowest performing schools.  The reason is purely statistical.  Small schools are overrepresented at both ends of the distribution because their smallness makes them statistically more variable.  This is the Law of Large Numbers.  And it has a long arm, extending to the sample of schools reviewed by Gates and—as I will show—to the 5% of the lowest performing schools identified by California for accountability purposes.

Here’s how it works. If you flip a coin, it has a 50% chance of coming up heads.  The more times you flip it, the more likely it is that half—or nearly half—of the results will be heads.  If you flip a coin only 10 times, it may come up heads 5 times, but relatively large deviations from the expected result of 5 heads would not be surprising with so few coin tosses.  (I use the term “expected result” to refer to a 50-50 result of 5 heads and 5 tails.  In reality, the chances of getting a 50-50 result with so few flips are less than 50-50 and not really to be expected.)

I just flipped a coin ten times, and it came up heads 6 times.  Then I flipped it for 9 more sets of 10 flips each, and here is the number of heads that came up each time:  6, 4, 5, 7, 5, 2, 4, 6, 3.   There is quite a bit variance around the expected result of 5.  But it’s not surprising.  We all know intuitively that anything can happen with such a small sample size.  Now, if we combine the 10 sets of 10 flips into one set of 100 flips, then we get 48 heads, which is proportionately closer to the expected result of 50 than I got in 8 of the 10 sets of flips.  This illustrates the Law of Large Numbers:  the average of the results obtained from a large number of trials should be close to the expected value andwill tend to become closer as more trials are performed.  (Just for fun, I used a computer to simulate 1,000 flips and got heads 512 times. Proportionately, this is about half of the variance from the expected result than 48 out of 100.)  We should not only acceptthat with a small number of trials the result could vary substantially from the mean, we should also expectit.  While a result of 20 heads out of 100 flips is extremely unlikely, a result of 2 heads out of 10 flips is not.  

(The obverse of the Law of Large Numbers would be the Law of Small Numbers, which is that the smaller the number of trials—or observations—the larger the variance from the expected result.  However, I’m avoiding that terminology to prevent confusion with the use of the term by Amos Tversky and Daniel Kahneman in their article, “Belief in the Law of Small Numbers” [Psychological Bulletin, 1971, Vol. 76, No. 2] in which they define the Law of Small Numbers to mean the belief (erroneous) that the Law of Large Numbers applies to small numbers as well.  I.e., that equally valid conclusions can be drawn from small samples as from large samples.)

So, returning to the small school initiative, we can see that the reason that small schools are overrepresented among high performing schools as well as low performing schools is due to the statistical fact that small school performance is more variable. The difference that smallness can make can be quite dramatic, as illustrated by Howard Wainer and Harris Zwerling in “Evidence that Smaller Schools Do Not Improve Student Achievement” (Phi Delta Kappan, December 1, 2006).  Doing a county-by-county analysis of kidney cancer death rates among men, they found that rural counties had the lowest rates.  They also found that rural counties had the highest rates.  The common denominator?  Small populations.  As they put it, “A county with, say, 100 inhabitants that has no cancer deaths would be in the lowest category [that is, among the counties with the lowest incidence cancer deaths].  But if it has one cancer death it would be among the highest.”  In a small population county, the difference between zero and one is the difference between being ranked among the lowest or highest counties in the death rate.

The Law of Large Numbers applies to all situations in which statistical measures are used to sort, rank, or classify any type of entity, including schools.  We have already seen that from the example of the small school initiative.  It’s also evident in the method by which California identifies the lowest performing 5% of schools for state and federal accountability purposes.  

To identify schools for Comprehensive Support and Improvement (CSI), California uses the California Dashboard, which incorporates data on suspensions, expulsions, absenteeism, graduation rates (for high schools), English learner reclassification (pending), and academic indicators drawn from the Smarter Balanced assessments. The dashboard uses five color-coded performance levels for each indicator.  Each indicator incorporates current year status plus the change (positive or negative) from the prior year.  The lowest performance is red.  A school is identified for CSI primarily on the basis of the number of red cells on its Dashboard.  

In a small school, the status and/or change from the prior year of just a few students can determine whether or not that school is in the red category for an indicator.  This is a significant issue, because California has 2,158 schools with an enrollment of less than 200, and 1,512 enroll fewer than 100. That’s a lot of schools that are subject to the Law of Large Numbers.  The California Department of Education (CDE) has recognized that small schools (which it defines as schools with an enrollment of less than 150) are over identified with both the highest the highest (blue) and lowest (red) categories, at least with respect to data reflecting the change from the prior year. To adjust for this, the CDE developed an alternative methodology for small schools called the “Safety Net,” which is intended to limit large swings in change data that can occur in small schools. In 2017 the Safety Net methodology was applied to the graduation rate and suspension rate indictors, and in 2018 it was also applied to a third indicator—chronic absenteeism.

Even with this adjustment, however, small schools are still overrepresented among the lowest-achieving 5%. The average enrollment in California is 594 students per school.  The average enrollment in California schools that have been identified for CSI is 410 students.  That alone tells us that small schools are over represented among CSI schools.  But there’s more evidence.  Here are the percentages of all California schools compared to the percentages of CSI schools with enrollments of less than 100 and 200:

 Enrollment < 100Enrollment < 200
All Schools14.3%20.6%
CSI Schools24.2%42.2%

The percentage of small schools identified for CSI is substantially larger than the percentage of all schools that are small.  In particular, the proportion of schools with an enrollment of less than 200 students that have been identified for CSI is more than double the proportion of those schools among all schools.  There’s no escaping the long arm of the Law of Large Numbers.  

The point is not that small schools are being unfairly picked on (though that’s not an unreasonable conclusion), but that the over identification of small schools leads to an under identification of the larger schools, which is where most of our struggling students are enrolled.   Our sole purpose in identifying schools for CSI is to improve the ability of those schools to improve student outcomes.  In other words, school improvement is not an end in itself, but a means to the end of improved student performance.  With this in mind, our ultimate objective should be to maximize the number of struggling students whose schools receive additional assistance. By over identifying small schools and under identifying larger schools, we fall short of this objective.

I understand that federal law requires states to identify the lowest performing 5% of schools. However, federal law also gives states the authority to determine how those schools will be identified.  It does not mandate California’s current methodology.  California’s attempt to ameliorate the effect of smallness in the identification of low performing schools does not seem to be working. Other approaches, such as averaging data from the prior 2 or 3 years also seem to have limited effect (although a simulation using California data could prove me wrong).  And these relatively minor adjustments will do little to help the large number of struggling students in the bigger schools. A possible solution is to use a hybrid system, in which schools would be identified partly on the basis of the percentage of struggling students and partly on the basis of the number of struggling students they enroll.  This need not change the number of schools that get identified, but it would increase the number of struggling students whose schools receive assistance.