0% found this document useful (0 votes)
25 views176 pages

Out

Uploaded by

ecn5451
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views176 pages

Out

Uploaded by

ecn5451
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 176

STUDENT LEARNING, STUDENT DEMOGRAPHICS, OR SOMETHING ELSE?

A QUANTCRIT ANALYSIS OF HOW SCHOOL ACCOUNTABILITY REFLECTS


STUDENT LABELS BUT NOT STUDENT NEEDS
by
Kim Strong
B.A., Portland State University, 2009

M.A.T., North Carolina State University, 2013

A thesis submitted to the


Faculty of the Graduate School of the
University of Colorado in partial fulfillment

of the requirement for the degree of


Doctor of Philosophy
School of Education
2022

Committee Members:

Kathy Escamilla

Mimi Engel

Kira Hall

Sue Hopewell

Michelle Renée Valladares

i
Abstract

Kim Strong, (Ph.D., School of Education)

Student Learning, Student Demographics, Or Something Else? A QuantCrit analysis of how

school accountability reflects student labels but not student needs

Thesis directed by Professor Emeritus Kathy Escamilla

Improving educational outcomes and school conditions for historically marginalized

students has been a primary goal since the school accountability movement began in the 1960s.

However, despite decades of legislation, policy, and enactment designed to achieve this purpose,

historically marginalized students continue to suffer from disparate academic outcomes. Using

Critical Race Theory and QuantCrit frameworks, this dissertation analyzed accountability

outcomes in relation to student demographics, school contexts, and English Learner

characteristics and services in Denver Public Schools over three years to understand what the

accountability framework employed by the district was and was not measuring. Findings indicate

that the schools with the highest accountability ratings consistently had (a) smaller proportions of

students of color, students receiving Free and Reduced Lunch services, and English Learners; (b)

higher rates of Fully Qualified teachers and students identified as Gifted and Talented; and (c)

nearly half the frequency of disciplinary actions, incidents, and actions resulting in instructional

loss. When these variables were used in Ordinary Least Squares (OLS) and ordinal logit multiple

regressions, this study revealed that student demographics and disciplinary actions were

statistically significant predictors of both accountability scores and outcomes. These results

indicate that the accountability framework used by the district was biased in favor of schools that

ii
served small proportions of historically marginalized students while ignoring and hence failing to

address disparate access to educational resources like high quality teachers, Gifted and Talented

programs, native language supports, and less punitive disciplinary environments. These failures

to measure and thus encourage equitable learning environments coincided with a downward

trend of schools increasingly gaining failing accountability status during the study, with charter

schools – which some see as solutions to public school dysfunctions – having the highest rates of

discipline and lowest rates of language supports for English Learners. Implications of this study

include the recommendation that districts conduct “equity reviews” to ensure accountability

policies do not disproportionately harm historically marginalized students and that accountability

frameworks include metrics to evaluate school contexts and services to promote the equitable

allocation of resources and opportunities to all students.

iii
Dedication

This work is dedicated to the many people whose support, wisdom, generosity, kindness, and
guidance allowed me to conduct this work.

Me gustaría agradecer a mi familia en Carolina del Norte: María, Chofo, Jared, Isa, Flaco,
Sonya, Byron, Yetnaletzi, Chayo, Maje, Ashira, Aye, Edith, Jeremy, Mitzy, Cristal, Dulce, Laila
y Panchito. Desde cuando nos conocimos me hicieron sentir parte de la familia. Gracias por
enseñarme lo que es tener un gran corazón y una familia como Dios manda. En particular quiero
agradecerle a Mitzy, quién me inspiró a hacer el doctorado en educación. Mitzy, mereces
muchísimo más que lo que has recibido de las escuelas y eso es culpa del sistema – no tuya. Con
mucho gusto espero ver todas las grandes cosas que tú y tu familia logren. Este trabajo es para
ustedes por compartir sus historias conmigo y dejarme ser parte de sus vidas.

To my Colorado family I owe more gratitude than I can express, not just for what you have done
for me but for my family as well. You welcomed us into your home for holidays, birthdays, and
barbeques. You celebrate our triumphs and share in our tears. You have been Katherine’s
‘buelos, so that neither she nor we feel so alone here. From the bottom of my heart, thank you for
being so good to us. Los queremos muchísimo, don Manuel y doña Kathy. Thank you for being
our family. Kathy, I could not have gotten through this program without you. You have been my
idea sounding board, my encouragement to move forward during difficult times, my role model,
my mentor, and my friend. If I could achieve a fraction of the brilliance, kindness, goodness, and
service that is your legacy, I will consider my life a success. You are the best person I know, and
the lessons I have learned from you have made me who I am today. Anything that I may
accomplish will be in no small part due to the impact you have had in my life and that of my
family. Thank you for being the best advisor, role model, and friend I could hope for. I humbly
dedicate this work to you as a token of gratitude for refusing to give up on me.

Finally, this dissertation is dedicated to Katherine and Simitrio, to whom I owe everything. Not
just for your patience through the long nights and missed weekends of graduate school, but
because you have given me a reason to be here. Thank you for the (many!) times you both
encouraged me to just drop out, saying I didn’t need a PhD for you to be proud of me. You have
taught me what real love is. Thank you for showing me what it is to have a family, to care and be
cared for, to know contentment and peace beyond what I had imagined was possible. You are my
world, and I love you with all my heart.

iv
Acknowledgments

This research was conducted in support of the decades-long advocacy of the Congress of
Hispanic Educators. Thank you for allowing me to contribute to your historic fight for the
educational rights of bilingual students and families. Special thanks to Dr. Martha Urioste, Esther
Romero, Dr. Darlene LeDoux, Roger Rice, and Lu Liñan (in memoriam) for your mentorship
and encouragement throughout the years.

Without a doubt, I could not have done this work without my committee, and to them I owe a
special debt of gratitude. More than act as examples of outstanding scholarship and complex
thinking, more than model professional talents and accomplishments, they exemplify how to be
fundamentally good people. From them, I have learned what it means to be a true scholar:
unapologetically brilliant yet not condescending, leaders of their fields who make time
mentorship, exceptionally accomplished and still kind-hearted. I have learned from each of you
so much more than academic skills – you have taught me how to be a strong woman with a clear
sense of purpose, both in academia and beyond, and that is a lesson that will stay with me for a
lifetime. Thank you, Dr. Kathy Escamilla, Dr. Mimi Engel, Dr. Kira Hall, Dr. Sue Hopewell, and
Dr. Michelle Renée Valladares.

v
Table of Contents

LIST OF TABLES .................................................................................................................................................. VIII

LIST OF FIGURES ................................................................................................................................................... IX

INTRODUCTION ........................................................................................................................................................ 1

SOCIOHISTORICAL CONTEXT ...................................................................................................................................... 1

STUDY SITE CONTEXT .............................................................................................................................................. 12

THEORETICAL FRAMEWORK ..................................................................................................................................... 18

CONCEPTUAL FRAMEWORK...................................................................................................................................... 22

PURPOSE OF STUDY .................................................................................................................................................. 33

RESEARCH QUESTIONS ............................................................................................................................................. 34

SIGNIFICANCE OF THE RESEARCH ............................................................................................................................. 35

LITERATURE REVIEW .......................................................................................................................................... 37

RESEARCH REGARDING THE VALIDITY OF ACCOUNTABILITY POLICIES .................................................................. 38

RESEARCH REGARDING THE EFFICACY AND OUTCOMES OF ACCOUNTABILITY POLICIES FOR HISTORICALLY

MARGINALIZED STUDENTS ...................................................................................................................................... 43

QUANTCRIT APPROACHES TO UNDERSTANDING SCHOOL ENVIRONMENTS AND OUTCOMES FOR HISTORICALLY

MARGINALIZED STUDENTS ...................................................................................................................................... 47

RELATIONSHIP BETWEEN PREVIOUS RESEARCH AND THE DISSERTATION ............................................................... 52

METHODS .................................................................................................................................................................. 55

DATA OVERVIEW AND STUDY PARAMETERS ........................................................................................................... 55

RESEARCH PROCESS ................................................................................................................................................. 64

METHODS PER RESEARCH QUESTION (RQ) .............................................................................................................. 66

ANALYSIS AND INTERPRETATION ............................................................................................................................. 74

POSITIONALITY STATEMENT .................................................................................................................................... 74

RESULTS .................................................................................................................................................................... 77

vi
RESEARCH QUESTION 1: WHAT ARE THE STUDENT DEMOGRAPHICS, EL CHARACTERISTICS, AND SCHOOL CONTEXTS

PER SCHOOL PERFORMANCE FRAMEWORK (SPF) RATING BRACKET? ...................................................................... 77

RESEARCH QUESTION 2: AT WHAT RATE DID SCHOOLS REMAIN IN, ENTER INTO, OR EXIT THE MOST EXTREME SPF

RATINGS DESIGNATIONS OF INTERVENTION AND BLUE STATUS, AND WHAT ARE THE STUDENT DEMOGRAPHICS, EL

CHARACTERISTICS AND SERVICES, AND SCHOOL CONTEXTS IN THESE STATUSES? ................................................... 85

RESEARCH QUESTION 3: WHAT ARE THE STUDENT DEMOGRAPHICS, EL CHARACTERISTICS AND SERVICES, AND

SCHOOL CONTEXTS PER CHARTERS AND DISTRICT-RUN SCHOOLS? ........................................................................... 93

RESEARCH QUESTION 4: DO STUDENT DEMOGRAPHICS PREDICT PERCENT SPF POINTS EARNED? ........................... 98

RESEARCH QUESTION 5: DO STUDENT DEMOGRAPHICS PREDICT SPF OUTCOMES?................................................ 111

SUMMARY .............................................................................................................................................................. 116

DISCUSSION ............................................................................................................................................................ 117

IMPLICATIONS FOR RESEARCHERS.......................................................................................................................... 127

IMPLICATIONS FOR TEACHERS, FAMILIES, AND ADVOCATES ................................................................................. 130

LIMITATIONS .......................................................................................................................................................... 132

CONCLUSION .......................................................................................................................................................... 133

REFERENCES ......................................................................................................................................................... 137

APPENDIX A............................................................................................................................................................ 161

APPENDIX B ............................................................................................................................................................ 163

vii
List of Tables

Table 1. Pearson Correlations of Potential Control Variables and SPF Percent Points
Earned ………………………………………………………..……………...... 70

Table 2. Pearson Correlation of Student Demographic Predictors and SPF Percentage


Used in Multiple Regressions …………………………………..…………….. 72

Table 3. Means of Student Demographics, English Learner Characteristics, Outcomes


and Programs, and School Contexts Across SPF Ratings Brackets for
Academic Years 2016-2017 through 2018-19 …………..…………..………. 78

Table 4. Descriptive Statistics of Schools that Remained In, Entered Into, and Exited
From Intervention Status and Blue Status per District-Run and Charter
Schools as of the Final Year of the Study (2018-2019) ……..…………..…… 86

Table 5. Key To Abbreviated Variable Names ……..…………………………………. 89

Table 6. Descriptive Statistics of Means of Schools that Remained In, Entered Into,
and Exited From Intervention Status and Blue Status Across the Three-Year
Study Timeframe Aggregate ……..……………………..……………..……… 90

Table 7. Means of Study Variables per District-Run And Charter Schools for Each
Year of Study and Three-Year Aggregate …………..……………………..… 94

Table 8. Individual Predictor and Saturated Models OLS Regressions with Cubed
Terms for Academic Years 2016-2017 Through 2018-2019 ………..……… 100

Table 9. Descriptive Statistics of Percentiles, Minimum and Maximum Values,


Standard Deviations, and Means for Each Student Demographic Predictor
Variable Used in Multiple Regressions ……..………..…………..…………. 106

Appendix Tables

Table 1. Data Sources, Datasets, Data Types, and Data Uses in Dissertation …………. 161

Table 2. Means of Student Demographics, English Learner Characteristics, English


Learner Outcomes and Programs, and School Contexts Across SPF Ratings
Brackets for Academic Year 2016-2017 …………..…………..……………. 163

Table 3. Means of Student Demographics, English Learner Characteristics, English


Learner Outcomes and Programs, and School Contexts Across SPF Ratings
Brackets for Academic Year 2017-2018 …………..…………..……………. 164

Table 4. Means of Student Demographics, English Learner Characteristics, English


Learner Outcomes and Programs, and School Contexts Across SPF Ratings
Brackets for Academic Year 2018-2019 …………..…………..……………. 165

viii
List of Figures

Figure 1. Denver Public Schools SPF Color-Coded Rating Brackets Description And
Points Cutoffs …………..…………..………..…………..…………..….…… 14

Figure 2. Scatterplots Panels of Student Demographics and the Percent of SPF Points
Earned ……………………………………………………………………..… 71

Figure 3. Mean Percentages of Select Student Demographics, EL Characteristics and


Services, and School Contexts Across SPF Ratings Brackets for Each Year... 79

Figure 4. Predicted Percent SPF Points Earned per Individual Student Demographic
Variables Reflecting Models 2, 4, 6, and 8 …………………………..……… 106

Figure 5. Predicted Probabilities of Receiving Simplified SPF Outcomes (A)


Intervention, (B) On-Watch, or (C) High Performing per Student
Demographic Predictor Using Models 2, 4, 6, and 8 from Research Question
4 ……..…………………………………………………..…………………… 113

ix
Introduction

Sociohistorical Context

Legislative History

Although assessments have been used in the United States since the nineteenth century,

how they embodied “accountability” was very different from our contemporary understanding.

Historically, assessments held students, rather than teachers, accountable for their own learning

(Ravitch, 2002). Public oversight of education was manifested through school board elections

and the input-oriented reporting of funding allocations to ensure that all students were provided

with adequate resources (Cuban, 2004; Elmore & Fuhrman, 1995). This changed in the early

twentieth century, as pseudo-scientific standardized testing gained popularity as a part of the

eugenics movement (Zuberi, 2001) and the newly formed departments of education throughout

US colleges began to see assessments as scientific instruments which could precisely measure

learning and progress (Ravitch, 2002).

Although professional educators at the time adhered to a belief that the purpose of

education was to prepare future citizens and that educational shortcomings would be best

remediated through improved support (Ravitch, 2002), this orientation was dramatically altered

with the passage of the Elementary and Secondary Education Act (ESEA) in 1965. The ESEA

was remarkable, not only for being a sweeping piece of federal legislation specifically designed

to improve the education of students of color and students in poverty (DeBray-Pelot & McGuinn,

2009; Thomas & Brady, 2005), but also because it represented a turning point by tying federal

funding to measurable evidence of program effectiveness, thus setting the stage for testing to

1
become the measure of school success and the bedrock of the modern school accountability

movement (Cuban, 2004; Ryan & Shepard, 2008).

This focus on outputs like test scores was further cemented with the release of the 1966

Coleman Report, a Congressionally-mandated study which focused on academic outputs like test

scores rather than inputs like supports, concluding that school funding and resources alone were

not predictive of academic achievement but rather a combination of other variables such as

family background and school composition were more strongly correlated with outcomes

(Coleman et al, 1966). Despite employing problematic methodology (Darling-Hammond, 2004),

the Coleman Report had an outsized impact on the popular understanding of education reform,

with its recommendation to desegregate lost to the deficit view that families in poverty and

families of color were variables that correlate to student academic failure (Ladson-Billings,

2006), a failure which no amount of additional funds or resources could ameliorate (Hanushek,

1997).

The Coleman Report also ushered in a new public attention to educational achievement,

and by the 1980s there was growing pressure on lawmakers to rectify what was increasingly seen

as a “crisis” in US schools. In response, the Regan administration commissioned a study about

the state of public education, and in 1983 published A Nation At Risk. The study warned that

public education in the US was in a dire situation due to the rising mediocrity of student

outcomes that would directly imperil the country’s geopolitical and economic competitiveness

(Slater, 2015). Only a top-down reform agenda guided by more rigorous standards – measured

by the regular use of standardized testing and enforced through clear incentives and sanctions for

schools and teachers – would remediate public education’s precarious state in the US (National

Commission on Excellence in Education, 1983).

2
Uniformly, governors at the time turned to business communities for guidance, who, with

an equal uniformity, responded according to the dispositions and knowledge they had available:

Schools, it was concluded, should be run more like businesses and hence subjected to

hierarchical management, cost-cutting initiatives, standardization, and external control; their

success would be best measured through their balance sheets and quantifiable performance

(Ravitch, 2002). This paradigm paved the way for the focus of the modern accountability

movement on behaviorist logics in which rewards and punishments are seen to motivate changes

in performance (Heubert & Hauser, 1999). Under this paradigm, test scores are taken to be

directly attributable to educational environments in such a way that poor test performance is seen

to be indicative of poor teaching that merits sanctions (Wiliam, 2010), such as the loss of status

and possibly the consequent loss of students, staff, and funds, and even the loss of the school

altogether if it is converted into a charter (Dworkin, 2005). Although seemingly straightforward,

these behaviorist logics rely on the unstated premises that (a) the youth are dangerously ignorant;

(b) high-stakes tests are reliable, valid, and appropriate for all students; and (c) the White,

middle-class students who consistently serve as the normative reference for these standardized

tests are the ideal against which all other students and educational contexts should be measured

and toward which they should aspire (Mathison & Ross, 2013). Together, the shifts following A

Nation at Risk not only represented a growing national education reform agenda but also a

fundamental reconceptualization of the purposes and outcomes of public schooling. Although

some educational scholars at the time advocated for alternative conceptions of accountability –

such as the proposal by Smith and O’Day (1992-1993) that school reform movements prioritize

equality of opportunities to learn challenging content – ultimately the conception of

accountability adopted at the national level eschewed equitable inputs in favor of standardized,

3
quantifiable, and performance-based outputs enforced through punitive consequences (Guiton &

Oakes, 1995).

The public appetite for school reform spurred bipartisan cooperation throughout the

1990s leading to the No Child Left Behind (NCLB) act of 2002 (McGuinn, 2006). Under the

NCLB, schools were to be primarily evaluated according to performance-based output measures

as determined by standardized test results, thus cementing the transition away from input-

measures of school quality that had previously defined accountability for almost a century. These

output measures were used to evaluate schools and identify low performance, which could then

be remedied by additional supports under the School Improvement Grant (SIG) component of

the NCLB. Under the Obama Administration, low-performing schools applying for SIG funds

were required to implement one of four possible intervention models: transformation (including

evaluation, curricular, and structural redesigns), turnaround (including staff layoffs), restart (as a

charter or under charter or external management), or closure (Trujillo & Renée, 2015).

Although the NCLB was the reauthorization of the ESEA, unlike the targeted

commitments of the ESEA to improving education for students of color and students in poverty

specifically, the NCLB sought to reform public education for all students (Cuban, 2004). As

such, states and school districts were required to report on the learning of all students as

measured by standardized testing, with a special mandate to report on historically marginalized

students’ test scores and achievement outcomes as disaggregated subgroups to ensure that these

students’ learning received particular attention (Cramer, Little & McHatton, 2018; Fusarelli,

2004). The requirement to report on disaggregated student data was incorporated into the

subsequent reauthorization of the ESEA, the Every Student Succeeds Act (ESSA) of 2015.

Departing from the NCLB, the ESSA gave states more flexibility in deciding which indicators to

4
use to measure school success and how much weight to give to each (Callahan & Hopkins, 2017;

Darling-Hammond et al, 2016). However, this flexibility has not interrupted the focus on test

scores in Colorado, as the Department of Education has opted to continue to primarily rely on

outcomes of standardized assessments when evaluating schools (Colorado Department of

Education, 2019).

Civil Rights History

Yet the focus on outputs that characterizes contemporary accountability law and policy

stands in contradiction to the origins of the accountability movement. Born during the Civil

Rights Era, school accountability as established by the ESEA was produced in a social-historical

context which included grassroots civil rights organizers who protested racial and linguistic bias

in public schools, demanding that school officials be held accountable for the education of

students of color as measured both by the output of academic success as well as the input of

ending discriminatory practices (Contreras, 2011; Palazzolo, 2013; Roney & Gutierrez, 2019).

Despite the history of political and corporate interest in and influence over education reform

(Kornhaber, 2004), the need to hold schools accountable for the academic success of historically

marginalized populations began during the Civil Rights era of the 1960s and 1970s. However,

this movement was not led by politicians but community coalitions which often represented

communities of color who were aware that public schools were chronically failing their children

(Peck, 2012).

For example, in 1969 students and community members organized in Crystal City, Texas,

to fight against a school system which systematically underserved and marginalized Latinx

students, ultimately demanding that the school provide bilingual and bicultural education while

also improving Latinx representation in the curriculum, teaching staff, administration, and

5
student activities (Palazzolo, 2013). That same year students and community members organized

walkouts in Los Angeles and Denver in response to school systems that both habitually failed to

provide Latinx students with equal access to quality education while also overtly discriminating

against them. Students in Denver were met by police force when they walked out (Roney &

Gutierrez, 2019) and in Los Angeles the community meetings were routinely infiltrated by

plainclothes police (Contreras, 2011). Nonetheless these grassroots movements prevailed in not

only demanding that their local schools be more responsive to community needs but also

achieving actual policy changes which made the schools more accountable for the educational

success of historically marginalized students according to community input – not standardized,

high-stakes tests. This focus on using accountability as a tool to promote attention to the needs

of students of color and bilingual students was again taken up in the 1980s and 1990s when

national coalitions such as the National Council of La Raza, the Education Trust, the Citizens’

Commission on Civil Rights, the Center for Law and Education, the Education Equality Project,

and the NAACP joined together to organize political and corporate support for accountability,

employing conservative ideology and business platforms to successfully argue that improved

outcomes for historically marginalized students was both necessary and feasible if federal

education reform centered on school accountability and standardization (Rhodes, 2011).

Although in this alignment with conservative and business interests the original

grassroots call to focus on inputs was diminished, the disparities were not. Not only were

students of color approximately twice as likely to work with ineffective teachers (Darling-

Hammond, 1998) but schools that served more low income students and students of color had

less access to learning resources like laboratories and computers (Oakes, 1990), a disparity that

became exacerbated in high school when such schools chronically lacked advanced placement

6
courses, tracking students of color and students in poverty into remedial and vocational courses

instead (Oakes & Guiton, 1995). These disparities had roots in unequal funding structures across

the US which resulted in the wealthiest schools spending up to ten times more than the poorest

schools on per pupil student learning, leading to schools that served more students of color

lacking textbooks, science labs, licensed teachers, art and music instruction, and functioning

bathrooms despite the poorest districts consistently taxing themselves at higher rates than the

richest districts (Kozol, 1991). Such disparities of resource investments continue to reverberate

today, as high schools with large Black and Latinx student populations less often offer calculus,

physics, chemistry, or algebra II as compared to high schools with small Black and Latinx

enrollment (Office for Civil Rights, 2016), leaving students in poorer districts with only basic

courses structured around rote memorization and vocational tracks while their wealthier peers

take classes in foreign language, art, music, technology, and science-based learning (Darling-

Hammond, 2013).

Leading up to the passage of the NCLB, educators and researchers increasingly insisted

that attention to issues such as access to quality teachers, quality curriculum, and resources be

incorporated into any new accountability framework, fearing that failing to do so would only

exacerbate the current disparities in outcomes between historically marginalized populations and

their dominant-group peers as students and teachers would be expected to perform at higher and

higher standards without being given the supports necessary to achieve them (Darling-

Hammond, 1998; Guiton & Oakes,1995). However, the concerns that attention to resources and

opportunities were too difficult to measure (McDonnell, 1995), would not necessarily guarantee

an increase in achievement (Elmore & Fuhrman, 1995), and were largely irrelevant to learning

(Hanushek, 1997) prevailed. In the predecessor to the NCLB, Goals 2000, consideration of such

7
opportunities and resources was optional (Guiton & Oakes, 1995). Under the NCLB, standards

for investments in opportunities and resources were totally absent save the requirement that

schools employ “qualified teachers,” a term left to be defined by individual states and which,

ironically, led to “English Learner”1 (EL) and immigrant students disproportionately being

served by novice teachers in the years after the NCLB was implemented (Dabach, 2015).

As feared, this lack of consideration for how opportunities and resources were invested in

schools resulted in schools that serve larger numbers of students of color, students in poverty,

and EL students being disproportionately given low accountability ratings which, under the

NCLB, were tied to the loss of funds. The loss of students as a direct result of low ratings then

further exacerbated the lack of resources that these schools had (Glynn & Waldeck, 2013;

Martin, 2012; Martinez-Garcia, LaPrairie & Slate, 2011; McNeil, Coppola, Radigan & Vasquez

Heilig, 2008). Worse yet, such low scores prompted many of these schools to narrow the

curriculum to only those subjects and skills – including test-taking – that were measured by the

standardized tests the NCLB used to evaluate schools. This curriculum narrowing resulted in the

loss of challenging curriculum coupled with an incentive for schools to push out low performing

students in the hopes of raising test scores (Darling-Hammond, 2007; Vasquez Heilig, Young &

Williams, 2012). Together, the loss of funds, loss of students, loss of challenging curriculum, and

incentive to push the most vulnerable students out of school resulted in an accountability

framework that, in the name of increasing performance, disproportionately punished schools that

1
This dissertation will discuss policies and legislation that use deficit language to describe raced, classed, and
linguistically marked groups. When referencing those documents, I will endeavor to use the language of the original
texts because of the legally-defined nature of the terms. This does not imply that I agree with the deficit language or
ideologies behind it. When I describe populations generally and not in relation to specific policies and legislation, I
will do so with more inclusive and equity-oriented phrasing. For example, I will use the term “emergent bilingual”
except when referencing specific policies and legislation that employ different terminology, such as in this case
when “English Learner” references a specific legal designation.

8
serve historically marginalized students while winnowing the opportunities and resources these

students had.

Despite seeking to promote the educational success of historically marginalized students,

the accountability movement and the high-stakes, standardized tests which drive it have been

broadly criticized for exacerbating rather than remediating the educational inequities that such

students face. Emergent bilinguals are disadvantaged by many standardized tests because these

tests are normed on monolingual English-speaking populations; since, by definition, emergent

bilingual students have not yet mastered English, these high-stakes assessments not only measure

content knowledge but also the language skills that such students are necessarily still developing

(Abedi, 2004; Menken, 2010; Solórzano, 2008; Tsang, Katz & Stack, 2008). Other historically

marginalized students, such as students of color or students in poverty, are likewise

disadvantaged by accountability systems which punish schools and teachers for low performance

on standardized tests, resulting in de facto policies that encourage low-performing students to

drop out or for their teachers to retain them in order to prevent their scores from being recorded

(McNeil, Coppola, Radigan & Vasquez Heilig, 2008; Vasquez Heilig & Darling-Hammond,

2008). Such discrepancies make the outcomes of high-stakes standardized tests and

consequentially of the accountability frameworks that they inform as much reflections of student

demographics as they are of student performance (Glynn & Waldeck, 2013; Martin, 2012;

Martinez-Garcia, LaPrairie & Slate, 2011; Strong & Escamilla, 2020), resulting in accountability

systems that punish historically marginalized students and their teachers rather than promote

learning.

9
Ideological History

The tension between the grassroots origins of the accountability movement and the

contemporary outcomes of those reforms begs the question, why did the latter interpretation of

accountability successfully inform national policy reforms while the former did not? In their

international comparative study of accountability frameworks Dorn and Ydesen (2014) identified

the highly cultural nature of school accountability as it reflects sociopolitical contexts and serves

to legitimize certain conceptions of the purpose of education at the expense of others. Although

there are hosts of interpretive frameworks available to make meaning of the world (Keane,

2018), the actors and institutions which already possess disproportionate cultural, social, or

economic capital are more likely to also have disproportionate access over which interpretive

frameworks are used (Bourdieu & Thompson 1991; Fairclough, 1995). This control over

ideological resources can be employed in order to promote those worldviews that are most

advantageous to the already-powerful by, for example, selectively disseminating the discourses

which normalize existing power relations or negative out-group and positive in-group identities

(van Dijk, 1993). In this way the NCLB, the ESSA, and any policy text should be seen as both

reflecting our social world by employing pre-existing ideas as well as constitutive of it by

promoting some ideas at the expense of others (Anderson & Holloway, 2020).

This holds true in the US context as well, which saw the interests of dominant political

and economic actors ultimately succeed in defining accountability according to standardized

outputs. Seeing an opportunity in the perceived education crisis inspired by the A Nation At Risk

report, political and economic elites promoted school reforms that instituted neoliberal logics of

individualization by placing the locus of responsibility of institutional success and failure on

students and teachers instead of states (Finn, Nybell & Shook, 2010; Wilson, 2018), while

10
diverting attention away from the local and global contexts in which those students and teachers

operate (Burman et al, 2017). This reflected a market rationality that sought to maximize outputs

while minimizing investments (Ambrosio, 2013), reconceptualizing the purpose of education

away from the production of future citizens and toward the production of future workers

(Jenlink, 2016) who, along with teachers, are measured through standardized and thus

decontextualized metrics that are presented as stable, unitary, and universally applicable

(Gershon, 2016).

Through such quantitatively defined standard metrics like test scores and school ratings,

these neoliberal logics were able to then claim that some schools were failing and deserved the

punishment of closure (Sunderman, Coghlan & Mintrop, 2017), thereby justifying the

privatization of public resources when ‘failing’ schools were consequently converted into private

charters (Ambrosio, 2013). Although charters are understandably viewed by some historically

marginalized communities as attractive alternatives to public education systems that have failed

them, unfortunately they not only do not perform better than public schools (Ravitch, 2010) but

they can also further marginalize students, such as in the case of emergent bilingual and special

needs students who are enrolled in charters at lower rates due to exclusionary practices and the

denial of appropriate services (Shum, 2018). Sadly, it is these very communities that are most

negatively impacted, as neoliberal and racial logics converge to justify the transfer of public

resources into private hands by framing communities of color as economically irrelevant to

economic elites and thus both undeserving of state investments and legitimate targets of

disenfranchisement (Lipman, 2013).

11
Study Site Context

Historical Context

These historical, conceptual, and policy contexts converge at the site of the study, Denver

Public Schools (DPS). Understanding the historical context of Denver Public School’s

relationship to marginalized student populations demonstrates why it is particularly well-suited

for a study regarding the intersection between education policy and provision of educational

opportunities and resources with special attention to the needs of emergent bilingual students. In

the1973 Supreme Court case, Keyes v. School Dist. No. 1, DPS was ordered to enact racial

desegregation in a ruling that was also notable because it confirmed that “hispanics”2 were an

identifiable class for 14th Amendment purposes and thus DPS could no longer argue that a

school with a majority African American and Latinx population was desegregated (Keyes v.

School District No. 1, 1973). In 1980, the Congress of Hispanic Educators (CHE) filed a

supplemental complaint based on the Equal Educational Opportunities Act (1974) to argue that

“limited-English proficient” students also experienced unequal education. The resulting 1983

District Court case Keyes v. School Dist. No. 1 found that DPS was obligated to “to take

appropriate action to eliminate language barriers which currently prevent a great number of

students from participating equally in the educational programs offered by the district” and that

“the issues which have been brought before the court by the plaintiff-intervenors [CHE] are part

and parcel of the mandate to establish a unitary [desegregated] school system” (Keyes v. School

Dist. No. 1, 1983).

2
Although “Latinx” is my preferred term because it is both non-male and non-cis normative, here I use “hispanics”
as this was the term used in the Supreme Court ruling.

12
While not mandating that DPS provide bilingual education, the ruling concluded that

providing services to ensure that bilingual students have access to equal education is

indistinguishable from desegregation. As a consequence of this ruling, DPS entered into a

Consent Decree (CD) in the 1984 Order Approving Programs for Limited English Proficient

Students which gave the court oversight of DPS’s plans to improve education for emerging

bilingual students (Keyes v. School Dist. No. 1, 1984). Although DPS was let out of court

oversight from the desegregation order in the 1990s, the court has continued oversight of DPS’s

provision of services to bilingual students to this day, making this court order the oldest in the

country. With CHE as the plaintiff and the Department of Justice as the “plaintiff-intervenor” as

of 1999, the most recent iteration of the Consent Decree in 2012 stipulates that DPS engage in

systematic tracking of services provided to and outcomes of “English Language Learners”

(Consent Decree of the U.S. District Court, 2012).

Accountability Context

This unique historical context reflects another area in which DPS stands out: It is not only

the largest school district in the state, but also the only one that created its own accountability

framework rather than use the framework created by the Colorado Department of Education.

Since its rollout in 2008, the accountability system designed by the district for its own use, called

the School Performance Framework (SPF), had undergone nearly annual revisions due to

consistent public backlash over its policies and outcomes (Asmar, 2016b; Asmar, 2017; Asmar,

2019a; Asmar, 2020b) until it was disbanded entirely in 2020 (Asmar, 2020c; Denver Public

Schools, n.d. - d). The SPF evaluated schools using different indicators according to school

context. Depending on how schools scored across these different indicators, they were given a

percentage of points earned out of total points possible, which placed them into one of five color-

13
coded accountability ratings. Red was the lowest rating possible, followed by Orange, then

Yellow, then Green, then Blue, which was the highest (Denver Public Schools (n.d. - d). See

Figure 1 for a description of each color-coded SPF rating bracket, the SPF points necessary to

achieve each rating bracket, and the district’s description of what each rating bracket indicates

regarding school quality.

Figure 1.
Denver Public Schools SPF Color-Coded Rating Brackets Description And Points Cutoffs

Note: Image taken from Denver Public Schools website “Learn more with an SPF Report
Guide,” retrieved from https://spf.dpsk12.org/en/understanding-your-spf-report/

14
The bulk of SPF scores were determined by students’ performance on annual state-

administered standardized assessments (Denver Public Schools, n.d. - e). Consistent with federal

requirements, these test scores were used to calculate both single-year snapshots of student

performance, called Status, as well as year-to-year measures of advances in learning reported as

student Growth (Asmar, 2019a), with Growth being weighted more heavily than Status (Asmar,

2017). In addition, in the 2016-2017 academic year the district implemented an Equity Indicator,

which described the degree of performance differentials between dominant and marginalized

students’ standardized test outcomes wherein schools with large “academic gaps” were

prohibited from receiving the highest SPF rating despite their scores on all other indicators

(Asmar, 2016b). A small percentage of schools’ SPF scores were also derived from the results of

the Parent and Student Satisfaction surveys and, for high schools, graduation rates and post-

secondary readiness (Denver Public Schools, n.d. - e).

The SPF operated under a behaviorist perspective in which rewards and punishments are

seen to motivate schools and teachers to perform differently (Dworkin, 2005). In DPS, the SPF

was intended to reward high performing schools with publicly available desirable SPF ratings

while identifying low-performing schools for extra supports and, if improvements were not

achieved, negative reinforcements (Asmar, 2017). Negative reinforcements included reduced

teacher pay, mandated improvement plans, possible restart or closure, and publicly available low

ratings which reduced enrollment and funding. These negative consequences mirrored those

mandated by the federal government under the Obama Administration in which low performing

schools seeking federal grants were required to implement interventions of either transformation,

turnaround, restart, or closure (Trujillo & Renée, 2015).

15
The publicly available SPF ratings translated into more families choosing Blue or Green

rated schools than Red schools under the district’s universal school choice model, thereby

reducing low-performing schools’ enrollment and the funding attached to it (Asmar, 2019a) and

jeopardizing these schools’ ability to fund teachers and programming (Asmar, 2019b). SPF

ratings also impacted teacher pay, as the district used an incentives-based system (Asmar,

2016c). Although the district shifted its policy regarding school restarts and closures several

times in the last decade, Red and Orange status triggered review and intervention in both of the

most recent policies. Beginning in 2015, the district adopted the School Performance Compact

(Denver Public Schools, n.d. - a), which initially used SPF scores to identify the lowest

performing schools for review and potential closure or restart (Asmar, 2016a), with schools

scoring Red two years in a row or a Red or Orange rating in the two years preceding a Red rating

automatically being slated for restart or closure if student progress was insufficient (Asmar,

2018; Denver Public Schools, 2018). In 2018, that policy was revised to mandate that all schools

with two years of Red SPF ratings or an Orange SPF rating followed by a Red rating needing to

submit improvement plans, which would then be reviewed by a committee before an intervention

– ranging from one or two years of monitoring to restart or closure – was suggested to and voted

upon by the school board (Asmar, 2018), although this policy was also suspended due to

community concerns in the 2018-2019 academic year (Denver Public Schools, n.d. - a).

However, it must be noted that district-provided information about the district’s policies

regarding interventions following low SPF ratings was unclear, as the district describes its policy

in vague terms such as, “When schools do not meet expectations for academic growth and

achievement on the School Performance Framework, DPS provides intensive support to help

them get back on track… . Although a restart, turnaround or closure is never an ideal outcome, it

16
is sometimes necessary” (Denver Public Schools, n.d. – a), and, “If schools are not able to show

improvement after significant support efforts over time, DPS believes that the students served by

these schools deserve a major change in their learning environment” (Denver Public Schools,

2018). Neither of these statements nor any publicly available information from the district

specifies exactly what kinds of supports are provided, what amount of time schools have to show

improvement, what constitutes significant improvement, what “major change in learning

environments” entails, what “restart” entails, or how and under what circumstances schools are

closed. Upon request for clarification, the district representative for accountability was

unresponsive.

Research Context

Taken together, the federal mandate for disaggregated reporting found in the ESSA and

the local mandate for disaggregated reporting found in the Consent Decree result in a unique

accountability context in which DPS is required to provide exceptional amounts of accountability

data due to its historical struggle to provide equitable education to historically marginalized

populations and English Learners in particular. Currently, DPS serves over 90,000 students, most

of whom are Latinx (approximately 64%) and approximately a third of whom (37%) are labeled

as “English Learners.” The student population of DPS is also 13% African American and 23%

White, with 67% of all students qualifying for Free and Reduced Lunch services, a proxy for

socioeconomic status, and 11% of students classified as receiving Special Education services.

While these characteristics are not uncommon in the US, they do result in the district serving

primarily students of color and students in poverty, with over one in three students requiring

special language interventions, tracking, and reporting per the CD.

17
Despite the flexibility of the ESSA and the commitment to serving bilingual students and

students of color expressed by the current superintendent, Alex Marrero, DPS has struggled to

implement school accountability that equitably measures student learning. A recent study by

Strong and Escamilla (2020) found that SPF school ratings had statistically significant

relationships to English Learners’ levels of English proficiency as measured by student WIDA

ACCESS scores, making the SPF not only a reflection of student learning but also of student

demographics. Since neither schools nor teachers can control what kinds of students they serve,

an accountability framework that penalizes them for serving historically marginalized

populations threatens to stigmatize these students as educators may falsely conclude that the

negative outcomes of low SPF ratings – including district intervention, loss of students and

funds, and school closure – are attributable to the students themselves rather than a faulty

accountability system (Strong & Escamilla, 2020).

Theoretical Framework

Critical Race Theory

The ways that accountability policies both purport to serve historically marginalized

students while also resulting in outcomes in which the schools that serve these students are

disadvantaged and further marginalized is predictable if one employs a Critical Race Theory

(CRT) lens, which understands schools to be sites of social reproduction that are socially- and

historically-situated. This perspective allows us to connect current practices in education to

broader patterns of institutional power disparities across society. Critical Race Theory is both a

theoretical and methodological framework with origins in critical legal studies (Bell, 1980;

Matsuda, 1993). CRT posits that, rather than a temporary or isolated aberration, racism is

endemic throughout US society and institutions (Russell, 1992) and thus must serve as a focal
18
point into any social science research (Howard & Navarro, 2016). By foregrounding race,

research is thus able to highlight the processes by which racialization reproduces inequality such

as through public education policy (Gillborn, Warmington & Demack, 2018). Racialization does

not reflect inherent, essential differences between individuals (Bonilla-Silva & Zuberi, 2008) but

rather is a social construct whose fluid boundaries shift in the service of maintaining a white

supremacist social order (Roediger, 2005). Race as a social construct nonetheless has real,

material consequences, such as regarding the allocations of school resources and opportunities

(Ladson-Billings & Tate, 1995), the right to property (Harris, 1993), and even language status

designations (Rosa, 2016). Furthermore, a tenet of CRT is that, although racialization is endemic

throughout US society, it interacts with other forms of oppression and power to create unique

intersectional identity categories that must be understood holistically rather than reduced to their

constituent parts (Crenshaw, 1991). In this way, students’ race interacts with students’ economic

status, gender, language practice, and cultures to create specific contexts in which students’

opportunities are afforded or denied (Howard & Navarro, 2016).

CRT explicitly calls attention to the fact that students of color, students in poverty, and

emergent bilingual students are poorly served by public schools, often as the result of policies,

legislation, and practices that systematically disenfranchise such students and the communities

from which they come (Baker & Wright, 2017; Donato, 1997; Donato & Hanson, 2012; Flores,

2005; Kozol, 1991; Leonardo, 2015; Menchaca, 1993; San Miguel & Donato, 2010; Santa Ana,

2004). This patterned disenfranchisement can be seen in part as stemming from disparities in

school contexts and investments, making the “achievement gap” more accurately understood as

an “opportunity gap” (Ladson-Billings, 2013b). Although such systemic educational

disenfranchisement has contributed to the inequalities which accountability reforms purport to

19
address, CRT scholarship clarifies this seemingly contradictory dynamic through the concept of

interest convergence, or the way that institutional policies like accountability are justified

through liberal, race-neutral, or even social justice frameworks but in reality serve to further the

interests of dominant-group members while obscuring how such policies are actually self-serving

(Bell, 1980), although some scholars contend that the interest convergence concept centers

whiteness and dominant group members at the expense of explicit attention to the needs and

interests of communities of color (Garces, Ishimaru & Takahashi, 2017).

QuantCrit

Building off of Critical Race Theory, QuantCrit posits that, due to the problematic history

and uses of demographic statistics, researchers employing these methods must do so carefully

and only for social justice purposes lest the research inadvertently perpetuates the white

supremacist status quo (Gillborn, Warmington & Demack, 2018). The tenets of QuantCrit,

discussed below, can be summarized as: (a) racism is central to US society; (b) quantitative data

are neither objective nor politically neutral but imbued with social and research bias, like any

kind of data; (c) likewise, racial categories are not objective, stable, or natural but social

constructions; (d) quantitative data, like all data, require interpretation, the act of which is

imbued with researcher bias, and thus numbers should not be taken to ‘speak for themselves’;

and (e) because of the ideological and political nature of quantitative research, like all research,

in addition to its problematic history of being used as justification for a white supremacists and

oppressive social order, researchers using quantitative data must do so with explicitly antiracist

designs and purposes (Gillborn, Warmington & Demack, 2018)

It must be noted that using statistical methods to study the relationship between student

demographics and learning outcomes is highly problematic, especially considering one of the

20
demographic variables of interest in this study – student race. Born during a time of overt racism,

statistical analysis of racial demographics was developed as a part of the eugenics movement,

which sought scientific justification for white supremacy through the “objective” measurement

of the “inferiority” of people of color that “necessitated” a racist social order (Zuberi, 2001). In

addition, racial categories themselves are highly arbitrary and unstable due to the socially

constructed nature of race, making racial categorizations fluid and ideational rather than a

discrete, consistent marker of biological difference (Bonilla-Silva & Zuberi, 2008). The use of

racial categories to define populations is thus both error prone and easily co-opted by white

supremacist interpretations of racial groups’ different outcomes as implications of inherent racial

difference and racial superiority/inferiority (Gillborn, Warmington & Demack, 2018).

These concerns beg the question, why measure racial groups at all? Although race is a

social construct, the ways individuals are racialized in society has very real material

consequences. Because public schools, like many institutions in the US, operate as systems of

social reproduction (Bourdieu & Thompson, 1991; Covarrubias, 2011), and the historical

foundation in this country is built on white supremacy (Russell, 1992), public education in this

country often reflects and reproduces white supremacist social orders (Leonardo, 2015; Yosso,

2002). For example, in a review of all 50 states’ elementary and secondary standards, Sabzalian,

Shear and Snyder (2021) used descriptive statistics guided by QuantCrit and TribCrit to reveal

that over half of states have little or no discussion of tribal sovereignty anywhere in their K-12

standards. For reasons like these, it is imperative that educational researchers are attentive to

issues of race in public schools without reifying racist narratives of inherent racial difference

(Crawford, Demack, Gillborn & Warmington, 2019).

21
In order to do this in quantitative research, scholars can adopt a critical stance toward

statistical methods and data (Stage, 2007). QuantCrit scholars note that although quantitative

data are often taken by academics, policymakers, and the public as somehow more objective than

qualitative data, they are nonetheless still subject to a host of researcher biases at every step of

the research design, implementation, and interpretation: Which questions are asked, how

populations are defined and measured, and the ways that relationships between population

groups and social outcomes are interpreted are all reflections of researcher choice and hence

potentially researcher bias (Crawford, Demack, Gillborn & Warmington, 2019; Stage, 2007;

Suzuki, Morris & Johnson, 2021). Thus, quantitative data is no more apolitical, objective, or

value-free than qualitative data and need to be understood for their subjective, political nature

(Bonilla-Silva & Zuberi, 2008).

Although a disperse framework, QuantCrit is guided by several positions derived from

Critical Race Theory such as the understanding that white supremacy is endemic in US society

(Bell, 1992) and thus necessarily constitutes the context of all our social practices and hence

research foci, making all research essentially political (DeCuir-Gunby & Thandeka, 2019).

Critical scholars, then, must be explicitly dedicated to antiracist research approaches lest our

findings by default perpetuate the white supremacist status quo (Garcia, López & Vélez, 2018).

Conceptual Framework

Taken together, Critical Race Theory and QuantCrit frameworks are useful tools to

highlight the mechanisms by which school accountability functions to both further

disenfranchise historically marginalized communities as well as legitimate that marginalization.

These frameworks call on scholars to be attentive to the processes of marginalization, such as

through the role of the privatization of public resources as in the example of charters. They also

22
call on scholars to interrogate the relationship between racialized populations and disparate

investments and outcomes that purportedly justify such privatization. This dissertation drew on

the explicitly antiracist foundations of both of these frameworks in its intention and design, as it

sought to challenge deficit views of historically marginalized communities by highlighting

patterns of systemic oppression (Yosso, 2002), such as how institutional processes like

accountability reproduce systemic inequalities through the disparate school contexts in which

racialized, classed, and linguistically marked students find themselves and that result in

disparities in these groups’ academic outcomes. In addition, these two critical theoretical

frameworks were employed to conceptualize how variables extrinsic to the SPF accountability

framework that reflect institutional patterns of marginalization might impact accountability

outcomes, thereby guiding the selection of quantitative variables for analysis. Specifically, this

study explored the intersections between SPF accountability outcomes and (a) student

demographics, (b) school contexts identified in previous research that reflect student

demographics (teacher quality, discipline, charter status, and enrollment), and (c) characteristics

and services relevant to English Learners specifically. The last set of variables reflects both the

CRT concept of intersectional identities of racialized students in which their language and class

status contribute to unique social locations and thus avenues of marginalization as well as the

specific context of the study site, Denver Public Schools, whose struggles to adequately serve

English Learners resulted in the intervention of the Department of Justice as discussed in the

above section. The rationale for selecting these variables for analysis is presented below.

Student Demographics

This study included variables describing student populations of students of color, students

receiving Free and Reduced Lunch services (a proxy for classed status via income), and students

23
with the English Learner label due to research which shows that racialized, classed, and

linguistically marked populations are often denied the academic services and resources necessary

for school success (Darling-Hammond, 2004; Martin, 2012; Wu, 2013). Critical Race Theory

and QuantCrit describe this maldistribution of resources and opportunities as reflections of

educational failures rather than students and families (Morris & Parker, 2013; Ramlackhan &

Wang, 2021). Together, this study included student demographic variables to investigate whether

accountability outcomes mirrored student demographics while framing any disproportionality

under a CRT and QuantCrit lens as reflections of systems rather than communities.

An additional area of systemic disproportionality that has been shown to negatively

impact historically marginalized students deals with placement and referrals to either Gifted and

Talented (GT) programs and Special Education (SPED) services. Black and Latinx students are

more likely to be referred for Special Education services than their White peers (Tenenbaum &

Ruck, 2007) even within income categories, indicating that race rather than socioeconomic status

alone is a stronger predictor of disproportionate SPED placements (Grindal, Schifter, Schwartz

& Hehir, 2019). Similarly, English Learner students, taken to be students of color, are more

likely to not only be labeled as having learning disabilities but also mental retardation relative to

their White peers (Sullivan, 2011).

The potential institutional biases against students of color that result in these

disproportionalities are also evident in the underrepresentation of the same students in GT

programs (Grissom & Redding, 2015). The propensity to under-identify the talents of

historically marginalized students was exemplified when a large school district in Florida

transitioned from a system for identifying students for GT based on teacher referrals to one based

on universal screening where, without changing standards for entry into GT programs, the

24
district saw enormous increases in girls, students of color, students in poverty, and English

Learner students all qualifying for placement (Card & Giuliano, 2016). The role of teacher bias

in failing to identify the talents and strengths of historically marginalized students as evidenced

in these studies might explain the national patterns of disproportionality in GT programs wherein

Black and Latinx students, who represent 42% of enrollment in schools that have GT programs,

only represent 28% of GT participants, a similar dynamic to that of English Learners who

likewise make up 11% of students by only 3% of GT participants (US Commission on Civil

Rights, 2018).

The overrepresentation of historically marginalized students in SPED programs and the

underrepresentation of these same students in GT programs represent a failure to provide these

students with appropriate educational services as placement decisions are based on students’

racial, class, and language statuses rather than learning needs. As such, this study operationalized

the Critical Race and QuantCrit frameworks which highlight these patterned disparities through

the inclusion of variables describing the rates of GT and SPED participation in each school in

addition to variables describing rates of students classified as students of color, students

receiving Free and Reduced Lunch services, and English Learner students.

School Contexts

Another set of study variables focused on school contexts, including enrollment, student-

teacher ratios, charter status, discipline rates, and teacher quality. Students in poverty, students of

color, and emergent bilingual students are less likely to work with highly qualified teachers

(Darling-Hammond, 2004; Goldhaber, Lavery, & Theobald, 2015; Lankford, Loeb & Wyckoff,

2002), whether that be measured through years of experience (Clotfelter, Ladd & Vigdor, 2005;

Dabach, 2015), teacher effectiveness ratings (Borman & Kimball, 2005), or teachers having

25
majored or minored in the content area if teaching middle or high school (Jerald & Ingersoll,

2002; Peske & Haycock, 2006). In fact, an international comparative study found that the US

ranked fourth of 46 countries in disparate access to “quality” math teachers between high- and

low-SES students (Akiba, 2007). Teacher qualifications – especially certification and education

in the content area – have been found to be more impactful than student demographics, class

size, teacher salaries, and general school funding in affecting student achievement (Darling-

Hammond, 2000), making the disparate access to quality teachers across raced, classed, and

linguistically-marked students especially troubling. Strong and Escamilla (2020) found that the

schools serving larger proportions of students designated as English Learners had smaller

average proportions of their teaching body that were designated as “Fully Qualified” to teach

bilingual students relative to schools that served student populations that were Whiter, wealthier,

and more composed of English-monolingual and English-proficient students. Applying a CRT

to this issue, disparate access to “Fully Qualified” teachers is a metric that likely represents

mechanisms by which the district denies full investments of resources and opportunities to raced,

classed, and linguistically marked populations, resulting in disparate accountability outcomes

which in turn exasperate the lack of equitable access. For this reason, in this study CRT and

QuantCrit frameworks are operationalized through the inclusion of a variable describing the

percentage of “Fully Qualified” teachers at each school.

Variables describing disciplinary environments in schools were also included in the

study, as the race and socioeconomic status of students has been found to predict rates of

disciplinary referrals (Bryan, Day-Vines, Griffin & Moore-Thomas, 2012; Skiba, Chung,

Trachok, Baker, Sheya & Hughes, 2014), with Black students overrepresented in both K-12 and

also tragically preschool suspensions and expulsions (US Commission on Civil Rights, 2018).

26
Teachers have been found to not only direct more positive speech at White students, but also

recommend them less often for behavior referrals when compared to Black and Latinx students

(Tenenbaum & Ruck, 2007) with White teachers more likely to interpret their Black students’

behaviors as disruptive (Bates & Glick, 2013; Wright, 2015). Such dynamics result in students of

color being overrepresented in disciplinary actions that remove them from school all together,

such as out of school suspensions (Anyon, Wiley, Samimi & Trujillo, 2021). In the study site,

Denver Public Schools, these trends also hold, with Black students being overrepresented in law

enforcement referrals, tickets, and arrests (Asmar, 2020a). This study posits that

disproportionality in discipline is an additional quantitative measure of how disparate school

contexts impact historically marginalized students’ educational experiences and consequentially

accountability. In this way, this study operationalized CRT and QuantCrit frameworks through

the inclusion of variables describing the rate of disciplinary (a) incidents, (b) actions, and (c)

actions that resulted in loss of instructional time.

Another school context variable included in the study described whether a school was a

charter or district-run. Charters have grown in popularity due to the perception that they offer

market-based alternatives to the inefficient bureaucracy of public schools, as their focus on

pleasing their clients, who are likewise free to choose the schools with the best results, will spur

innovative improvements in organization and learning (Chubb & Moe, 2011). By being driven

by competition, the theory holds that charter schools will achieve superior outcomes and be more

responsive to community needs, as those that fail to do so will also fail to attract the requisite

families needed to operate and will be forced to close (Howe, Eisenhart & Betebenner, 2002).

Because of this perception, charter schools are seen as attractive alternatives to district-run

schools that suffer from low accountability ratings, and through the “Call for New Quality

27
Schools” process Denver Public Schools allows for new schools to be created to replace those

that are low-performing (Denver Public Schools, 2018). However, whether charters actually

result in greater student achievement is unclear, as the corporate and market logics which

undergird charters substitute attention to the impact of race- and class-based inequities in

producing disparate learning outcomes with solutions grounded in competition, standardization,

and resource management, which potentially exacerbate these inequities as schools are free to

reject the highest-needs students (Kantor & Lowe, 2016). This dynamic reinforces the disparate

opportunities and resources afforded raced and classed families, who can be excluded from high

performing charters through exclusionary enrollment practices and then blamed under the market

logics of individual responsibility for participating in low-performing schools (Howe, Eisenhart

& Betebenner, 2002; Lipman, 2013). For these reasons, CRT and QuantCrit were operationalized

in this study through the inclusion of data describing the charter status of schools, as this status

has special implications both for accountability outcomes as well as how well historically

marginalized populations are served in the district.

Finally, this study also examined school context variables describing student population

sizes as an absolute value of total enrollment and a relative value as the ratio of students to

teachers. Data regarding school enrollment size was included because research has found that it

can impact relative disadvantages of students of color in public schools, with larger schools

having greater structural disadvantages for students of color compared to White students

(Fitzgerald, Gordon, Canty, Stitt, Onwuegbuzie & Frels, 2013). In addition, enrollment size is

currently being considered as a metric to use in decisions regarding whether to close or

consolidate schools at the study site, Denver Public Schools (Asmar, 2021), making it not only a

28
timely variable for analysis but potentially an avenue that can lead to the same end of school

closure that parallels the outcomes of accountability ratings.

Similarly, student-teacher ratios have been used in previous research as a factor that can

impact student learning independent of the other variables used in this study (Driscoll,

Halcoussis & Svorny, 2003; Powers, 2003; Wu, 2013), making it a necessary control and

example of school context factors external to metrics measured by the accountability framework

used in Denver Public Schools. Because CRT and QuantCrit prioritize examinations of systemic

mechanisms by which institutional oppression is reproduced and legitimized, this study

operationalized these theoretical frameworks through the inclusion of these student population

size school context variables in addition to those of charter status, disciplinary environments, and

teacher quality, as all have been found to relate to historically marginalized student populations’

disparate access to high quality education.

English Learner Characteristics and Services

Emergent bilingual students have a unique historical context of being marginalized and

racialized. Thus it is critical to center their skills and learning needs rather than applying a color-

blind lens (Bonilla-Silva, 2006) that would treat these students and their history and needs as

indistinguishable from dominant group students. As such, this study included metrics to evaluate

the characteristics and educational services unique to emergent bilingual students through

variables describing (a) placement of ELs in programs to develop English, (b) the parent

preferences for such program placement, (c) the rates at which English Learner students are

redesignated, exited, and re-entered into English Learner status, (d) the level of English

proficiency of students, (e) the language status of English Learner students, and (f) the

representation of English Learners in Special Education (SPED) and Gifted and Talented (GT)

29
programs. This is because if these students learn in environments in which their bilingualism is

positioned as a deficiency to be overcome rather than assets to be embraced or is otherwise

subtractive through assimilationist ideologies and monolingual normativity (Valenzuela, 1999),

then these students will experience diminished school quality despite their school contexts and

school demographics.

Research has shown that when emergent bilingual students have the opportunity to

develop their bilingualism and biliteracy, they do better in math, reading, and even English

(Ramírez, 1992; Thomas & Collier, 1997), making English-only or transitional bilingual

education ineffective if educators are seriously committed to promote their academic

achievement (Rolstad, Mahoney & Glass, 2005). However, a 2010-2011 representative sample

of kindergarten students participating in English Learner programs found that only 8% of such

students participated in programs designed to develop bilingualism (Redford, 2018). Instead,

most emergent bilingual students are denied these opportunities and advantages. Making matters

worse, when schools receive low accountability ratings the curriculum is often further narrowed

to only teaching those subjects and skills that will be reflected in accountability measures

(Diamond & Spillane, 2004), which can even lead to the loss of otherwise successful bilingual

education programs (Menken & Solorza, 2014).

The loss of bilingual programming due to accountability frameworks that do not value

and hence do not measure bilingualism is an example of how language policies historically and

currently act as educational gatekeepers in the service reproducing power differentials in society

(Tollefson & Tsui, 2014). In the nineteenth century, Native American children were forcibly

interned in boarding schools and prohibited from speaking in their home languages through a

violent policy of linguistic and cultural assimilation (Wiese & Garcia, 1998). Today, anti-

30
immigrant and assimilationist ideologies are still evident in English-only programs and

accountability paradigms that implicitly position students’ home languages and cultures as

obstacles to overcome rather than assets (Baker & Wright, 2017; Black, 2006; Wiley & Wright,

2004), despite research showing the academic, interpersonal, and cognitive benefits of

bilingualism (Bialystok, Craik, Green, & Gollan, 2009; Dorner, Orellana & Li-Grining, 2007;

Martínez, 2010). Such ideologies are also evident in the invalid yet ubiquitous practice of using

monolingual-normed tests to assess bilingual students, which have led to the misidentification of

students as in need of remediation or even as lacking language (Hopewell & Escamilla, 2014;

MacSwan, 2005).

Because of this unique historical and contemporary context, the CRT attention to

intersectional mechanisms of oppression was operationalized in this study through the inclusion

of variables that described both the variation of English Learners’ language needs as well as the

ways that different school contexts supported those needs or failed to do so. These variables

described both what kinds of language support programs English Learners were placed in as well

as their families’ preferences for language support programs. These data were complemented by

variables describing the rates at which English Learners were redesignated from, exited from,

and re-entered into English Learner status, as holding this label has been shown to reduce

students’ access to challenging curriculum (Brooks, 2020) and negatively impact students’

learning outcomes (Kim, 2017). In addition, this study examined variables describing English

Learners’ level of English proficiency according to WIDA ACCESS scores, with different levels

of English proficiency indicating distinct student needs in addition to being statistically

significant predictors of accountability outcomes (Strong & Escamilla, 2020).

31
Variables describing English Learner characteristics included the rates at which English

Learners were placed in SPED and GT programs, as English Learners are often overrepresented

in SPED (Sullivan, 2011) and underrepresented in GT programming (US Commission on Civil

Rights, 2018). Additionally, this study included a variable to describe if English Learners were

Spanish-speakers. This is due to evidence that raciolinguistic ideologies that position racialized

Spanish-speaking bilinguals as particularly linguistically deficient as a means of legitimizing

white supremacist racial hierarchies (Hill, 2009; Rosa & Flores, 2017), which potentially

parallels the role of accountability outcomes in legitimizing the maldistribution of opportunities

and resources across schools (Lipman, 2013).

In order to operationalize the CRT focus on how institutional and ideological

mechanisms disenfranchise students with complex intersectional identities, this study also

included analyses of disproportional rates of English Learner representation in SPED and GT

programs and the disparate outcomes and school contexts of Spanish-speaking English Learners.

Together with quantitative data describing language service program placement, family desires

for language services, rates of redesignations, exits, and re-entry into the English Learner label,

and level of English proficiency, this dissertation explored variables specific to the

characteristics and services unique to English Learners.

Summary

By operationalizing CRT and QuantCrit to include the learning contexts and needs

specific to historically marginalized students, this research project sought to decenter normative

whiteness, monolingualism, and middle-classness while drawing attention to potential areas for

accountability focus that might be more effective in explaining and overcoming disparate

achievement outcomes.

32
Purpose of Study

This research project applied a Critical Race and QuantCrit lens to explore this unique

historical and accountability context of Denver Public Schools. This study aims to explore the

relationship of non-achievement metrics like student demographics, school contexts, and English

Learner characteristics and services to other metrics used in the SPF. Research has shown that

these broader factors impact learning outcomes (Guiton & Oakes, 1995; Teddlie, Stringfield &

Reynolds, 2002; Wang, 1998; Wu, 2013), yet by relying just on the SPF, DPS leaders and

educators were not able to consider them. Specifically, this project seeks to expand our

understanding of how and why accountability is defined by school districts. This study is

particularly relevant as school districts around the nation are rethinking their accountability

systems. Capitalizing on the expansive accountability reporting available in DPS and including

metrics describing student demographic, school contexts, and English Learner characteristics and

services that research has shown impact historically marginalized populations’ educational

experiences can hopefully broaden policymakers' understanding of schools' unique contexts and

needs.

Critical Race and QuantCrit frameworks alert us to the need to examine the ways that

institutional policies like accountability result in the disenfranchisement of historically

marginalized communities, especially through ostensibly race-neutral policies like accountability

(Gillborn, 2005). Taken together, this study sought to identify the relationships between school

SPF accountability outcomes and the student demographics, school contexts, and English

Learner characteristics and services that are not measured by the SPF. This approach offers to

potentially highlight the ways in which the SPF erroneously measures – and holds teachers and

33
schools accountable for – non-academic, contextual, and demographic variables of schools

instead of only student learning and school quality.

This investigation also offers to highlight the ability of the SPF to achieve its goal of

promoting improved student learning and school quality. If the SPF accountability framework is

effective in promoting school success as revealed through SPF ratings, then trends should

indicate that schools gain or stay in the lowest categories briefly as the accountability

consequences of low status result in improved performance. Conversely, as a result of

accountability consequences that encourage high performance, over time schools should

increasingly gain and maintain designations in the high performing rating designations. Finally,

not only should an effective accountability framework result in brief low ratings designations

and greater rates of entering into and remaining in high ratings designations, but it should also

only reflect school success rather than student demographics. For this reason, if the SPF

accountability framework only reflects school success, then the characteristics of the schools in

the extremes of the highest and lowest SPF ratings should be approximately similar. As such,

this study centers the student demographic, school context, and EL characteristic and services

variables extrinsic to the SPF, as an unbiased accountability framework should not reflect any of

these non-achievement metrics, and these non-achievement metrics should likewise have no

impact on schools’ SPF ratings.

Research Questions

In order to investigate the relationship between accountability outcomes and student

demographics, EL characteristics and services, school contexts, this study addressed the

following research questions:

34
1. What are the student demographics, EL characteristics and services, and school contexts

per SPF rating bracket?

2. At what rate do schools remain in, enter into, or exit the most extreme SPF ratings

statuses of Intervention vs. Blue, and what are the student demographics, EL

characteristics and services, and school contexts in them?

3. What are the student demographics, EL characteristics and services, and school contexts

per charters and district-run schools?

4. Do student demographics predict SPF scores?

5. Do student demographics predict SPF outcomes?

Significance of the Research

I posit that it is only through understanding a problem that one can work toward a

solution. In the same way, it is only through better understanding the institutional mechanisms

through which disenfranchisement occurs that targeted policy solutions can be crafted and

implemented. Better understanding of how student demographics, school contexts, and

accountability outcomes interact with attention to student-specific needs and outcomes can

potentially help policymakers reframe how accountability is conceptualized, institutionalized,

and enacted in order to confront systemic disparities in accountability outcomes that punish

historically marginalized students and the schools that serve them. Empirical data regarding the

ways that the most recent accountability framework used by Denver Public Schools impacted

historically marginalized populations can help district leaders design better, more equitable

accountability policies and practices that measure and provide the students-specific resources

that different historically marginalized demographics such as emergent bilinguals deserve. Given

the country’s long history of educational disenfranchisement of raced, classed, and linguistically-

35
marked students, research like this study that seeks to identify policy mechanisms of

marginalization in order to craft improved services and outcomes for such students is timely as

ever.

36
Literature Review

The literature review follows three aspects of extant research to explore studies

regarding: (a) the efficacy, validity, and utility of accountability policies, (b) the efficacy and

outcomes of accountability policies for historically marginalized students in particular, with a

special focus on emergent bilingual students, and (c) QuantCrit approaches to understanding

school environments and outcomes for historically marginalized populations. To conclude, the

connection between previous research explored here and the current dissertation is discussed.

The search for literature was conducted using ERIC (Proquest) and Education Full Text

(EBSCO) in addition to bibliographic chaining. Search terms for the first two sections (regarding

the efficacy, validity, and utility of accountability policies generally and for historically

marginalized students specifically) were: “school accountability” in combination with “validity”

or “efficacy” or “outcomes.” These results were then coded to indicate when the studies

described general accountability research and when they described issues pertinent to historically

marginalized students, with those about emergent bilingual students being sub-coded as a distinct

category. Search terms for the third section (regarding QuantCrit studies) were: “school

accountability” in combination with (a) “historically marginalized students” or “student

demographics,” and (b) “QuantCrit” or “critical race theory.”

Results were refined in three ways: (a) to only books and articles (excluding dissertations,

opinion pieces, and other types of media); (b) to only publications after 2000 to represent the

current accountability era characterized by the passage of the No Child Left Behind act; and (c)

to only those pertinent to the US context, although if studies also discussed international contexts

in addition to the US, as in the case of the Wiliams (2010) piece, they were also included.

37
Research Regarding the Validity of Accountability Policies

The most recent federal accountability law, the Every Child Succeeds Act (ESEA) of

2015, allows individual states leeway to implement accountability policies, resulting in a

patchwork of systems in which states use different indicators and different weights (or some not

at all) to measure different constructs (Darling-Hammond et al, 2016). This flexibility came on

the heels of an already lenient system which was prone to inconsistency such as the utilization of

divergent cutoff points, tests used, growth scales, and the incorporation of non-academic factors.

A comparative cross-state analysis (Martin, Sargrad, Batel & Center for American Progress,

2016) concluded that cross-state variation was so diverse that a child could very easily be

considered proficient in a subject area in one state only to find that in the next she is below

average. Similarly, using a path analysis of relationships between policy inputs, outcomes, and

contexts of all 50 US states Lee (2010) found that, because states had wide latitude in

implementing federal standards, some opted to manipulate their own standards frameworks in

order to artificially inflate their scores and delay implementation of federal objectives entirely so

as to avoid the negative repercussions related to under-performance (a rational choice when

accountability standards demanded increased performance without providing increased

supports). Such “gaming” of the accountability system was also found by Vasquez Heilig (2011),

whose longitudinal descriptive cohort progression analysis of 45,000 students investigated

accountability reporting outcomes, finding that the publicly reported graduation rates were

mathematically impossible.

Beyond the manipulation of accountability frameworks by districts and states, research

has questioned whether even faithfully implemented accountability systems actually reflect

student learning or school success. In a qualitative comparison of accountability ratings systems,

38
Murray and Howe (2017) concluded that systems which report single metrics of school success

like a letter grade are unlikely to accurately describe school quality or motivate the very

improvements upon which the accountability system is premised. This is because such

oversimplified ratings don’t actually reflect differences in performance. A quantitative study

using multilevel modeling and ANOVAs conducted by Adams, Forysth, Ware, Mwavita, Barnes,

and Khojasteh (2016) revealed that students from the schools with the highest ratings did not

have statistically significant higher reading or math outcomes than students in schools with lower

ratings. More striking, the highest-rated schools were also home to the greatest achievement gaps

between students receiving Free and Reduced Lunch services (FRL) and students of color as

compared to their non-FRL and White counterparts. This is due in part to the fact that FRL

students and students of color in schools with the lowest ratings actually had higher average

reading and math performance than FRL students and students of color in the highest-rated

schools (Adams, Forysth, Ware, Mwavita, Barnes & Khojasteh, 2016).

Disaggregating outcomes by student demographics could help address the misleading

nature of single-metric accountability ratings. For example, Glynn and Waldeck (2013) found in

their comparative analysis of SchoolDigger school ratings in four states that not only did single-

metrics ratings often obscure such achievement gaps but they also represented variation in

student outcomes that were not statistically significant between one rating category and the next.

However, disaggregation is not a panacea to the unreliability of accountability scores, as

highlighting performance by student subgroups introduces new problems. Vasquez Heilig,

Young, and Williams (2012) found in their qualitative study using focus groups and interviews

that teachers and administrators saw “at-risk” students’ disaggregated lower test scores and

consequently interpreted these students as threats to the school’s accountability rating. Similarly,

39
in a mixed methods case study of a Latino-majority high school over seven years, McNeil,

Coppola, Radigan, and Vasquez Heilig (2008) found that disaggregating outcomes by student

demographic led to the view that historically marginalized students were not pupils to be taught

but liabilities who imperiled accountability ratings.

The accuracy of the tests used to determine accountability ratings has also been

questioned as many standardized assessments reflect constructs beyond student content learning.

For example, Spees, Potochnick, and Perreira (2016) used regressions to evaluate the

relationship between individual-level eighth grade scores on the National Association of

Educational Progress, student demographics, and contextual factors including type of city. They

found that, in the case of emergent bilingual students, students’ scores reflected whether they

lived in new or established immigrant communities, demonstrating how variables far removed

from the quality of instruction can impact test outcomes and consequently accountability ratings.

In a review of literature regarding the validity of teacher evaluation instruments used for teachers

of emergent bilingual students, Turkan and Buzick (2016) found that because "there is no

uniform definition of necessary teaching knowledge and skills to be effective teachers of ELLs

[or English Language Learners]” (p. 238), the use of value-added models to evaluate teachers of

emergent bilingual students is likely unreliable and invalid.

However, regarding emergent bilingual students a more consistent source of test

invalidity is language bias within the assessments themselves. For example, Menken’s (2010)

word frequency analysis of the New York statewide Regents exam found the exam was not only

a test of content knowledge but also of English language skills that, by definition, emergent

bilingual students are still developing. Abedi (2004) reached a similar conclusion through a

different approach, using descriptive tracking of scores and internal consistency analyses of

40
Annual Yearly Progress as mandated by No Child Left Behind, finding that because of language

factors test results were not directly comparable between emergent bilinguals and their peers.

Similarly, in a correlational study of the effect of language demands and SAT math scores,

Tsang, Katz, and Stack (2008) found that even bilingual students who achieved above the

national average in math were still disadvantaged by “language interference” (p. 19) in math

word problems, indicating that districts and schools may be labeled as failing due to the size of

their emergent bilingual population and the test biases they face (Fairbairn & Fox, 2009). Even

accountability models that prioritize growth scores rather than stand-alone outcomes on single

tests are likely disadvantaging emergent bilingual students according to Lakin and Young

(2013), whose quantitative comparison of targets used by different growth models used in

California found evidence that growth models might not accurately reflect future projections of

student achievement for emergent bilingual students, thus subjecting these students to unrealistic

growth targets that are much higher than their non-bilingual peers.

In addition to issues of implementation and test validity, research has shown that

accountability ratings themselves might not lead to the sort of public access to reliable

information about school quality that was one of the goals of the accountability movement. The

format of the accountability rating (e.g., whether the rating is presented as a letter grade or a

proficiency score) affects public interpretation of school quality, as shown through an ANOVA

analysis of results from a population based survey of 59 school report cards (Jacobsen, Snyder &

Saultz, 2014). Beyond the lack of consistent interpretability of accountability ratings, what the

ratings themselves reflect may not even be valuable to public audiences in the first place, as the

majority of the public who participated in a mixed methods survey reported they do not see

41
standardized test scores as indicators of school quality, with less than a quarter of respondents

endorsing the opposing view (Brewer, Knoeppel & Lindle, 2015).

The divorce between which metrics the public believes indicate school quality and which

metrics are used in accountability frameworks reflects the ideological underpinnings of the

accountability movement, which historically has positioned the public and especially historically

marginalized communities as expendable rather than integral to defining the principles of

accountability and its goals (Lipman, 2013). Such findings cast doubt on the idea that

accountability policies actually fulfill their markets-based rationale that desirable and undesirable

reputations will drive improved performance. In fact, "[m]isinformation about school quality

may steer families away from areas they otherwise would have selected" (Glynn & Waldeck,

2013; p. 476) and vice versa, leading to families taking decisions ostensibly based on objective

indicators of school quality that in reality can reflect as little as formatting choices (Jacobsen,

Snyder & Saultz, 2014).

Even ignoring these concerns about test validity – as does the majority of accountability

research on disadvantaged schools according to Huilla’s (2020) content analysis – research on

the ability of accountability frameworks to impact learning improvements is mixed. While the

implementation of high-stakes accountability structures has been found to have some positive

impacts on student achievement, whether or not this is due to the accountability frameworks

themselves or other contextual factors remains undetermined, an ambiguity heightened by the

fact that, internationally, differences in quality of instruction only account for approximately

10% of inter-school test score variability (Wiliam, 2010).

Although some research has found that accountability is shown to increase achievement

on tests, such as demonstrated by the 2001 study by Fuller and Johnson that analyzed high stakes

42
test outcomes, advanced placement course patterns, and college entrance examinations, it is not

clear that even these findings of improvement “imply that all accountability systems will drive

improvement in student achievement. They will not" (Fuller & Johnson, 2001; p. 281). Instead,

each accountability system must be weighed for its potential benefits as well as harms, since

even accountability systems which are shown to increase performance can have negative

consequences. This dynamic was highlighted by Hanushek and Raymond (2005), whose study

estimating accountability effects using data from the National Assessment of Education Progress

found that – although in the aggregate accountability led to more achievement growth than

would have occurred without it – it also led to widening racial achievement gaps as Black and

Latinx students gained less from accountability than their White peers. These findings reflect

those of Lee and Wong (2004), whose analysis of state policy surveys and achievement data

showed that accountability policy had no effect on reducing the achievement gap.

Research Regarding the Efficacy and Outcomes Of Accountability Policies for Historically

Marginalized Students

Because this dissertation focuses on the impact of local accountability policies on

historically marginalized populations with a special attention to emergent bilinguals, it is

important to also understand research specifically for these students. Unfortunately, when

accountability efficacy and outcomes are considered for these students in particular, research has

come to even more dire conclusions, including that accountability frameworks can reduce rather

than enhance access to high quality education for historically marginalized students like

emergent bilinguals and students with special needs. By definition, these students’ unique needs

defy the logic of standardization, yet the standardization which is the bedrock of accountability is

continually applied to assessing them, their teachers, and their schools, often with inaccurate

43
results that are detrimental to the students themselves (Cramer, Little & McHatton, 2018).

Overlooking specific populations’ different needs leads to accountability practices which are

counterproductive or even harmful, such as by dissuading local leaders from implementing

culturally and linguistically relevant interventions that are more likely to be successful, as found

in the case study of a successful turnaround of a ‘failing’ school (Reyes & Garcia, 2014).

Accountability does not only impede culturally and linguistically appropriate

interventions but also appropriate instruction. As a result of accountability pressures for students

to perform well on English-language exams, in her qualitative study of 10 New York high

schools Menken (2006) found that the majority of schools began ‘teaching to the test,’ or

winnowing instruction to only those skills and knowledge that would be tested. For example,

reducing bilingual education options and replacing English as a Second Language curriculum

designed to develop the communicative needs of emergent bilingual students with one based on

the English Language Arts curriculum designed for native English speakers. This phenomenon is

not limited to emergent bilingual students, as Diamond and Spillane (2004) found in their

qualitative study that such ‘teaching to the test’ was more common in low-rated schools, which

tend to have higher than average proportions of historically marginalized students. This might be

because teachers who work in contexts in which accountability frameworks are implemented

especially punitively – with consequences for low test performance including school closure,

intervention, and staff turnover – have reported that their opportunities for professional

development have likewise been winnowed in order to focus on instruction that will produce

higher test results. The consequence of this is that the students in these contexts likewise have

fewer opportunities to learn from teachers whose professional abilities are being fully developed,

as found by a case study conducted by Jacobs, Burns, and Yendol-Hoppey (2015).

44
In addition to responding to accountability pressures by ‘teaching to the test,’ some

schools and teachers have responded by removing students who they perceive likely to have low

test scores. Preventing students from taking high stakes tests is one way schools can “game”

accountability. For example, using the tests of statistical significance to prove "discriminatory

impact" according to the legal standard, Haney’s (2000) review of Texas’s remarkable outcomes

for students found that they were reflections of an increase in low-achieving students being

removed from the pool of testing takers through higher retention rates of Black and Latinx

students, higher rates of drop outs, and excluding students from testing. In a qualitative study,

Vasquez Heilig, Young, and Williams (2012) found that more than two-third of teachers and

administrators confirmed that their schools eliminated students who were perceived to lower

accountability scores by failing students, implementing ‘waivers’ to excuse low performing

students from tests, and encouraging low-performing students to drop out of school, even though

threats to report students to US immigration due to perceived undocumented status.

These findings mirror what Vasquez Heilig and Darling-Hammond (2008) encountered in

their longitudinal mixed methods study of 25,000 students over seven years, which showed that

test scores increased in response to student disappearances as low-achieving students were

retained or encouraged to leave school, resulting in Black, Latinx, and emergent bilingual

students having the lowest graduation rates. The motivation to push out students in order to

“game” accountability outcomes might explain discrepancies in graduation rates per student

racial group, as Fitzgerald, Gordon, Canty, Stitt, Onwuegbuzie, and Frels (2013) found using

nonparametric repeated measures analysis of variance of three years of data representing

between 500-600 schools each year, in which evidence emerged that in schools with large

enrollment sizes White students had statistically significantly higher high school completion

45
rates than Latinx and Black students, differences that represented large effect sizes and that

indicated students of color were at particular disadvantages in those school settings. Together,

these data indicate that accountability policies create disincentives for schools to work with

historically marginalized populations (Menken, 2010).

Because accountability policies have led to treatment of historically marginalized

students in which their curriculum and instruction are narrowed, they are erroneously retained or

discouraged from participating in school, and they are perceived as threats by teachers and

administrators, it is little surprise that accountability outcomes often reflect student populations.

In a regression analysis of the relationship between students’ reported Annual Yearly Progress

and student demographics, Martin (2012) found that the schools that failed to achieve their

benchmarks had larger populations of economically disadvantaged students. Similarly, Harris

(2007) used descriptive statistics of student demographics and school ratings of 60,000 schools

across 47 states and found that schools with the smallest populations of students of color and

students in poverty were 89 times more likely to be rated as high performing when compared to

schools with larger populations of these students, indicating that the accountability framework

was holding schools accountable for student demographics, a factor beyond their control. These

findings were confirmed by Martinez-Garcia, LaPrairie, and Slate (2011), whose MANOVA

analysis of student demographics and accountability ratings of 4,000 schools found that, because

“[e]xemplary elementary schools had the lowest percentages of Black students, Hispanic

students, economically disadvantaged students, at-risk students, students with LEP, and mobility

percent whereas Academically Unacceptable schools had the highest percentages” (p. 16) with

moderate to large effect sizes, accountability ratings were likely unreliable as they also were

reflections of student demographics. Regarding emergent bilingual students, the correlational

46
study conducted by Tsang, Katz, and Stack (2008) confirms that due to “language interference”

(p. 19) the standardized tests used to calculate accountability ratings are unreliable, as these

ratings reflect schools’ emergent bilingual populations rather than student learning.

Such disparities in outcomes along the lines of student demographics reflects disparities

of inputs. Using a fixed effects regression analysis to evaluate the relationship between student

demographics, school resources, and outcomes over ten years, Wu (2013) found as small as a 1%

change in student racial groups, students receiving Free and Reduced Lunch, or English learner

students was enough to change schools’ accountability outcomes. Unsurprisingly, a similar

relationship was detected between schools’ resources, with achievement increasing in proportion

with an increase in teachers with full teaching credentials and decreasing with an increase in

school enrollment and class size. Together, these findings indicate that accountability ratings are

not only a reflection of student learning but also student demographics and school resources.

Quantcrit Approaches To Understanding School Environments and Outcomes for

Historically Marginalized Students

The theoretical and methodological framework employed in this dissertation was

QuantCrit, which has roots in Critical Race Theory and seeks to use quantitative data for

antiracist purposes in part through the acknowledgment that, like all data, quantitative data are

not objective but require specific interpretations and reflect researcher intention (Gillborn,

Warmington & Demack, 2018). In statistical demographic research, this translates to an attention

to how racial categories are understood to be causally related to disparate social outcomes.

QuantCrit scholars hold that quantitative data do not “speak from themselves” but, like all data,

only become meaningful through interpretation. For this reason, the ways that quantitative

researchers discuss and interpret findings – such as the relationship between racial groups and

47
social outcomes – has powerful implications for either perpetuating white supremacist

understandings of inherent racial difference and deficit or disrupting those narratives

(Covarrubias, Nava, Lara, Burciaga & Solórzano, 2019). For example, instead of interpreting

data as a person’s race “causing” disparate social outcomes, we must insist on interpretations that

explore how social processes of racialization are related to outcomes (Bonilla-Silva & Zuberi,

2008). Therefore, a better way to report on statistical findings is through a commitment to

coupling discussions about race with discussions about racism (Gillborn, Warmington &

Demack, 2018).

This is embodied in recent research into disparities of educational outcomes per student

demographics, which Van Dusen, Nissen, Talbot, Huvard ,and Shultz (2022) framed as

reflections of education debt rather than student ability. Using hierarchical linear models of pre-

and post-tests of over 4,000 college students taking introductory chemistry courses across 12

institutions, they found that Black men and women were owed the largest education debts by

society. For example, White Hispanic women would need to take the introductory course two

and a half times to be repaid their legacy education debt. Another way demographic data can be

used within QuantCrit frameworks is to calculate relative difference composition indexes, Equity

Indexes, and Inequity Scores as demonstrated by Young and Young (2022), who found Black

students were nationally underrepresented in Gifted and Talented programs by between 31%-

56%. Similarly, in a 2021 study of math and English Language Arts achievement for different

demographic groups, Ramlackhan and Wang (2021) used descriptive statistics and growth

mixture models with varying numbers of latent classes to find that student demographics varied

dramatically across the higher- and lower-achievement classes, with the high-achievement

classes being overrepresented with White students. However, instead of attributing these

48
differences to the students themselves, the authors used a QuantCrit framework to call for greater

investigation of the “underlying structures and oppressive mechanisms in society that creates

differential access to resources and opportunity in urban communities" (Ramlackhan & Wang,

2021; p. 22).

Other recent research has drawn on the QuantCrit view that neither quantitative data nor

racial categorizations are neutral or objective to explore the process of student racialization in

educational reporting on student demographics. For example, Campbell-Montalvo (2020) used a

QuantCrit lens to find that actors across schools and districts interpreted and reported on student

race differently, resulting in inconsistencies that revealed the artificial nature of racial

demographics in school data and consequently in race-based educational policy. Likewise, a

study by Crawford (2019) examined the deeply political and biased nature of educational

statistics in her investigation into much-publicized data showing “discrimination” against White,

working-class, male students in the UK. This study found that the statistics had failed to

disaggregate for status of educational attainment while misrepresenting class status in order to

present White males as if they were chronically underserved in public schools when in reality

they continued to outperform their peers of color, making the statistics little more than

misleading data serving to legitimize the centering of White needs through a false sense of

White, male victimhood.

Additionally, many QuantCrit scholars borrow from theories of intersectionality, which

holds that our various raced, classed, gendered, sexual, linguistic, religious, etc. social identities

intersect in different arenas of our daily lives to create overlapping and at times contradictory

spheres of oppression and privilege (Crenshaw, 1991). Demographic statistics can fail to account

for intersectionality, such as through the misuse of multiple regression analyses that seek to

49
isolate and control for identity variables as if they were discrete and extractable aspects of social

existence rather than necessarily interconnected (López, Erwin, Binder & Chavez, 2018).

QuantCrit scholars have attempted to address this shortcoming through quantitative

intersectionality, a method and approach that is grounded in intersectionality through the

purposeful examination of how various social identities lead to differential outcomes rather than

treating singular identities, such as race or gender, as if there were homogeneous social

categories that relate to homogeneous experiences (Covarrubias, Nava, Lara, Burciaga &

Solórzano, 2019).

Using an intersectionality framework, several scholars have problematized the misleading

homogenization of students according to gender, race, class, and immigration status to identify

statistical differences in education attainment trends according to intersectional identities. López,

Erwin, Binder, and Chavez (2018) used saturated logistic models in an analysis of a large public

university’s six-year graduation rates and developmental coursework enrollment, finding that

students’ intersectional identity categories greatly related to their likelihood of graduating, with

low-income American Indian men being approximately 45% less likely than high-income White

women to graduate in six years.

An especially fruitful area for quantitative intersectionality focuses on the educational

pipeline in which percentages of academic attainment can be tracked to intersectional identities

that holistically analyze the social locations of, for example, female, upper-income, noncitizen,

Chicanas as opposed to male, middle-income, citizen, African Americans (Covarrubias, 2011).

Framing quantitative investigations in this way not only offers to produce a richer, more accurate

understanding of how identity categories relate to differential outcomes, but it also disrupts

majoritarian narratives that essentialize historically marginalized communities to singular aspects

50
of their identities, a disruption that is a central goal of Critical Race Theory and QuantCrit

(DeCuir & Dixson, 2004; Garcia, López & Vélez, 2018). Other research of the education

pipeline has used similar methods to find that gender, class, and citizenship status directly relate

to educational attainment and earning potential, whether for Asian Americans (Covarrubias &

Liou, 2014), students of Mexican origin (Covarrubias & Lara, 2014), or across racial groups

(Covarrubias, 2011) throughout the K-PhD spectrum.

These lenses also lend themselves to critical studies of proportionality. Cruz, Kulkarni,

and Firestone (2021) used mixed multilevel logistic regression models and discrete-time hazard

models to find that BIPOC students were overrepresented in both in- and out-of-school

suspensions, representing a form of instructional loss for these students. Even when controlling

for gender, Free and Reduced Lunch status, parent education, and school characteristics such as

the percentage of White students and average years of teacher experience, they found that

disciplinary actions tracked student race, with Black and Latinx students being about twice as

likely to be suspended as White students, with the odds of any student being suspended

decreasing in proportion to increases in the percentage of the student body that was White. These

results mirrored those of Anyon, Wiley, Samimi, and Trujillo (2021), who also used descriptive

statistics and multilevel logistic regressions to calculate odds ratios of being suspended per

student demographic, finding that when compared to White students, Black, Latinx, and

multiracial students had significantly higher odds of receiving both in- and out-of-school

suspensions. Both of these studies were grounded in QuantCrit, and thus interpreted these

discrepancies not as inherent attributes of students but reflections of systemic inequities within

public education policies and practices.

51
Other work highlights the inaccuracies of attributing disparities to students, such as the

multilevel multivariate logistic regression study conducted by Morris (2021), which found that,

according to the results of the nationally representative Education Longitudinal Study, the belief

that students of color were more likely to learn in disruptive, violent schools necessitating these

disparate discipline rates fell apart, as "students who attend minority segregated schools are, at

worst, no more likely to be victimized, and, once statistical controls are put into place, they

appear less likely to be victimized” (p. 13) by other students at their schools. This work

quantitatively demonstrates the fallacy of attributing disparate disciplinary outcomes to students

themselves, an example of the power of QuantCrit to both highlight educational inequities while

directly contesting deficit narratives. The power of such work is not limited to students but can

also be applied to all participants in the educational system, as demonstrated by Campbell-

Montalvo (2020), who used hierarchical linear modeling to describe disparities in teacher

evaluations as found in North Carolina Department of Public Instruction administrative data,

finding that even when quality of teaching indicators are similar, after classroom observations

are conducted Black women are rated lower than White women. Campbell-Montalvo draws on

QuantCrit to interpret these data not as implications of inherent racial difference but rather

inherent racial bias within public school settings that affect all participants, students and teachers

alike.

Relationship Between Previous Research and the Dissertation

This dissertation drew from QuantCrit in its design, interpretation, and purposes. Because

this project explicitly aimed to understand the ways that the School Performance Framework

accountability system is measuring student demographics rather than student learning, its

primary purpose is to advance critical scholarship that problematizes school accountability

52
policies and frameworks as metrics of social and institutional biases that legitimate the

maldistribution of educational resources both symbolic (e.g., high accountability ratings and

prestige) and material (e.g., appropriate quality curriculum) and perhaps even the cessation of

programs that have potential to be effective for students such as bilingual or dual language.

In this way, this research was fundamentally dedicated to promoting social justice causes

as it sought to explore the ways that institutional practices in the form of school accountability

frameworks result in disparate outcomes for historically marginalized populations while

benefiting historically dominant populations. By highlighting the systemic nature of institutional

bias in this context, this research project offers empirical evidence that counters deficit narratives

about historically marginalized communities and points to policy reforms that account for and

thus overcome such biases.

Additionally, this project looked at all available data regarding student demographics,

including racial designations, class designations (as approximated by Free and Reduced Lunch

status), language designations (as measured by “English Learner” status and WIDA scores in

English proficiency and Spanish-language status), and ability designations (as measured by

Special Education status), as well as intersectional identities, such as the percentage of English

Learners who receive Special Education services.

Doing so grounded this project in an intersectional lens in which student identities are not

seen as discrete but rather interwoven constellations of social locations that combine to impact

educational outcomes. Such a focus not only allowed this research to produce more robust

descriptions of patterns of institutional bias against raced, classed, and linguistically- and ability-

marked students, but it also embodied a priority to disrupting essentializing portrayals of

historically marginalized populations in research. Both of these goals and the intentional research

53
design and data collection procedures they inspire are derived from scholarship on

intersectionality generally and in QuantCrit specifically.

Finally, this project drew on CRT and QuantCrit regarding the need for scholars to take

explicitly political, antiracist stances during all stages of the research process, including during

the analysis and interpretation phases. For this reason, during analysis and interpretation this

project only looked for and discussed relationships between historically marginalized identity

categories and disparate academic outcomes in terms of racism and racialization rather than

racial causation. This commitment was extended to other social processes of marginalization

such as classing and linguistic discrimination as warranted by the data. At no time did this

dissertation entertain the possibility that students’ racial, class, language, or ability statuses

“cause” disparate accountability outcomes. Rather, any relationship found between those statuses

and accountability outcomes was investigated as reflections of institutional biases against such

populations that relate to the lack of construct validity in accountability frameworks, inadequate

supports for schools that serve large numbers of such identified students, or both.

54
Methods

Data Overview and Study Parameters

To produce antiracist, social justice scholarship, to acknowledge the complexity of both

student social identities and school contexts, as well as the historical context of Denver Public

Schools’ struggle to equitably serve historically marginalized students, this study used a

transformative research design (Teddlie & Tashakkori, 2009) based in QuantCrit principles

(Gillborn, Warmington, & 2018). Such quantitative data analysis allows for a better

understanding of the ways the accountability framework used by DPS inadvertently measures

non-academic variables such as student demographics, English Learner (EL) characteristics and

services, and school contexts instead of student learning. Doing so allowed this project to

highlight the ways that the School Performance Framework (SPF) is a measure of variables

extrinsic to the accountability framework, policy, and purposes, thus rendering the SPF a

reflection of institutional biases that reproduce inequality in education rather than student

strengths and needs.

Research Questions

In order to investigate the relationship between accountability outcomes and student

demographics, EL characteristics and services, school contexts, this study addressed the

following research questions:

1. What are the student demographics, EL characteristics and services, and school contexts

per SPF rating bracket?

2. At what rate do schools remain in, enter into, or exit the most extreme SPF ratings

statuses of Intervention vs. Blue, and what are the student demographics, EL

characteristics and services, and school contexts in them?

55
3. What are the student demographics, EL characteristics and services, and school contexts

per charters and district-run schools?

4. Do student demographics predict SPF scores?

5. Do student demographics predict SPF outcomes?

Timeframe

This study drew on data from the three most recent academic years available during

which the School Performance Framework (SPF) was implemented in Denver Public Schools

consistently. They are the 2016-2017, 2017-2018, and the 2018-2019 academic years (AYs).

After the 2018-2019 AY, COVID resulted in disruptions that made many of the metrics used in

the SPF unreliable, resulting in no SPF scores being issued for the 2019-2020 AY (Denver

Public schools, n.d. - d) Before the 2016-2017 academic year, the district implemented several

changes to how accountability scores were calculated that made comparison across years

problematic. These changes included (a) the addition of the new Equity Indicator, (b) switching

from using the Partnership for Assessment of Readiness for College and Careers (PARCC)

standardized tests to the Colorado Measures of Academic Success (CMAS) standardized tests to

calculate SPF scores, and (c) lowering the threshold as to what constitutes adequate performance

on some measures (Asmar, 2016). Because of the changes in how accountability scores were

calculated in the prior years and the global disruptions to education in the subsequent years, the

span of academic years 2016-2017 through 2018-2019 represent the most recent years in which

accountability policy has been consistently applied in the district.

In this study, each individual academic year is represented with an individual dataset. In

the study’s use of descriptive statistics, individual academic year’s data trends are shown, as well

as the averages derived when the three years of data are aggregated. In the use of regression

56
models, the three-year aggregate is used, and dichotomous variables are included to control for

different years.

Data Sources

Only publicly available school-level data pertaining to the district were included. Data

were drawn from various datasets across three sources: (a) the Colorado Department of

Education (CDE) publicly available online data of school-level staff, discipline, and student

statistics; (b) DPS annual SPF Reports; and (c) Consent Decree reports of “English Learner”

services and outcomes as mandated by the Modified Consent Decree (2012) related to mandated

services, programs and assessments for students identified as English Learners (Consent Decree

of the U.S. District Court (2012). Data represent both district-run schools and charters. In total,

nine datasets were used to compile each academic year’s final dataset: four datasets came from

the CDE, one dataset came from the SPF Report, and four datasets came from the reports

mandated by the Consent Decree. A summary of data sources can be found in Appendix Table 1

(Appendix A).

Inclusion Criteria

Because this dissertation examined the relationship between accountability outcomes in

the form of SPF scores and student demographics, EL characteristics, and school contexts, the

principle inclusion criteria was the availability of SPF accountability scores. Due to the diversity

of reporting sources and datasets, when all nine datasets were combined into final combined

datasets for each academic year, these were consistently incomplete. For example, although a

school might have had data from the SPF report, the CDE datasets, and most of the Consent

Decree datasets, perhaps in the dataset of English Learner participation rates in Gifted and

Talented programs there could have been no entry for that school. In this case, the school still
57
would have been included, as the secondary inclusion criteria was data available in at least one

additional dataset beyond the SPF Report. Fortunately, all schools with SPF scores met this

secondary criteria.

Exclusion criteria

As only schools with reported SPF scores were included, any school lacking this data was

omitted. This resulted in the omission of 30 schools in the 2016-2017 AY, 17 schools in 2017-

2018 AY, and 24 schools in the 2018-2019 AY. In addition, in some datasets multi-level schools

(e.g., serving grades K-8) were reported as a single entity while in others the levels were

disaggregated and reported separately. For example, in the SPF Report in the 2018-2019 AY

there is an entry for “Bruce Randolph School,” but in the same year the 9VA2 Consent Decree

report lists “Bruce Randolph HS” and “Bruce Randolph MS.” Because there was no way to

discern which outcomes of the two or more entries in the disaggregated reporting would be

relevant to which outcomes in the single, aggregated reporting, when there was reporting

inconsistency of multilevel schools those schools were omitted. This resulted in the omission of

38 schools in the 2016-2017 AY, 13 schools in 2017-2018 AY, and 14 schools in the 2018-2019

AY.

Variable Type and Selection

These datasets were chosen because they provided the variables necessary to address the

research questions of the study. The variables used can be categorized broadly into the following

themes: (a) student demographics, (b) English Learner characteristics and services, (c) school

contexts, and (d) accountability outcomes.

58
Student Demographics

Student demographic variables were defined by the respective reporting agency (i.e., the

CDE or DPS) and included counts of students classified as Students of Color (SoC), English

Learners (EL), Special Education (SPED), Gifted and Talented (GT), and Free and Reduced

Lunch (FRL). These variables were included for several reasons. First, research has shown that

students in the SoC, EL, FRL, and SPED classifications are historically denied the equitable

services, opportunities, and resources necessary for school success (Darling-Hammond, 2004;

Martin, 2012; Wu, 2013), with disparate opportunities to participate in GT programming being

an additional hallmark of this inequitable allocation of resources (Card & Giuliano, 2016) and a

rationale for including the GT metric in the study. Second, because this study pays special

attention to ELs, including variables regarding SoC, FRL, and SPED are relevant due to frequent

intersectional categorizations in which ELs occupy one or more of these additional

classifications (Blanchett, Klingner & Harry, 2009; Cramer, Little & McHatton, 2018). Finally,

the outcomes of students of color, students receiving FRL services, ELs, and students receiving

SPED services are specific indicators used by DPS to calculate an important part of the SPF

score called the Equity Indicator (Asmar, 2016b). Because of the centrality of these student

demographic classifications in research on equitable education, their importance to ELs

specifically, and their role in calculating SPF scores, all of these variables were included in the

study as “Student Demographics.” The raw counts of these student demographics were

transformed into percentages of each student demographic type out of the total student

population. In this study, all four of these classifications of historically marginalized students

(SoC, ELs, FRL, SPED) are used as predictors of SPF scores in multiple regressions in addition

to reporting of descriptive statistics.

59
EL Characteristics and Services

This dissertation also drew on the expansive accountability reporting mandated by the

Modified Consent Decree of 2012 (Consent Decree of the U.S. District Court, 2012) regarding

the characteristics, needs, and outcomes of ELs. Since ELs are typically overrepresented in

SPED programming (Sullivan, 2011) and underrepresented in GT programming (US

Commission on Civil Rights, 2018), this study included metrics describing EL participation rates

in these programs. The district reported the percent of ELs classified as GT and the percent of

SPED students that were classified as ELs in each school. In addition, this study reports on the

language status of ELs to specify when their bilingualism includes Spanish, as raciolinguistic

ideologies that index racial status by language practice has led to English-Spanish bilingualism

being especially denigrated in the US when embodied by heritage speakers of color (Hill, 2009;

Rosa & Flores, 2017).

This study also includes variables describing Parent Preference 1, 2, and 3 (PPF1, PPF2,

PPF3), which indicate what kinds of language supports parents desire for their EL students, with

PPF1 indicating a preference for native language instruction designed for emergent bilingual

students, PPF2 indicating a preference for English-only instruction designed for emergent

bilingual students, and PPF3 indicating a desire to decline all services offered specifically to

ELs. These data are paired with EL participation rates in the settings of what the district calls

Mainstream (reflecting PPF3), English Language Acquisition-English (reflecting PPF2), English

Language Acquisition -Spanish (reflecting PPF1), and Dual Language programs. Access to

language instruction settings is particularly important to ELs, who are often denied opportunities

to participate in challenging and appropriate curriculum (Callahan & Hopkins, 2017).

60
Additionally, data describing the rates at which ELs were Redesignated from, Exited

from, and Re-Entered into EL status were also included, as these rates can reflect policy and

instruction that impacts both ELs’ opportunities to access challenging curriculum as well as their

achievement outcomes (Brooks, 2020; Kim, 2017). Finally, EL data describing WIDA ACCESS

scores – which measure English-language proficiency across the domains of reading, writing,

speaking, and listening – were also included to indicate the percentage of ELs that were

Beginning, Intermediate, and Advanced Level in their development of English, as prior research

has shown these distinctions to be statistically significant predictors of SPF outcomes (Strong &

Escamilla, 2020). In this study, all of these variables were calculated as percentages reflecting

rates out of the total EL school population in each school and used in descriptive statistics to

illustrate differences in EL characteristics and services across schools.

School Contexts

In order to describe the variation across school settings as well as to provide controls for

multiple regression models, this study also included variables regarding school characteristics.

Because the race and socioeconomic status of students has been found to predict rates of

disciplinary referrals (Bryan, Day-Vines, Griffin & Moore-Thomas, 2012; Skiba, Chung,

Trachok, Baker, Sheya & Hughes, 2014) one such characteristic describes disciplinary actions

and incidents to represent whether students are learning in particularly discipline-heavy

environments, which might be related to bias against students of color and students in poverty.

All disciplinary action and incident counts were converted into rates of actions and incidents per

100 students. The discipline action counts were also used to calculate a new variable to describe

the rate of disciplinary actions that resulted in instructional loss per 100 students, since some

types of discipline such as out of school suspensions result in considerable loss of access to

61
teachers and instruction, making the disparate rates of discipline students of color confront

equivalent to the loss of months or even more than a year of instructional time (Losen &

Martinez, 2020). This variable was created to capture an additional potential impediment to

learning outcomes that could influence SPF ratings, a factor that individual analysis of

disciplinary action counts in isolation would obscure. This variable was made by combining the

counts of disciplinary actions of expulsion, out of school suspension, and classroom removal,

and then calculating the rate of those aggregated counts per 100 students. All discipline variables

were included in descriptive statistics, and the variable describing loss of instructional time was

also included as a control in the multiple regressions.

Another variable included in the descriptive statistics describes total school enrollment.

This was included both because of previous research that has found that enrollment size can

impact relative disadvantages of students of color (Fitzgerald, Gordon, Canty, Stitt,

Onwuegbuzie & Frels, 2013), and because the district is currently considering closing schools

with small enrollment (Asmar, 2021), making it especially pertinent to immediate district

interests and considerations when defining school success. Additionally, this study uses variables

describing student-teacher ratios and the percentage of teachers that are considered “Fully

Qualified” to work with culturally and linguistically diverse students according to the district.

The teacher qualification metric was selected because students in poverty, students of color, and

emergent bilingual students are less likely to work with highly qualified teachers (Darling-

Hammond, 2004; Goldhaber, Lavery, & Theobald, 2015; Lankford, Loeb & Wyckoff, 2002).

The student-teacher ratio metric was included due to previous research that found that these

ratios are related to student achievement and teacher stress (Alspaugh, 1994; Hojo, 2021; Koc &

Celik, 2015). Finally, this study also included a dichotomous variable to describe whether or not

62
a school was district-run or a charter in order to address Research Question 3. While all of these

variables were included in descriptive statistics, only the percent of teachers classified as “fully

qualified,” the student-teacher ratio, and the rate of disciplinary actions that result in instructional

loss per 100 students were also included in the multiple regressions as controls. The theoretical

and data-based decision making process regarding model construction will be discussed in the

section describing methods for each research question.

Variables Created

As mentioned, most of these variables were transformed from counts into rates and

percentages in order to standardize occurrences across schools of different sizes, although some

new variables were also created. For example, a new variable was created to represent Simplified

SPF Ratings designations by (a) collapsing the ratings categories (Red and Orange) that result in

district intervention into one category, called “Intervention;” (b) leaving the middle rating

category (Yellow) as a single category, called “On Watch;” and (c) collapsing the two highest

categories (Green and Blue) into one category, called “High Performing.” These were created

both because there are not meaningful differences between the collapsed ratings categories as

they result in similar outcomes (such as prestige, stigma, or intervention), as well as to run

ordinal logit regressions, which predict categorical outcomes, with results that were easier to

interpret as they showed predicted probabilities of broadly distinct accountability outcomes

rather than framing ratings outcomes with similar results as somehow different.

In a similar way, a set of variables was created to describe SPF trends over time. To do

so, I looked at schools that remained in, entered into, or exited from the SPF ratings categories at

the most extreme ends of high and low performance. At one extreme, schools were coded to

describe if they remained in, entered into, or exited Intervention Status (as defined above), and at

63
the other they were coded to describe whether they remained in, entered into, or exited Blue

Status, the most exclusive and thus most prestigious designation at the other pole of

accountability outcomes. The rationale and use of these categories will be discussed below in the

section for each research question.

Research Process

Data Collection and Cleaning

Each academic year’s nine datasets (27 in total) were downloaded from the three data

sources as Excel files. If rows did not contain individual school cases or contained nested data,

the Pivot tool of Excel was used to clean the data so that each row only described single school

cases. Values of “0” were inspected to ensure they actually represented a count or percent of 0

and not the absence of data. In the few cases in which values of “0” represented an absence of

data, the value was deleted and a blank space was left in its place.

Because of inconsistency in how school names were reported across the nine datasets, I

used Excel to clean the name text for each school. First, I used the UPPER function to capitalize

all school names. Then, I used the replace tool to ensure all instances of school level descriptions

were consistent, as in some datasets a school could be described as, for example, “Lincoln High

School” and in other datasets as “Lincoln HS.” Finally, I used the TRIM function to remove

additional spaces. This resulted in consistent reporting of school names across the datasets.

Data Consolidating (Excel)

These cleaning procedures allowed me to use the Consolidate tool of Excel to combine

all nine datasets for each academic year, using the school name as the identifying metric.

Although a preferable case identifier would be a numeric code, for reasons I do not have access

64
to DPS uses school level numeric identifiers that are different than those used by the Colorado

Department of Education, resulting in two sets of irreconcilable identifiers that only a name-by-

name check could match, a process which would have introduced unacceptable degrees of

human error.

Data Merging (Stata)

Once there was a single, complete dataset for each academic year, I imported them into

Stata by holding the three datasets in memory and creating dichotomous variables to indicate

each distinct academic year. This allowed me to calculate averages, run regressions, and conduct

analyses for the three-year aggregate as well as conduct analyses and output for individual years.

Stata Functions

I then used Stata to create variables to transform data from counts into percentages and

rates of student demographics and service types, discipline per 100 students, and fully qualified

teachers. Some student demographic variables, like percentages of Spanish-speaking ELs and

ELs receiving Special Education services, reflect the percentages of these students out of the

total number of their respective subpopulations (i.e., ELs) rather than the total number of

students enrolled.

Stata was also used to create the new variables, like SPF Simplified Outcomes and SPF

trends. Some of these new variables required several calculations, such as the variables

describing the percentages of ELs according to level of English proficiency, which were created

by combining the counts of ACCESS scores of: (a) 1 and 2 to create the count of Beginning

Level ELs, (b) 3 and 4 to create the count of Intermediate Level ELs, and (c) 5 and 6 to create

the count of Advanced Level ELs. These counts were transformed into percentages to represent

the rates of ELs in each level of English proficiency out of the total number of ELs in a school.
65
Stata was then used to create tables of descriptive statistics in order to address Research

Questions 1, 2 and 3, and run the multiple regressions required to address Research Questions 4,

5, and 6. Stata was also used to export the data used in all of the tables and the figures created in

R Studio (below), and to create the figures used in Research Questions 4 and 6.

Creation of Figures

R Studio was used to create figures for the descriptive statistics in Research Question 1

and the predicted probabilities resulting from the ordinal logit regressions in Research Question

5.

Methods per Research Question (RQ)

RQ1. What are the student demographics, EL characteristics and services, and school contexts

per SPF rating bracket?

To address this research question, I used Stata to create descriptive statistics of the mean

of the variables described in the above section per each of the five SPF ratings brackets (Red,

Orange, Yellow, Green, Blue). Results were exported to Excel and reported in a table in order to

show each individual academic year’s means as well as the three-year aggregate means. These

data were then used to create figures in R Studio using ggplots.

RQ2. At what rate do schools remain in, enter into, or exit the most extreme SPF ratings statuses

of Intervention vs. Blue, and what are the student demographics, EL characteristics and services,

and school contexts in them?

To address this research question, schools were coded to describe whether they remained

in, entered into, or exited either Intervention Status or Blue Status. These two statuses were

chosen to represent the poles of accountability outcomes as means of evaluating the effectiveness

66
of the SPF accountability framework in discouraging school failure and promoting school

success (Murray & Howe, 2017). At one pole is “Intervention Status,” representing the schools

receiving either Red or Orange SPF ratings, which trigger district intervention (Asmar, 2018;

Denver Public Schools, 2018). As such, schools in the Intervention Status category represent

primary targets of the accountability framework; namely, if the accountability framework is

effective in promoting school success, schools should receive Intervention Status only

temporarily as the accountability consequences of low ratings promote higher levels of success.

At the other extreme are the schools that earned the highest rating possible, or “Blue Status,” and

thus were used to represent the end toward which the accountability framework should, in

theory, move schools and at which schools should aspire to remain. Together, Intervention Status

and Blue Status not only represent the extremes of the SPF accountability system but also the

targets of that system.

To conduct this analysis, I created a variable “SPF Trends” and gave a non-ordinal

numeric code to schools that either (a) remained in Intervention Status, (b) remained in Blue

Status, (c) entered into Intervention Status, (d) entered into Blue Status, (e) exited Intervention

Status, or (f) exited Blue Status. Schools were coded as “remaining” in either status if they began

and ended the study timeframe in that same respective status. They were coded as “exiting” one

of those statuses if they began the study in that status and ended with any other SPF rating. They

were coded as “entering” one of those statuses if they began the study in a different SPF rating

and ended the study in either Intervention or Blue Status.

Schools that did not meet any of these criteria were coded as 0. Only schools with SPF

data for all three years of the study were eligible to receive non-zero numeric codes, as the

research question seeks to identify trends over time and even missing a single year’s data would

67
result in trends only describing year-over-year change, which I decided was not sufficient to

constitute a trend rather than potential noise.

I then used Stata to create and export descriptive statistics of each SPF Trend category for

each individual year and the three-year aggregate means. I also used Stata to create quartiles of

the variables, which I used to run crosstabs of each of the SPF Trend categories per the quartiles.

In doing so, my aim was to triangulate the findings, thus showing that differences in student

demographics, EL characteristics and services, and school contexts did not only represent

potentially insignificant variation of a few percentage points but indeed reflected schools at the

extremes of the distribution of the study variables.

RQ3. What are the student demographics, EL characteristics and services, and school contexts

per charters and district-run schools?

Similar methods to those used to answer Research Questions 1 and 2 were also employed

to address Research Question 3, as all three of these research questions resulted in the creation of

descriptive statistics of the means of the study variables for each individual academic year as

well as the means for the three-year aggregate. For this research question, I used Stata to create

and export the means for all the study variables per two categories: whether a school was district-

run schools or a charter. Stata was then used to create and export these data for each individual

academic year as well as the three-year averages.

RQ4. Do student demographics predict SPF scores?

To address Research Question 4, Stata was used to run OLS multiple regressions to test

whether student demographics predicted the percent of SPF points earned. Individual student

demographic predictors were the percent of the student population classified as (a) Students of

Color, (b) English Learner, (c) Special Education, or (d) Free and Reduced Lunch.
68
These regressions held constant: (a) the percent of teachers that are classified as Fully

Qualified, (b) the student-teacher ratio, (c) the number of disciplinary actions that result in

instructional loss per 100 students, and (d) dichotomous variables for the academic years 2017-

2018 and 2018-2019, with the variable for 2016-2017 being omitted as the reference. Although

there were other school context variables present in the study that could have served as

alternative or additional controls, the decision to include or omit some of these variables as

controls in the multiple regression models was both based in the data and in theory.

I decided not to include rates of disciplinary actions and incidents as controls because the

disciplinary outcomes that could most directly impact learning achievement was already

captured through the variable of disciplinary actions resulting in instructional loss. In addition,

this latter variable had a high degree of collinearity with the former discipline variables (r=0.77

and r=0.75 respectively) (Table 1), making them problematic additional controls. Similarly, the

categorical variable indicating whether a school was a charter or district-run did not have a

statistically significant correlation coefficient with the outcome of interest, the percent SPF

points earned (Table 1), and there was not a sufficient rationale in the existing research literature

to justify its inclusion despite its lack of a statistically significant correlation with the outcome of

interest. Finally, the enrollment variable was not included both due its small correlation

coefficient (r=0.10), which implies a lack of practical significance despite its statistical

significance, in addition to the limited extant research literature regarding its mediating role in

accountability outcomes that could justify its inclusion on theoretical grounds. As seen in Table

1, the controls chosen in these regressions all had statistically significant correlations with the

percent of SPF points earned with moderate to large coefficients, a data-based rationale for

inclusion that complemented the theoretical and research-based rationales regarding how they

69
could impact achievement outcomes independent of yet related to student demographics, thus

representing alternate metrics that could influence the learning outcomes that the accountability

policies aim to measure apart from the student demographic predictors.

Table 1.
Pearson Correlations of Potential Control Variables and SPF Percent Points Earned
Full
Student-
SPF Disc. - Qual. School Enroll- Disc. - Disc.-
Teacher
Points % Loss Teachers Type ment Incidents Actions
Ratio
%
SPF Points Earned % 1.00
Discipline - Instructional Loss
-0.25* 1.00
per 100 Students
Fully Qualified Teachers % 0.18* -0.25* 1.00
Student-Teacher Ratio 0.16* -0.16* 0.17* 1.00
School Type
-0.08 0.19* -0.58* -0.21* 1.00
(charter or district-run)
Enrollment 0.10* -0.09* 0.15* 0.33* -0.24* 1.00
Discipline - Incidents per 100
-0.21* 0.75* -0.31* -0.18* 0.18* -0.05 1.00
Students
Discipline - Actions per 100
-0.23* 0.77* -0.29* -0.18* 0.18* -0.03 0.96* 1.00
Students

* Indicates p-value ≤ 0.05


These regressions used a series of student demographic variables to predict the

percentage of SPF points earned when these control variables were held constant. First, I ran two

models for each individual student demographic predictor and the controls, one in which the

student demographic predictor used linear terms and one in which the student demographic

predictor used cubed terms. Cubed terms were chosen due to the apparent nonlinear relationship

between each student demographic predictor and the outcome, the percent of SPF points earned,

as evident in the spread of the scatterplots showing the relationship between each student

demographic predictor and the percent of SPF points earned (Figure 2).

70
Figure 2.
Scatterplots Panels of Student Demographics and the Percent of SPF Points Earned

Schools by Percent SPF Points Earned and Schools by Percent SPF Points Earned and
and Percent Students of Color Enrollment Percent Free & Reduced Lunch Student Enrollment
All District, AY 2016-2017 through AY 2018-2019 All District, AY 2016-2017 through AY 2018-2019
90 100

100
90
80

80
Percent SPF Points Earned

Percent SPF Points Earned


70

70
60

60
50

50
40

40
30

30
20

20
10

10
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Percent Students of Color Enrollment Percent Free & Reduced Lunch Student Enrollment

Schools by Percent SPF Points Earned and Schools by Percent SPF Points Earned and
Percent English Learner Student Enrollment Percent Special Education Student Enrollment
All District, AY 2016-2017 through AY 2018-2019 All District, AY 2016-2017 through AY 2018-2019
90 100

100
90
80

80
Percent SPF Points Earned

Percent SPF Points Earned


70

70
60

60
50

50
40

40
30

30
20

20
10

10

0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30 35 40
Percent English Learner Student Enrollment Percent Special Education Student Enrollment

Then, I created two sets of saturated models using all the student demographic predictors

together along with the controls. However, due to high collinearity between the Student of Color

and the Free and Reduced Lunch variables (r=0.95) as shown in the Pearson’s correlations in

Table 2, both could not be included in a single model. This prompted me to create two sets of

saturated models: One using the Student of Color variable, and the other using the Free and

71
Reduced Lunch variable. Like the models of individual student demographic predictors, I first

created a model using linear terms for all the student demographic predictors and then squared

terms and also cubed terms when they were statistically significant.

Table 2.
Pearson Correlation of Student Demographic Predictors and SPF Percentage Used in Multiple
Regressions

Free %
Student Demographic Student of English Special
SPF % Reduced
Predictor Color % Learner % Education %
Lunch %

SPF % 1

Student of Color % -0.37 * 1

Free % Reduced Lunch % -0.38 * 0.95 * 1

English Learner % -0.19 * 0.75 * 0.77 * 1

Special Education % -0.40 * 0.28 * 0.34 * 0.07 1

* Indicates p-value ≤ 0.05

Finally, I used each model of individual student demographic predictors using cubed

terms (as opposed to the models using linear terms) to create predicted margins, which then were

employed to create figures with Stata to show how changes in the student demographic predictor

relates to changes in the predicted SPF percent points earned.

RQ5. Do student demographics predict SPF outcomes?

To address Research Question 5, the same student demographic predictor variables and

same controls were used to run ordinal logit regressions with the outcome of Simplified SPF

categories of (a) Intervention Status, (b) On-Watch Status, and (c) High-Performing Status.

The first step toward addressing this question involved the creation of a new series of

Simplified SPF ratings categories. This was done to capture the similarities of accountability

outcomes rather than treating those similarities as artificially distinct. For example, a school is
72
subject to poor repute and district intervention if it receives either a Red or Orange SPF rating.

The Simplified SPF ratings categories also reflected the ways the district describes similarities

between SPF ratings outcomes. On the DPS website, Blue and Green ratings are described as

representing a similar accountability outcome, as they are “the top ratings,” each indicating that a

school “is generally doing well in the areas of student academic growth, family satisfaction,

equity, and more,” and each representing the points toward which all schools should aspire as

“all schools are working to achieve Green or Blue ratings” (Denver Public Schools, n.d. - b).

Likewise, DPS describes both Red and Orange rating as indicating the “need of significant

improvement” (Denver Public Schools, n.d. - c) because “the school needs a lot of extra support

to improve,” which initially comes in the form of an Improvement Plan, and, if that is not

successful, then “DPS may also need to make significant changes to the school program or

leadership. If a school receives a Red or Orange rating for several years, then DPS may restart or

close the school” (Denver Public Schools, n.d. - c). For these reasons, I decided that combining

similar SPF ratings outcomes would be the most efficient means of addressing this research

question, as it seeks to explore the relationship between student demographic predictors and

accountability outcomes and the district describes multiple ratings as resulting in similar

outcomes.

In order to capture trends regarding these broad similarities in outcomes, the five SPF

ratings were collapsed into three Simplified SPF designations: (a) all schools that received a Red

or Orange SPF rating were included in the “Intervention” Simplified SPF designation; (b) all

schools that received a Yellow SPF rating were included in the “On Watch” Simplified SPF

designation, using the term for Yellow schools employed by the district; and (c) all schools that

received a Green or a Blue SPF rating were included in the “High Performing” Simplified SPF

73
designation. These Simplified SPF categories were then used to run ordinal logit regressions

using cubed terms for all the student demographic variables used in Research Question 4, with

the exception of the Special Education variable, which only used squared terms as its quadratic

coefficient was no longer statistically significant. Like Research Question 5, these regressions

also held constant: (a) the percent of teachers that are classified as Fully Qualified, (b) the

student-teacher ratio, (c) the number of disciplinary actions that result in instructional loss per

100 students, and (d) dichotomous variables for the academic years 2017-2018 and 2018-2019,

with the variable for 2016-2017 being omitted as the reference. Finally, like Research Question 4

these models first explored each individual student demographic predictor. These individual

student demographic predictor models were used to produce predicted margins in Stata, which

were exported to Excel and then used to create figures in R Studio showing how changes in each

student demographic predictor variable related to predicted changes in the probability of

receiving each of the three Simplified SPF designation outcomes.

Analysis and Interpretation

Because this is a QuantCrit study, all these results were analyzed and interpreted with

explicitly antiracist aims. At no point were disparities in accountability outcomes, school

contexts, or student demographics attributed to inherent characteristics of the students

themselves or being “caused” by student demographics. Rather, all disparities were framed as

reflections of accountability policies and institutional practices, pointing to the need for better

policies and practices instead of the need for different kinds of students.

Positionality Statement

One of the reasons I aspired to earn a doctorate was because too often I sat with grieving

families and children who had internalized the deficit views that harmful, discriminatory
74
educational policies and practices beget: mothers who blamed themselves for not speaking

English well enough to understand how and why their child was being placed in Special

Education; my niece, who tearfully told me she was “too dumb” to pass kindergarten and needed

to be retained. Critical research has explored the process and consequences of historically

marginalized populations' internalization of racial hierarchies and the deficit ideologies which

maintain them (Kohli, 2014). The interactions I had with families, both as a social worker and a

family member, showed me time and again the truly insidious consequences of education

policies and practices which teach families and children that they are fundamentally inadequate.

This research project was intended for them.

However, as a researcher dedicated to social justice work it is incumbent upon me to

name, reflect on, and interrogate how my positionality as a straight, White, cis, English-

dominant, middle-class woman represent limitations of this work (Hartsock, 1997; Milner, 2007;

North, 2008). This study explores the mechanisms by which accountability policy marginalizes

communities that are already marginalized. Yet, because I am both outside these communities as

well as privileged rather than disadvantaged by such policies, I may not be able to fully

understand them as embodied practices. This work may land in highly personal and painful ways

on the historically marginalized communities which this study takes as its focal population, and

even new methodologies such as QuantCrit may not be completely adequate for capturing this

marginalization. If the findings of such research merely serves to inform, remind, or retraumatize

these populations about the various ways that they are socially constructed as inferior or

institutionally marginalized, then this study is arguably as destructive as the phenomena it aspires

to bring to light. Because of this, in conducting this work I have especially endeavored to not

trivialize the experiences of the focal populations of this study, whose identities and experiences

75
transcend the narrow boundaries that I have employed through demographic categorizations. In

doing this work, I lean heavily on my experiences working and living with Mexican, immigrant,

and undocumented communities over the last sixteen years, experiences which have engendered

a deep place of love and respect in my heart for the families with whom I have been privileged to

work and serve. Such feelings are compounded by the love and respect I have for my family who

come from similar backgrounds. Despite how my positionality limits my ability to fully

understand the issues explored in this dissertation, it is my hope that these limitations are

tempered by the attitude of service and love I bring to the project.

76
Results

Research Question 1: What are the student demographics, EL characteristics, and school

contexts per School Performance Framework (SPF) rating bracket?

Schools in the lowest rated brackets consistently served higher proportions of historically

marginalized student populations of students of color (SoC), students receiving Free and

Reduced Lunch (FRL), Special Education students (SPED), and English Learner students (EL),

while serving lower percentages of Gifted and Talented students (GT). Disparities between the

lowest and highest SPF ratings (Red and Blue) were the most dramatic. At no point during the

study Timeframe did schools in the Blue SPF ratings category serve average student populations

that were either (a) above the district average for historically marginalized populations, or (b)

below the district average for GT students or Fully Qualified Teachers. The inverse trend was

evident regarding schools in the Red SPF ratings category: With the exception of ELs in the

2016-2017 academic year, throughout the study Timeframe Red schools consistently served

historically marginalized student populations that exceeded district averages while having

percentages of GT students and Fully Qualified Teachers that never reached the district averages.

Below, I provide a brief description of the discrepancies between student demographics in the

lowest-rated (Red) and highest-rated (Blue) schools as evident in the three-year aggregate means,

followed by a summary of general trends between Red and Blue schools for the remaining

variable categories. Table 3 shows the means of each variable per SPF ratings bracket for the

aggregated three academic years of the study, and Appendix Tables 2, 3, and 4 in Appendix B

show the means for each individual academic year. Figure 3 shows the mean percentages of the

student demographics and select EL characteristics and school contexts per SPF ratings bracket

during each year of the study.

77
Table 3.
Means of Student Demographics, English Learner Characteristics, Outcomes and Programs, and
School Contexts Across SPF Ratings Brackets for Academic Years 2016-2017 through 2018-19
District
School Characteristics Red Orange Yellow Green Blue
Average
N 48 55 186 226 47 562
% 8.5% 9.8% 33.1% 40.2% 8.4% 100%
Student Demographics
Students of Color % 87.5 86.2 78.4 77.5 54.7 77.6
Free and Reduced Lunch % 78.5 76.6 71.5 69.0 42.3 69.2
Special Education % 15.4 14.4 12.3 10.9 8.3 11.8
English Learner % 36.7 38.8 32.2 37.2 21.0 34.3
Gifted and Talented % 10.5 11.6 12.4 11.2 18.7 12.4
English Learner Characteristics
Special Education as English Learners % 41.5 44.0 35.2 42.1 30.3 39.0
Spanish-Speaking English Learner % 87.0 84.8 80.3 78.4 59.7 78.8
English Learners in Gifted and Talented % 2.1 2.6 2.7 2.7 10.2 3.1
Beginning Level English Learner % 24.2 25.6 22.9 21.1 15.7 22.0
Intermediate Level English Learner % 72.2 68.9 70.9 70.0 66.3 70.1
Advanced Level English Learner % 3.6 5.4 6.2 8.9 17.9 7.9
English Learner Services
Redesignation % 10.5 9.7 14.8 10.4 18.6 12.5
Exit % 6.7 6.5 6.4 5.7 10.0 6.4
Re-Entry % 0.7 1.5 1.0 0.7 1.4 0.9
Parent Preference 1 % (bilingual) 40.1 42.5 38.6 41.0 27.5 39.1
Parent Preference 2 % (whatever is at school) 50.5 49.3 53.7 52.9 64.7 53.6
Parent Preference 3 % (nothing) 9.2 8.5 7.9 6.1 9.4 7.5
Mainstream % 20.6 35.8 17.2 24.2 34.1 23.5
ELA - English % 69.0 46.6 66.9 55.6 57.9 59.9
ELA - Spanish (ELAS) % 10.4 13.9 15.4 16.3 5.7 14.4
Dual Language (DL) % 0.0 3.7 0.5 3.8 2.2 2.3
Native Language (ELAS+DL) % 10.4 17.6 15.9 20.1 8.0 16.7
School Contexts
Total Enrollment 314.0 423.3 479.2 442.1 439.2 441.4
Student-Teacher Ratio 14.3 14.6 14.6 15.0 16.2 14.9
Fully Qualified Teacher % 67.4 71.4 80.2 79.5 86.1 78.7
Disciplinary Actions per 100 Students 16.5 11.5 11.8 7.6 6.1 10.0
Disciplinary Incidents per 100 Students 23.2 18.4 16.9 10.5 8.3 14.3
Disciplinary Actions Resulting in
11.8 8.1 7.0 4.2 2.8 6.1
Instructional Loss per 100 Students
Charter School % 54.2 43.6 20.4 28.8 31.9 29.9

78
Figure 3.
Mean Percentages of Select Student Demographics, EL Characteristics and Services, and School
Contexts Across SPF Ratings Brackets for Each Year

79
Student Demographics

Students of Color: Blue schools served average percentages of students of color that

were 22.9 percentage points lower than the district average (m=77.6%) and 32.8 percentage

points lower than Red school averages (m=87.5%), meaning that on average students of color

populations were 60% larger in Red schools compared to Blue schools. On average, Red schools

had 9.9 percentage points more students of color than the district average. Although during the

three years of the study all the non-Blue SPF ratings brackets had average populations of

students of color that were over 75%, the Blue SPF ratings bracket had average populations of

students of color of only 54.7%.

Free and Reduced Lunch (FRL): During the study Timeframe, all non-Blue schools

had average FRL populations between 69% and 78.5%, while Blue schools had average FRL

populations of only 42.3%. Blue schools served average percentages of FRL students that were

27 percentage points lower than the district average (m=69.2%) and 36.3 percentage points lower

than the Red school average (m=78.5%), while Red schools served average FRL populations 9.3

percentage points above the district average. As such, Red schools served average FRL

populations that were 85.9% larger than those in Blue schools.

Special Education (SPED): Although it may appear that there was greater parity

between Blue schools and the district average regarding SPED populations as Blue schools

served an average SPED population that was only 3.5 percentage points below the district

average, the general low frequency of these students can be misleading since that 3.5 percentage

point discrepancy actually represents an average SPED population that was 38.9% larger than the

district average. Similarly, Red schools served average SPED populations that were 3.6

percentage points or 30.1% larger than district average. Taken together, Blue schools served

80
average SPED populations 7.1 percentage points smaller than those of Red schools. Since the

district average for SPED students was 11.8%, a difference of 7.1 percentage points is relatively

large, as it translates to Red schools serving average SPED populations (m=15.4%) that were

85.8% larger than those in Blue schools (m=8.3%).

English Learner (EL): Red schools did not serve average EL populations that were

much larger than the district average as they only surpassed the district average by 2.4

percentage points. However, Blue schools served EL populations that were considerably smaller

(m=21.0%) than the district average (m=34.3%), representing a difference of 13.3 percentage

points or 38.8%. smaller than the district average. Compared to Red schools, Blue schools served

EL populations that were 15.7 percentage points smaller, meaning that Red schools served EL

populations 74.8% larger than those in Blue schools.

Gifted and Talented (GT): The frequency of students receiving GT services is a special

point of policy- and practice-based disparities, unless one assumes that talents are not evenly

distributed across the population. Rejecting this logic, it is undeniable that the highest- and

lowest-rated schools identified gifts and talents in their students at very different rates. On

average, nearly one in five students at Blue schools were designated for GT (m=18.7%), while in

Red schools only one in ten students (m=10.5%) were designated as such. Both of these rates

differ from the district average of 12.4%, with Blue schools’ GT students being about 50%

higher than the district average and Red schools’ GT students being 15.2% lower than the district

average. Refusing to accept the causal reasoning that there are nearly half as many students with

gifts and talents in Red schools as compared to Blue schools, these discrepancies highlight the

differential treatment, opportunities, and acknowledgement that students in the majority SoC and

FRL Red schools face.

81
EL Characteristics and Services

Unsurprisingly, the discrepancy between the rate of students receiving GT in Blue and

Red schools is also evident in the rate of EL participation in GT. About one in ten ELs in Blue

schools participated in GT (m=10.2%), while only about one in 50 ELs in Red schools (m=2.1%)

were given the same opportunity. Compared to the district average, Blue schools had 228.9%

higher rates of ELs in GT. Similarly, Red schools saw a greater percentage of their SPED

students co-classified as ELs (m=41.5%), which is 11.2 percentage points higher than the rate in

Blue schools (m=30.3%), whose rate of SPED students that are also ELs is 8.7 percentage points

or 22.3% less than the district average. A raciolinguistic lens which highlights how Spanish-

English bilingualism is especially denigrated in the US might clarify the ideological roots of

these discrepancies, as Red schools on average also had 27.4 percentage points or 45.9% larger

Spanish-speaking EL populations than Blue schools, whose Spanish-speaking EL populations

were 19.2 percentage points smaller than the district average.

Similarly, Red and Blue schools served ELs at markedly different points in their

trajectory toward English development. Blue schools on average served percentages of

Advanced Level ELs (m=17.9%) that were 10.0 percentage points or 126.1%) larger than the

district average (m=7.9%). Since the percentage of Intermediate Level ELs were similar between

Red and Blue schools (m=72.2% and m=66.3% respectively), the discrepancy of Advanced

Level ELs was reflected in a similar discrepancy in the rate of Beginning Level ELs, as Red

schools on average had 8.4 percentage points or 53.5% larger Beginning Level EL populations

than Blue schools. Since by definition these students are not yet proficient in English, and the

Spanish-language version of the primary standardized test used to calculate SPF scores is only

available in Spanish until the fourth grade (Colorado Department of Education, n.d.), it is

82
plausible that some of these students are nonetheless taking standardized tests in English, which

might account for the low accountability scores received by the schools with more of these

students.

Because Blue schools had higher rates of Advanced Level ELs, it is likewise expected

that they also redesignated and exited their ELs from English Learner services at higher rates

than the district average (m=12.5% and m=6.4% respectively), whose rates mirrored those of

Red schools. Blue schools also Re-Entered ELs into English Learner services afterwards

(m=1.4%) at double the rate of Red schools (m=0.7%). These different rates might reflect the

kinds of language support services available to students at Red and Blue schools, where on

average 13.5 percentage points more students were in Mainstream settings (m=34.1%) than in

Red schools (m=20.6%) despite having similar rates of parents choosing the preference option

(PPF3) to deny EL services which would make mainstream settings the most appropriate.

Interestingly, 12.6 percentage points or 45.8% more parents wanted their EL students to receive

native language services (PPF1) in Red schools (m=40.1%) as compared to Blue schools

(m=27.5%), although Red schools had no ELs in Dual Language programs.

School Contexts

These differences in EL characteristics and outcomes were sadly also reflected in the

percent of teachers the district classifies as “Fully Qualified” to teach ELs, where Red schools

had 11.3 percentage points fewer “Fully Qualified” teachers than the district average and Blue

schools had 7.4 percentage points more than the district average. As such, when compared to

Blue schools Red schools had 18.7 percentage points or 21.7%. fewer “Fully Qualified” teachers,

despite scholarship that calls for students with the greatest needs to be given more – not less –

access to high quality teachers. Not only did Red schools have considerably smaller “Fully

83
Qualified” teacher populations despite having greater proportions of historically marginalized

students, ELs, and Beginning Level ELs, students in these schools also experienced disciplinary

regimes unlike those in the Blue schools. During the study Timeframe, on average students in

Red schools received 172.8% more disciplinary actions, 181.6% more disciplinary incidents, and

322.5% more disciplinary actions resulting in instructional loss than their peers in Blue schools.

Like the GT rates, this study rejects the possibility that students in majority SoC and FRL

schools are three times more deserving of discipline that removes them from learning than

students in whiter and wealthier schools. These differences in disciplinary environments possibly

reflect differences in discipline policies in charter schools, as such schools make up 54.2% of

Red schools but only 29.9% of the district.

Summary

During the three years of the study Timeframe, a similar number of schools were

categorized as Red (n=48) and Blue (n=47). Despite representing similar proportions of the

district, schools at each pole of the SPF ratings brackets served very different kinds of students

under very different kinds of school contexts. These findings indicate that the SPF is not only

measuring student learning, but also student demographics and, to a lesser extent, school

contexts. By measuring conditions extrinsic to student learning, the SPF results in accountability

ratings that disadvantage the schools with the most historically marginalized students, Beginning

Level ELs, and Spanish-speaking ELs, while not providing these schools with the supports (such

as “Fully Qualified” teachers and improved mechanisms for identifying GT students) that these

students need and deserve to thrive.

84
Research Question 2: At what rate did schools remain in, enter into, or exit the most

extreme SPF ratings designations of Intervention and Blue status, and what are the student

demographics, EL characteristics and services, and school contexts in these statuses?

Although the findings from Research Question 1 indicate that schools with the highest

and lowest SPF ratings serve different student populations under different school contexts, a

potential counterpoint would be that these differences merely reflect the “education debt”

(Ladson-Billings, 2006) owed to historically marginalized students, or the historical failure to

provide these students with equitable resources and opportunities that the accountability

movement sought to highlight and rectify. As such, the fact that low-rated schools serve higher

proportions of historically marginalized students proves the need for accountability policies,

which seek to improve outcomes for students by identifying and discouraging low-performance

in part by identifying and encouraging high-performance. Research Question 2 sought to

investigate this potential counterpoint by exploring the effectiveness of the SPF in achieving its

goals of promoting higher performance (and thus higher SPF ratings) while discouraging low

performance (and thus low SPF ratings). If it is effective, then schools should demonstrate trends

toward higher performance and thus higher ratings over time, while remaining in low-ratings

statuses only briefly while the consequences of accountability begin to promote improvements.

To address this research question, the two poles of the SPF framework were contrasted: At one

end was Intervention Status, representing the schools that earned either the Red or Orange SPF

ratings, which indicate school failure and result in district intervention; at the other end, Blue

Status represented schools that earned the highest SPF rating possible, and the point toward

which the accountability framework should move schools and at which schools should aspire to

remain.

85
Table 4 shows descriptive statistics of the counts and rates of SPF Trends, or schools that

remained in, entered into, or exited from these two poles of Intervention Status and Blue Status.

An additional row, called “Began in Status,” is included in this table to indicate how many

schools were in either Intervention or Blue Status at the beginning of the study; the combined

counts of schools that remained in or exited each Status equal the counts of those that “Began in

Status.” These results only show counts and rates of the final academic year of the study, 2018-

2019, as these data reflect the final outcome of trends at the end of the study Timeframe and

using the three-year aggregate would result in repeat counts of schools. In addition, this table

disaggregates counts and rates per district-run and charter schools, one of the school context

variables identified in the research question. Table 6 shows descriptive statistics of the remaining

study variables for schools per each SPF Trend status using the three-year aggregate means in

order to capture the average student demographics, EL characteristics and services, and school

contexts of schools in each of these SPF Trend statuses throughout the study.

Table 4.
Descriptive Statistics of Schools that Remained In, Entered Into, and Exited From Intervention Status
and Blue Status per District-Run and Charter Schools as of the Final Year of the Study (2018-2019)
District-Run Schools Charter Schools All District Total
a a
SPF Status Trend Count Percent Count Percent Count Percent b
Total 129 69.7% 56 30.3% 185 100%
Intervention Status
Began in Status 9 56.3% 7 43.8% 16 8.6%
Remained 4 44.4% 5 55.6% 9 4.9%
Exited 5 71.4% 2 28.6% 7 3.8%
Entered 17 54.8% 14 45.2% 31 16.8%
Blue Status
Began in Status 11 61.1% 7 38.9% 18 9.7%
Remained 4 80.0% 1 20.0% 5 2.7%
Exited 7 53.9% 6 46.2% 13 7.0%
Entered 7 87.5% 1 12.5% 8 4.3%
a
Percent reflects totals per each Status Trend category
b
Percent reflects total count of schools in the district (n=185)

86
SPF Trends: Rates of Schools Remaining In, Entering Into, and Exiting Intervention Status and

Blue Status

During the study Timeframe, more than four times more schools entered into Intervention

Status (n=31, or 16.8% of schools in the district) than exited it (n=7). Of the 16 schools that

began the study in Intervention Status, the majority of them (n=9) remained in this status

throughout the three years of the study. Although a similar number of schools began the study in

Blue Status (n=18), the majority of them (n=13) consequentially exited it, with only five schools

remaining in Blue Status throughout the study and only eight schools entering it. These data

suggest that during the study timeframe there was a downward trend in accountability outcomes

as schools were more likely to gain and maintain Intervention Status than they were to gain or

maintain Blue Status. Similarly, schools were more likely to lose Blue Status than lose

Intervention Status. These trends indicate that during the study the SPF accountability

framework was ineffective in promoting school success. Together, these trends show that –

despite the explicit purpose of the SPF accountability framework to promote higher performance

and school success – during the study there was not an overall improvement of outcomes in

terms of SPF status at the district level but rather schools experienced increasing rates of failure

and declining rates of success.

SPF Trends: District-Run and Charter Schools

District-run schools (69.7% of all schools) were overrepresented in the categories of

schools that (a) remained in Blue Status (80.0%), (b) entered Blue Status (87.5%), and (c) exited

Intervention Status (71.4%), although here their overrepresentation was to a lesser extent. They

were underrepresented in the categories of schools that (a) remained in Intervention Status

(44.4%), (b) entered Intervention Status (54.8%), and (c) exited Blue Status (53.9%). Charter

87
schools (30.3% of all schools) were overrepresented in the categories of schools that (a)

remained in Intervention Status (43.8%), and (b) entered Intervention Status (55.6%). They were

underrepresented in the categories of schools that (a) exited Intervention Status (28.6%), (b)

remained in Blue Status (20.0%), and (c) entered Blue Status (12.5%). These data suggest that

during the study Timeframe charter schools were less likely than district-run schools to achieve

the high performance that the accountability framework seeks to promote, as only 1.8% of

charters entered into or remained in Blue Status, while 3.1% of district-run schools remained in

Blue Status and 5.4% of them entered into it. At the same time, charters were twice as likely to

exit Blue Status than district-run schools, with 10.7% of charters and only 5.4% of district-run

schools exiting. Charters remained in Intervention Status at a rate (8.9%) almost three times

higher than district-run schools (3.1%). Similarly, during the study Timeframe one in every four

charters (25.0%) entered into Intervention Status, while only 13.2% of district-run schools did

the same. Together, these data show that during the study Timeframe charters were more likely

to be or become low performing than district-run schools while being less likely to be or become

high performing.

SPF Trends: Student Demographic, EL Characteristics and Services, and School Contexts

These discrepancies might be partially attributable to the different student populations

and school contexts in the schools in each of the SPF Trend statuses (Table 6). Table 5 (below)

presents a key to the abbreviated table variable labels used in Table 6 and elsewhere in this

chapter.

88
Table 5.
Key To Abbreviated Variable Names
Abbreviated Variable Description
Student Demographics
SoC % Percent of students that are students of color (SoC)
FRL % Percent of students that receive Free and Reduced Lunch (FRL)
SPED % Percent of students that receive Special Education (SPED) services
EL % Percent of students that receive English Learner (EL) services
GT % Percent of students that receive Gifted and Talented services (GT)
English Learner Characteristics
SPED as ELs % Percent of Special Education students that are also ELs
Spanish EL % Percent of ELs students that are Spanish speakers
ELs in GT % Percent of ELs students that receive Gifted and Talented services
Beginning EL % Percent of ELs that are in the Beginning Level of English acquisition
Intermediate EL % Percent of ELs that are in the Intermediate Level of English acquisition
Advanced EL % Percent of ELs that are in the Advanced Level of English acquisition
English Learner Services
Redes. % Rate at which English Learners were redesignated from EL services
Exit % Rate at which English Learners were exited from EL services
Re-Entry % Rate at which English Learners were re-entered from EL services
Percent of families who request native language supports designed for ELs
PP1 %
for their EL children
Percent of families who request English-only supports designed for ELs for
PP2 %
their EL children
PP3 % Percent of families who request no supports for their EL children
Percent of ELs placed in Mainstream programs, which are not specifically
Main. %
designed for ELs
Percent of ELs in English Language Acquisition-English programs, which
ELA-E %
are specifically designed for ELs and taught through English-only
Percent of Els placed in English Language Acquisition-Spanish programs,
ELA-S %
which are specifically designed for ELs and taught through Spanish
DL % Percent of ELs placed in Dual Language programs
Nat. Lang. % Percent of ELs placed in either Dual Language or ELA-S programs
School Contexts
Enrollment Total student enrollment
Student-Teacher
Ratio of students to teachers
Ratio
Full. Qual. Teacher Percent of teachers with the label of “Fully Qualified” to teach emergent
% bilingual students according to district metrics
Disp. Actions Rate Count of disciplinary actions per 100 students
Disp. Incidents Rate Count of disciplinary incidents per 100 students
Disp. Instruction
Count of disciplinary actions that result in instructional loss per 100 students
Loss Rate
SPF % Percent of SPF points earned out of total points possible

89
Table 6.
Descriptive Statistics of Means of Schools that Remained In, Entered Into, and Exited From
Intervention Status and Blue Status Across the Three-Year Study Timeframe Aggregate
Intervention Status Blue Status
Remain Enter Exit Remain Enter Exit
N 9 31 7 5 8 13
% 4.9% 16.8% 3.8% 2.7% 4.3% 7.0%
Student Demographics
SoC % 87.5 87.3 80.0 29.5 59.0 73.9
FRL % 75.8 77.6 74.9 15.5 48.5 62.9
SPED % 18.2 12.7 12.0 5.7 10.2 10.5
EL % 31.9 40.6 44.7 9.8 21.9 34.0
GT % 10.1 12.0 8.5 23.1 12.9 16.3
English Learner Characteristics
SPED as ELs % 39.1 46.4 40.9 11.8 38.1 44.4
Spanish EL % 83.7 88.3 80.9 36.7 62.4 77.3
ELs in GT % 3.9 1.9 2.4 28.9 6.0 2.2
Beginning EL % 27.8 23.6 24.6 6.4 15.9 20.8
Intermediate EL % 67.8 70.5 69.7 66.5 70.4 69.5
Advanced EL % 4.4 5.9 5.7 27.1 13.7 9.7
English Learner Services
Redes. % 9.3 9.8 10.3 27.2 9.6 25.9
Exit % 4.2 6.0 4.8 14.4 7.1 7.4
Re-Entry % 0.6 1.3 1.4 2.8 0.4 1.0
PP1 % 37.5 44.1 49.5 11.5 34.8 37.7
PP2 % 51.5 47.9 45.6 75.8 63.7 54.8
PP3 % 12.0 7.8 4.9 12.6 3.9 8.6
Main. % 38.3 34.6 21.2 34.0 13.9 40.6
ELA-E % 46.7 49.2 49.5 66.0 86.1 43.7
ELA-S % 11.1 12.9 29.3 0.00 0.00 9.3
DL % 3.8 3.3 0.00 0.00 0.00 6.4
Nat. Lang. % 15.0 16.2 29.3 0.00 0.00 15.7
School Contexts
Enrollment 228.4 419.4 545.7 459.1 473.5 369.0
Students-Teacher Ratio 13.7 14.6 14.7 18.0 15.6 14.9
Full. Qual. Teacher % 61.3 75.9 78.4 88.8 81.8 82.9
Disp. Actions Rate 17.7 18.8 20.6 5.5 15.1 7.4
Disp. Incidents Rate 11.7 13.3 14.7 4.5 10.7 5.5
Disp. Instructional Loss Rate 10.0 8.4 9.3 2.5 2.8 3.2

Just as the findings from Research Question 1 showed that the highest- and lowest-rated

schools vary across student demographics, EL characteristics and services, and school contexts,

so, too, do the trends of schools that remained in, entered into, and exited from the extremes of

the SPF ratings brackets – Intervention Status and Blue Status -– vary along these lines, with the

90
most distinct variation evidenced between the student populations in the schools that were

consistently low- and high-rated. Compared to schools that remained in Blue Status for every

year of the study, schools that remained in Intervention Status on average had nearly three times

larger proportions of students of color (87.5%), EL students (31.9%), and SPED students

(18.2%), and five times larger FRL populations (75.8%), with less than half the rate of GT

students (10.1%). Schools that were able to exit Intervention Status had smaller average

proportions of SoC (80.0%) and SPED students (12.0%) than schools that remained in this

status. Interestingly, schools that exited Intervention Status had larger proportions of ELs (12.8

percentage points more) than schools that remained. Schools that entered Intervention Status not

only had larger proportions of EL students than schools that remained in Intervention Status, but

they also had slightly larger FRL proportions and almost identical SoC proportions. These data

imply that schools that entered Intervention Status had historically marginalized student

populations that were similar to those that remained in Intervention status, while schools that

exited this status had smaller proportions of these students with the exception of ELs.

Likewise, the 13 schools that exited Blue Status had much larger average proportions of

SoC (73.9%), FRL (62.9%), SPED (10.5%) and EL (34.0%) students than schools that remained

in Blue Status, whose average proportions of SoC (29.5%), FRL (15.5%), SPED (5.7%), and EL

(9.8%) students were approximately one-quarter to one half as large. Schools that entered Blue

Status had average historically marginalized student populations that were somewhat in the

middle of these two, with SoC, FRL, and EL populations that were respectively 29.5 (SoC), 33

(FRL), and 12.8 (EL) percentage points higher than schools that remained in Blue Status, but

13.9 (SoC), 14.4 (FRL), and 12.1 (EL) percentage points lower than schools that exited it. Just as

with the trends evident in the Intervention Status, these data indicate that schools with larger

91
proportions of historically marginalized students were more likely to exit Blue Status, while

schools with smaller proportions of these students were more likely to enter into or remain in it.

Similar disparities were evident in EL characteristics and services, as schools that

experienced Intervention status had a little more than twice the proportion of Spanish-speaking

ELs and approximately four times higher rates of ELs in SPED than schools that remained in

Blue Status. Notably, all schools in this analysis had low rates of ELs in GT, ranging between

1.9% and 6%, except for schools that remained in Blue Status, where nearly one in three GT

students were also ELs. However, these schools also served five to six times larger proportions

of Advanced Level ELs as compared to all schools that experienced Intervention Status – whose

Beginning Level EL populations likewise were about four times larger. The larger proportion of

Advanced Level ELs in schools that remained in Blue Status is mirrored in these schools’ higher

rates of redesignating and exiting students from English Learner services. These rates appear to

be divorced from the kinds of program settings EL students were in as these schools had very

similar rates of Mainstream participation as those that experienced Intervention Status, with

differences in ELA-E and ELA-S appearing to correspond to differences in Spanish-language

ELs and parent preferences.

While ELs seemed to be placed in program settings following similar logics in both

Intervention and Blue Status schools, in schools that remained in Blue Status 88.8% of their

teachers were Fully Qualified to work with such students while only 61.3% of teachers were

similarly qualified in the schools that remained in Intervention Status. The difference in teacher

quality was compounded by differences in disciplinary environments, as students in schools that

experienced Intervention Status received disciplinary actions and incidents between twice to five

times more frequently than students in schools that remained in Blue Status. Sadly, students in

92
schools that remained in Intervention Status received four times higher rates of disciplinary

actions that resulted in instructional loss than their peers in schools that remained Blue.

Summary

Together, these trends indicate that schools with greater proportions of students of color,

students receiving Free and Reduced Lunch, Special Education students, Spanish-speaking ELs,

and Beginning Level ELs in addition to higher rates of discipline were overrepresented in

schools that remained in Intervention Status, while having fewer proportions and rates of these

metrics was evident in schools that exited Intervention Status. Conversely, schools that remained

in Blue Status had strikingly smaller proportions of these students and lower rates of discipline,

while having larger proportions of these students and higher rates of discipline was evident in

schools that exited Blue Status. Schools that experienced Blue Status at some point during the

study Timeframe all had higher rates of Fully Qualified teachers and Gifted and Talented

students with lower rates of discipline than schools that experienced Intervention Status. Further

during the study timeframe there was a downward trajectory of more schools entering into or

remaining in Intervention Status than entering into or remaining in Blue Status. Together these

findings indicate that the SPF was not successful in promoting higher degrees of school success.

This leaves future research questions about the impact of not incorporating the student

demographic, EL characteristic and services, and school context discrepancies explored into the

accountability framework.

Research Question 3: What are the student demographics, EL characteristics and services,

and school contexts per charters and district-run schools?

The findings from the previous two research questions have indicated that student

demographics, EL characteristics and services, and school contexts vary both across SPF ratings

93
brackets as well as across schools that remained in, entered into, and exited from the SPF

statuses that represent the special focus of the accountability framework. Low accountability

ratings can lead to school closure and replacement with restart by a charter school. Additionally,

these previous findings have indicated that more historically marginalized students are learning

in distinct school contexts in these low-rated schools. Thus this third research question sought to

understand whether these same metrics also vary between district-run schools and the charters

that are potentially replacing them. Table 7 shows the means of the study variables in each

academic year as well as the three-year aggregate means. The same abbreviated variable names

used in the previous section are employed in Table 7; refer to Table 5 for a description of

variable names.

Table 7.
Means of Study Variables per District-Run And Charter Schools for Each Year of Study and Three-
Year Aggregate
2016-2017 2017-2018 2018-2019 All Years Avg.
Dist. Chart. Dist. Chart. Dist. Chart. Dist. Chart.
N 132 54 133 58 129 56 394 168
% 71.0% 29.0% 69.6% 30.4% 69.7% 30.3% 70.1% 29.9%
Student Demographics
SoC % 74.9 84.4 74.7 85.1 74.4 85.0 74.7 84.9
FRL % 67.1 74.1 67.6 74.9 66.1 74.5 66.9 74.5
SPED % 11.1 11.0 12.1 12.1 12.4 12.3 11.9 11.8
EL % 32.3 37.3 33.5 40.3 31.7 39.2 32.5 39.0
GT % 12.4 15.1 15.0 14.4 9.3 10.0 11.8 13.1
English Learner Characteristics
SPED as ELs % 39.5 45.8 34.7 45.2 34.8 45.9 36.3 45.6
Spanish EL % 76.6 84.3 76.0 86.1 75.6 85.8 76.1 85.4
ELs in GT % 3.3 3.6 2.8 2.3 3.5 2.2 3.2 2.8
Beginning EL % 22.2 10.3 24.7 17.2 26.7 21.0 24.5 16.3
Intermediate EL % 68.1 80.4 67.9 77.9 65.0 73.1 67.0 77.0
Advanced EL % 9.7 9.3 7.4 4.9 8.2 5.9 8.4 6.6
English Learner Services
Redes. % 7.8 7.8 12.3 15.1 14.4 21.4 11.5 15.0
Exit % 3.6 15.5 4.6 7.2 6.3 9.1 4.8 10.4

94
Re-Entry % 0.9 1.2 0.3 0.4 1.3 1.8 0.8 1.1
PP1 % 39.5 37.5 39.3 37.5 39.8 39.8 39.5 38.3
PP2 % 54.3 50.2 54.2 52.9 54.4 53.2 54.3 52.1
PP3 % 6.5 11.5 6.1 8.8 8.0 6.5 6.7 8.8
Main. % 3.4 95.5 4.5 82.0 3.2 32.7 3.7 69.4
ELA-E % 74.8 3.8 71.9 16.8 74.4 60.6 73.7 27.7
ELA-S % 18.7 0.7 20.6 1.2 18.9 6.7 19.4 2.9
DL % 3.1 0.0 3.1 0.0 3.5 0.0 3.2 0.0
Nat. Lang. % 21.8 0.7 23.6 1.2 22.4 6.7 22.6 2.9
School Contexts
Enrollment 495.1 330.9 479.8 340.2 481.4 345.3 485.4 339.0
Student-Teacher Ratio 15.5 14.4 15.2 13.6 15.2 13.6 15.3 13.8
Full. Qual. Teacher % 82.8 25.0 84.4 44.2 81.6 No data 83.0 42.6
Disp. Actions Rate 9.6 22.1 13.5 19.2 12.9 18.6 12.0 19.9
Disp. Incidents Rate 6.9 15.9 9.0 14.0 9.1 11.8 8.3 13.9
Disp. Instruction Loss
6.4 12.2 4.2 7.5 3.9 7.8 4.8 9.1
Rate
SPF % 57.6 58.2 55.2 50.1 51.9 48.5 54.9 52.2

In each year of the study, when compared to district-run schools charters served higher

percentages of students of color, FRL students, ELs, Spanish-speaking ELs, Special Education

students that are ELs, and Intermediate Level ELs, with lower percentages of Beginning and

Advanced Level ELs. In some cases, these differences were stark. For example, when compared

to district-run schools on average charters served 13.7% (or 10.2 percentage points) larger

proportions of students of color, 20.0% (6.5 percentage points) larger proportions of ELs, and

25.6% (9.3 percentage points) larger proportions of SPED students that were ELs, with 33.5%

(8.2 percentage points) smaller proportions of Beginning Level ELs and 21.4% (1.8 percentage

points) smaller proportions of Advanced Level ELs. However, there were no consistent

disparities between charters and district-run schools regarding proportions of GT students or ELs

in GT, and they each served nearly identical proportions of SPED students. These data indicate

that on average charters served student populations that were less White, less wealthy, and more

bilingual than district-run schools, with higher rates of Special Education students that were ELs,

95
and ELs that were Spanish-speakers and Intermediate Level than district-run schools. These

findings correspond with the previous results that found charters to be overrepresented in the

schools experiencing Intervention Status (Research Question 2), and historically marginalized

students overrepresented in low-rated schools (Research Question 1).

However, these larger proportions of ELs that would benefit from and qualify for the

native language programming offered in the district did not translate to charters offering more

students such opportunities. Despite district-run schools and charters having similar percentages

of families preferring native language supports (PPF1) and English-only language instruction

designed for ELs (PPF2), charters had considerably higher rates of placing ELs in Mainstream

class settings, which reflect neither of these preferences. While on average only 3.7% of ELs in

district-run schools were placed in Mainstream programming, 69.4% of ELs in charter schools

were in Mainstream classes, representing a 1,775.7% higher rate of Mainstream placement.

Charters placed approximately one in four (27.7%) ELs in English-only language instruction

designed for ELs, or ELA-English classes, while district-run schools placed ELs in these

programs at nearly three times that rate (73.7%). Unfortunately, this trend is also evident

regarding native language programming, as on average nearly one in five (19.4%) ELs in

district-run schools were placed in ELA-Spanish programs with only 2.9% of ELs in charters

given the same opportunity. Despite these dramatic differences in participation rates in programs

designed for emergent bilingual students, charter schools exited ELs from English Language

services 116.7% more frequently than district-run schools, exiting on average one in ten ELs

every year, while district-run school exited about one out of every 25 ELs annually. Although

charters also had 37.5% higher average re-entry rates for these students, indicating that exiting

them was premature, this higher rate of re-entries does not account for the discrepancies of

96
removing ELs from English Language supports, leading to the question of what happens to these

students in charters once they are no longer tracked as ELs.

Students in charters were not only less likely to participate in programs designed for

emergent bilinguals, they were also much more likely to experience disciplinary actions and

incidents and lose instructional time as a consequence. In every year of the study, charters had

higher rates of disciplinary actions, disciplinary incidents, and disciplinary actions resulting in

instructional loss than district-run schools, with 65.8% higher rates of disciplinary actions, 67.5%

higher rates of disciplinary incidents, and nearly double (89.6%) higher rates of disciplinary

actions resulting in instructional loss. Charters also had considerably smaller percentages of their

teachers that are fully qualified to work with emergent bilingual students according to the district

metrics of “Fully Qualified” teacher, with rates of Fully Qualified teachers that were

approximately one-third to one-half of those in district-run schools. However, these data were

incomplete as the district did not report on charters’ Fully Qualified teacher rates in the 2018-

2019 academic year and only partial reporting was available for the previous years.

Summary

Taken together, this analysis indicates that despite having greater proportions of students

of color, students receiving Free and Reduced Lunch services, English Learners, and the

Intermediate Level and Spanish-speaking English Learners that would especially benefit from

native language supports or other programs specifically designed to serve ELs, charters provided

dramatically fewer of these opportunities to their Els. Further all students in charters were

subject to stricter disciplinary environments. While it might be argued that high rates of

Mainstream classes for ELs, exiting ELs from English Learner status, and discipline are all

reflections of the different approaches to education that make charters unique and successful

97
alternatives to district-run schools, that success was not evidenced in SPF scores, as charters’

average SPF scores were a 4.9% lower than those in the district.

Research Question 4: Do student demographics predict percent SPF points earned?

The previous findings have consistently indicated that student demographics, EL

characteristics and services, and school contexts all vary between schools with high- and low-

SPF ratings, whether one evaluates this variation across SPF ratings brackets, schools that

remained in, began in, or ceased being especially high- or low-performing over time, or through

the charter or district school statue. However, none of these analyses have included tests of

statistical significance, meaning that despite the consistency of these variations the differences

could just be statistical “noise,” or random fluctuations above and below the means.

To evaluate whether these differences are statistically significant or just random, I ran a

series of OLS regressions predicting the percent of SPF points earned per student demographic

while controlling for (a) the percentage of teachers classified as “Fully Qualified” (F.Tch %), (b)

the student-teacher ratio (ST Ratio), and (c) the rate of disciplinary actions resulting in

instructional loss per 100 students (Disp. Loss). Because these models represent the three-year

aggregated data, they also controlled for year by using dichotomous variables for the academic

years of 2017-2018 (Year 2017) and 2018-2019 (Year 2018), with the 2016-2017 academic year

variable omitted as the reference. Table 8 shows the results from the series of individual student

demographic predictors, along with saturated models in which all student demographic

predictors are included (and thus controlled for). For each student demographic predictor, a

regression model is shown using linear and then polynomial terms, which allow for curvilinear

relationships between the predictor and the outcome. Quadratic terms are indicated with a

98
squared exponent next to the variable name, and cubed terms are indicated with a cubed

exponent next to the variable name.

Individual Student Demographic Predictors

Predictor: Student of Color (SoC) Percent

In Models 1 and 2, the predictor is the percentage of a school population that are

classified as students of color (SoC %). Model 1 uses a linear term for the SoC predictor, while

Model 2 uses cubed terms. The quadratic term (SoC % ²) is statistically significant, indicating

that the additional cubed term (SoC % ³) was appropriate, with the slightly higher Adjusted R2

(Adj R2) value in Model 2 compared to Model 1 likewise indicating that Model 2 was a better fit

for these data, as Model 1 accounted for 29% of the variation of the data while Model 2

accounted for 33% of the variation. In both Model 1 and Model 2, the percentage of students of

color is a statistically significant predictor of SPF score even when controlling for the percentage

of Fully Qualified teachers, student-teacher ratio, rate of disciplinary actions resulting in

instructional loss, and year. The coefficient on the percent students of color variable in the cubed

model (Model 2) indicates that, for every one point positive difference in the percent of students

that are students of color in a school (in layman’s terms, schools serving one percentage point

more of SoC are predicted to have scores that are 2.13 SPF points lower, on average), even when

holding constant all of the control variables we would expect to see 2.13 fewer SPF percentage

points. Model 2 was statistically significant (R2 = [0.33], F(8, 421) = [27.01], p = [0.00]). The

percent of students of color in a school significantly predicted the percent of SPF points earned

(β = [-2.13], p = [0.002]), and its the fitted model was:

Percent SPF Points Earned = 107.72 + -2.13(percent students of color³) + 0.02(percent


Fully Qualified teachers) + 0.4(student-teacher ratio) + -0.29(rate of disciplinary
actions resulting in instructional loss) + -2.90(Year 2017) + -6.27(Year 2018)
99
Table 8.
Individual Predictor and Saturated Models OLS Regressions with Cubed Terms for Academic Years 2016-2017 Through 2018-2019
Predictor: SoC Predictor: EL Predictor: SPED Predictor: FRL Saturated; SoC Saturated; FRL
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12
Predictor
SoC % -0.25*** -2.13** -0.33*** -2.00**
SoC % ² 0.03* 0.03*
SoC % ³ 0.00 0.00+
EL % -0.11*** -1.59*** 0.14** 0.07 0.14** 0.21
EL % ² 0.04*** 0.00 0.00
EL % ³ 0.00*** 0.00 0.00
SPED % -0.76*** -8.39*** -0.47** -3.27+ -0.43* -1.17
SPED % ² 0.50*** 0.20 0.06
SPED % ³ -0.01*** 0.00 0.00
FRL % -0.21*** -1.55*** -0.29*** -1.38***
FRL % ² 0.02*** 0.02**
FRL % ³ 0.00* 0.00*

Controls
F.Tch % 0.03 0.02 0.03 0.02 0.04 0.04 0.04 0.02 0.02 0.02 0.04 0.03
ST Ratio 0.39 0.44+ 1.11*** 0.81** 0.79** 0.69** 0.26 0.36 0.12 0.17 -0.10 0.08
Disp. Loss -0.32*** -0.29*** -0.36*** -0.29*** -0.33*** -0.32*** -0.32*** -0.28*** -0.26*** -0.25*** -0.26*** -0.24***
Year 2017 -3.13* -2.90* -3.31* -2.94+ -2.65+ -2.13 -3.19* -3.03* -3.23* -2.89* -3.31* -3.02*
Year 2018 -6.55*** -6.27*** -7.11*** -7.02*** -5.41** -5.28** -6.67*** -6.50*** -6.50*** -6.34*** -6.76*** -6.65***

Constant 70.08*** 107.72*** 44.31*** 61.98*** 52.78*** 87.85*** 67.29*** 85.53*** 81.11*** 122.33*** 77.67*** 92.59***
Adj R2 0.29 0.33 0.20 0.26 0.21 0.24 0.28 0.34 0.33 0.35 0.32 0.35
Obv. 430 430 416 416 421 421 430 430 409 409 409 409
+ Indicates p-value ≤ 0.1
* Indicates p-value ≤ 0.05
** Indicates p-value ≤ 0.01
*** Indicates p-value ≤ 0.001
Note: Year variables represent binaries for the academic years 2017-2018 and 2018-2019; the binary variable for the academic year 2016-2017 was omitted
as the reference

100
Predictor: English Learner (EL) Percent

Models 3 and 4 use the percentage of English Learners in a school as the predictor, with

the statistically significant quadratic and cubed terms indicating a curvilinear relationship

between the percent of SPF points earned and the percent of students classified as English

Learners, and a slightly higher Adjusted R2 indicating that the cubed model is the better fit as it

accounted for 26% of the variation in the data while Model 3 only accounted of 20%. The

percent of students that are English Learners is a statistically significant predictor of the percent

of SPF points earned in which SPF points earned are predicted to be 1.59 lower, on average,

with every one point positive difference in the percent of students in a school that are ELs when

holding constant the controls. In other words, Model 4 was statistically significant (R2 = [0.26],

F(8, 407) = [19.26], p = [0.00]), with the percent of English Learners in a school significantly

predicting the percent of SPF points earned (β = [-1.59], p = [0.000]). Its fitted model was:

Percent SPF Points Earned = 61.98 + -1.59(percent English Learners³) + 0.02(percent Fully
Qualified teachers) + 0.81(student-teacher ratio) + -0.29(rate of disciplinary actions resulting in
instructional loss) + -2.94(Year 2017) + -7.02(Year 2018)

Predictor: Special Education (SPED) Percent

Models 5 and 6 use the percentage of students receiving Special Education services as the

predictor. Like in Model 4, the statistically significant quadratic and cubed terms indicate a

curvilinear relationship between the percent of SPF points earned and the percent of students

classified as Special Education, and a slightly higher Adjusted R2 indicating that the cubed

model is the better fit as it accounts for 24% of the variation in the data while Model 5 only

accounts of 21%. Like the previous models, in Model 6 the percent of students that receive

Special Education services is a statistically significant predictor of percent SPF points earned

101
in which SPF points earned are predicted to be 8.39 lower, on average, with every one point

positive difference in the percent of students in a school that are SPED when holding constant

the controls. Model 6 was statistically significant (R2 = [0.24], F(8, 412) = [17.69], p = [0.00]),

with the percent of students in Special Education in a school significantly predicting the percent

of SPF points earned (β = [-8.39], p = [0.000]). Its fitted model was:

Percent SPF Points Earned = 87.85 + -8.39(percent Special Education students³) + 0.04(percent
Fully Qualified teachers) + 0.69(student-teacher ratio) + -0.32(rate of disciplinary actions
resulting in instructional loss) + -2.13(Year 2017) + -5.28(Year 2018)

Predictor: Free and Reduced Lunch (FRL) Percent

Models 7 and 8 use the percentage of students receiving Free and Reduced Lunch

services as the predictor. Like in Models 4 and 6, the statistically significant quadratic and cubed

terms indicate a curvilinear relationship between the percent of SPF points earned and the

percent of students classified as Free and Reduced Lunch, and a slightly higher Adjusted R2

indicating that the cubed model is the better fit as it accounts for 34% of the variation in the data

while Model 7 only accounts of 28%. Like the previous models, in Model 8 the percent of

students that receive Free and Reduced Lunch services is a statistically significant predictor of

the percent SPF points earned in which SPF points earned are predicted to be 1.55 lower, on

average, with every one point positive difference in the percent of students in a school that are

FRL when holding constant the controls. Model 8 was statistically significant (R2 = [0.34], F(6,

423) = [28.59], p = [0.00]), with the percent students receiving Free and Reduced Lunch in a

school significantly predicting the percent SPF points earned (β = [-1.55], p = [0.000]). Its fitted

model was:

Percent SPF Points Earned =85.53 + -1.55(percent Free and Reduced Lunch students³) +
0.02(percent Fully Qualified teachers) + 0.26(student-teacher ratio) + -0.32(rate of disciplinary
actions resulting in instructional loss) + -3.19(Year 2017) + -6.67(Year 2018)

102
Saturated Models

Two sets of saturated models were also created to test the predictive power of these

student demographic variables when they were combined into single models. Because of the high

degree of collinearity (r=0.95) between the percent student of color and the percent Free and

Reduced Lunch variables discussed in the Methods section, they could not be used together in a

single model. For this reason, one saturated model includes the percent students of color

variable, and the other includes the percent Free and Reduced Lunch variable. Like the previous

models, these saturated models are presented in the table first using linear and then cubed terms.

Saturated Model with Student of Color Percent Variable

Only the percent students of color variable continued to be a statistically significant

predictor in the saturated model using cubed terms (Model 10), although all the student

demographic variables were statistically significant predictors of the percent of SPF points

earned in the previous individual models and the saturated model using linear terms (Model 9).

This indicates that in the previous models, the variables that appeared to be statistically

significant predictors of the percent SPF points earned (e.g., percent English Learners and

percent Special Education) perhaps were not predictors in and of themselves but rather their

significance was derived from co-occurring characteristics of these students; namely, that

students of color are overrepresented in the English Learner and Special Education

classifications. In this way, the EL and SPED variables appeared to be significant predictors of

the percent SPF points earned, but this relationship was not due to students’ EL and SPED status

but rather students’ co-occurring student of color status. Because of this, when all these student

demographic predictors were included in a single saturated model, we see that only the percent

students of color variable continues to be significant, as this was the classification that was most

103
responsible for the relationship with the percent SPF points earned. Like in previous model sets,

the model using curvilinear terms (Model 10) has a slightly higher R2 than the model using

linear terms (Model 9), indicating that the curvilinear model – in which only the percent student

of color variable was a statistically significant predictor of the percent SPF points earned – is a

better fit for these data. Model 10 was statistically significant (R2 = [0.35], F(14, 394)) =

[16.68], p = [0.00]), with the percent students of color in a school significantly predicting the

percent SPF points earned (β = [-2.00], p = [0.006]). Its fitted model (Model 10) was:

Percent SPF Points Earned =122.33 + -2.00(percent students of color³) + 0.07(percent English
Learners³) + -3.27(percent Special Education students³)+ 0.02(percent Fully Qualified teachers)
+ 0.17(student-teacher ratio) + -0.25(rate of disciplinary actions resulting in instructional loss)
+ -2.89(Year 2017) + -6.34(Year 2018)

Saturated Model with Free and Reduced Lunch Percent Variable

Similarly, although all the student demographic variables were statistically significant

predictors of the percent of SPF points earned in the previous individual models and the

saturated model using linear terms (Model 11), when combined into a single saturated model

using cubed terms (Model 12), only the variable for the percent of Free and Reduced Lunch

students continued to be a statistically significant predictor. Like the other saturated model, this

indicates that the predictive power of the variables which previously appeared to be statistically

significant predictors of the percent SPF points earned (e.g., percent English Learners and

percent Special Education) was perhaps derived from co-occurring characteristics in which many

of these students were also classified as receiving Free and Reduced Lunch. As such, when all

these student demographic predictors were included in a single saturated model, we see that only

the percent FRL variable continues to be significant, as this was the classification that was most

responsible for the relationship with the percent SPF points earned. Continuing the trend evident

throughout these regressions, the model using curvilinear terms (Model 12) has a slightly higher

104
R2 than the model using linear terms (Model 11), indicating that the curvilinear model – in

which only the percent FRL variable was a statistically significant predictor of the percent SPF

points earned – is a better fit for these data. Model 12 was statistically significant (R2 = [0.35],

F(14, 394)) = [16.95], p = [0.00]), with the percent Free and Reduced Lunch students in a school

significantly predicting the percent SPF points earned (β = [-1.38], p = [0.000]). Its fitted model

(Model 12) was:

Percent SPF Points Earned =122.33 + -1.38(percent Free and Reduced Lunch students³) +
0.21(percent English Learners³) + -1.17(percent Special Education students³)+ 0.03(percent
Fully Qualified teachers) + 0.08(student-teacher ratio) + -0.24(rate of disciplinary actions
resulting in instructional loss) + -3.02(Year 2017) + -6.65(Year 2018)

Figures of Predicted Percent SPF Points Earned

Figure 4 shows panels of the predicted percent of SPF points earned per student

demographic variable. These figures were created using the individual predictor models using

curvilinear terms discussed in detail above (Model 2, Model 4, Model 6, and Model 8). The y-

axis shows the predicted percent of SPF points earned, and the x-axis shows the corresponding

change in each student demographic variable. The range of x-axis values begin with 0% and

continues through the 99th percentile of each student demographic in order to capture the

proportions of student demographic populations as they existed in the district with the exception

of the students of color variable, which begins with 20% as its first percentile value was 18.1 and

its 50th percentile value was 90.0 (Table 9).

105
Table 9.
Descriptive Statistics of Percentiles, Minimum and Maximum Values, Standard Deviations, and Means
for Each Student Demographic Predictor Variable Used in Multiple Regressions

Predictor Min. Max.


P1 P10 P50 P90 P99 SD Mean
variable value value

SoC % 18.1 36.3 90.0 97.4 99.4 0.0 100.0 23.5 77.6

FRL % 5.6 25.1 79.8 94.5 97.6 3.9 100.0 26.7 69.2

EL % 2.5 7.3 33.6 62.8 79.5 1.1 87.0 20.9 34.3

SPED % 3.4 6.6 11.3 17.9 27.3 1.6 37.7 4.8 11.8

Note regarding abbreviations in row and column titles: “Percentile” written as “P,” and “standard
deviation” written as “SD”

Figure 4.
Predicted Percent SPF Points Earned per Individual Student Demographic Variables Reflecting Models
2, 4, 6, and 8

Predicted SPF Score


per Percent Students of Color
In Multiple Regression Model with Cubed Term
90 80
Predicted SPF Score
60 70
50
40
20

30

40

50

60

70

80

90

0
10

Percent Students of Color

106
Predicted SPF Score
per Percent Free & Reduced Lunch Students
In Multiple Regression Model with Cubed Term

90
Predicted SPF Score
60 70
50
40 80
0

10

20

30

40

50

60

70

80

90

0
10
Percent Free & Reduced Lunch Students

Predicted SPF Score


per Percent English Learner Students
In Multiple Regression Model with Cubed Term
90 80
Predicted SPF Score
60 70
50
40
0

10

20

30

40

50

60

70

80

Percent English Learner Students

107
Predicted SPF Score
per Percent Special Education Students
In Multiple Regression Model with Cubed Term

40 50 60 70 80 90
Predicted SPF Score

10

15

20

25

30
Percent Special Education Students

These models show a nonlinear relationship between each student demographic predictor

and the percent SPF points earned. Specifically, they highlight that having greater proportions of

historically marginalized students is not predicted to consistently result in the same difference in

percent SPF points earned. Rather, for each of these student demographic variables there is a

dramatic negative difference in predicted percent SPF points earned as a school serve these

populations above the district’s lowest thresholds, and the downward trend eventually flattens

out as the student demographics reach about one standard deviation below the district means.

For example, for the percent students of color variable, when holding constant all the

model controls we would expect that a school with 20% students of color (representing schools

in the first percentile) to have a predicted SPF score of about 78.4. Schools that serve larger

proportions of students of color are predicted to earn fewer SPF points, until this negative

relationship flattens out around 60% students of color, which has a predicted SPF score of 53.4

points. In this way, for every additional ten percentage points of the student of color population

108
in a school beginning at a base population of 20%, we would expect to see an increasingly

narrowing difference in the percent of SPF points earned: schools with 30% students of color are

predicted to earn about 10 fewer SPF percentage points than schools with 20% students of color;

schools with 40% students of color are predicted to earn about 7 fewer SPF percentage points

than schools with 30% students of color; schools with 50% students of color are predicted to earn

about 4 fewer SPF percentage points than schools with 40% students of color; schools with 60%

students of color are predicted to earn about 3 fewer SPF percentage points than schools with

50% students of color. After that, the predicted differences in SPF percentage points earned

continues to winnow between schools with larger and larger percentages of students of color,

with differences ranging from about 1 to less than 1 percent of SPF points.

The other student demographic predictors follow similar patterns. While a school with

0% FRL students is predicted to have 88.2% of SPF points earned when all the model controls

are held constant, the predicted SPF points earned is considerably lower for schools with larger

percentages of FRL students, with the most dramatic differences evident in schools with low

percentages of FRL students compared to those with 0%, although after schools reach about 40%

FRL students (predicted to earn 54.3% SPF points) the differences flatten out. Keeping constant

the model controls, a school with 0% EL students is predicted to earn 71.3% of possible SPF

points, but schools that serve larger proportions of ELs are predicted to initially earn

dramatically fewer SPF percentage points – as schools that have 10% ELs opposed to 0% are

predicted to earn about 13 fewer SPF percentage points, lowering the predicted SPF points to

from 71.3% to 58.8% – until the trendline flattens at 20% ELs with a predicted SPF score of

52.0. Likewise, keeping the model controls constant a school with 0% SPED students is

predicted to earn 97.5% of possible SPF points, although a school with a 5% SPED population

109
opposed to 0% is predicted to earn 29 fewer SPF percentage points with a predicted SPF score of

66.9%. The dramatic downward trend only continues until schools reach about 10% SPED, after

which it flattens.

Control Variables

While this analysis focused on the student demographic variables as predictors, in a

multiple regression model all variables can be interpreted as predictors or controls, and which is

positioned in each role should be dictated by theory. Given this flexibility, an alternative

interpretation of these data could examine the predicted percentages of SPF points earned when

student demographic variables are held constant and there is variation in the percent of teachers

that are Fully Qualified, the student-teacher ratio, the rate of disciplinary actions resulting in

instructional loss, or the year. Although the percent of teachers that are Fully Qualified was not a

statistically significant predictor of percent SPF points in any model, and student-teacher ratios

were statistically significant predictors in only some models, in every model the rate of

disciplinary actions resulting in instructional loss was a highly statistically significant predictor,

with the year variables also being statistically significant predictors albeit with larger p-values.

This indicates that even in schools with similar student demographics, the rate at which

disciplinary actions remove students from instruction is consistently a highly statistically

significant predictor of percent SPF points earned, with each additional instance of such

disciplinary actions per 100 students predicted to correlate with 0.2 to 0.3 lower SPF percentage

points (depending on the model). Similarly, schools evaluated in the 2017-2018 academic year

instead of the 2016-2017 academic year were predicted to correlate with about 3 lower SPF

percentage points just for the change in year, even holding constant all the other model variables.

110
In addition, in 2018-2019 school SPF points were 6 points lower on average than in 2016-2017,

conditional on other variables in the model.

Summary

These findings reiterate those from the previous research questions by showing that, like

student demographics, school context variables extrinsic to the SPF framework do in fact

correlate with SPF outcomes, making the SPF not only a reflection of student learning but school

contexts, student demographics, and even the vagrancies of year. Since these data indicate that

these school context variables extrinsic to the SPF had significant relationships with SPF

outcomes, alternative accountability policies would do well to not only include attention to these

factors but design interventions that target them.

Research Question 5: Do student demographics predict SPF outcomes?

The previous findings have shown that student demographics (and school contexts) are

statistically significant predictors of percent SPF points earned, with one point positive

differences in historically marginalized populations correlating with 1.5 to 8 lower SPF

percentage points, depending on the model. However, a difference of only a few SPF percentage

points may not be meaningful; for example, if a school earns 55% or 60% SPF points, it will still

be rated in the fourth highest category of Green in the Red, Orange, Yellow, Green, and Blue

SPF rating system, and it will still experience the consequent prestigious status. Because it is not

the SPF percentage points themselves but the SPF outcome –whether a school is subjected to

interventions or prestige – that impacts the experience of students, teachers, and families, this

research question sought to understand whether the same student demographic predictor models

using the same controls as the previous research question also predicted SPF outcomes broadly.

111
To evaluate the relationship between student demographics and SPF outcomes, I ran

school-level ordinal logit regression models including cubed terms for key predictors to allow for

nonlinear associations holding constant: (a) percent of teachers that are Fully Qualified, (b)

student-teacher ratio, (c) number of disciplinary incidents that result in instructional loss per 100

students, and (d) dichotomous variables for academic years 2017-2018 and 2018-2019, omitting

the 2016-2017 year as a reference. Each model included an individual student demographic

predictor, which were: (a) percent student population that are students of color, (b) percent

student population that are Special Education students, (c) percent student population that are

English Learners, and (d) percent student population that receive Free and Reduced Lunch

services. I then used these regressions to predict the probability of a school receiving one of three

Simplified SPF designations: (a) Intervention, denoting either a Red or Orange SPF rating

bracket, which warrants district intervention; (b) On Watch, denoting a Yellow SPF rating

bracket, and using the term the district applies to such schools; and (c) High Performing,

denoting either a Green or Blue SPF rating bracket, which are the highest ratings brackets

available in the district and imply exemplary performance. The predicted margins for each

academic year were used to create figures in R using ggplot (Figure 5)

112
Figure 5.
Predicted Probabilities of Receiving Simplified SPF Outcomes (a) Intervention, (c) On-Watch, or (c) High Performing per Student
Demographic Predictor Using Models 2, 4, 6, and 8 From Research Question 4

113
These findings confirm those from the previous research question while adding greater

nuance. In each model, there is a dramatic negative difference in the predicted probabilities of

receiving a High Performing designation as schools serve proportions of historically

marginalized student populations greater than 0% to about one standard deviation below the

district mean for the respective population, as found in the Research Question 4. This indicates

that, not only do historically marginalized student demographics predict percent SPF points

earned, but having such students in proportions that deviate from the most extreme minimums

appear to predict schools’ probability of receiving High Performing designations the most

dramatically.

Schools with 0% students of color or Free and Reduced Lunch students have

approximately a 100% predicted probability of being High Performing. At about 20% students of

color and 0% FRL students, those probabilities are dramatically lower as schools serve greater

proportions of these students until these populations reach around 40%, after which the changing

negative trendline flattens. As schools serve greater proportions of student of color and FRL

students, not only do they have steeply lower probabilities of being High Performing, but also

greater probabilities of being On-Watch and Intervention until a threshold of about 50% of each

student population is reached. As these student populations become greater than 0%, the upward

trend of greater predicted probabilities of being On-Watch is steeper than that of being

Intervention, indicating that increasing the percent of students of color or FRL students is

associated with higher probabilities of being On-Watch than it does of being Intervention. The

predicted probabilities of being in Intervention status are not dramatically different in the same

way that the probabilities of being High Performing are after these student populations change

from 0% to 40%. Instead, the predicted probabilities of being in Intervention status only begin to

114
show positive differences at around 40% students of color or 20% FRL students, although each

trendline begins to flatten at around 60%.

Schools with 0% ELs or SPED students have approximately 75% to 80% predicted

probability of being High Performing. As seen elsewhere, this probability is dramatically lower

for schools that serve more than 0% ELs, with downward trendlines flattening out at around 20%

for both ELs and SPED. Once a school has approximately 10% EL students, this student

population no longer predicts High Performing status more or less than On-Watch status.

Interestingly, once the percentage of ELs reaches about 60%, the predicted probability of being

High Performing is higher, meaning that schools with such large proportions of ELs have greater

predicted probabilities of being High Performing than On-Watch or Intervention. Unlike the

students of color and FRL models, the predicted probabilities of being Intervention status are

immediately higher beyond 0% ELs, although this trend flattens at around 30% ELs after which

schools with larger percentage of ELs have lower predicted probabilities of being in Intervention

status. In contrast, the predicted probability of being in Intervention status is not different until

schools have 5% or more SPED students, and this upward trend continues until 25% SPED

students after which it flattens. Also unlike the EL model, the negative difference in the

predicted probability of being High Performing as the SPED proportions are greater than 0% is

immediate, and mirrors the positive differences in the predicted probability of being On-Watch

until about a school is about 10% SPED, after which this student demographic no longer predicts

one status more than the other.

Together, these trends confirm previous findings: the percentages of historically

marginalized students predict SPF outcomes, with schools serving larger proportions of

marginalized populations having lower SPF scores and, relatedly, greater likelihood of being in

115
the lowest SPF category. Positive differences in the predicted probabilities of being Intervention

status appear later, at about 5% for EL and SPED and 20% for students of color and FRL.

Despite this nuance, a central tendency remains; given the relationship between historically

marginalized student populations and SPF scores, schools that wish to earn higher SPF scores or

maintain high scores have incentives to work with as small a proportion of these students as

possible. As the accountability movement in education was in part rooted in a civil rights

struggle to ensure public schools better served these very student populations, an accountability

framework that disincentivizes schools from working with such students is a sad outcome

indeed.

Summary

Findings from each of the five research questions yielded similar and mutually

confirming results: In Denver Public Schools, the accountability ratings derived from the School

Performance Framework not only reflect student learning but also student demographics. This

results in schools that serve greater proportions of historically marginalized students – in

particular students of color and students receiving Free and Reduced Lunch services – being

predicted to have lower SPF ratings, potentially indicting that raced and classed students

experience disparate access to educational opportunities. Furthermore, the highest- and lowest-

rated schools also served markedly different types of English Learner (EL) students in markedly

different ways, both in terms of these students’ stage in their trajectory toward developing

English, home language, and participation in Gifted and Talented and Special Education

programs, as well as in terms of what language supports these students could access. Together,

these findings indicate that the SPF reflects factors extrinsic to what the framework purported to

evaluate. This results in an accountability policy that implicitly disadvantages the schools that

116
serve more historically marginalized students while rewarding those that serve the least.

Discussion

The results from this study iterate a central finding: the School Performance Framework

reflects and measures historically marginalized student demographics in a way that punishes the

schools that serve the largest proportions of these students while failing to account for school

context factors, like disciplinary rates and teacher qualifications, that also appear to drive SPF

outcomes. Following this study’s commitment to a QuantCrit employment and interpretation of

quantitative data, this finding is framed as a failure of both the accountability policy itself and

the district that instituted it and the systemic inequalities in society that research proves causes it

rather than of the historically marginalized students who are disadvantaged by the SPF or the

teachers who serve them. Such an intentional reframing of the locus of responsibility for racial

inequities away from racialized populations and onto institutions and the policymakers and

leaders who guide them is necessary to interrupt the legacy of quantitative data being used to

“obfuscate, camouflage, and even to further legitimate racist inequities” (Gillborn, Warmington

& Demack, 2018; p. 160). As such, the remaining part of this chapter will focus on the ways that

policymakers, researchers, and community members can use these findings as tools for pursuing

greater racial and social justice rather than tools for justifying current inequities. Implications for

Policy

Equity Reviews of Accountability Frameworks

That the SPF ratings outcomes consistently reflect demographic metrics extrinsic to the

accountability policy should alert district leaders and policymakers alike of the need to conduct

what I call “equity reviews,” or similar data analyses as those used in this study whose purpose is

to systematically review accountability data and outcomes to evaluate whether district policies

117
result in disproportionate harms to historically marginalized communities. The statistical

methods and data used in this study are accessible to district leadership and policymakers,

especially within offices of evaluation, assessment, and data management. As such, there are no

logistical or methodological constraints that limit the district's ability to incorporate regular

equity reviews of the outcomes of the accountability policy. It is possible to regularly incorporate

equity reviews of accountability frameworks into district accountability policy. Beyond possible

it is also responsible to conduct such reviews lest such policies result in the further

marginalization and disadvantages of historically marginalized communities.

A primary rationale of the accountability movement was the need to ensure that schools

are accountable for the outcomes of their students, especially their historically marginalized

students (DeBray-Pelot & McGuinn, 2009). In the same way, the policymakers and district

leaders who design and implement accountability policies should likewise hold themselves

accountable for the outcomes of their work. Just as the rationale for the accountability movement

holds that, when outcomes show that schools are disproportionately failing historically

marginalized communities such practices are unacceptable and deserve remediation, so, too,

should accountability policies and frameworks themselves be scrutinized to evaluate whether

they result in disparate impacts for the very students they are intended to help. Accountability

policies which result in disproportionate harm to these students should be reevaluated, revised,

and dismantled as necessary. Failure to do so results in a system which discourages schools and

teachers from working with historically marginalized students, as these students relate to lower

accountability ratings and the negative consequences they incur.

Such a commitment to uprooting discriminatory systems is already present in the goals of

the district. In 2021, the Denver Public Schools school board adopted a new governance structure

118
with which to orient its work and evaluate the superintendent through the use of “end goals.”

One of the “end goals” is to ensure the district is “free of oppressive systems and structures

rooted in racism” (Asmar, 2022; para. 5). An accountability framework which punishes schools

for working with large proportions of historically marginalized students while ignoring the

disparities in school contexts, resources, and opportunities provided to those students is an

example of an oppressive system rooted in racism. Whether the district uses the similar

accountability framework to the one examined in this study or an alternate framework, the

potential for the same disparate impacts remains. For this reason, the regularly administered

equity reviews this study recommends will continue to be necessary for whichever accountability

policy the district adopts. Failure to incorporate equity reviews of the district’s work and policies

is both a betrayal of the district’s goals as well as a betrayal of the students and families in the

district, who are subjected to externally created accountability policies, expectations, and

consequences yet have no voice in how those policies are created and implemented.

Such an equity review of district accountability policies could mirror the work of this

dissertation. By creating descriptive statistics of the student characteristics and school contexts in

the highest- and lowest-rated schools, the district could better identify whether the accountability

policy is resulting in disparate impact on certain student populations in addition to other factors –

such as the dramatically different rates of disciplinary actions that result in instructional loss

identified in this study – that could also have relationships with school ratings and thus deserve

attention and, if necessary, amelioration as discussed in the section below “Identifying Needed

Supports.” The result of a district-initiated equity review of its own work could be publicly

available ratings, like those of the SPF, in which the accountability system itself is evaluated and

rated. Families and teachers deserve to know whether the accountability policies used by the

119
district are effective and fair. Likewise, if the accountability policy results in a disproportionate

amount of historically marginalized students being found in the lowest-rated schools, or worse as

this study found that such student populations actually predict accountability scores, that is

information the public needs as they interpret their school’s rating.

Bonilla-Silva (2006) reminds us that white supremacy is not limited to a few extremist

individuals, but rather permeates the worldviews and institutions that constitute our shared social

reality. The SPF is a reflection of this dynamic: although there is no part of the accountability

policy, metrics, or goals that specify its design is intended to perpetuate the marginalization of

historically marginalized communities, in practice (and in history) this is the outcome. Stated

goals, overt acknowledgement, or even purposeful intentions are irrelevant (Leonardo, 2004),

and it is likely that many if not all those who worked to craft and implement this accountability

policy did so without malice. And yet, once again institutional policies and practice resulted in

the same outcome, in which the marginalization of the communities already battling

intergenerational marginalization not only continued but was legitimized through ostensibly

ideologically-neutral metrics like test scores. Investigations like the one undertaken in this study

are necessary because there is no warning label on policies that result in marginalization; there

are no written statements from policymakers announcing their intent to harm historically

marginalized communities as this overt intent is both likely nonexistent and certainly irrelevant

when evaluating the merit and consequences of such policies.

For these reasons, conducting equity reviews of the accountability policy it is not only in

line with the district’s stated goals and fair to the community, but it is an imperative in order to

ensure that the district is not perpetuating these historical ills. Using a lens grounded in Critical

Race Theory to analyze education policy highlights the dynamic of purportedly race-neutral

120
policies resulting in harms to racialized populations by asking simple questions such as, ‘Who is

the policy designed by? Who does it benefit and harm? What are the outcomes?’ (Gillborn,

2005). Doing so reveals the ways that education policies such as accountability reflect,

perpetuate, and legitimize the interests and worldviews of those who benefit from a white

supremacists status quo at the expense of racially marked communities. Policies like the SPF –

which result in schools with larger proportions of students of color being more likely to be

labeled as failures and closed – perpetuate worldviews that frame marginalized students as

causing their own marginalization and thus deserving of its adverse consequences while actively

discouraging teachers from working with such students. This worldview is incompatible with

one in which all students are capable of success and deserving of opportunities. For these

reasons, this study strongly recommends that the district adopt regular equity reviews of its

accountability policies, as failing to do so not only allows ineffective systems to continue but

fundamentally also fails the district’s goals and responsibilities toward the communities it serves.

Evaluating Success of Accountability Framework

Such a review of the accountability framework used by the district could also identify

trends of schools gaining or losing high-rated and low-rated accountability status over time, as

this study found that during the study Timeframe the SPF was unsuccessful in prompting greater

school success as evidenced in the downward trend in which increasing numbers of schools were

given low SPF ratings while decreasing numbers of schools were given high ratings. These

findings indicate that, despite the behaviorist and market logics which undergird school

accountability (Trujillo & Renée, 2015), not all accountability systems will be equally successful

in achieving their goals of promoting improvements in learning outcomes and school quality

(Fuller & Johnson, 2001).

121
Incorporating regular reviews of how the accountability framework is functioning in

addition to the abovementioned equity checks can help districts evaluate the efficacy of their

accountability policies. Findings from this study indicate that it is possible for accountability

policies to be ineffective deterrents against increasing rates of low performance. If an

accountability policy is found to be ineffective in promoting the kinds of learning outcomes and

quality school metrics it seeks to advance, then district leaders and policymakers would have

data supporting the need to revise their policies so that the accountability frameworks they

implement have better chances of succeeding in their purposes. Rather than aiming for one

permanent correct system, regular review of efficacy and equity would encourage districts to

engage in cycles of learning and inquiry, thus allowing them to adjust to the changing needs and

contexts of the students and communities they serve.

Failure to incorporate reviews of success and efficiency of accountability frameworks

risks imposing adverse consequences on students, teachers, and communities without any

benefit. Like many accountability frameworks, the SPF functioned through behaviorist logics

that saw negative reinforcements as mechanisms to spur desired changes in in outcomes

(Dworkin, 2005; Finnigan & Gross, 2007). These negative consequences included the loss of

students and the funds they bring under the district’s universal choice model (Asmar, 2019a),

which reduced low-rated schools’ ability to afford the teachers and programs that made them

attractive destinations in the first place (Asmar, 2019b), in addition to reduced teacher pay

(Asmar, 2016) and district intervention in the form of the need to complete improvement plans

and possible restart or closure (Asmar, 2018; Denver Public Schools, 2018). If an accountability

system results in students and teachers facing winnowing resources and enrollment, negative

repute, reduced teacher pay, and possible elimination, then these adverse consequences must at

122
least be in the service of achieving the admirable goal of improving outcomes for students.

However, if an accountability system is found to not even improve learning outcomes for

students or generally promote greater school quality or school success, then these punishments

only serve to adversely impact students and teachers for no purpose. As this study found that

historically marginalized students are concentrated in the schools most likely to receive such

punishments, it is especially imperative for districts to evaluate the effectiveness of their

accountability systems lest these students and their teachers are adversely impacted for no other

reason than a faulty, inefficient accountability system.

Identifying Needed Supports

In addition, employing similar methods as those used in this study can help districts

identify school and student needs that the SPF did not address. For example, this study found that

the lowest-rated schools had higher rates of discipline and lower rates of Fully Qualified teachers

than the highest-rated schools. However, since neither of these metrics were measured by the

SPF, as an accountability framework it was unable to identify how these disparate school

contexts might have contributed to disparate achievement outcomes, thus leaving obscure

information that could have helped prompt the district to offer appropriate supports and

interventions. By considering the non-achievement contexts of schools that are typically

disregarded by accountability frameworks, districts can help provide targeted interventions and

supports that reflect actual disparities in schools, thereby ensuring that all students are provided

the resources and learning environments conducive to school success.

Policymakers and district leaders could use such information to craft accountability

policy interventions. For example, as this study identified disparate rates of Fully Qualified

teachers in the highest- and lowest-performing schools, an appropriate intervention as a response

123
would be to place more Fully Qualified teachers in these schools, either by relocating them or

investing in the training and incentives to ensure that the teachers already practicing at those

schools can become Fully Qualified. Similarly, in response to the finding that the lowest-rated

schools experience disciplinary actions, incidents, and actions resulting in instructional loss at

nearly double the rate of the highest-rated schools and that rates of disciplinary actions that result

in instructional loss in fact predict SPF scores, an appropriate intervention could be to provide

these schools with additional support and training in restorative justice and other non-punitive

social and emotional supports for students to address behavior management needs that do not

result in instructional loss.

Because what is not measured is not acted upon, such a consideration for school context

variables can help district leaders identify and provide the resources, services, and supports that

students need but may not be receiving. Although research has long since identified the benefits

of emergent bilingual students receiving bilingual education (Ramírez, 1992; Rolstad, Mahoney

& Glass, 2005; Thomas & Collier, 1997), on average less than one in five English Learner

students in DPS received any type of native language supports. An accountability framework

which includes metrics describing whether emergent bilingual students have access to bilingual

education would implicitly encourage schools to provide the resources that research has

established is beneficial to these students. Furthermore, these kinds of metrics could be included

in accountability frameworks that are differentiated to each school’s contexts and student needs,

representing both an accountability system that understands that different student communities

will have different needs as well as one that seeks to provide for those unique needs through the

inclusion of differentiated metrics that encourage schools to provide them.

124
Another example relates to this study’s finding regarding Gifted and Talented (GT)

participation between the highest- and lowest-rated schools, with Red schools identifying

students for GT about half as often as Blue schools, which not only had higher rates of GT for all

students but also placed English Learners in GT at five times the rate as Red schools. If

dynamics such as GT participation rates were measured, then the accountability framework

could use findings like the ones in this study to identify the need for providing more teacher

training in cultural, linguistic, and racial biases that might prevent them from nominating

students from historically marginalized backgrounds for such placement. Another response could

be the inclusion of a metric evaluating the proportionate representation of historically

marginalized students in GT programs in order to encourage the sorts of programmatic changes

the accountability movement is premised upon. Including such metrics beyond test scores would

allow accountability frameworks to identify and address these kinds of disparate school contexts

and opportunities provided to students.

These are only a few examples of the ways that an accountability framework that

measures school context variables can identify the contexts which distinguish high- and low-

performing schools and thus provide the supports and interventions necessary to equalize the

learning environments and resources students enjoy in each. As such, these findings not only

highlight specific interventions that are likely necessary in Denver Public Schools, but the utility

of an accountability framework that thinks outside the narrow confines of test scores to measure

contextual factors and thus address root causes of achievement disparities.

Evaluating Success and Needs of Charters

Finally, this study highlighted the unique contexts of charter schools that may contribute

to the understanding of the role these schools can play in improving learning outcomes for

125
historically marginalized students. This study found that charters were overrepresented in the

lowest-rated SPF brackets (Red or Orange ratings brackets, also called Intervention Status in this

study) while being less likely to enter into or remain in the highest SPF rating bracket of Blue. In

addition, although on average when compared to district-run schools charters served larger

proportions of historically marginalized populations of students of color, students receiving Free

and Reduced Lunch services, and English Learner students, they did so in learning environments

that were distinct from those of district-run schools and difficult to reconcile with providing

these student populations equitable educational opportunities. For example, although on average

more than 90% of EL families in charters preferred some type of language supports for their

children, on average less than 4% of ELs in charters were provided with these resources. Not

only were these historically marginalized students denied the language supports their families

requested, but all students learned in disciplinary environments that appeared to be much harsher

than those of district-run schools, with nearly double the average rates of disciplinary actions,

incidents, and actions resulting in instructional loss. This dynamic suggests that, rather than

rectify the shortcomings of public schools, charter schools were replicating and exacerbating

some of the very same problems, such as the discipline disparities that negatively impact

students of color and students in poverty (Bryan, Day‐Vines, Griffin & Moore‐Thomas, 2012;

Skiba, Chung, Trachok, Baker, Sheya & Hughes, 2014; US Commission on Civil Rights, 2018)

and the denial of adequate supports to emergent bilingual students (Redford, 2018). Without

achieving improved learning outcomes as indicated by the propensity to low SPF ratings, this

study suggests that in Denver Public Schools charter schools may not be the solution to public

school challenges that some perceive them to be (Chubb & Moe, 2011), but instead are possibly

126
amplifying the inequitable practices historically marginalized students face in schools today

(Kantor & Lowe, 2016).

Implications for Researchers

The quantitative data and methods employed by this study are not only valuable tools for

district leaders and policymakers to advance more equitable educational systems for historically

marginalized students, but can and should also be used by education researchers whose work

advocates for these same ends. This study paid special attention to the characteristics of and

services provided to students who carry the English Learner label, as these students’ frequent

racialized and inherently linguistically-marked statuses have resulted in an extensive and well-

documented history of these students being poorly served in public schools (Commins &

Miramontes, 1989; Poza, 2016; MacSwan, 2005; San Miguel & Donato, 2010; Santa Ana, 2004;

Shannon & Escamilla, 1999; Valdés, 1998). Although quantitative data and methods are not

uncommon in the field of bilingual education as evidenced through assessment (Buono & Jang,

2021) and mixed methods (Hopewell, 2011) studies, this dissertation argues that expanding the

use of these tools can increase the effectiveness of research advocacy in service of emergent

bilingual students and families.

Currently, work which frames bilingual research and teaching as advocacy is dominated

by qualitative studies (Palmer, 2018). While the roots of prioritizing qualitative data and methods

when highlighting the experiences and needs of historically marginalized communities like

emergent bilinguals and those whose language practices are marked is well-founded in Critical

Race methods and literature (DeCuir & Dixson, 2004; Delgado, 1989), this study proposes that

researchers who view their work as advocacy in service of bilingual communities would be well

127
served to expand those methodological approaches to include more quantitative tools in line with

QuantCrit.

Understandably, there might be hesitation to use quantitative data and analysis in

advocacy research, as such tools have historically been employed to legitimize the very sort of

oppressive institutional policies and practices (Bonilla-Silva & Zuberi, 2008) that social justice

researchers seek to dismantle. However, this study demonstrates the potential for bilingual

education researcher advocates to reclaim quantitative data and analysis in service of our goals.

By using such approaches, this study revealed that trends that interest the work of bilingual

education researchers dedicated to promoting equitable educational experiences for bilinguals

students, such as: charter schools placed English Learner students in environments designed for

their success and in accordance with their family preferences at marginal rates; no ELs in the

lowest-rated schools participated in Dual Language programs; that the lowest-rated schools had

the largest differential between parents who wanted native language programming (averaging

40% of parents’ preferences) and ELs who received it (averaging 10% of ELs); nowhere in the

district did all families who wanted native language programming for their EL students receive it

(on average 39.1% of parents wanted this, but only 16.7% of ELs receive it); Spanish-speaking

ELs were overrepresented in the lowest-rated schools; despite approximately one in three

students being ELs, only one in 30 ELs were in GT programs; and in all schools the English

Learner participation rates in Special Education programs were disproportionate to their rates in

the overall student population, with this disproportionality most severe in the highest-rated

schools.

Findings such as these can be used by bilingual education researchers for advocacy

purposes, not only by highlighting an area in which emergent bilingual students are being

128
underserved and thus an area that researcher advocates should attend to, but also by providing us

with quantitative data that can be easily disseminated in research publications, policy briefs, and

other avenues in which we work directly with policymakers in the hopes of affecting institutional

reforms. For example, the finding that Spanish-speaking ELs were overrepresented in the lowest-

rated schools could inform the need for qualitative research into the raciolinguistic language

ideologies of teachers regarding English-Spanish bilingualism, while the finding that there is

much greater parent demand for native language programming than is currently being provided

could inform policy research and advocacy to prompt districts to provide more of these services

as well as schools of education to invest more in preparing the bilingual teachers necessary for

these programs. Similarly, the finding that ELs participated in Special Education programs

disproportionate to their rates in the overall student population could be used to establish legal

standards for proving "discriminatory impact" (Haney, 2000) that bilingual education researcher

advocates can then use to push district leaders, policymakers, and legislators to implement

revised policies so that emergent bilingual students are better served in public schools.

Despite its problematic history, quantitative data analysis is a threshold that such decision

makers use when crafting policy. Acknowledging this does not minimize the problematic

tendency of quantitative data being presented as objective and value-neutral; rather, it accepts

that despite being ideological in nature such types of evidence are effective in speaking to those

with the power to enact the change for which we are fighting (Crawford, Demack, Gillborn &

Warmington, 2019). In addition, quantitative data can be used by bilingual education researchers

to inform future projects and substantiate our recommendations for increased supports and

investments, both at the district as well as the teacher education context.

129
Implications for Teachers, Families, and Advocates

Finally, this work has particularly important implications for the students, families, and

teachers who are adversely impacted by flawed accountability policies like the School

Performance Framework as well as for those who consider themselves allies and advocates for

such communities. Because of the potential for marginalized populations to internalize deficit

ideologies (Kohli, 2014), it is necessary for research to provide counterevidence whenever

possible. This research provides empirical evidence that there is something flawed in the systems

used to manage and evaluate students – not something flawed in students. Absent such evidence,

policies like the SPF which report that historically marginalized students are concentrated in

‘failing’ schools implicitly place the locus of responsibility of that failure on students and

teachers rather than on an accountability system that is ineffective and biased, or a school system

which denies them the opportunities, resources, and supports they deserve. As such, it is of little

wonder why families, students, and teachers come to interpret the disparate outcomes of

accountability as reflections of disparate abilities, talents, and merits. This interpretation is not

only inaccurate, but deeply harmful. For this reason, this work is not only intended for

policymakers and researchers but also those who are subjected to the worst outcomes of biased

educational policies in hopes that offering alternative interpretations of academic disparities can

deter the internalization of blame for them.

Without such data, families and teachers may come to interpret disparate accountability

and achievement outcomes as reflections of personal failure. Even if this is not the case, families

may erroneously believe that, if they can only find an alternative school such as through the

allure of charters, their students will have greater opportunities and success. Sadly, the results

from the regression analyses in this study suggest otherwise, as findings indicated that the

130
proportion of students of color in a school was a statistically significant predictor of both

accountability scores as well as outcomes. This means that, if schools are subjected to biased

accountability frameworks or school systems do not provide equitable resources and

opportunities to students of color, there is no “escaping” these low ratings and disparate

outcomes; rather, they follow students as the low ratings are related to student demographics and

not necessarily student learning or school quality. Since student demographics predict

accountability scores, if those students change schools the low accountability scores are

predicted to follow them. Such information is important for families and teachers to have, not

only as it displaces the blame for low accountability scores from them personally but also

because it clarifies that simply changing schools is unlikely to result in improvements as the

accountability system, and possibly the district mechanisms for allocating resources and

opportunities, are the cause of the disparate outcomes, not the students or the schools.

Using methods and data like those employed in this study can help to directly counter

such deficit interpretations of academic disparities by highlighting the role that non-student, non-

teacher, and non-family factors play in producing disparate outcomes such as the

overrepresentation of historically marginalized students in low-rated schools. Just as Critical

Race scholars use counterstories to contest dominant deficit narratives (Ladson-Billings, 2013a),

so too can quantitative research that highlights the institutional mechanisms by which historically

marginalized communities are further disenfranchised be taken as a counternarrative, as such

data dispels interpretations which place the blame for academic disparities on those communities

themselves. Teachers, families, and students who are subjected to deficit narratives that attribute

the responsibility for academic disparities to them personally deserve to have access to counter

evidence that more accurately ascribes responsibility to the policymakers and district leaders

131
who craft and enact accountability frameworks and educational policies that lead to biased

outcomes and inadequate attention to students’ needs.

Further, these data and methods are accessible to a wide range of audiences. As someone

with only a limited background in statistical methods, the research design of this study by

necessity reflects an intuitive approach to quantitative data and analysis that I have found to be

accessible to the teachers, advocates, and families with whom I share this work. As such, for

teachers, families, students, and their allies, the approach used in this dissertation can offer a

means to communicate about and understand educational disparities that is accessible to

policymakers and community members alike, thus not only offering counternarratives to combat

the internalization of deficit views but also tools to advocate for educational policies and

practices that better serve historically marginalized students.

Limitations

This is not to say the study is without limitations. The focus of the study, the School

Performance Framework, was disbanded in 2020 and replaced by the accountability framework

developed and used by the Colorado Department of Education, also called the School

Performance Framework (Denver Public Schools (n.d. - d), making the issues and shortcomings

explored here without a current referent in the district. However, because of the centrality of

racism to US history and institutions (Ladson-Billings, 2013a), similar investigations of other

iterations of accountability policies will still be needed, as the disparate impacts of accountability

frameworks that disadvantage historically marginalized communities is not isolated to Denver

(Glynn & Waldeck, 2013; Harris, 2007; Lakin & Young, 2013; Martinez-Garcia, LaPrairie &

Slate, 2011; McNeil, Coppola, Radigan & Vasquez Heilig, 2008; Menken, 2006; Reyes &

Garcia, 2014; Tsang, Katz & Stack, 2008; Vasquez Heilig & Darling-Hammond, 2008; Wu,

132
2013). Nonetheless, that this study describes the specific nature and outcomes of an

accountability policy that is no longer used represents a limitation of the utility of the findings,

although the implications for policymakers, researchers, and community members to similarly

employ quantitative data and methods remain, as does the need for future iterations of

accountability to be reviewed for equity and efficacy not only in Denver but in any district

employing similar accountability frameworks.

In addition, because the multiple regressions in this study treated student demographic

categories as discrete rather than intersectional, this study may perpetuate inaccurate

representation of student identities that compromises the utility and accuracy of the findings

(Covarrubias & Vélez, 2013; Covarrubias, Nava, Lara, Burciaga, Vélez & Solórzano, 2018).

Future studies employing similar methods for similar purposes would be well served to expand

the methodological framework in order to produce more nuanced findings and more accurately

represent the intersectional identities of the historically marginalized communities at the heart of

this study.

Conclusion

This study used QuantCrit and Critical Race frameworks to examine the student

demographic, school context, and English Learner characteristics and services that previous

research suggested impact the learning opportunities and outcomes of students but that were

extrinsic to the accountability framework used in Denver Public Schools. The central finding of

this study confirmed the need for this analysis, as these factors were all reflected in

accountability outcomes yet not officially measured by the accountability framework. This

finding indicates that the SPF used by Denver Public Schools was not solely a measure of

student learning or school quality, but also student demographics, school contexts, and English

133
Learner characteristics and services. Yet, without actually measuring these factors, the

accountability framework was unable to identify and respond to how they appear to relate to the

disparate learning outcomes that the SPF did measure. The disconnect between the learning

outcomes the SPF purported to measure and these extrinsic factors which it in reality reflected

resulted in an accountability framework that had limited ability to identify the needs of low-

performing schools and thereby provide needed interventions and supports, which likely

accounts for the finding that more and more schools become low-performing over time despite

the intention of the accountability framework to have the opposite effect.

This study recognizes that the accountability movement has roots in the struggle of

historically marginalized communities to create more equitable learning environments for their

students. Yet the way accountability was manifested in Denver Public Schools appears to have

had the opposite effect, penalizing schools and teachers for working with larger proportions of

these students and offering solutions in the form of charters which further exacerbated the

inequitable environments and supports these students received. As such, this study highlights the

need for accountability policy to be more intentional in its design and implementation, with a

greater focus on evaluating non-test metrics of school needs and contexts in order to provide the

supports necessary to equalize the learning environments and opportunities between the lowest-

and highest-rated schools. Other non-test metrics like student demographics must also be

measured to ensure accountability systems are not reproducing the disenfranchisement of

historically marginalized communities. This study suggests that equity checks be incorporated

into any accountability policy to ensure that adverse impacts are not disproportionately felt by

historically marginalized students, in addition to the publication of the outcomes of these checks

so students and families can evaluate both the efficacy and fairness of the accountability results.

134
Perhaps more importantly, this study used quantitative data to highlight the disparate

school contexts, services, and types of students in the highest- and lowest-rated schools as a

means of providing a counternarrative to the deficit view which would ascribe disparate learning

outcomes to student and teacher failure. The finding that the accountability policy employed by

the district reflected student demographic and school contexts metrics extrinsic to the framework

is a valuable counternarrative to the deficit ideologies which would ascribe responsibility of

educational disenfranchisement to the communities that suffer them rather than district leaders

and policymakers who allocate resources and opportunities. The research methods and data

employed here are not only fruitful means of producing such counternarratives, but they can also

be useful tools for policymakers, bilingual education researchers and advocates, and community

members and allies who likewise seek to identify the mechanisms by which educational policy

reproduces and legitimizes marginalization. Doing so can help us explore how educational

policies can then be revised and thus converted into a means of equitably serving and

empowering the students of color, students in poverty, and especially emergent bilingual students

for whom I hope this study has been of service.

Although I will never see, experience, or understand the world like they do, I have been

witness to the casual, chronic, and systemic abuses that my friends and family have endured in

public schools. Because of this, I hope this work is successful not only in exposing the ways that

accountability policy results in marginalization, but also in aiding the pursuit of better

educational systems that treat all children with the love and humanity they deserve. This

investigation into the disparate outcomes of accountability policy strives to highlight the places

where current policy fails to serve the raced, classed, and linguistically marked students in

135
Denver. In doing so, I hope this project serves the work of all those who strive toward creating a

better, more equitable system for daughter, nieces, nephews, and all the students like them.

136
References

Abedi, J. (2004). The no child left behind act and English language learners: Assessment and

accountability issues. Educational Researcher, 33(1), 4-14.

Adams, C. M., Forysth, P. B., Ware, J. K., Mwavita, M., Barnes, L. L., & Khojasteh, J. (2016).

An empirical test of Oklahoma’s A-F grades. Education Policy Analysis Archives, 24, 4

Akiba, M., LeTendre, G. K., & Scribner, J. P. (2007). Teacher quality, opportunity gap, and

national achievement in 46 countries. Educational Researcher, 36(7), 369-387.

Alspaugh, J. W. (1994). The relationship between school size, student teacher ratio and school

efficiency. Education, 114(4), 593-602.

Ambrosio, J. (2013). Changing the subject: Neoliberalism and accountability in public education.

Educational Studies, 49(4), 316-333.

Anderson, K. T., & Holloway, J. (2020). Discourse analysis as theory, method, and epistemology

in studies of education policy. Journal of Education Policy, 35(2), 188-221.

Anyon, Y., Wiley, K., Samimi, C., & Trujillo, M. (2021). Sent out or sent home: Understanding

racial disparities across suspension types from critical race theory and QuantCrit

perspectives. Race, Ethnicity and Education, 1-20.

Asmar, M. (2016a, May 12). Here’s how Denver Public Schools will decide to close low-

performing schools. Chalkbeat. https://co.chalkbeat.org/2016/5/12/21103235/here-s-how-

denver-public-schools-will-decide-to-close-low-performing-schools

Asmar, M. (2016b, November 16). Which Denver schools are falling short on the school

district’s new equity rating? Chalkbeat.

https://co.chalkbeat.org/2016/11/16/21100574/which-denver-schools-are-falling-short-

on-the-school-district-s-new-equity-rating

137
Asmar, M (2016c, October 27). Your guide to understanding Denver Public Schools’ color-

coded school rating system. Chalkbeat.

https://co.chalkbeat.org/2016/10/27/21100475/your-guide-to-understanding-denver-

public-schools-color-coded-school-rating-system

Asmar, M. (2017, December 4). Why Denver’s school rating system is coming under fire on

multiple fronts. Chalkbeat. https://co.chalkbeat.org/2017/12/4/21103858/why-denver-s-

school-rating-system-is-coming-under-fire-on-multiple-fronts

Asmar, M. (2018, October 16). Closure is still an option, but a new approach will let struggling

Denver schools make their case. Chalkbeat.

https://co.chalkbeat.org/2018/10/16/21105926/closure-is-still-an-option-but-a-new-

approach-will-let-struggling-denver-schools-make-their-case

Asmar, M. (2019a. April 3). Calls are mounting to change Denver’s school rating system.

Here’s how it works now. Chalkbeat. https://chalkbeat.org/posts/co/2019/04/03/calls-are-

mounting-to-change-denvers-school-rating-system-heres-how-it-works-now/

Asmar, M. (2019b, October 12). Record number of Denver schools earn top ratings on latest

district quality scale. Chalkbeat. https://chalkbeat.org/posts/co/2017/10/12/record-

number-of-denver-schools-earn-top-ratings-on-latest-district-quality-scale/

Asmar, M. (2020a, June 10). Black students in Denver are much more likely to be ticketed or

arrested at school. Chalkbeat. https://co.chalkbeat.org/2020/6/10/21287249/black-

students-denver-more-likely-ticketed-arrested

Asmar, M. (2020b, May 4). Committee: Denver should adopt Colorado school rating system,

plus additional data. Chalkbeat. https://co.chalkbeat.org/2020/5/4/21247438/reimagine-

spf-committee-denver-recommendations-school-ratings

138
Asmar, M. (2020c, August 21). Denver discards school rating system, will move forward with an

information dashboard. Chalkbeat. https://co.chalkbeat.org/2020/8/21/21386185/denver-

discards-school-rating-system-will-move-forward-with-an-information-dashboard

Asmar, M. (2021, November 5). Denver to develop criteria for when to close under-enrolled

schools. Chalkbeat. https://co.chalkbeat.org/2021/11/5/22762476/denver-school-closure-

consolidation-develop-criteria

Asmar, M. (2022, May 23). Denver superintendent’s goals include dismantling ‘oppressive

systems. Chalkbeat. https://co.chalkbeat.org/2022/5/23/23138733/denver-alex-marrero-

superintendent-goals-school-board

Baker, C., & Wright, W. E. (2017). Foundations of bilingual education and bilingualism (6th

ed.). Bristol;Buffalo;: Multilingual Matters.

Bates, L. A., & Glick, J. E. (2013). Does it matter if teachers and schools match the student?

Racial and ethnic disparities in problem behaviors. Social science research, 42(5), 1180-

1190.

Bell D. A. (1980). Brown v. Board of Education and the interest-convergence dilemma. Harvard

law review, 518-533.

Bell, D. A. (1992). Faces at the bottom of the well: The permanence of racism. New York: Basic

Books.

Bialystok, E., Craik, F. I., Green, D. W., & Gollan, T. H. (2009). Bilingual minds. Psychological

science in the public interest, 10(3), 89-129.

Black, W. R. (2006). Constructing accountability performance for English language learner

students: An unfinished journey toward language minority rights. Educational Policy,

20(1), 197-224.

139
Blanchett, W. J., Klingner, J. K., & Harry, B. (2009). The intersection of race, culture, language,

and disability: Implications for urban education. Urban Education, 44(4), 389-409.

Bonilla-Silva, E. (2006). Racism without racists: Color-blind racism and the persistence of

racial inequality in the United States. Rowman & Littlefield Publishers.

Bonilla-Silva, E., & Zuberi, T. (2008). Toward a Definition of White Logic and White Methods.

In T. Zuberi & E. Bonilla-Silva (Eds.), White logic, white methods: Racism and

methodology (pp. 9-31). Lanham: Rowman & Littlefield Publishers.

Borman, G. D., & Kimball, S. M. (2005). Teacher quality and educational equality: Do teachers

with higher standards-based evaluation ratings close student achievement gaps?. The

elementary school journal, 106(1), 3-20.

Bourdieu, P., & Thompson, J. B. (1991). Language and symbolic power. Cambridge, Mass:

Harvard University Press.

Brewer, C., Knoeppel, R. C., & Lindle, J. C. (2015). Consequential validity of accountability

policy: Public understanding of assessments. Educational Policy, 29(5), 711-745.

Brooks, M. D. (2020). Transforming literacy education for long-term English Learners:

Recognizing brilliance in the undervalued. Routledge, Taylor & Francis Group.

Bryan, J., Day-Vines, N. L., Griffin, D., & Moore-Thomas, C. (2012). The disproportionality

dilemma: Patterns of teacher referrals to school counselors for disruptive behavior.

Journal of Counseling & Development, 90(2), 177-190.

Buono, S., & Jang, E. E. (2021). The Effect of Linguistic Factors on Assessment of English

Language Learners’ Mathematical Ability: A Differential Item Functioning Analysis.

Educational Assessment, 26(2), 125-144.

Burman, E., Greenstein, A., Bragg, J., Hanley, T., Kalambouka, A., Lupton, R., ... & Winter, L.

140
(2017). Subjects of, or subject to, policy reform? A Foucauldian discourse analysis of

regulation and resistance in UK narratives of educational impacts of welfare cuts: The

case of the ‘bedroom tax’. Education Policy Analysis Archives, 25, 26.

Callahan, R. M., & Hopkins, M. (2017). Policy brief: Using ESSA to improve secondary English

learners’ opportunities to learn through course taking. Journal of School Leadership,

27(5), 755-766.

Campbell-Montalvo, R. A. (2020). Being QuantCritical of U.S. K-12 demographic data: Using

and reporting race/ethnicity in Florida heartland schools. Race, Ethnicity and Education,

23(2), 180-199.

Card, D., & Giuliano, L. (2016). Universal screening increases the representation of low-income

and minority students in gifted education. Proceedings of the National Academy of

Sciences, 113(48), 13678-13683.

Chubb, J. E., & Moe, T. M. (2011). Politics, markets, and America's schools. Brookings

Institution Press.

Clotfelter, C. T., Ladd, H. F., & Vigdor, J. (2005). Who teaches whom? Race and the distribution

of novice teachers. Economics of Education review, 24(4), 377-392.

Coleman, J. S., Campell, E. Q., Hobson, J., McPartland, J., Mood, A. M., Weinfeld, F. D. &

York, R. L. (1966). Equality of Educational Opportunity. Washington, DC: U.S.

Government Printing Office

Colorado Department of Education (2019). A Look Back: History of Performance Frameworks

in Colorado https://www.cde.state.co.us/accountability/historyofperformanceframeworks

Commins, N. L., & Miramontes, O. B. (1989). Perceived and actual linguistic competence: A

141
descriptive study of four low-achieving Hispanic bilingual students. American

Educational Research Journal, 26(4), 443-472

Consent Decree of the U.S. District Court (2012). Consent Decree of the U.S. District Court:

Denver Public Schools; English Language Acquisition Program. Denver Public Schools.

http://thecommons.dpsk12.org/cms/lib/CO01900837/Centricity/domain/48/governance/C

onsent%20Decree%20about%20ELA-PACs.pdf

Contreras, R. (2011, April 4). East Los Angeles students walkout for educational reform (East

L.A. Blowouts), 1968. Global Nonviolent Action Database.

https://nvdatabase.swarthmore.edu/content/east-los-angeles-students-walkout-

educational-reform-east-la-blowouts-1968

Covarrubias, A. (2011). Quantitative intersectionality: A critical race analysis of the Chicana/o

educational pipeline. Journal of Latinos and Education, 10(2), 86-105.

Covarrubias, A., & Liou, D. D. (2014). Asian American education and income attainment in the

era of post-racial America. Teachers College Record, 116(6), 1-38.

Covarrubias, A., & Vélez, V. (2013). Critical race quantitative intersectionality: An antiracist

research paradigm that refuses to “let the numbers speak for themselves.” In M. Lynn &

A. Dixson (Eds.), Handbook of critical race theory in education (pp. 270– 285). New

York: Routledge.

Covarrubias, A., Nava, P. E., Lara, A., Burciaga, R. & Solórzano, D. G. (2019). Expanding

educational pipelines: Critical Race Quantitative Intersectionality as a transactional

methodology. In J. T. Decuir-Gunby, T. K. Chapman & P. A. Schutz (Eds.),

Understanding Critical Race Research Methods and Methodologies: Lessons From the

Field (pp. 138-149). Routledge.

142
Covarrubias, A., Nava, P. E., Lara, A., Burciaga, R., Vélez, V. N., & Solórzano, D. G. (2018).

Critical race quantitative intersections: A testimonio analysis. Race, Ethnicity and

Education , 21 (2), 253– 273.

Cramer, E., Little, M. E., & McHatton, P. A. (2018). Equity, equality, and standardization:

Expanding the conversations. Education and Urban Society, 50(5), 483-501.

Crawford, C. E. (2019). The one-in-ten: Quantitative Critical Race Theory and the education of

the ‘new (white) oppressed’. Journal of Education Policy, 34(3), 423-444.

Crawford, C. E., Demack, S., Gillborn, D. & Warmington, P. (2019). Quants and crits: Using

numbers for social justice (or, how not to be lied to with statistics). In J. T. Decuir-

Gunby, T. K. Chapman & P. A. Schutz (Eds.), Understanding Critical Race Research

Methods and Methodologies: Lessons From the Field (pp. 125-137). Routledge.

Crenshaw, K. (1991). Mapping the margins: Intersectionality, identity politics, and violence

against women of color. Stanford Law Review, 43(6), 1241-1299.

Cruz, R. A., Kulkarni, S. S., & Firestone, A. R. (2021). A QuantCrit analysis of context,

discipline, special education, and disproportionality. AERA Open, 7, 233285842110413.

Cuban, L. (2004). Looking through the rearview mirror at school accountability. In K. A.

Sirotnik (Ed.), Holding accountability accountable: What ought to matter in public

education (pp. 18-34). London; New York;: Teachers College Press.

Dabach, D. B. (2015). Teacher placement into immigrant English learner classrooms: Limiting

access in comprehensive high schools. American Educational Research Journal, 52(2),

243-274.

Darling-Hammond, L. (1998). Unequal opportunity: Race and education. The Brookings Review,

16(2), 28.

143
Darling-Hammond, L. (2000). Teacher quality and student achievement. Education policy

analysis archives, 8, 1.

Darling-Hammond, L. (2004) The color line in American education: race, resources, and student

achievement. W.E.B. DuBois Review: Social Science Research on Race, 1(2), 213–246.

Darling-Hammond, L. (2007a). Race, inequality and educational accountability: The irony of 'no

child left behind'. Race Ethnicity and Education, 10(3), 245-260.

Darling-Hammond, L. (2013). Inequality and School Resources: What It Will Take to Close the

Opportunity Gap. In P. L Carter & K. G. Welner (Eds.), Closing the opportunity gap:

What america must do to give every child an even chance (pp. 76-97). New York: Oxford

University Press.

Darling-Hammond, L., Bae, S., Cook-Harvey, C. M., Lam, L., Mercer, C., Podolsky, A., &

Stosich, E. L. (2016). Pathways to new accountability through the Every Student

Succeeds Act. Palo Alto, CA: Learning Policy Institute.

DeBray-Pelot, E., & McGuinn, P. (2009). The new politics of education: Analyzing the federal

education policy landscape in the post-NCLB era. Educational Policy, 23(1), 15-42.

DeCuir-Gunby, J. & Thandeka, K.. (2019). Critical Race Theory, racial justice, and education:

Understanding Critical Race research methods and methodologies. In J. T. Decuir-

Gunby, T. K. Chapman & P. A. Schutz (Eds.), Understanding Critical Race Research

Methods and Methodologies: Lessons From the Field (pp. 3-10). Routledge.

DeCuir, J. T., & Dixson, A. D. (2004). “So when it comes out, they aren’t that surprised that it is

there”: Using critical race theory as a tool of analysis of race and racism in education.

Educational researcher, 33(5), 26-31.

Delgado, R. (1989). Storytelling for oppositionists and others: A plea for narrative. Michigan

144
Law Review, 87(8), 2411-2441.

Denver Public Schools (2018). Portfolio Management Team: Accountability Report Reflecting

on School Year 2016-2017.

https://drive.google.com/file/d/1fXoljVjQShaj8kAsljcYb0DHMW2jGHve/view

Denver Public Schools (n.d. - a). Portfolio Management Team: School Performance Compact.

https://portfolio.dpsk12.org/school-performance-compact/

Denver Public Schools (n.d. - b). Portfolio Management Team: School Performance Framework.

https://portfolio.dpsk12.org/school-performance-framework/

Denver Public Schools (n.d. - c). School Performance Framework: Learn more with an SPF

Report Guide. https://spf.dpsk12.org/en/understanding-your-spf-report/

Denver Public Schools (n.d. - d). School Performance Framework: Shifting to the Colorado

School Performance Framework. https://spf.dpsk12.org/en/

Denver Public Schools (n.d. - e). School Performance Framework: What does the SPF measure?

https://spf.dpsk12.org/en/what-does-the-spf-measure/

Diamond, J. B., & Spillane, J. P. (2004). High-stakes accountability in urban elementary schools:

Challenging or reproducing inequality? Teachers College Record, 106(6), 1145-1176.

Donato, R. (1997). The other struggle for equal schools: Mexican Americans during the civil

rights era. Albany, NY: State University of New York Press.

Donato, R., & Hanson, J. (2012). Legally white, socially “Mexican”: The politics of de jure and

de facto school segregation in the American southwest. Harvard Educational Review,

82(2), 202-225.

Dorn, S., & Ydesen, C. (2014). Towards a comparative and international history of school testing

145
and accountability. Education Policy Analysis Archives/Archivos Analíticos de Políticas

Educativas, 22, 1-11.

Dorner, L. M., Orellana, M. F., & Li-Grining, C. P. (2007). “I helped my mom,” and it helped

me: Translating the skills of language brokers into improved standardized test scores.

American Journal of Education, 113(3), 451-478.

Driscoll, D., Halcoussis, D., & Svorny, S. (2003). School district size and student performance.

Economics of Education Review, 22, 193–201.

Dworkin, A. G. (2005). The No Child Left Behind Act: Accountability, high-stakes testing, and

roles for sociologists. Sociology of Education, 78(2), 170–174.

Elmore, R., & Fuhrman, S. (1995). Opportunity to learn and the state role in education. Teachers'

College Record, 96(3), 433-458.

Fairbairn, S. B., & Fox, J. (2009). Inclusive achievement testing for linguistically and culturally

diverse test takers: Essential considerations for test developers and decision makers.

Educational Measurement: Issues and Practice, 28(1), 10-24.

Fairclough, N. (1995). Media discourse. London; New York;: E. Arnold.

Finn, J. L., Nybell, L. M., & Shook, J. J. (2010). The meaning and making of childhood in the

era of globalization: Challenges for social work. Children and Youth Services Review,

32(2), 246-254.

Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher

motivation? Lessons from Chicago's low-performing schools. American Educational

Research Journal, 44(3), 594-629.

Fitzgerald, K., Gordon, T., Canty, A., Stitt, R. E., Onwuegbuzie, A. J., & Frels, R. K. (2013).

146
Ethnic Differences in Completion Rates as a Function of School Size in Texas High

Schools. Journal of At-Risk Issues, 17(2), 1-10.

Flores, B. (2005). The intellectual presence of the deficit view of Spanish-speaking children in

the educational literature during the 20th century. Latino education: An agenda for

community action research, 75-98.

Fuller, E. J., & Johnson, J. F. (2001). Can state accountability systems drive improvements in

school performance for children of color and children from low-income homes?

Education and Urban Society, 33(3), 260-283.

Fusarelli, L. D. (2004). The potential impact of the No Child Left Behind Act on equity and

diversity in American education. Educational Policy, 18(1), 71–94.

Garces, L. M., Ishimaru, A. M., & Takahashi, S. (2017). Introduction to beyond interest

convergence: Envisioning transformation for racial equity in education. Peabody Journal

of Education, 92(3), 291-293.

Garcia, N. M., López, N., & Vélez, V. N. (2018). QuantCrit: Rectifying quantitative methods

through critical race theory. Race Ethnicity and Education: QuantCrit: Rectifying

Quantitative Methods through Critical Race Theory, 21(2), 149-157.

Gershon, I. (2016). "I'm not a businessman, I'm a business, man": Typing the neoliberal self into

a branded existence. HAU: Journal of Ethnographic Theory, 6(3), 223.

Gillborn, D. (2005). Education policy as an act of white supremacy: Whiteness, critical race

theory and education reform. Journal of Education Policy, 20(4), 485-506.

Gillborn, D., Warmington, P., & Demack, S. (2018). QuantCrit: education, policy, ‘Big Data’

and principles for a critical race theory of statistics. Race Ethnicity and Education, 21(2),

158-179.

147
Glynn, T. P., & Waldeck, S. E. (2013). Penalizing diversity: How school rankings mislead the

market. The Journal of Law and Education, 42(3), 417.

Goldhaber, D., Lavery, L., & Theobald, R. (2015). Uneven playing field? Assessing the teacher

quality gap between advantaged and disadvantaged students. Educational researcher,

44(5), 293-307.

Grindal, T., Schifter, L. A., Schwartz, G., & Hehir, T. (2019). Racial Differences in Special

Education Identification and Placement: Evidence Across Three States. Harvard

Educational Review, 89(4), 525-553.

Grissom, J. A., & Redding, C. (2015). Discretion and disproportionality: Explaining the

underrepresentation of high-achieving students of color in gifted programs. Aera Open,

2(1).

Guiton, G., & Oakes, J. (1995). Opportunity to learn and conceptions of educational equality.

Educational Evaluation and Policy Analysis, 17(3), 323-336.

Haney, W. (2000). The myth of the Texas miracle in education. Education Policy Analysis

Archives, 8(41), 41.

Hanushek, E. A. (1997). Assessing the effects of school resources on student performance: An

update. Educational Evaluation and Policy Analysis, 19(2), 141-164.

Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved

student performance? Journal of Policy Analysis and Management, 24(2), 297-327

Harris, C. (1993), Whiteness as property. Harvard Law Review,106(8), 1707–1791.

Harris, D. N. (2007). High-Flying schools, student disadvantage, and the logic of NCLB.

American Journal of Education, 113(3), 367-394.

Hartsock, N. (1997). The Feminist Standpoint. In L. J. Nicholson (Ed). The second wave: A

148
reader in feminist theory. New York: Routledge.

Heubert, J. P., & Hauser, R. M. (1999). High stakes: Testing for tracking, promotion, and

graduation. Washington, DC: National Academy Press.

Hill, J. H. (2009). The everyday language of white racism. John Wiley & Sons.

Hojo, M. (2021). Association between student-teacher ratio and teachers’ working hours and

workload stress: evidence from a nationwide survey in Japan. BMC Public Health, 21(1).

Hopewell, S. (2011). Leveraging bilingualism to accelerate English reading comprehension.

International Journal of Bilingual Education and Bilingualism, 14(5), 603-620.

Hopewell, S., & Escamilla, K. (2014). Struggling reader or emerging biliterate student?

Reevaluating the criteria for labeling emerging bilingual students as low achieving.

Journal of Literacy Research, 46(1), 68-89.

Howard, T. C., & Navarro, O. (2016). Critical race theory 20 years later: Where do we go from

here?. Urban Education, 51(3), 253-273.

Howe, K., Eisenhart, M., & Betebenner, D. (2002). The Price of Public School Choice.

Educational Leadership, 59(7), 20-24.

Huilla, H. (2020). A circle of research on disadvantaged schools, improvement and test-based

accountability. Improving Schools, 23(1), 68-84.

Jacobs, J., Burns, R. W., & Yendol-Hoppey, D. (2015). The inequitable influence that varying

accountability contexts in the united states have on teacher professional development.

Professional Development in Education, 41(5), 849-872.

Jacobsen, R., Snyder, J. W., & Saultz, A. (2014). Informing or shaping public opinion? the

influence of school accountability data format on public perceptions of school quality.

American Journal of Education, 121(1), 1-27

149
Jenlink, P. M. (2016). Teacher Education, Democracy, and the Social Imaginary of

Accountability. Teacher Education and Practice, 29(1), 5+.

Jerald, C. D., & Ingersoll, R. (2002). All talk, no action: Putting an end to out of field teaching.

The Education Trust.

Kantor, H., & Lowe, R. (2016). Educationalizing the welfare state and privatizing education.

Learning from the Federal Market-Based Reforms: Lessons for Every Student Succeeds

Act, 37-60.

Keane, W. (2018). On semiotic ideology. Signs and Society, 6(1), 64-87.

Keyes v. School Dist. No. 1, 396 U.S. 1215 (1973).

Keyes v. School Dist. No. 1, 576 F. Supp. 1503 (D. Colo. 1983)

Keyes v. School Dist. No. 1, No. C-1499 (D. Colo. Aug. 17, 1984)

Kim, W. G. (2017). Long-term English language learners’ educational experiences in the context

of high-stakes accountability. Teachers College Record, 119(9), 1-32.

Koc, N., & Celik, B. (2015). The impact of number of students per teacher on student

achievement. Procedia-Social and Behavioral Sciences, 177, 65-70.

Kohli, R. (2014). Unpacking internalized racism: Teachers of color striving for racially just

classrooms. Race, Ethnicity and Education, 17(3), 367-387.

Kornhaber, M. L. (2004). Appropriate and inappropriate forms of testing, assessment, and

accountability. Educational Policy, 18(1), 45-70.

Kozol, J. (1991). Children in America's Schools. New York: Crown.

Ladson-Billings, G. (2006). From the achievement gap to the education debt: Understanding

achievement in US schools. Educational researcher, 35(7), 3-12.

Ladson-Billings, G. (2013a). Critical race theory: What it is not! In M. Lynn, & A. D. Dixson,

150
(Eds.), Handbook of critical race theory in education (pp. 34–47). New York: Routledge.

Ladson-Billings, G. (2013b). Lack of Achievement or Loss of Opportunity? In P. L. Carter & K.

G. Welner (Eds). Closing the opportunity gap: What America must do to give every child

an even chance (pp. 11-24). New York: Oxford University Press.

Ladson-Billings, G., & Tate, W. F., IV. (1995). Toward a critical race theory of education.

Teachers College Record, 97(1), 47.

Lakin, J. M., & Young, J. W. (2013). Evaluating growth for ELL students: Implications for

accountability policies. Educational Measurement: Issues and Practice, 32(3), 11-26.

Lankford, H., Loeb, S., & Wyckoff, J. (2002). Teacher sorting and the plight of urban schools: A

descriptive analysis. Educational evaluation and policy analysis, 24(1), 37-62.

Lee, J. (2010). Trick or treat: New ecology of education accountability system in the USA.

Journal of Education Policy, 25(1), 73-93.

Lee, J., & Wong, K. K. (2004). The impact of accountability on racial and socioeconomic equity:

Considering both school resources and achievement outcomes. American Educational

Research Journal, 41(4), 797-832.

Leonardo, Z. (2004). The color of supremacy: Beyond the discourse of ‘white privilege’.

Educational philosophy and theory, 36(2), 137-152.

Leonardo, Z. (2015). Contracting race: Writing, racism, and education. Critical Studies in

Education, 56(1), 86-98.

Lipman, P. (2013). Economic crisis, accountability, and the state's coercive assault on public

education in the USA. Journal of Education Policy, 28(5), 557-573.

López, N., Erwin, C., Binder, M., & Chavez, M. J. (2018). Making the invisible visible:

151
Advancing quantitative methods in higher education using critical race theory and

intersectionality. Race Ethnicity and Education, 21(2), 180-207.

Losen, D. J., & Martinez, P. (2020). Lost opportunities: How disparate school discipline

continues to drive differences in the opportunity to learn. Palo Alto, CA/Los Angeles,

CA: Learning Policy Institute; Center for Civil Rights Remedies at the Civil Rights

Project, UCLA.

MacSwan, J. (2005). The “non-non” crisis and academic bias in native language assessment of

linguistic minorities. In J. Cohen, KT McAlister, K. Rolstad, & J. MacSwan (Éds.), ISB4:

Proceedings of the 4th International Symposium on Bilingualism (pp. 1415-1422).

Martin, C., Sargrad, S., Batel, S., & Center for American Progress. (2016). Making the grade: A

50-state analysis of school accountability systems. Distributed by ERIC Clearinghouse.

Martin, P. C. (2012). Misuse of high-stakes test scores for evaluative purposes: Neglecting the

reality of schools and students. Current Issues in Education, 15(3).

Martinez-Garcia, C., LaPrairie, K., & Slate, J. R. (2011). Accountability ratings of elementary

schools: Student demographics matter. Current Issues in Education, 14(1).

Martínez, R. A. (2010). " Spanglish" as Literacy Tool: Toward an Understanding of the Potential

Role of Spanish-English Code-Switching in the Development of Academic Literacy.

Research in the Teaching of English, 124-149.

Mathison, S., & Ross, E. W. (2002). The hegemony of accountability in schools and universities.

WorkPlace: a journal for academic labor, 5(1)

Matsuda, M. J. (1993). Words that wound: Critical race theory, assaultive speech, and the first

amendment. Boulder, Colo: Westview Press.

McDonnell, L. M. (1995). Opportunity to learn as a research concept and a policy instrument.

152
Educational evaluation and policy analysis, 17(3), 305-322.

McGuinn, P. J. (2006). No Child Left Behind and the transformation of federal education policy,

1965-2005. Lawrence: University Press of Kansas.

McNeil, L. M., Coppola, E., Radigan, J., & Vasquez Heilig, J. (2008). Avoidable losses: High-

stakes accountability and the dropout crisis. Education Policy Analysis Archives, 16(3), 3.

Menchaca, M. (1993). Chicano Indianism: A historical account of racial repression in the United

States. American Ethnologist, 20(3), 583-603.

Menken, K. (2006). Teaching to the test: How No Child Left Behind impacts language policy,

curriculum, and instruction for English language learners. Bilingual Research Journal,

30(2), 521-546.

Menken, K. (2010). NCLB and English language learners: Challenges and consequences. Theory

Into Practice, 49(2), 121-128.

Menken, K., & Solorza, C. (2014). No child left bilingual: Accountability and the elimination of

bilingual education programs in New York City schools. Educational Policy, 28(1), 96-

125.

Milner, H. R. (2007). Race, culture, and researcher positionality: Working through dangers seen,

unseen, and unforeseen. Educational Researcher, 36(7), 388-400.

Morris, D. S. (2021). Challenging the stereotype that minority segregated schools are unsafe: Are

crime and violence really more prevalent in segregated minority high schools? Race,

Ethnicity and Education, 1-22.

Morris, J. E & Parker, B. D. (2013). CRT in education: Historical/archival analyses. In J. T.

Decuir-Gunby, T. K. Chapman & P. A. Schutz (Eds.), Understanding Critical Race

Research Methods and Methodologies: Lessons from the Field (pp. 24-33). Routledge.

153
Murray, K. & Howe, K. R. (2017). Neglecting Democracy in Education Policy: A-F School

Report Card Accountability Systems. Education Policy Analysis Archives, 25(109), 1–31

National Commission on Excellence in Education. (1983). A nation at risk: The imperative for

education reform. Washington, DC: Government Printing Office.

Oakes, J. (1990). Multiplying inequalities: The effects of race, social class, and tracking on

opportunities to learn mathematics and science. Santa Monica: The RAND Corporation.

Oakes, J., & Guiton, G. (1995). Matchmaking: The dynamics of high school tracking decisions.

American Educational Research Journal, 32, 151-181.

Office for Civil Rights (2016). A first look: 2013-2014 civil rights data collection. US

Department of Education. https://www2.ed.gov/about/offices/list/ocr/docs/2013-14-first-

look.pdf

Palazzolo, N. (2013, June 5). Chicano students strike for equality of education in Crystal City,

Texas, 1969-1970. Global Nonviolent Action Database.

https://nvdatabase.swarthmore.edu/content/chicano-students-strike-equality-education-

crystal-city-texas-1969-1970

Palmer, D. K. (2018). Teacher leadership for social change in bilingual and bicultural

education. Multilingual Matters.

Peck, C. (2014). Paradigms, power, and PR in New York City: Assessing two school

accountability implementation efforts. Education Policy Analysis Archives, 22, 114.

Peske, H. G., & Haycock, K. (2006). Teaching Inequality: How Poor and Minority Students Are

Shortchanged on Teacher Quality: A Report and Recommendations by the Education

Trust. Education Trust.

Powers, J. M. (2003). An analysis of performance-based accountability: Factors shaping school

154
performance in two urban school districts. Educational Policy, 17(5), 558-585.

Poza, L. (2016). Barreras: Language ideologies, academic language, and the marginalization of

Latin@ English Language Learners. Whittier Law Review, 37(3), 401.

Ramírez, J. D. (1992). Executive summary. Bilingual Research Journal, 16(1-2), 1-62.

Ramlackhan, K., & Wang, Y. (2021). Urban school district performance: A longitudinal analysis

of achievement. Urban Education, 4208592110449.

Ravitch, D. (2002). Testing and accountability, historically considered. In H. J. Walberg & W.

M. Evers (Eds.), School accountability (pp. 9-21). Stanford, Calif.: Hoover Institution

Press, Stanford University.

Ravitch, D. (2010). The Death and Life of the Great American School System: How Testing and

Choice Are Undermining Education. New York: Basic Books.

Redford, J. (2018). English Language Program Participation among Students in the

Kindergarten Class of 2010-11: Spring 2011 to Spring 2012. Stats in Brief. NCES 2018-

086. National Center for Education Statistics.

Reyes, A., & Garcia, A. (2014). Turnaround policy and practice: A case study of turning around

a failing school with English-language-learners. The Urban Review, 46(3), 349-371.

Rhodes, J. H. (2011). Progressive policy making in a conservative age? Civil rights and the

politics of federal education standards, testing, and accountability. Perspectives on

Politics, 9(3), 519-544.

Roediger, D. (2005). Working toward whiteness: How American’s immigrants became white.

New York: Basic Books.

Rolstad, K., Mahoney, K., & Glass, G. V. (2005). The big picture: A meta-analysis of program

effectiveness research on English language learners. Educational Policy, 19(4), 572-594.

155
Roney, E., & Gutierrez, S. (2019, March 20). 50 years later: A look at one of the most violent

student protests in Colorado. Global Nonviolent Action Database.

https://www.9news.com/article/news/local/next/50-years-later-a-look-at-one-of-the-most-

violent-student-protests-in-colorado/73-005d4626-9536-47b8-87d8-102ba1ba2536

Rosa, J. D. (2016). Standardization, racialization, languagelessness: Raciolinguistic ideologies

across communicative contexts. Journal of Linguistic Anthropology, 26(2), 162-183.

Rosa, J., & Flores, N. (2017). Unsettling race and language: Toward a raciolinguistic

perspective. Language in Society, 46(5), 621-647.

Russell, M. (1992). Entering Great America: Reflections on race and the convergence of

progressive legal theory and practice. Hastings Law Journal, 43, 749-767.

Ryan, K. E., & Shepard, L. A. (2008). The future of test-based educational accountability.

Routledge.

Sabzalian, L., Shear, S. B., & Snyder, J. (2021). Standardizing indigenous erasure: A TribalCrit

and QuantCrit analysis of K-12 U.S. civics and government standards. Theory and

Research in Social Education, 49(3), 321-359.

San Miguel Jr, G., & Donato, R. (2010). Latino education in twentieth-century America: A brief

history. In E. G. Murillo (Ed.), Handbook of Latinos and education: Theory, research

and practice. (pp. 53-88). New York: Routledge.

Santa Ana, O. (2004). Chronology of events, court decisions, and legislation affecting language

minority children in American public education. In O. Santa Ana (Ed.), Tongue-tied: The

lives of multilingual children in public education (pp. 86-105). Lanham: Rowman &

Littlefield Publishers.

Shannon, S. M., & Escamilla, K. (1999). Mexican immigrants in US schools: Targets of

156
symbolic violence. Educational policy, 13(3), 347-370

Shum, B. (2018). Civil Rights Protections for Students Enrolled in Charter Schools. In I. C.

Rotberg & J. L. Glazer (Eds.), Choosing charters: Better schools or more segregation?.

Teachers College Press.

Skiba, R. J., Chung, C. G., Trachok, M., Baker, T. L., Sheya, A., & Hughes, R. L. (2014).

Parsing disciplinary disproportionality: Contributions of infraction, student, and school

characteristics to out-of-school suspension and expulsion. American Educational

Research Journal, 51(4), 640-670.

Slater, G. B. (2015). Education as recovery: Neoliberalism, school reform, and the politics of

crisis. Journal of Education Policy, 30(1), 1-20.

Smith, M. S., & O'Day, J. A. (1992-1993). School Reform and Equal Opportunity: An

Introduction to the Education Symposium. Stanford Law & Policy Review, 4, 15-20.

Solórzano, R. W. (2008). High stakes testing: Issues, implications, and remedies for English

language learners. Review of Educational Research, 78(2), 260-329.

Spees, L. P., Potochnick, S., & Perreira, K. M. (2016). The academic achievement of Limited

English Proficient (LEP) youth in new and established immigrant states: Lessons from

the national assessment of educational progress (NAEP). Education Policy Analysis

Archives, 24, 99.

Stage, F. K. (2007). Answering critical questions using quantitative data. New directions for

institutional research, 2007(133), 5-16.

Strong, K. A., & Escamilla, K. (2020). The need for nuance: Relationships between EL English

proficiency and accountability outcomes. Bilingual Research Journal, 43(1), 92-110.

Sullivan, A. L. (2011). Disproportionality in special education identification and placement of

157
English language learners. Exceptional Children, 77(3), 317-334.

Sunderman, G.L., Coghlan, E., & Mintrop, R. (2017). School Closure as a Strategy to Remedy

Low Performance. Boulder, CO: National Education Policy Center. Retrieved July 9,

2022 from http://nepc. colorado.edu/publication/closures

Suzuki, S., Morris, S. L., & Johnson, S. K. (2021). Using QuantCrit to advance an anti-racist

developmental science: Applications to mixture modeling. Journal of Adolescent

Research, 36(5), 535-560.

Teddlie, C., Stringfield, S., & Reynolds, D. (2002). Context issues within school effectiveness

research. In R. David, C. Teddlie & D. Reynolds (Eds.), The international handbook of

school effectiveness research (pp. 160-185). Routledge.

Tenenbaum, H. R., & Ruck, M. D. (2007). Are teachers' expectations different for racial

minority than for European American students? A meta-analysis. Journal of Educational

Psychology, 99(2), 253.

Thomas, J. Y., & Brady, K. P. (2005). Chapter 3: The Elementary and Secondary Education Act

at 40: Equity, accountability, and the evolving federal role in public education. Review of

Research in Education, 29(1), 51-67.

Thomas, W., & Collier, V. (1997). School effectiveness for language minority students.

Washington, DC: National Clearinghouse for Bilingual Education.

Tollefson, J. W., & Tsui, A. B. (2014). Language diversity and language policy in educational

access and equity. Review of Research in Education, 38(1), 189-214.

Trujillo, T., & Renée, M. (2015). Irrational exuberance for market-based reform: How federal

turnaround policies thwart democratic schooling. Teachers College Record, 117(6), 1-34.

Tsang, S., Katz, A., & Stack, J. (2008). Achieving testing for English language learners, ready or

158
not? Education Policy Analysis Archives, 16(1), 1-25.

Turkan, S., & Buzick, H. M. (2016). Complexities and issues to consider in the evaluation of

content teachers of English language learners. Urban Education, 51(2), 221-248.

US Commission on Civil Rights. (2018). Public education funding inequity: In an era of

increasing concentration of poverty and resegregation.

https://www.usccr.gov/files/pubs/2018/2018-01-10-Education-Inequity.pdf

Valdés, G. (1998). The world outside and inside schools: Language and immigrant children.

Educational Researcher, 27 (6), 4-18.

Valenzuela, A. (1999). Subtractive schooling: U.S.-Mexican youth and the politics of caring.

Albany: State University of New York Press.

van Dijk, T. A. (1993). Principles of critical discourse analysis. Discourse & society, 4(2), 249-

283.

Van Dusen, B., Nissen, J., Talbot, R. M., Huvard, H., & Shultz, M. (2022). A QuantCrit

investigation of Society’s educational debts due to racism and sexism in chemistry

student learning. Journal of Chemical Education, 99(1), 25-34.

Vasquez Heilig J., & Darling-Hammond, L. (2008). Accountability Texas-style: The progress

and learning of urban minority students in a high-stakes testing context. Educational

Evaluation and Policy Analysis, 30(2), 75-110.

Vasquez Heilig, J. (2011). As good as advertised? tracking urban student progress through high

school in an environment of accountability. American Secondary Education, 39(3), 17-

41.

Vasquez Heilig, J., Young, M., & Williams, A. (2012). At-risk student averse: Risk management

and accountability. Journal of Educational Administration, 50(5), 562-585.

159
Wang, J. (1998). Opportunity to learn: The impacts and policy implications. Educational

Evaluation and Policy Analysis, 20(3), 137-156.

Wiese, A. M., & Garcia, E. E. (1998). The Bilingual Education Act: Language minority students

and equal educational opportunity. Bilingual Research Journal, 22(1), 1-18.

Wiley, T. G., & Wright, W. E. (2004). Against the undertow: Language-minority education

policy and politics in the “age of accountability”. Educational Policy, 18(1), 142-168.

Wiliam, D. (2010). Standardized testing and school accountability.

Educational Psychologist, 45(2), 107-122.

Wilson, J. A. (2018). Neoliberalism. New York, NY: Routledge.

Wright, A. C. (2015). Teachers’ perceptions of students’ disruptive behavior: The effect of racial

congruence and consequences for school suspension. University of California, Santa

Barbara.

Wu, M. (2013). The effects of student demographics and school resources on California school

performance gain: A fixed effects panel model. Teachers College Record, 115(4), 1-28

Yosso, T. J. (2002). Toward a critical race curriculum. Equity & Excellence in Education, 35(2),

93-107.

Young, J. L., & Young, J. (2022). Underrepresentation in gifted education revisited: The promise

of single-group summaries and meta-analytic QuantCrit. The Gifted Child Quarterly,

66(2), 136-138.

Zuberi, T. (2001). Thicker than blood: How racial statistics lie. U of Minnesota Press.

160
Appendix A

Appendix Table 1.
Data Sources, Datasets, Data Types, and Data Uses in Dissertation

Source of Description
Variables within Dataset Use in Dissertation Type of Data
Dataset of Dataset

• “Destruction of school property”


• “Detrimental behavior”
Suspension/ • “Disobedient”
• “Code of conduct” • Counts used to
Expulsion
CDE calculate rate of
Statistics – • “Classroom suspension”
Education disciplinary actions • Counts
Statistics
Discipline • “In-school suspension”
and incidents per
by Race/ • “Out of school suspension” 100 students
Ethnicity • “Other action,”
• “Expulsion”
• “Referral to law enforcement”

• Counts used to
Pupil
CDE calculate percentage
Member- • “Students of Color”
Education of students of color • Counts
Statistics
ship – Race/ • “PreK-12 Total Enrollment”
out of total
Ethnicity
enrollment

• “Gifted and Talented” • Counts used to


Pupil
CDE calculate
Membership • “Special Education”
Education percentages of • Counts
Statistics
by Service • “English Learner”
students per service
Types • “Free and Reduced Lunch” type

CDE School/
• Percent
Education District Staff • “Student Teacher Ratios” • No changes made
(rate)
Statistics Statistics

All Schools
SPF • Percent
DPS SPF • “SPF Rating”
Indicator • No changes made • Categorical
Reports • “SPF Earned Points %”
Summary variables
Report

• Counts used to
• “Redesignated English Learners Count” calculate rate of • Counts by
• “Re-Entered English Learners Count” Redesignations, language
CD July
9VC5 • “Exited English Learner Count” Exits, and Re- status
Report
• “ELA Program Type” Entries per total • Categorical
• “School Type” [district-run or charter] English Learner variables
population

• Teacher counts
CD July • “Teacher total” used to calculate
9VC11 • Counts
Report • “Fully qualified teacher counts” percentage of fully
qualified teachers

161
• “Gifted and Talented – English Learner
%”
• “Gifted and Talented – Never English
CD
Learner %”
October 9VC23 • No changes made • Percent
Report • “Gifted and Talented – Exited English
Learner %”
• “Gifted and Talented – Redesignated
English Learner %”

• ACCESS scores 1-2,


3-4, and 5-6 were
combined to make
the respective
“Beginner,”
“Intermediate,” and
“Advanced” Level
counts
• Those counts were
• “WIDA Access Scores” used to calculate the
• “English Learner/Provisional Total” percentage of ELs
• “English Learner/Provisional by per Level
CD July Language Status” • Spanish-speaking
9VA2 • Counts
Report • “Englisher Learners in Special ELs were totaled;
Education Total” that total was used
• “N of all ELs” per PPF category to calculate
• “N of all ELs” per program setting percentage of
Spanish-speaking
ELs
• Counts of ELs per
PPF and program
settings were used to
calculate the
percentages of
students in each
category

162
Appendix B

Appendix Table 2.
Means of Student Demographics, English Learner Characteristics, English Learner Outcomes and
Programs, and School Contexts Across SPF Ratings Brackets for Academic Year 2016-2017
2016-2017 Academic Year
District
School Characteristics Red Orange Yellow Green Blue
Average
N 9 14 49 98 20 190
% 4.7% 7.4% 25.8% 51.6% 10.5% 100%
Student Demographics
Students of Color % 89.7 84.5 81.4 76.8 59.9 75.3
Free and Reduced Lunch % 83.8 77.5 74.3 68.8 47.9 66.3
Special Education % 15.0 14.8 12.5 10.3 7.9 12.0
English Learner % 28.6 40.6 32.5 35.7 23.9 34.3
Gifted and Talented % 10.6 11.3 14.0 11.6 20.1 15.3
English Learner Characteristics
Special Education as English Learners % 38.0 42.2 37.8 43.3 36.7 38.7
Spanish-Speaking English Learner % 81.0 80.7 83.1 79.2 64.3 75.5
English Learners in Gifted and Talented % 2.3 3.9 2.6 3.1 8.1 4.1
Beginning Level English Learner % 20.3 23.6 18.1 18.2 17.6 17.0
Intermediate Level English Learner % 76.1 70.5 74.3 71.9 63.8 73.7
Advanced Level English Learner % 3.5 6.0 7.7 9.9 18.6 9.3
English Learner Services
Redesignation % 3.8 6.1 6.5 7.4 15.5 9.9
Exit % 6.0 6.3 6.3 6.9 8.5 8.5
Re-Entry % 0.2 0.7 0.9 0.9 2.3 1.9
Parent Preference 1 % (bilingual ed) 33.0 47.0 38.4 40.2 30.0 36.0
Parent Preference 2 % (whatever is at school) 57.6 44.8 52.7 51.8 62.6 52.9
Parent Preference 3 % (nothing) 6.0 10.1 8.3 7.7 9.3 10.2
Mainstream % 24.8 47.1 23.2 26.4 54.1 9.5
ELA - English % 65.9 36.8 62.6 55.9 34.9 78.4
ELA – Spanish (ELAS) % 9.3 16.1 14.2 14.5 5.7 7.7
Dual Language (DL) % 0.0 0.0 0.0 3.2 5.3 4.3
Native Language (ELAS+DL) % 9.3 16.1 14.2 17.7 11.0 12.0
School Contexts
Total Enrollment 287.4 382.2 484.4 450.8 408.3 591.4
Student-Teacher Ratio 15.5 15.5 14.8 15.2 16.0 15.2
Fully Qualified Teacher % 70.1 72.8 78.7 82.8 90.8 76.9
Disciplinary Actions per 100 Students 34.1 19.1 15.2 10.6 5.7 15.3
Disciplinary Incidents per 100 Students 29.3 11.0 11.0 8.0 4.4 10.9

163
Disciplinary Actions Resulting in
25.9 11.1 10.2 5.8 3.2 10.6
Instructional Loss per 100 Students
Charter School % 50.0 42.9 20.8 27.1 40.0 29.0

Appendix Table 3.
Means of Student Demographics, English Learner Characteristics, English Learner Outcomes and
Programs, and School Contexts Across SPF Ratings Brackets for Academic Year 2017-2018
2017-2018 Academic Year
District
School Characteristics Red Orange Yellow Green Blue
Average
N 17 20 71 74 12 194
% 8.8% 10.3% 36.6% 38.1% 6.2% 100%
Student Demographics
Students of Color % 86.4 88.1 77.1 79.0 46.5 80.6
Free and Reduced Lunch % 75.8 79.9 70.7 70.8 34.6 71.0
Special Education % 15.6 13.9 11.9 11.5 8.5 11.4
English Learner % 40.0 37.6 32.4 39.3 19.4 37.1
Gifted and Talented % 13.6 16.4 14.7 13.5 19.2 14.8
English Learner Characteristics
Special Education as English Learners % 42.3 43.4 35.2 40.9 19.9 42.8
Spanish-Speaking English Learner % 88.1 85.7 80.7 78.0 52.2 82.5
English Learners in Gifted and Talented % 2.7 1.9 2.4 2.2 13.5 2.5
Beginning Level English Learner % 20.2 25.1 23.1 23.0 13.7 17.3
Intermediate Level English Learner % 76.1 70.5 71.5 69.9 67.7 78.3
Advanced Level English Learner % 3.6 4.3 5.4 7.1 18.6 4.4
English Learner Services
Redesignation % 9.7 9.9 17.6 10.0 15.7 15.9
Exit % 4.1 4.3 5.5 5.2 9.9 7.5
Re-Entry % 0.4 0.7 0.2 0.3 0.9 0.8
Parent Preference 1 % 39.6 37.7 39.2 41.7 18.2 38.3
Parent Preference 2 % 47.5 54.6 52.4 53.3 72.0 50.1
Parent Preference 3 % 12.0 7.1 7.7 4.5 9.7 11.3
Mainstream % 38.4 40.6 22.7 27.7 29.7 33.4
ELA - English % 52.3 48.4 59.9 50.7 66.2 58.1
ELA – Spanish (ELAS) % 9.3 11.0 15.9 17.3 4.2 6.4
Dual Language (DL) % 0.0 0.0 1.4 4.2 0.0 2.1
Native Language (ELAS+DL) % 9.3 11.0 17.4 21.6 4.2 8.5
School Contexts
Total Enrollment 299.6 496.7 454.8 434.8 470.4 612.1
Student-Teacher Ratio 14.5 13.8 14.5 14.9 16.6 14.7

164
Fully Qualified Teacher % 60.5 68.5 78.8 73.8 83.3 65.3
Disciplinary Actions per 100 Students 16.7 22.2 17.2 12.1 9.5 25.2
Disciplinary Incidents per 100 Students 11.3 15.0 11.7 8.4 8.3 16.8
Disciplinary Actions Resulting in
9.4 8.9 5.4 3.4 2.6 9.7
Instructional Loss per 100 Students
Charter School % 58.8 42.1 22.9 28.8 25.0 30.4

Appendix Table 4.
Means of Student Demographics, English Learner Characteristics, English Learner Outcomes and
Programs, and School Contexts Across SPF Ratings Brackets for Academic Year 2018-2019
2018-2019 Academic Year
District
School Characteristics Red Orange Yellow Green Blue
Average
N 24 23 69 60 15 191
% 12.6% 12.0% 36.1% 31.4% 7.9% 100%
Student Demographics
Students of Color % 87.5 85.6 77.6 76.9 54.3 72.9
Free and Reduced Lunch % 78.5 73.2 70.4 67.0 40.9 61.6
Special Education % 15.5 14.6 12.5 11.0 8.6 12.3
English Learner % 37.4 39.1 31.8 37.1 17.9 28.7
Gifted and Talented % 9.2 8.4 9.8 8.0 16.2 12.7
English Learner Characteristics
Special Education as English Learners % 42.1 45.4 33.4 41.5 30.3 32.7
Spanish-Speaking English Learner % 88.4 86.5 77.9 77.7 59.8 75.5
English Learners in Gifted and Talented % 1.3 2.5 3.0 2.7 11.6 4.6
Beginning Level English Learner % 28.4 27.3 26.0 23.7 14.9 23.5
Intermediate Level English Learner % 68.0 66.7 68.0 66.8 68.7 69.3
Advanced Level English Learner % 3.7 6.1 6.0 9.5 16.4 7.1
English Learner Services
Redesignation % 13.5 11.7 17.5 16.2 24.9 20.3
Exit % 8.8 8.5 7.4 4.3 11.9 11.4
Re-Entry % 1.0 2.7 1.9 0.8 0.6 2.0
Parent Preference 1 % 42.8 44.0 38.0 41.2 31.8 31.5
Parent Preference 2 % 50.2 47.5 55.8 54.2 61.5 56.1
Parent Preference 3 % 8.3 8.9 7.8 5.6 9.1 12.3
Mainstream % 5.9 25.0 7.3 16.2 10.7 5.0
ELA - English % 82.4 50.9 77.1 61.3 82.1 84.2
ELA – Spanish (ELAS) % 11.7 15.0 15.6 18.1 7.1 7.6
Dual Language (DL) % 0.0 9.1 0.0 4.4 0.0 3.2
Native Language (ELAS+DL) % 11.7 24.1 15.6 22.5 7.1 10.8

165
School Contexts
Total Enrollment 333.9 380.9 500.3 437.0 455.5 616.5
Student-Teacher Ratio 13.7 14.7 14.5 14.8 16.0 14.7
Fully Qualified Teacher % 75.0 74.7 82.8 83.4 83.9 80.6
Disciplinary Actions per 100 Students 24.3 14.5 17.7 8.1 10.7 16.4
Disciplinary Incidents per 100 Students 16.0 8.6 12.4 5.9 6.6 10.4
Disciplinary Actions Resulting in
8.8 5.6 6.3 2.6 2.5 5.4
Instructional Loss per 100 Students
Charter School % 52.2 45.5 17.7 31.6 26.7 30.3

166
ProQuest Number: 29322065

INFORMATION TO ALL USERS


The quality and completeness of this reproduction is dependent on the quality
and completeness of the copy made available to ProQuest.

Distributed by ProQuest LLC ( 2022 ).


Copyright of the Dissertation is held by the Author unless otherwise noted.

This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.

This work is protected against unauthorized copying under Title 17,


United States Code and other applicable copyright laws.

Microform Edition where available © ProQuest LLC. No reproduction or digitization


of the Microform Edition is authorized without permission of ProQuest LLC.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA

You might also like