0% found this document useful (0 votes)
314 views8 pages

Improving Teacher Evaluations

This article discusses issues with traditional "drive-by" teacher evaluations and explores comprehensive alternative models. It notes that typical evaluations involve brief classroom observations, checklists, and generic ratings that do little to improve teaching quality. Some new models, like TAP and BEST, establish clear teaching standards, use rigorous rubrics and portfolio assessments, involve intensive coaching, and provide feedback to help teachers improve. These comprehensive systems show it is possible to evaluate teachers in more productive ways than typical brief, perfunctory evaluations.

Uploaded by

jowelyn maderal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
314 views8 pages

Improving Teacher Evaluations

This article discusses issues with traditional "drive-by" teacher evaluations and explores comprehensive alternative models. It notes that typical evaluations involve brief classroom observations, checklists, and generic ratings that do little to improve teaching quality. Some new models, like TAP and BEST, establish clear teaching standards, use rigorous rubrics and portfolio assessments, involve intensive coaching, and provide feedback to help teachers improve. These comprehensive systems show it is possible to evaluate teachers in more productive ways than typical brief, perfunctory evaluations.

Uploaded by

jowelyn maderal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Avoiding a Rush to Judgment: Teacher

Evaluation and Teacher Quality


By: Thomas Toch, Robert Rothman

Comprehensive methods of evaluating teachers that avoid the typical "drive-by"


evaluations can promote improvements in teaching.

The troubled state of teacher evaluation is a glaring and largely neglected problem
in public education, one with consequences that extend far beyond the current
debate over performance pay. Because teacher evaluations are at the center of the
educational enterprise — the quality of teaching in the nation's classrooms — they
are a potentially powerful lever of teacher and school improvement. But that
potential is being squandered throughout public education, an enterprise that
spends $400 billion annually on salaries and benefits.

The task of building better evaluation systems is as difficult as it is important. Many


hurdles stand in the way of rating teachers fairly on the basis of their students'
achievement, the solution favored by many education experts today. And it's
increasingly clear that it's not enough merely to create moredefensible systems for
rewarding or removing teachers. Teacher evaluations pay much larger dividends
when they also play a role in improving teaching.

This article explores the causes and consequences of the crisis in teacher
evaluation. And it examines a number of national, state, and local evaluation
systems that point to a way out of the evaluation morass. Together, they
demonstrate that it's possible to evaluate teachers in much more productive ways
than most public schools do today.

Drive-bys
It's hard to expect people to make a task a priority when the system they are
working in signals that the task is unimportant. That's the case with teacher
evaluation.
Public education defines teacher quality largely in terms of the credentials that
teachers have earned, rather than on the basis of the quality of the work they do in
their classrooms or the results their students achieve.

It's not surprising, then, that measuring how well teachers teach is a low priority in
many states. The nonprofit National Council on Teacher Quality (NCTQ) reports
that, despite many calls for performance pay coming from state capitals, only
fourteen states require school systems to evaluate their public school teachers at
least once a year, while some are much more lax than that. Tennessee, for
example, requires evaluations of tenured teachers only twice a decade (NCTQ
2007a).

An NCTQ analysis of the teacher contracts in the nation's fifty largest districts
(which enroll 17 percent of the nation's students) suggest that not much teacher
evaluation is enshrined in local regulations, either. Teachers union contracts dictate
the professional requirements for teachers in most school districts. But the NCTQ
study found that only two-thirds of them require teachers to be evaluated at least
once a year and a quarter of them require evaluations only every three years
(NCTQ 2007b).

The evaluations themselves are typically of little value — a single, fleeting


classroom visit by a principal or other building administrator untrained in evaluation
wielding a checklist of classroom conditions and teacher behaviors that often don't
even focus directly on the quality of teacher instruction. "It's typically a couple of
dozen items on a list: 'Is presentably dressed,' 'Starts on time,' 'Room is safe,' 'The
lesson occupies students,'" says Michigan State University professor Mary Kennedy,
author of Inside Teaching: How Classroom Life Undermines Reform, who has
studied teacher evaluation extensively. "In most instances, it's nothing more than
marking 'satisfactory' or 'unsatisfactory.'"

It's easy for teachers to earn high marks under these capricious rating systems,
often called "drive-bys," regardless of whether their students learn. Raymond
Pecheone, co-director of the School Redesign Network at Stanford University and an
expert on teacher evaluation, suggests by way of example that a teacher might get
a "satisfactory" check under "using visuals" by hanging up a mobile of the planets
in the Earth's solar system, even though students could walk out of the class with
no knowledge of the sun's role in the solar system or other key concepts. These
simplistic evaluation systems also fail to be remotely sensitive to the challenges of
teaching different subjects and different grade levels, adds Pecheone.

Unsurprisingly, the results of such evaluations are often dubious. Donald Medley of
the University of Virginia and Homer Coker of Georgia State University reported in a
comprehensive 1987 study, "The Accuracy of Principals' Judgments of Teacher
Performance," that the research up to that point found the relationship between the
average principal's ratings of teacher performance and achievement by the
teachers' students to be "near zero."

Principals fared better in a recent study by Brian Jacob of Harvard's Kennedy School
of Government and Lars Lefgren of Brigham Young University (2005) that
compared teacher ratings to student gains on standardized tests. Principals were
able to identify with some accuracy their best and worst teachers — the top 10 or
so percent and the bottom 10 or so percent — when asked to rate their teachers'
ability to raise math and reading scores.

Principals use evaluations to help teachers improve their performance as rarely as


they give unsatisfactory ratings. They frequently don't even bother to discuss the
results of their evaluations with teachers.

But principals don't put even those minimal talents to use in most public school
systems. A recent study of the Chicago school system by the nonprofit New Teacher
Project (2007), for example, found that 87 percent of the city's 600 schools did not
issue a single "unsatisfactory" teacher rating between 2003 and 2006. Among that
group of schools were sixty-nine that the city declared to be failing educationally.
Of all the teacher evaluations conducted during those years, only 0.3 percent
produced "unsatisfactory" ratings, while 93 percent of the city's 25,000 teachers
received top ratings of "excellent" or "superior."

And principals use evaluations to help teachers improve their performance as rarely
as they give unsatisfactory ratings. They frequently don't even bother to discuss
the results of their evaluations with teachers. "Principals are falling prey to fulfilling
the letter of the law," says Dick Flannery, director of professional development for
the National Association of Secondary School Principals, a principals' membership
organization. "They are missing the opportunity to use the process as a tool to
improve instruction and student achievement."

New models
A small number of local, state, and national initiatives have sought a different
solution to drive-by evaluations — comprehensive evaluation systems that measure
teachers' instruction in ways that promote improvement in teaching.

The Teacher Advancement Program (TAP) is a good example. Launched by the


Milken Family Foundation in 1999 and now operated by the nonprofit, California-
based National Institute for Excellence in Teaching, TAP is a comprehensive
program to strengthen teaching through intensive instructional evaluations,
coaching, career ladders, and performance- based compensation. It's now in 180
schools with 5,000 teachers and 60,000 students in five states and the District of
Columbia.

Standards for Teaching


TAP measures teaching against standards in three major categories — designing
and planning instruction, the learning environment, and instruction — and nineteen
subgroups targeting things like how well lessons are choreographed, the frequency
and quality of classroom questions, and ensuring that students are taught
challenging skills like drawing conclusions.
Schools using TAP evaluate their teachers using a rubric that rates performance as
"unsatisfactory," "proficient," or "exemplary." Standards and rubrics such as TAP's
"create a common language about teaching" for educators, says Katie Gillespie, a
fifth-grade teacher at DC Preparatory Academy, a District of Columbia charter
school in its third year of using TAP. "That's crucial," says Gillespie.

Connecticut's Beginning Educator Support and Training Program (BEST), the


nation's first — and, until recently, only — statewide evaluation system, draws
heavily on the state's teachers in drafting standards.

The Connecticut Department of Education established BEST in 1989 to strengthen


its teaching force by supplying new teachers with mentors and training and then
requiring them in their second year to submit a portfolio chronicling a unit of
instruction. The unit needs to involve at least five hours worth of teaching, to
capture how teachers develop students' understanding of a topic over time,
something "drive-by" evaluations can't and don't do.

State-trained scorers evaluate the portfolios from four perspectives — instructional


design, instructional implementation, assessment of learning, and teachers' ability
to analyze teaching and learning — using four standards: conditional, competent,
proficient, and advanced. The state established committees of top Connecticut
teachers to draft the standards, which were circulated to hundreds of teachers,
administrators, and higher-education faculty members for comment.

The nonprofit National Board for Professional Teaching Standards also has
sponsored a large-scale system of teacher evaluations. It has conferred advanced
certification in sixteen subjects on some 63,000 teachers nationwide since its
inception in 1987, using a two-part evaluation: candidates submit a Connecticut-
like portfolio and complete a series of half-hour online essays.

Teams of teachers from around the country draft standards in each certification
area, and hundreds of teachers, administrators, and state and federal officials
comment before the standards are finalized. The Educational Testing Service (ETS)
manages the evaluation system under a contract with the National Board.

Multiple Measures
While traditional evaluations tend to be one-dimensional, relying exclusively on a
single observation of a teacher in a classroom, the comprehensive models capture a
much richer picture of a teacher's performance.

Comprehensive models capture a much richer picture of a teacher's performance.


The National Board portfolios include lesson plans, instructional materials, student
work, two twenty-minute videos of the candidate working with students in
classrooms, teachers' written reflections on the two taped lessons, and evidence of
work with parents and peers.

The National Board portfolios, for example, include lesson plans, instructional
materials, student work, two twenty-minute videos of the candidate working with
students in classrooms, teachers' written reflections on the two taped lessons, and
evidence of work with parents and peers. That's on top of the six online exercises
that National Board candidates take at one of 400 evaluation centers around the
country to demonstrate expertise in the subjects they teach.

In total, National Board candidates spend between 200 and 400 hours
demonstrating their proficiency in five areas: commitment to students' learning,
knowledge of subject and of how to teach it, monitoring of student learning, ability
to think systematically and strategically about instruction, and professional growth.

An advantage of portfolios is that, unlike standardized-test scores, they can be used


to evaluate teachers in nearly every discipline. National Board certification is open
to some 95 percent of elementary and secondary teachers.

Teamwork
Another way to counter the limited, subjective nature of many conventional
evaluations is to subject teachers to multiple evaluations by multiple evaluators.

In schools using TAP, teachers are evaluated at least three times a year against
TAP's teaching standards by teams of "master" and "mentor" teachers that TAP
trains to use the organization's evaluation rubrics (master teachers are more senior
and do less teaching than mentors). Schools combine the scores from the different
evaluations and evaluators into an annual performance rating.

TAP evaluators must demonstrate an ability to rate teachers at TAP's three


performance levels before TAP lets them do "live" teacher evaluations. Then TAP
requires schools using the program to enter every evaluation into a TAP-run online
Performance Appraisal Management System that produces charts and graphs of
evaluation results, which are used to compare a school's evaluation scores to TAP
evaluation trends nationally. And every year TAP ships videotaped lessons to
evaluators that they must score accurately using TAP's performance levels as a
prerequisite for continuing as TAP evaluators.

In Connecticut, every BEST portfolio is scored using the program's standards by


three state-trained teacher-evaluators who teach the same subject as the
candidate. Failing portfolios are rescored by a fourth evaluator. As in the TAP
program, scorers must complete nearly a week's worth of training and demonstrate
an ability to score portfolios accurately before participating in the program.

Not surprisingly, using evaluators with backgrounds in candidates' subject and


grade levels, as TAP and BEST do, strengthens the quality of evaluations. "Good
instruction doesn't look the same in chemistry as in elementary reading," says Mike
Gass, executive director of secondary education in Eagle County, Colorado, where
the district's fifteen schools use TAP.

Under traditional evaluations — done as they are by principals or assistant


principals — it's rarely possible to use evaluators with backgrounds in the
candidate's teaching area, especially at the middle and high school levels, where
teachers typically teach only one subject. Many evaluations, as a result, focus on
how teachers teach, at the expense of what they teach. Evaluators, writes Michigan
State's Kennedy, "are rarely asked to evaluate the accuracy, importance,
coherence, or relevance of the content that is actually taught or the clarity with
which it is taught" (Kennedy 2007).

Subject-area and grade-level specialists, scoring rubrics, evaluator training, and


recertification requirements like TAP's increase the "inter-rater reliability" of
evaluations. They produce ratings that are more consistent from evaluator to
evaluator and that teachers are more likely to trust.

Places to Grow
Unlike traditional teacher evaluations, these systems are part of programs to
improve teacher performance, not merely weed out bad apples. They are drive-in
rather than drive-by evaluations. At a time when research is increasingly pointing
to working conditions as being more important than higher pay in keeping good
teachers in the classroom, the teachers in the comprehensive evaluations programs
say that the combination of extensive evaluations and coaching that they receive
helps make their working conditions more professional, and thus more attractive.

At DC Preparatory Academy, which serves 275 middle school students in


northeastern Washington, D.C., using evaluations to strengthen teaching is part of
the fabric of the school. The school opened in 2003 and brought on TAP in 2005.
And in the TAP model, a key role of evaluations by master and mentor teachers is
identifying the teachers' weaknesses that mentors will work on with teachers during
the six weeks between evaluations.

"I felt I was a really good teacher before I got here," says Gillespie, in her second
year at DC Prep after spending four years teaching in nearby Fairfax County,
Virginia. "I got really high marks on my evaluations [in Fairfax]. But holy moly, I've
learned under TAP that I've got a lot of places to grow." Some studies have
suggested that teachers' performance plateaus after several years in the classroom.
But few teachers in public education get the sort of sophisticated coaching that
Gillespie receives under TAP; if more did, perhaps studies would reveal that their
performance continued to improve.

"It makes a difference when people are constantly there to help you," adds
Gillespie's colleague, seventh-grade English teacher Geoff Pecover. "The
expectations are high. My principal last year in DCPS [the District of Columbia
Public Schools, where Pecover taught for three years] showed up to evaluate my
class with the evaluation form already filled out, and the post-conference was a
waste of time. You didn't feel like you were learning anything."

To further strengthen the relationship between evaluation and instruction, TAP


requires schools to have weekly, hour-long "cluster" meetings where
master/mentor teachers work with teams of teachers of a particular subject or
grade level.
Cost factors — time and money
Not surprisingly, comprehensive classroom evaluation systems are more time-
consuming and more expensive than once-a-year principal evaluations or
evaluations based only on student test scores.

In schools with complex models like TAP's, the administrative challenges of training
and retraining evaluators, conducting classroom visits, and tying the evaluation
system to teacher professional development activities are daunting. "We didn't
realize how demanding it was," says Natalie Butler, DC Prep's principal. "You just
have to make the investment."

TAP and other comprehensive evaluation models also are a lot more demanding on
teachers under evaluation. The upward of 400 hours some candidates for National
Board certification spend in that process suggests as much, and the demands are
even greater on teachers facing multiple evaluations and follow-up work under
programs like TAP. "The typical teacher evaluation process puts teachers in a
passive role," says Catherine Fiske Natale, a Connecticut official with the state's
BEST program. "This is different." But it is not unprecedented, at least by
international standards. Researchers Shujie Liu of the University of Southern
Mississippi and Charles Teddlie of Louisiana State University (2005) report in a
study of Chinese teacher evaluation practices that Chinese teachers are expected to
observe the classes of other teachers as many as fifteen times a semester and write
a 1,500-word essay every semester on some aspect of their teaching experience.

At $1,000 per teacher, it would cost $3 billion a year to evaluate the nation's three
million teachers using a Connecticut — or National Board — like portfolio or TAP's
multiple evaluations — multiple evaluators model. By way of contrast, public
education's price tag has surpassed $500 billion a year, including some $14 billion
(about $240 per student) for teachers to take "professional development" courses
and workshops that teachers themselves say don't improve their teaching in many
instances.

Yet many school systems have been reluctant to use these resources on
comprehensive evaluation systems such as TAP's. "It is really difficult to get them
to use Title II monies," says Kristan Van Hook, TAP's senior vice president for public
policy and development, referring to the section of NCLB that funnels some $3
billion in teacherimprovement grants to the nation's school systems. "They are very
reluctant to change how they spend that money. It's tied up in things like salaries
for reading tutors and class-size reduction."

Sending a message
Comprehensive evaluations — with standards and scoring rubrics and multiple
classroom observations by multiple evaluators and a role for student work and
teacher reflections — are valuable regardless of the degree to which they predict
student achievement, and regardless of whether they're used to weed out a few
bad teachers or a lot of them. They contribute much more to the improvement of
teaching than today's drive-by evaluations or test scores alone. And they contribute
to a much more professional atmosphere in schools.

Comprehensive evaluations are valuable regardless of the degree to which they


predict student achievement. They contribute much more to the improvement of
teaching than today's drive-by evaluations.

As a result, they make public school teaching more attractive to the sort of talent
that the occupation has struggled to recruit and retain. Capable people want to
work in environments where they sense they matter, and using evaluation systems
as engines of professional improvement signals that teaching is such an enterprise.
Comprehensive evaluation systems send a message that teachers are professionals
doing important work.

But superficial principal drivebys will continue to pervade public education — and
teacher evaluation's potential as a lever of teacher and school improvement will
continue to be squandered — if school systems and teachers unions lack incentives
to do things differently.

Ultimately, the single salary schedule may be the most stubborn barrier to better
teacher evaluations. As Kate Walsh, president of the National Council on Teacher
Quality and memberdesignate of the Maryland State Board of Education, says: "If
there are no consequences for rating a teacher at the top, the middle, or the
bottom, if everyone is getting paid the same, then why would a principal spend a
lot of time doing a careful evaluation? I wouldn't bother." Many teachers unions, of
course, argue that the failure of principals to take evaluations seriously requires a
single salary schedule.

There's no simple solution to this Catch-22. But TAP, for one, has addressed it
head-on by combining comprehensive evaluations that teachers trust with
performance pay. The program's comprehensive classroom evaluations legitimize
performance pay in teachers' minds, and its performancepay component gives
teachers and administrators alike a compelling reason to take evaluations seriously.
Pay and evaluations become mutually reinforcing, rather than mutually exclusive.

References

Toch, T. and Rothman, R. (2008). Avoiding a Rush to Judgment: Teacher


Evaluation and Teacher Quality. Voices in Urban Education, No. 20, Summer 2008

You might also like