See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/349485391
How to evaluate language teaching materials?
Presentation · February 2021
DOI: 10.13140/RG.2.2.12223.43685
CITATIONS READS
0 7,382
3 authors, including:
Farangis Shahidzade
Yazd University
103 PUBLICATIONS 42 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Show Your Achievements 😎 View project
The role of narrative designs in language teaching /learning research View project
All content following this page was uploaded by Farangis Shahidzade on 21 February 2021.
The user has requested enhancement of the downloaded file.
Materials Evaluation
Presented by Mohammad Reza Jafari and
Mohammad Reza Khorshidi
Under the Supervision of Dr. Farangis Shahidzade
An Overview of the Presentation
What is materials evaluation?
Why evaluate materials?
Evaluation vs. analysis
Principles in materials evaluation
What is being evaluated?
Types of materials evaluation
a) External
b) Internal
When is the evaluation carried out?
a) Pre-use
b) Whilst-use
c) Post-use
How is the evaluation carried out?
Who carries out the evaluation?
a) Teachers
b) Publishers
c) Learners
d) Specialists/practitioners
Developing criteria for materials evaluation
What Is Materials Evaluation?
Materials evaluation is a procedure that involves measuring the
(potential) value of a set of learning materials (Tomlinson, 2003c).
It involves making judgements about the effect of the materials on
the people using them.
Evaluation tries to measure some or all of the following:
the appeal of the materials to the learners,
the credibility of the materials to learners, teachers and
administrators,
the validity of the materials (i.e. Is what they teach worth
teaching?),
the reliability of the materials (i.e. Would they have the same
effect with different groups of target learners?).
the ability of the materials to interest the learners and the
teachers,
the ability of the materials to motivate the learners,
the flexibility of the materials (e.g. the extent to which it is
easy for a teacher to adapt the materials to suit a particular
context).
Why Evaluate Materials?
According to Sheldon (1988), there are three basic reasons to
evaluate coursebooks:
1) By evaluating materials, the teacher/program developer can
make decisions on selecting the appropriate coursebook.
2) Moreover, evaluation can familiarize the teacher with the
potential weaknesses and strengths of materials.
3) Material evaluation can be another way of action research
developing our understanding of the functions of materials.
“The primary function of evaluation is to assess the suitability of
materials for a given teaching and learning context” (Mishan &
Timmis, 2015, p. 66).
It is important to keep in mind that no two evaluations can be the
same because the needs, objectives, backgrounds (e.g.,
sociocultural/local) and preferred styles of the participants will
differ from context to context (Tomlinson, 2003c).
(New York vs. Tehran)
Therefore, the main point is that it is not the materials which are
being evaluated but their effects on the people who are using
them (including the evaluators, too).
Evaluation vs. Analysis
An evaluation can include an analysis or follow from one, but the
objectives and procedures of it are different (Tomlinson, 2003c).
An evaluation focuses on the users of the materials and makes
judgements about their effects and no matter how structured,
criterion referenced and rigorous an evaluation is, it will be
inevitably subjective (Tomlinson, 2003b).
On the other hand, an analysis focuses on the materials and it
aims to provide an objective analysis of them. It “asks questions
about what the materials contain, what they aim to achieve and
what they ask learners to do” (Tomlinson, 1999, p. 10).
Byrd (2001) makes a rather different distinction between
evaluation and analysis when she talks about “evaluation” for
selection and “analysis” for implementation.
Even a review for a publisher or journal, and an evaluation for a
ministry of education is often “fundamentally a subjective, rule
of thumb activity” (Sheldon, 1988, p. 245).
How to eliminate/remove subjectivity?
Making an evaluation criterion-referenced can reduce (but not
remove) subjectivity and can certainly help to make an evaluation
more principled, rigorous, systematic and reliable (Tomlinson,
2003c).
Examples of Analysis
Questions
• Does it provide a transcript of the listening texts? (YES/NO)
• What does it ask the learners to do immediately after reading
a text?
Example of Evaluation
Question
• Are the listening texts likely to engage the learner? (very
unlikely/very likely)
Ideally, analysis is objective but analysts are often influenced by
their own ideology and their questions are biased accordingly
(Tomlinson, 2003c).
• For example, in the question “Does it provide a lot of guided
practice?”, the phrase “a lot of” implies it should do and this
could interfere with an objective analysis of the materials.
Analysts also often have a hidden agenda when designing their
instruments of analysis (Tomlinson, 2003c).
• For example, an analyst might ask the question “Are the
dialogues authentic?” in order to provide data to support an
argument that intermediate coursebooks do not help to
prepare learners for the realities of conversation.
According to Tomlinson (2003c), unfortunately there have been
cases where “many publications on materials evaluation mix
analysis and evaluation” (p. 23).
For example, Cunningsworth (1984, pp. 74–9) includes both
analysis and evaluation questions in his “Checklist of Evaluation
Criteria.”
Tomlinson’s preference for separating analysis from evaluation is
also shared by Littlejohn (2011), who presents a general
framework for analyzing materials (pp. 182–98), and the
framework is as follows:
1) Analysis of the target situation of use,
2) Materials analysis,
3) Match and evaluation (determining the appropriacy
of the materials to the target situation of use, or
the congruence between the target situation of use
and the materials),
4) Action.
Principles in Materials Evaluation
Many evaluations are impressionistic, or at best are aided by an ad-hoc
and very subjective list of criteria (Tomlinson, 2003c).
In Tomlinson’s view (2003), it is very important that evaluations are
driven by a set of principles and that these principles are articulated by
the evaluator(s) prior to the evaluation.
In this way greater validity and reliability can be achieved and fewer
mistakes are likely to be made.
In developing a set of principles it is useful to consider the following.
The Evaluator’s Theory of Learning and Teaching
All teachers develop theories of learning and teaching which they apply in
their classrooms.
Many researchers (e.g. Schon, 1983) argue that it is useful for teachers to
try to achieve an articulation of their theories by reflecting on their
practice.
Edge and Wharton (1998, p. 297) argue that reflective practice cannot
only lead to “perceived improvements in practice but, more importantly,
to deeper understandings of the area investigated.”
In a similar way, Tomlinson (2003b) argues that the starting point of any
evaluation should be reflection on the evaluator’s practice leading to
articulation of the evaluator’s theories of learning and teaching.
In this way, evaluators can
a) make overt their predispositions,
b) make use of them in constructing criteria for evaluation,
c) be careful not to let them weight the evaluation too much towards
their own bias and
d) learn a lot about themselves and about the learning and teaching
process.
WHAT IS BEING EVALUATED?
According to Mishan and Timmis (2015), the assumption in
materials evaluation is that evaluation is applied to
coursebooks.
This is actually the most common form of evaluation.
“Among the materials which could usefully be evaluated are,
for example, in-house materials, tests, graded readers or self-
access materials” (Mishan & Timmis, 2015, p. 58).
Tasks can also be evaluated (Ellis, 2011).
Ellis (2011) informs us about “micro-evaluation,” which, in the
words of Mishan and Timmis (2015), “involves researching in
detail the effects of particular classroom tasks included in the
materials” (p. 58).
Ellis (2011: 231) argues that “[micro-evaluation] forces a teacher
to examine the assumptions that lie behind the design of a task
and the procedures used to implement it. It requires them to go
beyond impressionistic evaluation by examining empirically
whether a task “works” in the way it intended and how it can be
improved for future use.”
Types of Materials
Evaluation
Evaluations differ in
• purpose: doing an evaluation to
1) help a publisher to make decisions about publication,
2) to help yourself in developing materials for publication,
3) to select a textbook,
4) to write a review for a journal or as part of a research
project.
• personnel: as an evaluator you might be
1) a learner,
2) a teacher,
3) an editor,
4) a researcher,
5) a Director of Studies or an Inspector of English.
• formality: you might be
1) doing a mental evaluation in a bookshop,
2) filling in a short questionnaire in class or
3) doing a rigorous empirical analysis of data elicited from a
large sample of users of the materials.
• timing: you might be doing your evaluation
1) before the materials are used,
2) while they are being used or
3) after they have been used.
WHEN IS THE EVALUATION
CARRIED OUT?
In terms of the period the evaluation is
carried out, Cunningsworth (1995)
proposes pre-use, in-use and post-use
evaluations:
1) Pre-use Evaluation: it is intended to predict the potential
performance of a material (predictive).
2) In-use Evaluation: it is conducted while using a coursebook
“when a newly introduced coursebook is being monitored or
when a well-established but ageing coursebook is being
assessed to see whether it should be considered for
replacement” (Cunningsworth, 1995, p. 14).
3) Post-use Evaluation: it provides retrospective assessment of a
material and it is also used to decide whether to use the same
material on future occasions or not.
Tomlinson’s View
The obvious choice in terms of when the evaluation takes place is
sth between pre-use, whilst-use and post-use evaluation
(Tomlinson, 2003b).
However, it is important to keep in mind that almost all these
kinds of evaluations will be preceded by a detailed analysis of the
context in which the materials will be used (McGrath, 2002).
Therefore, it can be concluded that any evaluation presupposes
an already-done analysis (Tomlinson, 2003c).
Pre-use Evaluation
Pre-use evaluation involves making predictions about the
potential value of materials for their users (Tomlinson, 2003c).
It is often done impressionistically.
Example: when a teacher flicks through a book to gain a quick
impression of its potential value.
• For this reason, publishers are well aware of this procedure
and sometimes place attractive illustrations in the top right-
hand corner of the right-hand page in order to influence the
flicker in a positive way (Tomlinson, 2003c).
According to Mishan and Timmis (2015), pre-use evaluations are
more common than whilst-use and post-use evaluations for two
reasons (p. 59):
1) They are usually designed to inform us which materials to
adopt,
2) They are easier to carry out than whilst-use or post-use
evaluations.
Whilst-use Evaluation
This involves measuring the value of materials while using them
or while observing them being used (Tomlinson, 2003c).
It can be more objective and reliable than pre-use evaluation as it
makes use of measurement rather than prediction (Tomlinson,
2003c).
However, it is limited to measuring what is observable (e.g. Are
the instructions clear to the learners?) and cannot claim to
measure what is happening in the learners’ brains.
We should take into account the following
while conducting a whilst-use evaluation:
• Clarity of instructions
• Clarity of layout
• Comprehensibility of texts
• Credibility of tasks
• Achievability of tasks
• Achievement of performance objectives
• Potential for localization
• Practicality of the materials
• Teachability of the materials
• Flexibility of the materials
• Appeal of the materials
• Motivating power of the materials
• Impact of the materials
• Effectiveness in facilitating short-term learning
Therefore, whilst-use evaluation can be very useful but dangerous
because teachers and observers can be misled by whether the
activities seem to work or not (Tomlinson, 2003c).
According to Tomlinson (2003b), an evaluator can easily be
deceived by activities which appear to work well.
For example, lessons which generate “student talking time” are
often rated highly, but we need to evaluate the quality of the
talk, not just the quantity (Mishan & Timmis, 2015).
Tomlinson (2003b) argues that most of the whilst-use evaluation
aspects can be assessed impressionistically through observation,
though he advises that it is preferable to focus on one aspect per
observation.
In other words, greater reliability can be achieved by focusing on
one criterion at a time and also by using pre-prepared
instruments of measurement.
• For example, “Appeal of the materials.”
• Are the materials appealing to the learners?
McGrath (2002: 120) focuses specifically on the role of the
teacher in both whilst-use and post-use evaluations.
Teachers can, he argues, ask themselves questions of the
following kind as prompts for whilst-use and post-use evaluations:
• What proportion of the materials was I able to use unchanged?
• Did the unchanged materials appear to work well? What
evidence do I have for this?
• What spontaneous changes did I make as I taught with the
materials? Did these improvisations work well? If not, what do I
need to do differently?
Post-use Evaluation
Post-use evaluation is probably the most valuable/informative
type of evaluation because it can measure the actual effects of
the materials on the users (Tomlinson, 2003c).
It can measure:
1) the short-term effects such as motivation, impact,
achievability, instant learning, etc., and
2) the long-term effects like durable learning and application.
According to Mishan and Timmis (2015), while pre-use evaluation
has an important role in predicting poor selection of materials or
selection of poor materials, post-use evaluation is potentially the
most informative type.
McGrath (2013) also believes that retrospective evaluation (post-
use) can lead to the identification of weaknesses in the materials,
thereby leading to constructive revision and adaptation.
We should take into account the following
while conducting a post-use evaluation:
• What do the learners know which they did not know before
starting to use the materials?
• What do the learners still not know despite using the
materials?
• What can the learners do which they could not do before
starting to use the materials?
• What can the learners still not do despite using the materials?
• To what extent have the materials prepared the learners for
their examinations?
• To what extent have the materials prepared the learners for
their post-course use of the target language?
• What effect have the materials had on the confidence of the
learners?
• What effect have the materials had on the motivation of the
learners?
• To what extent have the materials helped the learners to
become independent learners? (autonomy)
• Did the teachers find the materials easy to use?
• Did the materials help the teachers to cover the syllabus?
• Did the administrators find the materials helped them to
standardize the teaching in their institution?
In other words, by conducting a post-use evaluation, one can:
1) measure the actual outcomes of the use of the materials, and
2) provide the data in order to make reliable decisions about the
use, adaptation or replacement of the materials.
How to measure the post-use effects
of materials:
tests of what has been “taught” by the materials,
tests of what the students can do (direct testing),
examinations,
interviews,
questionnaires,
criterion-referenced evaluations by the users,
post-course diaries,
post-course “shadowing” of the learners,
post-course reports on the learners by employers, subject
tutors, etc.
HOW IS THE EVALUATION
CARRIED OUT?
Evaluation criteria:
The most important factor in the design of an evaluation
instrument should be the criteria against which the materials are
evaluated (Tomlinson, 2003b).
Generating evaluation criteria:
1) The first succinct evaluative approach/framework is called
CATALYST test introduced by Grant (1987).
It stands for Communicative, Aims, Teachability, Available add-
ons, Level, Your impression, Students interest and Tried and
tested.
2) The second is Tanner and Green’s practical assessment form
(1998) based on Method, Appearance, Teacher-friendliness,
Extras, Realism, Interestingness, Affordability, Level and Skills,
the initials of which recollectively make up the word MATERIALS.
3) The third framework is that of Tomlinson (2003b); he suggests
five categories of evaluation criteria, each of which can be used
to develop a number of specific criteria:
universal (driven by SLA theory): e.g. are the materials
motivating?
local (related to the context): e.g. are the materials culturally
acceptable in the context?
media-specific (e.g. audio or computer): e.g. is the sound
quality of the audio materials good?
content-specific (e.g. exam or English for Specific Purposes
(ESP)): e.g. do the materials replicate the types of real-world
tasks the target group will need to do? (content validity)
age-specific: e.g. are the visuals likely to appeal to children?
4) The fourth is that of Rubdy (2003); he argues that evaluation
criteria can be generated from three key notions:
a) psychological validity: learners’ needs, goals and pedagogical
requirements (like independence, autonomy, self-
development and creativity).
b) pedagogical validity: teachers’ skills, abilities, theories and
beliefs (like guidance, choice and reflection).
c) process validity (and content validity): the thinking underlying
the materials, writer’s presentation of the content and
approach to teaching and learning respectively (methodology,
content, layout and graphics).
5) The fifth is the framework proposed by Riazi (2003) which
consists of surveying the teaching/learning situation, conducting a
neutral analysis and the carrying out of a belief-driven
evaluation.
a) surveying the teaching/learning situation,
b) conducting a neutral analysis and
c) the carrying out of a belief-driven evaluation.
6) The sixth is that of Mukundan (2006), who describes the use of
a composite framework combining checklists, reflective journals
and computer software to evaluate ELT textbooks in Malaysia.
a) checklists,
b) reflective journals and
c) computer software.
7) The seventh framework has been proposed by McDonough,
Shaw and Masuhara (2013), who focus on developing criteria
evaluating the suitability of materials in relation to usability,
generalizability, adaptability and flexibility.
Evaluating the suitability of materials in relation to:
a) usability,
b) generalizability,
c) adaptability and
d) flexibility.
8) The eighth framework is that of McGrath (2002), who suggests
a procedure involving materials analysis followed by first glance
evaluation, user feedback and evaluation using context-specific
checklists.
a) Materials analysis,
b) First glance evaluation,
c) User feedback and
d) (Final) evaluation using context-specific checklists.
McGrath (2002) notes the following areas which are common to
most of the frameworks:
• design: includes both layout of material on the page and
overall clarity of organization
• language content: coverage of linguistic items and language
skills
• subject matter: topics
• practical considerations: e.g. availability, durability and price
Making use of a checklist of criteria has become popular in
materials evaluations and certain checklists from the literature have
been frequently made use of in evaluations (Tomlinson, 2003c).
For example, the famous checklist by Demir & Ertas (2014), which
consists of these four main sections:
Subjects & Contents (10 items),
Skills & Sub-skills (25 items),
Layout & Physical Make-up (7 items) and,
Practical Considerations (14 items).
(56 items overall)
Problems of evaluation criteria/checklists:
• The problem is that no criteria can be applicable to all
situations and it is also important that there be a congruence
between the materials and the curriculum, students and
teachers (Byrd, 2001).
• Mathews (1985), Cunningsworth (1995) and Tomlinson (2012)
have also stressed the importance of relating evaluation
criteria to what is known about the context of learning,
meaning the criteria should be consonant with the context of
learning.
Makundan and Ahour (2010) in their review of 48 evaluation
checklists were critical of most checklists for being too context
bound to be generalizable, that is, the criteria were too much
context-specific.
Instead, Makundan and Ahour (2010) proposed that a framework
for generating flexible criteria would be more useful than
detailed and inflexible checklists. Moreover, more attention
should be given to retrospective evaluation than to predictive
evaluation.
It means instead of using checklists, each practitioner can utilize
or probably come up with a certain set of evaluation criteria.
Tomlinson and Masuhara (2004, p. 7) proposed the following
criteria for evaluating, monitoring and revising the criteria they
have generated:
a) Is each question an evaluation question?
b) Does each question only ask one question?
c) Is each question answerable?
d) Is each question free of dogma?
e) Is each question reliable in the sense that other evaluators
would interpret it in the same way?
Tomlinson (2003b) also suggests a set of questions which could be
used more generally to monitor evaluation criteria in any
evaluation framework:
• Is the list based on a coherent set of principles of language
learning?
• Are all the criteria actually evaluation criteria?
• Are the criteria sufficient to help the evaluator to reach useful
conclusions?
• Are the criteria organized systematically (for example into
categories and subcategories which facilitate discrete as well
as global verdicts and decisions)?
• Are the criteria sufficiently neutral to allow evaluators with
different ideologies to make use of them?
• Is the list sufficiently flexible to allow it to be made use of by
different evaluators in different circumstances?
Other Ways to Evaluate
Materials
Regarding different methods to evaluate coursebooks,
Abdelwahab (2013) suggests three basic methods:
a) The impressionistic method: involves analyzing a
coursebook based on the general impression. This method will
not be adequate in itself.
b) The checklist method: needs to be integrated with the
impressionistic method so that the impressionistic method
will not be inadequate.
c) The in-depth method: has to do with a profound scrutiny
of representative features such as the design of one particular
unit or exercise, or how particular language elements have
been treated (internal evaluation).
McDonough and Shaw (2003: 61) suggest that the evaluators
should first conduct an external evaluation “that offers a brief
overview from the outside” and then carry out “a closer and more
detailed internal evaluation.”
1) A brief external evaluation which should be conducted to
have an overview of the organizational foundation of the
material.
2) A detailed internal evaluation “to see how far the materials in
question match up to what the author claims as well as to the
aims and objectives of a given teaching program” (McDonough
& Shaw, 1993, p. 64).
The External Evaluation
In this model, the organization of the materials as stated
explicitly by the author/publisher should be examined by looking
at:
• the “blurb” or the claims made on the cover of the
teacher’s/students’ book
• the introduction and table of contents
This is actually what Tomlinson (2003c: 16) calls analysis in that
“it asks questions about what the materials contain, what they
aim to achieve and what they ask learners to do.”
At this stage, an evaluator should consider why the materials have
been produced. In other words, it should be made clear what the
purposes of the materials are.
From the “blurb” and the introduction, we can normally expect
comments on some/all of the following (McDonough, Shaw &
Masuhara, 2013, pp. 55-56):
the intended audience (who the materials are targeted at)
the proficiency level (false beginner, low intermediate, etc.)
the context in which the materials are to be used (EFL, ESL,
ESP, EAP)
how the language has been presented and organized into
teachable units/lessons (units/lessons/lengths)
the author’s views on language and methodology and the
relationship between the language, the learning process and
the learner
Other factors to take into account at this external stage are as
follows:
Are the materials to be used as the main “core” course or to
be supplementary to it?
Is a teacher’s book in print and locally available?
Is a vocabulary list/index included? (this is useful where the
learner might be doing a lot of individualized and/or out-of
class work)
What visual material does the book contain (photographs,
charts, diagrams) and is it there for cosmetic value only or is it
integrated into the text?
Is the layout and presentation clear or cluttered?
The potential durability of the materials, paper quality and
binding need to be assessed.
Is the material too culturally biased or specific?
Do the materials represent minority groups and/or women in a
negative way? Do they present a “balanced” picture of a
particular country/society?
What is the cost of the inclusion of digital materials (e.g. CD,
DVD, interactive games, quizzes and downloadable materials
from the web)? How essential are they to ensure language
acquisition and development?
The inclusion of tests in the teaching materials (diagnostic,
progress, achievement); would they be useful for your
particular learners?
What Next?
If our external evaluation shows the materials to be potentially
appropriate and worthy of a more detailed inspection, then we
can continue with our internal or more detailed evaluation.
If not, then we can “exit” at this stage and start evaluating other
materials if we wish so.
1) Macro-evaluation 2) Inappropriate/appropriate 3) Micro-evaluation 4) Inappropriate/appropriate 5) Adopt/select
(External) (Internal)
exit exit
An overview of the materials evaluation process (McDonough, Shaw & Masuhara, 2013, p. 58).
The Internal Evaluation
Now we can continue to the next stage of our evaluation
procedure by performing an in-depth investigation into the
materials.
What is important at this stage is that we have to analyze the
extent to which the aforementioned factors stated in the external
evaluation stage match up with the internal consistency and
organization of the materials (McDonough, et al., 2013, p. 58).
Therefore, there should be a congruence between the claims of
the author/publisher (at the external evaluation stage) and what
the materials really include (at the internal evaluation stage).
In order to perform an effective internal evaluation of the
materials, we need to examine at least two units (preferably
more) of a book or set of materials to investigate the following
factors (McDonough, et al., 2013, pp. 59-60):
the presentation of the skills in the materials (what skills are
covered, the proportion given to each skill, are the skills
treated in isolation (discretely) or integratively?)
the grading and sequencing of the materials.
where reading/discourse skills are involved, and is there much
in the way of appropriate text beyond the sentence?
where listening skills are involved, are recordings “authentic”
or artificial?
do speaking materials incorporate what we know about the
nature of real interaction or are artificial dialogues offered
instead?
the relationship of tests and exercises to (a) learner needs and
(b) what is taught by the course material.
do you feel that the material is suitable for different learning
styles? Is a claim and provision made for self-study and is such
a claim justified?
Are the materials engaging to motivate both students and
teachers alike, or would you foresee a student/teacher
mismatch?
At this stage, it is also useful to consider how the materials may
guide and frame “teacher–learner interaction” and “the teacher–
learner relationship.”
The framework proposed by McDonough, et al. (2013) focuses on
evaluating the suitability of materials in relation to:
1) usability,
2) generalizability,
3) adaptability and
4) flexibility.
1) Usability Factor
• How far the materials could be integrated into a particular
syllabus as “core” or supplementary?
• For example, we may need to select materials that suit a
particular syllabus or set of objectives that we have to work to.
• The materials may or may not be able to do this.
2) Generalizability Factor
• Is there a restricted use of “core” features that make the
materials more generally useful?
• Perhaps not all the material will be useful for a given
individual or group but some parts might be.
• This factor can in turn lead us to consider the next point.
3) Adaptability Factor
• Can parts be added/extracted/used in another context/modified
for local circumstances?
• There may be some very good qualities in the materials but, for
example, we may judge the listening material or the reading
passages to be unsuitable and in need of modification.
• If we think that adaptation is feasible, we may choose to do this.
4) Flexibility Factor
• How rigid is the sequencing and grading? Can the materials be
entered at different points or used in different ways?
• In some cases, materials that are not so steeply graded offer a
measure of flexibility that permits them to be integrated
easily into various types of syllabus.
WHO CARRIES OUT THE
EVALUATION?
Principled evaluations based on explicit criteria can give the
impression that evaluation is, or should be, the exclusive domain
of specialists, which may not always be the case (Mishan &
Timmis, 2015, pp. 64-65).
We need to consider the role in evaluation of stakeholders who
are not specialists in this specific field:
a) teachers,
b) learners and
c) publishers.
Teachers as Evaluators
• Masuhara (2011) says meetings could be held where new
materials are presented to the teachers, leading to discussions
of which activities the teachers preferred and why they
preferred these activities to others.
• McGrath (2002: 120) proposes a number of questions teachers
can ask themselves to systematize whilst-use evaluation.
• Again McGrath (2013) suggests that teachers might keep
records of use, noting sections of the materials they had used
or omitted, which sections went well and so on.
Learners as Evaluators
According to McGrath (2013: 151), learners also have an
important role to play in evaluation: “learners are capable of
evaluation. They do not always opt for the same point on a scale.
They discriminate. Given the opportunity, they can make
judgements which may sometimes surprise their teachers.”
Examples of learners’ involvement in evaluation:
a) learner diaries,
b) rating of tasks,
c) pyramid discussions and
d) metaphor study.
Publishers as Evaluators
Amrani (2011) notes that publishers can use either (a) piloting or
(b) reviewing of materials to determine their suitability.
However, she points out that reviewing (comments on materials
made by stakeholders) is now more common than piloting as a
reviewing practice by publishers.
Standard Approaches to
Materials Evaluation
A useful exercise for anybody writing or evaluating language
teaching materials would be to evaluate the checklists and
criteria lists from a sample of the publications above against the
following criteria (Tomlinson, 2003c):
• Is the list based on a coherent set of principles of language
learning?
• Are all the criteria actually evaluation criteria or are they
criteria for analysis?
• Are the criteria sufficient to help the evaluator to reach useful
conclusions?
• Are the criteria organized systematically (e.g. into categories
and subcategories which facilitate discrete as well as global
verdicts and decisions)?
• Are the criteria sufficiently neutral to allow evaluators with
different ideologies to make use of them?
• Is the list sufficiently flexible to allow it to be made use of by
different evaluators in different circumstances?
Developing Criteria for
Materials Evaluation
Tomlinson (2003c) stresses that evaluators need to develop their
own principled criteria which take into account the context of the
evaluation and their own beliefs.
He also claims that evaluation criteria should be developed
before materials are produced.
1) Brainstorm a list of universal criteria:
Universal criteria: criteria which would apply to any language
learning materials anywhere for any learners.
They derive from principles of language learning and the results
of classroom observation and provide the fundamental basis for
any materials evaluation (Tomlinson, 2003c).
Examples of universal criteria would be:
• Do the materials provide useful opportunities for the learners
to think for themselves?
• Are the target learners likely to be able to follow the
instructions?
• Are the materials likely to cater for different preferred
learning styles?
• Are the materials likely to achieve affective engagement?
2) Subdivide some of the criteria:
It is best to subdivide some of the criteria into more specific
question if:
• the evaluation is the basis for subsequent revision or
adaptation of materials or
• if it is a formal evaluation and important decisions are to be
made based on the results of the evaluation.
For example:
Are the instructions:
• succinct? (quantity)
• sufficient? (quantity)
• self-standing? (independence)
• standardized? (quality)
• separated? (quality)
• sequenced? (from simple to complex)
• Staged? (systemtaticity)
3) Monitor and revise the list of universal
criteria:
Is each question an evaluation question?
If the question is an analysis question then you can only give the
answer a 1 or a 5 on the 5-point scale which is recommended
later in this suggested procedure.
For example: (Does each unit include a test?)
However, if it is an evaluation question then it can be graded at
any point on the scale.
For example: (To what extent are the tests likely to provide
useful learning experiences?)
Analysis (objective) vs. Evaluation (subjective)
Does each question only ask one question?
Many criteria in published lists ask two or more questions and
therefore cannot be used in any numerical grading of the
materials.
For example, Grant (1987) includes the following question which
could be answered ‘Yes; No’ or ‘No; Yes’:
1) Is it attractive? Given the average age of your students, would
they enjoy using it?’ (p. 122).
This question could be usefully rewritten as:
1) Is the book likely to be attractive to your students?
2) Is it suitable for the age of your students?
3) Are your students likely to enjoy using it?
Double-barreled questions: it is when sb asks a question about
more than one issue, yet allows only for one answer.
For example, “Do you think that students should have more
classes about history and culture?” contains two different issues;
one is about history and the other concerns culture.
Is each question answerable?
It is when some questions are so large and so vague that they
cannot usefully be answered, or when they cannot be answered
without reference to other criteria, or they require expert
knowledge of the evaluator.
For example: “Is it culturally acceptable?”
We need to be aware of the culture of the context in advance if
planning to answer this question.
Is each question free of dogma?
The questions should reflect the evaluators’ principles of
language learning but should not impose a rigid methodology as a
requirement of the materials.
• Are the various stages in a teaching unit adequately
developed? (presupposition: PPP)
• Do the sentences gradually increase in complexity to suit the
growing reading ability of the students? (sequence of
materials)
Is each question reliable in the sense that other
evaluators would interpret it in the same way?
There are some terms and concepts in applied linguistics which
can be interpreted differently by linguists. Therefore, it is best to
avoid them when attempting to measure the effects of materials.
• Are the materials sufficiently authentic?
• Is there an acceptable balance of skills?
• Do the activities work?
• Is each unit coherent?
Are the materials sufficiently authentic?
• Do the materials help the learners to use the language in
situations they are likely to find themselves in after the course?
Is there an acceptable balance of skills?
• Is the proportion of the materials devoted to the development of
reading skills suitable for your learners?
Do the activities work?
• Are the communicative tasks useful in providing learning
opportunities for the learners?
Is each unit coherent?
• Are the activities in each unit linked to each other in ways which
help the learners?
4) Categorize the list:
It is possible to rearrange the random list of universal criteria into
categories.
This can result in focus and the possibility of making
generalizations increases.
Learning Principles, Cultural Perspective, Topic Content
Teaching Points, Texts, Activities, Methodology
Instructions, Design and Layout
5) Develop media-specific criteria:
These are criteria which ask questions of particular relevance to
the medium used by the materials being evaluated (e.g. criteria
for books, for audio cassettes, for videos, etc.).
• Is it clear which sections the visuals refer to? (illustrations)
• Is the sequence of activities clearly signaled? (layout)
• Are the different voices easily distinguished? (audibility)
• Do the gestures of the actors help to make the language
meaningful in realistic ways? (movement)
6) Develop content-specific criteria:
These are criteria which relate to the topics and/or teaching
points of the materials being evaluated.
(For example, a grammar book may never include rhetorical
conventions of English writing).
• Do the examples of business texts (e.g. letters, invoices, etc.)
replicate features of real-life business practice?
• Do the reading texts represent a wide and typical sample of
genres?
7) Develop age-specific criteria:
These are criteria which relate to the age of the target learners.
Whether it is suitable for 5-year-olds, for 10-year-olds, for
teenagers, for young adults or for mature adults.
These criteria would relate to cognitive and affective
development, to previous experience, to interests and to wants
and needs.
• Are there short, varied activities which are likely to match the
attention span of the learners?
• Is the content likely to provide an achievable challenge in
relation to the maturity level of the learners?
8) Develop local criteria:
These are criteria which relate to the actual or potential
environment of use.
They are actually related to measuring the value of the materials
for particular learners in particular circumstances.
It is this set of criteria which is unique to the specific evaluation
being undertaken and which is ultimately responsible for most of
the decisions made in relation to the adoption, revision or
adaptation of the materials.
Typical features of the environment which would determine this
set of materials are:
• the type(s) of institution(s),
• class size,
• the background, needs and wants of the learners/teachers,
• the language policies of a particular region,
• the objectives of the courses,
• the intensity and extent of the teaching time available,
• the amount of exposure to the target language outside the
classroom.
Examples of local criteria would be:
• To what extent are the stories likely to interest 15-year-old
boys in Turkey?
• To what extent are the reading activities likely to prepare the
students for the reading questions in the Primary School
Leaving Examination in Singapore?
• To what extent are the topics likely to be acceptable to
parents of students in Iran?
9) Develop other criteria:
• teacher-specific,
• administrator-specific,
• gender-specific,
• culture-specific,
• L1-specific criteria and,
• criteria assessing the match between the materials and the
claims made by the publishers for them (internal vs. external
evaluation).
10) Trial the criteria:
It is always important to trial the criteria to ensure that the
criteria are sufficient, answerable, reliable and useful.
Revisions, if needed, can be made before the actual evaluation
begins.
11) Conducting the evaluation:
According to Tomlinson (2003c), the most effective way of
conducting an evaluation is to:
• make sure there is more than one evaluator (reliability issues),
• discuss the criteria to make sure there is equivalence of
interpretation,
• answer the criteria independently and in isolation from the
other evaluator(s),
• focus in a large evaluation on a typical unit for each level (and
then check its typicality by reference to other units),
• give a score for each criterion (with some sets of criteria
weighted more heavily than others),
• write comments at the end of each category,
• at the end of the evaluation aggregate each evaluator’s scores
for each criterion, category of criteria and set of criteria and
then average the scores,
• record the comments shared by the evaluators,
• write a joint report.
As Tomlinson (2003c) says:
What is recommended above is a very rigorous, systematic
but time-consuming approach to materials evaluation which I
think is necessary for major evaluations from which important
decisions are going to be made. However for more informal
evaluations (or when very little time is available) I would
recommend the following procedure:
Procedures for Conducting
Informal Evaluation
1) Brainstorm beliefs,
2) Decide on shared beliefs,
3) Convert the shared beliefs into universal criteria,
4) Write a profile of the target learning context for the materials,
5) Develop local criteria from the profile,
6) Evaluate and revise the universal and the local criteria,
7) Conduct the evaluation.
Conclusion
Materials evaluation is initially a time-consuming and difficult
undertaking.
Approaching it in the principled, systematic and rigorous ways
suggested above can:
1) help to make and record vital discoveries about the materials
being evaluated,
2) help the evaluators to learn a lot about materials, about
learning and teaching and about themselves.
Doing evaluations formally and rigorously can also eventually
contribute to the development of an ability to conduct principled
informal evaluations quickly and effectively when the occasion
demands:
• when asked for an opinion of a new book,
• when deciding which materials to buy in a bookshop,
• when editing other people’s materials,
• and a lot of other occasions.
List of Key Words
Evaluation Ideology
Analysis Hidden agenda
Reliability Authenticity
Validity Micro-evaluation
Credibility Macro-evaluation
Action research Impressionistic
Adaptation Empirical
Adoption Pre-use/whilst-use/post-use
Objective Predictive/retrospective
Subjective Media/content/age-specific
Selection Universal/local criteria
Implementation Process/content validity
Criterion-referenced Pedagogical/psychological validity
Feedback Kinesthetic/dependent/independent
learner
Checklists Double-barreled questions
Usability Revision
Generalizability Language policy
Adaptability Ad hoc
Flexibility Self- investment
Context-specific Attitude
Dogma aptitude
In-depth Intake
External/internal evaluation Self-esteem
Blurb Style
False beginner Experiential learning
Localization Input
Achievability Strategic competence
Practicality Awareness
Teachability Sensitivity
Whole person approach Inner voice
Feedback Personalization
Kinesthetic/dependent/independent learner Output
References
Abdelwahab, M. M. (2013). Developing an English Language Textbook
Evaluative Checklist. IOSR Journal of Research & Method in
Education, 1(3), 55-70.
Amrani, F. (2011). The process of evaluation: a publisher’s view. In B.
Tomlinson (ed.), Materials Development in Language Teaching, 2nd
edn. (pp. 267–95). Cambridge: Cambridge University Press.
Byrd, P. (2001). Textbooks: Evaluation for Selection and Analysis for
Implementation. In M. Celce-Murcia (Ed.), Teaching English as a
Second or Foreign Language, 3rd edn. (pp. 415-427). Boston, MA:
Heinle & Heinle.
Cunningsworth, A. (1984). Evaluating and Selecting EFL Teaching
Material. London: Heinemann.
Cunningsworth, A. (1995). Choosing Your Coursebook. London: Longman.
Demir, Y., & Ertas, A. (2014). A Suggested Eclectic Checklist for ELT
Coursebook Evaluation. The Reading Matrix, 14(2), 243-252.
Ellis, R. (2011). Macro- and micro-evaluations of task-based teaching. In
B. Tomlinson (ed.), Materials Development in Language Teaching,
2nd edn. (pp. 212–36). Cambridge: Cambridge University Press.
Grant, N. (1987). Making the Most of Your Textbook. Harlow: Longman.
Littlejohn, A. P. (2011). The analysis of language teaching materials:
inside the Trojan horse. In B. Tomlinson (ed.), Materials
Development in Language Teaching, 2nd edn. (pp. 179–212).
Cambridge: Cambridge University Press.
Masuhara, H. (2011). What do teachers really want from coursebooks? In
B. Tomlinson (ed.), Materials Development in Language Teaching,
2nd edn. (pp. 236–67). Cambridge: Cambridge University Press.
Mathews, A. (1985). Choosing the best available textbook, in A. Mathews,
M. Spratt and L. Dangerfield (eds), At the Chalkface. London:
Edward Arnold, pp. 202–6.
McDonough. J. and Shaw, C. (1993). Materials and Methods in ELT.
Oxford: Blackwell.
McDonough, J. and Shaw, C. (2003): Materials and Methods in ELT, 2nd
edn. Oxford: Blackwell.
McDonough, J., Shaw, C. and Masuhara, H. (2013). Materials and Methods
in ELT, 3rd edn. Malden: John Wiley and Sons.
McGrath, I. (2002). Materials Evaluation and Design for Language
Teaching. Edinburgh: Edinburgh University Press.
McGrath, I. (2013). Teaching Materials and the Roles of EFL/ESL
Teachers. London: Bloomsbury.
Mishan, F. and Timmis, I. (2015). Materials Development for TESOL.
Edinburg: University Press.
Mukundan, J. (2006). Are there new ways of evaluating ELT coursebooks?.
In J. Mukundan (ed.), Readings on ELT Material II. Petaling Jaya:
Pearson Malaysia, pp. 170–9.
Mukundan, J. and Ahour, T. (2010). A review of textbook evaluation
checklists across four decades (1970–2008). In B. Tomlinson and H.
Masuhara (eds.), Research for Materials Development in Language
Learning: Evidence for Best Practice. London: Continuum, pp. 336–52.
Riazi, A. M. (2003). What do textbook evaluation schemes tell us? A study
of the textbook evaluation schemes of three decades. In W.
Renyanda (ed.), Methodology and Materials Design in Language
Teaching: Current Perceptions and Practices and their Implications
(pp. 52–69). Singapore: SEAMEO.
Rubdy, R. (2003). Selection of materials. In B. Tomlinson (ed.),
Developing Materials for Language Teaching (pp. 37–58). London:
Continuum.
Sheldon, L. (1988). Evaluating ELT textbooks and materials. ELT Journal,
42(4), 237–46.
Schon, D. (1983). The Reflective Practitioner. London: Temple Smith
Tanner, R., & Green, C. (1998). Tasks for Teacher Education. UK:
Longman.
Tomlinson, B. (1999). Developing criteria for evaluating L2 materials.
IATEFL Issues 47, March.
Tomlinson, B. (2003b). Developing principled frameworks for materials
development. In B. Tomlinson (ed.), Developing Materials for
Language Teaching, London: Continuum, 107–29.
Tomlinson, B. (2003c). Materials evaluation. In B. Tomlinson (ed.),
Developing Materials for Language Teaching. London: Continuum,
15–36.
Tomlinson, B. (2012a). Materials development for language learning and
teaching. Language Teaching, 45(2), 1–37.
Tomlinson, B. (2012b). State of the art review. materials development for
language learning and teaching. Language Teaching, 45(2), 143–79.
Tomlinson, B. (Ed.), (2013a). Applied Linguistics and Materials
Development. London: Bloomsbury.
Tomlinson, B. (2013b). Developing Materials for Language Teaching, 2nd
edn. London: Bloomsbury.
Tomlinson, B. and Masuhara, H. (2004). Developing Language Course
Materials. Singapore: SEAMO.
Thank You
View publication stats