0 ratings0% found this document useful (0 votes) 51 views6 pagesTesting Listening
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
12 Testing listening
It may seem rather odd to test listening separately from speaking, since
the two skills are typically exercised together in oral interaction. How.
ever, there are occasions, such as listening to the radio, listening to
lectures, oF listening to railway station announcersents, when no speaking
is called for. Also, as far as testing is concerned, there may be situations |
‘where the testing of oral ability is considered, for one reason or another,
impractical, but where a test of listening is included for its backwash |
effect on the development of oral skills. Listening may also be tested for
diagnostic purposes.
Because itis a receptive skill the testing of listening parallels in most
‘ways the testing of reading, This chapter will therefore spend little time
‘on issues common to the testing of the two skills and will concentrate
‘more on matters that are particular to listening, The reader who plans
to construct a listening test is advised to read both this and the previous
chapter.
The special problems in constructing listenirg tests arise out of the
transient nature of the spoken language. Listeners cannot usually move
backwards and forwards over what is being said in the way that they
can a written text. The one apparent exception to this, when a tape-
recording is put at the listener's disposal, does not represent a typical
listening task for most people. Ways of dealing with these problems are
discussed later in the chapter.
‘Specifying what the candidate should be able to do
[As with the other skills, the specifications for reading tests should say
‘what it is that candidates should be able to do.
160
Testing listening
Content
Operations
‘Some operations may be classified as global, inasmuch as they depend on
jn overall grasp of what is listened to. They include the ability to:
« obtain the gists
follow an argument;
6 recognise the attitude of the speaker.
(Other operations may be classified in the same way as were oral skills in
Chapter 10. In writing specifications, it is worth adding to each opera-
tion whether what isto be understood is explicitly stated or only implied.
Informational:
obtain factual information;
« follow instructions (including directions);
«understand requests for information;
understand expressions of needs
understand requests for helps
¢¢ understand requests for permission;
« understand apologies;
«@ follow sequence of events (narration);
«recognise and understand opinions;
¢ follow justification of opinions;
f¢ understand comparisons;
«@ recognise and understand suggestions;
«recognise and understand comments;
«¢ recognise and understand excuses;
recognise and understand expressions of preferences;
recognise and understand complaints;
«recognise and understand speculation.
interactional:
© understand greetings and introductions;
understand expressions of agreement;
understand expressions of disagreement;
recognise speaker's purposes
recognise indications of uncertainty;
understand requests for clarification;
recognise requests for clarifications
recognise requests for opinions
recognise indications of understandings
164‘Testing for language teachers Testing listening
Dialects may include standard or non-standard varieties.
‘Accents may be regional or non-regional.
If authenticity is called for, the speech should contain such natural
features a8 assimilation and elision (which tend to inerease with speed
of delivery) and hesitation phenomena (pauses, fillers, ete).
Intended audience, style, topics, range of grammar and vocabulary
say be indicated.
‘¢ recognise indications of failure to understands
‘¢ recognise and understand corrections by speaker (of self and others};
‘© recognise and understand modifications of statements and comments
recognise speaker's desire that listener indicate understanding;
«recognise when speaker justifies or supports statements, etc. of othee
speaker(s);
«recognise when speaker questions assertions made by other speakers
4 recognise attempts to persuade others.
Iemay also be thought worthwhile testing lower level listening skills in
a diagnostic test, since problems with these tend to persist longer than
they do in reading, These mighe include:
Setting criterial levels of performance
‘The remarks made in the chapter on testing reading apply equally here.
If the test is set at an appropriate level, then, as with reading, a near
perfect set of responses may be required for a‘pass’. ACTFL, ILR or other
scales may be used to validate the criterial levels that are se.
« discriminate between vowel phonemes;
« discriminate between consonant phonemes:
« interpret intonation patterns (recognition of sarcasm, questions in
declarative form, etc, interpretation of sentence stress).
Texts Setting the tasks
For reasons of content validity and backwash, texts should be speciid
as fully as possible.
‘Text type might be first specified as monologue, dialogue, or multi- ~
participant, and further specified: conversation, announcement, talk or
lecture, instructions, directions, etc.
Text forms include: description, exposition, argumentation, instrue-
tion, narration.
Length may be expressed in seconds or minutes. The extent of short
utterances or exchanges may be specified in terms of the number of turns
taken.
Speed of speech may be expressed as words per minute (wpm) or
syllables per second (sps). Reported average speeds for samples of British
Selecting samples of speech (texts)
Passages must be chosen with the test specifications in mind. If we are
interested in how candidates can cope with language intended for native
speakers, then ideally we should use samples of authentic speech. These
‘ean usually be readily found. Possible sources are the radio, television,
spoken-word cassettes, teaching materials, the Internet and and our own
recordings of native speakers. If, on the other hand, we want to know
whether candidates can understand language that may be addressed to
them as non-native speakers, these too can be obtained from teaching
materials and recordings of native speakers that we can make ourselves.
In some cases the indifferent quality of the recording may necessitate re~
English are: recording, It sesms to me, although not everyone would agree, that a
poor recording introduces difficulties additional to the ones that we want
wpm sps te crete, and so reduces the validity ofthe tet Ie may also introduce
a unreliability, since che performance of individuals may be affected by the
Radio mont oo caoriny ule afer deco tem oestenrs oxatonl eee
lonversations ‘of what is said on the recording interfere with the writing of good items,
Interviews 1904.17 testers should feel able to edit the recording, or to make a fresh record~
Lectures to non-native speakers 1403.17
(Tauroza and Allison, 1990)
ing from the amended transcript. In some cases, a recording may be used
simply as the basis for a ‘live’ presentation.
If recordings are made especially for the test, then care must be taken
to make them as natural as possible, There is typically a fair amount of
redundancy in spoken language: people are likely to paraphrase what
162 163Testing for language teachers
they have already said (‘What I mean to say is ...), and to remove thi
redundancy is to make the listening task unnatural. In. particular, we
should avoid passages originally intended for reading, like the follow.
ing, which appeared as an example of a listening comprehension
passage for a well-known test:
She found berself in a corridor which teas unfamiliar, but after
trying one or two doors discovered her way back to the stone.
flagged hall which opened onto the balcony. She listened for
sounds of pursuit but heard none, The ball was spacious, devoid
of decoration: no flowers, no pictures.
‘This is an extreme example, but test writers shoald be wary of trying to
create spoken English out of their imagination: it is better to base the
passage on a genuine recording, or a transcript of one. If an authentic
text is altered, it is wise to check with native speakers that it still sounds
natural. Ifa recording is made, care should be taken to ensure that it fits
with the specifications in terms of speed of delivery, style, etc
Suitable passages may be of various lengths, depending on what is
‘A passage lasting ten minutes or ore might be needed to
lity to follow an academic lecture, while twenty seconds could
be sufficient to give a set of directions.
Writing items
For extended listening, such as a lecture, a useful first step is to listen
to the passage and note down what itis that caadidates should be able
to get from the passage. We can then attempt to write items that check
whether or not they have got what they should be able to get. This
note-making procedure will not normally be necessary for shorter
passages, which will have been chosen (or constructed) to test particu-
lar abilities,
In testing extended listening, it is essential to keep items sufficiently
far apart in the passage. If rwo items are close to each other, candidates
may miss the second of them through no fault of their own, and the
effect of this on subsequent items can be disastrous, with candidates
listening for ‘answers’ that have already passed. Since a single faulty
item can have such an effect, it is particulerly important to trial
extended listening tests, even if only on colleagues aware of the poten
tial problems.
Candidates should be warned by key words “hat appear both in the
item and in the passage that the information called for is about to be
heard. For example, an item may ask about ‘the second point that the
speaker makes’ and candidates will hear ‘My second point is .
164
Testing listening
‘The wording does not have to be identical, but candidates should be
siven fair warning in the passage. It would be wrong, for instance, to
gsk about ‘what the speaker regards as her most important point’ when
the speaker makes the point and only afterwards refers to it as the
most important. Less obvious examples should be revealed through
tcalling.
‘Other than in exceptional circumstances (such as when the candidates
are required to take notes on a lecture without knowing what the items
will be, see below), candidates should be given sufficient time at the
cutset to familiarise themselves with the items. As was suggested for
reading in the previous chapter, there seems no sound reason not to write
items and accept responses in the native language of the candidates. This
will in fact often be what would happen in the real world, whien a fellow
native speaker asks for information that we have to listen for in the
foreign language.
Possible techniques
Multiple choice
‘The advantages and disadvantages of using multiple choice in extended
listening tests are similar to those identifed for reading tests in the previ-
cous chapter. In addition, however, there is the problem of the candidates
having to hold in their heads four or more alternatives while listening to
the passage and, after responding to one item, of taking in and retain-
ing the alternatives for the next item. If multiple choice is to be used,
then the alternatives must be kept short and simple. The alternatives in
the following, which appeared in a sample listening test of a well-known
examination, are probably too complex.
When stopped by the police, how is the motorist advised to
behave?
a. He should say nothing until he has seen his lawyer.
b. He should give only what additional information the law
requires.
¢. He should say only what the law requires.
4. He should in no circumstances say anything
Better examples would be:
(Understanding request for help)
I don’t suppose you could show me where this goes, could you?
Response:
a. No, I don’t suppose so.
165Testing for language teachers
b. Of course Ican.
c. suppose it won't go.
d. Not at all.
(Recognising and understanding suggestions)
T've been thinking. Why don’t we call Charlie and ask for hig =
opinion?
Response:
a. Why is this his opinion?
b. What is the point of thac?
You think it’s his opinion?
d. Do you think Charlie has called?
Multiple choice can work well for testing lower level skills, such as
phoneme discrimination.
‘The candidate hears bat
and chooses between pat mat fat_—_—bat
Short answer
This technique can work well, provided that the question is short and |
straightforward, and the correct, preferably unique, response is obvious,
Gap filing
This technique can work well where a short answer question with a
‘unique answer is not possible.
‘Woman: Do you think you can give me a hand with this?
Man: I'd love to help but I've got to go round to my mother’s in
a minute,
‘The woman asks the man if he can her but he has
to visit his
Information transfer
This technique is as useful in testing listening as itis in testing reading,
since it makes minimal demands on productive skills. Itcan involve such
activities as the labelling of diageams or pictures, completing forms,
making diary entries, or showing routes on a map. The following
example, which is taken from the ARELS examination, is one of a series
of related tasks in which the candidate ‘visits’ a friend who has been
involved in a motor accident. The friend has hurt his hand, and the
candidate (listening to a tape-recording) has to help Tom write his report
of the accident. Time allowed for each piece of writing is indicated
166
Testing listening
In this question you must write your answers. Tom also has to draw a
= etch map of the accident. He has drawn the streets, but he can’t write
_ ja the names. He asks you to fil in the details. Look at the sketch map
ja your book. Listen to Tom and write on the map what he tells you.
S¥etat Mare
J
@®
oo
Co
@ @Qoaa
@ 2a
at Ff
‘Tom: This is 2 rough map of where the accident happened. There's the
main road going across with the cars parked on both sides of it~ that’s
Queen Street. You'd better write the name on it ~ Queen Street. (five
seconds) And the smaller road going across it is called Green Road.
Write Green Road on the smaller road, (five seconds) Now, I was riding
slong Queen Street where the arrow is and the little boy ran into the
road from my right, from between the two buildings on the right. The
building on the corner isthe Star Cinema just write Star on the corner
building, (five seconds) And the one next to itis the Post Office. Write
PO. om that building next to the cinema. (five seconds) Well the boy ran
out between those two buildings, and into the road. Can you put an
arrow in where the boy came from, like I did for me and the bike, but
for the boy? (five seconds) When he ran out I turned left away from him
and hit one of the parked cars. It was the second car back from the
‘crossroads on the left. Put a cross on the second car back. (three
seconds) Ie was quite funny really It was parked right outside the police
station, A policeman heard the bang and came out at once. You'd better
‘rite Police on the police station there on the cornet. (five seconds) I
think that’s all we need. Thanks very much.
om
Geele
Qo
[oD]
Note taking
Where the ability to take notes while listening to, say, a lecture is in
question, this activity can be quite realistically replicated in the testing,
167Testing for language teachers
situation. Candidates take notes during the talk, and only after the talk
is finished do they see the items to which they have to respond. Whey
constructing such a test, it is essential to use a passage from which not,
can be taken successfully. This will only become clear when the task i
first attempted by test writers. I believe it is beter to have items (which
can be scored easily) rather than attempt to score the notes, which
not a task that is likely to be performed reliably. Items should be writen}
that are perfectly straightforward for someone who has taken appropriate
notes.
Teis essential when inclucling note taking as part ofa listening test that
careful moderation and, if possible, trialling should take place. Other-
‘wise, items are likely to be included that even highly competent speakers _
of the language do not respond to correctly. Itsiould go without saying. |
that, since this is a testing task which might otherwise be unfamiliar
potential candidates should be made aware of its existence and, if poss
ible, be provided with practice materials. If this is not done, then the
performance of many candidates will Iead us to underestimate thei
ability
Partial dictation
While dictation may not be a particularly authentic listening activity
(although in lectures at university, for instance, there is often a certain
amount of dictation), it can be useful as a testing technique. As wel as
providing a ‘rough and ready’ measure of listening, ability, it can also be
Lused diagnostically to test students’ ability to cope with particular diff-
culties (such as weak forms in English).
Because a traditional dictation is so difficult to score reliably, itis
recommended that partial dictation is used, where part of what the
candidates hear is already written down for them. It takes the following
form:
‘The candidate sees:
Te was a perfect day. The sun in a clear blue sky
and Diana felt that all was with the world. It
‘wasn’t just the weather that made her ‘eel this way. It was also
the fact that her husband had___agreed to a divorce.
More than that, he had agreed to let her Keep the house and to
pay her a small fortune every month. Life __ be
better
The tester reads:
Iwasa perfect day. The sun shone in aclear blue sky and Diana
felt that all was right with the world. I: wasn’t just the weather
x68
Testing listening
that made her feel this way. Itwas also the fact that her husband
had finally agreed to a divorce. More than that, he had agreed
to let her keep the house and to pay her a small fortune every
month, Life couldn't be better.
Since itis listening that is meant to be tested, correct spelling should
probably not be required for a response to be scored as correct. How-
ever it is not enough for candidates simply to attempt a representation
of the sounds that they hear, without making sense of those sounds.
‘To be scored as correct, a response has to provide strong evidence of
the candidate’s having heard and recognised the missing word, even if
they cannot spell it. It has to be admitted that this can cause scoring
problems,
‘The gaps may be longer than one word:
It was a perfect day. The sun shone
and Diana fele that all was well with the world.
‘While this has the advantage of requiring the candidate to do more than
listen for a single word, it does make the scoring (even) less straight-
forward.
Transcription
Candidates may be asked to transcribe numbers or words which are
spelled letter by letter. The numbers may make up a telephone number.
‘The letters should make up a name or a word which the candidates
should not already be able to spell. The skill that items of this kind test
belong directly to the ‘real world’. In the trialling ofa test I was involved
with recently, it was surprising how many teachers of English were
tunable to perform such tasks satisfactorily. A reliable and, I believe,
valid way of scoring transcription is to require the response to an item
to be entirely correct for a point to be awarded.
Moderating the items
‘The moderation of listening items is essential. Ideally it should be
carried out using the already prepared recordings or with the item
‘writer reading the text as it is meant to be spoken in the test. The
‘moderators begin by ‘taking’ the test and then analyse their items and
their reactions to them. The moderation checklist given on page 154 for
reading items needs only minor modifications in order to be used for
moderating listening, items.
169Testing for language teachers
Presenting the texts (live or recorded?)
‘The great advantage of using recordings when administering a listening
test is that there is uniformity in what is presented to the candidates,
This is fine if che recording is to be listened to in a well-maintaine
language laboratory or in a room with good acoustic qualities and with
suitable equipment (the recording should be equally clear in all parts
the room). If these conditions do not obtain, then a live presentation iy
to be preferred. If presentations are to be live, then greatest uniformity
{and so reliability) will be achieved if there is just a single speaker for
cach (part of a) test. Ifthe testis being administered at the same time in
‘a number of rooms, more than one speaker will be called for. In either
case, a recording should be made of the presentation, with which speak.
cers can be trained, so that the intended emphases, timing, etc. will be
“observed with consistency. Needless to say, speakers should have a good |
‘command of the language of the test and be generally highly reliable,
responsible and trustworthy individuals.
Scoring the listening test 2
It is probably worth mentioning again that in scoring a test of a recep-
tive skill there is no reason to deduct points for errors of grammar or
spelling, provided that itis clear that the correct response was intended,
Reader activities
1. Choose an extended recording of spoken language that would be
appropriate for a group of students with whom you are familiar (you
may get this from published materials, or you may record a native
speaker or something on the radio). Play a five-minute stretch to
yourself and take notes. On the basis of the notes, construct eight
short-answer items. Ask colleagues to take the test and comment on
it, Amend the test as necessary, and administer it to the group of
students you had in mind, if possible. Analyse the results. Go through
the test item by item with the students and ask for their comment
How far, and how well, is each item testing what you thought it
would test?
2. Design short items that attempt to discover whether candidates can
recognise: sarcasm, surprise, boredom, elation. Try these on colleagues
and students as above.
170
Testing listening
3. Design a test that requires candidates to draw (or complete) simple
pictures. Decide exactly what the testis measuring. Think what other
things could be measured using this or similar techniques. Administer
the test and see if the students agree with you about what is being
measured.
Further reading
Buck (2001) is a thorough study of the assessment of listening, Freedle
and Kostin (1999) investigate the importance of the text in TOEFL
rinitalk items, Sherman (1997) examines the effects of candidates
previewing listening test items. Buck and Tatsuoka (1998) analyse
performance on short-answer items. Hale and Courtney (1994) look at
the effects of note taking on performance on TOEFL listening items.
Buck (1991) uses introspection in the validation of a listening test.
Shohamy and Inbar (1991) look at the effects of texts and question type.
‘Arnold (2000) shows how performance on a listening test can be
improved by reducing stress in those who take it. Examples of record-
ings in English that might be used as the basis of listening tests are
Caystal and Davy (1975); Hughes and Trudgill (1996), if regional
British accents are relevant.
x71