0% found this document useful (0 votes)
64 views34 pages

Running Head: Achievement Test Final Report 1

This document provides a final report on the development of an achievement test to measure listening and speaking skills for hotel workers. The test assesses skills after completion of a workplace English course. It consists of 3 tasks that take approximately 15 minutes to complete. The tasks simulate real scenarios hotel employees face, like responding to guest requests. The report describes the test format, objectives, tasks, and scoring methodology in detail. It aims to evaluate students' workplace vocabulary, grammar, and ability to communicate effectively on the job.

Uploaded by

api-340592312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views34 pages

Running Head: Achievement Test Final Report 1

This document provides a final report on the development of an achievement test to measure listening and speaking skills for hotel workers. The test assesses skills after completion of a workplace English course. It consists of 3 tasks that take approximately 15 minutes to complete. The tasks simulate real scenarios hotel employees face, like responding to guest requests. The report describes the test format, objectives, tasks, and scoring methodology in detail. It aims to evaluate students' workplace vocabulary, grammar, and ability to communicate effectively on the job.

Uploaded by

api-340592312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Running head: ACHIEVEMENT TEST FINAL REPORT 1

Listening and Speaking for Hotel Workers: Summative Achievement Test

Final Report

Jenny Stetson-Strange

Sufang Hou

Sarah van Nostrand

Colorado State University


ACHIEVEMENT TEST FINAL REPORT 2

Introduction

Importance of language area

The general language area that we have chosen for our test development project falls

under the umbrella of English for Specific Purposes (ESP). More specifically, we will be

focusing our efforts on developing a listening and speaking test for a beginning level adult

Workplace English course. This test will take the form of a performance assessment. The general

language area noted above was chosen intentionally to mirror that of one group members real-

world curriculum development project currently nearing completion. This language area,

therefore, is important first and foremost because it is targeting real participants, and once the

test has been developed and piloted, it will be integrated into the curriculum. The curriculum

development project itself is being designed by one of our group members who plans to provide

a Workplace English course to staff working in various departments at a Northern Colorado

hotel. These departments include housekeeping, maintenance, and kitchen areas. Access to

English instruction for this staff opens up the possibility of promotion and boosted self-

confidence, allows the employees to assist their children at home with schoolwork, and helps

them to communicate more effectively within their community.

Method of organization

This final report has been organized into four major sections, excluding the references

section. They include the description of the test, pilot test procedure, test results, and discussion.

The end of this report contains multiple appendices including the table of specifications, the

actual test, and several tables describing the hypothetical test results.
ACHIEVEMENT TEST FINAL REPORT 3

Description of the Test

Purpose of test
The overall purpose of this test is summative. That is, it is an achievement test measuring

student performance at the end of instruction. The interpretation of test scores will be criterion-

referenced, and they will be used to assign a final course grade (in this case a number on a four-

point scale) to every participant. The test will be made up of three separate problem-solving tasks

that the participants must complete one-on-one with the instructor. Each task should

approximately five minutes, with the entire test lasting 15 minutes per participant. These tasks

will reflect (as closely as possible) real-world scenarios that the hotel employees face on a

regular basis. These tasks will provide evidence of employees ability to use common hotel

vocabulary and department-specific vocabulary, use the proper preposition while communicating

and providing information to guests, and the ability to understand a guest complaint and/or

request (whether in person or over the phone) and provide a timely and reasonable solution.

Specific description of TLU domain

An example of a task relating to our target language use (TLU) domain is a hotel

employee that must assist a guest of the hotel with either a request or complaint. This could

include a guest requesting more pillows or shampoo, a guest notifying the employee that they do

not want the room to be cleaned because they are staying an extra night, or the guest could be

filing a complaint that the bathroom faucet leaks or the safe will not open. The employee must be

able to understand the request or complaint, fulfill or resolve the request or complaint, and then

communicate that back to the guest.

Construct definition

In defining our construct, we must identify what it is that we are trying to gather information

about. For the purpose of this test, we are trying to gather information on employees English
ACHIEVEMENT TEST FINAL REPORT 4

performance as it pertains to the specific hotel and occupational TLU domain. Therefore,

elements of the constructs to be tested are:

Grammatical knowledge
o Hotel and customer service-specific vocabulary
o Syntax, including appropriate use of prepositions
Textual knowledge
o Cohesion, including producing and understanding utterances in conversation.
o Conversational organization, including turn-taking.
Sociolinguistic knowledge
o Cultural references such as common figures of speech and/or metaphors.
Functional knowledge
o Ideational knowledge, including descriptions and/or explanations.
o Manipulative knowledge, including understanding instrumental functions and

performing regulatory functions.


o Heuristic knowledge, including problem-solving.

Elements of the construct to be assumed:

Some listening comprehension ability.

Elements that are not included:

Reading and writing abilities.

Type/design of test

This test will be an achievement test because it is measuring specific course content from

the course curriculum. It is also a summative test since it is measuring participants knowledge at

the end of instruction. This test will use a criterion-based score interpretation because the intent

is to compare students against a carefully-defined standard set in place by the curriculum of the

overall course. We will not be spreading out scores in order to compare participants with one

another, as a norm-referenced score interpretation would do.

Description of Table of Specifications


ACHIEVEMENT TEST FINAL REPORT 5

The table of specifications (TOS) reflects a productive skills test rather than receptive

(refer to the following paragraph for a more detailed description of the TOS). This means that the

number of points that are on the scoring rubric have been written in the table in place of the

number of items. Listing the number of items on a table of specifications is more appropriate

where receptive skills are being targeted. As noted above, there are four major parts to this test

with five total tasks (recall that Part I has two tasks.) As shown in Appendix A, all five of these

tasks can be seen in the far-left column of the TOS. In the far-right column are the score

percentages for each of those tasks. The tasks are not all weighted equally. The column titled #

points reflects the percentage scores in decimal form. This is because the total number of

possible points to receive on this test is 10. In other words, 10 out of 10 points equals 100% on

the test.

The categories listed across the top of the table represent the general objectives, grammar

and function. Both grammar and function each have two sub-categories: vocabulary and syntax,

and ideational and instrumental functions, respectively. Grammar is included as one of our

objectives because we include the use of prepositions and adjectives in our test tasks, and

students are expected to produce cohesive, grammatically correct sentences. These tasks require

use of appropriate vocabulary and syntax. Under the function objective, we included both

ideational and instrumental functions because students are expected to use language to express or

exchange information about ideas (ideational functions) and are also expected to perform in

order to get other people to respond. This includes making requests, giving commands, and

offering suggestions (instrumental functions) (Bachman & Palmer, 2010).

Description of test tasks


ACHIEVEMENT TEST FINAL REPORT 6

This achievement test is comprised of four major parts. These parts consist of both

grammatical and function objective areas. Part I tests the students use of prepositions of place

and direction. Additionally, Part I is broken down into two tasks, A and B. The first task (A) is a

dictation, focusing on listening and speaking skills. The students will label a picture of a hotel

with the appropriate preposition they hear, which will be dictated to them by the teacher. This

section has 10 sentences (or tasks) to be completed. The second section (B) focuses on the

language use of specific prepositions that are difficult for the students to use.[1] Each student

will answer questions from the teacher about where places are located at a hotel. The combined

time allotted for sections A and B is 20 minutes.

Part II is a critical thinking task which focuses on specific vocabulary needed to

effectively communicate with the maintenance staff at a hotel. There are 10 items using specific

words that will be used in order to construct a cohesive sentence. The allotted time for Part II is

15 minutes.

Part III focuses on computer skills and cohesion and grammatical structure of sentences.

Students will be given a bag of three lost and found items (a razor, swimsuit, and a black i-

phone) and they will enter these items into a mock program resembling the Charger Back

program used at a hotel. The allotted time for Part III is 15 minutes.

Part IV is a role-play task. The students are given a scenario and will have to

communicate effectively with the teacher, who is acting as a guest of the hotel. The allotted time

for this task is 20 minutes.

The total time allotted to administer this test is 70 minutes, not including time for the

instructions and distribution of materials.

Pilot Test Procedure


Participants
ACHIEVEMENT TEST FINAL REPORT 7

The administration of this test, and therefore all test scores, are hypothetical. This is

because the test, although real, will not be administered until summer 2017. The 12 participants,

however, are real, as shown in Appendix C. These 12 participants are all current employees at the

hotel and English language learners (ELLs). They work primarily in housekeeping, but there are

kitchen and maintenance employees as well. They range in age from 18 to 50 years old and have

all emigrated from either Mexico or Guatemala. Some participants have been in the United

States for over a decade, while others have been in the United States for as little as three years.

Additionally, some participants use only Spanish at home, while others speak some limited

English at home, often if they are communicating with their child enrolled in a local (English-

speaking) school. There is a mix of both male and female test takers, and none of them have ever

been formally tested in their L2.

Administration

As reported by Miller, Linn, and Gronlund (2009), the students assessment outcomes

may be higher during a test if anxiety levels are low. There needs to be a conducive environment

for all students in order for them to perform well on an assessment. Therefore, the summative

assessment will be implemented at The Matthews House at 220 N Grant Avenue, Fort Collins,

CO 80524. The participants will be administered their achievement test during their lunch break

from 11:30am-12:30 p.m. on a Thursday, which will coincide with the last class of a 10-week

unit. The classroom that will be used is relatively small, seating approximately 20 people. There

are 10 computers situated around the room as well. There are desks in the classroom for students

to sit at while taking their test.

Part I. Prepositions of Place and Direction


ACHIEVEMENT TEST FINAL REPORT 8

Students will be given a handout with a picture of a hotel room for Section A. The teacher

will dictate 10 sentences to the students. The students will then use a pen or pencil and label the

picture with the appropriate preposition. They will be sitting down at the desks while completing

Part I. As for Section B, the teacher and intern will meet with students individually. During this

time, all other students will be asked to participate in the other room as there will be snacks and

drinks for them. Section B is graded individually.

Part II. and Part III. Application: Describing Repairs and Computer

At the beginning of Part II, students will be asked to go into the other room and work on

the computer, Part III. While others are completing Part III, the teacher will ask students to

complete Part II individually, in another room. At this time, the teacher will administer this part

of the test aurally. While Part II is being assessed by the teacher, Part III will be taking place and

an intern will be observing and overseeing as they complete this part.

Part IV. Role-play

During this last part, students will engage in a role play with the teacher and intern. This

will be completed individually. As the students are called on one by one, the other students are in

another room partaking in snacks and drinks waiting to be called upon. Once the assessment is

completed, the teacher and intern will gather the test and score together.

Scoring

Our test is criterion-referenced as it measures the performance of individuals assessment

results with respect to a predetermined cut-off score. Therefore, there is no score comparison

between individuals who take the same test (Miller, Linn, & Gronlund, 2009). The cut score is 6

out of 10. Students who receive a score of over six or above demonstrate that their English
ACHIEVEMENT TEST FINAL REPORT 9

language achievement meets the minimum requirement(s), and individuals who get a total score

of less than six means that their English language achievement does not meet the minimum

requirements after they took the course. (Please see Appendix H to view the score report.)

As cited in Miller et al., (2009), Richard Stiggins ...argued that the specification of performance

criteria is the most important aspect of developing effective performance assessments (p. 271).

As described in the structure of test section, our test is composed of four main parts, with Part I

divided into two sections, A and B. Each part and section measures different grammatical or

topical areas, hence we developed five rubric tables in total. Task descriptions on the top of each

rubric table explain what the task measures. The scoring explanation section in the table explains

how the scores are assigned in detail. With the exception of Part 1, Section A, where there is only

one correct possible answer and therefore a simple point system is assigned, holistic rubrics are

used. Holistic rubrics have been designed for the other test parts because they provide an

overview of student achievement based on their performance. Zero points is not listed in the

rubric as students are expected to provide answers for the tasks.

Part I (Section A) is simple to score because there is only one possible correct answer,

and students will either receive full or no credit. If students can label the correct prepositions

they hear, they get 0.15 points per item. Students get zero point if they label the wrong

preposition. The instructor needs to count numbers of correct prepositions students get to assign

points. For Part I Section B, and Part II, there is more than one correct possible answer. Possible

answers are divided into three different categories, which all have a clear, standard description.

Students will be given one of the scores based on which standards their performance fits. The

instructor needs to count numbers of correct answers to assign total points for each part.

Test Results
Item statistics
ACHIEVEMENT TEST FINAL REPORT 10

The hypothetical item statistics were scored according to the knowledge and language

ability of the hotel workers. Over a nine-month period, observations and interviews were

conducted at a local hotel in Northern Colorado. The researcher was able to ascertain what level

the participants were on by observing the participants weekly. This enabled the researcher to

score according to the learners learning level, as well as eagerness and motivation to learn the

language. The majority of participants had a basic knowledge of English. We analyzed item

facility to see the difficulty degree of each item, and utilized the B-index to see how each item

contributed to the pass/fail decision. It is important to note that the B-index statistics are not

available, as all students passed the test based on the hypothetical test results. Therefore, item

statistics were mainly focused on item facility analysis. All the items were analyzed by item

difficulty and B-index. Statistics for the overall hypothetical test results is shown in Appendix D.

Part I, Section A itemized hypothetical test results

This is a hypothetical analysis of Part I, Section A. Appendix E details the overall

hypothetical test statistics for Part I, Section A. Through analyzing students performance on

each item, each student got a score over the cut-score 0.90. Students were able to label 60% of

the prepositions correctly. Item 1 seems difficult for students as only 75% students got it correct.

Overall, students were able to label most prepositions in part 1A, but they made more mistakes

when labeling the propositions on and in. This is most likely because in and on are the same

preposition in Spanish, so there were difficulties when answering these questions in Part I,

Section A.

Part I, Section B itemized hypothetical test results

This is a hypothetical analysis of Part I, Section B. Appendix F details the overall

hypothetical test statistics for Part I, Section B. There were difficulties with finding the fitness
ACHIEVEMENT TEST FINAL REPORT 11

room and restaurant and this may have been due to the fact that they were around corners and

down hallways. 83.3% of students got scores over the cut score 0.9, and students were able to

answer 60% of the questions correctly by using the appropriate prepositions. See Appendix G for

an overview of Section A and B combined scores.

Part II itemized hypothetical test results

This is a hypothetical analysis of Part II. Appendix H details the overall hypothetical test

statistics for Part II. This task was particularly difficult for a few participants as a few questions

required multiple responses. This required both critical and problem solving skills. However, the

majority of participants did well: students answered 60% of the questions correctly, with 25% of

students making mistakes when answering item 1.

Part III itemized hypothetical test results

This is a hypothetical analysis of Part III. Appendix I details the overall hypothetical test

statistics for Part III. Overall, the participants handled this task with no difficulty as they had

around 15 minutes to complete the task. Describing the specific items in the bag was difficult for

a few of the participants. Using the appropriate adjectives and descriptors was also difficult. Due

to the structure of the units, the lesson on computers seemed to assist the participants with typing

their answers. Overall, 91.6% students got scores above the cut-score 1.2. Only 1 student

demonstrated difficulty with describing the items.

Part IV itemized hypothetical test results

This is a hypothetical analysis of Part IV. Appendix J details the hypothetical test

statistics for Part IV. Appendix K details the overall hypothetical test scores from all four parts of

the test, providing a final, single score. This task was extremely enjoyable to grade and the

instructor was able to see how well the participants progressed from the beginning of the course.
ACHIEVEMENT TEST FINAL REPORT 12

This same role play was administered at the onset of the course. There were minor errors that

occurred involving grammatical and textual coherence. Among all the tasks, the role-play was

the most challenging for students as they need to solve the authentic task in limited time. Two

raters were used to ensure the consistency of scores, and the two sets of results were slightly

different. 75% of students passed under rater 1 and 83.3% of students passed under rater 2. The

correlation coefficient between the two raters is 0.87, which means that the two sets of scores are

highly consistent.

Descriptive statistics

Through analyzing and interpreting the descriptive statistics, test holders could discover

relationships or find out differences. We would use measures of central tendency to measure

averages, and the range and standard deviation to measure score variability. Through analyzing

the statistics about the average and variability, instructors could know how well students are

performing on the assessment as a whole.

Several measures of central tendency could be used to measure the average value on a set

of scores from different aspects, which includes the mean, the median, and the mode. The mean,

which is calculated by adding all the raw scores, and then dividing the numbers of scores, is most

widely used to measure central tendency. Based on the hypothetical test results, the mean score is

8.44, which is quite high. The instructor could interpret that students performed well on the test.

But, the mean takes into account the value of each score, it is highly affected by the extreme

scores; therefore, it is useful to have look at the median (Miller et al., 2009, p. 502). The median

is a counting average, which divides the raw scores into equal halves, and half of the raw

scores fall above and below the point, hence it is not affected by the outliers at all (Miller et al.,

2009, p. 503). By arranging all the scores from high to low, we got the median 8.40. The mode is
ACHIEVEMENT TEST FINAL REPORT 13

the most frequent or popular score in the set (Miller et al., 2009, p. 503). The mode of the

hypothetical results is 10, which is the highest score possible.

To measure the variability, we would report the results of the range and the standard

deviation. The range is achieved by subtracting the highest score from the lowest score, which

shows the simplest and crudest measure of variability (Miller et al., 2009, p. 503). The highest

score of the hypothetical result is 10, and the lowest score is 6.05, hence the range is 3.95. To

collect more informative information of how the scores spread out, we would calculate the

standard deviation. The standard deviation shows the degree to which a set of scores deviates

from the mean (Miller et al., 2009, p. 504). The standard deviation of the hypothetical results is

1.17. Based on statistics of the mean and the standard deviation, we can draw a figure with a

curve, where we can clearly see the percentage distributions of scores.

Reliability

Reliability refers to the consistency of measurement, also known as the consistency

among scores (Miller et al., 2009, p. 107). To get an idea of how students are performing on each

task, we analyzed each item of the first three parts. Based on the hypothetical test results, all the

students meet the minimum requirements as they all pass the test. Among all the task items,

students are able to answer 60% of the items correctly for part I, part II and part III. Item 1 of

part I B seems hard for students as there are only 66.7% of students get it correctly. 75% of

students get correct on Item 1 of part 1A and item 1 of part II. Correlation coefficient ad the SEM

are not available for the first three parts because there is only one set of scores available. Based

on the item difficulty statistics talked above, the scores of the first three parts of the test are quite

reliable. We did not correlate the scores among the first three sections because each section focus
ACHIEVEMENT TEST FINAL REPORT 14

on different language content and skills with various items. The standard error of measurement

for the first three sections is also not available as we dont have a correlation coefficient.

To measure the reliability of the scores for part IV the role play, two raters were used to

score it. The scores were compared based on the mean, SD, correlation coefficient and SEM. All

the statistics are available from Appendix J. The Pearson formula as used to correlate the raw

scores from the two raters, producing a correlation coefficient of 0.87. This means that the two

raters scored the role-play task very consistently. The SEM is an indicator of the amount of errors

that must be considered when interpreting an observed score (Miller et al., 2009). The SEM of

the first rater is 0.27, which means that the real score should fall among the range of mean (2.08)

plus (+/-) 0.27. The SEM of the second rater is 0.26, which means the real score should fall

among the range of mean (2.25) plus (+/-) 0.26.

Description of masters/non-masters

In our criterion-referenced interpretation, the students that received a score higher than 6

possible points are considered masters and those who score below 6, are considered non-masters.

If they answer 6 out of the 10 answers as correct, the students have mastered the course material,

even though those who perform at a 6 or 7 may need assistance in the future with a few items.

Accordingly, those who score above at 60% have acquired the material from the lessons.

Discussion
Critique of item performance

According to Miller, et al., (2009), there are three pertinent questions that need to be asked

and addressed when critiquing and assessing item performance and its effectiveness. They are:

1. Did the item function as intended?

2. Was the test item of appropriate difficulty?


ACHIEVEMENT TEST FINAL REPORT 15

3. Was the test item free of irrelevant clues and other defects? (p. 351)

In regard to the first question, hypothetically speaking, the majority of participants mastered

each part of the assessment. Therefore, the test items did function as intended. However, there

were a few that did not master some of the test items. In regard to the test item being of

appropriate difficulty, it was concluded that a few of the test items could be more difficult as

most of the participants mastered each task item. Finally, in regard to the test item being free of

irrelevant clues and other defects, each test item measured tasks that the participant will engage

in everyday on the job. The items did not include multiple choice or true and false, but focused

solely on a task-based assessment, which therefore enabled it to be free of irrelevant clues and

defects. Of course, once this course has been administered, adapting certain tasks and items will

likely be necessary.

Evaluation of test usefulness

The discussion of reliability is based on the hypothetical test results, which will not

exactly reflect the consistency of real scores. The reliability of the first three parts of the test will

be discussed, followed separately by the role-play task because the results have been analyzed

using different methods.

One rater was used to score the first three parts of the test. Through the test results and

the overall statistics, we can see that the scores are highly reliable. Overall, all the students get

scores at or above the cut-score, which means that their language ability meets the minimum

requirements. Among all the items in the first three parts, students were able to answer 60% of

the questions correctly. The item facility statistics were the same for Item 1 of part IA and item 1

of part II, hence we can see that these two items share the same difficulty level. There are five

items in the first three parts that have an item facility of 83%. Based on the limitations of one
ACHIEVEMENT TEST FINAL REPORT 16

rater and multiple tasks, the correlation coefficient and SEM were not calculated. However, the

scores are highly reliable based purely on the item difficulty statistics.

To estimate the consistency of scores for the role-play task, two raters were used to assign

credits for this section, using the same rubric. The average scores from rater 1 and rater 2 are

2.08 and 2.25, which means that rater 2 was more lenient than rater 1. Compared to rater 1, rater

2 is more likely to give students high scores. Another way to look at reliability is through the

correlation coefficient alpha as discussed by Miller et al., (2009). By using the Pearson formula,

we got a coefficient alpha of 0.87. This means that the two raters are highly consistent in

assigning scores for the role-play section.

The validity of the test will be assessed from several aspects such as content, construct,

authenticity and interactiveness. Content validity is the most important one because it describes

how an individual perform on a domain of tasks that assessment is supposed to represent

(Miller et al., 2009, p. 75). To make the content valid, the tasks should represent the teaching

materials and the test items should assess what have been taught in the curriculum. Based on the

description of TLU domain, the five week, non-academic English course mainly focuses on

improving learners English listening and speaking ability, and their ability to solve real-world

tasks they might come across while working at the hotel. The test we designed for this

curriculum focuses on an integrated approach to measuring test takers listening, speaking, and

writing progress. The test items focus on the uses of prepositions, students ability to describe

specific items, and problem solving skills. The role-play task measures learners ability to solve

real-world tasks. The TOS in Appendix A clearly shows the numbers of items and the skills being

assessed. We can see that the test represents the content covered during the five-week course, and

the test measures only content that has been taught.


ACHIEVEMENT TEST FINAL REPORT 17

According to the construct definition described above, the elements of constructs to be

assessed are grammatical knowledge, textual knowledge, sociolinguistic knowledge, and

functional knowledge. Each construct is embodied through the test tasks. Grammatical

knowledge, such as syntax, is assessed through the whole test as students need to produce

grammatically correct sentences to successfully complete the tasks. Part I of the test assesses

learners ability to use prepositions appropriately. The textual knowledge is assessed as students

need to produce cohesive and coherent utterances. The appropriate use of turn-taking will make

the response sounds more organized, from which students can get more points. Part II of the test

measures students ability of describing repairs, which fits within the functional knowledge being

tested. Part III of the test measures students computer skills, and the ability to describe items

using grammatically correct sentences. The role-play task measures students heuristic

knowledge through problem-solving.

Authenticity of the assessment is measured by the degree of the correspondence between

the assessment and the TLU domain. The TLU domain focuses on hotel employees ability to

assist a guest of the hotel with either a request or complaint. The role-play portion of the test

gives test takers an authentic problem that they need to solve, which reflects the TLU domain.

Other parts of the test measure test takers problem-solving ability indirectly. For example, the

correct description of a location with prepositions helps guests find the place they are looking

for. By entering information of lost and found items into a computer, employees are helping

guests find their lost items. Therefore, we can see that the test highly matches the TLU domain,

and measures the content that should be assessed.

Interactiveness of the assessment is measured by the extent that language abilities are

being assessed. The instructions are given in English, which measures test takers listening
ACHIEVEMENT TEST FINAL REPORT 18

ability. For example, in part IA, the instructor will read 10 sentences with prepositions

and students need to label the picture with the correct preposition. Part I B measures test takers

oral speaking ability in the form of the instructor gives questions and students answer. Part III

computer application section measures test takers computer skills, and the ability of describing

items with appropriate vocabulary and sentence structures. The role-play measures test takers

problem-solving ability as students are asked to solve authentic problems they might encounter

while working. Overall, the test measures test takers listening, speaking, and writing abilities in

an integrated manner, as well vocabulary knowledge, grammatical knowledge, and textual

knowledge. Therefore, we can conclude that the test we designed is very interactive.

The impact of the learner was assessed before creating the specific tasks. After conducting an

extensive field study of over the course of nine months at the hotel, the researcher was able to

ascertain specific tasks that the learner will engage in and be comfortable with during the

achievement test. First of all, the participants will engage in these specific tasks weeks before the

final assessment and the instructor will notify them that these tasks embedded in the lessons will

be administered at the end of the course to help determine learner progress over the course.

Secondly, the instructor has built a unique relationship with the participants (through previous

observations and interviews). The impact on the learner should not play a significant part in the

administration of this assessment. As for other stakeholders, such as the General Manager and

Executive Housekeeper, this test could potentially impact them in a positive way as their

employees will be able to be promoted on the job due to their proven English abilities. This

outcome could also possibly create positive retention percentages in the workplace.

Estimation of whether test achieved purpose


ACHIEVEMENT TEST FINAL REPORT 19

From a hypothetical standpoint, this test has achieved its intended purpose, as outlined

above in the description of the test. However, since our results are only hypothetical, revisions

will certainly be necessary once the course is underway. After teaching the full five-week course

during summer 2017 and administering the test, we will have more data that can be analyzed in

terms of both validity and reliability. Tasks will have to be adapted as needed in order to

accurately reflect what has been covered in course material.

Reflection

The creation and production of this assessment was arduous and significantly difficult as

the TLU domain is something that is relatively new to the field of ESP and occupational English.

We had to think out of the box with every detail and, to be honest, felt as if we did not

accomplish the task at hand. We had to revise multiple times in order for it to be coherent and

purposeful. We did administer the test to ourselves to see if it would be effective; this is where

the revisions occurred. However, this was a rewarding experience, as it is an assessment that will

be implemented in the summer of 2017 and will have significant impact on ESL learners at a

local hotel in Northern Colorado. Understanding this goal and objective enabled this project to

move forward and pursue a validated and reliable assessment for the language learners.

Through going over the process of making a summative listening and speaking test for

hotel workers, we became familiar with the test design process. Before the actual test was

designed, we needed to keep in mind what content the curriculum covers, the TLU domain and

construct need to be assessed. To make a valid assessment, the test tasks should represent the

content that has been taught, which is called content validity. Also, the test tasks should match

the TLU domain as close as possible so that the test is authentic. In addition, the test items

should cover all the constructs need to be assessed. Most importantly, when designing a language
ACHIEVEMENT TEST FINAL REPORT 20

test, we need to keep in mind the test takers proficiency level to make sure that the test is not too

easy or difficult for target test takers. Pilot the test if there is a chance. Based on the statistics

acquired from the pilot, test designers will have an idea of how valid and reliable the test is, and

how each test item works. Then revisions could be made based on the feedback. Always keep in

mind that an effective test should be valid, reliable, authentic and bring positive impact on

students. Also an effective test may need several times revisions both on the test items and the

scoring rubric as well.

References

Bachman, L. & Palmer, A. (2010). Language Assessment in Practice. New York, New York.

Oxford University Press.

Jasso-Aguilar, R. (1999). Sources, methods and triangulation in needs analysis: A critical

perspective in a case study of Waikiki hotel maids. English for Specific Purposes, 18(1), 27-

46.

Larson-Freeman, D. & Anderson, M. (2011). Techniques & Principles in Language Teaching.


ACHIEVEMENT TEST FINAL REPORT 21

New York, New York. Oxford University Press.

Miller, D., Linn, R., & Gronlund, N. (2009). Measurement and Assessment in Teaching, 10th ed.

Saddle River, NJ. Pearson.

Nation, I.S.P. & Newman, J. (2009). Teaching ESL/EFL Listening and Speaking. New York,

New York. Routledge.

Appendix A

Table of Specifications

Grammar Function # tasks # points % points

Tasks Vocab. Syntax Ideationa Instrumenta


Skills l l
functions functions

Labelling 0-.375 0-.375 0-.375 0-.375 1 1.5 15

Hotel map 0-.375 0-.375 0-.375 0-.375 1 1.5 15


ACHIEVEMENT TEST FINAL REPORT 22

Maintenanc 0-.50 0-.50 0-.50 0-.50 1 2 20


e
repair

Computer 0-.50 0-.50 0-.50 0-.50 1 2 20


lost & found

Role-play 0-.75 0-.75 0-.75 0-.75 1 3 30

# tasks 5

# points 0-2.5 0-2.5 0-2.5 0-2.5 10

% points 25 25 25 25 100

Appendix B

Spring 2017: Listening and Speaking Achievement Test (10 points)

Directions: This assessment is designed to measure your knowledge, including how you apply
what you have learned, within specific contexts. There are four main parts to this test. You have
70 minutes to complete this test.
ACHIEVEMENT TEST FINAL REPORT 23

Part I. Prepositions of Place and Direction (3 points)

A. I will read 10 sentences out loud. I will read each sentence two times. In each sentence,
I will use a specific preposition. When you hear the preposition, label the picture with the
appropriate preposition that you hear with a pen or pencil. There is only one possible
answer for each sentence. An example has been provided for you below. You have 10
minutes to complete this task.

Example: The remote is on the table.

1 - The shampoo is in the bathtub.


2 - The soap is on the bathroom counter.
3 - The lamp is behind the chair.
4 - The pillow is on the bed.
5 - The remote is on the nightstand.
6 - The bathroom is around the corner.
7 - The conditioner is in the closet.
8 - The toilet is in the bathroom.
9 - The chair is next to the table.
10 - The mirror is behind the sink.

B. Where are these places? You will meet with me one-on-one and give the location of a
few places listed on a hotel map below. I will ask you five specific questions of where a place is
located. You will need to give me directions using prepositions, such as around, next to, in, and
near. You will have 10 minutes to complete this task.
Teacher asks:
1 - Where is the restaurant located?
2 - Where is the pool located?
3 - Where is the front desk located?
4 - Where are the restrooms?
5 - Where is the fitness room?
ACHIEVEMENT TEST FINAL REPORT 24

Part II. Application - Describing Repairs (2 points)

This is a critical thinking activity assessing your knowledge of vocabulary usage as well as
grammatical competence and cohesion. Please answer the following questions listed below.
Please use a pen or pencil. There could be several possible answers. You will have 15 minutes to
complete this task. An example is provided for you below.
Example: Light bulb - Use light bulb in a sentence. (Possible answer: Please replace the light
bulb in room 212.)

1 - Remote Control Name 2 situations where you have to use the word remote control.
2 - TV Use TV in a sentence.
3 - Toilet Use toilet in a question.
4 - Stay-over Explain what a stay-over is and what are two things that can happen with a stay-
over?
5 - Repair Use repair in a question to the maintenance staff.
6 - Sink - Name 2 situations where you have to use the word sink.
7 - Shower - Use shower in a question.
8 - Repair and sink - use repair and sink in a question.
9 - Clean - Name 2 situations where you have to use the word clean.
10 - Pool - Use pool in a sentence along with a preposition.

Part III. Computer Application - Lost and found items (2 points)

You will be given a bag of lost and found items. These items could include a razor, pencil,
ACHIEVEMENT TEST FINAL REPORT 25

wallet, or brush. You will use the computers provided to you in the classroom and enter
information about the bag of lost and found items. This will be entered into the Charger Back
program. Please submit your document to me once you have completed the task. You will have
15 minutes to complete this task.

Part IV. Application Role-play (3 points)

You will be given a scenario and will need to effectively communicate with your teacher or co-
teacher (intern). This task will take 20 minutes.
Scenario: The teacher or intern is a guest of the hotel. The guest will pass by a housekeeper
cleaning a room and presents the following scenario: Hello, I am in room 221. I am staying over
another night. Your task is to respond to the guest and follow-up with the front desk staff.

Appendix C

Participants - Housekeeping Staff


# Department Country Gender Time in U.S.

1 Housekeeping Mexico Female 13 years


ACHIEVEMENT TEST FINAL REPORT 26

2 Housekeeping Mexico Female 3 years


3 Housekeeping Mexico Female 10 years
4 Housekeeping Mexico Female 14 years
5 Housekeeping Guatemala Female 10 years
6 Housekeeping Mexico Male 6 years
7 Housekeeping Mexico Female 16 years
8 Housekeeping Mexico Male 3 years
9 Housekeeping Mexico Female 2 years
1 Housekeeping Guatemala Female 1 year
0
11 Housekeeping Mexico Female 4 years
1 Housekeeping Mexico Female 21 years
2

Appendix D

Statistics of the overall hypothetical test results


Sections Part I A Part I B Part II Part III Part IV Part IV Total
ACHIEVEMENT TEST FINAL REPORT 27

Rater Rater 2 score

Average 1.39 1.33 1.85 1.71 2.08 2.25 8.44

Standard 0.18 0.23 0.30 0.32 0.76 0.72 1.17


Deviation (SD)
Range 0.6 0.6 1 1 2 2 3.95
Standard Error of N/A N/A N/A N/A 0.27 0.26 N/A
Measurement (SEM)

Appendix E

Part I, Section A itemized hypothetical test results


Participants 1 2 3 4 5 6 7 8 9 10
ACHIEVEMENT TEST FINAL REPORT 28

Questions

1 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
2 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
3 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
4 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
5 0 0.15 0.15 0.15 0.15 0.15 0 0.15 0.15 0.15
6 0 0 0.15 0.15 0.15 0.15 0 0 0.15 0.15
7 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
8 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
9 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
10 0.15 0 0.15 0.15 0.15 0.15 0.15 0 0.15 0.15
11 0 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
12 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15

Appendix F

Part I, Section B itemized hypothetical test results


ACHIEVEMENT TEST FINAL REPORT 29

Participants 1 2 3 4 5

Question

1 0.30 0.30 0.30 0.30 0


2 0 0.30 0.30 0.30 0
3 0.30 0.30 0.30 0.30 0.30
4 0.30 0.30 0.30 0.30 0.30
5 0 0.30 0.30 0.30 0.30
6 0 0.30 0.30 0.30 0.30
7 0.30 0.30 0.30 0.30 0.30
8 0.30 0.30 0.30 0.30 0.30
9 0.30 0.30 0.30 0.30 0.30
10 0.30 0.30 0.30 0.30 0.30
11 0 0.30 0.30 0.30 0.30
12 0.30 0.30 0.30 0.30 0.30

Appendix G

Part I, Sections A and B itemized hypothetical test results


ACHIEVEMENT TEST FINAL REPORT 30

Participant Part A Score Part B Score Part I Total Score

1 1.5 1.2 2.7


2 1.5 .9 2.4
3 1.5 1.5 3
4 1.5 1.5 3
5 1.2 1.2 2.4
6 .9 .9 1.8
7 1.5 1.5 3
8 1.5 1.5 3
9 1.5 1.5 3
10 1.2 1.5 2.7
11 1.35 1.2 2.55
12 1.5 1.5 3
Average Score 1.39 1.33 2.71

Appendix H

Part II itemized hypothetical test results


ACHIEVEMENT TEST FINAL REPORT 31

Participants 1 2 3 4 5 6 7 8 9 10

Questions

1 0 0.20 0.20 0.2 09.20 0.2 0.20 0 0.2 0.20


0 0 0
2 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
3 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
4 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
5 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
6 0 0.20 0.20 0 0.20 0.2 0.20 0.20 0.2 0.20
0 0
7 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
8 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
9 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
10 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
11 0 0.20 0.20 0 0.20 0.2 0.20 0 0.2 0
0 0
12 0.20 0.20 0.20 0.2 0.20 0.2 0.20 0.20 0.2 0.20
0 0 0
ACHIEVEMENT TEST FINAL REPORT 32

Appendix I

Part III itemized hypothetical test results


Participants Description of Razor, Swimsuit, and
Questions Phone

1 2.00
2 1.50
3 2.00
4 2.00
5 2.00
6 1.50
7 2.00
8 1.50
9 1.00
10 1.50
11 1.50
12 2.00
ACHIEVEMENT TEST FINAL REPORT 33

Appendix J

Part IV itemized hypothetical test results

Participant Rater 1 Score Rater 2 Score Mean Score

1 2.00 2.00 2.00

2 1.00 1.00 1.00

3 2.00 3.00 2.50

4 2.00 2.00 2.00

5 3.00 3.00 3.00

6 3.00 3.00 3.00

7 3.00 3.00 3.00

8 2.00 2.00 2.00

9 2.00 2.00 2.00

10 1.00 2.00 1.50

11 1.00 1.00 1.00

12 3.00 3.00 3.00


ACHIEVEMENT TEST FINAL REPORT 34

Appendix K

Overall hypothetical test results


Participants Part I Part II Part III Part IV Total
Score Score Score Score Score

1 2.70 1.60 2.00 2.00 8.30


2 2.40 2.00 1.50 1.00 6.90
3 3.00 2.00 2.00 2.50 9.50
4 3.00 2.00 2.00 2.00 9.00
5 2.40 2.00 2.00 3.00 9.40
6 1.80 1.60 1.50 3.00 7.90
7 3.00 2.00 2.00 3.00 10.00
8 3.00 2.00 1.50 2.00 8.50
9 3.00 2.00 1.00 2.00 8.00
10 2.70 2.00 1.50 1.50 7.70
11 2.55 1.00 1.50 1.00 6.05
12 3.00 2.00 2.00 3.00 10.00

You might also like