0% found this document useful (0 votes)
16 views65 pages

Writing

The document discusses the importance of direct writing assessment in evaluating students' writing abilities, emphasizing the need for representative tasks that reflect real-world writing situations. It outlines key elements for specifying writing tasks, scoring methods, and the challenges of balancing wide sampling with practicality. The document also highlights the significance of reliable scoring systems and the calibration of scales to ensure valid assessments of writing skills.

Uploaded by

sumyatwatiaung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views65 pages

Writing

The document discusses the importance of direct writing assessment in evaluating students' writing abilities, emphasizing the need for representative tasks that reflect real-world writing situations. It outlines key elements for specifying writing tasks, scoring methods, and the challenges of balancing wide sampling with practicality. The document also highlights the significance of reliable scoring systems and the calibration of scales to ensure valid assessments of writing skills.

Uploaded by

sumyatwatiaung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Testing

Writing
Introduction to Direct Writing Testing

Main Point: Why this is important:

Even professional testing


Testing writing ability
institutions struggle to
should involve direct
create indirect tests that are
assessment—getting
accurate in measuring
people to write.
writing ability.
Writing tasks must be representative of the
types of tasks students are expected to
perform.
The Testing
Problem:
Three Key The tasks should elicit valid samples that
accurately reflect a student’s writing ability.
Parts

The writing samples must be scored reliably


and validly.
Representative
tasks
 To accurately assess writing ability, tasks
must be representative of what students
(i) Specify all are expected to perform.

possible
 These tasks must be specified clearly in
content the test framework to ensure validity.
1. Operations: The types of actions or functions the
writing should perform (e.g., expressing, directing).
2. Types of Text: The format or genre of the writing
(e.g., letters, reports).
Key Elements 3. Addressees: The intended recipients of the text (e.g.,
in Specifying friends, business partners).

Writing Tasks 4. Length of Texts: How long the writing should be


(e.g., short notes, full essays).
5. Topics: The subject matter that students may need to
write about.
6. Dialect and Style: Variations in language and tone
(e.g., formal vs. informal, British vs. American
English).
Example 1 – CCSE Writing Test Specifications
• Overview of CCSE Certificate in Writing (Level 1):
 Operations: Tasks may involve expressing opinions, reporting, describing, giving instructions, narrating
events, etc.
 Types of Text: Includes letters (personal and business), messages, postcards, reports, and instructions.
 Addressees: The target audience for each task is specified clearly, though not always detailed in the
framework.
 Topics: Tasks may or may not be connected by a common theme.
 Dialect and Style: Unspecified, though context-dependent (could be formal or informal).

• Conclusion:
 The CCSE Certificate framework offers a broad range of writing tasks that are relevant to general
communicative language courses.
• Key Specifications:
 Operations: Tasks require students to describe, explain, compare
Example 2 – and contrast, and argue for/against a position.
 Types of Text: Short examination answers (up to two paragraphs).
English for
 Addressees: University lecturers (both native and non-native
Academic English speakers).
Purposes  Topics: Broad, non-specialized academic topics relevant to first-
year undergraduate students.
(EAP) Writing  Dialect and Style: Any standard variety of English (e.g., British,
Test American), formal style.
 Length: About one page.

• Conclusion:
 The test focuses on academic writing tasks that are highly relevant
to university-level English.
(ii) Include • Key Concept:
a  Content Specifications: In order to create a
representa valid test, it’s essential to identify all possible
tasks students should be able to perform.
tive  The test specifications help ensure that tasks
sample of align with the intended abilities.
• Challenges:
the  It's difficult to include every possible task, but
specified test designers should strive for a
representative sample of tasks.
content
 The ideal test would cover all relevant tasks to provide a complete
measure of writing ability.
 Representative Tasks: If a test includes only a small set of tasks, it may
not fully capture a candidate’s writing ability. However, a broader sample
can improve the validity of the test.
 A test with a broad and representative set of tasks is likely to have a
more beneficial backwash effect, improving student learning.
Example of
CCSE Test
(May/June 2000)

Test Overview:
 The exam includes
tasks related to a
Summer Camp in
America.
Tasks Include:
1. Task 1: Write a letter to inquire about a
summer camp.
2. Task 2: Fill in an application form.
3. Task 3: Write a postcard to a friend about the
camp.
4. Task 4: Write a note to explain a change of
plans to other camp helpers.
• Conclusion:
 The tasks are designed to be a representative
sample of real-world writing situations.
However, they cannot cover all potential
writing tasks, limiting content validity.
Task 1: Write a letter to inquire
about a summer camp.
Task 2: Fill in an
application form.
Task 3: Write a postcard to a friend about the camp.
Task 4: Write a note to explain a change of plans to other camp
helpers.
Balancing Wide Sampling and Practicality

Challenges:

• Wide Sampling: Ideally, a test would cover all possible tasks, but practical
constraints limit this.
• Practicality vs. Accuracy: We must balance the need for a broad sample with the
practicality of constructing the test.
Considerations:

• High Stakes vs. Low Stakes: The importance of the test result influences how
many tasks are needed. High-stakes tests (e.g., university admissions) require a
more representative and accurate sample of writing ability.
1. Identify Relevant Tasks: Use frameworks to specify
key tasks students should be able to perform.
2. Content Validity: Strive for a representative set of
The Complexity tasks, but balance practicality.
of Representative 3. Test Design: High-stakes tests require more extensive
Tasks task sampling to ensure fairness and accuracy.

 While it's challenging to cover every writing skill, a


well-designed, representative test can provide a fair
and valid assessment of a student’s writing ability.
Elicit a valid sample of writing
ability
 To measure writing ability accurately, we must set tasks that
reflect a wide range of writing skills.
 The goal is to elicit a valid sample of writing ability through a
variety of tasks.
 People’s performance on the same task is unlikely to be
perfectly consistent.
 To improve reliability, offer candidates multiple ‘fresh starts’
by including several separate tasks. This approach leads to
greater validity in assessing writing ability.

Set as many separate


tasks as is feasible
Tests should focus solely on writing ability, not
creativity, general knowledge, or the ability to
argue.
Test only • Example Tasks to Avoid:

writing 1. Write a conversation about a planned holiday


(requires creativity).
ability, and 2. Talk about life in your country (tests
nothing speaking and presentation skills).

else 3. Discuss abstract topics like envy or


advantages of wealth (tests argumentation
rather than writing).
Task Evaluation - Creativity vs. Writing
Ability

• Why Certain Tasks Are Problematic:


 Tasks like writing a conversation or discussing abstract ideas
may demand skills beyond writing, such as creativity or
general knowledge.
• Example:
 Task 3: ‘Envy is the sin which most harms the sinner.
Discuss.’ — This tests argumentation more than writing
ability.
• Key Point:
 Reading ability should not interfere with
measuring writing ability.
Reading Skills  Simple instructions should be easy for all
and Writing candidates to understand without bias.

Ability • Example Problem:


 The second question in an example test could
require reading comprehension beyond basic
writing skills.
Using Illustrations to Reduce Reading Dependency

• Strategy:
 Use illustrations (e.g., pictures, charts,
diagrams) to reduce the reliance on reading
comprehension.
• Example:
 A diagram of three types of bees could lead
to a task asking students to compare and
contrast them, reducing reading difficulty.
Tasks that Elicit Narrative
Writing

• Using Visual Prompts:


 Pictures can be used to elicit narratives.
o Example: "Look at these pictures and tell
the story. Begin, 'Something very exciting
happened to my mother yesterday.'"
• Outcome: This helps to focus on writing skills
without requiring external knowledge.
 Writing tasks should be well-defined and not allow
candidates to stray too far from the requirements.
• Example:
 Task: "Compare the benefits of university education
in English vs. Arabic using the provided points."
o Benefits: Ensures focus on writing about specific
points, not general knowledge or unrelated ideas.

Restrict candidates
Providing Notes and
Information

• Important Consideration:
 When using notes or information sheets,
avoid giving candidates too much of the
answer. Full sentences should not be
provided unless necessary.
• Example:
 Comparison task: Notes should give key
points but not fully formed sentences to
ensure candidates still need to write their
responses.
• Core Principle:
 Tasks should be as authentic as possible to
ensure they reflect real-world writing situations.
• Authenticity Consideration:
Authenticity of  A task may be authentic for some candidates
Tasks (e.g., writing to a supervisor), but inauthentic
for others (e.g., language teachers writing to their
supervisor).
• Important: Consider the context and audience
for each task to maintain relevance and
authenticity.
Ensure valid and
reiable scoring
• SET TASKS WHICH CAN BE RELIABLY SCORED
• The tasks selected should allow for consistent, reliable
scoring across different candidates.

• SET AS MANY TASKS AS POSSIBLE


• The more tasks a candidate performs, the more scores are
available, which leads to a more reliable total score.
Restrict candidates

• By restricting what candidates can write or how they can respond, we


can make the performances more directly comparable.

GIVE NO CHOICE OF TASKS


 Requiring candidates to perform all tasks allows for easier
comparisons between candidates' abilities.
 Uniformity in task completion leads to more consistent
assessments.
 Writing samples must be long enough for assessors
to make reliable judgements on key aspects such as
organization and coherence.
 In order to evaluate organizational skills, the
writing needs to be long enough to allow these traits
to become evident.

Ensure long enough


samples
Create appropriate scales
for scoring
 Scoring scales should be clearly defined in the test
specifications under criterial levels of performance.

Types of Scoring Scales:


1. Holistic Scoring
2. Analytic Scoring
Holistic scoring
Holistic scoring (or impressionistic scoring) involves assigning a single
score to a piece of writing based on an overall impression.

Key Characteristics:
 Fast: Scoring is completed quickly, often in just minutes.
 Used for overall judgment, rather than evaluating specific components.
1. Rapid Scoring: Experienced scorers
can evaluate a one-page composition in
under two minutes.
Advantages of 2. Multiple Scores Per Work: Each piece
Holistic Scoring can be scored multiple times for greater
reliability.
3. Efficiency in High-Volume Tests:
Useful for large-scale assessments like
TOEFL.
Scoring system used in the English-medium
university writing test
Example of • NS: Native speaker standard
Holistic • MA: More than adequate
Scoring • A: Adequate for study
• D: Doubtful
Scale • NA: Not adequate
• FBA: Far below adequate
Purpose: This scale was designed to
determine if a student’s writing ability is
adequate for university study in English.
• Considerations:
 Scoring systems must be tailored to the level
of candidates and the purpose of the test.
 A scale suitable for university entry may not
Appropriateness be appropriate for other contexts.
of Scoring • Example: The scale used for university tests in
Systems English was designed based on:
1. An examination of undergraduate students’
work.
2. Teacher judgments of acceptable English.
TOEFL Scoring System

• Similar to holistic scoring but with


general headings.
TOEFL Scoring: • Scores are used by multiple institutions,
so the scale is broader.

• Provides linguistic features for each


level to guide scorers.
Features: • Can be more detailed than the
university-specific scales.
Criticism of Scoring Systems

• Example: ACTFL Descriptors for Writing:


 10-point scale: Ranges from Novice-Low to Superior.
 Criticism: The descriptors assume that grammatical ability
and lexical ability always develop together, which is
questionable.
• Issue: Scales may not be research-based or reflect actual
language learning progression.
• ILR Levels:
 Used to assess language proficiency for specific
Use of Scales for jobs, without considering how the proficiency
Proficiency vs. was achieved.
Achievement  ILR Levels assess whether language skills are
sufficient for tasks, such as diplomatic posts.
Challenge:
• How to rate individuals whose language
proficiency spans multiple levels?
Rating
Individuals with Approach:
Mixed Levels
• Proficiency-Based: Place them at the
lowest level that describes their language
proficiency.
• Achievement-Based: Allow strengths in
one area to compensate for weaknesses
in another.
Analytic scoring
Analytic scoring assigns separate scores for different aspects
of a task, providing a detailed evaluation of a candidate's
performance.

Key Features:
 Involves assessing multiple components (e.g., grammar,
content, structure, etc.).
 Designed to offer a more nuanced understanding of writing
ability.
1. Uneven Development of Subskills: Analytic
scoring addresses the variation in skill
development across different aspects of
writing.
2. Comprehensive Evaluation: Forces scorers
to evaluate aspects they might overlook in
holistic scoring.
3. Greater Reliability: Multiple scores across
components increase the consistency and
reliability of the assessment.

Advantages of Analytic Scoring


The Role of "Halo Effect"

 Halo Effect: The risk that scorers may be


influenced by one aspect of the writing (e.g.,
content) and allow it to affect their judgment
of other aspects (e.g., grammar).

 Despite this, multiple separate scores can


still lead to greater reliability.
 Anderson’s Scale: Equal weight is given to all
components.
 Jacobs et al. (1981): Weighting is
differentiated depending on the importance of
Weighting in each aspect.

Analytic
Scoring Common Aspects Scored:
1. Content
2. Organization
3. Grammar
4. Vocabulary
5. Mechanics (spelling, punctuation, etc.)
Challenges:
1. Time Consumption: Analytic
scoring takes more time than holistic
scoring due to the need for multiple
evaluations.
Disadvantages of
Analytic Scoring 2. Focus on Parts vs. Whole: Scoring
individual aspects may distract from
the overall impact of the writing,
potentially reducing validity.
o Solution: Impressionistic score
added to balance the overall
assessment.
Issue of Error Gravity
• Problem:
 A small number of grammatical errors can
significantly affect communication, while
many errors might not. The frequency of
errors and their impact on communication
may not always correlate.
• Relevance:
 This issue is present in both analytic and
holistic scoring systems.
Example: Jacobs et al. (1981) Scoring
• Description:
 Five components: Content, Organization, Vocabulary,
Grammar, Mechanics.
 Content has the highest weight, Mechanics the lowest.
 Score Range: Each component allows a range of scores,
giving scorers flexibility in evaluating performance.
• Considerations:
 Diagnostic Information: Analytic scoring is
Choosing essential when detailed insights into specific
areas (like grammar or organization) are
Between needed.
Holistic and  Scoring Context:
Analytic o Small, cohesive group: Holistic scoring
might be more time-efficient.
Scoring o Large or less experienced group:
Analytic scoring is preferred for
consistency.
 Scales inform candidates about the criteria by
which they will be judged, potentially creating a
backwash effect.
 If candidates are aware of the scales, they can focus
on improving the specific criteria emphasized in
the scoring.

Backwash Effect of Scales


Calibrate the scale to be
used
Calibration is the process of aligning the scale with actual
samples of performance.
 Calibration involves:
o Collecting performance samples under test conditions.
o Assigning each sample to a point on the scale, ensuring it
represents the full range of possible performances.
Purpose:
 Establish reference points for all future assessments.
 Serve as training materials for scorers.
 Not everyone can rate written work effectively
without training.
 Ideal scorers should be:
o Native or near-native speakers of the target
language.
o Experienced in teaching writing and grading.
o Trained in assessment practices.

Select and train scorers


1. Background & Rationale:
o Introduction to the scoring system and the need for
calibration.
2. Review of Writing Handbook:
o Trainees study the scoring guidelines and key descriptors.

3. Sample Work Analysis:


Training Stage 1 - o Trainees review sample pieces, comparing them against
Background and descriptors.
Overview o Discussion on how to assign levels based on specific
criteria.
4. Diverse Examples:
o Analyze pieces with different strengths (e.g., strong
grammar but poor organization).

Goal: Understand the scoring criteria and see how different types
of work can fit within the same level.
1. Clarify Queries:
o Address any questions or confusion from Stage 1.

2. Calibrated Samples:
o Trainees receive calibrated samples of writing, covering
all levels of the scale.
o They independently assign a score or level to each sample.
Training Stage 2 -
3. Discussion:
Practical
o Review and compare ratings with the trainer’s agreed-
Application upon scores.
o Discuss any discrepancies and ensure consistency in
interpretation.

Goal: Practice rating samples to reinforce understanding.


Keep records of ratings to track progress.
1. Independent Rating:
o Trainees rate new samples on their own
without discussion.

Training Stage 3 2. Accuracy Standard:


- Final o A specific level of accuracy is required for
raters to pass.
Assessment o Trainees who do not meet the required
standard do not become official raters.

Goal: Assess the consistency and accuracy of


raters before certifying them as scorers.
Initial Scoring Process

1. Present benchmark scripts to all scorers


for initial scoring.
2. Scorers review and discuss these scripts
until there is agreement on how to score
them.
3. Only after consensus on benchmark
scripts should the actual scoring begin.
 Each student’s work should be scored
independently by two or more scorers.
 The goal is to increase reliability by involving as
many scorers as possible.
 Record scores on separate sheets for each scorer.

Independent Scoring by Multiple Scorers


Identifying and Resolving Discrepancies

 After scoring, a senior team member should:


o Collate scores from all scorers.
o Identify discrepancies between scores assigned to the same
piece of writing.
• Resolution:
 Small discrepancies can be resolved by averaging the scores.
 Larger discrepancies require discussion and a decision by
senior members of the team.
• Key Concept:
 Scoring environment should be:
o Quiet and well-lit.
o Ensure scorers are not tired, as this can affect
concentration.
• Reason:
 Holistic scoring can be rapid but requires high
levels of concentration.

Importance of Scoring Environment


Multiple scoring ensures reliability, even if
different scorers use slightly different
standards.
Multiple Scoring
for Reliability

Encourage multiple scorers to ensure the


consistency of results.
Feedback is essential in various situations to help
candidates understand their performance.
 Feedback Pro Forma: A structured way of
providing feedback on candidates' work.

Purpose: Feedback helps candidates improve their


skills and address weaknesses.

Feedback
Non-Writing-Specific Feedback Elements

• Key Points:
 Incomplete task performance:
1. Topic: Not all parts addressed or treated superficially.
2. Operations: Failure to execute required operations (e.g.,
compare and contrast).
 Pointless repetition: When the writing includes unnecessary
repetition, it detracts from the overall quality.
1. Misuse of quotation marks: Incorrect or
inconsistent use of quotations.
2. Inappropriate underlining: Overuse or
misuse of underlining in the text.
3. Capitalization issues: Errors in capitalizing
Writing- words unnecessarily or inconsistently.
Specific 4. Style conventions: Failure to adhere to
Feedback appropriate writing style standards.
Elements 5. Failure to split overlong sentences: Writing
that includes long, run-on sentences that
should be broken up.
6. Inappropriate use of sentence fragments:
Use of incomplete sentences or fragments
inappropriately.
• Hughes, A. (2003). Testing for language
teachers (2nd ed.). Cambridge
References University Press.
Thank You!

You might also like