0% found this document useful (0 votes)

56 views409 pages

ASL Module 1 To 7

Module 2 covers the fundamental concepts and principles of educational assessment, including definitions, types, and benefits of assessment. It distinguishes between various forms of assessment such as formative, summative, standardized, and performance-based assessments, and discusses the importance of testing and measurement in evaluating student learning. The module emphasizes the role of assessment in improving teaching and learning processes, as well as the principles of quality educational assessment.

Uploaded by

naniedee4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views409 pages

ASL Module 1 To 7

Uploaded by

naniedee4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 409

MODULE 2

Basic Concepts and Principles in Educational Assessment

Intended Learning Outcomes

By the end of this topic/chapter, you must be able to:
1. Define educational assessment.
2. Recognize basic concepts and principles in assessment of learning like testing,
measurement, evaluation and assessment.
3. Characterize testing, measurement, evaluation and assessment.
4. Differentiate standardized and classroom assessments.

Educational Assessment

Educational assessment pertains to a much wider idea rather than just simply
confining its scope and meaning to exams and tests.
While an educational test is to determine someone’s knowledge to something or
to determine what they have learned; the goal of testing is to measure the skill or
knowledge acquired during the process of learning.
Assessment is a process of documenting knowledge, skills, attitudes and beliefs
in measurable terms. The purpose of assessment in education is to improve both the
teaching process for teachers and the learning process for the students.
Therefore, we can say that educational assessment is the process of gathering
information about what students have learned in their educational environments.

Forms of Educational Assessment

o It may involve formal tests or performance-based activities
o It may be administered online or using paper and pencil or other materials
o It may be objective (requiring one answer) or subjective (there may be many
possible answers, like essays)
o It may be formative (carried out over the course) or summative (administered at
the end of the course)

Types of Educational Assessment

1. Formative Assessment – used throughout the educational process, with the goal
of identifying problem areas and improving teaching and learning
2. Summative Assessment – used at the end of the learning block, as a final test of
student’s knowledge
3. Standardized Assessments – provide a path to discover struggles, successes,
accelerations on specific elements
4. Performance-based Assessment – measure student’s ability to apply skills and
knowledge learned from a unit or units of study
5. Norm-Versus Criterion Referenced Assessments –

Prof Ed 221 Assessment in Learning 1

1
• referenced assessments are given for the purpose of comparing student’s
results to a particular standard
• norm-referenced tests – standard is based on a large sample of students,
whose score is referred to as the norm
• criterion-referenced tests – compare individual students’ results to a
standard, but this time standard is based on the curriculum and is often
designed as a cut off for demonstrating efficiency
6. Alternative Assessments- used to determine what students can or can not do with
respect to what they already know

Benefits of Good Educational Assessment

1. Help educators track students’ progress so they can identify anyone who is
struggling and provide remediation
2. Provides feedback to students about their own performance, which they can use
to improve their knowledge and skills further
3. Motivate students as they know they will be evaluated at the end of each module
or course
4. Help educators set learning objectives and outcomes and determine the best
ways to help students reach their goals
5. Can be used to improve the curriculum
6. Can be used to evaluate teachers’ and school systems’ performance, as well as
the effectiveness of different teaching practices

Principles of quality educational assessment

1. Must be based on defined objectives and outcomes
2. Must be valid

Testing
- A formal, systematic procedure for gathering information

Test
❖ A tool comprised of a set of questions administered during fixed period of time
under comparable conditions for all students
❖ Most dominant form of assessment
❖ Traditional assessment
❖ An instrument used to measure a construct and make decisions
❖ Used to measure the learning progress of a student which is formative in purpose,
or comprehensive covering a more extended time frame which is summative
❖ It may not be the best way to measure how much students have learned but they
still provide valuable information about learning and their progress

Types of Test

Prof Ed 221 Assessment in Learning 1

2
A. according to mode of response
1. a. Oral test (viva voice)
• Answers are spoken
• Measure oral communication skills
• Used to check students’ understanding of concepts, theories and
procedures
• Minimally discriminatory and more inclusive especially for learners who
are dyslexic
• Plagiarism is less likely
• Consumes time and may be stressful to some students
• Favors extrovert and eloquent students

1.b. Written test

• Activities wherein students either select or provide a response to a
prompt
• Can be administered to a large group at one time
• Can measure students, written communication skills
• Can be used to assess lower and higher levels of cognition provided that
questions are phrased properly enables assessment of a wide range of
topics

Forms of Written Assessment

• . Alternate response (true/false)
• Multiple choice
• matching
• Short answer
• essay
• Completion
• Identification

1.c. Performance test

• Are activities that require students to demonstrate their skills or ability
to perform specific actions
• Task are designed to be authentic, meaningful, in-depth and
multidimensional
• Cost and efficiency are some of the drawbacks
• Includes problem-based learning, inquiry task, exhibits, presentation
task and capstone performances

2. a. Selected Response
• Alternate response
• Matching type
• Multiple choice

2. b. Constructed response

Prof Ed 221 Assessment in Learning 1

3
• completion
• Short answer
• Essay restricted or non-restricted
• Problem solving

B. according to ease of quantification of response

1. Objective
• Corrected and quantified easily
• Scores can be readily compared
• It includes true-false, multiple choice, completion and matching items
• Test items have single or specific convergent response

2. Subjective
• Elicits varied response
• May have more than one answer
• Includes restricted and extended-response essays
• Its not easy to check because students have the liberty to write their
answers
• Answers are divergent
• Scores are likely to be influenced by personal opinion or judgement by
the person doing the scoring

C. according to mode of administration

1. Individual Test
• Given to one person at a time
• Individual cognitive and achievement test are administered to gather
extensive information about each student’s cognitive functioning and
his/her ability to process and perform specific task
• It can help identify intellectually gifted students
• It can pinpoint those with learning disabilities (LDs)
• It can also observe students closely during the test to gather additional
information
2. Group Test
• Administered to a class or group of examinees simultaneously
• Developed to address the practical need of testing
• Test is usually objective and responses are more or less restricted
• It does not lend itself for in-depth observations of individual students
• Less opportunity to establish rapport or help students maintain interest
in the test
• Students are assessed on all items of the test
• Students may become bored with easy items and anxious over difficult
ones
• Information obtained from group test is not as comprehensive as those
from individual tests

D. according to test constructor

Prof Ed 221 Assessment in Learning 1

4
Table 1. Type of Test According to Test Constructor
Properties Standardized test Non-standardized test
Prepared by Specialist who are versed in teachers who may not be adept at
the principles of assessment the principles of test construction;
teacher-made test are
constructed haphazardly due to
limited time and lack of
opportunity to pre-test items or
pilot test
Learning Serve as an indicator of Not thoroughly examined for
outcomes & instructional effectiveness validity
content and reflection of the
measured school’s performance
Quality of test Consists of multiple choice Uncertain quality; one or several
items items used to distinguish formats are used; items not
between students entirely objective

Reliability Can be used for a long Scores are not subjected to any
period of time statistical procedure to determine
reliability; not intended to be used
repeatedly for a long time
Administration Administered to a large Administered to one or few
and scoring group of students; scoring classes to measure subject or
and procedures are course achievement; no
consistent; manuals and established standards for scoring
guides are available to aid in and interpreting
the administration and
interpretation

E. according to mode of interpreting results

1. Norm-referenced interpretation
• Evaluation instruments that measure a student’s performance in
relation to the performance of a group on the same test
• Comparisons are made and the students relative position is determined

2. Criterion-referenced interpretations
• Describe each student’s performance against an agreed upon or pre-
established criterion or level of performance
• The criterion is not actually a cutoff score but rather the domain of
subject matter- the range of well-defined instructional objectives or
outcomes
• In a mastery test, the cut score is used to determine whether or not a
student has achieved mastery of a given unit of instruction

Prof Ed 221 Assessment in Learning 1

5
F. according to nature of answer

Table 2. Type of test according to the nature of answer

1. Personality test o Measures one’s personality and behavioral style
o Used in recruitment as aid in determining how a
potential employee will respond to various work-
related activities
o Used in career guidance, in individual and
relationship counseling and in diagnosing
personality disorders
o In schools, it determines personality strength and
weaknesses of students
2. Achievement test o Measures student's learning as a result of
instructions and training experiences
o When used summatively, it is used as a basis for
promotion to the next grade
o Measures students ability and predicts success in
college
3. Intelligence Test o Measure learners’ innate intelligence or mental
ability
o Contain items on verbal comprehension,
quantitative and abstract reasoning, among others,
in accordance with some recognized theory of
intelligence
o Alfred Binet & Theodore Simon (1905) – published
the first modern intelligence test
o Sternberg – constructed a set of multiple choice
questions grounded on his Triarctic Theory of
Human Intelligence. The intelligence test taps into
the three independent aspects of intelligence:
analytic, practical and creative
4. Sociometric Test o Measures interpersonal relationships in a social
group
o Introduced in 1930s
o Allows learners to express their preferences in terms
of likes and dislikes for other members of the group
o Includes peer nomination, peer rating and
sociometric ranking of social acceptance
5. Trade or o Assess an individual’s knowledge, skills and
Vocational Test competence in a particular occupation
o Consists of a theory test and a practical test
o Upon completion , the individual is given
certification for qualification

Prof Ed 221 Assessment in Learning 1

6
o Used to determine the fruitfulness of training
programs

Functions of Testing

A. Instructional Functions

Table 3. Instructional functions of testing

1. Test facilitate the clarification of o When constructing test, teachers are
meaningful learning objectives reminded to go back to the learning
outcomes

2. Test provide a means of feedback o Can be used for self-diagnosis

to the instructor and the student o Students can assess their own
learning and performance
o Test results can guide teachers in
adjusting their pedagogical practices
to match students’ learning styles

Washback – the impact of a test on teaching

and learning

3. Test can motivate learning o - Frequent testing increases

academic preparation (study time)
and academic achievement
o Frequent testing produces a more
positive attitude among students

4. Test can facilitate learning o Testing improved performance

when learners are given the
opportunity to practice retrieval
before giving the final test
o Prompt feedback informs students
how they are doing
Successive learning – test-restudy practice
method conducted at appropriate intervals
which can bring about long-term retention

5. Test are useful means of over o Preparation for a scheduled test

learning induces overlearning
Overlearning – continued study, review,
interaction or practice of the same material
even after concepts and skills had been
mastered

Prof Ed 221 Assessment in Learning 1

7
Measurement

o Refers to the “limit or quantity”. Quantitative description of an object’s

characteristics or attribute.
o Determines how much learning a student acquired compared to a standard
(criterion) or in reference to other learners’ in a group (norm-referenced)
o Measure particular elements of learning like their readiness to learn, recall of
facts, demonstration of specific skills, or their ability to analyze and solve applied
problems
o Use tools or instruments like tests, oral presentation, written reports, portfolios
and rubrics to obtain pertinent information
o Each measurement has two components
1. a true value of the quantity
2. random error component

o The objective in educational measurement is to estimate or approximate the true

value of the quantity of interest
o Objective measurements do not depend on the person or individual taking the
measurements

Evaluation

o Process of judging the quality of a performance or course of action.

o Finding the value of an educational task.
o Carried out both by the teacher and the student to uncover how the learning
process is developing.

Objects of Evaluation
1. Instructional programs
2. School projects
3. Teachers
4. Students
5. Educational goals

Categories of Evaluation
1. Formative Assessment

❖ Judging the worth of the program while the program is in progress

❖ Focuses on the process
❖ Determine deficiencies so that the appropriate interventions can be done
❖ Used in analyzing learning materials, student learning and achievements and
teacher effectiveness

2. Summative Assessment
❖ Judging the worth of the program at the end of the program activities
❖ Focus is on the result

Prof Ed 221 Assessment in Learning 1

8
❖ Tools used for data gathering: questionnaire, survey forms,
interview/observations guide and test
❖ Determine the effectiveness of the program based on its objectives
❖ Techniques for summative evaluation: pretest-posttest with experimental and
control group; one group descriptive analysis
Assessment

Assessment is used to determine students’ learning needs, monitor the progress

of students and examine their performance against identified learning outcomes.
It may be implemented at different phases of instruction such as;
a. before (pre-assessment)
b. during ( formative assessment)
c. after instruction (summative)

From the Latin word “assidere” which means “to sit beside a judge” this implies
that assessment is tied up with evaluation.
It pertains to any method utilized to gather information about student
performance, all activities undertaken by teachers – and by their students in assessing
themselves – that provide information to be used to modify the teaching-learning
activities(TLA) in which they are engaged and aid teachers to make informed decisions
and judgements to improve TLA.

Nature of Assessment
Table 4. Nature of Assessment

Purpose of Assessment

Table 5. Purpose of assessment

PURPOSE

Prof Ed 221 Assessment in Learning 1

9
Assessment for o Diagnostic and formative assessment task which are used
Learning (AfL) to determine learning needs, monitor academic progress
of students during a unit or block of instruction and guide
instruction
o Examples: pre-tests, written assignments, quizzes,
concept maps, focused questions
Assessment as o Employs task or activities that provide students with an
Learning (AaL) opportunity to monitor and further their own learning – to
think about their personal learning habits and how they
can adjust their learning strategies to achieve their goals
o Formative which may be given at any phase of the
learning process
o Involves metacognitive processes like reflection and self-
regulation to allow students to utilize their strengths and
work on their weaknesses by directing and regulating
their learning
o Students are accountable and responsible for their own
learning
o Examples: peer-assessment rubrics, portfolios
Assessment of o Summative and done at the end of the unit, task or
Learning (AoL) process or period
o Purpose is to provide evidence of a student’s level of
achievement in relation to curricular outcomes
o Used for grading, evaluation and reporting purposes
o provides the foundation for decisions on student’s
placement and promotion
o Examples: unit test, final projects

Relevance of Assessment
1. Students
❖ Through varied learner-centered and constructive assessment task. Students
become actively engaged in the learning process
❖ Take responsibility for their own learning
❖ Can learn to monitor changes in their learning patterns
❖ Become aware of how they think, how they learn, how they accomplish task and
how they feel about their work
❖ Redound to ultimately better student achievement

2. Teachers
❖ Informs instructional practice
❖ Results can reveal which teaching methods and approaches are most effective
❖ Provide direction as to how teachers can help students more and what teachers
should do next
❖ Assessment procedures support instructor’s decisions on managing instruction,
assessing student competence, placing students to levels of education programs,
assigning grades to students, guiding and counseling, selecting students for
education opportunities and certifying competence

Prof Ed 221 Assessment in Learning 1

10
3. Parents
❖ Valued source of assessment information on the educational history and learning
habits of their children especially for preschoolers who do not yet understand
their developmental progress
❖ Can help identify needs of children for appropriate intervention
4. Administrators and Program Staff
❖ Identify strengths and weaknesses of the program
❖ Designate priorities, assess options and lay down plans for improvement
❖ Used to make decisions regarding promotion or retention of students and
arrangement of faculty development

5. Policy Makers
❖ Provides information about students’ achievements which in turn reflect the
quality of education being provided by the school
❖ government agencies can set or modify standards, reward or sanction schools
and direct educational resources

Aptitude and Achievement Test

Achievement and aptitude are two kinds of test which measures two different aspects
of learning. Achievement test refers to the amount of knowledge a student has already
learned or mastered, it is used for determination while aptitude test is used for
projection. Aptitude test most likely tells the potential or ability of a student to learn.

Achievement Test
It is a test used to assess student’s achievement or mastery of content, skill or
general academic knowledge; it is often used as admission test or placement test in
schools or in a scholarship grant
1. Standardized achievement test – measures specific things and results are
compared across age and grade level of students and often reported as
percentile, percentage or grade equivalency; same format, same types
of questions and the same content no matter when or where the test is
administered or who is taking the test; administered by trained
individuals
2. Non-standardized achievement test – measures stock or previous
knowledge and learning of students; specific skill determination to
determine ability to a certain specific area of subject; it may be
cumulative final exam or performance task

Aptitude Test
A test which measures test taker’s natural talent or abilities for current and future
use; it includes series of questions in which the taker makes a value judgement, to agree
or disagree, and the results may show what types of career they would be suited for;
Other types of aptitude tests include personality inventories. These types of
assessments will indicate the personal preferences and interpersonal strengths and

Prof Ed 221 Assessment in Learning 1

11
weaknesses of the test taker. These tests may also measure a test taker’s ability to solve
complex problems or future abilities to perform certain tasks.

References
[3] De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana Publishing
Co., Inc. pp 1-32
[6] ] McMillan, J.H. (2018). Classroom assessment: Principles and practice that enhance
student learning and motivation. Pearson Education, Inc. pp 1-33
[8] Navarro, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment of learning 1. Lorimar
Publishing, Inc.
pp10-16
[14] What is educational assessment (2021) . Retrieved from
https://www.proprofs.com/quiz-school/blog/what-is-educational-assessment-and-why-
is-it-necessary
[15] Forstall, M (2019). Retrieved from https://www.theclassroom.com/achievement-vs-
aptitude-tests-5607096.htm

Prof Ed 221 Assessment in Learning 1

12
Prof Ed 221 Assessment in Learning 1
13
UNIVERSITY OF SOUTHERN MINDANAO

Principles of High Quality

Assessment
Prof Ed 221 ASL 1
Topic Outline
1. Principles of High-Quality Assessment
2.Validity
3. Reliability
4. Ethics/Fairness
5. Practicality and efficiency
Insert Running Title 2
Intended Learning Outcomes

1.Interpret principles of high quality

assessment.
2.Identify factors that make assessment
valid, reliable, fair, ethical, practical
and efficient.

3
Insert Running Title
What govern assessment of learning?
• Five standards of quality assessment to inform sound
instructional decisions:
• 1. Clear purpose
• 2. Clear learning targets
• 3. Sound assessment design
• 4. Effective communication of results
• 5. Student involvement in the assessment process

(Chappuis, Chappuis & Stiggins (2009)

Prof Ed 221 ASL 1 4

Classroom assessment

What do you How are you

Why are you
want to going to
assessing?
assess? assess?

Assessment methods and tools should be parallel to the

learning targets or outcomes to provide learners with
opportunities that are rich in breadth and depth and
promote deep understanding.
• Not all assessment methods are applicable to every type
of learning outcomes and teachers have to be skillful in
the selection of assessment methods and designs.
• Knowledge of the different levels of assessment is
paramount.

ILO: students Assessment:

should be able written essay
to communicate
their ideas
verbally
Identifying Learning Outcomes
• pertains to a particular level of knowledge,
Learning skills and values that a student has acquired
at the end of the unit or period of study as
a result if his/her engagement in a set of
outcomes appropriate and meaningful learning
experiences.

An organized set of learning outcomes helps teachers plan and deliver

appropriate instruction and design valid assessment tasks and
strategies.
Four Steps In A Student Outcomes Assessment
Anderson, et al. (2005)

1. create learning outcome statements;

2. design teaching/assessments to achieve these outcomes

statements;

3. implement teaching/assessment activities;

4. analyze data on individual and aggregate levels; and

5. reassess the process

Taxonomy of learning domains
Learning Outcomes
• statements of performance expectations:
Learning outcomes cognitive, affective and psychomotor

Within each domain are levels of expertise that drives assessment.

These levels are listed in order of increasing complexity.
Higher levels require more sophisticated methods of assessment but
they facilitate retention and transfer of learning.

All learning outcomes must be capable of being assessed and measured

– using direct and indirect assessment techniques.
Cognitive (Knowledge-based)
originally devised by Bloom, Engelhart, Furst, Jill & Krathwohl (1956) and
revised by Anderson, Krathwohl et al. (2001)

produced a two-dimensional framework of Knowledge and Cognitive

Processes and account for 21st century needs by including metacognition

designed to help teachers understand and implement a standards-based

curriculum.
involves the development of knowledge and intellectual skills

answers the question, "What do I want learners to know?”

Cognitive (Knowledge-based)

• stressed that the revised Bloom's

taxonomy table is not only used to
classify instructional and learning
Krathwohl activities used to achieved the
(2002) objectives but also for assessments
employed to determine how well
learners have attained and mastered
the objectives.
Cognitive (Knowledge-based)

Self-system

Marzano
Metacognitive
& Kendall system
(2007)
Knowledge;
Comprehension;
Cognitive system Analysis; Knowledge
Utilization
Cognitive (Knowledge-based)
Knowledge - same with Remembering

Cognitive System Comprehension - entails synthesis &

representation
Analysis - involves processes of matching,
classifying, error analysis, generalizing &
specifying
Knowledge Utilization – decision making,
problem-solving, experimental inquiry
and investigation
E.g.. Science
Design an experiment to determine the factors that
affect the strength of an electromagnet

Which of the following factors does not affect the strength

of an electromagnet?
• a. diameter of the coil
• b. direction of the windings
• c. nature of the coil material
• d. number of the turns in the coil
PSYCHOMOTOR (Skills-based)

focuses on physical and mechanical skills involving

coordination of the brain and muscular activity

answers the question "What actions do I want learners to be

able to perform?"
PSYCHOMOTOR (Skills-based)
• Imitation, Manipulation, Precision,
Dave, (1970) identified 5 Articulation & Naturalization
levels of behavior

• Perception, Set, Guided Response,

Simpson (1972) laid down 7 Mechanism, Complex Overt Response,
progressive levels Adaptation & Origination

Harrow (1972) developed her own • Reflex movements, basic fundamental

taxonomy with 6 categories organized movement, perceptual, physical activities,
according to degree of coordination skilled movements, non-discursive
communication
AFFECTIVE (Values, Attitudes & Interest

emphasizes emotional knowledge

tackles the question, "What actions do I want learners to think or

care about?”

developed by Krathwohl, Bloom & Masia (1964)

includes factors such as student motivation, attitudes, appreciation

and values
Types Of Assessment Methods
ASSESSMENT METHODS
Categorized according to the nature and characteristics of each
method

Similar to carpenter tools and you need to choose which is apt for a
given task

It is not wise to stick to one method of assessment.

“If the only tool you have is a hammer, you tend to see every problem
as a nail.”
Assessment Methods

MacMillan (2007)
Selected-response

Constructed-response

Teacher-observation

Student self-assessment
1. SELECTED-RESPONSE FORMAT

students select from a given set of options to answer a

question or a problem

its objective and efficient because there is only one correct

or best answer

the items are easy to grade - teacher can assess and score a
great deal of content quickly
1. SELECTED-RESPONSE FORMAT

alternate
multiple-
response matching type
choice
(true/false)
2. CONSTRUCTED-RESPONSE FORMAT
demands that students create or produce their own answers in response
to a question, problem or task

is more useful in targeting higher level of cognition

items may fall under any of the following categories

• brief-constructed response items
• performance tasks
• essay items
• oral questioning
2.A. Brief-constructed Response Items
require only short responses from students
E.g. sentence completion where students fill in a blank at the
end of the statement

E.g. short -answer to open-ended questions

E.g. labeling a diagram

E.g. answering a mathematics problem by showing their

solutions
2.b. PERFORMANCE ASSESSMENT
require students to perform a task rather than select from a given set of
options

students have to come up with a more extensive and elaborate answer or

response

called authentic or alternative assessments because students are

required to demonstrate what they can do through activities, problems
and exercises

can be a more valid indicator of students' knowledge and skills than other
assessment methods
2.b. PERFORMANCE ASSESSMENT

• contains the performance criteria used

for grading performance tasks
• may be analytic scoring rubric where
different dimensions and characteristics
Scoring Rubric
of performance are identified and marked
separately
• holistic rubric where the overall process
or product is rated
2.b. PERFORMANCE TASKS

provide opportunities for students to apply their knowledge and skills in

real-world contexts

may be product-based or skills-oriented

students have to create or produce evidence of their learning or do

something and exhibit their skills
2.b. Examples of Products
written reflection
journals
reports papers

projects web pages tables

spreadsheets/
poems graph
worksheets

audio-visual illustrations/
portfolio
materials models
2.B. Examples Of Performance Or Skills-based
Activities
speech role play athletics

teaching
recital
demonstration

dramatic reading debate

2.b. PERFORMANCE ASSESSMENT
can result to better
integration of greater focus on higher
assessment with order thinking skills
instruction

increased motivation
improved instructional
level in the learning
and content validity
process
2.c. Essay Assessments
involve answering a question or proposition in written
form
allows students to express themselves and
demonstrate their reasoning
may be easy to construct, but they require much
thought in the part of the teacher
essay questions have to be clear so that students can
organize their thoughts quickly and directly answer the
questions
use rubric to score essays
2.c. Essay Assessments
• requires a few sentences
• there are constraints to the content
Restricted response
and nature of the response
• questions are more focused

• Allow for more flexibility on the part

of the student
Extended response
• Responses are longer and more
complex
2.d. Oral Questioning
Common assessment method during instruction to check on student
understanding
May take the form of an interview or conference when done formally

The teacher can keep students on their toes, received acceptable

responses, elicit various types of reasoning from the students and at the
same time strengthen their confidence.

The teacher can probe deeper and find out for himself/herself if the
student knows what he/she is talking about.
Responses to oral questions are assessed using a scoring system or rating
scale.
3. TEACHER OBSERVATIONS
A form of on-going assessment, usually done in combination with oral
questioning

Teacher regularly observe students to check on their understanding

By watching how students respond to oral questions and behave during

individual and collaborative activities, the teacher can get information if
learning is taking place in the classroom

Non-verbal cues communicate how learners are doing. Teachers have to

be watchful if students are losing attention, misbehaving or appear non-
participative in classroom activities.
3. TEACHER OBSERVATIONS
It would be beneficial if teachers make observational or anecdotal
notes to describe how students learn in terms of concept building,
problem solving, communication skills, etc.

can also be used to assess the effectiveness of teaching strategies and

academic interventions

Information gathered from observations reveal the strengths and

weaknesses of individual students and the class a whole

serve as basis for planning and implementing new supports for learning
4. STUDENT SELF-ASSESSMENT
one of the standards of quality assessment identified by Chappuis,
Chappuis & Stiggins (2009)

process where the students are given the chance to reflect and rate
their own work and judge how well they have performed in relation
to a set of assessment criteria

students tract and evaluate their own progress or performance

self-monitoring techniques like activity checklist, diaries and self-

report inventories
4. STUDENT SELF-ASSESSMENT
provide an opportunity to reflect on their performance, monitor
their learning progress, motivate them to do well and give
feedback to the teacher which the latter can use to improve the
subject/course

enhances student achievement, improves self-efficacy and

promotes a mastery goal orientation and more meaningful
learning

an essential component of formative assessment

References
• https://irds.stanford.edu/sites/g/files/sbiybj10071/f/msmt.pdf
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana Publishing Co., Inc.
2. McMillan, J.H. (2018). Classroom assessment: Principles and practice that enhance student learning and
motivation. Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers need to know. Pearson Education, Inc.

Insert Running Title 39

UNIVERSITY OF SOUTHERN MINDANAO

Module 3 Validity &

Reliability
Prof Ed 221 ASL 1
Topic Outline

. Principles of High-Quality
1
Assessment
2.Validity
3. Reliability
4. Ethics/Fairness
5. Practicality and efficiency
Insert Running Title 2
Intended Learning Outcomes

1.Interpret principles of high

quality assessment.
2.Identify factors that make
assessment valid, reliable, fair,
ethical, practical and efficient.
3
Insert Running Title
VALIDITY and
RELIABILITY
CHAPTER 4
VALIDITY
VALIDITY
❑is a term derived from the Latin word validus
which means “strong”.
❑It pertains to accuracy of the inferences teachers
make about students based on the information
gathered from the assessment (McMillan, 2007;
Fives & DiDonato-Barnes 2013)
VALIDITY
❑Content-Related Evidence
✓Face Validity
✓Instructional Validity
❑Criterion-Related Evidence
✓Concurrent Validity
✓Predictive Validity
❑Construct-Related Evidence
✓Convergent Validity
✓Divergent Validity
Content-Related Evidence
Content-Related Evidence
❑ pertains to the extent to which the test covers the domain of

content. If a summative test covers a unit with four topics, then the

assessment should contain items from each topic. This is done

through adequate sampling of content. A student’s performance in

the test may be used an in indicator of his\her content knowledge.

•Face Validity •Instructional Validity
❑The extent to which an
❑Test that appears to assessment is systematically
adequately measure the sensitive to the nature of
instruction offered.
learning outcomes and
content ❑An instructionally valid test is one
that registers differences in an
❑Based on the subjective amount and kind of instruction to
opinion of the one viewing it which students have been
❑Non-systematic or non- exposed. –Yoon & Resnick,1998
specific
Content-Related Evidence
TABLE OF SPECIFICATION
➢prepared before developing the test
➢test blueprint that identifies the content area and describes the
learning outcomes at each level of domain –Notar, et al., 2004
➢a tool used in conjunction with lesson and unit planning to help
teachers make genuine connections between planning,
instruction, and assessment –Fives and DiDonato-Barnes, 2013
➢assures teachers that they are testing students’ learning across
a wide range of content and readings as well as cognitive
processes requiring higher order thinking
Content-Related Evidence
SIX ELEMENTS IN TOS DEVELOPMENT
➢Balance among the goals selected for the examination
➢Balance among the levels of learning
➢The test format
➢The total number of items
➢The number of items for each goal and level of
learning
➢The enabling skills to be selected from each goal
framework
Criterion-Related Evidence
Criterion-Related Evidence
•refers to the degree to which test scores agree
with an external criterion.
•examines the relationship between an assessment
and another measure of he same trait –McMillan,
2007
•Three types of criteria:
• Achievement test scores
• Ratings, grades and other numerical judgments made by the
teacher
• Career data
Criterion-Related Evidence
CONCURRENT VALIDITY PREDICTIVE VALIDITY
•Provides an estimate of a •Pertains to the power or
student’s current usefulness of test scores
performance in relation to to predict future
previously validated or performance
established measure
*in testing correlations between two data sets for both concurrent
and predictive validity, the PEARSON CORRELATION COEFFICIENT (r)
or SPEARMAN’S RANK ORDER CORRELATION may be used.
Coefficient of determination = r2
Construct-Related Evidence
Construct-Related Evidence
•An assessment of the quality of the instrument used
•Measures the extent to which the assessment is a
meaningful measure of an unobservable trait or
characteristic –McMillan,2007
•Three types of construct-related evidences:
•theoretical
•Logical
•Statistical
• The construct must be operationally defined or explained
explicitly to differentiate it from other constructs.
Construct-Related Evidence
• In 1955, Lee Cronbach and Paul Meehl insisted that to provide evidence
of construct validity, one has to develop a nomological network.
• Construct validity can take the form of a differential group study.
• Another form is an intervention study wherein a test is given to a group
of students who are weak in problem-solving strategies.
• Two methods of establishing construct validity: convergent and divergent
validation.
• Convergent validity occurs when measures of constructs that are
related in fact observed to be related.
• Divergent validity occurs when constructs that are unrelated are in
reality observed not to be.
Construct-Related Evidence
•In 1959, Campbell and Fiske developed a statistical
approaches called Multitrait-Multimethod Matrix MTMM
•A table of correlations arranged to facilitate the assessment
of construct validity, integrating both convergent and
divergent validity.
• McMillan, 2007 recommends, for practical purposes, the
use of clear definitions and logical analysis as construct
related evidences.
Unified Concept of Validity
Unified Concept of Validity
• Messick, 1989 proposed a unified •Six distinct aspect of
concept of validity which
integrates considerations of construct validity:
content, criteria, and •Content
consequences into a construct •Substantive
framework for the empirical
testing of rational hypotheses •Structural
about score meaning and •Generalizability
theoretically relevant •External
relationships.
•Consequential
Validity of Assessment Methods
Validity of Other Assessment Methods

•Developing performance
assessments involves:
•Define the purpose
•Choose the activity
•Develop criteria for scoring
Validity of Assessment Methods
❑Define the purpose
•The first step is about determining the
essential skills students need to develop
and content worthy of understanding.
•To acquire validity evidence in terms of
content, performance assessments
should be reviewed by qualified content
experts.
Validity of Assessment Methods
❑Choose the activity
• The selected performance should reflect a valued activity.
• The completion of performance assessments should provide
a valuable learning experience.
• The statement of goals and objectives should be clearly
aligned with the measureable outcomes of the performance
activity.
• The task should not examine extraneous or unintended
variables.
• Performance assessments should be fair and free from bias.
Validity of Assessment Methods
❑Develop criteria for scoring
• In scoring, a rubric or rating scale should be created.
• In controlled conditions, oral questioning has high validity.
• For observations, operational and response definition should be
accurately describe the behavior of interest.
• It is highly valid if evidence is properly recorded and interpreted.
• TRIANGULATION-a technique to validate results through cross verification from two
or more sources.
• Validity in self-assessment is described as the agreement
between self-assessment ratings with teacher judgments or
peer rankings.
“No single type of instrument or method of
data collection can assess the vast array of
learning and development outcomes in a
school program“

-McMillan, Linn and Gronlund, 2009

Threats to Validity
Threats to Validity
McMillan, Linn and Gronlund, 2009 identified ten factors that affect
valifity of assessment results.
•Unclear test directions •Inappropriate level of
•Complicated vocabulary difficulty of test items
and sentence structure for outcomes being
•Ambiguous statements measured
•Inadequate time limits •Short test
•Inappropriate level of •Improper arrangement
difficulty of test items of items
•Poorly constructed test •Identifiable pattern of
items answers
Threats to Validity
McMillan, 2007 laid down suggestions for enhancing validity.

Ask others to judge the clarity of what are you assessing.

Check to see if different ways of assessing the same

thing give the result.

Sample a sufficient number of examples of what is being

assessed.
Prepare detailed table of specification

Ask others to judge the match between the assessment

iitems and the objectives of the assessment.

Compare groups known to differ on what is being

assessed.

Compare scores taken before to those taken after

instruction.

Compare predicted consequences to actual

consequences.
Compare scores on similar, but different traits.

Provide adequate time to complete the assessment.

Ensure appropriate vocabulary, sentence structure and

item difficulty.

Ask easy question first.

Use different methods to assess the same thing.

Use only for intended purposes.

RELIABILITY
RELIABILITY
• It talks about reproducibility and consistency in methods and
criteria.
• Reliable assessment produces the same results if given to an
examinee on two occasions.
• It pertains to the obtained assessment results and not to the
test or any other instrument.
• It is unlikely to turn out 100% because no two tests will
consistently produce identical results.
• Environmental factors like lightning and noise may affect
reliability.
• Student error and physical well-being of examinees also affect
consistency of assessment results.
RELIABILITY
•For a test to be valid, it has to be reliable.
•It is expressed as correlation coefficient.
•Two types of reliability:
•Internal reliability
•Assesses the consistency of results across items
within a test.
•External reliability
•Gauges the extent to which a measure varies
from one use to another.
SOURCES OF RELIABILITY
EVIDENCE
SOURCES OF RELIABILITY EVIDENCE

DECISION STABILITY
CONSISTENCY

INTERNAL
CONSISTENCY EQUIVALENCE SCORER OR
RATER
CONSISTENCY
STABILITY

•The test-retest reliability correlates scores

obtained from two administrations of the
same test over a period of time.
EQUIVALENCE
•Parallel forms or reliability ascertain the equivalency of
forms. In this method, two different versions of an
assessment tool are administered to the same group
of individuals. However, the items are parallel, i.e. they
probe the same construct, base knowledge or skill.
The two sets of scores are then correlated in order to
evaluate the consistency of results across alternative
versions.
INTERNAL CONSISTENCY
•It implies that a student who has mastery learning
will get all or most of the items correctly while a
student who knows little or nothing about the
subject matter will get all or most of the items
wrongly.
•To check the internal consistency, the split-half
method can be used.
INTERNAL CONSISTENCY
❑SPEARMAN-BROWN FORMULA

❑Whole test reliability = 2x reliability on ½ test

1 + reliability on ½ test

*to improve the reliability of the test

employing this method, items with low
correlations are either removed or modified.
INTERNAL CONSISTENCY
❑For internal consistency, the range of
reliability measures are rated as
follows:
❑0.00-0.49 *low reliability
❑0.50-0.80 *moderate reliability
❑0.81 above *high reliability
SCORER or RATER CONSISTENCY
•People do nor necessarily rate in a similar way.
•Certain characteristics of the raters contribute to errors like
bias, halo effect, mood, fatigue, among others.
•Inter-rater reliability
• it is the degree to which different raters, observers or judges agree
in their assessment decision.
• It is useful when grading essays, writing samples, performance
assessment and portfolios.
• To estimate inter-rater reliability, the Spearman’s rho (for ordinal
data) or Cohen’s kappa (for nominal and discrete data) may be
used.
DECISION CONSISTENCY
•describes hoe consistent the classification
decisions are rather than how consistent the
scores are.
•seen in situations when teacher decide who
will receive a passing or fail mark, or
considered to possess mastery or not.
MEASUREMENT ERRORS
MEASUREMENT ERRORS
•It can be caused examinee-specific factors like
fatigue, boredom, lack of motivation,
momentary lapses of memory and
carelessness.
•It can also be caused by test-specific factors.
•It can also arise due to scoring factors.
MEASUREMENT ERRORS
❑CLASSICAL TEST THEORY

❑X= T + E
❑X is the observation ( a measured score)
❑T is the true value
❑E is some measurement error
RELIABILITY of ASSESSMENT METHODS
RELIABILITY of ASSESSMENT METHODS
•Below are the ways to improve reliability of
assessment results (Nitko & Brookhart, 2011)
•Lengthen the assessment procedure by
providing more time, more questions and more
observation whenever practical.
•Broaden the scope of the procedure by
assessing all the significant aspects of the
largest learning performance.
❖Improve objectivity by using a systematic and more
formal procedure for scoring student performance. A
scoring scheme or rubric would prove useful.
❖Use multiple markers by employing inter-rater
reliability.
❖Combine results from several assessments especially
when making crucial educational decisions.
❖Provide sufficient time to student in completing the
assessment procedure.
❖Teach students how to perform their best by
providing practice and training to students and
motivating them.
❖Match the assessment difficulty to the students’
ability levels by providing tasks that are neither too
easy nor too difficult and tailoring the assessment to
each student’s ability level when possible.
❖Differentiate among students by selecting
assessment tasks that distinguish or discriminate
the best from the least able students.
References
• https://irds.stanford.edu/sites/g/files/sbiybj10071/f/msmt.pdf
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of
learning 1. Adriana Publishing Co., Inc.
2.McMillan, J.H. (2018). Classroom assessment: Principles and
practice that enhance student learning and motivation.
Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers
need to know. Pearson Education, Inc.

Insert Running Title 52

UNIVERSITY OF SOUTHERN MINDANAO

Module 3 Ethics
Prof Ed 221 ASL 1
Topic Outline

. Principles of High-Quality Assessment

2.Validity
3. Reliability
4. Ethics/Fairness
5. Practicality and efficiency
Insert Running Title 2
Intended Learning Outcomes

1.Interpret principles of high

quality assessment.
2.Identify factors that make
assessment valid, reliable, fair,
ethical, practical and efficient.
3
Insert Running Title
ETHICS
CHAPTER 6
ETHICS
CHAPTER 6
Teachers' assessments have important
long-term and short-term consequences for
students; thus teachers have an ethical
responsibility to make decisions using the
most valid and reliable Information
possible

-Russell & Airasian, 2012

Students' Knowledge of Learning
Targets and Assessments
• Transparency
➢disclosure of information to students about
assessments.
➢This includes
➢what learning outcomes are to be assessed
and evaluated
➢ assessment methods and formats
➢ weighting of items
➢ allocated time in completing the
assessment
➢grading criteria or rubric.
Students' Knowledge of Learning
Targets and Assessments
• For written tests, it is important that students
know what is included and excluded in the test.

• As for performance assessments, the criteria

should be divulged prior to assessment so that
students will know what the teacher is looking
for in the actual performance or product.
Students' Knowledge of Learning
Targets and Assessments
• What about surprise tests or pop quizzes?
• Graham's (1999) study revealed that unannounced
quizzes raised test scores of mid-range
undergraduate students and majority of students in
his sample claimed to appreciate the use of quizzes.
Students' Knowledge of Learning
Targets and Assessments
• What about surprise tests or pop quizzes?
• Kamuche (2007) reported that unannounced
quizzes showed better academic performance than
the control group with announced quizzes.
• Graham (cited by Kamuche, 2007) stated that
unannounced quizzes tend to increase the
examination tension and stress, and did not offer a
fair examination.
Students' Knowledge of Learning
Targets and Assessments
• Test-taking skills is another concern.

• Teachers should not create unusual

hybrids of assessment formats.
Opportunity to Learn

• McMillan (2007) asserted that fair

assessments are aligned with instruction that
provides adequate time and opportunities for
all students to learn.
• Discussing an extensive unit in an hour is
obviously insufficient.
Opportunity to Learn

• Inadequate instructional approaches would

not be just for the learners because they are
not given enough experiences to process
information and develop their skills.
Prerequisite Knowledge and Skills

• Students may perform poorly in an assessment if

they do not possess background knowledge and
skills.
• It would be improper if students are tested on the
topic without any attempt or effort to address the
gap in knowledge or skills.
• The problem is compounded if there are
misconceptions. The need for action and correction
is more critical.
Prerequisite Knowledge and Skills

• The teacher can analyze the assessment items

and procedures and determine the pieces of
knowledge and skills required to answer
them.
• the teacher can administer a prior knowledge
assessment, the results of which can lead to
additional or supplemental teacher or
students-managed activities like peer-assisted
study sessions, compensatory groups, note
swapping and active review.
Prerequisite Knowledge and Skills

• Another problem emerges if the assessment

focuses heavily on prior knowledge and prerequisite
skills.

• So as not to be unfair, the teacher must identify

early on the prerequisite skills necessary for
completing an assessment.
Prerequisite Knowledge and Skills

• The teacher may also provide clinics or reinforced tutorials to

address gaps in students' knowledge and skills.

• He/she may also recommend reading materials or advise students to

attend supplemental instruction sessions when possible.
Prerequisite Knowledge and Skills

• In the undergraduate level, prerequisites

are imposed to ensure that students
possess background knowledge and skills
necessary to advance and become
successful in subsequent courses.
Avoiding Stereotyping

• A stereotype is a generalization of a group

of people based on inconclusive
observations of a small sample of this
group.

• Common stereotypes are racial, sexual and

gender remarks.
Avoiding Stereotyping

• Stereotyping is caused by preconceived

judgments of people one comes in contact
with which are sometimes unintended.

• It is different from discrimination which

involves acting out one's prejudicial
opinions.
Avoiding Stereotyping

• A professional education teacher may

believe that since the education program
is dominated by females, they are better
off as teachers than males.
• Stereotypes may either be positive or
negative.
Avoiding Stereotyping

• Teachers should avoid terms and

examples that may be offensive to
students of different gender, race,
religion, culture or nationality.

• Stereotypes can affect students'

performance in examinations.
Avoiding Stereotyping

• In 1995, Steele & Aronson developed the theory

of stereotype threat claiming that for people
who are challenged in areas they deem
important like intellectual ability, their fear of
confirming negative stereotypes can cause them
to falter in their actual test performance.
Avoiding Stereotyping

• To reduce the negative effects of stereotype threat,

simple changes in classroom instruction and assessment
can be implemented.

• A school environment that fosters positive practices and

supports collaboration instead of competition can be
beneficial especially for students in diverse classrooms
where ethnic, gender and cultural diversity thrive.
Avoiding Stereotyping

• Jordan & Lovett (2006) recommended five

concrete changes to psycho-educational
assessment to alleviate stereotype threats:
❖Be careful in asking questions about topics related to
a student's demographic group. This may
inadvertently induce stereotype threats even if the
information presented in the test is accurate.
Avoiding Stereotyping

❖Place measures of maximal performance like ability

and achievement tests at the beginning of
assessments before giving less formal self-report
activities that contain topics or information about
family background, current home environment,
preferred extracurricular activities and self-
perceptions of academic functioning.
Avoiding Stereotyping

❖Do not describe tests as diagnostic of Intellectual capacity.

❖Consider possibility of stereotype threat when interpreting test

scores of susceptible typecast individuals.
Avoiding Stereotyping

❖Determine if there are mediators of

stereotype threat that affect test
performance. This can be done using
informal interviews or through
standardized measures of cognitive
interference and test anxiety.
Avoiding Bias in Assessment Tasks and
procedures

• Assessment must be free from bias.

• Fairness demands that all learners are given

equal chances to do well (from the task) and
get a good assessment (from the rater).

• Teachers should not be affected by factors that

are not part of the assessment criteria.
Avoiding Bias in Assessment Tasks and
procedures

• This aspect of fairness also includes removal

of bias towards students with limited
English or with different cultural
experiences when providing instruction and
constructing assessments (Russell & Airasian,
2012).
Avoiding Bias in Assessment Tasks and
procedures

• There are two forms of assessment bias:

offensiveness and unfair penalization (Popham,
2011). These forms distort test performance of
individuals in a group.
Avoiding Bias in Assessment Tasks and
procedures

• Offensiveness happens if test-takers get

distressed, upset or distracted about how an
individual or a particular group is portrayed in
the test.

• They tend to focus on the offensive items and

their concentration in answering subsequent
items suffers.
Avoiding Bias in Assessment Tasks and
procedures
• Ultimately, they end up not performing as well as they could have,
reducing the validity of inferences.
Avoiding Bias in Assessment Tasks and
procedures

• Unfair penalization harms student

performance due to test content, not
because items are offensive but rather, the
content caters to some particular groups
from the same economic class, race,
gender, etc., leaving other groups at a loss
or a disadvantage.
Avoiding Bias in Assessment Tasks and
procedures

• Unfair penalization causes distortion and

greater variation in scores which is not due to
differences in ability.

• Substantial variation or disparity in assessment

scores between student groups is called
disparate impact.
Avoiding Bias in Assessment Tasks and
procedures

• Popham (2011) pointed out that disparate

impact is not tantamount to assessment bias.

• Differentiation may yet exist but it may be due

to inadequate prior Instructional experience.
Avoiding Bias in Assessment Tasks and
procedures

• If the test showed no signs of bias, then it is

insinuated that the disparate impact is due
to prior instructional inadequacies or lack of
preparation.
Avoiding Bias in Assessment Tasks and
procedures

• To avoid bias during the instruction phase,

teachers should heighten their sensitivity
towards bias and generate multiple examples,
analogies, metaphors and problems that cut
across boundaries.
Avoiding Bias in Assessment Tasks and
procedures

• Teachers can have their tests reviewed by

colleagues to remove offensive words or
items.

• Content -knowledgeable reviewers can

scrutinize the assessment procedure or each
item of the test.
• In developing high-stakes test, a review panel is
usually formed - a mix of male and female members
from various subgroups who might be adversely
impacted by the test.
• On each item, the panelists are asked to determine if
it might offend or unfairly penalize any group of
students on the basis of personal characteristics.
• Each panel member responds and gives their
comments.
• The mean per Item absence-of-bias Index is
calculated by getting the average of the "no"
responses.

• If an Item is found biased, the item is discarded.

• Qualitative comments are also considered in the

decision to retain, modify or reject items.

• Afterwards, the entirety of the test is checked

for any bias.
• As for the empirical approach, try-out evidence is
sought.

• The test may be pilot-tested to different groups

after which differential item functioning (DIF)
procedures may be employed.

• A test item is labeled with DIF when people with

comparable abilities but from different groups have
unequal chances of item success.

• Item response theory (IRT), Mantel-Haenszel and logistic regression are

Accommodating Special Needs

• The legal basis for accommodation is contained

in Sec. 12 of Republic Act 7277 entitled "An Act
Providing for the Rehabilitation, Self-
Development and Self Reliance of Disabled
Person and their Integration into the
Mainstream of Society and for Other
Purposes"
Accommodating Special Needs

• Another is Sec. 32 of CHED Memorandum 09, s.

2013 on "Enhanced Policies and Guidelines on
Student Affairs and Services" which states that
higher education institutions should ensure that
academic accommodation is made available to
persons with disabilities and learners with special
needs.
Accommodating Special Needs

• Accommodation does not mean giving

advantage to students with learning disabilities
but rather allowing them to demonstrate their
knowledge on assessments without
hindrances from the disabilities.
Accommodating Special Needs

• Accommodations can be placed in one of six

categories (Thurlow, McGrew, Tindal, Thompson
& Ysseldyke, 2000)
o Presentation (repeat directions, read aloud, use large
print, braille)

o Response (mark answers in test booklet, permit

responses via digital recorder or computer, use
reference materials like dictionary)
Accommodating Special Needs

oSetting (study carrel, separate room, preferential

seating, Individualized or small group, special
lightning)
oTiming (extended time, frequent breaks, unlimited
time)
Accommodating Special Needs

oScheduling (specific time of day, subtests in

different order, administer test in several timed
sessions).
oOthers (special test preparation techniques and
out-of-level tests)
Accommodating Special Needs

• To ensure the appropriateness of the

accommodation supplied, it should take into
account three important elements:
▪ Nature and extent of the learner's disability

▪ Type and format of assessment.

▪ Competency and content being assessed

Accommodating Special Needs

• Nature and extent of the learner's disability

▪ Accommodation is dictated by the type and degree of
disability possessed by the learner. A learner with
moderate visual impairment would need a larger print
edition of the assessment or special lighting condition.
Of course, a different type of accommodation is needed
if the child has severe visual loss.
Accommodating Special Needs

• Type and format of assessment

▪ Accommodation is matched to the type and format
of assessment given. Accommodations vary
depending on the length of the assessment, the
time allotted, mode of response, etc. A partially
deaf child would not require assistance in a written
test.
Accommodating Special Needs

• Type and format of assessment

▪ However, his/her hearing impairment would affect
his/her performance should the test be dictated.
He/she would also have difficulty in assessment
tasks characterized by group discussions like round
table sessions.
Accommodating Special Needs

• Competency and content being assessed

▪ Accommodation does not alter the level of performance
or content the assessment measures. In Science,
permitting students to have a list of scientific formulae
during a test is acceptable if the teacher is assessing how
students are able to apply the formulae and not simple
recall.
Accommodating Special Needs

• Competency and content being assessed

▪ In Mathematics, if the objective is to add and
subtract counting numbers quickly, extended time
would not be a reasonable accommodation.
Relevance
• Relevance can also be thought of as an aspect of
fairness.

• Irrelevant assessment would mean short-changing

students of worthwhile assessment experiences.

• Assessment should be set in a context that

students will find purposeful. Killen (2000) gave
additional criteria for achieving quality
Relevance

• "Assessment should reflect the knowledge and

skills that are most important for students to
learn.“
• Assessment should not Include Irrelevant and trivial
content. Instead, it should measure learner's higher-
order abilities such as critical thinking, problem solving
and creativity which are 21" century skills.
Relevance

• "Assessment should support every student's

opportunity to learn things that are important."
• Assessment must provide genuine opportunities for
students to show what they have learned and
encourage reflective thinking. It should prompt them
to explore what they think is important.
Relevance

• "Assessment should tell teachers and individual

students something that they do not already
know."
• Assessment should stretch students' ability and
understanding. Assessment tasks should allow
them to apply their knowledge in new situations.
Ethical Issues

• Grades and reports of teachers generated

from using invalid and unreliable test
instruments are unjust. Resulting
interpretations are inaccurate and
misleading.
Ethical Issues

• Other ethical issues in testing (and research)

that may arise include possible harm to the
participants; confidentiality of results;
deception in regard the purpose and use of the
assessment; and temptation to assist students
in answering tests or responding to surveys.
End
Insert Running Title 62
ETHICS
CHAPTER 6
References
• https://irds.stanford.edu/sites/g/files/sbiybj10071/f/msmt.pdf
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of
learning 1. Adriana Publishing Co., Inc.
2.McMillan, J.H. (2018). Classroom assessment: Principles and
practice that enhance student learning and motivation.
Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers
need to know. Pearson Education, Inc.

Insert Running Title 64

UNIVERSITY OF SOUTHERN MINDANAO

Development of Traditional Tools for

Classroom-Based Assessment
Prof Ed 221-ASL 1
Topic Outline
• ● Selected-response type items: Multiple-choice, binary-
choice, and matching; Advantages, disadvantages, best
practices
• ● Constructed-response type items: Completion, short-
answer, and essay; Scoring criteria; Advantages,
disadvantages, best practices

Insert Running Title 2

Intended Learning Outcomes
• 5.1 Recognize the advantages and disadvantages of using different
selected-response type items, including multiple-choice, binary-choice,
and matching.
• 5.2 Identify appropriate practices in the construction of selected-
response items.
• 5.3 Construct sound selected-response items that match the nature of
the learning target that is assessed.
• 5.4 Recall the advantages and disadvantages of using different types of
constructed-response items.
• 5.5 Identify appropriate practices for writing and/or selecting effective
completion, short-answer, and essay type items.
• 5.6 Construct effective completion, short-answer, and essay type items,
and scoring criteria.
3
Insert Running Title
PREPARING A TEST BLUEPRINT
Identify purpose of
the test
1

Specify learning
outcomes to be
assessed
2
Prepare test
specifica-
tions 3

4 Construct pool of
items

Review and revise

items 5

FIGURE 7.1 Test Development Process for Classroom Tests

What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
including:
1. The purpose of the test: It might be something simple, such as
assessing knowledge prior to instruction to a get a baseline of what
students know before taking a course. Alternatively, the test purpose
might be more complex, such as assessing retention of material
learned across several professional education courses to determine
eligibility for advancement

Insert Running Title 6

What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
including:
2. The content framework: Start with the topics presented first
during the instruction

Insert Running Title 7

What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
including:
3. The testing time: This includes amount of testing time
available and the need for breaks, as well
as other logistical issues related to the test administration.

Insert Running Title 8

What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
including:
4. The content weighting (aka, number of items per content area): The
number of questions per topic category should reflect the importance of the
topic; that is, they should correlate with the amount of time spent on that topic in
the course. For example, if there are 20 one-hour lectures, there may be 10
questions from each hour of lecture or associated with each hour of expected
study. The number of questions per category can be adjusted up or down to
better balance the overall test content and represent the importance of each
lecture, as well as the total lecture time.
Insert Running Title 9
What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
including:
4. The content weighting (aka, number of items per content area): The
number of questions per topic category should reflect the importance of the
topic; that is, they should correlate with the amount of time spent on that topic in
the course. For example, if there are 20 one-hour lectures, there may be 10
questions from each hour of lecture or associated with each hour of expected
study. The number of questions per category can be adjusted up or down to
better balance the overall test content and represent the importance of each
lecture, as well as the total lecture time.
Insert Running Title 10
What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
including:
• 5. The item formats (e.g., MCQ, essay question): The item
formats should always be appropriate for the purpose of the
assessment.

Insert Running Title 11

Benefits of Test Blueprints

Test blueprints will help ensure that your tests:

o Appropriately assess the instructional objectives of the course
o Appropriately reflect key course goals and objectives – the material
to be learned
o Include the appropriate item formats for the skills being assessed

Insert Running Title 12

Benefits of Test Blueprints

Test blueprints can be used for additional purposes besides

test construction:
o Demonstrate to students the topics you value, and serve as a study guide
for them
o Facilitate learning by providing a framework or mental schema for
students
o Ensure consistent coverage of exam content from year to year
o Communicate course expectations to stakeholders

Insert Running Title 13

Goals of Using TOS

▪ improving validity of a teacher’s evaluations based on a given

assessment.
Validity is the degree to which the evaluations or judgments we make as
teachers about our students can be trusted based on the quality of evidence we
gathered (Wolming & Wilkstrom, 2010)

-It is important to understand that validity is not a property of the test

constructed, but of the inferences we make based on the information gathered
from a test.

Insert Running Title 14

Sources of Classroom Assessment Validity

1. evidence based on test content -

underscores the degree to which a
test (or any assessment task)
measures what it is designed (or
supposed) to measure (Wolming &
Wilkstrom, 2010)
Insert Running Title 15
Sources of Classroom Assessment Validity

1. evidence based on test content -

-we are interested in knowing if the
measured (tested/assessed)
objectives reflect what you claim to
have measured

Insert Running Title 16

Sources of Classroom Assessment Validity

2. evidence based on response process

- is concerned with the alignment of
the kinds of thinking required of
students during instruction and during
assessment (testing) activities.

Insert Running Title 17

References
• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=
web&cd=&cad=rja&uact=8&ved=2ahUKEwjxuMqGvOf2AhXT
4jgGHcMZDJYQFnoECBkQAQ&url=https%3A%2F%2Fwww.
nbme.org%2Fsites%2Fdefault%2Ffiles%2F2020-01%2FTest-
Blueprinting-Lesson-
2.pdf&usg=AOvVaw3kxnN1AkZh1_RjEj_lUcaR
• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=
web&cd=&ved=2ahUKEwjxuMqGvOf2AhXT4jgGHcMZDJYQF
noECBwQAQ&url=https%3A%2F%2Fwww.montclair.edu%2F
profilepages%2Fmedia%2F6109%2Fuser%2Fv18n4a_Authors
Proof_final.docx&usg=AOvVaw11LCpJtBLYmZ0RMcatKJy5
Insert Running Title 18
UNIVERSITY OF SOUTHERN MINDANAO

Module 5 – Measuring
Learning Outcomes
Prof Ed 221 ASL 1
Topic Outline
• ● Goals, standards, learning competencies, and
instructional objectives
• ● Taxonomy of Educational Objectives: Cognitive (revised
Bloom’s taxonomy); Affective; Psychomotor
• ● Learning outcomes and assessment methods

Insert Running Title 2

Intended Learning Outcomes
• 4.1 Identify differences among goals, standards, learning
competencies, and instructional objectives.
• 4.2 Distinguish learning outcomes in the three domains of
learning.
• 4.3 Classify learning targets.
• 4.4 Match learning outcomes with appropriate assessment
methods.

3
Insert Running Title
Goals, standards, learning
competencies, and instructional
objectives

Insert Running Title 4

Educational Goals
• statements that describe the • Examples:
skills, competencies and Think positive to stay focused.
qualities that you should Stay resilient.
possess upon completion of a Make time to read.
course or program. It usually Manage your time.
involves identifying objectives, Find time to relax.
choosing attainable short-term
Strive for excellence.
goals and then creating a plan
Build a strong network.
for achieving those goals.
Build good study habits.

Insert Running Title 5

Educational Standards

• learning goals for what students should know and be

able to do at each grade level

Insert Running Title 6

Educational Standards

1. Content Standards
- define what students should know and be able to do,
specifying skills or knowledge at various grade levels (Marzano,
1996, 1997). In the past, schools often used whatever content
was found in their textbooks. With this reform, content
standards are defined by national subject areas associations,
local districts, or states. Schools are then expected to develop
curriculum standards within and across subjects.
Insert Running Title 7
Educational Standards

2. Curriculum Standards
- usually describe instructional techniques or classroom
activities that help students achieve the content standard
(Marzano, 1996, 1997). Curriculum standards are often
developed at each grade level in all the core subjects as well as
others as defined by the school or district. These curriculum
standards are aligned with content standards and identify what
goes on in classrooms to help students achieve the standard.
Insert Running Title 8
Educational Standards
3. Performance Standards
- specify the level of performance in a skill or area of knowledge that is considered
acceptable (Burger, 1996, 1997). These measurable expectations for performance, sometimes
termed "benchmarks," are aligned with both curriculum and content standards in each subject
area. In many schools the acceptable level of performance has been defined by teachers
focusing on their own classrooms. In standards-based reform, educators and other
stakeholders define acceptable levels of performance for all students. The issue of what to do
when students do not achieve a particular performance level remains one of the great
challenges of this reform. Should the student go through remediation, get held back, be
required to take summer school, be excluded from graduation, or receive some other sanction?

Insert Running Title 9

Learning Competencies
• A general statement that describes the use of desired knowledge,
skills, behaviors, and abilities. Competencies often define specific
applied skills and knowledge that enables people to successfully
perform specific functions in a work or educational setting

Insert Running Title 10

Learning Competencies
• Functional competencies: Skills that are required to use on a
daily or regular basis, such as cognitive, methodological,
technological, and linguistic abilities
• Interpersonal competencies: Oral, written, and visual
communication skills, as well as the ability to work effectively
with diverse teams
• Critical thinking competencies: The ability to reason
effectively, use systems thinking, and make judgments and
decisions toward solving complex problems

Insert Running Title 11

Instructional Objectives
• An instructional objective is a statement that will describe what the learner will be able
to do after completing the instruction. (Kibler, Kegla, Barker, Miles, 1974).

• According to Dick and Carey (1990), a performance objective is a detailed description of

what students will be able to do when they complete a unit of instruction. It is also
referred to as a behavioral objective or an instructional objective.

• Robert Mager (1984), in his book Preparing Instructional Objectives, describes an

objective as "a collection of words and/or pictures and diagrams intended to let others
know what you intend for your students to achieve" (pg. 3). An objective does not
describe what the instructor will be doing, but instead the skills, knowledge, and
attitudes that the instructor will be attempting to produce in learners.

Insert Running Title 12

Instructional Objectives
• Instructional objectives are specific, measurable, short-term, observable student
behaviors. They indicate the desirable knowledge, skills, or attitudes to be
gained.
• An instructional objective is the focal point of a lesson plan. Objectives are the
foundation upon which you can build lessons and assessments and instruction
that you can prove meet your overall course or lesson goals.

Insert Running Title 13

Insert Running Title 14
Insert Running Title 15
Insert Running Title 16
Learning outcomes and
assessment methods

Insert Running Title 17

Matching learning targets with assessment
methods
CONSTRUCTED ALIGNMENT

• provides the "how to" by

verifying that the TLAs and
the ATs activate the same
verbs as in the ILOs
Constructed
Alignment • Performance verbs in the
ILOs are indicators of the
methods of assessment
suitable to measure and
evaluate student learning
Matching learning targets with assessment
methods
LEARNING TARGET
• a description of performance that
includes what learners should know
and be able to do
• contains the criteria used to judge
Learning Target student performance
• derived from national and local
standards
• similar with learning outcome
LEARNING TARGETS & ASSESSMENT METHODS
(Macmillan, 2007)
ASSESSMENT METHODS
TARGET SR &
E PT OQ O SSA
BCR
Knowledge & Simple Understanding 5 4 3 4 3 3
Deep Understanding & Reasoning 2 5 4 4 2 3
Skills 1 3 5 2 5 3
Products 1 1 5 2 4 4
Affect 1 2 4 4 4 5

Note: Higher numbers indicate better matches (eg. 5 = excellent, 1=poor

Knowledge and simple understanding

pertains to mastery of substantive subject matter and procedures

covers the lower order thinking skills of remembering, understanding and

applying (Bloom’s taxonomy)

Selected-response and constructed-response best in assessing lower-level

learning targets in terms of coverage and efficiency
• best in assessing lower-level learning targets in terms of coverage and efficiency
• facts, concepts, principles and procedures delegate to pencil-and-paper tests quite
well
Knowledge and simple understanding

Essays
• elicit original responses and response patterns
• effective especially if students are required to organize, connect or
integrate ideas
• used to assess writing skills of students
Oral Questioning
• assess knowledge and simple understanding but not as efficient as
selected-response items
• often used during instruction to check for mastery and
understanding of a limited amount of factual information and
provide immediate progress feedback.
DEEP UNDERSTANDING AND
REASONING
Reasoning
• mental manipulation and use of knowledge in critical
and creative ways
Deep Understanding and Reasoning
• involve higher order thinking skills of analyzing,
evaluating and synthesizing
• Essays are best - assess complex learning outcomes
because students are required to demonstrate their
reasoning and thinking skills
• Oral questioning – can also be used but it is less time
efficient than essays
• Performance tasks are effective as well
DEEP UNDERSTANDING AND REASONING
E.g. Compare and contrast two topics or ideas; or Explain
the pros and cons of an argument
• Through essays, teachers can detect errors in factual
content, writing and reasoning.
Selected-response and brief-constructed response
• demand more thought and time in crafting in order to
target understanding rather than simple recall or rote
memorization.
Interpretive exercise
• consists of a series of objective items based on a given
verbal, tabular or graphic information like a passage
from a story, a statistical table or a pie chart.
SKILLS
Performance assessment

• the superior assessment method

• authentic assessment - when used in real-life and
meaningful context
• suited for applications with less-structured problems
where problem identification; collection, organization,
integration and evaluation of information; and
originality are emphasized
• used when students are tasked to conduct oral
presentation or physical performance or create a
product
PRODUCTS

assessed through performance tasks

substantial and tangible output that showcases a

student's understanding of concepts and skills
and their ability to apply, analyze, evaluation and
integrate those concepts and skills.
PRODUCTS
musical
stories
compositions

research
poems
studies

model
drawings
construction

multimedia
materials
References
• https://www.indeed.com/career-advice/career-
development/educational-goals-
examples#:~:text=Educational%20goals%20are%20stateme
nts%20that,plan%20for%20achieving%20those%20goals.
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of
learning 1. Adriana Publishing Co., Inc.
2.McMillan, J.H. (2018). Classroom assessment: Principles and
practice that enhance student learning and motivation.
Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers
need to know. Pearson Education, Inc.
Insert Running Title 30
UNIVERSITY OF SOUTHERN MINDANAO

Development of Traditional Tools for

Insert Running Title 2

• The process of test construction for classroom testing

applies the same initial steps in the construction of any
instrument designed to measure a psychological
construct.
Identify purpose of
the test
1

Specify learning
outcomes to be
assessed
2
Prepare test
specifica-
tions 3

4 Construct pool of
items

Review and revise

items 5

FIGURE 7.1 Test Development Process for Classroom Tests

OVERALL TEST DEVELOPMENT PROCESS

• Planning Phase where purpose of the test is identified,

learning outcomes to be assessed are clearly specified and
lastly a table of specifications Is prepared to guide the
item construction phase
OVERALL TEST DEVELOPMENT PROCESS
• Item Construction Phase - where test items are constructed
following the appropriate item format for the specified
learning outcomes of instruction.
OVERALL TEST DEVELOPMENT PROCESS

• Review Phase -where items are examined by the teacher or

his/her peers, prior to administration based on judgment of
their alignment to content and behavior components of the
instructional competencies, and after administration, based
on an analysis of students' performance in each item.
Identifying Purpose of Test
• Testing as an assessment mechanism aims to gather valid and
reliable Information useful to both learners and teachers for
formative as well as summative purposes.

• Classroom formative assessments seek to uncover what students

know and can do to get feedback on what they need to alter or
work on farther to improve their learning.
Identifying Purpose of Test

• Feedback provided is used primarily to address specific

student learning problems while instruction is still in
progress (Russel and Airasian, 2012).
Multiple-choice items
• Very well in detecting and diagnosing the source of difficulty in
terms of misconceptions and areas of confusion.

• Each option or alternative can represent a type of error that

students are likely to commit.
Formative use of a classroom test for
diagnosis
• Alternatives or distracters in selected-response items can be in
terms of popular falsehoods, misconceptions, misinterpretations,
or inadequately stated principles students may likely adapt.

• By obtaining the option plausibility of the distracters, the teacher

can identify what to reinforce in the lesson follow-up based on
the most frequently chosen error.
Summative use of classroom test
• The test considers the planned competencies to be developed in the
unit of work.

• Consequently, the learners spend enormous time reviewing, recalling

or re-learning their past lessons prior to testing.

• Their test motivation is contingent on the stake they put on testing,

to "pass the test", to "pass the course", or to "get high grades".
Specifying the Learning Outcomes
• Defining learning has progressed from being simply an
accumulation of facts to being able to allow the learner to
interpret and apply such facts to create new knowledge.

• Developments in the assessment learning have, of late, focused on

multiple measures of student performance reflecting different
levels of outcomes of the teaching-learning process.
Specifying the Learning Outcomes
• The learning outcomes communicate both specific content and
nature of tasks to be performed.

• Assessment then becomes a quality assurance tool for tracking

student progress in attaining the curriculum standards In terms of
content and performance (Enc. No.1, DepEd Order No. 73, s 2012).
Specifying the Learning Outcomes
• Processes for assessment recognize and address different learning
targets defined by the intended outcomes from knowledge of facts
and information covered by the curriculum at every level to various
facets of showing understanding of them:
• what operative processes or skills they can demonstrate

• what bigger and newer ideas they can form and

• derive the innovative products and processes they can create including
their authentic application in real-life.
Specifying the Learning Outcomes
• Summative tests given at the end of an instructional
process focus on the accomplishment of the learning
outcomes demarcated in every unit of work designed
in the curriculum.
• As the focus of assessment varies due to recognized
levels of learning, so do the methods or techniques
for assessment.
• Each learning outcome when properly stated,
defines the behavior or task to be performed within
a given content area.
Specifying the Learning Outcomes
• Classroom tests need to be carefully planned to
ensure that they truthfully and reliably quantify
what are intended to be measured.
• Post instructional assessment tool expected to cover
the curriculum standards of a subject or course, grade
or year level in terms of measurable and demonstrable
student outcomes.
• Pre-instructional assessment tool which can diagnose
what the learners know of the new lesson for
instructional adjustment on the part of the teacher.
Preparing a Test Blueprint
• Whatever the purpose of the test maybe, a teacher must
determine appropriately the learning outcomes to be
assessed and how they will be assessed.
Preparing a Test Blueprint
• Particularly realizing this planning phase helps teachers make
genuine connections in the trilogy among curriculum, instruction
and assessment.

• The curriculum dictates the instructional as well as assessment

strategies to be applied while assessment informs both the
curriculum and instruction what decisions to make to improve
learning.
Preparing a Test Blueprint

• To assure the preparation of a good test, a test

blueprint is commonly set up in a two-way Table of
Specifications (TOS) that basically spells out WHAT
will be tested and HOW it will be tested to obtain the
information needed.
Preparing a Test Blueprint
• WHAT covers two aspects:
• content area (i.e. subject matter) being covered
• target learning outcomes (i.e. competencies)

• HOW specifies the test format:

• the type of assessment question or task to be used
• the item distribution to attain an effective
• balanced sampling of skills to be tested.
•The length of test should be able to sample what
students should know based on an outline of
work and not on ease of constructing questions
particularly for low level outcomes.
•The more important a learning outcome is the
more likely will there be more points allotted to
it. McMillan (2007) suggests some rules of thumb
in determining how many items are sufficient for
good sampling.
• A minimum of ten items is needed to assess each
knowledge learning target in a unit but which should
represent a good cross-section of difficulty of items.
• However, if there are more specific learning targets
to be tested, at least five items would be enough for
each one to allow for criterion-referenced
Interpretation for mastery.
• Eighty percent (80%) correct of items for a
competency is an acceptable mastery criterion.
References
• https://www.ulethbridge.ca/teachingcentre/creating-
selected-response-questions

Insert Running Title 26

UNIVERSITY OF SOUTHERN MINDANAO

Development of Traditional Tools for

Insert Running Title 2

• The process of test construction for classroom testing

applies the same initial steps in the construction of any
instrument designed to measure a psychological
construct.
Identify purpose of
the test
1

Specify learning
outcomes to be
assessed
2
Prepare test
specifica-
tions 3

4 Construct pool of
items

Review and revise

items 5

FIGURE 7.1 Test Development Process for Classroom Tests

OVERALL TEST DEVELOPMENT PROCESS

• Planning Phase where purpose of the test is identified,

• Review Phase -where items are examined by the teacher or

• Classroom formative assessments seek to uncover what students

know and can do to get feedback on what they need to alter or
work on farther to improve their learning.
Identifying Purpose of Test

• Feedback provided is used primarily to address specific

• Each option or alternative can represent a type of error that

• By obtaining the option plausibility of the distracters, the teacher

• Consequently, the learners spend enormous time reviewing, recalling

or re-learning their past lessons prior to testing.

• Their test motivation is contingent on the stake they put on testing,

• Developments in the assessment learning have, of late, focused on

• Assessment then becomes a quality assurance tool for tracking

• what bigger and newer ideas they can form and

• The curriculum dictates the instructional as well as assessment

strategies to be applied while assessment informs both the
curriculum and instruction what decisions to make to improve
learning.
Preparing a Test Blueprint

• To assure the preparation of a good test, a test

• HOW specifies the test format:

Insert Running Title 26

UNIVERSITY OF SOUTHERN MINDANAO

TOS Making
What is a Table of Specifications?

A Table of Specifications is a two-way

chart which describes the topics to be
covered by a test and the number of
items or points which will be associated
with each topic.
Significance and components of a Table of
Specification
Kubiszyn & Borich, (2003) emphasized the following
significance and components of TOS:
1. A Table of Specifications consists of a two-way chart
or grid relating instructional objectives to the
instructional content.
The column of the chart lists the objectives or
"levels of skills" (Gredlcr, 1999) to be addressed;
The rows list the key concepts or content the test is
to measure.
According to Bloom, et al. (1971),
"We have found it useful to represent the
relation of content and behaviors in the form
of a two dimensional table with the
objectives on one axis, the content on the
other”.
2. A Table of Specifications identifies not only the
content areas covered in class, it identifies the
performance objectives at each level of the
cognitive domain of Bloom's Taxonomy.

Teachers can be assured that they are measuring

students' learning across a wide range of content
and readings as well as cognitive processes
requiring higher order thinking.
3. A Table of Specifications is developed
before the test is written. In fact it
should be constructed before the actual
teaching begins.
4. The purpose of a Table of
Specifications is to identify the
achievement domains being measured
and to ensure that a fair and
representative sample of questions
appear on the test.
Carey (1988) pointed out that the time
available for testing depended not only
on the length of the class period but
also on students' attention spans.
Some rules of thumb exist for how long it takes most students
to answer various types of questions according to Linn &
Gronlund (2000):
1. A true-false test item takes 15 seconds to
answer unless the student is asked to
provide the correct answer for false
questions. Then the time increases to 30-
45 seconds.
2. A seven item matching exercise takes 60-
90 seconds.
3. A four response multiple choice test
item that asks for an answer regarding a
term, fact, definition, rule or principle
(knowledge level item) takes 30
seconds. The same type of test item
that is at the application level may take
60 seconds.
4. Any test item format that requires
solving a problem, analyzing,
synthesizing information or
evaluating examples adds 30-60
seconds to a question.
5. Short-answer test items take 30-45
seconds.

6. An essay test takes 60 seconds for

each point to be compared and
contrasted.
If an individual can perform the
most difficult aspects of the objective,
the instructor can "assume" the lower
levels can be done.
However, if testing the lower levels, the
instructor cannot "assume" the individual
can perform the higher levels.
The cornerstone of classroom assessment
practices is the validity of the judgments about
students’ learning and knowledge.
A TOS is one tool that teachers can use to
support their professional judgment when
creating or selecting test for use with their
students.
In order to understand how to best modify a
TOS to meet your needs, it is important to
understand the goal of this strategy: improving
validity of a teacher’s evaluations based on a given
assessment. Validity is the degree to which the
evaluations or judgments we make as teachers
about our students can be trusted based on the
quality of evidence we gathered (Wolming
& Wilkstrom, 2010).
From the literatures we have known that
standardized tests are valid.
The question needs to he asked if GPAs are a
valid measures of student achievement?
GPAs are based in large measure on teacher
made tests. If teacher made tests are not valid, how
can a students GPA be valid?
The use of a Table of Specifications can provide
teacher made tests validity (Notar, Charles, 2004).
Lei, Bassiri
and Schultz,(2001) found that a
college GPA was an unreliable predictor of
student achievement. Since we assume that
norm referenced tests are valid measures,
the tendency is to put more weight on
those results concerning student
achievement.
According to Ooster (2003) the faculty
made tests will likely have poor content
validity, "cause for concern because each
assessment instrument depends on its
validity more than on any other factor."
How can the use of a Table of Specifications
benefit your students, including those with
special needs?

A Table of Specifications benefits students in two

ways.
First, it improves the validity of teacher-
made tests.
Second, it can improve student learning as
well.
A Table of Specifications helps to ensure that
there is a match between what is taught and
what is tested. Classroom assessment should be
driven by classroom teaching which itself is
driven by course goals and objectives.
Tables of Specifications provide the link
between teaching and testing. (University of
Kansas, 2013)
Teachers can collaborate with students ,
teachers or colleagues on the construction of the
Table of Specifications:
❑ what are the main ideas and topics,
❑ what emphasis should be placed on each topic,
❑ what should be on the test?
Open discussion and negotiation of these issues
can encourage higher levels of understanding while
also modeling good learning and study skills.
THANK YOU …..
Selecting and Constructing Test Items and
Tasks
CATEGORIZING TEST TYPES
"Selection of item
format is dictated by the
instructional outcomes
intended to be assessed. There
are formats appropriate to
measuring knowledge and
simple understanding while
there are those fit to
measuring complex or deep
understanding."
Insert Running Title 23
Tree Chart of Test Types

"Selection of item format is dictated by the instructional

outcomes intended to be assessed. There are formats
appropriate to measuring knowledge and simple
understanding while there are those fit to measuring
complex or deep understanding."

Insert Running Title 24

Relating Test Types with Levels of Learning Outcomes
➢ review of curricular frameworks of educational systems across
various countries shows common integral domains that govern
their content and performance standards In different subject
areas

Insert Running Title 25

A. Measuring Knowledge and Simple Understanding
➢Knowledge as it appears in cognitive taxonomies (Bloom, 1956;
Anderson & Krathwol, 2004) as the simplest and lowest level is
categorized further into what thinking process is involved in
learning. Knowledge involves remembering or recalling specific
facts, symbols, details, elements of events and principles to
acquire new knowledge.

Insert Running Title 26

Insert Running Title 27
The examples below will differentiate declarative and
procedural knowledge as simple understanding involving
comprehension and application.

Insert Running Title 28

Nikko (2001) gives
categories of these
lower-order thinking
skills and some examples
of generic questions for
assessing them (see
Table 8.3). The generic
questions can be useful
in formatting completion
or short answer items to
assess simple
understanding.
Insert Running Title 29
B. Measuring Deep Understanding
Deep Understanding
➢requires more complex thinking processes
➢requires the three (3) higher cognitive level example
analyzing, evaluating and creating;
➢Higher-order- Thinking skills

Simple Understanding
➢involves the first three (3)cognitive levels example
remembering, comprehending and applying
➢Lower-order- Thinking skills
Insert Running Title 30
Insert Running Title 31
Table 8.5 illustrates the relationship between
learning outcomes and test types. It can be observed that
test types can be made flexible and versatile to test
different levels of outcomes and no to be limited or
exclusive to only one cognitive level. The arrows suggest
that supply or selection type can be used for both lower-
level as well as higher-level outcomes. Knowledge and
simple understanding can be handled by objective supply
type - i.e. completion and short answer items, and
objective selection type - i.e. alternate choice, multiple
choices and matching.
Insert Running Title 32
Insert Running Title 33
Insert Running Title 34
Note that deep understanding is assessed by the same
category of item format but using non-objective types - i.e. essay
questions, both restricted and extended, modified selected-
response- i.e. multiple interpretive items, and performance tasks.
What indeed matters is the careful construction of the item
elements (i.e. item stimulus and item response) to appropriately
elicit the cognitive processes involved. An elicitation device like a
question or a directive for a supply type can be used to assess both
low-level and high-level outcomes in the same way that with the
right construction of the stem and options for selected-response
types, both simple and complex forms of cognition can be activated.
Study the examples in the two boxes.
Insert Running Title 35
Insert Running Title 36
Insert Running Title 37
Miller, Linn & Gronlund (2009) presents categories of
thought questions for deep understanding and sample item
stems in Table 8.6. These sample stems can be used in
constructing test types, i.e. both supply and selection type that
can elicit complex thinking skills.

Insert Running Title 38

Insert Running Title 39
Insert Running Title 40
Insert Running Title 41
Performance tasks as in the case of "letter writing,"
"producing a plan," and "story writing" likewise assess
high-level learning outcomes involving complex thought
processes, e.g. analyzing, evaluating, and creating.
Angelo & Cross (1993) have extensively designed
classroom assessment tasks (CATs) for college level that
are performance-based type in nature. Some examples
given in Table 8.7 were taken from their inventory.

Insert Running Title 42

Table 8.7 Examples of Performance Assessment Tasks for Advanced Level
Thinking Skill Performance Task
Analysis 1. Analytic memos- writing a one or two page analysis of a specific
problem or issue
2. Pros and Cons grid- making a list of pros and cons of decision made
3. Contest, Form, and Function Outline- Analyzing the what, how and
why of the particular message of an advertisement, or commercial.
Evaluate 1. Muddiest Point – identifying what students find least clear in lesson,
story, demonstration
2. Misconception Check – assessing students’ prior beliefs that can
hinder learning
3. Empty Outline –recalling and organizing the important points of a
lecture or reading
Create 1. Application Card – designing an application of a learned scientific
principle or procedure in real world.
2. Directed Paraphrasing – translating what has been learned in one’s
own words or form for a specific audience.
3. Paper or Project Prospectus – writing a first structure draft of a
paper or
4. project
Insert Running Title 43
Constructing Objective Supply Type of Items
The item types falling under Supply type require the learners to
construct a response to a question or directive. The sub-types however,
differ in terms of the structure of the response needed to answer the
item:
1. Completion Type
Table 8.8 illustrates the usual item structure for Completion Type. An
item structure consists of a stimulus which defines the question or problem,
and a response which defines what is to be provided or constructed by the
learner. For a completion item, an incomplete statement with a blank is often
used as stimulus and the response is a constructed word, symbol, numeral or
phrase to complete the statement.
Insert Running Title 44
Insert Running Title 45
Sometimes instead of a set independent incomplete
statements as the stimulus, a discourse with gaps is used to
make it more communicative. Gap-Filling is another term for
this variant as the student fills several gaps in a discourse
depending on the target outcome. Language teachers often
utilize this form for integrative testing where more than one
type of skill (e.g. vocabulary and comprehension skills) are
needed to fill in the gaps.

Insert Running Title 46

ILO: Provide synonyms for target words in a paragraph.

Directions: Give a word that has the same meaning as the

word inside the parenthesis.

More than a few people may confuse fine dining with_______

(costly) dining in restaurants. Well-trained (cooks) at the top of
their profession can make their good________ (name) in these
places. Who the cooks bring_______ (honor) to these
restaurants.

Insert Running Title 47

Experts in test development agree on some helpful guidelines in the
construction of Completion Items (Kubiszyn and Borich, 2010; McMillan,
2007; Nitko, 2001; Popham, 2011)
A.There should only be one correct response to complete a statement.
This contributes to efficiency in scoring since a key to correction
can easily be prepared in advance when there is only one expected
response. Proper wording of the incomplete statement must be
carefully done to avoid having more than one correct answer. Exception
to this rule is if you are testing for verbal creativity where giving diverse
but acceptable responses is desirable. This however, should be explicitly
mentioned in the test instructions. For instance, the more synonyms
students can give to the word costly like expensive, exorbitant, and
pricey the more points they can earn. Objective scoring will likely have
to be modified here.
Insert Running Title 48
In Sample A of Table 8.8, if the target concept is
quadrilateral then its wording is all right. However, if the
target concept is square, the way it is worded may be open
to more than one acceptable answers. Quadrilateral,
rectangle and parallelogram can also be considered correct.
To improve the stem, it can be worded this way to eliminate
the other terms.
A quadrilateral with four equal sides is called_______.

Insert Running Title 49

B. The blank should be placed at the end or towards the end
of the incomplete statement.
This will provide the reader appropriate and adequate
context before s/he gets to answer the blank and consequently
avoids being perplexed. In Sample B, if the blank is placed at the
beginning like:

Insert Running Title 50

During the______ period, Dr. Jose Rizal wrote the novel, Noli
Me Tangere.
It can possibly call for diverse and ambiguous answers like
troubled, colonial, or earlier, without reading the rest of the
statement.

Insert Running Title 51

C. Avoid providing unintended clues to the correct answer.
The validity of a student's score is jeopardized when s/he
answers correctly an item without really knowing what the
correct response is. His/her score may represent a different
kind of ability apart from what is intended to be measured. This
happens when a student who doesn't know the answer would
find one by using unintended grammatical clues e.g. presence
of indefinite articles a or an before the blank to suggest a
response that starts with a vowe.
Insert Running Title 52
2. Short Answer Items
Instead of supplying words to complete statements,
relatively short answers are constructed as direct answers to
questions. See Table 8.9 for the item structure. The sample
items are the same statements in Table 8.8 which have been
transformed into interrogative form. Being able to do this
illustrates the fact that both test types can be used to test
the same learning outcomes requiring the same cognitive
processes.
Insert Running Title 53
Insert Running Title 54
Similar to completion type, the short answer items can
assess learners' declarative and procedural knowledge that require
such thinking processes as remembering, comprehending, and
applying. Writing short-answer items similarly follow the
guidelines in writing completion items. Here are those given by
McMillan (2007, pp.170-171) and they are quite self-explanatory.

Insert Running Title 55

1. State the item so that only one answer is correct.

2. State the item so that the required answer is brief. Requiring a

long response would not be necessary and it can limit the number
of items students can answer within the allotted period of time.

3. Do not use questions verbatim from textbooks and other

instructional materials. This will give undue disadvantage to
students not familiar with the materials since it can become a
memory test instead of comprehension.
Insert Running Title 56
4. Designate units required for the answer, this frequently occurs when
the constructed response requires a definite unit to be considered
correct. Without designating the unit, a response may be rendered wrong
because of differing mind-set.

Example:
Poor: How much does the food caterer charge?

This could be answered in different ways like cost per head, per dish, per
plate, or as a full package.

Improve: How much does the food caterer charge per head?
Insert Running Title 57
5. State the item succinctly with words students understand.
This is true for all types of tests. Validity of classroom-based test
is at risk when students cannot answer correctly, not because
they do not know, but could be due to the messy wording of the
question.

Poor: As viewed by creatures from the earth, when does the blood moon
appear in the evening?

Improved: when does a blood moon appear?

Insert Running Title 58
The two supply types, completion and short
answer items, share common points:
• Appropriate for assessing learning outcomes involving knowledge and simple
understanding.
• Capable of assessing both declarative and procedural knowledge.
• Both are easy and simple to construct.
• Both are objectively scored since a key to correction can be prepared in
advance.
• Both need ample number of items to assess a learning outcome. A single
completion or short-answer item is not sufficient to test mastery of a
competency .
Insert Running Title 59
Constructing Non-objective Supply Type

Essay Type
Essay Type likewise belongs to the Supply category for the
simple reason that the required response is to be fully
constructed by the students. However, unlike the completion and
short-answer items which are highly structured to allow the
students to organize freely their responses using their own
writing style to answer the question. This format therefore is
appropriate for testing
•

Insert Running Title 60

deep understanding and reasoning. Some of the thinking
processes to satisfactorily answer essay questions involve
comparison, induction, deduction, abstracting, analyzing
perspectives, decision-making, problem-solving, constructing
support and experimental inquiry (Marzano, et al (1993). They
actually involve higher-order thinking skills.

Insert Running Title 61

There are two variations of essay items: restricted-
response and extended-response. Table 8.10 approximates a
structure for these two types of essay items. The same stimulus
structure can be used for both types as well as the expected
forms of response. Sample items are provided to illustrate the
variations.

Insert Running Title 62

Insert Running Title 63

Insert Running Title 64

Insert Running Title 65

Suggestions for constructing essay questions are
given by Miller, Linn & Gronlund (2009, p.243):
1. Restrict the use of essay questions to those learning outcomes
that cannot be measured satisfactorily by objective items.
Objective items cannot measure such important skills as
ability to organize, integrate, and synthesize ideas showing one's
creativity in writing style. Use of essay format encourages and
challenges students to indulge in higher-order thinking skills
instead of simply rote memorization of facts and of remembering
inconsequential details.

Insert Running Title 66

2. Construct questions that will call forth the skills specified in the
learning standards.
.
A review of learning standards in school curricula which show that they
range from knowledge to deep understanding. The performance standards
require the learners to demonstrate application of principles, analysis of
experimental findings, evaluation of results and creation of new knowledge
and these are explicitly stated in terms of the expected outcomes at every
grade level. Teachers are expected to make it part of direct instruction to
teach them how to develop these competencies. Students cannot instantly
learning techniques, they should be taught how to make these thinking skills
visible. The essay questions to be constructed then should make the students
model how they are to perform the thinking processes.
Insert Running Title 67
2. Construct questions that will call forth the skills specified in the
learning standards.
.
A review of learning standards in school curricula which show that they
range from knowledge to deep understanding. The performance standards
require the learners to demonstrate application of principles, analysis of
experimental findings, evaluation of results and creation of new knowledge
and these are explicitly stated in terms of the expected outcomes at every
grade level. Teachers are expected to make it part of direct instruction to
teach them how to develop these competencies. Students cannot instantly
learning techniques, they should be taught how to make these thinking skills
visible. The essay questions to be constructed then should make the students
model how they are to perform the thinking processes.
Insert Running Title 68
3. Phrase the question so that the student's task is clearly
.
defined.
Restricted-response type of essay questions especially states the
specific task to be done in writing. As much as possible, the students
should interpret the question in the same way according to what the
teacher expects through the specifications in the question. For instance,
if the teacher aims at testing students' ability to apply learned properties
of a substance for a specific purpose, the question could be stated
"Explain the property of copper that makes it good for making cooking
pans" instead of simply "Why is copper a good material?"
Insert Running Title 69
4. Indicate an approximate time limit for each question.
.
This should be especially considered when the test is a
combination of objective and non-objective format like
inclusion of essay questions. Knowing how much time is
allotted to each one will make the students budget their time
so they do not spend their time on the first question and
consequently missing out on the others.

Insert Running Title 70

5. Avoid the use of optional questions.
. Some teachers have the practice of allowing the
students to select one or two essay questions from a set of
five questions. Some disadvantages of this practice may
include: not being able to use the same basis for reporting test
results, or students being able to prepare through
memorization for those they will likely choose.

Insert Running Title 71

Essay, among the test types, is quite frequently used
.
because of the seeming ease in its construction, but the
least preferred when it comes to scoring. Its reliability is
challenged since its subjective scoring may be affected by
such irrelevant factors as corrector's mood and biases,
student's penmanship, length of response and even time of
day for scoring. The use of a scoring guide called rubrics, can
significantly reduce subjectivity and more or less help in
"objectifying" scoring of a non-objective type of item.

Insert Running Title 72

Basic in the preparation of rubrics is the selection of relevant
scoring criteria to be used in evaluating the written output. Very often
. used to evaluate essays are clarity of message, organization, depth of
understanding, creativity of ideas, grammatical accuracy, etc. If the
relevant criteria are singled out and focused separately to show the
learner's profile across these different dimensions or attributes,
analytic scoring is applied. A 5-point or a 6-point rating scale is
prepared for each attribute and a student's output is judged using an
aggregate score. Table 8.11 illustrates this analytical scoring structure.
This type is useful when giving feedback to the learners as it enables
them to realize their strong and weak points especially when they are
made aware of the scoring criteria.

Insert Running Title 73

For judging a specific writing genre like an argument, the

rubric shown in Table 8.12 can be adapted for analytical scoring.

Insert Running Title 74

Insert Running Title 75

When the attributes are considered together to arrive at
. overall judgment or impression, holistic scoring is in use. For
an
ease of scoring, teachers often use a set of labels like excellent,
good, adequate, promising, weak, inadequate or the traditional
A,B,C,D, or F marks, however, this practice neither provides the
teachers with guidance for scoring nor the students for
understanding their score (Miller, Linn & Gronlund, 2009). What
is suggested is to have descriptions of the labels as in the
example below:

Insert Running Title 76

Insert Running Title 77

There are suggestions also given by Miller, Linn &
Gronlund (2009, p.254) to improve the reliability of
scoring responses to essay questions:
• 1. Prepare an outline of the expected answer in advance.
Particularly for restricted-response types which define
specifically the task, having a list of the expected responses, e.g.
three principles or two theories to explain a phenomenon, will be
very useful to the teacher.
2. Use the scoring rubric that is most appropriate.
The nature of the essay question and what is assessed should
identify the type of scoring rubric that should be used. Quite
significant too, is the kind of information that will be communicated by
the teacher to the students.

Insert Running Title 78

3. Decide how to handle factors that are irrelevant to the learning
outcomes being measured.
As mentioned earlier in this chapter, scoring of essay could be
influenced by irrelevant factors like spelling or handwriting. Teachers
should decide in advance whether these factors are to be ignored or not.

4. Evaluate all responses to one question before going on to the next

one.
Very likely, scoring of the subsequent question could be influenced
by the student's response to the preceding item. Consistency in scoring is
better attained when inter-comparison of responses of the members of
the class to the same item is done.
Insert Running Title 79
5. When possible, evaluate the answers without looking at the
student's name.
An acceptable practice in testing is making students use a
separate sheet for their response to the essay questions and make them
write their names at the back of the answer sheet. Sometimes teachers
could be influenced by who the student is and some form of bias could
happen in favor of better or popular students.

6. If especially important decisions are to be based on the results,

obtain two or more independent ratings.
To do away with scorer's bias, scoring could be reliably carried out
by two independent raters and the final score being the average of the
two ratings.
Insert Running Title 80
Constructing Selected-response Types

While supply formats require learners to construct their

responses to questions or directives, selected-response types
entail choosing the nearly best or most correct option to
answer a problem. The greatest challenge for this item
format is the construction of plausible options or distracters
so not one stand out as attractively correct.

Insert Running Title 81

There are three sub-types of the selected-response
format depending on the number of given options:

1. Alternate form or binary choice provides only two options,

2. Multiple-choice type offers 3 to 5 options or solutions to a

problem, and

3. Matching type gives a set of problems or premises and a

set of options which will be appropriately paired.

Insert Running Title 82

1. Binary Choice or Alternate Form
.
Table 8.13 shows the variety of structure using the alternate form
as suggested by Nitko (2001, p.136).

Insert Running Title 83

Insert Running Title 84

Except for Yes-No type which uses direct questions, all other
varieties of binary-choice or alternate choice of items have
.
propositions as the item stimulus. According to what the students
have learned, the veracity of such propositional statements is
judged by the students indicating whether they are true or false,
correct or incorrect or whether they agree or disagree with the
thought or idea expressed. Requiring the students to modify or
qualify their responses particularly for statements judged to be false
or incorrect more or less challenges the reasoning ability of the
learners and raises the level of outcome that can be assessed.

Insert Running Title 85

Ease in the construction of binary-choice items makes this
a. popular choice when constructing items especially for
knowledge level outcomes. The propositions are mostly
content-based in nature so teachers can easily referee the
correctness of the items. Sometimes the difficulty lies notonly
in writing the propositions but also in preparing the key to
correction! There are suggestions given to construct good
binary choice items (McMillan, 2007, Musial, et al ., 2009) in
order to avoid guessing:

Insert Running Title 86

Insert Running Title 87
Insert Running Title 88
Insert Running Title 89
2. Multiple-Choice Items
The wide choice for this format in classroom testing is mainly due
to its versatility to assess various levels of understanding from
knowledge and simple understanding to deep understanding. McMillan
(2007) asserts that multiple choice can assess whether students can use
reasoning as a skill similar to binary-choice or other reasoning task.
Cognitively demanding outcomes involving analysis evaluation and lend
themselves to the use of multiple-choice items.

Insert Running Title 90

Although its construction may not be as easy as binary-
choice, its advantages far exceed what true/false questions can
offer. Aside from being able to assess various outcome levels,
they are easy to score, less susceptible to guessing than
alternate-choice and more familiar to students as they often
encounter them in different testing events (Musial, et.al 2009).

Insert Running Title 91

Table 8.14 illustrates the item structure of multiple-
choice. Its item stimulus consists of a stem which contains the
problem in the form of a direct question or an incomplete
statement and the options which offer the alternatives/
distracters from which to select the correct answer. The item
response is selecting the correct answer or best answer from
the options or distracters given. They are listed using letters
(i.e. A, B, C, D or a, b, c, d) or numerals (1, 2, 3, 4).

Insert Running Title 92

Insert Running Title 93
Writing good multiple-choice items requires clarity in
stating the problem in the stem and the plausibility or
attractiveness of the distracters. Test experts agree on a set of
guidelines to achieve this purpose (McMillan (2007); Miller, Linn
& Gronlund (2009)1 Popham (2011).

Insert Running Title 94

Stem
1. All the words of the stem should be relevant to the task. It means
stating the problem succinct and clear so students understand what is
expected to be answered.
2. Stem should be meaningful by itself and should fully contain the
problem. This should especially be observed when the stem uses an
incomplete statement format. Consider this stem:
The constitution is______________.

Insert Running Title 95

A stem worded this way does not make definite the
conceptual knowledge being assessed. One does not know
what is being tested. Is it after definition of the term, its
significance or its history? To test whether a stem is effectively
worded is to be able to answer it without the distracters. This
stem can be improved by changing its format to a direct
question or adding more information in the incomplete
statement like:

Insert Running Title 96

• What does the constitution of an organization provide?
(Direct-question format)
• The constitution of an organization provides
(Incomplete-statement format)

This way, the test writer determines what knowledge

competence to focus on and what appropriate distracters to
use.
Insert Running Title 97
3. The stem should use a question with only one correct or
clearly best answer. Ambiguity sets in when the stem allows for
more than one best answer. Students will likely base their
answers on personal experience instead of on facts. Consider
this example. There could be more than one best answer here.

Insert Running Title 98

Poor:
Which product of Thailand makes it economically stable?
A. Rice
B. Dried fruits
C. Dairy products
D. Ready-to-wear
Improved:
Which agricultural product of Thailand is most productive for export?
A. Rice
B. Fish
C. Fruits
D. Vegetables
Insert Running Title 99
Distracters

1. All distracters should appear plausible to uninformed test

takers. This is the key to making the item discriminating and
therefore valid. The validity of the item suffers when there is a
distracter that is obviously correct option D. or obviously
wrong as option B in the following item.

Insert Running Title 100

Poor:
What is matter?
A. Everything that surround us.
B. All things bright and beautiful.
C. Things we see and hear.
D. Anything that occupies space and has mass.
.
Quite interesting are the guidelines by Miller, Linn &
Gronlund (2009, p.212) in making distracters plausible. See Table
8.15.

Insert Running Title 101

Insert Running Title 102
2. Randomly assign correct answers to alternative positions.
Item writers have a tendency to assign the correct answer to
the third alternative as they run short of incorrect alternatives.
Students then who have been used to taking multiple-choice
tests choose wily option C when guessing for greater chance of
being correct. No deliberate order should be followed in
assigning the correct answers (e.g. ABCDABCD or AACCBBDD)
for ease in scoring. As much as possible have an equal number
of correct answers distributed of the distracters.

Insert Running Title 103

3. Avoid using "All-of-the-above" or "None-of-the-above" as
distracters. Item writers think that using them adds difficulty to the
item since it is a way to test reasoning ability. However, students
without much thinking, will tend to choose these "of-the-above"
distracters haphazardly when they see at least two distracters as
correct incorrect without considering the remaining ones. When
forced to come up with a fourth plausible option and there seems
to be none available except "All-of-the-above" or "None-of-the-
above," do not make them as the correct answer.

Insert Running Title 104

3. Matching Items
Of the three general selected-response items formats, matching
items appear differently. It consists of two parallel lists of words or
phrases the students are tasked to pair. The first list which is to be
matched is referred to as premises while the other list from which to
choose its match based on a kind of association is the options. Table 8.16
shows the item structure of matching items followed by two illustrative
items.

Insert Running Title 105

Insert Running Title 106
Insert Running Title 107
The two illustrative items exemplify the guidelines
in constructing matching items (Kubiszyn and
Borich (2010):

1. Keep the list of premises and the list of options homogenous or

belonging to a category.
In Sample 1, the premises are events associated with Philippine
presidents while the options are all names of presidents. In Sample 2,
Column A lists some theories in astronomy about how the universe has
evolved and Column B lists the names of the theories. Homogeneity is a
basic principle in matching items

Insert Running Title 108

2. Keep the premises always in the first column and the options in the
second column.
Since the premises are oftentimes descriptions of events, illustrations of
principles, functions or characteristics, they appear longer than the options
which are most of the times are names, categories, objects, and parts. Ordering
of the two columns this way saves reading time for the students since they will
usually read one long premise once and select the appropriate match from a list
of short words. If ordered the opposite way, the students will read a short word
as the premise then read through long descriptions to look for the correct
answer. Especially for Sample 2, the students will normally read a theoretical
postulate first and then logically go through the names of the theories given in
Column B. imagine the time spent if the opposite process is done. Insert Running Title 109
3. Keep the lists in the two columns unequal in number. Basic reason
for this is to avoid guessing.
The options in Column B are usually more than the premises in
Column A. if the two lists are equal in number, students can
strategically resort to wise elimination in finding the rest of the pairs.
There are matching items however, when the options are much less
than the premises. This is recommended when testing ability to classify.
For instance, Column A will be a list of 10 animals which are to be
classified and Column B could just be 4 categories of mammals. With
this format, it is important to mention in the test directions that an
option can be used more than once.
Insert Running Title 110
4. Test directions always describe the basis for matching.
"Match Column A with Column B" is a no-no in matching type.
Describe clearly what is to be found in the two columns, how they are
associated and how matching will be done. Invalid scores of students
could be due to extraneous factors like misinterpretation of how
matching is to be done, misunderstanding in using given options (e.g.
using an option only once when the teacher allows use of an option
more than once), and limiting number of items to be answered when
there are few options given.

Insert Running Title 111

5. Keep the number of premises not more than eight (8) as

shown in the two sample items.
Fatigue sets in when there are too many items in a set
and again, test validity suffers. If an item writer feels that there
are many concepts to be tested, dividing them into sets is a
better strategy. It is also suggested that a set of matching
items should appear on a page only and not to be carried on to
the next page. Frequently flipping the test papers just to look
for appropriate options requires additional time.

Insert Running Title 112

6. Ambiguous lists should be avoided.
This is especially true in the preparation of options for the
second column. There should only be one option appropriately
associated with a premise unless it is unequivocally mentioned that
an option could be used more than once as mentioned in # 4. This
often occurs when matching events and places or events and
names, descriptions and characters. For instance, in a description
character matching, a premise like "mean to Cinderella" may
carelessly list "stepmother" and "stepsister" as options which are
both correct. Either the premise is improved or one option
removed.
Insert Running Title 113
It can be seen that matching type as a test format is used
quite appropriately in assessing knowledge outcomes
particularly for recall of terminologies, classifications, and
remembering facts, concepts, principles, formulae, and
associations. Its main advantage is its efficiency in being able to
test several concepts using the same format.

Insert Running Title 114

Analyzing, and Improving Tests

ADMINISTERING THE TEST

The test is ready. All that remains is to get the students ready and hand out the
tests. Here is a series of suggestions to help your students psychologically prepare for
the test.
1. Maintain a positive attitude
2. Maximize achievement motivation
3. Equalize advantages
4. Avoid surprises
5. Clarify the rules
6. Rotate distribution
7. Remind students to check their copies
8. Monitor students
9. Minimize distractions
10. Give time warnings
11. Collect tests uniformly

SCORING THE TEST

Some general suggestions to save scoring time and improve scoring accuracy and
consistency:
1. Prepare an answer key
2. Check the answer key
3. Score blindly
4. Check machine-scored answer sheets
5. Check scoring
6. Record scores

ANALYZING THE TEST

Quantitative Item Analysis
A numerical method for analyzing test items employing student response
alternatives or options.

Empirically-based Improvement Procedures

Item-improvement using empirically-based methods is aimed at improving
the quality of an item using students’ responses to the test. Test developers refer
to this technical process as item analysis as it utilizes data obtained separately for
each item. An item is considered good when its quality indices i.e. difficulty index
and discrimination index, meet certain characteristics. For a norm-referenced
test, these two indices are related since the level of difficulty of an item
contributes to its discriminability. An item is good if it can discriminate between
those who perform well in the test and those who do not. However, an extremely
easy item, that which can be answered correctly by more than 85% of the group,

[Prof Ed 221 Assessment of Student Learning 1]

or an extremely difficult item, that which can only be answered correctly by 15%,
is not expected to perform well as a “discriminator”. The group will appear to be
quite homogeneous with items of this kind. They are weak items since they do
not contribute to “score-based inference”.

Difficulty Index
An Item’s difficulty index is obtained by calculating the p value (p) which
is the proportion of students answering the item correctly.
The difficulty of an item or item difficulty is defined as the number of
students who are able to answer the item correctly divided by the total
number of students.

Where p is the difficulty index

R = total number of students answering the item right T
= total number of students answering the item

Here are two illustrative samples:

Item 1: There were 45 students in the class Item 1 has a p value of 0.67. Sixty-seven
who responded to Item 1 and 30 percent (67%) got the item right while
answered it correctly. 33% missed it.

p = 30/45
=0.67

Item 2: In the same class, only 10 Item 2 has a p value of 0.22. Out of 45 only
responded correctly in Item 2. 10 or 22% got the item right while 35 or
78% missed it.

p = 10/45
=0.22

For Normative-referenced test: Between the two items, Item 2 appears to be a much
more difficult item since less than a fourth of the class only was able to respond
correctly.

For Criterion-referenced test: The class shows much better performance in Item 1
than in Item 2. It is still a long way for many to master Item 2.

[Prof Ed 221 Assessment of Student Learning 1]

Range of Difficulty Index Interpretation Action

0 - 0.25 Difficult Revise or Discard

0.26 - 0.75 Right Difficult Retain

0.76 - above Easy Revise or Discard

Discrimination Index
The power of an item to discriminate between informed and uninformed groups
or between more knowledgeable and less knowledgeable learners is shown using the
item discrimination index (D). This is an item statistics that can reveal useful
information for improving an item. Basically an item-discrimination index shows the
relationship between the student’s performance in an item (i.e. right or wrong) and
his/her total performance in the test represented by the total score.
For classroom tests, the discrimination index shows if a difference exists between
the performance of those who scored high and those who scored low in an item. As a
general rule, the higher the discrimination index (D), the more marked the
magnitude of the difference is, and thus, the more discriminating the item is. The
nature of the difference however, can take different directions:
a. Positively discriminating item - proportion of high scoring group is greater
than that of the low scoring group.
b. Negatively discriminating item - proportion of high scoring group is less than
that of the low scoring group.
c. Not discriminating - proportion of high scoring group is equal to that of the
low scoring group.

Calculation of the discrimination index therefore requires obtaining the

difference between the proportion of the high-scoring group getting the item
correctly and the proportion of the low-scoring group getting the item correctly using
this simple formula:

Another calculation can bring about the same result as (Kubiszyn and Borich,
2010):

[Prof Ed 221 Assessment of Student Learning 1]

As you can see is actually getting the p
value of an item. So to get D is to get
the difference between the p-value involving the upper half and the p-value involving
the lower half. So the formula for discrimination index (D) can also be given as
(Popham, 2011):

To obtain the proportions of the upper and lower groups responding to the item
correctly, the teacher follows these steps:

1. Score the test papers using a key to correction to obtain the total scores of the
student. Maximum score is the total number of objective items.

2. Order the test papers from highest to lowest score.

3. Split the test papers into halves: high group and low group.

➢ For a class of 50 or less students, do a 50 - 50 split. Take the upper half as the
HIGH GROUP and the lower as the LOW GROUP.

➢ For a big group of 100 or so, take the upper 25 - 27% and the lower 25 - 27%.

➢ Maintain equal numbers of test papers for Upper and Lower groups.

4. Obtain the p value for the upper group and p value for the lower group.

5. Get the discrimination index by getting the difference between the p-values.

For the purposes of evaluating the discriminating power of items, Popham (2011)
offers the guidelines proposed by Ebel & Frisbie (1991) shown in Table 1. The teachers
can be guided on how to select satisfactory items and what to do to improve the test.

[Prof Ed 221 Assessment of Student Learning 1]

Table 1. Guidelines for Evaluating the Discriminating Efficiency of Items
Discrimination Index Item Evaluation

.40 and above Very good items

.30 - .39 Reasonably good items, but possibly

subject to improvement

.20 - .29 Marginal items, usually needing

improvement

.19 and below Poor items, to be rejected or improved

by revision

Items with negative discrimination indices, although significantly high, are

subject right away to revision if not deletion. With multiple-choice items, negative D
is a forensic evidence errors in item writing. It suggests the possibility of:

✓ Wrong key - more knowledgeable students selected a distracter which is the

correct answer but is not the keyed option

✓ Unclear problem in the stem leading to more than one correct answer
✓ Ambiguous distracters leading the more informed students be divided in
choosing the attractive options

✓ Implausible keyed option which more informed students will not choose

As you can see, awareness of item-writing can provide cues on how to improve
items bearing negative or non-significant discrimination indices.

Distracter Analysis
Another empirical procedure to discover areas for item-improvement utilizes an
analysis of the distribution of responses across the distracters. Especially when the
difficulty index and discrimination index of the item seem suggest its being candidate
for revision, distracter analysis becomes a useful follow-up. It can detect differences
in how the more able students respond to the distracters in a multiple-choice item
compared to how the less able ones do it. It can also provide an index of the plausibility
of the alternatives, that is, if they are functioning as good distracters. Distracters not
chosen at all, especially by the uniformed students need to be revised to increase their
attractiveness.
To illustrate this process, consider the frequency distribution of the responses of
the upper group and lower group across the alternatives for two items. Separate
counts are done for the upper and lower group who chose A,B,C, and D. The data is
organized in a distracter analysis table.

[Prof Ed 221 Assessment of Student Learning 1]

Table 2. Distracter Analysis Table
Item Difficulty Discrimination Group Alternatives
N=40 Index (p) Index (D)
A B C D Omit

1 .38 -.35 Upper 2 10 *5 3

Lower 2 0 12 6

2 .45 -.50 Upper 2 *4 10 4

Lower 5 14 1 0

Analysis:
⚫ What kinds of items do you see based on their D?
⚫ What does their respective D indicate? Cite the data supporting this.
⚫ Which of the two items is more discriminating? Why ⚫ Which
items need to be revised?
Activity 6.2

Task Description: This activity will test your ability to apply empirical procedures for
item-improvement. Solve and answer the following.

1. A final test in Science was administered to a Grade IV class of 50. The teacher
wants to improve further the items for next year’s use. Calculate a quality index that
can be used using the given data and indicate the possible revision needed by some
items.
Item Number getting Index Revision needed to
item correct be done

1 34 ____________

2 18 ____________

3 10 ____________

4 46 ____________

5 8 ____________

2. Below are additional data collected for the same items. Calculate another
quality index and indicate what needs to be improved with the obtained index as a
basis.
Item Upper Group Lower Group Index Revision
needed to be
done

[Prof Ed 221 Assessment of Student Learning 1]

1 25 9 ____________

2 9 9 ____________

3 2 8 ____________

4 38 8 ____________

5 1 7 ____________

3. A distracter analysis table is given for a test item given to a class of 60. Obtain
the necessary item statistics using the given data.
Item Difficulty Discrimination Group Alternatives
N=30 Index (p) Index (D)
A B *C D Omit

1 ? ? Upper 2 18 5 0

Lower 0 10 20 0

Write your evaluation on the following aspects of the item:

a. Difficulty of the Item - _______________________________________________

b. Discriminating power of the Item - ____________________________________

c. Plausibility of Options - ______________________________________________

d. Ambiguity of the answer - ____________________________________________

Qualitative Item Analysis

A non-numerical method for analyzing test items not employing student
responses, but considering test objectives, content validity, and technical item
quality.
Judgmentally-based improvement procedures
This approach basically makes use of human judgment in reviewing the
items. The judges are the teachers themselves who know exactly what the test is
for, the instructional outcomes to be assessed, and the items’ level of difficulty
appropriate to his/her class; the teacher’s peers or colleagues who are familiar
with the curriculum standards for the target grade level, the subject matter
content, and the ability of the learners; and the students themselves who can
perceive difficulties based on their past experiences.
◼ Teachers’ Own Review (Self-review)
It is always advisable for teachers to take a second look at the assessment
tool s/he has devised for a specific purpose. To presume perfection right
away after its construction may lead to failure to detect shortcomings of
the test or assessment task. There are five (5) suggestions given by
Popham (2011, p253) for the teachers to follow in exercising judgment:

[Prof Ed 221 Assessment of Student Learning 1]

1. Adherence to item-specific guidelines and general item-writing
commandments. There are specific guidelines in writing various forms of
objective and non-objective constructed-response types and the selected-
response types for measuring lower level and higher level thinking skills.
Those guidelines should be used by the teachers to check how good the items
have been planned and written particularly in their alignment to intended
instructional outcomes.

2. Contribution to score-based inference. The teacher examines if the

expected scores generated by the test can contribute to making valid
inference about the learners. Can the scores reveal the amount of learning
achieved or show what have been mastered? Can the score infer the
student’s capability to move on to the next instructional level? Or rather the
scores obtained do not make any difference at all in describing or
differentiating various abilities.

3. Accuracy of content. This review should especially be considered

when tests have been developed after a certain period of time. Changes that
may occur due to new discoveries or developments can redefine the test
content of a summative test. If this happens, the items or the key to
correction may have to be revisited.

4. Absence of content gaps. This review criterion is especially useful in

strengthening the score-based inference capability of the test. If the current
tool misses out on important content now prescribed by a new curriculum
standard, the score will likely not give an accurate description of what is
expected to be assessed. The teacher always ensures that the assessment
tool matches what is currently required to be learned. This is a way to check
on the content validity of the test.

5. Fairness. The discussions on item-writing guidelines always give

warning on unintentionally favoring the uninformed students obtain higher
scores. These are due inadvertent grammatical clues, unattractive
distracters, ambiguous problems, and messy test instructions. Sometimes,
unfairness can happen because of due advantage received by a particular
group like those seated in front of the classroom or those coming from a
particular socio-economic level. Getting rid of faulty and biased items and
writing clear instructions definitely add to the fairness of the test.

◼ Peer review
There are schools that encourage peer or collegial review of assessment
instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance-based
assessment tasks. During these teacher dyad or triad sessions, those teaching the

[Prof Ed 221 Assessment of Student Learning 1]

same subject area can openly review together the classroom tests and tasks they
have devised against some consensual criteria. The suggestions given by test
experts can actually be used collegially as basis for a review checklist:
a. Do the items follow the specific and general guidelines in writing items
especially on:
◆ Being aligned to instructional objectives?
◆ Making the problem clear and unambiguous?
◆ Providing plausible options?
◆ Avoiding unintentional clues?
◆ Having only one correct answer?
b. Are the items free from inaccurate content?
c. Are the items free from obsolete content?
d. Are the test instructions clearly written for students to follow?
e. Is the level of difficulty of the test appropriate to level of learners?
f. Is the test fair to all kinds of students?

◼ Student Review
Engagement of students in reviewing items has become a laudable practice
for improving classroom tests. The judgement is based on the students’
experience in taking the test, their impressions and reactions during the testing
event. The process can be efficiently carried out through the use of a review
questionnaire. Popham (2011) illustrates a sample questionnaire shown in Table
3. It is better to conduct the review activity a day after taking the test so the
students still remember the experience when they see a blank copy of the test.

Table 3. Item-Improvement Questionnaire for Students.

1. If any of the items seemed confusing, which ones were they?

2. Did any items have more than one correct answer? If so, which ones?

3. Did any items have no correct answers? If so, which ones?

4. Were there words in any items that confused you? If so, which ones?

5. Were the directions for the test, or for particular subsections, unclear? If so, which
ones?

Acitivity 6.3 Classifying Item - Improvement Approach. Below are descriptions of

procedures done, write J if a judgmental approach is used and E if empirically based.
__________ 1. The Math Coordinator of Grade VII classes examined the periodical
tests prepared by the Math teachers to see if their items are aligned to the target
outcomes for the first quarter.
__________ 2. The alternatives of the multiple-choice items of the Social Studies test
were reviewed to discover if they only have one correct answer.

[Prof Ed 221 Assessment of Student Learning 1]

__________ 3. To determine if the items are efficiently discriminating between the
more able students from the less able ones, a Biology teacher obtained the
discrimination index (D) of the items.
__________ 4. A Technology Education teacher was interested to see if the criterion-
referenced test he has devised shows a difference in the items’ post-test and pre-test
p-values.
__________ 5. An English teacher conducted a session with his students to find out if
there are other responses acceptable in their literature test. He encouraged them to
rationalize their answers.

ITEM ANALYSIS MODIFICATIONS FOR THE CRITERION-

REFERENCED TEST
The statistical test analysis method discussed earlier, called quantitative item
analysis, applies most directly to the norm-referenced test. The classroom teacher
will typically use criterion-referenced tests rather than norm referenced tests. Well,
then, we can just use these same procedures for our teacher-made criterion-
referenced tests. Right? Wrong!
As we will discover in later chapters, variability of scores is crucial to the
appropriateness and success of norm-referenced quantitative item analysis
procedures. In short, these procedures depend on the variability or spread of scores
(i.e., low to high) if they are to do their jobs correctly. In a typical teacher-made
criterion-referenced test, however, variability of scores would be expected to be
small, assuming instruction is effective and the test and its objectives match. Thus,
the application of quantitative item analysis procedures to criterion-referenced
measures may not be appropriate, since by definition most students will answer these
items correctly (i.e., there will be minimal variability or spread of scores). In this
section we will describe several ways in which these procedures can be modified when
a criterion-referenced, mastery approach to test item evaluation is employed. As you
will see, these modifications are straightforward and easier to use than the
quantitative procedures described earlier.

✓ Using Pre- and Post-tests as Upper and Lower Groups

The following approaches require that you administer the test as a pretest prior
to your instruction and as a post-test after your instruction. Ideally, in such a situation
the majority of students should answer most of your test items incorrectly on the
pretest and correctly on the post-test. By studying the difference between the
difficulty (p) levels for each item at the time of the pre- and post-tests, we can tell if
this is happening. At pretest, the p level should be low (e.g., 0.30 or lower), and at
post-test, it should be high (e.g., 0.70 or higher). In addition, we can consider the
pretest results for an item as the lower group (L) and post-test results for the item as
the upper group (U), and then we can perform the quantitative item analysis
procedures previously described to determine the discrimination direction for the key
and for the distractors.

[Prof Ed 221 Assessment of Student Learning 1]

If a criterion-referenced test item manifests these features, it has passed our
“test” and probably is a good item with little or no need for modification. Contrast this
conclusion, however, with the following item from the same test.

[Prof Ed 221 Assessment of Student Learning 1]

Thus, the item in Example 2 failed all the tests. Rather than modify the item, it is
probably more efficient to replace it with another.

✓ Comparing the Percentage Answering Each Item Correctly on Both Pre- and
Pos t-test
If your test is sensitive to your objectives (and assuming you teach to your
objectives), the majority of learners should receive a low score on the test prior to your

[Prof Ed 221 Assessment of Student Learning 1]

instruction and a high score afterward. This method can be used to determine
whether this is happening. Subtract the percentage of students passing each item
before your instruction from the percentage of students passing each item after your
instruction. The more positive the difference, the more you know the item is tapping
the content you are teaching. This method is similar to the first step as described in
the preceding section. For example, consider the following percentages for five test
items:

Notice that item 3 registers no change in the percentage of students passing

from before to after instruction. In fact, a high percentage of students got the item
correct without any instruction! This item may be eliminated from the test, since little
or no instruction pertaining to it was provided and most students already knew the
content it represents.
Now, look at item 5. Notice that the percentage is negative. That is, 14% of the
class actually changed from getting the item correct before instruction to getting it
wrong after. Here, either the instruction was not related to the item or it actually
confused some students who knew the correct answer beforehand. A revision of the
item, the objective pertaining to this item, or the related instruction is in order.
✓ Determining the Percentage of Items Answered in the Expected Direction for
the Entire Test
Another, slightly different approach is to determine whether the entire test
reflects the change from fewer to more students answering items correctly from pre-
to post-test. This index uses the number of items each learner failed on the test prior
to instruction but passed on the test after instruction. Here is how it is computed:

Step 1: Find the number of items each student failed on the pretest, prior to
instruction, but passed on the post-test, after instruction.

The asterisks indicate just the items counted in Step 1 for Bobby. This count is
then repeated for each student.
Step 2: Add the counts in Step 1 for all students and divide by the number of
students.

[Prof Ed 221 Assessment of Student Learning 1]

Step 3: Divide the result from Step 2 by the number of items on the test. Step
4: Multiply the result from Step 3 by 100.

Let’s see how this would work for a 25-item test given to five students before and
after instruction.

DEBRIEFING GUIDELINES
Before handing back answer sheets or grades, you should do the following.
1. Discuss problem items.
2. Listen to student reactions.
3. Avoid on-the-spot decisions.
4. Be equitable with changes.
5. Ask students to double-check.
6. Ask students to identify problems.

THE PROCESS OF EVALUATING CLASSROOM ACHIEVEMENT

Figure 2 summarizes all of the important components of achievement testing
that we have discussed thus far. If you’ve studied and worked at these chapters, you
are ahead in the test construction game. What that means for you is better tests that
cause fewer students and parents to complain and tests that are more valid and
reliable measurements of achievement.

[Prof Ed 221 Assessment of Student Learning 1]

Figure 2. The process of measuring achievement in the classroom. (Kubiszyn &
Borich, 2013 p. 240)
References

[1] De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana
Publishing Co., Inc.
[2] Kubiszyn, T., & Borich, G. (2013). Educational testing and measurement:
Classroom application and practice. John Wiley & Sons, Inc.
[3] Navaroo, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment of learning
1. Lorimar Publishing, Inc.
[4]Popham, W.J. (2017). Classroom assessment: What teachers need to know.
Pearson Education, Inc.

[Prof Ed 221 Assessment of Student Learning 1]

UNIVERSITY OF SOUTHERN MINDANAO

Interpretation and Utilization of

Assessment Data
Prof Ed 221 - ASL 1
Topic Outline
● Types and interpretation of test scores

● Grading and reporting

● Issues in assessment and grading

Insert Running Title 2

Intended Learning Outcomes
1. Provide meaning to test results using norm-referenced and
criterion-referenced interpretations.
2. Utilize assessment results to report students’ learning
progress and achievement.
• Analyze issues in assessment and grading employing
principles of assessment.

3
Insert Running Title
Aim
To provide results
• In brief,
• understandable form
• for varied users.

Insert Running Title 4

The big questions
1. What should I count—just achievement, or effort too?
2. How do I interpret a student’s score? Do I compare it to:
• • other students’ scores (norm-referenced),
• • a standard of what they can do (criterion-referenced),
• • or some estimate of what they are able to do (learning potential, or
self-referenced)?
3. What should my distribution of grades be, and how do I determine it?
4. How do I display student progress, or strengths and weaknesses, to
students and their parents?

Insert Running Title 5

Where do I get the answers?
1. Your school may have some policies or guidelines
2. Apply what you learn in this chapter
3. Consult your teaching colleagues, and then apply your good judgment
4. Learn from first-hand experience

Insert Running Title 6

Functions of Grading and Reporting
Systems
1. Improve students’ learning by: Best achieved by:
• day-to-day
✓clarifying instructional objectives for them tests and feedback
✓showing students’ strengths & weaknesses • plus periodic
✓providing information on personal-social integrated
summaries
development
✓enhancing students’ motivation (e.g., short-
term goals)
✓indicating where teaching might be modified

Insert Running Title 7

Functions of Grading and Reporting
Systems
2. Reports to parents/guardians
✓Communicates objectives to parents, so they can help
promote learning
✓Communicates how well objectives being met, so parents
can better plan

Insert Running Title 8

Insert Running Title 9

Functions of Grading and Reporting
Systems
3. Administrative and guidance uses
✓ Help decide promotion, graduation, honors, athletic
eligibility
✓ Report achievement to other schools or to employers
✓ Provide input for realistic educational, vocational, and
personal counseling

Insert Running Title 10

Types of Grading and Reporting
Systems
1. Traditional letter-grade system
✓ Easy and can average them
✓But of limited value when used as the sole report, because:
1. they end up being a combination of achievement, effort, work
habits, behavior
2. teachers differ in how many high (or low) grades they give
3. they are therefore hard to interpret
4. they do not indicate patterns of strength and weakness

Insert Running Title 11

Types of Grading and Reporting
Systems
2. 1. Pass-Fail
✓ Easy and can average them
✓But of limited value when used as the sole report, because:
1. they end up being a combination of achievement, effort, work
habits, behavior
2. teachers differ in how many high (or low) grades they give
3. they are therefore hard to interpret
4. they do not indicate patterns of strength and weakness

Insert Running Title 12

ASL Basic-Concepts
No ratings yet
ASL Basic-Concepts
41 pages
Educational Assessment Assignment
No ratings yet
Educational Assessment Assignment
6 pages
Portfolio in Prof Ed 8
No ratings yet
Portfolio in Prof Ed 8
30 pages
Unit 1 Week 1 Assessment Concepts in Education Longer
No ratings yet
Unit 1 Week 1 Assessment Concepts in Education Longer
15 pages
SG 1 Assessment 1
No ratings yet
SG 1 Assessment 1
43 pages
Prelim Notes
No ratings yet
Prelim Notes
7 pages
Assessment in Learning PPT 1 (Basic Concepts in Assessment)
No ratings yet
Assessment in Learning PPT 1 (Basic Concepts in Assessment)
38 pages
Educator's Guide to Student Assessment
No ratings yet
Educator's Guide to Student Assessment
8 pages
Ed. 310 - Assessment of Student Learning 1 First Semester 2014 - 2015
No ratings yet
Ed. 310 - Assessment of Student Learning 1 First Semester 2014 - 2015
34 pages
Educational Assessment Principles
100% (1)
Educational Assessment Principles
73 pages
Edu 204 Objectives P
No ratings yet
Edu 204 Objectives P
173 pages
EPS201A Lecture1 030842
No ratings yet
EPS201A Lecture1 030842
4 pages
Evaluating Students' Learning. Types of Assessment
No ratings yet
Evaluating Students' Learning. Types of Assessment
33 pages
EDUC105 Reviewer 1st Exam
No ratings yet
EDUC105 Reviewer 1st Exam
10 pages
Teacher Assessment Guide
No ratings yet
Teacher Assessment Guide
39 pages
Module 1 Assessment in Learning 1
No ratings yet
Module 1 Assessment in Learning 1
3 pages
Assessment As, Of, For Learning
No ratings yet
Assessment As, Of, For Learning
16 pages
Ass&Eva
No ratings yet
Ass&Eva
57 pages
21st Century Assessment for Educators
No ratings yet
21st Century Assessment for Educators
9 pages
The Concept of Educational Assessment
No ratings yet
The Concept of Educational Assessment
37 pages
Basic Concepts Used in Education Assessment
No ratings yet
Basic Concepts Used in Education Assessment
4 pages
Of Student Learning
No ratings yet
Of Student Learning
19 pages
Profed PCK2 Unit 1
No ratings yet
Profed PCK2 Unit 1
3 pages
Psy 311 Cat
No ratings yet
Psy 311 Cat
11 pages
CPT4 Session 1
No ratings yet
CPT4 Session 1
7 pages
Evaluation
No ratings yet
Evaluation
156 pages
Preliminary Concepts and Recent Trends
No ratings yet
Preliminary Concepts and Recent Trends
48 pages
Educational Assesment
No ratings yet
Educational Assesment
34 pages
Chapter Quiz Notes For Easy Reviewing
No ratings yet
Chapter Quiz Notes For Easy Reviewing
5 pages
Section 1 Explain The Nature and Roles of A Good Assessment, and Its Relevance To Learners, Teachers, Parents and Stakeholders
No ratings yet
Section 1 Explain The Nature and Roles of A Good Assessment, and Its Relevance To Learners, Teachers, Parents and Stakeholders
15 pages
Evaluation in Nursing PDF
No ratings yet
Evaluation in Nursing PDF
29 pages
Module Assessment 1
100% (1)
Module Assessment 1
20 pages
Assessment Course
No ratings yet
Assessment Course
221 pages
Basic Concepts in Assessment Notes 1
100% (3)
Basic Concepts in Assessment Notes 1
6 pages
Preliminary Concepts and Recent Trends - Assessment 1 1.1
No ratings yet
Preliminary Concepts and Recent Trends - Assessment 1 1.1
56 pages
Assessment of Student Learning Introduction June 9
100% (1)
Assessment of Student Learning Introduction June 9
68 pages
Ed 8a - Module 1
No ratings yet
Ed 8a - Module 1
32 pages
Classroom Testing and Evaluation
No ratings yet
Classroom Testing and Evaluation
80 pages
Basic-Concept-In-Assessment-Morales Mark John C. - Beed 3C
No ratings yet
Basic-Concept-In-Assessment-Morales Mark John C. - Beed 3C
7 pages
Basic Concepts and Principles in Assessment
No ratings yet
Basic Concepts and Principles in Assessment
49 pages
Reviewer in Prof Ed 8
No ratings yet
Reviewer in Prof Ed 8
7 pages
Lesson 1 4
No ratings yet
Lesson 1 4
8 pages
Assessment in Learning 1
100% (3)
Assessment in Learning 1
20 pages
MODULE 1234 .Profed311
No ratings yet
MODULE 1234 .Profed311
23 pages
Lyksss Portfolio Kemerut 20240604 154541 0000
No ratings yet
Lyksss Portfolio Kemerut 20240604 154541 0000
20 pages
Educator's Guide to Assessment
No ratings yet
Educator's Guide to Assessment
8 pages
Edu 533 - Assessment
No ratings yet
Edu 533 - Assessment
8 pages
Basic Concepts in Assessment - Notes
71% (17)
Basic Concepts in Assessment - Notes
6 pages
UNIT 1 LESSONS 1-3 Preliminary Concepts, Basic Terminologies and Types of Assessment
100% (2)
UNIT 1 LESSONS 1-3 Preliminary Concepts, Basic Terminologies and Types of Assessment
40 pages
SSE 115 (Chap. 1)
No ratings yet
SSE 115 (Chap. 1)
3 pages
Pstmls Evaluation in Health Science Education Part 1 Objectives
No ratings yet
Pstmls Evaluation in Health Science Education Part 1 Objectives
9 pages
Powerpontassessment1 150219064534 Conversion Gate02
No ratings yet
Powerpontassessment1 150219064534 Conversion Gate02
34 pages
Ail 1
No ratings yet
Ail 1
4 pages
Macro Teaching On Nidhi Mam 2
No ratings yet
Macro Teaching On Nidhi Mam 2
47 pages
My Learning Journal in Soc
No ratings yet
My Learning Journal in Soc
3 pages
2 PPT Continuation Shift of Educational Focus From Content To Learning Outcomes PDF
No ratings yet
2 PPT Continuation Shift of Educational Focus From Content To Learning Outcomes PDF
50 pages
Educational Assessment Basics
No ratings yet
Educational Assessment Basics
53 pages
Ped 107 A Unit 1
No ratings yet
Ped 107 A Unit 1
17 pages
43 F
No ratings yet
43 F
19 pages
Philosophy of Music Education
No ratings yet
Philosophy of Music Education
7 pages
8626 - Assignment 2 (AG)
No ratings yet
8626 - Assignment 2 (AG)
28 pages
Homework: Essential for Student Success
100% (1)
Homework: Essential for Student Success
4 pages
How To Teach Social Studies Effectively
100% (1)
How To Teach Social Studies Effectively
13 pages
13 Colonies Lesson Plan
No ratings yet
13 Colonies Lesson Plan
4 pages
July 15 Letter From PS 122Q Principal
No ratings yet
July 15 Letter From PS 122Q Principal
2 pages
Pre-Islamic Arabic Prose Literature & Its Growth
No ratings yet
Pre-Islamic Arabic Prose Literature & Its Growth
196 pages
Decribing Connections Between Historical Events
No ratings yet
Decribing Connections Between Historical Events
4 pages
Aspiring K-8 Teacher's Journey
No ratings yet
Aspiring K-8 Teacher's Journey
2 pages
Agencies of Education
100% (1)
Agencies of Education
4 pages
PRINCIPAL INSTRUCTIONAL MANAGEMENT RATING SCALE - Alex
83% (6)
PRINCIPAL INSTRUCTIONAL MANAGEMENT RATING SCALE - Alex
7 pages
English 10 - Q1 - Week 4
No ratings yet
English 10 - Q1 - Week 4
2 pages
Atg Q1 Cpar WK 7-8-9 Rzar
100% (1)
Atg Q1 Cpar WK 7-8-9 Rzar
2 pages
s5 Template WAP-for-Teachers
No ratings yet
s5 Template WAP-for-Teachers
8 pages
Ariella Katz Suchow Resume - Administration2 No Address
No ratings yet
Ariella Katz Suchow Resume - Administration2 No Address
3 pages
All The Lesson Plans For Persepolis
100% (3)
All The Lesson Plans For Persepolis
23 pages
Let Review Cnu With Answers
87% (52)
Let Review Cnu With Answers
11 pages
English Fluency Difficulties
No ratings yet
English Fluency Difficulties
6 pages
Radical Music Theory: Abstract and Keywords
0% (1)
Radical Music Theory: Abstract and Keywords
22 pages
Tws 4 Assessment Plan
No ratings yet
Tws 4 Assessment Plan
11 pages
Daily Lesson Plan Chapter 5 JTM
No ratings yet
Daily Lesson Plan Chapter 5 JTM
7 pages
Mariachi Education As A District Initiative 3
No ratings yet
Mariachi Education As A District Initiative 3
7 pages
Year 8 Shogunate Japan
No ratings yet
Year 8 Shogunate Japan
11 pages
Classroom Observation Insights
No ratings yet
Classroom Observation Insights
4 pages
SC Baswell Internship Hours Assignment
No ratings yet
SC Baswell Internship Hours Assignment
11 pages
Student Discipline Guidelines
No ratings yet
Student Discipline Guidelines
6 pages
Niet 2021
No ratings yet
Niet 2021
33 pages
Lindsay Durr - Resume
No ratings yet
Lindsay Durr - Resume
1 page
Jane Passy A Handful of Sounds
0% (1)
Jane Passy A Handful of Sounds
153 pages
Improving Writing Skills for Students
No ratings yet
Improving Writing Skills for Students
83 pages