MODULE 2
Basic Concepts and Principles in Educational Assessment
Intended Learning Outcomes
By the end of this topic/chapter, you must be able to:
    1. Define educational assessment.
    2. Recognize basic concepts and principles in assessment of learning like testing,
        measurement, evaluation and assessment.
    3. Characterize testing, measurement, evaluation and assessment.
    4. Differentiate standardized and classroom assessments.
Educational Assessment
        Educational assessment pertains to a much wider idea rather than just simply
confining its scope and meaning to exams and tests.
        While an educational test is to determine someone’s knowledge to something or
to determine what they have learned; the goal of testing is to measure the skill or
knowledge acquired during the process of learning.
        Assessment is a process of documenting knowledge, skills, attitudes and beliefs
in measurable terms. The purpose of assessment in education is to improve both the
teaching process for teachers and the learning process for the students.
        Therefore, we can say that educational assessment is the process of gathering
information about what students have learned in their educational environments.
Forms of Educational Assessment
   o It may involve formal tests or performance-based activities
   o It may be administered online or using paper and pencil or other materials
   o It may be objective (requiring one answer) or subjective (there may be many
     possible answers, like essays)
   o It may be formative (carried out over the course) or summative (administered at
     the end of the course)
Types of Educational Assessment
   1. Formative Assessment – used throughout the educational process, with the goal
      of identifying problem areas and improving teaching and learning
   2. Summative Assessment – used at the end of the learning block, as a final test of
      student’s knowledge
   3. Standardized Assessments – provide a path to discover struggles, successes,
      accelerations on specific elements
   4. Performance-based Assessment – measure student’s ability to apply skills and
      knowledge learned from a unit or units of study
   5. Norm-Versus Criterion Referenced Assessments –
                             Prof Ed 221 Assessment in Learning 1
                                              1
          •   referenced assessments are given for the purpose of comparing student’s
              results to a particular standard
          • norm-referenced tests – standard is based on a large sample of students,
              whose score is referred to as the norm
          • criterion-referenced tests – compare individual students’ results to a
              standard, but this time standard is based on the curriculum and is often
              designed as a cut off for demonstrating efficiency
   6. Alternative Assessments- used to determine what students can or can not do with
      respect to what they already know
Benefits of Good Educational Assessment
   1. Help educators track students’ progress so they can identify anyone who is
      struggling and provide remediation
   2. Provides feedback to students about their own performance, which they can use
      to improve their knowledge and skills further
   3. Motivate students as they know they will be evaluated at the end of each module
      or course
   4. Help educators set learning objectives and outcomes and determine the best
      ways to help students reach their goals
   5. Can be used to improve the curriculum
   6. Can be used to evaluate teachers’ and school systems’ performance, as well as
      the effectiveness of different teaching practices
Principles of quality educational assessment
   1. Must be based on defined objectives and outcomes
   2. Must be valid
Testing
       - A formal, systematic procedure for gathering information
Test
   ❖ A tool comprised of a set of questions administered during fixed period of time
     under comparable conditions for all students
   ❖ Most dominant form of assessment
   ❖ Traditional assessment
   ❖ An instrument used to measure a construct and make decisions
   ❖ Used to measure the learning progress of a student which is formative in purpose,
     or comprehensive covering a more extended time frame which is summative
   ❖ It may not be the best way to measure how much students have learned but they
     still provide valuable information about learning and their progress
Types of Test
                            Prof Ed 221 Assessment in Learning 1
                                             2
A. according to mode of response
1. a. Oral test (viva voice)
             • Answers are spoken
             • Measure oral communication skills
             • Used to check students’ understanding of concepts, theories and
                  procedures
             • Minimally discriminatory and more inclusive especially for learners who
                  are dyslexic
             • Plagiarism is less likely
             • Consumes time and may be stressful to some students
             • Favors extrovert and eloquent students
1.b. Written test
            • Activities wherein students either select or provide a response to a
                prompt
            • Can be administered to a large group at one time
            • Can measure students, written communication skills
            • Can be used to assess lower and higher levels of cognition provided that
                questions are phrased properly enables assessment of a wide range of
                topics
Forms of Written Assessment
          • . Alternate response (true/false)
          • Multiple choice
          • matching
          • Short answer
          • essay
          • Completion
          • Identification
1.c. Performance test
            • Are activities that require students to demonstrate their skills or ability
              to perform specific actions
            • Task are designed to be authentic, meaningful, in-depth and
              multidimensional
            • Cost and efficiency are some of the drawbacks
            • Includes problem-based learning, inquiry task, exhibits, presentation
              task and capstone performances
2. a. Selected Response
            • Alternate response
            • Matching type
            • Multiple choice
2. b. Constructed response
                              Prof Ed 221 Assessment in Learning 1
                                               3
            •   completion
            •   Short answer
            •   Essay restricted or non-restricted
            •   Problem solving
B. according to ease of quantification of response
        1. Objective
            • Corrected and quantified easily
            • Scores can be readily compared
            • It includes true-false, multiple choice, completion and matching items
            • Test items have single or specific convergent response
2. Subjective
            •   Elicits varied response
            •   May have more than one answer
            •   Includes restricted and extended-response essays
            •   Its not easy to check because students have the liberty to write their
                answers
            •   Answers are divergent
            •   Scores are likely to be influenced by personal opinion or judgement by
                the person doing the scoring
C. according to mode of administration
        1. Individual Test
             • Given to one person at a time
             • Individual cognitive and achievement test are administered to gather
                 extensive information about each student’s cognitive functioning and
                 his/her ability to process and perform specific task
             • It can help identify intellectually gifted students
             • It can pinpoint those with learning disabilities (LDs)
             • It can also observe students closely during the test to gather additional
                 information
2. Group Test
             • Administered to a class or group of examinees simultaneously
             • Developed to address the practical need of testing
             • Test is usually objective and responses are more or less restricted
             • It does not lend itself for in-depth observations of individual students
             • Less opportunity to establish rapport or help students maintain interest
                 in the test
             • Students are assessed on all items of the test
             • Students may become bored with easy items and anxious over difficult
                 ones
             • Information obtained from group test is not as comprehensive as those
                 from individual tests
D. according to test constructor
                               Prof Ed 221 Assessment in Learning 1
                                                4
Table 1. Type of Test According to Test Constructor
 Properties         Standardized test                    Non-standardized test
 Prepared by        Specialist who are versed in         teachers who may not be adept at
                    the principles of assessment         the principles of test construction;
                                                         teacher-made test are
                                                         constructed haphazardly due to
                                                         limited time and lack of
                                                         opportunity to pre-test items or
                                                         pilot test
 Learning           Serve as an indicator of             Not thoroughly examined for
 outcomes &         instructional effectiveness          validity
 content            and reflection of the
 measured           school’s performance
 Quality of test    Consists of multiple choice          Uncertain quality; one or several
 items              items used to distinguish            formats are used; items not
                    between students                     entirely objective
 Reliability        Can be used for a long               Scores are not subjected to any
                    period of time                       statistical procedure to determine
                                                         reliability; not intended to be used
                                                         repeatedly for a long time
 Administration     Administered to a large              Administered to one or few
 and scoring        group of students; scoring           classes to measure subject or
                    and procedures are                   course achievement; no
                    consistent; manuals and              established standards for scoring
                    guides are available to aid in       and interpreting
                    the administration and
                    interpretation
E. according to mode of interpreting results
        1. Norm-referenced interpretation
            • Evaluation instruments that measure a student’s performance in
                relation to the performance of a group on the same test
            • Comparisons are made and the students relative position is determined
2. Criterion-referenced interpretations
            • Describe each student’s performance against an agreed upon or pre-
                established criterion or level of performance
            • The criterion is not actually a cutoff score but rather the domain of
                subject matter- the range of well-defined instructional objectives or
                outcomes
            • In a mastery test, the cut score is used to determine whether or not a
                student has achieved mastery of a given unit of instruction
                              Prof Ed 221 Assessment in Learning 1
                                               5
F. according to nature of answer
Table 2. Type of test according to the nature of answer
 1. Personality test          o Measures one’s personality and behavioral style
                              o Used in recruitment as aid in determining how a
                                potential employee will respond to various work-
                                related activities
                              o Used in career guidance, in individual and
                                relationship counseling and in diagnosing
                                personality disorders
                              o In schools, it determines personality strength and
                                weaknesses of students
 2. Achievement test          o Measures student's learning as a result of
                                instructions and training experiences
                              o When used summatively, it is used as a basis for
                                promotion to the next grade
                              o Measures students ability and predicts success in
                                college
 3. Intelligence Test         o Measure learners’ innate intelligence or mental
                                ability
                              o Contain items on verbal comprehension,
                                quantitative and abstract reasoning, among others,
                                in accordance with some recognized theory of
                                intelligence
                              o Alfred Binet & Theodore Simon (1905) – published
                                the first modern intelligence test
                              o Sternberg – constructed a set of multiple choice
                                questions grounded on his Triarctic Theory of
                                Human Intelligence. The intelligence test taps into
                                the three independent aspects of intelligence:
                                analytic, practical and creative
 4. Sociometric Test          o Measures interpersonal relationships in a social
                                group
                              o Introduced in 1930s
                              o Allows learners to express their preferences in terms
                                of likes and dislikes for other members of the group
                              o Includes peer nomination, peer rating and
                                sociometric ranking of social acceptance
 5. Trade or                  o Assess an individual’s knowledge, skills and
 Vocational Test                competence in a particular occupation
                              o Consists of a theory test and a practical test
                              o Upon completion , the individual is given
                                certification for qualification
                              Prof Ed 221 Assessment in Learning 1
                                               6
                                o Used to determine the fruitfulness of training
                                  programs
Functions of Testing
A. Instructional Functions
Table 3. Instructional functions of testing
 1. Test facilitate the clarification of           o When constructing test, teachers are
 meaningful learning objectives                      reminded to go back to the learning
                                                     outcomes
 2. Test provide a means of feedback               o Can be used for self-diagnosis
 to the instructor and the student                 o Students can assess their own
                                                     learning and performance
                                                   o Test results can guide teachers in
                                                     adjusting their pedagogical practices
                                                     to match students’ learning styles
                                              Washback – the impact of a test on teaching
                                              and learning
 3. Test can motivate learning                     o - Frequent testing increases
                                                     academic preparation (study time)
                                                     and academic achievement
                                                   o Frequent testing produces a more
                                                     positive attitude among students
 4. Test can facilitate learning                 o Testing improved performance
                                                     when learners are given the
                                                     opportunity to practice retrieval
                                                     before giving the final test
                                                 o Prompt feedback informs students
                                                     how they are doing
                                              Successive learning – test-restudy practice
                                              method conducted at appropriate intervals
                                              which can bring about long-term retention
 5. Test are useful means of over                 o Preparation for a scheduled test
 learning                                              induces overlearning
                                              Overlearning – continued study, review,
                                              interaction or practice of the same material
                                              even after concepts and skills had been
                                              mastered
                                Prof Ed 221 Assessment in Learning 1
                                                 7
Measurement
   o Refers to the “limit or quantity”. Quantitative description of an object’s
     characteristics or attribute.
   o Determines how much learning a student acquired compared to a standard
     (criterion) or in reference to other learners’ in a group (norm-referenced)
   o Measure particular elements of learning like their readiness to learn, recall of
     facts, demonstration of specific skills, or their ability to analyze and solve applied
     problems
   o Use tools or instruments like tests, oral presentation, written reports, portfolios
     and rubrics to obtain pertinent information
   o Each measurement has two components
              1. a true value of the quantity
              2. random error component
   o The objective in educational measurement is to estimate or approximate the true
     value of the quantity of interest
   o Objective measurements do not depend on the person or individual taking the
     measurements
Evaluation
   o Process of judging the quality of a performance or course of action.
   o Finding the value of an educational task.
   o Carried out both by the teacher and the student to uncover how the learning
     process is developing.
Objects of Evaluation
   1. Instructional programs
   2. School projects
   3. Teachers
   4. Students
   5. Educational goals
Categories of Evaluation
1. Formative Assessment
   ❖   Judging the worth of the program while the program is in progress
   ❖   Focuses on the process
   ❖   Determine deficiencies so that the appropriate interventions can be done
   ❖   Used in analyzing learning materials, student learning and achievements and
       teacher effectiveness
2. Summative Assessment
    ❖ Judging the worth of the program at the end of the program activities
    ❖ Focus is on the result
                             Prof Ed 221 Assessment in Learning 1
                                              8
   ❖ Tools used for data gathering: questionnaire, survey forms,
     interview/observations guide and test
   ❖ Determine the effectiveness of the program based on its objectives
   ❖ Techniques for summative evaluation: pretest-posttest with experimental and
     control group; one group descriptive analysis
Assessment
        Assessment is used to determine students’ learning needs, monitor the progress
of students and examine their performance against identified learning outcomes.
        It may be implemented at different phases of instruction such as;
        a. before (pre-assessment)
        b. during ( formative assessment)
        c. after instruction (summative)
         From the Latin word “assidere” which means “to sit beside a judge” this implies
that assessment is tied up with evaluation.
         It pertains to any method utilized to gather information about student
performance, all activities undertaken by teachers – and by their students in assessing
themselves – that provide information to be used to modify the teaching-learning
activities(TLA) in which they are engaged and aid teachers to make informed decisions
and judgements to improve TLA.
Nature of Assessment
Table 4. Nature of Assessment
Purpose of Assessment
Table 5. Purpose of assessment
 PURPOSE
                                Prof Ed 221 Assessment in Learning 1
                                                 9
 Assessment for          o Diagnostic and formative assessment task which are used
 Learning (AfL)            to determine learning needs, monitor academic progress
                           of students during a unit or block of instruction and guide
                           instruction
                         o Examples: pre-tests, written assignments, quizzes,
                           concept maps, focused questions
 Assessment as           o Employs task or activities that provide students with an
 Learning (AaL)            opportunity to monitor and further their own learning – to
                           think about their personal learning habits and how they
                           can adjust their learning strategies to achieve their goals
                         o Formative which may be given at any phase of the
                           learning process
                         o Involves metacognitive processes like reflection and self-
                           regulation to allow students to utilize their strengths and
                           work on their weaknesses by directing and regulating
                           their learning
                         o Students are accountable and responsible for their own
                           learning
                         o Examples: peer-assessment rubrics, portfolios
 Assessment of           o Summative and done at the end of the unit, task or
 Learning (AoL)            process or period
                         o Purpose is to provide evidence of a student’s level of
                           achievement in relation to curricular outcomes
                         o Used for grading, evaluation and reporting purposes
                         o provides the foundation for decisions on student’s
                           placement and promotion
                         o Examples: unit test, final projects
Relevance of Assessment
1. Students
    ❖ Through varied learner-centered and constructive assessment task. Students
        become actively engaged in the learning process
    ❖ Take responsibility for their own learning
    ❖ Can learn to monitor changes in their learning patterns
    ❖ Become aware of how they think, how they learn, how they accomplish task and
        how they feel about their work
    ❖ Redound to ultimately better student achievement
2. Teachers
    ❖ Informs instructional practice
    ❖ Results can reveal which teaching methods and approaches are most effective
    ❖ Provide direction as to how teachers can help students more and what teachers
       should do next
    ❖ Assessment procedures support instructor’s decisions on managing instruction,
       assessing student competence, placing students to levels of education programs,
       assigning grades to students, guiding and counseling, selecting students for
       education opportunities and certifying competence
                            Prof Ed 221 Assessment in Learning 1
                                             10
3. Parents
    ❖ Valued source of assessment information on the educational history and learning
        habits of their children especially for preschoolers who do not yet understand
        their developmental progress
    ❖ Can help identify needs of children for appropriate intervention
4. Administrators and Program Staff
    ❖ Identify strengths and weaknesses of the program
    ❖ Designate priorities, assess options and lay down plans for improvement
    ❖ Used to make decisions regarding promotion or retention of students and
        arrangement of faculty development
5. Policy Makers
    ❖ Provides information about students’ achievements which in turn reflect the
        quality of education being provided by the school
    ❖ government agencies can set or modify standards, reward or sanction schools
        and direct educational resources
Aptitude and Achievement Test
Achievement and aptitude are two kinds of test which measures two different aspects
of learning. Achievement test refers to the amount of knowledge a student has already
learned or mastered, it is used for determination while aptitude test is used for
projection. Aptitude test most likely tells the potential or ability of a student to learn.
Achievement Test
       It is a test used to assess student’s achievement or mastery of content, skill or
general academic knowledge; it is often used as admission test or placement test in
schools or in a scholarship grant
             1. Standardized achievement test – measures specific things and results are
                 compared across age and grade level of students and often reported as
                 percentile, percentage or grade equivalency; same format, same types
                 of questions and the same content no matter when or where the test is
                 administered or who is taking the test; administered by trained
                 individuals
             2. Non-standardized achievement test – measures stock or previous
                 knowledge and learning of students; specific skill determination to
                 determine ability to a certain specific area of subject; it may be
                 cumulative final exam or performance task
Aptitude Test
          A test which measures test taker’s natural talent or abilities for current and future
use; it includes series of questions in which the taker makes a value judgement, to agree
or disagree, and the results may show what types of career they would be suited for;
          Other types of aptitude tests include personality inventories. These types of
assessments will indicate the personal preferences and interpersonal strengths and
                               Prof Ed 221 Assessment in Learning 1
                                                11
weaknesses of the test taker. These tests may also measure a test taker’s ability to solve
complex problems or future abilities to perform certain tasks.
References
[3] De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana Publishing
Co., Inc. pp 1-32
[6] ] McMillan, J.H. (2018). Classroom assessment: Principles and practice that enhance
student learning and motivation. Pearson Education, Inc. pp 1-33
[8] Navarro, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment of learning 1. Lorimar
Publishing, Inc.
 pp10-16
[14] What is educational assessment (2021) . Retrieved from
https://www.proprofs.com/quiz-school/blog/what-is-educational-assessment-and-why-
is-it-necessary
[15] Forstall, M (2019). Retrieved from https://www.theclassroom.com/achievement-vs-
aptitude-tests-5607096.htm
                              Prof Ed 221 Assessment in Learning 1
                                               12
Prof Ed 221 Assessment in Learning 1
                 13
  UNIVERSITY OF SOUTHERN MINDANAO
Principles of High Quality
       Assessment
         Prof Ed 221 ASL 1
  Topic Outline
1. Principles of High-Quality Assessment
2.Validity
3. Reliability
4. Ethics/Fairness
5. Practicality and efficiency
                                      Insert Running Title   2
Intended Learning Outcomes
1.Interpret principles of high quality
 assessment.
2.Identify factors that make assessment
 valid, reliable, fair, ethical, practical
 and efficient.
                                                     3
                                     Insert Running Title
  What govern assessment of learning?
• Five standards of quality assessment to inform sound
  instructional decisions:
  • 1. Clear purpose
  • 2. Clear learning targets
  • 3. Sound assessment design
  • 4. Effective communication of results
  • 5. Student involvement in the assessment process
   (Chappuis, Chappuis & Stiggins (2009)
                                                         Prof Ed 221 ASL 1   4
Classroom assessment
                       What do you              How are you
Why are you
                        want to                   going to
assessing?
                        assess?                   assess?
   Assessment methods and tools should be parallel to the
    learning targets or outcomes to provide learners with
     opportunities that are rich in breadth and depth and
                promote deep understanding.
• Not all assessment methods are applicable to every type
  of learning outcomes and teachers have to be skillful in
  the selection of assessment methods and designs.
• Knowledge of the different levels of assessment is
  paramount.
               ILO: students  Assessment:
               should be able written essay
               to communicate
               their ideas
               verbally
  Identifying Learning Outcomes
                         • pertains to a particular level of knowledge,
 Learning                  skills and values that a student has acquired
                           at the end of the unit or period of study as
                           a result if his/her engagement in a set of
 outcomes                  appropriate and meaningful learning
                           experiences.
An organized set of learning outcomes helps teachers plan and deliver
appropriate instruction and design valid assessment tasks and
strategies.
     Four Steps In A Student Outcomes Assessment
        Anderson, et al. (2005)
1. create learning outcome statements;
     2. design teaching/assessments to achieve these outcomes
     statements;
           3. implement teaching/assessment activities;
                 4. analyze data on individual and aggregate levels; and
                        5. reassess the process
Taxonomy of learning domains
    Learning Outcomes
                            • statements of performance expectations:
Learning outcomes             cognitive, affective and psychomotor
Within each domain are levels of expertise that drives assessment.
These levels are listed in order of increasing complexity.
Higher levels require more sophisticated methods of assessment but
they facilitate retention and transfer of learning.
All learning outcomes must be capable of being assessed and measured
– using direct and indirect assessment techniques.
      Cognitive (Knowledge-based)
originally devised by Bloom, Engelhart, Furst, Jill & Krathwohl (1956) and
revised by Anderson, Krathwohl et al. (2001)
produced a two-dimensional framework of Knowledge and Cognitive
Processes and account for 21st century needs by including metacognition
designed to help teachers understand and implement a standards-based
curriculum.
involves the development of knowledge and intellectual skills
answers the question, "What do I want learners to know?”
Cognitive (Knowledge-based)
            • stressed that the revised Bloom's
              taxonomy table is not only used to
              classify instructional and learning
Krathwohl     activities used to achieved the
  (2002)      objectives but also for assessments
              employed to determine how well
              learners have attained and mastered
              the objectives.
Cognitive (Knowledge-based)
              Self-system
Marzano
             Metacognitive
& Kendall       system
 (2007)
                                   Knowledge;
                                 Comprehension;
            Cognitive system   Analysis; Knowledge
                                    Utilization
Cognitive (Knowledge-based)
                       Knowledge - same with Remembering
    Cognitive System   Comprehension - entails synthesis &
                       representation
                       Analysis - involves processes of matching,
                       classifying, error analysis, generalizing &
                       specifying
                       Knowledge Utilization – decision making,
                       problem-solving, experimental inquiry
                       and investigation
E.g.. Science
Design an experiment to determine the factors that
     affect the strength of an electromagnet
     Which of the following factors does not affect the strength
     of an electromagnet?
     • a. diameter of the coil
     • b. direction of the windings
     • c. nature of the coil material
     • d. number of the turns in the coil
PSYCHOMOTOR (Skills-based)
focuses on physical and mechanical skills involving
coordination of the brain and muscular activity
answers the question "What actions do I want learners to be
able to perform?"
       PSYCHOMOTOR (Skills-based)
                                   • Imitation, Manipulation, Precision,
  Dave, (1970) identified 5          Articulation & Naturalization
     levels of behavior
                          • Perception, Set, Guided Response,
Simpson (1972) laid down 7 Mechanism, Complex Overt Response,
    progressive levels      Adaptation & Origination
  Harrow (1972) developed her own • Reflex movements, basic fundamental
taxonomy with 6 categories organized movement, perceptual, physical activities,
 according to degree of coordination skilled movements, non-discursive
                                     communication
   AFFECTIVE (Values, Attitudes & Interest
emphasizes emotional knowledge
tackles the question, "What actions do I want learners to think or
care about?”
developed by Krathwohl, Bloom & Masia (1964)
includes factors such as student motivation, attitudes, appreciation
and values
Types Of Assessment Methods
ASSESSMENT METHODS
Categorized according to the nature and characteristics of each
method
Similar to carpenter tools and you need to choose which is apt for a
given task
It is not wise to stick to one method of assessment.
“If the only tool you have is a hammer, you tend to see every problem
as a nail.”
Assessment Methods
   MacMillan (2007)
                        Selected-response
                       Constructed-response
                       Teacher-observation
                      Student self-assessment
1. SELECTED-RESPONSE FORMAT
students select from a given set of options to answer a
question or a problem
its objective and efficient because there is only one correct
or best answer
the items are easy to grade - teacher can assess and score a
great deal of content quickly
1. SELECTED-RESPONSE FORMAT
             alternate
multiple-
             response      matching type
 choice
            (true/false)
     2. CONSTRUCTED-RESPONSE FORMAT
demands that students create or produce their own answers in response
to a question, problem or task
is more useful in targeting higher level of cognition
items may fall under any of the following categories
 •   brief-constructed response items
 •   performance tasks
 •   essay items
 •   oral questioning
  2.A. Brief-constructed Response Items
require only short responses from students
E.g. sentence completion where students fill in a blank at the
end of the statement
E.g. short -answer to open-ended questions
E.g. labeling a diagram
E.g. answering a mathematics problem by showing their
solutions
  2.b. PERFORMANCE ASSESSMENT
require students to perform a task rather than select from a given set of
options
students have to come up with a more extensive and elaborate answer or
response
called authentic or alternative assessments because students are
required to demonstrate what they can do through activities, problems
and exercises
can be a more valid indicator of students' knowledge and skills than other
assessment methods
2.b. PERFORMANCE ASSESSMENT
                 • contains the performance criteria used
                   for grading performance tasks
                 • may be analytic scoring rubric where
                   different dimensions and characteristics
Scoring Rubric
                   of performance are identified and marked
                   separately
                 • holistic rubric where the overall process
                   or product is rated
     2.b. PERFORMANCE TASKS
provide opportunities for students to apply their knowledge and skills in
real-world contexts
may be product-based or skills-oriented
students have to create or produce evidence of their learning or do
something and exhibit their skills
2.b. Examples of Products
    written       reflection
                                  journals
    reports        papers
    projects     web pages         tables
                spreadsheets/
     poems                         graph
                 worksheets
                audio-visual    illustrations/
    portfolio
                 materials          models
2.B. Examples Of Performance Or Skills-based
Activities
    speech         role play       athletics
  teaching
                    recital
demonstration
dramatic reading   debate
2.b. PERFORMANCE ASSESSMENT
  can result to better
  integration of          greater focus on higher
  assessment with         order thinking skills
  instruction
  increased motivation
                          improved instructional
  level in the learning
                          and content validity
  process
2.c. Essay Assessments
  involve answering a question or proposition in written
  form
  allows students to express themselves and
  demonstrate their reasoning
  may be easy to construct, but they require much
  thought in the part of the teacher
  essay questions have to be clear so that students can
  organize their thoughts quickly and directly answer the
  questions
  use rubric to score essays
2.c. Essay Assessments
                      • requires a few sentences
                      • there are constraints to the content
Restricted response
                        and nature of the response
                      • questions are more focused
                      • Allow for more flexibility on the part
                        of the student
Extended response
                      • Responses are longer and more
                        complex
    2.d. Oral Questioning
  Common assessment method during instruction to check on student
                        understanding
 May take the form of an interview or conference when done formally
   The teacher can keep students on their toes, received acceptable
responses, elicit various types of reasoning from the students and at the
                  same time strengthen their confidence.
  The teacher can probe deeper and find out for himself/herself if the
             student knows what he/she is talking about.
Responses to oral questions are assessed using a scoring system or rating
                                 scale.
 3. TEACHER OBSERVATIONS
A form of on-going assessment, usually done in combination with oral
questioning
Teacher regularly observe students to check on their understanding
By watching how students respond to oral questions and behave during
individual and collaborative activities, the teacher can get information if
learning is taking place in the classroom
Non-verbal cues communicate how learners are doing. Teachers have to
be watchful if students are losing attention, misbehaving or appear non-
participative in classroom activities.
3. TEACHER OBSERVATIONS
It would be beneficial if teachers make observational or anecdotal
notes to describe how students learn in terms of concept building,
problem solving, communication skills, etc.
can also be used to assess the effectiveness of teaching strategies and
academic interventions
Information gathered from observations reveal the strengths and
weaknesses of individual students and the class a whole
serve as basis for planning and implementing new supports for learning
4. STUDENT SELF-ASSESSMENT
one of the standards of quality assessment identified by Chappuis,
Chappuis & Stiggins (2009)
process where the students are given the chance to reflect and rate
their own work and judge how well they have performed in relation
to a set of assessment criteria
students tract and evaluate their own progress or performance
self-monitoring techniques like activity checklist, diaries and self-
report inventories
  4. STUDENT SELF-ASSESSMENT
provide an opportunity to reflect on their performance, monitor
their learning progress, motivate them to do well and give
feedback to the teacher which the latter can use to improve the
subject/course
enhances student achievement, improves self-efficacy and
promotes a mastery goal orientation and more meaningful
learning
an essential component of formative assessment
   References
• https://irds.stanford.edu/sites/g/files/sbiybj10071/f/msmt.pdf
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana Publishing Co., Inc.
2. McMillan, J.H. (2018). Classroom assessment: Principles and practice that enhance student learning and
   motivation. Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers need to know. Pearson Education, Inc.
                                                                                                     Insert Running Title   39
UNIVERSITY OF SOUTHERN MINDANAO
Module 3 Validity &
   Reliability
       Prof Ed 221 ASL 1
    Topic Outline
. Principles of High-Quality
1
Assessment
2.Validity
3. Reliability
4. Ethics/Fairness
5. Practicality and efficiency
                                 Insert Running Title   2
Intended Learning Outcomes
1.Interpret principles of high
 quality assessment.
2.Identify factors that make
 assessment valid, reliable, fair,
 ethical, practical and efficient.
                                               3
                               Insert Running Title
VALIDITY and
RELIABILITY
CHAPTER 4
VALIDITY
VALIDITY
❑is a term derived from the Latin word validus
 which means “strong”.
❑It pertains to accuracy of the inferences teachers
 make about students based on the information
 gathered from the assessment (McMillan, 2007;
 Fives & DiDonato-Barnes 2013)
VALIDITY
❑Content-Related Evidence
  ✓Face Validity
  ✓Instructional Validity
❑Criterion-Related Evidence
  ✓Concurrent Validity
  ✓Predictive Validity
❑Construct-Related Evidence
  ✓Convergent Validity
  ✓Divergent Validity
Content-Related Evidence
  Content-Related Evidence
❑ pertains to the extent to which the test covers the domain of
 content. If a summative test covers a unit with four topics, then the
 assessment should contain items from each topic. This is done
 through adequate sampling of content. A student’s performance in
 the test may be used an in indicator of his\her content knowledge.
    •Face Validity                 •Instructional Validity
                                ❑The extent to which an
❑Test that appears to            assessment is systematically
adequately measure the           sensitive to the nature of
                                 instruction offered.
learning outcomes and
content                         ❑An instructionally valid test is one
                                 that registers differences in an
❑Based on the subjective         amount and kind of instruction to
opinion of the one viewing it    which students have been
❑Non-systematic or non-          exposed. –Yoon & Resnick,1998
specific
   Content-Related Evidence
TABLE OF SPECIFICATION
➢prepared before developing the test
➢test blueprint that identifies the content area and describes the
 learning outcomes at each level of domain –Notar, et al., 2004
➢a tool used in conjunction with lesson and unit planning to help
 teachers make genuine connections between planning,
 instruction, and assessment –Fives and DiDonato-Barnes, 2013
➢assures teachers that they are testing students’ learning across
 a wide range of content and readings as well as cognitive
 processes requiring higher order thinking
    Content-Related Evidence
SIX ELEMENTS IN TOS DEVELOPMENT
➢Balance among the goals selected for the examination
➢Balance among the levels of learning
➢The test format
➢The total number of items
➢The number of items for each goal and level of
 learning
➢The enabling skills to be selected from each goal
 framework
Criterion-Related Evidence
    Criterion-Related Evidence
•refers to the degree to which test scores agree
 with an external criterion.
•examines the relationship between an assessment
 and another measure of he same trait –McMillan,
 2007
•Three types of criteria:
   • Achievement test scores
   • Ratings, grades and other numerical judgments made by the
     teacher
   • Career data
    Criterion-Related Evidence
CONCURRENT VALIDITY               PREDICTIVE VALIDITY
•Provides an estimate of a       •Pertains to the power or
 student’s current                usefulness of test scores
 performance in relation to       to predict future
 previously validated or          performance
 established measure
 *in testing correlations between two data sets for both concurrent
 and predictive validity, the PEARSON CORRELATION COEFFICIENT (r)
 or SPEARMAN’S RANK ORDER CORRELATION may be used.
 Coefficient of determination = r2
Construct-Related Evidence
    Construct-Related Evidence
•An assessment of the quality of the instrument used
•Measures the extent to which the assessment is a
 meaningful measure of an unobservable trait or
 characteristic –McMillan,2007
•Three types of construct-related evidences:
    •theoretical
    •Logical
    •Statistical
• The construct must be operationally defined or explained
  explicitly to differentiate it from other constructs.
        Construct-Related Evidence
• In 1955, Lee Cronbach and Paul Meehl insisted that to provide evidence
  of construct validity, one has to develop a nomological network.
• Construct validity can take the form of a differential group study.
• Another form is an intervention study wherein a test is given to a group
  of students who are weak in problem-solving strategies.
  • Two methods of establishing construct validity: convergent and divergent
    validation.
    • Convergent validity occurs when measures of constructs that are
      related in fact observed to be related.
    • Divergent validity occurs when constructs that are unrelated are in
      reality observed not to be.
      Construct-Related Evidence
•In 1959, Campbell and Fiske developed a statistical
 approaches called Multitrait-Multimethod Matrix MTMM
   •A table of correlations arranged to facilitate the assessment
    of construct validity, integrating both convergent and
    divergent validity.
• McMillan, 2007 recommends, for practical purposes, the
  use of clear definitions and logical analysis as construct
  related evidences.
Unified Concept of Validity
     Unified Concept of Validity
• Messick, 1989 proposed a unified   •Six distinct aspect of
  concept of validity which
  integrates considerations of        construct validity:
  content, criteria, and              •Content
  consequences into a construct       •Substantive
  framework for the empirical
  testing of rational hypotheses      •Structural
  about score meaning and             •Generalizability
  theoretically relevant              •External
  relationships.
                                      •Consequential
Validity of Assessment Methods
Validity of Other Assessment Methods
•Developing performance
 assessments involves:
 •Define the purpose
 •Choose the activity
 •Develop criteria for scoring
Validity of Assessment Methods
❑Define the purpose
 •The first step is about determining the
  essential skills students need to develop
  and content worthy of understanding.
 •To acquire validity evidence in terms of
  content, performance assessments
  should be reviewed by qualified content
  experts.
 Validity of Assessment Methods
❑Choose the activity
 • The selected performance should reflect a valued activity.
 • The completion of performance assessments should provide
   a valuable learning experience.
 • The statement of goals and objectives should be clearly
   aligned with the measureable outcomes of the performance
   activity.
 • The task should not examine extraneous or unintended
   variables.
 • Performance assessments should be fair and free from bias.
Validity of Assessment Methods
❑Develop criteria for scoring
 • In scoring, a rubric or rating scale should be created.
 • In controlled conditions, oral questioning has high validity.
 • For observations, operational and response definition should be
   accurately describe the behavior of interest.
 • It is highly valid if evidence is properly recorded and interpreted.
     • TRIANGULATION-a technique to validate results through cross verification from two
       or more sources.
 • Validity in self-assessment is described as the agreement
   between self-assessment ratings with teacher judgments or
   peer rankings.
“No  single type of instrument or method of
data collection can assess the vast array of
learning and development outcomes in a
school program“
                            -McMillan, Linn and Gronlund, 2009
Threats to Validity
Threats to Validity
McMillan, Linn and Gronlund, 2009 identified ten factors that affect
valifity of assessment results.
•Unclear test directions               •Inappropriate level of
•Complicated vocabulary                 difficulty of test items
 and sentence structure                 for outcomes being
•Ambiguous statements                   measured
•Inadequate time limits                •Short test
•Inappropriate level of                •Improper arrangement
 difficulty of test items               of items
•Poorly constructed test               •Identifiable pattern of
 items                                  answers
Threats to Validity
McMillan, 2007 laid down suggestions for enhancing validity.
    Ask others to judge the clarity of what are you assessing.
    Check to see if different ways of assessing the same
    thing give the result.
    Sample a sufficient number of examples of what is being
    assessed.
Prepare detailed table of specification
Ask others to judge the match between the assessment
iitems and the objectives of the assessment.
Compare groups known to differ on what is being
assessed.
Compare scores taken before to those taken after
instruction.
Compare predicted consequences to actual
consequences.
Compare scores on similar, but different traits.
Provide adequate time to complete the assessment.
Ensure appropriate vocabulary, sentence structure and
item difficulty.
Ask easy question first.
Use different methods to assess the same thing.
Use only for intended purposes.
RELIABILITY
 RELIABILITY
• It talks about reproducibility and consistency in methods and
  criteria.
• Reliable assessment produces the same results if given to an
  examinee on two occasions.
• It pertains to the obtained assessment results and not to the
  test or any other instrument.
• It is unlikely to turn out 100% because no two tests will
  consistently produce identical results.
• Environmental factors like lightning and noise may affect
  reliability.
• Student error and physical well-being of examinees also affect
  consistency of assessment results.
RELIABILITY
•For a test to be valid, it has to be reliable.
•It is expressed as correlation coefficient.
•Two types of reliability:
 •Internal reliability
   •Assesses the consistency of results across items
    within a test.
 •External reliability
   •Gauges the extent to which a measure varies
    from one use to another.
SOURCES OF RELIABILITY
EVIDENCE
SOURCES OF RELIABILITY EVIDENCE
                  DECISION                    STABILITY
                  CONSISTENCY
    INTERNAL
    CONSISTENCY                 EQUIVALENCE               SCORER OR
                                                            RATER
                                                          CONSISTENCY
   STABILITY
•The test-retest reliability correlates scores
obtained from two administrations of the
same test over a period of time.
    EQUIVALENCE
•Parallel forms or reliability ascertain the equivalency of
 forms. In this method, two different versions of an
 assessment tool are administered to the same group
 of individuals. However, the items are parallel, i.e. they
 probe the same construct, base knowledge or skill.
 The two sets of scores are then correlated in order to
 evaluate the consistency of results across alternative
 versions.
    INTERNAL CONSISTENCY
•It implies that a student who has mastery learning
 will get all or most of the items correctly while a
 student who knows little or nothing about the
 subject matter will get all or most of the items
 wrongly.
•To check the internal consistency, the split-half
 method can be used.
INTERNAL CONSISTENCY
❑SPEARMAN-BROWN FORMULA
  ❑Whole test reliability =   2x reliability on ½ test
                              1 + reliability on ½ test
  *to improve the reliability of the test
  employing this method, items with low
  correlations are either removed or modified.
   INTERNAL CONSISTENCY
❑For internal consistency, the range of
reliability measures are rated as
follows:
 ❑0.00-0.49 *low reliability
 ❑0.50-0.80 *moderate reliability
 ❑0.81 above *high reliability
      SCORER or RATER CONSISTENCY
•People do nor necessarily rate in a similar way.
•Certain characteristics of the raters contribute to errors like
 bias, halo effect, mood, fatigue, among others.
•Inter-rater reliability
    • it is the degree to which different raters, observers or judges agree
      in their assessment decision.
    • It is useful when grading essays, writing samples, performance
      assessment and portfolios.
    • To estimate inter-rater reliability, the Spearman’s rho (for ordinal
      data) or Cohen’s kappa (for nominal and discrete data) may be
      used.
   DECISION CONSISTENCY
•describes hoe consistent the classification
 decisions are rather than how consistent the
 scores are.
•seen in situations when teacher decide who
 will receive a passing or fail mark, or
 considered to possess mastery or not.
MEASUREMENT ERRORS
    MEASUREMENT ERRORS
•It can be caused examinee-specific factors like
 fatigue, boredom, lack of motivation,
 momentary lapses of memory and
 carelessness.
•It can also be caused by test-specific factors.
•It can also arise due to scoring factors.
  MEASUREMENT ERRORS
❑CLASSICAL TEST THEORY
 ❑X= T + E
  ❑X is the observation ( a measured score)
  ❑T is the true value
  ❑E is some measurement error
RELIABILITY of ASSESSMENT METHODS
 RELIABILITY of ASSESSMENT METHODS
•Below are the ways to improve reliability of
 assessment results (Nitko & Brookhart, 2011)
 •Lengthen the assessment procedure by
  providing more time, more questions and more
  observation whenever practical.
 •Broaden the scope of the procedure by
  assessing all the significant aspects of the
  largest learning performance.
❖Improve objectivity by using a systematic and more
 formal procedure for scoring student performance. A
 scoring scheme or rubric would prove useful.
❖Use multiple markers by employing inter-rater
 reliability.
❖Combine results from several assessments especially
 when making crucial educational decisions.
❖Provide sufficient time to student in completing the
 assessment procedure.
❖Teach students how to perform their best by
 providing practice and training to students and
 motivating them.
❖Match the assessment difficulty to the students’
 ability levels by providing tasks that are neither too
 easy nor too difficult and tailoring the assessment to
 each student’s ability level when possible.
❖Differentiate among students by selecting
 assessment tasks that distinguish or discriminate
 the best from the least able students.
  References
• https://irds.stanford.edu/sites/g/files/sbiybj10071/f/msmt.pdf
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of
   learning 1. Adriana Publishing Co., Inc.
2.McMillan, J.H. (2018). Classroom assessment: Principles and
   practice that enhance student learning and motivation.
   Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers
   need to know. Pearson Education, Inc.
                                                          Insert Running Title   52
UNIVERSITY OF SOUTHERN MINDANAO
   Module 3 Ethics
       Prof Ed 221 ASL 1
    Topic Outline
. Principles of High-Quality Assessment
1
2.Validity
3. Reliability
4. Ethics/Fairness
5. Practicality and efficiency
                                   Insert Running Title   2
Intended Learning Outcomes
1.Interpret principles of high
 quality assessment.
2.Identify factors that make
 assessment valid, reliable, fair,
 ethical, practical and efficient.
                                               3
                               Insert Running Title
ETHICS
CHAPTER 6
ETHICS
CHAPTER 6
     Teachers' assessments have important
long-term and short-term consequences for
students; thus teachers have an ethical
responsibility to make decisions using the
most valid and reliable Information
possible
                           -Russell & Airasian, 2012
  Students' Knowledge of Learning
  Targets and Assessments
• Transparency
  ➢disclosure of information to students about
   assessments.
  ➢This includes
     ➢what learning outcomes are to be assessed
       and evaluated
     ➢ assessment methods and formats
     ➢ weighting of items
     ➢ allocated time in completing the
       assessment
     ➢grading criteria or rubric.
Students' Knowledge of Learning
Targets and Assessments
• For written tests, it is important that students
 know what is included and excluded in the test.
• As for performance assessments, the criteria
 should be divulged prior to assessment so that
 students will know what the teacher is looking
 for in the actual performance or product.
Students' Knowledge of Learning
Targets and Assessments
• What about surprise tests or pop quizzes?
   • Graham's (1999) study revealed that unannounced
    quizzes raised test scores of mid-range
    undergraduate students and majority of students in
    his sample claimed to appreciate the use of quizzes.
Students' Knowledge of Learning
Targets and Assessments
• What about surprise tests or pop quizzes?
   • Kamuche (2007) reported that unannounced
     quizzes showed better academic performance than
     the control group with announced quizzes.
   • Graham (cited by Kamuche, 2007) stated that
     unannounced quizzes tend to increase the
     examination tension and stress, and did not offer a
     fair examination.
Students' Knowledge of Learning
Targets and Assessments
• Test-taking skills is another concern.
• Teachers should not create unusual
 hybrids of assessment formats.
Opportunity to Learn
     • McMillan (2007) asserted that fair
       assessments are aligned with instruction that
       provides adequate time and opportunities for
       all students to learn.
     • Discussing an extensive unit in an hour is
       obviously insufficient.
Opportunity to Learn
     • Inadequate instructional approaches would
      not be just for the learners because they are
      not given enough experiences to process
      information and develop their skills.
Prerequisite Knowledge and Skills
   • Students may perform poorly in an assessment if
     they do not possess background knowledge and
     skills.
   • It would be improper if students are tested on the
     topic without any attempt or effort to address the
     gap in knowledge or skills.
   • The problem is compounded if there are
     misconceptions. The need for action and correction
     is more critical.
Prerequisite Knowledge and Skills
    • The teacher can analyze the assessment items
      and procedures and determine the pieces of
      knowledge and skills required to answer
      them.
    • the teacher can administer a prior knowledge
      assessment, the results of which can lead to
      additional or supplemental teacher or
      students-managed activities like peer-assisted
      study sessions, compensatory groups, note
      swapping and active review.
Prerequisite Knowledge and Skills
     • Another problem emerges if the assessment
      focuses heavily on prior knowledge and prerequisite
      skills.
     • So as not to be unfair, the teacher must identify
      early on the prerequisite skills necessary for
      completing an assessment.
  Prerequisite Knowledge and Skills
• The teacher may also provide clinics or reinforced tutorials to
 address gaps in students' knowledge and skills.
• He/she may also recommend reading materials or advise students to
 attend supplemental instruction sessions when possible.
Prerequisite Knowledge and Skills
     • In the undergraduate level, prerequisites
      are imposed to ensure that students
      possess background knowledge and skills
      necessary to advance and become
      successful in subsequent courses.
Avoiding Stereotyping
     • A stereotype is a generalization of a group
      of people based on inconclusive
      observations of a small sample of this
      group.
     • Common stereotypes are racial, sexual and
      gender remarks.
Avoiding Stereotyping
     • Stereotyping is caused by preconceived
      judgments of people one comes in contact
      with which are sometimes unintended.
     • It is different from discrimination which
      involves acting out one's prejudicial
      opinions.
Avoiding Stereotyping
     • A professional education teacher may
      believe that since the education program
      is dominated by females, they are better
      off as teachers than males.
     • Stereotypes may either be positive or
       negative.
Avoiding Stereotyping
     • Teachers should avoid terms and
      examples that may be offensive to
      students of different gender, race,
      religion, culture or nationality.
     • Stereotypes can affect students'
      performance in examinations.
Avoiding Stereotyping
     • In 1995, Steele & Aronson developed the theory
      of stereotype threat claiming that for people
      who are challenged in areas they deem
      important like intellectual ability, their fear of
      confirming negative stereotypes can cause them
      to falter in their actual test performance.
Avoiding Stereotyping
    • To reduce the negative effects of stereotype threat,
     simple changes in classroom instruction and assessment
     can be implemented.
    • A school environment that fosters positive practices and
     supports collaboration instead of competition can be
     beneficial especially for students in diverse classrooms
     where ethnic, gender and cultural diversity thrive.
Avoiding Stereotyping
    • Jordan & Lovett (2006) recommended five
      concrete changes to psycho-educational
      assessment to alleviate stereotype threats:
      ❖Be careful in asking questions about topics related to
        a student's demographic group. This may
        inadvertently induce stereotype threats even if the
        information presented in the test is accurate.
Avoiding Stereotyping
    ❖Place measures of maximal performance like ability
     and achievement tests at the beginning of
     assessments before giving less formal self-report
     activities that contain topics or information about
     family background, current home environment,
     preferred extracurricular activities and self-
     perceptions of academic functioning.
  Avoiding Stereotyping
❖Do not describe tests as diagnostic of Intellectual capacity.
❖Consider possibility of stereotype threat when interpreting test
 scores of susceptible typecast individuals.
Avoiding Stereotyping
    ❖Determine if there are mediators of
     stereotype threat that affect test
     performance. This can be done using
     informal interviews or through
     standardized measures of cognitive
     interference and test anxiety.
Avoiding Bias in Assessment Tasks and
procedures
      • Assessment must be free from bias.
      • Fairness demands that all learners are given
       equal chances to do well (from the task) and
       get a good assessment (from the rater).
      • Teachers should not be affected by factors that
       are not part of the assessment criteria.
Avoiding Bias in Assessment Tasks and
procedures
      • This aspect of fairness also includes removal
       of bias towards students with limited
       English or with different cultural
       experiences when providing instruction and
       constructing assessments (Russell & Airasian,
       2012).
Avoiding Bias in Assessment Tasks and
procedures
      • There are two forms of assessment bias:
       offensiveness and unfair penalization (Popham,
       2011). These forms distort test performance of
       individuals in a group.
Avoiding Bias in Assessment Tasks and
procedures
      • Offensiveness happens if test-takers get
       distressed, upset or distracted about how an
       individual or a particular group is portrayed in
       the test.
      • They tend to focus on the offensive items and
       their concentration in answering subsequent
       items suffers.
  Avoiding Bias in Assessment Tasks and
  procedures
• Ultimately, they end up not performing as well as they could have,
 reducing the validity of inferences.
Avoiding Bias in Assessment Tasks and
procedures
      • Unfair penalization harms student
       performance due to test content, not
       because items are offensive but rather, the
       content caters to some particular groups
       from the same economic class, race,
       gender, etc., leaving other groups at a loss
       or a disadvantage.
Avoiding Bias in Assessment Tasks and
procedures
      • Unfair penalization causes distortion and
       greater variation in scores which is not due to
       differences in ability.
      • Substantial variation or disparity in assessment
       scores between student groups is called
       disparate impact.
Avoiding Bias in Assessment Tasks and
procedures
      • Popham (2011) pointed out that disparate
       impact is not tantamount to assessment bias.
      • Differentiation may yet exist but it may be due
       to inadequate prior Instructional experience.
Avoiding Bias in Assessment Tasks and
procedures
      • If the test showed no signs of bias, then it is
       insinuated that the disparate impact is due
       to prior instructional inadequacies or lack of
       preparation.
Avoiding Bias in Assessment Tasks and
procedures
      • To avoid bias during the instruction phase,
       teachers should heighten their sensitivity
       towards bias and generate multiple examples,
       analogies, metaphors and problems that cut
       across boundaries.
Avoiding Bias in Assessment Tasks and
procedures
      • Teachers can have their tests reviewed by
       colleagues to remove offensive words or
       items.
      • Content -knowledgeable reviewers can
       scrutinize the assessment procedure or each
       item of the test.
• In developing high-stakes test, a review panel is
 usually formed - a mix of male and female members
 from various subgroups who might be adversely
 impacted by the test.
• On each item, the panelists are asked to determine if
  it might offend or unfairly penalize any group of
  students on the basis of personal characteristics.
• Each panel member responds and gives their
  comments.
• The mean per Item absence-of-bias Index is
 calculated by getting the average of the "no"
 responses.
• If an Item is found biased, the item is discarded.
• Qualitative comments are also considered in the
 decision to retain, modify or reject items.
• Afterwards, the entirety of the test is checked
 for any bias.
• As for the empirical approach, try-out evidence is
  sought.
• The test may be pilot-tested to different groups
  after which differential item functioning (DIF)
  procedures may be employed.
• A test item is labeled with DIF when people with
  comparable abilities but from different groups have
  unequal chances of item success.
• Item response theory (IRT), Mantel-Haenszel and logistic regression are
Accommodating Special Needs
    • The legal basis for accommodation is contained
     in Sec. 12 of Republic Act 7277 entitled "An Act
     Providing for the Rehabilitation, Self-
     Development and Self Reliance of Disabled
     Person and their Integration into the
     Mainstream of Society and for Other
     Purposes"
Accommodating Special Needs
    • Another is Sec. 32 of CHED Memorandum 09, s.
     2013 on "Enhanced Policies and Guidelines on
     Student Affairs and Services" which states that
     higher education institutions should ensure that
     academic accommodation is made available to
     persons with disabilities and learners with special
     needs.
Accommodating Special Needs
    • Accommodation does not mean giving
     advantage to students with learning disabilities
     but rather allowing them to demonstrate their
     knowledge on assessments without
     hindrances from the disabilities.
Accommodating Special Needs
   • Accommodations can be placed in one of six
     categories (Thurlow, McGrew, Tindal, Thompson
     & Ysseldyke, 2000)
     o Presentation (repeat directions, read aloud, use large
       print, braille)
     o Response (mark answers in test booklet, permit
       responses via digital recorder or computer, use
       reference materials like dictionary)
Accommodating Special Needs
   oSetting (study carrel, separate room, preferential
    seating, Individualized or small group, special
    lightning)
   oTiming (extended time, frequent breaks, unlimited
    time)
Accommodating Special Needs
   oScheduling (specific time of day, subtests in
    different order, administer test in several timed
    sessions).
   oOthers (special test preparation techniques and
    out-of-level tests)
Accommodating Special Needs
  • To ensure the appropriateness of the
   accommodation supplied, it should take into
   account three important elements:
     ▪ Nature and extent of the learner's disability
     ▪ Type and format of assessment.
     ▪ Competency and content being assessed
Accommodating Special Needs
   • Nature and extent of the learner's disability
      ▪ Accommodation is dictated by the type and degree of
        disability possessed by the learner. A learner with
        moderate visual impairment would need a larger print
        edition of the assessment or special lighting condition.
        Of course, a different type of accommodation is needed
        if the child has severe visual loss.
Accommodating Special Needs
    • Type and format of assessment
      ▪ Accommodation is matched to the type and format
        of assessment given. Accommodations vary
        depending on the length of the assessment, the
        time allotted, mode of response, etc. A partially
        deaf child would not require assistance in a written
        test.
Accommodating Special Needs
    • Type and format of assessment
      ▪ However, his/her hearing impairment would affect
        his/her performance should the test be dictated.
        He/she would also have difficulty in assessment
        tasks characterized by group discussions like round
        table sessions.
Accommodating Special Needs
  • Competency and content being assessed
    ▪ Accommodation does not alter the level of performance
      or content the assessment measures. In Science,
      permitting students to have a list of scientific formulae
      during a test is acceptable if the teacher is assessing how
      students are able to apply the formulae and not simple
      recall.
Accommodating Special Needs
    • Competency and content being assessed
      ▪ In Mathematics, if the objective is to add and
        subtract counting numbers quickly, extended time
        would not be a reasonable accommodation.
  Relevance
• Relevance can also be thought of as an aspect of
 fairness.
• Irrelevant assessment would mean short-changing
 students of worthwhile assessment experiences.
• Assessment should be set in a context that
 students will find purposeful. Killen (2000) gave
 additional criteria for achieving quality
Relevance
   • "Assessment should reflect the knowledge and
     skills that are most important for students to
     learn.“
     • Assessment should not Include Irrelevant and trivial
       content. Instead, it should measure learner's higher-
       order abilities such as critical thinking, problem solving
       and creativity which are 21" century skills.
Relevance
   • "Assessment should support every student's
     opportunity to learn things that are important."
      • Assessment must provide genuine opportunities for
       students to show what they have learned and
       encourage reflective thinking. It should prompt them
       to explore what they think is important.
Relevance
   • "Assessment should tell teachers and individual
     students something that they do not already
     know."
      • Assessment should stretch students' ability and
       understanding. Assessment tasks should allow
       them to apply their knowledge in new situations.
Ethical Issues
      • Grades and reports of teachers generated
       from using invalid and unreliable test
       instruments are unjust. Resulting
       interpretations are inaccurate and
       misleading.
Ethical Issues
      • Other ethical issues in testing (and research)
       that may arise include possible harm to the
       participants; confidentiality of results;
       deception in regard the purpose and use of the
       assessment; and temptation to assist students
       in answering tests or responding to surveys.
End
Insert Running Title   62
ETHICS
CHAPTER 6
  References
• https://irds.stanford.edu/sites/g/files/sbiybj10071/f/msmt.pdf
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of
   learning 1. Adriana Publishing Co., Inc.
2.McMillan, J.H. (2018). Classroom assessment: Principles and
   practice that enhance student learning and motivation.
   Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers
   need to know. Pearson Education, Inc.
                                                          Insert Running Title   64
    UNIVERSITY OF SOUTHERN MINDANAO
Development of Traditional Tools for
  Classroom-Based Assessment
           Prof Ed 221-ASL 1
  Topic Outline
• ● Selected-response type items: Multiple-choice, binary-
  choice, and matching; Advantages, disadvantages, best
  practices
• ● Constructed-response type items: Completion, short-
  answer, and essay; Scoring criteria; Advantages,
  disadvantages, best practices
                                                     Insert Running Title   2
  Intended Learning Outcomes
• 5.1 Recognize the advantages and disadvantages of using different
  selected-response type items, including multiple-choice, binary-choice,
  and matching.
• 5.2 Identify appropriate practices in the construction of selected-
  response items.
• 5.3 Construct sound selected-response items that match the nature of
  the learning target that is assessed.
• 5.4 Recall the advantages and disadvantages of using different types of
  constructed-response items.
• 5.5 Identify appropriate practices for writing and/or selecting effective
  completion, short-answer, and essay type items.
• 5.6 Construct effective completion, short-answer, and essay type items,
  and scoring criteria.
                                                                                   3
                                                                   Insert Running Title
PREPARING A TEST BLUEPRINT
                       Identify purpose of
                             the test
                                             1
               Specify learning
               outcomes to be
                  assessed
     2
                         Prepare test
                          specifica-
                             tions           3
     4         Construct pool of
                    items
                        Review and revise
                             items           5
FIGURE 7.1 Test Development Process for Classroom Tests
   What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
  including:
      1. The purpose of the test: It might be something simple, such as
assessing knowledge prior to instruction to a get a baseline of what
students know before taking a course. Alternatively, the test purpose
might be more complex, such as assessing retention of material
learned across several professional education courses to determine
eligibility for advancement
                                                                Insert Running Title   6
  What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
  including:
      2. The content framework: Start with the topics presented first
during the instruction
                                                                Insert Running Title   7
  What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
  including:
      3. The testing time: This includes amount of testing time
available     and      the      need      for     breaks,        as            well
as other logistical issues related to the test administration.
                                                                  Insert Running Title   8
   What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
  including:
       4. The content weighting (aka, number of items per content area): The
number of questions per topic category should reflect the importance of the
topic; that is, they should correlate with the amount of time spent on that topic in
the course. For example, if there are 20 one-hour lectures, there may be 10
questions from each hour of lecture or associated with each hour of expected
study. The number of questions per category can be adjusted up or down to
better balance the overall test content and represent the importance of each
lecture, as well as the total lecture time.
                                                                         Insert Running Title   9
   What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
  including:
       4. The content weighting (aka, number of items per content area): The
number of questions per topic category should reflect the importance of the
topic; that is, they should correlate with the amount of time spent on that topic in
the course. For example, if there are 20 one-hour lectures, there may be 10
questions from each hour of lecture or associated with each hour of expected
study. The number of questions per category can be adjusted up or down to
better balance the overall test content and represent the importance of each
lecture, as well as the total lecture time.
                                                                         Insert Running Title   10
    What is a Test Blueprint?
• A test blueprint is a list of key components defining your test,
  including:
•       5. The item formats (e.g., MCQ, essay question): The item
    formats should always be appropriate for the purpose of the
    assessment.
                                                                Insert Running Title   11
  Benefits of Test Blueprints
Test blueprints will help ensure that your tests:
o Appropriately assess the instructional objectives of the course
o Appropriately reflect key course goals and objectives – the material
to be learned
o Include the appropriate item formats for the skills being assessed
                                                              Insert Running Title   12
  Benefits of Test Blueprints
Test blueprints can be used for additional purposes besides
test construction:
o Demonstrate to students the topics you value, and serve as a study guide
for them
o Facilitate learning by providing a framework or mental schema for
students
o Ensure consistent coverage of exam content from year to year
o Communicate course expectations to stakeholders
                                                                 Insert Running Title   13
  Goals of Using TOS
▪ improving validity of a teacher’s evaluations based on a given
 assessment.
       Validity is the degree to which the evaluations or judgments we make as
teachers about our students can be trusted based on the quality of evidence we
gathered (Wolming & Wilkstrom, 2010)
       -It is important to understand that validity is not a property of the test
constructed, but of the inferences we make based on the information gathered
from a test.
                                                                           Insert Running Title   14
 Sources of Classroom Assessment Validity
1. evidence based on test content -
   underscores the degree to which a
   test (or any assessment task)
   measures what it is designed (or
   supposed) to measure (Wolming &
   Wilkstrom, 2010)
                                     Insert Running Title   15
 Sources of Classroom Assessment Validity
1. evidence based on test content -
   -we are interested in knowing if the
   measured (tested/assessed)
   objectives reflect what you claim to
   have measured
                                     Insert Running Title   16
 Sources of Classroom Assessment Validity
2. evidence based on response process
    - is concerned with the alignment of
the kinds of thinking required of
students during instruction and during
assessment (testing) activities.
                                     Insert Running Title   17
 References
• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=
  web&cd=&cad=rja&uact=8&ved=2ahUKEwjxuMqGvOf2AhXT
  4jgGHcMZDJYQFnoECBkQAQ&url=https%3A%2F%2Fwww.
  nbme.org%2Fsites%2Fdefault%2Ffiles%2F2020-01%2FTest-
  Blueprinting-Lesson-
  2.pdf&usg=AOvVaw3kxnN1AkZh1_RjEj_lUcaR
• https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=
  web&cd=&ved=2ahUKEwjxuMqGvOf2AhXT4jgGHcMZDJYQF
  noECBwQAQ&url=https%3A%2F%2Fwww.montclair.edu%2F
  profilepages%2Fmedia%2F6109%2Fuser%2Fv18n4a_Authors
  Proof_final.docx&usg=AOvVaw11LCpJtBLYmZ0RMcatKJy5
                                                  Insert Running Title   18
UNIVERSITY OF SOUTHERN MINDANAO
Module 5 – Measuring
 Learning Outcomes
       Prof Ed 221 ASL 1
  Topic Outline
• ● Goals, standards, learning competencies, and
  instructional objectives
• ● Taxonomy of Educational Objectives: Cognitive (revised
  Bloom’s taxonomy); Affective; Psychomotor
• ● Learning outcomes and assessment methods
                                                     Insert Running Title   2
  Intended Learning Outcomes
• 4.1 Identify differences among goals, standards, learning
  competencies, and instructional objectives.
• 4.2 Distinguish learning outcomes in the three domains of
  learning.
• 4.3 Classify learning targets.
• 4.4 Match learning outcomes with appropriate assessment
  methods.
                                                                       3
                                                       Insert Running Title
Goals, standards, learning
competencies, and instructional
objectives
                             Insert Running Title   4
     Educational Goals
• statements that describe the       • Examples:
  skills, competencies and                 Think positive to stay focused.
  qualities that you should                Stay resilient.
  possess upon completion of a             Make time to read.
  course or program. It usually            Manage your time.
  involves identifying objectives,         Find time to relax.
  choosing attainable short-term
                                           Strive for excellence.
  goals and then creating a plan
                                           Build a strong network.
  for achieving those goals.
                                           Build good study habits.
                                                             Insert Running Title   5
   Educational Standards
• learning goals for what students should know and be
  able to do at each grade level
                                             Insert Running Title   6
       Educational Standards
1. Content Standards
     - define what students should know and be able to do,
specifying skills or knowledge at various grade levels (Marzano,
1996, 1997). In the past, schools often used whatever content
was found in their textbooks. With this reform, content
standards are defined by national subject areas associations,
local districts, or states. Schools are then expected to develop
curriculum standards within and across subjects.
                                                    Insert Running Title   7
       Educational Standards
2. Curriculum Standards
     - usually describe instructional techniques or classroom
activities that help students achieve the content standard
(Marzano, 1996, 1997). Curriculum standards are often
developed at each grade level in all the core subjects as well as
others as defined by the school or district. These curriculum
standards are aligned with content standards and identify what
goes on in classrooms to help students achieve the standard.
                                                    Insert Running Title   8
          Educational Standards
3. Performance Standards
       - specify the level of performance in a skill or area of knowledge that is considered
acceptable (Burger, 1996, 1997). These measurable expectations for performance, sometimes
termed "benchmarks," are aligned with both curriculum and content standards in each subject
area. In many schools the acceptable level of performance has been defined by teachers
focusing on their own classrooms. In standards-based reform, educators and other
stakeholders define acceptable levels of performance for all students. The issue of what to do
when students do not achieve a particular performance level remains one of the great
challenges of this reform. Should the student go through remediation, get held back, be
required to take summer school, be excluded from graduation, or receive some other sanction?
                                                                            Insert Running Title   9
  Learning Competencies
• A general statement that describes the use of desired knowledge,
  skills, behaviors, and abilities. Competencies often define specific
  applied skills and knowledge that enables people to successfully
  perform specific functions in a work or educational setting
                                                               Insert Running Title   10
  Learning Competencies
• Functional competencies: Skills that are required to use on a
  daily or regular basis, such as cognitive, methodological,
  technological, and linguistic abilities
• Interpersonal competencies: Oral, written, and visual
  communication skills, as well as the ability to work effectively
  with diverse teams
• Critical thinking competencies: The ability to reason
  effectively, use systems thinking, and make judgments and
  decisions toward solving complex problems
                                                          Insert Running Title   11
   Instructional Objectives
• An instructional objective is a statement that will describe what the learner will be able
  to do after completing the instruction. (Kibler, Kegla, Barker, Miles, 1974).
• According to Dick and Carey (1990), a performance objective is a detailed description of
  what students will be able to do when they complete a unit of instruction. It is also
  referred to as a behavioral objective or an instructional objective.
• Robert Mager (1984), in his book Preparing Instructional Objectives, describes an
  objective as "a collection of words and/or pictures and diagrams intended to let others
  know what you intend for your students to achieve" (pg. 3). An objective does not
  describe what the instructor will be doing, but instead the skills, knowledge, and
  attitudes that the instructor will be attempting to produce in learners.
                                                                                 Insert Running Title   12
  Instructional Objectives
• Instructional objectives are specific, measurable, short-term, observable student
  behaviors. They indicate the desirable knowledge, skills, or attitudes to be
  gained.
• An instructional objective is the focal point of a lesson plan. Objectives are the
  foundation upon which you can build lessons and assessments and instruction
  that you can prove meet your overall course or lesson goals.
                                                                         Insert Running Title   13
Insert Running Title   14
Insert Running Title   15
Insert Running Title   16
Learning outcomes and
assessment methods
                        Insert Running Title   17
Matching learning targets with assessment
                methods
    CONSTRUCTED ALIGNMENT
              • provides the "how to" by
                verifying that the TLAs and
                the ATs activate the same
                verbs as in the ILOs
Constructed
 Alignment    • Performance verbs in the
                ILOs are indicators of the
                methods of assessment
                suitable to measure and
                evaluate student learning
Matching learning targets with assessment
                methods
            LEARNING TARGET
                  • a description of performance that
                    includes what learners should know
                    and be able to do
                  • contains the criteria used to judge
Learning Target     student performance
                  • derived from national and local
                    standards
                  • similar with learning outcome
  LEARNING TARGETS & ASSESSMENT METHODS
  (Macmillan, 2007)
                                                             ASSESSMENT METHODS
TARGET                                              SR &
                                                                E       PT       OQ      O   SSA
                                                    BCR
Knowledge & Simple Understanding                       5        4        3           4   3   3
Deep Understanding & Reasoning                         2        5        4           4   2   3
Skills                                                 1        3        5           2   5   3
Products                                               1        1        5           2   4   4
Affect                                                 1        2        4           4   4   5
           Note: Higher numbers indicate better matches (eg. 5 = excellent, 1=poor
  Knowledge and simple understanding
pertains to mastery of substantive subject matter and procedures
covers the lower order thinking skills of remembering, understanding and
applying (Bloom’s taxonomy)
Selected-response and constructed-response best in assessing lower-level
learning targets in terms of coverage and efficiency
 • best in assessing lower-level learning targets in terms of coverage and efficiency
 • facts, concepts, principles and procedures delegate to pencil-and-paper tests quite
   well
   Knowledge and simple understanding
Essays
 • elicit original responses and response patterns
 • effective especially if students are required to organize, connect or
   integrate ideas
 • used to assess writing skills of students
Oral Questioning
 • assess knowledge and simple understanding but not as efficient as
   selected-response items
 • often used during instruction to check for mastery and
   understanding of a limited amount of factual information and
   provide immediate progress feedback.
        DEEP UNDERSTANDING AND
        REASONING
Reasoning
 • mental manipulation and use of knowledge in critical
   and creative ways
Deep Understanding and Reasoning
 • involve higher order thinking skills of analyzing,
   evaluating and synthesizing
 • Essays are best - assess complex learning outcomes
   because students are required to demonstrate their
   reasoning and thinking skills
 • Oral questioning – can also be used but it is less time
   efficient than essays
 • Performance tasks are effective as well
DEEP   UNDERSTANDING          AND    REASONING
E.g. Compare and contrast two topics or ideas; or Explain
the pros and cons of an argument
 • Through essays, teachers can detect errors in factual
   content, writing and reasoning.
Selected-response and brief-constructed response
 • demand more thought and time in crafting in order to
   target understanding rather than simple recall or rote
   memorization.
Interpretive exercise
 • consists of a series of objective items based on a given
   verbal, tabular or graphic information like a passage
   from a story, a statistical table or a pie chart.
       SKILLS
Performance assessment
• the superior assessment method
• authentic assessment - when used in real-life and
  meaningful context
• suited for applications with less-structured problems
  where problem identification; collection, organization,
  integration and evaluation of information; and
  originality are emphasized
• used when students are tasked to conduct oral
  presentation or physical performance or create a
  product
       PRODUCTS
assessed through performance tasks
substantial and tangible output that showcases a
student's understanding of concepts and skills
and their ability to apply, analyze, evaluation and
integrate those concepts and skills.
PRODUCTS
   musical
                  stories
 compositions
   research
                  poems
    studies
                   model
  drawings
                construction
                multimedia
                 materials
  References
• https://www.indeed.com/career-advice/career-
  development/educational-goals-
  examples#:~:text=Educational%20goals%20are%20stateme
  nts%20that,plan%20for%20achieving%20those%20goals.
1. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of
   learning 1. Adriana Publishing Co., Inc.
2.McMillan, J.H. (2018). Classroom assessment: Principles and
   practice that enhance student learning and motivation.
   Pearson Education, Inc.
3. Popham, W.J. (2017). Classroom assessment: What teachers
   need to know. Pearson Education, Inc.
                                                       Insert Running Title   30
    UNIVERSITY OF SOUTHERN MINDANAO
Development of Traditional Tools for
  Classroom-Based Assessment
           Prof Ed 221-ASL 1
  Topic Outline
• ● Selected-response type items: Multiple-choice, binary-
  choice, and matching; Advantages, disadvantages, best
  practices
• ● Constructed-response type items: Completion, short-
  answer, and essay; Scoring criteria; Advantages,
  disadvantages, best practices
                                                     Insert Running Title   2
  Intended Learning Outcomes
• 5.1 Recognize the advantages and disadvantages of using different
  selected-response type items, including multiple-choice, binary-choice,
  and matching.
• 5.2 Identify appropriate practices in the construction of selected-
  response items.
• 5.3 Construct sound selected-response items that match the nature of
  the learning target that is assessed.
• 5.4 Recall the advantages and disadvantages of using different types of
  constructed-response items.
• 5.5 Identify appropriate practices for writing and/or selecting effective
  completion, short-answer, and essay type items.
• 5.6 Construct effective completion, short-answer, and essay type items,
  and scoring criteria.
                                                                                   3
                                                                   Insert Running Title
PLANNING THE TEST
OVERALL TEST DEVELOPMENT PROCESS
  • The process of test construction for classroom testing
   applies the same initial steps in the construction of any
   instrument designed to measure a psychological
   construct.
                       Identify purpose of
                             the test
                                             1
               Specify learning
               outcomes to be
                  assessed
     2
                          Prepare test
                           specifica-
                              tions          3
     4         Construct pool of
                    items
                        Review and revise
                             items           5
FIGURE 7.1 Test Development Process for Classroom Tests
OVERALL TEST DEVELOPMENT PROCESS
  • Planning Phase where purpose of the test is identified,
   learning outcomes to be assessed are clearly specified and
   lastly a table of specifications Is prepared to guide the
   item construction phase
OVERALL TEST DEVELOPMENT PROCESS
 • Item Construction Phase - where test items are constructed
  following the appropriate item format for the specified
  learning outcomes of instruction.
OVERALL TEST DEVELOPMENT PROCESS
  • Review Phase -where items are examined by the teacher or
   his/her peers, prior to administration based on judgment of
   their alignment to content and behavior components of the
   instructional competencies, and after administration, based
   on an analysis of students' performance in each item.
Identifying Purpose of Test
• Testing as an assessment mechanism aims to gather valid and
 reliable Information useful to both learners and teachers for
 formative as well as summative purposes.
• Classroom formative assessments seek to uncover what students
 know and can do to get feedback on what they need to alter or
 work on farther to improve their learning.
  Identifying Purpose of Test
• Feedback provided is used primarily to address specific
 student learning problems while instruction is still in
 progress (Russel and Airasian, 2012).
Multiple-choice items
• Very well in detecting and diagnosing the source of difficulty in
 terms of misconceptions and areas of confusion.
• Each option or alternative can represent a type of error that
 students are likely to commit.
Formative use of a classroom test for
diagnosis
• Alternatives or distracters in selected-response items can be in
 terms of popular falsehoods, misconceptions, misinterpretations,
 or inadequately stated principles students may likely adapt.
• By obtaining the option plausibility of the distracters, the teacher
 can identify what to reinforce in the lesson follow-up based on
 the most frequently chosen error.
 Summative use of classroom test
• The test considers the planned competencies to be developed in the
 unit of work.
• Consequently, the learners spend enormous time reviewing, recalling
 or re-learning their past lessons prior to testing.
• Their test motivation is contingent on the stake they put on testing,
 to "pass the test", to "pass the course", or to "get high grades".
Specifying the Learning Outcomes
• Defining learning has progressed from being simply an
 accumulation of facts to being able to allow the learner to
 interpret and apply such facts to create new knowledge.
• Developments in the assessment learning have, of late, focused on
 multiple measures of student performance reflecting different
 levels of outcomes of the teaching-learning process.
  Specifying the Learning Outcomes
• The learning outcomes communicate both specific content and
 nature of tasks to be performed.
• Assessment then becomes a quality assurance tool for tracking
 student progress in attaining the curriculum standards In terms of
 content and performance (Enc. No.1, DepEd Order No. 73, s 2012).
Specifying the Learning Outcomes
• Processes for assessment recognize and address different learning
 targets defined by the intended outcomes from knowledge of facts
 and information covered by the curriculum at every level to various
 facets of showing understanding of them:
   • what operative processes or skills they can demonstrate
   • what bigger and newer ideas they can form and
   • derive the innovative products and processes they can create including
    their authentic application in real-life.
Specifying the Learning Outcomes
• Summative tests given at the end of an instructional
  process focus on the accomplishment of the learning
  outcomes demarcated in every unit of work designed
  in the curriculum.
• As the focus of assessment varies due to recognized
  levels of learning, so do the methods or techniques
  for assessment.
• Each learning outcome when properly stated,
  defines the behavior or task to be performed within
  a given content area.
Specifying the Learning Outcomes
• Classroom tests need to be carefully planned to
  ensure that they truthfully and reliably quantify
  what are intended to be measured.
  • Post instructional assessment tool expected to cover
    the curriculum standards of a subject or course, grade
    or year level in terms of measurable and demonstrable
    student outcomes.
  • Pre-instructional assessment tool which can diagnose
    what the learners know of the new lesson for
    instructional adjustment on the part of the teacher.
  Preparing a Test Blueprint
• Whatever the purpose of the test maybe, a teacher must
 determine appropriately the learning outcomes to be
 assessed and how they will be assessed.
Preparing a Test Blueprint
• Particularly realizing this planning phase helps teachers make
 genuine connections in the trilogy among curriculum, instruction
 and assessment.
• The curriculum dictates the instructional as well as assessment
 strategies to be applied while assessment informs both the
 curriculum and instruction what decisions to make to improve
 learning.
Preparing a Test Blueprint
• To assure the preparation of a good test, a test
 blueprint is commonly set up in a two-way Table of
 Specifications (TOS) that basically spells out WHAT
 will be tested and HOW it will be tested to obtain the
 information needed.
Preparing a Test Blueprint
• WHAT covers two aspects:
  • content area (i.e. subject matter) being covered
  • target learning outcomes (i.e. competencies)
• HOW specifies the test format:
  • the type of assessment question or task to be used
  • the item distribution to attain an effective
  • balanced sampling of skills to be tested.
•The length of test should be able to sample what
 students should know based on an outline of
 work and not on ease of constructing questions
 particularly for low level outcomes.
•The more important a learning outcome is the
 more likely will there be more points allotted to
 it. McMillan (2007) suggests some rules of thumb
 in determining how many items are sufficient for
 good sampling.
• A minimum of ten items is needed to assess each
  knowledge learning target in a unit but which should
  represent a good cross-section of difficulty of items.
• However, if there are more specific learning targets
  to be tested, at least five items would be enough for
  each one to allow for criterion-referenced
  Interpretation for mastery.
• Eighty percent (80%) correct of items for a
  competency is an acceptable mastery criterion.
  References
• https://www.ulethbridge.ca/teachingcentre/creating-
  selected-response-questions
                                                        Insert Running Title   26
    UNIVERSITY OF SOUTHERN MINDANAO
Development of Traditional Tools for
  Classroom-Based Assessment
           Prof Ed 221-ASL 1
  Topic Outline
• ● Selected-response type items: Multiple-choice, binary-
  choice, and matching; Advantages, disadvantages, best
  practices
• ● Constructed-response type items: Completion, short-
  answer, and essay; Scoring criteria; Advantages,
  disadvantages, best practices
                                                     Insert Running Title   2
  Intended Learning Outcomes
• 5.1 Recognize the advantages and disadvantages of using different
  selected-response type items, including multiple-choice, binary-choice,
  and matching.
• 5.2 Identify appropriate practices in the construction of selected-
  response items.
• 5.3 Construct sound selected-response items that match the nature of
  the learning target that is assessed.
• 5.4 Recall the advantages and disadvantages of using different types of
  constructed-response items.
• 5.5 Identify appropriate practices for writing and/or selecting effective
  completion, short-answer, and essay type items.
• 5.6 Construct effective completion, short-answer, and essay type items,
  and scoring criteria.
                                                                                   3
                                                                   Insert Running Title
PLANNING THE TEST
OVERALL TEST DEVELOPMENT PROCESS
  • The process of test construction for classroom testing
   applies the same initial steps in the construction of any
   instrument designed to measure a psychological
   construct.
                       Identify purpose of
                             the test
                                             1
               Specify learning
               outcomes to be
                  assessed
     2
                          Prepare test
                           specifica-
                              tions          3
     4         Construct pool of
                    items
                        Review and revise
                             items           5
FIGURE 7.1 Test Development Process for Classroom Tests
OVERALL TEST DEVELOPMENT PROCESS
  • Planning Phase where purpose of the test is identified,
   learning outcomes to be assessed are clearly specified and
   lastly a table of specifications Is prepared to guide the
   item construction phase
OVERALL TEST DEVELOPMENT PROCESS
 • Item Construction Phase - where test items are constructed
  following the appropriate item format for the specified
  learning outcomes of instruction.
OVERALL TEST DEVELOPMENT PROCESS
  • Review Phase -where items are examined by the teacher or
   his/her peers, prior to administration based on judgment of
   their alignment to content and behavior components of the
   instructional competencies, and after administration, based
   on an analysis of students' performance in each item.
Identifying Purpose of Test
• Testing as an assessment mechanism aims to gather valid and
 reliable Information useful to both learners and teachers for
 formative as well as summative purposes.
• Classroom formative assessments seek to uncover what students
 know and can do to get feedback on what they need to alter or
 work on farther to improve their learning.
  Identifying Purpose of Test
• Feedback provided is used primarily to address specific
 student learning problems while instruction is still in
 progress (Russel and Airasian, 2012).
Multiple-choice items
• Very well in detecting and diagnosing the source of difficulty in
 terms of misconceptions and areas of confusion.
• Each option or alternative can represent a type of error that
 students are likely to commit.
Formative use of a classroom test for
diagnosis
• Alternatives or distracters in selected-response items can be in
 terms of popular falsehoods, misconceptions, misinterpretations,
 or inadequately stated principles students may likely adapt.
• By obtaining the option plausibility of the distracters, the teacher
 can identify what to reinforce in the lesson follow-up based on
 the most frequently chosen error.
 Summative use of classroom test
• The test considers the planned competencies to be developed in the
 unit of work.
• Consequently, the learners spend enormous time reviewing, recalling
 or re-learning their past lessons prior to testing.
• Their test motivation is contingent on the stake they put on testing,
 to "pass the test", to "pass the course", or to "get high grades".
Specifying the Learning Outcomes
• Defining learning has progressed from being simply an
 accumulation of facts to being able to allow the learner to
 interpret and apply such facts to create new knowledge.
• Developments in the assessment learning have, of late, focused on
 multiple measures of student performance reflecting different
 levels of outcomes of the teaching-learning process.
  Specifying the Learning Outcomes
• The learning outcomes communicate both specific content and
 nature of tasks to be performed.
• Assessment then becomes a quality assurance tool for tracking
 student progress in attaining the curriculum standards In terms of
 content and performance (Enc. No.1, DepEd Order No. 73, s 2012).
Specifying the Learning Outcomes
• Processes for assessment recognize and address different learning
 targets defined by the intended outcomes from knowledge of facts
 and information covered by the curriculum at every level to various
 facets of showing understanding of them:
   • what operative processes or skills they can demonstrate
   • what bigger and newer ideas they can form and
   • derive the innovative products and processes they can create including
    their authentic application in real-life.
Specifying the Learning Outcomes
• Summative tests given at the end of an instructional
  process focus on the accomplishment of the learning
  outcomes demarcated in every unit of work designed
  in the curriculum.
• As the focus of assessment varies due to recognized
  levels of learning, so do the methods or techniques
  for assessment.
• Each learning outcome when properly stated,
  defines the behavior or task to be performed within
  a given content area.
Specifying the Learning Outcomes
• Classroom tests need to be carefully planned to
  ensure that they truthfully and reliably quantify
  what are intended to be measured.
  • Post instructional assessment tool expected to cover
    the curriculum standards of a subject or course, grade
    or year level in terms of measurable and demonstrable
    student outcomes.
  • Pre-instructional assessment tool which can diagnose
    what the learners know of the new lesson for
    instructional adjustment on the part of the teacher.
  Preparing a Test Blueprint
• Whatever the purpose of the test maybe, a teacher must
 determine appropriately the learning outcomes to be
 assessed and how they will be assessed.
Preparing a Test Blueprint
• Particularly realizing this planning phase helps teachers make
 genuine connections in the trilogy among curriculum, instruction
 and assessment.
• The curriculum dictates the instructional as well as assessment
 strategies to be applied while assessment informs both the
 curriculum and instruction what decisions to make to improve
 learning.
Preparing a Test Blueprint
• To assure the preparation of a good test, a test
 blueprint is commonly set up in a two-way Table of
 Specifications (TOS) that basically spells out WHAT
 will be tested and HOW it will be tested to obtain the
 information needed.
Preparing a Test Blueprint
• WHAT covers two aspects:
  • content area (i.e. subject matter) being covered
  • target learning outcomes (i.e. competencies)
• HOW specifies the test format:
  • the type of assessment question or task to be used
  • the item distribution to attain an effective
  • balanced sampling of skills to be tested.
•The length of test should be able to sample what
 students should know based on an outline of
 work and not on ease of constructing questions
 particularly for low level outcomes.
•The more important a learning outcome is the
 more likely will there be more points allotted to
 it. McMillan (2007) suggests some rules of thumb
 in determining how many items are sufficient for
 good sampling.
• A minimum of ten items is needed to assess each
  knowledge learning target in a unit but which should
  represent a good cross-section of difficulty of items.
• However, if there are more specific learning targets
  to be tested, at least five items would be enough for
  each one to allow for criterion-referenced
  Interpretation for mastery.
• Eighty percent (80%) correct of items for a
  competency is an acceptable mastery criterion.
  References
• https://www.ulethbridge.ca/teachingcentre/creating-
  selected-response-questions
                                                        Insert Running Title   26
UNIVERSITY OF SOUTHERN MINDANAO
      TOS Making
   What is a Table of Specifications?
   A Table of Specifications is a two-way
chart which describes the topics to be
covered by a test and the number of
items or points which will be associated
with each topic.
      Significance and components of a Table of
      Specification
Kubiszyn & Borich, (2003) emphasized the following
  significance and components of TOS:
1. A Table of Specifications consists of a two-way chart
    or grid relating instructional objectives to the
    instructional content.
    The column of the chart lists the objectives or
    "levels of skills" (Gredlcr, 1999) to be addressed;
    The rows list the key concepts or content the test is
    to measure.
According to Bloom, et al. (1971),
 "We have found it useful to represent the
relation of content and behaviors in the form
of a two dimensional table with the
objectives on one axis, the content on the
other”.
2. A Table of Specifications identifies not only the
 content areas covered in class, it identifies the
 performance objectives at each level of the
 cognitive domain of Bloom's Taxonomy.
Teachers can be assured that they are measuring
 students' learning across a wide range of content
 and readings as well as cognitive processes
 requiring higher order thinking.
3. A Table of Specifications is developed
 before the test is written. In fact it
 should be constructed before the actual
 teaching begins.
4. The purpose of a Table of
 Specifications is to identify the
 achievement domains being measured
 and to ensure that a fair and
 representative sample of questions
 appear on the test.
Carey (1988) pointed out that the time
available for testing depended not only
 on the length of the class period but
  also on students' attention spans.
     Some rules of thumb exist for how long it takes most students
     to answer various types of questions according to Linn &
     Gronlund (2000):
1. A true-false test item takes 15 seconds to
   answer unless the student is asked to
   provide the correct answer for false
   questions. Then the time increases to 30-
   45 seconds.
2. A seven item matching exercise takes 60-
   90 seconds.
3. A four response multiple choice test
 item that asks for an answer regarding a
 term, fact, definition, rule or principle
 (knowledge level item) takes 30
 seconds. The same type of test item
 that is at the application level may take
 60 seconds.
4. Any test item format that requires
 solving a problem, analyzing,
 synthesizing information or
 evaluating examples adds 30-60
 seconds to a question.
5. Short-answer test items take 30-45
 seconds.
6. An essay test takes 60 seconds for
 each point to be compared and
 contrasted.
   If an individual can perform the
most difficult aspects of the objective,
the instructor can "assume" the lower
levels can be done.
   However, if testing the lower levels, the
instructor cannot "assume" the individual
can perform the higher levels.
   The cornerstone of classroom assessment
practices is the validity of the judgments about
students’ learning and knowledge.
   A TOS is one tool that teachers can use to
support their professional judgment when
creating or selecting test for use with their
students.
    In order to understand how to best modify a
TOS to meet your needs, it is important to
understand the goal of this strategy: improving
validity of a teacher’s evaluations based on a given
assessment. Validity is the degree to which the
evaluations or judgments we make as teachers
about our students can be trusted based on the
quality of evidence we gathered (Wolming
 & Wilkstrom, 2010).
    From the literatures we have known that
standardized tests are valid.
    The question needs to he asked if GPAs are a
valid measures of student achievement?
    GPAs are based in large measure on teacher
made tests. If teacher made tests are not valid, how
can a students GPA be valid?
    The use of a Table of Specifications can provide
teacher made tests validity (Notar, Charles, 2004).
  Lei, Bassiri
            and Schultz,(2001) found that a
college GPA was an unreliable predictor of
student achievement. Since we assume that
norm referenced tests are valid measures,
the tendency is to put more weight on
those results concerning student
achievement.
   According to Ooster (2003) the faculty
made tests will likely have poor content
validity, "cause for concern because each
assessment instrument depends on its
validity more than on any other factor."
     How can the use of a Table of Specifications
     benefit your students, including those with
     special needs?
 A Table of Specifications benefits students in two
                       ways.
   First, it improves the validity of teacher-
made tests.
  Second, it can improve student learning as
well.
   A Table of Specifications helps to ensure that
there is a match between what is taught and
what is tested. Classroom assessment should be
driven by classroom teaching which itself is
driven by course goals and objectives.
   Tables of Specifications provide the link
between teaching and testing. (University of
Kansas, 2013)
    Teachers can collaborate with students ,
 teachers or colleagues on the construction of the
 Table of Specifications:
❑ what are the main ideas and topics,
❑ what emphasis should be placed on each topic,
❑ what should be on the test?
     Open discussion and negotiation of these issues
 can encourage higher levels of understanding while
 also modeling good learning and study skills.
THANK YOU …..
   Selecting and Constructing Test Items and
   Tasks
CATEGORIZING TEST TYPES
                                  "Selection     of    item
                             format is dictated by the
                             instructional        outcomes
                             intended to be assessed. There
                             are formats appropriate to
                             measuring knowledge and
                             simple understanding while
                             there are those fit to
                             measuring complex or deep
                             understanding."
                                                 Insert Running Title   23
Tree Chart of Test Types
        "Selection of item format is dictated by the instructional
        outcomes intended to be assessed. There are formats
        appropriate to measuring knowledge and simple
        understanding while there are those fit to measuring
        complex or deep understanding."
                                                                     Insert Running Title   24
      Relating Test Types with Levels of Learning Outcomes
➢   review of curricular frameworks of educational systems across
    various countries shows common integral domains that govern
    their content and performance standards In different subject
    areas
                                                        Insert Running Title   25
    A. Measuring Knowledge and Simple Understanding
➢Knowledge as it appears in cognitive taxonomies (Bloom, 1956;
 Anderson & Krathwol, 2004) as the simplest and lowest level is
 categorized further into what thinking process is involved in
 learning. Knowledge involves remembering or recalling specific
 facts, symbols, details, elements of events and principles to
 acquire new knowledge.
                                                      Insert Running Title   26
Insert Running Title   27
     The examples below will differentiate declarative and
procedural knowledge as simple understanding involving
comprehension and application.
                                                       Insert Running Title   28
        Nikko (2001) gives
categories of these
lower-order       thinking
skills and some examples
of generic questions for
assessing them (see
Table 8.3). The generic
questions can be useful
in formatting completion
or short answer items to
assess              simple
understanding.
                             Insert Running Title   29
    B. Measuring Deep Understanding
Deep Understanding
  ➢requires more complex thinking processes
  ➢requires the three (3) higher cognitive level example
    analyzing, evaluating and creating;
  ➢Higher-order- Thinking skills
Simple Understanding
   ➢involves the first three (3)cognitive levels example
    remembering, comprehending and applying
   ➢Lower-order- Thinking skills
                                                           Insert Running Title   30
Insert Running Title   31
      Table 8.5 illustrates the relationship between
learning outcomes and test types. It can be observed that
test types can be made flexible and versatile to test
different levels of outcomes and no to be limited or
exclusive to only one cognitive level. The arrows suggest
that supply or selection type can be used for both lower-
level as well as higher-level outcomes. Knowledge and
simple understanding can be handled by objective supply
type - i.e. completion and short answer items, and
objective selection type - i.e. alternate choice, multiple
choices and matching.
                                                       Insert Running Title   32
Insert Running Title   33
Insert Running Title   34
        Note that deep understanding is assessed by the same
category of item format but using non-objective types - i.e. essay
questions, both restricted and extended, modified selected-
response- i.e. multiple interpretive items, and performance tasks.
What indeed matters is the careful construction of the item
elements (i.e. item stimulus and item response) to appropriately
elicit the cognitive processes involved. An elicitation device like a
question or a directive for a supply type can be used to assess both
low-level and high-level outcomes in the same way that with the
right construction of the stem and options for selected-response
types, both simple and complex forms of cognition can be activated.
Study the examples in the two boxes.
                                                              Insert Running Title   35
Insert Running Title   36
Insert Running Title   37
     Miller, Linn & Gronlund (2009) presents categories of
thought questions for deep understanding and sample item
stems in Table 8.6. These sample stems can be used in
constructing test types, i.e. both supply and selection type that
can elicit complex thinking skills.
                                                          Insert Running Title   38
Insert Running Title   39
Insert Running Title   40
Insert Running Title   41
      Performance tasks as in the case of "letter writing,"
"producing a plan," and "story writing" likewise assess
high-level learning outcomes involving complex thought
processes, e.g. analyzing, evaluating, and creating.
Angelo & Cross (1993) have extensively designed
classroom assessment tasks (CATs) for college level that
are performance-based type in nature. Some examples
given in Table 8.7 were taken from their inventory.
                                                        Insert Running Title   42
Table 8.7 Examples of Performance Assessment Tasks for Advanced Level
       Thinking Skill                             Performance Task
 Analysis               1. Analytic memos- writing a one or two page analysis of a specific
                           problem or issue
                        2. Pros and Cons grid- making a list of pros and cons of decision made
                        3. Contest, Form, and Function Outline- Analyzing the what, how and
                           why of the particular message of an advertisement, or commercial.
 Evaluate               1. Muddiest Point – identifying what students find least clear in lesson,
                           story, demonstration
                        2. Misconception Check – assessing students’ prior beliefs that can
                           hinder learning
                        3. Empty Outline –recalling and organizing the important points of a
                           lecture or reading
 Create                 1. Application Card – designing an application of a learned scientific
                           principle or procedure in real world.
                        2. Directed Paraphrasing – translating what has been learned in one’s
                           own words or form for a specific audience.
                        3. Paper or Project Prospectus – writing a first structure draft of a
                           paper or
                        4. project
                                                                                         Insert Running Title   43
      Constructing Objective Supply Type of Items
         The item types falling under Supply type require the learners to
  construct a response to a question or directive. The sub-types however,
  differ in terms of the structure of the response needed to answer the
  item:
1. Completion Type
      Table 8.8 illustrates the usual item structure for Completion Type. An
item structure consists of a stimulus which defines the question or problem,
and a response which defines what is to be provided or constructed by the
learner. For a completion item, an incomplete statement with a blank is often
used as stimulus and the response is a constructed word, symbol, numeral or
phrase to complete the statement.
                                                                    Insert Running Title   44
Insert Running Title   45
      Sometimes instead of a set independent incomplete
statements as the stimulus, a discourse with gaps is used to
make it more communicative. Gap-Filling is another term for
this variant as the student fills several gaps in a discourse
depending on the target outcome. Language teachers often
utilize this form for integrative testing where more than one
type of skill (e.g. vocabulary and comprehension skills) are
needed to fill in the gaps.
                                                        Insert Running Title   46
 ILO: Provide synonyms for target words in a paragraph.
Directions: Give a word that has the same meaning as the
word inside the parenthesis.
More than a few people may confuse fine dining with_______
(costly) dining in restaurants. Well-trained (cooks) at the top of
their profession can make their good________ (name) in these
places. Who the cooks bring_______ (honor) to these
restaurants.
                                                            Insert Running Title   47
   Experts in test development agree on some helpful guidelines in the
  construction of Completion Items (Kubiszyn and Borich, 2010; McMillan,
  2007; Nitko, 2001; Popham, 2011)
A.There should only be one correct response to complete a statement.
       This contributes to efficiency in scoring since a key to correction
can easily be prepared in advance when there is only one expected
response. Proper wording of the incomplete statement must be
carefully done to avoid having more than one correct answer. Exception
to this rule is if you are testing for verbal creativity where giving diverse
but acceptable responses is desirable. This however, should be explicitly
mentioned in the test instructions. For instance, the more synonyms
students can give to the word costly like expensive, exorbitant, and
pricey the more points they can earn. Objective scoring will likely have
to be modified here.
                                                                   Insert Running Title   48
     In Sample A of Table 8.8, if the target concept is
quadrilateral then its wording is all right. However, if the
target concept is square, the way it is worded may be open
to more than one acceptable answers. Quadrilateral,
rectangle and parallelogram can also be considered correct.
To improve the stem, it can be worded this way to eliminate
the other terms.
     A quadrilateral with four equal sides is called_______.
                                                       Insert Running Title   49
B. The blank should be placed at the end or towards the end
of the incomplete statement.
     This will provide the reader appropriate and adequate
context before s/he gets to answer the blank and consequently
avoids being perplexed. In Sample B, if the blank is placed at the
beginning like:
                                                           Insert Running Title   50
During the______ period, Dr. Jose Rizal wrote the novel, Noli
Me Tangere.
     It can possibly call for diverse and ambiguous answers like
troubled, colonial, or earlier, without reading the rest of the
statement.
                                                       Insert Running Title   51
C. Avoid providing unintended clues to the correct answer.
      The validity of a student's score is jeopardized when s/he
answers correctly an item without really knowing what the
correct response is. His/her score may represent a different
kind of ability apart from what is intended to be measured. This
happens when a student who doesn't know the answer would
find one by using unintended grammatical clues e.g. presence
of indefinite articles a or an before the blank to suggest a
response that starts with a vowe.
                                                        Insert Running Title   52
2. Short Answer Items
       Instead of supplying words to complete statements,
relatively short answers are constructed as direct answers to
questions. See Table 8.9 for the item structure. The sample
items are the same statements in Table 8.8 which have been
transformed into interrogative form. Being able to do this
illustrates the fact that both test types can be used to test
the same learning outcomes requiring the same cognitive
processes.
                                                        Insert Running Title   53
Insert Running Title   54
     Similar to completion type, the short answer items can
assess learners' declarative and procedural knowledge that require
such thinking processes as remembering, comprehending, and
applying. Writing short-answer items similarly follow the
guidelines in writing completion items. Here are those given by
McMillan (2007, pp.170-171) and they are quite self-explanatory.
                                                        Insert Running Title   55
1. State the item so that only one answer is correct.
2. State the item so that the required answer is brief. Requiring a
long response would not be necessary and it can limit the number
of items students can answer within the allotted period of time.
3. Do not use questions verbatim from textbooks and other
instructional materials. This will give undue disadvantage to
students not familiar with the materials since it can become a
memory test instead of comprehension.
                                                        Insert Running Title   56
4. Designate units required for the answer, this frequently occurs when
the constructed response requires a definite unit to be considered
correct. Without designating the unit, a response may be rendered wrong
because of differing mind-set.
Example:
Poor: How much does the food caterer charge?
This could be answered in different ways like cost per head, per dish, per
plate, or as a full package.
Improve: How much does the food caterer charge per head?
                                                               Insert Running Title   57
5. State the item succinctly with words students understand.
This is true for all types of tests. Validity of classroom-based test
is at risk when students cannot answer correctly, not because
they do not know, but could be due to the messy wording of the
question.
Poor: As viewed by creatures from the earth, when does the blood moon
appear in the evening?
Improved: when does a blood moon appear?
                                                            Insert Running Title   58
    The two supply types, completion and short
    answer items, share common points:
• Appropriate for assessing learning outcomes involving knowledge and simple
 understanding.
• Capable of assessing both declarative and procedural knowledge.
• Both are easy and simple to construct.
• Both are objectively scored since a key to correction can be prepared in
 advance.
• Both need ample number of items to assess a learning outcome. A single
  completion or short-answer item is not sufficient to test mastery of a
  competency .
                                                                    Insert Running Title   59
 Constructing Non-objective Supply Type
Essay Type
      Essay Type likewise belongs to the Supply category for the
simple reason that the required response is to be fully
constructed by the students. However, unlike the completion and
short-answer items which are highly structured to allow the
students to organize freely their responses using their own
writing style to answer the question. This format therefore is
appropriate for testing
•
                                                       Insert Running Title   60
deep understanding and reasoning. Some of the thinking
processes to satisfactorily answer essay questions involve
comparison, induction, deduction, abstracting, analyzing
perspectives, decision-making, problem-solving, constructing
support and experimental inquiry (Marzano, et al (1993). They
actually involve higher-order thinking skills.
                                                    Insert Running Title   61
      There are two variations of essay items: restricted-
response and extended-response. Table 8.10 approximates a
structure for these two types of essay items. The same stimulus
structure can be used for both types as well as the expected
forms of response. Sample items are provided to illustrate the
variations.
                                                      Insert Running Title   62
.
    Insert Running Title   63
.
    Insert Running Title   64
.
    Insert Running Title   65
  Suggestions for constructing essay questions are
  given by Miller, Linn & Gronlund (2009, p.243):
1. Restrict the use of essay questions to those learning outcomes
that cannot be measured satisfactorily by objective items.
       Objective items cannot measure such important skills as
ability to organize, integrate, and synthesize ideas showing one's
creativity in writing style. Use of essay format encourages and
challenges students to indulge in higher-order thinking skills
instead of simply rote memorization of facts and of remembering
inconsequential details.
                                                         Insert Running Title   66
    2. Construct questions that will call forth the skills specified in the
    learning standards.
.
            A review of learning standards in school curricula which show that they
    range from knowledge to deep understanding. The performance standards
    require the learners to demonstrate application of principles, analysis of
    experimental findings, evaluation of results and creation of new knowledge
    and these are explicitly stated in terms of the expected outcomes at every
    grade level. Teachers are expected to make it part of direct instruction to
    teach them how to develop these competencies. Students cannot instantly
    learning techniques, they should be taught how to make these thinking skills
    visible. The essay questions to be constructed then should make the students
    model how they are to perform the thinking processes.
                                                                      Insert Running Title   67
    2. Construct questions that will call forth the skills specified in the
    learning standards.
.
            A review of learning standards in school curricula which show that they
    range from knowledge to deep understanding. The performance standards
    require the learners to demonstrate application of principles, analysis of
    experimental findings, evaluation of results and creation of new knowledge
    and these are explicitly stated in terms of the expected outcomes at every
    grade level. Teachers are expected to make it part of direct instruction to
    teach them how to develop these competencies. Students cannot instantly
    learning techniques, they should be taught how to make these thinking skills
    visible. The essay questions to be constructed then should make the students
    model how they are to perform the thinking processes.
                                                                      Insert Running Title   68
3. Phrase the question so that the student's task is clearly
.
defined.
        Restricted-response type of essay questions especially states the
specific task to be done in writing. As much as possible, the students
should interpret the question in the same way according to what the
teacher expects through the specifications in the question. For instance,
if the teacher aims at testing students' ability to apply learned properties
of a substance for a specific purpose, the question could be stated
"Explain the property of copper that makes it good for making cooking
pans" instead of simply "Why is copper a good material?"
                                                                 Insert Running Title   69
    4. Indicate an approximate time limit for each question.
.
       This should be especially considered when the test is a
    combination of objective and non-objective format like
    inclusion of essay questions. Knowing how much time is
    allotted to each one will make the students budget their time
    so they do not spend their time on the first question and
    consequently missing out on the others.
                                                        Insert Running Title   70
  5. Avoid the use of optional questions.
.       Some teachers have the practice of allowing the
  students to select one or two essay questions from a set of
  five questions. Some disadvantages of this practice may
  include: not being able to use the same basis for reporting test
  results, or students being able to prepare through
  memorization for those they will likely choose.
                                                         Insert Running Title   71
        Essay, among the test types, is quite frequently used
.
  because of the seeming ease in its construction, but the
  least preferred when it comes to scoring. Its reliability is
  challenged since its subjective scoring may be affected by
  such irrelevant factors as corrector's mood and biases,
  student's penmanship, length of response and even time of
  day for scoring. The use of a scoring guide called rubrics, can
  significantly reduce subjectivity and more or less help in
  "objectifying" scoring of a non-objective type of item.
                                                         Insert Running Title   72
        Basic in the preparation of rubrics is the selection of relevant
  scoring criteria to be used in evaluating the written output. Very often
. used to evaluate essays are clarity of message, organization, depth of
  understanding, creativity of ideas, grammatical accuracy, etc. If the
  relevant criteria are singled out and focused separately to show the
  learner's profile across these different dimensions or attributes,
  analytic scoring is applied. A 5-point or a 6-point rating scale is
  prepared for each attribute and a student's output is judged using an
  aggregate score. Table 8.11 illustrates this analytical scoring structure.
  This type is useful when giving feedback to the learners as it enables
  them to realize their strong and weak points especially when they are
  made aware of the scoring criteria.
                                                                   Insert Running Title   73
.
           For judging a specific writing genre like an argument, the
    rubric shown in Table 8.12 can be adapted for analytical scoring.
                                                              Insert Running Title   74
.
    Insert Running Title   75
     When the attributes are considered together to arrive at
 . overall judgment or impression, holistic scoring is in use. For
an
ease of scoring, teachers often use a set of labels like excellent,
good, adequate, promising, weak, inadequate or the traditional
A,B,C,D, or F marks, however, this practice neither provides the
teachers with guidance for scoring nor the students for
understanding their score (Miller, Linn & Gronlund, 2009). What
is suggested is to have descriptions of the labels as in the
example below:
                                                          Insert Running Title   76
.
    Insert Running Title   77
  There are suggestions also given by Miller, Linn &
  Gronlund (2009, p.254) to improve the reliability of
  scoring    responses     to    essay     questions:
• 1. Prepare an outline of the expected answer in advance.
       Particularly for restricted-response types which define
specifically the task, having a list of the expected responses, e.g.
three principles or two theories to explain a phenomenon, will be
very useful to the teacher.
2. Use the scoring rubric that is most appropriate.
       The nature of the essay question and what is assessed should
identify the type of scoring rubric that should be used. Quite
significant too, is the kind of information that will be communicated by
the teacher to the students.
                                                              Insert Running Title   78
3. Decide how to handle factors that are irrelevant to the learning
outcomes being measured.
      As mentioned earlier in this chapter, scoring of essay could be
influenced by irrelevant factors like spelling or handwriting. Teachers
should decide in advance whether these factors are to be ignored or not.
4. Evaluate all responses to one question before going on to the next
one.
       Very likely, scoring of the subsequent question could be influenced
by the student's response to the preceding item. Consistency in scoring is
better attained when inter-comparison of responses of the members of
the class to the same item is done.
                                                             Insert Running Title   79
5. When possible, evaluate the answers without looking at the
student's name.
      An acceptable practice in testing is making students use a
separate sheet for their response to the essay questions and make them
write their names at the back of the answer sheet. Sometimes teachers
could be influenced by who the student is and some form of bias could
happen in favor of better or popular students.
6. If especially important decisions are to be based on the results,
obtain two or more independent ratings.
       To do away with scorer's bias, scoring could be reliably carried out
by two independent raters and the final score being the average of the
two ratings.
                                                              Insert Running Title   80
 Constructing Selected-response Types
      While supply formats require learners to construct their
responses to questions or directives, selected-response types
entail choosing the nearly best or most correct option to
answer a problem. The greatest challenge for this item
format is the construction of plausible options or distracters
so not one stand out as attractively correct.
                                                     Insert Running Title   81
  There are three sub-types of the selected-response
 format depending on the number of given options:
1. Alternate form or binary choice provides only two options,
2. Multiple-choice type offers 3 to 5 options or solutions to a
   problem, and
3. Matching type gives a set of problems or premises and a
   set of options which will be appropriately paired.
                                                        Insert Running Title   82
    1. Binary Choice or Alternate Form
.
          Table 8.13 shows the variety of structure using the alternate form
    as suggested by Nitko (2001, p.136).
                                                                 Insert Running Title   83
.
    Insert Running Title   84
      Except for Yes-No type which uses direct questions, all other
varieties of binary-choice or alternate choice of items have
 .
propositions as the item stimulus. According to what the students
have learned, the veracity of such propositional statements is
judged by the students indicating whether they are true or false,
correct or incorrect or whether they agree or disagree with the
thought or idea expressed. Requiring the students to modify or
qualify their responses particularly for statements judged to be false
or incorrect more or less challenges the reasoning ability of the
learners and raises the level of outcome that can be assessed.
                                                             Insert Running Title   85
     Ease in the construction of binary-choice items makes this
a. popular choice when constructing items especially for
knowledge level outcomes. The propositions are mostly
content-based in nature so teachers can easily referee the
correctness of the items. Sometimes the difficulty lies notonly
in writing the propositions but also in preparing the key to
correction! There are suggestions given to construct good
binary choice items (McMillan, 2007, Musial, et al ., 2009) in
order to avoid guessing:
                                                       Insert Running Title   86
Insert Running Title   87
Insert Running Title   88
Insert Running Title   89
2. Multiple-Choice Items
      The wide choice for this format in classroom testing is mainly due
to its versatility to assess various levels of understanding from
knowledge and simple understanding to deep understanding. McMillan
(2007) asserts that multiple choice can assess whether students can use
reasoning as a skill similar to binary-choice or other reasoning task.
Cognitively demanding outcomes involving analysis evaluation and lend
themselves to the use of multiple-choice items.
                                                             Insert Running Title   90
      Although its construction may not be as easy as binary-
choice, its advantages far exceed what true/false questions can
offer. Aside from being able to assess various outcome levels,
they are easy to score, less susceptible to guessing than
alternate-choice and more familiar to students as they often
encounter them in different testing events (Musial, et.al 2009).
                                                        Insert Running Title   91
       Table 8.14 illustrates the item structure of multiple-
choice. Its item stimulus consists of a stem which contains the
problem in the form of a direct question or an incomplete
statement and the options which offer the alternatives/
distracters from which to select the correct answer. The item
response is selecting the correct answer or best answer from
the options or distracters given. They are listed using letters
(i.e. A, B, C, D or a, b, c, d) or numerals (1, 2, 3, 4).
                                                      Insert Running Title   92
Insert Running Title   93
      Writing good multiple-choice items requires clarity in
stating the problem in the stem and the plausibility or
attractiveness of the distracters. Test experts agree on a set of
guidelines to achieve this purpose (McMillan (2007); Miller, Linn
& Gronlund (2009)1 Popham (2011).
                                                        Insert Running Title   94
   Stem
1. All the words of the stem should be relevant to the task. It means
stating the problem succinct and clear so students understand what is
expected to be answered.
2. Stem should be meaningful by itself and should fully contain the
problem. This should especially be observed when the stem uses an
incomplete statement format. Consider this stem:
The constitution is______________.
                                                           Insert Running Title   95
      A stem worded this way does not make definite the
conceptual knowledge being assessed. One does not know
what is being tested. Is it after definition of the term, its
significance or its history? To test whether a stem is effectively
worded is to be able to answer it without the distracters. This
stem can be improved by changing its format to a direct
question or adding more information in the incomplete
statement like:
                                                         Insert Running Title   96
• What does the constitution of an organization provide?
   (Direct-question format)
• The constitution of an organization provides
     (Incomplete-statement format)
     This way, the test writer determines what knowledge
competence to focus on and what appropriate distracters to
use.
                                                  Insert Running Title   97
3. The stem should use a question with only one correct or
clearly best answer. Ambiguity sets in when the stem allows for
more than one best answer. Students will likely base their
answers on personal experience instead of on facts. Consider
this example. There could be more than one best answer here.
                                                     Insert Running Title   98
Poor:
Which product of Thailand makes it economically stable?
A. Rice
B. Dried fruits
C. Dairy products
D. Ready-to-wear
Improved:
Which agricultural product of Thailand is most productive for export?
A. Rice
B. Fish
C. Fruits
D. Vegetables
                                                                   Insert Running Title   99
Distracters
1. All distracters should appear plausible to uninformed test
takers. This is the key to making the item discriminating and
therefore valid. The validity of the item suffers when there is a
distracter that is obviously correct option D. or obviously
wrong as option B in the following item.
                                                      Insert Running Title   100
Poor:
What is matter?
A. Everything that surround us.
B. All things bright and beautiful.
C. Things we see and hear.
D. Anything that occupies space and has mass.
.
      Quite interesting are the guidelines by Miller, Linn &
Gronlund (2009, p.212) in making distracters plausible. See Table
8.15.
                                                      Insert Running Title   101
Insert Running Title   102
2. Randomly assign correct answers to alternative positions.
Item writers have a tendency to assign the correct answer to
the third alternative as they run short of incorrect alternatives.
Students then who have been used to taking multiple-choice
tests choose wily option C when guessing for greater chance of
being correct. No deliberate order should be followed in
assigning the correct answers (e.g. ABCDABCD or AACCBBDD)
for ease in scoring. As much as possible have an equal number
of correct answers distributed of the distracters.
                                                          Insert Running Title   103
3. Avoid using "All-of-the-above" or "None-of-the-above" as
distracters. Item writers think that using them adds difficulty to the
item since it is a way to test reasoning ability. However, students
without much thinking, will tend to choose these "of-the-above"
distracters haphazardly when they see at least two distracters as
correct incorrect without considering the remaining ones. When
forced to come up with a fourth plausible option and there seems
to be none available except "All-of-the-above" or "None-of-the-
above," do not make them as the correct answer.
                                                            Insert Running Title   104
3. Matching Items
      Of the three general selected-response items formats, matching
items appear differently. It consists of two parallel lists of words or
phrases the students are tasked to pair. The first list which is to be
matched is referred to as premises while the other list from which to
choose its match based on a kind of association is the options. Table 8.16
shows the item structure of matching items followed by two illustrative
items.
                                                              Insert Running Title   105
Insert Running Title   106
Insert Running Title   107
  The two illustrative items exemplify the guidelines
  in constructing matching items (Kubiszyn and
  Borich (2010):
1. Keep the list of premises and the list of options homogenous or
   belonging to a category.
      In Sample 1, the premises are events associated with Philippine
presidents while the options are all names of presidents. In Sample 2,
Column A lists some theories in astronomy about how the universe has
evolved and Column B lists the names of the theories. Homogeneity is a
basic principle in matching items
                                                           Insert Running Title   108
2. Keep the premises always in the first column and the options in the
second column.
       Since the premises are oftentimes descriptions of events, illustrations of
principles, functions or characteristics, they appear longer than the options
which are most of the times are names, categories, objects, and parts. Ordering
of the two columns this way saves reading time for the students since they will
usually read one long premise once and select the appropriate match from a list
of short words. If ordered the opposite way, the students will read a short word
as the premise then read through long descriptions to look for the correct
answer. Especially for Sample 2, the students will normally read a theoretical
postulate first and then logically go through the names of the theories given in
Column B. imagine the time spent if the opposite process is done. Insert Running Title 109
 3. Keep the lists in the two columns unequal in number. Basic reason
for this is to avoid guessing.
       The options in Column B are usually more than the premises in
Column A. if the two lists are equal in number, students can
strategically resort to wise elimination in finding the rest of the pairs.
There are matching items however, when the options are much less
than the premises. This is recommended when testing ability to classify.
For instance, Column A will be a list of 10 animals which are to be
classified and Column B could just be 4 categories of mammals. With
this format, it is important to mention in the test directions that an
option can be used more than once.
                                                             Insert Running Title   110
4. Test directions always describe the basis for matching.
       "Match Column A with Column B" is a no-no in matching type.
Describe clearly what is to be found in the two columns, how they are
associated and how matching will be done. Invalid scores of students
could be due to extraneous factors like misinterpretation of how
matching is to be done, misunderstanding in using given options (e.g.
using an option only once when the teacher allows use of an option
more than once), and limiting number of items to be answered when
there are few options given.
                                                         Insert Running Title   111
.
5. Keep the number of premises not more than eight (8) as
shown in the two sample items.
      Fatigue sets in when there are too many items in a set
and again, test validity suffers. If an item writer feels that there
are many concepts to be tested, dividing them into sets is a
better strategy. It is also suggested that a set of matching
items should appear on a page only and not to be carried on to
the next page. Frequently flipping the test papers just to look
for appropriate options requires additional time.
                                                        Insert Running Title   112
6. Ambiguous lists should be avoided.
      This is especially true in the preparation of options for the
second column. There should only be one option appropriately
associated with a premise unless it is unequivocally mentioned that
an option could be used more than once as mentioned in # 4. This
often occurs when matching events and places or events and
names, descriptions and characters. For instance, in a description
character matching, a premise like "mean to Cinderella" may
carelessly list "stepmother" and "stepsister" as options which are
both correct. Either the premise is improved or one option
removed.
                                                       Insert Running Title   113
      It can be seen that matching type as a test format is used
quite appropriately in assessing knowledge outcomes
particularly for recall of terminologies, classifications, and
remembering facts, concepts, principles, formulae, and
associations. Its main advantage is its efficiency in being able to
test several concepts using the same format.
                                                         Insert Running Title   114
                     Analyzing, and Improving Tests
ADMINISTERING THE TEST
    The test is ready. All that remains is to get the students ready and hand out the
tests. Here is a series of suggestions to help your students psychologically prepare for
the test.
    1. Maintain a positive attitude
    2. Maximize achievement motivation
    3. Equalize advantages
    4. Avoid surprises
    5. Clarify the rules
    6. Rotate distribution
    7. Remind students to check their copies
    8. Monitor students
    9. Minimize distractions
    10. Give time warnings
    11. Collect tests uniformly
SCORING THE TEST
    Some general suggestions to save scoring time and improve scoring accuracy and
consistency:
   1. Prepare an answer key
   2. Check the answer key
   3. Score blindly
   4. Check machine-scored answer sheets
   5. Check scoring
   6. Record scores
ANALYZING THE TEST
     Quantitative Item Analysis
     A numerical method for analyzing test items employing student response
alternatives or options.
    Empirically-based Improvement Procedures
         Item-improvement using empirically-based methods is aimed at improving
    the quality of an item using students’ responses to the test. Test developers refer
    to this technical process as item analysis as it utilizes data obtained separately for
    each item. An item is considered good when its quality indices i.e. difficulty index
    and discrimination index, meet certain characteristics. For a norm-referenced
    test, these two indices are related since the level of difficulty of an item
    contributes to its discriminability. An item is good if it can discriminate between
    those who perform well in the test and those who do not. However, an extremely
    easy item, that which can be answered correctly by more than 85% of the group,
                                               [Prof Ed 221 Assessment of Student Learning 1]
     or an extremely difficult item, that which can only be answered correctly by 15%,
     is not expected to perform well as a “discriminator”. The group will appear to be
     quite homogeneous with items of this kind. They are weak items since they do
     not contribute to “score-based inference”.
         Difficulty Index
              An Item’s difficulty index is obtained by calculating the p value (p) which
         is the proportion of students answering the item correctly.
              The difficulty of an item or item difficulty is defined as the number of
         students who are able to answer the item correctly divided by the total
         number of students.
             Where p is the difficulty index
             R = total number of students answering the item right T
             = total number of students answering the item
Here are two illustrative samples:
Item 1: There were 45 students in the class Item 1 has a p value of 0.67. Sixty-seven
who responded to Item 1 and 30              percent (67%) got the item right while
 answered it correctly.                      33% missed it.
                 p = 30/45
                   =0.67
 Item 2: In the same class, only 10 Item 2 has a p value of 0.22. Out of 45 only
 responded correctly in Item 2.     10 or 22% got the item right while 35 or
                                    78% missed it.
                 p = 10/45
                   =0.22
 For Normative-referenced test: Between the two items, Item 2 appears to be a much
 more difficult item since less than a fourth of the class only was able to respond
 correctly.
 For Criterion-referenced test: The class shows much better performance in Item 1
 than in Item 2. It is still a long way for many to master Item 2.
                                              [Prof Ed 221 Assessment of Student Learning 1]
Range of Difficulty Index           Interpretation                       Action
0 - 0.25                     Difficult                      Revise or Discard
0.26 - 0.75                  Right Difficult                Retain
0.76 - above                 Easy                           Revise or Discard
Discrimination Index
     The power of an item to discriminate between informed and uninformed groups
or between more knowledgeable and less knowledgeable learners is shown using the
item discrimination index (D). This is an item statistics that can reveal useful
information for improving an item. Basically an item-discrimination index shows the
relationship between the student’s performance in an item (i.e. right or wrong) and
his/her total performance in the test represented by the total score.
     For classroom tests, the discrimination index shows if a difference exists between
the performance of those who scored high and those who scored low in an item. As a
general rule, the higher the discrimination index (D), the more marked the
magnitude of the difference is, and thus, the more discriminating the item is. The
nature of the difference however, can take different directions:
    a. Positively discriminating item - proportion of high scoring group is greater
     than that of the low scoring group.
    b. Negatively discriminating item - proportion of high scoring group is less than
     that of the low scoring group.
    c. Not discriminating - proportion of high scoring group is equal to that of the
     low scoring group.
      Calculation of the discrimination index therefore requires obtaining the
 difference between the proportion of the high-scoring group getting the item
 correctly and the proportion of the low-scoring group getting the item correctly using
 this simple formula:
     Another calculation can bring about the same result as (Kubiszyn and Borich,
 2010):
                                               [Prof Ed 221 Assessment of Student Learning 1]
         As you can see                                        is actually getting the p
                                                value of an item. So to get D is to get
the difference between the p-value involving the upper half and the p-value involving
the lower half. So the formula for discrimination index (D) can also be given as
(Popham, 2011):
    To obtain the proportions of the upper and lower groups responding to the item
correctly, the teacher follows these steps:
1. Score the test papers using a key to correction to obtain the total scores of the
   student. Maximum score is the total number of objective items.
2. Order the test papers from highest to lowest score.
3. Split the test papers into halves: high group and low group.
    ➢ For a class of 50 or less students, do a 50 - 50 split. Take the upper half as the
      HIGH GROUP and the lower as the LOW GROUP.
    ➢ For a big group of 100 or so, take the upper 25 - 27% and the lower 25 - 27%.
    ➢ Maintain equal numbers of test papers for Upper and Lower groups.
4. Obtain the p value for the upper group and p value for the lower group.
5. Get the discrimination index by getting the difference between the p-values.
    For the purposes of evaluating the discriminating power of items, Popham (2011)
offers the guidelines proposed by Ebel & Frisbie (1991) shown in Table 1. The teachers
can be guided on how to select satisfactory items and what to do to improve the test.
                                              [Prof Ed 221 Assessment of Student Learning 1]
Table 1. Guidelines for Evaluating the Discriminating Efficiency of Items
            Discrimination Index                           Item Evaluation
.40 and above                                 Very good items
.30 - .39                                     Reasonably good items, but possibly
                                              subject to improvement
.20 - .29                                     Marginal items, usually needing
                                              improvement
.19 and below                                 Poor items, to be rejected or improved
                                              by revision
      Items with negative discrimination indices, although significantly high, are
subject right away to revision if not deletion. With multiple-choice items, negative D
is a forensic evidence errors in item writing. It suggests the possibility of:
    ✓ Wrong key - more knowledgeable students selected a distracter which is the
      correct answer but is not the keyed option
    ✓ Unclear problem in the stem leading to more than one correct answer
    ✓ Ambiguous distracters leading the more informed students be divided in
      choosing the attractive options
    ✓ Implausible keyed option which more informed students will not choose
    As you can see, awareness of item-writing can provide cues on how to improve
    items bearing negative or non-significant discrimination indices.
Distracter Analysis
     Another empirical procedure to discover areas for item-improvement utilizes an
analysis of the distribution of responses across the distracters. Especially when the
difficulty index and discrimination index of the item seem suggest its being candidate
for revision, distracter analysis becomes a useful follow-up. It can detect differences
in how the more able students respond to the distracters in a multiple-choice item
compared to how the less able ones do it. It can also provide an index of the plausibility
of the alternatives, that is, if they are functioning as good distracters. Distracters not
chosen at all, especially by the uniformed students need to be revised to increase their
attractiveness.
     To illustrate this process, consider the frequency distribution of the responses of
the upper group and lower group across the alternatives for two items. Separate
counts are done for the upper and lower group who chose A,B,C, and D. The data is
organized in a distracter analysis table.
                                               [Prof Ed 221 Assessment of Student Learning 1]
Table 2. Distracter Analysis Table
Item     Difficulty Discrimination Group                         Alternatives
N=40     Index (p)     Index (D)
                                                    A       B        C       D       Omit
  1           .38          -.35         Upper       2       10       *5      3
                                        Lower       2       0        12      6
  2           .45          -.50         Upper       2       *4       10      4
                                        Lower       5       14       1       0
Analysis:
    ⚫ What kinds of items do you see based on their D?
    ⚫ What does their respective D indicate? Cite the data supporting this.
         ⚫ Which of the two items is more discriminating? Why ⚫ Which
         items need to be revised?
Activity 6.2
Task Description: This activity will test your ability to apply empirical procedures for
item-improvement. Solve and answer the following.
1.     A final test in Science was administered to a Grade IV class of 50. The teacher
wants to improve further the items for next year’s use. Calculate a quality index that
can be used using the given data and indicate the possible revision needed by some
items.
       Item             Number getting              Index             Revision needed to
                         item correct                                      be done
         1                     34               ____________
         2                     18               ____________
         3                     10               ____________
         4                     46               ____________
         5                        8             ____________
2.      Below are additional data collected for the same items. Calculate another
quality index and indicate what needs to be improved with the obtained index as a
basis.
      Item          Upper Group       Lower Group           Index            Revision
                                                                           needed to be
                                                                              done
                                              [Prof Ed 221 Assessment of Student Learning 1]
       1                 25                 9            ____________
       2                 9                  9            ____________
       3                 2                  8            ____________
       4                 38                 8            ____________
       5                 1                  7            ____________
3.     A distracter analysis table is given for a test item given to a class of 60. Obtain
the necessary item statistics using the given data.
Item       Difficulty Discrimination Group                        Alternatives
N=30       Index (p)     Index (D)
                                                     A        B       *C       D       Omit
   1          ?               ?          Upper       2       18        5       0
                                         Lower       0       10       20       0
Write your evaluation on the following aspects of the item:
a. Difficulty of the Item - _______________________________________________
b. Discriminating power of the Item - ____________________________________
c. Plausibility of Options - ______________________________________________
d. Ambiguity of the answer - ____________________________________________
Qualitative Item Analysis
    A non-numerical method for analyzing test items not employing student
responses, but considering test objectives, content validity, and technical item
quality.
    Judgmentally-based improvement procedures
         This approach basically makes use of human judgment in reviewing the
    items. The judges are the teachers themselves who know exactly what the test is
    for, the instructional outcomes to be assessed, and the items’ level of difficulty
    appropriate to his/her class; the teacher’s peers or colleagues who are familiar
    with the curriculum standards for the target grade level, the subject matter
    content, and the ability of the learners; and the students themselves who can
    perceive difficulties based on their past experiences.
         ◼ Teachers’ Own Review (Self-review)
              It is always advisable for teachers to take a second look at the assessment
              tool s/he has devised for a specific purpose. To presume perfection right
              away after its construction may lead to failure to detect shortcomings of
              the test or assessment task. There are five (5) suggestions given by
              Popham (2011, p253) for the teachers to follow in exercising judgment:
                                                [Prof Ed 221 Assessment of Student Learning 1]
    1.    Adherence to item-specific guidelines and general item-writing
    commandments. There are specific guidelines in writing various forms of
    objective and non-objective constructed-response types and the selected-
    response types for measuring lower level and higher level thinking skills.
    Those guidelines should be used by the teachers to check how good the items
    have been planned and written particularly in their alignment to intended
    instructional outcomes.
    2.    Contribution to score-based inference. The teacher examines if the
    expected scores generated by the test can contribute to making valid
    inference about the learners. Can the scores reveal the amount of learning
    achieved or show what have been mastered? Can the score infer the
    student’s capability to move on to the next instructional level? Or rather the
    scores obtained do not make any difference at all in describing or
    differentiating various abilities.
    3.    Accuracy of content. This review should especially be considered
    when tests have been developed after a certain period of time. Changes that
    may occur due to new discoveries or developments can redefine the test
    content of a summative test. If this happens, the items or the key to
    correction may have to be revisited.
    4.    Absence of content gaps. This review criterion is especially useful in
    strengthening the score-based inference capability of the test. If the current
    tool misses out on important content now prescribed by a new curriculum
    standard, the score will likely not give an accurate description of what is
    expected to be assessed. The teacher always ensures that the assessment
    tool matches what is currently required to be learned. This is a way to check
    on the content validity of the test.
    5.    Fairness. The discussions on item-writing guidelines always give
    warning on unintentionally favoring the uninformed students obtain higher
    scores. These are due inadvertent grammatical clues, unattractive
    distracters, ambiguous problems, and messy test instructions. Sometimes,
    unfairness can happen because of due advantage received by a particular
    group like those seated in front of the classroom or those coming from a
    particular socio-economic level. Getting rid of faulty and biased items and
    writing clear instructions definitely add to the fairness of the test.
◼    Peer review
     There are schools that encourage peer or collegial review of assessment
instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance-based
assessment tasks. During these teacher dyad or triad sessions, those teaching the
                                        [Prof Ed 221 Assessment of Student Learning 1]
    same subject area can openly review together the classroom tests and tasks they
    have devised against some consensual criteria. The suggestions given by test
    experts can actually be used collegially as basis for a review checklist:
        a. Do the items follow the specific and general guidelines in writing items
    especially on:
             ◆ Being aligned to instructional objectives?
             ◆ Making the problem clear and unambiguous?
             ◆ Providing plausible options?
             ◆ Avoiding unintentional clues?
             ◆ Having only one correct answer?
        b. Are the items free from inaccurate content?
        c. Are the items free from obsolete content?
        d. Are the test instructions clearly written for students to follow?
        e. Is the level of difficulty of the test appropriate to level of learners?
        f. Is the test fair to all kinds of students?
    ◼     Student Review
         Engagement of students in reviewing items has become a laudable practice
    for improving classroom tests. The judgement is based on the students’
    experience in taking the test, their impressions and reactions during the testing
    event. The process can be efficiently carried out through the use of a review
    questionnaire. Popham (2011) illustrates a sample questionnaire shown in Table
    3. It is better to conduct the review activity a day after taking the test so the
    students still remember the experience when they see a blank copy of the test.
Table 3. Item-Improvement Questionnaire for Students.
1. If any of the items seemed confusing, which ones were they?
2. Did any items have more than one correct answer? If so, which ones?
3. Did any items have no correct answers? If so, which ones?
4. Were there words in any items that confused you? If so, which ones?
5. Were the directions for the test, or for particular subsections, unclear? If so, which
   ones?
Acitivity 6.3 Classifying Item - Improvement Approach. Below are descriptions of
procedures done, write J if a judgmental approach is used and E if empirically based.
__________ 1. The Math Coordinator of Grade VII classes examined the periodical
tests prepared by the Math teachers to see if their items are aligned to the target
outcomes for the first quarter.
__________ 2. The alternatives of the multiple-choice items of the Social Studies test
were reviewed to discover if they only have one correct answer.
                                               [Prof Ed 221 Assessment of Student Learning 1]
__________ 3. To determine if the items are efficiently discriminating between the
more able students from the less able ones, a Biology teacher obtained the
discrimination index (D) of the items.
__________ 4. A Technology Education teacher was interested to see if the criterion-
referenced test he has devised shows a difference in the items’ post-test and pre-test
p-values.
__________ 5. An English teacher conducted a session with his students to find out if
there are other responses acceptable in their literature test. He encouraged them to
rationalize their answers.
ITEM ANALYSIS MODIFICATIONS FOR THE CRITERION-
REFERENCED TEST
      The statistical test analysis method discussed earlier, called quantitative item
analysis, applies most directly to the norm-referenced test. The classroom teacher
will typically use criterion-referenced tests rather than norm referenced tests. Well,
then, we can just use these same procedures for our teacher-made criterion-
referenced tests. Right? Wrong!
      As we will discover in later chapters, variability of scores is crucial to the
appropriateness and success of norm-referenced quantitative item analysis
procedures. In short, these procedures depend on the variability or spread of scores
(i.e., low to high) if they are to do their jobs correctly. In a typical teacher-made
criterion-referenced test, however, variability of scores would be expected to be
small, assuming instruction is effective and the test and its objectives match. Thus,
the application of quantitative item analysis procedures to criterion-referenced
measures may not be appropriate, since by definition most students will answer these
items correctly (i.e., there will be minimal variability or spread of scores). In this
section we will describe several ways in which these procedures can be modified when
a criterion-referenced, mastery approach to test item evaluation is employed. As you
will see, these modifications are straightforward and easier to use than the
quantitative procedures described earlier.
✓    Using Pre- and Post-tests as Upper and Lower Groups
     The following approaches require that you administer the test as a pretest prior
to your instruction and as a post-test after your instruction. Ideally, in such a situation
the majority of students should answer most of your test items incorrectly on the
pretest and correctly on the post-test. By studying the difference between the
difficulty (p) levels for each item at the time of the pre- and post-tests, we can tell if
this is happening. At pretest, the p level should be low (e.g., 0.30 or lower), and at
post-test, it should be high (e.g., 0.70 or higher). In addition, we can consider the
pretest results for an item as the lower group (L) and post-test results for the item as
the upper group (U), and then we can perform the quantitative item analysis
procedures previously described to determine the discrimination direction for the key
and for the distractors.
                                               [Prof Ed 221 Assessment of Student Learning 1]
    If a criterion-referenced test item manifests these features, it has passed our
“test” and probably is a good item with little or no need for modification. Contrast this
conclusion, however, with the following item from the same test.
                                              [Prof Ed 221 Assessment of Student Learning 1]
    Thus, the item in Example 2 failed all the tests. Rather than modify the item, it is
probably more efficient to replace it with another.
✓   Comparing the Percentage Answering Each Item Correctly on Both Pre- and
    Pos t-test
    If your test is sensitive to your objectives (and assuming you teach to your
objectives), the majority of learners should receive a low score on the test prior to your
                                               [Prof Ed 221 Assessment of Student Learning 1]
instruction and a high score afterward. This method can be used to determine
whether this is happening. Subtract the percentage of students passing each item
before your instruction from the percentage of students passing each item after your
instruction. The more positive the difference, the more you know the item is tapping
the content you are teaching. This method is similar to the first step as described in
the preceding section. For example, consider the following percentages for five test
items:
       Notice that item 3 registers no change in the percentage of students passing
from before to after instruction. In fact, a high percentage of students got the item
correct without any instruction! This item may be eliminated from the test, since little
or no instruction pertaining to it was provided and most students already knew the
content it represents.
     Now, look at item 5. Notice that the percentage is negative. That is, 14% of the
class actually changed from getting the item correct before instruction to getting it
wrong after. Here, either the instruction was not related to the item or it actually
confused some students who knew the correct answer beforehand. A revision of the
item, the objective pertaining to this item, or the related instruction is in order.
✓ Determining the Percentage of Items Answered in the Expected Direction for
     the Entire Test
     Another, slightly different approach is to determine whether the entire test
reflects the change from fewer to more students answering items correctly from pre-
to post-test. This index uses the number of items each learner failed on the test prior
to instruction but passed on the test after instruction. Here is how it is computed:
     Step 1: Find the number of items each student failed on the pretest, prior to
instruction, but passed on the post-test, after instruction.
    The asterisks indicate just the items counted in Step 1 for Bobby. This count is
then repeated for each student.
    Step 2: Add the counts in Step 1 for all students and divide by the number of
students.
                                              [Prof Ed 221 Assessment of Student Learning 1]
    Step 3: Divide the result from Step 2 by the number of items on the test. Step
    4: Multiply the result from Step 3 by 100.
    Let’s see how this would work for a 25-item test given to five students before and
after instruction.
DEBRIEFING GUIDELINES
        Before handing back answer sheets or grades, you should do the following.
        1. Discuss problem items.
        2. Listen to student reactions.
        3. Avoid on-the-spot decisions.
        4. Be equitable with changes.
        5. Ask students to double-check.
        6. Ask students to identify problems.
THE PROCESS OF EVALUATING CLASSROOM ACHIEVEMENT
     Figure 2 summarizes all of the important components of achievement testing
that we have discussed thus far. If you’ve studied and worked at these chapters, you
are ahead in the test construction game. What that means for you is better tests that
cause fewer students and parents to complain and tests that are more valid and
reliable measurements of achievement.
                                             [Prof Ed 221 Assessment of Student Learning 1]
   Figure 2. The process of measuring achievement in the classroom. (Kubiszyn &
                               Borich, 2013 p. 240)
                                  References
[1]     De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning 1. Adriana
Publishing Co., Inc.
[2]     Kubiszyn, T., & Borich, G. (2013). Educational testing and measurement:
Classroom application and practice. John Wiley & Sons, Inc.
[3]     Navaroo, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment of learning
1. Lorimar Publishing, Inc.
[4]Popham, W.J. (2017). Classroom assessment: What teachers need to know.
Pearson Education, Inc.
                                           [Prof Ed 221 Assessment of Student Learning 1]
  UNIVERSITY OF SOUTHERN MINDANAO
Interpretation and Utilization of
       Assessment Data
        Prof Ed 221 - ASL 1
  Topic Outline
● Types and interpretation of test scores
● Grading and reporting
● Issues in assessment and grading
                                            Insert Running Title   2
     Intended Learning Outcomes
1. Provide meaning to test results using norm-referenced and
   criterion-referenced interpretations.
2. Utilize assessment results to report students’ learning
   progress and achievement.
• Analyze issues in assessment and grading employing
  principles of assessment.
                                                                       3
                                                       Insert Running Title
 Aim
To provide results
• In brief,
• understandable form
• for varied users.
                        Insert Running Title   4
  The big questions
1. What should I count—just achievement, or effort too?
2. How do I interpret a student’s score? Do I compare it to:
   • • other students’ scores (norm-referenced),
   • • a standard of what they can do (criterion-referenced),
   • • or some estimate of what they are able to do (learning potential, or
     self-referenced)?
3. What should my distribution of grades be, and how do I determine it?
4. How do I display student progress, or strengths and weaknesses, to
students and their parents?
                                                                 Insert Running Title   5
  Where do I get the answers?
1. Your school may have some policies or guidelines
2. Apply what you learn in this chapter
3. Consult your teaching colleagues, and then apply your good judgment
4. Learn from first-hand experience
                                                              Insert Running Title   6
     Functions of Grading and Reporting
     Systems
1. Improve students’ learning by:               Best achieved by:
                                                •      day-to-day
✓clarifying instructional objectives for them   tests and feedback
✓showing students’ strengths & weaknesses       •      plus periodic
✓providing information on personal-social       integrated
                                                       summaries
 development
✓enhancing students’ motivation (e.g., short-
 term goals)
✓indicating where teaching might be modified
                                                       Insert Running Title   7
    Functions of Grading and Reporting
    Systems
2. Reports to parents/guardians
✓Communicates objectives to parents, so they can help
 promote learning
✓Communicates how well objectives being met, so parents
 can better plan
                                             Insert Running Title   8
    Functions of Grading and Reporting
    Systems
2. Reports to parents/guardians
✓Communicates objectives to parents, so they can help
 promote learning
✓Communicates how well objectives being met, so parents
 can better plan
                                             Insert Running Title   9
     Functions of Grading and Reporting
     Systems
3. Administrative and guidance uses
✓ Help decide promotion, graduation, honors, athletic
 eligibility
✓ Report achievement to other schools or to employers
✓ Provide input for realistic educational, vocational, and
 personal counseling
                                                  Insert Running Title   10
     Types of Grading and Reporting
     Systems
1. Traditional letter-grade system
✓ Easy and can average them
✓But of limited value when used as the sole report, because:
   1. they end up being a combination of achievement, effort, work
habits, behavior
   2. teachers differ in how many high (or low) grades they give
   3. they are therefore hard to interpret
   4. they do not indicate patterns of strength and weakness
                                                        Insert Running Title   11
     Types of Grading and Reporting
     Systems
2. 1. Pass-Fail
✓ Easy and can average them
✓But of limited value when used as the sole report, because:
   1. they end up being a combination of achievement, effort, work
habits, behavior
   2. teachers differ in how many high (or low) grades they give
   3. they are therefore hard to interpret
   4. they do not indicate patterns of strength and weakness
                                                        Insert Running Title   12