ITEM ANALYSIS AND
VALIDATION
LEARNING OUTCOMES
Explain the meaning of item analysis, item
validity, reliability, item difficulty,
discrimination index
Determine the validity and reliability of given
test items Determine the quality of a test
item
by its difficulty index, discrimination index
and plausibility of options (for a selected-
response test)
01
INTRODUCTION
THE TEACHER NORMALLY PREPARES A DRAFT OF THE TEST. SUCH A
DRAFT IS SUBJECTED TO ITEM ANALYSIS AND VALIDATION IN ORDER
TO ENSURE THE FINAL VERSION OF THE TEST WOULD BE USEFUL AND
FUNCTIONAL. THE ITEM ANALYSIS WILL PROVIDE INFORMATION THAT
WILL ALLOW THE TEACHER TO DECIDE WHETHER TO REVISE OR
REPLACE AN ITEM. THEN, FINALLY, THE FINAL DRAFT OF THE TEST IS
SUBJECTED TO VALIDATION IF THE INTENT IS TO MAKE USE OF THE
TEST AS A STANDARD TEST FOR THE PARTICULAR UNIT OR GRADING
PERIOD.
6.1. ITEM ANALYSIS: DIFFICULTY INDEX AND DISCRIMINATION INDEX
THERE ARE TWO IMPORTANT CHARACTERISTICS OF AN ITEM THAT WILL BE OF INTEREST TO THE TEACHER.
THESE ARE: (A) ITEM DIFFICULTY AND (B) DISCRIMINATION INDEX. WE SHALL LEARN HOW TO MEASURE
THESE CHARACTERISTICS AND APPLY OUR KNOWLEDGE IN MAKING A DECISION ABOUT THE ITEM IN
QUESTION.
THE DIFFICULTY OF AN ITEM OR ITEM DIFFICULTY IS DEFINED AS THE NUMBER OF STUDENTS WHO ARE
ABLE TO ANSWER THE ITEM CORRECTLY DIVIDED BY THE TOTAL NUMBER OF STUDENTS. THUS:
ITEM DIFFICULTY = NUMBER OF STUDENTS WITH CORRECT ANSWER/ TOTAL NUMBER OF STUDENTS
EXAMPLE: WHAT IS THE ITEM DIFFICULTY INDEX OF AN ITEM IF 25 STUDENTS ARE UNABLE TO ANSWER IT
CORRECTLY WHILE 75 ANSWERED IT CORRECTLY?
HERE, THE TOTAL NUMBER OF STUDENTS IS 100, HENCE THE ITEM DIFFICULTY INDEX IS 75/100 OR 75%.
ONE PROBLEM WITH THIS TYPE OF DIFFICULTY INDEX IS THAT IT MAY NOT ACTUALLY INDICATE
THAT THE ITEM IS DIFFICULT (OR EASY). A STUDENT WHO DOES NOT KNOW THE SUBJECT MATTER
WILL NATURALLY BE UNABLE TO ANSWER THE ITEM CORRECTLY EVEN IF THE QUESTION IS EASY.
HOW DO WE DECIDE ON THE BASIS OF THIS INDEX WHETHER THE ITEM IS TOO DIFFICULT OR TOO
EASY?
THE FOLLOWING ARBITRARY RULE IS OFTEN USED IN THE LITERATURE:
RANGE OF DIFFICULTY INDEX INTERPRETATION ACTION
0 - 0.25 DIFFICULT REVISE OR DISCARD
0.26 - 0.75 RIGHT DIFFICULTY RETAIN
0.76 - ABOVE EASY REVISE OR DISCARD
DIFFICULT ITEMS TEND TO DISCRIMINATE BETWEEN THOSE WHO KNOW AND THOSE WHO DO NOT
KNOW THE ANSWER. CONVERSELY, EASY ITEMS CANNOT DISCRIMINATE BETWEEN THESE TWO
GROUPS OF STUDENTS. WE ARE THEREFORE INTERESTED IN DERIVING A MEASURE THAT WILL
TELL US WHETHER AN ITEM CAN DISCRIMINATE BETWEEN THESE TWO GROUPS OF STUDENTS.
SUCH A MEASURE IS CALLED AN INDEX OF DISCRIMINATION.
DISCRIMINATION INDEX IS THE DIFFERENCE BETWEEN THE PROPORTION OF THE TOP SCORERS
WHO GOT AN ITEM CORRECT AND THE PROPORTION OF THE LOWEST SCORERS WHO GOT THE ITEM
RIGHT. THE DISCRIMINATION INDEX RANGE IS BETWEEN -1 AND +1. THE CLOSER THE
DISCRIMINATION INDEX IS TO +1, THE MORE EFFECTIVELY THE ITEM CAN DISCRIMINATE OR
DISTINGUISH BETWEEN THE TWO GROUPS OF STUDENTS. A NEGATIVE DISCRIMINATION INDEX
MEANS MORE FROM THE LOWER GROUP GOT THE ITEM CORRECTLY. THE LAST ITEM IS NOT GOOD
AND SO MUST BE DISCARDED.
INDEX RANGE INTERPRETATION ACTION
-1.0 - -.50 CAN DISCRIMINATE DISCARD
-.55 - 0.45 BUT ITEM IS QUESTIONABLE REVISE
0.46 - 1.0 DISCRIMINATING ITEM INCLUDE
EXAMPLE: CONSIDER A MULTIPLE CHOICE TYPE OF TEST OF WHICH THE FOLLOWING DATA WERE OBTAINED:
ITEM OPTIONS
A B* C D TOTAL
1 0 40 20 20 UPPER 25%
0 15 5 0 LOWER 25%
0 5 10 5
THE CORRECT RESPONSE IS B. LET US COMPUTE THE DIFFICULTY INDEX AND INDEX OF DISCRIMINATION:
DIFFICULTY INDEX = NO. OF STUDENTS GETTING CORRECT RESPONSE/TOTAL
= 40/100 = 40%, WITHIN RANGE OF A "GOOD ITEM"
THE DISCRIMINATION INDEX CAN SIMILARLY BE COMPUTED:
DU = NO. OF STUDENTS IN UPPER 25% WITH CORRECT
RESPONSE/NO. OF STUDENTS IN THE UPPER 25%
= 15/20 = .75 OR 75%
DL = NO. OF STUDENTS IN LOWER 25% WITH CORRECT
RESPONSE/ NO. OF STUDENTS IN THE LOWER 25%
= 5/20 = .25 OR 25%
DISCRIMINATION INDEX = DU - DL = .75 - .25 = .50 OR 50%.
THUS, THE ITEM ALSO HAS A "GOOD DISCRIMINATING POWER."
MORE SOPHISTICATED DISCRIMINATION INDEX
ITEM DISCRIMINATION REFERS TO THE ABILITY OF AN ITEM TO
DIFFERENTIATE AMONG STUDENTS ON THE BASIS OF HOW WELL THEY
KNOW THE MATERIAL BEING TESTED. VARIOUS HAND CALCULATION
PROCEDURES HAVE TRADITIONALLY BEEN USED TO COMPARE ITEM
RESPONSES TO TOTAL TEST SCORES USING HIGH AND LOW SCORING
GROUPS OF STUDENTS. COMPUTERIZED ANALYSES PROVIDE MORE ACCURATE
ASSESSMENT OF THE DISCRIMINATION POWER OF ITEMS BECAUSE THEY
TAKE INTO ACCOUNT RESPONSES OF ALL STUDENTS RATHER THAN JUST
HIGH AND LOW SCORING GROUPS.
SUMMARY
THE ITEM-ANALYSIS PROCEDURE FOR NORM-PROVIDES THE
FOLLOWING INFORMATION:
1. THE DIFFICULTY OF THE ITEM;
2. THE DISCRIMINATING POWER OF THE ITEM, AND
3. THE EFFECTIVENESS OF EACH ALTERNATIVE
Some benefits derived from Item Analysis are:
1. It provides useful information for class discussion of the test.
2. It provides data which help students improve their learning.
3. It provides insights and skills that lead to the preparation for better tests in
the future.
Sumpay sa summary 🥹
ASSESSMENT
ASSESSMENT IN LEARNING 1
6.2. VALIDATION AND VALIDITY
AFTER PERFORMING THE ITEM ANALYSIS AND REVISING THE ITEMS WHICH NEED REVISION, THE NEXT
STEP IS TO VALIDATE THE INSTRUMENT. THE PURPOSE OF VALIDATION IS TO DETERMINE THE
CHARACTERISTICS OF THE WHOLE TEST ITSELF, NAMELY, THE VALIDITY AND RELIABILITY OF THE TEST.
VALIDATION IS THE PROCESS OF COLLECTING AND ANALYZING EVIDENCE TO SUPPORT THE
MEANINGFULNESS AND USEFULNESS OF THE TEST.
VALIDITY. VALIDITY IS THE EXTENT TO WHICH A TEST MEASURES WHAT IT PURPORTS TO MEASURE OR
AS REFERRING TO THE APPROPRIATENESS, CORRECTNESS, MEANINGFULNESS AND USEFULNESS OF
THE SPECIFIC DECISIONS A TEACHER MAKES BASED ON THE TEST RESULTS. THESE TWO DEFINITIONS
OF VALIDITY DIFFER IN THE SENSE THAT THE FIRST DEFINITION REFERS TO THE TEST ITSELF WHILE
THE SECOND REFERS TO THE DECISIONS MADE BY THE TEACHER BASED ON THE TEST. A TEST IS VALID
WHEN IT IS ALIGNED WITH THE LEARNING OUTCOME.
A TEACHER WHO CONDUCTS TEST VALIDATION MIGHT WANT TO GATHER DIFFERENT KINDS OF
EVIDENCE. THERE ARE ESSENTIALLY THREE MAIN TYPES OF EVIDENCE THAT MAY BE
COLLECTED: CONTENT-RELATED EVIDENCE OF VALIDITY, CRITERION-RELATED EVIDENCE OF
VALIDITY AND CONSTRUCT-RELATED EVIDENCE OF VALIDITY. CONTENT-RELATED EVIDENCE
OF VALIDITY REFERS TO THE CONTENT AND FORMAT OF THE INSTRUMENT. HOW
APPROPRIATE IS THE CONTENT? HOW COMPREHENSIVE? DOES IT LOGICALLY GET AT THE
INTENDED VARIABLE? HOW ADEQUATELY DOES THE SAMPLE OF ITEMS OR QUESTIONS
REPRESENT THE CONTENT TO BE ASSESSED?
CRITERION-RELATED EVIDENCE OF VALIDITY REFERS TO THE RELATIONSHIP BETWEEN
SCORES OBTAINED USING THE INSTRUMENT AND SCORES OBTAINED USING ONE OR MORE
OTHER TESTS (OFTEN CALLED CRITERION). HOW STRONG IS THIS RELATIONSHIP? HOW WELL
DO SUCH SCORES ESTIMATE PRESENT OR PREDICT FUTURE PERFORMANCE OF A CERTAIN
TYPE?
CONSTRUCT-RELATED EVIDENCE OF VALIDITY REFERS TO THE NATURE OF THE PSYCHOLOGICAL CONSTRUCT OR
CHARACTERISTIC BEING MEASURED BY THE TEST. HOW WELL DOES A MEASURE OF THE CONSTRUCT EXPLAIN
DIFFERENCES IN THE BEHAVIOR OF THE INDIVIDUALS OR THEIR PERFORMANCE ON A CERTAIN TASK?
THE USUAL PROCEDURE FOR DETERMINING CONTENT VALIDITY MAY BE DESCRIBED AS FOLLOWS: THE TEACHER
WRITES OUT THE OBJECTIVES OF THE TEST BASED ON THE TABLE OF SPECIFICATIONS AND THEN GIVES THESE
TOGETHER WITH THE TEST TO AT LEAST TWO (2) EXPERTS ALONG WITH A DESCRIPTION OF THE INTENDED TEST
TAKERS. THE EXPERTS LOOK AT THE OBJECTIVES, READ OVER
MICHAEL JHON
THE ITEMS IN THE TEST AND PLACE A CHECK MARK IN FRONT OF EACH QUESTION OR ITEM THAT THEY FEEL DOES
NOT MEASURE ONE OR MORE OBJECTIVES. THEY ALSO PLACE A CHECK MARK IN FRONT OF EACH OBJECTIVE NOT
ASSESSED BY ANY ITEM IN THE TEST. THE TEACHER THEN REWRITES ANY ITEM CHECKED AND RESUBMITS TO THE
EXPERTS AND/OR WRITES NEW ITEMS TO COVER THOSE OBJECTIVES NOT COVERED BY THE EXISTING TEST. THIS
CONTINUES UNTIL THE EXPERTS APPROVE OF ALL ITEMS AND ALSO UNTIL THE EXPERTS AGREE THAT ALL OF THE
OBJECTIVES ARE SUFFICIENTLY COVERED BY THE TEST.
IN ORDER TO OBTAIN EVIDENCE OF CRITERION-RELATED VALIDITY, THE TEACHER
USUALLY COMPARES SCORES ON THE TEST IN QUESTION WITH THE SCORES ON SOME
OTHER INDEPENDENT CRITERION TEST WHICH PRESUMABLY HAS ALREADY HIGH
VALIDITY. FOR EXAMPLE, IF A TEST IS DESIGNED TO MEASURE MATHEMATICS ABILITY
OF STUDENTS AND IT CORRELATES HIGHLY WITH A STANDARDIZED MATHEMATICS
ACHIEVEMENT TEST (EXTERNAL CRITERION), THEN WE SAY WE HAVE HIGH CRITERION-
RELATED VALIDITY. IN PARTICULAR, THIS TYPE OF CRITERION-RELATED VALIDITY IS
CALLED ITS CONCURRENT VALIDITY. ANOTHER TYPE OF CRITERION-RELATED VALIDITY
IS CALLED PREDICTIVE VALIDITY WHEREIN THE TEST SCORES IN THE INSTRUMENT ARE
CORRELATED WITH SCORES ON A LATER PERFORMANCE (CRITERION MEASURE) OF THE
STUDENTS. FOR EXAMPLE, THE MATHEMATICS ABILITY TEST CONSTRUCTED BY THE
TEACHER MAY BE CORRELATED WITH THEIR LATER PERFORMANCE IN A DIVISION-WIDE
MATHEMATICS ACHIEVEMENT TEST.
IN SUMMARY CONTENT VALIDITY REFERS TO HOW WILL THE TEST
ITEMS REFLECT THE KNOWLEDGE ACTUALLY REQUIRED FOR A
GIVEN TOPIC AREA (E.G. MATH). IT REQUIRES THE USE OF
RECOGNIZED SUBJECT MATTER EXPERTS TO EVALUATE WHETHER
TEST ITEMS ASSESS DEFINED OUTCOMES. DOES A PRE-
EMPLOYMENT TEST MEASURE EFFECTIVELY AND
COMPREHENSIVELY THE ABILITIES REQUIRED TO PERFORM THE
JOB? DOES AN ENGLISH GRAMMAR TEST MEASURE EFFECTIVELY
THE ABILITY TO WRITE GOOD ENGLISH?
CRITERION-RELATED VALIDITY IS ALSO KNOWN AS CONCRETE VALIDITY
BECAUSE CRITERION VALIDITY REFERS TO A TEST’S CORRELATION WITH A
CONCRETE OUTCOME.
IN THE CASE OF PRE-EMPLOYMENT TEST, THE TWO VARIABLES THAT ARE
COMPARED ARE TEST SCORES AND EMPLOYEE PERFORMANCE.
THERE ARE 2 MAIN TYPES OF CRITERION VALIDITY-CONCURRENT VALIDITY AND
PREDICTIVE VALIDITY. CONCURRENT VALIDITY REFERS TO A COMPARISON
BETWEEN THE MEASURE IN QUESTION AND AN OUTCOME ASSESSED AT THE
SAME TIME.
RELIABILITY IS ABOUT HOW CONSISTENT THE RESULTS OF A TEST ARE —
WHETHER A PERSON WOULD GET THE SAME RESULTS IF THEY TOOK THE TEST
AGAIN OR ANSWERED SIMILAR QUESTIONS. TO MEASURE THIS, METHODS
LIKE THE SPLIT-HALF OR KUDER-RICHARDSON FORMULAS (KR-20 OR KR-21)
ARE USED.
RELIABILITY AND VALIDITY ARE CONNECTED. IF A TEST ISN’T RELIABLE, IT
CAN’T BE VALID. BUT JUST BECAUSE SOMETHING IS RELIABLE DOESN’T
ALWAYS MEAN IT’S VALID. HOWEVER, IF A TEST IS PROVEN TO BE VALID,
IT'S ALMOST ALWAYS RELIABLE TOO.
THERE ARE DIFFERENT TYPES OF VALIDITY.
PREDICTIVE VALIDITY CHECKS IF TEST RESULTS CAN
PREDICT FUTURE OUTCOMES — FOR EXAMPLE, SEEING
IF NAT SCORES CAN PREDICT COLLEGE GRADES.
CONSTRUCT VALIDITY CHECKS IF A TEST ACTUALLY
MEASURES WHAT IT’S SUPPOSED TO. IF YOU WANT TO
MEASURE DEPRESSION BUT YOUR TEST ENDS UP
MEASURING ANXIETY, THEN IT LACKS CONSTRUCT
VALIDITY.
THANK YOU AND
GODBLESS
EVERYONE