0% found this document useful (0 votes)
646 views8 pages

PE 7 MODULE 7 Correct

This document discusses establishing the validity and reliability of tests. It defines validity as the degree to which a test measures what it intends to measure. There are three main types of validity discussed: content validity, criterion-related validity, and construct validity. Reliability refers to the consistency of test results over multiple administrations of the test. Several methods for establishing reliability are discussed, including test-retest reliability, parallel forms reliability, and internal consistency reliability. Factors that can affect both the validity and reliability of tests are also outlined.

Uploaded by

Emily Pestaño
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
646 views8 pages

PE 7 MODULE 7 Correct

This document discusses establishing the validity and reliability of tests. It defines validity as the degree to which a test measures what it intends to measure. There are three main types of validity discussed: content validity, criterion-related validity, and construct validity. Reliability refers to the consistency of test results over multiple administrations of the test. Several methods for establishing reliability are discussed, including test-retest reliability, parallel forms reliability, and internal consistency reliability. Factors that can affect both the validity and reliability of tests are also outlined.

Uploaded by

Emily Pestaño
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

PROF ED 7
EDUCATIONAL MEASUREMENT & EVALUATION
MODULE NO. 7

I. TITLE: ESTABLISHING VALIDITY & RELIABILITY OF TESTS – part 1

II. INTRODUCTION

Test constructors believed that every assessment tool should


possess good qualities. Most literatures consider the most common
technical concepts in assessment are the validity and reliability. For any
type of assessment whether traditional or authentic, it should be carefully
developed so that it may serve whatever purpose it may have. In this
module, you will know the different ways of establishing validity and
establishing reliability.

III. LEARNING OUTCOMES

At the end of this module, you should be able to:


1. define the following: validity, reliability, content validity, construct
validity, criterion-related validity, predictive validity, concurrent validity,
test-retest method, equivalent/parallel method, split-half method, Kuder-
Richardson formula, validity coefficient, reliability coefficient;
2. discuss the different approaches of validity
3. present and discuss the different methods of solving the reliability of
tests
4. identify the different factors affecting the validity of the test
5. identify the factors affecting the reliability of the test
6. compute the validity coefficient and reliability coefficients
7. interpret the reliability coefficient and validity coefficient of the test.

IV. LEARNING CONTENT (KEY CONCEPTS)

A. VALIDITY
Validity means the degree to which a test measures what it intends to
measure or the truthfulness of the response. The validity of a test concerns
what the test measures and how well it does so. For example, to assess the
validity of teacher-made test, it is necessary to consider what kind of teacher-
made test is supposed to measure and how well it serves its purpose.
It is concerned whether the information obtained from an
assessment permits the teacher to make a correct decision about a
student’s learning. This means that the appropriateness of score-based
inferences or decisions made are based on the students’ test results. It is
the extent to which a test measures what is supposed to measure.
When the assessment tool provides information that is irrelevant to
the learning objectives it was intended to help, it makes the interpretation of
the test results invalid. Teachers must select and use procedures,
2

performance criteria and settings to all forms of assessment most especially


performance-based assessment so that fairness to all students is maintained.
Assessing student’s performance on the basis of persona characteristics
rather than on the performance of the students lowers the validity of the
assessment.
For instance, Social Science test is administered twice to second
year college students. The answer of students C on Item 12 “What is the
capital of Catanduanes?” is Virac. In the second administration of the test,
his answer is still the same on Item 12. His answer is both valid and reliable.
Valid because his answer is correct and reliable due to the consistency of his
answer.

❖ TYPES OF VALIDITY
1. CONTENT VALIDITY – a type of validation that refers to the
relationship between a test and the instructional objectives, establishes
content so that the test measures what it is supposed to measure.
Things to remember about validity:
a. The evidence of the content validity of a test is found in the Table
of Specification.
b. This is the most important type of validity for a classroom
teacher.
c. There is no coefficient for content validity. It is determined by
experts judgmentally, not empirically.

2. CRITERION-RELATED VALIDITY - a type of validation that refers to


the extent to which scores from a test relate to theoretically similar
measures. It is a measure of how accurately a student’s current test
score can be used to estimate a score on a criterion measure, like
performance in courses, classes or another measurement instrument.
a. Concurrent validity. The criterion and the predictor data are
collected at the same time. This type of validity is appropriateness for
tests designed to assess a student’s status; it is good diagnostic
screening test. It is established by correlating the criterion and the
predictor using Pearson product correlation coefficient and other
statistical tools correlations.

b. Predictive Validity. A type of validation that refers to a measure of


the extent to which a student’s current test result can be used to estimate
accurately the outcome of the student’s performance at later time. It is
appropriate for tests designed to assess student’s future status on a
criterion.

Predictive validity is very important in psychological testing, like if


the psychologists want to predict responses, behaviors, outcomes,
3

performances and others. These scores will be used in the assessment


process. Regression analysis can be used to predict the criterion of a
single predictor or multiple predictors.

3. CONSTRUCT VALIDITY. A type of validation that refers to the measure


of the extent to which a test measures a theoretical and unobservable
variable qualities such as intelligence, math achievement, performance
anxiety and the like, over the period of time on the basis of gathering
evidence. It is established through intensive study of the test or
measurement instrument using convergent/divergent validation and
factor analysis.

a. Convergent validity – is a type of construct validation wherein a


test has a high correlation with another test that measures the same
construct.
b. Divergent validity – is a type of construct validation wherein a test
has low correlation with a test that measures a different construct. In this
case, a high validity occurs only when there is a low correlation
coefficient between the tests that measure different traits.
c. Factor analysis is another method of assessing the construct
validity of a test using complex statistical procedures conducted with
different procedures.

❖ IMPORTANT THINGS TO REMEMBER ABOUT VALIDITY


1. Validity refers to the decisions we make, and not to the test itself or
to the measurement.
2. Like reliability, validity is not an all-or-nothing concept; it is never
totally absent or absolutely perfect.
3. A validity estimate, called a validity coefficient, refers to specific
type of validity. It ranges between 0 and 1.
4. Validity can never be finally determined; it is specific to each
administration of the test.

❖ FACTORS AFFECTING THE VALIDITY OF A TEST ITEM.


1. The test itself.
2. The administration and scoring of a test.
3. Personal factors influencing how students response to the test.
4. Validity is always specific to a particular group.

❖ REASONS THAT REDUCE THE VALIDITY OF THE TEST ITEM


4

1. Poorly constructed test items


2. Unclear directions.
3. Ambiguous test items
4. Too difficult vocabulary
5. Complicated syntax
6. Inadequate time limit
7. Inappropriate level of difficulty
8. Unintended clues
9. Improper arrangement of test items

❖ GUIDE QUESTIONS TO IMPROVE VALIDITY


1. What is the purpose of the test?
2. How well do the instructional objectives selected for the test
represent the instructional goals?
3. Which test item format will best measure the achievement of each
objective?
4. How many test items will be required to measure the performance
adequately to each objective?
5. When and how will the test be administered?

❖ VALIDITY COEFFICIENT
❖ The validity coefficient is the computed value of the rxy. In theory,
the validity coefficient has values like the correlation that ranges
from 0 to 1. In practice, most of the validity scores are usually
small and they range from 0.3 to 0.5, few exceed 0.6 to 0.7. Hence,
there is a lot of improvement in most of our psychological
measurement.
Another wat of interpreting the findings is to consider the
squared correlation coefficient (rxy)2, this is called
COEFFICIENT OF DETERMINATION. This indicates how much
variation in the criterion can be accounted for by the predictor
(teacher test).

B. RELIABILITY OF A TEST
- refers to the consistency with which it yields the same rank for
individuals who take the test more than once (Kubiszyn and Borich, 2007).
That is, how consistent test results or other assessment results from one
measurement to another. We can say that a test is reliable when it can be
used to predict practically the same scores when test administered twice to
the same group and with a reliability index of 0.60 or above.
5

-the reliability of a test can be determined by means of Pearson


product correlation coefficient, Spearman-Brown formula and Kuder-
Richardson formulas.
- it is concerned with the consistency of responses from moment to
moment. Even if a student takes the same test twice, the test yields the
same results. However, a reliable test may not always be valid.
For instance, a student took a Math test twice. His answer is six (6)
to Item 10, “How many sides are there in a nonagon?” In the second
administration, his answer to the same question remains the same. Thus,
his response is reliable due to consistency of responses, but not valid
because it is incorrect.

❖ FACTORS AFFECTING RELIABILITY OF A TEST


1. length of the test
2. moderate item difficulty
3. objective scoring
4. heterogeneity of the student group
5. limited time
❖ FOUR METHOD OF ESTABLISHING RELIABILITY OF A TEST
1. TEST-RETEST METHOD. (Measure of Stability) -A type of reliability
determined by administering the same test twice to the same group of
students with any time interval between the tests. The results of the
test scores are correlated using the Pearson product correlation
coefficient ( r ) and this correlation coefficient provides measure of
stability. This indicates how stable the test result over a period of
time.

2. EQUIVALENT OR PARALLEL FORMS. (Measure of Equivalence) A


type of reliability determined by administering two different but
equivalent forms of the test to the same group of students in close
succession. The equivalent forms are constructed to the same set of
specifications that is similar in content, type of items and difficulty.
The results of the test scores are correlated using the Pearson
product correlation coefficient ( r ) and this correlation coefficient
provides a measure of the degree to which generalization about the
performance of students from one assessment to another assessment
is justified. It measures the equivalence of the tests.

3. SPLIT-HALF METHOD. (Measure of Stability & Equivalence) - A


type of reliability determined by administering test once and score
two equivalent halves of the test. To split the test into halves that are
equivalent, the usual procedure is to score the even-numbered and
the odd-numbered test item separately. This provides two scores for
each student. The results of the test scores are correlated using the
6

Spearman-Brown Formula and this correlation coefficient provides a


measure of internal consistency. It indicates the degree to which
consistent results are obtained from two halves of the test.

4. KUDER-RICHARDSON FORMULA. (Measure of Internal


Consistency) - A type of reliability determined by administering the
test once and score the total test and apply the Kuder-Richardson
Formula. The Kuder-Richardson formula is applicable only in
situations where students’ responses are scored dichotomously, and
therefore, is most useful with traditional test items that are scored as
right or wrong, true or false, and yes or no type. KR-20 formula
estimates of reliability provide information whether the degree to
which the items in the test measure is of the same characteristic, it is
an assumption that all items are of equal in difficulty. Another
formula for testing the internal consistency of a test is the KR-21
formula, which is not limited to test items that are scored
dichotomously.

❖ RELIABILITY COEFFICIENT
It is a measure of the amount of error associated with
the test scores.

❖ Description of Reliability Coefficient


a. The range of the reliability coefficient is from 0 to 1.0.
b. The acceptable range value is 0.60 or higher.
c. The higher the value of the reliability coefficient, the
more reliable the overall test scores
d. Higher reliability indicates that the test items measure
the same thing.

❖ INTERPRETING RELIABILITY COEFFICIENT


1. The group variability will affect the size of the reliability
coefficient. Higher coefficient results from
heterogeneous groups than from the homogeneous
groups. As group variability increases, reliability goes
up.
2. Scoring reliability limits test score reliability. If tests are
scored unreliable, error is introduced. This will limit the
reliability of the test scores.

3. Test length affects test score reliability. As the length


increases, the test’s reliability tends to go up.

4. Item difficulty affects test score reliability. As test items


become very easy or very hard, the test’s reliability goes
down.
7

LEVEL OF RELIABILITY COEFFICIENT

RELIABILITY COEFFICIENT INTERPRETATION


Above 0.90 Excellent reliability
0.81 – 0.90 Very good for a classroom test
0.71 – 0.80 Good for classroom test. There are
probably few items needs to be improved.
0.61 – 0.70 Somewhat low. The test needs to be
supplemented by other measures (more
test) to determine grades
0.51 – 0.60 Suggests need for revision of test, unless
it is quite short (ten or fewer items).
Needs to be supplemented by other
measures (more tests) for grading
0.50 and below Questionable reliability. This test should
not contribute heavily to the course grade,
and it needs revision

INTERPRETATION OF CORRELATION VALUE ( r )


▪ An r from 0.00 indicates zero correlation.
▪ An r from 0.01 to ± 0.20 denoted negligible correlation.
▪ An r from 0.21 to ± 0.40 means low or slight correlation.
▪ An r from 0.41 to ± 0.70 signifies marked or moderate relationship.
▪ An r from 0.71 ± 0.90 deals on high relationship.
▪ An r from 0.91 to ± 0.99 denotes very high relationship.
▪ An r of ± 1.0 means perfect correlation.

• REMEMBER: The perfect correlation is 1.0. If the correlation value is


more than 1.0, there is something wrong in the computation.

(NOTE: Sample Computations on the next module)

V. END OF MODULE ASSESSMENT. Handwritten.

Answer the following questions:

1. Is a reliable test also a valid test? Why?


2. Is a valid test always a reliable test? Why?
3. How can we improve the validity of the test?
4. How can we improve the reliability of a test?
5. Discuss briefly in your own words the different approaches in
establishing test reliability.

VI. REFERENCES
8

1. Gabuyo, Y.A. (2012). ASSESSMENT OF LEARNING I (Textbook and


Reviewer) (1st ed.). Rex Book Store Inc.
2. Calmorin, L. P. (2004). MEASUREMENT & EVALUATION (3rd ed). National
Bookstore.
3. Calmorin, L. P. (2011) ASSESSMENT OF LEARNING 1. Rex Book Store, Inc.
4. LET Reviewer

You might also like