Paprint

The document discusses different types of reliability in testing including test-retest reliability, parallel forms reliability, split-half reliability, Kuder-Richardson reliability, and interscorer reliability. It provides details on calculating and interpreting each type of reliability.

Uploaded by

christenecancino21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views3 pages

Paprint

Uploaded by

christenecancino21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Reliability in the same form and should cover the same

• Consistency in measurement content. Range and level of difficulty should be

• Doesn’t mean that test is good or bad equal.
• Measured by the reliability coefficient, ratio of true score • The degree of relationship between various forms of
variance and the total variance. (Total Variance = True a test is measured by a coefficient of equivalence.
Score Variance + Error Variance). • Such reliability coefficient is a measure of both
• The greater the proportion of the total variance attributed to true temporal consistency and consistency of
variance, the more reliable the test. response to different item samples or test
forms.
Sources of Error Variance • If you want to use this kind of reliability,
• TEST CONSTRUCTION- Item sampling/ Content Sampling ensure that you pay attention to content
sampling.
• TEST ADMINISTRATION- Temperature, materials, test
taker variables, examiner-related variables • Reports should also be accompanied by
length of interval between testing as well
• TEST SCORING AND INTEPRETATION- Scoring
as relevant intervening experiences.
and scoring systems, human & electronic
• Note that parallel forms are also helpful not just
to establish reliability but also for use in follow
• The most common way of computing a correlation is
up studies. It is a means of reducing
through the Pearson Product Moment Correlation
likelihood of coaching or cheating.
Coefficient.
• May still be subject to practice effect.
• Correlation coefficient (r) expresses the degree of
correspondence or a relationship between two sets of
3. Split-Half Reliability Estimate
scores. A perfect correlation is ±1.00, a zero correlation
indicates complete absence of a relationship as might • Split half reliability is obtained by correlating two
occur by chance. pairs of scores obtained from equivalent halves
of a single test administered once.
• Desirable reliability coefficients usually fall in the 0.80s and
0.90s • Temporal stability doesn’t enter as only one test
session is involved.
• Concept underlies the computation of error of
measurement of a single score, whereby we can predict the * A Measure of internal consistency or inter-item consistency or
range of fluctuation likely to occur in a single individual’s test homogeneity
score as a result of irrelevant or chance factors. • Randomly assign items to one or the other half of the
test.
RELIABILITY ESTIMATES • Odd-Even Split
1. Test-Retest Reliability • Divide by content so that each half is equivalent in
• Estimate of reliability obtained by correlating content and difficulty.
pairs of scores from the same people on two
different administrations of the same test. How to Split the Test
• It assumes that what is being measured is stable over • STEP 1: Divide test into equivalent halves
time • STEP 2: Calculate Pearson r between scores of the
• It assumes no significant learning occurs in the time two halves
between administrations • STEP 3: Adjust the half test reliability using the
• It is subject to errors due to practice effects. Spearman-Brown formula
• Nature of the test may also change with • Estimates the effect of lengthening and
repetition, as is the case with problems involving shortening a test. Other things being equal, the
reasoning or ingenuity. longer the test, the more reliable it will be.
• Most appropriate for measuring reaction time and
perceptual judgments 4. Kuder-Richardson Reliability & Coefficient Alpha
• *The reliability estimate is called coefficient of • Way of finding reliability also using a single form
stability. of the test, based on the consistency of
responses in the test.
Take note: • This interitem consistency is influenced by two
• When test-retest reliability is reported in the test sources of error variance: (1) content sampling (as
manual, interval over which it is measured should in alternate form and split half reliability) and (2)
always be specified. heterogeneity of the behavior being sampled. The
more homogenous the domain, the higher the
• Since retest correlations decrease interitem consistency.
progressively as interval lengthens, there is an
infinite number of test –retest reliability • It’s important to also consider whether the criterion the
coefficients for any test. test is trying to predict is itself
relatively homogenous or heterogenous.
2. Parallel –Forms Reliability Estimate • The most common formula for finding interitem
• Estimate of the extent to which item sampling consistency is the Kuder Richardson Formula
and other errors have affected scores on 20. Unlike analyzing performance in split
versions of the same test. halves, KR20 analyzes performance in each
item.
• Alternate/parallel: independently constructed test
designed to meet same specifications. Same • Kuder Richardson formula is applicable to tests
number of items, and items should be expressed whose items are scored as right or wrong, or
according to some other all-or-none systems. standard error of measurement (SEM) also called the
• For tests that have multiple-scored items, on a standard error of score. This measure is suited to the
Likert scale personality test for example, a interpretation of individual scores.
generalized formula has been derived called
Coefficient Alpha. Validity
• The procedure is to find the variance of all individual’s
scores for each item and then to add 3 Categories:
this variance across all items. Content validity – evaluation of subjects, topics, or
contents in a test
5. Interscorer Reliability Criterion-related validity – evaluation of the
• Degree of agreement or consistency relationship of scores to scores on other tests or
between two or more scorers with regards instruments
to a particular measure. Construct validity – comprehensive analysis of
• Most applicable for tests of creativity and projective theoretical framework + scores on other tests
tests.
• Controls for examiner variance 1. Content Validity
• Simplest way is to have a sample of test papers • Describes a judgment of how adequately a test
independently scored by two examiners. samples behavior representative of the universe of
behavior that the test is designed to sample.
• Index of measurement is the coefficient of interscorer
reliability.
How to Create Content Valid Items
Errors Due to Examiner’s Bias • a) Content validity is built into a test from the outset
through the choice of appropriate items.
1. Error of Central Tendency • Test specifications are drawn up to show content
areas or topics to be covered, the instructional
objectives or processes to be tested, and the relative
• less than accurate rating or evaluation by a rater or judge due
importance of individual topics and processes.
to that rater’s general tendency to make ratings near the
midpoint of the scale. • b) In test manuals, number and qualifications of subject
matter experts must be reported.
• The rater doesn’t want to identify low or high score, always • c) Other empirical procedures: total scores and
average. performance on individual items can be checked.
• Face validity is a judgment on how relevant the test
items appears to be.
2. Leniency/ Generosity Error • Fundamentally, face validity concerns rapport and
public relations.
• Rater’s tendency to be too forgiving or insufficiently critical. • Face validity can be improved by merely
reformulating test items in terms that appear relevant
and plausible in the particular setting in which they will
3. Severity Error: be used.

2. Criterion-related Validity
• rater’s tendency to be overly critical.
• Judgment of how adequately a test score can be used
to infer an individual’s most probably
4. Halo Effect
standing based on some measure of interest
• Tendency of the leader to judge all aspects of an individual
• The measure of interest being the criterion,
effectiveness of test in predicting an individual’s
using a general impression that was formed on only one or few
performance in specified activities.
of the individual’s characteristics.
• Samples of Criterion: academic achievement,
performance in specialized training, and job
performance, psychiatric diagnosis.
5. Horn Effect:
2 TYPES:
• Opposite of Halo effect, this refers to a tendency to let one
poor rating influence all other ratings, resulting in a lower over-
• Concurrent validity: the extent scores on a new
measure relate to scores from a criterion measure
all evaluation than deserved.
administered at the same time
• Predictive validity: uses the scores from the new
measure to predict performance on a criterion
6. Contrast Error
measure administered at a later time.
• Happens when raters when compares examinees with one
another instead of against performance standards. 3. Construct Validity
• This may result in an average employee being rated as high
• Judgment about the appropriateness of inferences
performer when compared to their underperforming peers, or drawn from test scores regarding individual
a good performer can be rated as poor performer when standings on a variable called construct.
compared to their high performing peers.
• The reliability of a test may be expressed in terms of the
• Is an informed scientific idea developed or
hypothesized to describe or explain behavior.
• Constructs are unobservable, presupposed
(underlying) traits that test developer may invoke to
describe test behavior or criterion performance.

Examples of constructs are job satisfaction, personality,

bigotry, clerical aptitude, depression, motivation, self-
esteem, emotional adjustment, potential
dangerousness, executive potential, creativity,
mechanical comprehension and more.

• Construct validation focused attention on the role of

psychological theory in test construction.

Techniques that Contribute to Construct

Identification
a) Correlations with other tests
• Correlations between new and similar earlier tests
are sometimes cited as evidence that the new test
measures approximately the same general area of
behavior as other tests designated by the same
name.
• Correlations with other tests employed are still
another way to demonstrate the new test is
relatively free from the influence of certain
irrelevant factors.
b) Factor analysis
• Factor Analysis is a refined statistical technique for
analyzing the interrelationships of behavior data. In
the process, the number of variables or categories
in terms of which each individual’s performance
can be described is reduced from the number of
original tests to a relatively small number of factors,
or common traits.
c) Internal Consistency
• Items that fail to show a significantly greater
proportion of “passes” in the upper than in the
lower criterion group are considered as invalid, and
are either eliminated or revised.
• Another application involves correlation of subtests
with total score, and any subtests whose
correlation is too low will be eliminated.
d) Convergent and Discriminant Validation
• DT Campbell (1960) pointed that in order to
demonstrate construct validity, we must show that
not only must a test correlate highly with other
variables it should theoretically correlate, it must
not correlate with variables it’s supposed to differ
from.
• Convergent evidence – high relationship with
measures construct is supposed to be related
to.
• Discriminant evidence – low relationship
with measures construct is NOT supposed to
be related to.

Standard error of estimate (SEEST): computes for the

margin of error to be expected in the
individual’s predicted criterion score as a result of the
imperfect validity of the test.

KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Reliability Estimates: Source of Error Variance Is Test Administration
No ratings yet
Reliability Estimates: Source of Error Variance Is Test Administration
8 pages
Validity and Reliability
100% (1)
Validity and Reliability
22 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
Module 4 Psychometric Properties
No ratings yet
Module 4 Psychometric Properties
49 pages
Reviewer Test Measurement Midterms
No ratings yet
Reviewer Test Measurement Midterms
6 pages
3.3 Validity & Reliability of The Test.
No ratings yet
3.3 Validity & Reliability of The Test.
7 pages
Reliability & Validity
No ratings yet
Reliability & Validity
6 pages
Psych Ass Ratio March 4
No ratings yet
Psych Ass Ratio March 4
4 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Reliability
No ratings yet
Reliability
2 pages
Reliability
No ratings yet
Reliability
3 pages
Effective Employee Selection Techniques
No ratings yet
Effective Employee Selection Techniques
17 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
Test Constrcution
No ratings yet
Test Constrcution
39 pages
Al2 Report
No ratings yet
Al2 Report
87 pages
Lesson 9A - Reliability
No ratings yet
Lesson 9A - Reliability
9 pages
Chapter 5 New
No ratings yet
Chapter 5 New
13 pages
Properties of Assessment Method: Validity
No ratings yet
Properties of Assessment Method: Validity
30 pages
Psych Testing Reviewer Midterm
No ratings yet
Psych Testing Reviewer Midterm
9 pages
Assessment Midtrerm Quiz Reviewer
No ratings yet
Assessment Midtrerm Quiz Reviewer
3 pages
Chracteristics of A Good Test
No ratings yet
Chracteristics of A Good Test
58 pages
Psy 323 Topic 3
No ratings yet
Psy 323 Topic 3
5 pages
Kyu Edu 2301 WK3
No ratings yet
Kyu Edu 2301 WK3
5 pages
Psychometric Test Properties Guide
No ratings yet
Psychometric Test Properties Guide
44 pages
Quality of A Test
No ratings yet
Quality of A Test
7 pages
Midterm Assess
No ratings yet
Midterm Assess
3 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
33 pages
Psyc 85 - Reliability
No ratings yet
Psyc 85 - Reliability
37 pages
Unit 9
No ratings yet
Unit 9
27 pages
Reliability and Validity
No ratings yet
Reliability and Validity
47 pages
Strructures
No ratings yet
Strructures
28 pages
Chapter 4 Assessment & Evaluation
No ratings yet
Chapter 4 Assessment & Evaluation
10 pages
What Is Validit1
No ratings yet
What Is Validit1
5 pages
3 - Reliability
No ratings yet
3 - Reliability
38 pages
Trixielyn Kate N. Roxas - Improving Assessment Items
No ratings yet
Trixielyn Kate N. Roxas - Improving Assessment Items
28 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Effective Selection Techniques
No ratings yet
Effective Selection Techniques
13 pages
Chapter 5 - Reliability
No ratings yet
Chapter 5 - Reliability
9 pages
5 Reliability
No ratings yet
5 Reliability
67 pages
Handbook of Psychological Assessment Fourth Edition
100% (1)
Handbook of Psychological Assessment Fourth Edition
9 pages
Students Slides 1 Realibity
No ratings yet
Students Slides 1 Realibity
59 pages
Characteristics of A Good Test
No ratings yet
Characteristics of A Good Test
41 pages
Reliability 2024
No ratings yet
Reliability 2024
30 pages
Concept of Reliability, Validity and Norms (AutoRecovered)
No ratings yet
Concept of Reliability, Validity and Norms (AutoRecovered)
10 pages
Unit 6
No ratings yet
Unit 6
37 pages
Chapter 5
No ratings yet
Chapter 5
9 pages
Reliabilty Lecture
No ratings yet
Reliabilty Lecture
16 pages
Psyc 385 Exam 2 Study Guide
No ratings yet
Psyc 385 Exam 2 Study Guide
17 pages
Psycass Reviewer
No ratings yet
Psycass Reviewer
19 pages
Final Notes of Psychological Testing
No ratings yet
Final Notes of Psychological Testing
13 pages
Measuring Instrument Module 2
No ratings yet
Measuring Instrument Module 2
10 pages
LESSON 6 Assessment Reviewer
No ratings yet
LESSON 6 Assessment Reviewer
7 pages
Readings Psy211
No ratings yet
Readings Psy211
23 pages
FREL1402
No ratings yet
FREL1402
3 pages
Leadership and Team Management (Mgmt623) Assignment No. 2 HASSAN AMIR (MC190402450)
No ratings yet
Leadership and Team Management (Mgmt623) Assignment No. 2 HASSAN AMIR (MC190402450)
4 pages
U.S. Constitution Rubric A
No ratings yet
U.S. Constitution Rubric A
1 page
Brain Tracy Ebook List
100% (1)
Brain Tracy Ebook List
4 pages
Grade 4 English Lesson Plan
No ratings yet
Grade 4 English Lesson Plan
4 pages
Personal Development Plan
No ratings yet
Personal Development Plan
9 pages
Cognitive Psychology - Write-Up Template
No ratings yet
Cognitive Psychology - Write-Up Template
5 pages
Kerjasama Tim Lintas Fungsi Dan Kinerja Manajemen PDF
No ratings yet
Kerjasama Tim Lintas Fungsi Dan Kinerja Manajemen PDF
11 pages
"The Cultural School of Thought": "Summary"
No ratings yet
"The Cultural School of Thought": "Summary"
5 pages
Japan Google Slides Themes
No ratings yet
Japan Google Slides Themes
12 pages
G1 Term2 Ch6 Water
No ratings yet
G1 Term2 Ch6 Water
8 pages
Argumentative Essay Assessment Rubric
100% (1)
Argumentative Essay Assessment Rubric
1 page
4 Mpa810 Public Policy Analysis
No ratings yet
4 Mpa810 Public Policy Analysis
202 pages
And The Words Become: Sawikaan's Mga Salita NG Taon-Filipino Word-Formation Towards Expanding and Codifying The Lexicon of Philippine English
No ratings yet
And The Words Become: Sawikaan's Mga Salita NG Taon-Filipino Word-Formation Towards Expanding and Codifying The Lexicon of Philippine English
77 pages
The Definitive Guide To The Mckinsey Problem Solving Test (PST) (Part 1 of 2)
No ratings yet
The Definitive Guide To The Mckinsey Problem Solving Test (PST) (Part 1 of 2)
34 pages
Motivation Personalioz - : Longman
No ratings yet
Motivation Personalioz - : Longman
5 pages
PS205 Exam Notes
No ratings yet
PS205 Exam Notes
2 pages
English Pedagogical Module 1: It's All About Science
100% (1)
English Pedagogical Module 1: It's All About Science
1 page
Digital Literacy Across The Curriculum: Key To Themes Overleaf
No ratings yet
Digital Literacy Across The Curriculum: Key To Themes Overleaf
63 pages
TELC Module 2 Packet
No ratings yet
TELC Module 2 Packet
64 pages
SHS QUANTITATIVE RESEARCH TEMPLATE 2024 CHAP 1 4 IMRaD
No ratings yet
SHS QUANTITATIVE RESEARCH TEMPLATE 2024 CHAP 1 4 IMRaD
36 pages
Innovation Management: Munir Hasan
No ratings yet
Innovation Management: Munir Hasan
25 pages
Mental Status Examination Outline
No ratings yet
Mental Status Examination Outline
3 pages
Resume 9 22 19
No ratings yet
Resume 9 22 19
3 pages
Professional Shadowing and Immersion and Profesional Development Design
No ratings yet
Professional Shadowing and Immersion and Profesional Development Design
11 pages
Achieving Excellence by Means of Critical Reflection and Cultural Imagination in Culinary Arts and Gastronomy Education
No ratings yet
Achieving Excellence by Means of Critical Reflection and Cultural Imagination in Culinary Arts and Gastronomy Education
12 pages
MENTAL HEALTH AND WELL BEING IN THE MIDDLE AND Autosaved
100% (1)
MENTAL HEALTH AND WELL BEING IN THE MIDDLE AND Autosaved
33 pages
Summarry
No ratings yet
Summarry
11 pages
GEC-Training-Manual-2025 Version 4
No ratings yet
GEC-Training-Manual-2025 Version 4
67 pages
Types of Research Questions Guide
No ratings yet
Types of Research Questions Guide
5 pages

Paprint

Uploaded by

Paprint

Uploaded by

Reliability in the same form and should cover the same

• Consistency in measurement content. Range and level of difficulty should be

Examples of constructs are job satisfaction, personality,

• Construct validation focused attention on the role of

Techniques that Contribute to Construct

Standard error of estimate (SEEST): computes for the

You might also like