0% found this document useful (0 votes)
11 views25 pages

Psychological Testing - Final

The document discusses the concepts of norms, reliability, and validity in psychological testing, emphasizing the importance of standard scores and norm groups for meaningful test results. It outlines various statistical concepts such as standard deviation, measures of central tendency, and the significance of normal distribution in test scoring. Additionally, it covers the methods for estimating reliability, sources of measurement error, and the different types of validity, including content and construct validity.

Uploaded by

Dene C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Psychological Testing - Final

The document discusses the concepts of norms, reliability, and validity in psychological testing, emphasizing the importance of standard scores and norm groups for meaningful test results. It outlines various statistical concepts such as standard deviation, measures of central tendency, and the significance of normal distribution in test scoring. Additionally, it covers the methods for estimating reliability, sources of measurement error, and the different types of validity, including content and construct validity.

Uploaded by

Dene C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 25

NORMS AND RELIABILITY are identical, SD is zero. (positive or negative as well).

__________________________________ - The SD is simply the square root of the variance. - Subtract the mean of the normative group from the
The Normal Distribution raw score and then divide the difference by standard
- The raw score is the most basic level of - a symmetrical, mathematically defined bell-shaped deviation of the normative group.
information provided by a psychological test. curve * Exemplifies the most desirable psychometric
- In most cases, the raw score is useless by itself. Why normal distribution is preferable? properties.
For results of psychological tests to be meaningful, 1. The normal curve has useful mathematical - One reason for transforming raw scores to
examiners must be able to convert the initial score features that form the basis for several kinds of standard scores is to depict results on different tests
to some form of derived score based on comparison statistical investigations. according to a common scale, but the distribution
to a standardization or norm group. 2. Its mathematical precision, it is possible to must have the same form to make a comparable
compute the area underneath different regions of position of two equivalent SD’s.
- A norm group consists of a sample of examinees the curve (34.13%, 13.59%, & 2.14% above and T Scores and Other Standardized Scores
who are representative of the population for whom below the mean) - Standardized scores are identical to standard
the test is intended. 3. Normal curve often arises spontaneously. Many scores, however, standardized scores are always
- In general, norms indicate an examinee’s standing important human characteristics produce a close expressed as positive whole numbers (no decimal
on the test relative to the performance of other approximation to the normal curve when fractions or negative signs) by producing values
persons of the same age, gender, sex, and so on. measurements for large and heterogeneous other than zero for mean and 1 for the standard
* Periodic renorming of test is very important, so that samples are graphed. deviation of the transformed scores.
norms will not become outmoded and outdated. Skewness * The important point about standardized scores is
- Skewness refers to the symmetry or asymmetry of that we can transform any distribution to a preferred
ESSENTIAL STATISTICAL CONCEPTS a frequency distribution. Skewed distributions scale with predetermined mean and standard
- Frequency distribution, histogram, frequency usually signify that the test developers have deviation.
polygon included too few easy items (positive skew) or too - T score has a mean of 50 and a standardized
few hard items (negative skew). deviation of 10 (to eliminate negative scores) this is
Measures of Central Tendency common with personality test. T scores will fall
mean, median, mode RAW SCORE TRANSFORMATIONS between values of 20 and 80, that is, within three
*T he mean is so sensitive to extreme values. Percentile – expresses the percentage of persons standard deviations of the mean.
The median is a better index of central tendency if in the standardization sample who scored below a
the distribution of scores is skewed. specific raw score. Note that higher percentile Normalizing Standard Scores
Measures of Variability indicates higher scores. - Distributions that are skewed or otherwise non-
- Variability refers to the extent of dispersion of Percentile Ranks – It is a complete reverse of normal can be transformed or normalized to fit a
scores about the mean in a distribution. usual ranking procedures. A percentile rank of 1 is normal curve.
1) Standard Deviation (s or SD) at the bottom of the sample. Conversion of standard scores to percentiles
- It reflects the degree of dispersion of scores. * Major Drawback: They distort the underlying percentiles to normalized standard scores
- When scores are tightly packed around a central measurement scale especially at the extremes. * Drawback: Normalized standard scores are a
* These are the most common type of raw scores nonlinear transformation of the raw scores
Distribution Percentage for Stanine Conversion transformation in psychological testing.
Standard Scores Stanines, Stens, and C Scale
Percentage 4 7 12 17 20 17 12 7 4 - A standard score uses the standard deviation of - In stanine (standard nine), all raw scores are
the total distribution of raw scores as the converted to a single-digit system of scores ranging
Stanine 1 2 3 4 5 6 7 8 9 fundamental unit of measurement (also called as z- from 1-9. The mean is always 5, and the standard
score). deviation is always 2.
value, SD is small. As scores become more spread - It not only expresses the magnitude of deviation - The main advantage of stanines is that they are
out, the SD the SD becomes larger. When scores from the mean, but the direction of departure restricted to single-digit numbers.
- A variation is the10-unit sten scale, with units CRITERION-REFERENCED TESTS (characteristics of the individual, the test, and
above and 5 units below the mean (Canfield). - It represents a fundamental shift: The focus is on situation that have nothing to do with the attribute
Guilford and Fruchter proposed the C Scale what the test taker can do rather than on being measured.
consisting of 11 units comparisons to the performance level of others. - This conceptual breakdown can be expressed in a
- It is best suited to testing of basic academic skills simple equation:
SELECTING A NORM GROUP in educational setting. X = T + e, where X is the obtained score, T is the
- When choosing a norm group, test developers discrepancy true score, and e represent error of
strive to obtain a representative cross section of the measurement. Thus error in measurement
population for whom the test is designed. represents the obtained score and the true score;
Age Norms Norm-Referenced Criterion-Referenced e=X–T
- Age norm depicts the level of test performance for - The true score is never known.
each separate age group in the normative sample.
Purpose compare examinees compare examinees
The purpose of age norms is to facilitate same-aged performance to one performance to a SOURCES OF MEASUREMENT ERROR
comparisons. another predefined standard 1) Item Selection/Sampling
- For characteristics that change quickly with age – 2) Test Administration
such as intellectual abilities in childhood, test 3) Test Scoring
developer might report separate test norms for Item content broad domain of narrow domain of skills 4) Systematic Error Measurement – its effects is not
narrowly defined age brackets. By contrast, adult skills unpredictable and inconsistent like the other tree
characteristics change more slowly and it might be sources. A systematic error arises when a test
sufficient to report normative data by large age Item items vary widely in most items of similar
consistently measure something other than the trait
intervals. Selection difficulty level difficulty level for which it is intended.
Grade Norms
- A grade norm depicts the level of test performance MEASUREMENT ERROR AND RELIABILITY
for each separate grade in the normative sample. usually expressed usually expressed as Basic Assumptions of Classical Theory
- This is rarely used with ability tests. However, Interpretation as percentile percentage, with 1) Unsystematic measurement errors act as random
of Scores standard score or passing level
these norms are especially useful in school settings grade equivalent predetermined
influences
when reporting the achievement level of school 2) Mean error of measurement is zero.
children. 3) True score and errors are uncorrelated.
4) Measurement errors are not correlated with errors
- Local norms are derived from representative local - Reliability refers to the attribute of consistency in on other tests.
examinees, as opposed to a national sample. measurement. - Test scores vary as the result of variability in true
- Subgroup norms consist of the scores obtained scores and variability due to measurement error.
from an identified subgroup. The subgroup can be CLASSICAL TEST THEORY AND THE SOURCES (x2 = T2 + e2)
formed with respect to sex, ethnic background, OF MEASUREMENT ERROR
geographical region, urban versus rural - Classical test theory is an approach also called THE RELIABILITY COEEFICIENT
environment, socio economic level and many other as theory of true and error scores. Charles - Reliability expresses the relative influence of true
factors. Spearman laid the foundation for the theory which and error scores on obtained scores.
was subsequently extended and revised by - Reliability coefficient is the ratio of true score
- An expectancy table is one practical form that contemporary psychologists. variance to the total variance of test scores (0.0-
norms may take. It portrays the relationship between - Test score results from the influence of two factors: 1.0.)
test scores (predictor) and expected outcome on a 1) Factors that contribute to consistency (stable - It is an index of potential or actual consistency of
relevant task (criterion). attributes of the individual that is being measured. obtained scores.
2) Factors that contribute to inconsistency
THE CORRELATION COEEFICIENT the odd items versus the even items if items are - A sample of tests is independently scored by two
- A correlation coefficient expresses the degree of arranged in approximate order of difficulty. or more examiners and scores for pairs of
linear relationship between two set of scores - Spearman-Brown Formula is needed in examiners are then correlated.
obtained from the same persons. computing an additional reliability estimate for the
- It can take values ranging from -1.00-+1.00. full instrument, aside from calculating a Pearson r
- One use of the correlation coefficient is to gauge for the two equivalent halves. A longer test is more ITEM RESPONSE THEORY (IRT)
the consistency of psychological test scores. If the reliable than a shorter test if the longer test - Also known as latent trait theory or Rasch Model
scores are highly consistent, then the scores of embodies equivalent content and similar item - Developed by Danish mathematician Georg Rasch
person taking the test on two occasions will be difficulty.
strongly correlated. - Coefficient Alpha is thought of as the mean of all Item Response Functions (IRF)
possible split-half coefficients. It is an index of the - Also known as item characteristic curve (ICC).
Methods for Estimating Reliability internal consistency of the items, that is, their - It is a mathematical equation that describes the
- It falls in two broad groups tendency to correlate positively with another. relation between the amount of a latent trait an
- Traditionally, coefficient alpha has been thought of individual possess and the probability that he or she
1) Temporal Stability – approach of estimating as an index of unidimensionality, that is the degree will give a designated response to a test item
reliability which directly measures the consistency of to which a test or scale measures a single factor. designated to measure that construct.
test scores - IRFs are commonly used to eliminate items that
2) Internal Consistency – approach which rely Kuder–Richardson Estimate of Reliability don’t function optimally in a psychometric sense.
upon a single test administration to gauge reliability. - A specific formula developed by Kuder and - It can also be used to determine the difficulty level
Richardson referred to as Kuder-Richardson of test items. Difficulty is indexed by how much of
TEMPORAL STABILITY formula 20 or KR-20. the trait is needed to answer the item correctly.
- This is relevant to special case in which each item - Item discrimination parameter or a gauge of how
Test-Retest Reliability – administering the identical
is scored 0 or 1. well the item differentiates among individuals at a
test twice to the same group of heterogeneous and
specific level of trait in question.
representative subjects.
Interscorer Reliability
Alternate-Forms Reliability – producing two forms
Information Functions
of the same test. Alternate forms of a test
No. of No. of - In psychological measurement; information
incorporate similar content and cover the same Method
Forms Sessions
Sources Of Error Variance
represents the capacity of a test item to differentiate
range and level of difficulty in items.
among people.
- Reliability estimates are derived by administering
- The beauty of IRT is that the item information
both forms to the same group and correlating the Test-Retest 1 2 Changes over time
functions from different scale items can be added
two sets of scores. This has much in common with
together to derive the scale information function,
test-retest but one fundamental difference is that
Alternate Forms which is analogous to test reliability in classical test
alternate-forms methodology introduces item- 2 1 Item sampling
(immediate) theory.
sampling differences as an additional source of error
variance.
Invariance in IRT
INTERNAL CONSISTENCY Alternate Forms
Item sampling - First, invariance in IRT means that an examinee’s
Split-Half Reliability – Reliability estimate is 2 2
Changes over time
(delayed) position on latent-trait continuum (his or her score)
obtained by correlating the pairs of scores obtained
can be estimated from the responses to any set of
from equivalent halves of test administered only
test items with known IRFs.
once to a representative sample of examinees.
Split-Half 1 1
Item sampling - Second, IRFs do not depend on the characteristics
- The major challenge with split-half reliability is Nature of split
of a particular population.
dividing the test into two nearly equivalent halves.
The most common method is to compare scores on
Coefficient Item sampling
1 1
Alpha Test heterogeneity

Interscorer 1 1 Scorer differences


VALIDITY measure, called a criterion. characteristic based on a limited sample of
__________________________________ behavior. Construct validity refers to the
Characteristics of a Good Criterion appropriateness of these inferences about the
1) A criterion must itself be reliable if it to be a useful underlying construct.
- Validity is the extent to which a test measures what
index of what the test measures. - Construct validity is regarded as the unifying
it purports to measure. The inferences drawn from a
Validity Coefficient (rxy) = square root of the test concept for all types of validity.
valid test are appropriate, meaningful, and useful.
reliability (rxx) multiplied by the criterion reliability (ryy) - No criterion or universe of content is accepted
2) A criterion must also be free from contamination entirely adequate to define the quantity to be
CONTENT VALIDITY
from the test itself, referred to as criterion measured.
- Content validity is determined by the degree to
contamination.
which the questions, tasks, or items on a test are
a) the criterion is contaminated by its artificial APPROCAHES TO CONSTRUCT VALIDITY
representative of the universe of behavior the test
commonality with the predictor Test Homogeneity
was designed to sample.
b) also possible when the criterion consists of - The test measures a single construct (also referred
- It involves item sampling issues. If the sample
ratings from experts to as internal consistency)
(specific items on the test) is representative of the
- The most commonly used method is to correlate
population (all possible items), then the test possess
Concurrent Validity each potential item with the total score and select
content validity.
- The criterion measures are obtained at items that show high correlations with the total score
- When evaluating content validity, response
approximately the same time as the test scores. and correlating subtests with the total score in the
specification is also an integral part of defining the
- Correlation between a new test and existing test early phases of test development.
relevant universe of behaviors.
are often cited as evidence of concurrent validity. Appropriate Developmental Changes
Quantification of Content Validity
Predictive Validity - Some constructs can be assumed to show age-
Lawshe method of interrater agreement
- The criterion measures are obtained in the future, graded changes.
- Ratings of each judge are dichotomized into weak
usually months or years after the test scores are Theory –Consistent Group Differences
relevance versus strong relevance. For each item,
obtained. - Persons of a particular background and
the conjoint ratings of two judges can be entered
- Regression equation describes the best-fitting characteristics thought to be high on the construct
into a two-by-two agreement table.
straight line for estimating the criterion from the test measured by the test should obtain high scores and
- A coefficient of content validity can be derived from
Y = 0.7X + .2 the other way around.
the following formula:
- Standard Error of Estimate is the margin of error Theory-Consistent Intervention Effects
content validity = D / (A+B+C+D)
to be expected in the predicted criterion score. - Another approach of construct validation is to show
* If more than two judges are used, this
- Standards of predictive accuracy are in part, value that test scores change in appropriate direction and
computational procedure could be completed with
judgments. Psychological testing is not amount in reaction to planned or unplanned
all possible pair-wise combinations of judges, and
measurement per se but measurement in the interventions.
the average coefficient reported.
service of decision making. Convergent and Discriminant Validation
* Quantification of content validity is no substitute for
- Taylor-Russell Tables can determine the - Convergent validity is demonstrated when a test
careful selection of items.
proportion of successes expected through the correlates highly with other variables or tests with
application of a test. The tester must specify (1) the which it shares an overlaps of constructs.
FACE VALIDITY
predictive validity of the test, (2) the selection ratio, - Discriminant validity is demonstrated when a test
- Face validity is a matter of social acceptability and
(3) the base rate for successful applicants does not correlate with variables or tests from which
not a technical form of validity in the same category
it should differ.
as content, criterion-related, or construct validity.
CONSTRUCT VALIDITY - Multitrait-multimethod matrix
CRITERION-RELATED VALIDITY
- A construct is a theoretical, intangible quality or Factor Analysis
- Criterion-related validity is demonstrated when a
trait in which individuals differ. - A specialized statistical technique which identifies
test is shown to be effective in estimating an
- A test designed to measure a construct must the minimum number of determiners (factors)
examinee’s performance on some outcome
estimate the existence of an inferred, underlying required to account for the intercorrelations among
battery of tests. measure and how it differs from existing of those items endorsed.
- The goal of factor analysis is to find a smaller set instruments. Method of Absolute Scaling
of dimensions, called factors that can account for 2) SELECTING A SCALING METHOD - Also developed by Thurstone, a procedure for
the observed array of intercorrelations among - The immediate purpose of testing is to assign obtaining a measure of absolute item difficulty
individual test. numbers to responses on a test so that the based on results for different age groups of test
- Factor loadings is a correlation between an examinee can be judged to have more or less f the takers.
individual test and a single factor (-1.0-+1.0) characteristics measured. - Essentially, a set of common test items is
Classification Accuracy Levels of Measurement administered to two or more age groups.
- Sensitivity has to do with accurate identification of Nominal Scale – The numbers serve only as - One age group serves as the anchor group. Item
patient who have a syndrome. category names. difficulty is measured in common units such as
- Specificity has to do with accurate identification of Ordinal Scale – It constitutes a form of order or standard deviation units of ability for the anchor
normal patients. ranking. However, ordinal scales fail to provide group.
- Robust levels of sensitivity and specificity provide information about the relative strength of rankings. - It is widely used in group achievement and
corroborating evidence of test validity and test Interval Scale – It provides information about aptitude testing. Also used as a basis for dropping
developers should strive to achieve the highest ranking, but also supplies a metric for gauging the redundant test items.
possible levels of both. differences between rankings. Likert Scale
Ratio Scale – Has all the characteristics of an - widely used today
EXTRAVALIDITY CONCERNS interval scale but also possesses a conceptually - presents the examinees with a continuum of
- By acknowledging the importance of the meaningful zero point, in which there is a total responses.
extravalidity domain, psychologists confirm that the absence of the characteristic being measured. Ratio - The total score is obtained by adding the scores
decision to use a test involves social, legal, and scales are rare in a psychological measurement. from individual items. It is also referred to as
political considerations that extend beyond the summative scale.
traditional questions of technical validity. REPRESENTATIVE SCALING METHODS Guttman Scales
Expert Rankings - Respondents who endorse one statement also
Method of Equal-Appearing Intervals agree with milder statements pertinent to the same
Unintended Side Effects of Testing - L.L Thurstone proposed a method for constructing underlying continuum.
- The examiner must determine whether the benefits interval-level scales from attitude statements. -Guttman scale was originally devised to determine
of giving the test outweigh the costs of potential side 1. Collecting as many true-false statements as whether a set of attitude statements is
effects. possible. unidimensional.
2. Have known experts or judges (qualified for the Method of Empirical Keying
THE WIDENING SCOPE OF TEST VALIDITY task) rate the statements to determine the degree of -This scaling method is devoid of theory or expert
- A functionalist definition of validity asserts that a favorability/unfavorability. Each statement is sorted judgment.
test is valid if it serves the purpose for which it is into 1 of 11 equidistant categories. - In this method, test items are selected based
used. 3. The mean favorability and standard deviation for entirely on how well they contrast a criterion group
- Test validity, then, is an overall evaluative each item is determined. from a normative sample.
judgment of the adequacy and appropriateness of 4. Ratings with large standard deviations are - The endorsement frequency of the criterion is
inferences and actions that flow from test scores. dropped. Usually 20-30 items are chosen such that compared to the endorsement frequency of the
statements cover the range of dimension (favorable normative sample. Items which show a large
TEST CONSTRUCTION to unfavorable). It is then assumed that the difference in endorsement frequency are selected
__________________________________ differences between items on the final scale fulfil the for the scale, keyed in the direction favored by the
properties of an interval scale. criterion group.
1) DEFINING A TEST 5. Persons who will take the scale are asked to - Raw score in the scale is simply the number of
- In order to construct a new test, the developer mark all statements with which they agree. Their items answered in the keyed direction.
must have a clear idea of what the test is to score is determined by averaging the scale values - The downside of this scaling method is that some
items selected for the scale may show no obvious - In determining which cognitive processes and item level of item difficulty needs to be adjusted for the
relationship to the construct measured. domains should be included, a table of specification effects of guessing.
Rational Scale Construction (Internal should be made. - In general, the optimal level of item difficulty can
Consistency) - It enumerates the information and cognitive tasks be computed from the formula (1.0 + g)/2, where g
- This approach to scale construction is a popular on which examinees are to be assessed (e.g. is the chance success level.
method for the development of self-report content-by-process matrix). - If a test is to be used for selection of an extreme
personality inventories. group by means of a cutting score, it may be
- The heart of this method is that all scale items Item Formats desirable to select items with difficulty levels outside
correlate with each other and also with the total - For group-administered tests of intellect or the .3 to .7 range.
score of the scale. achievement, the technique of choice is the p = NP/N; number of students who answered
- The developer of the test conceptually articulated multiple-choice question. correctly divided by the number of test takers.
beforehand the central theme or unifying dimension 1) It permits quick and objective machine scoring Item-Reliability Index
around which the test items are clustered. 2) It can measure conceptual as well as factual - A test must have a high level of internal
- Very large samples to which the scale is designed knowledge. consistency in which the items are reasonably
to identify are desirable target population. 3) However, difficult to write good distractor options, homogenous.
- The next step is to correlate each scores on each and second, the possibility that the presence of the - To determine this is to correlate scores on that
of the preliminary item with the total score (biserial response may cue a half-knowledgeable respondent item with scores on the total test through a special
correlation coefficient is needed for the dichotomous to the correct answer. type of statistic called point-biserial correlation
scoring). - A matching question suffers from serious coefficient (scored 1 or 0 for dichotomously scored
- Items with weak correlations and reversals psychometric shortcomings: (but popular in items).
(negative) are discarded. classroom settings) - By computing the item-reliability index for every
1) Responses are not independent – missing one item in the preliminary test, we can eliminate the
3) CONSTRUCTING THE ITEMS match usually compels the examinee to miss “outlier” items that have the lowest value on this
Initial Test Questions in Test Construction another. index, possessing weak dispersion of scores.
Homogeneity versus Heterogeneity of Item Content 2) Options in a matching question must be very Item-Validity Index
- In large measure, whether item content is closely related or the question will be too easy. - It is a useful item in the preliminary test, to identify
homogenous or varied is dictated by the manner in - For individually-administered tests, the procedure predictively useful items.
which the test developer has defined the new of choice is the short-answer objective item. - It is the product of the standard deviation and the
instrument. - Personality tests often use true-false questions point-biserial correlation.
Range of Item Difficulty because they are easy for subjects o understand. Item-Characteristic Curves
- The range of item difficulty must be sufficient to 1) Answers to true-false item format reflect social - It is an item response function. ICC is a graphical
allow for meaningful differentiation of examiners at desirability rather than personality traits. display of the relationship between the probability of
both extremes. - An alternative to T-F questions is the forced- a correct response and the examinee’s position on
- A ceiling effect is observed when significant choice methodology in which the examinee must the underlying trait measured by the test.
numbers of examinees obtain perfect or near- choose between two equally desirable (undesirable) - The simplest ICC model is the Rasch Model which
perfect scores. A floor effect is observed when options. rests on twp assumptions:
significant numbers of examiners obtain scores at or 1) the test items are unidimensional and measure
near the bottom of the scale. 4) TESTING THE ITEMS one common trait.
Initial Item Pool - Item Analysis refers to a family of statistical 2) Test items vary on a continuum of difficulty level
- The first draft of the test items must contain excess procedures, to identify the best items. - ICCs are especially useful for identifying items that
items, perhaps double the number of questions Item Difficulty Index perform differently for subgroups of examinees.
desired on the final draft. - It is a useful tool for identifying items that should - ICC is particularly appropriate for certain forms of
be altered or discarded. computerized adaptive testing (CAT).
Table of Specifications - For true-false or multiple-choice items, the optimal Item-Discrimination Index
- It is a statistical index of how efficiently an item physical packaging of test materials must allow for intercorrelations among all the variables.
discriminates between persons who obtain high and quick and smooth administration. The Factor Matrix and Factor Loadings
low scores on the entire test. Technical Manual and User’s Manual - The factor matrix consists of correlations called
- On an item-by-item basis, this index compares the - In the technical manual, the prospective user can factor loadings (which can take on values from -1.00
performance of subjects in the upper and lower find information about item analyses, scale to +1.00), indicating the weighing of each variable
regions of total test score. The upper and lower reliabilities, cross-validation studies, and the like. on each factor.
ranges are generally defined as the upper-and- - The user’s manual gives instruction for - The derivation of factors may seem mysterious but
lower-scoring 10 percent to 33 percent of the administration and also provides guidelines for test a factor is nothing more than a weighted linear sum
sample. interpretation. The test manual should also provide of the variables; that is, each factor is a precise
d = (U-L) / N essential data on reliability and validity. statistical combination of the tests used in the
- In addition, a good item should show proportional analysis.
dispersion of incorrect choices for both high- and THEORIES AND INDIVIDUAL TESTS OF - A factor is produced by “adding in” carefully
low-scoring subjects. INTELLIGENCE AND ACHIEVEMENT determined portions of some tests and perhaps
“subtracting out” fractions of other tests.
Retained - if difficulty level is average
__________________________________ The Rotated Factor Matrix
- if discriminatory power is positive - Intelligence is one of the most highly researched - Position of the reference axes is arbitrary. The
topics in psychology. axes can be rotated so that a sensible fit with the
Rejected - if difficulty level is easy/very easy Definitions of Intelligence factor loadings can be produced.
or difficult/very difficult - Broadly speaking, experts agree that intelligence is - Thurstone’s objective statistical criteria can be
- if discriminatory power is negative used to produce the final rotated factor matrix. In a
or zero (1) the capacity to learn from experience rotation to positive manifold, the computer
(2) the capacity to adapt to one’s environment program seeks to eliminate as many of the negative
Modified - if difficulty level is average factor loadings as possible. In a rotation to simple
FACTOR ANALYSIS
- if discriminatory power is negative structure, the computer program seeks to simplify
Two Forms of Factor Analysis
the factor loadings so that each test has significant
- The main goal of factor analysis is to help produce
5) REVISING THE TEST loadings on as few factors as possible.
a parsimonious description of large, complex data
- Collecting new data from a second tryout sample. - Varimax rotation should not be used if the
sets.
Cross Validation theoretical expectation suggests that a general
- In confirmatory factor analysis, the purpose is to
- Refers to the practice of using the original factor may occur. Only used for multiple ability
confirm the test scores and variables fit a certain
regression equation in a new sample to determine factors.
pattern predicted by a theory.
whether the test predicts the criterion as well as it The Interpretation of Factors
(e.g. the theory underlying a certain intelligence test
did in the original sample. - In order to interpret or name a factor, the
prescribed that the subtests belong to three factors)
Validity Shrinkage researcher must make a reasoned judgment about
- This form of FA is essential to the validation of
- A test predicts the relevant criterion less accurately the common processes and abilities shared by the
many ability tests.
with the new sample of examinees than with the tests with strong loadings on that factor.
- The central purpose of exploratory factor
original tryout sample. Issues in Factor Analysis
analysis is to summarize the interrelationships
- Validity shrinkage is an inevitable part of test - An important point is that a particular kind of factor
among a large number of variables in a concise and
development and underscores the need for cross can emerge from factor analysis only if the tests and
accurate manner as an aid in conceptualization (e.g.
validation. measures contain that factor in the first place.
a battery of tests represents only four underlying
Feedback from Examinees - Sample size is crucial to a stable factor analysis.
factors).
- With orthogonal axes, the factors are right angles
6) PUBLISHING THE TEST to one another, which means that they are
The Correlation Matrix
Production of Testing Materials uncorrelated.
- The correlation matrix is a complete table of
- The first guideline for test production is that the - With oblique axes, the factors are correlated
among themselves. With oblique rotations it is also series completion knowledge (e.g. vocabulary).
possible to factor analyze the factors themselves. - Thurstone published the Primary Mental Abilities 3) Domain-Specific Knowledge – Represents a
Such a procedure may yield one or more second- Test (PMAT) consisting of separate subtests. person’s acquired knowledge in one or more
order factors. specialized domains that is atypical of the
VERNON’S HIERARCHAL GROUP FACTOR experiences of individuals in the culture (knowledge
GALTON AND SENSORY KEENESS THEORY OF INTELLIGENCE of biology).
- The notion by Sir Francis Galton and his disciple - P.E. Vernon provided a rapprochement between 4) Visual-Spatial Abilities – This ability has
James McKeen Cattell that intelligence is the viewpoints of Spearman and Thurstone. something to do with imagining, retaining, and
underwritten by keen sensory abilities. - In his view, g was the single factor at the top of the transforming mental representations of visual
- Reaction Time – Movement Time (RT-MT) hierarchy that included two major group factors images.
apparatus labelled verbal educational (V:ed) and practical- 5) Auditory Processing – Involves the capacity to
SPEARMAN AND THE g FACTOR mechanical-spatial-physical (k:m). Underneath analyze, comprehend, and synthesize patterns or
- Charles Spearman proposed that intelligence these two major group factors are the several minor groups of sounds.
consisted of two kinds of factors, a single general group factors resembling the PMA; specific factors 6) Broad Retrieval [Memory] – Includes the ability
factor g and numerous specific factors (s1, s2, s3) occupied the bottom of the hierarchy. to consolidate and store new information in long-
- The Two- Factor Theory of Intelligence term memory and then to retrieve the information
- Factor analysis aided his investigation of the letter through association (e.g. ideational fluency,
nature of intelligence. CATTELL-HORN-CARROLL (CHC) THEORY naming facility).
- Spearman believed that individual differences in g - also known as the Three-Stratum Theory of 7) Cognitive Processing Speed – This ability
were most directly reflected in the ability to use Cognitive Abilites refers to the speed of executing overlearned or
three principles of cognition: apprehension of - Main proponent: Raymond Cattell; revised and automatized cognitive processes, especially when
experience, eduction of relations, and eduction of extended by John Horn and John Carroll. high levels of attention and focus concentration are
correlations. - The theory synthesizes the findings from almost a required.
century of factor analytic research on intelligence. 8) Decision/Reaction Time or Speed – This is the
THURSTONE AND THE PRIMARY MENTAL - According to CHC theory, intelligence consists of ability to make decisions quickly in response to
ABILITIES pervasive, broad, and narrow abilities that are simple stimuli, measured by reaction time.
- In his analysis of how scores on different kinds of hierarchically organized.
intellectual analysis correlated with each other, - Stratum III, the highest level which is the g factor PIAGET AND ADAPTATION
Thurstone concluded that several broad factors overseeing all cognitive abilities. - theory of cognitive development by Jean Piaget
(designated as primary mental abilities) – and not - Stratum II, broad factors that include basic - has a number of implications for the design of
a single g factor could best explain empirical results. constitutional and longstanding characteristics of children’s intelligence test
individuals that can govern or influence a variety of - interviews and informal tests as methods of
1) Verbal Comprehension – vocabulary, reading behaviors in a given domain. investigating intellectual development
comprehension, and verbal analogies - Stratum I, are narrow abilities that represent 1) Children’s thought is quantitatively different from
2) Word Fluency – anagrams or quickly naming greater specializations of abilities (continue to adult’s thought.
words in a given category undergo revision and extension since it’s not yet - Conservation refers to the awareness that
3) Number – arithmetic computation firmly established. physical quantities do not change in the amount
4) Space- ability to visualize how a 3D object would Broad Factors in CHC when they are superficially altered in appearance.
appear if it was rotated or partially disassembled 1) Fluid Intelligence/Reasoning – It is used for
5) Associative Memory – learning to associate novel tasks that cannot be performed automatically. 2) Psychological structures called schemas are the
pairs of unrelated items This is large nonverbal abilities and not heavily primary basis for gaining new knowledge about the
6) Perceptual Speed – checking for similarities and dependent on exposure to a specific culture. world.
differences in visual details 2) Crystallized Intelligence/Knowledge – It is - Schemas are an organized pattern of behavior or a
7) Inductive Reasoning - finding a rule for number defined as an individual’s acquired cultural well-defined mental structure that leads to knowing
how to do something. operations required by the test| - Since this approach focuses upon the mechanics
- The mechanism by which schemas become more by which information is processed, is called an
mature is called the process of equilibration. information-processing theory
- Assimilation is the application of a schema to an Cognition Discovering, knowing, or comprehending
object, person, or event while accommodation is Committing items of information to memory, Simultaneous processing of information is
Memory
the adjustment of an unsuccessful schema so that it such as a series of numbers characterized by the execution of several different
works. Retrieving from memory items of a specific mental operations simultaneously (e.g. spatial
Divergent
class, such as naming objects that are both analysis)
Production
hard and edible
3) Convergent Retrieving from memory a correct item, such - associated with the occipital and parietal lobes
Visual Images presented to the eyes
Production as a crossword puzzle word Successive processing of information is needed
Auditory Sounds presented to the ears Determining how well a certain item of infor- for mental activities in which a proper sequence of
Evaluation
mation satisfies a specific logical requirement operations must be followed (e.g. remembering
Such as the mathematical symbols that stand
Symbolic series of digits, repeating a string of words)
for something
Semantic Meanings, usually of word symbols 2) Contents – refers to the nature of the materials
of information presented to the examinee INFORMATION-PROCESSING THEORIES OF
The ability to comprehend the mental state
Behavioral INTELLIGENCE
and behavior of other persons
Four Stages of Cognitive Development - Information-processing conceptions of intelligence
Sensorimotor (birth to 2 years) 3) Products- refer to the different kinds of mental propose models of how individuals mentally
- Infants experience the world mainly through their structures that the brain must produce to derive a represent and process information.
senses and motor abilities. Develops object correct answer. - The architectural system (hardware) refers to
permanence in the end of this stage. biologically based properties necessary for
A single entity having a unique combination of
Preoperational (2 to 6 years) Unit information processing (e.g. memory span and
properties or attributes
- Conservation concepts not yet developed, but What it is that similar units have in common, speed of decoding/encoding information).
children understand the idea of a functional Class such as a set of tri-angles or high-pitched - In addition to the structural component of
relationship. Ability to mentally symbolize things with tones intelligence, there are various functional
An observed connection between two items, components (software). The executive system
words and images also develops. Relation
such as two tones an octave apart
Concrete Operational (7 to 12 years) which refers to environmentally learned components
Three or more items forming a recognizable
- Children typically develop conservation and System whole such as a melody or a plan for a se- that steer problem solving, provides overall
demonstrate limited capacities of logical reasoning. quence of actions guidance to the functional components (e.g.
The concept of reversibility develops – the A change in an item of information, such as a knowledge base, schemes, control processes,
Transformation
knowledge that one action can reverse or negate correction of a misspelling metacognition)
What an individual item implies, such as to
another. Implication
expect thunder following lightning
Formal Operational (12 years and up) – The INTELLIGENCE AS A BIOLOGICAL CONSTRUCT
systematic problem solving that we associate with - nature of intelligence by looking at the properties of
Convergent production – the construction of single the brain itself
adult thought usually develops in this stage. correct answer to a stimulus situation - synchronized electrical activity of brain cells
Divergent production – the creation of numerous - glucose metabolism revealed by PET scan
GUILFORD AND THE STRUCTURE-OF- appropriate responses to a single stimulus situation
INTELLECT MODEL
- J.P. Guilford proposed an elegant structure-of- MULTIPLE INTELLIGENCE THEORY: GARDNER
THEORY OF SIMULTANEOUS AND - Howard Gardner argues for the existence of
intellect (SOI), which classifies intellectual abilities SUCCESSIVE PROCESSING
along three dimensions called operations, contents, several relatively independent human intelligences
-has roots on the neuropsychological investigations although he admits that the exact nature, extent,
and products. of Aleksandr Luria (individual case studies and
- a total of 150 operations and number of the intelligences have not been
clinical observations of brain-injured soldiers. definitely established.
1) Operations – are the kind of intellectual
- based loosely on brain-behavior relationships - First format: Wechsler-Bellevue Intelligence Scales - regarded as one of the best measures of general
- The seven intelligences include linguistic, logical- - Wechsler also hoped to use this test as an aid in ability among the subtests
mathematical, spatial, musical, bodily kinesthetic, psychiatric diagnosis dividing his scale into separate
interpersonal, and intrapersonal. verbal and performance sections. PERFORMANCE IQ
- produced 12 intelligence tests in a span of 65 Picture Completion
TRIARCHIC THEORY OF INTELLIGENCE: years, and each new test remained faithful to the - identifying the important part that is missing from a
STERNBERG first format. picture
- His theory emphasizes what he calls successful Coding/Digit Symbol
intelligence or the ability to adapt, to shape, and Common features of latest versions and editions of - requires the ability to quickly produce distinctive
select environments to accomplish one’s goals and Wechsler intelligence tests: verbal codes to represent each of the symbols in
those of one’s society and culture. 1. Fourteen or fifteen subtests memory; on-the-spot learning of an unfamiliar task
2. Empirically based breakdown into composite Block Design
Componential (Analytical) Intelligence scores and full scale IQ - the examinee must reproduce two-dimensional
Metacomponents or executive or processes (plan- - verbal comprehension geometric designs by proper rotation and placement
ning)
Performance components (short-term memory) - perceptual organization/reasoning of 3D colored blocks
Knowledge acquisition components - working memory - requires the analysis of spatial relations, visual-
Experiential (Creative) Intelligence - processing speed motor coordination, and the rigid application of logic
Ability to deal with novelty 3. A common metric for IQ and index scores (mean - has the highest loading on Performance subtests
Ability to automatize information processing of 100 and standard deviation of 15 for all tests and Matrix Reasoning
Contextual (Practical) Intelligence age groups; a mean of 10 and SD of 3 for scales - taps pattern completion, reasoning by analogy,
Adaptation to real-world environment scores on each subtest) and serial reasoning
Selection of a suitable environment
Shaping of the environment 4. Common subtests for the different test versions - an excellent measure of inductive reasoning based
Subtests: on figural stimuli
- Sternberg Triarchic Abilities Test (STAT) VERBAL IQ - the only untimed performance subtest on WAIS III
Vocabulary
- the single best measure of overall intelligence on Picture Arrangement
INDIVIDUAL TEST OF INTELLIGENCE AND the Wechsler scale - deciphering the gestalt of the entire story from its
ACHIEVEMENT Similarities disarranged elements
________________________________ - evaluates the examinee’s ability to distinguish Picture Concepts
The Wechsler Scales of Intelligence resemblances in objects, facts, and ideas. - measures abstract, categorical reasoning
- David Wechsler of Bellevue Hospital defined Arithmetic
intelligence as the “ the aggregate or global capacity - orally presented mathematics problems that must Object Assembly
of the individual to act purposefully, to think be solved without paper or pencil within a time limit - assembling pieces of jigsaw puzzle to form a
rationally and to deal effectively with his Digit Span common object
environment. - consists of two separate sections: Digits Forward - requires high levels of perceptual organization
- Wechsler explained that existing instruments such and Digits backward - least reliable of the Wechsler subtests
as the Stanford-Binet were inadequate for assessing - measures immediate auditory recall and visual Letter-Number Sequencing
adult intelligence. To correct these shortcomings, he memory trace - found only in WAIS-III
added performance items to balance verbal Comprehension - measures attention, concentration, and freedom
questions, reduced the emphasis upon speeded - items that require explanation rather than mere from distractibility
question and invented a new age-relative formula factual knowledge (interpreting proverbs) - under Working Memory Index together with digit
for obtaining the IQ. - a measure of “social intelligence” span and arithmetic
IQ = Attained or actual score / expected mean score Information Symbol Search
for age - found on all three Wechsler tests - highly speeded subtest, a measure of processing
speed items were dropped and new items were added at Motor-enhanced Motor-reduced Motoric
Cancellation the higher end of the scale. The concept of mental Fluid Crystallized Horn & Cattell
-designed to measure processing speed, vigilance, level was introduced. Simultaneous Successive Das
and visual attention - In the 1911 scale, it was extended to adult range, Associative Cognitive Jensen
and there are 5 tests for each age level.
Verbal Performance Wechsler
WECHSLER ADULT INTELLIGENCE SCALE - In 1916 scale (Termann & Merrill), the concept of
Significant changes WAIS-R to WAIS-III IQ was introduced.
- Standardization sample: 16-89 yrs old *5th edition of Stanford-Binet Scale: Five Factors of KAUFMAN ASSESSMENT BATTERY FOR
-addition of three subtests Intelligence (Verbal & Nonverbal) ; 2-85 yrs old CHILDREN-II
- inclusion of an alternative model for scoring the 1. Fluid Reasoning - an individually administered test of cognitive
test (4 index scores and Full Scale IQ) 2. Knowledge abilities designed for children and adolescent ages 3
- extending coverage to age 89 3. Quantitative Reasoning through 18
- adding easy items to improve the assessment of 4. Visual-Spatial Reasoning - the first edition is grounded on Luria’s
mental retardation 5. Working Memory neuropsychological theory of processing, but the
- Routing Procedure – estimates the general second edition already includes CHT Theory
- comprised of 14 subtests cognitive ability before proceeding to the remainder - Sequential, Simultaneous, Learning, Planning,
- Object assembly as optional subtest of the test Knowledge, Global Scale (Fluid-Crystallized Index,
- IQ breakdown: object assembly replaced with - constructed according to the principles of item Mental Processing Index, Nonverbal Index)
matrix reasoning response theory - composed of 18 subtests scaled to a mean of 10
- index scores are “pure” measures than verbal and - includes high-end items to assess giftedness and and a standard deviation of 3 for all age groups
performance IQs improved low-end items to provide better - The CHC approach generally is favored for
- WAIS-III is a harder test than WAIS-R. assessment for very young children (as age 2) and evaluating children for gifted and talented programs
adults with mental retardation, and Working Memory while Luria model is preferred if
can help assess in ADHD  a child from bilingual background
WECHSLER INTELLIGENCE SCALE FOR - religious tradition has been considered  a child whose mainstream cultural background
CHILDREN – IV DETROIT TESTS OF LEARNING APTITUDE – 4 may affect knowledge acquisition and verbal
- 6 ½ to 16 ½ standardization sample - DTLA-4 is designed for schoolchildren 6 through development
- consists of 15 subtests, 10 of which are designated 17 years of age and individually administered  a child with known or suspected language
as core subtests for computing composite scores - the subtests are largely within the Binet-Wechsler disorder whether expressive, receptive, or
and Full Scale IQ, 5 of which are supplemental. heritage except Story Construction mixed receptive-expressive
- abandonment of the two-factor division of - The General Mental Ability composite is formed  a child with known or suspected autism
intelligence by combining standard scores for all 10 subtests in  a child who is deaf or hard of hearing
the battery. Scale Luria CHT
STANFORD-BINET INTELLIGENCE SCALES:
- zeitgeist: identifying those with special needs - The Optimal Level composite is based on the Sequential linear thinking short-term memory
- Alfred Binet invented the first modern intelligence highest four standard scores earned by the Simultaneous holistic processing visual processing
in 1905 examinee and is thought to represent how well the attention & concentra-
long-term memory and
Learning tion, coding and storage
* In 1905 scale, (1) its aim was not measurement examinee might perform under optimal of information
retrieval
but classification, (2) brief and practical test, (3) circumstances. making decision, moni-
measured practical intelligence and not lower-level Planning toring goals and gener- fluid reasoning
- mean of 100 and standard deviation of 15. ating hypothesis
abilities, (4) arranged in approximate level of
difficulty instead of content Knowledge crystallized abilities
Verbal Nonverbal Linguistic
-heavily weighted towards verbal skills Attention-en-
* In 1908 scale it consists of 58 items. Very simple hanced Attention-reduced Attentional KAUFMAN ADOLESCENT AND ADULT
INTELLIGENCE TEST (KAIT) achievement for children ages 4 ½ though 25. GROUP TESTS AND CONTROVERSIES IN
- a measure of intelligence constructed broadly - It has a brief 3 subtest version, but the ABILITY TESTING
within the Cattell-Horn model of fluid/crystallized ________________________________
Comprehensive Form is preferred
intelligence - Group tests differ from individual tests in five ways:
- suitable for ages 11 t0 85 - The Comprehensive Form consists of eight
- The Fluid Scale consists of three subtests: subtests in four areas: 1) multiple choice-format versus open-ended format
 Rebus Learning: 2) objective machine scoring versus examiner
 Mystery Codes Reading Letter & Word Recognition scoring versus individualized
 Logical Steps Reading Comprehension 3) group versus individualized administration
- The Crystallized Scale consists of three subtests: Mathematics Math Concepts and Applications 4) applications in screening versus remedial
 Auditory Comprehension Math Computation planning
 Double Meanings 5) huge versus merely large standardization
Written Language Written Expression
 Definitions samples
Spelling
- The examiner may administer additional subtests - Group tests served many purposes but the vast
for clinical or neuropsychological assessment: Oral Language Listening Comprehension majority can be assigned to one of the three types:
- The instrument also includes a supplementary Oral Expression 1) Ability Test – typically sample a broad
measure of Mental Status, used only with seriously assortment of proficiency in order to estimate
impaired individual, that assess attention and - It also utilizes entry and exit rules for each subtest current intellectual level (can be used for screening
orientation to day, date, time, and the like. to ensure that students only encounter items of or placement purposes)
appropriate difficulty. 2) Aptitude Test – Usually measure a few
KAUFMAN BRIEF INTELLIGENCE TEST – 2 homogenous segments of ability and are designed
(KBIT-2) NATURE AND ASSESSMENT OF to predict future performance. Predictive validity is
- Individual intelligence scales have certain LEARNINGDISABILITIES foundation to aptitude tests. (used for institutional
drawbacks: - The operational definition of learning disability: LD selection purposes)
(1) One problem is the time required to administer must demonstrate a severe discrepancy between 3) Achievement Test – Assesses current skill
them. general ability (intelligence) and specific attainment in relation to the goals of school and
(2) A second disadvantage is the amount of training achievement in one or more of the seven areas: training programs. Also used to assess the
required to administer them. 1) Oral Expression adequacy of school educational programs.
- can be administered in approximately 20 minutes; 2) Listening Comprehension - Group testing poses two interrelated risks:
and norm for ages 4 to 90 3) Written Expression (1) Some examinees will score below their true
- mean of 100 and standard deviation of 15 4) Basic Reading Skill ability, owing to motivational problems or difficulty
5) Reading Comprehension following directions, and (2) invalid scores will not be
INDIVIDUAL TEST OF ACHIEVEMENT 6) Mathematics Calculation recognized as such with undesirable consequences
- Individual achievement tests are better suited for 7) Mathematic Reasoning for these atypical examinees.
the appraisal of learning problems. - In practical terms, a severe discrepancy has been
- The notion that intelligence and achievement defined as a difference of one standard deviation or GROUP TESTS OF ABILITY
typically parallel one another is at the very heart of more between intelligence and specific MULTIDIMENSIONAL APTITUDE BATTERY-II
the concept of learning disability – which involves a achievement. (MAB-II)
discrepancy between the two. Two Broad Categories of Learning Disability - a recent group intelligence test designed to be a
o Dyslexia or Verbal Learning Disability paper-and-pencil equivalent of the WAIS-R
KAUFMAN TEST OF EDUCATIONAL o Right Hemisphere or Nonverbal Learning - appropriate for examinees from ages 16-74
ACHIEVEMENT – II (KTEA-II) Disability - employing a multiple choice format capable of
- KTEA-II is an untimed test of educational being computer-scored.
- Digit Span in WAIS-R not included as well as the
replacement of Block Design with Spatial - Factor analytic studies of the RPM provide little
CULTURE-FAIR INTELLIGENCE TEST (CFIT) support for the original intention of the test to
SHIPLEY INSTITUTE OF LIVING SCALE (SILS) - is a nonverbal measure of fluid intelligence test measure a unitary construct.
- originally proposed as an index of intellectual - The goal of CFIT is to measure fluid intelligence – - The RPM is particularly valuable for the
deterioration in an attempt to gauge the effects of analytical and reasoning ability in abstract and novel supplemental testing of children and adults with
dementia, brain damage, and other organic situations – in a manner that is free of cultural bias hearing, language, or physical disabilities.
conditions as possible. MULTIPLE APTITUDE OF TEST BATTERIES
- consists of two subtests: vocabulary (40 items) and - The test consists of three scales and two - In this, the examinee is tested in several, separate,
abstractions (2o items), multiple-choice format equivalent forms, called Form A and Form B are homogenous aptitude areas.
- 10 minute limit for each section, there is also available for each scale. - dictated by the findings of factor analysis
untimed approach with separate norms - It consists of four subtests: Series, Classification,
- This test has significant limitations: Matrices, and Conditions DIFFERENTIAL APTITUDE TEST (DAT)
(1) The SILS is inappropriate for low-IQ persons or - initially for educational and vocational guidance of
those with significant language disabilities. Scale I Scale II Scale III students in grades seven through twelve
(2) The test has a low ceiling especially on the - considered as one of the most popular multiple
- for adults in the av- - for high ability
abstraction section, and does not spread high-IQ - use for mentally
erage range of intel- adults and for high aptitude test batteries of all time
examinees. defective adults and
children ages 4 to 8
ligence and children school and college - The DAT consists of eight independent tests
(3) The SILS has a band of error approaching 11 IQ ages 8-13 students
- more of an individ-
- group test of intelli- - group test if intelli-
1) Verbal Reasoning
points, which may be too excessive for many ual intelligence test
gence gence 2) Numerical Reasoning
applications. 3) Abstract Reasoning
RAVEN’S PROGRESSIVE MATRICES (RPM) 4) Perceptual Speed and Accuracy
MULTI-LEVEL BATTERY - is a nonverbal test of inductive reasoning based on 5) Mechanical Reasoning
- School-aged children differ hugely in their figural stimuli 6) Space Relations
intellectual abilities. The answer to this dilemma is a - RPM was originally designed as a measure of 7) Spelling
multi-level battery, a series of overlapping tests. Spearman’s factor g – defined as the “eduction of 8) Language Usage
- Each group test is designed for a specific age or correlates”, which refers to the process of figuring - The major problem with the battery is the lack of
grade level, but adjacent tests possess some out relationships based in the perceived discriminant validity between the eight subtests and
content. Because of the overlapping content with fundamental similarities between stimuli. cannot be used in a diagnostic sense.
adjacent age or grade levels, each test possesses a - Examinees must identify a recurring pattern of
suitably low floor and high ceiling for proper relationship between figural stimuli organized in a GENERAL APTITUDE TEST BATTERY (GATB)
assessment of students at both extremes of ability. 3x3 matrix. The items are arranged in order of - the premiere test battery for predicting job
- Multi-level batteries usually provide a much increasing difficulty, hence the reference to performance (developed for U.S Department of
desired continuity in the abilities measured. progressive matrices. Labor in the late 1930’s
Cognitive Abilities Test (CogAT) - The GATB is composed of 8 paper-and-pencil
- with nine subtests a d grouped into three Coloured Progres- Standard Progres- Advanced Progres- tests and 4 apparatus measures. The entire battery
sive Matrices sive Matrices sive Matrices
batteries: 60 items (5 sets of Set I (12 items)
can be administered in approximately two and half
36 items hours.
12 progressions) Set II (36 items)
Quantitative Bat- Nonverbal Bat- - The 12 tests yield a total of 9 factor scores (scores
Verbal Battery 5-11 yrs old 6 yrs & up 6 yrs &up
tery tery are expressed as standard scores with a mean of
Verbal Classifica- Quantitative Rela- Figure Classifica- most of the items are - with higher ceiling
color helps hold the 100 and standard deviation of 20).
tion tions tion so difficult that the than standard
attention of the
Sentence Comple- young children
test is best suited for - suitable for persons 1) General Learning Ability
Number Series Figure Analogies adults of superior intellect 2) Verbal Aptitude
tion
Verbal Analogies Equation Building Figure Analysis
3) Numerical Aptitude
4) Spatial Aptitude
5) Form Perception - with a of mean 500 and standard deviation of 100 attainment of students
6) Clerical Perception - four tests: English, Mathematics, Reading, Science (5) To group students according to similar skill level
7) Motor Coordination Reasoning in specific academic domains
8) Finger Dexterity (6) To identify the level of instruction that is
9) manual Dexterity POSTGRADUATE SELECTION TESTS appropriate for individual students
- The 9 factors can combine nicely into 3 general GRADUATE RECORD EXAM (GRE)
factors: Cognitive, Perceptual, and Psychomotor - multiple-choice and essay test widely used in IOWA TEST OF BASIC SKILLS (ITBS)
THE ARMEDSERVICES VOCATIONAL APTITUDE graduate programs - It is a multilevel battery of achievement tests that
BATTERY (ASVAB) - three general tests: Verbal, Quantitative, and covers grades K through 8.
- the most widely used aptitude test in existence Analytical Writing - A companion test, the Test of Achievement and
- decisions about ASVAB are typically based on - GRE-V and GRE-Q are reported as standard Proficiency (TAP) covers grade 9 through 12.
composite scores, not subtest scores. scores with an approximate mean of 500 and
Academic Composites standard deviation of 100. Scoring of GRE-AW is - It is available in several levels: 5-6 (grades K-1),
(1) Academic Ability based on 6-point holistic ratings provided levels 7-8 (grades 2-3), and levels 9-14 (grades 3-
(2) Verbal independently by two trained raters. 8).
(3) Math
Occupational Composites MEDICAL COLLEGE ADMISSION TEST (MCAT) METROPOLITAN ACHIEVEMENT TEST (MAT)
(4) Mechanical and Crafts - There are 3 multiple-choice sections (Verbal - concurrently normed with Ottis-Lennon School
(5) Business and Clerical Reasoning, Physical Sciences, Biological Sciences) Ability Test (OLSAT)
(6) Electronics and Electrical and 1 essay section (Writing Sample). - a multilevel battery designed for grades K through
(7) Health, Social, and Technology - The writing samples are scored on a 6-point scale 12
- It falls short as a multiple aptitude test battery by independent raters. - The areas tested by the MAT include the
because the composites are so highly correlated traditional-related skills: reading, mathematics,
with one another as to be essentially redundant. LAW SCHOOL ADMISSION TEST (LSAT) language, writing, science, and social studies.
- it has a computerized version supplanting the - It consists of multiple-choice questions in four - An attractive feature of the MAT is that students
paper-and-pencil form areas: reading, comprehension, analytical reading scores are reported as lexile measures, a
reasoning, and two logical reasoning sections. new and practical indicator of reading level.
PREDICTING COLLEGE PERFORMANCE - The score scale for the LSAT extends from a low Lexile Measures
SCHOLASTIC ASSESSMENT TESTS (SAT) of 120 to a high of 180. - This approach is based on two simple,
- The new SAT released in 2005, consists of the commonsense assumptions:
SAT Reasoning Test (for college admission) and the EDUCATIONAL ACHIEVEMENT TESTS (1) reading materials can be placed on a continuum
SAT Subject Tests (for advanced college - Practical applications of group achievement tests as to difficultly level (comprehensibility)
placement) include the following: (institutional) (2) readers can be ordered on a continuum as to
(1) To identify children and adults with specific reading ability
Section Subtests achievement deficits who might need more detailed - The Lexile scale is a true interval scale.
Extended Reasoning assessment for learning disabilities - The student’s comprehension can be predicted as
Critical Reading Literal Comprehension (2) To help parents recognize the academic a function of the disparity between the demands of
Vocabulary in Context the text and the student’s ability.
Numbers and Operations
strengths and weaknesses of their children and
Algebra and Functions thereby foster individual remedial efforts at home
Math
Geometry and Measurement (3) To identify classwide or schoolwide achievement TEST OF GENERAL EDUCATIONAL
Data Analysis, Statistics, and Probability DEVELOPMENT (GED)
Essay
deficiencies as a basis for redirection of instructional
Improving Sentences efforts - for highschool equivalency certification
Writing
Identifying Sentence Errors (4) To appraise the success of educational - consists of multiple-choice examinations in 5
Improving Paragraphs educational areas:
programs by measuring the subsequent skill
(1) Language Arts – Writing not understand the question. - supported by adoption studies, familial research,
(2) language Arts – Reading and twin projects
(3) Mathematics Bias in Predictive or Criterion-Related Validity - Enriching the psychological environment can boost
(4) Science - An unbiased test will predict future performance IQ level (around 20 points)
(5) Social Studies equally well for persons from different
- The GED emphasizes broad concepts rather than subpopulations. AGE CHANGES IN INTELLIGENCE
specific facts and details. - Criterion of homogenous regression: A test is - Cross-Sectional Research: Wechsler’s result
unbiased if the results for all relevant subpopulation indicated a rapid growth of intelligence in childhood
TEST BIAS AND OTHER cluster equally well round a single regression line. through age 15or 20, followed by a slow decline to
CONTROVERSIES - Intercept bias: The use of a single regression line age 65.
would constitute a clear instance of test bias, - Cross-sectional method often confounds age-
________________________________ because the test has differential predictive validity. group differences with educational disparities or
Slope bias: The regression lines for separate other age-group differences.
MISCONCEPTIONS ABOUT IQ subgroups are not even parallel. -Sequential Studies: Longitudinal design
- “Innate IQ” fallacy states that each person Bias in Construct Validity eliminates age-group disparities as a confounding
possesses a near constant IQ that persists as part - Bias exists in regard to construct validity when a factor
of his or her core identity throughout her life. test is shown to measure different hypothetical traits - Pitfalls:
- IQ is not a measure of personal worth (psychological constructs) for one group than for 1) Time of measurement
- Another misconception is the belief that IQ scores another. 2) Selective attrition
remain stable from childhood into maturity. - A test is nonbiased if comparisons across relevant 3) Practice effects
subpopulations should reveal a high degree of 4) Regression to the mean
THE QUESTION OF TEST BIAS similarity for:
The Test-Bias Controversy (1) the factorial structure of the test FLUIDAND CRYSTALLIZED INTELLIGENCE
- A test is deemed is bias if it is differentially biased (2) the rank order of item difficulties within the test - Fluid intelligence relates to a significant age-
for different subgroups. Test fairness, on the other related decrement because of its reliance upon
hand is broad concept that recognizes the SOCIAL VALUES AND TEST FAIRNESS neural integrity, which is presumed to decline with
importance of social values in test usage. - The proper application of psychological tests is advancing age.
- has its origin in the observed differences in essentially an ethical conclusion that cannot be
average IQ among various racial and ethical groups. established on objective grounds alone. POSTFORMAL OPERATIONS
Unqualified Individualism - The formal operational thought tends to be
- significant gender differences also exist on some - The best qualified candidates should be selected unsuited to the fuzzy, dynamic, conditional, and
ability measures for employment admission or other privilege. unstructured problems encountered in everyday life.
Bias in Content Validity Quotas
- Test content is a source of cultural bias against - By definition, fair share quotas are based initially - As suggested by post-Piagetian theorists,
minorities: based upon population percentages. postformal operations, dialectical thought, or
 The items ask for information that they have not Qualified Individualism wisdom has the following general characteristics:
had equal opportunity to learn. - a radical variant of individualism. For selection 1. recognition that knowledge us relative and
 The scoring of the items is improper, minorities purposes, the qualified individualists rely exclusively temporary, not absolute and universal.
are penalized for giving answers that would be on tested abilities, without reference to age, sex or 2. The acceptance of contradiction as a basic
correct in their own culture but not of that test other demographic characteristics. aspect of reality.
maker. 3. The ability to synthesize contradictory thoughts,
 The wording of the question is unfamiliar; the GENETIC AND ENVIRONMENTAL emotions, and experiences into more coherent
minority may know the correct answer but may DETERMINANTS OF INTELLIGENCE wholes.
not be able to respond because he or she does - measured in terms of heritability index
TESTING SPECIAL POPULATIONS system. - General Cognitive Index
_________________________________ - suitable up to two months of age but is most
ASSESSMENT OF INFANT ABILITY commonly administered in the first week of life DIFFERENTIAL ABILITY SCALES (DAS)
- The infant and preschool period extends from birth - the scale assesses the infant’s behavioral - cover an age range of 21/2 years to 18 years in 3
to roughly 6 years of age. repertoire on 28 behavior items, each scored on a 9- overlapping batteries: lower preschool, upper
- Infant and preschool assessment devices can help point scale; 18 reflex items, each scored on a 4- preschool, and school-age
answer these questions at both extremes of the point scale Subtests:
spectrum: those who might be developmentally - the scoring of NBAS do not provide an integrative (1) Core – heavily saturated with the g factor
delayed, and those who might be intellectually scoring system, instead it consists of a summary (2) Diagnostic – used for clinical analysis only
gifted. sheet with ratings of each specific items
- Most infant tests (ages birth to 2 ½) load heavily on - It can provide feedback to parents in clinical work. WESCHLER PRESCHOOL AND PRIMARY
sensory and motor skills. Preschool tests (ages 2 ½ SCALE OF INTELLIGENCE- III (WPPSI-III)
to 6) tend to tap cognitive skills to a significant ORDINAL SCALES OF PSYCHOLOGICAL - very similar to its predecessors but offers updated
degree. DEVELOPMENT norms, more expansive assessment of cognitive
- designed as a Piagetian-based tool for measuring functions and application wider age range (2 ½ to 7
GESELL DEVELOPMENTAL SCHEDULES intellectual development between the ages of 2 years and 3 months)
- designed to measure developmental progress of weeks and 2 years. - consists of 14 subtypes as one of the three types
babies and children from 4 weeks to 60 months age. - consists of 6 scales each designed to measure a (1) Core
specific ability that arises during Piaget’s first stage (2) Supplemental
- all infants tests have borrowed or adapted items of sensorimotor intelligence. (3) Optional
from the original schedules devised by Arnold 1. Visual pursuits and permanence of objects
Gesell. 2. Development of mean-ends STANFORD-BINET INTELLIGENCE SCALES FOR
- observes and evaluates the developmental 3. Vocal and gestural imitation EARLY CHILDHOOD
attainment of children in five areas: gross-motor, 4. Development of operational causality - combines the subtests from the Stanford-Binet
fine-motor, language development, adaptive 5. Construction of object relations in space Intelligence Scale 5th Edition with a new Test
behavior, and personal-social behaviors. 6. Development of schemes for relating to objects Observation Checklist (TOC) and a software-
- Most of the 144 items in the schedule are purely generated Parent Report
observational, based on the direct inspection of the BAYLEY SCALES OF INFANT AND TODDLER - for 2 ½ to 7 years and 3 months
child’s responses to toys and standard situation. In DEVELOPMENT-III - The purpose of TOC is to provide a qualitative but
some cases, information from a parent or caretaker - this instrument is an important mainstay for the highly structured format for describing a wide range
is needed to score individual items. evaluation of developmental delay in infants and of behaviors, including noncompliance, known to
- used by paediatricians and other child specialist to toddlers affect performance. It is divided into two groups;
identify risk of neurological impairment and mental - suitable for children 1 month to 42 months of age (1) Characteristics
retardation. - provides assessment of five domains: cognitive, (2) Specific Behaviors
- Developmental quotients (DQ) can be obtained motor, language, adaptive behavior, social-
with the formula: emotional
- excellent standardization and technical quality Predictive Validity of Infant Scales
DQ = Maturity Age / Chronologic Age X100 Infant tests generally have poor prognostic value,
ASSESSMENT OF PRESCHOOL INTELLIGENCE whereas preschool tests are moderately predictive
NEONATAL BEHAVIORAL ASSESSMENT of later intelligence.
SCALE (NBAS) MC CARTHY SCALES OF CHILDREN ABILITIES
(MSCA) Practical Utility of Infant Scales
- It is unique because of its theoretical orientation, - Screening for developmental disabilities
which emphasizes the need to document the - individually administered intelligence test
designed for children 2 ½ to 81 ½ years of age - A very low score on infant test – two or more
contributions of the newborn to the parent-infant standard deviations below the mean –
- consists of 18 separate subtests
accurately prognosticates mental retardation in to 6 years, the test consists of 125 items in four of intelligence that can be administered
childhood. areas: individually or in a group.
(1) personal-social, (2) fine motor-adaptive, (3) - 73 scorable items
FAGAN TEST OF INFANT INTELLIGENCE language, and (4) gross motor - Aside from the Man Scale, it also includes the
- Traditional infant tests overlook early - each item is arranged chronologically on the Woman and Self Scale.
information-processing behaviors such as test by age of the child and marked pass/fail - 3-16 years old
recognition, memory, and attentiveness to the - test interpretation categories (normal, Hiskey-Nebraska Test of Learning Aptitude (H-
environment. questionable, abnormal) NTLA)
- It assesses visual recognition memory using a - is a nonlanguage performance scale for use
10-trial habituation format HOME with children ages 3 to 17 years
- Novelty Preference - The Home Observation for Measurement of the - can be administered entirely through
Environment (HOME) is the most widely used pantomime and requires no verbal response
SCREENINGFOR SCHOOL READINESS index of children’s environment. from the examinee
- The purpose of screening is to identify at risk - it comes in three forms: Toddler (0-3), Early - It consists of 12 subtests:
children. At risk refers to likelihood of failure in Childhood (3-6), and Middle Childhood (6-10) Bead Patterns Block Patterns
the early elementary years of schooling. - based on home observation and interview with
- It is also linked to concept of developmental the primary caretaker Memory For Color Completion of Drawings
delay, which refers to children whose cognitive Picture Identification Memory for Digits
development is well below age expectations. TESTING PERSONS WITH DISABILITIES
- For practical reasons, individual intelligence Non-Language Tests Picture Association Puzzle Blocks
tests are not suitable as screening instrument Leiter International Performance Scale – Revised Paper Folding Picture Analogies
because it requires a substantial component of
time. The ideal screening instrument is a short - a culturally-reduced measure of nonverbal Visual Attention Span Spatial Reasoning
test that can be administered by individuals who intelligence - Raw scores on the subtests are
have received limited training in assessment. - a remarkable feature is the complete
- A sensible screening test is one that provides a elimination of verbal instructions converted into a Deviation Learning
cutoff score that is accurate in classifying - Leiter-R is particularly suitable for children and Quotient (LQ) with a mean of 100 and a
children as normal or at risk. (false positive or adolescents whose English language skills are SD of 16
false negative) weak, with autism, traumatic brain injury, - use of H-NTLA may increase the risk
speech impairment, hearing problems, or an for false positive misclassification
DIAL III impoverished environment, or for children with
- The Developmental Indicators for the attentional problems. TEST OF NONVERBAL INTELLIGENCE-3
Assessment of Learning-III is an individually - Testing is performed by the child or adolescent (TONI-3)
administered screening procedure designed for matching small laminated cards underneath - a language-free measure of cognitive
quick and efficient detection of developmental corresponding illustrations on an easel display. ability designed persons with aphasia,
problems or giftedness. - The Leiter-R contains 20 subtests organized non-English speakers, those with
- It screens the performance of children in three into four domains: Reasoning, Visualization,
developmental domains: motor, concepts, and Memory, and Attention. hearing impairments, and persons who
language. - yields a composite IQ with the familiar mean of have experienced a variety of
100 and a standard deviation of 1, and subtest neurological traumas
DENVER II scaled scores with a mean of 10 and SD of 3 - It consists of two equivalent forms of
- probably the most widely known and Human Figure Drawing Tests 50 abstract/figural problem-solving
researched pediatric screening tool in the US - The Goodenough-Harris Drawing Test by
- suitable for infants and children ages 1 month Florence Goodenough is a brief, nonverbal test items. The items fall into three
categories: identification of basal and ceiling levels. Identification of the missing element in a 2x2 ma-
trix
Simple Matching - Raw scores are converted to age
Completion of a figure
Analogies equivalents or standard scores (mean of
Identification of the missing element in a 3x3 ma-
Classification 100 and SD of 15) trix
Intersection
Progressions Testing Persons with Visual Intelligence Test for Visually Impaired
- Many of the items are similar to the Impairments Children (ITVIC)
format of Raven’s Progressive Matrices. - Perkins Binet is based on the 1916 - designed for children 6 t 15 years of
- The test yields two kinds of scores: Stanford-Binet instrument, retains most age, and has separate norms for
percentile ranks and TONI-3 quotients of the verbal items but also adapts other partially sighted and totally blind
(mean of 100 and SD of 15). items to a tactual mode. examinees
-Haptic Intelligence Scale for Adult Blind - takes three hours to administer the full
Non-Reading and Motor-Reduced Tests (HISAB) battery
- Nonreading tests of intelligence are - consists of six subtests, 4 of which
designed for illiterate examinees who resemble the Digit Symbol, Block Verbal Nonverbal/Haptic
can nonetheless, understand spoken Design, Object Assembly, and Picture Perception of Objects
English well enough to follow Completion tests of the WAIS Vocabulary
Digit Span
Perception of Figures
Block Design
instructions. Performance Scale. Verbal Fluency Rectangle Puzzles
Verbal Analogies Map and Plan Tests
- The performance subtests of most - The remaining two consists of Bead Learning Names Exclusion of Figures
mainstream instruments qualify as non Arithmetic (use of abacus to solve Figural Analogies
reading tests. arithmetic problems) and a Pattern
Testing Individuals Who Are Deaf or Hard of
Board (reproducing the pattern felt on a Hearing
Peabody Picture Vocabulary Test-III board that has rows of holes with pegs - For the intellectual assessment of persons who are
(PPVT-III) on them. deaf or hard of hearing, the Wechsler Performance
- is the best known and most used of the Blind Learning Aptitude Test (BLAT) subtests remain the tools of choice and sometimes
nonreading, motor-reduced tests - A tactile test for children from 6 to 16 Raven’s Progressive Matrices and Hiskey-Nebraska
Test of Learning Aptitude.
- a well-normed measure of hearing years of age who are blind; items are in
vocabulary bas-relief form consisting of dots and Assessment of Adaptive Behavior in Mental
-It comes in two parallel versions, each lines similar to Braille. Retardation
consisting of 4 practice plates and 204 - Most of the items were adopted from Four Levels of Mental Retardation
testing plates. Each plate contains four RPM and CCFIT Profound: IQ below 20-25
Severe: IQ of 20-25 to 35-40
line drawings of objects or everyday - The items consists of six different Moderate: IQ of 35-40 50 50-55
scenes. types: Mild: IQ of 50-55 to 70-75+
- The test items are precisely ordered Recognition of Differences Four Levels of Support: pervasive, extensive,
according to difficulty level, arranged in Recognition of Similarities
limited, intermittent
17 sets of 12 items each for efficient - The American Association on Mental Retardation
Identification of Progressions lists 10 different areas of adaptive skill:
(1) Communication (1) Hurtful to self psychoanalytic theory.
(2) Self-care (2) Hurtful to others - Unstructured, vague, and ambiguous stimuli
(3) Home Living (3) Destructive to property provide the ideal circumstance for revelations about
(4) Social Skills (4) Disruptive behaviour inner aspects of personality.
(5) Community use (5) Unusual or repetitive habits - The projective hypothesis states that personal
(6) Self-direction (6) Socially offensive behavior interpretations of ambiguous stimuli must
(7) Health and safety (7) Withdrawal or inattentive behavior necessarily reflect the unconscious needs, motives,
(8) Functional academics (8) Uncooperative behavior and conflicts of the examinee.
(9) leisure - The SIB-R is an excellent tool for providing insights - Galton developed the first projective technique, a
(10) Work into an examinee’s current level of functioning in word association test.
real-life situations in the home, school, and - Projectives can be divided into 5 categories:
Vineland Social Maturity Scale community settings. 1) Association to inkblots or words
- the original scale consisted of 117 discrete items 2) Construction of stories or sequences
arranged in a year-scale format, now known as the Independent Living Behavior Checklist (ILBC) 3) Completion of sentences or stories
Vineland Adaptive Behavior Scales (VABS) - An extensive list of 343 independent living skills 4) Arrangement/selection of pictures or
- VABS provides an evaluation on the following classified and presented in six categories verbal choices
domain and subdomains (1) Mobility 5) Expression with drawings or plays
Domain Subdomain (2) Self-care
(3) Home maintenance and safety Association Techniques
Communication receptive, expressive, written (4) food THE RORSCHACH
Daily Living Skills personal, domestic, community (5) social and communication - This association technique consists of 10 inkblots
interpersonal relationships, play and leisure
(6) functional academic devised by Herman Rorschach.
Socialization - This is completely nonnormative, it only facilitates - The Rorschach is suited to persons age 5 but is
time, coping skills
Motor Skills gross, fine the training of the individual examinee in the skills commonly used with adults
required for independent living. - Administration consists of two phases: free
association phase and inquiry phase (location,
- Scales for adaptive behavior can be distinguished Inventory for Client and Agency Planning (ICAP) determinant, content, popular versus original)
into two types: - This test is one of the most widely used tests in the - Critics on Rorschach include:
1) One group of mainly norm-referenced scales is field of developmental disabilities. 1) The purpose of Rorschach is ill-defined and
used largely to assist in diagnosis and classification - The test is a 16-page booklet that evaluates unclear, declining to regard it as a test, preferring
2) Another group is of mainly criterion-referenced adaptive behavior, maladaptive behavior, and the instead to call it a method for generating information
scales is used largely in training and rehabilitation. need for assistance and supports. about personality functioning.
2) Rorschach is highly susceptible to faking
Scales of Independent Behavior – Revised (especially to psychosis)
- The instrument consists of 259 items organized 3) low reliability and general lack of predictive
into 14 subscales, arranged into four clusters, PERSONALITY TESTING
validity
constituting the Broad Independence Scale. PROJECTIVE TECHNIQUES HOLTZMAN INKBLOT TECHNIQUE (HIT)
(1) Motor Skills _________________________________ - Wayne H. Holtzman sought to overcome the
(2) Social and Communication Skills - Projective method refers to describe a category of limitations in the Rorschach by developing a
(3) Personal Living Skills tests for studying personality with unstructured completely new technique using more inkblots with
(4) Community Living Skills stimuli. In projective test the examinee encounters simplifies procedures for administration and scoring.
- The Problem Behavior Scale includes eight major the vague, ambiguous stimuli and responds with his
categories of personal and social maladjustment or her constructions. - In HIT, the examinee is limited to one response per
that could affect adaptive behavior. - Projective techniques are heavily vested in card, of 45 cards. It comes in two parallel forms.
_________________________________________ Construction Techniques PERSONALITY
THEMATIC APPERCEPTION TEST (TAT) __________________________________
Completion Techniques - consists of 30 pictures that portray a variety of - Self-report inventories and behavior rating scales
SENTENCE COMPLETION TESTS subject matters and themes in black-and-white gained are examples of structured approaches in
- The respondent is presented with a series of stems drawings and photographs, one card is blank personality testing.
consisting of the first few words of a sentence, and - developed by Henry Murray - Three tactics for personality test development:
the task is to provide an ending. - Majority of the TAT pictures exert a strong theory bounded approaches, factor-analytic
- It can be interpreted in two different ways: negative stimulus “pull” on storytelling. It is not approaches, and criterion-key methods.
subjective-intuitive analysis and objective analysis surprising that projective responses to the TAT are
strongly channelled toward negative, melancholic Self-report Inventories
ROTTER INCOMPLETE SENTENCES BLANK stories Theory-Guided Inventories
(RISB) PICTURE PROJECTIVE TEST (PPT) EDWARDS PERSONAL PREFERENCE
- has the strongest empirical underpinnings and is - projective responses were of comparable length SCHEDULE (EPPS)
the most widely used in clinical settings with the TAT but are much more positive in thematic - an objective, structured test to measure those 15
- 40 sentences stems written mostly in the first content and emotional tone. needs proposed by Henry Murray
person for each forms: highschool, college, and -placed more emphasis on interpersonal stories - the inventory uses forced-choice format and an
adult ipsative test and normative test format integrated
- in the objective scoring system, each completed CHILDREN”S APPERCEPTION TEST (making the scoring confusing)
sentence receives an adjustment score from 0 - a direct extension of the TAT consists of 10 - to do away with social desirability response set
(good adjustment) t0 6 (very poor adjustment) along pictures and is suitable for children 3-10 years of (pairs of statement in each items are matched for
6 categories: omission, conflict response, positive age social desirability)
response, and neutral response -depicts animals because young children would - widely used in college counseling as a means of
identify better with animals than humans personal discovery
ROSENZWEIG PICTURE FRUSTRATION STUDY - no formal scoring system and no statistical
- The purpose of the P-F study is to assess the information is provided for reliability and validity PERSONALITY RESEARCH FORM (PRF)
examinee’s characteristics manner of reacting to _________________________________________ - still based on Murray’s personality theory the same
frustration (consists of 24 drawings) as EPPS
- The value of the P-F Study is its multifaceted Expression Techniques - true-false item format
conceptualization of aggression according to three DRAW-A-PERSON TESTS (DAP) - used primarily with college students
directions and three types - pioneered by Karen Machover, and still widely - Unlike many other personality inventories, the PRF
Direction used as a clinical assessment tool scales have no item overlap
1. Extragressive – turned in to the environment HOUSE-TREE-PERSON TEST (HTP) - a desirable feature of PRF is its readability
2. Intragressive – turned by the examinee into the - HTP was originally conceived as a measure of
self intelligence, but now, it is used almost exclusively as MYERS-BRIGGS TYPE INDICATOR (MBTI)
3. Imagressive – it is evaded in an attempt to gloss a projective measure of personality - a forced-choice self-report inventory that attempts
over the frustration - The House drawing mirrors the examinee’s home to classify persons according to an adaptation of
Type life and intrafamilial relations; the Tree drawing Carl Jung’s theory of personality types
1. Obstacle-dominant – response focused on the reflects the manner in which the examinee - comes in a 166- (Form F) and 126-item version
barrier that caused the frustration experiences the environment, and the Person (Form G); Form F is widely used
2. Ego-defensive – organizing capacity drawing echoes the examinee’s interpersonal - MBTI is regarded as the most widely personality
predominates in the response relationships test of any kind with nonpsychiatric populations
3. Need-persistent – the solution is emphasized, - scored on four theoretically independent
pursuing the goal despite the obstacle STRUCTURED ASSESSMENT OF dimensions:
_________________________________________
(1) Extraversion-Introversion 16 Personality Factors/Source Traits - several items used archaic and obsolete
(2) Sensing-Intuition terminology; sexist language; some items are found
Warmth Shrewdness
(3) Thinking-Feeling Intelligence Insecurity
objectionable like those dealing with religious beliefs
(4) Judging-Perceptive Emotional Stability Radicalism - item pool was not broad enough to assess many
- examinees’ scores are summarized in a typological Dominance Self-sufficiency important characteristics (the range of item
Impulsivity Self-discipline
manner (ISTJ, ENFP) Conformity Tension
coverage must be extended)
- Interpretations seems too slick and simple, Boldness Extraversion - MMPI-2 (released in 1989)
possessing an almost horoscope-like quality. he Sensitivity Anxiety - The MMP-2I can be scored for 4 validity scales, 10
Suspiciousness Tough Poise
Caution must be urged in the application of the Imagination Independence
standard clinical scales, and dozens of
MBTI, especially when making simplistic inferences supplementary scales
from the 4-letter type formula EYSENCK PERSONALITYQUESTIONNAIRE 4 validity scales
- EPQ is a series of tests designed to measure the * Cannot Say – the total number of items omitted or
major dimensions of normal and abnormal double-marked (indicates reading problem,
personality: Psychoticism, Extraversion, and opposition to authority, defensiveness, or
Neuroticism and incorporates Lie Scale to assess indecisiveness caused by depression
Factor-Analytically Derived Inventories the validity of an examinee’s responses *L Scale – 15 items all scored in the false direction,
SIXTEEN PERSONALITY FACTOR - The EPQ contains 90 statements answered yes or designed to identify a general, deliberate, evasive
QUESTIONNAIRE (16PF) no (true or false format) and is designed for persons test-taking attitude (indicates defensiveness and
- The 16PF is a widely used forced-choice, untimed ages 16 and older; 81 statements is for 7-15. naivety)
test of personality that is currently available in five * F Scale – consists of 60 items answered by normal
separate forms. NEO PERSONALITY INVENTORY-REVISED subjects in the scored direction no more than 10
- Each form consists of declarative stems that (NEOP-PI R) percent of the time. (Reflects maladjustment like
require the examinee to respond to a specific - based upon the five-factor model of personality peculiar thoughts, apathy, and social alienation
situation by choosing from among the two or three (with 6 facets each domain) * K Scale – Composed of 22 items that differentiated
forced choice options. - 240 items rated on a five-point dimension normal profiles produced by defensive hospitalized
- The 16 PF is intended for high school seniors and - shorter version (NEO Five-Factor Inventory) psychiatric patients from those produced by normal
adults. - Form S is for self-reports whereas Form R is for controls. 8 items that improved discrimination of
- A significant shortcoming of this test is that most outside observers (e.g., the spouse of a client) depressive and schizophrenic symptoms. (normal
norms date back to 1970. - the item format consists of five-point ratings: range suggests good ego strength, elevated score
- The 16PF is predicated on Catell’s factor analytic strongly disagree, disagree, neutral, agree, strongly indicate a defensive test-taking attitude)
conception of personality. Source traits (the stable, agree 10 Clinical Scales
constant, but less-visible wellsprings of behavior) - particularly useful in research and shows a Hypochondriasis Paranoia
emerge only from specialized factor analyses of the promise as a measure of psychopathology Depression Psychasthenia
surface traits (the more obvious aspects of (borderline, ADHD) Hysteria Schizophrenia
personality and emerge from simple cluster Psychopathic deviate Hypomania
Masculinity-femininity Social Introversion
analyses of test responses. Criterion-Keyed Inventories
- 4 second-order indices of personality are MINNESOTA MULTIPHASIC PERSONALITY
computed from weighted linear sums of the previous - MMPI-2 scale raw scores are converted to T
INVENTORY (MMPI -2) scores, with a mean of 50 and a standard deviation
16 indices, yielding a total of 20 bipolar scales. - a 566-item true or false personality inventory
- The major thrust of application for the 16PF is of 10.Scores that exceed a T of 65 may signify a
designed originally as an aid in psychiatric diagnosis presence of psychiatric symptomatology.
career guidance, vocational exploration, and Concerns:
occupational testing. Interpretation:
- sampling issues in selecting the control group Scale by Scale
Limitations: (convenience sampling)
- short items for each scale diminishing reliability the simple approach, inspecting the four validity
scales and make hypotheses workers against the first sample. personality-based integrity tests which typically do
Configural Approach Employment Interview not contain obvious references to undesirable
more complicated, a combination of 2 elevated - highly considered as a crucial make-or-break employee behavior.
clinical scales component of hiring Work Sample and Situational Exercises
- Reliability of interview can be raised using panel or - A work sample is a miniature replica of the job for
- computerized MMPI-2 narrative reports (The structured interview or in conjunction with other which examinees have applied.
Minnesota Report) provides good accuracy information. - A work sample should test important job domains,
- intercorrelations among the clinical scales are - Actuarial prediction (based in empirically derived not the entire job universe.
extremely high (indicating overlap of scales) formulas) is way superior to clinical prediction - A situational exercise is approximately the white-
* A high intercorrelation of basic scales is one of the (based in subjective impressions). collar equivalent of work sample. Situational
price to be paid for using the test development Cognitive Ability Tests exercises are largely used to select persons for
strategy - specific ability or general cognitive ability?? managerial and professional positions.
- considerably * The general factor of intelligence is usually a better - Employee is asked to perform under
a premiere instrument for assessment of predictor of training and job success than are score circumstances that are highly similar to the
psychopathology in adulthood for many years to on specific cognitive measures. anticipated work environment.
come 1) Wonderlic Personnel Test - Examples are: The In-Basket Test and
- a measure of general cognitive ability (group test) assessment centers
INDUSTRIAL, OCCUPATIONAL, AND - 50-multiple choice items, a 12-minutetime limit
FORENSIC ASSESSMENT - items include vocabulary, sentence
rearrangement, arithmetic, problem solving, logical APPRAISAL OF WORK PERFORMANCE
________________________________ induction, and interpretation of proverbs
THE ROLE OF TESTING IN PERSONNEL - In the absence of meaningful feedback, employees
2) Bennett Mechanical Comprehension Test have no idea how to improve.
SELECTION (BMCT)
- I/O research on personnel selection highly - Performance evaluation has four major uses
- the test emphasizes basic mechanical principles (1) comparing individuals in terms of their overall
emphasized criterion validity: current assessment - Despite its, psychometric excellence, the BMCT is
results must predict future criterion of job performance level
need of modernization. The test looks old and many (2) identifying and using information about individual
performance. items are outdated.
- Interview is a widely used form of personnel strengths and weaknesses
3) Minnesota Clerical Test (3) implementing and evaluating human resource
assessment. - purports to measure perceptual speed and
Autobiographical Data systems in organizations and documenting or
accuracy relevant to clerical work justifying personnel decision
- The rationale for the use biodata approach is that - two subtests: Name Comparison and Number
future work-related behavior can be predicted from (4) documenting or justifying personnel decisions
Comparison
past choices and accomplishments. Personality and Temperament Tests
- It has a predictive power because certain - Criteria for effective performance is seldom vague.
- Personality tests must possess a demonstrated - A criterion problem in the implementation of
characteristic traits that are essential for success link to job performance before they are used in
are stable and enduring. performance evaluation meant to convey the
personnel selection. difficulties involved in conceptualizing and
- Empirical keying: Individual biodata data of - California Personality Inventory (CPI) provides an
successful and unsuccessful group of hired measuring performance constructs.
accurate measure of managerial potential and MMPI Performance Measures
individuals are contrasted to determine which items good selection tool for law enforcement. Hogan
most accurately discriminate between successful - include seemingly objective indices
Personality Inventory is well validated for job Disadvantages:
and unsuccessful workers; strongly discriminative performance in military, hospital, and corporate
items in the biodata are assigned large weights. 1. The rate of productivity may not be under the
settings (based in Big Five theory of personality). control of the worker.
- Cross validation: comparing the scoring scheme Paper-and-Pencil Integrity Tests
on second sample and successful and unsuccessful 2. Production counts are not applicable to most jobs.
- Overt integrity tests can be more easily faked than
3. An emphasis upon production may distort the - To reduce halo effect, a diary of information Vocational Interest Blank (SVIB)
quality of the output. relevant to appraisal must be kept - the theoretical foundation of SVIB derives from a
4. Production counts may tap only a small RATER BIAS typological, trait-oriented conception of personality
proportion of job requirements, even when they - Leniency errors occur when a supervisor tends to Assumptions:
appear to be definitive criterion. rate workers at the extremes of the scale. Severity 1. Each occupation has a desirable pattern of
5. Unreliable, especially over short periods of time. errors refer to the practice of rating all aspects of interests and personality characteristics among its
Personnel Data: Absenteeism performance as deficient. workers. The ideal pattern is represented by
- a major problem is in defining what is absenteeism - Central tendency errors occur when the rater successful people in that occupation.
Peer Ratings and Self-Assessments evaluates an employee in the context of other 2. Each individual has a relatively stable interest
- an important complement to supervisor ratings employees rather than based upon objective and personality traits.
- have limited application for purposes such performance. 3. It is highly possible to differentiate individuals in
personal development - Context errors occur when the rater evaluates an a given occupation from other-in-general in terms
Supervisor Rating Scales employee in the context of other employees rather of the desirable patterns of interests and traits for
- Rating scales are the most common measure of than based upon objective performance. that occupation.
job performance. - Criterion Contamination is said to exist when a - Strong Interest Inventory consists of 317 items
Types of Rating Scale: criterion measure includes factors that are not grouped into 7 sections.
graphic rating scale demonstrably part of the job. It has three types: - All scores are expressed as standard scores with
- famous because of simplicity (1) opportunity bias, (2) group characteristic bias, (3) a mean of 50 and SD of10.
- performance being evaluated may be vaguely knowledge of predictor bias - General Occupational Theme Scores: Realistic,
defines Investigative, Artistic, Social, Enterprising, &
critical incidents checklist GUIDELINES FOR DEVELOPING Conventional
- based upon actual episodes of desirable and PERFORMANCE APPRAISAL - A recent innovation in SII is the addition of bipolar
undesirable on-the-job behavior personal style scales: Work Style, Learning
behaviourally anchored rating scale  Base the performance appraisal upon a Environment, Leadership Style, Risk Taking/
- performance are identified and defined along a careful job analysis Adventure
dimension, scaled meaningfully  Develop specific, contamination-free criteria - Not recommended for use below high school level
forced-choice scale for appraisal from the job analysis because most students’ interests are
- is designed to eliminate bias and subjectivity in  Determine that the instrument used to rate underdeveloped and unstabilized prior to age 13or
supervisor ratings by forcing a choice between performance is appropriate for the appraisal 14
options that are equal in social desirability situation
 Train raters to be accurate, fair, and legal in KUDER GENERAL INTEREST SURVEY (KGIS)
SOURCES OF ERROR IN PERFORMANCE their use of the appraisal instrument - Its target population is restricted to adolescents in
SCALE  Use performance evaluations at regular grades 6 through 12.
- The failure to identify appropriate criteria for intervals of six months to a year - Well suited to the development of educational and
acceptable and unacceptable performance is a  Evaluate the performance system vocational goals in the early formative years of
major source of error in PA. periodically to determine whether it is adolescence
Halo effect actually improving performance -The inventory uses a forced-choice response triad
- The tendency to rate an employee high or low on format. The 168-item inventory produces 10
all dimensions because of a global dimension. Inventories for Interest Assessment interest scores that are largely ipsative in nature.
- Overgeneralization from one element of a worker’s - Interest assessment promotes two compatible - Inattention to opportunity is common to all
behavior goals: life satisfaction and vocational productivity interest measures.
- For example, an employee with perfect attendance Outdoor Artistic
may receive higher-than-received evaluations on STRONG INTEREST INVENTORY (SII) Mechanical :Literacy
- the latest revision of the well-known Strong Computational Musical
productivity and work quality.
Scientific Social Service examinee marks “like” or “dislike” in four  Interpretation of polygraph data
Persuasive Clerical
sections: Activities, Competencies,  Specialized forensic personality assessment
Occupations, Self-estimates
VOCATIONAL PREFERENCE INVENTORY (VPI)
- takes 30-50 minutes for completion and is THE NATURE OF FORENSIC ASSESSMENT
- VPI is an objective paper-and-pencil
intended for person 15 years and older - Forensic assessment is molded by the
personality interest inventory used in
- three highest themes scores are identified prerequisites of the legal system whereas
vocational and career assessment .
from the RIASEC themes (e.g. IAR which traditional assessment is shaped by the needs of
- VPI is a brief test (15-30 minutes) and is
resembles the following occupation: the client and the current professional standards.
intended for persons 14 years and older. It
anthropologist, astronomer, chemist) - Traditional assessment usually is broadscale and
measures 11 dimensions
Realistic Self-Control provides a comprehensive picture of a client’s
Investigative Masculinity/Femininity CAMPBELL INTEREST AND SKILL SURVEY functioning and treatment needs, forensic
Artistic Status (CISS) assessment engages a narrow focus that may not
Social Infrequency
Enterprising Acquiescence - newer measure of self-reported interest and even appear to be clinical in nature.
Conventional skills - In forensic assessment, the individual undergoing
- Realistic type prefers hands-on or outdoor - consists of 200 interest items and 120 skill the assessment is not really the client, it would be
vocations (farmers, mechanic, electrician) item s accurate to refer to them as “examinee”. The judge,
- Investigative type task-oriented thinker with - CISS are scored on several different kinds of lawyer, or other court officer is usually the real
unconventional attitudes (chemist, physicist) scales: Orientation Scales, Basic Interest and client.
- Artistic type prefers aesthetic pursuits and is Skill Scales, Occupational Scales, Special - The written report also differs in the two settings.
individualistic Scales, and Procedural Scales The forensic report should:
-Social type uses social competencies to solve - All scale scores are reported as T scores, with (1) separate facts from inferences
problems, likes to help others, and prefers a mean of 50 and standard deviation of 10 (2) stay within the scope of the referral question
teaching or helping profession 7 Orientation Scales (3) avoid information overkill
- Enterprising type a leader with good selling Influencing (4) minimize clinical jargon
Organizing
skills who fits well in business and managerial Helping
positions. Creating EVALUATION OF SUSPECTED MALINGERING
- Conventional type is conforming and prefers Analyzing - For forensic clients, fabrication of symptoms may
Producing
structures roles such as bank teller or Adventuring serve to excuse unacceptable behavior (favouring
computer operator -29 basic skills, 58 occupational scales the insanity plea) sway sentencing
- Holland proposes that personality traits tend recommendations (against capital punishment), or
to cluster into small number of vocationally gain entitlements (certificate of disability).
relevant patterns called type. There is also a - Structured Interview of Reported Symptoms
corresponding work environment best suited to
FORENSIC APPLICATIONS OF (SIRS). A 172-item interview schedule, with 8
that type. ASSESSMENT scales designed expressly for the evaluation of
- Individuals tend to move toward environments __________________________________ malingering
that are congruent with their personality types. - The role of the psychological examiner can
- The better the person-environment fit, the intersect with the legal system in a multitude ways: ASSESSMENT OF MENTAL STATE FOR THE
greater should be job satisfaction.  Evaluation of possible malingering INSANITY PLEA
 Assessment of mental state for the insanity - Whenever a special defense is invoked, a
SELF-DIRECTED SEARCH plea evaluation of the defendant’s mental state at the
- designed to be a self-administered, self-  Determination of competency to stand trial time of the offense
scored, and self-interpreted test of vocational  Prediction of violence and assessment of risk - Technically, the insanity defense is known as not
interest  Evaluation of child custody in divorce guilty by reason of insanity (NGRI)
- consists of dichotomous items that the  Assessment of personal injury
_________________________________________ o Research
Adverse impact is a legal term referring to the FACTROS INFLUENCING THE SOUNDNESS OF
disproportionate selection of white candidates over TESTING
minority candidates. (1) the manner of administration
Standards for Educational and Psychological (2) the characteristics of the tester
Testing (3) the context of the testing
Horizontal Test – combination of different tests (4) the motivation and experience of the examinee
are administered (e.g. test battery) (5) method of scoring
Vertical Test – the same test but different levels of
the test are administered (e.g. OLSAT)
Spiral Test Format – progressing degree of
difficulty of the tests
Cyclical Test Format – The test items are
presented in differing degree of difficulty
Normative Test Format – A test format that
presents items that is independent of all other
items.
Ipsative Test Format - Type of measure that one A test is a standardized procedure for sampling
cannot legitimately compare two or more people. behavior and describing it with categories or
The overall score – averaged across all subjects – scores.
is always the same for every examinee. High Tests possess these defining features:
scores are also relative, not absolute. It reflects o Standardized procedure
intraindividual variability rather than interindividual o Behavior sample
variability. o Scores or categories
Flynn Effect: Newer tests of individual intelligence o Norms or standards
tests invariably yield lower IQ scores in comparison o Prediction of nontest behavior
to older tests Assessment is a more comprehensive term,
Speed test has a fixed time limit. An important referring to the entire process of compiling
focus of a speed test is the number of items information about a person and using it to make
completed in the time period provided. While inferences about characteristics and predict
power test allows the test taker to have sufficient behavior. Test represents only one source of
time to complete all items. Typically, power test information used in the assessment process.
have difficult items with a focus on the percentage
of items answered correctly. TYPES OF TEST
USES OF TESTING
o Classification
- Placement
- Screening
- Certification
- Selection
o Diagnosis and treatment planning
o Self-knowledge
o Program evaluation

You might also like