Cat4 Uk Technical Report
Cat4 Uk Technical Report
U K & I R E L AN D E DITIO N
COGNITIVE
 ABILITIES
   TEST
                                                                                                                                                     TECHNICAL REPORT
Contents
CAT4 UK EDITION  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                                       3
Test reliability .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                     3
Test re-test reliability  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                                4
Cognitive Abilities Test and National
Test indicators  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                       4
Key Stage 2 National Test indicators: England  .  .                                                                                              5
      Correlations of CAT4 and KS2 scaled scores .  .  .  .  .  .  .  .  .  .                                                                    5
      KS2 indicators for groups of students  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                                         8
CAT4 trialling  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                     15
      Pre-trials  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .     15
      Main trials  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .       15
Gender differences  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                                                23
      Verbal-Spatial profile  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .                          24
                                                                                                                                                         2
                                                                                           TECHNICAL REPORT
CAT4 UK EDITION
Test reliability
The reliability of a test is a measure of the consistency of a student’s
test scores over repeated testing, assuming conditions remain
the same – that is, there was no fatigue, learning effect or lack of
motivation. Tests with poor reliability might result in very different
scores for a student across two test administrations.
The reliability of the test was estimated using the Cronbach’s Alpha
formula which produces values ranging from 0 to 1. Values above 0.80
are considered to be very good. The reliability values for the various
CAT4 batteries are given in the table below, and all show that the tests
are very reliable. These are based on students who took part in the UK
standardisation.
                                           CAT4 reliability
                 Verbal     Quantitative     Nonverbal        Spatial
      CAT4                                                                  Overall
                Reasoning    Reasoning       Reasoning        Ability
      level                                                                 CAT4
                 Battery      Battery         Battery         Battery
     Level X      0.93          0.91            0.87           0.83          0.95
     Level Y      0.89         0.88             0.89           0.78          0.94
      Pre-A       0.82          0.81            0.78           0.67          0.90
       A          0.91          0.91            0.90           0.87          0.97
       B          0.89         0.90             0.90           0.88          0.96
       C          0.86          0.91            0.87           0.85          0.96
       D          0.90          0.91            0.89           0.86          0.96
       E          0.89         0.88             0.86           0.88          0.96
       F          0.89         0.87             0.85           0.88          0.96
       G          0.90         0.84             0.85           0.86          0.95
     Average      0.89         0.88             0.87           0.84          0.95
                                                                        Technical Report       3
                                                                                               TECHNICAL REPORT
However, most tests show the 90% chance or confidence bands. For
values around the average, the 90% confidence band is as follows:
                         Correlation
                       between Level D
                       and Level F SAS
     Mean CAT4                0.88
       Verbal                 0.80
     Quantitative             0.75
      Nonverbal               0.71
       Spatial                0.74
                                                                         Technical Report       5
                                                                                                              TECHNICAL REPORT
The graph below illustrates the relationship between the mean CAT4
score and the KS2 Mathematics scaled scores. It shows the most likely
scaled score and the score if the student is challenged. We can see that
the scaled scores increase as the CAT4 scores increase.
115
                                  110
                          score
                   scaled core
                                                  If challenged
                                  105
             Maths Scaled
                                   95
         KS2
90
85
                                   80
                                        60   70    80      90     100     110     120   130   140
                                                           Mean CAT4 score
For example, a student with a mean CAT4 score of 90, the ‘most likely’
Mathematics scaled score is 99 and the ‘if challenged’ threshold is 103.
Not all students with a mean CAT4 score of 90 will get a Mathematics
scaled score of 99. The ‘most likely’ score is an average, so around half
of the students with mean CAT4 scores of 90 will obtain a Mathematics
scaled score below 99; 25% of the students will obtain a Mathematics
scaled score of between 99 and 102; and 25% of the students will
obtain an ‘if challenged’ score of 103 or above.
90%
80%
                         70%
         % of students
60%
50%
20%
10%
                          0%
                                60   70   80    90   100    110     120   130    140
                                               MeanCAT4
                                               Mean CAT score
                                                         score
The chart below illustrates the relationship between the Verbal CAT4
score and the KS2 English Reading benchmarks.
90%
80%
                         70%
         % of students
60%
50%
20%
10%
                          0%
                                60   70   80    90   100    110     120   130    140
                                                Verbal CAT4
                                               Verbal       score
                                                       CAT score
                                                                                             Technical Report       7
                                                                                                                              TECHNICAL REPORT
The chart below illustrates the relationship between the Verbal CAT4
score and the KS2 English Spelling, and Grammar (SPAG) benchmarks.
90%
80%
                               70%
               % of students
60%
50%
20%
10%
                                0%
                                      60         70   80     90   100    110    120     130    140
                                                             Verbal
                                                            Verbal  CAT4
                                                                   CAT    score
                                                                       score
The correlations are all highly significant. The Mathematics and Science
outcomes tend to have their highest correlation with the mean CAT4
SAS. The CAT4 Verbal Reasoning score alone gives a slightly higher
correlation than the mean CAT4 score for Literacy, English and Welsh
2nd subject.
                                                                      Technical Report      9
                                                                                                   TECHNICAL REPORT
GCSE indicators
The GCSE indicators are derived from an analysis of the relationship
between CAT4 scores from Level D and above and GCSE examination
results at age 16 for a large and nationally representative sample of
around 91,000 students in 2019. These indicators are updated regularly
as we get new data.
The ‘most likely grade achieved’ is grade C with the student having
a 64% chance of achieving grade C or below and a 34% chance of
achieving grade B or above.
                                                                                 Technical Report     11
                                                                                                    TECHNICAL REPORT
GCSE grade indicators for groups of students
The table below illustrates how the group/class indicators have been
calculated for a fictitious class with five students and shows the most
likely grade achieved and the probabilities associated with getting
different Mathematics 9-1 grades. The group indicator is an average of
the individual student outcomes and probabilities. A similar method is
used for subjects using the A*-G grades.
Using individual student grade estimates to provide information about
the overall class or group grade outcomes will in most cases lead to
underestimating the number of students likely to get both the higher
and lower GCSE grades.
The group level indicators are the average of the probabilities for
all students in the group. Our research has shown that this method
provides the most accurate set of group level indicators. However,
group indicators are extremely sensitive to variations in the number of
students in the group, and may be very unstable for groups of less than
30 students. Group indicators should only ever be taken as a rough
guide to the possible future performance of a class.
80
70
                          60
                  score
     Attaimnet 88score
                          50
    Attainment
                          40
                                                                                   25th percentile
                                                                                        Percentile
                          30
                                                                                   Median
                          20                                                        75th percentile
10
                           0
                               60   70   80       90      100     110        120   130       140
                                                   Mean
                                                    MeanCAT4
                                                         CATscore
                                                             score
For example, for a student with a mean CAT4 score of 90, the most
likely Attainment 8 is 42 and the ‘if challenged’ score is 49. Not all
students with a mean CAT4 score of 90 will get an Attainment 8 score
of 35.
Around half the students will get an Attainment 8 score below 35,
with around 25% of the students obtaining an Attainment 8 score of
less than 26 – the bottom 25th percentile. Around 25% of students will
obtain the 'if challenged’ score of 43 and above.
                                                                                         Technical Report    13
                                                                                                        TECHNICAL REPORT
                                      Probabilityof
                                    Probability   of 5
                                                     five
                                                        orormore
                                                             moreGCSE
                                                                 GCSEs at
                                                                       at grades
                                                                          gradesA*–C
                                                                                 A*–C
                                             including English and Mathematics
                                          including English and Mathematics
                   100%
90%
80%
70%
                   60%
     Probability
50%
40%
30%
20%
10%
                    0%
                          70   75    80   85    90   95    100   1051 110   1151 1120   125   130
                                                     MeanCAT4
                                                     Mean      score
                                                          CAT score
Setting targets
The above confirms the need for suitably cautious interpretation when
using the indicators with staff and parents, and particularly if sharing
them with individual students. In the latter context, we would advise
that school staff follow the established best practice of schools, using
the results for mentoring and target-setting purposes by:
Main trials
The main trials of all the questions in all four batteries of CAT4 were
carried out in autumn 2010.
The numbers of students taking part in the trials were as follows:
                Trial sample
                           Number of
        Year
                            students
         4                     2,028
         6                     1,870
         8                     2,179
         10                    2,114
        Total                  8,191
For the trials, 24 test booklets were created, that is six test booklets
for each year group. All students took Verbal Classification and Figure
Recognition plus two of the remaining six test types, so that all items
were taken by at least 300 students. Some of the questions were
duplicated in booklets across year groups.
The data from the trials were analysed to provide information on the
difficulty level of each question, its ability to discriminate between high
and low scorers, and the extent to which it proved equally difficult for
both sexes, once each sex’s general level of performance was taken
into account. This information was then used to select and order the
sequences of questions for the final standardisation version of CAT4.
                                                            Technical Report    15
                                                                                           TECHNICAL REPORT
CAT4 UK standardisation: levels Pre-A to G
The standardisation of CAT4 took place between September and
December 2011 in England, Wales, Scotland and Northern Ireland. A
national database of schools was created and schools were grouped
into 10 categories – by country (Wales, Scotland and Northern Ireland)
and, for England, further grouped into independent or grammar, plus
five categories of school intake based on the proportion of students
taking free school meals.
Schools were selected by stratified random sampling procedures
within these groupings. As this was a national sample, many schools
taking part in the standardisation had never used CAT4 before. For the
standardisation, schools were asked to do one pre-selected CAT4 test
level and were given an option to do other levels. Schools were free
to choose between the paper and digital version of the test. Primary
schools were asked to test all students in the year group but secondary
schools had the option either to test two randomly selected teaching
groups if they tested by paper, or to test the whole year group if they
chose the digital option.
The numbers of students taking part in the standardisation were as
follows:
                                     Standardisation sample
        Country         Primary            Secondary                 Total
        England          4,663               13,085                  17,748
         Wales            269                2,169                   2,438
        Scotland          259                2,439                   2,698
    Northern Ireland      179                 1,645                   1,824
          Total          5,370               19,338              24,708
Main trials
The main trials of the CAT4 Levels X and Y questions were carried out
in Autumn 2013. Approximately 1200 students in Years 2 and 3 took
part in the trials.
Four test booklets were created - two test booklets for each year
group. Around 300 pupils took each booklet, with the parallel booklets
of each year group alternated within a class. All the questions used
in CAT4 Levels X and Y were used in the trialling with some of the
questions duplicated in booklets across the two different year groups.
The data from the trials were analysed to provide information on the
difficulty level of each question, its ability to discriminate between high
and low scorers and the extent to which it proved equally difficult for
both sexes, once overall score was taken into account. This information
was then used to select and order the sequences of questions for the
final standardisation version. Two versions of the test were created:
Form X for 7 year-olds (Year 2 in England and Wales or equivalent) and
Form Y for 8 year-olds (Year 3 in England and Wales or equivalent).
Standardisation
The standardisation of CAT4 Levels X and Y took place between May
and June 2014 in England, Wales, Scotland and Northern Ireland. A
national database of schools was created and schools were grouped
into nine categories by country and within England. This was further
grouped into ‘Independent’ plus five categories of maintained sector
schools based on the proportion of students taking free school meals.
Schools were selected by stratified random sampling procedures within
                                                                            Technical Report    17
                                                                                                TECHNICAL REPORT
these groupings. As this was a national sample, many schools taking
part in the standardisation had never used CAT4 before. Around 1900
students completed Form X and around 1100 students completed Form
Y. The standardisation results were weighted to account for sample
response bias.
The mean CAT4 Levels X and Y standard age scores (SAS) for males
and females for Levels X and Y are in the tables below.
Level X
                                 Nonverbal                Quantita-                 Mean CAT4
              Gender                         Verbal SAS               Spatial SAS
                                   SAS                    tive SAS                    score
                Mean               102.4       102.4        100.1        100.5        101.5
    Females     N                  944          941         941          941           945
                Std. Deviation     14.9         15.0        14.0         14.8          11.4
                Mean               98.6         98.6        100.3        99.3         99.2
     Males      N                   981         966         967          962           984
                Std. Deviation     14.6         14.8        16.9         15.0          12.0
                Mean               100.5       100.5        100.2       100.0         100.4
     Total
   including    N                  1931         1913        1914         1909         1934
   unknown
                Std. Deviation     14.8         15.0        15.5         14.9          11.8
Level Y
Overall, female mean CAT4 scores are around 2 SAS points higher than
for males for both Levels X and Y. In particular, the mean Verbal and
Nonverbal scores are around 4 SAS points for females.
Note that the mean CAT4 score is not a Standard Age Score but an
average of the nonverbal, verbal, quantitative and spatial SAS. The
standard deviation for the mean CAT4 score is around 12, lower than
the 15 that is expected for an SAS. This does not indicate the sample
was unrepresentative in its spread of ability: rather, that the scores for
the four components are correlated, so the spread narrows as scores
are averaged.
                                      Correlation
   Level X
                      English level   Maths level   Science level
   Nonverbal SAS          0.41           0.39           0.37
   Verbal SAS             0.65           0.57           0.52
   Quantitative SAS       0.47           0.48           0.40
   Spatial SAS            0.42           0.43           0.43
   Mean CAT4 score        0.63           0.61           0.55
Note: Figures in bold are the highest correlations for each outcome.
                                                                    Technical Report    19
                                                                                 TECHNICAL REPORT
Evaluating differences between CAT4
scores
Evaluating a difference between two scores, whether scores on two
different tests or scores on the same test on two occasions, has to be a
three-stage process.
Rarity of differences
Second, if the difference is ‘real’ or statistically significant, then
the unusualness or rarity of the difference has to be evaluated. A
significant difference can sometimes be very common. For example, if
you use a millimetre ruler to measure a boy’s height when he is seven
and then again when he is eight, the difference between these two
heights can be measured very accurately to within two millimetres.
Therefore ‘real’ or statistically significant differences will be very
common in a sample of boys because the difference between the
heights is likely to be substantially greater than two millimetres in
almost all cases.
The spread of difference in scores can be determined either directly
from the data or by a formula that takes into account the spread of
scores on each test and the correlation between the two sets of scores.
If the sample size is large enough, the two methods will produce very
similar results; this was the case for the standardisation of CAT4. The
formula used is:
SEMdiff= √ (SD12 + SD22 – 2r12 SD1SD2)
where SD1 and SD2 are the standard deviations of the scores on each
test and r12 is the correlation between the two tests.
                              Percentage of
     Difference in SAS     students obtaining
    scores from first to     this extent and
     second occasion           direction of
                                difference
     Increases by >16             5%
     Increases by >12             10%
     Increases by >9              15%
     Decreases by >9              15%
    Decreases by >12              10%
    Decreases by >16              5%
                              Percentage of
      Difference in
                           students obtaining
     SAS scores from
                             this extent and
       Battery 1 to
                               direction of
        Battery 2
                                difference
       Higher by >19              5%
       Higher by >15              10%
       Higher by >12              15%
       Lower by >12               15%
       Lower by >15               10%
       Lower by >19               5%
1
  The figures in the table have assumed a mean correlation of 0.8 between the two
occasions.
2
  The figures in the table have assumed a mean correlation of 0.7 between pairs of
batteries.
                                                                     Technical Report    21
                                                                           TECHNICAL REPORT
Practical significance of differences
Finally, it needs to be remembered that a difference between two
batteries which occurs commonly in the general population is not
necessarily insignificant. It can indicate a real, albeit common,
difference between the development of the cognitive abilities
underlying the two battery scores, with implications for the ways
in which the student concerned is likely to progress academically.
Such differences need to be interpreted in the light of all that is
known of a student’s background and educational record. For
example, students who have a background of poor socio-economic
and educational opportunities who gain higher scores for Nonverbal
Reasoning than for Verbal Reasoning may not have any real
difference between their abilities to reason with words and with
shapes. Instead, they may not have had the chance to acquire the
basic reading and word knowledge needed to perform well on the
verbal tasks. On the other hand, if they have good socio-economic
and educational backgrounds, then the score difference may
suggest that there is a genuine difference in abilities to think with
words and with shapes.
                                                                           Technical Report   23
                                                                                               TECHNICAL REPORT
Verbal-Spatial profile
The table below shows the proportion of males and females within the
verbal-spatial profile for primary and secondary schools.
                                         Primary                         Secondary
    Verbal-Spatial Profile   Female       Male        Total     Female     Male      Total
    Extreme spatial bias       1%          2%          1%         1%        2%        2%
    Moderate spatial bias      3%          6%          5%        3%         6%        5%
    Mild spatial bias          9%          11%        10%        9%        14%       11%
    No bias                    68%        67%         68%        66%       63%       65%
    Mild verbal bias           13%         9%         11%        13%       10%       11%
    Moderate verbal bias       5%          3%          4%        5%         4%        5%
    Extreme verbal bias        1%          1%          1%        2%         1%        2%
                              100%        100%        100%       100%      100%      100%
                                                          Technical Report   25
                                                                                                    TECHNICAL REPORT
CAT4 IRISH EDITION
CAT4 Irish standardisation
Irish age-based norms for the CAT4 were derived from the
administration of four levels of the tests (D to G) to students in random
samples of primary and second-level schools nationwide in 2012. The
Irish version of the tests has the same content as the UK edition and is
aimed at the following students:
                          Number of
       Test level
                           students
        Level D              1,733
        Level E              1,818
        Level F              1,678
        Level G              1,387
         Total               6,617
Test reliability
The reliability of a test is a measure of the consistency of a student’s
test scores over repeated testing, assuming conditions remain
the same – that is, there was no fatigue, learning effect or lack of
motivation. Tests with poor reliability might result in very different
scores for a student across two test administrations.
The test reliabilities of the Irish version are high and are similar to the
UK edition.
                                                      CAT4 reliability
                     Verbal          Quantitative        Nonverbal        Spatial
       Test                                                                           Overall
                    Reasoning         Reasoning          Reasoning        Ability
       level                                                                          CAT4
                     Battery           Battery            Battery         Battery
      Level D         0.89              0.90                0.88           0.87        0.96
      Level E         0.89              0.88                0.86           0.87        0.95
      Level F         0.90              0.87                0.84           0.88        0.95
      Level G         0.91              0.86                0.83           0.88        0.95
    Average D-G       0.90              0.88                0.85           0.87        0.95
However, most tests show the 90% chance or confidence bands. For
values around the average, the 90% confidence band is as follows:
Gender differences
The table below shows the average SAS scores for all the students who
took part in the Irish standardisation, by gender.
Males were on average around three SAS points higher and around
1.5 SAS points higher for Spatial. Females were around one SAS point
higher than for males the Verbal and Nonverbal Batteries.
                                                                                      Technical Report   27
                                                                                             TECHNICAL REPORT
Irish Leaving Certificate indicators
Results were collected from 870 students who completed CAT4
and the Leaving Certificate. Subject grades were obtained as either
Ordinary (O) or Higher (H) level. The equivalence between the
Ordinary and Higher grades as set out in https://www.cao.ie/index.
php?page=scoring&s=lcepointsgrid was used to combine results from
the two levels to a common scale. For example, Higher 6 grade is
equivalent to Ordinary 2 grade and both of these have 46 points.
The strength of the relationship between two variables can be
measured by a statistic called the correlation coefficient. A value of
zero indicates no relationship between the two measures, whereas a
value of one indicates a perfect positive relationship. The correlations
between CAT4 scores and Leaving Certificate subjects grades are
shown below. These show that the overall mean CAT4 SAS has a
moderate to strong association with the subject grades.
The Leaving Certificate indicators for each subject are derived from the
statistical relationship between CAT4 scores and Leaving Certificate
subject grades or points scores. Indicators are calculated from the mean
CAT4 Standard Age Score (SAS) for Maths, Physics, Chemistry, Art and
Construction Studies and are based on verbal SAS for the other subjects.
                                                                          Technical Report   28
                                                                                                          TECHNICAL REPORT
CAT4 and Leaving Certificate ‘Best 6’ score
A summary ‘Best 6’ indicator based on the total points score for Maths
and the best of five other subjects was calculated for each student. The
correlation between the 'Best 6' points score and the mean CAT4 score
was 0.61 and the relationship is displayed graphically below.
600
                        500
  Best 6 points score
400
300
200
100
                         0
                              70   75   80   85   90   95    100     105   110   115   120   125   130
                                                                                       Technical Report   29
                                                                                                       TECHNICAL REPORT
Leaving Certificate indicators for groups of students
The table below shows how the group/class indicators have been
calculated for a fictitious class with five students and shows the most
likely grade achieved and the probabilities associated with getting
different Mathematics grades. The group indicator is an average of the
individual student outcomes and probabilities.
Calculating group indicators for Mathematics for a fictitious class of
five students
     1       70        99       95%    3%    1%    0%    0%    0%    0%    0%      0%        O5
     2       85        217      72%    14%   8%    4%    1%    1%    0%    0%      0%        O5
     3      100        337      25% 20%      21%   16%   8%    5%    3%    2%      0%        O3
     4       115       457      4%     5%    10%   17%   16%   17%   17%   11%     3%      H5/O1
     5      140       600       0%     0%    0%    1%    2%    3%    10% 36% 47%             H2
   Group indicator
                      342       39%    8%    8%    8%    6%    5%    6%    10%     10%
     (average)
Technical Report 30