Cambridge International Primary Programme Test
Documentation
This document is intended for users of the Cambridge International Primary Programme Tests. It
describes the nature of the tests, including assessment structure, and describes the process by
which the tests were created and standardised, their intended uses and their limitations. This
information will allow you to understand the test scores in context and make sound interpretations
of the scores.
Purpose and Aims
Cambridge International Primary Progression Tests
The Cambridge International Primary Progression Tests are for use by teachers within their
classrooms to assess the performance and progress of the children in their classes. The tests are
intended for children in local and international schools around the world who receive most of their
teaching in English. The tests assume proficiency in English, although do not require English to be
the students mother tongue. They have been written to avoid colloquial language and
inappropriate contexts and ensure that subject skills are being assessed rather than language
proficiency.
In Mathematics and English, there is a test for each of Stages 3-6 of the Cambridge Primary
Programme; there is also a science test at Stage 6, as shown in the table.
Stage 3 Stage 4 Stage 5 Stage 6
English 9 9 9 9
Mathematics 9 9 9 9
Science 9
The age at which children will be ready for the tests depends on the local teaching context. As a
general guide, Stage 1 will be the first year of primary schooling when the children are
approximately 5 years old, and Stage 6 will be the final year of primary schooling and the year in
which the children reach their 11th birthday. Each Stage would usually be taught over the course
of a year, but in some cases it may be appropriate to take more or less time to teach the material.
Each school should implement the curriculum as best suits their own needs and the test delivery
should be timed appropriately. The Progression Tests are designed to give broad coverage of all
the learning outcomes in a Stage and so should be administered after teaching is complete in any
one subject at any one Stage.
The tests results can be used in a variety of ways to suit the needs of the teacher, and the tests
were designed with the following aims:
To provide a measure of the performance of students relative to a standard external benchmark;
To allow tracking of performance from one year to the next to determine whether students are
making progress;
To allow identification of strengths and weaknesses in individual and class performance.
Cambridge International Primary Achievement Test
The Cambridge International Primary Achievement Test is available in English, mathematics and
science and is an internally assessed, externally moderated examination that provides students
leaving primary schooling with a Certificate of Achievement. The test is designed for children at
the end of their primary schooling, who have covered the learning objectives of Stages 1 to 6 of the
Cambridge International Primary Programme. As for the Progression Tests, the Achievement tests
are intended for children who receive most of their teaching in English. These tests are also
written to avoid bias caused by language and context and also followed the procedures detailed
below.
Test structure and design
Subject content assessed
The Cambridge International Primary Programme Tests are designed to assess the learning
outcomes stated in the Cambridge Primary Curriculum Framework. The Primary Curriculum
Framework for mathematics and English is constructed such that each Stage builds upon the
knowledge and skills developed in the previous Stage. Thus, each of the Primary Progression
Tests assesses learning outcomes from across the current Stage and incorporates some from the
Stages below.
The Primary Curriculum Framework for science progressively introduces scientific enquiry skills
which develop from one Stage to the next, but the content is not dependent on previous Stages
and can be taught in any order. There is a Primary Progression Test for Stage 6 only and this test
covers the content from all four Stages of the Primary Curriculum Framework for Science.
If the Curriculum Framework supplied by CIE is not being followed, the teacher will have to
consider the material that has been taught alongside the question papers and the Curriculum
Framework for that year and decide whether the tests are suitable for their classes. If the material
that has been taught differs from that assessed the results should be interpreted accordingly.
Assessment structure
English is assessed using two question papers. The first is based upon a non-narrative text and
assesses reading, writing and usage skills. The second is based on a narrative text and assesses
reading and writing. Mathematics is assessed using three question papers. In the first two papers
questions will be set on all aspects of the curriculum, and from Stage 5 a calculator is to be used
for Paper 2 (calculators are not allowed for any other mathematics Progression Test papers).
Paper 3 is a mental mathematics paper and specifically assesses the students ability to perform
mathematical operations in their head.
Science is assessed using two question papers that are both made up of questions from across
the curriculum.
Question difficulty
The tests contain a range of question difficulties. Most questions are targeted at the expected
average ability of the students but all tests contain some easier questions and some harder ones.
In mathematics, as far as possible, the questions appear in order of difficulty. Detailed test
specifications showing distributions of marks by topic can be found in the mark scheme booklets
that accompany the tests.
Item types
A variety of item types is used throughout.
In English, reading and usage are assessed using structured short answer, matching and multiple
choice type questions. Writing is assessed using a single open-ended sustained writing task on
each paper.
In mathematics and science, structured questions requiring numeric, one-word or short answers
are used in conjunction with questions that require graphical answers, matching, and multiple
choice.
Scoring procedures
The tests are designed to be marked by teachers using detailed, published mark schemes. Whole
marks only can be awarded (with the exception of the mental mathematics papers where all
questions are worth a mark) and there is no allowance for accumulation of partial credits.
However, in some questions, marks can be awarded for correct procedure even if an error has
been made resulting in an incorrect final answer. Similarly, marks can be gained in later question
parts even if incorrect information is carried forward from an earlier part.
Guidelines for test administration
1. Appropriate and inappropriate uses
It should be noted that the tests have been designed and standardised for the uses listed above; if
they are used for other purposes, e.g. to determine the ability of new students who have not
studied the curriculum, care should be taken when interpreting the results.
2. Interpretation
The results of the tests provide teachers with item-level information about student performance.
The data can be analysed to determine where there are particular strengths and weaknesses in
individuals or groups of students. The data should always be interpreted with respect to the actual
material taught to the students.
The results also provide a total mark for each paper which can be translated into an overall result
for that subject and Stage. The possible overall results are shown in the table below.
Stage 3 Stage 4 Stage 5 Stage 6
Highest 3_6 4_6 5_6 6_6
3_5 4_5 5_5 6_5
3_4 4_4 5_4 6_4
3_3 4_3 5_3 6_3
3_2 4_2 5_2 6_2
Lowest 3_1 4_1 5_1 6_1
The tests have been standardised such that the expected performance for a student of a specific
ability can be tracked from one year to the next. For example, a student who obtains 3_3 at Stage
3 will be expected to obtain 4_3 at Stage 4. If they obtain a better result than this, e.g. 4_4, they
have improved beyond expectation. Similarly, if they obtain a less good result, e.g. 4_2, they have
not progressed as well as was expected. This feature of the scoring system allows individual
targets to be set for each student and enables each student to demonstrate whether they are
achieving their potential. It should be noted, however, that at present this is only an approximate
measure of progression because the data does not yet exist to completely verify the relationship
between one Stage and the next. The section on standards setting below explains in more detail
how these level bands were set.
Test development
Item writing guidelines
All items were written following guidelines that covered question style and appropriate content and
context. The language used in questions was deliberately kept simple, with straightforward
grammatical constructions, limited vocabulary and avoidance of the colloquial. This was done to
ensure, as far as is possible, that the interpretation of the questions themselves did not challenge
the language abilities of the students so that the tests assess the students abilities in the subject
rather than their success in interpreting the questions. Similarly, question content and context was
carefully chosen to avoid material that may be offensive and to prevent bias or problems
associated with unfamiliarity.
Item writing process
Initial item writing
The items were written by a team of authors with experience of teaching and test writing for the
relevant age groups. All items were revised prior to being put forward to a test construction
committee, which consisted of item authors and representatives from University of Cambridge
International Examinations and Cambridge Assessment. The test construction committee made
further amendments to items and discarded those that were inappropriate, ambiguous, repetitions
of other items or felt to be weak for some other reason. The outcome of the test construction
committee meetings were two parallel forms of each test. The exceptions to this were English
Stages 4 and 5 where only one version of each test was written.
Initial trial
Trials of the parallel forms of the tests were conducted using an international cohort of children
from seven schools in five countries. The schools were all CIE Centres, with the exception of one
which has since become a Centre. Each question was attempted by approximately 32 children on
average.
The results of these trials were used to create the final versions of the test papers. For science
and mathematics all the items from both parallel forms were considered together and the best
items for each topic were chosen for the final test papers, as follows.
The original specification for distribution of marks by topic was adhered to.
Marker experience was used to discard items that where unexpected answers were given
or it appeared that students misunderstood the question.
Item level statistics were used to determine whether a question had an appropriate facility
and discrimination.
Items that behaved strangely, e.g. appeared to be harder for more able students, were
discarded.
Items that were too easy were discarded except where they formed an essential part of a
multi-part question.
Items that were very hard were discarded unless there was an educational reason for not
doing so, e.g. if they assessed a curriculum area that often causes students trouble.
The majority of items used had a facility between 0.4 and 0.8. Some items with a facility of
greater than 0.8 or less than 0.4 were kept when they were required to fulfil the test
specification because all other options had been discarded for other reasons. It was
deemed appropriate that there be some easy items because the tests are designed to
provide a positive experience for all takers, and that there be some difficult items to provide
a challenge for more able students.
Items that showed little or no discrimination were discarded.
The questions for the final English papers could not be selected in this way because the individual
reading comprehension items and the writing task refer to a specific text and so it is not possible to
mix items from different papers. Although the items in the usage section do not directly refer to the
text contained within the question paper and could be answered in isolation, they are thematically
linked to the text and so it was decided that the usage items should remain in the same paper as
the text to which they are linked. Thus, where multiple versions of the papers existed, the paper
with the best overall and individual item statistics was selected in its entirety as the final version.
Individual item statistics were considered and any items that showed poor responses were edited
or replaced, based upon the statistical results and the professional judgement of the examiners.
Pre-test and standard setting
The papers were pre-tested in their final form so that cut-scores could be set. The pre-test sample
consisted of students from schools that were all CIE Centres, with the exception of one which has
since become a CIE Centre. Nineteen schools in 14 countries around the world were involved in
the pre-test and over 1500 individual students took part. The table below shows the total number
of students that attempted each question paper.
English Mathematics Science
Paper 1 329 294
Stage 3 Paper 2 363 285
Paper 3 281
Paper 1 321 257
Stage 4 Paper 2 335 235
Paper 3 218
Paper 1 308 259
Stage 5 Paper 2 296 212
Paper 3 227
Paper 1 188 169 208
Stage 6 Paper 2 154 121 177
Paper 3 151
Item level statistics (such as facility values, discrimination indices and omit rates) were generated
as a check that the questions were performing as intended, but the main analysis focussed on the
score distribution for each paper.
For each student, the scores on the individual papers were combined to give a total score for that
Stage and subject. The distribution of these total scores was considered and the mean, standard
deviation and standard errors were calculated. These statistics, including sample sizes, for each
Stage and subject are shown in Appendix 1.
It was decided, based on the statistics, that the score range could be split into six levels. These
levels would not be based on external criteria, but would be set by the performance of pupils in the
pre-test. It was also decided that there should be a smaller proportion of pupils achieving each of
levels 1 and 6 than achieving each of the other levels. Therefore the decision was made that cut-
scores should be set at the 10th, 30th, 50th, 70th and 90th percentiles for levels 2, 3, 4, 5 and 6,
respectively, with the level 1 cut-score being set at 1 mark. To allow for the fact that the pupils
would be likely to improve rapidly once the Programme is introduced to schools in its entirety, the
cut-scores were actually set at the 15th, 35th, 55th, 75th and 95th percentiles of the pre-test pupils,
with the level 1 cut-score being set at 1 mark.
This method equates the tests from one Stage to the next by making the assumption that the
distribution of student ability remains consistent from one year to the next. This assumption is
most likely to hold if the same Centres are represented for each Stage, but in the whole sample not
all the Centres used all the Stages. Thus, for each pair of consecutive Stages, score distribution
tables were made containing only data from those Centres that used both Stages in the pair.
These tables showed that the proportions of pupils achieving each level, using the cut-scores set
on the whole sample, were appropriate from one Stage to the next and that extending the
assumption of consistent student ability from one Stage to the next in the whole sample was valid.
Once the cut-scores had been decided upon, individual tables were created for each Centre
showing the frequency and percentage of pupils achieving the different levels in each Stage. This
data confirmed the assumption that student performance within most Centres was consistent from
one year to the next.
Limitations of standard setting process
Two rounds of trials and item analysis produce a reasonable degree of confidence that all the
items discriminate well and function correctly, that is the higher ability students obtain a greater
average score on the item than the lower ability students. The item statistics also confirm that the
questions span a range of difficulties and that, in turn, the test overall provides a good measure of
ability and produces good discrimination between students.
As stated above, some items which were rarely correctly answered were chosen for the final
versions of the tests because they assessed a piece of knowledge or a skill that was deemed to be
important for children to master. It is likely that performance on these items will improve over time
as teachers use the results of the tests to inform their teaching in the following year.
It should be noted that students in the test sample had not been following the Primary Curriculum
Framework prior to taking the tests. Analysis of teacher questionnaires confirmed that in some
Centres there were areas where the students had not been taught some of the material assessed
in the tests, but in general the content of the question papers matched well what the students had
been taught. We expect that the overall results on the tests will improve over time as a school fully
adopts the Primary Curriculum Framework, especially if teachers use the diagnostic feedback to
improve their teaching in areas where students are struggling. As more test data becomes
available from students who have been taught using the Cambridge Primary Curriculum
Framework, CIE will update the results of the standard cohort. This will improve the usefulness of
any comparisons between individual, class or school performance and the standard cohort and
provide more accurate information when interpreting a result as meaning that the child is within a
specific percentile of the whole population. When comparing groups of students within a school
the results will be robust because they do not rely on the external cohort.
Appendix 1 - Test Statistics
Mathematics
Stage 3 Stage 4 Stage 5 Stage 6
No. items 54 54 92 96
Test total mark 50 50 88 88
Sample size 224 210 197 116
Mean mark 29.8 29.7 52.7 48.9
Standard deviation 10.0 10.1 14.9 15.2
Standard error of the 0.67 0.70 1.06 1.41
mean
Standard error of 2.89 2.89 4.14 3.70
measurement
Cronbachs Alpha 0.92 0.92 0.92 0.94
English
Stage 3 Stage 4 Stage 5 Stage 6
No. items 54 54 52 48
Test total mark 85 85 85 85
Sample size 237 177 160 156
Mean mark 30.8 29.8 29.3 40.4
Standard deviation 14.8 13.9 17.1 12.3
Standard error of the 0.96 1.04 1.35 0.98
mean
Standard error of 3.87 3.85 4.09 4.01
measurement
Cronbachs Alpha 0.93 0.92 0.94 0.89
Science
Stage 6
No. whole questions 32
No. sub-questions 87
Test total mark 100
Sample size 153
Mean mark 59.5
Standard deviation 14.3
Standard error of the 1.15
mean
Standard error of 4.12
measurement
Cronbachs Alpha 0.92