Data Collection
Populations and Samples
In stats, a population is defined as the whole set of items that are of interest – e.g. all the
people in a town.
A census observes or measures every member of a population.
A sample is a selection of observations taken from a subset of the population which is used
to find out information about the population as a whole.
Method                           Advantages                       Disadvantages
Census                               •    Should give                 •   Time consuming and
                                          completely accurate             expensive
                                          result                      •   Cannot be used when
                                                                          the testing process to
                                                                          destroys the item
                                                                      •   Hard to process large
                                                                          quantity of data
Sample                               •    Less time consuming         •   The data may not be
                                     •    Fewer people have to            as accurate
                                          respond                     •   Sample may not be
                                     •    Less data to process            large enough to give
                                          than a census                   information about
                                                                          small sub-groups
The size of the sample affects the validity of drawn conclusions. It depends on the required
accuracy. Generally, the larger the sample, the greater the accuracy. Large samples required
if population is very varied. Different samples could lead to different conclusions.
Individual units of a population are known as sampling units.
Often sampling units of a population are individually named or numbered to form a list
called a sampling frame.
Sampling
The sample should be representative of the population.
Random sampling helps to remove bias from a sample. There are three methods:
    • Simple random sampling
    • Systematic sampling
    • Stratified sampling
A simple random sample of size n is one where every sample of size n has an equal chance
of being selected.
To carry out a simple random sample, each item in a sampling frame is allocated a number
to be selected randomly.
These random numbers can be selected either using a computer or using lottery sampling
(taking numbers out of a hat.
In systematic sampling, the required element are chosen at regular intervals from an ordered
list.
In stratified sampling, the population is divided into mutually exclusive strata e.g. males and
females, and a random sample is taken from each. The proportion of each strata sampled
should be the same.
The number sampled in stratum = no. in stratum / no. in pop * overall pop size
Sampling Method                  Advantages                       Disadvantages
Simple random                       •   Free of bias                 •   Not suitable when
                                    •   Easy and cheap for               population is large
                                        small population             •   A sampling frame is
                                        samples                          needed
                                    •   Each unit has a known
                                        and equal chance of
                                        selection
Systematic                          •   Simple and quick             •   Sampling frame is
                                    •   Suitable for large               needed
                                        populations and              •   Can introduce bias if
                                        samples                          sampling frame not
                                                                         random
Stratified                          •   Sample accurately            •   Population must be
                                        reflects the population          clearly defined into
                                        structure                        strata
                                    •   Guarantees                   •   Selection within each
                                        proportional                     stratum suffers from
                                        representation within            the same
                                        population                       disadvantages as SRS
Non-random sampling
There are two types of non-random sampling:
   • Quota sampling
   • Opportunity
Quota sampling – researcher selects a sample that relfects characteristics of entire
population. The researcher meets with people and assigns them under particular
characteristics.
Opportunity sampling – taking sample from people available at the time and who fit the
criteria. e.g. first 20 people outside a supermarket
Method                           Advantages                      Disadvantages
Quota                               •   Allows small sample to      •   Can introduce bias
                                        represent population        •   Population must be
                                    •   No sampling frame               divided into groups
                                        required                    •   Non-responses not
                                    •   Quick and easy                  regarded as such
                                    •   Easy comparison
                                        between groups
Opportunity                         •   Easy                        •   Unlikely to provide a
                                    •   Inexpensive                     representative sample
                                                                    •   Highly dependent on
                                                                        individual researcher
Types of data
There is quantitive and qualitative data.
A variable that can take any value is continuous
A variable that can only take specific values is discrete
Large amounts of data can be stored in a frequency table or as grouped data.
When data is presented in a grouped frequency table, the specific data values are not
shown. The groups are known as classes:
   •   Class boundaries tell you the maximum and minimum values in a class
   •   Midpoint is average of class bondaries
   •   Class width is difference between upper and lower boundaries
The large data set
Some questions will be based on weather data from the large data set provided by edexcel.