Sampling methods
A. K. Sharma
Humanities and Social Sciences Department
Indian Institute of Technology Kanpur
9 Februrary 2011 IME Deptt, IIT Kanpur 1
The aim of sampling theory is
To select a part of the universe that
represents the whole
To save money, time and other resources
To collect better quality data through better
training, management and reduce the errors
(non-sampling)
To assist in in-depth studies of sub-samples
9 Februrary 2011 IME Deptt, IIT Kanpur 2
Some important terms
Population/universe
Target population
Sampling frame
Sample units
Sample design
9 Februrary 2011 IME Deptt, IIT Kanpur 3
Errors
Sampling error measured by mean square
error, standard error or relative error of
estimates
Design effect (for cluster sampling, etc.)
Non-sampling errors
Sampling error decreases as sample size
increases but non-sampling error increases
as sample size increases
Large size of sample is, therefore, not always
better
9 Februrary 2011 IME Deptt, IIT Kanpur 4
Sampling errors: bias and
precision
Bias is a term which refers to how far the
average statistic lies from the parameter it is
estimating. Errors from chance will cancel
each other out in the long run, those from
bias will not.
Precision is a measure of how close an
estimator is expected to be to the true value
of a parameter.
Mean square error (MSE) = Sampling
variance of the estimate + Bias2
9 Februrary 2011 IME Deptt, IIT Kanpur 5
The figure shows the difference
Precise Imprecise
Biased
Unbiased
9 Februrary 2011 IME Deptt, IIT Kanpur 6
For bias:
Calculate the difference between the expected
value
of the estimate and the true value of the parameter
For precision:
Calculate the standard error of the estimate
9 Februrary 2011 IME Deptt, IIT Kanpur 7
Sampling methods
Probability sampling (selection by chance)
Non-probability sampling (selection by researcher)
restricted to population readily accessible
selected haphazardly
small heterogeneous pop. – typical units
sample of volunteers
Matched sampling (experimental design)
Independent sampling (for comparing the findings
for different ways of measuring or interviewer’s bias)
Theoretical sampling
9 Februrary 2011 IME Deptt, IIT Kanpur 8
Thus the sample can be classified as
Random sampling
Stratified random sampling
Systematic sampling
Cluster sampling
Convenience sampling ( for inexpensive
approximation of the truth)
Respondent driven sampling
Judgment sampling
Quota sampling
Multistage sampling
Theoretical sampling
9 Februrary 2011 IME Deptt, IIT Kanpur 9
Random sampling
Random, or probability sampling, gives each
member of the target population a known and equal
probability of selection. The two basic procedures
are:
1 the lottery method, e.g. picking
numbers out of a hat or bag
2 the use of a table of random
numbers.
9 Februrary 2011 IME Deptt, IIT Kanpur 10
- More suitable for small populations
for which a sampling frame is available
- When the size of units varies Probability
Proportional to Size (PPS) method may
be more desirable
9 Februrary 2011 IME Deptt, IIT Kanpur 11
Inappropriate for large populations
due to lack of sampling framework
Costly
Extreme sample units probable
9 Februrary 2011 IME Deptt, IIT Kanpur 12
Variance of the estimate of population
proportion (SRS)
Depends on: population parameter (P), sample
size (n) and population size (N)
PQ N n
V ( p) ( )
n N 1
9 Februrary 2011 IME Deptt, IIT Kanpur 13
Variance of mean
The variance of sample mean depends on
population variance (S2), sample size (n), and
the sample fraction (n/N).
S ( N n)
2
V ( y)
n N
9 Februrary 2011 IME Deptt, IIT Kanpur 14
Stratified sampling
Stratification is the process of grouping members of
the population into relatively homogeneous
subgroups before sampling.
The strata should be mutually exclusive and
exhaustive.
It produces a weighted mean that has less variability
than the arithmetic mean of a simple random sample
of the population.
9 Februrary 2011 IME Deptt, IIT Kanpur 15
Mean of stratified sample
Estimate of mean used in stratified sampling is not
same as sample mean
L
y Wh y h
st
h 1
9 Februrary 2011 IME Deptt, IIT Kanpur 16
Strategies
Proportionate allocation uses a sampling fraction
in each of the strata, proportional to that of the
total population.
Optimum allocation (or disproportionate
allocation) – sample size from each stratum is
proportionate to the standard deviation of the
distribution of the variable in the stratum, size of
the stratum, and indirectly proportional to the
square root of the cost of sampling per unit in the
stratum. Larger samples are taken in the strata
with the greater variability, greater size and lesser
cost of sampling to generate the least possible
sampling variance.
9 Februrary 2011 IME Deptt, IIT Kanpur 17
Proportionate allocation
nh N h
n N
9 Februrary 2011 IME Deptt, IIT Kanpur 18
Optimum allocation
nh N h S h / ch
n (N h S h / ch )
9 Februrary 2011 IME Deptt, IIT Kanpur 19
Advantages of stratified random sampling
Focuses on important subpopulations but
ignores irrelevant ones
Improves the accuracy of estimation
Estimation with a desired precision from
selected strata
Sampling equal numbers from strata varying
widely in size may be used to equate the
statistical power of tests of differences
between strata.
9 Februrary 2011 IME Deptt, IIT Kanpur 20
Disadvantages
Can be difficult to select relevant
stratification variables.
Not useful when there are no
homogeneous subgroups.
Is more expensive sometimes.
Requires accurate information about the
population, or introduces bias.
Looks randomly within specific sub
headings.
9 Februrary 2011 IME Deptt, IIT Kanpur 21
A few sampling tips
1. Calculate what percentage of people that you will need to
survey within a subgroup (number of people to survey
divided by total subgroup size). Finally, calculate the number
of people in each of the subgroups that are needed to
achieve this same ratio (multiply the percentage from step 3
by the size of each of the other subgroups). This is how many
people you will need to survey within each group.
OR
Calculate the number of people required to achieve your
desired error level and level of confidence for a subgroup.
2. Determine the size of the smallest subgroup in your
population. For example, if you want to look at males vs.
females and there are fewer females, then this is the
group you want to look at.
9 Februrary 2011 IME Deptt, IIT Kanpur 22
For stratified sample
Vopt V prop Vran
9 Februrary 2011 IME Deptt, IIT Kanpur 23
Systematic sampling
Systematic sampling is a modification of random
sampling. To arrive at a systematic sample we
simply calculate the desired sampling fraction, e.g.
if there are 100 patients of a particular infection in
which we are interested and our budget allows us
to sample say 20 of them then we divide 100 by 20
and get the sampling fraction 5. Thereafter we go
through our sampling frame selecting every 5th
patient.
Example: From the first five select one randomly.
If that happens to be 3 your sample units are 3, 8,
13, 2011
9 Februrary 18,… IME Deptt, IIT Kanpur 24
Cluster sampling
Involves dividing the entire population into
clusters, each representing the universe
Selection of one or more clusters randomly or
otherwise
Commonly used by treating contiguous
geographical areas as clusters
Reduces cost and time
Design effect (2-4)
9 Februrary 2011 IME Deptt, IIT Kanpur 25
Quota sampling
Is part of non-probability sampling
In this method a definite quota is fixed for
different sub-groups in the population which
reflects the proportion of the sub-group in the
entire population
Requires a good understanding of population
composition
Is comparable to stratified random sampling
with proportionate representation
9 Februrary 2011 IME Deptt, IIT Kanpur 26
Advantages of quota sampling
1. Quota sampling is less costly. A quota interview on
average costs only half or a third as much as a random
interview, but we must remember that precision is lost.
2. It is easy administratively. The labour of random
selection is avoided, and so are the headaches of non-
contact and callbacks.
3. If fieldwork has to be done quickly, perhaps to reduce
memory errors, quota sampling may be the only
possibility, e.g. to obtain immediate public reaction to
some event.
4. Quota sampling is independent of the existence of
sampling frames.
9 Februrary 2011 IME Deptt, IIT Kanpur 27
Disadvantages of quota sampling
1. It is not possible to estimate sampling errors with quota
sampling because of the absence of randomness.
Some people argue that sampling errors are so small compared
with all the other errors and biases that enter into a survey that
not being able to estimate is no great disadvantage. One does
not have the security, though, of being able to measure and
control these errors.
2. The interviewer may fail to secure a representative sample of
respondents in quota sampling. For example, are those in the
over 65 age group spread over all the age range or clustered
around 65 and 66?
3. Social class controls leave a lot to the interviewer's judgment.
4. Strict control of fieldwork is more difficult, i.e., did interviewers
place respondents in groups where cases are needed rather
than in those to which they belong.
9 Februrary 2011 IME Deptt, IIT Kanpur 28
Multistage sampling
The population is regarded as being composed of a number of
first stage or primary sampling units (PSU's) each of them
being made up of a number of second stage units in each
selected PSU and so the procedure continues down to the
final sampling unit, with the sampling ideally being random
at each stage.
The necessity of multistage sampling is easily established.
PSU's for national surveys are often administrative districts,
urban districts or parliamentary constituencies. Within the
selected PSU one may go direct to the final sampling units,
such as individuals, households or addresses, in which case
we have a two-stage sample. It would be more usual to
introduce intermediate sampling stages, i.e. administrative
districts are sub-divided into wards, then polling districts.
9 Februrary 2011 IME Deptt, IIT Kanpur 29
Special situations warranting non-
probability sampling
Quick evaluation studies (Rural development
with proportions fixed for certain categories)
Studies of hidden populations (MSM by
respondent driven sampling)
Situation analysis (study of
providers/clinics/PHCs/CHCs/VCTCs)
Theoretical sampling
9 Februrary 2011 IME Deptt, IIT Kanpur 30
Non-sampling errors
Failure to measure some of the units due to failure
to locate some individuals
Errors of measurement on a unit
Non-response bias
Careless data collection
Errors of classification
Data entry errors
Inappropriate analysis
Misrepresentation of facts
All other types of errors (telescoping, reference
period, etc.)
9 Februrary 2011 IME Deptt, IIT Kanpur 31
Thank you!
9 Februrary 2011 IME Deptt, IIT Kanpur 32