23-02-2024
Statistics:
Determination of
Sample Size
Dr. Nilesh Fichadiya
M.D.(Community Medicine), D.P.H.
Calculation of Sample Size
• What should be the sample size?
To get the correct results
• To small samples Study is not valid
• To large sample Laborious, costly & time
consuming
• We need optimum sample size, which gives reliable
results
Sample and population
• If sufficiently large and unbiased sample is taken
from defined population Inference drawn from
sample can be applied to whole population
1
23-02-2024
Sample and population
• If sample is representative of population, sample
statistics Mean (X ), SD and proportion(p) will
NOT differ significantly from population parameters
µ, and P
Characteristics of Representative
Sample
1. Precision: Sample Size
2. Unbiased character: Technique of sample
selection
Precision
• Also known as reproducibility or reliability
• Defined as Ability of an instrument/test/sample to
provide the same or a very similar result with
repeated measurements of the same factor
• In case of sample, it depends on Sample size
2
23-02-2024
Precision
Precision =
n= sample size
s = Standard deviation, SD
Sample size depends on:
1. Study Objective and Study design:
To estimate a value : Descriptive study
To test hypothesis : Analytical study or experimental
study
Whether we have one or two set of data
Whether the data is paired or unpaired
Sample Size and Study Design
• In analytical study, there is ALWAYS comparison
So we have cases and controls or comparison group
Here we need to multiply sample size with (r+1)/r where
r= ratio of control to case
• In interventional study:
We have intervention and control group
We have to double the sample size
3
23-02-2024
Sample Size and Study Design
• In before-after study,
We have one group
We need not double the sample size
• In cluster sampling,
We have to multiply the sample size with design effect
(discussed in previous topic)
Sample size depends on:
2. The degree of type I error (denoted as α)
Type I error (α) is also known as p value or
level of significance
1- α is confidence level (taken as 95% or 99%)
3. The degree of type II error (denoted as β)
(1 – β) is “power” of the study (taken as 80% or 90%)
and errors
• It is the probability of the difference occurring by
chance and not in reality (Type I error) but we
conclude that the difference is real
• So, 1- (Confidence level) is probability the
difference is NOT occurring by chance
• Prior to starting a study, we set an acceptable value
for this “p.” which can be p<0.05 or p<0.01
(accordingly confidence level is 95% or 99%)
4
23-02-2024
and errors
• error is the probability that the study will miss a
true difference
i.e. there is a real difference but we miss to conclude
that
• So, (1 – β) or the “power” of the study is
probability that if there is a difference we will
detect it correctly
and errors
• As a rule:
• For estimating a value only error is considered
• For testing hypothesis both and errors are
considered
• If error is considered in the formula of sample
size, we get a higher sample size
Sample size depends on:
4. Standard deviation (SD) or variance(s) in the
population
5. The proportion of people experiencing (p) and
people not experiencing (1-p or q) the attribute
5
23-02-2024
Sample size depends on:
6. The level of acceptable or allowable error (E)
Can be taken as 5% or 10% or 20%
7. The degree of difference or effect size expected
between the two set of data (d)
We can estimate the effect size based on previously
reported or preclinical studies
If the effect size is large the sample size is less
If the effect size is small the sample size is large
Sample size is larger if:
• Confidence level desired is large
• Power is large
• Variance (or SD) is large
• Proportion of people with attribute is small
• Margin of error (allowable error) is small
Various Formulae for Calculation of
Sample Size:
• To estimate the value by descriptive cross-sectional
study
Quantitative variable (mean):
𝑍 𝑠
𝑁=
𝐸
where,
𝑍 = Value of Std. normal deviate for value of ,
s = standard deviation,
E = allowable error
6
23-02-2024
Value of Z1-/2 & Z1- for two
tailed and one tailed studies
Exercise
• Mean hemoglobin level of girls students in the
colleges is estimated to be 11.5 gm% with SD of 1.5
gm%. Calculate sample size for a study of
Haemoglobin estimation of physiotherapy colleges
of Saurashtra region with allowable error of 0.2 at
5% significance level
• 𝑁=
• N = (1.96)2(1.5)2
(0.2)2
= 3.84 x 2.25
0.04
= 216
7
23-02-2024
Sample size calculation: For
Quantitative data:
• Exercise:
• Mean pulse rate of a population is believed to be
72/minute with Standard Deviation of 8. Calculate
minimum sample size to verify this if allowable
error is 1 at 5% significance level.
Various Formulae for Calculation of
Sample Size:
• To estimate the value by descriptive cross-
sectional study
Qualitative variable (proportion):
𝑍 𝑝𝑞 where,
𝑁= p = positive character,
𝐸
q = negative character = 1 – p, or
q = 100 – p in percentage as p+q =
100%,
E = allowable error of p, usually 10%
or 20% of p
Example
Incidence rate in the last SARS CoV epidemic was
found to be 50/1000 (5%) of the population exposed.
What should be the size of sample to find
incidence rate SARS CoV in the current epidemic if
allowable error is 10% and 20%?
8
23-02-2024
p = 5%,
q = p-100=100-5=95%
L = 0.5 (at 10% of p)
L = 1 (at 20% of p)
• Sample size calculation at 10% allowable error
n= 4pq = 4 x 5 x 95 = 7600
E2 0.5 x 0.5
• Sample size calculation at 20% allowable error
n= 4pq = 4 x 5 x 95 = 1900
E 2 1x1
Sample size calculation: For Qualitative
data:
• Exercise:
Prevalence rate of Musculoskeletal disorders were
found to be 40% in earlier studies.
Calculate the size of sample required to find the
prevalence rate of Musculoskeletal disorders in your
area if allowable error is 10% or 20%.
Various Formulae for Calculation of
Sample Size:
• To establish association by case control study or
independent sample cohort study:
Quantitative Data (sample size for each group):
𝑁 =2 𝑍 +𝑍 2 𝑠 ×𝑠
---------------------------------------
𝑑
9
23-02-2024
Value of Z1- for different
Value of Z1-/2 & Z1- for two power of studies
tailed and one tailed studies
Example:
• An investigator wants to conduct a study to find out
whether there is any difference in effect of pollution on
lung function by studying force expiratory volume (FEV)
between traffic police and general population.
• From the previous study it is known that S.D.(s) of FEV are
3.5 l/min and 5 l/min among traffic police and general
population respectively.
• How many subjects from traffic police and general
population required for testing the null hypothesis that
there is no difference in FEV between traffic police and
general population.
• The investigator wishes to be 90% confident of detecting a
difference of 2.5 l/min or more in either direction at 5%
level of significance?
𝑁 =2 𝑍 +𝑍 2 𝑠 ×𝑠
---------------------------------------
𝑑
=2 (1.96 + 1.28)2 (3.5 x 5)
(2.5)2
= 2 (10.49 x 17.5)
6.25
= 58.74 59 from each population
10
23-02-2024
Various Formulae for Calculation of
Sample Size:
• To establish association by case control study or
independent sample cohort study:
Qualitative Data (sample size for each group):
𝑁 = 𝑍 +𝑍 2 𝑝1𝑞1 + 𝑝2𝑞2
----------------------------------------------
𝑑
Example
• An investigator is interested to study whether the lower
back pain among Bank officers in Ahmedabad is more
or equal to the rest of the population?
• Previous studies reported that the prevalence of Lower
back pain among Bank officers in Ahmedabad is 55%
whereas it is 45% among general population.
• The investigator wishes to be 80% confident of
detecting a difference of 10% or more in either
direction at 1% level of significance.
• How many Bank Officers and members of General
population should be included in this study ?
𝑁= 𝑍 +𝑍 2 𝑝1𝑞1 + 𝑝2𝑞2
----------------------------------------------
𝑑
= (2.57 + 0.84)2 [(0.45 x 0.55) + (0.55 x 0.45)]
(0.1)2
= (3.41)2 (0.25 + 0.25) = 581
0.01
11
23-02-2024
Various Formulae for Calculation of
Sample Size:
• To identify significant difference between two
groups by intervention study
Quantitative Data (sample size for each group):
𝑁 =2 𝑍 +𝑍 2 𝑠
For Unpaired Data,
------------------------- • Take pooled ‘s’
𝑑 • Calculated sample
size is for each
group
Calculate Pooled s or p
• Pooled 𝑠 =
• Pooled 𝑝 =
Various Formulae for Calculation of
Sample Size:
• To identify significant difference between two
groups by intervention study
Qualitative Data (sample size for each group):
𝑁 =2 𝑍 +𝑍 2 𝑝𝑞
For Unpaired Data,
-------------------------------- • Take pooled ‘p’ & ‘q’
𝑑 • Calculated sample
size is for each
group
12
23-02-2024
Bias in Sampling
• Bias: A result that differs from the true values
• Examples of Bias
• Types:
1. Selection Bias
2. Information Bias
3. Measurement Bias
4. Bias due to confounding
Bias in Sampling
• Selection Bias
Selection of subjects
Non-response
Loss to follow up
• Information Bias
Quality and extent of information obtained from
different subject
Recall bias
Bias in Sampling
• Measurement Bias
Misclassification or mis-diagnosis of subjects
Unequal diagnostic work-up in different groups
Measurement error
• Bias due to confounding
A factor that is associated with the risk factor and
disease outcome
13
23-02-2024
Thank You
14