Biostatistics
Dr. Md Jahedul Islam
Chairman & Associate Professor
Dept. of Public Health
Lecture 3-4 Premier University, Chittagong (PUC)
Definitions and Concepts
Population:
A set of all values of study elements – patients, organisms, number of students, number of diseases on some
common characteristics is called a population.
‘N’ denotes the size of population.
Example: If we want to study the average age of some TB patients then all the patients of the study will be
the population in this case.
Sample:
A representative small part of population is known as sample.
In many particular situations it is impossible or even impractical to study the whole population, in such case
only a small part of population is taken under consideration about the population. Such a part of population is
known as sample.
Sample size is denoted by ‘n’.
Sample Unit: Each member of population.
Constant:
Quantities that do not vary.
In Biostatistics the summary value or constant of a variable that describes the population such as mean, standard deviation,
standard error, correlation coefficient and proportion of a particular population are considered as constant.
Example: mean height, birth rate, yield etc.
Variable:
The measurement of individuals / elements of a population having certain characteristics may vary from one to another either
in magnitude or in quality. These measurable characteristics are called variables.
Example: Height, weight, age, marks, family size, gender, etc. are some examples of variables.
Types of variable:
Variables can be classified in different ways.
Variable may be either quality based and quantity based.
Qualitative Variable
A qualitative variable is one for which numerical measurement is not possible. The quality based observations can neither meaningfully
ordered nor physically measured.
Example: hair color (brown, black, white etc.), religion (Muslim, Hindu, etc.), gender (male, female)
Quantitative Variable
A quantitative variable is one which is expressed through numerical measurement. On the other hand a quantity based variable is one for
which the resulting observations are numeric.
Example: height, weight, yield, death rate, no. of incidence etc.
Quantity based variable can be classified into two groups based on numeric values.
Discrete variable:
When a variable can assume only fixed values is called discrete variable.
Example: No. of children in a family, no. of road accident in a year, no. of phone call in a phone booth etc.
Continuous variable:
A variable is said to be a continuous variable if any value within a given range or ranges.
Example: height of a person - it might be any value between 5.6 feet - 5.8 feet, distance - between 7 - 9 km etc.
Variable can also be classified into categories like:
Dichotomous variable:
If there are only two categories in a variable…
Example: gender (male & female), selection response (yes & no), (good & bad), (right & wrong) etc.
Polytomous variable:
If there are more than two categories in a variable.
Example: Color of rose (red, pink, black etc.)
Observation:
An event and its measurements
Example: height of plant (event) in meter (measurements).
Data:
A set of Numerical facts / figures gathered from a statistical investigation are called a data. In a statistical analysis
the first work is to collect data the raw materials of statistics.
Sources and Types of data:
There are mainly two sources of data:
1. Primary and
2. Secondary.
I. Primary data: A data is said to be primary data if it is obtained from an investigation conducted for the first time.
Thus the data collected for the first time by the investigator as original are known as primary data.
II. Secondary data: When a statistical analysis is conducted on a data set available from a prior investigation is called
a secondary data.
Other Typesof Data
1. Categorical: (e.g., Gender, Marital Status, income category)
2. Continuous: (e.g., Age, weight, height, time to achieve an outcome)
3. Discrete: (e.g., Number of Children in a family)
4. Binary orDichotomous: (e.g., response to all Yes or No type of questions)
Before presentation the data needs to be classified and then condensed. So the steps are:
1. Classification of Data
2. Prepresentation of Data
Classification:
Classification is the process of arranging data (values) of a variable in groups or classes according to ‘their
affinities’ or ‘of our interest’.
Classification is the first step towards further processing of a heterogeneous mass of data into a number of
homogeneousgroupsandsubgroupsbytheir respective characteristics.
Purpose of Classification:
Classification is necessary to serve the following purpose:
To eliminate unnecessary details.
To bring out similarity and dissimilarity.
To enable one to form mental picture of the object.
To enable one to make comparisons.
To pin point the most significant features of the data at a glance.
To enable a statistical treatment of the data.
We should know:
Frequencydistribution
Class frequency
Class Interval
Class limit
Class midpoint
Cumulative frequency
Relative Frequency
Cumulative Relative Frequency
Exercise: FrequencyTable
Aresearchstudyhasbeenconductedexaminingthenumberof children in thefamilies living in acommunity. Thefollowingdatahas
been collected basedona randomsample of n=40families from the community.
12, 21, 15, 13, 10, 11, 13, 12, 13, 14, 11, 13, 14, 5, 7, 3, 2, 4, 1, 3, 5, 18, 6, 9, 4, 2, 4, 4, 17, 16, 19, 20, 8, 9, 7, 8, 6, 5, 7, 4
Organizethis datain aFrequencyTable.
11 - 15
16 - 20
21 - 25
Frequency Table
A research study has been conducted examining the number of adults in the families living in a
community. The following data has been collected based on a random sample of n = 30 families from
the community.
2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6, 5, 4 , 2, 4, 4, 7, 6
Organize this data in a Frequency Table!
Count Cumulative Cumulative
X=No. of Relative Freq.
Adults (Frequency) Frequency Relative Freq.
0 2 2/30=0.067
1 3 3/30=0.100
2 5 5/30=0.167
3 5 5/30=0.167
4 6 6/30=0.200
5 4 4/30=0.133
6 2 2/30=0.067
7 2 2/30=0.067
8 1 1/30=0.033
Total 30
Frequency Table
Now, construct a similar frequency table for the age of patients with Heart based problems in a clinic.
The following data has been collected based on a random sample of n = 30 patients who went to the emergency
room of the clinic for Heart based problems.
The measurements are: 42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67, 53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54, 56,
52, 40, 55, 72, 69.
Age Relative Cumulative Cumulative Relative
Frequency
Groups Frequency Frequency Frequency
32 -36 yr 2 2/30=0.067
37- 41 yr 3 3/30=0.100
42-46 yr 4 4/30=0.134
47-51 yr 3 3/30=0.100
52-56 yr 8 8/30=0.267
57-61 yr 3 3/30=0.100
62-66 yr 4 4/30=0.134
67-72 yr 3 3/30=0.100
Total n=30
Exercise: Find the class mid values and the frequency and cumulative frequency from
the given data: 12, 15, 20, 22, 26, 21, 19, 14, 25, 13, 17, 16, 20, 23, 16, 18, 15, 23, 19 and
20.
2. Methods of presentation of data
1. Numerical presentation
2. Graphical presentation
3. Mathematical presentation
1 - Numerical presentation
Tabular presentation
SimplefrequencydistributionTable(S.F.D.T.)
Table (I): Distribution of 50patients at the surgical departmentof Xhospital
in June,2022according to theirABOblood groups
Blood group Frequency %
A+ 12 24
B- 18 36
AB- 5 10
O- 15 30
Total 50 100
Table(II): Distribution of 50patients at the surgical department of Xhospital in May
2023according to their age
Age (years) Frequency %
20-30 12 24
31-40 18 36
41-50 5 10
50+ 15 30
Total 50 100
Complexfrequency distributionTable
Table (III): Distribution of 20 lung cancer patients at the chest department of X hospital in
August, 2022 according to smoking
Lung cancer
Total
Smoking Cases Control
No. % No. % No. %
Smoker 15 75% 8 20% 23 38.33
Non smoker 5 25% 32 80% 37 61.67
Total 20 100 40 100 60 100
Complex frequency distributionTable
Table(IV): Distribution of 60patients at the chest department of Xhospital in May2023
according to smoking &lung cancer
Lung cancer
Total
Smoking positive negative
No. % No. % No. %
Smoker 15 65.2 8 34.8 23 100
Non smoker 5 13.5 32 86.5 37 100
Total 20 33.3 40 66.7 60 100
2- Graphical presentation
Graphs
Line graph
Frequency polygon
Histogram
Bar graph
Pie chart
Line Graph
MMR/1000 Year MMR
60 1960 50
50
40 1970 45
30 1980 26
20
10 1990 15
0 2000 12
Year
1960 1970 1980 1990 2000
Figure (1): Maternal mortality rate of (country), 1960-2000
Line Graph
Frequency polygon
Males Females Sex
%
40 Age A-M-P
M F
35
30 20-30 (12%) (10%) 25
25
20 30-40 (36%) (30%) 35
15
40-50 (8%) (25%) 45
10
5 50-60 (16%) (15%) 55
0
Age
25 35 45 55 65 60-70 (8%) (20%) 65
Figure: Distribution of 45 patients at (place) , in (time) by age and sex
Bar chart
Bar chart
M
F
M
F
Pie chart
Pie chart
D eletion
Invers ion
3%
18%
Translocation
79%
Doughnut chart
Hospital B
DM
Hospital A IHD
Renal
Histogram
Histogram
% 35
30
33
25 27
20 22
20
15 17 16
10 14 14
11
5
0
Age (years)
Figure: Distribution of cholera patients by age