Seven Star Health Science and Business College
Biostatistics (SPH1)4031
Zenaw A. (MSc.)
April 2024
Chapter-1
Introduction to Biostatistics
Contents
o Definition and Classification of Statistics
o Definition of Some Basic Terms
o Stages in Statistical Investigation
o Introduction to Biostatistics and its Applications
o Types of Variables and Measurement Scales
o Methods of Data Collections
Definition of Statistics
o The term Statistics is used to mean either statistical data or
statistical methods.
o Statistical data refers to numerical descriptions of things.
o It is the raw data themselves such as statistics of births, deaths,
age-sex distribution of HIV Test result, etc.
o As statistical methods, Statistics is the subject that deals with
the collection, organization, presentation, analysis and
interpretation of numerical data so as to make valid conclusions
and reasonable decisions.
It is the science concerned with developing and studying
methods for collecting, organizing, presenting, analyzing,
interpreting numerical data.
o Statistics is the art or science of learning from data.
o The discipline of statistics teaches us how to make an informed
decisions in the presence of uncertainty and variation.
Statistical
Data Information
tools
Common terms in Statistics
o Population refers to the entire group or set of individuals,
objects or events with a specific property in common.
o Census is the entire enumeration of the population.
o Sample is part or portion of a population.
o Sampling is the process or method of sample selection from the
population.
o Sample Size is the number of elements or observation to be
included in the sample.
o Parameter is a numerical quantity computed from population
data.
o Statistic is a numerical quantity computed from sample data.
o Variable is an attribute or an entity that can take different
values.
Classification of Statistics
o There are two broad classifications of Statistics.
o Descriptive statistics is part of statistics which focus on
organization, presentation and summarization of sample data.
o It consists of a set of methods to describe the characteristics of a
set of data.
o The frequency distribution, measure of central tendencies like
mean, median and mode; measure of variation such as range and
standard deviation belong to this category of statistics.
o In the process, the descriptive statistics is used to make the raw
data becomes more manageable, presented in a logical form and
show patterns and easily understandable.
o Example
o The data shows that 50% of the suspected patients have Malaria.
o The average weight of newborns is 1.75 kg.
Classification of Statistics…
o Inferential statistics is branch of statistics concerned with the
making conclusions or an inference about a particular
population.
o The inferences are drawn from particular properties of sample
to particular properties of population.
o Inferential statistics is used to make generalizations from a
sample to a population.
o This class of Statistics builds upon descriptive statistics.
o In short, inferential statistics enables us to make confident
decisions in the face of uncertainty.
o Example: The average calorie intake of all families (the
population in Ethiopia) can be estimated from figures
obtained from a few hundred (the sample) families.
sampling
Sample
Population
Descriptive Statistics
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Summarize data
– e.g., Sample mean = X i
n
Inferential Statistics
o Estimation
o Example: Estimate the population
mean weight using the sample mean
weight
o Hypothesis testing
o Example: Test the claim that the
population mean weight is 65 kg
Inference is the process of drawing conclusions or making
decisions about a population based on sample results.
Major differences between Descriptive and
Inferential Statistics
Feature Descriptive Inferential Statistics
Statistics
Purpose Presenting, Make inferences about a
Summarizing and population based on sample
describe data data
Data Entire population or Sample data
sample
Generalizability Based on results it is Results can be generalized
not possible to make to the larger population
inferences or
generalizations
Examples Mean, median, Estimation, Hypothesis
mode, range, testing, Prediction,
variance, standard Confidence intervals, etc.
deviation, etc.
Exercise
a) What are the two fundamental issues considered in Statistics?
b) Briefly explain stages of statistical investigation.
c) Explain the two major classifications of Statistics with examples.
d) Define the following terms;
o Population
o Sampling Frame
o Statistic
o Parameter
a) Classify the following statements as descriptive or inferential Statistics (Biostatistics).
1. Determining the proportion of adults who smoke cigarettes in a particular country.
2. Testing whether a new drug is effective in treating a particular disease.
3. Estimating the risk of developing cancer associated with exposure to a particular
environmental toxin.
4. Predicting the survival rate of patients with a particular type of cancer.
5. Presenting blood pressure readings for patients with hypertension using a histogram.
Cont…
o In nutshell, the field of statistics is a fundamental tools and techniques
such as;
Forming hypotheses,
Designing experiments,
Data Collection and Summarization,
Drawing inferences from data, i.e., estimation and testing
hypotheses.
o In general Statistics can be used to;
Express facts related to different situations in number.
Presents messy data in a simple and easily understandable
manner.
Make comparisons of facts from data.
Show existing trends and make future predictions.
Stages in Statistical Investigation
o There are five basic stages for any statistical investigation.
o Collection of Data refers to the process of collecting data using
measurements, survey responses, etc..
o Organization of Data is the arrangement of data in a suitable form.
It constitutes editing, classifying and tabulation.
o Presentation of Data is the process of displaying data in a precise
manner using tables, graphs and diagrams.
o Analysis of Data is the process of systematically applying statistical
and/or logical techniques to describe, illustrate, and evaluate data.
o Interpretation of Data is concerned with providing interpretation for
numerical results obtained from analysis.
Definition of Biostatistics
Biostatistics is the application of statistical methods to the
biological and life sciences.
Biostatistics is the scientific process of collecting, organizing,
presenting, analysing and interpreting health and health-
related data.
In general, when the data to be analysed are derived from the
public health, biological sciences and medicine, the term
Biostatistics comes to use to differentiate this particular
application of statistical tools and concepts to other methods.
Cont…
Biostatistics deals with data arising from medical research,
clinical trials, epidemiology, genetics, public health, and other
related fields.
It therefore includes statistical methods pertaining to health and
health-related data.
Many investigations in the biological sciences measured
quantitatively with observations consisting of numerical
information are called datum for singular and data for plural.
Indeed, statistical methods are more heavily used in health
applications than elsewhere.
Main Reasons to know Biostatistics
Health Sciences are becoming increasingly quantitative.
The planning, conduct and interpretation of much of medical
research are becoming increasingly reliant on the statistical
methodology.
As a result Statistics pervades the medical literature.
Example: Evaluation of Penicillin (treatment A) vs Penicillin
plus other (treatment B) for treating bacterial pneumonia in
children less than 2 years.
What is the sample size needed to demonstrate the significance
of one group against other ?
Main Reasons to know Biostatistics…
Is treatment A is better than treatment B or vice versa ?
If so, how much better ?
What is the normal variation in clinical measurement ? (mild,
moderate & severe) ?
How reliable and valid is the measurement ?
What is the magnitude and effect of laboratory and technical
error?
How does one interpret abnormal values ?
How can Biostatistics helps you
o Some areas of applications of biostatistics;
o Determining study design (cross-sectional, longitudinal…)
o Sampling designs and calculating corresponding sample size
o Selection of sample and controls
o Data management and Analysis
o To make valid decisions based on part of subjects taken from a
large population.
o To monitor and evaluate health programs or interventions.
Application of Biostatistics
o To Manage and summarize Health Data: Data management
and summarization is important to transform collected health
data to meaningful information.
o In comparison of health indices: To determine what is normal
or healthy in a given population and the limits of normal values;
in finding the difference between means and proportions of
normal values at two places or in different periods.
o To make an inference: Health professional could use inferential
statistics to draw conclusion about a large body of data by
examining only a part (sample) of it.
Application of Biostatistics…
o In prioritization of Healthcare service implementation
Biostatistics is used to quantify the magnitude of health and
health-related problems and informs planning, implementation
and further evaluation of such services and programs.
o To conduct Health Research: This may focus on
o Clinical trials to test effectiveness of an intervention.
o Testing vaccines, drugs or interventions and finding out if
differences observed ant test such statistical significant.
o Testing the strength of associations and identified signs and
symptoms of a disease or syndrome.
Exercises
a) Define in detail what Biostatistics means.
b) Explain the application of Biostatistics.
c) Write a report on applications of Biostatistics for medical
laboratory (at least half page).
d) Compare and contrast Biostatistics and Epidemiology
with examples.
Types of Variables
o Quantitative Variables: are variable which could take numerical
value and it makes sense to do arithmetic operations (+, -, x, /)
Example: Survival time, length of stay in a hospital, Age, BMI, etc.
Quantitative variables are of two types:
o Discrete: which can assume only certain values and there are
usually "gaps" between the values.
Example: Number of neonates in NICU, number of wards, etc.
o Continuous: which can assume any value within a specific range.
Example: BMI, systolic blood pressure, weight, height, etc.
o Qualitative or Categorical Variables: are variables that express
qualities or attributes but not take numerical values.
Example: Severity of illness, socio-economic status, medical
service satisfaction, level of anemic, etc.
Measurement Scales
o Measurement is the process of assigning numbers or symbols to
characteristics of variable of interest according to certain pre-
specified rules.
o The measurement scale types are nominal, ordinal, interval and
ratio.
Nominal Scale is basic scale of measurement used to categorize
data into distinct, non-ordered groups based on qualitative
attributes without any inherent order or ranking.
o It is used only naming, description or labelling purpose and only
counting and frequency calculations are meaningful.
o Examples of Nominal Scales
o Blood type (A,B, AB and O) or A=1, B=2, AB=3 and O=4
o Eye Color (Brown, black,..)
o Disease status: Yes, No or Yes=1 and No=0
o Treatment group (treatment, placebo)
Cont…
The nominal scale assigns numbers as a way to label or identify
characteristics.
For instance: For laboratory result of HIV test; positive can be
coded as 1 and negative as 0.
The numbers assigned have no quantitative meaning beyond
indicating the presence or absence of the characteristic under
investigation and also assigned purely arbitrary.
Any arithmetic operation applied to these arbitrary numbers is
meaningless.
Cont…
o Ordinal Scale
When the possible categories of a variable have a natural order
then the measurement is called ordinal scale.
In such scale, possible to apply mathematical inequalities but we
can not apply mathematical operations.
Example 1. Status of health service at some hospital can be
labeled as: poor, moderate, good and excellent.
It is quite obvious that there is some natural ordering: the
category 'excellent indicates a better health service than the
category 'moderate' and, thus, order relations are meaningful.
But there is no meaningful difference between label of the given
ordinal scale.
Example 2: Severity of illness (low, medium, high)
Cont…
o Interval Scale: it contains all the information of an ordinal scale,
but it also allows you to compare the differences between objects.
It is a scale with arbitrary zero point and zero does not shows a
total absence of the quantity being measured i.e. it does not have
a true zero.
In such case mathematical inequalities could be applied.
For instance: The temperature of a certain area may be zero
degree Celsius. This does not mean that there is no heat at all.
This simply indicates that it is too cool.
Example of Interval Scales : IQ and temperature
Cont…
o Ratio Scale: In ratio scales we can identify or classify objects,
rank the objects and compare intervals or differences.
Indicates actual amount of variable, shows magnitude of
differences between points on scale, proportions of differences.
Both mathematical inequalities and operations could be done.
Furthermore, there is no restriction on the kind of statistics that
can be computed for ratio scaled data i.e. all statistical
techniques are useable.
Examples: Height, laboratory measurements (blood glucose or
cholesterol level), duration of an event and others.
Summary on Scale of Measurements
Property
Scale Order Difference Meaningful Zero
Nominal No No No
Ordinal Yes No No
Interval Yes Yes No
Ratio Yes Yes Yes
Scales of Measurement…
Differences between
measurements, true
zero exists Ratio Scale
Quantitative Variable
Differences between
measurements but no Interval Scale
true zero
Ordered Categories
(rankings, order, or
scaling but no exact Ordinal Scale
difference)
Qualitative Variable
Categories (no ordering
or direction) Nominal Scale
Additional Examples
• Marital status
• Eye color
o Nominal: • Gender
• Patient ID
• Stage of disease
o Ordinal: • Severity of pain
• Level of satisfaction
• Temperature
o Interval
• Exam scores
o Ratio: • Distance
• Length
• Time until death
• Weight
o Why is level of measurements important?
Helps to select appropriate data presentation and
summarization methods.
Knowing the level of measurement helps us to decide on how to
interpret the data.
It has a pivotal role to select an appropriate statistical method
for the given data.
In general statistical methods rely on the nature of variables.
Thus, knowing and identifying scale of measurements is vital to
select an appropriate statistical tool for the available data.
Exercise
a) Define a variable.
b) Discuss the various classifications of variables.
c) Give examples for each class of variables you indicated in (b).
d) What is scale of measurements? What are its classifications?
Provide illustrative examples.
Methods of Data Collection
o Why we collect data?
To answer questions,
To make decisions,
To gain a deeper understanding of some phenomena.
o Example
Does lowering speed limit reduce the number of fatal traffic
accidents?
What fractions of students in a college belong to blood group O?
o Data: A plural noun (the singular form is datum) means a set of
known or given facts.
o Data can be collected using survey, experiment or observations.
Sources of Data
o Primary: a major source of data and first hand data is
collected by the researcher directly from the source.
• Survey, census, surveillance, experimental and observational
studies are major primary sources of data.
• Tend to require more time and expense than secondary data.
o Secondary: is a source of data in which data is collected from
existing source gathered by organization, agency or people for
their own purpose.
• The possible sources could be hospital records, health offices,
health centers, statistical offices, magazines, internet (big data),
DHS, Vital statistics and others.
Sources of Data . . .
• Example: If it is required to know the death rate of newborns at
selected private hospitals, then data can be accessed from the
medical records of selected hospitals.
• Uses of Secondary Data
• Secondary data save time and cost as compared to primary
data.
• They are less subject to intentional bias (enumerator effect).
• Secondary data are the only option for inaccessible information.
• Drawback of Secondary Data
• They may not fit all the requirements that we need.
• This may include incomplete, outdated, inconsistent and others.
Types of Data
Data
Categorical Numerical
(Qualitative) (Quantitative)
Examples: Data on
Marital Status
Cause of death Discrete Continuous
Eye Color
(Defined categories or
groups) Examples: Data on Examples: Data on
Number of patients Weight
Frequency of cough at night Blood sugar level
Number of missing teeth Survival time
(Measured characteristics)
Types of Data . . .
•Cross Section Data: a set of observations taken at one point in
time.
•Example: Data collected on HIV/AIDS status of all students
enrolled at a given University in a given year.
• Spatial Data: data collected is connected with that of a place.
• Example: District wise malaria distribution in Addis Ababa.
•Time Series Data: a set of observations collected for a sequence
of times, usually at equal intervals, which may be on weekly,
monthly, yearly etc. basis.
Methods of Data Collection
o There are various data collection methods based on the nature of
the investigation and the availability of resources.
o Direct Observation or Measurement.
o Interview using questionnaires.
o Extracting Data from Records
• Observation or Measurement-It is a method that involves
systematically selecting, watching and recording behaviors of
people or an issue to get required information.
• It includes all methods from simple observation to use of high
level machines, measurements or sophisticated equipment's.
• It provides accurate information but it is expensive.
Cont….
o Example: Laboratory tests, clinical measurements and physical
examination are examples of observation data collection methods.
o Interview with questionnaires: it is most commonly used for
data collection methods in research, the researcher design a
questionnaire.
o Questionnaires: are written documents which instruct the reader
or listener to answer the questions written on it.
o The major interview types are;
Face to face interview.
Telephone interview.
Self administered questionnaires posted on website or email.
Cont….
o Respondents (Interviewees): are individuals those who are
answered the questions on the questionnaire.
o Interviewers/Data Collectors/Enumerators: are individuals
those who record the responses given by the respondents.
o Face to Face Interviews
• The interviewer knows exactly who is responding to the
questionnaire.
o Advantages
o The interviewer can help the respondent if he/she has difficulty in
understanding the questions. The difficulty could be due to
language, concentration or limited intellectual capacity.
o There is more flexibility in presenting the items; they can range
from closed to open ended.
o There is the ability to use the method of skip patterns.
Cont….
o Skip patterns means skipping a questions or a group of
questions which are not applicable.
o Disadvantages
o It costs much in terms of time and money.
o Attribute of the interviewer may affect the responses due to:
o Bias of the interviewer and
o his/her social or ethnic characteristics.
o Untrained interviewer may distort the meaning of the
questions.
o Telephone Interviews
o Advantages
It is less expensive in time and money compared with face to face
interviews.
The interviewer is able to help the respondent if he or she
doesn’t understand the question.
Broad representative samples can be obtained for those who
have telephone lines.
o Disadvantages
Under representation of those groups which do not have
telephones.
Respondent may be substituted by another.
Problem with unlisted telephone number in the directory.
o Self administered questionnaires posted on Website or
Email
o Here the questionnaire is mailed to the respondents to be filled.
o It is known as self enumeration.
o Advantages
These are the cheapest.
There is no need for trained interviewer.
There is no interviewer bias.
Disadvantages
Low response rate
Uncompleted questionnaires due to omission or invalid
responses.
No assurance that the questionnaire was answered by the right
person
Needs intense follow up to get a high response rate.
o Extraction of Data from Records or Documentary Sources
Extracting information from existing sources.
This includes clinical or personal records like death certificate,
published mortality statistics, census publications, hospital
reports and others.
Indeed it take less expensive than the other two methods, take
care on the quality and completeness of the data.
Check availability, relevance, accuracy and sufficiency of the
data.
Advantage of secondary data
Provides a larger database as compared to primary data
Time saving
Cost effective
Disadvantages of secondary data
It is difficult to get information as needed, when records are
compiled in unstandardized manner.
Types of Questions
o In questionnaire design there are two types of questions.
o Open ended (free response)-this types of questions that allow
to have detail insight about the respondents feeling, perception
and experiences without any restriction.
o Example: In your opinion what is the main challenge in getting
ANC services.
o It stimulates free thoughts of respondents
o Helpful to obtain detail information on certain issues (sensitive).
o Close ended (restricted choice)- in this case pre-specified or
restricted choices will be given and respondents are forced to
choose of the available choice only.
o Example: How many times did you visit this hospital over the
last 3 year?
o A. Once B. Twice C. Forth D. Never
Exercise
a) What is merits and demerits of primary and secondary data
sources?
b) Explain the methods of data collections.
c) Discuss the advantages and disadvantages of each of data
collection methods.
d) What is the difference between open-ended and closed-ended
questions? Provide at least three examples for each.