Course Name
Medical Statistics
(Medicine, Prevention, Nursing)
Lecture-2
2025/03/05
Lecture Contents
Some definitions
Chapter-2: Descriptive Statistics
Some Definitions
Center
Variation
Distribution
Outliers
Frequency Distribution
Definitions
• Center: A representative or average value that indicates where
the middle of the data set is located.
• Variation: A measure of the amount that the data values vary.
• Distribution: The nature or shape of the spread of data over
the range of values (such as bell-shaped, uniform, or skewed).
• Outliers: Sample values that lie very far away from the vast
majority of other sample values.
• Time: Changing characteristics of the data over time.
The frequency for a particular class is the number of original values that fall
into that class.
• Lower Class Limits: These are the smallest numbers that can
actually belong to different classes
Lower Class
Limits
• Upper Class Limits: These are the largest numbers that can
actually belong to different classes.
Upper Class
Limits
• Class Boundaries: These are the numbers used to separate
classes but without the gaps created by class limits.
59.5
69.5
79.5
89.5
Class
Boundaries 99.5
109.5
119.5
129.5
• Class Midpoints: These are the numbers used to separate
classes but without the gaps created by class limits.
64.5
74.5
84.5
Class 94.5
Midpoints 104.5
114.5
124.5
• Class Width: It is the difference between two consecutive
lower class limits or two consecutive lower class boundaries.
10
10
Class
Width 10
10
10
10
• Relative Frequency: The relative frequency of a class is the
portion or percentage of the data that falls in that class to
the relative frequency of a class, divide the frequency by the
sample size.
• Cumulative Frequency: The cumulative frequency of a class is
the sum of the frequency for that class and all previous
classes.
Frequency Distribution or Frequency Table
• Frequency Distribution or Frequency Table: It shows how a data
set is partitioned among all of several categories (or classes)
by listing all of the categories along with the number of data
values in each of the categories.
Reasons for Constructing Frequency Distributions
• Large data sets can be summarized.
• We can analyze the nature of the data.
• We have a basis for constructing important graphs.
Key Concept
• When working with large data sets, it is often helpful to
organize and summarize data by constructing a table called a
frequency distribution.
• Because computer software and calculators can generate
frequency distributions, the details of constructing them are
not as important as what they tell us about data sets. It
helps us understand the nature of the distribution of a data
set.
Constructing a Frequency Distribution
• Determine the number of classes (should be between 5 and 20).
• Calculate the class width (round up).
Maximum value – Minimum value
Class width Number of classes
• Starting point: Choose the minimum data value or a convenient
value below it as the first lower class limit.
• Using the first lower class limit and class width, proceed to
list the other lower class limits.
• List the lower-class limits in a vertical column and proceed to
enter the upper-class limits.
• Take each individual data value and put a tally mark in the
appropriate class. Add the tally marks to get the frequency.
Relative Frequency Distribution
• Includes the same class limits as a frequency distribution,
but the frequency of a class is replaced with a relative
frequency (a proportion) or a percentage frequency ( a percent)
Class frequency
Relative frequency =
Sum of all frequencies
Class frequency
Percentage frequency = 100%
Sum of all frequencies
Frequency Distribution &
Relative frequency Distribution
• Example: Using the pulse rate of females given in Table 2.2,
follow the procedure to construct the frequency distribution
and relative frequency distribution with 7 classes.
76 72 88 60 72 68 80 64 68 68 80 76 68 72 96 72 68 72 64 80
64 80 76 76 76 80 104 88 60 76 72 72 88 80 60 72 88 88 129 64
Class- interval Frequency Relative Frequency
60-69 12
70-79 14
80-89 11
90-99 1
100-109 1
110-119 0 0.000
120-129 1
Total = 40
Relative frequency Distribution
Total Frequency = 40
* 12/40 100 = 30%
Cumulative frequency Distribution
Cumulative Frequencies
Frequency tables
Graphs of Frequency Distribution
• Sometimes it is easier to identify patterns of a data set by
looking at a graph of the frequency distribution. One such
graph is a frequency histogram.
• A graph consisting of bars of equal width drawn adjacent to
each other (without gaps). The horizontal scale represents the
classes of quantitative data values and the vertical scale
represents the frequencies.
Frequency histogram
• It is a bar graph that represents the frequency distribution
of a data set.
• The heights of the bars correspond to the frequency values.
• A histogram has the following properties.
1. The horizontal scale is quantitative and measures the data
values.
2. The vertical scale measures the frequencies of the classes.
3. Consecutive bars must touch.
Frequency histogram
Basically a graphic version of a frequency distribution.
Histogram
• The bars on the horizontal scale are labelled with one of the
following.
1. Class boundaries
2. Class midpoints.
3. Lower class limits (introduces a small error).
Horizontal Scale for Histogram: Use class boundaries or class midpoints.
Vertical Scale for Histogram: Use the class frequencies.
Relative Frequency histogram
Has the same shape and horizontal scale as a histogram, but the
vertical scale is marked with relative frequencies instead of
actual frequencies
Example 2-1
Measures of Central Tendency
• A measure of central tendency is a value that represents a
typically, or central entry of a data set. Three commonly used
measures can be used for describing the central tendency of a
population or sample are; Mean, Median, and Geometric mean.
• Normal distributional data can use arithmetic mean. Biased
distribution data use a median, and some data becoming normal
distribution or approximately normal distribution after
logarithm counterchange will use the geometric mean.
Measure of Center
• The value at the center or middle of a data set
Arithmetic Mean
• The measure of the center obtained by adding the values and
dividing the total by the number of values.
• What most people call an average.
Notation
denotes the sum of a set of values.
x is the variable usually used to represent individual data values.
n represents the number of data values in a sample.
N represents the number of data values in a population.
x is pronounced ‘x-bar’ and denotes the mean of a set of sample values
x
x = n
µ is pronounced ‘mu’ and denotes the mean of all values in a population
x
µ =
N
Arithmetic Mean
Find the mean of the data summarized in the given frequency distribution table:
Pulse Rates of Frequency
Females
60 – 69 12
70 – 79 14
80 – 89 11
90 – 99 1
100 – 109 1
110 – 119 0
120 – 129 1
Total = 40
Weighted mean
Mean for Grouped data
• The weighted mean is the mean of the data set whose entries
have varying weights. A weighted mean is given by
(fx)
x = f
Where fx is the weight of each entry x.
Class interval
60 – 69 12 64.5 774.0
70 – 79 14 74.5 1043.0
80 – 89 11 84.5 929.5
90 – 99 1 94.5 94.5
100 – 109 1 104.5 104.5
110 – 119 0 114.5 0.0
120 – 129 1 124.5 124.5
Median
• The middle value when the original data values are arranged in
order of increasing (or decreasing) magnitude.
• Is not affected by an extreme value - is a resistant measure of
the center.
Median
Example: Find the median of the five sample values 22 chips, 22
chips,26 chips,24 chips, and 23 chips.
•
Solution: First sort the data by arranging them in order, then median
=23
Finding the Median
First sort the values (arrange them in order), then follow one of
these
• If the number of data values is odd, the median is the number
located in the exact middle of the list.
• If the number of data values is even, the median is found by
computing the mean of the two middle numbers.
Mode
• The value that occurs with the greatest frequency.
• Data set can have one, more than one, or no mode.
• Bimodal: two data values occur with the same greatest
frequency.
• Multimodal: more than two data values occur with the same
greatest frequency.
• No Mode: no data value is repeated.
Mode
a. 5.40 1.10 0.42 0.73 0.48 1.10
Mode is 1.10
b. 27 27 27 55 55 55 88 88 99 Bimodal - 27 & 55
c. 1 2 3 6 7 8 9 10 No Mode
Percentiles
• Percentiles are measures of location. There are 99 percentiles
denoted P1, P2, . . . P99, which divide a set of data into 100
groups with about 1% of the values in each group.
• Finding the Percentile of a Data Value:
Total number of values
Converting from the kth Percentile
to the Corresponding Data Value
Notation
k
L= n
100
n total number of values in the data set
k percentile being used
L locator that gives the position of a value
Pk kth percentile
Geometric Mean
• The geometric mean is a mean or average which indicates a
central tendency of a set of numbers by using the product of
their values (as opposed to the arithmetic mean which uses
their sum).
• The geometric mean is defined as the nth root of the product
of n numbers, i.e., for a set of numbers a1, a2, ..., an, the
geometric mean is defined as.
Measures of Dispersion and Variability
Basics Concepts of Measures of Variation.
1. Range
2.Inter-quartile range
3. Mean deviation
4. Variance
5. Standard deviation
6 Coefficient of variation
Range
• The range of a set of data values is the difference between
the maximum data value and the minimum data value.
Range = Maximum value-Minimum value
It is very sensitive to extreme values; therefore not as useful
as other measures of variation.
Quartiles
• Quartiles are measures of location, denoted Q1, Q2, and Q3,
which divide a set of data into four groups with about 25% of
the values in each group.
1. Q1 (First Quartile) separates the bottom 25% of sorted
values from the top 75%.
2. Q2 (Second Quartile) same as the median; separates the bottom
50% of sorted values from the top 50%.
3. Q3 (Third Quartile) separates the bottom 75% of sorted
values from the top 25%.
Q 1 , Q 2, Q 3
Divide ranked scores into four equal parts
25% 25% 25% 25%
(minimum) Q1 Q2 Q3 (maximum)
(median)
Interquartile range
• 10 - 90 Percentile Range: P90 – P10
The Mean deviation
• The mean of the absolute values of the numerical differences
between the numbers of a set (such as statistical data) and
their mean or median.
(x – x)
Sample mean deviation = n
The Variance
• The sum of the squares of the deviations from the mean is
called the sum of squares (ss) and is defined as follows:
• The mean sum of squares is called the variance ( mean square
deviation) and for a population is denoted by 2 and defined
by
• Sample variance is
The standard deviation
• The standard deviation is the positive square root of the
variance. The formula for the standard deviation for a
population is given as:
• For Sample is
The coefficient of variation
• The coefficient of variation (or CV) for a set of nonnegative
sample or population data, expressed as a percent, describes
the standard deviation relative to the mean:
Sample Population
CV = 100 % CV =
m