0% found this document useful (0 votes)
5 views58 pages

1 Lecture-2

Uploaded by

ms5202932
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views58 pages

1 Lecture-2

Uploaded by

ms5202932
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Course Name

Medical Statistics
(Medicine, Prevention, Nursing)

Lecture-2
2025/03/05
Lecture Contents

 Some definitions

 Chapter-2: Descriptive Statistics


Some Definitions

 Center

 Variation

 Distribution

 Outliers

 Frequency Distribution
Definitions

• Center: A representative or average value that indicates where


the middle of the data set is located.

• Variation: A measure of the amount that the data values vary.

• Distribution: The nature or shape of the spread of data over


the range of values (such as bell-shaped, uniform, or skewed).

• Outliers: Sample values that lie very far away from the vast
majority of other sample values.

• Time: Changing characteristics of the data over time.


The frequency for a particular class is the number of original values that fall
into that class.
• Lower Class Limits: These are the smallest numbers that can
actually belong to different classes

Lower Class
Limits
• Upper Class Limits: These are the largest numbers that can
actually belong to different classes.

Upper Class
Limits
• Class Boundaries: These are the numbers used to separate
classes but without the gaps created by class limits.

59.5
69.5
79.5
89.5
Class
Boundaries 99.5
109.5
119.5
129.5
• Class Midpoints: These are the numbers used to separate
classes but without the gaps created by class limits.

64.5
74.5
84.5

Class 94.5
Midpoints 104.5
114.5
124.5
• Class Width: It is the difference between two consecutive
lower class limits or two consecutive lower class boundaries.

10
10
Class
Width 10
10
10
10
• Relative Frequency: The relative frequency of a class is the
portion or percentage of the data that falls in that class to
the relative frequency of a class, divide the frequency by the
sample size.

• Cumulative Frequency: The cumulative frequency of a class is


the sum of the frequency for that class and all previous
classes.
Frequency Distribution or Frequency Table

• Frequency Distribution or Frequency Table: It shows how a data


set is partitioned among all of several categories (or classes)
by listing all of the categories along with the number of data
values in each of the categories.
Reasons for Constructing Frequency Distributions

• Large data sets can be summarized.

• We can analyze the nature of the data.

• We have a basis for constructing important graphs.


Key Concept

• When working with large data sets, it is often helpful to


organize and summarize data by constructing a table called a
frequency distribution.

• Because computer software and calculators can generate


frequency distributions, the details of constructing them are
not as important as what they tell us about data sets. It
helps us understand the nature of the distribution of a data
set.
Constructing a Frequency Distribution

• Determine the number of classes (should be between 5 and 20).

• Calculate the class width (round up).


Maximum value – Minimum value
Class width  Number of classes

• Starting point: Choose the minimum data value or a convenient


value below it as the first lower class limit.

• Using the first lower class limit and class width, proceed to
list the other lower class limits.

• List the lower-class limits in a vertical column and proceed to


enter the upper-class limits.

• Take each individual data value and put a tally mark in the
appropriate class. Add the tally marks to get the frequency.
Relative Frequency Distribution
• Includes the same class limits as a frequency distribution,
but the frequency of a class is replaced with a relative
frequency (a proportion) or a percentage frequency ( a percent)

Class frequency
Relative frequency =
Sum of all frequencies

Class frequency
Percentage frequency =  100%
Sum of all frequencies
Frequency Distribution &
Relative frequency Distribution
• Example: Using the pulse rate of females given in Table 2.2,
follow the procedure to construct the frequency distribution
and relative frequency distribution with 7 classes.

76 72 88 60 72 68 80 64 68 68 80 76 68 72 96 72 68 72 64 80

64 80 76 76 76 80 104 88 60 76 72 72 88 80 60 72 88 88 129 64
Class- interval Frequency Relative Frequency
60-69 12
70-79 14
80-89 11
90-99 1
100-109 1
110-119 0 0.000
120-129 1
Total = 40
Relative frequency Distribution

Total Frequency = 40

* 12/40  100 = 30%


Cumulative frequency Distribution

Cumulative Frequencies
Frequency tables
Graphs of Frequency Distribution

• Sometimes it is easier to identify patterns of a data set by


looking at a graph of the frequency distribution. One such
graph is a frequency histogram.

• A graph consisting of bars of equal width drawn adjacent to


each other (without gaps). The horizontal scale represents the
classes of quantitative data values and the vertical scale
represents the frequencies.
Frequency histogram

• It is a bar graph that represents the frequency distribution


of a data set.

• The heights of the bars correspond to the frequency values.

• A histogram has the following properties.

1. The horizontal scale is quantitative and measures the data


values.

2. The vertical scale measures the frequencies of the classes.

3. Consecutive bars must touch.


Frequency histogram

Basically a graphic version of a frequency distribution.


Histogram

• The bars on the horizontal scale are labelled with one of the
following.

1. Class boundaries

2. Class midpoints.

3. Lower class limits (introduces a small error).

Horizontal Scale for Histogram: Use class boundaries or class midpoints.

Vertical Scale for Histogram: Use the class frequencies.


Relative Frequency histogram

Has the same shape and horizontal scale as a histogram, but the
vertical scale is marked with relative frequencies instead of
actual frequencies
Example 2-1
Measures of Central Tendency

• A measure of central tendency is a value that represents a


typically, or central entry of a data set. Three commonly used
measures can be used for describing the central tendency of a
population or sample are; Mean, Median, and Geometric mean.

• Normal distributional data can use arithmetic mean. Biased


distribution data use a median, and some data becoming normal
distribution or approximately normal distribution after
logarithm counterchange will use the geometric mean.
Measure of Center

• The value at the center or middle of a data set


Arithmetic Mean

• The measure of the center obtained by adding the values and


dividing the total by the number of values.

• What most people call an average.


Notation

 denotes the sum of a set of values.

x is the variable usually used to represent individual data values.

n represents the number of data values in a sample.

N represents the number of data values in a population.

x is pronounced ‘x-bar’ and denotes the mean of a set of sample values


x
x = n
µ is pronounced ‘mu’ and denotes the mean of all values in a population
x
µ =
N
Arithmetic Mean

Find the mean of the data summarized in the given frequency distribution table:

Pulse Rates of Frequency


Females
60 – 69 12
70 – 79 14
80 – 89 11
90 – 99 1
100 – 109 1
110 – 119 0
120 – 129 1
Total = 40
Weighted mean
Mean for Grouped data
• The weighted mean is the mean of the data set whose entries
have varying weights. A weighted mean is given by

 (fx)
x = f

Where fx is the weight of each entry x.


Class interval

60 – 69 12 64.5 774.0
70 – 79 14 74.5 1043.0
80 – 89 11 84.5 929.5
90 – 99 1 94.5 94.5
100 – 109 1 104.5 104.5
110 – 119 0 114.5 0.0
120 – 129 1 124.5 124.5
Median

• The middle value when the original data values are arranged in
order of increasing (or decreasing) magnitude.

• Is not affected by an extreme value - is a resistant measure of


the center.
Median

Example: Find the median of the five sample values 22 chips, 22


chips,26 chips,24 chips, and 23 chips.

Solution: First sort the data by arranging them in order, then median
=23
Finding the Median

First sort the values (arrange them in order), then follow one of
these

• If the number of data values is odd, the median is the number


located in the exact middle of the list.

• If the number of data values is even, the median is found by


computing the mean of the two middle numbers.
Mode

• The value that occurs with the greatest frequency.

• Data set can have one, more than one, or no mode.

• Bimodal: two data values occur with the same greatest


frequency.

• Multimodal: more than two data values occur with the same
greatest frequency.

• No Mode: no data value is repeated.


Mode

a. 5.40 1.10 0.42 0.73 0.48 1.10


 Mode is 1.10
b. 27 27 27 55 55 55 88 88 99  Bimodal - 27 & 55

c. 1 2 3 6 7 8 9 10  No Mode
Percentiles

• Percentiles are measures of location. There are 99 percentiles


denoted P1, P2, . . . P99, which divide a set of data into 100
groups with about 1% of the values in each group.

• Finding the Percentile of a Data Value:

Total number of values


Converting from the kth Percentile
to the Corresponding Data Value
Notation

k
L= n
100

n total number of values in the data set


k percentile being used
L locator that gives the position of a value
Pk kth percentile
Geometric Mean
• The geometric mean is a mean or average which indicates a
central tendency of a set of numbers by using the product of
their values (as opposed to the arithmetic mean which uses
their sum).

• The geometric mean is defined as the nth root of the product


of n numbers, i.e., for a set of numbers a1, a2, ..., an, the
geometric mean is defined as.
Measures of Dispersion and Variability

Basics Concepts of Measures of Variation.

1. Range

2.Inter-quartile range

3. Mean deviation

4. Variance

5. Standard deviation

6 Coefficient of variation
Range

• The range of a set of data values is the difference between


the maximum data value and the minimum data value.

Range = Maximum value-Minimum value

It is very sensitive to extreme values; therefore not as useful


as other measures of variation.
Quartiles

• Quartiles are measures of location, denoted Q1, Q2, and Q3,


which divide a set of data into four groups with about 25% of
the values in each group.

1. Q1 (First Quartile) separates the bottom 25% of sorted


values from the top 75%.

2. Q2 (Second Quartile) same as the median; separates the bottom


50% of sorted values from the top 50%.

3. Q3 (Third Quartile) separates the bottom 75% of sorted


values from the top 25%.
Q 1 , Q 2, Q 3
Divide ranked scores into four equal parts

25% 25% 25% 25%

(minimum) Q1 Q2 Q3 (maximum)
(median)
Interquartile range

• 10 - 90 Percentile Range: P90 – P10


The Mean deviation

• The mean of the absolute values of the numerical differences


between the numbers of a set (such as statistical data) and
their mean or median.

 (x – x)
Sample mean deviation = n
The Variance

• The sum of the squares of the deviations from the mean is


called the sum of squares (ss) and is defined as follows:

• The mean sum of squares is called the variance ( mean square


deviation) and for a population is denoted by 2 and defined
by

• Sample variance is
The standard deviation

• The standard deviation is the positive square root of the


variance. The formula for the standard deviation for a
population is given as:

• For Sample is
The coefficient of variation

• The coefficient of variation (or CV) for a set of nonnegative


sample or population data, expressed as a percent, describes
the standard deviation relative to the mean:

Sample Population

CV = 100 % CV =
m

You might also like