Descriptive statistics:
Data Organization
and Presentation
Tufa Kolola
(MPH, Ass’t. Prof.)
1
Learning
objectives
At the end of this session you will be able to:
• Present qualitative data using tabular methods
• Present qualitative data using graphical
methods
• Present quantitative data using tabular methods
• Present quantitative data using graphical
methods
2
Descriptive
summary statistics
Descriptive statistics: Techniques used to
organize and summarize a set of data in more
comprehensible and meaningful way
– Organization of data
– Summarization of data
– Presentation of data
Numbers that have not been summarized and
organized are called raw data
3
Raw data
Definition
Data that have been collected or recorded but
have not been arranged or processed yet are
called raw data
4
Example1: Ages of 50
students in years
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
5
Example2:
These are types of blood group for a sample of
50 OPD patients
O AB A AB AB B O B B O
O O B O A O O A B B
A A AB O O O A O O B
A O O O A B O O A A
O A A B AB B O A O A
Ordered array
Ordered array: is a simple arrangement of
individual observations in the order of magnitude
- Example: Ages of 50 students
18 19 19 21 22 23 23 25 26 31
18 19 20 21 22 23 24 25 27 33
18 19 20 21 22 23 24 25 27 34
19 19 20 21 22 23 25 25 28 37
19 19 21 21 22 23 25 26 29 37
Very difficult with large sample size
7
Presentation of
data
Data
Qualitative Data Quantitative Data
Tabular Graphical Tabular Graphical
Methods Methods Methods Methods
8
Frequency
Distribution
Frequency distribution: is a table that summarizes
a raw data into non-overlapping classes or categories
along with their corresponding class frequency
Class frequency: The number of observations that
fall into the class
The objective is to provide insights about the data
that cannot be quickly obtained by looking only at the
original data
9
Frequency
Distribution
The actual summarization and organization of
data starts from frequency distribution
The distribution condenses the raw data into a
more useful form and allows for a quick visual
interpretation of the data
10
Frequency Distribution
for categorical variables
Count the number of observations (frequency) in
each category and present as relative
frequencies
Often presented in the form of Table, Bar and
Pie charts
11
Frequency Distribution for
categorical variables
Relative frequency: value for any category
obtained by dividing the number of observations in
that category by the total number of observations
- Class relative frequency = Class frequency/
Total number of observations
This can be reported as a percentage by
multiplying the resulting fraction by 100
12
Frequency Distribution
for categorical variables
A relative frequency distribution: Shows the proportion
of counts that fall into each class or category
For nominal and ordinal data, frequency distributions
are often used as a summary
The % of times that each value occurs, or the relative
frequency, is often listed
Tables make it easier to see how the data are
distributed
13
Example 1: Nominal data
Table 1: Type of hospitals owned by MOH in Ethiopia
in 2006/07
Source: Health and health related indicator
14
Example 2: Ordinal data
Table 2: Level of satisfaction, with nursing care by
475 psychiatric in-patients, 1991
15
Frequency Distribution
for numerical variables
A frequency distribution can also show the number
of observations at different values or within
certain ranges
There are two types of frequency distribution:
– Single value (ungrouped frequency)
– Interval type (classes) – grouped frequency
16
Ungrouped Frequency
Distribution
Ungrouped frequency distribution: Consists
of a single data with their respective frequency
Can be used when the range of values in the
data set is not large
Classes are one unit in width
17
Example:
Leisure time in hours per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
Construct a frequency distribution table?
18
Leisure time Frequency
(hours)
10 1
12 1
13 1
14 2
15 2
16 3
18 2
19 4
20 2
21 3
22 2
23 3
24 2
25 1
26 1
27 2
28 2
29 1
31 1
32 1
34 1
36 1
38 1
Total 40
19
Grouped Frequency
Distribution
Can be used when the range of values in the
data set is large
The data must be grouped into classes that are
more than one unit in width
20
Grouped Frequency
Distribution
Steps in Constructing Frequency Distribution
Tables
Step 1: Determine the range of the data
- R = Highest Value – Lowest Value
21
Step 2: Determine the number of classes (k) and
the corresponding width, we may use:
Where;
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
22
Step 3: For each class, count the number of
observations (class frequency)
Step 4: Determine the relative frequency for each
class
Frequency of each class interval
Relative frequency =
Total number of observations
23
Grouped Frequency
Distribution
Guidelines for Constructing a Frequency
Distribution:
The classes must be mutually exclusive
The classes must be continuous
The classes must be exhaustive
The class must be equal in width
24
Example:
Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
Maximum value = 38, Minimum value = 10
K = 1 + 3.322 (log40) = 6.32 6
Width = (38-10)/6 = 4.6 5
25
26
Cumulative frequencies: When frequencies of
two or more classes are added
Cumulative relative frequency: The proportion of
the total number of observations that have a value
less than or equal to the upper limit of the interval
Mid-point: The value of the interval which lies
midway between the lower and the upper limits of
a class
27
True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions
Used for smoothening of the class intervals
Subtract 0.5 from the lower and add it to the
upper limit
28
29
Guidelines for
constructing tables
Tables should be self-explanatory
Include clear title telling what, when and where
Clearly label the rows and columns
State clearly the unit of measurement used
Explain codes and abbreviations in the foot-note
Show totals
If data is not original, indicate the source in foot-
note
30
Graphical
presentation of data
Help users to obtain at a glance an intuitive feeling
of the data
Should be self-explanatory
Must have a descriptive title, labeled axes and
indication of the units of measurement
31
Graphical
presentation
Importance of Graphical presentation:
Diagrams have greater attraction than mere figures
They give quick overall impression of the data
They have great memorizing value than mere
figures
They facilitate comparison
Used to understand patterns and trends
32
Graphical
presentation
Well designed graphs can be powerful means of
communicating a great deal of information
When graphs are poorly designed, they not only
ineffectively convey message, but they are often
misleading
33
Types of graphs
Categorical data
– Bar chart
– Pie-chart
Quantitative data
– Histogram
– Frequency Polygon
– Ogive
– Stem-and-leaf plot
– Box plot
– Scatter Diagram
34
Bar chart
Definition:
A graph made of bars whose heights represent
the frequencies of respective categories is called
a bar graph.
35
Bar chart
Used to display frequency contained in the
frequency distribution of categorical variable
It is used with categorical data
Each bar represent one category and its height is
the frequency or relative frequency
o y – axis: Frequency or the relative
frequency or percentage
o x – axis: Category
36
Bar chart
Rules
o Bars should be separated
o The gap between each bar is uniform
o All bars should be of the same width
o All the bars should rest on the same line called the
base
o It is very important that Y axis begin with 0
o Label both axes clearly
37
Simple bar chart
The simple bar chart is appropriate if only one
variable is to be shown
60
53.9
50
40.6
40
Percentage
30
20
10 5.5
0
First trimester Second trimester Third trimester
Figure 1 : First ANC booking time among pregnant women in X
Town, Ethiopia, 2017 38
Clustered bar chart
95 90
90 First day
85
80 74.3
75 Second and subsquent days
70
65
60
55
Percent
50
45
40
35
30 25.7
25
20
15 10.0
10
5
0
Urban Rural
Residence
Figure 2 : Timing of health care seeking reported by place of
residence, X District, Ethiopia, 2011. 39
Pie-chart
A pie chart: is a circle that is divided into
sections according to the percentage of
frequencies in each category of the distribution
Used for a single categorical variable relative
frequency
Each slice of pie correspond at relative
frequency of categories of variable
40
Pie-chart
Steps to construct a pie-chart
Construct a frequency table
Change the frequency into percentage (P)
Change the percentages into degrees, where:
degree = Percentage X 360o
Draw a circle and divide it accordingly
41
Example
Digestive Others
System 8%
Injury and 4%
Poisoning
3%
Respiratory
system ciculatory
13% system
42%
Neoplasmas
30%
Figure 3: Distribution for cause of death for females, in
England and Wales, 1989
42
Histogram
Histograms are frequency distributions with
continuous class intervals that have been
turned into graphs
To construct a histogram, we draw the interval
boundaries on a horizontal line and the
frequencies on a vertical line
43
Histogram
In a histogram, the bars are drawn adjacent to
each other
The bars are drawn to touch each other, to show
the underlying continuity of the data
In a histogram, the area of each bar is proportional
to the frequency of observations in the interval
44
Example
Using the following frequency distribution of the
home runs hit by Major League Baseball teams
during the 2002 season, construct the histogram
Total Home Runs f
124 – 145 6
146 – 167 13
168 – 189 4
190 – 211 4
212 - 233 3
45
Class boundaries and their Frequency and
cumulative frequency distributions
Total Home Cumulative
Class Boundaries Frequency
Runs frequency
124 – 145 123.5 - 145.5 6 6
146 – 167 145.5 - 167.5 13 19
168 – 189 167.5 - 189.5 4 23
190 – 211 189.5 - 211.5 4 27
212 - 233 211.5 - 233.5 3 30
Total 30
46
Histogram
15
12
9
Frequency
0
123.5 145.5 167.5 189.5 211.5 233.5
Figure 4: Total home runs hit by all players of each of the 30
Major League Baseball teams during the 2002 season
47
Frequency
polygon
Frequency polygon: Is a graph formed by joining
the midpoints of the tops of successive bars in a
histogram with straight lines
The total area under the frequency polygon is
equal to the area under the histogram
48
Frequency polygon
15
12
9
Frequency
0
134.5 156.5 178.5 200.5 222.5
Figure 5: Total home runs hit by all players of each of the 30
Major League Baseball teams during the 2002 season
49
Ogive
Ogive: Is a curve drawn for the cumulative
frequency distribution by joining with straight lines
the dots marked above the upper boundaries of
classes at heights equal to the cumulative
frequencies of respective classes
50
Ogive
It is obtained as follows:
On a vertical axis we mark cumulative frequency
On a horizontal axis we mark the upper
boundaries of all classes. However, the lower
boundary of the first class will be the starting
point
Then, a smooth curve is drawn joining all these
points
51
Class boundaries and their Frequency and
cumulative frequency distributions
Total Home Cumulative
Class Boundaries Frequency
Runs frequency
124 – 145 123.5 - 145.5 6 6
146 – 167 145.5 - 167.5 13 19
168 – 189 167.5 - 189.5 4 23
190 – 211 189.5 - 211.5 4 27
212 - 233 211.5 - 233.5 3 30
Total 30
52
Ogive
30
25
Cumulative frequency
20
15
10
123.5 145.5 167.5 189.5 211.5 233.5
Figure 6: Total home runs hit by all players of each of the 30
Major League Baseball teams during the 2002 season 53
Stem-and leaf plot
Another common tool for visually displaying
continuous data is the ―stem and leaf‖ plot
Allows for easier identification of individual values
in the sample
Very similar to a histogram
Are most effective with relatively small data sets
Helps to understand the nature of data
– Presence or absence of symmetry
54
Stem-and leaf plot
Can be constructed as follows:
(1) Separate each data point into a stem component
and a leaf component
The stem component consists of the number
formed by all but the rightmost digit of the
number, and the leaf component consists of the
rightmost digit. Thus the stem of the number
483 is 48, and the leaf is 3
(2) Write the smallest stem in the data set in the
upper left-hand corner of the plot
55
Data of birth weights from 100 consecutive
deliveries
56
Stem-and-leaf plot for the birth weight data
(N=100)
Stem Leaves
57
Stem-and-leaf plot can be constructed as
follows:
(3) Write the second stem, which equals the fist stem
+ 1, below the fist stem
(4) Continue with step until you reach the largest stem
in the data set
(5) Draw a vertical bar to the right of the column of
stems
(6) For each number in the data set, find the
appropriate stem and write the leaf to the right of
the vertical bar
58
Box plot
One way to give a nice profile of a data set is the
box plot
Gives good insight into distribution shape in terms
of skewness and outlying values
Very nice tool for easily comparing distribution of
continuous data in multiple groups—can be plotted
side by side
59
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men
60
Box plot: BP for 113 Males
Box plot of Systolic Blood Pressures
Sample of 113 Men
Sample Median
Blood Pressure
61
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men
75th Percentile
25th Percentile
62
Box plot: BP for 113 Males
Boxplot of Systolic Blood Pressures
Sample of 113 Men
Largest Observation
Smallest Observation
63
Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data
Tabular Graphical Tabular Graphical
Methods Methods Methods Methods
• Frequency •Bar Graph • Frequency • Histogram
Distribution •Pie Chart Distribution • Frequency polygon
• Rel. Freq. Dist. • Rel. Freq. Dist. • Ogive
• Cumulative Freq. • Cum. Freq. Dist. • Scatter
Distribution • Cum. Rel. Freq. Diagram
• Cumulative Rel. Freq. Distribution
Distribution
64
65