Descriptive Statistics
Part 1 : Frequency Distributions and
Their Graphs
1
Outline
• 2.1 Frequency Distributions and Their Graphs
• 2.2 Measures of Central Tendency
• 2.3 Measures of Variation
2
List of symbols used
3
Overview
Descriptive Statistics
• Describes the important characteristics of a set of
data.
• Organize, present, and summarize data:
1. Graphically
2. Numerically
4
Important Characteristics of
Quantitative Data
“Shape, Center, and Spread”
• Center: A representative or average value that
indicates where the middle of the data set is located.
• Variation: A measure of the amount that the values
vary among themselves.
• Distribution: The nature or shape of the distribution
of data (such as bell-shaped, uniform, or skewed).
“Shape” of Distributions
Symmetric
• Data is symmetric if the left half of its histogram is
roughly a mirror image of its right half.
Skewed
• Data is skewed if it is not symmetric and if it
extends more to one side than the other.
Uniform
• Data is uniform if it is equally distributed (on a
histogram, all the bars are the same height or
approximately the same height).
The Shape of Distributions
Symmetric Uniform
Skewed left Skewed Right
Outliers
Outliers
• Unusual data values as compared to the rest of the set.
They may be distinguished by gaps in a histogram.
Section 2.1
Frequency Distributions
and Their Graphs
9
Frequency Distributions
Frequency Distribution
• A table that organizes data values into classes or
intervals along with number of values that fall in
each class (frequency, f ).
1. Ungrouped Frequency Distribution – for data
sets with few different values. Each value is in
its own class.
2. Grouped Frequency Distribution: for data sets
with many different values, which are grouped
together in the classes.
Grouped and Ungrouped
Frequency Distributions
Ungrouped Grouped
Courses Frequency, f Age of Frequency, f
Taken Voters
1 25 18-30 202
2 38 31-42 508
3 217 43-54 620
4 1462 55-66 413
5 932 67-78 158
6 15 78-90 32
Ungrouped Frequency Distributions:
step by step
1. RAW DATA 2. DEVELOP TABLE 3. FILL IN DATA
Number of Peas in a Pea Peas per Freq,
Pod pod Freq, f Peas per pod f
Sample Size: 50
1 1
5 5 4 6 4
3 7 6 3 5 2 2
6 5 4 5 5 3 5
6 2 3 5 5
4 9
5 5 7 4 3
4 5 4 5 6 5 18
5 1 6 2 6 6 12
6 6 6 6 4
7 3
4 5 4 5 3
5 5 7 6 5
Graphs of Frequency Distributions:
Frequency Histograms
Frequency Histogram
• A bar graph that represents the frequency distribution.
• The horizontal scale is quantitative and measures the
data values.
• The vertical scale measures the frequencies of the
classes.
• Consecutive bars must touch.
frequency data values
Larson/Farber 4th ed. 13
Frequency Histogram
Example. Peas per Pod
Number of Peas in a Pod
Peas per pod Freq, f
1 1 20
2 2 15
Frequency, f
3 5 10
4 9 5
5 18
0
6 12 1 2 3 4 5 6 7
Number of Peas
7 3
Relative Frequency Distributions and
Relative Frequency Histograms
Relative Frequency Distribution
• Shows the portion or percentage of the data that falls
in a particular class.
class frequency f
• relative frequency
Sample size n
Relative Frequency Histogram
• Has the same shape and the same horizontal scale as
the corresponding frequency histogram.
• The vertical scale measures the relative frequencies,
not frequencies.
15
Relative Frequency Histogram
Has the same shape and horizontal scale as a
histogram, but the vertical scale is marked with
relative frequencies.
Grouped Frequency Distributions
Grouped Frequency Distribution
• For data sets with many different values.
• Groups data into 5-20 classes of equal width.
Exam Scores Freq, f Exam Scores Freq, f Exam Scores Freq, f
30-39 30-39 1
40-49 40-49 0
50-59 50-59 4
60-69 60-69 9
70-79 70-79 13
80-89 80-89 10
90-99 90-99 3
Grouped Frequency Distribution Terms
• Lower class limits: are the smallest numbers that
can actually belong to different classes
• Upper class limits: are the largest numbers that can
actually belong to different classes
• Class width: is the difference between two
consecutive lower class limits
18
Labeling Grouped Frequency
Distributions
• Class midpoints: the value halfway between LCL
and UCL:
(Lower class limit) (Upper class limit)
2
• Class boundaries: the value halfway between an
UCL and the next LCL
(Upper class limit) (next Lower class limit)
2
Constructing a Grouped Frequency
Distribution
1. Determine the range of the data:
Range = highest data value – lowest data value
May round up to the next convenient number
2. Decide on the number of classes.
Usually between 5 and 20; otherwise, it may be
difficult to detect any patterns.
3. Find the class width:
.class width = range
number of classes
Round up to the next convenient number.
20
Constructing a Frequency Distribution
4. Find the class limits.
Choose the first LCL: use the minimum data
entry or something smaller that is convenient.
Find the remaining LCLs: add the class width to
the lower limit of the preceding class.
Find the UCLs: Remember that classes must
cover all data values and cannot overlap.
5. Find the frequencies for each class. (You may add a
tally column first and make a tally mark for each
data value in the class).
21
Other Graphs
Besides Histograms, there are other methods of
graphing quantitative data:
• Stem and Leaf Plots
• Dot Plots
• Time Series
Stem and Leaf Plots
Represents data by separating each data value into
two parts: the stem (such as the leftmost digit) and
the leaf (such as the rightmost digit)
Larson/Farber 4th ed. 23
Constructing Stem and Leaf Plots
• Split each data value at the same place value to form
the stem and a leaf. (Want 5-20 stems).
• Arrange all possible stems vertically so there are no
missing stems.
• Write each leaf to the right of its stem, in order.
• Create a key to recreate the data.
• Variations of stem plots:
1. Split stems
2. Back to back stem plots.
Larson/Farber 4th ed. 24
Constructing a Stem-and-Leaf Plot
Include a key to identify
the values of the data.
Larson/Farber 4th ed. 25
Dot Plots
Dot plot
• Consists of a graph in which each data value is
plotted as a point along a scale of values
Figure 2-5
Time Series
(Paired data)
Time Series
• Data set is composed of quantitative entries taken at
regular intervals over a period of time.
e.g., The amount of precipitation measured each
day for one month.
• Use a time series chart to graph.
Quantitative
data time
Larson/Farber 4th ed. 27
Time-Series Graph
Number of Screens at Drive-In Movies Theaters
Figure 2-8
Ex. www.eia.doe.gov/oil_gas/petroleum/
Graphing Qualitative Data Sets
Pie Chart
• A circle is divided into sectors
that represent categories.
Pareto Chart
• A vertical bar graph in which the
Frequency
height of each bar represents
frequency or relative frequency.
Categories
Larson/Farber 4th ed. 29
Constructing a Pie Chart
• Find the total sample size.
• Convert the frequencies to relative frequencies (percent).
Marital Status Frequency,f Relative frequency (%)
(in millions)
55.3
Never Married 55.3 0.25 or 25%
219.7
127.7
Married 127.7
219.7
13.9
Widowed 13.9
219.7
22.8
Divorced 22.8
219.7
Total: 219.7 30
Constructing Pareto Charts
• Create a bar for each category, where the height of the
bar can represent frequency or relative frequency.
• The bars are often positioned in order of decreasing
height, with the tallest bar positioned at the left.
Figure 2-6
THANK YOU.