Reviewer in StatAna
Chapter 2 – Descriptive Statistics: Rating Frequency
Tabular and Graphical Displays Coca-cola 2
Diet coke 3
Þ Summarizing Data for a Categorical Dr. Pepper 5
Variable Pepsi 9
• Categorical data use labels or Sprite 1
names to identify categories of Total 20
like items.
Þ Summarizing Data for a Quantitative Relative Frequency and Percent Frequency
Variable Distributions
• Quantitative Data are numerical
values that indicate how much or Rating Relative Percent
how many. Frequency Frequency
Coca-cola .10 10
Summarizing Categorical Data Diet Coke .15 15
Dr. Pepper .25 25
§ Frequency Distribution Pepsi .45 45
§ Relative Frequency Distribution Sprite .05 5
§ Percent Frequency Distribution Total 1.00 100
§ Bar Chart
§ Pie Chart Relative Frequency Distribution
Frequency Distribution § The relative frequency of a class is the
fraction of proportion of the total number
§ A frequency distribution is a tabular of data items belonging to a class.
summary of data showing the number
(frequency) of observations in each of Formula:
several non-overlapping categories or Relative frequency of a class =
!"#$%#&'( *! +,# '-.//
&
classes.
§ The objective is to provide insights about
§ A relative frequency distribution is a tabular
the data that cannot be quickly obtained by
summary of a set of data showing the
looking only at the original data.
relative frequency for each class.
Example:
§ Soft Drink purchasers were asked to select
Bar Chart
one among the five popular soft drinks:
Coca-cola, Diet Coke, or Dr. Pepper, Pepsi,
§ A Bar Chart is a graphical display for
and Sprite.
depicting qualitative data.
§ Soft Drink selected by a sample of 20
§ On one axis (usually the horizontal axis), we
purchasers are:
specify the labels that are used for each of
the classes.
Coca-cola Pepsi Dr. Pepper § A Frequency, relative frequency, or percent
Diet Coke Dr. Pepper Dr. Pepper frequency scale can be used for the other
Dr. Pepper Pepsi Pepsi axis (usually the vertical axis).
Pepsi Coca-cola Diet Coke § Using a bar of fixed width drawn above
Pepsi Diet-Coke Dr. Pepper each class label, we extend the height
Pepsi Pepsi Sprite appropriately.
Pepsi Pepsi
§ The bars are separated to emphasize the § Stern-and-Leaf Display
fact that each class is separate.
Frequency Distribution
Pareto Diagram
Þ Step 1: Determine the number of
§ In quality control, bar charts are used to overlapping classes.
identify the most important causes of Þ Step 2: Determine the width of each class
problems. Þ Step 3: Determine the class limits.
§ When the bars are arranged in descending
order of height from left to right (with the Example: Sanderson and Clifford, a small public
most frequently occurring cause appearing accounting firm wants to determine time in days
first) the bar chart is called a Pareto required to complete year end audits. It takes a
Diagram. sample of 20 clients.
§ This diagram is named for its founder,
Vilfredo Pareto, an Italian economist. Year-end Audit Time (In Days)
Pie Chart 12 14 19 18
15 15 18 17
§ The Pie Chart is a commonly used graphical
20 27 22 23
display for presenting relative frequency
22 21 33 28
and percent frequency distributions for
categorical data. 14 18 16 13
§ First draw a circle, then use the relative
frequencies to subdivide the circle into Guidelines for Determining the Width of Each Class
sectors that correspond to the relative
frequency for each class. § Use classes of equal width.
§ Approximate Class Width =
§ Since there are 360 degrees in a circle, a -."0#/+ 1.+. 2.-%#3/4.--#/+ 1.+. 2.-%#
class with a relative frequency of .25 would &%45#" *! '-.//#/
consume .25(360) = 90 degrees of the circle. § Making the classes the same width reduces
§ Example: the chance of inappropriate interpretations.
- Inferences from the Pie Chart
o Almost one-half of the Note on Number of Classes and Class Width
customers surveyed preferred
Pepsi (looking at the left side of § In practice, the number of classes and the
the pie). appropriate class width are determined by
o The second preference is for Dr. trial and error.
Pepper with 25% of the § Once a possible number of classes is
customers opting for it. chosen, the appropriate class width is
o Only 5% of the customers opted found.
for Sprite. § The process can be repeated for a different
number of classes.
Summarizing Quantitative Data § Ultimately, the analyst uses judgement to
determine the combination of the number
§ Frequency Distribution of classes and class width that provides the
§ Relative Frequency and Percent Frequency best frequency distribution for summarizing
Distributions the data.
§ Dot Plot
§ Histogram
§ Cumulative Distributions
Guidelines for Determining the Class Limits Relative Frequency and Percent Frequency
Distributions
§ Class limits must be chosen so that each data
item belongs to one and only class. Example: Sanderson and Clifford
§ The lower class limit identifies the smallest
possible data value assigned to the class Insights obtained from the Percent Frequency
§ The upper class limit identifies the largest Distribution:
possible data value assigned to the class
§ The appropriate values for the class limits Þ 40% of the audits required from 15 to 19
depend on the level of accuracy of the data days.
§ An open-end requires only a lower class limit Þ Another 25% of the audits required 20 to 25
or an upper class limit. days.
Þ Only 5% of the audits required more than
Guidelines for Determining the Number of Classes 30 days.
§ Use between 5 and 20 classes. Audit time (in Relative Percent
§ Data sets with a larger number of elements days) Frequency Frequency
usually require a larger number of classes. 10 – 14 .20 20 (0.2 * 100)
§ Smaller data sets usually require fewer 15 – 19 .40 40
classes. 20 – 25 .25 25
§ The goal is to use enough classes to show 25 – 29 .10 10
the variation in the data, but not so many 30 – 34 .05 5
classes that some contain only a few data Total 1.00 100
items.
Dot Plot
Example: Sanderson and Clifford
§ One of the Simplest graphical summaries of
§ If we choose five classes: data is a dot plot.
§ Approximate Class Width = (33 – 12)/5 = 4.2 § A horizontal axis shows the range of data
@4 values.
§ Then each data value is represented by a
Time in days Frequency dot placed above the axis.
10 – 14 4
15 – 19 8 Example: Sanderson and Clifford
20 – 24 5
25 – 29 2
30 – 34 1
Total 20
Class Midpoint
§ In some cases, we want to know the Histogram
midpoints of the classes in a frequency
distribution for quantitative data. § Another common graphical display of
§ The class midpoint is the value halfway quantitative data is a histogram.
between the lower and upper class limits. § The variable of interest is placed on the
horizontal axis.
§ interval with its height corresponding to the Cumulative Distributions
interval’s frequency, relative frequency, or
percent frequency. § Cumulative frequency distribution – shows
§ Unlike a bar graph, a histogram has no the number of items with values less than
natural separation between rectangles of or equal to the upper limit of each class.
adjacent classes. § Cumulative relative frequency distribution –
shows the proportion of items with values
Histograms Showing Skewness less than or equal to the upper limit of each
class.
§ Moderately Skewed Left § Cumulative percent frequency distribution –
• A longer tail to the left shows the percentage of items with values
• Example: Exam Scores less than or equal to the upper limit of each
class.
§ Example: Sanderson and Cliffords
Audit Cumulative Cumulative Cumulative
time frequency Relative Percent
(days) Frequency Frequency
£ 14 4 .20 20
£ 19 12 .60 60
£ 24 17 .85 85
£ 29 19 .95 95
§ Moderately Right Skewed 20 1.00 100
£ 34
• A longer tail to the right
• Example: Housing Values § The last entry in a cumulative frequency
distribution always equals the total number
of observations.
§ The last entry in a cumulative relative
frequency distribution always equals 1.00.
§ The last entry in a cumulative percent
frequency distribution always equals 100.
Stem-and-Leaf Display
§ A stem-and-leaf display shows both the
§ Symmetric rank order and shape of the distribution of
• Left tail is the mirror image of the the data.
right tail § It is similar to a histogram on its side, but it
• Example: Heights of people has the advantage of showing the actual
data values.
§ The first digits of each data item arranged
to the left of a vertical line.
§ To the right of the vertical line we record
the last digit for each item in rank order.
§ Each line (row) in the display is referred to
as a stem.
§ Each digit on a stem is a leaf.
Number of questions answered correctly by 50
students.
Leaf Units
§ A single digit is used to define each leaf
§ In the preceding example, the leaf unit was
1.
§ Leaf units may be 100, 10, 1, 0.1, and so on.
§ Where the leaf unit is not shown, it is
assumed to equal 1.
§ The leaf unit indicates how to multiply the
stem-and-leaf numbers in order to
approximate the original data.
§ Example: Leaf Unit = 0.1
If we have data with values such as
8.6, 11.7, 9.4, 10.2, 11.0, 8.8
Leaf Unit = 0.1
8 68
9 14
10 2
11 07
§ Example; Leaf Unit = 10
If we have data with values such as
1806, 1717, 1974, 1791, 1682, 1910, 1838
Leaf Unit = 10
16 8
17 19
18 03
19 17
The 82 in 1682 is rounded down to 80 and is
represented as an 8.