Describing Data: Displaying and
Exploring Data
Chapter 4
4-1 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education.
Learning Objectives
LO4-1 Construct and interpret a dot plot
LO4-2 Identify and compute measures of position
LO4-3 Construct and analyze a box plot
LO4-4 Compute and interpret the coefficient of skewness
LO4-5 Create and interpret a scatter diagram
LO4-6 Compute and interpret the correlation coefficient
LO4-7 Develop and explain a contingency table
4-2 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Dot Plots Example
Use dot plots to compare the two data sets like these of
the number of vehicles serviced last month for two
different dealerships
4-3 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Dot Plots Example (2 of 2)
Minitab(stats software) provides dot plots and summary
statistics
4-4 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Measures of Position
Measures of location also describe the shape of the
distribution and can be expressed as percentiles
Quartiles divide a set of observations into four equal
parts
The interquartile range is the difference between the
third quartile and the first quartile
Deciles divide a set of observations into 10 equal parts
Percentiles divide a set of observations into 100 equal
parts
4-5 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Quantiles of Discrete Distributions
https://en.wikipedia.org/wiki/Quantile
https://en.wikipedia.org/wiki/Percentile
In statistics, a percentile (or a centile) is a score below
which a given percentage of scores in its frequency
distribution falls (exclusive definition) or a score at or
below which a given percentage falls (inclusive definition).
For example, the 50th percentile (the median) is the
score below which (exclusive) or at or below which
(inclusive) 50% of the scores in the distribution may be
found.
Algorithms either return the value of a score that exists in
the set of scores (nearest-rank methods) or interpolate
between existing scores and are either exclusive or
inclusive.
4-6
The nearest‐rank method
One definition of percentile, often given in texts, is that the P-th percentile
(0<P≤100) of a list of N ordered values (sorted from least to greatest) is the
smallest value in the list such that no more than P percent of the data is
strictly less than the value and at least P percent of the data is less than or
equal to that value. This is obtained by first calculating the ordinal rank and
then taking the value from the ordered list that corresponds to that rank.
The ordinal rank n is calculated using this formula
n=[ ∗ 𝑁
• A percentile calculated using the nearest-rank method will always be a member of the
original ordered list.
• The 100th percentile is defined to be the largest value in the ordered list.
4-7
Measures of Position Example
Morgan Stanley is an investment company with offices
located throughout the United States. Listed below are
the commissions earned last month by a sample of 15
brokers
First, sort the data from smallest to largest
4-8 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Measures of Position Example (2 of 2)
Next, find the median
L50 = (15+1)*50/100 = 8
So the median is $2,038, the value at position 8
25 75
L25 (15 1) 4 L75 (15 1) 12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respectively: L25 $1, 721; L75 $2, 205
$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038
2,047 2,054 2,097 2,205 2,287 2,311 2,406
4-9 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Box Plots
BOX PLOT A graphic display that shows the general shape of a
variable’s distribution. It is based on five descriptive statistics: the
maximum and minimum values, the first and third quartiles, and the
median.
The interquartile range is Q3 – Q1
Outliers are values that are inconsistent with the rest of
the data and are identified with asterisks in box plots
4-10 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Box Plot Example
Alexander’s Pizza offers free delivery of its pizza within
15 miles. How long does a typical delivery take? Within
what range will most deliveries be completed?
Using a sample of 20 deliveries, Alexander determined the
following:
Minimum value = 13 minutes
Q1 = 15 minutes
Median = 18 minutes
Q3 = 22 minutes
Maximum value = 30 minutes
Develop a box plot for delivery times
4-11 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Box Plot Example Continued
Begin by drawing a number line using an appropriate scale
Next, draw a box that begins at Q1 (15 minutes) and
ends at Q3 (22 minutes)
Draw a vertical line at the median (18 minutes)
Extend a horizontal line out from Q3 to the maximum
value (30 minutes) and out from Q1 to the minimum
value (13 minutes)
4-12 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Common Shapes of Data
4-13 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Skewness
The coefficient of skewness (비대칭계수) is a measure of
the symmetry of a distribution
Two formulas for coefficient of skewness
4-14 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Skewness (2 of 2)
Pearson’s coefficient of skewness can range from −3 to
+3
A value near −3 indicates considerable negative skewness
A value of 1.63 indicates moderate positive skewness
A value of 0 means the distribution is symmetrical
4-15 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Skewness Example
Following are the earnings per share for a sample of
15 software companies for the year 2018. The
earnings per share are arranged from smallest to
largest.
Begin by finding the mean, median, and standard
deviation. Find the coefficient of skewness.
What do you conclude about the shape of the
distribution?
4-16 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Skewness Example (2 of 2)
Step 1 : Compute the Mean
X
X
$ 74 .26
$ 4 .95
n 15
Step 2 : Compute the Standard Deviation
s
XX
2
($ 0 .09 $ 4 .95 ) 2 ... ($ 16 .40 $ 4 .95 ) 2 )
$ 5 .22
n 1 15 1
Step 3 : Find the Median
The middle value in the set of data, arranged from smallest to largest is 3.18
Step 4 : Compute the Skewness
3( X Median ) 3($ 4 .95 $ 3 .18 )
sk 1 .017
s $ 5 .22
What do you conclude about the shape of the
distribution?
4-17 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Describing the Relationship Between Two
Variables
SCATTER DIAGRAM Graphical technique used to show the
relationship between two variables measured with interval or ratio
scales.
Both variables are measured with interval or ratio level
scale
If the scatter of points moves from the lower left to the
upper right, the variables under consideration are directly
or positively related
If the scatter of points moves from the upper left to the
lower right, the variables are inversely or negatively
related
4-18 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Scatter Diagrams
4-19 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Correlation Coefficient
A statistic called the correlation coefficient can be
calculated to measure the direction and strength of the
relationship between two variables
Can range from −1.0 to +1.0
The closer the coefficient is to −1.0 or +1.0, the stronger
the relationship
If r is close to 0.0, we can say that there is no relationship
between the variables
4-20 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Correlation Coefficient (2 of 2)
4-21 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Contingency Tables
A contingency table is used to classify nominal scale
observations according to two characteristics
CONTINGENCY TABLE A table used to classify observations
according to two identifiable characteristics.
It is a cross-tabulation that simultaneously summarizes
two variables of interest
Both variables need only be nominal or ordinal
4-22 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Contingency Table Example
Applewood Auto Group’s profit comparison
90 of the 180 cars sold had a profit above the median and
half below. This meets the definition of median.
The percentage of profits above the median are Kane
48%, Olean 50%, Sheffield 42% , and Tionesta 60%.
4-23 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Chapter 4 Practice Problems
4-24 Copyright © 2022 McGraw-Hill Education. All rights reserved. No
reproduction or distribution without the prior written consent of McGraw-
Question 3 LO4-1
Consider the following chart.
a. What is this chart called?
b. How many observations are in the study?
c. What are the maximum and the minimum values?
d. Around what values do the observations tend to
cluster?
4-25 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Question 7 LO4-2
The Thomas Supply Company Inc. is a distributor of gas-
powered generators. As with any business, the length of
time customers take to pay their invoices is important.
Listed below, arranged from smallest to largest, is the time,
in days, for a sample of the Thomas Supply Company Inc.
invoices.
a. Determine the first and third quartiles.
b. Determine the second decile and the eighth decile.
c. Determine the 67th percentile.
4-26 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Question 9 LO4-3
The box plot below shows the amount spent for books and
supplies per year by students at four-year public colleges.
a. Estimate the median amount spent.
b. Estimate the first and third quartiles for the amount spent.
c. Estimate the interquartile range for the amount spent.
d. Beyond what point is a value considered an outlier?
e. Identify any outliers and estimate their values.
f. Is the distribution symmetrical or positively or negatively
skewed?
4-27 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Question 15 LO4-4
Listed below are the commissions earned ($000) last year
by the 15 sales representatives at Furniture Patch Inc.
a. Determine the mean, median, and the standard
deviation.
b. Determine the coefficient of skewness using Pearson’s
method.
c. Determine the coefficient of skewness using the
software method.
4-28 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Question 17 LO4-5,6
Create a scatter diagram and compute a correlation
coefficient. How would you describe the relationship
between the values?
4-29 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.
Question 19 LO4-7
The Director of Planning for Devine Dining Inc. wishes to study the
relationship between the time of day a customer dined and whether the guest
orders dessert. To investigate the relationship, the manager collected the
following information on 200 recent customers.
a. What is the level of
measurement of the two variables?
b. What is the above table called?
c. Does the data suggest that customer are more likely to order dessert?
Explain why.
d. Does the data suggest that customers at lunch time are more likely to
order dessert? Explain why.
e. Does the data suggest that customers at dinner time are more likely to
order dessert? Explain why.
4-30 Copyright © 2022 McGraw-Hill Education. All rights reserved. No reproduction or distribution
without the prior written consent of McGraw-Hill Education.