6.
6 SCATTER DIAGRAMS
6-1 Numerical Summaries of Data
Definition: Sample Mean
6-1 Data Summary and Display
Example 6-1
6-1 Data Summary and Display
Example 6-1
6-1 Data Summary and Display
Figure 6-1 Dot diagram showing the sample mean as a
balance point for a system of weights.
6-1 Data Summary and Display
Population Mean
For a finite population with N measurements, the
mean is
The sample mean x is a reasonable estimate of the
population mean µ .
6-1 Data Summary and Display
Although the sample mean is useful, it does not convey
all of the information about a sample of data
Sample Variance
6-1 Data Summary and Display
How Does the Sample Variance Measure Variability?
Figure 6-2 How the sample variance measures variability
through the deviations xi − x .
6-1 Data Summary and Display
6-1 Data Summary and Display
6-1 Data Summary and Display
Example 6-2
6-1 Data Summary and Display
Efficient Computation of s2 :
6-1 Data Summary and Display
Computation of s2
6-1 Data Summary and Display
Population Variance
When the population is finite and consists of N values,
we may define the population variance as
The sample variance is a reasonable estimate of the
population variance.
6-1 Data Summary and Display
Figure 6-3
Relationship between
a population and a
sample.
6-1 Data Summary and Display
Definition: Sample Range
6-1 Data Summary and Display
Sample Range
• It is easy to calculate, but it ignores all of the information
in the sample data between the largest and smallest values.
• Example: the samples [1, 3, 5, 8, 9] and [1, 5, 5, 5, 9] both
have the same range (r = 8). However, the standard
deviation s1 = 3.35 > s2 = 2.83
The variability is actually less in the second sample
• Sometimes, when the sample size is small (say 8 or 10) the
information loss associated with the range is not too serious
6-1 Data Summary and Display
Tutorial
6-1 Data Summary and Display
Tutorial
6-2 Stem-and-Leaf Diagrams
Steps for Constructing a Stem-and-Leaf Diagram
6-2 Stem-and-Leaf Diagrams
6-2 Stem-and-Leaf Diagrams
6-2 Stem-and-Leaf Diagrams
Figure 6-4 Stem-and-
leaf diagram for the
compressive strength
data in Table 6-2.
6-2 Stem-and-Leaf Diagrams
Example 6-4
Inspection of this display immediately reveals that:
• most of the compressive strengths lie between 110 and 200 psi
• a central value is somewhere between 150 and 160 psi
• the strengths are distributed approximately symmetrically
about the central value
The stem-and-leaf diagram enables us to determine quickly
some important features of the data that were not immediately
obvious in the original display in Table 6-2
6-2 Stem-and-Leaf Diagrams
Example 6-5
6-2 Stem-and-Leaf Diagrams
Figure 6-5 Stem-and-leaf displays
for Example 6-5
6-2 Stem-and-Leaf Diagrams
Figure 6-6 A
typical computer-
generated Stem-
and-leaf diagram.
6-2 Stem-and-Leaf Diagrams
Data Features
• The median is a measure of central tendency that divides the data
into two equal parts, half below the median and half above.
If the number of observations is even, the median is
halfway between the two central values.
From Fig. 6-6, the 40th and 41st values of strength as 160
and 163, so the median is (160 + 163)/2 = 161.5.
If the number of observations is odd the central value.
The sample mode is the most frequently occurring data value
The range is a measure of variability that can be easily computed
from the ordered stem-and-leaf display range = 245 - 76 = 169.
6-2 Stem-and-Leaf Diagrams
Data Features
When an ordered set of data is divided into four equal parts, the
division points are called quartiles.
The first or lower quartile, q1 , is a value that has approximately
one-fourth (25%) of the observations below it and approximately
75% of the observations above.
The second quartile, q2, has approximately one-half (50%) of the
observations below its value.
The second quartile is exactly equal to the median
The third or upper quartile, q3, has approximately three-fourths
(75%) of the observations below its value.
As the median, the quartiles may not be unique
6-2 Stem-and-Leaf Diagrams
Data Features
• The compressive strength data in Figure 6-6 n = 80 observations
• Minitab calculates the first and third quartiles as the (n + 1)/4 and
3(n + 1)/4 ordered observations and interpolates as needed
For example, (80 + 1)/4 = 20.25 and 3(80 + 1)/4 = 60.75.
• Therefore, Minitab interpolates:
• between the 20th and 21st observation to obtain q1 = 143.50
• between the 60th and 61st observation to obtain q3 =181.00.
6-2 Stem-and-Leaf Diagrams
Data Features
• The interquartile range is the difference between the upper
and lower quartiles, and it is sometimes used as a measure of
variability.
IQR=q3-q1
• In general, the 100kth percentile is a data value such that
approximately 100k% of the observations are at or below this
value and approximately 100(1 - k)% of them are above it.
6-2 Stem-and-Leaf Diagrams
Tutorial
6-16.
6-3 Stem-and-Leaf Diagrams
Tutorial: 6-16.
Median? Q1? Q3?
6-3 Stem-and-Leaf Diagrams
Tutorial: 6-16.
Median? Q1? Q3?
6-3 Frequency Distributions and Histograms
• A frequency distribution is a more compact
summary of data than a stem-and-leaf diagram.
• To construct a frequency distribution, we must divide
the range of the data into intervals, which are usually
called class intervals, cells, or bins.
Constructing a Histogram (Equal Bin Widths):
6-3 Frequency Distributions and Histograms
Figure 6-7 Histogram of compressive strength for 80
aluminum-lithium alloy specimens
6-3 Frequency Distributions and Histograms
Figure 6-8 A histogram of the compressive strength data
from Minitab with 17 bins.
6-3 Frequency Distributions and Histograms
Figure 6-9 A histogram of the compressive strength data
from Minitab with nine bins
6-3 Frequency Distributions and Histograms
Choose the number of bins approximately equal to the
square root of the number of observations
A frequency distribution for the comprehensive strength data in
Table 6-2 is:
6-3 Frequency Distributions and Histograms
Relative frequency distribution
Relative frequencies are found by dividing the observed
frequency in each bin by the total number of observations.
Cumulative frequency distribution
are often easier to interpret than tables of data.
6-3 Frequency Distributions and Histograms
Figure 6-10 A cumulative distribution plot of the
compressive strength data from Minitab.
6-3 Frequency Distributions and Histograms
Figure 6-11 Histograms for symmetric and skewed distributions.
6-3 Frequency Distributions and Histograms
• Frequency distributions and histograms can also be used with
qualitative or categorical data
6-3 Frequency Distributions and Histograms
Figure 6-12 Boing Airplane production in 1985
6-4 Box Plots
• The box plot is a graphical display that simultaneously
describes several important features of a data set, such
as:
• center
• spread
• departure from symmetry
• identification of observations that lie unusually far
from the bulk of the data (Outlier, Extreme outlier)
6-4 Box Plots
Figure 6-13 Description of a box plot (called
also box-and-whisker plots)
6-4 Box Plots
Figure 6-14 Box plot for compressive strength data in Table 6-2
6-4 Box Plots
Figure 6-14 Box plot for compressive strength data in Table 6-2
6-4 Box Plots
6-4 Box Plots
Figure 6-15
Comparative box
plots of a quality
index at three plants.
6-5 Time Sequence Plots
• A time series or time sequence is a data set in
which the observations are recorded in the order in
which they occur.
• A time series plot is a graph in which the vertical
axis denotes the observed value of the variable (say x)
and the horizontal axis denotes the time (which could
be minutes, days, years, etc.).
• When measurements are plotted as a time series, we
often see
•trends,
•cycles, or
•other broad features of the data
6-5 Time Sequence Plots
Figure 6-16 Company sales by year (a) and by quarter (b).
6-5 Time Sequence Plots
Figure 6-17 A digidot plot of the compressive strength data
in Table 6-2.
6-5 Time Sequence Plots
Figure 6-18 A digidot plot of chemical process concentration
readings, observed hourly.