0% found this document useful (0 votes)
13 views54 pages

CH 06

The document discusses various methods for summarizing and displaying data, including the sample mean, sample variance, and sample range, highlighting their usefulness and limitations. It also covers stem-and-leaf diagrams, frequency distributions, histograms, and box plots, explaining how these tools can reveal important features of data sets. Additionally, time series plots are introduced to illustrate trends and cycles in data collected over time.

Uploaded by

ali282h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views54 pages

CH 06

The document discusses various methods for summarizing and displaying data, including the sample mean, sample variance, and sample range, highlighting their usefulness and limitations. It also covers stem-and-leaf diagrams, frequency distributions, histograms, and box plots, explaining how these tools can reveal important features of data sets. Additionally, time series plots are introduced to illustrate trends and cycles in data collected over time.

Uploaded by

ali282h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

6.

6 SCATTER DIAGRAMS
6-1 Numerical Summaries of Data

Definition: Sample Mean


6-1 Data Summary and Display
Example 6-1
6-1 Data Summary and Display
Example 6-1
6-1 Data Summary and Display

Figure 6-1 Dot diagram showing the sample mean as a


balance point for a system of weights.
6-1 Data Summary and Display

Population Mean
For a finite population with N measurements, the
mean is

The sample mean x is a reasonable estimate of the


population mean µ .
6-1 Data Summary and Display

Although the sample mean is useful, it does not convey


all of the information about a sample of data
 Sample Variance
6-1 Data Summary and Display
How Does the Sample Variance Measure Variability?

Figure 6-2 How the sample variance measures variability


through the deviations xi − x .
6-1 Data Summary and Display
6-1 Data Summary and Display
6-1 Data Summary and Display

Example 6-2
6-1 Data Summary and Display

Efficient Computation of s2 :
6-1 Data Summary and Display

Computation of s2
6-1 Data Summary and Display

Population Variance
When the population is finite and consists of N values,
we may define the population variance as

The sample variance is a reasonable estimate of the


population variance.
6-1 Data Summary and Display

Figure 6-3
Relationship between
a population and a
sample.
6-1 Data Summary and Display

Definition: Sample Range


6-1 Data Summary and Display

Sample Range
• It is easy to calculate, but it ignores all of the information
in the sample data between the largest and smallest values.
• Example: the samples [1, 3, 5, 8, 9] and [1, 5, 5, 5, 9] both
have the same range (r = 8). However, the standard
deviation s1 = 3.35 > s2 = 2.83

 The variability is actually less in the second sample

• Sometimes, when the sample size is small (say 8 or 10) the


information loss associated with the range is not too serious
6-1 Data Summary and Display

Tutorial
6-1 Data Summary and Display
Tutorial
6-2 Stem-and-Leaf Diagrams

Steps for Constructing a Stem-and-Leaf Diagram


6-2 Stem-and-Leaf Diagrams
6-2 Stem-and-Leaf Diagrams
6-2 Stem-and-Leaf Diagrams

Figure 6-4 Stem-and-


leaf diagram for the
compressive strength
data in Table 6-2.
6-2 Stem-and-Leaf Diagrams

Example 6-4
Inspection of this display immediately reveals that:

• most of the compressive strengths lie between 110 and 200 psi
• a central value is somewhere between 150 and 160 psi
• the strengths are distributed approximately symmetrically
about the central value

 The stem-and-leaf diagram enables us to determine quickly


some important features of the data that were not immediately
obvious in the original display in Table 6-2
6-2 Stem-and-Leaf Diagrams

Example 6-5
6-2 Stem-and-Leaf Diagrams

Figure 6-5 Stem-and-leaf displays


for Example 6-5
6-2 Stem-and-Leaf Diagrams

Figure 6-6 A
typical computer-
generated Stem-
and-leaf diagram.
6-2 Stem-and-Leaf Diagrams
Data Features
• The median is a measure of central tendency that divides the data
into two equal parts, half below the median and half above.
 If the number of observations is even, the median is
halfway between the two central values.
From Fig. 6-6, the 40th and 41st values of strength as 160
and 163, so the median is (160 + 163)/2 = 161.5.
 If the number of observations is odd  the central value.

The sample mode is the most frequently occurring data value

The range is a measure of variability that can be easily computed


from the ordered stem-and-leaf display  range = 245 - 76 = 169.
6-2 Stem-and-Leaf Diagrams
Data Features
When an ordered set of data is divided into four equal parts, the
division points are called quartiles.

The first or lower quartile, q1 , is a value that has approximately


one-fourth (25%) of the observations below it and approximately
75% of the observations above.

The second quartile, q2, has approximately one-half (50%) of the


observations below its value.
 The second quartile is exactly equal to the median

The third or upper quartile, q3, has approximately three-fourths


(75%) of the observations below its value.
 As the median, the quartiles may not be unique
6-2 Stem-and-Leaf Diagrams
Data Features
• The compressive strength data in Figure 6-6  n = 80 observations

• Minitab calculates the first and third quartiles as the (n + 1)/4 and
3(n + 1)/4 ordered observations and interpolates as needed

 For example, (80 + 1)/4 = 20.25 and 3(80 + 1)/4 = 60.75.

• Therefore, Minitab interpolates:


• between the 20th and 21st observation to obtain q1 = 143.50
• between the 60th and 61st observation to obtain q3 =181.00.
6-2 Stem-and-Leaf Diagrams

Data Features
• The interquartile range is the difference between the upper
and lower quartiles, and it is sometimes used as a measure of
variability.
 IQR=q3-q1

• In general, the 100kth percentile is a data value such that


approximately 100k% of the observations are at or below this
value and approximately 100(1 - k)% of them are above it.
6-2 Stem-and-Leaf Diagrams

Tutorial
6-16.
6-3 Stem-and-Leaf Diagrams

Tutorial: 6-16.

Median? Q1? Q3?


6-3 Stem-and-Leaf Diagrams

Tutorial: 6-16.

Median? Q1? Q3?


6-3 Frequency Distributions and Histograms

• A frequency distribution is a more compact


summary of data than a stem-and-leaf diagram.

• To construct a frequency distribution, we must divide


the range of the data into intervals, which are usually
called class intervals, cells, or bins.

Constructing a Histogram (Equal Bin Widths):


6-3 Frequency Distributions and Histograms

Figure 6-7 Histogram of compressive strength for 80


aluminum-lithium alloy specimens
6-3 Frequency Distributions and Histograms

Figure 6-8 A histogram of the compressive strength data


from Minitab with 17 bins.
6-3 Frequency Distributions and Histograms

Figure 6-9 A histogram of the compressive strength data


from Minitab with nine bins
6-3 Frequency Distributions and Histograms

Choose the number of bins approximately equal to the


square root of the number of observations

A frequency distribution for the comprehensive strength data in


Table 6-2 is:
6-3 Frequency Distributions and Histograms

Relative frequency distribution


 Relative frequencies are found by dividing the observed
frequency in each bin by the total number of observations.

Cumulative frequency distribution


 are often easier to interpret than tables of data.
6-3 Frequency Distributions and Histograms

Figure 6-10 A cumulative distribution plot of the


compressive strength data from Minitab.
6-3 Frequency Distributions and Histograms

Figure 6-11 Histograms for symmetric and skewed distributions.


6-3 Frequency Distributions and Histograms

• Frequency distributions and histograms can also be used with


qualitative or categorical data
6-3 Frequency Distributions and Histograms

Figure 6-12 Boing Airplane production in 1985


6-4 Box Plots

• The box plot is a graphical display that simultaneously


describes several important features of a data set, such
as:
• center
• spread
• departure from symmetry
• identification of observations that lie unusually far
from the bulk of the data (Outlier, Extreme outlier)
6-4 Box Plots

Figure 6-13 Description of a box plot (called


also box-and-whisker plots)
6-4 Box Plots

Figure 6-14 Box plot for compressive strength data in Table 6-2
6-4 Box Plots

Figure 6-14 Box plot for compressive strength data in Table 6-2
6-4 Box Plots
6-4 Box Plots

Figure 6-15
Comparative box
plots of a quality
index at three plants.
6-5 Time Sequence Plots

• A time series or time sequence is a data set in


which the observations are recorded in the order in
which they occur.
• A time series plot is a graph in which the vertical
axis denotes the observed value of the variable (say x)
and the horizontal axis denotes the time (which could
be minutes, days, years, etc.).
• When measurements are plotted as a time series, we
often see
•trends,
•cycles, or
•other broad features of the data
6-5 Time Sequence Plots

Figure 6-16 Company sales by year (a) and by quarter (b).


6-5 Time Sequence Plots

Figure 6-17 A digidot plot of the compressive strength data


in Table 6-2.
6-5 Time Sequence Plots

Figure 6-18 A digidot plot of chemical process concentration


readings, observed hourly.

You might also like