CORRELATION ANALYSIS
Learning Objectives
• Understand how correlation can be used to demonstrate a
relationship between two factors.
• Know how to perform a correlation analysis and calculate the
coefficient of linear correlation (r).
• Understand how a correlation analysis can be used in an
improvement project.
How does it help?
Correlation
Correlation Analysis
Analysis is
is
necessary
necessary to:
to:
•show
•show aa relationship
relationship between
between two
two
variables.
variables. This
This also
also sets
sets the
the stage
stage
for
for potential
potential cause
cause and
and effect.
effect.
IMPROVEMENT ROADMAP
Uses of Correlation Analysis
Common Uses
Phase 1:
Measurement
Characterization
•Determine and quantify the
Phase 2:
Analysis
relationship between
factors (x) and output
Breakthrough
Strategy
characteristics (Y)..
Phase 3:
Improvement
Optimization
Phase 4:
Control
KEYS TO SUCCESS
Always plot the data
Remember: Correlation does not always imply cause & effect
Use correlation as a follow up to the Fishbone Diagram
Keep it simple and do not let the tool take on a life of its own
WHAT IS CORRELATION?
Output or y
variable
(dependent)
Correlation
Correlation
Y=
Y= f(x)
f(x)
As
As the
the input
input variable
variable changes,
changes,
there
there is
is an
an influence
influence or
or bias
bias on
on
the
the output
output variable.
variable.
Input or x variable (independent)
WHAT IS CORRELATION?
• A measurable relationship between two variable data characteristics.
Not necessarily Cause & Effect (Y=f(x))
• Correlation requires paired data sets (ie (Y1,x1), (Y2,x2), etc)
• The input variable is called the independent variable (x or KPIV) since it is independent
of any other constraints
• The output variable is called the dependent variable (Y or KPOV) since it is
(theoretically) dependent on the value of x.
• The coefficient of linear correlation “r” is the measure of the strength of the
relationship.
• The square of “r” is the percent of the response (Y) which is related to the input (x).
TYPES OF CORRELATION
Y=f(x) Strong Y=f(x) Weak Y=f(x) None
Positive
x x x
Negative
CALCULATING “r”
Coefficient of Linear Correlation
∑ ( )( )
•Calculate
•Calculate
s xy sample
sample covariance
covariance ((
x i − x yi − y ))
s xy =
n −1 •Calculate
•Calculate ssxx and
set
and ssyyfor
for each
each data
data
set
•Use
•Use the
the calculated
calculated values
values to
to
s xy compute
compute rrCALC .
CALC.
rCALC = •Add
•Add aa ++ for
for positive
positive correlation
correlation
sx s y and
and -- for
for aa negative
negative correlation.
correlation.
While this is the most precise method to calculate Pearson’s r,
there is an easier way to come up with a fairly close
approximation...
APPROXIMATING “r”
Coefficient of Linear Correlation
•Plot
•Plot the
the data
data on
on orthogonal
orthogonal axis
axis
•Draw
•Draw an
an Oval
Oval around
around the
the data
data
•Measure
•Measure the
the length
length and
and width
width of
of
the
the Oval
Oval
W •Calculate
•Calculate the
the coefficient
coefficient of
of linear
linear
correlation
correlation (r)
(r) based
based onon the
the
Y=f(x)
L formulas
formulas below
below
⎛ W⎞
r ≈ ±⎜1 − ⎟
x
⎝ L⎠
⎛ 6 .7 ⎞
r ≈ −⎜1 − ⎟ = − .47
⎝ 12 .6 ⎠
L + = positive slope
| | | |
W
| | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
6.7 12.6 - = negative slope
HOW DO I KNOW WHEN I HAVE CORRELATION?
Ordered r CRIT • The answer should strike a familiar cord at this point… We
Pairs have confidence (95%) that we have correlation when
5 .88 |rCALC|> rCRIT.
6 .81 •Since sample size is a key determinate of rCRIT we need to
7 .75 use a table to determine the correct rCRIT given the number
8 .71 of ordered pairs which comprise the complete data set.
9 .67 •So, in the preceding example we had 60 ordered pairs of
10 .63 data and we computed a rCALC of -.47. Using the table at
15 .51 the left we determine that the rCRIT value for 60 is .26.
20 .44 •Comparing |rCALC|> rCRIT we get .47 > .26. Therefore the
25 .40 calculated value exceeds the minimum critical value
30 .36 required for significance.
50 .28 • Conclusion: We are 95% confident that the observed
80 .22 correlation is significant.
100 .20
Learning Objectives
• Understand how correlation can be used to demonstrate a
relationship between two factors.
• Know how to perform a correlation analysis and calculate the
coefficient of linear correlation (r).
• Understand how a correlation analysis can be used in a blackbelt
story.