Logistic Regression
~ by Heiletjé van Zyl (VZYHEI003@myuct.ac.za)
Room 5.61, P.D. Hahn Building, Level 5 (South entrance)
Linear regression Logistic regression
Assumptions Assumptions
(1) Underlying relationship between (1) BINARY dependent variable has a relationship
CONTINUOUS dependent and (either continuous with (either continuous and/or categorical)
and/or categorical) independent variable/s is independent variable/s
linear
(2) Errors are assumed to be: ∴ Outcome of interest = NOT continuous and so NOT normally distributed…
> Independent
> Normally distributed for all 𝒙 ‘s with mean
of 0 and constant variance
𝜺𝒊 ~ 𝑵(𝟎, 𝝈𝟐 )
Simple Multiple
𝒚
𝒚
𝒚
𝒙𝟐
𝒙𝟏
𝒙 𝒙𝟏
Application of LR
On 28 January 1986, the Space Shuttle Challenger exploded 73 seconds after lift-off
~ due to O-ring (circular gasket that sealed the right rocket booster) failure caused by exposure to very
low temperature (30 ℉ or -0.5℃)
Morton Thiokol engineers (specifically Roger Boisjoly) warned NASA management about this risk…
BUT NASA overruled these engineers, making an estimate that there is only a 1 in 100 000 chance of
shuttle failure for any given launch! Clearly subpar statistical reasoning = one of the main reasons the launch still went through!
“Take off your engineer hat, and put on your management hat” ~ NASA to Boisjoly
Application of LR
Of interest: to predict whether the Space Shuttle Challenger’s O-rings will fail or not,
contingent on the temperature at the time of the launch.
All of the variables involved:
• 𝒚 (failure) = binary indicating whether O-rings failure occurred (0) or not (1).
• 𝒙𝟏 (temperature) = temperature at time of Space Shuttle Challenger launch (in ℉)
Application of LR
Visualizing the data Superimposing SLR Superimposing LR
~ uses identity link ~ uses logit link
> Want to estimate the probability of getting a 1 (i.e., the O-rings NOT failing)
> Fitted line needs to be constrained such that it falls between 0 and 1…
Logit Transformation
Logistic regression uses the following logit link function:
Slope parameters
(related to different independent variables)
Dependent variable
This transformation
enables the
$!
logit 𝑝# = log( ) = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + … + 𝜷𝒑 𝒙𝒑𝒊 + 𝜺𝒊 constraining of 𝑝"
%&$!
to be between 0
and 1!
Intercept parameter
Independent variables
$! $
%&$!
= odds of outcome of interest occurring log(%&$! )= log-odds of outcome of interest occurring
!
Logit Transformation
𝑝7#
logit 𝑝7# = log( :𝟎 + 𝜷
)=𝜷 :𝟏 𝒙𝟏𝒊 + 𝜷
:𝟐 𝒙𝟐𝒊 + … + 𝜷
:𝒑 𝒙𝒑𝒊
1 − 𝑝7#
But recall the outcome of interest is still the predicted 𝑝" (probability)…
∴ NEED TO BACKTRANSFORM
(once model is fitted and 𝛽 coefficients estimated)
+ + + +
𝒆𝒙𝒑( 𝜷𝟎-𝜷𝟏𝒙𝟏𝒊-𝜷𝟐𝒙𝟐𝒊- … - 𝜷𝒑𝒙𝒑𝒊) 𝒆𝒙𝒑𝑳𝑶
𝑝7# = + + + + 𝑝#$ =
𝟏 + 𝒆𝒙𝒑( 𝜷𝟎-𝜷𝟏𝒙𝟏𝒊-𝜷𝟐𝒙𝟐𝒊- … - 𝜷𝒑𝒙𝒑𝒊) 𝟏 + 𝒆𝒙𝒑 𝑳𝟎
𝟏 𝟏
= =
'( 𝜷)𝟎 +𝜷)𝟏 𝒙𝟏𝒊 +𝜷)𝟐 𝒙𝟐𝒊 + … + 𝜷
)𝒑 𝒙 )
𝒑𝒊
𝟏-𝒆𝒙𝒑'(𝑳𝑶)
𝟏-𝒆𝒙𝒑
:𝟎 + 𝜷
where 𝜷 :𝟏 𝒙𝟏𝒊 + 𝜷
:𝟐 𝒙𝟐𝒊 + … + 𝜷
:𝒑 𝒙𝒑𝒊 = LOG-ODDS (LO)
𝑝!$ ∈ 0, 1 ∴ 𝑝!$ could take on value like 0.66…
BUT want to speak in terms of 0 and 1 only! So, then often specify a threshold 𝜋 value (e.g., 0.5) where:
𝑝7# = { 0,
1,
if 𝑝"! ≤ 𝜋
i𝐟 '𝑝! > 𝜋
Odds versus Log(odds)
$! $
= odds of outcome of interest occurring log(%&$! )= log-odds of outcome of interest occurring
%&$! !
Use ln button on
calculator to get this!
~ denotes for base e
DO NOT USE the log button!
~ denotes for base 10
(natural logarithm)
Logit Transformation
In terms of probability In terms of the logit transformation
~ want shape between p and X
= S-curved
Ways in which to interpret 𝛽! coefficients of fitted LR model
Log-odds Odds with factor
𝑝7# 𝑝7# + +
log( :𝟎 + 𝜷
)=𝜷 :𝟏 𝒙𝟏𝒊 = 𝒆𝒙𝒑(𝜷𝟎) × 𝒆𝒙𝒑(𝜷𝟏𝒙𝟏𝒊)
1 − 𝑝7# 1 − 𝑝7#
*𝟏 = change in logarithm of odds (log-odds) when *
𝜷 𝒆𝒙𝒑(𝜷𝟏) = factor that odds get multiplied by for
𝒙𝟏𝒊 is increased by one-unit. every one-unit increase in 𝒙𝟏𝒊 .
~ more than one-unit in 𝒙𝟏𝒊
&
𝒆𝒙𝒑( $ ×𝜷𝟏)
*
𝑟 ×𝜷*𝟏 = estimated change in the log-odds when 𝒙𝟏𝒊 𝒆𝒙𝒑( - ×𝜷𝟏) = factor that odds get multiplied by
is increased by 𝑟 –units. for every 𝑟 -unit increase in 𝒙𝟏𝒊 .
Ways in which to interpret factors
𝑝7# + +
= 𝒆𝒙𝒑(𝜷𝟎) × 𝒆𝒙𝒑(𝜷𝟏𝒙𝟏𝒊)
1 − 𝑝7#
&
𝒆𝒙𝒑(𝜷𝟏) = factor that odds get multiplied by for every one-unit change in 𝒙𝟏𝒊 .
+ Interpretation
Magnitude of factor 𝒆𝒙𝒑(𝜷𝟏)
Greater than 1 Positive effect – increase in odds
Less than 1 Negative effect – decrease in odds
Equal to 1 No effect
Examples:
)
> 𝒆𝒙𝒑(𝜷𝟏) = 1.8
~ one-unit increase in some 𝒙 is associated with the odds of event being multiplied by 1.8, meaning an increase of 80%.
)
> 𝒆𝒙𝒑(𝜷𝟏) = 0.6
~ one-unit increase in some 𝒙 is associated with the odds of event being multiplied by 0.6, meaning a decrease of 40%.
)
> 𝒆𝒙𝒑(𝜷𝟏) = 1
~ one-unit increase in some 𝒙 is associated with the odds of event being multiplied by 1, meaning there is no effect.
Odds Ratio (OR’s)
𝑝7#
log( :𝟎 + 𝜷
)=𝜷 :𝟏 𝒙𝟏𝒊
1 − 𝑝7#
Assumption that 𝒙𝟏𝒊 is a dichotomous variable, where 1 = exposed to disease and 0 = unexposed to disease
If 𝑝*
"- = Pr(event | unexposed):
/. & Substituting 0 into 𝒙𝟏𝒊
> Odds of event among unexposed are: /0
= 𝒆𝒙𝒑(𝜷𝟎)
01 /./0
If 𝑝*
"0 = Pr(event | exposed):
/. & & Substituting 1 into 𝒙𝟏𝒊
> Odds of event among exposed are: /1
= 𝒆𝒙𝒑(𝜷𝟎3𝜷𝟏)
012//1
Ratio of odds of event for an exposed person relative to an unexposed person:
32/1 * *
14 53/1 𝒆𝒙𝒑(𝜷𝟎7𝜷𝟏) &
𝑶𝑹 = 32 = * = 𝒆𝜷𝟏
/0 𝒆𝒙𝒑(𝜷𝟎)
14 63/0
Perform LR in R
In R use function:
glm(y ~ x1 + x2, data =…, family = “binomial”)
{ Code
*𝟎
𝜷
*𝟏
𝜷
{ Output
Using R for Interpretation
𝑝7#
log = 𝟏𝟎. 𝟖𝟕𝟓𝟑𝟓 − 𝟎. 𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)
1 − 𝑝7#
𝒆𝒙𝒑𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)
THEN BACKTRANSFORMS TO: 𝑝7# = 𝟏-𝒆𝒙𝒑𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)
𝟏
Alternatively: 𝑝7# =
𝟏-𝒆𝒙𝒑'(𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆 )
In terms of log-odds
> On average, the log-odds of O-rings failure occurring decreases by 0.17 units if temperature increases by 1 ℉
In terms of factor (one-unit)
> On average, 1 ℉ increase in temperature is associated with the odds of O-rings failure changing by a factor of
𝒆𝒙𝒑(8𝟎.𝟏𝟕𝟏𝟑𝟐) = 0.84, which translates to a 16% decrease in the odds of O-rings failure occurring.
In terms of factor (ten-unit)
> On average, 10 ℉ increase in temperature is associated with the odds of O-rings failure changing by a factor of
𝒆𝒙𝒑(8𝟎.𝟏𝟕𝟏𝟑𝟐 × 𝟏𝟎) = 0.18, which translates to an 82% decrease in the odds of O-rings failure occurring.
Checking significance of 𝒙 effect
Of interest: how confident about temperature effect being real?
.𝟏
Compute 95% confidence interval for 𝜷
$𝟏 ± 𝒛𝜶 × 𝒔 .
CI = 𝜷 𝜷𝟏
𝟐
Note: NOT from t-distribution, but the NORMAL!
Raw-scale OR-scale
(𝑒 −0.335 ; 𝑒 −0.008 )
CI = -0.17132 ± -1.046).
(-3.669, 1.96 × 0.08344
∴(0.715 ; 0.992)
∴(-0.335; -0.008)
Prediction
Of interest: to predict the probability of O-rings failure in relation to temperature being 30 ℉
Steps:
(1) Estimate log-odds by substituting in value of 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆 (i.e., 𝒙 value of interest)
(2) Substitute estimated log-odds in backtransformed 𝑝6" formula
𝑝7#
log = 𝟏𝟎. 𝟖𝟕𝟓𝟑𝟓 − 𝟎. 𝟏𝟕𝟏𝟑𝟐 (𝟑𝟎)
1 − 𝑝7#
𝒆𝒙𝒑𝟓.𝟕𝟑𝟓𝟕𝟓
THEN BACKTRANSFORMS TO: 𝑝7# =
𝟏-𝒆𝒙𝒑𝟓.𝟕𝟑𝟓𝟕𝟓
𝟏
Alternatively: 𝑝7# =
𝟏-𝒆𝒙𝒑'(𝟓.𝟕𝟑𝟓𝟕𝟓)
Both equal to 0.99678...
LABEL PROPERLY!!!!! Confusion Matrix Positive Predictive Value
-
OBSERVED PPV = -./
> probability of being positive given
that the prediction is positive.
> proportion of true positives (a) among
all the predicted positives (a+b)
Negative Predictive Value
PREDICTED
N
NPV =
MLN
> probability of being negative given
that the prediction is negative.
> proportion of true negatives (d)
among all the predicted negatives (c+d)
K N
Sensitivity = KLM Specificity =
OLN
> ability of the model to predict a true positive > ability of the model to predict a true negative
> proportion of true positives (a) among all the > proportion of true negatives (d) among all the
actual (observed) positives (a+c). actual (observed) negatives (b+d).
OBSERVED Confusion Matrix Threshold value chosen to be 𝜋 = 0.5
OBSERVED
PREDICTED O-rings Succeed O-rings Fail
7 0
PPV = = 0.1 ≈ 0.75
O-rings Succeed 3 1 789
PREDICTED
N ST
NPV = = ≈ 0.80
MLN RLST
O-rings Fail
4 16
K Q N ST
Sensitivity = KLM = QLR ≈ 0.4286 Specificity = = ≈ 0.9418
OLN SLST
Choosing threshold value (𝜋) = trade-off between sensitivity and specificity
Change in 𝜋 Effect
OVERALL = MAXIMISE
MODEL’S ABILITY TO Decrease Classify more observations = 1, ∴ ↑ sensitivity and ↓ specificity
CLASSIFY CORRECTLY!
Increase Classify more observations = 0 ∴ ↑ specificity and ↓ sensitivity
More observations classified as 1!
Confusion Matrix
Decrease threshold value to be 𝜋 = 0.2 (no longer 0.5)
OBSERVED
O-rings Succeed O-rings Fail
7 2
PPV = = 2.3 ≈ 0.4286
O-rings Succeed 6 8 789
PREDICTED
N U
O-rings Fail
9 NPV = = ≈ 0.900
1 MLN SLU
K T N U
Sensitivity = KLM = SLT ≈ 0.8571 Specificity = = ≈ 0.5294
OLN VLU
Evidently, decreasing 𝜋 results in the sensitivity increasing (from 0.4286 to
0.8571) and the specificity decreasing (from 0.9418 to 0.5294)…
More observations classified as 0!
Confusion Matrix
Increase threshold value to be 𝜋 = 0.8 (no longer 0.5)
OBSERVED
O-rings Succeed O-rings Fail
7 1
PPV = = 1.4 ≈ 1
O-rings Succeed 1 0 789
PREDICTED
N SW
O-rings Fail
17 NPV = = ≈ 0.7391
6 MLN TLSW
K S N SW
Sensitivity = KLM = SLT ≈ 0.1429 Specificity = = ≈1
OLN XLSW
Evidently, increasing 𝜋 results in the sensitivity decreasing (from 0.4286 to
0.1429) and the specificity increasing (from 0.9418 to 1)…