0% found this document useful (0 votes)

38 views20 pages

Logistic Regression-4

Logistic regression is used when the dependent variable is binary and non-normally distributed. It models the relationship between the log-odds of the dependent variable occurring and the independent variables. The logistic regression equation transforms the probability using the logit function to constrain the output between 0 and 1. The coefficients can be interpreted as how they affect the log-odds or odds of the dependent variable occurring.

Uploaded by

Amelie Griffith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views20 pages

Logistic Regression-4

Uploaded by

Amelie Griffith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Logistic Regression

~ by Heiletjé van Zyl (VZYHEI003@myuct.ac.za)

Room 5.61, P.D. Hahn Building, Level 5 (South entrance)
Linear regression Logistic regression
Assumptions Assumptions
(1) Underlying relationship between (1) BINARY dependent variable has a relationship
CONTINUOUS dependent and (either continuous with (either continuous and/or categorical)
and/or categorical) independent variable/s is independent variable/s
linear
(2) Errors are assumed to be: ∴ Outcome of interest = NOT continuous and so NOT normally distributed…
> Independent
> Normally distributed for all 𝒙 ‘s with mean
of 0 and constant variance
𝜺𝒊 ~ 𝑵(𝟎, 𝝈𝟐 )

Simple Multiple

𝒚
𝒚
𝒚
𝒙𝟐
𝒙𝟏

𝒙 𝒙𝟏
Application of LR
On 28 January 1986, the Space Shuttle Challenger exploded 73 seconds after lift-off
~ due to O-ring (circular gasket that sealed the right rocket booster) failure caused by exposure to very
low temperature (30 ℉ or -0.5℃)

Morton Thiokol engineers (specifically Roger Boisjoly) warned NASA management about this risk…
BUT NASA overruled these engineers, making an estimate that there is only a 1 in 100 000 chance of
shuttle failure for any given launch! Clearly subpar statistical reasoning = one of the main reasons the launch still went through!

“Take off your engineer hat, and put on your management hat” ~ NASA to Boisjoly
Application of LR

Of interest: to predict whether the Space Shuttle Challenger’s O-rings will fail or not,
contingent on the temperature at the time of the launch.

All of the variables involved:

• 𝒚 (failure) = binary indicating whether O-rings failure occurred (0) or not (1).
• 𝒙𝟏 (temperature) = temperature at time of Space Shuttle Challenger launch (in ℉)
Application of LR

Visualizing the data Superimposing SLR Superimposing LR

~ uses identity link ~ uses logit link

> Want to estimate the probability of getting a 1 (i.e., the O-rings NOT failing)
> Fitted line needs to be constrained such that it falls between 0 and 1…
Logit Transformation

Logistic regression uses the following logit link function:

Slope parameters
(related to different independent variables)

Dependent variable
This transformation
enables the
$!
logit 𝑝# = log( ) = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏𝒊 + 𝜷𝟐 𝒙𝟐𝒊 + … + 𝜷𝒑 𝒙𝒑𝒊 + 𝜺𝒊 constraining of 𝑝"
%&$!
to be between 0
and 1!
Intercept parameter

Independent variables

$! $
%&$!
= odds of outcome of interest occurring log(%&$! )= log-odds of outcome of interest occurring
!
Logit Transformation
𝑝7#
logit 𝑝7# = log( :𝟎 + 𝜷
)=𝜷 :𝟏 𝒙𝟏𝒊 + 𝜷
:𝟐 𝒙𝟐𝒊 + … + 𝜷
:𝒑 𝒙𝒑𝒊
1 − 𝑝7#

But recall the outcome of interest is still the predicted 𝑝" (probability)…

∴ NEED TO BACKTRANSFORM
(once model is fitted and 𝛽 coefficients estimated)

+ + + +
𝒆𝒙𝒑( 𝜷𝟎-𝜷𝟏𝒙𝟏𝒊-𝜷𝟐𝒙𝟐𝒊- … - 𝜷𝒑𝒙𝒑𝒊) 𝒆𝒙𝒑𝑳𝑶
𝑝7# = + + + + 𝑝#$ =
𝟏 + 𝒆𝒙𝒑( 𝜷𝟎-𝜷𝟏𝒙𝟏𝒊-𝜷𝟐𝒙𝟐𝒊- … - 𝜷𝒑𝒙𝒑𝒊) 𝟏 + 𝒆𝒙𝒑 𝑳𝟎

𝟏 𝟏
= =
'( 𝜷)𝟎 +𝜷)𝟏 𝒙𝟏𝒊 +𝜷)𝟐 𝒙𝟐𝒊 + … + 𝜷
)𝒑 𝒙 )
𝒑𝒊
𝟏-𝒆𝒙𝒑'(𝑳𝑶)
𝟏-𝒆𝒙𝒑

:𝟎 + 𝜷
where 𝜷 :𝟏 𝒙𝟏𝒊 + 𝜷
:𝟐 𝒙𝟐𝒊 + … + 𝜷
:𝒑 𝒙𝒑𝒊 = LOG-ODDS (LO)

𝑝!$ ∈ 0, 1 ∴ 𝑝!$ could take on value like 0.66…

BUT want to speak in terms of 0 and 1 only! So, then often specify a threshold 𝜋 value (e.g., 0.5) where:

𝑝7# = { 0,
1,
if 𝑝"! ≤ 𝜋
i𝐟 '𝑝! > 𝜋
Odds versus Log(odds)

$! $
= odds of outcome of interest occurring log(%&$! )= log-odds of outcome of interest occurring
%&$! !

Use ln button on
calculator to get this!
~ denotes for base e

DO NOT USE the log button!

~ denotes for base 10
(natural logarithm)
Logit Transformation

In terms of probability In terms of the logit transformation

~ want shape between p and X

= S-curved
Ways in which to interpret 𝛽! coefficients of fitted LR model

Log-odds Odds with factor

𝑝7# 𝑝7# + +
log( :𝟎 + 𝜷
)=𝜷 :𝟏 𝒙𝟏𝒊 = 𝒆𝒙𝒑(𝜷𝟎) × 𝒆𝒙𝒑(𝜷𝟏𝒙𝟏𝒊)
1 − 𝑝7# 1 − 𝑝7#

𝟏 = change in logarithm of odds (log-odds) when

𝜷 𝒆𝒙𝒑(𝜷𝟏) = factor that odds get multiplied by for
𝒙𝟏𝒊 is increased by one-unit. every one-unit increase in 𝒙𝟏𝒊 .

~ more than one-unit in 𝒙𝟏𝒊

&
𝒆𝒙𝒑( $ ×𝜷𝟏)
*
𝑟 ×𝜷*𝟏 = estimated change in the log-odds when 𝒙𝟏𝒊 𝒆𝒙𝒑( - ×𝜷𝟏) = factor that odds get multiplied by
is increased by 𝑟 –units. for every 𝑟 -unit increase in 𝒙𝟏𝒊 .
Ways in which to interpret factors
𝑝7# + +
= 𝒆𝒙𝒑(𝜷𝟎) × 𝒆𝒙𝒑(𝜷𝟏𝒙𝟏𝒊)
1 − 𝑝7#
&
𝒆𝒙𝒑(𝜷𝟏) = factor that odds get multiplied by for every one-unit change in 𝒙𝟏𝒊 .
+ Interpretation
Magnitude of factor 𝒆𝒙𝒑(𝜷𝟏)
Greater than 1 Positive effect – increase in odds
Less than 1 Negative effect – decrease in odds
Equal to 1 No effect

Examples:
)
> 𝒆𝒙𝒑(𝜷𝟏) = 1.8
~ one-unit increase in some 𝒙 is associated with the odds of event being multiplied by 1.8, meaning an increase of 80%.

)
> 𝒆𝒙𝒑(𝜷𝟏) = 0.6
~ one-unit increase in some 𝒙 is associated with the odds of event being multiplied by 0.6, meaning a decrease of 40%.

)
> 𝒆𝒙𝒑(𝜷𝟏) = 1
~ one-unit increase in some 𝒙 is associated with the odds of event being multiplied by 1, meaning there is no effect.
Odds Ratio (OR’s)
𝑝7#
log( :𝟎 + 𝜷
)=𝜷 :𝟏 𝒙𝟏𝒊
1 − 𝑝7#

Assumption that 𝒙𝟏𝒊 is a dichotomous variable, where 1 = exposed to disease and 0 = unexposed to disease

If 𝑝*
"- = Pr(event | unexposed):
/. & Substituting 0 into 𝒙𝟏𝒊
> Odds of event among unexposed are: /0
= 𝒆𝒙𝒑(𝜷𝟎)
01 /./0

If 𝑝*
"0 = Pr(event | exposed):
/. & & Substituting 1 into 𝒙𝟏𝒊
> Odds of event among exposed are: /1
= 𝒆𝒙𝒑(𝜷𝟎3𝜷𝟏)
012//1

Ratio of odds of event for an exposed person relative to an unexposed person:

32/1 * *
14 53/1 𝒆𝒙𝒑(𝜷𝟎7𝜷𝟏) &
𝑶𝑹 = 32 = * = 𝒆𝜷𝟏
/0 𝒆𝒙𝒑(𝜷𝟎)
14 63/0
Perform LR in R
In R use function:
glm(y ~ x1 + x2, data =…, family = “binomial”)

{ Code

*𝟎
𝜷
*𝟏
𝜷
{ Output
Using R for Interpretation

𝑝7#
log = 𝟏𝟎. 𝟖𝟕𝟓𝟑𝟓 − 𝟎. 𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)
1 − 𝑝7#

𝒆𝒙𝒑𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)

THEN BACKTRANSFORMS TO: 𝑝7# = 𝟏-𝒆𝒙𝒑𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)
𝟏
Alternatively: 𝑝7# =
𝟏-𝒆𝒙𝒑'(𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆 )

In terms of log-odds
> On average, the log-odds of O-rings failure occurring decreases by 0.17 units if temperature increases by 1 ℉

In terms of factor (one-unit)

> On average, 1 ℉ increase in temperature is associated with the odds of O-rings failure changing by a factor of
𝒆𝒙𝒑(8𝟎.𝟏𝟕𝟏𝟑𝟐) = 0.84, which translates to a 16% decrease in the odds of O-rings failure occurring.

In terms of factor (ten-unit)

> On average, 10 ℉ increase in temperature is associated with the odds of O-rings failure changing by a factor of
𝒆𝒙𝒑(8𝟎.𝟏𝟕𝟏𝟑𝟐 × 𝟏𝟎) = 0.18, which translates to an 82% decrease in the odds of O-rings failure occurring.
Checking significance of 𝒙 effect

Of interest: how confident about temperature effect being real?

.𝟏
Compute 95% confidence interval for 𝜷

$𝟏 ± 𝒛𝜶 × 𝒔 .
CI = 𝜷 𝜷𝟏
𝟐

Note: NOT from t-distribution, but the NORMAL!

Raw-scale OR-scale

(𝑒 −0.335 ; 𝑒 −0.008 )
CI = -0.17132 ± -1.046).
(-3.669, 1.96 × 0.08344
∴(0.715 ; 0.992)
∴(-0.335; -0.008)
Prediction
Of interest: to predict the probability of O-rings failure in relation to temperature being 30 ℉

Steps:
(1) Estimate log-odds by substituting in value of 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆 (i.e., 𝒙 value of interest)
(2) Substitute estimated log-odds in backtransformed 𝑝6" formula

𝑝7#
log = 𝟏𝟎. 𝟖𝟕𝟓𝟑𝟓 − 𝟎. 𝟏𝟕𝟏𝟑𝟐 (𝟑𝟎)
1 − 𝑝7#

𝒆𝒙𝒑𝟓.𝟕𝟑𝟓𝟕𝟓
THEN BACKTRANSFORMS TO: 𝑝7# =
𝟏-𝒆𝒙𝒑𝟓.𝟕𝟑𝟓𝟕𝟓
𝟏
Alternatively: 𝑝7# =
𝟏-𝒆𝒙𝒑'(𝟓.𝟕𝟑𝟓𝟕𝟓)

Both equal to 0.99678...

LABEL PROPERLY!!!!! Confusion Matrix Positive Predictive Value

-
OBSERVED PPV = -./
> probability of being positive given
that the prediction is positive.
> proportion of true positives (a) among
all the predicted positives (a+b)

Negative Predictive Value

PREDICTED
N
NPV =
MLN
> probability of being negative given
that the prediction is negative.
> proportion of true negatives (d)
among all the predicted negatives (c+d)

K N
Sensitivity = KLM Specificity =
OLN
> ability of the model to predict a true positive > ability of the model to predict a true negative
> proportion of true positives (a) among all the > proportion of true negatives (d) among all the
actual (observed) positives (a+c). actual (observed) negatives (b+d).
OBSERVED Confusion Matrix Threshold value chosen to be 𝜋 = 0.5

OBSERVED
PREDICTED O-rings Succeed O-rings Fail

7 0
PPV = = 0.1 ≈ 0.75
O-rings Succeed 3 1 789

PREDICTED
N ST
NPV = = ≈ 0.80
MLN RLST
O-rings Fail
4 16

K Q N ST
Sensitivity = KLM = QLR ≈ 0.4286 Specificity = = ≈ 0.9418
OLN SLST
Choosing threshold value (𝜋) = trade-off between sensitivity and specificity

Change in 𝜋 Effect
OVERALL = MAXIMISE
MODEL’S ABILITY TO Decrease Classify more observations = 1, ∴ ↑ sensitivity and ↓ specificity
CLASSIFY CORRECTLY!
Increase Classify more observations = 0 ∴ ↑ specificity and ↓ sensitivity
More observations classified as 1!
Confusion Matrix
Decrease threshold value to be 𝜋 = 0.2 (no longer 0.5)

OBSERVED
O-rings Succeed O-rings Fail

7 2
PPV = = 2.3 ≈ 0.4286
O-rings Succeed 6 8 789

PREDICTED
N U
O-rings Fail
9 NPV = = ≈ 0.900
1 MLN SLU

K T N U
Sensitivity = KLM = SLT ≈ 0.8571 Specificity = = ≈ 0.5294
OLN VLU

Evidently, decreasing 𝜋 results in the sensitivity increasing (from 0.4286 to

0.8571) and the specificity decreasing (from 0.9418 to 0.5294)…
More observations classified as 0!
Confusion Matrix
Increase threshold value to be 𝜋 = 0.8 (no longer 0.5)

OBSERVED
O-rings Succeed O-rings Fail

7 1
PPV = = 1.4 ≈ 1
O-rings Succeed 1 0 789

PREDICTED
N SW
O-rings Fail
17 NPV = = ≈ 0.7391
6 MLN TLSW

K S N SW
Sensitivity = KLM = SLT ≈ 0.1429 Specificity = = ≈1
OLN XLSW

Evidently, increasing 𝜋 results in the sensitivity decreasing (from 0.4286 to

0.1429) and the specificity increasing (from 0.9418 to 1)…

Logistic Nota
No ratings yet
Logistic Nota
87 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regresson
No ratings yet
Logistic Regresson
32 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Lec 20
No ratings yet
Lec 20
16 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
8 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
Day 4
No ratings yet
Day 4
29 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
Log Reg
No ratings yet
Log Reg
32 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Chap4 Logistic Regression
No ratings yet
Chap4 Logistic Regression
40 pages
Logistic and Nonlinear Regression: Department of Political Science AND International Relations Posc/Uapp 816
No ratings yet
Logistic and Nonlinear Regression: Department of Political Science AND International Relations Posc/Uapp 816
15 pages
Logistic Regression for Statisticians
No ratings yet
Logistic Regression for Statisticians
25 pages
FAQ - How Do I Interpret Odds Ratios in Logistic Regression
No ratings yet
FAQ - How Do I Interpret Odds Ratios in Logistic Regression
6 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
6 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Logit
No ratings yet
Logit
48 pages
Logistic Regression
No ratings yet
Logistic Regression
33 pages
Probit Logit Interpretation
100% (1)
Probit Logit Interpretation
26 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Logistic Regression for Researchers
100% (2)
Logistic Regression for Researchers
51 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
29 pages
MGT 6203 - Sri - M4 - Logistic Regression v042919
No ratings yet
MGT 6203 - Sri - M4 - Logistic Regression v042919
31 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Lec11 PDF
No ratings yet
Lec11 PDF
12 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Article: An Introduction Tos Logistic Regression Analysis and Reporting
No ratings yet
Article: An Introduction Tos Logistic Regression Analysis and Reporting
5 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
54 pages
ML Logistic Regression Module3 Final
No ratings yet
ML Logistic Regression Module3 Final
22 pages
Getting Started in Logit and Ordered Logit Regression: (Ver. 3.1 Beta)
No ratings yet
Getting Started in Logit and Ordered Logit Regression: (Ver. 3.1 Beta)
15 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
An Introduction To Logistic Regression: Johnwhitehead Department of Economics East Carolina University
No ratings yet
An Introduction To Logistic Regression: Johnwhitehead Department of Economics East Carolina University
48 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Logistic Regression: in Experimental Research
No ratings yet
Logistic Regression: in Experimental Research
12 pages
Logistic Reg
No ratings yet
Logistic Reg
54 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
48 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Logit PDF
No ratings yet
Logit PDF
44 pages
Logistic Regression: Jia Li
No ratings yet
Logistic Regression: Jia Li
44 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Advice: sciences/business/economics/kit-baum-workshops/Bham13P4slides PDF
No ratings yet
Advice: sciences/business/economics/kit-baum-workshops/Bham13P4slides PDF
11 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages

Logistic Regression-4

Uploaded by

Logistic Regression-4

Uploaded by

Logistic Regression

~ by Heiletjé van Zyl (VZYHEI003@myuct.ac.za)

All of the variables involved:

Visualizing the data Superimposing SLR Superimposing LR

~ uses identity link ~ uses logit link

Logistic regression uses the following logit link function:

𝑝!$ ∈ 0, 1 ∴ 𝑝!$ could take on value like 0.66…

DO NOT USE the log button!

In terms of probability In terms of the logit transformation

~ want shape between p and X

Log-odds Odds with factor

*𝟏 = change in logarithm of odds (log-odds) when *

~ more than one-unit in 𝒙𝟏𝒊

Ratio of odds of event for an exposed person relative to an unexposed person:

𝒆𝒙𝒑𝟏𝟎.𝟖𝟕𝟓𝟑𝟓 '𝟎.𝟏𝟕𝟏𝟑𝟐 (𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆)

In terms of factor (one-unit)

In terms of factor (ten-unit)

Of interest: how confident about temperature effect being real?

Note: NOT from t-distribution, but the NORMAL!

Both equal to 0.99678...

Negative Predictive Value

Evidently, decreasing 𝜋 results in the sensitivity increasing (from 0.4286 to

Evidently, increasing 𝜋 results in the sensitivity decreasing (from 0.4286 to

You might also like

𝟏 = change in logarithm of odds (log-odds) when