0% found this document useful (0 votes)

17 views25 pages

Slides 3

stat learning 3

Uploaded by

Pasxalis Itsios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views25 pages

Slides 3

stat learning 3

Uploaded by

Pasxalis Itsios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Lecture 3 - Classification and Non-Linear

Regression
Statistical Learning (CFAS420)

Alex Gibberd

Lancaster University

18th Feb 2020

Outline

Learning Outcomes:
I Understand the basic generalisations of linear models to binary,
categorical and ordinal outputs
I Understand how to interpret the logistic regression coefficients
(log-odds)
I Know how to convert a probability into a binary/ordinal decision via
cutpoints
I Know a range of ways to assess performance of a (binary)
classification model (Accuracy, Sensitivity, ROC)
I Recognise the mathematical construction of Generalised Additive
Models
I Understand how GAM’s may be used to model non-linear
relationships

2
Logistic Regression

I Consider we have binary outcomes coded as Y ∈ {1, 0}

I We want to model P(Y = 1|X = x)
– Treat this as a Bernoulli trial with probability
p(x) := E[Y|X] = P(Y = 1|X = x)
– Note: 0 ≤ p(x) ≤ 1 so need to transform somehow
I Taking log(p(x)) is only unbounded on one side (as p(x) → 0)
I Alternative is to take log of ratio:

p(x)
log = flin (x; β )
1 − p(x)

Generalised Linear Models (GLM) 4

Logistic Regression

I For generalisation purposes lets call

glogit (z) = log(z/{1 − z})

I Solving glogit (z) = flin (x; β ) for z gives:

1
z=
1 + exp{−flin (x; β )}

Generalised Linear Models (GLM) 5

Logistic Regression

I In summary
– We map binary outcome Y ∈ {0, 1} via probability to continuous
range (related to X)
– We then model this range as we would in linear regression, via
flin (x; β )
– We assumed that outcome was Bernoulli trial with probability p(x)
I Over n trials, we have a Binomial distribution, probability of getting
k positive outcomes from n coin tosses

Generalised Linear Models (GLM) 6

Interpreting Logistic Regression

I The odds of an outcome (say this is success) are given by

P[success] P[success]
Odd(success) := = .
P[not success] 1 − P[success]
I In the logit model, we note that replacing z = P[success] gives
p
log(Odds(success)) = α + ∑ Xi βi ,
i=1

I The scale of the regression coefficients determines how much the

log-odds of the outcome change in response to a covariate.

Generalised Linear Models (GLM) 7

Generalised Linear Models (GLM)

I More generally, we can consider different link functions g(z) and

g−1 (z)
– A common alternative (the probit model) is to use the cumulative
distribution function of the normal: g−1
probit (z) = Φ(z)
I If there are multiple outputs, we need to rethink our distributional
assumption
– Can either model cumulative probability P(Y ≤ k|X = x)
– Utilise a multinomial distribution (Y1 , . . . , Yk ) when taken over n trials
I General form of model looks like:

g(E[Y|X = x]) = flin (x; β )

Generalised Linear Models (GLM) 8

Binomial Classification

I The GLM models the expectation of the outcome E[Y]

I If Y can have two outcomes and these are coded as Y = 1 or Y = 0
the binomial noise distribution is appropriate
I However our Logistic (GLM) regression gives us

E[Y] = P(Y = 1) × 1 + P(Y = 0) × 0

= P(Y = 1)

I To make a prediction about Y|X we have to decide how to convert

this probabilities to either ŷ = 1 or ŷ = 0.
I To do this, we introduce a decision rule

Converting Probabilities to Predictions 10

Cutpoints

I The simplest way to convert the probability to a class is via a

hard-threshold
I Let us refer to this as a cutpoint, τ ∈ (0, 1)
I Specifically, we may consider
(
+ if P̂[Y = 1|X = x] > τ
ŷ = .
− if P̂[Y = 1|X = x] ≤ τ

– Note: I have recoded our classes here as (1, 0) ⇐⇒ (+, −)

I Remember P̂[Y = +|X = x] = g−1 (flin (x; β̂ )) is given via the GLM
model

Converting Probabilities to Predictions 11

Types of Error for Classification

I In a binary setting, imagine we have classes (+, −) there are only

four possible outcomes:
– A true positive (TP) ŷ = + & y = +
– A false positive (FP) ŷ = + & y = −
– A true negative (TN) ŷ = − & y = −
– A false negative (FN) ŷ = − & y = +
I A common way to analyse these rates is in terms of a confusion
matrix
I We count the number of each outcome and tabulate:
True
+ -
Pred + #TP #FN
- #FP #TN

Evaluating Classification Performance 13

Predicting Survival on Titanic

I One of the lab exercises will get you to predict survival using a GLM

– Predict class (outcome) probabilities

– Define decision rule (cutpoint) and apply rule
– Evaluate performance via confusion matrix

Evaluating Classification Performance 14

Summarising Classification Error

I There are a few more popular ways to summarise the classification

performance:
– Sensitivity: the empirical probability of correctly predicting class "+"
TP TP
Sensitivity := ≡
P TP + FN
– Specificity: the empirical probability of correctly predicting class "-"
TN TN
Specificity := ≡
N TN + FP
– Accuracy: the empirical probability of predicting the correct class
TP + TN TP + TN
Accuracy := ≡
P+N TP + TN + FP + FN

Evaluating Classification Performance 15

The Receiver Operating Characteristic (ROC)

I In the previous examples, we evaluated the classification

performance for a single cutpoint τ
I In practice, we do not always know where to place τ
– A large τ leads to high specificity, but low sensitivity and vice versa
– It is often informative to summarise classification performance across
a range of τ
I The ROC (curve) is a plot of specificity vs sensitivity, where each
point in the curve is given by evaluating a different τ.

Evaluating Classification Performance 16

ROC Curve for Titanic Predictions

Evaluating Classification Performance 17

Multinomial Regression

I We will now look at a form of GLM which can be used for the case
where we have more than K = 2 classes.
I Consider the case where the outcomes are unordered, nominal
categorical data
I Assumption: Independence of Irrelevant Alternatives (IIA)
– The odds of preferring one class over another do not depend on the
presence or absence of other "irrelevant" alternatives.
– For example, the relative probabilities of waling or taking a bus to
work do not change if a bicycle is added as an additional possibility.
I This allows the choice of K alternatives to be modeled as a set of
K − 1 independent binary choices

Multiclass Classification 19
Multinomial Regression

I One way to model this data is as a chain of log-odds relating to

different outcomes1

P(Yi = 1) (1)
ln = flin (xi ; β )
P(Yi = K)
..
.

P(Yi = K − 1) (K−1)
ln = flin (xi ; β )
P(Yi = K)
I Using the fact that all K probabilities sum to one we find

1
P(Yi = K) = (k)
.
1 + ∑K−1
k=1 exp{flin (xi ; β )}

I Exponentiating the chained equations leads to P(Yi = k) generally.

1 Note: x ∈ Rp
i
Multiclass Classification 20
Proportional Odds Regression (Ordinal Data)

I In some cases we may have categorical outcomes (multiple classes),

which we can ascribe some order to, these are known as ordinal data
I In these cases, we can use the ordering of the outcomes to simplify
the model
I We introduce multiple cutpoints

τ0 = −∞ < τ1 . . . τk < τK = ∞

and link this with the GLM

g(P(Y ≤ k|X = x)) = τk − flin (X; β̂ )

I Note: generally g(z) = logit(z).

I Importantly, we have far less parameters β as we do not need one
set for each outcome.

Multiclass Classification 21
Aside: Kolmogorov-Arnold Representation

I The Kolmogorov-Arnold Representation Theorem states that any

multivariate function f (x1 , . . . , xp ) can be written as a superposition
of functions acting on the individual variables x1 , . . . .xp separately.
I Specifically, one may write:
!
2p p
u(x1 , . . . , xp ) = ∑ φq ∑ fq,l (xl ) (1)
q=0 l=1

where φq and fq,l are some (potentially non-linear) functions.

I The main problem is, that while the theorem states a function of the
above form exists, it does not tell us how to actually identify such a
form.

Generalised Additive Models 23

Generalised Additive Models (GAM)

I The KA theorem is interesting as it suggests to model a multivariate

function in terms of a sum of univariate functions.
I A GAM model is similar, in that instead of assuming flin (x; β ) as
before, we now approximate fj (xj ) using a sum of basis functions.
I We use the GLM framework according to:

g(E[Y]) = α + f1 (x1 ) + f2 (x2 ) + · · · + fp (xp ) .

I For more details on GAM’s see the book by Simon Wood [1]

Generalised Additive Models 24

Basis Expansions

I To approximate the individual fj (xj ) we now use a set of further basis

functions
I Let ψj,k (xj ) represent a basis function for covariate j.
I Assuming that for each covariate we have Kj such functions, we can
construct an approximation of a smooth function fj (xj ) as
Kj
fj (xj ) = ∑ βk,j ψk,j (xj ) ,
k=1

where βk,j are a set of coefficients which need to be estimated.

I This is a linear (sum) approximation, but in terms of non-linear basis
functions

Generalised Additive Models 25

Spline Approximation

I There are a variety of different basis functions ψj,k one can use,
however, in GAM’s a popular choice is to use splines (piecewise
polynomial curve).

– The black line is being approximated by a weighted sum of the

others..

Generalised Additive Models 26

Example of GAM Estimation
I Consider the data Y = sin(X) + ε where X ∼ U (0, 2π) and
ε ∼ N (0, σ 2 )
I Fit a GAM using caret (via mgcv)

I View the results...pretty impressive

Generalised Additive Models 27

Summary

I Introduced logit transform to enable us to apply regression to binary

classification tasks
I Demonstrated how this can be generalised to multinomial,
categorical outcomes
I Presented various ways to assess binary classification accuracy
(ROC, Accuracy, Sensitivity,...)
I Introduced GAM models motivated by the KA representation
theorem
I Sketched how basis functions are used in GAMs to approximate
continuous smooth functions

Generalised Additive Models 28

In The Lab

1. How to fit logit and probit models for binary outcome data.
2. How to use binary variables as covariates via dummy variables
3. Predict survival on the titanic, and analyse your classifiers
performance
4. Fit a proportional odds model using caret

Generalised Additive Models 29

References I

S. Wood.
Generalized Additive Models: An Introduction with R.
2017.

Appendix 30

Advanced Regression with GLMs
No ratings yet
Advanced Regression with GLMs
13 pages
Class
No ratings yet
Class
102 pages
Lec 20
No ratings yet
Lec 20
16 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression Basics
No ratings yet
Logistic Regression Basics
18 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Logistic Regression & Model Evaluation
100% (1)
Logistic Regression & Model Evaluation
11 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
15 pages
Machine Learning for Engineering Students
No ratings yet
Machine Learning for Engineering Students
31 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
2223hk1 Slide03 ML2022
No ratings yet
2223hk1 Slide03 ML2022
33 pages
15 GLM
No ratings yet
15 GLM
32 pages
Classification Problems: - Outcome Is Categorical - E.G. Customer Responds or Not
No ratings yet
Classification Problems: - Outcome Is Categorical - E.G. Customer Responds or Not
10 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
11 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
6 pages
Softmax Regression for Data Scientists
100% (1)
Softmax Regression for Data Scientists
10 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
BANA 560 Lecture - 4 - LogisticRegression
No ratings yet
BANA 560 Lecture - 4 - LogisticRegression
26 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
i2ML Cheatsheets
No ratings yet
i2ML Cheatsheets
7 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
3 Classification
No ratings yet
3 Classification
26 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Logistic Regression Tutorial Python
No ratings yet
Logistic Regression Tutorial Python
30 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Business Analytics and Operations Research
No ratings yet
Business Analytics and Operations Research
34 pages
Lec 42
No ratings yet
Lec 42
12 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Multivariate Classification
No ratings yet
Multivariate Classification
7 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Linear Models For Classification: Logreg - PDF - May 4, 2010 - 1
No ratings yet
Linear Models For Classification: Logreg - PDF - May 4, 2010 - 1
7 pages
Jurnal Jutrids C Indu-1
No ratings yet
Jurnal Jutrids C Indu-1
14 pages
STAT Module I Notes
No ratings yet
STAT Module I Notes
10 pages
Statstics and Probability WEEK 1
0% (1)
Statstics and Probability WEEK 1
14 pages
S & Punit 1
No ratings yet
S & Punit 1
186 pages
Population and Sample
No ratings yet
Population and Sample
15 pages
Assignment 1 - Research Questions
No ratings yet
Assignment 1 - Research Questions
4 pages
Operational Analysis of A Select Spinning Mill - An Emprical Study
No ratings yet
Operational Analysis of A Select Spinning Mill - An Emprical Study
12 pages
Intro to Random Variables
No ratings yet
Intro to Random Variables
27 pages
Agrawal and Smith 1996
No ratings yet
Agrawal and Smith 1996
23 pages
Practice Question Set Linear Regression
No ratings yet
Practice Question Set Linear Regression
24 pages
STA 122 CBE Past Questions
0% (1)
STA 122 CBE Past Questions
41 pages
E-Exams Impact on Management Students
No ratings yet
E-Exams Impact on Management Students
11 pages
Experiment 2 - Causal Attribution
100% (1)
Experiment 2 - Causal Attribution
23 pages
(1977) (A) Psychophysical and Psychometric Approaches To Sensory Evaluation
No ratings yet
(1977) (A) Psychophysical and Psychometric Approaches To Sensory Evaluation
40 pages
Statement of The Problem
No ratings yet
Statement of The Problem
6 pages
SQQS1013-Chapter 5
No ratings yet
SQQS1013-Chapter 5
52 pages
Mentor Assessment 2
No ratings yet
Mentor Assessment 2
2 pages
983-Article Text-4328-1-10-20210926
No ratings yet
983-Article Text-4328-1-10-20210926
11 pages
Random Processes for Engineers
No ratings yet
Random Processes for Engineers
23 pages
Question - Fa Qmt181.sta104 - March-August 2021
No ratings yet
Question - Fa Qmt181.sta104 - March-August 2021
9 pages
Malamud y Turcotte
No ratings yet
Malamud y Turcotte
24 pages
Sarima
No ratings yet
Sarima
7 pages
Intersecting The Academic Gender Gap
No ratings yet
Intersecting The Academic Gender Gap
99 pages
ANN for Credit Default Prediction
No ratings yet
ANN for Credit Default Prediction
19 pages
Engineering Data Analysis Final Exam
No ratings yet
Engineering Data Analysis Final Exam
4 pages
The Image Processing Handbook Sixth Edition John C. Russ Download
100% (3)
The Image Processing Handbook Sixth Edition John C. Russ Download
146 pages
Toms Coons Mulattoes Mammies Bucks An Interpretive History of Blacks in American Films 4th Edition Donald Bogle Available Instanly
100% (6)
Toms Coons Mulattoes Mammies Bucks An Interpretive History of Blacks in American Films 4th Edition Donald Bogle Available Instanly
76 pages
Eviews Understanding
No ratings yet
Eviews Understanding
23 pages
4 Biology Section
0% (1)
4 Biology Section
208 pages
Example 10 of Industrial Stat
No ratings yet
Example 10 of Industrial Stat
4 pages