0% found this document useful (0 votes)

32 views42 pages

Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy

Uploaded by

hiphoplistener

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views42 pages

Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy

Uploaded by

hiphoplistener

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Chapter 4 – Linear Model

Prepared by: Shier Nee, SAW

Based on: Probabilistic Machine Learning by Kevin Murphy
Answer for Week 2: Exercise 5
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
# iris.data = [(Sepal Length, Sepal Width, Petal Length, Petal Width)]
X = iris.data[:, :3] # we take the first three features as X
y = iris.data[:, 3] # we take last feature as Y

# split data into train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

alphas = np.logspace(-10, 1.3, 20) # Regularization strength

nalphas = len(alphas)
mse_train = np.empty(nalphas)
mse_test = np.empty(nalphas)
ytest_pred_stored = dict()

for i, alpha in enumerate(alphas):

model = Ridge(alpha=alpha, fit_intercept=False)
model.fit(X_train, y_train)
ytrain_pred = model.predict(X_train)
ytest_pred = model.predict(X_test)
mse_train[i] = mse(ytrain_pred, y_train)
mse_test[i] = mse(ytest_pred, y_test)
ytest_pred_stored[alpha] = ytest_pred

# Plot MSE vs degree

fig, ax = plt.subplots()
mask = [True]*nalphas
ax.plot(alphas[mask], mse_test[mask], color = 'r', marker = 'x',label='test')
ax.plot(alphas[mask], mse_train[mask], color='b', marker = 's', label='train')
ax.set_xscale('log')
ax.legend(loc='upper right', shadow=True)
plt.xlabel('L2 regularizer')
plt.ylabel('mse')
plt.show()

print('The best L2 regularizer = ', alphas[np.argmin(abs(mse_train - mse_test))])

Recap
• Probability: Univariate Model – Gaussian
• Probability: Multivariate Model – Gaussian
• Statistic – Maximum Likelihood Estimation, Regularizer
• Decision Theory - Bayesian
• Information Theory - Entropy
• Optimization - Stochastic Gradient Descent
Outline
• Logistic Regression
• Linear Regression
• Generalized Linear Model
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression
Binary logistic regression often follows the following model

sigmoid bias
Bernoulli
weight

1 𝑝
𝑝 ( 𝑦 = 1| 𝑥 ; 𝜃 )= 𝜎 ( 𝑎 )= , 𝑤h𝑒𝑟𝑒 𝑎= 𝑙𝑜𝑔
1 +𝑒
−𝑎
, 1− 𝑝
Logistic Regression Linear Regression Generalized Linear Model

Linear Classifier
The prediction can be written as

𝑻
𝑓 ( 𝑥 ; 𝜃 )=𝑏+𝒘 𝒙

normal vector and an offset from

hyperplane
the origin

This linear hyperplane separate 3d space into half  decision boundary

In general, there will be uncertainty about the correct class label, so we need to predict a
probability distribution over labels, feed it to sigmoid function
Logistic Regression Linear Regression Generalized Linear Model

Sigmoid Function
Sigmoid function
1 𝑝 𝑻
𝜎 ( 𝑎) = , 𝑤h𝑒𝑟𝑒 𝑎=𝑙𝑜𝑔 = 𝑏+𝒘 𝒙
1+ 𝑒
−𝑎
, 1− 𝑝

Log loss and determine the steepness of the sigmoid function

Logistic Regression Linear Regression Generalized Linear Model

Non Linear Classifier

Transform input features in suitable way

Decision boundary (where f(x) = 0) defines a circle with radius R

Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Cost Function

Maximize Maximum Likelihood Estimation / Minimize Negative Log Likelihood

No of Sample
probability

Binary cross-entropy
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Cost Function

Check convexity

Error

Here, we can see that the gradient is weighed

by the error for each input
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Cost Function

Ensure NLL has bowl shape (global minimum)
 check Hessian matrix

Check convexity
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Cost Function

Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Optimizer

1. First order method
• Stochastic Gradient Descent

Slow convergence, when gradient is small

2. Second order method

• Newton Method (Iteratively reweighted least squares)
1
𝜃 𝑡 +1= 𝜃 𝑘 − α ′′
𝑓 ′ (𝜃 𝑘)
𝑓 ( 𝜃𝑘 )
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Overfitting

See any trend?

As degree increases,
w increase / decrease?
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Overfitting

Reduce Overfitting
 Do not let weight to grow
 Add regularizer to as penalty
Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Overfitting

Big λ / Small C  less flexible

Logistic Regression Linear Regression Generalized Linear Model

Logistic Regression – Overfitting

Binary Logistic Regression Multiple Logistic Regression

Probability

Activation function σ = sigmoid activation σ = softmax activation

Cost function

Gradient

Hessian
Logistic Regression Linear Regression Generalized Linear Model

Handling large number of classes

Using regular softmax function, when the number of classes, C increases, computational cost to
compute H increases

To facilitate this, we can use hierarchy softmax

The idea behind decomposing the output layer to a

binary tree was to reduce the complexity to obtain
probability distribution
Logistic Regression Linear Regression Generalized Linear Model

Handling imbalance class

- More attention on more ‘common’ dataset
- Less attention on ‘rare’ dataset

Approach
- Resample the data – Oversample / Undersample
Logistic Regression Linear Regression Generalized Linear Model

Handling outlier
Use Mixture model for the likelihood
Otherwise, is generated
using conditional model

is generated uniformly at
random with probability π
Logistic Regression Linear Regression Generalized Linear Model

Handling outlier – Bi tempered loss

Tempered cross-entropy

Tempered softmax
Logistic Regression Linear Regression Generalized Linear Model

Take 15min break

Logistic Regression Linear Regression Generalized Linear Model

Linear Regression
Follow the following equation

bias Slope / weight

If input is 1-D, simple linear regression

If input is N-D, multiple/multivariate linear regression

dimension
Logistic Regression Linear Regression Generalized Linear Model

Least square regression

weight Gaussian

Error Variance

The MLE is the point where

We can first optimize wrt w, and then solve for the optimal σ.
Logistic Regression Linear Regression Generalized Linear Model

Ordinal least square – 1D

Residual sum of square is given

y
(o𝑦 1 − 𝑤 𝑥 𝑥 1)
2 2

o
o

o o
2 2 x
𝑦 5 − 𝑤𝑥 𝑥 5 )
Logistic Regression Linear Regression Generalized Linear Model

Ordinal least square – 2D

Residual sum of square is given

Add one dimension

y
o
o

o
o
¿
o
¿
x
Logistic Regression Linear Regression Generalized Linear Model

How to get w?
Minimize RSS

We know y
o

o
o

o o
x
The ‘w’ can be obtained
Logistic Regression Linear Regression Generalized Linear Model

Ridge regression
Least square estimation can results in overfitting

y Training Testing
y
o o

o o Bad
o o

o o o o
x x

Ridge regression add a L2 regularizer to avoid overfitting (avoid very high gradient).
RSS + λ*w2
Logistic Regression Linear Regression Generalized Linear Model

Ridge regression
Zero λ = Least Square Big λ Very Big λ = 100000
y y y
o o o

o o o
o o o

o o o o o o
x x x

penalizing weights that become too

large in magnitude
Logistic Regression Linear Regression Generalized Linear Model

How to choose lambda

Ridge Regression add in a penalty function, L2 regularizer.

Methods:
1. Try with strong lambda and then softer it. Check results.  regularization path
2. Cross validation.
Logistic Regression Linear Regression Generalized Linear Model

Lasso Regression
y Least square regression, RSS  less bias, high variance
o
Ridge regression, RSS + λ*w2  high bias, low variance
o
o
Ridge allows parameters to be small but
o o cannot reach zero.

x Lasso allow parameters to be exactly zero.

Lasso regression, RSS + λ*|w|
This is useful because it can be used to
Absolute of w perform feature selection, where the weight of
certain features can be zero.

 Can make equation simpler

Logistic Regression Linear Regression Generalized Linear Model

Q-norm
General Equation

L0 loss L1 loss L2 loss

Elastic Net – Lasso + Ridge

Logistic Regression Linear Regression Generalized Linear Model

Example - Cancer Data

Least Square – Worst

Ridge – weight is smaller but won’t reach zero, better than LS
Lasso – some features’ weight are zero; features eliminated
Elastic Net - Best
Logistic Regression Linear Regression Generalized Linear Model

Generalized Linear Model

If we have the following, ordinal least square is not suitable. o
y
• Exponential graph
o
o
• Variance of errors in y is not constant, and varies with X.
o o
x
• Response variable is not continuous, but categorical.
y

o o
o

o o
x
Logistic Regression Linear Regression Generalized Linear Model

Generalized Linear Model

We can’t use linear regression.
Variance increase with x
A suitable regression is Poisson Regression,
one type of GLM model.
Logistic Regression Linear Regression Generalized Linear Model

GLM – Poisson Regression

GLM normally made up of three components:
1. Linear predictor – b0+b1x
2. Link function – log link function
3. Probability distribution – Poisson distribution
Logistic Regression Linear Regression Generalized Linear Model

GLM – Linear/Logistic Regression

Linear Regression
1. Linear predictor – b0+b1x
2. Link function – identify link function
3. Probability distribution – Normal distribution

Logistic Regression
1. Linear predictor – b0+b1x
2. Link function – logic link function
3. Probability distribution – Binomial / Bernoulli distribution
Logistic Regression Linear Regression Generalized Linear Model

Custom GLM
Relationship between x and y is not linear.
Link function = Log link function
Logistic Regression Linear Regression Generalized Linear Model

Custom GLM
Variance seems constants

Which probability distribution for variance to

choose?
1. Normal
2. Poisson
Which probability
distribution for variance to
choose?

ⓘ Start presenting to display the poll results on this slide.

Logistic Regression Linear Regression Generalized Linear Model

Custom GLM
Variance seems constants

Which probability distribution for variance to

choose?
1. Normal
2. Poisson
Logistic Regression Linear Regression Generalized Linear Model

Let’s go to colab to try out creating

Logistic Regression with Pytorch.

CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Intro to Classification & Regression
No ratings yet
Intro to Classification & Regression
42 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Data Science Course Syllabus
No ratings yet
Data Science Course Syllabus
104 pages
3.1 Linear and Logistic Regression
No ratings yet
3.1 Linear and Logistic Regression
36 pages
CS2011 2
No ratings yet
CS2011 2
14 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
Training Models
No ratings yet
Training Models
13 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Unit 2
No ratings yet
Unit 2
8 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Introml 02 Regression Annotated PDF
No ratings yet
Introml 02 Regression Annotated PDF
26 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Output 23
No ratings yet
Output 23
6 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
ML 1
No ratings yet
ML 1
24 pages
DDA3020 Lecture 06 Logistic Regression
No ratings yet
DDA3020 Lecture 06 Logistic Regression
47 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Exp 2
No ratings yet
Exp 2
7 pages
Regression
No ratings yet
Regression
25 pages
w02 LectureSlices MA4550
No ratings yet
w02 LectureSlices MA4550
27 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Group 30
No ratings yet
Group 30
33 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Unit-2 ML
No ratings yet
Unit-2 ML
199 pages
HW 4
No ratings yet
HW 4
7 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
04 Linear
No ratings yet
04 Linear
31 pages
Effector: A Python Package For Regional Explanations
No ratings yet
Effector: A Python Package For Regional Explanations
33 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
Chapter 4 - Machine Learning With Graphs II: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 4 - Machine Learning With Graphs II: Prepared By: Shier Nee, SAW
48 pages
Chapter 4 - Machine Learning With Graphs III: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 4 - Machine Learning With Graphs III: Prepared By: Shier Nee, SAW
71 pages
Chapter 4 - Machine Learning With Graphs I: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 4 - Machine Learning With Graphs I: Prepared By: Shier Nee, SAW
42 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
3D Scanner Company: Business Proposal
No ratings yet
3D Scanner Company: Business Proposal
2 pages
Portada Documentos
No ratings yet
Portada Documentos
2 pages
PROJ6003 Project Execution and Control Risk Management Part-B
No ratings yet
PROJ6003 Project Execution and Control Risk Management Part-B
8 pages
CUCM BK E92F409A 00 Elm-User-Guide-912
No ratings yet
CUCM BK E92F409A 00 Elm-User-Guide-912
60 pages
CSS MCQ Set Paper 2
No ratings yet
CSS MCQ Set Paper 2
6 pages
Organizational Readiness and Its Contributing Factors To Adopt KM Processes: A Conceptual Model
No ratings yet
Organizational Readiness and Its Contributing Factors To Adopt KM Processes: A Conceptual Model
9 pages
3G611R+ User Guide
No ratings yet
3G611R+ User Guide
107 pages
FST-1 e 2024
No ratings yet
FST-1 e 2024
3 pages
Waheed
No ratings yet
Waheed
9 pages
EVU Website Manual EN V1.1 2022-08-01
No ratings yet
EVU Website Manual EN V1.1 2022-08-01
9 pages
Screen Cnvm4270a
No ratings yet
Screen Cnvm4270a
43 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
3 pages
Influence of Geogebra On Problem Solving Strategies: Núria Iranzo and Josep Maria Fortuny
No ratings yet
Influence of Geogebra On Problem Solving Strategies: Núria Iranzo and Josep Maria Fortuny
2 pages
Strategic Management Essentials
No ratings yet
Strategic Management Essentials
3 pages
Comp Literacy - Chapter 6 Computer Ethic and Security
No ratings yet
Comp Literacy - Chapter 6 Computer Ethic and Security
21 pages
LA 310 Liquid Adhesive Piston Pumps
No ratings yet
LA 310 Liquid Adhesive Piston Pumps
2 pages
Queuing Theory: Single vs Multi-Server Systems
No ratings yet
Queuing Theory: Single vs Multi-Server Systems
59 pages
EST Practice
No ratings yet
EST Practice
27 pages
Strategi Perusahaan Bidang Konstruksi Dalam Menghadapi Resesi Ekonomi (Studi Kasus PT Adhi Karya (Persero) TBK)
100% (1)
Strategi Perusahaan Bidang Konstruksi Dalam Menghadapi Resesi Ekonomi (Studi Kasus PT Adhi Karya (Persero) TBK)
9 pages
Frequently Asked Questions (Faq) For Italk: NO Answer
0% (1)
Frequently Asked Questions (Faq) For Italk: NO Answer
4 pages
High Vibration of GA-101D
100% (2)
High Vibration of GA-101D
17 pages
Scheme & Syllabus For Master of Computer Applications
No ratings yet
Scheme & Syllabus For Master of Computer Applications
110 pages
Válvula Descarga 3
No ratings yet
Válvula Descarga 3
2 pages
18bit62c U2
No ratings yet
18bit62c U2
14 pages
How I Became A Pivotal Spring Professional Certified 5.0 PDF
No ratings yet
How I Became A Pivotal Spring Professional Certified 5.0 PDF
6 pages
GPS Rig Acceptance Surveys
100% (1)
GPS Rig Acceptance Surveys
5 pages
IEEE Paper Formatting Guide
No ratings yet
IEEE Paper Formatting Guide
3 pages
Price Guide: For Our New Mobile Broadband Plans
No ratings yet
Price Guide: For Our New Mobile Broadband Plans
18 pages
OM 04 - Unit-2
No ratings yet
OM 04 - Unit-2
56 pages
VMmark Users Guide 3.1 2019-04-15 PDF
No ratings yet
VMmark Users Guide 3.1 2019-04-15 PDF
104 pages
Wt6533-19-Aa: XX Pol - Vet Panel - 65° / 33° - 18.0 / 20.0 Dbi
100% (1)
Wt6533-19-Aa: XX Pol - Vet Panel - 65° / 33° - 18.0 / 20.0 Dbi
2 pages
NCC - Reference Letter
No ratings yet
NCC - Reference Letter
2 pages

Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy

Uploaded by

Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy

Uploaded by

Chapter 4 – Linear Model

Prepared by: Shier Nee, SAW

# split data into train and test

alphas = np.logspace(-10, 1.3, 20) # Regularization strength

for i, alpha in enumerate(alphas):

# Plot MSE vs degree

print('The best L2 regularizer = ', alphas[np.argmin(abs(mse_train - mse_test))])

normal vector and an offset from

This linear hyperplane separate 3d space into half  decision boundary

Log loss and determine the steepness of the sigmoid function

Non Linear Classifier

Decision boundary (where f(x) = 0) defines a circle with radius R

Logistic Regression – Cost Function

Logistic Regression – Cost Function

Here, we can see that the gradient is weighed

Logistic Regression – Cost Function

Logistic Regression – Cost Function

Logistic Regression – Optimizer

Slow convergence, when gradient is small

2. Second order method

Logistic Regression – Overfitting

See any trend?

Logistic Regression – Overfitting

Logistic Regression – Overfitting

Big λ / Small C  less flexible

Logistic Regression – Overfitting

Activation function σ = sigmoid activation σ = softmax activation

Handling large number of classes

To facilitate this, we can use hierarchy softmax

The idea behind decomposing the output layer to a

Handling imbalance class

Handling outlier – Bi tempered loss

Take 15min break

bias Slope / weight

If input is 1-D, simple linear regression

If input is N-D, multiple/multivariate linear regression

Least square regression

The MLE is the point where

Ordinal least square – 1D

Ordinal least square – 2D

Add one dimension

penalizing weights that become too

How to choose lambda

x Lasso allow parameters to be exactly zero.

 Can make equation simpler

L0 loss L1 loss L2 loss

Elastic Net – Lasso + Ridge

Example - Cancer Data

Least Square – Worst

Generalized Linear Model

Generalized Linear Model

GLM – Poisson Regression

GLM – Linear/Logistic Regression

Which probability distribution for variance to

ⓘ Start presenting to display the poll results on this slide.

Which probability distribution for variance to

Let’s go to colab to try out creating

You might also like