0% found this document useful (0 votes)

46 views37 pages

Python Tutorial

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views37 pages

Python Tutorial

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Linear Regression

Yijun Zhao
Northeastern University

Fall 2016

Yijun Zhao Linear Regression

Regression Examples
Any Attributes Continuous Value
x =⇒ y
{age, major , gender , race} ⇒GPA
{income, credit score, profession} ⇒ loan
{college, major , GPA} ⇒ future income
..
.

Yijun Zhao Linear Regression

Regression Examples
Data often has/can be converted into matrix form:

Age Gender Race Major GPA

20 0 A Art 3.85
22 0 C Engineer 3.90
25 1 A Engineer 3.50
24 0 AA Art 3.60
19 1 H Art 3.70
18 1 C Engineer 3.00
30 0 AA Engineer 3.80
25 0 C Engineer 3.95
28 1 A Art 4.00
26 0 C Engineer 3.20

Yijun Zhao Linear Regression

Formal Problem Setup
Given N observations

{(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}

a regression problem tries to uncover the function

yi = f (xi ) ∀i = 1, 2. . . . , n

such that for a new input value x∗ , we can

accurately predict the corresponding value

y∗ = f (x∗ ).

Yijun Zhao Linear Regression

Linear Regression
Assume the function f is a linear combination
of components in x
Formally, let x = (1, x1 , x2 , . . . , xd )T , we have
y = ω0 + ω1 x 1 + ω2 x 2 + · · · + ω d x d
= wT x
where w = (ω0 , ω1 , ω2 , . . . , ωd )T
w is the parameter to estimate !
Prediction:
y∗ = wT x∗

Yijun Zhao Linear Regression

Visual Illustration

Figure: 1D and 2D linear regression

Yijun Zhao Linear Regression

Error Measure
Mean Squared Error (MSE):
N
1X T
E (w) = (w xn − yn )2
N n=1
1
= k Xw − y k2
where N
   
— x1 T — y1
 — x2 T —   y2 
X=  y= 
 ... 
 ... 
T yN
— xN —

Yijun Zhao Linear Regression

Minimizing Error Measure
1
E (w) = N k Xw − y k2

5E (w) = N2 XT(Xw − y) = 0

XTXw = XTy

w = X† y

where X† = (XTX)−1XT is the

’pseudo-inverse’ of X

Yijun Zhao Linear Regression

LR Algorithm Summary
Ordinary Least Squares (OLS) Algorithm
Construct matrix X and the vector y from
the dataset {(x1 , y1 ), x2 , y2 ), . . . , (xN , yN )}
(each x includes x0 = 1) as follows:
   
— xT 1 — y1
 — xT —   y2 
X= 2  y =  
 ...   ... 
— xT N — yN
Compute X† = (XT X)−1 XT
Return w = X† y

Yijun Zhao Linear Regression

Gradient Descent
Why?
Minimize our target function (E (w )) by
moving down in the steepest direction

Yijun Zhao Linear Regression

Gradient Descent

Gradient Descent Algorithm

Initialize the w(0) for time t = 0
for t = 0, 1, 2, . . . do
Compute the gradient gt = 5E (w(t))
Set the direction to move, vt = −gt
Update w(t + 1) = w(t) + ηvt
Iterate until it is time to stop
Return the final weights w

Yijun Zhao Linear Regression

Gradient Descent
How η affects the algorithm?

Use 0.1 (practical observation)

Use variable size: ηt = η k 5E k

Yijun Zhao Linear Regression

OLS or Gradient Descent?

Yijun Zhao Linear Regression

Computational Complexity

OLS Gradient Descent

OLS is expensive when D is large!

Yijun Zhao Linear Regression

Linear Regression

What is the Probabilistic Interpretation?

Yijun Zhao Linear Regression

Normal Distribution

Right Skewed Left Skewed Random

Normal Distribution
Yijun Zhao Linear Regression
Normal Distribution
mean = median = mode
symmetry about the center
1 2
x ∼ N(µ, σ 2 ) =⇒ f (x) = σ√12π e − 2σ2 (x−µ)

Yijun Zhao Linear Regression

Central Limit Theorem
All things bell shaped!
Random occurrences over a large population
tend to wash out the asymmetry and
uniformness of individual events. A more
’natural’ distribution ensues. The name for it is
the Normal distribution (the bell curve).
Formal definition: If (y1 , . . . , yn ) are i.i.d. and
0 < σy2 < ∞, then when n is large the
distribution of ȳ is well approximated by a
σ2
normal distribution N(µy , ny ).

Yijun Zhao Linear Regression

Central Limit Theorem
Example:

Yijun Zhao Linear Regression

LR: Probabilistic Interpretation

Yijun Zhao Linear Regression

LR: Probabilistic Interpretation

1 − 12 (wT xi −yi )2
prob(yi |xi ) = √
2πσ
e 2σ
Yijun Zhao Linear Regression
LR: Probabilistic Interpretation
Likelihood of the entire dataset:
− 12 (wT xi −yi )2
Y
L ∝ e 2σ

− 12 (wT xi −yi )2
P
2σ
=e i

P T
Maximize L ⇐⇒ Minimize (w xi − yi )2
i

Yijun Zhao Linear Regression

Non-linear Transformation

Linear is limited:

Linear models become powerful when we

consider non-linear feature transformations:
Xi = (1, xi , xi2 ) =⇒ yi = ω0 + ω1 xi + ω2 xi2

Yijun Zhao Linear Regression

Overfitting

Yijun Zhao Linear Regression

Overfitting
How do we know we overfitted?
Ein : Error from the training data
Eout : Error from the test data
Example:

Yijun Zhao Linear Regression

Overfitting
How to avoid overfitting?
Use more data
Evaluate on a parameter tuning set
Regularization

Yijun Zhao Linear Regression

Regularization
Attempts to impose ”Occam’s razor” principle
Add a penalty term for model complexity
Most commonly used :
L2 regularization (ridge regression) minimizes:
E (w) =k Xw − y k2 + λ k w k2
where λ ≥ 0 and k w k2 = wT w
L1 regularization (LASSO) minimizes:
E (w) =k Xw − y k2 + λ|w|1
D
P
where λ ≥ 0 and |w|1 = |ωi |
i=1

Yijun Zhao Linear Regression

Regularization
L2: closed form solution

w = (XT X + λI)−1 XT y

L1: No closed form solution. Use quadratic

programming:

minimize k Xw − y k2 s.t. k w k1 ≤ s

Yijun Zhao Linear Regression

L2 Regularization Example

Yijun Zhao Linear Regression

Model Selection
Which model?
A central problem in supervised learning
Simple model: ”underfit” the data
Constant function
Linear model applied to quadratic data

Complex model: ”overfit” the data

High degree polynomials
Model with hidden logics that fits the data to
completion

Yijun Zhao Linear Regression

Bias-Variance Trade-off
N

1
(wT xn − yn )2 let ŷ = wT xn
P
Consider E N
n=1

E (ŷ − yn )2 can be decomposed into (reading):

var {noise} + bias 2 + var {yi }
var {noise}: can’t be reduced
bias 2 + var {yi } is what counts for prediction
High bias 2 : model mismatch, often due to
”underfitting”
High var {yi }: training set and test set
mismatch, often due to ”overfitting”
Yijun Zhao Linear Regression
Bias-Variance Trade-off
Often: low bias ⇒ high variance
low variance ⇒ high bias
Trade-off:

Yijun Zhao Linear Regression

How to choose λ ?
But we still need to pick λ.
Use the test set data ? NO!

Set aside another evaluation set

Small evaluation set ⇒ inaccurate estimated error
Large evaluation set ⇒ small training set

CrossValidation

Yijun Zhao Linear Regression

Cross Validation (CV)
Divide data into K folds
Alternatively train on all except k th folds, and
test on k th fold

Yijun Zhao Linear Regression

Cross Validation (CV)
How to choose K?
Common choice of K = 5, 10, or N (LOOCV)

Measure on average performance

Cost of computation: K folds × choices of λ

Yijun Zhao Linear Regression

Learning Curve
A learning curve plots the performance of the
algorithm as a function of the size of training data

Yijun Zhao Linear Regression

Learning Curve

Yijun Zhao Linear Regression

Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Updated Module2 - OTML Updated
No ratings yet
Updated Module2 - OTML Updated
83 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
2 Linear Regression
No ratings yet
2 Linear Regression
14 pages
Linear Regression & Least Squares
No ratings yet
Linear Regression & Least Squares
29 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Linear Regression
No ratings yet
Linear Regression
28 pages
Single-Parameter Linear Regression: Predicting Real-Valued Outputs: An Introduction To Regression
No ratings yet
Single-Parameter Linear Regression: Predicting Real-Valued Outputs: An Introduction To Regression
51 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
02 Linear Regression Models
No ratings yet
02 Linear Regression Models
206 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
ML 3
No ratings yet
ML 3
56 pages
Regression
No ratings yet
Regression
11 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Lecture 3
No ratings yet
Lecture 3
33 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
CS2011 2
No ratings yet
CS2011 2
14 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
ML - Lec 5 - Regression - Gradient Descent Least Square
No ratings yet
ML - Lec 5 - Regression - Gradient Descent Least Square
59 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Data Science Course Syllabus
No ratings yet
Data Science Course Syllabus
104 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
3.1 Linear and Logistic Regression
No ratings yet
3.1 Linear and Logistic Regression
36 pages
ML 2
No ratings yet
ML 2
155 pages
ML-Unit I - Linear Regression
No ratings yet
ML-Unit I - Linear Regression
74 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
No ratings yet
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
38 pages
Introml 02 Regression Annotated PDF
No ratings yet
Introml 02 Regression Annotated PDF
26 pages
AI & ML Lab Manual - LDCE
No ratings yet
AI & ML Lab Manual - LDCE
70 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Linear Reg, Logistic Reg and SVM
No ratings yet
Linear Reg, Logistic Reg and SVM
40 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Lec24 Linear Regression
No ratings yet
Lec24 Linear Regression
10 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
ML Unit
No ratings yet
ML Unit
23 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Introduction Supervised Machine Learning
No ratings yet
Introduction Supervised Machine Learning
27 pages
統計摘要
No ratings yet
統計摘要
12 pages
Python Libraries for Finance
100% (1)
Python Libraries for Finance
15 pages
Pyspark Vs Spark SQL
100% (1)
Pyspark Vs Spark SQL
6 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Learning SQL Zero To Hero
100% (2)
Learning SQL Zero To Hero
110 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
61 pages
Ce 24/L - Numerical Solutions To Ce Problems
No ratings yet
Ce 24/L - Numerical Solutions To Ce Problems
1 page
Hamiltonian Mechanics Overview
No ratings yet
Hamiltonian Mechanics Overview
4 pages
BACKTRACKING Solutions
No ratings yet
BACKTRACKING Solutions
5 pages
Total Correlation
No ratings yet
Total Correlation
3 pages
Week 1
No ratings yet
Week 1
8 pages
NeurIPS 2020 Training Generative Adversarial Networks With Limited Data Paper
No ratings yet
NeurIPS 2020 Training Generative Adversarial Networks With Limited Data Paper
11 pages
Pre College Algebra
No ratings yet
Pre College Algebra
3 pages
Advance Deep Learning - BIT L1
No ratings yet
Advance Deep Learning - BIT L1
66 pages
Sivaraksa Et Al. - 2024 - Risk-Optimized Crypto Trading Bot
No ratings yet
Sivaraksa Et Al. - 2024 - Risk-Optimized Crypto Trading Bot
6 pages
Question Bank Advanced CO3, CO4
No ratings yet
Question Bank Advanced CO3, CO4
5 pages
Least-Squares Spectral Analysis Overview
No ratings yet
Least-Squares Spectral Analysis Overview
9 pages
Comparison of Machine Learning Approaches For Time-Series-Based Quality Monitoring of Resistance Spot Welding (RSW)
No ratings yet
Comparison of Machine Learning Approaches For Time-Series-Based Quality Monitoring of Resistance Spot Welding (RSW)
17 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
ML System Optimization - Lecture 10 - Model Optimization Techniques
No ratings yet
ML System Optimization - Lecture 10 - Model Optimization Techniques
33 pages
Introduction To Digital Communications 2nd Edition Joachim Speidel Instant Download
100% (2)
Introduction To Digital Communications 2nd Edition Joachim Speidel Instant Download
51 pages
XSTK 1
No ratings yet
XSTK 1
37 pages
Econometrics for Advanced Students
No ratings yet
Econometrics for Advanced Students
3 pages
Statistical Learning Framework
No ratings yet
Statistical Learning Framework
7 pages
Digital Signal Processing Lab Work (TEC-317) : NAME: Samyak I.D.: 53589
No ratings yet
Digital Signal Processing Lab Work (TEC-317) : NAME: Samyak I.D.: 53589
24 pages
Write Algorithms and Draw Flowcharts For The Following Accept The Age KnowledgeBoat
No ratings yet
Write Algorithms and Draw Flowcharts For The Following Accept The Age KnowledgeBoat
1 page
ML Practical
No ratings yet
ML Practical
61 pages
6etscyll BEC613
No ratings yet
6etscyll BEC613
6 pages
Ward Equivalent
No ratings yet
Ward Equivalent
24 pages
ML Lecture 7 - Ensemble Learning
No ratings yet
ML Lecture 7 - Ensemble Learning
18 pages
MATLAB & ANSYS Structural Analysis Guide
No ratings yet
MATLAB & ANSYS Structural Analysis Guide
29 pages
AI in - Sem Question Bank
No ratings yet
AI in - Sem Question Bank
2 pages
Code Question1-Adaline
No ratings yet
Code Question1-Adaline
29 pages
CCD 7104 Machine Learning 2
No ratings yet
CCD 7104 Machine Learning 2
4 pages
Numerical Differentiation & Integration
No ratings yet
Numerical Differentiation & Integration
5 pages
Grade 11 Mathematics Amajuba Dictrict Test 05 June 2025 Memorandum - 101057
No ratings yet
Grade 11 Mathematics Amajuba Dictrict Test 05 June 2025 Memorandum - 101057
4 pages

Python Tutorial

Uploaded by

Python Tutorial

Uploaded by

Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Age Gender Race Major GPA

Yijun Zhao Linear Regression

{(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}

a regression problem tries to uncover the function

such that for a new input value x∗ , we can

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Figure: 1D and 2D linear regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

where X† = (XTX)−1XT is the

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Gradient Descent Algorithm

Yijun Zhao Linear Regression

Use 0.1 (practical observation)

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

OLS Gradient Descent

OLS is expensive when D is large!

Yijun Zhao Linear Regression

What is the Probabilistic Interpretation?

Yijun Zhao Linear Regression

Right Skewed Left Skewed Random

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Linear models become powerful when we

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

L1: No closed form solution. Use quadratic

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Complex model: ”overfit” the data

Yijun Zhao Linear Regression

E (ŷ − yn )2 can be decomposed into (reading):

Yijun Zhao Linear Regression

Set aside another evaluation set

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Measure on average performance

Cost of computation: K folds × choices of λ

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

You might also like