0% found this document useful (0 votes)
6 views50 pages

ML 3

The document covers key concepts in linear and logistic regression, including overfitting, regularization, and performance metrics such as R², MAE, MSE, and RMSE. It provides examples, particularly focusing on predicting nutritional ratings of cereals based on sugar content. The content is part of a course on Machine Learning and Data Mining for semester 1, 2023.

Uploaded by

zhouyjchris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views50 pages

ML 3

The document covers key concepts in linear and logistic regression, including overfitting, regularization, and performance metrics such as R², MAE, MSE, and RMSE. It provides examples, particularly focusing on predicting nutritional ratings of cereals based on sugar content. The content is part of a course on Machine Learning and Data Mining for semester 1, 2023.

Uploaded by

zhouyjchris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Linear Regression. Logistic Regression.

Overfitting and Regularization.


COMP5318/COMP4318 Machine Learning and Data Mining
semester 1, 2023, week 3
Irena Koprinska

Reference: Witten ch.4: 128-131, Müller & Guido: ch.2: 28-31, 47-63,
Geron: ch.4 132-137, 149-161

1
Outline

• Linear regression
• Logistic regression
• Overfitting and regularization
• Ridge and Lasso regression

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


2
Introduction

• Linear regression is a prediction method used for regression tasks


• Regression tasks – the predicted variables is numeric
• Examples: predict the exchange rate of AU$ based on economic
indicators, predict the sales of a company based on the amount
spent for advertising
• Logistic regression is an extension of linear regression for
classification tasks
• Classification tasks – the predicted variable is nominal
• Both linear regression and logistic regression are very popular in
statistics

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


3
Linear Regression

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


4
Simple (Bivariate) Regression

• Given: a dataset with 2 continuous variables:


• feature x (also called independent variable)
• predicted variable y (also called target variable or dependent
variable)

• Goal: Approximate the relationship between these variables with a


straight line for the given dataset
• Prediction (typical task in DM): Given a new value of independent
variable, use the line to predict the value of the dependent variable
• Descriptive analysis (typical task in psychology, health and social
sciences): assess the strength of the relationship between x and y

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


5
Example – cereals dataset

• Contains nutritional information for 77 breakfast cereals


• 14 features
• cereal manufacturer, type (hot or cold), calories, protein [g], fat [g], sodium
[mg], fiber [g], carbohydrates [g], sugar [g], potassium [mg],
%recommended daily vitamins, weight of 1 serving, number of cups per
serving, shelf location (bottom, middle or top)
• Class variable (numeric): nutritional rating

• Task: Predict the nutritional rating of a cereal based on its sugar content
1. Use this data to build the model
2. Given the sugar content of a new cereal, use the model to predict is
nutritional rating
• New cereal = cereal not used for building of the model

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


6
Task

Task: Predict the nutritional rating of a cereal based on its sugar content
1. Use this data to build the model
2. Given the sugar content of a new cereal, use the model to predict is
nutritional rating

Dependent variable?
rating

Independent variable?
sugars

Example from D. Larose, Data Mining: Methods and Models, 2006, Wiley
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
7
Regression model

• The relationship between sugars and rating is modeled by a line


• The line is used to make predictions

model (regression line)


Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
8
Equation of a line

y = b0 + b1 x
intercept slope

y = 5 + 2x

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


9
Equation of a regression line


y = b0 + b1 x
 Estimated (predicted)
y value of y from the
regression line

b0 and b1 Regression
coefficients

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


10
How to make predictions

• In our case the computed regression line (model) is


𝑦ො = 59.4 − 2.42𝑥
• It can be used to make predictions
• e.g. predict the nutritional rating of a new cereal type (not in the original
data) that contains x=1g sugar

𝑦ො = 59.4 − 2.42 ∗ 1 = 56.98

56.98

• The predicted value lies precisely


on the regression line

1
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
11
How to make predictions (2)

• We have a cereal type in our dataset with sugar =1g: Cheerios


• Its nutritional rating is: 50.765 (actual value) not 56.98 (predicted)
• The difference is called prediction error or residual

56.98

50.765

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


12
Fitting a line

• There are many lines that can be fitted to the


given dataset. Which one is the best one?
• The one “closest” to the data
• Mathematically:
• Prediction error (residual) =
actual value - predicted value:

 = yi − yi
• Performance index: sum of squared prediction
errors (SSE): SSE =  ( yi − yi ) 2

i

• Our goal: select the line which minimizes SSE


• Can be solved using the method of the least
squares
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
13
Solution using the least squares method

σ 𝑥𝑖 𝑦𝑖 − [(σ𝑥𝑖 ሻ (σ𝑦𝑖 ሻ ]Τ𝑛


𝑏1 = 2
σ 𝑥𝑖2 − ൫σ𝑥𝑖 ሻ ൗ𝑛

𝑏0 = 𝑦ത − 𝑏1 𝑥ҧ

x - mean value of x

y - mean value of y
n – number of training examples (= data points, observations)

• This solution is obtained by minimizing SSE using differential calculus


• If you are interested to see how this was done, please see Appendix 1 at the
end
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
14
Coefficient of determination R2

• The least squares method finds the best fit to the data but doesn’t tell
us how good this fit it
• E.g. SSE=12; is this large or small?

• R2 measures the goodness of fit of the regression line found by the


least squares method:
SSR
R = 2

SST
• Values between 0 and 1; the higher the better
• = 1: the regression line fits perfectly the training data
• close to 0: poor fit

• What are SSR and SST?


Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
15
Three types of errors

• 1. SSE - Sum of squared prediction errors


n

SSE =  ( yi − yi ) 2 = actual value – predicted value
i

• 2. SST - Sum of squared total errors

n
SST =  ( yi − y ) 2 = actual value – mean value
i =1

• Hence, SST measures the prediction error when the predicted value is the
mean value
• SST is a function of the variance of y (variance = standard deviation^2) => SST
is a measure of the variability of y, without considering x
n Can be used as a baseline -
SST =  ( yi − y ) 2
= (n − 1) var( y ) predicting y without knowing x
i
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
16
Three types of errors (2)

• 3. SSR - Sum of squared regression errors = predicted value – mean value


n
SSR =  ( yˆ i − y ) 2
i
Ex.: Distance
travelled for a
SSE number of hours

SST
SSR

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


17
Relation between SST, SSR and SSE

• From the graph: yi − yi = ( yˆ i − yi ) + ( yi − yi )
• It can be shown that SST=SSR+SSE
(For the interested students: n n n
 2
How? By squaring each side:   i i  i i)
( y i − y i ) 2
= ( ˆ
y − y ) 2
+ ( y − y
i =1 i =1 i =1
The cross product cancels out as shown in this book:
N. Draper and H. Smith, Applied Regression Analysis, Wiley, 1998)

SSE

SST
SSR

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


18
Coefficient of determination R2 - again

• Measures the goodness of fit of the regression line to the


SSR
R =
2 training data
SST • Values between 0 and 1; the higher the better
• 1: perfect fit, SSE=0 ; Why is it 1 when SSE=0?
• 0: x is not helpful for predicting y, SSR=0
If SSE=0
𝑆𝑆𝑅 𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝑇
𝑅2 = = = =1
𝑆𝑆𝑇 𝑆𝑆𝑇 𝑆𝑆𝑇 Is R2 high or low?

SSE
SSR
SST

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


19
Relation R2 and r

• r - correlation coefficient; measures linear relationship between 2 vectors x and


y (see slides for week 1b):
covar(x, y ) covar(x, y )
r = corr(x, y ) = =
std (x) std( y ) var(x) var(y )

• R2 – coefficient of determination; measures how well the regression line


represents the data: SSR
2
R =
SST

• It can be shown that r = R2


Except for the sign of r, which depends on the direction of the relationship,
positive or negative, so:
𝑟 = ± 𝑅2

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


20
MAE, MSE and RMSE

• MAE, MSE and RMSE are other performance measures for evaluating:
• how good the model is (performance on training data) and
• how well it works on new data (performance on test data)
• They are widely used in ML and DM
𝑛
1
𝑀𝐴𝐸 = ෍ |𝑦ො𝑖 − 𝑦𝑖 |
• Mean Absolute Error (MAE): 𝑛
𝑖=1
𝑛
1
• Mean Squared Error (MSE): 𝑀𝑆𝐸 = ෍(𝑦ො𝑖 − 𝑦𝑖 ሻ2
𝑛
𝑖=1

• Root Mean Squared Error (RMSE):

𝑛
1
𝑅𝑀𝑆𝐸 = ෍(𝑦ො𝑖 − 𝑦𝑖 ሻ2
𝑛
𝑖=1

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


21
Multiple regression

• Simple regression: 1 feature


• Multiple regression: more than 1 feature

• The line becomes a plane in 2-dim. space and a hyperplane in >2-dim.


space
• R2 is similarly defined, called multiple coefficient of determination
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
22
Question time

• True or False?
• 1) The regression line minimizes the sum of the residuals

• 2) If all residuals are 0, SST=SSR

• 3) If the value of the correlation coefficient is negative, this indicates


that the 2 variables are negatively correlated

• 4) The value of the correlation coefficient can be calculated given the


value of R2

• 5) SSR represents an overall measure of the prediction error on the


training set by using the regression line

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


23
Answers

• True or False?
• 1) The regression line minimizes the sum of the residuals False
No, the sum of squared residuals
• 2) If all residuals are 0, SST=SSR True
If the residuals are 0 =>SSE will be 0; SST=SSR+SSE => SST=SSR
• 3) If the value of the correlation coefficient is negative, this indicates
that the 2 variables are negatively correlated True

• 4) The value of the correlation coefficient can be calculated given the


value of R2 False 2
𝑟=± 𝑅
• 5) SSR represents an overall measure of the prediction error on the
training set by using the regression line False
No, this is SSE, 𝑅2 or other measures such as MAE, MSE, RMSE
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
24
Negative R squared

• Note that if the LR model is fitted on one dataset but tested on another
dataset, then it is possible that the R2 value is negative
• We will see such case during the tutorial – a LR model trained on the
training set and tested on the test set
• R2 on the training set: 0.69
• R2 on the test set: -0.73 (negative)

• Negative value means a poor fit


• R2 on the training set: 0.69 - good fit on the training data
• R2 on the test set: -0.73 - poor fit on the test data
• => overfitting
• If the model is trained and tested on the same dataset, R2 is always
between 0 and 1

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


25
Logistic Regression

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


26
Logistic regression

• Used for classification tasks


• Two classes: 0 and 1 (there are extensions for more than 2 classes)
• Fits the data to a logistic (sigmoidal) curve instead of fitting it to a straight line
• => assumes that the relationship between the feature and class variable
is nonlinear

logistic regression

linear regression

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


27
Simple (bivariate) logistic regression

• Example: Predicting the presence (class=1) or absence (class=0) of a particular


disease, given the patient’s age

ID age disease ID age disease


logistic regression
1 25 0 11 50 0
2 29 0 12 59 1
linear regression
3 30 0 13 60 0
4 31 0 14 62 0
5 32 0 15 68 1
6 41 0 16 72 0
7 41 0 17 79 1
8 42 0 18 80 0
9 44 1 19 81 1
10 49 1 20 84 1

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


28
Logistic regression – example

• What will be the prediction of Logistic Regression for patient 11 from the
training data (age=50, disease=0)?

logistic regression

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


29
Logistic curve

• The equation of the logistic (sigmoidal) curve is:

𝑒 𝑏0+𝑏1 𝑥
𝑝=
1 + 𝑒 𝑏0 +𝑏1 𝑥
• It gives a value between 0 and 1 that is interpreted as the probability for
class membership:

• p is the probability for class 1 and 1-p is the probability for class 0

• It uses the maximum likelihood method to find the parameters b0 and b1 - the
curve that best fits the data

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


30
How to make predictions

• The logistic regression produced b0 = -4.372, b1= 0.06696


• => the probability for a patient aged 50 (training example 11) to have the
disease:

• => 26% to have the disease and 74% not to have the disease

• We can use the probability directly or convert it into 0/1 answer required for
classification tasks, e.g. 0 if p<0.5 and 1 if p>=0.5
• => We predict class 0 for this patient
• Other thresholds (not 0.5) are also possible depending on domain knowledge
• The class for new examples can be predicted similarly – e.g. make a
prediction for a patient aged 45

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


31
Logistic regression equation

𝑒 𝑏0+𝑏1𝑥
𝑝=
1 + 𝑒 𝑏0+𝑏1𝑥
• It also follows that: How can this be shown? See Appendix 2 at the end.
𝑝
𝑏0 + 𝑏1 𝑥 = ln
1−𝑝
𝑝
ln = 𝑏0 + 𝑏1 𝑥 linear calculation, as in linear regression
1−𝑝

called odds ratio for the


default class (class 1)
ln(𝑜𝑑𝑑𝑠ሻ = 𝑏0 + 𝑏1 𝑥 => 𝑜𝑑𝑑𝑠 = 𝑒 (𝑏0+𝑏1𝑥1൯
The model is still a linear
Compare:
combination of the input features,
• Logistic regression: ln(𝑜𝑑𝑑𝑠ሻ = 𝑏0 + 𝑏1 𝑥 but this combination determines the
log odds of the class not directly
• Linear regression: 𝑦ො = b0 + b1 𝑥 the predicted value
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
32
Overfitting and Regularization

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


33
Overfitting

• Overfitting:
• Small error on the training set but high error on test set (new examples)
• The classifier has memorized the training examples but has not learned
to generalize to new examples!
• It occurs when
• we fit a model too closely to the particularities of the training set – the
resulting model is too specific, works well on the training data but doesn’t
work well on new data
Ex.1 Ex.2
may be overfitting –
too complex Rule1: may be overfitting – too specific
If age>45, income>100K, has_children=3,
divorced=no -> buy_boat=yes

Rule2:
If age>45, income>100K -> buy_boat=yes

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


34
Overfitting (2)

• Various reasons, e.g.


• Issues with the data
• Noise in the training data
• Training data does not contain enough representative examples – too
small
• Training data very different than test data – not representative enough
• How the algorithm operates
• Some algorithms are more susceptible to overfitting than others
• Different algorithms have different strategies to deal with overfitting,
e.g.
• Decision tree – prune the tree
• Neural networks – early stopping of the training
• …

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


35
Underfitting

• The model is too simple and doesn’t capture all important aspects
of the data
• It performs badly on both training and test data

Rule1: may be overfitting – too specific


If age>45, income>100K, has_children=3,
divorced=no -> buy_boat=yes

Rule2:
If age>45, income>100K -> buy_boat=yes

Rule3: may be underfitting – too general


If owns_hourse=yes -> buy_boat=yes

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


36
Trade-off between model complexity and
generalization performance
• generalization performance = accuracy on test set
• Usually, the more complex we allow the model to be, the better it will
predict on the training data
• However, if it becomes to complex, it will start focusing too much on each
individual data point, and will not generalize well on new data
Image from A. Mueller and S. Guido, Introduction to ML with Python

• There is point in between,


which will yield the best test
accuracy
• This is the model we want to
find

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


37
Regularization

• Regularization means explicitly restricting a model to avoid


overfitting
• It is used in some regression models (e.g. Ridge and Lasso
regression) and in some neural networks

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


38
Ridge and Lasso Regression

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


39
Ridge regression

• A regularized version of the standard Linear Regression (LR)


• Also called Tikhonov regularization
• Uses the same equation as LR to make predictions
• However, the regression coefficients w are chosen so that they not only
fit well the training data (as in LR) but also satisfy an additional
constraint:
• the magnitude of the coefficients is as small as possible, i.e. close
to 0
• Small values of the coefficients means
• each feature will have little effect on the outcome
• small slope of the regression line
• Rationale: a more restricted model (less complex) is less likely to overfit
• Ridge regression uses the so called L2 regularization (L2 norm of the
weight vector)
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
40
Ridge regression (2)

• Minimizes the following cost function:


1 𝑛 n - number of training examples
σ (𝑦ො − 𝑦𝑖 ሻ2 + 𝛼 σ𝑚
𝑖=1 𝑤𝑖
2
𝑛 𝑖=1 𝑖 m – number of regression coefficients (weights)

MSE regularization term


Goal: high accuracy low complexity
on training data (low model – w close to 0
MSE)
• Parameter  controls the trade-off between the performance on the
training set and model complexity

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


41
Ridge regression (3)

1 𝑛
σ (𝑦ො − 𝑦𝑖 ሻ2 + 𝛼 σ𝑚
𝑖=1 𝑤𝑖
2
𝑛 𝑖=1 𝑖

MSE regularization term


(L2 norm)
•  controls the trade-off between the
performance on the training set and
model complexity
• Increasing  makes the coefficients
smaller (close to 0); this typically
decreases the performance on the
training set but may improve the
performance on the test set
• Decreasing  means less restricted Image from A. Geron, Hands-on ML with
Scikit-learn, Keras & TensorFlow
coefficients. For very small  , Ridge
Regression will behave similarly to LR

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


42
Lasso regression

• Another regularized version of the standard LR


• LASSO = Least Absolute Shrinkage and Selection Operator Regression
• As Ridge Regression, it adds a regularization term to the cost function
but it uses the L1 norm of the regression coefficient vector w
1 𝑛
σ (𝑦ො − 𝑦𝑖 ሻ2 + 𝛼 σ𝑚
𝑖=1 |𝑤𝑖 |
𝑛 𝑖=1 𝑖

MSE regularization term


(L1 norm)
Goal: high accuracy
low complexity model
on training data (low
MSE)
• Consequence of using L1 – some w will become exactly 0
• => some features will be completely ignored by the model – a form of
automatic feature selection
• Less features – simpler model, easier to interpret
Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023
43
Lasso regression (2)

1 𝑛
σ (𝑦ො − 𝑦𝑖 ሻ2 + 𝛼 σ𝑚
𝑖=1 |𝑤𝑖 |
𝑛 𝑖=1 𝑖

MSE regularization term


(L1 norm)

• As in Ridge Regression:
•  controls the trade-off between the
performance on the training set and
model complexity
• Increasing/decreasing  - similar
reasoning as before

Image from A. Geron, Hands-on ML with


Scikit-learn, Keras & TensorFlow

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


44
Summary

• Linear regression
• Simple (bivariate) - a line is used to approximate the relationship between 2
continuous variables (feature x and class variable y)
• Multiple – more than 1 feature; the line becomes a hyperplane
• The least-square method is used to find the line (hyperplane) which best fit
the given data (training data)
• “Best fit”: minimizes the sum of the squared errors (SSE) between the
actual and predicted values of y, over all data points
• R2 = coefficient of determination=SSR/SST – how well the line fits the data;
the higher the better
• MAE, MSE and RMSE – widely used accuracy measures in ML (can be
measured on both training and test data)

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


45
Summary (2)

• Logistic regression
•Simple (bivariate) - a sigmoidal curve is used to approximate the relationship
between the feature x and class variable y
• => assumes the relationship between the feature and class variable is
nonlinear
• Multiple – more than 1 feature; the sigmoidal curve becomes a sigmoidal
hyperplane
• Uses the maximum likelihood method to find the curve (hyperplane) which
best fit the given data (training data)
• Overfitting and regularization
• Overfitting - high accuracy on training data but low accuracy on test data (low
generalization)
• High model complexity -> low generalization
• Regularization is a method to avoid overfitting – it makes the model more
restrictive (less complex)
• Ridge and Lasso regression are regularized linear regression models

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


46
Acknowledgements

• M. Lewis-Beck, Applied statistics, SAGE University Paper Series on


Quantitative Analysis.
• D. Larose, Data Mining: Methods and Models, 2006, Wiley.

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


47
Appendix 1: Minimizing SSE

• For interested students; not examinable


• From D. Larose, Data Mining: Methods and Models, 2006, Wiley; p.36-37

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


48
Appendix 1: Minimizing SSE (2)

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


49
Appendix 2: Logistic regression

• For interested students; not examinable

Irena Koprinska, irena.koprinska@sydney.edu.au COMP5318 ML&DM, week 3, 2023


50

You might also like