0% found this document useful (0 votes)
14 views53 pages

RRB - Unit 2 Regresion

Sppu university AI DS engineering final year BE semester 7th subject Machine learning unit 2 nd notes

Uploaded by

pawaletrupti434
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views53 pages

RRB - Unit 2 Regresion

Sppu university AI DS engineering final year BE semester 7th subject Machine learning unit 2 nd notes

Uploaded by

pawaletrupti434
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Unit 2

Regression
Syllabus- AInDS
● Introduction- Regression, Need of Regression,
● Difference between Regression and Correlation,
● Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear vs.
Multiple Linear,
● Bias-Variance tradeoff, Overfitting and Underfitting.
● Regression Techniques - Polynomial Regression, Stepwise Regression, Decision Tree
Regression, Random Forest Regression, Support Vector Regression, Ridge Regression, Lasso
Regression, Elastic Net Regression, Bayesian Linear Regression.
● Evaluation Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean
Squared Error (RMSE),R-squared ,Adjusted R-squared.
Syllabus- Computer
● Bias, Variance,
● Generalization, Underfitting, Overfitting,
● Linear regression,
● Regression: Lasso regression, Ridge regression
● Gradient descent algorithm
● Evaluation Metrics: MAE, RMSE, R2
Errors in Machine Learning

Important Link
https://www.javatpoint.com/bias-a
nd-variance-in-machine-learning
Errors in Machine Learning
● Irreducible errors are errors which will always be present in a
machine learning model, because of unknown variables, and
whose values cannot be reduced.
● Reducible errors are those errors whose values can be
further reduced to improve a model. They are caused
because our model’s output function does not match the
desired output function and can be optimized.
Bias
● Bias is the difference
between our actual and
predicted values.

● Bias is the simple


assumptions that our model
makes about our data to be
able to predict new data.
Variance
● Variance can defined as
as the model’s sensitivity
to fluctuations in the
data.
● The model may learn
from noise.
Variance
Bias VS Variance
Bias-Variance Tradeoff
Bias-Variance Tradeoff
Bias-Variance Tradeoff
Overfitting and Underfitting
Generalization
● Generalization is a term used to describe a model’s ability to react
to new data. That is, after being trained on a training set, a model
can digest new data and make accurate predictions.
● If a model has been trained too well on training data, it will be unable
to generalize.
● It will make inaccurate predictions when given new data, making the
model useless even though it is able to make accurate predictions for
the training data. This is called overfitting.
● The inverse is also true. Underfitting happens when a model has not 16

been trained enough on the data. In the case of underfitting, it


makes the model just as useless and it is not capable of making
accurate predictions, even with the training data.
Overfitting
Underfitting
Regression
Regression

"Regression shows a line or curve that passes


through all the datapoints on target-predictor
graph in such a way that the vertical distance
between the data points and the regression
line is minimum."
Simple Linear Regression
Linear Regression
● Regression analysis is a statistical method to model the
relationship between a dependent (target) and independent
(predictor) variables with one or more independent variables.

● It predicts continuous/real values such as temperature, age,


salary, price, etc.
● Regression is a supervised learning technique
● It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect relationship
between variables.
Linear Regression
Y= 𝛃 0+𝛃 1X
where
● Y is the dependent variable: the variable we wish to explain (also called the
endogenous variable)
● X is the independent variable: the variable used to explain the dependent variable
(also called the exogenous variable)
● β0 is the intercept: where the line cuts Y-axis.
● β1 is the slope of the line. (This slope is very important because it indicates the
change in Y-variable when the variable X changes.)
Linear Regression
Simple Linear Regression

Solved Example

Google Colab
X Y

Person Bahubali1 Bahubali2


P1 4 3
P2 2 4
P3 3 2
P4 5 5
P5 1 3
P6 3 1
AVG 3 3
Xavg Yavg
(X-Xavg)*( (X-Xavg)*
Bahubali1 Bahubali2 X-Xavg Y - Yavg
Y-Yavg) (X-Xavg)
4 3 1 0 0 1
2 4 -1 1 -1 1
3 2 0 -1 0 0
5 5 2 2 4 4
1 3 -2 0 0 4
3 1 0 -2 0 0
3 3 3 10
avg avg sum sum
𝛃1 0.3 Bahubali1 Bahubali2
predicted y
(x) (Actual Y)
4 3 3.3
𝛃0 2.1 2 4 2.7
3 2 3
5 5 3.6
Y= 𝛃 0+𝛃 1X y= 2.1 + 0.3 x 1 3 2.4
3 1 3
Y Actual Vs Y Predicted
How Good is the model’s prediction power?
SSE SSR SST
Bahubali1 Bahubali2 predicted y Yi- Ypredicted- Y-
Square Square Square
Ypredicted Avg Y Avg Y
4 3 3.3 -0.3 0.09 0.3 0.09 0 0
2 4 2.7 1.3 1.69 -0.3 0.09 1 1
3 2 3 -1 1 0 0 -1 1
5 5 3.6 1.4 1.96 0.6 0.36 2 4
1 3 2.4 0.6 0.36 -0.6 0.36 0 0
3 1 3 -2 4 0 0 -2 4
SSE 9.1 SSR 0.9 SST 10

SST=SSR+SSE
10
SST = 0.9+0.1
How Good is the model’s prediction power?
SSE SSR SST
Bahubali1 Bahubali2 predicted y Yi- Ypredicted- Y-
Square Square Square
Ypredicted Avg Y Avg Y
SSE 9.1 SSR 0.9 SST 10

SST=SSR+SSE
10
SST = 0.9+0.1

2 =0.9/10
r = 0.09
Correlation
The term correlation is a combination of two words 'Co' (together) and the relation
between two quantities. Correlation is when it is observed that a change in a unit in
one variable is retaliated by an equivalent change in another variable, i.e., direct or
indirect, at the time of study of two variables.
Correlation can be either negative or positive.
If the two variables move in the same direction, i.e. an increase in one variable results in the
corresponding increase in another variable, and vice versa, then the variables are considered to be
positively correlated. For example, Investment and profit.

On the contrary, if the two variables move in different directions so that an increase in one
variable leads to a decline in another variable and vice versa, this situation is known as a negative
correlation. For example, Product price and demand.
Polynomial Regression
Polynomial Regression
Polynomial Regression

The problem of non-linear


regression can be solved by two
methods

1. Transformation of non-linear
data to linear data, so that the
linear regression can handle the
data
2. Using polynomial regression
Polynomial Regression
Polynomial Regression
Polynomial Regression
Polynomial Regression
Lets See the Example

x y

1 1

2 4

3 9

4 15

y=- 0.75 + 0.95 x + 0.75 x2


Stepwise Regression

Read Upto Diagram Only

https://quantifyinghealth.com/stepwise-selection/
Regression
Statistical technique based on the average mathematical relationship between two
or more variables is known as regression, to estimate the change in the metric
dependent variable due to the change in one or more independent variables.

It plays an important role in many human activities since it is a powerful and flexible
tool that is used to forecast past, present, or future events based on past or present
events. For example, The future profit of a business can be estimated on the basis of
past records.
There are two variables x and y in a simple linear regression, wherein y depends on x
or say that is influenced by x. Here y is called as a variable dependent, or criterion,
and x is a variable independent or predictor.
Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear
vs. Multiple Linear,
1. Univariate data –This type of data consists of only one variable. The analysis of univariate data is
thus the simplest form of analysis since the information deals with only one quantity that changes.

It does not deal with causes or relationships and the main purpose of the analysis is to describe the data
and find patterns that exist within it. The example of a univariate data can be height.

Suppose that the heights of seven students of a class is recorded (figure 1),there is only one variable that
is height and it is not dealing with any cause or relationship.
Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear
vs. Multiple Linear,
2. Bivariate data: This type of data involves two different variables.

The analysis of this type of data deals with causes and relationships and the analysis is done to find out the
relationship among the two variables. Example of bivariate data can be temperature and ice cream sales in summer
season.

Suppose the temperature and ice cream sales are the two variables of a bivariate data (figure 2). Here, the
relationship is visible from the table that temperature and sales are directly proportional to each other and thus
related because as the temperature increases, the sales also increase. Thus bivariate data analysis involves
comparisons, relationships, causes and explanations.
Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear
vs. Multiple Linear

3. Multivariate data
When the data involves three or more variables, it is categorized under multivariate.
Example of this type of data is suppose an advertiser wants to compare the popularity of
four advertisements on a website, then their click rates could be measured for both men
and women and relationships between variables can then be examined. It is similar to
bivariate but contains more than one dependent variable. The ways to perform analysis on
this data depends on the goals to be achieved. Some of the techniques are regression
analysis, path analysis, factor analysis and multivariate analysis of variance (MANOVA).
Click Here: Regularization (Ridge and Lasso)

Example

You might also like