RRB - Unit 2 Regresion
RRB - Unit 2 Regresion
Regression
Syllabus- AInDS
● Introduction- Regression, Need of Regression,
● Difference between Regression and Correlation,
● Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear vs.
Multiple Linear,
● Bias-Variance tradeoff, Overfitting and Underfitting.
● Regression Techniques - Polynomial Regression, Stepwise Regression, Decision Tree
Regression, Random Forest Regression, Support Vector Regression, Ridge Regression, Lasso
Regression, Elastic Net Regression, Bayesian Linear Regression.
● Evaluation Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean
Squared Error (RMSE),R-squared ,Adjusted R-squared.
Syllabus- Computer
● Bias, Variance,
● Generalization, Underfitting, Overfitting,
● Linear regression,
● Regression: Lasso regression, Ridge regression
● Gradient descent algorithm
● Evaluation Metrics: MAE, RMSE, R2
Errors in Machine Learning
Important Link
https://www.javatpoint.com/bias-a
nd-variance-in-machine-learning
Errors in Machine Learning
● Irreducible errors are errors which will always be present in a
machine learning model, because of unknown variables, and
whose values cannot be reduced.
● Reducible errors are those errors whose values can be
further reduced to improve a model. They are caused
because our model’s output function does not match the
desired output function and can be optimized.
Bias
● Bias is the difference
between our actual and
predicted values.
Solved Example
Google Colab
X Y
SST=SSR+SSE
10
SST = 0.9+0.1
How Good is the model’s prediction power?
SSE SSR SST
Bahubali1 Bahubali2 predicted y Yi- Ypredicted- Y-
Square Square Square
Ypredicted Avg Y Avg Y
SSE 9.1 SSR 0.9 SST 10
SST=SSR+SSE
10
SST = 0.9+0.1
2 =0.9/10
r = 0.09
Correlation
The term correlation is a combination of two words 'Co' (together) and the relation
between two quantities. Correlation is when it is observed that a change in a unit in
one variable is retaliated by an equivalent change in another variable, i.e., direct or
indirect, at the time of study of two variables.
Correlation can be either negative or positive.
If the two variables move in the same direction, i.e. an increase in one variable results in the
corresponding increase in another variable, and vice versa, then the variables are considered to be
positively correlated. For example, Investment and profit.
On the contrary, if the two variables move in different directions so that an increase in one
variable leads to a decline in another variable and vice versa, this situation is known as a negative
correlation. For example, Product price and demand.
Polynomial Regression
Polynomial Regression
Polynomial Regression
1. Transformation of non-linear
data to linear data, so that the
linear regression can handle the
data
2. Using polynomial regression
Polynomial Regression
Polynomial Regression
Polynomial Regression
Polynomial Regression
Lets See the Example
x y
1 1
2 4
3 9
4 15
https://quantifyinghealth.com/stepwise-selection/
Regression
Statistical technique based on the average mathematical relationship between two
or more variables is known as regression, to estimate the change in the metric
dependent variable due to the change in one or more independent variables.
It plays an important role in many human activities since it is a powerful and flexible
tool that is used to forecast past, present, or future events based on past or present
events. For example, The future profit of a business can be estimated on the basis of
past records.
There are two variables x and y in a simple linear regression, wherein y depends on x
or say that is influenced by x. Here y is called as a variable dependent, or criterion,
and x is a variable independent or predictor.
Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear
vs. Multiple Linear,
1. Univariate data –This type of data consists of only one variable. The analysis of univariate data is
thus the simplest form of analysis since the information deals with only one quantity that changes.
It does not deal with causes or relationships and the main purpose of the analysis is to describe the data
and find patterns that exist within it. The example of a univariate data can be height.
Suppose that the heights of seven students of a class is recorded (figure 1),there is only one variable that
is height and it is not dealing with any cause or relationship.
Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear
vs. Multiple Linear,
2. Bivariate data: This type of data involves two different variables.
The analysis of this type of data deals with causes and relationships and the analysis is done to find out the
relationship among the two variables. Example of bivariate data can be temperature and ice cream sales in summer
season.
Suppose the temperature and ice cream sales are the two variables of a bivariate data (figure 2). Here, the
relationship is visible from the table that temperature and sales are directly proportional to each other and thus
related because as the temperature increases, the sales also increase. Thus bivariate data analysis involves
comparisons, relationships, causes and explanations.
Types of Regression: Univariate vs. Multivariate, Linear vs. Nonlinear, Simple Linear
vs. Multiple Linear
3. Multivariate data
When the data involves three or more variables, it is categorized under multivariate.
Example of this type of data is suppose an advertiser wants to compare the popularity of
four advertisements on a website, then their click rates could be measured for both men
and women and relationships between variables can then be examined. It is similar to
bivariate but contains more than one dependent variable. The ways to perform analysis on
this data depends on the goals to be achieved. Some of the techniques are regression
analysis, path analysis, factor analysis and multivariate analysis of variance (MANOVA).
Click Here: Regularization (Ridge and Lasso)
Example