Rohit Unit 2 ML Notes

Uploaded by

Abhishek Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views7 pages

Rohit Unit 2 ML Notes

Uploaded by

Abhishek Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

UNIT 2 CHAPTER 1: LINEAR REGRESSION

1. What is simple linear regression? Write the steps to build a regression model.
Explain how to build simple linear regression model.
 Simple linear regression is a statistical technique used for finding the existence of an
association rela tionship between a dependent variable (aka response variable or outcome
variable) and an independent variable (aka explanatory variable, predictor variable or
feature). We can only establish that change in the value of the outcome variable (Y) is
associated with change in the value of feature X, that is, regression technique cannot be used
for establishing causal relationship between two variables. Regression is one of the most
popular supervised learning algorithms in predictive analytics. A regres sion model requires
the knowledge of both the outcome and the feature variables in the training dataset. The
following are a few examples of simple and multiple linear regression problems:
a. A hospital may be interested in finding how the total cost of a patient for a treatment
varies with the body weight of the patient.
b. Insurance companies would like to understand the association between healthcare
costs and ageing.
c. An organization may be interested in finding the relationship between revenue
generated from a product and features such as the price, money spent on promotion,
competitors’ price, and promotion expenses.
d. Restaurants would like to know the relationship between the customer waiting time
after placing the order and the revenue.

 STEPS IN BUILDING A REGRESSION MODEL:

Building a regression model is an iterative process and several iterations may be required
before finalizing the appropriate model.
STEP 1: Collect/Extract Data
The first step in building a regression model is to collect or extract data on the dependent
(outcome) variable and independent (feature) variables from different data sources.
STEP 2: Pre-Process the Data
Before the model is built, it is essential to ensure the quality of the data for issues such as
reliability, completeness, usefulness, accuracy, missing data, and outliers.
STEP 3: Dividing Data into Training and Validation Datasets
In this stage the data is divided into two subsets (sometimes more than two subsets): training
dataset and validation or test dataset. The proportion of training dataset is usually between
70% and 80% of the data and the remaining data is treated as the validation data. The subsets
may be created using random/ stratified sampling procedure. This is an important step to
measure the performance of the model using dataset not used in model building. It is also
essential to check for any overfitting of the model. In many cases, multiple training and
multiple test data are used (called cross-validation).
STEP 4: Perform Descriptive Analytics or Data Exploration
It is always a good practice to perform descriptive analytics before moving to building a
predictive analytics model. Descriptive statistics will help us to understand the variability in
the model and visualization of the data through, say, a box plot which will show if there are
any outliers in the data. Another visualization technique, the scatter plot, may also reveal if
there is any obvious relationship between the two variables under consideration. Scatter plot
is useful to describe the functional relationship between the dependent or outcome variable
and features.
STEP 5: Build the Model
The model is built using the training dataset to estimate the regression parameters. The
method of Ordinary Least Squares (OLS) is used to estimate the regression
parameters.
STEP 6: Perform Model Diagnostics
Regression is often misused since many times the modeler fails to perform necessary
diagnostics tests before applying the model. Before it can be applied, it is necessary that the
model created is validated for all model assumptions including the definition of the function
form. If the model assumptions are violated, then the modeler must use remedial measure.
STEP 7: Validate the Model and Measure Model Accuracy
A major concern in analytics is over-fitting, that is, the model may perform very well on the
training dataset, but may perform badly in validation dataset. It is important to ensure that the
model performance is consistent on the validation dataset as is in the training dataset. In fact,
the model may be cross validated using multiple training and test datasets.
STEP 8: Decide on Model Deployment
The final step in the regression model is to develop a deployment strategy in the form of
actionable items and business rules that can be used by the organization.
 BUILDING SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression (SLR) is a statistical model in which there is only one
independent vari able (or feature) and the functional relationship between the outcome
variable and the regression coefficient is linear. Linear regression implies that the
mathematical function is linear with respect to regression parameters. One of the
functional forms of SLR is as follows:

For a dataset with n observations (Xi , Yi ), where i = 1, 2, …, n, the above functional form
can be written as follows:
----------------------------------------------------------------------------------------------------------------
2. How to split the dataset into training and validation sets?
 train_test_split() function from skelarn.model_selection module provides the ability to
split the dataset randomly into training and validation datasets. The parameter train_size takes
a fraction between 0 and 1 for specifying training set size. The remaining samples in the
original set will be test or validation set. The records that are selected for training and test set
are randomly sampled. The method takes a seed value in parameter named random_state, to
fix which samples go to training and which ones go to test set. train_test_split() The method
returns four variables as below:
1. train_X contains X features of the training set.
2. train_y contains the values of response variable for the training set.
3. test_X contains X features of the test set.
4. test_y contains the values of response variable for the test set.
Example:
3. Explain the following below

 Linear Regression
Linear regression is a fundamental statistical method used for modeling the relationship
between a dependent variable and one or more independent variables. It assumes a linear
relationship between the variables and is widely used for predictive analysis.
Reducing Features with Lasso Regression
Lasso regression can be used for feature selection by shrinking some coefficients to zero,
effectively excluding them from the model.
 Feature Selection:
o Description: Lasso regression automatically selects a subset of the most
important features.
o Example: Using lasso regression to identify the most important factors
influencing house prices by reducing less significant features to zero.
 Interpretation:
o Description: The coefficients that remain non-zero are interpreted as the most
relevant predictors.
o Example: Interpreting non-zero coefficients in a lasso regression model to
determine key drivers of sales performance.
By understanding these concepts and techniques, you can effectively apply linear regression
to various predictive modeling tasks and improve your model's performance and
interpretability.
Examples:
Simple Linear Regression:

Multiple Linear Regression:

Fitting a Line
Ordinary Least Squares Example:

Practical - Regression
No ratings yet
Practical - Regression
114 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
Model Development
No ratings yet
Model Development
80 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
Practical 5
No ratings yet
Practical 5
8 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
Assignment Group C
No ratings yet
Assignment Group C
8 pages
Regression Logistic Unit3 Notes
No ratings yet
Regression Logistic Unit3 Notes
6 pages
Linear Regression for Sales and Advertising
No ratings yet
Linear Regression for Sales and Advertising
14 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
BA Unit 2 Notes
No ratings yet
BA Unit 2 Notes
5 pages
Regression
No ratings yet
Regression
53 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Datamining Unit4
No ratings yet
Datamining Unit4
21 pages
Unit 5
No ratings yet
Unit 5
18 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Module 2 Modified
No ratings yet
Module 2 Modified
67 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
ML Combined
No ratings yet
ML Combined
254 pages
ML Exp3
No ratings yet
ML Exp3
10 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Unit 3
No ratings yet
Unit 3
30 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Article Module 4
No ratings yet
Article Module 4
8 pages
Regression Models Overview
No ratings yet
Regression Models Overview
170 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
m2 Data Analytic and Visualization
No ratings yet
m2 Data Analytic and Visualization
53 pages
To Understand Regression Models Using First Principles Thinking
No ratings yet
To Understand Regression Models Using First Principles Thinking
3 pages
Example of Classification Model in Predictive Analytics Techniques
No ratings yet
Example of Classification Model in Predictive Analytics Techniques
9 pages
Book CHPT 9 PPT - SLR
No ratings yet
Book CHPT 9 PPT - SLR
87 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
Satyam
No ratings yet
Satyam
4 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Module 3
No ratings yet
Module 3
34 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Unit 2
No ratings yet
Unit 2
26 pages
ML Unit3b
No ratings yet
ML Unit3b
175 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Regression Test Lesson Notes (Optional Download)
No ratings yet
Regression Test Lesson Notes (Optional Download)
5 pages
Regression Analysis - Classical Assumptions Additional Notes
No ratings yet
Regression Analysis - Classical Assumptions Additional Notes
7 pages
Breusch Godfrey Test of Autocorrelation
No ratings yet
Breusch Godfrey Test of Autocorrelation
7 pages
Poe 5 Statatoc
No ratings yet
Poe 5 Statatoc
12 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
Curve Fitting & Interpolation Guide
No ratings yet
Curve Fitting & Interpolation Guide
64 pages
Essential Concept 2 - Standard Error of Estimate, Coefficient of Determination, Confidence Interval For A Regression Coefficient - IFT World
No ratings yet
Essential Concept 2 - Standard Error of Estimate, Coefficient of Determination, Confidence Interval For A Regression Coefficient - IFT World
2 pages
Applied Econometrics Problem Set 3
No ratings yet
Applied Econometrics Problem Set 3
4 pages
Inisiatif Vol 2 No 2 April 2023 Hal 248-254
No ratings yet
Inisiatif Vol 2 No 2 April 2023 Hal 248-254
7 pages
EBE Ch8
100% (1)
EBE Ch8
9 pages
Civil r17 Co Po Pso Mapping
No ratings yet
Civil r17 Co Po Pso Mapping
63 pages
Enm 331 Numerical Methods
No ratings yet
Enm 331 Numerical Methods
15 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Time Series Analysis in Python With Statsmodels
No ratings yet
Time Series Analysis in Python With Statsmodels
8 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
(FREE PDF Sample) (Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition Ebooks
100% (5)
(FREE PDF Sample) (Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition Ebooks
49 pages
Master Thesis Multiple Regression Analysis
100% (2)
Master Thesis Multiple Regression Analysis
7 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Sources of Error
No ratings yet
Sources of Error
11 pages
1.1. Linear Models - Scikit-Learn 1.4.2 Documentation
No ratings yet
1.1. Linear Models - Scikit-Learn 1.4.2 Documentation
17 pages
Internal Assessment Guide 2012 Physics Reduced PDF
80% (5)
Internal Assessment Guide 2012 Physics Reduced PDF
198 pages
Linear Algebra
No ratings yet
Linear Algebra
10 pages
Training Load Responses Modelling and Model Generalisation in Elite Sports
No ratings yet
Training Load Responses Modelling and Model Generalisation in Elite Sports
14 pages
Econometrics: Multiple Regression Basics
No ratings yet
Econometrics: Multiple Regression Basics
9 pages
S2 Linear Regression LKW 9march2025
No ratings yet
S2 Linear Regression LKW 9march2025
23 pages
Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation
No ratings yet
Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation
24 pages
1-27 Propogation of Error
No ratings yet
1-27 Propogation of Error
22 pages
Strength Standards
No ratings yet
Strength Standards
38 pages
Bharathidasan University-Econometrics-QP-Nov-2010
No ratings yet
Bharathidasan University-Econometrics-QP-Nov-2010
2 pages
Chemistry Lab Report Guide
No ratings yet
Chemistry Lab Report Guide
2 pages

Rohit Unit 2 ML Notes

Uploaded by

Rohit Unit 2 ML Notes

Uploaded by

UNIT 2 CHAPTER 1: LINEAR REGRESSION

 STEPS IN BUILDING A REGRESSION MODEL:

Multiple Linear Regression:

You might also like