0% found this document useful (0 votes)
54 views7 pages

Rohit Unit 2 ML Notes

Uploaded by

Abhishek Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views7 pages

Rohit Unit 2 ML Notes

Uploaded by

Abhishek Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

UNIT 2 CHAPTER 1: LINEAR REGRESSION

1. What is simple linear regression? Write the steps to build a regression model.
Explain how to build simple linear regression model.
 Simple linear regression is a statistical technique used for finding the existence of an
association rela tionship between a dependent variable (aka response variable or outcome
variable) and an independent variable (aka explanatory variable, predictor variable or
feature). We can only establish that change in the value of the outcome variable (Y) is
associated with change in the value of feature X, that is, regression technique cannot be used
for establishing causal relationship between two variables. Regression is one of the most
popular supervised learning algorithms in predictive analytics. A regres sion model requires
the knowledge of both the outcome and the feature variables in the training dataset. The
following are a few examples of simple and multiple linear regression problems:
a. A hospital may be interested in finding how the total cost of a patient for a treatment
varies with the body weight of the patient.
b. Insurance companies would like to understand the association between healthcare
costs and ageing.
c. An organization may be interested in finding the relationship between revenue
generated from a product and features such as the price, money spent on promotion,
competitors’ price, and promotion expenses.
d. Restaurants would like to know the relationship between the customer waiting time
after placing the order and the revenue.

 STEPS IN BUILDING A REGRESSION MODEL:


Building a regression model is an iterative process and several iterations may be required
before finalizing the appropriate model.
STEP 1: Collect/Extract Data
The first step in building a regression model is to collect or extract data on the dependent
(outcome) variable and independent (feature) variables from different data sources.
STEP 2: Pre-Process the Data
Before the model is built, it is essential to ensure the quality of the data for issues such as
reliability, completeness, usefulness, accuracy, missing data, and outliers.
STEP 3: Dividing Data into Training and Validation Datasets
In this stage the data is divided into two subsets (sometimes more than two subsets): training
dataset and validation or test dataset. The proportion of training dataset is usually between
70% and 80% of the data and the remaining data is treated as the validation data. The subsets
may be created using random/ stratified sampling procedure. This is an important step to
measure the performance of the model using dataset not used in model building. It is also
essential to check for any overfitting of the model. In many cases, multiple training and
multiple test data are used (called cross-validation).
STEP 4: Perform Descriptive Analytics or Data Exploration
It is always a good practice to perform descriptive analytics before moving to building a
predictive analytics model. Descriptive statistics will help us to understand the variability in
the model and visualization of the data through, say, a box plot which will show if there are
any outliers in the data. Another visualization technique, the scatter plot, may also reveal if
there is any obvious relationship between the two variables under consideration. Scatter plot
is useful to describe the functional relationship between the dependent or outcome variable
and features.
STEP 5: Build the Model
The model is built using the training dataset to estimate the regression parameters. The
method of Ordinary Least Squares (OLS) is used to estimate the regression
parameters.
STEP 6: Perform Model Diagnostics
Regression is often misused since many times the modeler fails to perform necessary
diagnostics tests before applying the model. Before it can be applied, it is necessary that the
model created is validated for all model assumptions including the definition of the function
form. If the model assumptions are violated, then the modeler must use remedial measure.
STEP 7: Validate the Model and Measure Model Accuracy
A major concern in analytics is over-fitting, that is, the model may perform very well on the
training dataset, but may perform badly in validation dataset. It is important to ensure that the
model performance is consistent on the validation dataset as is in the training dataset. In fact,
the model may be cross validated using multiple training and test datasets.
STEP 8: Decide on Model Deployment
The final step in the regression model is to develop a deployment strategy in the form of
actionable items and business rules that can be used by the organization.
 BUILDING SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression (SLR) is a statistical model in which there is only one
independent vari able (or feature) and the functional relationship between the outcome
variable and the regression coefficient is linear. Linear regression implies that the
mathematical function is linear with respect to regression parameters. One of the
functional forms of SLR is as follows:

For a dataset with n observations (Xi , Yi ), where i = 1, 2, …, n, the above functional form
can be written as follows:
----------------------------------------------------------------------------------------------------------------
2. How to split the dataset into training and validation sets?
 train_test_split() function from skelarn.model_selection module provides the ability to
split the dataset randomly into training and validation datasets. The parameter train_size takes
a fraction between 0 and 1 for specifying training set size. The remaining samples in the
original set will be test or validation set. The records that are selected for training and test set
are randomly sampled. The method takes a seed value in parameter named random_state, to
fix which samples go to training and which ones go to test set. train_test_split() The method
returns four variables as below:
1. train_X contains X features of the training set.
2. train_y contains the values of response variable for the training set.
3. test_X contains X features of the test set.
4. test_y contains the values of response variable for the test set.
Example:
3. Explain the following below

 Linear Regression
Linear regression is a fundamental statistical method used for modeling the relationship
between a dependent variable and one or more independent variables. It assumes a linear
relationship between the variables and is widely used for predictive analysis.
Reducing Features with Lasso Regression
Lasso regression can be used for feature selection by shrinking some coefficients to zero,
effectively excluding them from the model.
 Feature Selection:
o Description: Lasso regression automatically selects a subset of the most
important features.
o Example: Using lasso regression to identify the most important factors
influencing house prices by reducing less significant features to zero.
 Interpretation:
o Description: The coefficients that remain non-zero are interpreted as the most
relevant predictors.
o Example: Interpreting non-zero coefficients in a lasso regression model to
determine key drivers of sales performance.
By understanding these concepts and techniques, you can effectively apply linear regression
to various predictive modeling tasks and improve your model's performance and
interpretability.
Examples:
Simple Linear Regression:

Multiple Linear Regression:


Fitting a Line
Ordinary Least Squares Example:

You might also like