0% found this document useful (0 votes)

13 views18 pages

ML File 17 March

The document provides an introduction to machine learning, focusing on its definition, types, and applications, particularly in regression analysis. It explains supervised and unsupervised learning, detailing various regression techniques such as linear, polynomial, and logistic regression, along with their use cases and evaluation metrics. Additionally, it includes practical steps for implementing linear regression in Python, including data preparation and model evaluation.

Uploaded by

pranav1256kam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views18 pages

ML File 17 March

Uploaded by

pranav1256kam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Practical Number- 01

INTRODUCTION TO MACHINE LEARNING

Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn
from data and improve their performance without being explicitly programmed. It involves
algorithms that can analyse patterns, make predictions, and learn from experience, allowing
systems to autonomously adapt and improve their accuracy over time.

Types of Machine Learning

Supervised learning is a machine learning model that uses labeled training data (structured
data) to map a specific feature to a label. In supervised learning, the output is known (such
as recognizing a picture of an apple) and the model is trained on data of the known output.

The most common supervised learning algorithms used today include:

 Linear regression
 Polynomial regression
 K-nearest neighbors
 Naive Bayes
 Decision trees

Unsupervised learning is a machine learning model that uses unlabeled data (unstructured
data) to learn patterns. Unlike supervised learning, the “correctness” of the output is not
known ahead of time. Rather, the algorithm learns from the data without human input (and
is thus, unsupervised) and categorizes it into groups based on attributes Unsupervised
learning is good at descriptive modeling and pattern matching.

The most common unsupervised learning algorithms used today include:

 Fuzzy means
 K-means clustering
 Hierarchical clustering
 Partial least squares

What is Regression?

In machine learning, regression is a supervised learning technique used to predict

continuous numerical values by modeling the relationship between input features and a
target variable, using statistical methods to make predictions.

Independent Variable (Predictor, Feature, Input Variable):

These are the variables that you use to predict the outcome. They are the inputs to the
model.
Dependent Variable (Response ,Target ,Output Variable):

This is the variable that you are trying to predict. It depends on the independent variables.

How to Use Regression?

1. Identify the Problem

 Determine if your problem involves predicting a continuous variable (e.g., price,

temperature, salary).
 Choose regression when the goal is to find relationships between variables.

2. Collect and Prepare Data

 Gather relevant data with independent and dependent variables.

 Clean the data by handling missing values, removing outliers, and normalizing if necessary.
 Split data into training and testing sets (e.g., 80% train, 20% test).

3. Choose the Right Regression Model

 Linear Regression: When the relationship between variables is linear.

 Polynomial Regression: When the relationship is nonlinear.
 Multiple Linear Regression: When multiple independent variables influence the outcome.
 Logistic Regression: For classification problems (e.g., spam or not spam).
 Ridge/Lasso Regression: For regularization to avoid overfitting.
 Decision Tree/Random Forest Regression: For complex, nonlinear relationships.

4. Train the Model

5. Evaluate the Model

6. Make Predictions.
Type of Regression:

1. Linear Regression

 Use Case: Predict continuous values (e.g., house prices, salary).

 Equation:
 Y=mX +b
 Example: Predicting salary based on years of experience.
 Simple Linear Regression → One independent variable.
 Multiple Linear Regression → Multiple independent variables.

2. Polynomial Regression

 Use Case: When data has a non-linear relationship but is still continuous.
 Equation:
 Y=aX2+bX+c
 Example: Predicting population growth or temperature variations.
 Used when the relationship is curved, not straight.

3. Logistic Regression (for Classification)

 Use Case: Binary or multi-class classification problems (e.g., spam detection, disease
prediction).
 Equation (Sigmoid Function):
 P(Y)=1/1+e^−(b0+b1X)
 Example: Predicting whether an email is spam (Yes/No).
 Types:
· Binary Logistic Regression → Two classes (e.g., pass/fail).
· Multinomial Logistic Regression → More than two classes (e.g., predicting
 type of weather: sunny, rainy, snowy).
· Ordinal Logistic Regression → Ordered categories (e.g., rating: low,medium, high)

4. Ridge Regression (L2 Regularization)

 Use Case: Prevents overfitting by adding a penalty to large coefficients.
 Equation (Loss Function with Regularization Term):
 ∑(Y−Y^)2+λ∑β^2
 Example: Used in high-dimensional data (e.g., financial modeling, genetics).
 Helps when multicollinearity (correlation between independent variables) is present.

5. Lasso Regression (L1 Regularization)

 Use Case: Feature selection by reducing less important variables to zero.
 Equation (Loss Function with Regularization Term):
 ∑(Y−Y^)2+λ∑∣β∣
 Example: Selecting the most relevant factors affecting house prices.
 Helps in feature selection by shrinking irrelevant coefficients to zero.

1.Read a CSV file

You can read a CSV file in Python using the pandas library. Here’s how you can do it:

Country Year Total Water Consumption (Billion Cubic Meters) \

0 Argentina 2000 481.490000
1 Argentina 2001 455.063000
2 Argentina 2002 482.749231
3 Argentina 2003 452.660000
4 Argentina 2004 634.566000
.. ... ... ...
495 USA 2020 418.097000
496 USA 2021 572.094000
497 USA 2022 440.978000
498 USA 2023 566.865000
499 USA 2024 249.485000

Per Capita Water Use (Liters per Day) Agricultural Water Use (%)
\
0 235.431429 48.550000
1 299.551000 48.465000
2 340.124615 50.375385
3 326.756667 49.086667
4 230.346000 38.670000
.. ... ...
495 292.970000 47.448000
496 275.978000 46.195000
497 292.039000 54.810000
498 261.197500 62.945000
499 186.374000 51.386000
Industrial Water Use (%) Household Water Use (%) \
0 20.844286 30.100000
1 26.943000 22.550000
2 29.042308 23.349231
3 30.476000 24.440000
4 36.670000 23.924000
.. ... ...
495 25.266000 27.538000
496 32.223000 26.720000
497 30.918000 22.638000
498 25.207500 21.632500
499 24.769000 27.677000

Rainfall Impact (Annual Precipitation in mm) \

0 1288.698571
1 1371.729000
2 1590.305385
3 1816.012667
4 815.998000
.. ...
495 1510.662000
496 754.615000
497 2119.898000
498 1439.155000
499 1771.199000

Groundwater Depletion Rate (%)

0 3.255714
1 3.120000
2 2.733846
3 2.708000
4 1.902000
.. ...
495 2.431000
496 2.628000
497 2.871000
498 1.597500
499 1.638000

[500 rows x 9 columns]

· pd.read_csv("file.csv") loads the CSV file into a DataFrame.

2. Perform descriptive exploration (head, summary statistics

Country Year Total Water Consumption (Billion Cubic Meters) \

0 Argentina 2000 481.490000
1 Argentina 2001 455.063000
2 Argentina 2002 482.749231
3 Argentina 2003 452.660000
4 Argentina 2004 634.566000

Per Capita Water Use (Liters per Day) Agricultural Water Use (%) \
0 235.431429 48.550000
1 299.551000 48.465000
2 340.124615 50.375385
3 326.756667 49.086667
4 230.346000 38.670000

Industrial Water Use (%) Household Water Use (%) \

0 20.844286 30.100000
1 26.943000 22.550000
2 29.042308 23.349231
3 30.476000 24.440000
4 36.670000 23.924000

Rainfall Impact (Annual Precipitation in mm) \

0 1288.698571
1 1371.729000
2 1590.305385
3 1816.012667
4 815.998000

Groundwater Depletion Rate (%)

0 3.255714
1 3.120000
2 2.733846
3 2.708000

4 1.902000

· df.head() shows the first 5 rows of the dataset.

The df.describe() function in Pandas provides summary statistics of a DataFrame’s numerical

columns.

3.Plot feature distributions (histograms, scatter plots).

4.Check linear relationship between two features

5.Split the dataset into 70% test and 30% train

· train_test_split() randomly divides New_Data into:

· 70%test data (test_data)

· 30%training data (train_data)

· test_size=0.3 → 30% of the data is used for testing.

· random_state=42 ensures that the split is reproducible (same split every time

you run it)

Practical Number- 02
Introduction To Linear Regression
Linear regression is one of the most fundamental and widely used algorithms in machine
learning. It is a supervised learning technique used for predictive modeling, primarily to
estimate relationships between variables.

Linear regression models the relationship between a dependent variable (target) and one
or more independent variables (features) by fitting a straight line to the data. The objective
is to find the best-fitting line that minimizes the error between the predicted and actual
values.

Types of Linear Regression

1.Simple Linear Regression– Involves a single independent variable (feature).

The equation is:

y=mx+b

where:

o Y is the predicted value,

o M is the slope (coefficient),
o X is the independent variable,
o B is the y-intercept (bias)

2.Multiple Linear Regression– Involves multiple independent variables. The

equation extends to:

y=b0+b1x1+b2x2+...+bnxn

where b0 is the intercept ,and b1,b2,...bn are the coefficients for respective features

x1,x2,...,xn

How Does Linear Regression Work?

Linear regression uses a statistical approach to estimate the best-fit line by minimizing the
difference between actual and predicted values. The most common method for this is
Ordinary Least Squares (OLS), which minimizes the sum of squared residuals (errors).

Evaluation Metrics

To assess the performance of a linear regression model, we use:

1. Mean Squared Error (MSE):

Mean Squared Error (MSE) is a common loss function used in regression tasks

within machine learning. It measures the average squared difference between the

actual (true) values and the predicted values from a model.

where:

· n=number of data points

· yi= actual value of the ith data point

· y^i= predicted value of the ith data point

Interpretation:

 A lower MSE indicates better model performance, as the predictions are

closer to the actual values.
 A higher MSE means the model has larger errors and does not fit the data
well.

Applications of MSE in Machine Learning:

 Regression Models: Used as a loss function in algorithms like Linear

 Regression, Ridge Regression, and Lasso Regression.
 Neural Networks: Often used in deep learning models for continuous target
variables.
 Model Evaluation: Helps compare different regression models based on their
prediction accuracy.
 Hyper parameter Tuning: Used to optimize parameters like learning rates
and regularization strengths.
2. Root Mean Squared Error (RMSE):
Root Mean Squared Error (RMSE) is a commonly used metric to evaluate the
performance of regression models. It is the square root of the Mean Squared Error
(MSE),which measures the average squared difference between actual and
predicted values.
3.R² Score (Coefficient of Determination):
Measures how well the regression explains variability in the data.
Applications of Linear Regression
· Predicting house prices
· Stock market forecasting
· Sales and revenue prediction
(1.) Relationship Between Variables
The question asks whether the insurance premium depends on driving experience
or vice versa.
· Independent Variable (X): Driving Experience (years)
· Dependent Variable (Y): Monthly Auto Insurance Premium
Expected Relationship:
· As driving experience increases, the insurance premium is expected to
decrease because experienced drivers are generally considered lower risk.
· This suggests a negative correlation between the two variables.

Importing important Libraries And Loading Datasets

(2.) Plots the scatter diagram and regression line

(3.)Computes correlation coefficient (r) and R²
(4.)Computes residual standard errors

(5.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ

Practical Number- 03
Python script to solve the linear regression problem based on the given data

(1.) Importing Important Libraries And Loading Dataset.

(2.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ

(3.) Explain the meaning of a (intercept) and b (slope).

(4.) Calculate the correlation coefficient (r) and r².

(5.) Plot the scatter diagram and regression line.

(6.) Predict cholesterol level for a 60-year-old.

(7.) Compute the standard deviation of errors.

(8.) Construct a 95% confidence interval for B.

(9.) Perform a hypothesis test for B at a 5% significance level.

(10.)Test the positivity of the correlation coefficient at α = 0.025.

Untitled 91
No ratings yet
Untitled 91
6 pages
Python in Hidrology Book
100% (1)
Python in Hidrology Book
153 pages
BCG - Analysis - Colab
No ratings yet
BCG - Analysis - Colab
4 pages
LAB1 HTML
No ratings yet
LAB1 HTML
17 pages
Book 0.1.0 PDF
No ratings yet
Book 0.1.0 PDF
147 pages
Intro to ML with Sklearn & Python
No ratings yet
Intro to ML with Sklearn & Python
10 pages
EDGAR-FOOD Pakistan
No ratings yet
EDGAR-FOOD Pakistan
42 pages
India Economic Indicators 2003-2022
No ratings yet
India Economic Indicators 2003-2022
24 pages
Irrigation Monitoring
No ratings yet
Irrigation Monitoring
2 pages
Curva Hipsometrica Cuenca 1
No ratings yet
Curva Hipsometrica Cuenca 1
3 pages
Nikitha
No ratings yet
Nikitha
15 pages
Social Yrevised
No ratings yet
Social Yrevised
117 pages
Urbanization Trends
No ratings yet
Urbanization Trends
8 pages
ML LAB Prob 1 5
No ratings yet
ML LAB Prob 1 5
22 pages
Python Assignment 1.ipynb - Colaboratory
No ratings yet
Python Assignment 1.ipynb - Colaboratory
3 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Shashank Bodduna: Informatics Practices Project XII
No ratings yet
Shashank Bodduna: Informatics Practices Project XII
20 pages
King County House Price Analysis
No ratings yet
King County House Price Analysis
1 page
Wa0000
No ratings yet
Wa0000
26 pages
Bangoutawaye
No ratings yet
Bangoutawaye
41 pages
Table 1a
No ratings yet
Table 1a
3 pages
BP Production Oct 2024 (11 Nov 2024)
No ratings yet
BP Production Oct 2024 (11 Nov 2024)
4 pages
Urban Heat Island & Environmental Impact Analysis: Overview
No ratings yet
Urban Heat Island & Environmental Impact Analysis: Overview
28 pages
DS Manual
100% (1)
DS Manual
29 pages
Class 12 Chemistry Project
No ratings yet
Class 12 Chemistry Project
21 pages
Daniel Sam Joseph: Informatics Practices Project XII
No ratings yet
Daniel Sam Joseph: Informatics Practices Project XII
20 pages
Particulars UOM Target 1 2: Date
No ratings yet
Particulars UOM Target 1 2: Date
9 pages
G S I Documentation V 0 35
No ratings yet
G S I Documentation V 0 35
25 pages
Institute of Technology Management & Research
No ratings yet
Institute of Technology Management & Research
10 pages
Energy Economics Assignment Aijaz
No ratings yet
Energy Economics Assignment Aijaz
2 pages
Energy Economics Assignment Aijaz
No ratings yet
Energy Economics Assignment Aijaz
2 pages
Energy Economics Assignment Aijaz
No ratings yet
Energy Economics Assignment Aijaz
2 pages
Descriptive Statistics Project
No ratings yet
Descriptive Statistics Project
11 pages
Water System Design
No ratings yet
Water System Design
18 pages
Experiment 3: Name: Harshit Kapoor Reg. No: 15BCE0657 Slot: L11+L12
No ratings yet
Experiment 3: Name: Harshit Kapoor Reg. No: 15BCE0657 Slot: L11+L12
8 pages
Python Workshops: Data Analysis & Visualization
No ratings yet
Python Workshops: Data Analysis & Visualization
43 pages
Diploma in Python For Water Resources and Geoscience: v. 1.1 April - September 2022
No ratings yet
Diploma in Python For Water Resources and Geoscience: v. 1.1 April - September 2022
20 pages
# Syllabus
No ratings yet
# Syllabus
2 pages
Class XII Informatics Project
No ratings yet
Class XII Informatics Project
20 pages
Air Quality Randomforest
No ratings yet
Air Quality Randomforest
5 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
223 pages
WWW Water Carbon Targets Working File-Template1
No ratings yet
WWW Water Carbon Targets Working File-Template1
20 pages
Intro to Pandas for Data Science
No ratings yet
Intro to Pandas for Data Science
6 pages
Assignment2 VidulGarg
No ratings yet
Assignment2 VidulGarg
11 pages
667788
No ratings yet
667788
18 pages
3rd Attempt Revised
No ratings yet
3rd Attempt Revised
137 pages
DS Manual-1
No ratings yet
DS Manual-1
29 pages
A PROJECT WORK ON-pages-deleted
No ratings yet
A PROJECT WORK ON-pages-deleted
23 pages
Learningthepandaslibrary PDF
100% (1)
Learningthepandaslibrary PDF
233 pages
Infilated Service Cost 2024
No ratings yet
Infilated Service Cost 2024
5 pages
Yearly Breakup
No ratings yet
Yearly Breakup
9 pages
Pandas Puzzles for Data Science
100% (1)
Pandas Puzzles for Data Science
156 pages
ML Lab - BCSL606
No ratings yet
ML Lab - BCSL606
67 pages
Poblacion 7, Water System, Can Avid
No ratings yet
Poblacion 7, Water System, Can Avid
142 pages
India Sustainable Energy
No ratings yet
India Sustainable Energy
28 pages
Relax MC Calculation - 11112024
No ratings yet
Relax MC Calculation - 11112024
22 pages
Sumit Kumar
No ratings yet
Sumit Kumar
58 pages
PDFF
No ratings yet
PDFF
15 pages
ML 2 Marks Quick Revision
No ratings yet
ML 2 Marks Quick Revision
3 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
ML 01 (Shubham)
No ratings yet
ML 01 (Shubham)
14 pages
Check Balance and Imbalance Using Stack
No ratings yet
Check Balance and Imbalance Using Stack
2 pages
Pranavsql
No ratings yet
Pranavsql
26 pages
A.C. Joshi Library Panjab University, Chandigarh
No ratings yet
A.C. Joshi Library Panjab University, Chandigarh
1 page
2019 How Planetary Systems Form
No ratings yet
2019 How Planetary Systems Form
5 pages
Absentee Parents
No ratings yet
Absentee Parents
4 pages
Delayed Coker Fired Heater Designand Operation Fouling PDF
No ratings yet
Delayed Coker Fired Heater Designand Operation Fouling PDF
10 pages
Hermes
No ratings yet
Hermes
7 pages
BS 160 Selected Pages
No ratings yet
BS 160 Selected Pages
52 pages
Borer Structuring Sense
No ratings yet
Borer Structuring Sense
13 pages
Medical Devices Equipment
No ratings yet
Medical Devices Equipment
30 pages
ECON 511 Revision Questions On CH 7
No ratings yet
ECON 511 Revision Questions On CH 7
4 pages
PS4 Solution
No ratings yet
PS4 Solution
9 pages
Lecture 10
No ratings yet
Lecture 10
4 pages
(Ebook) The Bell in The Fog by Lev AC Rosen ISBN 9781250834256, 9781250834263, 1250834252, 1250834260 All Chapters Available
No ratings yet
(Ebook) The Bell in The Fog by Lev AC Rosen ISBN 9781250834256, 9781250834263, 1250834252, 1250834260 All Chapters Available
65 pages
Final PR
No ratings yet
Final PR
26 pages
Btech Oe 7 Sem Renewable Energy Resources Koe074 2023
No ratings yet
Btech Oe 7 Sem Renewable Energy Resources Koe074 2023
2 pages
General Specification LAGMAY
No ratings yet
General Specification LAGMAY
7 pages
HRM370 Group Project 3
No ratings yet
HRM370 Group Project 3
21 pages
Broiler Farm Profitability in Nigeria
No ratings yet
Broiler Farm Profitability in Nigeria
13 pages
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
No ratings yet
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
25 pages
Intro to Microeconomics Course
No ratings yet
Intro to Microeconomics Course
5 pages
SDK DM20 Manual
No ratings yet
SDK DM20 Manual
20 pages
Interlanguage
No ratings yet
Interlanguage
28 pages
Chapter 3 Force and Motion
No ratings yet
Chapter 3 Force and Motion
22 pages
Primordialism Constructivism Instrumenta
No ratings yet
Primordialism Constructivism Instrumenta
10 pages
Test Table Mounting Brackets
No ratings yet
Test Table Mounting Brackets
2 pages
Aristotle's Just Theory On Private Property
No ratings yet
Aristotle's Just Theory On Private Property
9 pages
Shopify Developer Profile
No ratings yet
Shopify Developer Profile
3 pages
Pre-Hispanic Guatemalan Theater
No ratings yet
Pre-Hispanic Guatemalan Theater
13 pages
Elastic Deformation in Axial Loads
No ratings yet
Elastic Deformation in Axial Loads
19 pages
Linde HPR135 - 1021
No ratings yet
Linde HPR135 - 1021
2 pages
EEX3417 Final - 20202021
No ratings yet
EEX3417 Final - 20202021
9 pages
Resolution Book Exercises
No ratings yet
Resolution Book Exercises
17 pages

ML File 17 March

Uploaded by

ML File 17 March

Uploaded by

Practical Number- 01

INTRODUCTION TO MACHINE LEARNING

Types of Machine Learning

The most common supervised learning algorithms used today include:

The most common unsupervised learning algorithms used today include:

In machine learning, regression is a supervised learning technique used to predict

Independent Variable (Predictor, Feature, Input Variable):

How to Use Regression?

1. Identify the Problem

 Determine if your problem involves predicting a continuous variable (e.g., price,

2. Collect and Prepare Data

 Gather relevant data with independent and dependent variables.

3. Choose the Right Regression Model

 Linear Regression: When the relationship between variables is linear.

4. Train the Model

5. Evaluate the Model

 Use Case: Predict continuous values (e.g., house prices, salary).

3. Logistic Regression (for Classification)

4. Ridge Regression (L2 Regularization)

5. Lasso Regression (L1 Regularization)

1.Read a CSV file

Country Year Total Water Consumption (Billion Cubic Meters) \

Rainfall Impact (Annual Precipitation in mm) \

Groundwater Depletion Rate (%)

[500 rows x 9 columns]

· pd.read_csv("file.csv") loads the CSV file into a DataFrame.

2. Perform descriptive exploration (head, summary statistics

Country Year Total Water Consumption (Billion Cubic Meters) \

Industrial Water Use (%) Household Water Use (%) \

Rainfall Impact (Annual Precipitation in mm) \

Groundwater Depletion Rate (%)

· df.head() shows the first 5 rows of the dataset.

The df.describe() function in Pandas provides summary statistics of a DataFrame’s numerical

3.Plot feature distributions (histograms, scatter plots).

5.Split the dataset into 70% test and 30% train

· train_test_split() randomly divides New_Data into:

· 70%test data (test_data)

· test_size=0.3 → 30% of the data is used for testing.

you run it)

Types of Linear Regression

1.Simple Linear Regression– Involves a single independent variable (feature).

The equation is:

o Y is the predicted value,

2.Multiple Linear Regression– Involves multiple independent variables. The

equation extends to:

How Does Linear Regression Work?

To assess the performance of a linear regression model, we use:

actual (true) values and the predicted values from a model.

· n=number of data points

· yi= actual value of the ith data point

· y^i= predicted value of the ith data point

 A lower MSE indicates better model performance, as the predictions are

Applications of MSE in Machine Learning:

 Regression Models: Used as a loss function in algorithms like Linear

Importing important Libraries And Loading Datasets

(2.) Plots the scatter diagram and regression line

(5.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ

(1.) Importing Important Libraries And Loading Dataset.

(2.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ

(3.) Explain the meaning of a (intercept) and b (slope).

(4.) Calculate the correlation coefficient (r) and r².

(6.) Predict cholesterol level for a 60-year-old.

(7.) Compute the standard deviation of errors.

(9.) Perform a hypothesis test for B at a 5% significance level.

(10.)Test the positivity of the correlation coefficient at α = 0.025.

You might also like