0% found this document useful (0 votes)
13 views18 pages

ML File 17 March

The document provides an introduction to machine learning, focusing on its definition, types, and applications, particularly in regression analysis. It explains supervised and unsupervised learning, detailing various regression techniques such as linear, polynomial, and logistic regression, along with their use cases and evaluation metrics. Additionally, it includes practical steps for implementing linear regression in Python, including data preparation and model evaluation.

Uploaded by

pranav1256kam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

ML File 17 March

The document provides an introduction to machine learning, focusing on its definition, types, and applications, particularly in regression analysis. It explains supervised and unsupervised learning, detailing various regression techniques such as linear, polynomial, and logistic regression, along with their use cases and evaluation metrics. Additionally, it includes practical steps for implementing linear regression in Python, including data preparation and model evaluation.

Uploaded by

pranav1256kam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Practical Number- 01

INTRODUCTION TO MACHINE LEARNING

Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn
from data and improve their performance without being explicitly programmed. It involves
algorithms that can analyse patterns, make predictions, and learn from experience, allowing
systems to autonomously adapt and improve their accuracy over time.

Types of Machine Learning

Supervised learning is a machine learning model that uses labeled training data (structured
data) to map a specific feature to a label. In supervised learning, the output is known (such
as recognizing a picture of an apple) and the model is trained on data of the known output.

The most common supervised learning algorithms used today include:

 Linear regression
 Polynomial regression
 K-nearest neighbors
 Naive Bayes
 Decision trees

Unsupervised learning is a machine learning model that uses unlabeled data (unstructured
data) to learn patterns. Unlike supervised learning, the “correctness” of the output is not
known ahead of time. Rather, the algorithm learns from the data without human input (and
is thus, unsupervised) and categorizes it into groups based on attributes Unsupervised
learning is good at descriptive modeling and pattern matching.

The most common unsupervised learning algorithms used today include:

 Fuzzy means
 K-means clustering
 Hierarchical clustering
 Partial least squares

What is Regression?

In machine learning, regression is a supervised learning technique used to predict


continuous numerical values by modeling the relationship between input features and a
target variable, using statistical methods to make predictions.

Independent Variable (Predictor, Feature, Input Variable):

These are the variables that you use to predict the outcome. They are the inputs to the
model.
Dependent Variable (Response ,Target ,Output Variable):

This is the variable that you are trying to predict. It depends on the independent variables.

How to Use Regression?

1. Identify the Problem

 Determine if your problem involves predicting a continuous variable (e.g., price,


temperature, salary).
 Choose regression when the goal is to find relationships between variables.

2. Collect and Prepare Data

 Gather relevant data with independent and dependent variables.


 Clean the data by handling missing values, removing outliers, and normalizing if necessary.
 Split data into training and testing sets (e.g., 80% train, 20% test).

3. Choose the Right Regression Model

 Linear Regression: When the relationship between variables is linear.


 Polynomial Regression: When the relationship is nonlinear.
 Multiple Linear Regression: When multiple independent variables influence the outcome.
 Logistic Regression: For classification problems (e.g., spam or not spam).
 Ridge/Lasso Regression: For regularization to avoid overfitting.
 Decision Tree/Random Forest Regression: For complex, nonlinear relationships.

4. Train the Model

5. Evaluate the Model

6. Make Predictions.
Type of Regression:

1. Linear Regression

 Use Case: Predict continuous values (e.g., house prices, salary).


 Equation:
 Y=mX +b
 Example: Predicting salary based on years of experience.
 Simple Linear Regression → One independent variable.
 Multiple Linear Regression → Multiple independent variables.

2. Polynomial Regression

 Use Case: When data has a non-linear relationship but is still continuous.
 Equation:
 Y=aX2+bX+c
 Example: Predicting population growth or temperature variations.
 Used when the relationship is curved, not straight.

3. Logistic Regression (for Classification)

 Use Case: Binary or multi-class classification problems (e.g., spam detection, disease
prediction).
 Equation (Sigmoid Function):
 P(Y)=1/1+e^−(b0+b1X)
 Example: Predicting whether an email is spam (Yes/No).
 Types:
· Binary Logistic Regression → Two classes (e.g., pass/fail).
· Multinomial Logistic Regression → More than two classes (e.g., predicting
 type of weather: sunny, rainy, snowy).
· Ordinal Logistic Regression → Ordered categories (e.g., rating: low,medium, high)

4. Ridge Regression (L2 Regularization)


 Use Case: Prevents overfitting by adding a penalty to large coefficients.
 Equation (Loss Function with Regularization Term):
 ∑(Y−Y^)2+λ∑β^2
 Example: Used in high-dimensional data (e.g., financial modeling, genetics).
 Helps when multicollinearity (correlation between independent variables) is present.

5. Lasso Regression (L1 Regularization)


 Use Case: Feature selection by reducing less important variables to zero.
 Equation (Loss Function with Regularization Term):
 ∑(Y−Y^)2+λ∑∣β∣
 Example: Selecting the most relevant factors affecting house prices.
 Helps in feature selection by shrinking irrelevant coefficients to zero.

1.Read a CSV file

You can read a CSV file in Python using the pandas library. Here’s how you can do it:

Country Year Total Water Consumption (Billion Cubic Meters) \


0 Argentina 2000 481.490000
1 Argentina 2001 455.063000
2 Argentina 2002 482.749231
3 Argentina 2003 452.660000
4 Argentina 2004 634.566000
.. ... ... ...
495 USA 2020 418.097000
496 USA 2021 572.094000
497 USA 2022 440.978000
498 USA 2023 566.865000
499 USA 2024 249.485000

Per Capita Water Use (Liters per Day) Agricultural Water Use (%)
\
0 235.431429 48.550000
1 299.551000 48.465000
2 340.124615 50.375385
3 326.756667 49.086667
4 230.346000 38.670000
.. ... ...
495 292.970000 47.448000
496 275.978000 46.195000
497 292.039000 54.810000
498 261.197500 62.945000
499 186.374000 51.386000
Industrial Water Use (%) Household Water Use (%) \
0 20.844286 30.100000
1 26.943000 22.550000
2 29.042308 23.349231
3 30.476000 24.440000
4 36.670000 23.924000
.. ... ...
495 25.266000 27.538000
496 32.223000 26.720000
497 30.918000 22.638000
498 25.207500 21.632500
499 24.769000 27.677000

Rainfall Impact (Annual Precipitation in mm) \


0 1288.698571
1 1371.729000
2 1590.305385
3 1816.012667
4 815.998000
.. ...
495 1510.662000
496 754.615000
497 2119.898000
498 1439.155000
499 1771.199000

Groundwater Depletion Rate (%)


0 3.255714
1 3.120000
2 2.733846
3 2.708000
4 1.902000
.. ...
495 2.431000
496 2.628000
497 2.871000
498 1.597500
499 1.638000

[500 rows x 9 columns]

· pd.read_csv("file.csv") loads the CSV file into a DataFrame.

2. Perform descriptive exploration (head, summary statistics

Country Year Total Water Consumption (Billion Cubic Meters) \


0 Argentina 2000 481.490000
1 Argentina 2001 455.063000
2 Argentina 2002 482.749231
3 Argentina 2003 452.660000
4 Argentina 2004 634.566000

Per Capita Water Use (Liters per Day) Agricultural Water Use (%) \
0 235.431429 48.550000
1 299.551000 48.465000
2 340.124615 50.375385
3 326.756667 49.086667
4 230.346000 38.670000

Industrial Water Use (%) Household Water Use (%) \


0 20.844286 30.100000
1 26.943000 22.550000
2 29.042308 23.349231
3 30.476000 24.440000
4 36.670000 23.924000

Rainfall Impact (Annual Precipitation in mm) \


0 1288.698571
1 1371.729000
2 1590.305385
3 1816.012667
4 815.998000

Groundwater Depletion Rate (%)


0 3.255714
1 3.120000
2 2.733846
3 2.708000

4 1.902000

· df.head() shows the first 5 rows of the dataset.

The df.describe() function in Pandas provides summary statistics of a DataFrame’s numerical


columns.

3.Plot feature distributions (histograms, scatter plots).


4.Check linear relationship between two features

5.Split the dataset into 70% test and 30% train

· train_test_split() randomly divides New_Data into:

· 70%test data (test_data)


· 30%training data (train_data)

· test_size=0.3 → 30% of the data is used for testing.

· random_state=42 ensures that the split is reproducible (same split every time

you run it)


Practical Number- 02
Introduction To Linear Regression
Linear regression is one of the most fundamental and widely used algorithms in machine
learning. It is a supervised learning technique used for predictive modeling, primarily to
estimate relationships between variables.

Linear regression models the relationship between a dependent variable (target) and one
or more independent variables (features) by fitting a straight line to the data. The objective
is to find the best-fitting line that minimizes the error between the predicted and actual
values.

Types of Linear Regression

1.Simple Linear Regression– Involves a single independent variable (feature).

The equation is:

y=mx+b

where:

o Y is the predicted value,


o M is the slope (coefficient),
o X is the independent variable,
o B is the y-intercept (bias)

2.Multiple Linear Regression– Involves multiple independent variables. The

equation extends to:

y=b0+b1x1+b2x2+...+bnxn

where b0 is the intercept ,and b1,b2,...bn are the coefficients for respective features

x1,x2,...,xn

How Does Linear Regression Work?

Linear regression uses a statistical approach to estimate the best-fit line by minimizing the
difference between actual and predicted values. The most common method for this is
Ordinary Least Squares (OLS), which minimizes the sum of squared residuals (errors).

Evaluation Metrics

To assess the performance of a linear regression model, we use:


1. Mean Squared Error (MSE):

Mean Squared Error (MSE) is a common loss function used in regression tasks

within machine learning. It measures the average squared difference between the

actual (true) values and the predicted values from a model.

where:

· n=number of data points

· yi= actual value of the ith data point

· y^i= predicted value of the ith data point

Interpretation:

 A lower MSE indicates better model performance, as the predictions are


closer to the actual values.
 A higher MSE means the model has larger errors and does not fit the data
well.

Applications of MSE in Machine Learning:

 Regression Models: Used as a loss function in algorithms like Linear


 Regression, Ridge Regression, and Lasso Regression.
 Neural Networks: Often used in deep learning models for continuous target
variables.
 Model Evaluation: Helps compare different regression models based on their
prediction accuracy.
 Hyper parameter Tuning: Used to optimize parameters like learning rates
and regularization strengths.
2. Root Mean Squared Error (RMSE):
Root Mean Squared Error (RMSE) is a commonly used metric to evaluate the
performance of regression models. It is the square root of the Mean Squared Error
(MSE),which measures the average squared difference between actual and
predicted values.
3.R² Score (Coefficient of Determination):
Measures how well the regression explains variability in the data.
Applications of Linear Regression
· Predicting house prices
· Stock market forecasting
· Sales and revenue prediction
(1.) Relationship Between Variables
The question asks whether the insurance premium depends on driving experience
or vice versa.
· Independent Variable (X): Driving Experience (years)
· Dependent Variable (Y): Monthly Auto Insurance Premium
Expected Relationship:
· As driving experience increases, the insurance premium is expected to
decrease because experienced drivers are generally considered lower risk.
· This suggests a negative correlation between the two variables.

Importing important Libraries And Loading Datasets

(2.) Plots the scatter diagram and regression line


(3.)Computes correlation coefficient (r) and R²
(4.)Computes residual standard errors

(5.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ


Practical Number- 03
Python script to solve the linear regression problem based on the given data

(1.) Importing Important Libraries And Loading Dataset.

(2.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ

(3.) Explain the meaning of a (intercept) and b (slope).

(4.) Calculate the correlation coefficient (r) and r².


(5.) Plot the scatter diagram and regression line.

(6.) Predict cholesterol level for a 60-year-old.

(7.) Compute the standard deviation of errors.


(8.) Construct a 95% confidence interval for B.

(9.) Perform a hypothesis test for B at a 5% significance level.

(10.)Test the positivity of the correlation coefficient at α = 0.025.

You might also like