Practical Number- 01
INTRODUCTION TO MACHINE LEARNING
    Machine learning (ML) is a field of artificial intelligence (AI) that enables computers to learn
    from data and improve their performance without being explicitly programmed. It involves
    algorithms that can analyse patterns, make predictions, and learn from experience, allowing
    systems to autonomously adapt and improve their accuracy over time.
    Types of Machine Learning
    Supervised learning is a machine learning model that uses labeled training data (structured
    data) to map a specific feature to a label. In supervised learning, the output is known (such
    as recognizing a picture of an apple) and the model is trained on data of the known output.
    The most common supervised learning algorithms used today include:
   Linear regression
   Polynomial regression
   K-nearest neighbors
   Naive Bayes
   Decision trees
    Unsupervised learning is a machine learning model that uses unlabeled data (unstructured
    data) to learn patterns. Unlike supervised learning, the “correctness” of the output is not
    known ahead of time. Rather, the algorithm learns from the data without human input (and
    is thus, unsupervised) and categorizes it into groups based on attributes Unsupervised
    learning is good at descriptive modeling and pattern matching.
    The most common unsupervised learning algorithms used today include:
   Fuzzy means
   K-means clustering
   Hierarchical clustering
   Partial least squares
    What is Regression?
    In machine learning, regression is a supervised learning technique used to predict
    continuous numerical values by modeling the relationship between input features and a
    target variable, using statistical methods to make predictions.
    Independent Variable (Predictor, Feature, Input Variable):
    These are the variables that you use to predict the outcome. They are the inputs to the
    model.
Dependent Variable (Response ,Target ,Output Variable):
This is the variable that you are trying to predict. It depends on the independent variables.
How to Use Regression?
1. Identify the Problem
      Determine if your problem involves predicting a continuous variable (e.g., price,
       temperature, salary).
      Choose regression when the goal is to find relationships between variables.
2. Collect and Prepare Data
      Gather relevant data with independent and dependent variables.
      Clean the data by handling missing values, removing outliers, and normalizing if necessary.
      Split data into training and testing sets (e.g., 80% train, 20% test).
3. Choose the Right Regression Model
      Linear Regression: When the relationship between variables is linear.
      Polynomial Regression: When the relationship is nonlinear.
      Multiple Linear Regression: When multiple independent variables influence the outcome.
      Logistic Regression: For classification problems (e.g., spam or not spam).
      Ridge/Lasso Regression: For regularization to avoid overfitting.
      Decision Tree/Random Forest Regression: For complex, nonlinear relationships.
4. Train the Model
5. Evaluate the Model
6. Make Predictions.
Type of Regression:
1. Linear Regression
      Use Case: Predict continuous values (e.g., house prices, salary).
      Equation:
      Y=mX +b
      Example: Predicting salary based on years of experience.
      Simple Linear Regression → One independent variable.
      Multiple Linear Regression → Multiple independent variables.
2. Polynomial Regression
      Use Case: When data has a non-linear relationship but is still continuous.
      Equation:
      Y=aX2+bX+c
      Example: Predicting population growth or temperature variations.
      Used when the relationship is curved, not straight.
3. Logistic Regression (for Classification)
      Use Case: Binary or multi-class classification problems (e.g., spam detection, disease
       prediction).
      Equation (Sigmoid Function):
      P(Y)=1/1+e^−(b0+b1X)
      Example: Predicting whether an email is spam (Yes/No).
      Types:
           · Binary Logistic Regression → Two classes (e.g., pass/fail).
           · Multinomial Logistic Regression → More than two classes (e.g., predicting
     type of weather: sunny, rainy, snowy).
          · Ordinal Logistic Regression → Ordered categories (e.g., rating: low,medium, high)
  4. Ridge Regression (L2 Regularization)
     Use Case: Prevents overfitting by adding a penalty to large coefficients.
     Equation (Loss Function with Regularization Term):
     ∑(Y−Y^)2+λ∑β^2
     Example: Used in high-dimensional data (e.g., financial modeling, genetics).
     Helps when multicollinearity (correlation between independent variables) is present.
  5. Lasso Regression (L1 Regularization)
     Use Case: Feature selection by reducing less important variables to zero.
     Equation (Loss Function with Regularization Term):
     ∑(Y−Y^)2+λ∑∣β∣
     Example: Selecting the most relevant factors affecting house prices.
     Helps in feature selection by shrinking irrelevant coefficients to zero.
  1.Read a CSV file
  You can read a CSV file in Python using the pandas library. Here’s how you can do it:
       Country Year Total Water Consumption (Billion Cubic Meters) \
0     Argentina 2000                                     481.490000
1     Argentina 2001                                     455.063000
2     Argentina 2002                                     482.749231
3     Argentina 2003                                     452.660000
4     Argentina 2004                                     634.566000
..          ...  ...                                            ...
495         USA 2020                                     418.097000
496         USA 2021                                     572.094000
497         USA 2022                                     440.978000
498         USA 2023                                     566.865000
499         USA 2024                                     249.485000
      Per Capita Water Use (Liters per Day)                 Agricultural Water Use (%)
\
0                                          235.431429                                48.550000
1                                          299.551000                                48.465000
2                                          340.124615                                50.375385
3                                          326.756667                                49.086667
4                                          230.346000                                38.670000
..                                                ...                                      ...
495                                        292.970000                                47.448000
496                                        275.978000                                46.195000
497                                        292.039000                                54.810000
498                                        261.197500                                62.945000
499                                        186.374000                                51.386000
       Industrial Water Use (%)              Household Water Use (%)   \
0                     20.844286                            30.100000
1                     26.943000                            22.550000
2                     29.042308                            23.349231
3                     30.476000                            24.440000
4                     36.670000                            23.924000
..                          ...                                  ...
495                   25.266000                            27.538000
496                   32.223000                            26.720000
497                   30.918000                            22.638000
498                   25.207500                            21.632500
499                   24.769000                            27.677000
       Rainfall Impact (Annual Precipitation in mm)              \
0                                       1288.698571
1                                       1371.729000
2                                       1590.305385
3                                       1816.012667
4                                        815.998000
..                                              ...
495                                     1510.662000
496                                      754.615000
497                                     2119.898000
498                                     1439.155000
499                                     1771.199000
       Groundwater Depletion Rate (%)
0                            3.255714
1                            3.120000
2                            2.733846
3                            2.708000
4                            1.902000
..                                ...
495                          2.431000
496                          2.628000
497                          2.871000
498                          1.597500
499                          1.638000
[500 rows x 9 columns]
· pd.read_csv("file.csv") loads the CSV file into a DataFrame.
2. Perform descriptive exploration (head, summary statistics
     Country Year Total Water Consumption (Billion Cubic Meters) \
0   Argentina 2000                                     481.490000
1   Argentina       2001                                                       455.063000
2   Argentina       2002                                                       482.749231
3   Argentina       2003                                                       452.660000
4   Argentina       2004                                                       634.566000
    Per Capita Water Use (Liters per Day)                   Agricultural Water Use (%)        \
0                              235.431429                                    48.550000
1                              299.551000                                    48.465000
2                              340.124615                                    50.375385
3                              326.756667                                    49.086667
4                              230.346000                                    38.670000
    Industrial Water Use (%)              Household Water Use (%)          \
0                  20.844286                            30.100000
1                  26.943000                            22.550000
2                  29.042308                            23.349231
3                  30.476000                            24.440000
4                  36.670000                            23.924000
    Rainfall Impact (Annual Precipitation in mm)                    \
0                                    1288.698571
1                                    1371.729000
2                                    1590.305385
3                                    1816.012667
4                                     815.998000
    Groundwater Depletion Rate (%)
0                         3.255714
1                         3.120000
2                         2.733846
3                         2.708000
4                                   1.902000
· df.head() shows the first 5 rows of the dataset.
The df.describe() function in Pandas provides summary statistics of a DataFrame’s numerical
columns.
3.Plot feature distributions (histograms, scatter plots).
4.Check linear relationship between two features
5.Split the dataset into 70% test and 30% train
· train_test_split() randomly divides New_Data into:
· 70%test data (test_data)
· 30%training data (train_data)
· test_size=0.3 → 30% of the data is used for testing.
· random_state=42 ensures that the split is reproducible (same split every time
you run it)
                              Practical Number- 02
                       Introduction To Linear Regression
Linear regression is one of the most fundamental and widely used algorithms in machine
learning. It is a supervised learning technique used for predictive modeling, primarily to
estimate relationships between variables.
 Linear regression models the relationship between a dependent variable (target) and one
or more independent variables (features) by fitting a straight line to the data. The objective
is to find the best-fitting line that minimizes the error between the predicted and actual
values.
Types of Linear Regression
1.Simple Linear Regression– Involves a single independent variable (feature).
The equation is:
y=mx+b
where:
           o   Y is the predicted value,
           o   M is the slope (coefficient),
           o   X is the independent variable,
           o   B is the y-intercept (bias)
2.Multiple Linear Regression– Involves multiple independent variables. The
equation extends to:
y=b0+b1x1+b2x2+...+bnxn
where b0 is the intercept ,and b1,b2,...bn are the coefficients for respective features
x1,x2,...,xn
How Does Linear Regression Work?
Linear regression uses a statistical approach to estimate the best-fit line by minimizing the
difference between actual and predicted values. The most common method for this is
Ordinary Least Squares (OLS), which minimizes the sum of squared residuals (errors).
Evaluation Metrics
To assess the performance of a linear regression model, we use:
1. Mean Squared Error (MSE):
Mean Squared Error (MSE) is a common loss function used in regression tasks
within machine learning. It measures the average squared difference between the
actual (true) values and the predicted values from a model.
where:
· n=number of data points
· yi= actual value of the ith data point
· y^i= predicted value of the ith data point
Interpretation:
        A lower MSE indicates better model performance, as the predictions are
         closer to the actual values.
        A higher MSE means the model has larger errors and does not fit the data
         well.
Applications of MSE in Machine Learning:
      Regression Models: Used as a loss function in algorithms like Linear
      Regression, Ridge Regression, and Lasso Regression.
      Neural Networks: Often used in deep learning models for continuous target
       variables.
     Model Evaluation: Helps compare different regression models based on their
        prediction accuracy.
     Hyper parameter Tuning: Used to optimize parameters like learning rates
        and regularization strengths.
2. Root Mean Squared Error (RMSE):
       Root Mean Squared Error (RMSE) is a commonly used metric to evaluate the
        performance of regression models. It is the square root of the Mean Squared Error
        (MSE),which measures the average squared difference between actual and
       predicted values.
3.R² Score (Coefficient of Determination):
 Measures how well the regression explains variability in the data.
 Applications of Linear Regression
 · Predicting house prices
 · Stock market forecasting
 · Sales and revenue prediction
(1.) Relationship Between Variables
 The question asks whether the insurance premium depends on driving experience
 or vice versa.
 · Independent Variable (X): Driving Experience (years)
 · Dependent Variable (Y): Monthly Auto Insurance Premium
 Expected Relationship:
 · As driving experience increases, the insurance premium is expected to
 decrease because experienced drivers are generally considered lower risk.
 · This suggests a negative correlation between the two variables.
Importing important Libraries And Loading Datasets
(2.) Plots the scatter diagram and regression line
(3.)Computes correlation coefficient (r) and R²
(4.)Computes residual standard errors
(5.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ
                                    Practical Number- 03
Python script to solve the linear regression problem based on the given data
(1.) Importing Important Libraries And Loading Dataset.
(2.) Compute SSₓₓ, SSᵧᵧ, and SSₓᵧ
(3.) Explain the meaning of a (intercept) and b (slope).
(4.) Calculate the correlation coefficient (r) and r².
(5.) Plot the scatter diagram and regression line.
(6.) Predict cholesterol level for a 60-year-old.
(7.) Compute the standard deviation of errors.
(8.) Construct a 95% confidence interval for B.
(9.) Perform a hypothesis test for B at a 5% significance level.
(10.)Test the positivity of the correlation coefficient at α = 0.025.