SINDHUDURG SHIKSHAN PRASARAK MANDAL’S
A.P-Harkul Budruk, Nardave Road, TAL-Kankavli, DIST-Sindhudurg PIN-416602
Department of Computer Science and Engineering (AIML)
Lab Course: CSL601 Sub: DAVLAB
EXPERIMENT NO.3
Aim: Multiple linear regression in python.
Regression models are used to describe relationships between variables by fitting a line to
the observed data. Regression allows you to estimate how a dependent variable changes as
the independent variable(s) change.
Multiple linear regression is used to estimate the relationship between two or more
independent variables and one dependent variable. You can use multiple linear regression
when you want to know:
1. How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added
affect crop growth).
2. The value of the dependent variable at a certain value of the independent variables
(e.g. the expected yield of a crop at certain levels of rainfall, temperature, and
fertilizer addition).
3. Assumptions of multiple linear regression
4. Multiple linear regression makes all of the same assumptions as simple linear
regression:
5. Homogeneity of variance (homoscedasticity): the size of the error in our prediction
doesn’t change significantly across the values of the independent variable.
6. Independence of observations: the observations in the dataset were collected using
statistically valid sampling methods, and there are no hidden relationships among
variables.
7. In multiple linear regression, it is possible that some of the independent variables are
actually correlated with one another, so it is important to check these before
developing the regression model. If two independent variables are too highly
correlated (r2 > ~0.6), then only one of them should be used in the regression model.
8. Normality: The data follows a normal distribution.
9. Linearity: the line of best fit through the data points is a straight line, rather than a
curve or some sort of grouping factor.
SINDHUDURG SHIKSHAN PRASARAK MANDAL’S
A.P-Harkul Budruk, Nardave Road, TAL-Kankavli, DIST-Sindhudurg PIN-416602
Department of Computer Science and Engineering (AIML)
LabCourse:CSL601 Sub: DAVLAB
EXPERIMENT NO.3
Aim: Multiple linear regression in python.
Input code:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Set random seed for reproducibility
np.random.seed(42)
# Generate independent variables
X1 = 2 * np.random.rand(100, 1)
X2 = 3 * np.random.rand(100, 1)
X3 = 5 * np.random.rand(100, 1)
# Combine independent variables into a single matrix
X = np.hstack((X1, X2, X3))
# Generate dependent variable with noise
y = 5 + 2 * X1 + 3 * X2 + 1.5 * X3 + np.random.randn(100, 1)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Initialize and train the multiple linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on test data
y_pred = model.predict(X_test)
# Calculate evaluation metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Extract model coefficients and intercept
intercept = model.intercept_[0]
coefficients = model.coef_[0]
# Print model parameters and performance metrics
print("\nModel Parameters:")
print(f"Intercept: {intercept:.2f}")
print(f"Coefficients: X1={coefficients[0]:.2f}, X2={coefficients[1]:.2f},
X3={coefficients[2]:.2f}") # Corrected line
print(f"Mean Squared Error: {mse:.2f}")
print(f"R2 Score: {r2:.2f}")
# Plot Actual vs Predicted values
plt.scatter(y_test, y_pred, color='blue', label="Predicted vs Actual")
plt.plot(y_test, y_test, color='red', linewidth=2, label="Perfect Fit Line") #
Corrected line
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values in Multiple Linear Regression")
plt.legend()
plt.show()
Output:
Model Parameters:
Intercept: 4.51
Coefficients: X1=2.33, X2=3.14, X3=1.59
Mean Squared Error: 2.10
R2 Score: 0.88