Assignment 5
Name: satyajit Shinde
Div: TY AI C Roll No.: 41
PRN: 12211701
Understanding Polynomial Regression
Polynomial regression is a form of regression analysis in which the
relationship between the independent variable xx and the dependent
variable y is modelled as an nth degree polynomial. This technique is
particularly useful when the data exhibits a curvilinear relationship that
cannot be well captured by a simple linear regression model.
Key Steps in Polynomial Regression
1. Data Preparation:
● Gather and preprocess the dataset, ensuring that it is clean
and ready for analysis.
● Identify the independent variable(s) (features) and the
dependent variable (target).
2. Choose Polynomial Degree:
● Determine the degree of the polynomial nn based on the
nature of the data and the underlying relationship you wish to
model.
3. Feature Transformation:
● Transform the independent variable(s) into polynomial
features. For example, if you have a single feature xx, you
would create features such as x2,x3,…,xnx2,x3,…,xn.
4. Model Fitting:
● Fit a polynomial regression model to the transformed features
using a suitable algorithm (e.g., ordinary least squares).
5. Model Evaluation:
● Evaluate the model's performance using appropriate metrics
(e.g., R-squared, Mean Squared Error) to assess how well it
fits the data.
6. Visualization:
● Plot the original data points along with the fitted polynomial
curve to visually assess how well the model captures the
underlying trend.
Benefits of Polynomial Regression
● Captures Non-Linear Relationships: Polynomial regression can
model complex relationships that linear regression cannot, making
it suitable for datasets with non-linear patterns.
● Flexibility: By adjusting the degree of the polynomial, you can
create a wide range of models that can fit various types of data
distributions.
● Improved Predictions: In cases where relationships are inherently
non-linear, polynomial regression can lead to better predictive
performance compared to linear models.
Choosing the Degree of Polynomial
● Low Degree (1-2): Suitable for simple curves; less risk of
overfitting.
● Moderate Degree (3-5): Often provides a good balance between
flexibility and overfitting; commonly used in practice.
● High Degree (>5): Can fit very complex relationships but risks
overfitting and may lead to poor generalization on unseen data.
Problem Statement: WAP to implement Polynomial Regression
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, mean_absolute_error
# Load the dataset
data = pd.read_csv('Position_Salaries.csv')
X = data.iloc[:, 1:2].values # Independent variable (Position level)
y = data.iloc[:, 2].values # Dependent variable (Salary)
# Handle missing values in y (if any)
y = np.nan_to_num(y, nan=np.nanmean(y))
# Linear Regression using scikit-learn
lin_reg = LinearRegression()
lin_reg.fit(X, y)
linear_predictions = lin_reg.predict(X)
# Calculate error metrics for Linear Regression
linear_mse = mean_squared_error(y, linear_predictions)
linear_mae = mean_absolute_error(y, linear_predictions)
print(f"Linear Regression - MSE: {linear_mse:.2f}, MAE: {linear_mae:.2f}")
# Polynomial Regression Degree 2 using scikit-learn
poly_features_2 = PolynomialFeatures(degree=2)
X_poly_2 = poly_features_2.fit_transform(X)
poly_reg_2 = LinearRegression()
poly_reg_2.fit(X_poly_2, y)
poly_predictions_2 = poly_reg_2.predict(X_poly_2)
# Calculate error metrics for Polynomial Regression Degree 2
poly_mse_2 = mean_squared_error(y, poly_predictions_2)
poly_mae_2 = mean_absolute_error(y, poly_predictions_2)
print(f"Polynomial Regression Degree 2 - MSE: {poly_mse_2:.2f}, MAE:
{poly_mae_2:.2f}")
# Polynomial Regression Degree 4 using scikit-learn
poly_features_4 = PolynomialFeatures(degree=4)
X_poly_4 = poly_features_4.fit_transform(X)
poly_reg_4 = LinearRegression()
poly_reg_4.fit(X_poly_4, y)
poly_predictions_4 = poly_reg_4.predict(X_poly_4)
# Calculate error metrics for Polynomial Regression Degree 4
poly_mse_4 = mean_squared_error(y, poly_predictions_4)
poly_mae_4 = mean_absolute_error(y, poly_predictions_4)
print(f"Polynomial Regression Degree 4 - MSE: {poly_mse_4:.2f}, MAE:
{poly_mae_4:.2f}")
# Visualization for Linear Regression
plt.scatter(X, y, color='red')
plt.plot(X, linear_predictions, color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Visualization for Polynomial Regression Degree 2
X_grid = np.arange(min(X), max(X), 0.01).reshape(-1,1) # For smoother curve
plotting
X_grid_poly_2 = poly_features_2.transform(X_grid)
plt.scatter(X, y, color='red')
plt.plot(X_grid, poly_reg_2.predict(X_grid_poly_2), color='blue')
plt.title('Polynomial Regression Degree 2')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Visualization for Polynomial Regression Degree 4
X_grid_poly_4 = poly_features_4.transform(X_grid)
plt.scatter(X, y, color='red')
plt.plot(X_grid, poly_reg_4.predict(X_grid_poly_4), color='blue')
plt.title('Polynomial Regression Degree 4')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
Output:
Linear Regression - MSE: 26695878787.88, MAE: 128454.55
Polynomial Regression Degree 2 - MSE: 6758833333.33, MAE: 70218.18
Polynomial Regression Degree 4 - MSE: 210343822.84, MAE: 12681.82