4 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/4.
ipynb
4. Apply Linear Regression Model techniques to predict data on any
dataset.
In [18]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [19]: df = pd.read_csv(r"C:\Users\ABHISHEK\Downloads\LungCapData - LungCapData.csv")
In [20]: df.head()
Out[20]: LungCap Age Height Smoke Gender Caesarean
0 6.475 6 62.1 no male no
1 10.125 18 74.7 yes female no
2 9.550 16 69.7 no female yes
3 11.125 14 71.0 no male no
4 4.800 5 56.9 no male no
In [21]: df.isnull().sum()
Out[21]: LungCap 0
Age 0
Height 0
Smoke 0
Gender 0
Caesarean 0
dtype: int64
In [22]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 725 entries, 0 to 724
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 LungCap 725 non-null float64
1 Age 725 non-null int64
2 Height 725 non-null float64
3 Smoke 725 non-null object
4 Gender 725 non-null object
5 Caesarean 725 non-null object
dtypes: float64(2), int64(1), object(3)
memory usage: 34.1+ KB
1 of 6 30-10-2024, 22:06
4 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/4.ipynb
In [23]: from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
In [24]: le = LabelEncoder()
In [25]: df.Smoke = le.fit_transform(df.Smoke)
In [26]: df.Gender = le.fit_transform(df.Gender)
In [27]: df.Caesarean = le.fit_transform(df.Caesarean)
In [28]: df.head()
Out[28]: LungCap Age Height Smoke Gender Caesarean
0 6.475 6 62.1 0 1 0
1 10.125 18 74.7 1 0 0
2 9.550 16 69.7 0 0 1
3 11.125 14 71.0 0 1 0
4 4.800 5 56.9 0 1 0
In [29]: x = df.drop(['LungCap'],axis = 1)
In [30]: x
Out[30]: Age Height Smoke Gender Caesarean
0 6 62.1 0 1 0
1 18 74.7 1 0 0
2 16 69.7 0 0 1
3 14 71.0 0 1 0
4 5 56.9 0 1 0
... ... ... ... ... ...
720 9 56.0 0 0 0
721 18 72.0 1 1 1
722 11 60.5 1 0 0
723 15 64.9 0 0 0
724 10 67.7 0 1 0
725 rows × 5 columns
In [31]: y = df['LungCap']
2 of 6 30-10-2024, 22:06
4 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/4.ipynb
In [32]: X_test,X_train,y_test,y_train = train_test_split(x,y,test_size = 0.2,random_state
In [34]: X_train.shape,y_train.shape
Out[34]: ((145, 5), (145,))
In [35]: from sklearn.linear_model import LinearRegression
In [36]: lr = LinearRegression()
In [37]: lr.fit(X_train,y_train)
Out[37]: LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust
the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page
with nbviewer.org.
In [38]: y_pred = lr.predict(X_test)
In [39]: from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error
In [40]: mse = mean_squared_error(y_test,y_pred)
In [41]: err_train = y_test - y_pred
In [43]: mse = np.mean(np.square(err_train))
In [44]: mse
Out[44]: 1.0559850321341964
In [45]: rmse = np.sqrt(mse)
In [46]: rmse
Out[46]: 1.0276113234750754
In [47]: r2_score(y_test,y_pred)
Out[47]: 0.8511008247863296
3 of 6 30-10-2024, 22:06
4 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/4.ipynb
In [48]: plt.plot(err_train,"*")
Out[48]: [<matplotlib.lines.Line2D at 0x165af248dc0>]
In [51]: plt.hist(err_train,bins=20,edgecolor='g')
plt.grid()
In [55]: y_test.shape,y_pred.shape
Out[55]: ((580,), (580,))
In [65]: d = {"Actual":(y_test),
"Predicted":(y_pred)}
In [66]: pred_actual_df = pd.DataFrame(d)
4 of 6 30-10-2024, 22:06
4 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/4.ipynb
In [67]: pred_actual_df
Out[67]: Actual Predicted
446 6.300 6.251456
6 4.950 6.996499
423 7.800 9.078604
596 3.925 4.716138
411 8.675 8.229591
... ... ...
71 9.700 9.940940
106 10.875 11.602824
270 6.100 5.671011
435 11.300 10.971752
102 3.450 6.361120
580 rows × 2 columns
In [69]: sns.jointplot(x ='Actual',y = 'Predicted',data= pred_actual_df ,kind = 'reg')
plt.grid()
5 of 6 30-10-2024, 22:06
4 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/4.ipynb
In [ ]:
6 of 6 30-10-2024, 22:06