10/8/21, 1:09 PM 20190802050_DS_Lab4
AI/ML LAB-4
Name: Pratik Jadhav
PRN: 20190802050
AIM: To implement two algorithms on a data set and impute the
accuracy score of the predictions
Q1. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data
set. Print both correct and wrong predictions. Java/Python ML library classes can be used
for this problem.
In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
iris_data = pd.read_csv("Iris.csv")
iris_data.head()
Out[2]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
In [3]:
len(iris_data)
150
Out[3]:
In [4]:
iris_data.isna().sum()
Id 0
Out[4]:
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64
localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 1/5
10/8/21, 1:09 PM 20190802050_DS_Lab4
In [5]: X = iris_data.drop("Species", axis=1)
y = iris_data["Species"]
len(X), len(y)
(150, 150)
Out[5]:
In [6]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=1)
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
0.9666666666666667
Out[6]:
In [7]:
y_preds = clf.predict(X_test)
y_preds[:10]
array(['Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
Out[7]:
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
'Iris-setosa', 'Iris-setosa', 'Iris-virginica'], dtype=object)
In [8]:
y_preds_proba = clf.predict_proba(X_test)
y_preds_proba[:10]
array([[1., 0., 0.],
Out[8]:
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.]])
In [9]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
accuracy = accuracy_score(y_preds, y_test)
print(f"The accuracy of the ML model for iris data: {accuracy * 100:.2f}%\n")
print(f"Classfication Report: {classification_report(y_preds, y_test)}\n")
print(f"Confusion Matrix: \n{confusion_matrix(y_preds, y_test)}")
The accuracy of the ML model for iris data: 96.67%
Classfication Report: precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 11
Iris-versicolor 0.92 1.00 0.96 12
Iris-virginica 1.00 0.86 0.92 7
localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 2/5
10/8/21, 1:09 PM 20190802050_DS_Lab4
accuracy 0.97 30
macro avg 0.97 0.95 0.96 30
weighted avg 0.97 0.97 0.97 30
Confusion Matrix:
[[11 0 0]
[ 0 12 0]
[ 0 1 6]]
In [10]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(clf, X, y)
print(cvs)
print(f"Mean of each testing data set: {np.mean(cvs) * 100:.2f}%")
[0.66666667 1. 1. 1. 0.7 ]
Mean of each testing data set: 87.33%
In [11]:
y_testing = pd.Series(y_test).reset_index().drop("index",axis=1)
y_predictions = pd.Series(y_preds)
In [12]:
predictions_df = pd.DataFrame(data={
"Species": y_testing["Species"],
"Predicted Species": y_predictions
})
In [13]:
predicts = []
for index, i in enumerate(y_testing["Species"]):
if i == y_preds[index]:
predicts.append("Correct")
else:
predicts.append("Wrong")
In [14]:
predictions_df["Correct or Wrong"] = pd.Series(predicts)
predictions_df.head()
Out[14]: Species Predicted Species Correct or Wrong
0 Iris-setosa Iris-setosa Correct
1 Iris-versicolor Iris-versicolor Correct
2 Iris-versicolor Iris-versicolor Correct
3 Iris-setosa Iris-setosa Correct
4 Iris-virginica Iris-virginica Correct
In [15]:
print(f"Total Correct or Wrong Predictions:\n\
{predictions_df['Correct or Wrong'].value_counts()}")
Total Correct or Wrong Predictions:
Correct 29
localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 3/5
10/8/21, 1:09 PM 20190802050_DS_Lab4
Wrong 1
Name: Correct or Wrong, dtype: int64
Q2. Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data
sets.
In [16]:
iris_data = pd.read_csv("Iris.csv")
iris_data.head()
Out[16]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
In [17]:
X = iris_data.drop("Species", axis=1)
y = iris_data["Species"]
len(X), len(y)
(150, 150)
Out[17]:
In [18]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
random_state=1)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
gnb.score(X_test, y_test)
1.0
Out[18]:
In [19]:
y_preds = gnb.predict(X_test)
y_preds[:10]
array(['Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
Out[19]:
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
'Iris-setosa', 'Iris-setosa', 'Iris-virginica'], dtype='<U15')
In [20]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_preds, y_test)
localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 4/5
10/8/21, 1:09 PM 20190802050_DS_Lab4
print(f"The accuracy of the ML model for iris data: {accuracy * 100:.2f}%")
The accuracy of the ML model for iris data: 100.00%
In [21]:
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(gnb, X, y)
print(cvs)
print(f"Mean of each testing data set: {np.mean(cvs) * 100:.2f}%")
[0.96666667 1. 1. 1. 1. ]
Mean of each testing data set: 99.33%
In [22]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
accuracy = accuracy_score(y_preds, y_test)
print(f"The accuracy of the ML model for iris data: {accuracy * 100:.2f}%\n")
print(f"Classfication Report: {classification_report(y_preds, y_test)}\n")
print(f"Confusion Matrix: \n{confusion_matrix(y_preds, y_test)}")
The accuracy of the ML model for iris data: 100.00%
Classfication Report: precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 14
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Confusion Matrix:
[[14 0 0]
[ 0 18 0]
[ 0 0 13]]
Conclusion: Hence, we have successfully implemented kNeigbhours and Naive Bayesian
algorithms on iris data set and computed the accuracy and different evaluation model on the
predictions. We got an accuray of 96.67% on testing data and 87.33% on different testing data
sets of the KNeighbours Algorithm. And for Naive Bayesian we got an accuracy of 100% and
99.33% on different testing data sets of iris data.
localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 5/5