0% found this document useful (0 votes)

32 views5 pages

Dsbda 10

The document outlines an assignment to implement a Simple Naïve Bayes classification algorithm using the Iris dataset in Python or R. It includes steps for data preprocessing, model training, and evaluation metrics such as confusion matrix, accuracy, precision, recall, and F1 score. The Iris dataset consists of 150 samples of iris flowers with four features and three species classifications.

Uploaded by

Purva Kamat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views5 pages

Dsbda 10

Uploaded by

Purva Kamat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment No: 10

Data Analytics III

1. Implement Simple Naïve Bayes classification algorithm using Python/R on

iris.csv

2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error rate,
Precision. Recall on the given dataset.

About the Dataset

The Iris dataset consists of 150 samples of iris flowers, each belonging to one of
three species: Setosa, Versicolor, and Virginica. For each sample, four features
were measured:

Sepal length in centimeters.

Sepal width in centimeters.
Petal length in centimeters.
Petal width in centimeters.

data : A 2D array containing the features of the dataset (150 samples x 4

features).
target : A 1D array containing the target variable (the species of each sample
encoded as integers: 0 for Setosa, 1 for Versicolor, and 2 for Virginica).
target_names : An array containing the names of the target classes (Setosa,
Versicolor, Virginica).
feature_names : An array containing the names of the features (Sepal Length,
Sepal Width, Petal Length, Petal Width).
DESCR : A description of the dataset.

Step 1: Import all Necessasry Libraries

In [40]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

Step 2: Load the dataset & Start Preprocessing

In [57]:
iris = load_iris()
data = pd.DataFrame(iris.data)
data['class'] = iris.target

# rename columns
data.columns = ['sepal_len','sepal_wid','petal_len','petal_wid','clas
data.head()

Out[57]: sepal len sepal wid petal len petal wid class
Ou [5 ] p _ p _ p _ p _

0 5.1 3.5 1.4 0.2 0

1 4.9 3.0 1.4 0.2 0

2 4.7 3.2 1.3 0.2 0

3 4.6 3.1 1.5 0.2 0

4 5.0 3.6 1.4 0.2 0

In [58]:
data.describe()

Out[58]: sepal_len sepal_wid petal_len petal_wid class

count 150.000000 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333 1.000000

std 0.828066 0.435866 1.765298 0.762238 0.819232

min 4.300000 2.000000 1.000000 0.100000 0.000000

25% 5.100000 2.800000 1.600000 0.300000 0.000000

50% 5.800000 3.000000 4.350000 1.300000 1.000000

75% 6.400000 3.300000 5.100000 1.800000 2.000000

max 7.900000 4.400000 6.900000 2.500000 2.000000

In [59]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_len 150 non-null float64
1 sepal_wid 150 non-null float64
2 petal_len 150 non-null float64
3 petal_wid 150 non-null float64
4 class 150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 6.0 KB

In [60]:
#check null
data.isnull().sum()

Out[60]: sepal_len 0
sepal_wid 0
petal_len 0
petal_wid 0
class 0
dtype: int64

Model Traininig

Step 3: Split the dataset into features and target variable

In [62]:
X = data[['sepal_len','sepal_wid','petal_len','petal_wid']] # here we
l l
Y = data['class'] #selected secies

Step 4: Split the data into training and testing sets

In [46]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0

train_test_split() is a function from scikit-learn that is commonly used to

split a dataset into two subsets:

one for training a machine learning model

other for testing its performance

X, Y : These are the input features (X) and target variable (Y) that you want to
split into training and testing sets.
test_size=0.3 : Proportion of dataset or train and test. Here, it's set to 0.3,
meaning 30% for test, and 70% for train.
random_state=42 : This parameter is used to set the random seed for
reproducibility. Setting a specific random seed ensures that the data is split in the
same way each time the code is run, which is useful for obtaining consistent
results. In this case, the random seed is set to 42.
X_train, X_test, y_train, y_test :These are the resulting subsets of the
data. X_train and y_train contain the input features and target variable for the
training set, while X_test and y_test contain the input features and target variable
for the testing set.

Step 5: Train the Naïve Bayes classifier

In [63]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

classifier = GaussianNB()

classifier.fit(X_train, y_train)

Out[63]: GaussianNB()
In a Jupyter environment, please rerun this cell to show the HTML
representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading
this page with nbviewer.org.

In [65]:
from sklearn import metrics

y_pred = classifier.predict(X_test)

print("Accuracy Score: ", metrics.accuracy_score(y_test, y_pred)*100)

Accuracy Score: 97.77777777777777

In [70]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import confusion_matrix, accuracy_score, precisi

cm = confusion_matrix(y_test, y_pred)

# Get Number of classes

num_classes = cm.shape[0]

precision = []

# Calculate precision for all classes

for i in range(num_classes):
correct_precision = cm[i][i]
total_predicted_positives = sum(cm[:, i])
precision.append(correct_precision / total_predicted_positives)

print("Precision for each class: ", precision)

# Calculate recall for all classes

recall = []

for i in range(num_classes):
correct_predictions = cm[i][i]
total_actual_positive = sum(cm[i, :])
recall.append(correct_predictions / total_actual_positive)

print("Recall for each class: ", recall)

print("\nOverall")
conf_matrix = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
error_rate = 1 - accuracy
print("Confusion Matrix:\n", conf_matrix)
print("Accuracy:", accuracy)
print("Error Rate:", error_rate)
print("Precision:", precision)
print("Recall:", recall)

Precision for each class: [1.0, 1.0, 0.9285714285714286]

Recall for each class: [1.0, 0.9230769230769231, 1.0]

Overall
Confusion Matrix:
[[19 0 0]
[ 0 12 1]
[ 0 0 13]]
Accuracy: 0.9777777777777777
Error Rate: 0.022222222222222254
Precision: 0.9761904761904763
Recall: 0.9743589743589745

In [71]:
# f1 score
from sklearn.metrics import f1_score
f1 = f1_score(y_test, y_pred, average='macro')
print("F1 Score: ", f1)

F1 Score: 0.974320987654321

In [72]:
# Classification report
print(classification_report(y_test, y_pred))

precision recall f1-score support

0 1.00 1.00 1.00 19

1 1.00 0.92 0.96 13
2 0.93 1.00 0.96 13

accuracy 0.98 45
macro avg 0.98 0.97 0.97 45
weighted avg 0.98 0.98 0.98 45

Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
ML Lab1 PGM
No ratings yet
ML Lab1 PGM
4 pages
Data Analytics III
No ratings yet
Data Analytics III
5 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
Wa0001
No ratings yet
Wa0001
39 pages
Python ML Algorithms Guide
No ratings yet
Python ML Algorithms Guide
7 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
3 Classification
No ratings yet
3 Classification
16 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Import As Import As Import As Import As From Import
No ratings yet
Import As Import As Import As Import As From Import
3 pages
Iris Dataset
No ratings yet
Iris Dataset
3 pages
ML Prac1-10
No ratings yet
ML Prac1-10
32 pages
Iris Dataset EDA & ML Techniques
100% (2)
Iris Dataset EDA & ML Techniques
24 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
Machine Learning With SQL
100% (1)
Machine Learning With SQL
12 pages
ML Functions
No ratings yet
ML Functions
12 pages
Practical 1
No ratings yet
Practical 1
2 pages
CP4252 Machine Learning Lab Manual
100% (2)
CP4252 Machine Learning Lab Manual
33 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
Perform The Data Classification Using SVM Classifier - BI Prac 1
No ratings yet
Perform The Data Classification Using SVM Classifier - BI Prac 1
8 pages
NaiveBayesClassifier - Jupyter Notebook
No ratings yet
NaiveBayesClassifier - Jupyter Notebook
2 pages
ML Lab 01999676272
No ratings yet
ML Lab 01999676272
12 pages
Aml Lab
No ratings yet
Aml Lab
6 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
IRis
No ratings yet
IRis
19 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
No ratings yet
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
5 pages
ML Expt 4
No ratings yet
ML Expt 4
4 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
6 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
AIML Lab 3 4
No ratings yet
AIML Lab 3 4
5 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
ML Classification
No ratings yet
ML Classification
54 pages
Python ML Lab for Beginners
No ratings yet
Python ML Lab for Beginners
10 pages
Sklearn
No ratings yet
Sklearn
141 pages
ML Keshav
No ratings yet
ML Keshav
23 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
Handbook of Advanced Chromatography Mass Spectrometry Techniques 1st Edition Michal Holcapek Instant Download
100% (2)
Handbook of Advanced Chromatography Mass Spectrometry Techniques 1st Edition Michal Holcapek Instant Download
59 pages
Bp2101 A Mighty Fortress Complete Corrected 3digitalsale-1
No ratings yet
Bp2101 A Mighty Fortress Complete Corrected 3digitalsale-1
15 pages
British Society of Gastroenterology Guidelines On The Management of Irritable Bowel Syndrome
No ratings yet
British Society of Gastroenterology Guidelines On The Management of Irritable Bowel Syndrome
27 pages
Pride and Prejudice Character Analysis
No ratings yet
Pride and Prejudice Character Analysis
1 page
Selfservice Student Self Registration and Payment Instruction Guide
No ratings yet
Selfservice Student Self Registration and Payment Instruction Guide
5 pages
Dependency Management in A Large Agile Environment
No ratings yet
Dependency Management in A Large Agile Environment
6 pages
Real Numbers Imp Descriptive Questions
No ratings yet
Real Numbers Imp Descriptive Questions
5 pages
Deontology Ethics and Social Responsibility of Education
No ratings yet
Deontology Ethics and Social Responsibility of Education
17 pages
Dialog Bi
No ratings yet
Dialog Bi
3 pages
Mariam Chapter 2
No ratings yet
Mariam Chapter 2
13 pages
Fundamentals of Food Technology Midterm Notes
100% (1)
Fundamentals of Food Technology Midterm Notes
14 pages
Robot Arms, Hands: Kinematics: With Slides From Renata Melamud
No ratings yet
Robot Arms, Hands: Kinematics: With Slides From Renata Melamud
59 pages
Introduction To Agile Change Management v1.0 1
100% (1)
Introduction To Agile Change Management v1.0 1
8 pages
ATX-UK Exam Guidance 2024-2025
No ratings yet
ATX-UK Exam Guidance 2024-2025
7 pages
Project Finance & Investment Guide
No ratings yet
Project Finance & Investment Guide
38 pages
Soal STS B. Inggris Kls 2 2024-2025
No ratings yet
Soal STS B. Inggris Kls 2 2024-2025
3 pages
WFP 0000148112
No ratings yet
WFP 0000148112
102 pages
Od Dog, To, 2024: Commerce Examination, July, 2024
No ratings yet
Od Dog, To, 2024: Commerce Examination, July, 2024
4 pages
Guppy Farming
No ratings yet
Guppy Farming
7 pages
101 Side Hustles
No ratings yet
101 Side Hustles
35 pages
Cursors 100112215205 Phpapp01
No ratings yet
Cursors 100112215205 Phpapp01
19 pages
Flood Lamps P94-1401 - 01: General Notes
100% (1)
Flood Lamps P94-1401 - 01: General Notes
4 pages
Fertilizer Industry Insights
No ratings yet
Fertilizer Industry Insights
27 pages
Aws B5. 2. 2018 PDF
100% (3)
Aws B5. 2. 2018 PDF
30 pages
BLAME! Academy and So On (2017) (Digital) (Danke-Empire)
100% (3)
BLAME! Academy and So On (2017) (Digital) (Danke-Empire)
158 pages
Campus Recruitment Proposal - Cermati
No ratings yet
Campus Recruitment Proposal - Cermati
25 pages
Fundamentals of Genetics
No ratings yet
Fundamentals of Genetics
254 pages
Alynsa Eryn Anak Musa - 1379199 - 0
No ratings yet
Alynsa Eryn Anak Musa - 1379199 - 0
17 pages
GE Fanuc IC695ALG728: Analog Output HART Module, 8 Channels, That Is Configurable IC695A IC695AL IC695ALG
No ratings yet
GE Fanuc IC695ALG728: Analog Output HART Module, 8 Channels, That Is Configurable IC695A IC695AL IC695ALG
46 pages

Dsbda 10

Uploaded by

Dsbda 10

Uploaded by

Assignment No: 10

Data Analytics III

1. Implement Simple Naïve Bayes classification algorithm using Python/R on

About the Dataset

Sepal length in centimeters.

data : A 2D array containing the features of the dataset (150 samples x 4

Step 1: Import all Necessasry Libraries

Step 2: Load the dataset & Start Preprocessing

0 5.1 3.5 1.4 0.2 0

1 4.9 3.0 1.4 0.2 0

2 4.7 3.2 1.3 0.2 0

3 4.6 3.1 1.5 0.2 0

4 5.0 3.6 1.4 0.2 0

Out[58]: sepal_len sepal_wid petal_len petal_wid class

count 150.000000 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333 1.000000

std 0.828066 0.435866 1.765298 0.762238 0.819232

min 4.300000 2.000000 1.000000 0.100000 0.000000

25% 5.100000 2.800000 1.600000 0.300000 0.000000

50% 5.800000 3.000000 4.350000 1.300000 1.000000

75% 6.400000 3.300000 5.100000 1.800000 2.000000

max 7.900000 4.400000 6.900000 2.500000 2.000000

Step 3: Split the dataset into features and target variable

Step 4: Split the data into training and testing sets

train_test_split() is a function from scikit-learn that is commonly used to

one for training a machine learning model

Step 5: Train the Naïve Bayes classifier

print("Accuracy Score: ", metrics.accuracy_score(y_test, y_pred)*100)

# Get Number of classes

# Calculate precision for all classes

print("Precision for each class: ", precision)

# Calculate recall for all classes

print("Recall for each class: ", recall)

accuracy = accuracy_score(y_test, y_pred)

Precision for each class: [1.0, 1.0, 0.9285714285714286]

precision recall f1-score support

0 1.00 1.00 1.00 19

You might also like