0% found this document useful (0 votes)

18 views9 pages

Miniproject 2

The document outlines a mini-project focused on classifying breast cancer data using various machine learning models, including Naive Bayes, Artificial Neural Networks (ANN), and K-Nearest Neighbors (KNN). It details the data preprocessing steps, model training, and evaluation metrics such as accuracy, precision, and recall for each model. The results indicate that the ANN model achieved the highest accuracy of approximately 97.81%.

Uploaded by

hemaaccess

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views9 pages

Miniproject 2

Uploaded by

hemaaccess

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

miniproject-2

August 6, 2024

[13]: import pandas as pd

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, precision_score,␣
↪recall_score,f1_score

# Load the breast_cancer_dataset

df = pd.read_csv("C:\\Users\\SRIRAM\\OneDrive\\Documents\\WIN-SEM␣
↪2022-23\\B2_ML\\LAB\\PROJECT\\Breast_cancer.csv")

# column names of the dataset

column_names = ['Sample_code_number', 'Clump_Thickness', 'Uniformity_of_Cell␣
↪Size', 'Uniformity_of_Cell_Shape',

'Marginal_Adhesion', 'Single_Epithelial_Cell_Size',␣
↪'Bare_Nuclei', 'Bland_Chromatin',

'Normal_Nucleoli', 'Mitoses', 'Class']

df.columns = column_names

#changing values of target function from [2,4] to [0,1]

df['Class'] = np.where(df['Class'] == 2, 0, 1)

# Droping 'Sample code number' column which is not required for classification
df.drop('Sample_code_number', axis=1, inplace=True)

# Replace missing values (denoted by '?') with NaN

df.replace('?', np.nan, inplace=True)

# Droping the rows with missing values

df.dropna(inplace=True)

# Converting columns to numeric data type

df = df.astype({'Bare_Nuclei': 'int64', 'Class': 'int64'})

#updated df is shown as output

1
print("Updated data set without sample_code_number \n",df)

# Split the dataset into features and target

X = df.iloc[:, :-1] # loading Features into x
y = df.iloc[:, -1] # Target into y

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

# Standardize the features

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Updated data set without sample_code_number

Clump_Thickness Uniformity_of_Cell Size Uniformity_of_Cell_Shape \
0 5 1 1
1 5 4 4
2 3 1 1
3 6 8 8
4 4 1 1
.. … … …
694 3 1 1
695 2 1 1
696 5 10 10
697 4 8 6
698 4 8 8

Marginal_Adhesion Single_Epithelial_Cell_Size Bare_Nuclei \

0 1 2 1
1 5 7 10
2 1 2 2
3 1 3 4
4 3 2 1
.. … … …
694 1 3 2
695 1 2 1
696 3 7 3
697 4 3 4
698 5 4 5

Bland_Chromatin Normal_Nucleoli Mitoses Class

0 3 1 1 0
1 3 2 1 0
2 3 1 1 0
3 3 7 1 0

2
4 3 1 1 0
.. … … … …
694 1 1 1 0
695 1 1 1 0
696 8 10 2 1
697 10 6 1 1
698 10 4 1 1

[683 rows x 10 columns]

[14]: # Naive Bayes

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Train the model

nb = GaussianNB()
nb.fit(X_train, y_train)

# Make predictions
y_pred = nb.predict(X_test)

# Compute confusion matrix for Naive bayes

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion matrix for testing set using Naive bayes')
plt.show()
print("\n")

# Evaluate performance(Accuracy)
nb_accuracy = accuracy_score(y_test, y_pred)
print("Naïve Bayesian Classifier accuracy:", nb_accuracy)
print("\n")

# Compute the precision

nb_precision = precision_score(y_test, y_pred)
print("Naive Bayes precision:", nb_precision)
print("\n")

# Compute the recall

nb_recall = recall_score(y_test, y_pred)
print("Naive Bayes recall:", nb_recall)
print("\n")

3
Naïve Bayesian Classifier accuracy: 0.9562043795620438

Naive Bayes precision: 0.9482758620689655

Naive Bayes recall: 0.9482758620689655

[15]: # ANN
from sklearn.neural_network import MLPClassifier

# Train the model

ann = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500)
ann.fit(X_train, y_train)

# Make predictions

4
y_pred = ann.predict(X_test)

# Compute confusion matrix for Naive bayes

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion matrix for testing set using ANN')
plt.show()
print("\n")

# Evaluate performance(Accuracy)
ann_accuracy = accuracy_score(y_test, y_pred)
print("ANN accuracy:", ann_accuracy)
print("\n")

# Compute the precision

ann_precision = precision_score(y_test, y_pred)
print("ANN precision:", ann_precision)
print("\n")

# Compute the recall

ann_recall = recall_score(y_test, y_pred)
print("ANN recall:", ann_recall)
print("\n")

5
ANN accuracy: 0.9781021897810219

ANN precision: 0.9661016949152542

ANN recall: 0.9827586206896551

[16]: # KNN
from sklearn.neighbors import KNeighborsClassifier

# Train the model

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions

6
y_pred = knn.predict(X_test)

# Compute confusion matrix for Naive bayes

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion matrix for testing set using ANN')
plt.show()
print("\n")

# Evaluate performance(Accuracy)
knn_accuracy = accuracy_score(y_test, y_pred)
print("KNN accuracy:", knn_accuracy)
print("\n")

# Compute the precision

knn_precision = precision_score(y_test, y_pred)
print("KNN precision:", knn_precision)
print("\n")

# Compute the recall

knn_recall = recall_score(y_test, y_pred)
print("KNN recall:", knn_recall)
print("\n")

C:\Users\SRIRAM\anaconda3\lib\site-
packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other
reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode`
typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will
change: the default value of `keepdims` will become False, the `axis` over which
the statistic is taken will be eliminated, and the value None will no longer be
accepted. Set `keepdims` to True or False to avoid this warning.
mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

7
KNN accuracy: 0.9635036496350365

KNN precision: 0.9818181818181818

KNN recall: 0.9310344827586207

[23]: x = np.arange(3)
y1 = [nb_accuracy,nb_precision,nb_recall]
y2 = [ann_accuracy,ann_precision,ann_recall]
y3 = [knn_accuracy,knn_precision,knn_recall]

# plot data in grouped manner of bar type

plt.bar(x-0.3, y1, width, color='violet')

8
plt.bar(x, y2, width, color='yellow')
plt.bar(x+0.3, y3, width, color='green')

plt.xticks(x, ['Naive Bayes', 'ANN', 'KNN'])

plt.xlabel("Metrics")
plt.ylabel("Values")
plt.legend(["Accuracy","Precision","Recall","F1_score"])
plt.show()

[0 1 2]

[ ]:

Lab 8
No ratings yet
Lab 8
2 pages
DL 2
No ratings yet
DL 2
5 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
Data Com ML PDF Code + Output
No ratings yet
Data Com ML PDF Code + Output
6 pages
ML - Labtask5.ipynb - K - Colab
No ratings yet
ML - Labtask5.ipynb - K - Colab
8 pages
ML Functions
No ratings yet
ML Functions
12 pages
Python ML Algorithms Guide
No ratings yet
Python ML Algorithms Guide
7 pages
All in One
No ratings yet
All in One
13 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Final ML Programs 075005
No ratings yet
Final ML Programs 075005
15 pages
1
No ratings yet
1
13 pages
Breast Cancer Classification Using DTC
No ratings yet
Breast Cancer Classification Using DTC
1 page
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
ML Batch
No ratings yet
ML Batch
36 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
Practical 6
No ratings yet
Practical 6
8 pages
Aml Lab 41 Ann Hyperparameter Tuning - Ipynb - Colab
No ratings yet
Aml Lab 41 Ann Hyperparameter Tuning - Ipynb - Colab
3 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
ML 2.4 Prashant
No ratings yet
ML 2.4 Prashant
3 pages
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
No ratings yet
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
8 pages
AIML Practical 02 22105A2021
No ratings yet
AIML Practical 02 22105A2021
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Additional Program
No ratings yet
Additional Program
573 pages
Lab On ML Print-Set-2022
No ratings yet
Lab On ML Print-Set-2022
10 pages
Perceptron & Adaline in Python
No ratings yet
Perceptron & Adaline in Python
11 pages
ML File
No ratings yet
ML File
8 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
Medical Data ML
No ratings yet
Medical Data ML
6 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Lab 2
No ratings yet
Lab 2
8 pages
Sofcomputing Da2
No ratings yet
Sofcomputing Da2
7 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Loading The Dataset: 'Diabetes - CSV'
No ratings yet
Loading The Dataset: 'Diabetes - CSV'
4 pages
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
No ratings yet
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
5 pages
ML Lab 01999676272
No ratings yet
ML Lab 01999676272
12 pages
ML
No ratings yet
ML
11 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
Atul MLT Exp 4-11
No ratings yet
Atul MLT Exp 4-11
17 pages
ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
Superior University Lahore: Assignment 8
No ratings yet
Superior University Lahore: Assignment 8
5 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
AI Lab 2nd Intenal.
No ratings yet
AI Lab 2nd Intenal.
11 pages
Big Data Assignment - 7
No ratings yet
Big Data Assignment - 7
7 pages
Code
No ratings yet
Code
6 pages
Ipynb - Colab01
No ratings yet
Ipynb - Colab01
4 pages
ML RECORD EX 5,6,7,8,9 (Without Border)
No ratings yet
ML RECORD EX 5,6,7,8,9 (Without Border)
13 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
IRis
No ratings yet
IRis
19 pages
Experiment 2
No ratings yet
Experiment 2
15 pages
ML Expt 4
No ratings yet
ML Expt 4
4 pages
ML Regression & Classification Guide
100% (1)
ML Regression & Classification Guide
45 pages
Svmdoc
No ratings yet
Svmdoc
7 pages
DGS 5 Examples
No ratings yet
DGS 5 Examples
8 pages
AN1015 Software Techniques For Improving Micro Controller EMC Performance
No ratings yet
AN1015 Software Techniques For Improving Micro Controller EMC Performance
14 pages
Academic & IT Professional Resume
No ratings yet
Academic & IT Professional Resume
3 pages
STD Set
No ratings yet
STD Set
3 pages
MeghdootUserGuide
No ratings yet
MeghdootUserGuide
16 pages
Assignment 2 1
No ratings yet
Assignment 2 1
4 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
37 pages
5 Years Emc Storage Resume
No ratings yet
5 Years Emc Storage Resume
4 pages
Cyclopedia of Mechanical Engineering
100% (1)
Cyclopedia of Mechanical Engineering
441 pages
Chap5-Sampling Rate Conversion
No ratings yet
Chap5-Sampling Rate Conversion
22 pages
NVIDIA Firmware Update Guide
No ratings yet
NVIDIA Firmware Update Guide
6 pages
PRELIMINARY (CMP) - Comprehensive Master Planning Report / Checklist
No ratings yet
PRELIMINARY (CMP) - Comprehensive Master Planning Report / Checklist
7 pages
TO: All Clerical Employees: Commercial Bank of Ethiopia Human Resources Management
No ratings yet
TO: All Clerical Employees: Commercial Bank of Ethiopia Human Resources Management
2 pages
Software Testing Guide Book v0.1
No ratings yet
Software Testing Guide Book v0.1
110 pages
Java Layout Managers Guide
No ratings yet
Java Layout Managers Guide
9 pages
Pulse Modulation
100% (2)
Pulse Modulation
12 pages
Internship Report Guide for Students
No ratings yet
Internship Report Guide for Students
4 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
Begin With C/C++: Programming Pathshala A Training Report
No ratings yet
Begin With C/C++: Programming Pathshala A Training Report
23 pages
Intro to Grouped Data Statistics
No ratings yet
Intro to Grouped Data Statistics
11 pages
Resume Java Job Interview
No ratings yet
Resume Java Job Interview
6 pages
MCQ (MTH107)
No ratings yet
MCQ (MTH107)
8 pages
Unity Pro XL Learning
75% (4)
Unity Pro XL Learning
124 pages
Input and Output Devices
100% (1)
Input and Output Devices
20 pages
Sample Questions
No ratings yet
Sample Questions
6 pages
DSA Elab Level 1
No ratings yet
DSA Elab Level 1
137 pages
School Address Book1
No ratings yet
School Address Book1
36 pages
COBOL Language History & Features
No ratings yet
COBOL Language History & Features
104 pages
How To Solve Remote Desktop Problems
No ratings yet
How To Solve Remote Desktop Problems
4 pages
16f877a Programming Specifications
No ratings yet
16f877a Programming Specifications
22 pages

Miniproject 2

Uploaded by

Miniproject 2

Uploaded by

miniproject-2

[13]: import pandas as pd

# Load the breast_cancer_dataset

# column names of the dataset

'Normal_Nucleoli', 'Mitoses', 'Class']

#changing values of target function from [2,4] to [0,1]

# Replace missing values (denoted by '?') with NaN

# Droping the rows with missing values

# Converting columns to numeric data type

#updated df is shown as output

# Split the dataset into features and target

# Split the dataset into training and testing sets

# Standardize the features

Updated data set without sample_code_number

Marginal_Adhesion Single_Epithelial_Cell_Size Bare_Nuclei \

Bland_Chromatin Normal_Nucleoli Mitoses Class

[683 rows x 10 columns]

[14]: # Naive Bayes

# Train the model

# Compute confusion matrix for Naive bayes

# Compute the precision

# Compute the recall

Naive Bayes precision: 0.9482758620689655

Naive Bayes recall: 0.9482758620689655

# Train the model

# Compute confusion matrix for Naive bayes

# Compute the precision

# Compute the recall

ANN precision: 0.9661016949152542

ANN recall: 0.9827586206896551

# Train the model

# Compute confusion matrix for Naive bayes

# Compute the precision

# Compute the recall

KNN precision: 0.9818181818181818

KNN recall: 0.9310344827586207

# plot data in grouped manner of bar type

plt.xticks(x, ['Naive Bayes', 'ANN', 'KNN'])

You might also like