0% found this document useful (0 votes)

40 views6 pages

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

This document analyzes various machine learning algorithms to predict cancer using a cancer dataset with 568 samples. It imports necessary libraries, loads and preprocesses the dataset, splits it into training and test sets, trains 7 different classifiers (Logistic Regression, KNN, Linear SVM, RBF SVM, Naive Bayes, Decision Tree, Random Forest), makes predictions on the test set, calculates accuracy scores, and plots the results. The Logistic Regression model achieved the highest accuracy of 97.88% while the Decision Tree model had the lowest accuracy of 90.14%.

Uploaded by

frankh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views6 pages

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

Uploaded by

frankh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

10/8/2020 Cancer

Meaningful Predictive Modeling Week-4 Assignment

CANCER DISEASE PREDICTION
In [1]:

#importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [16]:

#importing our cancer dataset

dataset = pd.read_csv('cancer.csv')
X = dataset.iloc[:, 2:31].values
Y = dataset.iloc[:, 1].values

In [17]:

dataset.head()

Out[17]:

842302 M 17.99 10.38 122.8 1001 0.1184 0.2776 0.3001 0.1471 ... 25.38

0 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 2

1 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 ... 23.57 2

2 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 ... 14.91 2

3 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 ... 22.54

4 843786 M 12.45 15.70 82.57 477.1 0.12780 0.17000 0.1578 0.08089 ... 15.47 2

5 rows × 32 columns

In [18]:

print("Cancer data set dimensions : {}".format(dataset.shape))

Cancer data set dimensions : (568, 32)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 1/6
10/8/2020 Cancer

In [19]:

dataset.isnull().sum()
dataset.isna().sum()

Out[19]:

842302 0
M 0
17.99 0
10.38 0
122.8 0
1001 0
0.1184 0
0.2776 0
0.3001 0
0.1471 0
0.2419 0
0.07871 0
1.095 0
0.9053 0
8.589 0
153.4 0
0.006399 0
0.04904 0
0.05373 0
0.01587 0
0.03003 0
0.006193 0
25.38 0
17.33 0
184.6 0
2019 0
0.1622 0
0.6656 0
0.7119 0
0.2654 0
0.4601 0
0.1189 0
dtype: int64

In [20]:

#Encoding categorical data values

from sklearn.preprocessing import LabelEncoder
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)

In [21]:

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_stat
e = 0)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 2/6
10/8/2020 Cancer

In [22]:

#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 3/6
10/8/2020 Cancer

In [24]:

#Using Logistic Regression Algorithm to the Training Set

from sklearn.linear_model import LogisticRegression
classifier1 = LogisticRegression(random_state = 0)
classifier1.fit(X_train, Y_train)
#Using KNeighborsClassifier Method of neighbors class to use Nearest Neighbor algorithm
from sklearn.neighbors import KNeighborsClassifier
classifier2 = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier2.fit(X_train, Y_train)
#Using SVC method of svm class to use Support Vector Machine Algorithm
from sklearn.svm import SVC
classifier3 = SVC(kernel = 'linear', random_state = 0)
classifier3.fit(X_train, Y_train)
#Using SVC method of svm class to use Kernel SVM Algorithm
from sklearn.svm import SVC
classifier4 = SVC(kernel = 'rbf', random_state = 0)
classifier4.fit(X_train, Y_train)
#Using GaussianNB method of naïve_bayes class to use Naïve Bayes Algorithm
from sklearn.naive_bayes import GaussianNB
classifier5 = GaussianNB()
classifier5.fit(X_train, Y_train)
#Using DecisionTreeClassifier of tree class to use Decision Tree Algorithm

from sklearn.tree import DecisionTreeClassifier

classifier6 = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier6.fit(X_train, Y_train)

#Using RandomForestClassifier method of ensemble class to use Random Forest Classificat

ion algorithm

from sklearn.ensemble import RandomForestClassifier

classifier7 = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_s
tate = 0)
classifier7.fit(X_train, Y_train)

C:\Users\ROHINI\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.
py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22.
Specify a solver to silence this warning.
FutureWarning)

Out[24]:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entro

py',
max_depth=None, max_features='auto', max_leaf_nodes
=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=0, verbo
se=0,
warm_start=False)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 4/6
10/8/2020 Cancer

In [29]:

Y_pred1 = classifier1.predict(X_test)
Y_pred2 = classifier2.predict(X_test)
Y_pred3 = classifier3.predict(X_test)
Y_pred4 = classifier4.predict(X_test)
Y_pred5 = classifier5.predict(X_test)
Y_pred6 = classifier6.predict(X_test)
Y_pred7 = classifier7.predict(X_test)

In [30]:

from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix(Y_test, Y_pred1)
cm2 = confusion_matrix(Y_test, Y_pred2)
cm3 = confusion_matrix(Y_test, Y_pred3)
cm4 = confusion_matrix(Y_test, Y_pred4)
cm5 = confusion_matrix(Y_test, Y_pred5)
cm6 = confusion_matrix(Y_test, Y_pred6)
cm7 = confusion_matrix(Y_test, Y_pred7)
print(cm1)
print(cm2)
print(cm3)
print(cm4)
print(cm5)
print(cm6)
print(cm7)

[[91 1]
[ 2 48]]
[[91 1]
[ 6 44]]
[[90 2]
[ 4 46]]
[[92 0]
[ 6 44]]
[[89 3]
[ 6 44]]
[[84 8]
[ 6 44]]
[[89 3]
[ 6 44]]

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 5/6
10/8/2020 Cancer

In [34]:

from sklearn.metrics import accuracy_score

acc1=accuracy_score(Y_test, Y_pred1)*100
acc2=accuracy_score(Y_test, Y_pred2)*100
acc3=accuracy_score(Y_test, Y_pred3)*100
acc4=accuracy_score(Y_test, Y_pred4)*100
acc5=accuracy_score(Y_test, Y_pred5)*100
acc6=accuracy_score(Y_test, Y_pred6)*100
acc7=accuracy_score(Y_test, Y_pred7)*100
print("LogR",acc1)
print("KNN",acc2)
print("SVM",acc3)
print("K-SVM",acc4)
print("NB",acc5)
print("DT",acc6)
print("RF",acc7)

LogR 97.88732394366197
KNN 95.07042253521126
SVM 95.77464788732394
K-SVM 95.77464788732394
NB 93.66197183098592
DT 90.14084507042254
RF 93.66197183098592

In [38]:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt

data = [acc1, acc2, acc3, acc4,acc5,acc6,acc7]

labels = ['LogR', 'KNN', 'SVM', 'KSVM', 'NB','DT','RF']
plt.xticks(range(len(data)), labels)
plt.xlabel('Algorithms')
plt.ylabel('Accuracy(%)')
plt.title('Comparision of Algorithms')
plt.bar(range(len(data)), data,color=['pink', 'red', 'green', 'blue', 'cyan','yellow',
'purple'])
plt.show()

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 6/6

Cancer Disease Classification
No ratings yet
Cancer Disease Classification
6 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Disease Prediction2 Printout
No ratings yet
Disease Prediction2 Printout
5 pages
Final ML Programs 075005
No ratings yet
Final ML Programs 075005
15 pages
Appendix - Complete Code Implementation
No ratings yet
Appendix - Complete Code Implementation
8 pages
Additional Program
No ratings yet
Additional Program
573 pages
Labaihw
No ratings yet
Labaihw
1 page
Heart Dis
No ratings yet
Heart Dis
13 pages
Prgm8.Ipynb - Colab
No ratings yet
Prgm8.Ipynb - Colab
2 pages
All in One
No ratings yet
All in One
13 pages
ML Regression & Classification Guide
100% (1)
ML Regression & Classification Guide
45 pages
ML Lab 5
No ratings yet
ML Lab 5
2 pages
Preductive Modelling Assignment
No ratings yet
Preductive Modelling Assignment
3 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
Assignment ML
No ratings yet
Assignment ML
5 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
LAB9
No ratings yet
LAB9
3 pages
Allcodesml 2
No ratings yet
Allcodesml 2
10 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Reast Cancer Prediction Using Debt
No ratings yet
Reast Cancer Prediction Using Debt
18 pages
Lab 8
No ratings yet
Lab 8
2 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
ML File
No ratings yet
ML File
7 pages
PDF To Jpeg
No ratings yet
PDF To Jpeg
7 pages
Python Breast Cancer Prediction Guide
No ratings yet
Python Breast Cancer Prediction Guide
8 pages
SVM
No ratings yet
SVM
1 page
MN
No ratings yet
MN
1 page
DWDM Lab 3
No ratings yet
DWDM Lab 3
10 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
ML Lab
No ratings yet
ML Lab
4 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Breast Cancer Classification Using DTC
No ratings yet
Breast Cancer Classification Using DTC
1 page
ML Journal
No ratings yet
ML Journal
45 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
ML Lab 4,5,6,7,8,9,10
No ratings yet
ML Lab 4,5,6,7,8,9,10
7 pages
ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
PRJ-Parkinsons Disease Prediction
No ratings yet
PRJ-Parkinsons Disease Prediction
16 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Team No-7
No ratings yet
Team No-7
12 pages
Breast Cancer Detection Using Python & Machine Learning
No ratings yet
Breast Cancer Detection Using Python & Machine Learning
12 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
ML II Lab
No ratings yet
ML II Lab
5 pages
ML RECORD EX 5,6,7,8,9 (Without Border)
No ratings yet
ML RECORD EX 5,6,7,8,9 (Without Border)
13 pages
ML Functions
No ratings yet
ML Functions
12 pages
MLfull
No ratings yet
MLfull
29 pages
Breast Cancer SVM Classification Guide
No ratings yet
Breast Cancer SVM Classification Guide
2 pages
01 Machine Learning
No ratings yet
01 Machine Learning
25 pages
Data Com ML PDF Code + Output
No ratings yet
Data Com ML PDF Code + Output
6 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
AML Lab
No ratings yet
AML Lab
14 pages
Writing Effective Research Abstracts
No ratings yet
Writing Effective Research Abstracts
3 pages
University of The People BUS 2201 - AY2021-T2 Principles of Marketing Written Assignment Unit 1 Instructor DR Linda Howe Date: November 14, 2020
No ratings yet
University of The People BUS 2201 - AY2021-T2 Principles of Marketing Written Assignment Unit 1 Instructor DR Linda Howe Date: November 14, 2020
5 pages
Written Assignment Unit 7: Abstract
No ratings yet
Written Assignment Unit 7: Abstract
3 pages
Step 1: Finding The Data Set: "Amazon - Reviews - Multilingual - UK - v1 - 00.tsv - GZ" 'RT' "Utf8"
No ratings yet
Step 1: Finding The Data Set: "Amazon - Reviews - Multilingual - UK - v1 - 00.tsv - GZ" 'RT' "Utf8"
4 pages
Written Assignment Unit 1: Business Net Types University of The People BUS 2202 E-Commerce Instructor Richard Cline 16 November, 2020
No ratings yet
Written Assignment Unit 1: Business Net Types University of The People BUS 2202 E-Commerce Instructor Richard Cline 16 November, 2020
5 pages
This Study Resource Was: Module 2 - Assignment 2
No ratings yet
This Study Resource Was: Module 2 - Assignment 2
3 pages
Written Assignment
No ratings yet
Written Assignment
7 pages
A. Describe in Detail The Advantages and Disadvantages of Renting Versus Owning A Home
No ratings yet
A. Describe in Detail The Advantages and Disadvantages of Renting Versus Owning A Home
2 pages
University of The People Course Bus 2204 Topic: Personal Financial Planning Instructor: Madam Schaffert
No ratings yet
University of The People Course Bus 2204 Topic: Personal Financial Planning Instructor: Madam Schaffert
4 pages
4 Ps vs Value Approach in Marketing
No ratings yet
4 Ps vs Value Approach in Marketing
2 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
Data Exploration
No ratings yet
Data Exploration
4 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Journal. Retrieved From: References
No ratings yet
Journal. Retrieved From: References
1 page
What Makes A Good Abstract
No ratings yet
What Makes A Good Abstract
3 pages
MC 33030
No ratings yet
MC 33030
17 pages
IT 602 Assignment 2 Solution
No ratings yet
IT 602 Assignment 2 Solution
2 pages
OpenText Media Management 16.3 - Administration Guide English (MEDMGT160300-AGD-EN-02) PDF
No ratings yet
OpenText Media Management 16.3 - Administration Guide English (MEDMGT160300-AGD-EN-02) PDF
306 pages
Compliance Gap Analysis
No ratings yet
Compliance Gap Analysis
1 page
User Manual ETK-20180803
No ratings yet
User Manual ETK-20180803
59 pages
SAP Enterprise Structure Setup Guide
No ratings yet
SAP Enterprise Structure Setup Guide
45 pages
Nec XN120 Programming Manual
No ratings yet
Nec XN120 Programming Manual
410 pages
ME6703 CIM Important Questions Unit 1
No ratings yet
ME6703 CIM Important Questions Unit 1
2 pages
How To Install and Root Your Android Emulator - Cyber Treat Defense On Security Tutorials
No ratings yet
How To Install and Root Your Android Emulator - Cyber Treat Defense On Security Tutorials
11 pages
Duty Statements Office Technician Typing PDF
No ratings yet
Duty Statements Office Technician Typing PDF
3 pages
RM-012N-1 - Repeater For USB To RS-485 Converter
No ratings yet
RM-012N-1 - Repeater For USB To RS-485 Converter
1 page
Flutter Setup Guide for Windows
100% (1)
Flutter Setup Guide for Windows
7 pages
Brochure Houillon
No ratings yet
Brochure Houillon
2 pages
C++ Function Overloading Guide
No ratings yet
C++ Function Overloading Guide
10 pages
PP Syllabus
No ratings yet
PP Syllabus
6 pages
CS409 Sample Paper Solved
No ratings yet
CS409 Sample Paper Solved
5 pages
CC Unit-1
No ratings yet
CC Unit-1
21 pages
Python First Sessional - IT - 24-25
No ratings yet
Python First Sessional - IT - 24-25
1 page
Diagnostic Test Media and Information Literacy
100% (2)
Diagnostic Test Media and Information Literacy
2 pages
Firmware Downloader Program: Updating Your Printer
No ratings yet
Firmware Downloader Program: Updating Your Printer
4 pages
Apple Inc.
0% (1)
Apple Inc.
7 pages
KMU v. The Director General (NEDA)
No ratings yet
KMU v. The Director General (NEDA)
13 pages
Comprehensive Linux Training Guide
No ratings yet
Comprehensive Linux Training Guide
2 pages
Pfe 2018 FACG
100% (2)
Pfe 2018 FACG
44 pages
GradCat13 14final8 21
No ratings yet
GradCat13 14final8 21
150 pages
Muratec MFX 4555 Service Manual
No ratings yet
Muratec MFX 4555 Service Manual
292 pages
CS101 Introduction To Computing Solved MID Term Paper 01
No ratings yet
CS101 Introduction To Computing Solved MID Term Paper 01
4 pages
Research Paper-Business Analytics
No ratings yet
Research Paper-Business Analytics
13 pages
EPON OLT Operation Manual V1.2 20211102
No ratings yet
EPON OLT Operation Manual V1.2 20211102
484 pages
UMG510 User Guide for Technicians
No ratings yet
UMG510 User Guide for Technicians
44 pages

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

Uploaded by

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

Uploaded by

10/8/2020 Cancer

Meaningful Predictive Modeling Week-4 Assignment

#importing the libraries

#importing our cancer dataset

print("Cancer data set dimensions : {}".format(dataset.shape))

Cancer data set dimensions : (568, 32)

#Encoding categorical data values

#Using Logistic Regression Algorithm to the Training Set

from sklearn.tree import DecisionTreeClassifier

#Using RandomForestClassifier method of ensemble class to use Random Forest Classificat

from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entro

from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

data = [acc1, acc2, acc3, acc4,acc5,acc6,acc7]

You might also like