Cancerdetection2013 PDF
Cancerdetection2013 PDF
Teknologi
Received :14 June 2013                        Accurate diagnosis of cancer plays an importance role in order to save human life. The results of the
Received in revised form :                    diagnosis indicate by the medical experts are mostly differentiated based on the experience of different
9 September 2013                              medical experts. This problem could risk the life of the cancer patients. From the literature, it has been
Accepted :15 Octobe r2013                     found that Artificial Intelligence (AI) machine learning classifiers such as an Artificial Neural Network
                                              (ANN) and Support Vector Machine (SVM) can help doctors in diagnosing cancer more precisely. Both
Graphical abstract                            of them have been proven to produce good performance of cancer classification accuracy. The aim of this
                                              study is to compare the performance of the ANN and SVM classifiers on four different cancer datasets.
                                              For breast cancer and liver cancer dataset, the features of the data are based on the condition of the organs
                                              which is also called as standard data while for prostate cancer and ovarian cancer; both of these datasets
                                              are in the form of gene expression data. The datasets including benign and malignant tumours is specified
                                              to classify with proposed methods. The performance of both classifiers is evaluated using four different
                                              measuring tools which are accuracy, sensitivity, specificity and Area under Curve (AUC). This research
                                              has shown that the SVM classifier can obtain good performance in classifying cancer data compare to
                                              ANN classifier.
Keywords: Support vector machine; artificial neural network; classification; cancer; accuracy
Abstrak
                                              Diagnosis kanser yang tepat memainkan peranan yang penting dalam usaha untuk menyelamatkan nyawa
                                              manusia. Keputusan diagnosis yang telah disahkan oleh pakar perubatan adalah berbeza mengikut
                                              pengalaman masing-masing. Masalah ini boleh membahayakan nyawa pesakit kanser. Dari kajian
                                              terdahulu, didapati bahawa penggunaan sistem pembelajaran Kepintaran Tiruan (AI) seperti Rangkaian
                                              Neural Buatan (ANN) dan Mesin Vektor Sokongan (SVM) boleh membantu doktor dalam mendiagnosis
                                              kanser dengan lebih tepat. Keupayaan kedua-dua algoritma ini telah terbukti dalam menghasilkan prestasi
                                              yang baik bagi pengelasan kanser. Tujuan kajian ini adalah untuk membandingkan prestasi ANN dan
                                              SVM dalam mengelaskan empat dataset kanser yang berbeza. Ciri-ciri data kanser payudara dan kanser
                                              hati adalah berdasarkan kepada keadaan organ-organ tersebut, manakala bagi kanser prostat dan kanser
                                              ovari, kedua-dua set data adalah dalam bentuk data ekspresi gen. Set data dikelaskan kepada dua jenis
                                              ketumbuhan iaitu ketumbuhan yang tidak berbahaya dan ketumbuhana yang merbahaya dengan
                                              menggunakan algoritma-algoritma yang dicadangkan. Prestasi kedua-dua algoritma dinilai menggunakan
                                              empat alat pengukur yang berbeza iaitu kejituan, kepekaan, keperincian dan nilai kawasan di bawah
                                              lengkungan (AUC). Kajian ini telah menunjukkan bahawa SVM mempunyai prestasi yang baik dalam
                                              mengklasifikasikan data kanser berbanding ANN.
Kata kunci: Mesin vektor sokongan; rangkaian neural buatan; pengelasan; kanser; kejituan
1.0 INTRODUCTION
                                                                                  determining and choosing the proper treatment [2]. Undeniably,
Cancer is one of the leading causes of death in most countries                    the decisions made by the doctors are the most important factors
around the world. The survival rate is strongly influenced by stage               in diagnosis but lately, application of different AI classification
of the malignancy (malignant tumour) at the point of diagnosis                    techniques have been proven in helping doctors to facilitate their
[1]. Thus, an early diagnosis is needed in order to give the proper               decision making process [3]. Recently, the use of AI classification
treatment to the patients and to help reduce the mortality and                    techniques in the cancer classification in the medical field has
morbidity rate. Accurate diagnosis for different types of cancer                  increased gradually. Possible errors that might occur due to
plays an important role to the doctors to assist them in                          unskilled doctors can be minimized by using classification
techniques. This technique can also examine medical data in a                cancer classification model using SVM for prostate magnetic
shorter time and more precisely [3]. The aim of the classification           resonance spectra dataset. The result stress that the SVM classifier
is to develop a set of models that are able to correctly classify the        can obtain high accuracy (95.85%). In the same year, [13] applied
class of different objects. There are three types of inputs to such          SVM classifier to the breast cancer dataset using digital
models, which are; a set of objects or commonly described as                 ultrasound image database. They obtained 86.92% accuracy with
training data, the dependent variables or classes which these                321 samples. In addition, the SVM classifier was also used for
objects belong to and the independent variables, which is a set of           cancer classification of prostate cancer datasets by [14] in 2010.
variables describing different characteristics of the objects. Once a        SVM classifier was used to classify the cancer dataset into two
classification model is built, it can be used to classify the class of       classes namely; normal and cancer samples. This model produces
the objects for which class information is unidentified [4].                 an accurate classification rate of 95.09%. In 2007, [15]
      There are many types of classification algorithm or                    successfully classified the breast cancer dataset by using LS-SVM
commonly known as classifiers have been used for cancer                      classifier with an accuracy rate of 98.53%.
diagnosis. Some of them are Artificial Neural Network (ANN),                       Several comparative studies have been done by the
Support Vector Machine (SVM), Genetic Algorithm (GA), Fuzzy                  researchers in cancer classification in order to select the best
Set (FS) and Rough Set (RS). They are used to classify cancer                techniques to classify cancer. However, the result obtains from the
dataset as malignant tumours (cancerous) and benign tumours                  previous studies are inconsistent. Some studies state that ANN is
(non-cancerous). However, ANN and SVM are the classifiers that               better compare to SVM. In the study conducted by [5], which
received attention from most researchers. Both of them have been             compare the performance of ANN and SVM on Dynamic
proven to produce good classification accuracy performance.                  Magnetic Resonance Imaging (MRI) of breast cancer data, had
Several comparative studies on ANN and SVM have been                         found that Probabilistic Neural Network (PNN) obtained the
conducted by the researchers [3,5,6,7], however the result                   maximum accuracy among all classifiers. [6] also found that ANN
reported are inconsistent.                                                   outperforms SVM in the classification of Microclacification
      Due to the inconsistent result obtained, the aim of this study         Clusters (MCCs) in mammogram imaging. [7] studied the
is to further validate the performance of both ANN and SVM in                performance of ANN and SVM classifier on Wisconsin Breast
cancer classification. The performance of both classifiers will be           Cancer Database (WBCD). The research has demonstrated that
tested and evaluate on four different cancer datasets which are              Radial Basis Function Neural Network (RBFNN) outperformed
divided into two type of cancer data namely; standard data and               the polynomial SVM for correctly classifying the tumours.
gene expression data. These four datasets are obtained from UCI                    In contrast, some studies had found that SVM has better
Machine Learning Repository and National Cancer Institute                    performance than ANN. In the study conducted by [16], the result
(NCI). This paper is organized as follows. Section 2 provides the            obtained showed that Polynomial SVM gives better result than
related studies carried out on cancer classification using ANN and           ANN in classifying the prostate cancer data. All of the results are
SVM classifier and basic concept of ANN and SVM. In section 3,               determined based on the value of accuracy for each classifier. In
the explanation on dataset used will be explained. Section 4                 2011, [13] compared the performance of SVM and ANN in
described on the methodology of this study. In section 5, the                classifying breast cancer dataset. The experimental result
results and discussion are summarized on tables to show the                  demonstrate that the SVM classifier gives the best performance.
performance and the comparison of applied classifiers. Finally,              Besides that, studied done by [14] also stressed that SVM results
the paper is concluded in section 6.                                         ineffectual and powerful classification of a prostate cancer
                                                                             dataset compare to ANN. Table 1 summarizes the result obtain for
                                                                             each comparative study and from this table it can be concluded
2.0 LITERATURE REVIEW                                                       that both of the classifiers can obtain good percentages
                                                                             performance in classifying cancer. However, all of the previous
2.1 Related Work                                                             studies only compared the performance of both classifiers on one
                                                                             type of cancer data whether standard data or gene expression data.
ANN and SVM classifiers have been the most useful AI                         Thus, this study is conducted to further validate the performance
techniques for the researchers’ community to classify cancer.                of both classifiers in both type of cancer data in order to verify
Both of them have obtained excellence performance in classifying             which classifier could performed better for both type of cancer
cancer. For ANN, there are some studies that had proven the                  data.
excellent performance of this classifier. In [8], a study on liver
biopsy images using Probabilistic Neural Network (PNN) has                                  Table 1 S ummary of comparative studies
been done. The result expresses high performance of PNN
classifier with 92% of accuracy for testing set. Besides that, ANN                   Author             ANN (%)            SVM (%)
classifier was also used for breast cancer classification for                        [5]                94.00              88.00
Wisconsin Breast Cancer Database (WBCD) dataset [9]. A neural                        [6]                78.00              72.00
network with feed-forward back propagation algorithm was used
                                                                                     [7]                96.57              92.13
to classify the cancerous tumours from a symptom that causes the
breast cancer disease. This model produces a correct classification                  [16]               79.30              81.10
rate of 96.63% for the testing set. In 2010, [10] applied ANN                        [13]               86.60              86.92
classifier to the lung cancer dataset. The dataset is in the type of
                                                                                     [14]               94.11              95.09
CT images. They obtained 84.6% accuracy for the unknown
samples of the dataset. [11] in 2006 focused on classifying the
ovarian cancer dataset. A novel Radial Basis Function (RBF)
neural network is used to classify the data. The results in the              2.2 Artificial Neural Network (ANN) Classifier
dataset show that the RBF neural network is able to achieve 100%
accuracy.                                                                    Artificial Neural Network (ANN) is a branch of computational
     SVM classifiers have also gained the attention of the                   intelligence that employs a variety of optimization tool to learn
researchers in classifying cancer. Recently, [12], proposed a                from past experiences and use that prior training to classify new
75                           Sharifah Hafizah S. A. Ubaidillah et al. / Jurnal Teknologi (Sciences & Engineering) 65:1 (2013), 73–81
data, identify new patterns or predict. Artificial Neural Networks          types of SVM classifier which are linear SVM and non-linear
(ANNs) are gross simplifications of real (biological) networks of           SVM.
neurons. Inspired by the structure of the brain, a neural network                For linear SVM, consider N pairs of training samples:
consists of a set of highly interconnected entities, called nodes or                   xi , yi  , i  1,2,..... n                    (1)
units. Each unit is designed to mimic its biological counterpart,
the neuron. Each accepts a weighted set of inputs and responds                   Where xi  R n is a k-dimensional feature vector and
with an output [17]. Figure 1 shows the working of nodes in                  yi   1,  1 is the class label of xi . A hyperplane in the feature
Artificial Neural Network (ANN).                                            space can be described as
                                                                                         w.x  b  0                                        (2)
                                                                                 where w is an orthogonal vector while b is a scalar. For
                                                                            linearly separable cases of training samples, SVM generate the
                                                                            optimal hyperplane that separates two classes with maximum
                                                                            margin and no training error [16, 20]. The hyper plane is placed
                                                                            midway between the two classes to maximize the margin [2].
                                                                            Now maximizing the separating margin is equivalent to maximize
                                                                            the minimum value of signed distance d(i) from a point xi to the
                                                                            hyperplane [20, 21]. The value of d(i) can be obtained by
                                                                                                  w.xi  b
                                                                                         d (i )                                            (3)
                                                                                                     w
                 Figure 1 Working of node in ANN                                 The parameter pairs of w and b that corresponding to the
                                                                            optimal hyperplane is the one that minimize
                                                                                                1     2
                                                                                       L( w)  w                                          (4)
     The Back Propagation Neural Network (BPNN), also called                                    2
multi-layer feed-forward neural network or multi-layer                      subject to
perceptron, is very popular and is most widely used [9,17]. The                        yi w.xi  b  1, i  1,2,...... n                (5)
BPNN is based on the supervised procedure whereas the network
                                                                                 When the training samples are linearly non-separable, there
constructs a model based on examples of data with known outputs             is no such a hyperplane that is able to classify every training point
[17]. Back propagation algorithm is a training algorithm where              correctly [20]. In order to solve the imperfect separation, the
signals travel in one direction from input neuron to an output              optimization idea can be generalized by introducing the concept
neuron without returning to its source [9]. Back propagation
                                                                            of soft margin [21]. Thus, the new optimization problem becomes:
algorithm consists of at least three layers of units which are input                                              N
                                                                            Minimize Lw,                     
layer, at least one hidden layer and output layer. The number of                                    1 2
                                                                                                      w C               i                 (6)
nodes in the input layer is corresponded to the number of input                                     2
                                                                                                                  i 1
variables while for the number of nodes in the output layer is
determined by the number of output variables [5]. In the context
of cancer classification, the values of output variables are either         so that
zero for benign tumour or 1 for malignant tumours.                                     yi w.xi  b  1  i , i  1,2,...... n            (7)
     The term back propagation refers to the way the error                       where i are called as slack variables which are related to
computed at the output side is propagated backward from the
                                                                            the soft margin, and C is the tuning parameter used to balance the
output layer, to the hidden layer, and finally to the input layer.
                                                                            margin and the training error. The optimization problem in (5) and
Each of the iteration in back propagation constitutes two sweeps:
                                                                            (7) can be solved by using the Lagrange multipliers  i that
forward activation to produce a solution, and a backward
propagation of the computed error to modify the weights. The                transform to quadratic optimization problem, for which there exist
forward and backward sweeps are performed repeatedly until the              a unique solution. According to the KuhnTucker theorem of
ANN solution agrees with the desired value within a pre-specified           optimization theory [22], the optimal solution satisfy
tolerance. The back propagation algorithm provides the needed                          i yi .w.xi  b  1  0, i  1,2,..... n       (8)
weight adjustments in the backward sweep [9]. Table 2 shows the
                                                                            (8) has non-zero Lagrange multipliers if and only if the points xi
nine steps in BP algorithm.
                                                                            satisfy
2.3 Support Vector Machine (SVM) Classifier                                            yi .w.xi  b  1                               (9)
                                                                                  These points are called as Support Vector (SV) which lie
Support Vector Machine (SVM) is a machine learning which has                either on or within the margin. Hence, if  i is the non-zero
been extensively used as a classification tool and has found a
                                                                            optimal solution, the classification phase can be stated as
great deal of success in many applications. Originally, SVM is
developed based on the Vapnik-Chervonenkis (VC) theory and
                                                                                                     n                
                                                                                                  
                                                                                                                      
structural risk minimization (SRM) principle [18, 19] which is
trying to find the trade-off between minimizing the training set                      f x   sgn  i yi xi .x   b              (10)
error and maximizing the margin, in order to achieve the best                                     
                                                                                                   i 1               
                                                                                                                       
generalization ability and remains resistant to over fitting. SVM is             When a linear SVM does not gives good performance, non-
a method to estimate the function classifying the data into two             linear SVM is used. The function of non-linear SVM is to map the
classes [16]. For cancer classification, the classes will be divided        feature vector, x by a non-linear mapping,  x  into a high
into two which are benign and malignant tumours. A very brief               dimensional feature space in which the optimal hyperplane is
review of SVM will be concentrated in this section. There are two           found [23]. The non-linear mapping can be perform into feature
76                               Sharifah Hafizah S. A. Ubaidillah et al. / Jurnal Teknologi (Sciences & Engineering) 65:1 (2013), 73–81
space by using the kernel function, which computes the inner                    The prostate cancer dataset namely, JNCI Data (7-3-02) consists
                                        
product of vectors  xi  and  x j . The kernel function can be               of 322 serum spectra composed of peak amplitude measurements
                                                                                at 15154 points stated by corresponding values in the range 0-
explained as
                              
          k xi .x j   xi . x j                                  (11)
                                                                                20000 Da. There are 253 benign and 69 malignant samples in the
                                                                                dataset. The ovarian cancer dataset is labelled as “Ovarian 8-7-
     The most commonly used kernel functions are the Radial                     02”, and consists of 253 dataset. An upgraded PBSII SELDI-TOF
Basis Function (RBF)                                                            mass spectrometer was employed to generate the spectra, which
                                   2                                          include 91 benign samples and 162 malignant samples. Each
                           xi  x j 
                  
          k xi .x j  exp                               (12)
                                                                                spectrum includes peak amplitude measurements at 15154 points
                             2 2                                             defined by corresponding m/z values in the range 0-20000 Da.
                                                                                   From the Table 3, we can see that each dataset is different in
     where  is the parameter controlling the width of the kernel               terms of number of features. For example, the breast cancer and
and the Polynomial Function                                                     liver cancer dataset have less number of features while prostate
                               
           k xi .x j  xi .x j  1 p                                 (13)
                                                                                and ovarian cancer dataset have bigger number of features. The
                                                                                features of breast cancer is based on physical appearance of the
     where the parameter, p is the polynomial order. In this                    tumours such as clump thickness, uniformity of cell size,
study, the RBF kernel function is used.                                         uniformity of cell shape, marginal adhesion and single epithelial
                                                                                cell size for breast cancer dataset. For liver cancer, the features
           Table 2 The summarized steps in BPNN algorithm                       that influence the tumours whether it is benign or malignant
                                                                                tumours is based on blood tests and number of half-pint
Step 1      Obtain a set of training patterns.                                  equivalents of alcoholic beverages drunk per day. For prostate and
                                                                                ovarian cancer dataset, both of the datasets are in form of gene
Step 2      Set up ANN model that consist of number of input neurons,           expression data which defined as the flow of genetic information
            hidden neurons and output neurons                                   from gene to protein. A data or commonly called as a mass
Step 3      Set learning rate (h) and momentum rate (a)                         spectrum in the context of gene expression data contain thousands
                                                                                of different mass/charge ratios. For both of the dataset, each data
Step 4      Initialize all connections ( Wij and W jk ) and bias weights        contain 15154 values of m/z in the range of 0-20000 Da. These
            ( qk and qi ) to random values.                                     values are then called as features in this study.
Step 5      Set the minimum error, Emin .
                                                                                               Table 3 The Summary of cancer datasets
Step 6      Start training by applying input patterns one at a time and         Type of     Name of      Number      Number      Benign    Malignant
            propagate through the layers then calculate total error.            Cancer      Dataset         of          of      Tumours    Tumours
Step 7      Back propagate error through output and hidden layer and                                     Samples     Features
                                                                                 Breast    Wisconsin       683          9          444       239
            adapt weights, W jk and qk .
                                                                                              Breast
Step 8      Back propagate error through hidden and input layer and                           Cancer
            adapt weights, Wij and qi .                                                      Dataset
                                                                                            (WBCD)
Step 9      Check if Error < Emin . If not, repeat steps 6-9. If yes, stop       Liver        BUPA         345          6          200       145
            training                                                                           Liver
                                                                                           Disorders
                                                                                Prostate   JNCI Data       322        15154        253        69
                                                                                             (7-3-02)
3.0 EXPERIMENTAL DATA                                                          Ovarian      Ovarian       253        15154        91        162
                                                                                              8-7-02
The performance of the proposed method was tested and
evaluated using four different types of cancers datasets which are
breast cancer, liver cancer, prostate cancer and ovarian cancer.                4.0 METHODOLOGY
These dataset contains the samples of the benign and malignant
tumours. The aim of this classification is to classify the benign               4.1 Development of SVM and ANN Classification Model
and malignant tumours correctly using the ANN and SVM
classifiers. Breast cancer and liver cancer dataset are obtained                The development of ANN and SVM classification models consist
from the UCI Machine Library Database while ovarian and                         of four steps which are input variable selection, data
prostate cancers are obtained from the National Cancer Institute                preprocessing and partitioning, setting of model parameter and
(NCI). The summary for all the datasets are shown in Table 3.                   model implementation. The difference between ANN and SVM
      The breast cancer dataset which is Wisconsin Breast Cancer                classification models are lies in the setting of model parameter
Database (WBCD) is given by W.Nick Street (1995) from                           and model implementation. Figure 2 shows the summary of the
University of Wisconsin. The dataset consist of 683 samples                     steps involves in developing the classification model using ANN
excluded missing values. These samples were divided into two                    and SVM classifier. To facilitate the performance of the
classes: 444 benign tumours and 239 malignant tumours. There                    classifiers, Matlab R2012a Neural Network Toolbox is used to
are 9 features in the dataset. For liver cancer, the BUPA Liver                 develop ANN classification model and LIBSVM package
Disorders dataset which is created by BUPA Medical Research                     introduced by [24] is implemented in Matlab R2012a to develop
Limited is used. This dataset is given by Richard S.Forsyth in                  the SVM classification model.
1990. The total of data is 345 which 200 samples are benign                          The first steps in developing the classification models are
tumours and 145 samples are malignant tumours. Each of the data                 input variable selection. The network input variables are vary for
has 6 features. The breast cancer and liver cancer dataset represent            each type of datasets. The second step is data preprocessing and
as standard data.
77                            Sharifah Hafizah S. A. Ubaidillah et al. / Jurnal Teknologi (Sciences & Engineering) 65:1 (2013), 73–81
partitioning. Data is usually pre-processed before it can be used            classification model is chosen based on the smallest value of
for training to accelerate convergence. Hence, data are normalized           Mean Square Error (MSE) obtained during the training phase and
using a linear transformation. The actual data is transformed in             used to classify the testing dataset while in SVM classification
the range of 0 to 1 using equation (20):                                     model, the SVM model is trained until the best pairs of
                   X 0  X min                                               parameters C ,   are obtained. This process involved cross
           Xn                                             (14)
                  X max  X min                                              validation techniques. In this study, 3-fold cross validation is
     where X n is the new value of X, X 0 is the initial value of X,         used. Meaning that, for each of 3 subsets acts as an independent
                                                                             holdout test set for the model trained with the rest of 2 subsets.
 X min is the minimum value of X in the sample data and X max is             The advantage of k-fold cross validation are the impact of data
the maximum value of X in the sample data. Besides the data                  dependency is minimized and the reliability of result can be
normalization, data preprocessing also involves the process of               improved. The best parameter pairs C ,   are used to create the
data conversion which requires that each data instance is                    classification model. The selected classification model is then
represented as a vector of real numbers. Thus, data have to be               tested on the testing dataset.
converted into numeric data if they are categorical attributes. In
the case of data conversion, [25] recommend using m numbers to
represent a m-category attribute. For in tumours classification, it is
usually classified to be whether benign or malignant, so it should
be represented as (0,1) before it can be supplied into the
classifiers.
      Then the data are divided into two partitions which are
training and testing set. There is no specific rule to determine the
data division of training dataset and testing dataset [3]. In most
cases, the researchers used different combinations of data division
and it varies according to the problems. In this study, the datasets
are split into training-test partitions namely, 70-30% respectively.
The training set contains 70% of data from each tumour which are
benign and malignant tumours while another 30% of the data used
for testing set. For example WBCD dataset, 70% (311) of benign
tumours data and 70% (133) of malignant tumours data are
grouped as testing set. The other 30% of each tumours data in
WBCD dataset are used for testing set. The division of data for all
datasets are summarized in Table 4.                                          Figure 2 Steps involve in developing the classification model using ANN
      The third step is setting the model parameter and it is very           and SVM classifier
important. The proper model parameters setting can improve the
ANN and SVM classification accuracy performance. There are                                         Table 4 Division of datasets
three types of parameters that should be considered for training
the Back Propagation Neural Network (BPNN) that is network                        Name of Dataset             Training Set              Testing Set
architecture, transfer function and learning parameters. The                                                    (70%)                     (30%)
                                                                              Wisconsin Breast Cancer             478                       205
network architecture of the BPNN model consists of three layers;
                                                                                 Dataset (WBCD)
input, hidden and output. The number of hidden nodes in the                   BUPA Liver Disorders                242                      103
hidden layer is different for each dataset; usually it depends on the           JNCI Data (7-3-02)                225                      97
number of input nodes used. The number of hidden nodes applied                    Ovarian 8-7-02                  177                      76
are important because it could effect the results of the
experiments. Tangent sigmoid has been used in input and hidden
layers as the transfer function. The scale-conjugate gradient                4.2 Performance Measure
(SCG) back propagation neural network was selected as learning
parameter in this study. In SCG, the value of weight update is               The performance of the classifiers was evaluated by the
calculated as follows:                                                       percentage of accurately assigned new samples of cancer data to
            wi 1  wi    i  i                     (15)           its correct class such as benign and malignant. There are several
                      i                                                  measuring tools to evaluate the performance of the classifiers that
              i                                           (16)
                      wi 
                                                                             have been proposed. They are sensitivity, specificity, accuracy
                                                                             and area under receiving operating characteristic curve (AUC).
     Where   i    is the iteration count,  is the learning rate, and        Each of them are used to measure different aspects of
  i  is the step direction taken in the i  th iteration step. For        performance for example, or in other word, the performance of
SVM classification model, there are two parameters that should               ANN is quantified based on how accurate the ANN could classify
be considered in RBF kernel function, namely regularization                  the benign and malignant tumours correctly on the dataset that
parameter, C and gamma parameter,  . C determines the trade-                never been used in training.
                                                                                   Sensitivity which is also defined as True Positive Rate (TPR)
off cost between minimizing the training error and the complexity
                                                                             is the percentage of benign tumours data classified as benign by
of the model, while  defines the non-linear mapping from the                the classifier. The classifier that can correctly classify benign
input space to some high dimensional space [21]. For this study, a           tumours will have a higher result in sensitivity. Sensitivity is
parameter search is conducted in order to identify the best values           defined as follows [5,7]:
of parameters C ,   using trial and error approach.                                                            TP
                                                                                    Sensitivity (%) = TPR =               100             (17)
       The last steps in developing the classification models are                                             FN  TP 
model implementation. For ANN classification model, the best
78                           Sharifah Hafizah S. A. Ubaidillah et al. / Jurnal Teknologi (Sciences & Engineering) 65:1 (2013), 73–81
Specificity is the percentage of malignant tumours data classified           tumours correctly), whereas an AUC value of 50% is equivalent
as malignant by the classifiers. The classifier that can correctly           to random model. AUC was calculated as follows [5]:
classify malignant tumours will have a better result in specificity.                          1  TP             TN 
Specificity was also known as the True Negative Rate (TNR) and                   AUC (%) =                             100           (20)
                                                                                              2  TP  FN TN  FP 
was calculated as follows [6,16]:
                                   TN                                             Even though there are several measuring tools used to
     Specificity (%) = TNR =                100              (18)           evaluate the performance of the classifiers but the best classifiers
                                TN  FP                                    is chosen based on the classification accuracy [5,6,7].
     Accuracy evaluates the performance of the classifier that can
correctly classify both types of tumours. The higher value of
accuracy indicates better performance of the classifier. It is given         5.0 RESULTS AND DISCUSSION
by [9,11,13,14]:
    Accuracy (%) =
                            TP  TN        100             (19)
                                                                             In this study, 8 different classification models have been built by
                      TP  FN  TN  FP                                    using two different classifiers (i.e., ANN and SVM). The best
     AUC represents a common measure of sensitivity and                      classifiers for each dataset are determined based on four
specificity over all possible thresholds. The AUC value of 100%              measuring tools such as accuracy, sensitivity, specificity and
represents perfect discrimination (the classifier can classify the           AUC. The result obtains is summarized in Table 5. Based on the
                                                                             result obtained, the best classification techniques vary for each
                                                                             dataset.
                               SVM                                               ANN
                               accuracy    sensitivity   specificity   AUC       accuracy    sensitivity   specificity   AUC
      In terms of accuracy (Figure 3), SVM classifier outperforms            of sensitivity (correct classification of benign tumours) for
ANN classifier for WBCD and BUPA dataset. On the other hand,                 Ovarian 8-7-02 dataset and the percentage of specificity (correct
ANN classifier obtains higher performance in accuracy for JNCI               classification of malignant tumours) for JNCI (7-3-02) dataset
(7-3-02) and Ovarian 8-7-02 dataset. This indicates that, SVM has            obtained by SVM classifier is zero. However, for the datasets
the highest capability in classifying dataset with a smaller number          which have less number of input features such as WBCD and
of input features while ANN has better performance of accuracy               BUPA dataset, the imbalanced distribution of data for both
in classifying dataset with larger number of input features.                 tumours did not affect the performance of the SVM classifier in
      Figure 4 and Figure 5 showed the performance of ANN and                sensitivity and specificity. Unlike SVM classifier, the number of
SVM classifiers in terms of sensitivity and specificity. Based on            input features and the imbalanced distribution of data for both
Figure 4, ANN classifier has better result for BUPA and Ovarian              tumours affected the performance of ANN classifiers in
8-7-02 dataset. Both of the classifier obtains the same result in            sensitivity and specificity. The sensitivity and specificity value
sensitivity for WBCD and JNCI (7-3-02). For specificity (Figure              obtain by ANN classifier are higher in the class of tumours which
5), SVM classifier outperforms ANN classifier for WBCD and                   contain more data.
BUPA dataset while ANN classifier gives better results in                         Lastly, as it can be seen in Figure 6, ANN obtains higher
specificity for JNCI (7-3-02) dataset compare to SVM. Both of                performance in terms of AUC value for JNCI (7-3-02) and
the classifiers obtains similar results for Ovarian 8-7-02 dataset.          Ovarian 8-7-02 dataset while SVM classifier outperforms ANN
      From this result, it can be seen that SVM classifier is better         classifiers in terms of AUC for the other two datasets. Similar to
than ANN in classifying the dataset that represent malignant                 accuracy, SVM has higher value of AUC for classifying dataset
tumours (specificity). This is because the SVM classifier has                with a smaller number of input features while ANN has better
obtained 100% in specificity performance for three dataset which             performance of AUC in classifying dataset with bigger number of
is WBCD, BUPA and Ovarian 8-7-02 dataset. Even though, ANN                   input features.
classifier has higher performance of sensitivity in two dataset                   As the conclusions, it can be seen that both of the AI
which are BUPA and Ovarian 8-7-02 dataset compared to SVM                    classification techniques can do well in classifying cancer dataset.
classifier, but both of the classifier obtains similar results in the        Both of the classifiers obtained good performances in accuracy
other two dataset. Thus, it can be said that the SVM classifier can          based on the datasets used. ANN classifier can obtain good
also obtain good result in sensitivity or in other word; SVM can             classification performance in the dataset with bigger amount of
correctly classify the dataset which belongs to benign tumours.              input features (prostate and ovarian cancer dataset) while SVM
      For JNCI (7-3-02) and Ovarian 8-7-02 dataset, which have               classifier can have better performance in the dataset with smaller
bigger number of input features, the imbalanced distribution of              amount of input features (breast cancer and liver cancer dataset).
data for benign and malignant tumours has a big effect on the                Although both of the classifiers have good result in accuracy and
performance of the SVM classifier in sensitivity and specificity.            AUC but the SVM classifier is better in classifying data which
This can be proven in Figure 4 and Figure 5, where the percentage            belongs to each tumours (sensitivity and specificity).
79   Sharifah Hafizah S. A. Ubaidillah et al. / Jurnal Teknologi (Sciences & Engineering) 65:1 (2013), 73–81
6.0 CONCLUSIONS                                                                      Resonance Imaging using ANN and SVM. American Journal of
                                                                                      Biomedical Engineering. 1: 20–25.
                                                                               [6]    Ren J. 2012. ANN vs. SVM: Which One Performs Better in
Accurate cancer classification is important in order to save                          Classification of MCCs in Mammogram Imaging. Knowledge-Based
human’s life. Despite using common diagnosis tools, most of the                       Systems. 26: 144–153.
researchers nowadays are interested in using AI classification                 [7]    Subashini T. S., V. Ramalingam, and S. Palanivel. 2009. Breast Mass
techniques to classify cancer. This study is conducted in order to                    Classification Based on Cytological Patterns using RBFNN and SVM.
compare the performance of two AI classification techniques                           Expert Systems with Applications. 36: 5284–5290.
                                                                               [8]    Pan S. M., and C. H. Lin. 2010. Fractal Features Classification for Liver
which are SVM and ANN in classifying cancer data. Both of the                         Biopsy Images Using Neural Network-Based Classifier, International
techniques can be effective tools in order to classify cancer data.                   Symposium on Computer, Communication, Control and Automation. 2:
In the future study, different training rules can be used for training                227–230.
ANN while SVM classifier can also be train by using different                  [9]    Azmi M. S., and Z. C. Cob. 2010. Breast Cancer Prediction Based on
kernel functions in order to improve the performance of the                           Backpropagation Algorithm. Student Conference on Research and
                                                                                      Development (SCOReD). 164–168.
classifiers.                                                                   [10]   Wu Y., N. Wang, H. Zhang, L. Qin, Z. Yan, and Y. Wu. 2010.
                                                                                      Application of Artificial Neural Networks in the Diagnosis of Lung
                                                                                      Cancer by Computed Tomography. Sixth International Conference on
                          Acknowledgement                                             Natural Computation (ICNC). 1: 147–153.
                                                                               [11]   Chu F., and L. Wang. 2006. Applying RBF Neural Networks to Cancer
This study is supported by a Fundamental Research Grants                              Classification Based on Gene Expressions, International Joint Conference
                                                                                      on Neural Network. 1930–1934.
Scheme (vot : 4F086) that sponsored by Ministry of Higher                      [12]   Parfait S., P. M. Walkera, G. Créhangea, X. Tizond, and J. Mitérana.
Education (MOHE) and GUP grant (vot:Q.J130000.2628.08J02).                            2011. Classification of Prostate Magnetic Resonance Spectra using
Authors would like to thank Research Management Centre                                Support Vector Machine. Biomedical Signal Processing and Control. 7:
(RMC) Universiti Teknologi Malaysia, for the research activities                      499–508.
and Soft Computing Research Group (SCRG) for the support and                   [13]   Liao R., T. Wan, and Z. Qin. 2011. Classification of Benign and
                                                                                      Malignant Breast Tumors in Ultrasound Images Based on Multiple
motivation in making this study a success.                                            Sonographic and Textural Features. Third International Conference on
                                                                                      Intelligent Human-Machine Systems and Cybernetic. 1: 71–74.
References                                                                     [14]   Chen A. H., and C. H. Lin. 2011. A Novel Support Vector Sampling
                                                                                      Technique to Improve Classification Accuracy and To Identify Key
[1]   Sattlecker M. 2011. Optimisation of Machine Learning Methods for                Genes of Leukemia and Prostate Cancers. Expert Systems with
      Cancer Diagnostics using Vibrational Spectroscopy. PhD Thesis.                  Applications. 38: 3209–3219.
      Cranfield University, United Kingdom.                                    [15]   Polat K., and S. Güneş. 2007. Breast Cancer Diagnosis using Least
[2]   Chu F., W. Xie, and L. Wang. 2004. Gene Selection and Cancer                    Square Support Vector Machine. Digital Signal Processing. 17(4): 694–
      Classification using A Fuzzy Neural Network. IEEE.                              701.
[3]   Polat K., S. Sahan, H. Kodaz, and S. Gunes. 2007. Breast Cancer and      [16]   Murat C., E. Mehmet, Z. B. Erkan, and Y. A. Ziya. 2009. Early Prostate
      Liver Disorders Classification using Artificial Immune Recognition              Cancer Diagnosis by using Artificial Neural Networks and Support
      System (AIRS) with Performance Evaluation by Fuzzy Resource                     Vector Machines. Expert Systems with Applications. 36: 6357–6361.
      Allocation mechanism. Expert System with Applications. 32: 172–183.      [17]   Mumtaz K. 2009. Evaluation of Three Neural Network Models using
[4]   Saravanan V., and R. Mallika. 2009. An Effective Classification Model           Wisconsin Breast Cancer Database. International Conference on Control,
      For Cancer Diagnosis using Micro Array Gene Expression Data. 38th               Automation, Communication and Energy Conservation. 1–7.
      International Conference on Computer Engineering & Technology. 1:        [18]   Vapnik, V. 1995. The Nature of Statistical Learning Theory. New York:
      137–141.                                                                        Springer.
[5]   Keyvanfard F., M. A. Shoorehdeli, and M. Teshnehlab. 2011. Feature       [19]   Vapnik, V. 1998. Statistical Learning Theory. New York: Wiley.
      Selection and Classification of Breast Cancer on Dynamic Magnetic
81                             Sharifah Hafizah S. A. Ubaidillah et al. / Jurnal Teknologi (Sciences & Engineering) 65:1 (2013), 73–81
[20] Liu Y., and Y. F. Zheng. 2004. FS_SFS: A Novel Feature Selection         [23]] Akay M. F. 2009. Support Vector Machines Combined with Feature
     Method for Support Vector Machines. IEEE International Conference on           Selection for Breast Cancer Diagnosis, Expert System Appl. 36: 3240–
     Acoustic, Speech, and Signal Processing. 5: 797–800.                           3247.
[21] Chen H. L. 2011. A Support Vector Machine Classifier with Rough Set-     [24] Chang C. C., and C. J. Lin C. J. LIBSVM: A Library for Support Vector
     Based Feature Selection, Expert System Appl. 38(7): 9014–9022.                 Machine. S oftware available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[22] Bertsekas D. P. 1995. Nonlinear Programming. Belmont: Athena             [25] Hsu H. H., C. W. Hsieh, and M-D. Lu. 2011. Hybrid Feature Selection
     Scientific.                                                                    By Combining Filters and Wrappers. Expert System Appl. 38(7): 8144–
                                                                                    8150.