See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/343673685
Skin cancer detection and classification using machine learning
Article in Materials Today: Proceedings · August 2020
DOI: 10.1016/j.matpr.2020.07.366
CITATIONS READS
40 11,052
5 authors, including:
Usha Kumari Laxmi Lydia
Gokaraju Rangaraju Institute of Engineering & Technology Vignan’s Institute of Information Technology
80 PUBLICATIONS 586 CITATIONS 126 PUBLICATIONS 941 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Big Data Analytics View project
DRA antenna and optimization View project
All content following this page was uploaded by Usha Kumari on 19 October 2020.
The user has requested enhancement of the downloaded file.
Materials Today: Proceedings xxx (xxxx) xxx
Contents lists available at ScienceDirect
Materials Today: Proceedings
journal homepage: www.elsevier.com/locate/matpr
Skin cancer detection and classification using machine learning
M. Krishna Monika a, N. Arun Vignesh a, Ch. Usha Kumari a,⇑, M.N.V.S.S. Kumar b, E. Laxmi Lydia c
a
Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India
b
Aditya Institute of Technology and Management, Srikakulam, Andhra Pradesh, India
c
Vignan’s Institute of Information Technology, Visakhapatnam, Andhra Pradesh, India
a r t i c l e i n f o a b s t r a c t
Article history: Skin cancer is considered as one of the most dangerous types of cancers and there is a drastic increase in
Received 10 July 2020 the rate of deaths due to lack of knowledge on the symptoms and their prevention. Thus, early detection
Accepted 15 July 2020 at premature stage is necessary so that one can prevent the spreading of cancer. Skin cancer is further
Available online xxxx
divided into various types out of which the most hazardous ones are Melanoma, Basal cell carcinoma
and Squamous cell carcinoma. This project is about detection and classification of various types of skin
Keywords: cancer using machine learning and image processing tools. In the pre-processing stage, dermoscopic
Dermoscopic images
images are considered as input. Dull razor method is used to remove all the unwanted hair particles
Dull razor method
Filters
on the skin lesion, then Gaussian filter is used for image smoothing. For noise filtering and to preserve
Feature extraction the edges of the lesion, Median filter is used. Since color is an important feature in analyzing the type
ABCD method of cancer, color-based k-means clustering is performed in segmentation phase. The statistical and texture
GLCM method feature extraction is implemented using Asymmetry, Border, Color, Diameter, (ABCD) and Gray Level Co-
Classification occurrence Matrix (GLCM). The experimental analysis is conduted on ISIC 2019 Challenge dataset consist-
MSVM ing of 8 different types of dermoscopic images. For classification purpose, Multi-class Support Vector
Machine (MSVM) was implemented and the accuracy obtained is about 96.25.
Ó 2020 Elsevier Ltd. All rights reserved.
Selection and peer-review under responsibility of the scientific committee of the International Confer-
ence on Nanotechnology: Ideas, Innovation and Industries.
1. Introduction And the other types include Melanocytic nevus, Actinic keratosis
(AK), Benign keratosis, Dermatofibroma, Vascular lesions. Of all
Skin cancer rates as the 6th most types of cancer that are the types, Melanoma is the most dangerous type and can grow
increasing globally. Generally, skin consists of cells and these cells back even after removal. Australia and the United States are the
comprise tissues. Thus, cancer is caused due to the abnormal or most affected by skin cancer.
uncontrolled growth of the cells in the corresponding tissues or This paper uses the most suitable techniques to categorize all
to the other adjacent tissues. Exposure to UV rays, depressed the types of cancer that are mentioned above. Dull Razor method
immune system, family history, etc., maybe the reason for the and Gaussian filter are used for image enhancement and Median
occurrence of cancer. This type of irregular pattern of cell growth filter is used for noise removal. The above steps are considered as
can be given as either benign or malignant. Benign tumors are can- preprocessing stage. Color-based k- means clustering is used to
cer type and generally, they are considered as moles, which are not segment the preprocessed images. To extract the features from
harmful. Whereas, malignant tumors are treated as cancer which is the segmented images, two methods known as the ABCD method
threatening to life. They can also damage the other tissues of the and GLCM methods are used. Features from both the methods
body. The layer of the skin consists of three types of cells: Basal are combined for further classification. Lastly, to achieve high accu-
cell, Squamous cell, and Melanocyte. These are responsible for racy MSVM classifier is used for classification purposes.
the tissues to become cancerous. There are different types of skin
cancers, of which Melanoma, Basal cell carcinoma (BCC), Squamous 2. Related work
cell carcinoma (SCC), which are considered as dangerous types.
In this paper [1], classification of two types of skin cancer
⇑ Corresponding author. whether melanoma or non-melanoma was performed. Rather than
E-mail address: ushakumari.c@gmail.com (Ch. Usha Kumari). using color or gray image alone, the combination of both was used
https://doi.org/10.1016/j.matpr.2020.07.366
2214-7853/Ó 2020 Elsevier Ltd. All rights reserved.
Selection and peer-review under responsibility of the scientific committee of the International Conference on Nanotechnology: Ideas, Innovation and Industries.
Please cite this article as: M. K. Monika, N. Arun Vignesh, C. Usha Kumari et al., Skin cancer detection and classification using machine learning, Materials
Today: Proceedings, https://doi.org/10.1016/j.matpr.2020.07.366
2 M.K. Monika et al. / Materials Today: Proceedings xxx (xxxx) xxx
to get better results. Segmentation is performed using k-means stable. Thus, this paper [9] proposes an idea to use grey images
clustering, whereas ABCD method (Asymmetry, Boundary irregu- instead of color profile for texture analysis. GLCM is used for the
larity, color, Diameter). Total of 150 images are used out of which feature extraction whereas SVM is used as a classifier to classify
75 images are melanoma and non-melanoma each. The perfor- the various types of skin cancer.
mance evaluation is done using four classifiers, in which SVC and
1-NN achieved highest accuracy with the same number of feature
set. 3. Proposed methodology
In this paper [2], a 3D reconstruction algorithm is proposed
using 2D images, where the detection of 3D image shape and The proposed methodology is shown in Fig. 1 using a block dia-
RGB are performed. The images are pre-processed and converted gram and each block is explained in detail below.
into binary images of 0 s and 1 s. Adaptive snake algorithm is used Input image: The proposed system uses dataset consists of
for segmentation purpose. Along with all the features a 3D depth high-resolution dermoscopic images. ISIC 2019 challenge dataset
estimation parameter is also used to increase the efficiency of which consists of eight different classes is compressed into 800
classification. images and applied to the proposed system [8–10].
Early detection of melanoma at its premature stage is the best Pre-processing: The acquisition of images process must be
way to decrease the effect of the disease. This paper discusses [3] non-uniform in several terms. Thus, the main goal of the pre-
the one of the approaches that uses MVSM classifier. Five different processing step is to enhance the image parameters such as qual-
skin lesion types such as actinic keratosis, Squamous Cell Cancer, ity, clarity, etc., by removing or reducing the unwanted parts of
Basal Cell Cancer, Seborrhoeic Verruca, Nevocytic nevus are the image or the background. The main steps of the pre-
grouped and considered by the proposed system. GLCM is used processing are grayscale conversion, image enhancement, and
to extract color and texture features such as contrast, gradient, noise removal. In this proposed system, firstly all the images are
homogeneity. K-means clustering is used for the purpose of seg- converted into grayscale. Then two filters which are known as
mentation. The tumor area was calculated for all the five types of Gaussian filter and median filter are used for image enhancement
images. The classification and segmentation results are shown and noise removal. Along with filters, to remove the unwanted hair
using a GUI. from the skin lesion, the Dull Razor Method is used.
Melanoma is the most common type of skin cancer. This paper The aim of image enhancement is to intensify the image quality
[4] proposes an idea to classify the melanoma using shearlet trans- by increasing its visibility. Generally, most of the skin lesions com-
form coefficients and naïve Bayes classifier. The dataset is decom- prises of body hair, which can act as an obstacle in the process of
posed using shearlet transform with the predefined number of (50, achieving high accuracy at the time of classification. So, inorder
75 and 100) shearlet coefficients. Then to the naïve bayes classifier, to remove the unwanted hair from the images, Dull razor method
the required coefficients are applied. The accuracy achieved at 3rd is used. Dull Razor method mainly performs these operations: a) By
level of classification using 100 coefficients of shearlet transform. using the grayscale morphological operation, it recognizes the
Dermoscopy is the major technique used to detect skin cancer. position of the hair on the skin lesion. b) After locating the position
The Dermoscopic images must be very clear and there should be an of the hair pixel, it verifies the shape either as a thin or long struc-
expert dermatologist to deal the issues related to diseases. But, this ture and then replace that hair pixel by using bilinear interpola-
is a time consuming process. This paper [5] presents a ground idea tion. c) Lastly, with the help of adaptive median filter, it
of an annotation tool which can upgrade the manual segmentation smoothens the replaced hair pixel.
methods, by building a ground truth database for the automation
of segmentation and classification processes, developed under
the guidance of dermatologists. The main functionalities of this
tool are: image uploading and displaying, manual segmentation,
boundary reshaping, region labelling, a posteriori boundary edi-
tion, multi-user ground truth annotation and segmentation com-
parison, and storage of the segmented images. From all the above
functionalities, it is more advantageous for boundary reshaping
and free hand drawing.
Feature extraction is the key step in any detection system. Fea-
ture extraction is nothing but extracting or taking the features
from the input image or dataset and represents them in set of val-
ues. The features can be of different types such as color, shape, tex-
ture and morphological features and the extraction of the features
depend on the respective application. This paper [6] includes dif-
ferent techniques of feature extraction and proposed a best way
for the skin cancer detection application. In this proposed system,
Hair removal is the basic and first step, then followed by segmen-
tation using OTSU method. In the proposed system, the extracted
features include circularity, High luminance Scale, Fast corners,
solidity, shape skewness and border skewness and the accuracy
of all are computed. Among them, shape and texture + color fea-
tures achieve high accuracy of about 97%, which implies them as
most suitable type of technique for skin cancer feature extraction.
In order to prevent the melanoma at an early stage, certain fea-
tures should be analysed clearly [7]. Previous work is done on the
skin images by considering them in frequency domain, where the
histogram profile is flat since the color of the skin lesions may be Fig. 1. Block diagram of proposed methodology.
Please cite this article as: M. K. Monika, N. Arun Vignesh, C. Usha Kumari et al., Skin cancer detection and classification using machine learning, Materials
Today: Proceedings, https://doi.org/10.1016/j.matpr.2020.07.366
M.K. Monika et al. / Materials Today: Proceedings xxx (xxxx) xxx 3
Feature extraction: Feature extraction is considered as the
most crucial part in the entire process of classification [11]. The
extraction of relevant features from the given input dataset for per-
forming computations such as detection and classification further
is called feature extraction [12]. Our proposed system uses two
methods such as ABCD and GLCM to extract the features from
the skin lesions and the generated results are combined into an
excel sheet. Features such as the Asymmetry index, Diameter,
Standard vector, Mean Color channel values, Energy, Entropy,
Autocorrelation, correlation, homogeneity, and contrast are pro-
duced for further classification purposes.
ABCD method is the standard method for any dermatological
applications. There are some particular symptoms which need to
consider in skin cancer case, they are Asymmetry, Border irregular-
ity, Color and Diameter which are known as ABCD parameters. The
method of finding these parameters is termed as the ABCD method.
Asymmetry is calculated by considering the area of the lesion,
where the total area of the segmented image is divided into two
Fig. 2. Input image.
halves. Thus, the asymmetry index is calculated by figuring how
much one-half of the region matches with the other half and is
indicated with a score of 0, 1, 2. Border irregularity is the abrupt-
Gaussian filters are predominantly used to blur images and to ness and unevenness of the image. It is important to depict the
remove redundant features form the skin lesion. These are low color of the images which are irregular in shades. For color values,
pass filters with linear smoothing. This filter uses 2D convolution each color channel is separated and average intensity and standard
operator with the weights selected in the shape of the Gaussian deviation are calculated. The diameter of all the images is
function. extracted. For example, malignant melanoma diameter is greater
Segmentation: Segmentation is the process of separating the than 6 mm.
region of interest of the image. This separation can be done by con- In statistical texture analysis, the texture features are classified
sidering each pixel of the image with a similar attribute. The main as the first, second and third order. The results are obtained at dif-
advantage here is instead of processing the entire image, the image ferent positions relative to one another of the images. Grey Level
which is divided into segments can be processed. The most com- Co-occurrence Matrix (GLCM) method is a way of extracting
mon technique is to indicate the edges of the particular region. second-order statistical texture features. GLCM performs the calcu-
The other approaches such as thresholding, clustering, and region lation by considering two pixels called reference and neighboring
growing use detection of similarities in the particular region. pixels at a time. It is defined with the help of a matrix, where
Color-based k means clustering is implemented here. the number of gray levels in an image is identical to the number
Clustering algorithms are treated as unsupervised algorithms of rows and columns respectively. The matrix element P (i, j | Dx,
but are similar to classification algorithms. It is the process of Dy) is known as relative frequency, where i and j represent the
identifying some segments or clusters from the background in intensity and both are separated by a pixel distance Dx, Dy.
the data provided. K-means clustering generally partitions the According to the co-occurrence matrix, there are 14 features
given data into k parts which are known as clusters depended defined of which Energy, Entropy, Autocorrelation, correlation,
on the k-centroids. This type is mainly used in the case of unla- homogeneity, and contrast is considered.
belled data, where certain groups can be formed based on the Classification: MSVM is the part of Support vector machine and
availability of similarities in the data. The main steps involved is used for solving the multiclass problems. SVM is the very precise
in this algorithm are given as a) select the number of clusters; method for implementation [13]. SVM mainly works on the deci-
k. b) then chooses a random k point which can be treated as cen- sion planes concept, where it separates the objects into different
troids. c) To form the clusters, assign each data to the nearest classes. It defines the decision boundaries, so it is characterized
centroid. d) Now compute and replace the new centroid of each by the capability control established. However, in the case of mul-
cluster. e) Again reassigns the data points to the new closest cen- ticlass classification problem, the output of one class should match
troid. If any reassignments required to repeat the above process with the other classes, which involves complexity. So, the output of
until the value k. one class must be divided into M sub classes.
Fig. 3. Pre-processing stage results, (a) Dull razor image, (b) Gray scale image, (c) Gaussian filter, (d) Median filter.
Please cite this article as: M. K. Monika, N. Arun Vignesh, C. Usha Kumari et al., Skin cancer detection and classification using machine learning, Materials
Today: Proceedings, https://doi.org/10.1016/j.matpr.2020.07.366
4 M.K. Monika et al. / Materials Today: Proceedings xxx (xxxx) xxx
Table 1 Segmentation: The image is segmented using color based k
Extracted features and their values. means clustering and results are shown in Fig. 3.
Features Values Feature extraction: Extracted features for the input image
Standard vector 20.8532 using ABCD and GLCM methods are listed in the Table 1 given
Diameter 2.1480 below:
Asymmetry index 1 Classification: MSVM is used for classification. Since the ISIC
Color values of r, g, b 37.0471, 23.2337, 27.0009 dataset consists of about 25,000 images which involves complex-
Auto correlation 2.520931623931624e + 01
Contrast 1.228632478632479e-01
ity, total of 800 images are considered by following 200 images
Correlation 9.894224944536026e-01 for each class. The training to testing ratio is 70:30. The confusion
Energy 1.669194389655928e-01 matrix is shown below in Figs. 4 and 5.
Entropy 2.156049329513495e + 00 The accuracy and precision achieved is about 96.25% and
Homogeneity 9.411574074074074e-01
96.32%.
5. Conclusion
4. Results
Globally, there is a drastic increase in the rate of skin cancer
Input: An example image from the dataset chosen is as shown cases because of several factors. So early detection plays a crucial
in Fig. 2 below. The sample image represents cancerous part of the role in detection and treatment. Thus, this paper discusses an
skin infected. approach based on the MSVM classification, where it uses two
Pre-processing stage: Firstly, for the input image, dull razor effective methods called ABCD and MSVM for feature extraction.
method is applied, then it is converted into gray scale, followed The accuracy achieved is about 96.25%. The proposed system uses
by application of Gaussian filter and median filter. The pre- eight types of skin cancers for classification and to obtain high
processing results are shown in Fig. 2. accuracy and precision.
Fig. 4. Segmentation results, (a) Image labelled by cluster index, (b) Objects in cluster 1, (c) Objects in cluster 2, (d) Objects in cluster 3.
Fig. 5. Confusion matrix.
Please cite this article as: M. K. Monika, N. Arun Vignesh, C. Usha Kumari et al., Skin cancer detection and classification using machine learning, Materials
Today: Proceedings, https://doi.org/10.1016/j.matpr.2020.07.366
M.K. Monika et al. / Materials Today: Proceedings xxx (xxxx) xxx 5
CRediT authorship contribution statement [5] Ferreira, P. M., Mendonça, T., Rozeira, J., & Rocha, P, ‘‘An annotation tool for
dermoscopy image segmentation”, In Proceedings of the 1st International
Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision
M. Krishna Monika: Conceptualization, Investigation, Writing - Applications (p. 5). ACM, May, 2012.
review & editing. N. Arun Vignesh: Methodology, Resources, Writ- [6] VedantiChintawar, JignyasaSanghavi, ‘‘Improving Feature Selection
Capabilities in Skin Disease Detection System”, International Journal of
ing - review & editing. Ch. Usha Kumari: Methodology, Writing -
Innovative Technology and Exploring Engineering (IJITEE), Volume 8, Issue
original draft, Supervision. M.N.V.S.S. Kumar: Software, Resources, 8S3, June, 2019.
Writing - original draft, Supervision. E. Laxmi Lydia: Investigation, [7] Hutokshi Sui, ManishaSamala, Divya Gupta, Neha Kudu, ‘‘Texture feature
extraction for classification Of Melanoma”, International Research Journal of
Writing - original draft, Supervision.
Engineering and Technology (IRJET), Volume 05, Issue 03, March, 2018.
[8] P. Tschandl, C. Rosendahl, H. Kittler, The HAM10000 dataset, a large collection
Declaration of Competing Interest of multi-source dermatoscopic images of common pigmented skin lesions, Sci.
Data 5 (2018), https://doi.org/10.1038/sdata.2018.161 180161.
[9] N.C.F. Codella M. David Gutman B.H. EmreCelebi M.A. Marchetti S.W. Dusza
The authors declare that they have no known competing finan- KonstantinosLiopyris AadiKalloo N. Mishra H. Kittler A. Halpern ‘‘Skin Lesion
cial interests or personal relationships that could have appeared Analysis Toward Melanoma Detection A Challenge at the 2017 International
to influence the work reported in this paper. Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin
Imaging Collaboration (ISIC)” 2017 arXiv:1710.05006.
[10] Marc Combalia, Noel C. F. Codella, Veronica Rotemberg, Brian Helba, Veronica
References Vilaplana, Ofer Reiter, Allan C. Halpern, Susana Puig, JosepMalvehy:
‘‘BCN20000: Dermoscopic Lesions in the Wild”, 2019; arXiv:1908.02288.
[1] MohdAnas, Ram Kailash Gupta, Dr. Shafeeq Ahmad, ”Skin Cancer Classification [11] C. Usha Kumari, A.K. Panigrahy, N. Arun Vignesh, Sleep bruxism disorder
Using K-Means Clustering”, International Journal of Technical Research and detection and feature extraction using discrete wavelet trasform, Lecture
Applications, Volume 5, Issue 1, 2017. Notes in Elect. Eng. 605 (2020) 833–840.
[2] T.Y. Satheesha, D. Dr, D.r. Satyanarayana, M.N. Giriprasad, K.N. Nagesh, [12] K. Swaraja, Protection of medical image watermarking, J. Adv. Res. Dyn.
Detection of Melanoma Using Distinct Features, 3rd MEC International Control Syst. 9 (Special issue 11) (2017) 480–486.
Conference on Big Data and Smart City, 2016. [13] Padmavathi K, Krishna K.S.R, ‘‘Myocardial infraction detection using
[3] R.S. ShiyamSundar, M. Vadivel, ‘‘Performance Analysis of Melanoma Early magnitude squared coherence and Support Vector Machine”, 2014
Detection using Skin Lession Classification System”, International Conference International Conference on Medical imaging, m-health and Emerging
on Circuit, Power and Computing Technologies [ICCPCT], 2016. Communication Systems, MedCom 2014, art.no. 7006037, pp. 382-385, 2014.
[4] S. Mohan Kumar, J. Ram Kumar, K. Gopalakrishnan, Skin cancer diagnostic
using machine learning techniques - shearlettransform and naïve bayes
classifier, Int. J. Eng. Adv. Technol. (IJEAT) 9 (2) (2019) 2249–8958.
Please cite this article as: M. K. Monika, N. Arun Vignesh, C. Usha Kumari et al., Skin cancer detection and classification using machine learning, Materials
Today: Proceedings, https://doi.org/10.1016/j.matpr.2020.07.366
View publication stats