Ibm 16
Ibm 16
1, 2020 57
Abstract: Skin disease has more touchiness as compared to any other disease.
Regular skin issues are dermatitis. The main focus of this research paper will be
on dermatology database which contains different eryhemato-squamous
diseases class as psoriasis, seboreic dermatitis, lichen planus, pityriasisrosea,
cronic dermatitis and pityriasisrubrapilaris. Each record is a collection of 33
attributes which are linear values and one attribute of them is nominal. The
75% of the dataset utilise for demonstrating and keep down 25% for approval.
The purpose of this article is to achieve the best-performing classifier that will
communicate in the collection of dermatological information. Therefore, k-
nearest neighbours and support vector machines are used. By using ten-fold
cross validation and assess calculations utilising the accuracy metric. This is a
gross metric which will prove the developed model is best one.
Biographical notes: Vikas Chaurasia holds an MSc in Math and MCA from
UNSIET VBS Purvanchal University, U.P., India. He is currently a PhD
student and teaching assistant in the Computer Applications Department at the
V.B.S. Purvanchal University, where he teaches data mining, mathematics and
computer organisation. His research interests focus on the data mining
techniques to predict diseases. He has published more than 12 international
research papers related to disease perdition using data mining methods in
reputed journals. His area of research includes data mining, machine learning,
python programming, deep learning and artificial intelligence.
Saurabh Pal received his MSc in Computer Science in 1996 and obtained his
PhD in 2002. He then joined the Department of Computer Applications, VBS
Purvanchal University, Jaunpur as a Lecturer. Currently, he is working as Head
and Associate Professor. He has authored more than 53 research papers in
international/national conference/journals as well as four books and also guides
1 Introduction
“India suffers today in the estimation of the world, more through the world’s
ignorance of her achievements than in the absence or insignificance of these
achievements.” (Mukhopadhyay, 2016)
Nearly 1,500 BC a medical document on skin ailments Ebers Papyrus was found in
ancient Egypt. It portrays different skin maladies, including rashes, ulcers and tumours,
and recommends medical procedure for balms to care for the afflictions (Hartmann,
2016). From that point to now the skin sickness portion has indicated colossal
development. The predominance of skin malady in India is 10 to 12% of the all out
populace with eczema and psoriasis being the significant benefactors. Because of
contamination, bright light, and an unnatural weather change, photosensitive skin issue
like tanning, colour obscuring, sunburn, skin malignant growths, and irresistible
infections are expanding at a quicker pace. A 1% decrease in ozone prompts a 2 to 4%
expansion in the occurrence of tumours. The seriousness of developing skin illnesses in
India is additionally underlined by the way that the World Health Organization (WHO)
has included skin infection under the most widely recognised non-transferable maladies
in India. What’s more, there is an absence of offices that give thorough skin related
medicines under one rooftop. “The circumstance is additionally compounded by the low
accessibility of dermatologists in India. At present, there are around 6,000 dermatologists
obliging a populace of more than 135 carore. This implies for each 100,000 individuals,
just 0.49 dermatologists are accessible in India when contrasted with 3.2 in numerous
conditions of the US.” Different tertiary consideration private setups come up short on
the capacity to treat incessant, hereditary and paediatric skin diseases. Likewise, their
capacity to give complete derma pathology and immunopathology units is additionally
restricted. Non-careful restorative administrations given by independent healthy skin
focuses take into account just a little portion of skin treatment and a specific section of
the general public. Consequently, there is a critical need of exhaustive healthy skin setups
giving all healthy skin medications under one rooftop.
With the quick advancement of computer helped procedures in late years, use of
machine learning strategies is playing an undeniably essential job in the skin infection
analysis, and different forecast calculations are being investigated persistently by
analysts. When we take a shot at a machine learning venture, we regularly end up with
various great models to look over. Each model will have distinctive execution qualities.
Utilising re sampling techniques like cross validation, we can get a gauge for how precise
each model might be on inconspicuous information. We should almost certainly utilise
these appraisals to pick a couple of best models from the suite of models that we have
made. When we have another dataset, it is a smart thought to envision the information
utilising diverse methods so as to take a gander at the information from alternate points of
view. A similar thought applies to display determination. We should utilise various
distinctive methods for taking a gander at the assessed exactness of your machine
Machine learning algorithms using binary classification 59
2 Literature survey
There are various studies has done to solve skin disease prediction and developed model
to achieve best accuracy.. They all offer the clinical highlights of erythema and scaling
with almost no distinctions. The illnesses in this gathering are psoriasis,
seboreicdermatitis, lichenplanus, pityriasisrosea, perpetual dermatitis and
pityriasisrubrapilaris (Güvenir et al., 1998). There have been a few examinations given an
account of erythemato-squamous diseases finding. These investigations have connected
diverse techniques to the given issue and accomplished distinctive characterisation
correctness’s. Among these investigations the main work on the differential analysis of
erythemato-squamous illnesses was shown Table 1.
Table 1 A few investigations which have dealt with skin disease mining
Table 1 A few investigations which have dealt with skin disease mining (continued)
In this paper, we have used two types of machine learning classification algorithms first
linear algorithms and second nonlinear algorithms. Linear type of algorithms includes
logistic regression (LR) and linear discriminant analysis (LDA) and nonlinear type of
algorithms includes k-nearest neighbours (KNN), support vector machines (SVMs),
classification and regression trees (CARTs) and Gaussian Naive Bayes (NB).
Four ensemble methods for improving the performance of algorithms are used in
which two are boosting algorithms gradient boosting (GBM) and AdaBoost (AB) and two
are bagging type extra trees (ETs) and random forests (RFs) to develop a multi model
ensemble model.
In last the proposed machine learning binary classification and multi model ensemble
methods are used to enhance the output of ensemble techniques.
Machine learning algorithms using binary classification 61
3 Methods
The methodology of the proposed binary classification machine learning and multi model
ensemble is shown in Figure 1. In first stage, features are selected for the
erythemato-squamous dataset, in this process we have select the important features we
get best dataset. Then, the best dataset is divided into k groups of training and testing
datasets. This division of training and testing dataset is done using k-fold cross validation
process. Then, we will evaluate algorithms using the accuracy metric. This is a gross
metric that will give a quick idea of how correct a given model is. We apply the base
learners and then evaluated the performance of base classifiers. Now all the same base
learners are evaluated with standardised scaled copy of dataset. In next section, an
ensemble model is used to improve the accuracy by using four different machine learning
algorithms; two boosting and two bagging methods. Finally compare the predictions after
algorithm tuning with ensemble models to aim of reducing the generalisation error and
procuring a more accurate result.
The method of ten-fold cross validation utilised in this paper is outlined in Figure 2. We
will part the stacked informational index into two, 80% of which we will use to prepare
our models and 20% that we will keep down as an validate informational collection. We
will utilise ten-fold cross validation to appraise accuracy. This will part our dataset into
ten parts, train on 9 and test on 1 and rehash for all blends of train-test parts. We are
utilising the measurement of precision to assess models. This is a proportion of the
quantity of accurately anticipated occasions isolated by the absolute number of cases in
the dataset duplicated by 100 to give a rate.
Figure 3 An illustration of the ensemble techniques structure (see online version for colours)
N
Output: H(x) = arg max (h n (x) = y)
n =1
y∈Y
4 Results
Here we utilised distance-based algorithms like KNN and SVMs We have utilised
ten-fold cross validation. The dataset is not excessively little and this is a decent standard
test saddle setup. We will assess algorithms utilising the exactness metric. This is a gross
metric that will give a fast thought of how right a given model is. Making a standard of
act on this issue and spot-check various distinctive algorithms. We will choose a suite of
various algorithms fit for dealing with this classification issue. The algorithms all
utilisation defaults tuning parameters.
On comparing the algorithms mean accuracy values are given in Table 2.
Table 2 Output of evaluating algorithms
It is always learned to look at the distribution of accuracy values calculated across cross
validation folds. The results demonstrate a tight distribution for LR which is empowering,
recommending low difference. The poor outcomes for KNN are amazing. See Figure 4.
It is conceivable that the changed appropriation of the attributes is affecting the
exactness of algorithms, for example, KNN. In the following segment we will rehash this
spot-check with a standardised copy of the training dataset.
We can see that LR is as yet progressing admirably. We can likewise observe that the
standardisation of the data has lifted the aptitude of SVM to be the most precise
algorithm tried up until this point. See Table 3.
Figure 5 plots the distribution of the accuracy scores using box and whisker plots.
68 V. Chaurasia and S. Pal
The outcomes propose delving further into the LR and SVM algorithms. Almost
certainly, setup past the default may yield significantly increasingly precise models.
4.2.1 Tuning LR
We use parameter C as our regularisation parameter. Parameter C = 1/λ. Lambda (λ)
controls the exchange off between enabling the model to expand its unpredictability as
much as it needs with attempting to keep it straightforward. For instance, if λ is
exceptionally low or 0, the model will have enough capacity to build its complexity (over
fit) by relegating enormous qualities to the loads for every parameter. In the event that, in
the other hand, we increment the estimation of λ, the model will tend to under fit, as the
model will turn out to be excessively straightforward. Parameter C will work a different
way. For little estimations of C, we increment the regularisation quality which will make
basic models which under fit the information. For enormous estimations of C, we low the
intensity of regularisation which impels the model is permitted to expand its multifaceted
nature, and in this way, over fit the data. See table 4.
Table 4 Results of tuning LR on the scaled dataset
We can see that the ideal setup is C = 10. This is interesting as the algorithm will make
forecasts utilising the most comparative example in the training dataset alone.
Machine learning algorithms using binary classification 69
We can see the most precise design was SVM with a sigmoid kernel and a C estimation
of 0.3. The accuracy 97.26 % is apparently not as much as what LR could accomplish.
Figure 6 Plots of ensemble algorithms comparison (see online version for colours)
Machine learning algorithms using binary classification 71
We can see that both bagging procedures give solid precision scores in the low ‘90s (%)
with default designs. We can plot the accuracy scores over the cross validation folds. See
Figure 6.
5 Discussion
In view of the results, we see that the proposed single classifier technology can produce
satisfactory results and outperform multi-model integration methods in disease prediction
(Chaurasia et al., 2018a, 2018b). Due to the complexity of the analysis and the need for
biopsy, auspicious and precise determinations are essential. Along these lines, improving
the accuracy of predictions through the application of computer-supported programs is
very helpful in treating diseases.
In the survey, we examined the six separately functioning classification models and
multi-model integration methods proposed in this paper. Six classifiers have been
established and promoted, which are commonly used for anticipated skin diseases. As
perceptually indicated, for SVM and LR, each classifier may rank behind the other on the
same dataset. A similar situation may occur on different classifiers, which indicates that
each technology has its own weaknesses compared to other technologies. It is this
perception that has inspired us to propose a procedure that combines various classifiers in
order to obtain proof of an increasingly accurate and fair arrangement. Our results on the
72 V. Chaurasia and S. Pal
dataset show that the six classifiers acting on the dataset alone have higher accuracy than
the multi-model integration method.
Moreover, a random check of the accuracy and standardisation accuracy of the
classifier shows that the prediction performance of the dataset displayed by a single
classifier is unstable. From the accuracy of the six classifiers, we selected SVM and LR
from the standardised datasets, which are very accurate. After adjusting the LR and SVM,
the accuracy will be improved. By going beyond to fit the output of the four classifiers,
our proposed technique will continue to prepare the weights for each classifier. In this
strategy, the implementation of the classifier with higher accuracy is more effective, and
the interference data of the classifier with lower accuracy is excluded. In this way, the
advantages of each classifier are fully considered and utilised, and better prediction
execution is obtained.
Additional tests were performed to confirm the expected predictions of better LR
performance findings after normalising the dataset. We can compute from the entire
training dataset and apply equivalent transformations to the input attributes in the
validation dataset. We need to come up with six unique classification models that require
higher computational costs. In order to overcome this limitation to a certain extent, we
applied a feature determination method in the data pre-processing stage, which can
significantly reduce the running time and improve the prediction accuracy in a similar
time. In general, the selection of features in the disclosure of imperative dermatomes and
pathological studies deserves more consideration.
Ensemble techniques are used to improve the results of the prediction of skin disease.
Four ensemble techniques are used AB, GBM, RF and ET. These techniques improve the
accuracy score up to a maximum of 99% in the case of ET ensemble method. Instead of
using these three techniques we can also used Bucket of models or stacking ensemble
techniques but it is future plan to test the predictions.
6 Conclusions
References
Abdi, H. (2007) ‘Discriminant correspondence analysis’, Encyclopedia of Measurement and
Statistics, pp.270–275, Sage, Thousand Oaks (CA).
Altman, N.S. (1992) ‘The American statistician. An introduction to kernel and nearest-neighbor
nonparametric regression’, Journal of the American Statistical Association, Vol. 46, No. 3,
pp.175–185.
Amarathunga, A.A.L.C., Ellawala, E.P.W.C., Abeysekara, G.N. and Amalraj, C.R.J. (2015) ‘Expert
system for diagnosis of skin diseases’, International Journal of Scientific & Technology
Research, Vol. 4, No. 1, pp.174–178.
Badrinath, N., Gopinath, G., Ravichandran, K.S. and Soundhar, R.G. (2016) ‘Estimation of
automatic detection of erythemato-squamous diseases through AdaBoost and its hybrid
classifiers’, Artificial Intelligence Review, Vol. 45, No. 4, pp.471–488.
Bojarczuka, C.C., Lopesb, H.S. and Freitasc, A.A. (2001) ‘Data mining with constrained-syntax
genetic programming: applications in medical data set’, in Data Analysis in Medicine and
Pharmacology (IDAMAP-2001), a Workshop at Medinfo-2001, London, UK.
Chang, C.L. and Chen, C.H. (2009) ‘Applying decision tree and neural network to increase quality
of dermatologic diagnosis’, Expert Systems with Applications, Vol. 36, No. 2, pp.4035–4041.
Chaurasia, V., Pal, S. and Tiwari, B.B. (2018a) ‘Prediction of benign and malignant breast cancer
using data mining techniques’, Journal of Algorithms & Computational Technology, Vol. 12,
No. 2, pp.119–126.
Chaurasia, V., Pal, S. and Tiwari, B.B. (2018b) ‘Chronic kidney disease: a predictive model using
decision tree’, Int. J. Eng. Res. Technol., Vol. 11, No. 11, pp.1781–1794.
Güvenir, H.A. and Emeksiz, N. (2000) ‘An expert system for the differential diagnosis of
erythemato-squamous diseases’, Expert Systems with Applications, Vol. 18, No. 1, pp.43–49.
Güvenir, H.A., Demiröz, G. and Ilter, N. (1998) ‘Learning differential diagnosis of
erythemato-squamous diseases using voting feature intervals’, Artificial Intelligence in
Medicine, Vol. 13, No. 3, pp.147–165.
Hartmann, A. (2016) ‘Back to the roots-dermatology in ancient Egyptian medicine’, JDDG:
Journal der Deutschen Dermatologischen Gesellschaft, Vol. 14, No. 4, pp.389–396.
https://scikit-learn.org/stable/modules/feature_selection.html.
Idoko, J.B., Arslan, M. and Abiyev, R. (2018) ‘Fuzzy neural system application to differential
diagnosis of erythemato-squamous diseases’, Cyprus Journal of Medical Sciences, Vol. 3,
No. 2, pp.90–97.
Lekkas, S. and Mikhailov, L. (2010) ‘Evolving fuzzy medical diagnosis of Pima Indians diabetes
and of dermatological diseases’, Artificial Intelligence in Medicine, Vol. 50, No. 2,
pp.117–126.
Maghooli, K., Langarizadeh, M., Shahmoradi, L., Habibi-koolaee, M., Jebraeily, M. and
Bouraghi, H. (2016) ‘Differential diagnosis of erythmato-squamous diseases using
classification and regression tree’, Acta Informatica Medica, Vol. 24, No. 5, p.338.
Mukhopadhyay, A.K. (2016) ‘Dermatology in India and Indian dermatology: a medico-historical
perspective’, Indian Dermatology Online Journal, Vol. 7, No. 4, p.235.
Nanni, L. (2006) ‘An ensemble of classifiers for the diagnosis of erythemato-squamous diseases’,
Neurocomputing, Vol. 69, Nos. 7–9, pp.842–845.
Perriere, G. and Thioulouse, J. (2003) ‘Use of correspondence discriminant analysis to predict the
subcellular location of bacterial proteins’, Computer Methods and Programs in Biomedicine,
Vol. 70, No. 2, pp.99–105.
Polat, K. and Güneş, S. (2009) ‘A novel hybrid intelligent method based on C4.5 decision tree
classifier and one-against-all approach for multi-class classification problems’, Expert Systems
with Applications, Vol. 36, No. 2, pp.1587–1592.
74 V. Chaurasia and S. Pal
Übeyli, E.D. (2009) ‘Combined neural networks for diagnosis of erythemato-squamous diseases’,.
Expert Systems with Applications, Vol. 36, No. 3, pp.5107–5112.
Übeyli, E.D. and Doğdu, E. (2010) ‘Automatic detection of erythemato-squamous diseases using
k-means clustering’, Journal of Medical Systems, Vol. 34, No. 2, pp.179–184.
Übeylı, E.D. and Güler, I. (2005) ‘Automatic detection of erythemato-squamous diseases using
adaptive neuro-fuzzy inference systems’, Computers in Biology and Medicine, Vol. 35, No. 5,
pp.421–433.
Verma, A.K., Pal, S. and Kumar, S. (2019a) ‘Prediction of skin disease using ensemble data mining
techniques and feature selection method – a comparative study’, Applied Biochemistry and
Biotechnology, Vol. 190, No. 2, pp.1–19.
Verma, A.K., Pal, S. and Kumar, S. (2019b) ‘Classification of skin disease using ensemble data
mining techniques’, Asian Pacific Journal of Cancer Prevention, Vol. 20, No. 6,
pp.1887–1894.
Xie, J. and Wang, C. (2011) ‘Using support vector machines with a novel hybrid feature selection
method for diagnosis of erythemato-squamous diseases’, Expert Systems with Applications,
Vol. 38, No. 5, pp.5809–5815.
Zhang, X., Wang, S., Liu, J. and Tao, C. (2018) ‘Towards improving diagnosis of skin diseases by
combining deep neural network and human knowledge’, BMC Medical Informatics and
Decision Making, Vol. 18, No. 2, p.59.
Zhou, H., Xie, F., Jiang, Z., Liu, J., Wang, S. and Zhu, C. (2017) ‘Multi-classification of skin
diseases for dermoscopy images using deep learning’, in 2017 IEEE International Conference
on Imaging Systems and Techniques (IST), IEEE, October, pp.1–5.