Machine Learning in Disease Detection
Machine Learning in Disease Detection
net/publication/349054979
CITATIONS READS
11 2,301
2 authors:
Some of the authors of this publication are also working on these related projects:
Different Model for Hand Gesture Recognition with a Novel Line Feature Extraction View project
All content following this page was uploaded by Adnan Mohsin Abdulazeez on 05 February 2021.
 Abstract:
 One of the most significant subjects of society is human healthcare. It is
 looking for the best one and robust disease diagnosis to get the care they
 need as soon as possible. Other fields, such as statistics and computer
 science, are needed for the health aspect of searching since this recognition
 is often complicated. The task of following new approaches is challenging
 these disciplines, moving beyond the conventional ones. The actual number
 of new techniques makes it possible to provide a broad overview that
 avoids particular aspects. To this end, we suggest a systematic analysis of                     IJSB
 human diseases related to machine learning. This research concentrates on         Literature review
                                                                                   Accepted 19 January 2021
 existing techniques related to machine learning growth applied to the            Published 25 January 2021
 diagnosis of human illnesses in the medical field to discover exciting trends,   DOI: 10.5281/zenodo.4462858
                                                 102
IJSB                                                 Volume: 5, Issue: 2 Year: 2021 Page: 102-113
Introduction
In human society, healthcare is one of the most urgent issues, as the quality of life of people is
It relies explicitly on it (Bagga & Hans, 2015). The healthcare area, however, is exceedingly
varied, broadly dispersed, and fragmented. The delivery of adequate patient care from a
clinical perspective requires access to appropriate patient information, rarely accessible
when necessary (Grimson et al., 2001; Zeebaree et al., 2019). Besides, the large variance in
the order of tests for diagnostic purposes indicates the need for an adequate and suitable
collection of tests (Daniels & Schroeder, 1977; Wennberg, 1984; Zeebaree et al., 2019).
(Smellie et al., 2002) expanded this claim by suggesting that the significant differences found
in the request for general practice pathology arise primarily from individual variations in
clinical practice and are thus likely to improve through more transparent and better-
informed decision-making for physicians (Stuart et al., 2002). Therefore, medical data also
consist of many heterogeneous variables obtained from various sources, such as
demographics, history of illness, medications, allergies, biomarkers, medical photographs, or
genetic markers, each offers a different partial view of the condition of the patient. Also,
among the sources, as mentioned earlier, statistical properties are fundamentally different.
Researchers and practitioners face two challenges when analyzing such data: The curse of
dimensionality (the number of dimensions and the number of samples increases
exponentially in the space of the features) and the heterogeneity of function sources and
statistical features (Pölsterl et al., 2016). These causes contribute to delays and inaccuracies
in the diagnosis of the disease and, therefore, patients have not been able to obtain adequate
care. Therefore, there is a strong need for an appropriate and systematic approach that
enables early detection of the disease and can be used as a physician's decision-making aid
(Zhuang et al., 2009). Therefore, the medical, computer, and statistical fields face the
challenge of exploring new strategies for modeling disease prognosis and diagnosis, as
conventional paradigms struggle to answer all of this information (Huang et al., 2007). Today,
ML offers many essential resources for intelligent data analysis. Furthermore, its technology
is currently well adapted for the study of medical data. In particular, a wide variety of medical
diagnostic work has been carried out on small-specialized diagnostic problems(Bargarai et
al., 2020; Kononenko, 2001), where initial ML applications have been found. ML classifiers
have been successfully used, for example, to differentiate between stable patients and those
with Parkinson's disease (Sriram et al., 2016; Zebari et al., 2020), which is a valuable tool in
clinical diagnosis. Indeed, on a wide range of significant issues, most ML algorithms perform
very well.
2. Background Theory
This section briefly introduced Machine learning, its types, and the most used literature
techniques, comparing the studies and research about machine learning.
2.1 ML
 ML is a branch of artificial intelligence that enables computers to think like human beings
and make their own decisions without human interference. ML has much progress in
detecting various forms of disease due to the rapid growth of Artificial Intelligent. A machine
learning algorithm also provides us with more precise predictions and performance
(Shaheamlung et al., 2020). ML has been widely divided into various forms, as seen in figure
1. below.
                                               103
IJSB                                                Volume: 5, Issue: 2 Year: 2021 Page: 102-113
a) SUPERVISED LEARNING
 This type of ML gives a training data set. This ML approach responds accurately to all
 feasible inputs, as it depends on the training data set. Supervised learning from examples is
 often referred to as learning (Hashem et al., 2018; Sadeeq & Abdulazeez, 2018; Shi & Malik,
 2000; Zebari et al., 2020). Regression and classification are two forms of supervised machine
 learning.
b) UNSUPERVISED LEARNING
Right answers or goals are not given. Because of these similarities, the purpose of un-
regulated learning techniques is to discover the similarities between knowledge data and the
story structured by an un-directed learning approach. This type of learning is otherwise
referred to as calculating thickness. Grouping requires unsupervised adaptation (Jahwar,
2021; Najim Adeen et al., 2020; Pan & Tompkins, 1985).
d) REINFORCEMENT LEARNING
The psychology of behaviorists endorses this form of ML. An algorithm indicates that the
answer is incorrect, but it does not say how to correct that response. This algorithm conducts
several tests before it finds the right answer. Improvement is not feasible in this learning
process.
                                              104
IJSB                                                 Volume: 5, Issue: 2 Year: 2021 Page: 102-113
2.2.1. Support Vector Machines (SVM)
SVM, which was designed in the 1990s. SVM is used to accomplish (ML) tasks, and it is a
prominent and straightforward tool. A selection of training samples divides each sample into
different categories in this process. Help vector SVM computer, used primarily for problems
with classification and regression (Murphy, 2012; Zeebaree et al., 2019).
                                               105
IJSB                                                 Volume: 5, Issue: 2 Year: 2021 Page: 102-113
models (i.e., models with fewer parameters). This particular form of regression is ideal for
models with high multicollinearity levels or when certain aspects of model selection are
automated, such as variable selection/parameter elimination.
3. Related work
There are many research areas and related works on this topic. In (Ramana et al., 2011), they
found that the AP datasets were better than the UCLA datasets for all the various chosen
algorithms. The writers used two separate datasets of inputs. The AP data sets were
calculated to be better than the UCLA dataset. Based on the usefulness of their KNN
classification, backward propagation and SVM give better outcomes. For the entire chosen
algorithm, the AP data set is better than UCLA. Besides, 95.07, 96.27, 96.93, 97.47, & 97.07 %
accuracy have C4.5, Backward propagation, Naïve Bayes, SVM, and KNN. (Kousarrizi et al.,
2012) this analysis is focused on two databases on thyroid disease. The first dataset is taken
from the UCI machine learning repository. The second is the actual data gathered from the
Imam Khomeini hospital by the Intelligent Device Laboratory of the K.N.Toosi University of
Technology. They obtained a classification accuracy of 98.62 % using SVM for the first
dataset, which is the highest accuracy achieved so far. (Chitra et al., 2018) in the paper, the
SVM with a Radial base function kernel is used for classification. The output parameters are
high, such as the classification accuracy, sensitivity, and specificity of the SVM and RBF,
making it the right choice for the classification process. (Fan et al., 2013) Twelve
morphological features from the ST segment were extracted. Using the SVM classifier, they
obtained 95.20% sensitivity, 93.29% specificity, and 93.63% accuracy. (Hariharan et al.,
2014) to diagnose Parkinson's disease, in this approach, the neural networks and the SVM
algorithm are fused. The experimental findings show that for Parkinson's dataset, the
combination of feature preprocessing, feature reduction/selection methods, and classification
give a maximum classification precision of 100 %.
The (Senturk & Kara, 2014) intends to contribute to early breast cancer diagnosis in this
study. An analysis of the diagnosis of breast cancer for patients is provided. Seven different
algorithms are used to realize the predictions of the other patients and give them precision.
Patient data from UCI ML during the prediction process, the data mining tool RapidMiner 5.0,
is used to apply data mining with the desired algorithms during the prediction process.
In a difference between two classification algorithms, SVM and ANNs, was addressed by the
Vijayarani & Dhayanand (2015). In this study reached the target of predicting CKD based on
their respective accuracies and timings. The one picked with higher accuracy, and the right
timing was chosen. Survey of a paper (Hashem et al., 2017) to classify liver disease. Different
data mining classification methods were studied in this analysis, and the AP liver dataset data
                                               106
IJSB                                               Volume: 5, Issue: 2 Year: 2021 Page: 102-113
set used had better results than the UCLA dataset and concluded that C4.5 had achieved
better results than other algorithms. (Ko et al., 2017) using thermoscopic and clinical images
that displayed the performance of CNNs approach, a CNNs architecture was trained from
scratch. However, because of the limited datasets, a network's training from scratch to detect
skin cancer is usually not viable. Most of the researchers, therefore either fine-tuned the
model or used pre-trained models.
The output of tumour classification techniques for classifying MR brain image characteristics
as n/a, gliomatosis, multifocal, and multicentric was analyzed (Cinarer & Emiroglu, 2019)
study. KNN, RF, LDA and SVM machine learning algorithms tested these results. Compared to
other algorithms, the SVM algorithm with a 90% precision rate was higher. Javeed et al.
(2019) addressed overfitting, a model has been developed to improve heart disease
                                             107
IJSB                                                          Volume: 5, Issue: 2 Year: 2021 Page: 102-113
prediction; overfitting implies that the proposed model works and provides better data
testing accuracy and gives unfortunate accuracy results for training data when predicting
heart disease. They have built a model to solve this problem to give the best precision for
training and testing results. There are two algorithms in the model: RAS (Random Search
Algorithm) and the other is a random forest algorithm used for model prediction. In both
training data and testing data, this proposed model provided them with better performance.
Intracerebral hemorrhage sources for high mortality rate as a result, (Liu et al., 2019) it is
based on multivariate analysis to anticipate the expansion of hematoma in spontaneous ICH
with normally accessible SVM data and pointed out 83. A randomized 179 search approach
was used in this study for parameter tuning, and recursive function 180 elimination was used
for feature selection. Patient selection for thrombolytic procedures is another significant
factor. Rustam et al. (2020) used three types of the forecast for each model: the number of
cases freshly infected, the number of casualties, and the number of recoveries over the next
ten days. The outcomes provided by the Study Analysis indicate that the use of these methods
in the current COVID-19 pandemic scenario is a promising mechanism. The results show that
of all the models used, and the ES performs best, followed by LASSO & LR, which performs
well in forecasting newly recorded incidents, death rate and recovery rate, Although SVM
does not perform well in the prediction scenarios, the available dataset is given. Tanveer et al.
(2020) analyzed 165 articles from 2005-2019 using different feature extraction techniques
and machine learning techniques. Three key categories are studied in ML techniques: SVM,
ANN and DL, and the ensemble methods.
(Javeed et al.,      Heart      Cleveland        RSA, RF        93.33%    Develop an intelligent system that
2019)               disease    heart failure                   (RSA+RF)   would show good performance on
                               (Meng et al.,                              both training and testing data
                                  2018)                                   diagnosis of heart failure.
                                                                          The best ML and classification
(Cinarer      &      Brain        (TCIA)         KNN, RF,     SVM: 90%    algorithms' goal is to learn from
                                                        108
IJSB                                                           Volume: 5, Issue: 2 Year: 2021 Page: 102-113
Emiroglu,             tumour       (Scarpace et     SVM and                   training automatically and ultimately
2019)                                al. 2015)        LDA                     make a wise decision with high
                                                                              accuracy.
(Durai et al.,          Liver           UCI           J48,     With 95.04,    To predict the same definitive result,
2019)                  disease     (Shi & Malik,     SVM&        the J48      compare algorithm techniques with a
                                      2000)            NB       algorithm     higher accuracy rate for detecting
                                                               has a better   liver disease.
                                                                choice of
                                                                features.
                                                                              The study's objective is to increase the
(Ahmed et al.,       Alzheimer         ADNI           CNN        90.05%       degree of accuracy comparable to
2019)                 Diseases                                                state-of-the-art techniques, address
                                                                              the problem of overfitting, and
                                                                              examine validated brain technologies
                                                                              that include noticeable AD diagnostic
                                                                              features.
                                                                              Based on gene expression data, DL
(Zeebaree et           Cancer        Different        CNN        100%         algorithm applications are used to
al., 2018)             disease         cancer                                 diagnose the disease.
                                      dataset
(Acharya        et   myocardial     Control:40        CNN        98.99%       This study proposed diagnosing MI
al., 2017)           infarction        CHD:7                                  using   11    deep    CNNs layers
                                       (Pan &                                 automatically, using two separate
                                    Tompkins,                                 databases (noise and without noise).
                                   1985; Singh &
                                   Tiwari, 2006)
(Kulkarni            Alzheimer      100 (50 CN,                               The purpose of this research paper is
and Bairagi,          disease         50 AD)          SVM         96%         to examine various characteristics of
2017)                               (Kulkarni &                               Alzheimer's disease diagnosis to serve
                                   Bairagi, 2017)                             as a potential biomarker to
                                                                              differentiate between the topic of AD
                                                                              and the ordinary subject.
(Senturk        et                                                            Determine the best approaches to
al., 2014)             breast           UCI         SVM, NB,       K-         lead to early breast cancer detection.
                       cancer                       KNN and    NN:95.15%,     An overview of the diagnosis of breast
                                                       DT      SVM:96.40%     cancer in patients is given.
(Hariharan et        Parkinson's    PD dataset        SVM        100%         found the best and an integrated
al., 2014)             disease       was used                                 approach to propose to improve the
                                    from (UCI)                                accuracy of detection of Parkinson's
                                                                              disease
                                                                              Determine the best approaches to
(Kumari and           Diabetic          UCI           SVM         78%         lead to early breast cancer detection.
Chitra, 2013)         Disease                                                 An overview of the diagnosis of breast
                                                                              cancer in patients is given.
                                                                              Choose the best methods of feature
(Kousarrizi           Thyroid           UCI           SVM        98.62%       selection and classification for thyroid
et al., 2011)         Disease                                                 disease diagnosis, which is one of the
                                                                              most critical classification problems
Naqi et al. (2020) focused on 3D properties in the feature's extraction process. In image
processing, recent developments in deep learning are a breakthrough. From traditional
handcraft characteristics to deep automated characteristics, the emphasis of mechanical
diagnostic systems has shifted. It helps in better identification and classification with a CT
picture of nodular objects. For better feature reduction and type, an autoencoder and SoftMax
are considered useful tools. Kumar et al. (2020) employed DL techniques, namely CNNs, the
proposed model eradicates errors in the manual process. The model, trained on cells' images,
preprocesses the images first and extracts the best characteristics. This survey is followed by
the optimized Dense Convolutional neural network structure (called DCNN) training the
                                                         109
IJSB                                                Volume: 5, Issue: 2 Year: 2021 Page: 102-113
model and eventually predicting the type of cancer present in the cells. The model correctly
replicated all measurements while accurately recollecting the samples 94 times out of 100.
The aggregate accuracy was 97.2%, which is better than the techniques of CNNs such as
SVMs, DT, RF, NB. This research shows that the DCNN model's performance is similar to that
of the architectures of the developed CNNs with much fewer parameters and computation
time tested on the retrieved dataset. Therefore, to evaluate the form of cancer in the bone
marrow, the model can be used effectively.
Discussion
 This paper discusses various instruments and methods commonly used in the fields of
medicine and healthcare. These tools are within ML and allow us to reach DL's main aim,
finding useful patterns in databases, explaining and making a non-trivial prediction about
data. We summarized the technical details shown in table 1: (including the References, Year,
Diseases, Dataset, Performance and Research Objective) of the research mentioned in this
previous section. As shown in table 1: some researchers used DL algorithms to achieve a
higher rate of deeper detecting to improve precision, trust, and performance. It has been
noticed that five researchers (Kumar et al., 2020; Naqi et al., 2020; Ahmed et al., 2019;
Zeebaree et al., 2018 and Acharya et al., 2017). Focused on the DL algorithms for a detect
disease like (Blood cancer, Lung cancer, Alzheimer, Cancer disease and myocardial infarction)
show the performance column the accuracy of CNNs in cancer disease has a higher rate than
the others disease. Classification is the model used to search for a model or function that
defines and distinguishes the data, classes, or concepts that the model uses to predict the
class of object whose class mark is unknown. In classification, they create software that can
learn how the data objects can be categorized. The derived model can be presented as
classification or rules; many researchers have used different algorithms to help health care
practitioners diagnose diseases with greater precision in diagnosis. In this study many
classification algorithms used for detect disease (LR, LASSO, SVM, KNN, RF, LDA, NB, J48, RSA
and DT) as shown in table 1, SVM in Liu et al. (2019);Cinarer & Emiroglu, (2019); Kulkarni
and Bairagi (2017); Senturk et al. (2014); Hariharan et al. (2014); Kumari and Chitra (2013)
and Kousarrizi et al. (2011) had the higher accuracy among the other classification algorithms
for the disease detection. However, given the available dataset, Rustam et al. (2020) found
that SVM performs poorly in all prediction scenarios and Durai et al. (2019) mentioned J48
algorithm is considered a better output algorithm when it comes to feature selection with an
accuracy rate of 95.04 %.
Conclusion
Intelligent data processing is a social necessity for identifying, as soon as possible, of useful
and robust disease detections to provide patients with appropriate care within the shortest
possible time. This detection has been carried out in recent decades by detecting exciting
patterns in databases. Smart data processing is emerging as a requirement for effective and
robust diseases to be found by society. Detection of patients providing the necessary
treatment as soon as possible within the shortest possible period. This identification has been
achieved in recent decades through the method of identifying exciting patterns in databases.
A comprehensive overview of intelligent data analysis tools in the medical sector is given in
this paper. Some examples of some algorithms used in these medical field areas are also
presented, examining potential patterns based on the target searched, the methodology used,
and the application field. Given the pace at which new works emerge in this emerging field, a
systematic analysis such as the one we have just presented may become obsolete in a short
period. For this reason, we consider that, after a careful quest for new scientific literature,
Table 1 should mainly be revised, provided that further research is more likely to take place
in the short term on the application of established techniques in this field than on the
                                              110
IJSB                                                           Volume: 5, Issue: 2 Year: 2021 Page: 102-113
proposal of new techniques which are novel and not merely enhancing or changing existing
ones.
References
Acharya, U. R., Fujita, H., Oh, S. L., Hagiwara, Y., Tan, J. H., & Adam, M. (2017). Application of deep convolutional
       neural network for automated detection of myocardial infarction using ECG signals. Information Sciences,
       415–416, 190–198. https://doi.org/10.1016/j.ins.2017.06.027
Ahmed, S., Choi, K. Y., Lee, J. J., Kim, B. C., Kwon, G. R., Lee, K. H., & Jung, H. Y. (2019). Ensembles of Patch-Based
       Classifiers    for      Diagnosis     of    Alzheimer       Diseases.     IEEE     Access,   7,   73373–73383.
       https://doi.org/10.1109/ACCESS.2019.2920011
Al-Zebari, A., & Sengur, A. (2019). Performance Comparison of Machine Learning Techniques on Diabetes
       Disease Detection. 1st International Informatics and Software Engineering Conference: Innovative
       Technologies        for      Digital      Transformation,        IISEC      2019      -     Proceedings,    2–5.
       https://doi.org/10.1109/UBMYK48245.2019.8965542
Bagga, P., & Hans, R. (2015). Applications of mobile agents in healthcare domain: A literature survey.
       International        Journal      of      Grid      and       Distributed        Computing,     8(5),     55–72.
       https://doi.org/10.14257/ijgdc.2015.8.5.05
Bargarai, F. A. M., Abdulazeez, A. M., Tiryaki, V. M., & Zeebaree, D. Q. (2020). Management of wireless
       communication systems using artificial intelligence-based software defined radio. International Journal of
       Interactive Mobile Technologies, 14(13), 107–133. https://doi.org/10.3991/ijim.v14i13.14211
Chitra, K. and. (2018). Classification Of Diabetes Disease Using Support Vector Machine. 3(2), 1797–1801.
       https://www.researchgate.net/publication/320395340
Cinarer, G., & Emiroglu, B. G. (2019). Classificatin of Brain Tumors by Machine Learning Algorithms. 3rd
       International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019 -
       Proceedings. https://doi.org/10.1109/ISMSIT.2019.8932878
Daniels, M., & Schroeder, S. A. (1977). Variation among physicians in use of laboratory tests II. Relation to clinical
       productivity and outcomes of care. Medical Care, 15(6), 482–487. https://doi.org/10.1097/00005650-
       197706000-00004
Durai, V. (n.d.). Liver disease prediction using machine learning. 5(2), 1584–1588.
Fan, C. H., Hsu, Y., Yu, S. N., & Lin, J. W. (2013). Detection of myocardial ischemia episode using morphological
       features. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and
       Biology Society, EMBS, 7334–7337. https://doi.org/10.1109/EMBC.2013.6611252
Grimson, J., Stephens, G., Jung, B., Grimson, W., Berry, D., & Pardon, S. (2001). Sharing healthcare records over the
       internet. IEEE Internet Computing, 5(3), 49–58. https://doi.org/10.1109/4236.935177
Hariharan, M., Polat, K., & Sindhu, R. (2014). A new hybrid intelligent system for accurate detection of
       Parkinson's disease. Computer Methods and Programs in Biomedicine, 113(3), 904–913.
       https://doi.org/10.1016/j.cmpb.2014.01.004
Hashem, S., Esmat, G., Elakel, W., Habashy, S., Raouf, S. A., ElHefnawi, M., Eladawy, M., & ElHefnawi, M. (2018).
       Comparison of Machine Learning Approaches for Prediction of Advanced Liver Fibrosis in Chronic
       Hepatitis C Patients. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(3), 861–868.
       https://doi.org/10.1109/TCBB.2017.2690848
Hazra, A., Kumar, S., & Gupta, A. (2016). Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes,
       SVM and Ensemble Algorithms. International Journal of Computer Applications, 145(2), 39–45.
       https://doi.org/10.5120/ijca2016910595
Huang, F. J., & LeCun, Y. (2006). Large-scale learning with SVM and convolutional nets for generic object
       categorization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
       Recognition, 1(July 2006), 284–291. https://doi.org/10.1109/CVPR.2006.164
Huang, M. J., Chen, M. Y., & Lee, S. C. (2007). Integrating data mining with case-based reasoning for chronic
       diseases prognosis and diagnosis. Expert Systems with Applications, 32(3), 856–867.
       https://doi.org/10.1016/j.eswa.2006.01.038
Iswanto, I., Laxmi Lydia, E., Shankar, K., Nguyen, P. T., Hashim, W., & Maseleno, A. (2019). Identifying diseases
       and diagnosis using machine learning. International Journal of Engineering and Advanced Technology, 8(6
       Special Issue 2), 978–981. https://doi.org/10.35940/ijeat.F1297.0886S219
Jahwar, A. F. (2021). META-HEURISTIC ALGORITHMS FOR K-MEANS CLUSTERING : A REVIEW. 17(7), 1–20.
Javeed, A., Zhou, S., Yongjian, L., Qasim, I., Noor, A., & Nour, R. (2019). An Intelligent Learning System Based on
       Random Search Algorithm and Optimized Random Forest Model for Improved Heart Disease Detection.
       IEEE Access, 7, 180235–180243. https://doi.org/10.1109/ACCESS.2019.2952107
Ko, J., Swetter, S. M., Blau, H. M., Esteva, A., Kuprel, B., Novoa, R. A., & Thrun, S. (2017). Dermatologist-level
       classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
       http://dx.doi.org/10.1038/nature21056
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial
                                                         111
IJSB                                                            Volume: 5, Issue: 2 Year: 2021 Page: 102-113
                                                          112
      IJSB                                                         Volume: 5, Issue: 2 Year: 2021 Page: 102-113
    Nareen O. M. Salim & Adnan Mohsin Abdulazeez (2021). Human Diseases Detection Based
    On Machine Learning Algorithms: A Review. International Journal of Science and Business,
    5(2), 102-113. doi: https://doi.org/10.5281/zenodo.4462858
Published by
113