0% found this document useful (0 votes)
12 views4 pages

Mini Research

This document discusses the application of machine learning techniques, specifically K-Nearest Neighbor (K-NN) and Random Forest, for predicting heart disease risk based on various patient characteristics. The study evaluates the performance of these algorithms, achieving an accuracy of 79.0% for K-NN and 80.7% for Random Forest. The findings suggest that machine learning can significantly aid in early diagnosis and management of cardiovascular diseases.

Uploaded by

Moneer Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

Mini Research

This document discusses the application of machine learning techniques, specifically K-Nearest Neighbor (K-NN) and Random Forest, for predicting heart disease risk based on various patient characteristics. The study evaluates the performance of these algorithms, achieving an accuracy of 79.0% for K-NN and 80.7% for Random Forest. The findings suggest that machine learning can significantly aid in early diagnosis and management of cardiovascular diseases.

Uploaded by

Moneer Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Heart disease prediction using machine learning

techniques
Mohammed Ramadan Mohammeed
Computer Sciences (AI)
University of Benghazi
Benghazi, Libya
mohammed.ramadan@uob.edu.ly

Abstract—one of the most well-known uses of artificial II. LITERATURE REVIEW


intelligence, machine learning (ML), is revolutionizing
Through research in this area, techniques for predicting
the field of study. In this work, the use of machine cardiovascular disease using supervised machine learning
learning to determine a person's risk of heart disease is algorithms have been developed. On this subject, several
discussed. Cardiovascular diseases (CVDs) are common study articles have been prepared. A report surveying the
and can possibly be fatal for people anywhere in the globe. performance of many models based on machine learning
A person's age, cholesterol level, chest discomfort, and algorithms and methodologies has been given. [4]. One of the
other characteristics may all be taken into account using studies describes efforts to develop a Graphical User Interface
machine learning to determine if they have a (GUI) that uses a Weighted Association rule-based classifier
cardiovascular disease. Cardiovascular disease diagnosis to determine if a person has heart disease or not [5]. A novel
can be facilitated by machine learning classification method for the prediction of cardiac illness based on the
algorithms based on supervised learning. To distinguish coactive neuro-fuzzy interference system (CANFIS) has been
between individuals with and without cardiac disease, reported in another study [6]. In one of the publications [7],
algorithms such as Random Forest and K-Nearest the methods frequently used to forecast cardiac disease and
Neighbor (KNN) are utilized. This study uses two their associated difficulties are summarized. One of the studies
supervised machine learning algorithms: Random Forest [8] described a classifier strategy for the identification of heart
and K-Nearest Neighbor (K-NN). K-Nearest Neighbor disease and demonstrates the usage of Naive Bayes for
(K-NN) yielded a prediction accuracy of 79.0%, whereas classification purposes. One of the publications conducts a
the Random Forest method produced an accuracy of survey comprising several papers whereby one or more data
mining techniques have been applied to forecast heart disease
80.7%.
[9].
Keywords
III. PROPOSED METHODS
Heart Disease, Random Forest, K Nearest Neighbor (K-NN), A. K-Nearest Neighbor (K-NN)
Machine Learning
For classification tasks, a well-liked machine learning
I. INTRODUCTION algorithm is K-Nearest Neighbors (K-NN). Data points are
categorized using this non-parametric algorithm according to
Human body is made up of various organs, all of which how close they are to one another in a feature space. The K-
have their own functions. Heart is one such organ which NN algorithm counts the number of neighbors, represented by
pumps blood throughout the body and if it does not do so, the the letter k, that will be taken into consideration for
human body can have fatal circumstances. One of the main classification when a new data point is encountered and its
reasons of mortality today is having a heart disease [1]. So, it category or class is unknown. Usually, the user specifies this
becomes necessary to make sure that our cardiovascular value of k in advance or finds it through cross-validation.
system or any other system in the human body for that matter
must remain healthy. Unfortunately, people all around the Because it is based on the notion that data points belonging
world have been facing cardiovascular diseases. Any to the same class tend to be closer to one another in the feature
technology that can help diagnose these diseases before much space, this technique makes the K-NN algorithm a
damage is done will prove as helpful in saving people’s money straightforward but efficient technique for classification tasks.
and more importantly their lives. Data mining techniques can The method makes predictions for unknown data points by
be useful in predicting heart diseases. Predictive models can utilizing the local structure of the data by taking into account
be made by finding previously unknown patterns and trends the class labels of the closest neighbors.
in databases and using the obtained information [2]. To extract B. Random Forest
knowledge from vast volumes of data is to engage in data
mining [3]. One technological advancement that can assist in The Random Forest algorithm is a powerful ensemble
diagnosing cardiac disease early on before significant harm is learning method that combines multiple decision trees to make
done to an individual is machine learning. Machine learning predictions. Using the training set, it constructs a set of
is a rapidly developing subject in science and technology that decision trees, each of which independently generates a
has the ability to diagnose and categorize patients based on predicted class as an output. When it comes to classification
their risk of heart disease. tasks, the final prediction is the class that appears the most
frequently throughout all decision trees.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Random Forest uses the wisdom of the crowd to produce
predictions that are more reliable and accurate by building a
variety of decision trees and combining their forecasts. The
premise of this ensemble approach is that, despite the potential
biases and limitations of each decision tree, the process of
collective decision-making can make up for these drawbacks.
IV. EXPERIMENTAL SETUP
Getting a dataset containing the traits of a person with and
without heart disease is the first step in getting ready. The
dataset for this experiment may be retrieved from the Kaggle
website (https://www.kaggle.com). The Orange Machine
Learning software is the new program used in this experiment.
The data will now be analyzed using this application. To get a
quick overview of the data set, I used a tool called Data Info.
Fig 3 , Distributions Attribute of sex attribute

Fig 1 , Data info

To get certain statistical statistics for the data set, such the
Fig 4 , Distributions of chest pain attribute
average values of the characteristics used, Distributions
Attribute a service offered by Orange Machine Learning is After checking the data balance, the correlation between
utilized. Target is an attribute that is taken; if the patient has the data is discovered using a tool called Heat Map
heart disease, its value is 1, and if not, its value is 0.

Fig 2 , Distributions of target attribute

It is clear from the findings displayed by attribute


distributions that the data set employed in this investigation is
balanced.
Also use Distributions Attribute with different attributes of
the dataset such as the sex attribute which has values of 1
(male) and 0 (female) and the cp (chest pain) attribute which
shows the type of chest pain ranging from 0 to 3.
Fig 5 , Correlation between variables
The heat map unequivocally demonstrates the positive TABLE I. CONFIUSION MATRIX KNN
link between the desired characteristic and qualities like K-Nearest Neighbor (K-NN)
maximal heart rate reached (thalack) and chest pain (cp).
Having confirmed the association, the dataset has to be Actual
processed in order to turn categorical variables like sex, cp,
fbs, restecg, exang, sclop, ca, and thal into dummy variables. 0 1 ∑
To get the best results while training the models, we will

Predicted
change the values of these characteristics to a value between 0 83 26 109
0 and 1.
1 19 115 134
The training data, which makes up 80% of the total data
set in this study, and the testing data, which makes up the ∑ 102 141 243
remaining 20%, were carefully separated from the original
data set. This section enables a thorough assessment of the
machine learning algorithms used in the research. From the confusion matrix, the accuracy is calculated
The chosen machine learning algorithms were then which comes out to be 79.0 %.
applied to the training data once the data set was ready. These B. Random Forest
algorithms constructed heart disease prediction models by Component The value of number of trees is kept 10. The
leveraging the available features and attributes. In order to confusion matrix obtained was as follows.
make precise predictions, it was necessary to train these
models to recognize the underlying relationships and patterns TABLE II. CONFIUSION MATRIX RF
in the data.
Random Forest
A confusion matrix was used to evaluate the trained
models' performance. The comprehensive assessment of the Actual
algorithm's predictive capabilities is given by the confusion
matrix. Confusion matrix can also be shown as a matrix in the 0 1 ∑
following way:
Predicted

0 79 30 109

1 21 113 134

∑ 100 143 243

From the confusion matrix, the accuracy is calculated


which comes out to be 80.7%.
C. Results after applying each algorithm

TABLE III. RESULTS ALGORITHM

Algorithm Used TP FP TN FN Accuracy

K-NN 83 26 115 19 79.0%

Random Forest 79 30 113 21 80.7%


Fig 6 , Distributions Attribute of sex attribute

The accuracy of the algorithm can be calculated using the VI. CONCLUSION
formula: After putting different algorithms to use, it can be
Accuracy = {(TP + TN) / TP + FP + TN + FN)} * 100 concluded that machine learning is showing to be very helpful
in predicting heart disease, which is one of the biggest issues
Through an examination of the algorithmic accuracy, facing society today. There may soon be new techniques to
can determine how well the machine learning models predict make machine learning more beneficial in the healthcare
heart disease. A higher accuracy score indicates a more industry as more and more research is being done in this area.
reliable and precise algorithm, suggesting that it is capable of With the attributes at hand, the algorithms employed in this
making accurate predictions based on the given attributes. experiment have shown excellent performance. Finally, it can
be concluded that by anticipating heart disease, machine
V. RESULTS learning can lessen the harm done to a person's physical and
A. K-Nearest Neighbor (K-NN) mental health.
The value of k was taken as 5 in the Manhattan matrix, as VII. ACKNOWLEDGMENTS
5 was one of the values that gave the highest accuracy for the
algorithm. The confusion matrix obtained was as follows: Thanks and appreciation to Dr. Muhammad Salem and Dr.
Younis Al-Badri for everything you gave me in this semester,
and I hope that we will meet in future lessons.
VIII.REFERENCES [7] Chitra, R., & Seenivasagam, V. (2013). Review of heart disease
prediction system using data mining and hybrid intelligent techniques.
[1] Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart ICTACT journal on soft computing, 3(04), 605-09.
disease prediction using hybrid machine learning techniques. IEEE
Access, 7, 81542-81554. [8] Medhekar, D. S., Bote, M. P., & Deshmukh, S. D. (2013). Heart disease
prediction system using naive Bayes. Int. J. Enhanced Res. Sci.
[2] Bhatla, N., & Jyoti, K. (2012). An analysis of heart disease prediction Technol. Eng, 2(3).
using different data mining techniques. International Journal of
Engineering, 1(8), 1-4. [9] Kaur, B., & Singh, W. (2014). Review on heart disease prediction
system using data mining techniques. International journal on recent
[3] Patel, J., TejalUpadhyay, D., & Patel, S. (2015). Heart disease and innovation trends in computing and communication, 2(10), 3003-
prediction using machine learning and data mining technique. Heart 3008.
Disease, 7(1), 129-137.
[4] Ramalingam, V. V., Dandapath, A., & Raja, M. K. (2018). Heart
disease prediction using machine learning techniques: a survey. IEEE conference templates contain guidance text for
International Journal of Engineering & Technology, 7(2.8), 684687. composing and formatting conference papers. Please
[5] Soni, J., Ansari, U., Sharma, D., & Soni, S. (2011). Intelligent and ensure that all template text is removed from your
effective heart disease prediction system using weighted associative conference paper prior to submission to the
classifiers. International Journal on Computer Science and
Engineering, 3(6), 2385-2392. conference. Failure to remove template text from
[6] Parthiban, L., & Subramanian, R. (2008). Intelligent heart disease your paper may result in your paper not being
prediction system using CANFIS and genetic algorithm. International published.
Journal of Biological, Biomedical and Medical Sciences, 3(3).

You might also like