C-G1 2nd Review
C-G1 2nd Review
A PROJECT REPORT
Submitted to
Department of
ELECTRONICS AND COMMUNICATION ENGINEERING
CERTIFICATE
Certified that the Project entitled as “MULTI-CLASS HEART HEALTH
CLASSIFICATION FROM ECG DATA WITH RANDOM FOREST CLASSIFIER”
that is being submitted by I. Suparna (20JN1A0409), B. Swathi (20JN1A0420), G.
prathyusha (20JN1A0443), A. Amrutha (21JN5A0403) in partial fulfillment of the
requirements for the award of degree of BACHELOR OF TECHNOLOGY in
ELECTRONICS AND COMMUNICATION ENGINEERING by Jawaharlal Nehru
Technological University Anantapur, Ananthapuramu during the academic year 2020-
2024. It is certified that all corrections/suggestions indicated for internal assessment have
been incorporated in the report. The project report has been approved as it satisfies the
academic requirements in respect of project work prescribed for the said degree.
Principal
We hereby declare that the project entitled “Multi-Class Heart Health Classification
From ECG Data with Random Forest Classifier” has been done by us under the guidance
of Mrs. A. PAVANI, M.Tech,(Ph.D) Associate Professor, Department of Electronics &
Communication Engineering. This project work has been submitted to S.V. GROUP OF
INSTITUTIONS as a part of partial fulfillment of the requirements for the award of degree
of Bachelor of Technology.
We also declare that this project report has not been submitted at any time to another
Institute or University for the award of any degree.
Project Associates,
I. SUPARNA (H,T,NO,20JN1A0409)
B. SWATHI (H.T.NO.20JN1A0420)
G. PRATHYUSHA (H.T.NO.20JN1A0443)
A. AMRUTHA (H.T.NO.20JN1A0403)
TABLE OF CONTENTS
ACKNOWLEDGMENT
TABLE OF CONTENTS
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
1 INTRODUCTION 1-11
1.1 Overview 1
I
4 PROPOSED SYSTEM 53-58
4.1 Overview 53-54
4.2 Dataset Preprocessing 54-55
4.3 Random Forest Classifier 55-56
4.4 Random Forest Prediction 56
4.5 Classifier 56-57
4.6 Advantages 57-58
5 MACHINE LEARNING 59-66
6 SOFTWARE ENVIRONMENT 67-83
7 SOURCE CODE 84-89
8 RESULT AND DISCUSSION 90-97
8.1 Sample Dataset 90-91
8.2 Sample Figure Of ECG 91
8.3 The Circle Of Value Count in Dataset 92
8.4 One ECG For Each Caterory 93
8.5 Classification Report 94-96
8.6 Confusion_matrix 97
9 CONCLUSION AND FUTURE SCOPE 98-105
9.1 Conclusion 98
9.2 Future Scope 99
9.3 References 99-105
II
ABSTRACT
Traditional ECG ailment classification methods often rely on handcrafted features and
statistical measures extracted from the ECG signals. While these methods have been used
successfully for some ailments, they may struggle to capture complex and subtle patterns
indicative of certain cardiac abnormalities. Additionally, manual feature engineering can
be time-consuming and may limit the model's ability to adapt to new and diverse datasets.
Moreover, conventional machine learning algorithms might not fully exploit the temporal
dependencies present in ECG signals, leading to suboptimal performance, especially for
long term monitoring scenarios.
To overcome the limitations of existing methods, this work propose a novel machine
learning approach for ECG ailment multi-class classification using machine learning
techniques. These ML networks can capture long-term dependencies and temporal
patterns, making them well suited for sequential data like ECG signals. The ML network
then learns to extract relevant features from the sequential data, enabling accurate
classification of different cardiac ailments.
III
LIST OF FIGURES
IV
LIST OF TABLES
TABLE.NO TABLE NAME PAGE.NO
8.1 Existing Naive Bayes report. 94
8.2 Proposed RFC report. 94
8.3 Overall performance comparison. 95
8.4 Normal Class Performance Comparison. 95
8.5 Artial Premature class performance comparison. 95
8.6 Premature Ventricular contraction class
performance comparison. 96
8.7 Fusion of ventricular and normal
class performance comparison. 96
8.8 Fusion of paced and normal
class performance comparison. 96
V
Multi-Class Heart Health Classification From ECG Data With RFC
CHAPTER-1
INTRODUCTION
1.1 OVERVIEW
Heart health classification involves the assessment and categorization of individuals
based on the condition of their cardiovascular system. This process typically includes
analyzing various factors such as blood pressure, cholesterol levels, heart rate, and overall
lifestyle habits. Healthcare professionals use this classification to determine the risk of
developing heart diseases such as coronary artery disease, heart attack, or stroke. By
classifying individuals into different risk categories, healthcare providers can tailor
preventive measures and interventions to help individuals maintain or improve their heart
health. These interventions may include lifestyle modifications such as diet and exercise,
medication management, or other medical treatments aimed at reducing the risk of
cardiovascular events.
The classification of heart health is crucial for early detection and prevention of
cardiovascular diseases. It allows healthcare providers to identify individuals who are at
higher risk of developing heart-related problems and intervene promptly to mitigate these
risks. Moreover, by regularly monitoring and updating the classification based on changes
in health status or lifestyle habits, healthcare professionals can provide personalized care
and support to individuals to optimize their heart health and overall well-being.
Ultimately, the goal of heart health classification is to empower individuals to take
proactive steps towards maintaining a healthy heart and reducing their risk of
cardiovascular diseases.
The motivation behind research into heart health classification using smartwatch
technology stems from the desire to enhance early detection and management of
cardiovascular diseases. Smartwatches equipped with electrocardiogram (ECG) sensors
can provide real-time monitoring of heart rhythms and detect irregularities. By leveraging
artificial intelligence (AI) and machine learning (ML) algorithms, these devices can
analyze ECG readings collected from the user's wrist and accurately classify potential
heart conditions. This approach enables individuals to conveniently monitor their heart
health continuously without the need for frequent visits to healthcare facilities, leading to
early detection of abnormalities and timely intervention.
Integrating AI and ML algorithms into smartwatches allows for the rapid processing and
interpretation of ECG data, enabling the device to display potential heart disease names
directly on the watch screen. This immediate feedback empowers users to take proactive
measures to address any identified issues promptly, such as seeking medical attention or
adjusting lifestyle habits. Additionally, by providing personalized insights and actionable
information in realtime, smartwatches equipped with heart health classification
capabilities contribute to improving individuals' overall cardiovascular health and
reducing the burden of heart-related illnesses. Thus, the research aims to harness the
potential of wearable technology to revolutionize heart health monitoring and empower
individuals to make informed decisions about their well-being.
The existing system for doctors' diagnosis of heart health classification relies on a
combination of patient history, physical examinations, and diagnostic tests. Doctors begin
by gathering information about the patient's symptoms, medical history, lifestyle factors,
and family history of heart disease. They then conduct a physical examination to assess
vital signs, listen to the heart and lungs, and check for any signs of heart problems.
Diagnostic tests such as electrocardiograms (ECGs), echocardiograms, stress tests, and
blood tests are often performed to further evaluate heart function and detect any
abnormalities.
Based on the collected information and test results, doctors classify the patient's heart
health into various categories such as normal, at risk, or indicative of specific conditions
like coronary artery disease, arrhythmias, or heart failure. This classification guides
treatment decisions and helps in providing appropriate care to improve the patient's heart
health.
The current system for conducting clinical trials on heart health classification involves
several steps. Researchers begin by designing the trial, outlining the specific objectives
and criteria for participant eligibility. Potential participants are then recruited, often
through medical centers or community outreach programs. Once enrolled, participants
undergo a series of assessments to gather baseline data on their heart health status. These
assessments may include medical history interviews, physical examinations, and
diagnostic tests such as electrocardiograms (ECGs) and blood tests. Participants are then
randomly assigned to different treatment groups, which may involve receiving a new
medication, undergoing a specific procedure, or following a particular lifestyle
intervention.
Throughout the trial, researchers carefully monitor participants' progress and collect data
on key outcome measures, such as changes in heart function or incidence of
cardiovascular events. The trial concludes with data analysis to determine the
effectiveness and safety of the intervention in classifying heart health, ultimately
contributing to medical knowledge and informing future clinical practice.
Patient with smart watches: The problem statement revolves around patients who utilize
smartwatches for heart health classification. Currently, many individuals rely on
smartwatches equipped with heart rate monitoring features to track their heart health.
However, there are concerns regarding the accuracy and reliability of these devices in
accurately classifying heart health conditions. While smartwatches can provide
continuous heart rate data, there is uncertainty about their ability to detect subtle changes
or accurately diagnose specific heart conditions. Additionally, the lack of medical-grade
validation and oversight raises questions about the reliability of the information provided
by these devices. Incorporating smartwatch data into clinical decision-making processes
presents challenges such as ensuring data accuracy, interpreting results, and integrating
this information into existing healthcare systems. Further research and validation are
necessary to determine the potential benefits and limitations of utilizing smartwatch data
for heart health classification in clinical practice.
If more human resources: The problem statement revolves around determining the time
complexity for classifying heart health based on various factors. Specifically, it focuses
on understanding how the use of additional human resources affects the time required for
this classification task. The goal is to analyze and quantify the relationship between the
number of human resources involved and the time it takes to classify heart health
accurately. By examining this relationship, we aim to gain insights into the efficiency and
scalability of the classification process in real-world scenarios.
1.5 OBJECTIVE
The objective of heart health classification is to accurately assess and categorize
individuals' heart conditions based on various medical parameters and data. By analyzing
factors such as blood pressure, cholesterol levels, and heart rate, the aim is to provide a
reliable assessment of an individual's cardiac health status. This classification process
helps healthcare professionals in making informed decisions regarding diagnosis,
treatment, and prevention strategies for heart-related diseases. Ultimately, the goal is to
improve patient outcomes and overall cardiovascular health by effectively identifying and
managing potential heart issues at an early stage.
1.6 ADVANTAGES
Disease Diagnosis and Risk Assessment: High accuracy heart health classifications
provide advantageous insights into human written language, enabling precise diagnosis
and proactive treatment strategies. They facilitate a deeper understanding of
cardiovascular conditions, aiding in effective patient care and risk management.
Clinical Decision Support Systems: In the context of heart health, factors like regular
exercise and a balanced diet demonstrate robustness to noise, suggesting their consistent
advantages. Stress management and avoiding smoking similarly exhibit resilience to
noise, reinforcing their importance for a healthy heart.
Research and Drug Development: In heart health discussions, activities such as exercise
and maintaining a balanced diet show advantages due to efficient parallelization, allowing
for simultaneous benefits across various aspects of wellness. Similarly, stress
management and abstaining from smoking exhibit efficiencies in parallelization,
contributing to overall heart health.
1.7 APPLICATION
1.7.4 IOMT
The heart health classification of applications in the Internet of Medical Things (IOMT)
categorizes individuals' heart health based on data collected from wearable devices. These
applications analyze parameters like heart rate, blood pressure, and activity levels to
assess cardiovascular health. By providing real-time monitoring and analysis, they enable
users to understand their heart health status and take proactive measures to improve it.
Healthcare professionals also utilize these classifications to diagnose heart-related
conditions and recommend personalized treatment plans. Overall, IOMT applications
play a crucial role in promoting awareness and management of heart health.
The dataset for heart health classification consists of electrocardiogram (ECG) signals
obtained from human subjects. These signals are captured using ECG sensors attached to
the body, typically on the chest area. The dataset contains various attributes extracted
from these signals, such as waveforms, intervals, and amplitudes. Each data point in the
dataset represents a specific ECG recording from an individual. Additionally, the dataset
includes corresponding labels indicating the heart health status of each subject, such as
normal, arrhythmia, or other cardiac conditions. This dataset serves as a valuable resource
for developing and testing machine learning algorithms aimed at accurately classifying
heart health based on ECG signals.
The performance evaluation of heart health classification involves various metrics, one
of which is the confusion matrix. A confusion matrix provides a detailed breakdown of
the model's performance by comparing predicted labels with actual labels. It consists of
four quadrants: true positive (TP), true negative (TN), false positive (FP), and false
negative (FN). The TP represents correctly classified instances of individuals with heart
disease, while TN indicates correctly classified instances of individuals without heart
disease. On the other hand, FP signifies instances where the model wrongly predicts the
presence of heart disease, and FN denotes instances where the model wrongly predicts
the absence of heart disease. By analyzing the values in each quadrant, researchers can
assess the accuracy, sensitivity, specificity, and other performance metrics of the heart
health classification model, providing valuable insights into its effectiveness in
identifying and classifying heart disease cases.
The performance evaluation of heart health classification involves assessing the accuracy
and effectiveness of the classification model. One commonly used method for evaluating
the model's performance is the classification report. This report provides detailed metrics
such as precision, recall, and F1-score for each class in the classification task. Precision
represents the proportion of true positive predictions among all positive predictions made
by the model, indicating how reliable the positive predictions are. Recall, on the other
hand, measures the proportion of true positive predictions among all actual positive
instances in the dataset, reflecting the model's ability to correctly identify all positive
instances. The F1-score is the harmonic mean of precision and recall, providing a
balanced measure of the model's performance on both precision and recall.
In the context of heart health classification, the classification report helps healthcare
professionals understand how well the model performs in identifying different heart
conditions such as coronary artery disease, arrhythmias, or heart failure. By examining
the precision, recall, and F1-score for each class, clinicians can assess the model's
strengths and weaknesses in classifying specific heart conditions. For instance, a high
precision value indicates that the model makes few false positive predictions, while a
high recall value suggests that the model captures a large proportion of true positive
instances. Overall, the classification report provides valuable insights into the
performance of the heart health classification model, aiding healthcare providers in
making informed decisions about patient care and intervention strategies.
1.9.3 METRICS
Accuracy: Accuracy is the most common metric to be used in everyday talk. Accuracy
answers the question “Out of all the predictions we made, how many were true?”
As we will see later, accuracy is a blunt measure and can sometimes be misleading.
Accuracy measures the overall correctness of the classification model by calculating the
ratio of correctly predicted cases to the total number of cases. Precision focuses on the
proportion of true positive cases among all the cases predicted as positive, highlighting
the model's ability to avoid false positives.
Recall: Recall focuses on how good the model is at finding all the positives. Recall is
also called true positive rate and answers the question “Out of all the data points that
should be predicted as true, how many did we correctly predict as true?”
Recall, also known as sensitivity, evaluates the model's ability to correctly identify all
relevant cases, measuring the proportion of true positive cases identified out of all actual
positive cases.
F1 score: F1 Score is a measure that combines recall and precision. As we have seen
there is a trade-off between precision and recall, F1 can therefore be used to measure how
effectively our models make that trade-off.
One important feature of the F1 score is that the result is zero if any of the components
fall to zero. Thereby it penalizes extreme negative values of either component.
The F1 score combines precision and recall into a single metric, providing a balanced
assessment of the classification model's performance by calculating the harmonic mean
of precision and recall.
These metrics collectively provide insights into the model's ability to accurately classify
individuals' heart health status, taking into account both true positives and false positives,
as well as true negatives and false negatives.
Precision: Precision is a metric that gives you the proportion of true positives to the
amount of total positives that the model predicts. It answers the question “Out of all the
positive predictions we made, how many were true?
CHAPTER-2
LITERATURE SURVEY
2.1 INTRODUCTION
Malakouti, Seyed Matin.[1] One of the most critical steps when diagnosing
cardiovascular disorders was examining and processing ECG data. Classification of
health and ill persons was the primary focus of research in this Area, and approaches
based on machine learning were being used more often. Research in this Area focused
mainly on classification, and an increasing number of researchers were turning to
Ozcan, Mert, and Serhat Peker.[2] Heart disease remained the leading cause of death,
such that nearly one-third of all deaths worldwide were estimated to be caused by heart-
related conditions. Advancing applications of classification-based machine learning to
medicine facilitated earlier detection. In this study, the Classification and Regression Tree
(CART) algorithm, a supervised machine learning method, was employed to predict heart
disease and extract decision rules in clarifying relationships between input and output
variables. In addition, the study’s findings ranked the features influencing heart disease
based on importance. When considering all performance parameters, the 87% accuracy
of the prediction validated the model’s reliability. On the other hand, extracted decision
rules reported in the study could simplify the use of clinical purposes without needing
additional knowledge. Overall, the proposed algorithm could support not only healthcare
professionals but patients who were subjected to cost and time constraints in the diagnosis
and treatment processes of heart disease.
Nguyen, Minh Tuan, Wei Wen Lin, and Jin H. Huang. [4] In this study, two models for
classifying heart rate sounds were proposed to classify heart sound by deep learning
techniques based on the log-mel spectrogram of heart sound signals. The heart sound
dataset comprised five classes, one normal class and four anomalous classes, namely,
Aortic Stenosis, Mitral Regurgitation, Mitral Stenosis, and Murmur in systole. First, the
heart sound signals were framed to a consistent length and thereafter extract the log-mel
spectrogram features. Two deep learning models, long short-term memory and
convolution neural network were proposed to classify heartbeat sounds based on the
extracted features. Analysis results demonstrated the high performance of classification
models, with an overall accuracy of about 99.67%. The results also showed higher
performance compared to previous studies.
Huang, Youhe, Hongru Li, and Xia Yu. [5] Electrocardiogram (ECG) was an important
tool used to analyze abnormal heart activity and assess heart health, especially in remote
cardiac health monitoring. Although deep learning had achieved significant results in
automatic ECG classification, how to combine the characteristics of ECG physiological
signals to construct inputs or features with differentiation was still a key point of
classification. To this end, a novel representation input method with temporal
characteristics was proposed in this paper. At first, the temporal characteristic of ECG
signals was extracted and transformed into a time representation input with the original
input. Subsequently, the deep learning network combining Convolutional Neural
Network and Long Short-Term Memory was employed for feature extraction.
Simultaneous attention mechanism was used to focus on feature differences. The
proposed method was validated in the classification of five classes of heartbeats (Normal
heartbeat, Left bundle branch block heartbeat, Right bundle branch block heartbeat, Atrial
Premature Contraction, Premature ventricular contraction), achieving a higher average
accuracy, precision, sensitivity, and specificity of 98.95%, 97.07%, 96.54%, and 99.33%
respectively in the MIT-BIH arrhythmia database. The results showed that our method
was able to combine the periodic characteristics of ECG to construct a better temporal
representation input than traditional feature fusion. This method could provide a new way
to classify similar physiological signals with periodic characteristics.
Sk, Khader Basha, D. Roja, Sunkara Santhi Priya, Lavanya Dalavi, Sai Srinivas Vellela,
and Venkateswara Reddy. [6]Nowadays, digitalization in the healthcare organizations
placed great emphasis on technological advances in clinical healthcare providers.
Traditional methods for measuring and evaluating outcomes for patients in forecasting
and diagnosing chronic diseases were being substituted by techniques that captured the
most significant insights from medical information by combining predictive modeling
with a highly valuable application of machine learning. Heart disease was nowadays
among the worst disorders in the world. Because the death rate from heart disease
remained largely significant, more intensive efforts in preventive were required, such as
enhancing the accuracy of a heart disease prediction system. Additionally, an early
diagnosis supported in the appropriate diagnosis of the condition as well as the
management of its symptoms. By creating forecasting analytics, Machine Learning (ML)
techniques could be used to anticipate chronic diseases including kidneys and cardiac
disorders. Hence, this analysis presented coronary heart disease prediction and
classification utilizing Hybrid Machine Learning methods. In this approach the
combination of Decision Tree (DT) and Ada Boosting algorithms was used as a hybrid
ML algorithm to predict the CHD. This method's performance was determined by the
performance metrics such as accuracy, True Positive Rate (TPR), and Specificity.
Li, Jiajia, Christopher Brown, Dillon J. Dzikowicz, Mary G. Carey, Wai Cheong Tam,
and Michael Xuelin Huang. [7] A machine learning-based heart health monitoring model,
named H2M, was developed. Twenty-four-hour electrocardiogram (ECG) data from 112
career firefighters were used to train the proposed model. The model used carefully
designed multilayer convolution neural networks with maximum pooling, dropout, and
global maximum pooling to effectively learn the indicative ECG characteristics. H2M
was benchmarked against three existing state-of-the-art machine learning models. Results
showed the proposed model was robust and had an overall accuracy of approximately
94.3%. A parametric study was conducted to demonstrate the effectiveness of key model
components. An additional data study was also carried out, and it was shown that using
non-firefighters’ ECG data to train the H2M model led to a substantial error of ∼40%.
The contribution of this work was to provide firefighters on-demand, real-time status of
heart health status to enhance their situational awareness and safety. This could help
reduce firefighters’ injuries and deaths caused by sudden cardiac events.
Chen, Dan, Juan Feng, HongYan He, WeiPing Xiao, and XiaoJing Liu.[8] Evidence-based
medicine showed that obesity is associated with a wide range of cardiovascular (CV)
diseases. Obesity led to changes in cardiac structure and function, which could result in
obese cardiomyopathy, subclinical cardiac dysfunction, and even heart failure. It also
increased the risk of atrial fibrillation and sudden cardiac death. Many invasive and
noninvasive diagnostic methods could detect obesity-related heart disease at an early
stage, so that appropriate measures could be selected to prevent adverse CV events.
However, studies had shown a protective effect of obesity on clinical outcomes of CV
disease, a phenomenon that had been termed the obesity paradox. The “obesity paradox”
essentially referred to the fact that the classification of obesity defined by body mass
index (BMI) did not consider the impact of obesity heterogeneity on CV disease
prognosis, but simply put subjects with different clinical and biochemical characteristics
into the same category. In any case, indicators such as waistto-hip ratio, ectopic body fat
qualitative and quantitative, and CV fitness had been shown to be able to distinguish
different CV risks in patients with the same BMI, which was convenient for early
intervention in an appropriate way. A multidisciplinary approach, including lifestyle
modification, evidence-based generic and novel pharmacotherapy, and surgical
intervention, could improve CV outcomes in overweight/obese patients.
Maulani, Ahmad Alaik, Sri Winarno, Junta Zeniarja, Rusyda Tsaniya Eka Putri, and Ailsa
Nurina Cahyani.[9] Heart disease, which caused the highest number of deaths worldwide,
recorded about 17.9 million cases in 2019, or about 32% of total global deaths, according
to the World Health Organization (WHO). The significance of early detection of heart
disease drove research to develop effective diagnosis systems utilizing machine learning.
The advancement of machine learning in healthcare currently primarily served as a
supporting role in the ability of clinicians or analysts to fulfill their roles, identify
healthcare trends, and develop disease prediction models. Meanwhile, deep learning had
experienced rapid development and had become the most popular method in recent years,
one of which was detecting diseases. The main objective of this research was to optimize
the hybrid convolutional neural network (CNN) and long short-term memory (LSTM)
model for classifying heart disease by comparing hyperparameter optimization using grid
search and random search. Although random search required less time in hyperparameter
tuning, the classification performance results of grid search showed higher accuracy. In
the test, the hybrid CNN and LSTM model with grid search achieved 91.67% accuracy,
89.66% recall (sensitivity), 93.55% specificity, 92.86% precision, 91.23% f1-score, and
0.9310 AUC value. These results confirmed that using a hybrid CNN and LSTM model
with a grid search approach was better suited for classifying heart disease.
Parveen, Nikhat, Manisha Gupta, Shirisha Kasireddy, Md Shamsul Haque Ansari, and
Mohammad Nadeem Ahmed.[10] The timely prediction of heart diseases with an
automated system reduced the mortality rate of cardiac disease patients. However,
detecting cardiac disease was one of the difficult tasks due to the small variations in the
ECG signal that could not be easily visible to the human eyes. To overcome this issue,
many techniques had been introduced to effectively classify the variation in beats.
However, those techniques faced high error and failed to learn the spatiotemporal
features, which badly affected the accuracy performance. Hence, a novel hybridized DL
technique was introduced, which analyzed the spatio-temporal features and performed
the heartbeat classification accurately with less error rate. At the initial stage, the signal
from the raw dataset was smoothened to enhance the accuracy performance. The pre-
processed samples were then balanced using the synthetic minority oversampling
(SMOTE) technique to avoid over-fitting issues. Then, spatiotemporal features were
extracted using a novel hybridized DL based One-Dimensional Residual Deep
Convolutional Auto-Encoder (1D-RDCAE) technique. Finally, ML based extreme
gradient boosting (XGB) classifier was introduced to classify the ECG heartbeats
effectively. The proposed model was implemented via PYTHON and processed with the
MIT-BIH arrhythmia dataset. Performance measures like accuracy, sensitivity,
specificity, and false negative rate were analyzed and compared with existing techniques.
In the experimental section, the proposed model obtained an accuracy of 99.9% and a
specificity of 99.8%. Compared to other existing models, the proposed model showed
better outcomes. Consequently, clinical cardiac care systems might benefit from this
strategy as well.
of both structured and unstructured data, as well as the rapid advancement of analytical
methodologies. Medical diagnosis models were essential to saving human lives; thus, we
had to be confident enough to treat a patient as advised by a black-box model. Concerns
regarding the lack of openness and understandability, as well as potential bias in the
model's predictions, developed as AI's significance in healthcare increased. The use of
neural networks as a classification method became increasingly significant. The benefits
of neural networks made it possible to classify given data effectively. This study used an
optimized generalized metric learning neural network model approach to examine a
dataset on heart disorders. In the context of cardiac disease, the authors first conducted
the correlation and interdependence of several medical aspects. A goal was to identify the
most pertinent characteristics (an ideal reduced feature subset) for detecting heart disease.
Shivadekar, Samit, Ketan Shahapure, Shivam Vibhute, and Ashley Dunn.[13] A heart
failure (HF) condition was a type of chronic cardiovascular disease that affected millions
of people globally. It could lead to various symptoms and had a significant impact on the
quality of life. Despite the advancements that had been made in treating this condition, it
remained a major public health issue. One of the biggest challenges that HF management
faced was the high number of readmissions. This issue contributed to the increasing of
patients' outcomes and cost the healthcare system. Implementing effective interventions
and identifying those at high risk of returning to the hospital could help lower the financial
burden on the system. Through the use of machine learning techniques, researchers could
now predict the likelihood of HF readmissions. These tools could analyze large datasets
and provide a personalized diagnosis and treatment plan. There had been various studies
that had examined the use of ML for predicting HF readmissions. The goal of this study
was to analyze the various techniques used in predicting HF readmissions and provide a
comprehensive analysis of their performance. Through a combination of data collected
from various sources, including a diverse set of patients, we were able to explore the
performance of various ML algorithms. In addition to the algorithms' performance, we
also looked into their impact on various parameters, such as model evaluation metrics,
optimization techniques, and feature selection. The findings of this study would be used
to inform policymakers and healthcare providers about the use of ML techniques to
identify patients at high risk of HF readmissions. These insights could help them improve
the quality of care for those with this condition and develop effective interventions. The
objective of this study was to use the power of ML to improve the management of HF
and reduce the burden of readmissions on both the patients and the healthcare systems.
Chakraborty, C. Parnasree.[14] Heart disease was one of the major diseases which caused
a sudden loss of life. Early diagnosis of heart-related problems could prevent the
progression of the disease. In this paper, a Hybrid ensemble machine learning model was
suggested with a correlation-based feature selection algorithm. Our proposed model was
built using conventional ensemble bagging, boosting, and stacking methods. The standard
machine learning algorithms such as Support Vector Machine, k-Nearest Neighborhood,
Logistic regression, Decision tree, and Gaussian naïve Bayes were used to build the
ensemble model. The suggested approach was well suited for medical assistance to
physicians as it achieved 97.4% of disease classification accuracy and 98% of precision
which was 4 % and 2% improvements in conventional methods.
Kaur, Ishleen, and Tanvir Ahmad.[15] The main goals of this study were to create a
reliable data analysis model that could help with (i) a better understanding of congenital
heart disease prediction in the presence of missing and unbalanced data and (ii) creating
cohorts of expectant mothers with similar lifestyle characteristics. Clusters of patient
cohorts were produced using the unsupervised data mining technique density-based
spatial clustering of applications with noise (DBSCAN). For more accurate CHD
prediction, a random forest model was trained using these clusters and their
corresponding patterns. This study used a dataset of 33,831 expectant mothers to make
its prediction. Missing data were handled using the k-NN imputation approach, while
extremely unbalanced data were balanced using SMOTE. These techniques were all data-
driven and needed little to no user or expert involvement.
Wright, Brandon, Carly Fassler, Dmitry Tumin, and Lauren A. Sarno. [17]Patients with
congenital heart defects (CHD) seen in the clinic during 2018 and subsequently lost to
cardiology follow-up were included in the study. Loss to follow-up was defined as not
being seen in the clinic for at least 6 months past the most recently recommended follow-
up visit. Subsequent visits to other locations, including other subspecialty clinics, primary
care clinics, the emergency department (ED), and the hospital, were tracked through
2020. Of 235 patients (median age 7 years, 136/99 female/male), 96 (41%) were seen
elsewhere in the health system. Of 96 patients with any follow-up, 40 were seen by a
primary care provider and 46 by another specialist; 44 were seen in the emergency
department (ED) and 12 more were hospitalized. Patients with medical comorbidities or
Medicaid insurance and those living closer to the clinic were more likely to have
continued receiving care within the same health system.
Jou, Stephanie, Sean R. Mendez, Jason Feinman, Lindsey R. Mitrani, Valentin Fuster,
Massimo Mangiola, Nader Moazami, and Claudia Gidea. [18]Approximately 65 million
adults globally had heart failure, and the prevalence was expected to increase
substantially with ageing populations. Despite advances in pharmacological and device
therapy of heart failure, longterm morbidity and mortality remained high. Many patients
progressed to advanced heart failure and developed persistently severe symptoms. Heart
transplantation remained the goldstandard therapy to improve the quality of life,
functional status, and survival of these patients. However, there was a large imbalance
between the supply of organs and the demand for heart transplants. Therefore, expanding
the donor pool was essential to reduce mortality while on the waiting list and improve
clinical outcomes in this patient population. A shift had occurred to consider the use of
organs from donors with hepatitis C virus, HIV, or SARS-CoV-2 infection. Other
advances in this field had also expanded the donor pool, including opt-out donation
policies, organ donation after circulatory death, and xenotransplantation. We provided a
Baek, Ji Yoon, Seung Hee Seo, Sooyoung Cho, Jun-Bean Park, Bhumsuk Keam, Shin
Hye Yoo, and Aesun Shin. [20]This study aimed to examine the impact of the COVID-
19 pandemic on the emergency department (ED) visits of cardiovascular disease (CVD)
patients. The customized data of the National Health Insurance Service (NHIS) from 2017
to 2020 were analyzed. CVD patients were defined by the code ‘V192’ based on the NHIS
coverage benefit expansion policy. The number of ED visits of CVD patients, as well as
executed procedures in 2020 (during the pandemic), were compared to the corresponding
average numbers in 2018 and 2019 (pre-pandemic). Stratification by age group,
residential area, and hospital location was performed. The number of ED visits of newly
diagnosed CVD patients decreased by 2.1% nationwide in 2020 (2018–2019: 97,041;
2020: 95,038) and decreased the most (by 14.1%) in March (2018–2019: 8539; 2020:
7334). However, the number of executed procedures increased by 1.1% nationwide in
2020 (2018–2019: 74,696; 2020: 75,520), while it decreased by 11.9% in April (2018–
2019: 6603; 2020: 5819). The most notable decreases in the number of newly diagnosed
CVD patients (31.7%) and procedures (29.2%) in March 2020 were observed in the
Daegu·Gyeongbuk area. CVD patients living in the epicenter of the COVID19 pandemic
may have experienced difficulty accessing healthcare facilities and receiving proper
treatment.
dietary habits and behaviors among CVD patients in the South through implementation
science approaches as a means of promoting secondary CVD management in the region.
Wiatma, Deny Sutrisna, Reksa Samoedra, I. Putu Bayu Agus Saputra, and Bayu Setia.
[22]In 2019, there were 523 million cases of cardiovascular disease which caused the
deaths of 18.6 million people. In this manner, some major issues needed to be considered,
high cardiovascular endurance, for example. Relatively, high cardiovascular endurance
could reduce the incidence of cardiovascular disease by 40% to 70%. The objective was
to analyze the relationship between physical activity and smoking habits among farmers
in Pandan Wangi Village. The method used was quantitative research with a cross-
sectional design involving 108 respondents. The respondents were selected by a simple
random sampling technique. Data in this study were collected using GPAQ for physical
activity variables, Brinkman index questionnaire for smoking variables, and Harvard step
test for cardiovascular endurance variables. Meanwhile, Spearman rank test was used in
the data analysis. The research showed that the characteristics of respondents were
dominated by males (64.8%) within the 36-45 years old age range group (52.8%). In
addition, most of the respondents were in the non-smoker category (62.0%), had a high
level of physical activity (52.8%), and a very good level of cardiovascular endurance
(27.8%). Bivariate analysis showed that there was a significant relationship between
physical activity (p-value = 0.005) and smoking behavior (p-value =
0.047) on cardiovascular endurance among farmers in Pandan Wangi Village. There was
a significant relationship between physical activity and smoking habits on cardiovascular
endurance among farmers in Pandan Wangi Village.
professional life. Redo-surgery was almost inevitable in patients who had multi-stage
repair of congenital heart surgeries and biological valves at a young age, and often in
those having valve repair in rheumatic disease. So, being familiar with the pitfalls and
precautions to be taken was of crucial importance. In general, the patients presenting for
repeat procedures were sicker, older, and had more comorbid conditions. The dissection
was always rendered difficult by adhesions, scarring, and previous graft placements.
Hence, prolonged dissection time, intraoperative injuries to heart chambers, great vessels,
and grafts, increased bleeding, and poorer cardiac function resulted in higher morbidity
and mortality in such subsets of patients. The outcome was worse with emergency redo-
cardiac surgeries.
Charchar, Fadi J., Priscilla R. Prestes, Charlotte Mills, Siew Mooi Ching, Dinesh
Neupane, Francine Z. Marques, James E. Sharman et al. [24]Hypertension, defined as
persistently elevated systolic blood pressure (SBP) >140 mmHg and/or diastolic blood
pressure (DBP) at least 90 mmHg (International Society of Hypertension guidelines),
affected over 1.5 billion people worldwide. Hypertension was associated with increased
risk of cardiovascular disease (CVD) events (e.g., coronary heart disease, heart failure,
and stroke) and death. An international panel of experts convened by the International
Society of Hypertension College of Experts compiled lifestyle management
recommendations as a first-line strategy to prevent and control hypertension in adulthood.
It was also recommended that lifestyle changes be continued even when blood pressure-
lowering medications were prescribed. Specific recommendations based on literature
evidence were summarized with advice to start these measures early in life, including
maintaining a healthy body weight, increasing levels of different types of physical
activity, adopting healthy eating and drinking habits, avoiding and ceasing smoking and
alcohol use, and managing stress and sleep levels. The relevance of specific approaches
including consumption of sodium, potassium, sugar, fiber, coffee, tea, intermittent
fasting, as well as integrated strategies to implement these recommendations using, for
example , behavior change-related technologies and digital tools, was also discussed.
Seng, Nang San Hti Lar, Gebremichael Zeratsion, Oscar Yasser Pena Zapata, Muhammad
Umer Tufail, and Belinda Jim. [26]Cardiovascular disease was a major cause of death
worldwide, especially in patients with chronic kidney disease (CKD). Troponin T and
troponin I were cardiac biomarkers used not only to diagnose acute myocardial infarction
(AMI) but also to prognosticate cardiovascular and all-cause mortality. The diagnosis of
AMI in the CKD population was challenging because of their elevated troponins at
baseline. The development of high-sensitivity cardiac troponins shortened the time
needed to rule in and rule out AMI in patients with normal renal function. While the
sensitivity of high-sensitivity cardiac troponins was preserved in the CKD population, the
specificity of these tests was compromised. Hence, diagnosing AMI in CKD remained
problematic even with the introduction of high-sensitivity assays. The prognostic
significance of troponins did not differ whether it was detected with standard or high-
sensitivity assays. The elevation of both troponin T and troponin I in CKD patients
remained strongly correlated with adverse cardiovascular and all-cause mortality, and the
prognosis became poorer with advanced CKD stages. Interestingly, the degree of troponin
elevation appeared to be predictive of the rate of renal decline via unclear mechanisms
though activation of the renin-angiotensin and other hormonal/oxidative stress systems
remained suspect. In this review, we presented the latest evidence of the use of cardiac
troponins in both the diagnosis of AMI and the prognosis of cardiovascular and all-cause
mortality. We also suggested strategies to improve on the diagnostic capability of these
troponins in the CKD/endstage kidney disease population.
Thummisetti, Bala Siva Prakash, and Haritha Atluri. [27]This research paper explored the
transformative potential of federated learning in healthcare informatics, focusing on its
pivotal role in balancing advancements with privacy and security imperatives. In an era
marked by exponential growth in healthcare data, federated learning emerged as a
promising paradigm to enable collaborative model training without compromising the
confidentiality of sensitive patient information. Through a decentralized approach, this
paper elucidated the mechanisms of secure aggregation, differential privacy, and
encryption protocols inherent in federated learning, emphasizing their significance in
preserving data privacy. By dissecting real-world implementations and case studies, it
underscored the practical applicability of federated learning while addressing ethical
implications, regulatory considerations, and potential challenges. Ultimately, this paper
advocated for the widespread integration of federated learning in healthcare informatics,
positing it as a cornerstone in advancing medical research while ensuring robust privacy
and security safeguards.
Shield, Kevin, Catherine Paradis, Peter Butt, Tim Naimi, Adam Sherk, Mark Asbridge,
Daniel Myran et al. [28]Low-Risk Alcohol Drinking Guidelines (LRDGs) aimed to
reduce the harms caused by alcohol. However, considerable discrepancies existed in the
‘low-risk’ thresholds employed by different countries. Drawing upon Canada's LRDGs
update process, the current paper offered the following propositions for debate regarding
the establishment of ‘low-risk’ thresholds in national guidelines: (1) as an indicator of
health loss, years of life lost (YLL) had several advantages that could make it more
suitable for setting guidelines than deaths, premature deaths, or disability-adjusted years
of life (DALYs) lost. (2) Presenting age-specific guidelines may not have been the most
appropriate way of providing LRDGs. (3) Given past overemphasis on the so-called
protective effects of alcohol on health, presenting cause-specific guidelines may not have
been appropriate compared with a ‘whole health’ effect derived from a weighted
composite risk function comprising conditions that were causally related to alcohol
consumption. (4) To help people reduce their alcohol use, presenting different risk zones
associated with alcohol consumption instead of a single low-risk threshold may have been
advantageous.
Neshat, Sina, Abbas Rezaei, Armita Farid, Salar Javanshir, Fatemeh Dehghan Niri,
Padideh Daneii, Kiyan Heshmat-Ghahdarijani, and Setayesh Sotoudehnia Korani.
[29]Cardiovascular diseases (CVDs) posed a serious threat to people’s health, with
extremely high global morbidity, mortality, and disability rates. This study aimed to
review the literature that examined the relationship between blood groups and CVD.
Many studies have reported that non-O blood groups were associated with an increased
risk and severity of coronary artery disease and acute coronary syndromes. Non-O blood
groups increased the risk and severity of these conditions by increasing von Willebrand
factor and plasma cholesterol levels and inducing endothelial dysfunction and
inflammation. They were also linked with increased coronary artery calcification,
coronary lesion complexity, and poor collateral circulation. Blood groups also affected
the prognosis of coronary artery disease and acute coronary syndrome and could alter the
rate of complications and mortality. Several cardiovascular complications were described
for coronavirus disease 2019, and blood groups could influence their occurrence. No
studies found a significant relationship between the Lewis blood group and CVD. In
conclusion, people with non-O blood groups should be vigilantly monitored for
cardiovascular risk factors as prevention and proper treatment of these risk factors might
mitigate their risk of CVD and adverse cardiovascular events.
Lee, Chien-Chiang, and Zihao Yuan. "Impact of energy poverty on public health: A non-
linear study from an international perspective. [30]Research on energy poverty and its
impact has been quite extensive, but the impact of such poverty on public health was still
lacking. This paper thus presented the relationship between energy poverty and public
health of 185 countries from 2000 to 2020 as well as the role of urbanization development
levels in this nexus. To achieve this goal, this study used a partial linear function
coefficient (PLFC) method to analyze the relationship between them, which could also
clearly exhibit the non-linear impact of energy poverty on public health. First, both linear
and non-linear regression results showed that energy poverty had significantly negative
impacts on public health. Second, urbanization level played a significant moderating
effect in the energy poverty and public health nexus, meaning that energy poverty affected
public health under the influence of urbanization. According to the PLFC model results,
countries that exceeded the threshold of urbanization had significantly reduced the
adverse effects of energy poverty on public health. Third, this study investigated the
heterogeneous impact of energy poverty across different regions, comparing the
SubSaharan Africa region with other areas. The results revealed in the Sub-Saharan
Africa region that affordable energy under the influence of urbanization provided a new
pathway for improving public health in that region, whereas this effect was considerably
smaller in other regions. Additionally, a series of tests confirmed the robustness of the
results. This paper offered a reference for the development and implementation of
renewable energy-related public health policies.
Li, Jian Ping, Amin Ul Haq, Salah Ud Din, Jalaluddin Khan, Asif Khan, and Abdus
Saboor.[31] Heart disease was one of the complex diseases, and globally, many people
suffered from this disease. On time and efficient identification of heart disease played a
key role in healthcare, particularly in the field of cardiology. In our article, we proposed
an efficient and accurate system to diagnose heart disease, based on machine learning
techniques. The system was developed based on classification algorithms, including
Support Vector Machine, Logistic Regression, Artificial Neural Network, K-nearest
Neighbor, Naïve Bayes, and Decision Tree, while standard feature selection algorithms
were used, such as Relief, Minimal Redundancy Maximal Relevance, Least Absolute
Shrinkage Selection Operator, and Local Learning for removing irrelevant and redundant
features. We also proposed a novel fast conditional mutual information feature selection
algorithm to solve the feature selection problem. The feature selection algorithms were
used for feature selection to increase the classification accuracy and reduce the execution
Deng, Muqing, Tingting Meng, Jiuwen Cao, Shimin Wang, Jing Zhang, and Huijie
Fan.[32] Heart sound classification played a vital role in the early detection of
cardiovascular disorders, especially for small primary health care clinics. Despite much
progress being made for heart sound classification in recent years, most of them were
based on conventional segmented features and shallow structure-based classifiers. These
conventional acoustic representation and classification methods might have been
insufficient in characterizing heart sound and generally suffered from degraded
performance due to the complicated and changeable cardiac acoustic environment. In our
paper, we proposed a new heart sound classification method based on improved Mel-
frequency cepstrum coefficient (MFCC) features and convolutional recurrent neural
networks. The Mel-frequency cepstrums were firstly calculated without dividing the heart
sound signal. We proposed a new improved feature extraction scheme based on MFCC
to elaborate the dynamic characteristics among consecutive heart sound signals. Finally,
the MFCC-based features were fed to a deep convolutional and recurrent neural network
(CRNN) for feature learning and later classification task. The proposed deep learning
framework could take advantage of the encoded local characteristics extracted from the
convolutional neural network (CNN) and the long-term dependencies captured by the
recurrent neural network (RNN). We presented comprehensive studies on the
performance of different network parameters and different network connection strategies.
Performance comparisons with stateof-the-art algorithms were given for discussions.
Experiments showed that, for the two-class classification problem (pathological or non-
pathological), a classification accuracy of 98% had been achieved on the 2016
PhysioNet/CinC Challenge database.
Abdellatif, Abdallah, Hamdan Abdellatef, Jeevan Kanesan, Chee-Onn Chow, Joon Huang
Chuah, and Hassan Muwafaq Gheni.[33] Cardiovascular disease (CVD) was the leading
cause of death worldwide. A Machine Learning (ML) system could predict CVD in the
early stages to mitigate mortality rates based on clinical data. Recently, many research
works utilized different machine learning approaches to detect CVD or identify the
patient’s severity level. Although these works obtained promising results, none focused
on employing optimization methods to improve the ML model performance for CVD
detection and severity-level classification. This study provided an effective method based
on the Synthetic Minority Oversampling Technique (SMOTE) to handle imbalance
distribution issue, six different ML classifiers to detect the patient status, and
Hyperparameter Optimization (HPO) to find the best hyperparameter for ML classifier
together with SMOTE. Two public datasets were used to build and test the model using
all features. The results showed that SMOTE and Extra Trees (ET) optimized using
hyperband achieved higher results than other models and outperformed the state-of-the-
art works by achieving 99.2% and 98.52% in CVD detection, respectively. Also, the
developed model converged to 95.73% severity classification using the Cleveland
dataset. The proposed model could help doctors determine a patient’s current heart
disease status. As a result, it was possible to prevent heart disease-related mortality by
implementing early therapy.
Chen, Yongchao, Shoushui Wei, and Yatao Zhang.[34] We proposed a novel method that
combined modified frequency slice wavelet transform (MFSWT) and convolutional
neural network (CNN) for classifying normal and abnormal heart sounds. A hidden
Markov model was used to find the position of each cardiac cycle in the heart sound
signal and determine the exact position of the four periods of S1, S2, systole, and diastole.
Then, the one-dimensional cardiac cycle signal was converted into a two-dimensional
time-frequency picture using the MFSWT. Finally, two CNN models were trained using
the aforementioned pictures. We combined two CNN models using sample entropy
(SampEn) to determine which model was used to classify the heart sound signal. We
evaluated our model on the heart sound public dataset provided by the PhysioNet
Computing in Cardiology Challenge 2016. Experimental classification performance from
a 10-fold cross-validation indicated that sensitivity (Se), specificity (Sp), and mean
accuracy (MAcc) were 0.95, 0.93, and 0.94, respectively. The results showed the
proposed method could classify normal and abnormal heart sounds with efficiency and
high accuracy.
Shah, Devansh, Samir Patel, and Santosh Kumar Bharti.[35] Heart disease, alternatively
known as cardiovascular disease, encases various conditions that impact the heart and is
the primary basis of death worldwide over the span of the past few decades. It associates
many risk factors in heart disease and a need of the time to get accurate, reliable, and
sensible approaches to make an early diagnosis to achieve prompt management of the
disease. Data mining is a commonly used technique for processing enormous data in the
healthcare domain. Researchers apply several data mining and machine learning
techniques to analyse huge complex medical data, helping healthcare professionals to
predict heart disease. This research paper presents various attributes related to heart
disease, and the model on basis of supervised learning algorithms as Naïve Bayes,
decision tree, K-nearest neighbor, and random forest algorithm. It uses the existing
dataset from the Cleveland database of UCI repository of heart disease patients. The
dataset comprises 303 instances and 76 attributes. Of these 76 attributes, only 14
attributes are considered for testing, important to substantiate the performance of different
algorithms. This research paper aims to envision the probability of developing heart
disease in the patients. The results portray that the highest accuracy score is achieved with
K-nearest neighbor.
Oliveira, Jorge, Francesco Renna, Paulo Dias Costa, Marcelo Nogueira, Cristina Oliveira,
Carlos Ferreira, Alípio Jorge et al.[36] Cardiac auscultation was one of the most cost-
effective techniques used to detect and identify many heart conditions. Computer-assisted
decision systems based on auscultation could support physicians in their decisions.
Unfortunately, the application of such systems in clinical trials was still minimal since
most of them only aimed to detect the presence of extra or abnormal waves in the
phonocardiogram signal, i.e., only a binary ground truth variable (normal vs. abnormal)
was provided. This was mainly due to the lack of large publicly available datasets, where
a more detailed description of such abnormal waves (e.g., cardiac murmurs) existed. To
pave the way for more effective research on healthcare recommendation systems based
on auscultation, our team prepared the currently largest pediatric heart sound dataset. A
total of 5282 recordings were collected from the four main auscultation locations of 1568
patients, in the process, 215780 heart sounds were manually annotated. Furthermore, and
for the first time, each cardiac murmur was manually annotated by an expert annotator
according to its timing, shape, pitch, grading, and quality. In addition, the auscultation
locations where the murmur was present were identified as well as the auscultation
location where the murmur was detected more intensively. Such detailed description for
a relatively large number of heart sounds may pave the way for new machine learning
algorithms with a real-world application for the detection and analysis of murmur waves
for diagnostic purposes.
Balaji, Tata.[37] Heart Disease (HD) was one among the critical diseases that severely
affected humankind. The presence of heart disease arose due to insufficient blood supply
to other body parts. Henceforth, diagnosing the HD on time prevented heart failure.
Traditional diagnosing procedures regarding HD detection and prediction became
unreliable in many circumstances. Recent studies put forth the witness that implication
of Machine Learning (ML) in traditional HD detection and prediction resulted in superior
performance. Further, Computer-Aided Diagnosis using one-dimensional and multi-
dimensional signals assisted in diagnosing the HDs at an early stage, thereby saving
human lives. The objective of this manuscript was to present an overview of HDs,
symptoms, and the role of ML in HD predictions followed by various state-of-the-art ML
algorithms that aided in the identification and prediction of HD at an early stage to save
human lives.
Pati, Abhilash, Manoranjan Parhi, and Binod Kumar Pattanayak.[38] The prediction of
heart disease (HD) helped the physicians in taking accurate decisions towards the
improvement of patient's health. Hence, machine learning (ML), data mining (DM), and
classification techniques played a vital role in understanding and reducing the symptoms
related to HDs. In this paper, an integrated heart disease prediction model (IHDPM) was
introduced for HD prediction by considering principal component analysis (PCA) for
dimensionality reduction, sequential feature selection (SFS) for feature selection, and
random forest (RF) classifier for classifications. Some experiments were performed by
considering different evaluative measures on Cleveland Heart Disease Dataset (CHDD)
sourced from the UCI-ML repository and Python language, thereby concluding that the
proposed model outperformed the other six conventional classification techniques. The
proposed model would have helped out the physicians in conducting a diagnosis of the
heart patients proficiently and at the same time, it could have been applicable in
predictions of other chronic diseases like diabetes, cancers, etc.
Vamshi Kumar, S., T. V. Rajinikanth, and S. Viswanadha Raju.[40] Recent studies showed
that heart attack was one of the severe problems in today’s world. Prediction was one of
the crucial challenges in the medical field. In the heart, there were two main blood vessels
for the supply of blood through coronary arteries. If the arteries got completely blocked,
then it led to a heart attack. The healthcare field had lots of data related to different
diseases, so machine learning techniques were useful to find results effectively for
predicting heart diseases. In this paper, data was preprocessed in order to remove the
noisy data, filling the missing values using measures of central tendencies. Later, the
refined dataset was classified using classifiers apart from prediction. The numbers of
attributes were reduced using dimensionality reduction techniques namely Linear
Transformation Techniques (LTT) like Principal Component Analysis (PCA) and Linear
Discriminant Analysis (LDA). The performances of the classifiers were analyzed based
on various accuracy-related metrics. The designed classifier model was able to predict
the occurrence of a heart attack. The Support Vector Machine (SVM) classifier was
applied along with the three kernels namely Linear (linear), Radial Basis Function (RBF),
and Polynomial (poly). Another technique namely Decision Tree (DT) was also applied
on the Cleveland dataset, and the results were compared in detail, and effective
conclusions were drawn from the results.
Ali, Farman, Shaker El-Sappagh, SM Riazul Islam, Daehan Kwak, Amjad Ali,
Muhammad Imran, and Kyung-Sup Kwak.[41] The accurate prediction of heart disease
was essential to efficiently treating cardiac patients before a heart attack occurred. This
goal could be achieved using an optimal machine learning model with rich healthcare
data on heart diseases. Various systems based on machine learning had been presented
recently to predict and diagnose heart disease. However, these systems could not handle
high-dimensional datasets due to the lack of a smart framework that could use different
sources of data for heart disease prediction. In addition, the existing systems utilized
conventional techniques to select features from a dataset and compute a general weight
for them based on their significance. These methods had also failed to enhance the
performance of heart disease diagnosis. In this paper, a smart healthcare system was
proposed for heart disease prediction using ensemble deep learning and feature fusion
approaches. First, the feature fusion method combined the extracted features from both
sensor data and electronic medical records to generate valuable healthcare data. Second,
the information gain technique eliminated irrelevant and redundant features and selected
the important ones, which decreased the computational burden and enhanced the system
performance. In addition, the conditional probability approach computed a specific
feature weight for each class, which further improved system performance. Finally, the
ensemble deep learning model was trained for heart disease prediction. The proposed
system was evaluated with heart disease data and compared with traditional classifiers
based on feature fusion, feature selection, and weighting techniques. The proposed
system obtained an accuracy of 98.5%, which was higher than existing systems. This
result showed that our system was more effective for the prediction of heart disease, in
comparison to other state-of-the-art methods.
Katarya, Rahul, and Sunit Kumar Meena.[42] Nowadays, people were getting caught in
their day-to-day lives doing their work and other things and ignoring their health. Due to
this hectic life and ignorance towards their health, the number of people getting sick
increased every day. Moreover, most of the people were suffering from diseases like heart
disease. Global deaths of almost 31% population were due to heart-related diseases as
data contributed by the World Health Organization (WHO). So, the prediction of
happening heart disease or not became important for the medical field. However, data
received by the medical sector or hospitals was so huge that sometimes it became difficult
to analyze. Using machine learning techniques for this prediction and handling of data
could become very efficient for medical people. Hence, in this study, we discussed heart
disease and its risk factors and explained machine learning techniques. Using those
machine learning techniques, we predicted heart disease and provided a comparative
analysis of the algorithms for machine learning used for the experiment of the prediction.
The goal or objective of this research was completely related to the prediction of heart
disease via a machine learning technique and analysis of them.
Katarya, Rahul, and Polipireddy Srinivas.[43] Predicting and detecting heart disease has
always been a critical and challenging task for healthcare practitioners. Hospitals and
other clinics offered expensive therapies and operations to treat heart diseases. So,
predicting heart disease at the early stages was useful to the people around the world so
that they could take necessary actions before it got severe. Heart disease was a significant
problem in recent times; the main reason for this disease was the intake of alcohol,
tobacco, and lack of physical exercise. Over the years, machine learning showed effective
results in making decisions and predictions from the broad set of data produced by the
healthcare industry. Some of the supervised machine learning techniques used in this
prediction of heart disease were artificial neural network (ANN), decision tree (DT),
random forest (RF), support vector machine (SVM), naïve Bayes) (NB), and k-nearest
neighbor algorithm. Furthermore, the performances of these algorithms were
summarized.
Rath, Adyasha, Debahuti Mishra, Ganapati Panda, and Suresh Chandra Satapathy.[44] In
comparison to other diseases, the number of deaths from Heart Disease (HD) was the
highest across the globe. The trend of death due to HD was still rising, which had become
a constant source of concern amongst human beings. Researchers and doctors were
putting tremendous efforts to save lives from HD. It was observed from the literature that
a large number of researchers were currently carrying out their research work in various
aspects of HD. Among those, the early detection and diagnosis of HD were currently the
focus area of research. Appropriate, reliable, accurate, robust, and affordable HD
detection schemes were the ultimate goal for saving the lives of people. In this research,
articles on HD detection and diagnosis published in the recent past were collected and
critically analyzed. The outcome of the analysis was presented in various tabular forms
for easy understanding and further use. The paper provided thorough knowledge on
standard data sources on HD, the feature extraction, selection, and reduction methods,
and Machine Learning (ML) and Deep Learning (DL) based classification schemes. The
categorization of published articles and the various performance measures employed
were presented, which would develop interest amongst new researchers working in the
area of detection or classification of HD. The best performing technique in each category
was listed. The research challenges and future scope of work were also provided to
facilitate further research work in this promising area.
Kumar, Ashish, Rama Komaragiri, and Manjeet Kumar.[46] Heart rate monitoring and
therapeutic devices included real-time sensing capabilities reflecting the state of the heart.
Current circuitry could be interpreted as a cardiac electrical signal compression algorithm
representing the time signal information into a single event description of the cardiac
activity. It was observed that some detection techniques developed for ECG signal
detection like artificial neural network, genetic algorithm, Hilbert transform, hidden
Markov model were some sophisticated algorithms which provided suitable results but
their implementation on a silicon chip was very complicated. Due to less complexity and
high performance, wavelet transform-based approaches were widely used. In this paper,
after a thorough analysis of various wavelet transforms, it was found that Biorthogonal
wavelet transform was best suited to detect ECG signal's QRS complex. The main steps
involved in the ECG detection process consisted of de-noising and locating different ECG
peaks using adaptive slope prediction thresholding. Furthermore, the significant
challenges involved in the wireless transmission of ECG data were data conversion and
power consumption. As medical regulatory boards demanded a lossless compression
technique, a lossless compression technique with a high bit compression ratio was highly
required. Furthermore, in this work, an LZMA based ECG data compression technique
was proposed. The proposed methodology achieved the highest signal to noise ratio, and
lowest root mean square error. Also, the proposed ECG detection technique was capable
of distinguishing accurately between healthy, myocardial infarction, congestive heart
failure and coronary artery disease patients with a detection accuracy, sensitivity,
specificity, and error of 99.92%, 99.94%, 99.92% and 0.0013, respectively. The use of
LZMA data compression of ECG data achieved a high compression ratio of 18.84. The
advantages and effectiveness of the proposed algorithm were verified by comparing with
the existing methods.
Bahrami, Boshra, and Mirsaeid Hosseini Shirvani.[47] The background— New age- and
sexspecific lipoprotein cut points that were developed from National Health and Nutrition
Examination Survey (NHANES) data were considered to be a more accurate
classification of a high-risk lipoprotein level in adolescents compared with existing cut
points established by the National Cholesterol Education Program (NCEP). The aim of
the study was to determine which of the NHANES or NCEP adolescent lipoprotein
classifications was most effective for predicting abnormal levels in adulthood.
Bahrami, Boshra, and Mirsaeid Hosseini Shirvani.[49] The background— New age- and
sexspecific lipoprotein cut points were developed from National Health and Nutrition
Examination Survey (NHANES) data and were considered to be a more accurate
classification of a highrisk lipoprotein level in adolescents compared with existing cut
points established by the National Cholesterol Education Program (NCEP). The aim of
the study was to determine which of the NHANES or NCEP adolescent lipoprotein
classifications was most effective for predicting abnormal levels in adulthood.
Dewan, Ankita, and Meghna Sharma.[52] Fluctuations in heart rate were intimately
related to changes in the physiological state of the organism. This relationship was
exploited by classifying a human participant's wake/sleep status using their instantaneous
heart rate (IHR) series. An approach was employed using a convolutional neural network
(CNN) to build features from the IHR series extracted from a whole-night
electrocardiogram (ECG) and predict every 30 seconds whether the participant was
awake or asleep. The training database consisted of 56 normal participants, and three
different databases were considered for validation; one was private, and two were public
with different races and apnea severities. On the private database of 27 participants, the
accuracy, sensitivity, specificity, and F1 values for predicting the wake stage were 75.3%,
52.4%, 89.4%, and 0.83, respectively. Validation performance was similar on the two
public databases. When the photoplethysmography was used instead of the ECG to obtain
the IHR series, the performance was also comparable. A robustness check was carried out
to confirm the obtained performance statistics. These results advocated for an effective
and scalable method for recognizing changes in physiological state using non-invasive
heart rate monitoring. The CNN model adaptively quantified IHR fluctuation as well as
its location in time and was suitable for differentiating between the wake and sleep stages.
Shuvo, Samiul Based, Shams Nafisa Ali, Soham Irtiza Swapnil, Mabrook S. Al-Rakhami,
and Abdu Gumaei.[53]The alarmingly high mortality rate and increasing global
prevalence of cardiovascular diseases (CVDs) signified the crucial need for early
detection schemes. Phonocardiogram (PCG) signals had been historically applied in this
domain owing to their simplicity and cost-effectiveness. In that article, CardioXNet was
proposed, a novel lightweight end-to-end CRNN architecture for automatic detection of
five classes of cardiac auscultation namely normal, aortic stenosis, mitral stenosis, mitral
regurgitation, and mitral valve prolapse using raw PCG signal. The process was
automated by the involvement of two learning phases namely, representation learning,
and sequence residual learning. Three parallel CNN pathways were implemented in the
representation learning phase to learn the coarse and fine-grained features from the PCG
and to explore the salient features from variable receptive fields involving 2D-CNN based
squeeze-expansion. Thus, in the representation learning phase, the network extracted
efficient time-invariant features and converged with great rapidity. In the sequential
residual learning phase, because of the bidirectional-LSTMs and the skip connection, the
network could proficiently extract temporal features without performing any feature
extraction on the signal. The obtained results demonstrated that the proposed end-toend
architecture yielded outstanding performance in all the evaluation metrics compared to
the previous state-of-the-art methods with up to 99.60% accuracy, 99.56% precision,
99.52% recall, and 99.68% F1-score on an average while being computationally
comparable. This model outperformed any previous works using the same database by a
considerable margin. Moreover, the proposed model was tested on the PhysioNet/CinC
2016 challenge dataset achieving an accuracy of 86.57%. Finally, the model was
evaluated on a merged dataset of Github PCG dataset and PhysioNet dataset achieving
an excellent accuracy of 88.09%. The high accuracy metrics on both primary.
Reddy, N. Satish Chandra, Song Shue Nee, Lim Zhi Min, and Chew Xin Ying.[54]The
heart disease has been one of the major causes of death worldwide. The heart disease
diagnosis was expensive, thus it was necessary to predict the risk of getting heart disease
with selected features. The feature selection methods could be used as valuable
techniques to reduce the cost of diagnosis by selecting the important attributes. The
objectives of the study were to predict the classification model and to know which
selected features played a key role in the prediction of heart disease by using Cleveland
and statlog project heart datasets. The accuracy of the random forest algorithm both in
classification and feature selection models was observed to be 90–95% based on three
different percentage splits. The 8 and 6 selected features seemed to be the minimum
feature requirements to build a better performance model. Whereby, further dropping of
the 8 or 6 selected features did not lead to better performance for the prediction model.
Deng, Shi-Wen, and Ji-Qing Han.[56] In the study, a novel framework for heart sound
classification without segmentation was presented, based on the autocorrelation feature
and diffusion maps, aiming to provide a primary diagnosis in primary health centers and
home care settings. In the proposed framework, autocorrelation features were first
extracted from the subband envelopes calculated from the sub-band coefficients of the
heart signal using discrete wavelet decomposition (DWT). Then, the autocorrelation
features were fused to obtain the unified feature representation with diffusion maps.
Finally, the unified feature was input into the Support Vector Machines (SVM) classifier
to perform the task of heart sound classification. Additionally, the proposed framework
was evaluated on two public datasets published in the PASCAL Classifying Heart Sounds
Challenge. The experimental results demonstrated outstanding performance of the
proposed method compared with the baselines.
Singh, Jagdeep, Amit Kamra, and Harbhag Singh.[57] Today's healthcare services had
come a long way to provide medical care to the patients and protect them from various
diseases. This paper comprised the development of a framework based on associative
classification techniques on a heart dataset for early diagnosis of heart-based diseases. It
was hard to diagnose the heart diseases with just observation that arrived suddenly and
might prove fatal when uncontrolled. The implementation of work was done on the
Cleveland heart diseases dataset from the University of California Irvine (UCI) machine
learning repository to test on different data mining techniques. The various attributes
related to the cause of heart diseases were viz: gender, age, chest pain type, blood
pressure, blood sugar, etc., that could predict early symptoms of heart disease. Various
data mining algorithms such as Apriori, FP-Growth, Naive Bayes, ZeroR, OneR, J48, and
k-nearest neighbor were applied in this study for the prediction of heart diseases. On the
basis of the best results, the development of a heart disease prediction system was done
by using a hybrid technique for classification associative rules (CARs) to achieve a
prediction accuracy of 99.19%.
Malik, John, Yu-Lun Lo, and Hau-tieng Wu.[58] Fluctuations in heart rate were intimately
related to changes in the physiological state of the organism. This relationship was
Fida, Benish, Muhammad Nazir, Nawazish Naveed, and Sheeraz Akram.[59] Heart
disease diagnosis was considered one of the complicated tasks in the medical field. In
order to perform heart disease diagnosis, an accurate and efficient automation system
could have been very helpful. In this research, a classifier ensemble method was proposed
to improve the decision of the classifiers for heart disease diagnosis. Homogeneous
ensemble was applied for heart disease classification, and finally, results were optimized
by using Genetic algorithm. Data was evaluated by using 10-fold cross-validation, and
the performance of the system was evaluated by classifiers' accuracy, sensitivity, and
specificity to check the feasibility of the system. Comparison of the methodology with
existing ensemble techniques showed considerable improvements in terms of
classification accuracy.
Kirsch, J., and A. McGuire.[60] The study considered the feasibility of defining a QALY
from disease-specific data using the New York Heart Association (NYHA) classification
of heart failure. Health state values for the four different NYHA classifications of disease
progression were derived using the time trade-off (TTO) instrument associated with the
five-dimensional (EQ-5D) health state valuation method. Consistent mappings between
the disease classification and the chosen QALY instrument were found. With this being
the case, the assumption of constant proportionality, necessary to define the QALY as an
acceptable measure of healthrelated preferences, was considered. It was found that
constant proportionality did not hold across the more severe health states, thus
questioning the use of QALYs as representing cardinal preference structures.
Current research on heart health classification reveals several notable gaps. Firstly, there
is a lack of standardized criteria for categorizing heart health status across different
populations. This inconsistency hampers the development of universally applicable
classification models. Additionally, while machine learning techniques show promise in
classifying heart health based on various data sources such as medical records and
imaging scans, there remains a need for more robust validation and comparison studies
to determine the most effective algorithms.
Malik, John, Yu-Lun Lo, and Hau-tieng Wu.[59] Fluctuations in heart rate were intimately
related to changes in the physiological state of the organism. This relationship was
exploited by classifying a human participant's wake/sleep status using his instantaneous
heart rate (IHR) series. An approach was employed using a convolutional neural network
(CNN) to build features from the IHR series extracted from a whole-night
electrocardiogram (ECG) and predict every 30 s whether the participant was awake or
asleep. The training database consisted of 56 normal participants, and three different
databases were considered for validation; one was private, and two were public with
different races and apnea severities.On the private database of 27 participants, the
accuracy, sensitivity, specificity, and F1 values for predicting the wake stage were [values
here]. Validation performance was similar on the two public databases. When the
photoplethysmography was used instead of the ECG to obtain the IHR series, the
performance was also comparable. A robustness check was carried out to confirm the
obtained performance statistics.This result advocated for an effective and scalable method
for recognizing changes in physiological state using non-invasive heart rate monitoring.
The CNN model adaptively quantified IHR fluctuation as well as its location in time and
was suitable for differentiating between the wake and sleep stages.
Fida, Benish, Muhammad Nazir, Nawazish Naveed, and Sheeraz Akram.[60] Heart
disease diagnosis was considered one of the complicated tasks in the medical field. In
order to perform heart disease diagnosis, an accurate and efficient automation system
could have been very helpful. In this research, a classifier ensemble method was proposed
to improve the decision of the classifiers for heart disease diagnosis. Homogeneous
ensemble was applied for heart disease classification, and finally, results were optimized
by using Genetic algorithm. Data was evaluated by using 10-fold cross-validation, and
the performance of the system was evaluated by classifiers' accuracy, sensitivity, and
specificity to check the feasibility of the system. Comparison of the methodology with
existing ensemble techniques showed considerable improvements in terms of
classification accuracy.
[15] In this section, the key findings and This section highlights the
contributions of the research are limitations or shortcomings
summarized. observed in the methodology or
findings of the research.
Heart health classification involves assessing the condition of a person's heart based on
various factors such as blood pressure, cholesterol levels, and overall cardiovascular
function. This classification helps medical professionals determine the risk of heart
disease and recommend appropriate interventions to maintain or improve heart health. By
analyzing data from medical tests and examinations, doctors can classify individuals into
different categories ranging from low risk to high risk for heart disease.
The goal of heart health classification is to identify potential issues early on and
implement preventative measures to reduce the risk of heart disease. This may include
lifestyle changes such as diet and exercise modifications, as well as medication or medical
procedures for individuals with higher risk factors. By accurately classifying heart health
status, healthcare providers can tailor treatment plans to meet the specific needs of each
patient, ultimately improving overall heart health and reducing the incidence of
cardiovascular events.
CHAPTER-3
EXISTING SYSTEM
The existing classifier commonly used for heart health classification from ECG data is
known as the Random Forest classifier. It works by constructing multiple decision trees
during training and outputs the mode of the classes (classification) or the mean prediction
(regression) of the individual trees. This classifier is particularly favoured for its
robustness and ability to handle large datasets with high dimensionality, making it a
popular choice in the medical field for tasks like identifying various heart conditions
based on ECG signals.
The Random Forest classifier is a powerful tool for assessing heart health using ECG
data. It operates by constructing numerous decision trees during its training process. Each
decision tree examines different aspects of the ECG signals, seeking patterns that might
indicate various heart conditions. These trees work together to make a collective
prediction about the health status of the heart. By combining the outputs of multiple trees,
the classifier arrives at a robust and reliable assessment of the heart's condition.
One of the strengths of the Random Forest classifier is its ability to handle large datasets
with high dimensionality, which is common in medical data like ECG recordings. Each
decision tree in the forest is trained independently and makes its own prediction based on
a subset of the data. This diversity helps to prevent overfitting and ensures that the
classifier generalizes well to new, unseen data. As a result, the Random Forest classifier
is widely trusted by medical professionals for its accuracy and robustness in identifying
various heart conditions from ECG signals.
In the medical field, the Random Forest classifier plays a crucial role in diagnosing
heartrelated issues. Its ability to analyze complex ECG data and provide reliable
classifications makes it a valuable tool for healthcare practitioners. By leveraging the
power of ensemble learning, where multiple decision trees collaborate to make informed
decisions, the Random Forest classifier offers a dependable method for assessing heart
health and aiding in the early detection of potential cardiac abnormalities.
The Random Forest Classifier is a versatile ensemble learning method that operates by
constructing multiple decision trees during training and outputting the mode of the classes
(classification) or the mean prediction (regression) of the individual trees. It's known for
its robustness, ability to handle large datasets with high dimensionality, and resistance to
overfitting. In environmental condition monitoring, RFC can be effective in identifying
patterns and relationships within complex sensor data, allowing for accurate predictions
of environmental conditions such as temperature, humidity, air quality, and more.
On the other hand, Naive Bayes is a probabilistic classifier based on Bayes' theorem with
the assumption of independence between features. Despite its simplicity, Naive Bayes
can be remarkably effective in classification tasks, especially when dealing with large
datasets and high-dimensional feature spaces. It's particularly suitable for real-time
applications due to its fast training and prediction times. In environmental monitoring,
Naive Bayes can be utilized to classify sensor data into different environmental
conditions or detect anomalies based on probabilistic reasoning.
In practice, the choice between RFC and Naive Bayes (or other classifiers) depends on
various factors such as the nature of the data, the specific objectives of the monitoring
system, computational resources, and performance requirements. In some cases, a
combination of multiple classifiers or advanced techniques such as ensemble learning
may be employed to further enhance the accuracy and reliability of the monitoring
models. Ultimately, the selection of the most suitable classifier should be based on
thorough experimentation and evaluation to ensure optimal performance in
environmental condition monitoring applications.
The Naive Bayes method, while commonly used for classification tasks, has several
drawbacks when applied to heart health classification. Firstly, it assumes that all features
are independent of each other, which is often not the case with heart health data. For
instance, factors like cholesterol levels, blood pressure, and age can be interrelated, but
Naive Bayes overlooks these correlations, potentially leading to inaccurate predictions.
Secondly, Naive Bayes struggles with handling continuous features effectively. In heart
health classification, many metrics such as heart rate and blood pressure are continuous
variables. Naive Bayes treats these as discrete categories, which may result in loss of
information and decreased accuracy in identifying patterns associated with different heart
conditions.
Lastly, Naive Bayes can be prone to overfitting, especially when dealing with small or
imbalanced datasets. It may generalize poorly to new data if the training set does not
adequately represent the full range of heart health conditions. This limitation can
undermine the reliability of the classifier and diminish its usefulness in real-world
applications for heart health assessment.
CHAPTER-4
PROPOSED SYSTEMS
4.1 OVERVIEW
The proposed system is a machine learning-based system that can be used to diagnose
diseases. The system works by training a random forest model on a dataset of ECG
signals. The model is then used to test new ECG signals and predict the presence of
disease. The system can be used to diagnose a variety of diseases, including heart disease,
stroke, and epilepsy.
ECG Dataset: This is a collection of ECG signals from patients with and without disease.
Dataset Preprocessing: This component preprocesses the ECG signals before they are fed
into the model.
Train/Test Split: This component splits the dataset into a training set and a test set. The
training set is used to train the model, and the test set is used to evaluate the performance
of the model. RFC Model: This is a random forest model that is trained on the training
set.
Performance Estimation: This component evaluates the performance of the model on the
test set.
Simple Test ECG: This is a new ECG signal that is used to test the model.
Type of disease testing: This specifies the type of disease that the model is being used to
test The system works as follows:
Step-3: The dataset is split into a training set and a test set.
Step-9: The system is still under development, but it has the potential to be a valuable
tool for diagnosing diseases.
Label encoding: The dataset preprocessing for heart health classification involves several
steps. Firstly, we load the dataset containing various features such as age, gender, blood
pressure, cholesterol levels, and more. Then, we check for any missing values in the
dataset and handle them by either removing the rows or imputing values using techniques
like mean or median imputation. Next, we perform label encoding on categorical
variables like gender, where we assign numeric labels to categories (e.g., 0 for male, 1
for female). This ensures that the algorithm can interpret these variables correctly during
training. Once the encoding is done, we split the dataset into training and testing sets to
evaluate the model's performance accurately. Finally, we scale the numerical features to
bring them to a similar scale, which helps in improving the model's convergence and
performance. After preprocessing, the dataset is ready for training a machine learning
model for heart health classification.
Null values removal: In preprocessing the dataset for heart health classification, the first
step is to identify and remove any null values present in the dataset. This involves
scanning each column for missing values and either imputing them with a suitable value
or dropping the rows containing nulls entirely. Removing null values ensures that the
dataset is clean and ready for analysis and model training. This process helps in
preventing bias and inaccuracies in the classification task by ensuring that all data points
are complete and reliable. After null values removal, the dataset can proceed to further
preprocessing steps such as normalization, feature scaling, and feature selection before
feeding it into the classification model.
Scaling: In preprocessing the dataset for heart health classification, the first step is scaling
the features to ensure uniformity and optimal performance of the machine learning model.
This involves transforming the data so that all features have a similar scale, preventing
any particular feature from dominating the others due to its larger magnitude. Typically,
techniques like MinMax scaling or Standard scaling are employed, where values are
adjusted to fall within a specific range or standardized around a mean of zero and a
standard deviation of one, respectively. By scaling the dataset, we prepare it for further
analysis and modeling, ensuring that each feature contributes proportionately to the
classification task without bias towards features with larger numeric ranges.
The existing classifier commonly used for heart health classification from ECG data is
known as the Random Forest classifier. It works by constructing multiple decision trees
during training and outputs the mode of the classes (classification) or the mean prediction
(regression) of the individual trees. This classifier is particularly favoured for its
robustness and ability to handle large datasets with high dimensionality, making it a
popular choice in the medical field for tasks like identifying various heart conditions
based on ECG signals.
The Random Forest classifier is a powerful tool for assessing heart health using ECG
data. It operates by constructing numerous decision trees during its training process. Each
decision tree examines different aspects of the ECG signals, seeking patterns that might
indicate various heart conditions. These trees work together to make a collective
prediction about the health status of the heart. By combining the outputs of multiple trees,
the classifier arrives at a robust and reliable assessment of the heart's condition.
The random forest classifier predicts the heart health classification by analyzing a
multitude of factors gathered from individuals' medical records and health data. It
examines various indicators such as blood pressure, cholesterol levels, BMI, and lifestyle
habits to determine the likelihood of different heart health outcomes. By considering these
factors collectively, the classifier can make informed predictions about whether an
individual is at low, medium, or high risk for heart-related issues.
Through its analysis, the random forest classifier assigns each individual to one of several
categories based on their heart health status. This classification helps healthcare
professionals identify patients who may require closer monitoring, lifestyle interventions,
or medical treatment to mitigate potential risks and promote better heart health. By
leveraging the power of machine learning, the classifier provides valuable insights that
aid in preventive care and improve overall cardiovascular health outcomes for
individuals.
4.5 CLASSIFIER:
A Random Forest Classifier is a type of machine learning model that is widely used for
both classification and regression tasks. It operates on the principle of ensemble learning,
which combines multiple classifiers to solve complex problems and enhance the model's
performance. The model consists of numerous decision trees, each considering a subset
of observations and splitting based on a subset of features, resulting in a diverse set of
classifiers.
When predicting a new data point, the classifier takes the prediction from each tree and
decides the final output based on the majority votes of predictions. This approach helps
to increase accuracy and prevent overfitting, a common problem in machine learning
where a model performs well on training data but poorly on unseen data.
Key parameters of a Random Forest Classifier include the number of trees in the forest
(n_estimators), the function to measure the quality of a split (criterion), the maximum
depth of the tree (max_depth), and the minimum number of samples required to split an
internal node
4.6 ADVANTAGES
Robustness: Random Forest is a robust algorithm that can handle noisy data and outliers.
It is less likely to overfit.
Accuracy: Random Forest is one of the most accurate machine learning algorithms. It
can handle both classification and regression tasks.
Speed: Despite being a complex algorithm, Random Forest is efficient and fast¹.
Effective on Large Datasets: Random Forests are particularly well-suited for handling
large and complex datasets, dealing with high-dimensional feature spaces².
Feature Importance: Random Forest's ability to provide feature importance scores makes
it a valuable tool for understanding the significance of different variables in the dataset².
Ensemble Nature: The ensemble nature of Random Forests, combining multiple trees,
makes them less prone to overfitting compared to individual decision trees².
Handling of Many Features: Random Forest is effective on datasets with a large number
of features, and it can handle irrelevant variables well².
High Accuracy: Among all the available classification methods, random forests provide
the highest accuracy⁴.
Big Data Handling: The random forest technique can also handle big data with numerous
variables running into thousands.
CHAPTER 5
MACHINE LEARNING
What is Machine Learning
Before we take a look at the details of various machine learning methods, let's start by
looking at what machine learning is, and what it isn't. Machine learning is often
categorized as a subfield of artificial intelligence, but I find that categorization can often
be misleading at first brush. The study of machine learning certainly arose from research
in this context, but in the data science application of machine learning methods, it's more
helpful to think of machine learning as a means of building models of data.
At the most fundamental level, machine learning can be categorized into two main types:
supervised learning and unsupervised learning.
categories, while in regression, the labels are continuous quantities. We will see examples
of both types of supervised learning in the following section.
Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate, and solve complex problems. On the other side, AI is
still in its initial stage and have not surpassed human intelligence in many aspects. Then
the question is that what is the need to make machine learn? The most suitable reason for
doing this is, “to make decisions, based on data, with efficiency and scale”.
1. Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data
preprocessing and feature extraction.
2. Time-Consuming task − Another challenge faced by ML models is the
consumption of time especially for data acquisition, feature extraction and
retrieval.
3. Lack of specialist persons − As ML technology is still in its infancy stage,
availability of expert resources is a tough job.
4. No clear objective for formulating business problems − Having no clear objective
and well-defined goal for business problems is another key challenge for ML
because this technology is not that mature yet.
5. Issue of overfitting & underfitting − If the model is overfitting or underfitting, it
cannot
be represented well for the problem.
6. Curse of dimensionality − Another challenge ML model faces is too many features
of data points. This can be a real hindrance.
7. Difficulty in deployment − Complexity of the ML model makes it quite difficult
to be deployed in real life.
Applications of Machines Learning
Machine Learning is the most rapidly growing technology and according to researchers
we are in the golden year of AI and ML. It is used to solve many real-world complex
problems which cannot be solved with traditional approach. Following are some real-
world applications of ML.
• Emotion analysis
• Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation
• Object recognition
• Fraud detection
• Fraud prevention
• Recommendation of products to customer in online shopping How to Start
Learning Machine Learning?
Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of
study that gives computers the capability to learn without being explicitly programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is
one of the most popular (if not the most!) career choices. According to Indeed, Machine
Learning Engineer Is the Best Job of 2019 with a 344% growth and an average base salary
of $146,085 per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start
learning it? So, this article deals with the Basics of Machine Learning and also the path
you can follow to eventually become a full-fledged Machine Learning Engineer. Now
let’s get started!!!
This is a rough roadmap you can follow on your way to becoming an insanely talented
Machine Learning Engineer. Of course, you can always modify the steps according to
your needs to reach your desired end-goal!
In case you are a genius, you could start ML directly but normally, there are some
prerequisites that you need to know which include Linear Algebra, Multivariate Calculus,
Statistics, and Python. And if you don’t know these, never fear! You don’t need a Ph.D.
degree in these topics to get started but you do need a basic understanding.
Both Linear Algebra and Multivariate Calculus are important in Machine Learning.
However, the extent to which you need them depends on your role as a data scientist. If
you are more focused on application heavy machine learning, then you will not be that
heavily focused on maths as there are many common libraries available. But if you want
to focus on R&D in Machine Learning, then mastery of Linear Algebra and Multivariate
Calculus is very important as you will have to implement many ML algorithms from
scratch.
Data plays a huge role in Machine Learning. In fact, around 80% of your time as an ML
expert will be spent collecting and cleaning data. And statistics is a field that handles the
collection, analysis, and presentation of data. So it is no surprise that you need to learn
it!!! Some of the key concepts in statistics that are important are Statistical Significance,
Probability Distributions, Hypothesis Testing, Regression, etc. Also, Bayesian Thinking
is also a very important part of ML which deals with various concepts like Conditional
Probability, Priors, and Posteriors, Maximum Likelihood, etc.
Some people prefer to skip Linear Algebra, Multivariate Calculus and Statistics and learn
them as they go along with trial and error. But the one thing that you absolutely cannot
skip is Python! While there are other languages you can use for Machine Learning like
R, Scala, etc. Python is currently the most popular language for ML. In fact, there are
many Python libraries that are specifically useful for Artificial Intelligence and Machine
Learning such as Keras, TensorFlow, Scikit-learn, etc.
So, if you want to learn ML, it’s best if you learn Python! You can do that using various
online resources and courses such as Fork Python available Free on GeeksforGeeks.
Now that you are done with the prerequisites, you can move on to actually learning ML
(Which is the fun part!!!) It’s best to start with the basics and then move on to the more
complicated stuff. Some of the basic concepts in ML are:
• Supervised Learning – This involves learning from a training dataset with labeled
data using classification and regression models. This learning process continues
until the required level of performance is achieved.
• Unsupervised Learning – This involves using unlabelled data and then finding the
underlying structure in the data in order to learn more and more about the data
itself using factor and cluster analysis models.
• Semi-supervised Learning – This involves using unlabelled data like
Unsupervised Learning with a small amount of labeled data. Using labeled data
vastly increases the learning accuracy and is also more cost-effective than
Supervised Learning.
• Reinforcement Learning – This involves learning optimal actions through trial and
error. So, the next action is decided by learning behaviors that are based on the
current state and that will maximize the reward in the future.
1. Easily identifies trends and patterns: Machine Learning can review large volumes
of data and discover specific trends and patterns that would not be apparent to humans.
For instance, for an e-commerce website like Amazon, it serves to understand the
browsing behaviors and purchase histories of its users to help cater to the right products,
deals, and reminders relevant to them. It uses the results to reveal relevant advertisements
to them.
2. No human intervention needed (automation): With ML, you don’t need to babysit
your project every step of the way. Since it means giving machines the ability to learn, it
lets them make predictions and also improve the algorithms on their own. A common
example of this is anti-virus softwares; they learn to filter new threats as they are
recognized. ML is also good at recognizing spam.
1. Data Acquisition: Machine Learning requires massive data sets to train on, and
these should be inclusive/unbiased, and of good quality. There can also be times where
they must wait for new data to be generated.
2. Time and Resources: ML needs enough time to let the algorithms learn and
develop enough to fulfill their purpose with a considerable amount of accuracy and
relevancy. It also needs massive resources to function. This can mean additional
requirements of computer power for you.
CHAPTER 6
SOFTWARE ENVIRONMENT
What is Python?
Below are some facts about Python.
• Python is currently the most widely used multi-purpose, high-level programming
language.
• Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.
The biggest strength of Python is huge collection of standard libraries which can be used
for the following –
• Machine Learning
• Test frameworks
• Multimedia
Advantages of Python
Let’s see how Python dominates over other languages.
1. Extensive Libraries
Python downloads with an extensive library and it contain code for various purposes like
regular expressions, documentation-generation, unit-testing, web browsers, threading,
databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.
2. Extensible
As we have seen earlier, Python can be extended to other languages. You can write some
of your code in languages like C++ or C. This comes in handy, especially in projects.
3. Embeddable
Complimentary to extensibility, Python is embeddable as well. You can put your Python
code in your source code of a different language, like C++. This lets us add scripting
capabilities to our code in the other language.
4. Improved Productivity
The language’s simplicity and extensive libraries render programmers more productive
than languages like Java and C++ do. Also, the fact that you need to write less and get
more things done.
5. IOT Opportunities
Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright
for the Internet of Things. This is a way to connect the language with the real world.
When working with Java, you may have to create a class to print ‘Hello World’. But in
Python, just a print statement will do. It is also quite easy to learn, understand, and code.
This is why when people pick up Python, they have a hard time adjusting to other more
verbose languages like Java.
7. Readable
Because it is not such a verbose language, reading Python is much like reading English.
This is the reason why it is so easy to learn, understand, and code. It also does not need
curly braces to define blocks, and indentation is mandatory. These further aids the
readability of the code.
8. Object-Oriented
This language supports both the procedural and object-oriented programming paradigms.
While functions help us with code reusability, classes and objects let us model the real
world.
A class allows the encapsulation of data and functions into one.
Like we said earlier, Python is freely available. But not only can you download Python
for free, but you can also download its source code, make changes to it, and even
distribute it. It downloads with an extensive collection of libraries to help you with your
tasks.
10. Portable
When you code your project in a language like C++, you may need to make some changes
to it if you want to run it on another platform. But it isn’t the same with Python. Here,
you need to code only once, and you can run it anywhere. This is called Write Once Run
Anywhere (WORA). However, you need to be careful enough not to include any system-
dependent features.
11. Interpreted
Lastly, we will say that it is an interpreted language. Since statements are executed one
by one, debugging is easier than in compiled languages.
Any doubts till now in the advantages of Python? Mention in the comment section.
1. Less Coding
Almost all of the tasks done in Python requires less coding when the same task is done in
other languages. Python also has an awesome standard library support, so you don’t have
to search for any third-party libraries to get your job done. This is the reason that many
people suggest learning Python to beginners.
2. Affordable
Python is free therefore individuals, small companies or big organizations can leverage
the free available resources to build applications. Python is popular and widely used so it
gives you better community support.
The 2019 Github annual survey showed us that Python has overtaken Java in the most
popular programming language category.
Python code can run on any machine whether it is Linux, Mac or Windows. Programmers
need to learn different languages for different jobs but with Python, you can
professionally build web apps, perform data analysis and machine learning, automate
things, do web scraping and also build games and powerful visualizations. It is an all-
rounder programming language.
Disadvantages of Python
So far, we’ve seen why Python is a great choice for your project. But if you choose it,
you should be aware of its consequences as well. Let’s now see the downsides of choosing
Python over another language.
1. Speed Limitations
We have seen that Python code is executed line by line. But since Python is interpreted,
it often results in slow execution. This, however, isn’t a problem unless speed is a focal
point for the project. In other words, unless high speed is a requirement, the benefits
offered by Python are enough to distract us from its speed limitations.
While it serves as an excellent server-side language, Python is much rarely seen on the
clientside. Besides that, it is rarely ever used to implement smartphone-based
applications. One such application is called Carbonnelle.
The reason it is not so famous despite the existence of Brython is that it isn’t that secure.
3. Design Restrictions
As you know, Python is dynamically-typed. This means that you don’t need to declare
the type of variable while writing the code. It uses duck-typing. But wait, what’s that?
Well, it just means that if it looks like a duck, it must be a duck. While this is easy on the
programmers during coding, it can raise run-time errors.
5. Simple
No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my example. I
don’t do Java, I’m more of a Python person. To me, its syntax is so simple that the
verbosity of Java code seems unnecessary.
This was all about the Advantages and Disadvantages of Python Programming Language.
History of Python
What do the alphabet and the programming language Python have in common? Right,
both start with ABC. If we are talking about ABC in the Python context, it's clear that the
programming language ABC is meant. ABC is a general-purpose programming language
and programming environment, which had been developed in the Netherlands,
Amsterdam, at the CWI (Centrum Wiskunde &Informatica). The greatest achievement of
ABC was to influence the design of Python. Python was conceptualized in the late 1980s.
Guido van Rossum worked that time in a project at the CWI, called Amoeba, a distributed
operating system. In an interview with Bill Venners 1, Guido van Rossum said: "In the
Guido Van Rossum published the first version of Python code (version 0.9.0) at
alt.sources in February 1991. This release included already exception handling, functions,
and the core data types of lists, dict, str and others. It was also object oriented and had a
module system. Python version 1.0 was released in January 1994. The major new features
included in this release were the functional programming tools lambda, map, filter and
reduce, which Guido Van Rossum never liked. Six and a half years later in October 2000,
Python 2.0 was introduced. This release included list comprehensions, a full garbage
collector and it was supporting unicode. Python flourished for another 8 years in the
versions 2.x before the next major release as Python 3.0 (also known as "Python 3000"
and "Py3K") was released. Python 3 is not backwards compatible with Python 2.x. The
emphasis in Python 3 had been on the removal of duplicate programming constructs and
modules, thus fulfilling or coming close to fulfilling the 13th law of the Zen of Python:
"There should be one -- and preferably only one -- obvious way to do it."Some changes
in Python 7.3:
• The rules for ordering comparisons have been simplified. E.g., a heterogeneous
list cannot be sorted, because all the elements of a list must be comparable to
each other. There is only one integer type left, i.e., int. long is int as well.
• The division of two integers returns a float instead of an integer. "//" can be used
to have the "old" behaviour.
• Text Vs. Data Instead of Unicode Vs. 8-bit
Purpose
Python
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
• Python is Interactive − you can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.
Python also acknowledges that speed of development is important. Readable and terse
code is part of this, and so is access to powerful constructs that avoid tedious repetition
of code. Maintainability also ties into this may be an all but useless metric, but it does say
something about how much code you have to scan, read and/or understand to troubleshoot
problems or tweak behaviors. This speed of development, the ease with which a
programmer of other languages can pick up basic Python skills and the huge standard
library is key to another area where Python excels. All its tools have been quick to
implement, saved a lot of time, and several of them have later been patched and updated
by people with no Python background - without breaking.
TensorFlow
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google.
TensorFlow was developed by the Google Brain team for internal Google use. It was
released under the Apache 2.0 open-source license on November 9, 2015.
NumPy
It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:
Pandas
Matplotlib
For simple plotting the pyplot module provides a MATLAB-like interface, particularly
when combined with IPython. For the power user, you have full control of line styles,
font properties, axes properties, etc, via an object-oriented interface or via a set of
functions familiar to MATLAB users.
Scikit – learn
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
There have been several updates in the Python version over the years. The question is
how to install Python? It might be confusing for the beginner who is willing to start
learning Python but this tutorial will solve your query. The latest or the newest version of
Python is version
Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.
Before you start with the installation process of Python. First, you need to know about
your System Requirements. Based on your system type i.e., operating system and based
processor, you must download the python version. My system type is a Windows 64-bit
operating system. So the steps below are to install python version 3.7.4 on Windows 7
device or to install Python 3. Download the Python Cheatsheet here. The steps on how to
install Python on Windows 10, 8 and 7 are divided into 4 parts to help understand better.
Step 1: Go to the official site to download and install python using Google Chrome or
any other web browser. OR Click on the following link: https://www.python.org
Now, check for the latest and the correct version for your operating system.
Step 3: You can either select the Download Python for windows 3.7.4 button in Yellow
Color or you can scroll further down and click on download with respective to their
version. Here, we are downloading the most recent python version for windows 3.7.4
Step 4: Scroll down the page until you find the Files option.
Step 5: Here you see a different version of python along with the operating system .
• To download Windows 32-bit python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable installer or
Windows x86 web-based installer.
• To download Windows 64-bit python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer or Windows x86-64 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part regarding
which version of python is to be downloaded is completed. Now we move ahead with the
second part in installing python i.e., Installation
Note: To know the changes or updates that are made in the version you can click on the
Release Note Option.
Installation of Python
Step 1: Go to Download and Open the downloaded python version to carry out the
installation process.
Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7 to
PATH.
Step 3: Click on Install NOW After the installation is successful. Click on Close.
With these above three steps on python installation, you have successfully and correctly
installed Python. Now is the time to verify the installation.
Step 4: Let us test whether the python is correctly installed. Type python –V and press
Enter.
Note: If you have any of the earlier versions of Python already installed. You must first
uninstall the earlier version and then install the new one.
Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on File >
Click on Save
Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I
have named the files as Hey World.
Step 6: Now for e.g. enter print (“Hey World”) and Press Enter.
You will see that the command given is launched. With this, we end our tutorial on how
to install Python. You have learned how to download python for windows into your
respective operating system.
Note: Unlike Java, Python does not need semicolons at the end of the statements
otherwise it won’t work.
CHAPTER 7
SOURCE CODE
import numpy as np import pandas as pd import
warnings.filterwarnings("ignore") df_train =
pd.read_csv("mitbih_train.csv", header=None)
df_train.head() plt.plot(df_train.iloc[1,:186])
plt.figure(figsize=(5,5)) my_circle=plt.Circle(
labels=['n','q','v','s','f'],
colors=['red','green','blue','skyblue','orange'],autopct='
%1.1f%%') p=plt.gcf()
p.gca().add_artist(my_circle)
plt.show()
print(df_train[187].value_counts())
plot_equilibre(df_train[187].value_count
df_train.values[:, -1].astype(int)
C0 = np.argwhere(y_train == 0).flatten()
C1 = np.argwhere(y_train == 1).flatten()
C2 = np.argwhere(y_train == 2).flatten()
C3 = np.argwhere(y_train == 3).flatten()
C4 = np.argwhere(y_train == 4).flatten()
label="Normal")
plt.legend()
plt.ylabel("Amplitude", fontsize=15)
plt.show()
bnb_classifier =joblib.load('BNB_classifier_weights.pkl')
y_pred = bnb_classifier.predict(X_test)
joblib.dump(bnb_classifier, 'BNB_classifier_weights.pkl')
labels = ["Normal",
"Artial Premature",
print(classification_report(y_test, y_pred,
target_names=labels))
cm=confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots()
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
plt.show()
rf_classifier= joblib.load('rf_classifier_weights.pkl')
y_pred1 = rf_classifier.predict(X_test)
print("Accuracy:", accuracy)
else:
rf_classifier=RandomForestClassifier(n_estimators=100,
random_state=42,min_samples_split=2,min_samples_leaf=1)
joblib.dump(rf_classifier, 'rf_classifier_weights.pkl')
labels = ["Normal",
"Artial Premature",
target_names=labels))
cm1=confusion_matrix(y_test, y_pred1)
fig, ax = plt.subplots()
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
dataset =pd.read_csv(filename)
A='Normal'
B='Artial Premature'
rf_classifier.predict(X_test[1:10,:]) for i
in range(len(predict)):
if predict[i] == 0:
elif predict[i]== 1:
elif predict[i]== 2:
elif predict[i]==3:
else:
CHAPTER-8
The dataset 8.1 for heart health classification consists of electrocardiogram (ECG) signals
obtained from human subjects. These signals are captured using ECG sensors attached to
the body, typically on the chest area. The dataset contains various attributes extracted
from these signals, such as waveforms, intervals, and amplitudes. Each data point in the
dataset represents a specific ECG recording from an individual. Additionally, the dataset
includes corresponding labels indicating the heart health status of each subject, such as
normal, arrhythmia, or other cardiac conditions. This dataset serves as a valuable resource
for developing and testing machine learning algorithms aimed at accurately classifying
heart health based on ECG signals.
The dataset consists of electrocardiogram (ECG) signals collected from individuals. Each
ECG signal represents the electrical activity of the heart over a period of time. These
signals are processed to extract numerical features, such as amplitude, frequency, and
duration of specific waveforms, like the P wave, QRS complex, and T wave. These
numerical features serve as inputs for machine learning algorithms to classify the heart
health status of the individuals, distinguishing between normal and abnormal conditions.
The dataset aims to aid researchers and healthcare professionals in developing accurate
and efficient methods for ECG-based heart health classification.
The sample figure 8.3 data of an electrocardiogram (ECG) displays the electrical activity
of the heart, providing insights into heart health classification. The ECG waveform
comprises distinct peaks and troughs, representing different phases of cardiac activity.
The P wave indicates atrial depolarization, while the QRS complex signifies ventricular
depolarization. The ST segment reflects the time between ventricular depolarization and
repolarization, crucial for assessing myocardial ischemia. Lastly, the T wave corresponds
to ventricular repolarization. By analyzing the amplitude, duration, and morphology of
these waveforms, healthcare professionals can evaluate the heart's rhythm, identify
abnormalities such as arrhythmias or ischemia, and classify the overall cardiac health
status of an individual.
The dataset figure 8.4 on heart health classification reveals that the majority, comprising
82.8%, is represented by the category denoted as 'n'. This indicates a substantial portion
of the dataset falls under this classification. Following 'n', there is a smaller proportion,
accounting for 7.3%, categorized as 'q'. Although not as prevalent as 'n', 'q' still holds a
significant portion within the dataset, suggesting a notable presence in the heart health
classification.
Additionally, the dataset encompasses 'v', constituting 6.6% of the total. While 'v'
represents a smaller fraction compared to 'n' and 'q', it remains a notable category within
the dataset. Furthermore, 's' and 'f' categories are observed, collectively representing a
minor portion, with 's' at 2.5% and 'f' at 0.7%. Although these categories make up a
smaller percentage individually, their inclusion underscores the diversity within the
dataset's heart health classification.
In cases of atrial premature contractions, the ECG may exhibit irregularities characterized
by premature beats originating from the atria. These premature beats disrupt the regular
rhythm, causing deviations in the ECG tracing. The color of the plot shifts to orange,
marking these irregular contractions. Similarly, premature ventricular contractions
manifest as early, abnormal beats originating from the ventricles, causing irregularities in
the ECG tracing, represented by green. Fusion of ventricular normal rhythms displays a
combination of normal and irregular beats, depicted by a red color. Fusion of paced and
normal rhythms illustrates a combination of paced and normal beats, represented by
violet, indicating a blend of artificial pacing and natural heart rhythm.
Figure 8.5 likely displays one Electrocardiogram (ECG) for each heart condition category
mentioned in the classification report. Each ECG serves as an example representation of
the corresponding heart condition, helping to visually understand the differences between
them.
Table 8.1 provides metrics such as precision, recall, F1-score, and support for each class,
along with overall accuracy, macro average, and weighted average. It evaluates the
performance of Naive Bayes algorithm in classifying different heart conditions.
Table 8.3 directly compares overall performance metrics between Naive Bayes and RFC,
including precision, recall, F1-score, and support. It provides insights into how RFC
performs compared to Naive Bayes across all classes.
Tables 8.5 offer a detailed comparison of performance metrics for each individual class
between Naive Bayes and RFC. Each table focuses on a specific class, providing
precision, recall, F1-score, and support for that class, allowing for a granular comparison
of algorithm performance.
Tables 8.6 offer a detailed comparison of performance metrics for each individual class
between Naive Bayes and RFC. Each table focuses on a specific class, providing
precision, recall, F1-score, and support for that class, allowing for a granular comparison
of algorithm performance.
Tables 8.8 offer a detailed comparison of performance metrics for each individual class
between Naive Bayes and RFC. Each table focuses on a specific class, providing
precision, recall, F1-score, and support for that class, allowing for a granular comparison
of algorithm performance.
8.6 CONFUSION_MATRIX:
CHAPTER 9
9.1 CONCLUSION
In conclusion, the study employs a random forest classifier to health based on ECG data.
The results indicate that the classifier achieves a high level of accuracy in distinguishing
between different heart health categories. Specifically, it accurately identifies individuals
with healthy hearts and those with various heart conditions. This suggests that ECG data
can be effectively utilized for heart health classification, offering a non-invasive and
efficient method for assessing cardiovascular well-being.
Moreover, the findings underscore the potential of incorporating ECG-based heart health
classification into routine medical practices. With advancements in technology and the
increasing availability of ECG devices, this approach could offer a cost-effective and
accessible means of assessing heart health. By identifying individuals at risk early on,
healthcare providers can intervene promptly, potentially reducing the burden of
cardiovascular diseases and improving overall patient outcomes.
In summary, the study demonstrates the efficacy of using random forest classifiers to
classify heart health based on ECG data. It emphasizes the utility of machine learning in
healthcare and advocates for the integration of ECG-based classification methods into
routine medical assessments. Ultimately, these efforts have the potential to enhance
preventive care, facilitate early diagnosis, and improve the management of cardiovascular
diseases, contributing to better health outcomes for individuals.
9.3 REFERENCES
[1] Malakouti, Seyed Matin. "Heart disease classification based on ECG using machine
learning models." Biomedical Signal Processing and Control 84 (2023): 104796.
[2] Ozcan, Mert, and Serhat Peker. "A classification and regression tree algorithm for
heart disease modeling and prediction." Healthcare Analytics 3 (2023): 100130.
[3] Fakhry, Mahmoud, and Ascensión Gallardo-Antolín. "Elastic net regularization and
gabor dictionary for classification of heart sound signals using deep learning."
Engineering Applications of Artificial Intelligence 127 (2024): 107406.
[4] Nguyen, Minh Tuan, Wei Wen Lin, and Jin H. Huang. "Heart Sound Classification
Using Deep Learning Techniques Based on Log-mel Spectrogram." Circuits, Systems,
and Signal Processing 42, no. 1 (2023): 344-360.
[5] Huang, Youhe, Hongru Li, and Xia Yu. "A novel time representation input based on
deep learning for ECG classification." Biomedical Signal Processing and Control 83
(2023): 104628.
[6] Sk, Khader Basha, D. Roja, Sunkara Santhi Priya, Lavanya Dalavi, Sai Srinivas
Vellela, and Venkateswara Reddy. "Coronary Heart Disease Prediction and
Classification using Hybrid Machine Learning Algorithms." In 2023 International
Conference on Innovative Data Communication Technologies and Application
(ICIDCA), pp. 1-7. IEEE, 2023.
[7] Li, Jiajia, Christopher Brown, Dillon J. Dzikowicz, Mary G. Carey, Wai Cheong Tam,
and Michael Xuelin Huang. "Towards real-time heart health monitoring in firefighting
using convolutional neural networks." Fire Safety Journal 140 (2023): 103852.
[8] Chen, Dan, Juan Feng, HongYan He, WeiPing Xiao, and XiaoJing Liu.
"Classification, Diagnosis, and Treatment of Obesity-Related Heart Diseases."
Metabolic Syndrome and Related Disorders (2024).
[9] Maulani, Ahmad Alaik, Sri Winarno, Junta Zeniarja, Rusyda Tsaniya Eka Putri, and
Ailsa Nurina Cahyani. "Comparison of Hyperparameter Optimization Techniques in
Hybrid CNNLSTM Model for Heart Disease Classification." Sinkron: jurnal dan
penelitian teknik informatika 9, no. 1 (2024): 455-465.
[10] Parveen, Nikhat, Manisha Gupta, Shirisha Kasireddy, Md Shamsul Haque Ansari,
and Mohammad Nadeem Ahmed. "ECG based one-dimensional residual deep
convolutional autoencoder model for heart disease classification." Multimedia Tools
and Applications (2024): 1-27.
[11] Tartarisco, Gennaro, Giovanni Cicceri, Roberta Bruschetta, Alessandro Tonacci,
Simona Campisi, Salvatore Vitabile, Antonio Cerasa et al. "An intelligent Medical
Cyber–Physical System to support heart valve disease screening and diagnosis."
Expert Systems with Applications 238 (2024): 121772.
[12] Jemima, P. Preethy, R. Gokul, R. Ashwin, and S. Matheswaran. "Optimized
Generalised Metric Learning Model for Iterative, Efficient, Accurate, and Improved
Coronary Heart Diseases." In Advanced Applications of Generative AI and Natural
Language Processing Models, pp. 373-388. IGI Global, 2024.
[13] Shivadekar, Samit, Ketan Shahapure, Shivam Vibhute, and Ashley Dunn.
"Evaluation of Machine Learning Methods for Predicting Heart Failure
Readmissions: A Comparative Analysis." International Journal of Intelligent Systems
and Applications in Engineering 12, no. 6s (2024): 694-699.
[14] Chakraborty, C. Parnasree. "Integrating Neural Networks and Traditional Models: A
Hybrid Approach for Accurate Heart Disease Prediction." (2024).
[15] Kaur, Ishleen, and Tanvir Ahmad. "A cluster-based ensemble approach for congenital
heart disease prediction." Computer Methods and Programs in Biomedicine 243
(2024): 107922.
[16] Searles, Charles D. "MicroRNAs and Cardiovascular Disease Risk." Current
Cardiology Reports (2024): 1-10.
[17] Wright, Brandon, Carly Fassler, Dmitry Tumin, and Lauren A. Sarno. "Health system
encounters after loss to cardiology follow-up among patients with congenital heart
disease." The Journal of Pediatrics (2024): 113931.
[18] Jou, Stephanie, Sean R. Mendez, Jason Feinman, Lindsey R. Mitrani, Valentin
Fuster, Massimo Mangiola, Nader Moazami, and Claudia Gidea. "Heart
transplantation: Advances in expanding the donor pool and xenotransplantation."
Nature Reviews Cardiology 21, no. 1 (2024): 25-36.
[19] Christogianni, Aikaterini. "The Benefits of Continuous Health Data Monitoring in
Cardiovascular Diseases and Dementia." In Encyclopedia of Information Science
and Technology, Sixth Edition, pp. 1-22. IGI Global, 2025.
[20] Baek, Ji Yoon, Seung Hee Seo, Sooyoung Cho, Jun-Bean Park, Bhumsuk Keam,
Shin Hye Yoo, and Aesun Shin. "Emergency department visits of newly diagnosed
cardiovascular disease patients in Korea during the COVID-19 pandemic." Scientific
Reports 14, no. 1 (2024): 397.
[21] Trivedi, Rupal. "Cardiovascular Disease Management In The South: An
Implementation Science Approach." PhD diss., University of Georgia.
[22] Wiatma, Deny Sutrisna, Reksa Samoedra, I. Putu Bayu Agus Saputra, and Bayu
Setia. "Physical Activity and Smoking Habits are Closely Related to Cardiovascular
Endurance in Farmers." Indonesian Journal of Global Health Research 6, no. 1
(2024): 263-270.
[23] Bhende, Vishal V., Tanishq S. Sharma, Mathangi Krishnakumar, Anikode
Subramanian Ramaswamy, Kanchan Bilgi, Sohilkhan R. Pathan, and Sohilkhan
Pathan. "The Myths, Perils, and Pitfalls of Redo Pediatric Cardiac Surgery: The New
Normal in Developing Countries Such as India." Cureus 16, no. 1 (2024).
[24] Charchar, Fadi J., Priscilla R. Prestes, Charlotte Mills, Siew Mooi Ching, Dinesh
Neupane, Francine Z. Marques, James E. Sharman et al. "Lifestyle management of
hypertension: International Society of Hypertension position paper endorsed by the
World Hypertension League and European Society of Hypertension." Journal of
hypertension 42, no. 1 (2024): 23-49.
[25] Campbell‐Washburn, Adrienne E., Juliet Varghese, Krishna S. Nayak, Rajiv
Ramasawmy, and Orlando P. Simonetti. "Cardiac MRI at low field strengths."
Journal of Magnetic Resonance Imaging 59, no. 2 (2024): 412-430.
[26] Seng, Nang San Hti Lar, Gebremichael Zeratsion, Oscar Yasser Pena Zapata,
Muhammad Umer Tufail, and Belinda Jim. "Utility of cardiac troponins in patients
with chronic kidney disease." Cardiology in Review 32, no. 1 (2024): 62-70.
[27]Thummisetti, Bala Siva Prakash, and Haritha Atluri. "Advancing Healthcare
Informatics for Empowering Privacy and Security through Federated Learning
Paradigms." International Journal of Sustainable Development in Computing
Science 1, no. 1 (2024): 1-16.
[28] Shield, Kevin, Catherine Paradis, Peter Butt, Tim Naimi, Adam Sherk, Mark
Asbridge, Daniel Myran et al. "New perspectives on how to formulate alcohol
drinking guidelines." Addiction 119, no. 1 (2024): 9-19.
[29] Neshat, Sina, Abbas Rezaei, Armita Farid, Salar Javanshir, Fatemeh Dehghan Niri,
Padideh Daneii, Kiyan Heshmat-Ghahdarijani, and Setayesh Sotoudehnia Korani.
"Cardiovascular diseases risk predictors: ABO blood groups in a different role."
Cardiology in Review 32, no. 2 (2024): 174-179.
[30] Lee, Chien-Chiang, and Zihao Yuan. "Impact of energy poverty on public health: A
nonlinear study from an international perspective." World Development 174 (2024):
106444.
[31] Li, Jian Ping, Amin Ul Haq, Salah Ud Din, Jalaluddin Khan, Asif Khan, and Abdus
Saboor. "Heart disease identification method using machine learning classification
in e-healthcare." IEEE access 8 (2020): 107562-107582.
[32] Deng, Muqing, Tingting Meng, Jiuwen Cao, Shimin Wang, Jing Zhang, and Huijie
Fan. "Heart sound classification based on improved MFCC features and
convolutional recurrent neural networks." Neural Networks 130 (2020): 22-32.
[33] Abdellatif, Abdallah, Hamdan Abdellatef, Jeevan Kanesan, Chee-Onn Chow, Joon
Huang Chuah, and Hassan Muwafaq Gheni. "An effective heart disease detection
and severity level classification model using machine learning and hyperparameter
optimization methods." ieee access 10 (2022): 79974-79985.
[34] Chen, Yongchao, Shoushui Wei, and Yatao Zhang. "Classification of heart sounds
based on the combination of the modified frequency wavelet transform and
convolutional neural network." Medical & Biological Engineering & Computing 58
(2020): 2039-2047.
[35] Shah, Devansh, Samir Patel, and Santosh Kumar Bharti. "Heart disease prediction
using machine learning techniques." SN Computer Science 1 (2020): 1-6.
[36] Oliveira, Jorge, Francesco Renna, Paulo Dias Costa, Marcelo Nogueira, Cristina
Oliveira, Carlos Ferreira, Alípio Jorge et al. "The CirCor DigiScope dataset: from
murmur detection to murmur classification." IEEE journal of biomedical and health
informatics 26, no. 6 (2021): 2524-2535.
[37] Balaji, Tata. "An insight on machine learning algorithms for predicting heart
diseases." Turkish Journal of Computer and Mathematics Education (TURCOMAT)
12, no. 10 (2021): 5867-5877.
[38] Pati, Abhilash, Manoranjan Parhi, and Binod Kumar Pattanayak. "IHDPM: An
integrated heart disease prediction model for heart disease prediction." International
Journal of Medical Engineering and Informatics 14, no. 6 (2022): 564-577.
[39] Nagendra, Kolluru Venkata, Maligela Ussenaiah, and N. Rajasekhar. "Design and
Development of EGB Classification Model for predicting Heart Diseases." In 2020
2nd International Conference on Innovative Mechanisms for Industry Applications
(ICIMIA), pp. 359-366. IEEE, 2020.
[40] Vamshi Kumar, S., T. V. Rajinikanth, and S. Viswanadha Raju. "Heart Attack
Classification Using SVM with LDA and PCA Linear Transformation Techniques."
In Machine Learning Technologies and Applications: Proceedings of ICACECS
2020, pp. 99-112. Springer Singapore, 2021.
[41] Ali, Farman, Shaker El-Sappagh, SM Riazul Islam, Daehan Kwak, Amjad Ali,
Muhammad Imran, and Kyung-Sup Kwak. "A smart healthcare monitoring system
for heart disease prediction based on ensemble deep learning and feature fusion."
Information Fusion 63 (2020): 208-222.
[42] Katarya, Rahul, and Sunit Kumar Meena. "Machine learning techniques for heart
disease prediction: a comparative study and analysis." Health and Technology 11
(2021): 87-97.
[43] Katarya, Rahul, and Polipireddy Srinivas. "Predicting heart disease at early stages
using machine learning: A survey." In 2020 International Conference on Electronics
and Sustainable Communication Systems (ICESC), pp. 302-305. IEEE, 2020.
[44] Rath, Adyasha, Debahuti Mishra, Ganapati Panda, and Suresh Chandra Satapathy.
"An exhaustive review of machine and deep learning based diagnosis of heart
diseases." Multimedia Tools and Applications 81, no. 25 (2022): 36069-36127.
[45] Menshawi, Alaa, Mohammad Mehedi Hassan, Nasser Allheeib, and Giancarlo
Fortino. "A Hybrid Generic Framework for Heart Problem diagnosis based on a
machine learning paradigm." Sensors 23, no. 3 (2023): 1392.
[46] Kumar, Ashish, Rama Komaragiri, and Manjeet Kumar. "Heart rate monitoring and
therapeutic devices: a wavelet transform based approach for the modeling and
classification of congestive heart failure." ISA transactions 79 (2018): 239-250.
[47] Bahrami, Boshra, and Mirsaeid Hosseini Shirvani. "Prediction and diagnosis of heart
disease by data mining techniques." Journal of Multidisciplinary Engineering
Science and Technology (JMEST) 2, no. 2 (2015): 164-168.
[48] Magnussen, Costan G., Olli T. Raitakari, Russell Thomson, Markus Juonala,
Dharmendrakumar A. Patel, Jorma SA Viikari, Jukka Marniemi et al. "Utility of
currently recommended pediatric dyslipidemia classifications in predicting
dyslipidemia in adulthood: evidence from the Childhood Determinants of Adult
Health (CDAH) study, Cardiovascular Risk in Young Finns Study, and Bogalusa
Heart Study." Circulation 117, no. 1 (2008): 32-42.
[49] Bahrami, Boshra, and Mirsaeid Hosseini Shirvani. "Prediction and diagnosis of heart
disease by data mining techniques." Journal of Multidisciplinary Engineering
Science and Technology (JMEST) 2, no. 2 (2015): 164-168.
[50] Magnussen, Costan G., Olli T. Raitakari, Russell Thomson, Markus Juonala,
Dharmendrakumar A. Patel, Jorma SA Viikari, Jukka Marniemi et al. "Utility of
currently recommended pediatric dyslipidemia classifications in predicting
dyslipidemia in adulthood: evidence from the Childhood Determinants of Adult
Health (CDAH) study, Cardiovascular Risk in Young Finns Study, and Bogalusa
Heart Study." Circulation 117, no. 1 (2008): 32-42.
[51] Acharya, R., Ashwin Kumar, P. S. Bhat, C. M. Lim, S. S. Lyengar, N. Kannathal, and
ShankarM Krishnan. "Classification of cardiac abnormalities using heart rate
signals." Medical and Biological Engineering and Computing 42 (2004): 288-293.
[52] Dewan, Ankita, and Meghna Sharma. "Prediction of heart disease using a hybrid
technique in data mining classification." In 2015 2nd International Conference on
Computing for Sustainable Global Development (INDIACom), pp. 704-706. IEEE,
2015.
[53] Shuvo, Samiul Based, Shams Nafisa Ali, Soham Irtiza Swapnil, Mabrook S. Al-
Rakhami, and Abdu Gumaei. "CardioXNet: A novel lightweight deep learning
framework for cardiovascular disease classification using heart sound recordings."
ieee access 9 (2021): 36955-36967.
[54] Reddy, N. Satish Chandra, Song Shue Nee, Lim Zhi Min, and Chew Xin Ying.
"Classification and feature selection approaches by machine learning techniques:
Heart disease prediction." International Journal of Innovative Computing 9, no. 1
(2019).
[55] Woodward, Mark. "Small area statistics as markers for personal social status in the
Scottish heart health study." Journal of epidemiology and community health 50, no.
5 (1996): 570.
[56] Deng, Shi-Wen, and Ji-Qing Han. "Towards heart sound classification without
segmentation via autocorrelation feature and diffusion maps." Future Generation
Computer Systems 60 (2016): 13-21.
[57] Singh, Jagdeep, Amit Kamra, and Harbhag Singh. "Prediction of heart diseases using
associative classification." In 2016 5th International conference on wireless networks
and embedded systems (WECON), pp. 1-7. IEEE, 2016.
[58] Malik, John, Yu-Lun Lo, and Hau-tieng Wu. "Sleep-wake classification via
quantifying heart rate variability by convolutional neural network." Physiological
measurement 39, no. 8 (2018): 085004.
[59] Fida, Benish, Muhammad Nazir, Nawazish Naveed, and Sheeraz Akram. "Heart
disease classification ensemble optimization using genetic algorithm." In 2011 IEEE
14th International Multitopic Conference, pp. 19-24. IEEE, 2011.
[60] Katarya, Rahul, and Sunit Kumar Meena. "Machine learning techniques for heart
disease prediction: a comparative study and analysis." Health and Technology 11
(2021): 87-97.