Cardiovascular Disease Prediction Using Retinal Image: A Project Work - Phase I On
Cardiovascular Disease Prediction Using Retinal Image: A Project Work - Phase I On
Bachelor of Technology
In
by
Pishangal Venkata Ramana 21241A6652
M.Vikram Adithya 21241A6647
Tappatla Sujit Goud 21241A6660
Dr.R.P.Ram Kumar
Professor
i
GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING AND
TECHNOLOGY
(Autonomous)
Hyderabad-500090
CERTIFICATE
This is to certify that Project Work - Phase II entitled “Cardiovascular Disease Prediction
Using Retinal Images” is submitted by Pishangal Venkata Ramana(21241A6652), M.
Vikram Adithya(21241A6647) and Tappatla Sujit Goud(21241A6660) in partial
fulfillment of the award of degree in BACHELOR OF TECHNOLOGY in Computer Science
and Engineering (Artificial Intelligence and Machine Learning) during Academic year 2024-
2025.
External Examiner
ii
ACKNOWLEDGEMENT
There are many people who helped us directly and indirectly to complete our project
successfully. We would like to take this opportunity to thank one and all. First, we would like
to express our deep gratitude towards our internal guide Dr. R.P Ram Kumar, Professor,
Department of Computer Science and Engineering (Artificial Intelligence and Machine
Learning) for his support in the completion of our dissertation. We wish to express our
sincere thanks to Dr. G. Karuna, Head of the Department, and to our principal Dr. J.
PRAVEEN, for providing the facilities to complete the project work. We would like to thank
the Project Coordinator, Dr. R.P. Ram Kumar, the Faculty Members of AIML Department,
and Friends for their help and constructive criticism during the project period. Finally, we are
very much indebted to our parents for their moral support and encouragement to achieve
goals.
iii
DECLARATION
iv
ABSTRACT
Cardiovascular Diseases (CVD) is a general term for conditions effecting heart or blood vessels
It is one of the most fatal conditions in the present world. It is usually associated with a build-
up of fatty deposits inside the arteries (atherosclerosis) and an increased risk of blood clots.
Researches suggests that blood vessels present in retinal can offer a valuable insights into
cardiovascular health, making this a promising avenue for early diagnosis. The primary
objective of the project is to develop a deep learning model that can accurately predict
cardiovasular risks based on retinal images. The methodology involves preprocessing of retinal
images using Contrast Limited Adaptive Histogram Equalization (CLAHE) and Normalization
this is followed by feature extraction using Region-based Convolutional Neural Networks
(RCNN) with K Nearest Neighbor(KNN). and followed by feature classification using
Convolutional Neural Network (CNN) with LSTM. The accuracy achieved is 94%. The
uniqueness of this approach is by using retinal images it will be cost efficient to predict
cardiovascular diseases and by using RCNN with KNN and using CNN with LSTM to improve
the accuracy .
v
LIST OF FIGURES
vi
LIST OF TABLES
vii
LIST OF ACRONYMS
Acronym Full Form
viii
TABLE OF CONTENTS
Chapter No. Chapter Name Page No.
Certificate ii
Acknowledgement iii
Declaration iv
Abstract v
List of Figures vi
List of Tables vii
List of Acronyms viii
1 Introduction
1.1 Cardiovascular Disease Prediction Using Retinal Images 1
1.2 Objective of the Project 3
1.3 Methodology 4
1.4 Architecture diagram 5
1.5 Organization of the Report 7
2 Literature Survey
2.1 Summary of Existing Approaches 8
2.2 Summary Drawbacks of Existing Approaches 18
3 Proposed Method
3.1 Problem Statement & Objectives of the Project 20
3.2 Architecture Diagram 21
3.3 Modules and its Description 34
3.4 Requirements Engineering 37
3.5 Analysis and Design through UML 39
3.6 Testing 44
4 Results and Discussions
4.1 Description about Dataset 45
4.2 Detailed Explanation 47
ix
4.3 Significance of the Proposed Method with its 49
Advantages
6 Appendices 53
60
References
x
CHAPTER 1
INTRODUCTION
The heart is a kind of muscular organ which pumps blood into the body and is the central
part of the body’s cardiovascular system which also contains lungs. Cardiovascular system
also comprises a network of blood vessels, for example, veins, arteries, and capillaries.
These blood vessels deliver blood all over the body. Abnormalities in normal blood flow
from the heart cause several types of heart diseases which are commonly known as
cardiovascular diseases (CVD). Heart diseases are the main reasons for death worldwide.
According to the survey of the World Health Organization (WHO), 17.5 million total global
deaths occur because of heart attacks and strokes. More than 75% of deaths from
cardiovascular diseases occur mostly in middle-income and low-income countries. Also,
80% of the deaths that occur due to CVDs are because of stroke and heart attack .
Therefore, prediction of cardiac abnormalities at the early stage and tools for the
prediction of heart diseases can save a lot of life and help doctors to design an effective
treatment plan which ultimately reduces the mortality rate due to cardiovascular diseases.
Due to the development of advance healthcare systems, lots of patient data are
nowadays available (i.e. Big Data in Electronic Health Record System) which can be used
for designing predictive models for Cardiovascular diseases. Data mining or machine
learning is a discovery method for analyzing big data from an assorted perspective and
encapsulating it into useful information. “Data Mining is a non-trivial extraction of implicit,
previously unknown and potentially useful information about data”. Nowadays, a huge
amount of data pertaining to disease diagnosis, patients etc. are generated by healthcare
industries. Data mining provides a number of techniques which discover hidden patterns or
similarities from data.
Therefore, in this paper, a machine learning algorithm is proposed for the
implementation of a heart disease prediction system which was validated on two open access
heart disease prediction datasets. Data mining is the computer based process of extracting
useful information from enormous sets of databases. Data mining is most helpful in an
explorative analysis because of nontrivial information from large volumes of evidence
Data mining technology affords an efficient approach to the latest and indefinite patterns in
the data. The information which is identified can be used by the healthcare administrators to
1
get better services. Heart disease was the most crucial reason for victims in the countries
like India, United States. In this project we are predicting the heart disease using
classification algorithms. Machine learning techniques like Classification algorithms such
as DNN Classifications, Logistic Regression are used to explore different kinds of heart
based problems. A major challenge facing healthcare organizations (hospitals, medical
centers) is the provision of quality services at affordable costs. Quality service implies
diagnosing patients correctly and administering treatments that are effective. Poor clinical
decisions can lead to disastrous consequences which are therefore unacceptable. Hospitals
must also minimize the cost of clinical tests. They can achieve these results by employing
appropriate computer-based information and/or decision support systems.
This section depicts the overview of the proposed system and illustrates all of the
components, techniques and tools are used for developing the entire system. To develop an
intelligent and user-friendly heart disease prediction system, an efficient software tool is
needed in order to train huge datasets and compare multiple machine learning algorithms.
After choosing the robust algorithm with best accuracy and performance measures, it will be
implemented on the development of the smart phone-based application for detecting and
predicting heart disease risk level. Retinal fundus photographs can be used to detect a range
of retinal conditions. Here we show that deep-learning models trained instead on external
photographs of the eyes can be used to detect diabetic retinopathy (DR), diabetic macular
oedema and poor blood glucose control. We developed the models using eye photographs
from patients with diabetes from 301 DR screening sites and evaluated the models on four
tasks and four validation datasets with a patients from 198 additional screening sites. For all
four tasks, the predictive performance of the deep-learning models was significantly higher
than the performance of logistic regression models using self-reported demographic and
medical history data, and the predictions generalized to patients with dilated pupils, to
patients from a different DR screening programme and to a general eye care programme that
included diabetics and non-diabetics. We also explored the use of the deep-learning models
for the detection of elevated lipid levels. The utility of external eye photographs for the
diagnosis and management of diseases should be further validated with images from different
cameras and patient populations.
Many hospital information systems are designed to support patient billing, inventory
management and generation of simple statistics. Some hospitals use decision support
systems, but they are largely limited. They can answer simple queries like “What is the
average age of patients who have heart disease?”, “How many surgeries had resulted in
2
hospital stays longer than 10 days?” “Identify the female patients who are single, above 30
years old, and who have been treated for cancer.” However, they cannot answer complex
queries like “Identify the important preoperative predictors that increase the length of hospital
stay”, “Given patient records on cancer, should treatment include chemotherapy alone,
radiation alone, or both chemotherapy and radiation?”, and “Given patient records, predict
the probability of patients getting a heart disease.” Clinical decisions are often made based on
doctors’intuition and experience rather than on the knowledge-rich data hidden in the
database. This practice leads to unwanted biases, errors and excessive medical costs which
affects the quality of service provided to patients. Wu, et alproposed that integration of
clinical decision support with computer-based patient records could reduce medical errors,
enhance patient safety, decrease unwanted practice variation, and improve patient outcome
[17]. This suggestion is promising as data modeling and analysis tools, e.g., data mining,
have the potential to generate a knowledge-rich environment which can help to significantly
improve the quality of clinical decisions.
Cardiovascular diseases (CVD) include a wide category of diseases affecting the heart and
circulation and are among the most common and deadly diseases. These conditions are
usually accompanied with deposits of fats around the blood vessels’ walls (atherosclerosis)
and with coagulation of the blood, which may be potentially dangerous and lead to such
severe diseases as heart attacks and strokes. These life-threatening events are best detected
before their occurrence and current diagnostic techniques are normally expensive, time
consuming and or invasive.
In the last couple of years, studies conducted on patients have pointed out the possibility of
using retinal blood vessels with assessment of cardiovascular risk. The retina is an ocular
tissue and being microvascular thus the changes it portrays are likely to mirror the state of the
vascular system and can be used for screening cardiovascular risks in a non invasive manner.
As such, this project seeks to unlock this potential by training a deep learning model to
accurately predict cardiovascular disease risk from images of the retina.
The suggested procedure is initiation of retinal image preprocessing with the aim to improve
their quality and visibility of the significant forms. Illumination is pre-processed using image
enhancement techniques for instance Contrast Limited Adaptive Histogram Equalization
(CLAHE), normalization to improve the contrast. To extract features, Region-based
Convolutional Neural Networks (RCNN) in combination with K-Nearest Neighbor (KNN)
are employed to accurately localize important features such as blood vessels. Next, feature
classification is performed by employing CNN with LSTM for further capability to identify
3
patterns through sequences. This has proved to be effective since the hybrid approach has an
accuracy of 94 percent.
4
1.3 Methodology
Methodology Description
The methodology of this project is designed to predict cardiovascular diseases (CVD) by
analyzing retinal images through a multi-step process that involves image preprocessing,
feature extraction, and classification using deep learning techniques. The process leverages
several advanced techniques, such as Contrast Limited Adaptive Histogram Equalization
(CLAHE) for preprocessing, Region-based Convolutional Neural Networks (RCNN) combined
with K-Nearest Neighbors (KNN) for feature extraction, and Convolutional Neural Networks
(CNN) integrated with Long Short-Term Memory (LSTM) for feature classification. Below is a
detailed description of each step in the methodology.
1. Preprocessing of Retinal Images
The first step in the methodology involves preprocessing the retinal images to improve the
quality of the data and enhance the relevant features necessary for effective feature extraction.
This step is crucial because retinal images can vary in terms of lighting conditions, contrast,
and noise. Two main techniques are applied to improve the images:
Contrast Limited Adaptive Histogram Equalization (CLAHE):
CLAHE is used to enhance the contrast of the retinal images. It improves the visibility of finer
details in the image, such as blood vessels, optic disc, and other structures that are essential for
cardiovascular disease prediction. CLAHE works by dividing the image into smaller regions
(tiles), applying histogram equalization to each, and then limiting the amplification of contrast
to avoid over-amplifying noise in homogenous areas.
Normalization:
After contrast enhancement, normalization is applied to standardize the pixel intensity values
across all images. This step ensures that all images are scaled uniformly, making it easier for
the deep learning models to process the data consistently.
2. Feature Extraction Using RCNN and KNN
Once the retinal images are preprocessed, the next step is to extract meaningful features that
will aid in predicting cardiovascular disease. This is done using a combination of Region-based
Convolutional Neural Networks (RCNN) and K-Nearest Neighbors (KNN). Each plays a
crucial role in isolating and identifying important features:
Region-based Convolutional Neural Network (RCNN):
RCNN is a powerful deep learning architecture that is specifically designed to identify regions
of interest (RoI) in images. In the case of retinal images, RCNN helps detect important areas,
5
such as blood vessels, optic disc, and other key structures that may indicate cardiovascular risk.
RCNN works by first generating region proposals (potential areas of interest) and then applying
convolutional layers to each proposal to extract features relevant for classification.
K-Nearest Neighbors (KNN):
After the RCNN has identified and extracted initial features, KNN is applied to further process
and classify these features. KNN is a simple yet effective algorithm used to classify data points
based on their similarity to the nearest data points in the feature space. In this case, KNN helps
group the extracted features into clusters, making it easier to understand the relationship
between different parts of the retinal image and their association with cardiovascular health.
3. Feature Classification Using CNN and LSTM
After extracting relevant features using RCNN and KNN, the next step is to classify these
features based on the likelihood of cardiovascular disease. This is achieved by using a
combination of Convolutional Neural Networks (CNN) and Long Short-Term Memory
(LSTM) networks.
Convolutional Neural Network (CNN):
CNN is a powerful deep learning architecture designed for image classification tasks. It uses
convolutional layers to automatically learn spatial hierarchies in the image data, making it ideal
for analyzing retinal images. CNN helps detect complex patterns in the images, such as blood
vessel abnormalities or signs of atherosclerosis, which are indicative of cardiovascular
conditions. The CNN architecture is composed of multiple layers, including convolutional
layers, pooling layers, and fully connected layers, which work together to extract features and
classify the input data.
Long Short-Term Memory (LSTM):
LSTM is a type of Recurrent Neural Network (RNN) that excels at learning and remembering
long-term dependencies in sequential data. In this project, LSTM is used to enhance the CNN’s
ability to classify retinal images by adding a temporal or sequential component. LSTM
networks can capture the relationships between features across multiple frames or data points,
making them particularly useful for detecting subtle patterns and long-range dependencies in
retinal images that may indicate cardiovascular risks.
6
1.4 Architecture diagram
Feature Extraction Feature Classification
Using RCNN and using CNN & LSTM
KNN
Preprocessing
Upload Image using CLAHE and No Pose Accuracy
Normalization o Analysis and Real-
Time Feedback
Yes
Displaying output at
User Interface
1. Upload Image
The process starts with the user uploading an image, which acts as the primary input for the
diagnostic system. This image could be a retinal scan, X-ray, or any other type of medical
image, depending on the application. The uploaded image is then forwarded to the
preprocessing stage, ensuring it is ready for further analysis. This step is crucial for initiating
the pipeline, as it provides the raw data required for processing and analysis.
2. Preprocessing Using CLAHE and Normalization
Preprocessing involves two critical steps: Contrast Limited Adaptive Histogram Equalization
(CLAHE) and normalization. CLAHE enhances the contrast of the image, focusing on
improving low-contrast regions while preventing the amplification of noise. This is
particularly beneficial in medical imaging, where subtle details are important for accurate
analysis. Normalization, on the other hand, adjusts the pixel intensity values to bring
consistency across the dataset. This step standardizes the input image, eliminating variations
caused by differences in imaging conditions such as lighting or resolution. Together, these
preprocessing techniques ensure that the image is optimized for feature extraction.
3. Feature Extraction Using RCNN
7
Region-based Convolutional Neural Networks (RCNN) are employed to extract significant
features from the preprocessed image. RCNN identifies regions of interest (ROIs) within the
image, isolating critical patterns such as vascular structures in retinal scans or abnormalities
in medical images. By focusing on relevant areas, RCNN reduces the influence of irrelevant
data, enabling more accurate and efficient analysis. This step is a foundational part of the
pipeline, as it translates raw visual data into meaningful features that can be processed
further.
4. Feature Classification Using CNN and AdaBoost The features extracted by RCNN are
classified using a combination of Convolutional Neural Networks (CNN) and AdaBoost.
CNN specializes in recognizing spatial patterns and hierarchies within the image, making it
highly effective for medical image analysis. AdaBoost, an ensemble learning technique,
enhances the classification process by combining multiple weak classifiers to form a robust
model. It focuses on difficult-to-classify samples, ensuring improved accuracy and reliability.
Together, CNN and AdaBoost create a strong classification framework that can handle
complex and subtle variations in the input data.
5. Pose Accuracy Analysis and Real-Time Feedback
This step evaluates the accuracy of the classification results by analyzing the pose and spatial
arrangement of the extracted features. If the system determines that the accuracy is
insufficient, real-time feedback is sent back to the preprocessing stage. This feedback loop
allows the system to iteratively refine the input image, addressing any inconsistencies or
errors. The inclusion of real-time feedback ensures that the system remains adaptive,
improving its performance with each iteration.
6. Decision Point
After pose accuracy analysis, the system reaches a decision point. If the analysis confirms the
accuracy of the classification, the process proceeds to disease diagnosis and categorization. If
the accuracy is not satisfactory, the system loops back to the preprocessing stage to refine the
image further. This decision point is essential for ensuring the reliability and precision of the
final output.
7. Diagnosing Diseases and Categorization
Once the pose accuracy analysis is successful, the system diagnoses the condition based on
the classified features. This step may involve identifying specific diseases, categorizing
them .
8
1.5 Organization of the Report
This report consists of the overview of what all topics discussed in this entire report in a brief
and concise manner with the sequence of topics presented.
Chapter 1: Introduction
In this section we discussed about the project and the use case of the project and how it is
useful to our users we discussed about the basic working of the overall project.
Chapter 2: Literature Survey
In this section we discussed about the existing approaches to solve this problem and their
drawbacks and the advantages. This section provides the required knowledge and a
momentum to carry out the project.
Chapter 3: Proposed Methods
In this section we discussed about the logical sequence in which we are solving the problem
and the methods that we adopted to solve the problem.
Chapter 4: Results and Discussions
In this section we provided a scope for AI evaluation. The AI evaluation gave RCNN and
CNN+Adaboost models of 0.96
Chapter 5: Conclusion and Future Enhancements
In this project, we used the RCNN model to extract the features of retinal images , CNN and
Adaboost to classify into heart diseases we got accuracy of 96% .
Chapter 6: Appendices
This chapter consists of sample code of the proposed method. As code is very long only 3
pages of the code is given . Feature Extraction and Classification to predict heart diseases
and flask interface to interact with the front end is taken here .
9
CHAPTER 2
LITERATURE SURVEY
Deema Moharram ,Xuguang Yuan and Dan Li .[3] “Prediction of cardiovascular disease risk
based on major contributing features” , [1] It uses UCI's Heart Disease dataset for training and
testing [2] The study employs Learning Vector Quantization (LVQ) and compares its
performance with traditional classifiers such as Decision Tree (DT), Support Vector Machine
(SVM), and Random Forest (RF). [3] LVQ showed superior accuracy (98.78%) compared to
other models.[5] Faster R-CNN,Retina-net are two Stage Framework which leads to having
complex process compared to YOLO-v5
Adel Ammar, Anis Koubaa,and Bilel Benjdira [4] Recognition of Cardiovascular Diseases
through Retinal Images Using Optic Cup to Optic Disc Ratio . its objective is to detect and
count palm trees .it compares between several YOLO and CNN models and it selects the more
accurate and more optimal solution for the given problem. The models given for the
comparision are YOLOv3,YOLOv4,EfficientDet-D5,Faster R-CNN . Among the tested
10
models, YOLOv4 and EfficientDet-D5 achieved the best trade-off between accuracy Up to
94% mean average precision (mAP): This metric measures how well the model identifies palm
trees. A higher mAP indicates better accuracy.YOLOv4 uses CSPDarknet53 (Cross Stage
Partial Dark-net), which reduces computational complexity while maintaining a deep network.
This provides a more efficient feature extraction process , resulting in faster and more accurate
object detection . MADAN(multi-stage attention domain adaption network ) which has slower
interface and requires high computational resources compared to YOLO
Thani Jintasuttisak, Eran Edirisinghe, Ali Elbattay . [5]can deep learning on retinal images
augment known risk factors for cardiovascular disease prediction in diabetes? A prospective
cohort study from the national screening programme in Scotland , the main objective of this
research paper is Detection and Counting of Date Palm Trees and comparing with
YOLOv3,YOLO-vs(m), YOLO-V4, SSD300 and giving a conclusion that which is better ,
same like the previous research paper after deploying the model ,testing and training the
greatest accuracy is achieved by YOLO-V5m (medium depth) YOLO-vs(m) is used to identify
better for medium size objects where its accuracy is calculated by mean average precision of
92.34%.Before all YOLO models and R-CNN models they used LeNet CNN model but it is
only known for image classification but not object detection. and accuracy is very less.
Mukhes Sri Muna, Andri Prima Nugroho, Muhdan Syarovy, Ardan Wiratmoko, Suwardi, Lilik
Sutiarso .[7] An Overview of Deep-Learning-Based Methods for Cardiovascular Risk
Assessment with Retinal Images, the main objective of the project is to To Detect and Calculate
number of Palm Oil Trees present.the model used in this research paper is YOLO-v3 which is
11
object detection model perfectly used to detect and count number of trees. The validation test
for MAPE results is 0.057627 or 5.76%. YOLO v3 architecture has advantages in detection
speed while maintaining a specific Mean Average Precision (mAP), is relatively easy to
modify, and has a faster computation time. Before using YOLO-v3 model they used Faster R-
CNN,Masked CNN compared to YOLO model these models are complicated and time
consuming.
Ling Yao, Tang Liu, Jun Qin , Ning Lu , Chenghu Zhou [8] “Image Analytics for Tree
Enumeration and Diversion of Forest Land” To use Contrast Limited Adaptive Histogram
Equalization (CLAHE).It categorizes trees based on their crown size and environmental
conditions) . marked R-CNN is used for this approach with compared to CNN R-CNN
performs better accuracy in terms of f1-score approach.It Has High Accuracy in counting
Number of Trees Compared to Other one at their time.They used Basic CNN method to count
number of trees it has low accuracy compared to Masked R-CNN algorithm.
Samir Khan, Prasun Kumar Gupta [9] "Cardiovascular Disease Diagnosis from DXA Scan and
Retinal Images Using Deep Learning" The main objective of the reserch paper is to develop a
speed , robust and accurate method for counting trees in forests using high spatial-resolution
satellite imagery. This is achieved through the application of deep neural networks (DNNs) to
automate the process, improving the efficiency. A deep neural network architecture suitable for
object detection is selected ie., DNN (deep neural network).the Encoder-Decoder Network
achieved an average MAE of 47.34, an average RMSE of 83.17, and an average R2 of 0.933,
surpassing the second-best approach by 14.6% for MAE and 13.9% for RMSE. The advantages
of this model are model can process large areas of forest efficiently, making it suitable for
large-scale monitoring projects (scalability). High accuracy in detecting and counting trees,
with the ability to distinguish between different species and densities. While the current
algorithm performs well with less than 50% overlap, further improvements could be made to
handle even lower overlap scenarios more effectively.
Devidas Dukale, Sumit Agale, Swaroop Kalunge, Pushpak Nikam . [10] Application of Image
Analytics for Tree Enumeration and Diversion of Forest Land. The main objective of this study
is to compare the effectiveness of various tree counting algorithms in different vegetative
regions, specifically dense and sparse forests. Models like YOLOv3,Faster R-CNN, and
Retina-net Used to detect object in comparison. Which model gives better accuracy and better
12
speed is taken into account this is possible when a particular datasets is undergone all tests of
the above models after every model has undergone the highest accuracy is achieved by
YOLOv3, which shows improved precision and recall in both dense and sparse regions.
Typically, F1-scores is 0.8. Advantages of the model : High accuracy and ability to handle
complex scenes. Automated feature extraction reduces the need for manual intervention. It is
Scalable for large-scale applications. Further development should be enhancing the adaptability
of the algorithm in diverse real-time environments and varying weather conditions.
Lei Li, Tianfang Zhang, Zhongyu Jiang, Cheng-Yen Yang, Jenq-Neng Hwang, Stefan
Oehmcke, Dimitri Pierre Johannes Gominski, Fabian Gieseke, Christian Igel .[11] Tree
Counting by Bridging 3D Point Clouds with Imagery. The main objective of the project is
Comparing a Deep Learning approach to count trees using 3D airborne Li DAR data and 2D
imagery data . The Model undergone in this project is Faster R- CNN. These are the accuracy
results after execution MAE (Mean Absolute Error): 7.193,RMSE (Root Mean Square
Error) :10.461 , (R-squared): 0.9317 . this accuracy is acheived when block number is four
different number of blocks are taken.This method bypasses the need of intermediate bounding
box representation .without representing bounding boxes we can directly infer tree count from
point cloud data. Further modifications needed in Alignment issues Bounding Boxes
Limitations and Generalization for the reserch paper.
K Budnik, J Byrtek and A Kapusta .[12] Counting trees - methods of automatic analysis of
photogrammetric data in forests of the continental region. The main objecttive of the project is
evaluation and comparition of two automated tree detection methods , Faster R-CNN and
Template Maping , for analysing photogramatic data . methodology used are Template
matching and Faster R-CNN. Accuracy of the project is calculated by MAE (Mean Absolute
Error) and RMSE(Root Mean Square Error) the accuracy values are 11.671. Advantages are
Faster R-CNN showed robustness in detecting trees under different weather conditions, making
it a reliable tool for forest management. Further Modifications are validating the methods
across different forest types and geographies.
Omair Hassaan , Ahmad Kamal Nasir , Hubert Roth , M. Fakhir Khan Precision Forestry: [13]
Trees Counting in Urban Areas Using Visible Imagery based on an Unmanned Aerial Vehicle.
The main objective of the project is Developing an efficient and accurate method for counting
trees in urban areas using UAV imagery and Computer vision algorithm. Methodology used is
13
Developing an efficient and accurate method for counting trees in urban areas using UAV
imagery and Computer vision algorithm. Accuracy is 92% .Advantage is UAV imagery
provides high resolution compare to satillite image dataset , which leeds to more accurately
count trees. Further Enhancement is r scalability of the method in different environments such
as dense forest area , varying geographical areas and in small Villages.
Masato Katoh , andFrancois A. Gougeon [14] Improving the Precision of Tree Counting by
Combining Tree Detection with Crown Delineation and Classification on Homogeneity Guided
Smoothed High Resolution (50 cm) Multispectral Airborne Digital Data. Developing an
efficient and accurate method for counting trees in urban areas using UAV imagery and
Computer vision algorithm. Using CNN . Further reserch needed in Extended the technique
to larger areas and multiple scenes Adapt the method to varied topographies and different forest
conditions.Automated process Suitable for Large Scale Forest Managment Compatible with
commonly used GIS Software Facilitating Practical Applications.
Pengcheng Han,Cunbao Ma ,Jian Chen ,Lin Chen ,Shuhui ,Shibiao Xu ,Yong Zhao ,Chenhua
Zhang andTatsuya Hagino “Fast Tree Detection and Counting on UAVs for Sequential Aerial
Images with Generating Orthophoto Mosaicing” developing an efficient and accurate method
for counting trees in urban areas using UAV imagery and computer vision algorithms. This
method aims to aid in forest management and environmental monitoring by providing precise
data on tree populations LoD algorithm. By Using computer vision and machine learning
algorithms tree counting process gets automated, reducing manual effort and potential errors.
the current algorithm performs well with less than 50% overlap, further improvements could be
made to handle even lower overlap scenarios more effective.
Ref.
No Methodology Dataset Name Advantage Drawbacks Results
14
automatically detect CVD screening training the
R2 score of 0.503
retinal diseases using retinal deep learning
using fundus images. model.
images. Using
CNN
The paper utilizes UCI Heart Dis The ensemble m Individual mod The ensemble mo
15
multiple machine l
earning algorithms
odel achieves hig
, such as Random els like QUEST
ease dataset an h accuracy (99.1
[4.] Forest, Neural Net and Neural Net del significantly o
d another datas %) by combining
works, Bayesian N works had lowe utperformed indiv
et from Kaggle multiple models,
etworks, and QUE r accuracies (84 idual classifiers
. enhancing predic
ST, for cardiovasc %-91%).
tive ability.
ular disease predict
ion.
A Random Forest
model is used for c Computationall The Random Fore
High accuracy (
ardiovascular disea y intensive, req st model achieved
99%) due to ense
se prediction with Heart Disease uires careful tu 99% accuracy, ou
mble learning, be
[5.] feature selection a Dataset from ning, and can b tperforming K-
tter feature select
nd outlier removal, Kaggle. e slower compa NN (95%), SVM
ion, and handling
evaluated against red to simpler (85%), and LR (8
of overfitting.
K-NN, SVM, and models. 7%).
LR.
The study used the The XGBH mod
XGBH machine le el showed high a Limited general The XGBH mode
arning model, train Shanxi Bethun ccuracy (AUC = izability outside l achieved an AU
[6.] ed on 70,000 samp e Hospital and 0.81) and reduce the studied regi C of 0.81, outperf
les from Shanxi Be Kaggle Cardio d computational ons and some u orming traditional
thune Hospital and vascular Disea time with a simpl naddressed outl models and other
a Kaggle dataset, f se dataset. ified version usin iers in blood pr machine learning
or cardiovascular d g only three feat essure data. algorithms .
isease prediction. ures.
The study used reti The DRIONS This approach all The system’s ac The model achiev
nal images to detec database, cont ows for non- curacy may dec ed an accuracy of
t cardiovascular di aining 110 col invasive detectio rease with blurr 86.7% for disease
seases by extractin or retinal imag n of cardiovascul ed images or th d retinal images w
g features like the es ar diseases using e presence of n ith high sensitivit
optic cup-to-disc r retinal images an oise, requiring a y and specificity
16
[7.] atio (CDR). Segme
ntation and classifi
cation of retinal ve
ssels were perform d provides high a dvanced techni
ed using Support ccuracy in early ques to handle s
Vector Machines ( detection. uch data.
SVM) and Artifici
al Neural Network
s (ANN).
The study used dee
Scottish Diabe The increment
p learning (DL) on
tes Research N in predictive pe The DL model sh
retinal images fro The DL model i
etwork dataset rformance by a owed a modest im
m diabetic patients mproved CVD p
(SDRN-NDS), dding DL score provement in pred
to predict cardiova rediction by iden
which includes s was statisticall ictive performanc
scular disease (CV tifying retinal fea
over 226,000 p y significant bu e, with the C-
[8.] D) risk. A ResNet- tures predictive o
atients from th t small, with li statistic increasin
101 model was trai f CVD, providin
e Scottish Dia mited improve g from 0.820 to 0.
ned on bilateral ret g additional insig
betic Retinopat ments in the C- 822 for type 1 dia
inal images to pred ht into diabetes-
hy Screening ( statistic for bot betes and 0.709 to
ict CVD and other related cardiovas
SDRS) progra h type 1 and typ 0.711 for type 2 d
risk factors like bl cular risks.
m (2008- e 2 diabetes coh iabetes
ood pressure and r
2018). orts.
enal function.
Utilizes the YOLO Custom parkin Real-time detecti Struggles with The model achiev
(You Only Look O g lot images da on with high acc heavily occlude ed a mean averag
nce) model, with e taset with man uracy and reduce d vehicles or cl e precision (mAP)
nhancements for d ually labeled i d computation ti uttered environ of 92.3% in detect
etecting vehicles a mages me. Optimized Y ments, leading t ing vehicles and p
nd parking spaces i OLO network ca o false positives arking spots in va
n realtime. The mo n process frames . Requires powe rious parking envi
[9.] del was trained to faster than other rful hardware f ronments.
optimize for vehicl models. or real-time per
e size and shape to formance.
17
improve detection
accuracy.
Proposes a deep lear
ningbased parking li
Enables more prec
ne detection system, Limited to clear Achieved a 95% su
ise parking line de
using convolutional AVM (Around- weather and light ccess rate in detecti
tection, making au
neural networks (CN View Monitorin ing conditions. D ng parking lines an
[10.] tonomous parking
Ns) integrated with a g) dataset captur oes not perform d was implemented
smoother and less
roundview monitori ed from vehicle well in dark or o successfully in a si
error-prone, especi
ng to detect lines in cameras. verly bright cond mulated autonomou
ally in tight spaces
parking lots and assi itions. s parking system.
.
st aautonomous park
ing.
Non-invasive, c
Deep learning (D Variations in da
ost-effective met High accuracy in
L) methods, for ass taset quality, et
Public datasets hod for detecting predicting CVD ri
essing cardiovascu hnic diversity, a
such as UK Bi CVD risks. AI m sk factors, with A
lar diseases (CVD) nd image resolu
obank, Qatar B odels can identif UC values often a
using retinal fundu tion can impact
[11.] iobank, and Ey y subtle retinal c bove 0.90. Model
s images. Models l model accuracy
ePACS were u hanges correlate s like MobileNet-
ike ResNet, Mobil . Clinical valida
tilized in these d with cardiovas V2 predicted smo
eNet, and Inceptio tion across broa
studies. cular health, aidi king status with 8
n-v3 were commo der populations
ng in early diagn 8.88% accuracy
nly used. is still needed.
osis.
The dataset is split The Heart Dis High classificati Potential limitat The Gradient Bo
into training and te ease dataset us on performance ions due to relia osting model achi
st sets (70%to30% ed is a compos with an accuracy nce on secondar eved an Area Und
), and various mac ite dataset synt of 90.94% and F y data and cross er the Curve (AU
hine learning mode hesized from fi 1-score of 92.3% -sectional desig C) of 0.927, maki
ls such as XGBoos ve heart datase . Efficient featur n, which restrict ng it highly effect
[12.] t, AdaBoost, Rand ts:Cleveland,H e selection using s the ability to a ive for early detec
om Forest, Decisio ungarian,Long SHAP values ssess long-term tion of cardiovasc
n Tree, and others Beach VA ,Sta effects ular disease
are applied. log (Heart), Da
18
ta Set
19
st, Decision Tree,
t.
and others.
20
2.2 Summary Drawbacks of Existing Approaches
Sometimes uncomfortable or potentially dangerous for the patient, for instance
angiography or cardiac catheterisation. Affordable and available to the general public,
particularly for people in the poor area. They are not particularly useful in the early diagnosis
of cardiovascular diseases because only advanced cases are considered.
Electrocardiograms (ECGs) and Blood Tests: Relatively poor in detecting structural
distortions or the primary signs of atherosclerotic diseases. May result in false positive or
false negative –may causes unnecessary procedure or failure to diagnose disease. Imaging
Techniques (CT, MRI, and Ultrasound):
Able to necessitate expensive and complex equipment and technical personnel thus
becoming expensive as well as inaccessible.Its timeliness has been followed by worrying
effects that occur due to radiation exposure with frequent CT scans. Unable to relate lesions
to further progression of CVD diseases.
Traditional Retinal Image Analysis: Many of them are based on the hand crafted
features, for example the vessel segmentation, which may overlook fine pattern details.
Limited integration with present day sophisticated machine learning models which helps the
identification achieve higher precision. Finally, generalization of deep models across
different populations since most deep models learn from datasets that are small or have a
skewed distribution of image acquisition.
Deep Learning-Based Approaches Without Retinal Insights: Most current deep
learning tools also lack the ability to detect microvascular changes when trained on
cardiovascular data in general, for example, echo or ECG signals. Cardiac data does not
capture necessary information from the fundus images of the eye which have strong
associations with cardiovascular disease.
General Machine Learning Techniques: Relieve themselves on feature engineering which can
initially involve qualitative work and is normally less accurate than the qualitative feature
extraction provided by deep leaning. It can be hypothesized that the performance decreases
with the decrease of the quality or availability of domain-specific features.
But there are some issues with present methodologies for the prediction of
cardiovascular disease (CVD) and tree counting. Most deep learning models for screening
CVD using retinal fundus images; including models based on CNN, are computationally
intensive and expensive, both in terms of hardware and time for training. This makes their
21
application less feasible in the resource-constrained environment and this has been one of the
challenges that such institutions have faced. Also, the availability of these models is restricted
to regional datasets making them confined to an ethnic diverse population with diverse
settings. For example, models trained with datasets derived from Scottish Diabetes Research
Network or Shanxi Bethune Hospital fail to work for other ethnicities or different regions
because of the difference in data quality, image resolution, or the patient’s population.
Random Forest, Decision Tree, and XGBoost methods have been used with high accuracy,
but they involve a lot of feature engineering and fine tuning of hyperparameters. This make
this reliance on manual effort very time-consuming and at the same time is maxed by the
probability of human mistakes. Moreover, some of the models like the XGBH and the
method of ensemble, are quite time consuming that makes them unsuitable for real time
analysis. This is especially so due to the absence of testing on held-out datasets or validation
on external data, as has been shown in studies that merely assessed the models using
restricted dataset splits.
In deep learning approaches such as ResNet and MobileNet for the classification of
CVD risk from retinal images, although high accuracy is achievable, the gain in minor
changes in other metrics such as the C-statistic inResNet and MobileNet often raises
questions about their clinical relevance. Furthermore there is a presence of noises, outliers
and low contrast images in retinal datasets, which pose a threat to the models thereby need to
undergo high level of preprocessing to enhance the model accuracy.
Finally, several methodologies include only a specific dataset like UCI Heart
Disease or Kaggle cardiovascular datasets that are usually subjected to origin bias. Certain of
these models may not perform well with new data; hence, they may not be very helpful in
solving everyday problems. To overcome these limitations further, models need be designed
with better generalization capabilities, the preprocessing steps needs to be made more
effective, and the techniques must be designed optimally in terms of computation time and
complexity.
Traditional Retinal Image Analysis: Many of them are based on the hand crafted
features, for example the vessel segmentation, which may overlook fine pattern details.
Limited integration with present day sophisticated machine learning models which helps the
identification achieve higher precision. Finally, generalization of deep models across
different populations since most deep models learn from datasets that are small or have a
skewed distribution of image acquisition.
22
CHAPTER 3
PROPOSED METHOD
3.1 Problem Statement and Objectives
Cardiovascular diseases (CVDs) have become some of the common cause of death globally,
largely due to atherosclerosis and blood clotting. CVDs can be prevented if diagnosed in the
early stages, however, available diagnostic techniques are expensive, may involve invasive
procedures, and are unavailable to large populations. In contrast, retinal images are innocent,
non-contact and inexpensive method of identifying cardiovascular threats, which can be
derived from vascular analysis. However, the construction of a robust and accurate feature
extraction system from the retinal images for performance of CVD prediction has been a
work in progress.
Currently used diagnostic tools in cardiovascular diseases are expensive and frequently
demand sophisticated equipment and highly skilled personnel, meaning that their
application might be very limited when resources are scarce.
Currently, there are no other methods which can give both low cost and high accuracy of
detecting cardiovascular risks at early stages.
In particular, current clinical approaches do not exploit the full diagnostic potential of
retinal vascular pattern, which is related to cardiovascular status.
Lack of a uniform, deep learning-based approach for assessing the retinal images in order
to identify CVD is a limitation in the more widespread applicability of such solutions.
3.1.2 Objectives
To establish an economical and efficient approach for using deep learning for the assessment of
cardiovascular disease risks from retinal images. This involves:
1. Filtering of retinal images by applying the Contrast Limited Adaptive Histogram
Equalization (CLAHE) and image normalization for better visualization.
2. Feature extraction using RCNN integrated with feature matching using Speeded-Up Robust
Features (SURF).
3. A combination of Convolutional Neural Network (CNN) and Long Short-Term
Memory(LSTM) networks for classifying features that enhance the aspect of prediction.
4. Offering a diagnostic approach that is painless, effective and can be applied at a large scale
to people with CVD.
23
3.1 Detailed Explanation of Architecture Diagram
Introduction
Cardiovascular Disease is still one of the biggest killers of people around the globe. An early
stage of diagnosis take major central among the factors that help to increase the survival
rates. Therefore, retinal imaging has adopted a significant role for non-invasive diagnosis of
heart disease. There is evidence proving that the retinal vasculature stands for the systemic
cardiovascular conditions, which makes it perfect for biomarking. It is possible to use the
most sophisticated deep learning algorithms for that, which can pick out very fine details in
these images. The current paper presents a fusion of RCNN and KNN in feature extraction
process and CNN and LSTM in feature classification.
There are two feature extraction and two feature classification techniques explanation of each
is given in detail .
3.1.1 What is R-CNN? How Does R-CNN Work?
Region-based Convolutional Neural Network (R-CNN) is a type of deep learning architecture
used for object detection in computer vision tasks. RCNN was one of the pioneering models
that helped advance the object detection field by combining the power of convolutional
neural networks and region-based approaches.
24
step by step.
3.1.2 Region Proposal
R-CNN starts by dividing the input image into multiple regions or subregions. These regions
are referred to as "region proposals" or "region candidates." The region proposal step is
responsible for generating a set of potential regions in the image that are likely to contain
objects. R-CNN does not generate these proposals itself; instead, it relies on external methods
like Selective Search or EdgeBoxes to generate region proposals.
Selective Search, for example, operates by merging or splitting segments of the image based
on various image cues like color, texture, and shape to create a diverse set of region
proposals.
Below we show how Selective Search works, which shows an input image, then an image
with many segmented masks, then fewer masks, then masks that comprise the main
components in the image.
25
networks are known to be subjected to excessive overfitting.
A number of parameters would represent an image, or say matrix of numbers, with
dimensions of ( rows * columns * number of channels). Therefore, a minimum size of a real-
world image can easily be considered as 200 * 200 * 3. One way to feed-forward an image
into the network would be to flatten it completely into a one-dimensional matrix. The need
would be to have too many neurons and thus too many weights at the first hidden layer since
it represents a very high percentage of parameters that could increase susceptibility to
overfitting.
Differently, convolutional neural networks adopt a different approach to processing images.
Contrary to flattening an image and processing it in one pass, small patches of an image are
processed bit by bit across the whole image with CNNs. This enables the network to
internalize essential features of the image, but with lesser neurons and fewer parameters that
are used. This comparatively also makes the chances of overfitting lesser and thus makes
CNNs a more efficient alternative to feed-forward neural networks.
The inner working of convolutional neural networks
Before entering into the complex workings of convolutional neural networks, it is essential to
understand a few basics about the image and how they are represented.
Images are presented as a matrix of degrees denoting pixel values; each pixel has a specific
color value. Most images these days are RGB images, having three planes for display of Red,
Green
26
novel classifiers for each object class to be considered. R-CNN, normally adopted Support
Vector Machines (SVMs) for classification. For each class, there is a different SVM to
indicate whether or not a region proposal contains an instance of the class.
Positive samples are areas that include an example of the class during training of the model.
In this case negative samples refer to areas that have not been assigned a class label of forest.
3.1.4 Bounding Box Regression
Like other models of the class, R-CNN also accomplishes the notion of bounding box
regression besides the object classification. For each class, another regression model is used
in order to fine tune the location and the size of the box drawn around the object. The method
of bounding box regression assists to enhance location prediction by refining an initially
given bounding box to enclosing an object more precisely.
3.1.5 Non-Maximum Suppression (NMS)
Once RPN has classified the regions and used linear regression to transform the bounding
box for each region proposal, the R-CNN has the procedure of suppressing the redundant and
overlapping boxes. They added that NMS guarantees that the fullest bounding boxes that
possess the highest degree of reliability and which are also non-redundant are selected as the
actual targets of incorporation into object detection systems.
Strengths of R-CNN
Listed below are some of the advantages of the R-CNN architecture.
Accurate Object Detection: Through the use of region-based convolutional features, R-CNN
enables accurate detection specially of objects. It performs very well under situations where
the accurate localization and identification of the object is important.
Robustness to Object Variations: Since the R-CNN are capable of detecting different sizes,
orientations and scales of the objects, they can work well in situations where the object
contains variations of size and position and also has complex background.
Flexibility: R-CNN is an easily extensible architecture that can be used for instance
segmentation and for object tracking. This is because, other than the initial layers, R-CNN
can be customized to fit your requirements depending on the more final layers of your
network.
Disadvantages of R-CNN
Here are some of the disadvantages of the R-CNN architecture which we need to discuss.
Computational Complexity: Regarding the limitations, R-CNN is slow and with high
computational time. It includes feature extraction, where region proposals are extracted and a
CNN is applied to each of the extracted proposals and feature classification. This multiple
27
step approach can therefore be time consuming and costly.
Slow Inference: An advantage of F-RCN and higher‐level feature maps is that it is faster
during inference because of its sequential processing of region proposals compared to R-
CNN. This latency may be a problem with real-time applications.
Overlapping Region Proposals: Multiple region proposals may be produced when using R-
CNN while some of them may have large overlap; this would result in repeated calculations
hence reduce performance.
R-CNN is Not End-to-End: However, like all of the feature extraction architectures we have
seen up to this point, R-CNN is not an end-to-end object detection architecture like Faster R-
CNN or YOLO. The region proposal and the classification are done in two different modules
which can result in improved but not optimal results as opposed to when both modules are
optimized together.
Performance of the R-CNN
In this section, we will look deeply into R-CNN performance, supplemented with visual
demonstration for object detection using Pascal VoC 2007 dataset.
The image above speaks about few things as given below:
Significance of features: going from HOG/DPM algorithm to a Convolutional Neural
Network like AlexNext adds something around 10 more points in mAP.
Impact of Fine-Tuning: Going for fine-tuning rather than starting from just pre-trained
weights again does prop up the mAP by 10-odd points.
Influence of Bounding Box Correction: Introducing bounding box correction boosts mAP
further by about 4 points.
Role of architecture: Moving from AlexNet to VGG (henceforth referred to as O-Net in the
table because of the evolving nomenclature) enhances the mAP by another 8 points.
In the subsequent section, we present visual illustrations of R-CNN object's detection
performance on the Pascal VoC 2007 dataset.
Max Supression
Indeed, this blog post takes the reader into the deep and murky waters of object detection
frameworks. R-CNN architecture has been proven to show incredible promise in achieving
high accuracies in the object detection task itself. In particular, R-CNN investigated the
critical features affecting mAP score, focusing on the essential properties, fine-tuning
methods, bounding box correction, and architectural features.
However, one cannot really shy away from the fact that R-CNN comes with certain
stumbling blocks. Its computational complexities can really be intimidating at times,
28
especially when it deals with slowness of inference times, particularly in real-time situations.
Indeed, the said R-CNN brings one to the end of this journey. It's already pretty evident that
this framework has left a lasting mark on the evolutionary accents of this field in computer
vision. The hallmark contributions of object detection-related tasks undertaken here laid the
groundwork for further progress in subsequent activities from that point onwards, as further
progress continued to build on its foundations.
29
suitable for all sort of data that might be available in classification or regression problems. It
is a procedure performed outside a parametric analysis, and it is also used to forecast
outcomes based on the similarity of data in the set. K-NN is immune to outliers more so than
the other algorithms.
30
to then run the output through a fully connected layer followed by an activation function, an
operation which normally converts such outputs into a classification for an image's contents
or returns any other information regarding locations of various objects in the image.
Unlike the other neural networks, CNNs differ significantly from other architectures by their
reliance primarily on convolution instead of on matrix multiplication, which is evident from
their circulant structures. Therefore, mathematically, convolution is defined in the language
of functions as one, which can convert two functions into a third function, representing the
modification of one by the other. Further process includes pooling and going through fully
connected layers and activation function to predict outcomes. Below is the basic architecture
of CNN.
31
How Do Convolutional Neural Networks Operate?
First, one need to understand the fundamentals of images and how they are represented
before they could dig into the nitty-gritty of convolutional neural networks.
An image can be modeled as a matrix of pixel values, with each pixel assigned a particular
color value. Most images come in RGB format , due to which they define three planes of red,
green, and blue. Grayscale images, however, only have one plane, which pertains to the
intensities of a pixel.
32
the same being applied to a grid of pixel data in 5x5 format:
Convolutional neural networks consist of a multitude of different layers that are
interconnected by a number of artificial "neurons". Each of these artificial neurons is simply a
mathematical function which takes multiple inputs and weighs them before returning an
activation value.As for the first layer, ConvNet detects basic features such as edges, corners
and other simple shapes. Once the image is passed through these layers, the network detects
more complex features like objects, faces and so on as it integrates information from previous
layers. This somewhat translates deep into the model where an image can even identify more
complex features such as objects, faces, etc.
33
as the much touted R-CNN (Regions with Convolutional Neural Networks) models into
which they divide a single image into chunks called "region proposals", where region
proposals indicate candidate object parts in the image.
Such proposals would then feed into a second stage where images are subjected to a fully-
connected network classifying the objects within these regions, along with refining their
bounding boxes. The latest approaches like YOLO (You Only Look Once) and RetinaNet
still use the single-stage methods to predict the bounding boxes, class labels, and confidence
scores directly from the input image in one single pass. Models tend to be faster than two-
stage models but often a little less accurate.
Though each specific architecture may be different, what typically underlies all approaches
using CNNs for object detection is the extraction of rich, discriminative features from images
through their convolutional layers so that they may serve as basis for predictions about
objects appearing in that image.
Layers in Convolutional Neural Networks
This chapter will give a general overview of the layers most commonly integrated within a
convolutional neural network architecture. Over the course of this section, we will view the
many different kinds of layers that can be incorporated into CNNs: the pooling layer,
activation layer, batch normalization layer, dropout-related layer among others.
Pooling
max pooling, and average pooling are all techniques of down-sampling the convolved
features in order to reduce the spatial size of the feature map. In specific, max pooling selects
the maximum value of a pixel in the region covered by the kernel.
This not only reduces computational power in processing data but also acts as a noise
suppressor, discarding spurious noisy activations along with the dimension reduction feature
of denoising.
On the contrary, average pooling simply returns the average of all the values from the region
of the image covered by the kernel. Although average pooling implements dimensionality
reduction, it does not serve the purpose of noise suppression as max pooling does.
In actual practice, therefore, max pooling offers better performance than average pooling and
is regarded as being more effective and robust. Indeed, by definition, their competitiveness
depends chiefly on the importance of a given task and the noise level in the dataset itself.
34
Figure 3.1.9 Pooling layer
Activation Function Layer
The activation function layer helps the network learn the non-linear mapping through the
input and output. It helps to introduce non-linearity into the network and thus enables the
network to model complex patterns and relationships in data. Thus the activation function
generally is employed on output of each neuron in the network after taking the weighted sum
of inputs after processing into output from it next.
The most used activation functions in CNN are:
Rectified linear unit (ReLU), sigmoid, Hyperbolic tan (tanh)
ReLU is famous in CNN as it used very effective computations and very sparse results for
any input data. Returns the input where it is positive and returns zero if negative. Sigmoid
and tanh are other nonlinear functions but they are less popular than ReLU used in CNN
because they usually give back a gradient close to zero on larger input numbers. The
activation function layer could be considered the "brain" of the CNN because valuable
transformations are used on input into somewhat meaningful representation of the data. It is
one of the basic components going into them.
A detailed guide to activation functions can be found in our activation function guide.
Batch Normalization Layer
The Batch Normalization (BN) layer is one of the most recruited layers in convolutional
neural networks which helps each neuron normalize such that its input reaches a zero mean
and unit variance. This eventually leads to increased stability in the learning process and also
prevents the problem of internal covariate shift, where the distribution of inputs into a layer
varied through training.
BN normalizes each neuron's output based on the mean and standard deviation of the batch of
35
input data and is usually arranged following the convolutional and fully connected layers.
Layer of Dropout
dropout layers drop out some neurons randomly in input for every forwarding pass in order to
reduce overfitting. Before every pass, a proportion of input units is set to zero, effectively
being removed from the network. During actual testing and validation, they do not use
dropout at all. Dropping out, therefore, compels the network to learn many independent
representations of the input, making it more resistance to changes in the input data and also
reducing the chances of overfitting. Dropout is a very widely used method for regularization
and is applied to all the fully connected layers of a CNN. It can also be applied to any layer.
The concept of dropout inspired by Hinton and his bank:
"I went to my bank. The tellers kept changing and I asked one of them why. He said he didn't
know but they got moved around a lot. I figured it must be because it would require
cooperation between employees to successfully defraud the bank. This made me realize that
randomly removing a different subset of neurons on each example would prevent
conspiracies and thus reduce overfitting."
For instance, let us train the network to detect faces. Faces do not always have all attributes,
such as a nose, or eyes, or ears, etc. Something is usually missing in a general case. For
example, I would train my network for eyes that are occluded; therefore, it makes a very good
idea to not have all those features that co-adapt but to learn a random subset of these features
so that we can generalize better on the unseen data. It will also force the face detector to work
with some of the attributes already missing.
"I went to my bank. The tellers kept changing and I asked one of them why. He said he didn't
know but they got moved around a lot. I figured it must be because it would require
cooperation between employees to successfully defraud the bank. This made me realize that
randomly removing a different subset of neurons on each example would prevent
conspiracies and thus reduce overfitting." This property makes it suitable for tasks such as
object detection and semantic segmentation. Capable of handling large volumes of data: CNN
is capable of learning from large volumes of data, making it a very useful model in training
on large data sets, such as ImageNet. Transfer Learning: Fine-tuning on smaller datasets can
be done with CNN models pre-trained on large datasets, such as ImageNet, thereby saving
enormous amounts of data and the computational resources required to train a model for a
specific task.
36
Conclusion
Convolutional Neural Networks (CNNs) are architectural designs that are popularly and
widely applied in the field of deep learning, especially in image classification, object
detection, and semantic segmentation. Some advantages that CNNs have over traditional
computer vision techniques include Robust to translation and rotation, because CNN learns
the features which are robust to small movements and rotational changes in any image
position. This property makes it suitable for tasks such as object detection and semantic
segmentation. Capable of handling large volumes of data: CNN is capable of learning from
large volumes of data, making it a very useful model in training on large data sets, such as
ImageNet. Transfer Learning: Fine-tuning on smaller datasets can be done with CNN models
pre-trained on large datasets, such as ImageNet, thereby saving enormous amounts of data
and the computational resources required to train a model for a specific task.
On the other hand, there are some limitations and challenges with CNNs such as:
The computation cost: The amount of computational resources needed for training and
deploying CNN architectures is extremely high, which limits its wide-scale deployment on
mobile and embedded devices.
Overfitting: Limited data may lead to overfitting in CNN architectures, leaving them unable
to generalize well to unseen data.
Explainability: CNNs are considered "black box models", because understanding their logic
is quite difficult; therefore, we cannot assess their decisions or point out possible mistakes.
3.2 Connectivity Diagram
37
The above mentioned picture depicts a typical connectivity of modules in a system which
includes image analysis, preprocessing, feature extraction and classification. Here's a detailed
explanation of the modules and their connections:
38
3.2.1 Software and Hardware Requirements
Software
Language : Python
Languages : Html, css, python
Libraries : OpenCV, Flask
Text Editor : IDE VS Code, PyCharm or Jupyter Notebook for coding
Hardware
Operating System : Windows 11 12th Gen
Processor : Intel® i5
RAM : 16 GB
System type : 64-bit operating system
Graphics Processing Unit (GPU)
39
3.3 Modules and its Description
This diagram represents the module connections of the project. It includes the use of YOLO
model for counting trees and Flask application for UI and backend API. A detailed
description of each module and its connections is given below:
1. Image Upload Module
Description:
This module enables a user to upload an image to the system. From here, it plays the role of
initiation of the pipeline and also checks on the readiness of the input image.
Key Functionality:
Uploading image files of the specified formats only.
In other words, the image is passed on to the next module of the program which is concerned
with preprocessing.
2. Image Preprocessing Module
Description:
Enhances the uploaded image and normalizes it with a view of having the feature extraction
process on the said image. This step enhances the quality of the image making it proper for
feature extraction as seen in the next section.
Key Functionality:
CLAHE Normalization:
Applies the contrast using Contrast Limited Adaptive Histogram Equalization (CLAHE) that
automatically fixes the intensity and contrast of an image.
Sends the improved image to the Feature Extraction Module.
3. Feature Extraction Module
Description:
A stage where meaningful features of the image that undergoes pre-processing are obtained.
Each of them reflects the necessary details or patterns in the image for classification features.
Key Techniques:
RCNN (Region-based Convolutional Neural Network):
Locates areas of interest in the image and makes detailed analyses of features contained
within the interesting areas.
Purpose: In order to pass the extracted features to the Feature Classification Module for
further evaluation.
40
4. Feature Classification Module
Description:
Sorts the known features into categories according to some defined classes. This module
decides the overall outcomes of the image analysis process.
Key Techniques:
CNN (Convolutional Neural Network):
It is applied where categorization, identification of visual patterns and images data is
required.
Adaboost:
Used when data is collected or arranged in stages, in other words if the features are ordered in
time, this approach might be applicable.
Purpose:
To deliver classification results which are sent to its downstream module which is; Image
Display Module.
5. Image Display Module
Description:
Shows the results of the decision-making discovery process of the classification. This module
allows results to be displayed in an understandable format to the user, serving as the link
between the system and the end user.
Key Functionality:
Outputs categorized outcomes (e.g., predicted class, label or measure).
It also allows for convenient navigation as a user to go through the results of the system.
41
3.4 Requirements Engineering
Functional
Each module's operation and function must be perfect.
It should be able to detect activities.The software's overall performance will enable users to
work efficiently.
The application is composed of parts that make it easy to locate and address problems. As a
result, it is easier to add and update new functionality.
Only the user has access to their own data..
Non-Functional
Accuracy: The system should be capable of recognizing the spoken and written responses
and analyse the quality of the responses returned from the ASR for the correct results.
Speed: Time conscious This means that the system should afford short response and
evaluation time, that is, there should not be significant time lag.
Security: The system should also incorporate ways and means of ensuring the…
information derived from the interview especially those that are personal to the
interviewee and their answers should in a way be safeguarded.
Scalability: It means that countless interviews must be processed simultaneously with a
similar amount of users at the same time without the loss of productive capacities.
Reliability: It should be efficient, and demonstrate operational fluency, with no crashes,
errors or similar mishaps occurring.
Accessibility: the system should be able to accommodate users with one or multiple
disabilities, for instance those with hearing/speech impairments as well as those with
assistive technologies.
Compatibility: Supporting multiple devices and browsers would have great importance to
allow as many users as possible to be interacting with the system with easy and
convenient manner.
42
3.5 Analysis and Design through UML
3.5.1 Class Diagram
A class diagram in the Unified Modelling Language (UML) shows the connections and
dependencies between classes in the source code. In the above Figure 3.5.1 The above
presented class diagram reveals a system meant for analyzing retinal images for diagnosis of
diseases. It consists of five main classes: There are five main modules of the system namely
Image Preprocessing, Feature Extraction, Feature Classification, Diagnosis Manager and
User Interface. This book is of the Image Preprocessing class that takes in the multifaceted
responsibilities of uploading images and adapting of CLAHE normally histograms of
equalization. The Feature Extraction class utilizes RCNN algorithms to generate specific
features from the discussed images. These features are processed by the Feature
Classification class which uses an Adaboost trained CNN model to categorize these features.
The last step is the result of the Diagnostic Manager that makes the final assessment
according to the classification result. Finally, there is the User Interface class which presents
the diagnosis result to the user. Interactions between the classes mimic a sequential flow of
tasks involved in data preprocessing, feature analysis, generation of diagnosis, and result
43
display. This diagram focuses during modularity and simplicity of the structure of the system.
3.5.2 Sequence Diagram
The sequence diagram, shows how the messages move through the system. Figure3.5.2
The retinal image analysis for diagnosing cardiovascular diseases is also shown in the figure
through sequence diagrams. Currently, the process begins with the User uploading an image
of the retinal image which is processed by the ImageProcessing module. In this module the
image is enhanced using the Contrast Limited Adaptive Histogram Equalization (CLAHE)
method and normalized for quality and standardization. Next, an image is processed with the
help of RCNN (Region-based Convolutional Neural Networks) to extract features, which is
then addressed to the FeatureClassification module. Here, the extracted features are classified
by using the CNN & Adaboost for predicting cardiovascular risks and then finding accuracy
of the classification. The classification results obtained are used to evaluate for any potential
features or discrepancies in the retinal structure. The system is able to provide proper
feedback by progressively applying the model for learning. Besides, the results of the
experiment are also divided into different groups in terms of severity so that doctors can
decide on which patients to attend to first. Last, the findings where the diagnosis, the
classification results, and the recommended action, are forwarded to the ImageProcessing
module, and to the User for analysis.
44
3.5.3 Use case diagram
The Use Case diagram depicts the interactions between the USER and the
SYSTEM as far as processing and diagnosing a retinal image is concerned In the Figure
3.5.3. The USER first starts by invoking the use case ‘Upload Image’. After uploading, the
SYSTEM performs image Preprocessing of the image uploaded by applying normalization
and enhancement. Following preprocessing, the next step is to extract feature from the image
using the Feature Extraction Using RCNN use case samples. This extracted feature is then
classified into disease categories using Feature Classification Using CNN and Adaboost. In
the end, the USER looks at the recommended results produced by the system. At the center of
the diagram you see the order of operations that the system performs, and you see the user’s
part as ‘initiating’ the process and as the ‘audience’ of the output. It thus displays a clear
decoupling of the input from the timer which in turn decouples the user input from the system
processing.
45
3.5.4 Activity Diagram
The activity diagram figure 3.5.4 provides information showing the flow of the work to be
done in the project; it is like a flowchart of the project work and has the flow of the project
work. The activity diagram depicts the workflow for predicting cardiovascular risk using
retinal images. The process begins with the user uploading a retinal image into the system.
Once the image is uploaded, it undergoes preprocessing using techniques such as Contrast
Limited Adaptive Histogram Equalization (CLAHE) and normalization. This step enhances
the image quality and prepares it for analysis. Following preprocessing, the system performs
feature extraction using a Region-based Convolutional Neural Network (RCNN). This
model identifies significant features in the retinal image, such as blood vessels and optic
discs, which are critical indicators of cardiovascular health. The extracted features are then
classified using a Convolutional Neural Network (CNN) enhanced with AdaBoost, a
boosting algorithm that improves the classification accuracy by combining multiple weak
classifiers into a stronger one. This ensures precise categorization of the input data into
appropriate risk levels, such as high-risk or low-risk cardiovascular conditions. After feature
classification, the system generates and displays the predicted results to the user. This
approach is cost-efficient and non-invasive, leveraging retinal images to assess
cardiovascular health. The integration of advanced deep learning techniques like RCNN and
46
AdaBoost enhances the accuracy and reliability of predictions. This diagram effectively
captures the sequence of steps in the workflow, emphasizing the interaction between the user
and the system to provide actionable insights into cardiovascular risk.
3.6 Testing
3. 6. 1 Types of Testing
Unit Testing: Checking the functional performance of each unit or component of the
software withoug regard to other units or the rest of the system.
Integration Testing: The process of making sure that a particular combination of
components or of systems will operate in a joint manner as is intended.
System Testing: Performing tests on the end to end available and integrated
application to know how it meets the laid down requirements.
Acceptance Testing: Validating the application in an actual working environment to
satisfy the intended users while confirming it is fit to be run.
Performance Testing: Testing of the software such as, loads, stress and scalability
analysis.
In the course of the project it was possible to use unit testing, integration testing and
performance testing to assess that the model complies with the identified needs.
Unit Testing: Module specific testing was conducted with different test scenarios
involving data preprocessing, the training of the model, an evaluation of the metrics.
CHAPTER 4
47
RESULTS AND DISCUSSIONS
4.1 Description about Dataset
In the present work, we have Dataset on Roboflow to train a machine learning model to
recognize trees and give the total number of trees on satellite images. This dataset splits the
images in training/validation/testing sets, and has additional annotations that point at the trees
in the images. When following the approach presented in this tutorial, working with datasets
is made easier through the use of Roboflow, allowing proper management, versioning, and
sharing, which is important in the development and deployment of tree counting models.
48
of size 113 images is divided where 4% of the dataset is considered the test set to measure the
performance of the model. This particular testing phase allows for fact-finding and
understanding of the model’s ability to identify trees in new and unseen images. However, for
further validation during the training phase, it is also divided a validation set that contain 4 %
of 105 images. This step is to try and improve the model and prevent what is called
overtraining ; this makes the model to be efficient in both the training data as well as new
data.
The Retinal Fundus Multi-Disease Image Dataset (RFMiD) is a comprehensive
dataset curated for the detection and analysis of multiple retinal diseases. It consists of retinal
fundus images categorized into 46 distinct disease classes, enabling researchers to develop
models for automated diagnosis. The dataset is structured into three key subsets: a training set
for model development, an evaluation set for validation, and a test set for performance
assessment. This dataset is a part of the Retinal Image Analysis for Multi-Disease Detection
(RIADD) initiative and has been widely used for machine learning and deep learning
applications in medical image analysis. The high-resolution retinal images capture a variety
of conditions, including diabetic retinopathy, glaucoma, macular degeneration, and other
abnormalities. The Dataset plays a crucial role in advancing research in ophthalmology by
enabling automated detection systems. Each image~ is labeled to represent specific retinal
conditions, providing a robust foundation for classification and segmentation tasks. The
RFMiD dataset is particularly significant for addressing the challenges of multi-disease
detection, as it includes variations in disease severity and complex cases. With its well-
structured datasets, RFMiD supports training and evaluation of models to ensure reliable
performance on unseen data. It encourages researchers to create efficient solutions for early
diagnosis and treatment of retinal diseases, ultimately aiming to reduce the risk of vision loss.
The dataset is structured into three key subsets: a training set for model development,
an evaluation set for validation, and a test set for performance assessment. This dataset is a
part of the Retinal Image Analysis for Multi-Disease Detection (RIADD) initiative and has
been widely used for machine learning and deep learning applications in medical image
analysis. The high-resolution retinal images capture a variety of conditions, including
diabetic retinopathy, glaucoma, macular degeneration, and other abnormalities. The Dataset
plays a crucial role in advancing research in ophthalmology by enabling automated detection
Systems .
49
4.2 Detailed Explanation about the Experimental Results
Screen Shots:
50
Description of the User Interface
Image 1: Home Page of the Heart Attack Prediction System
Title: "Heart Attack Prediction" is prominently displayed at the top, indicating the purpose of
the system.
Introduction: A brief description explains that it uses deep learning techniques to predict and
distinguish heart attacks from retinal images.
Key Features Highlighted:
Recurrent Neural Network (RNN): Mentioned as the primary deep learning model as it is
able to store temporal information like human memory.
Clinical Data: Utilizes over 6,000 clinical datasets for evaluation and improving accuracy
through RNN ensembles.
Accuracy: This advertises a very high 98% accuracy achieved by optimizing
hyperparameters, dropout rate, kernel size, and learning rate.
Web Application: Developed using interactive tools, such as Flask, for user-friendly access.
Image Upload Section:
It provides an opt to upload a retinal image for processing.
Input field to mention "Number of Clusters," most probably for image segmentation.
Image 2: Heart Attack Prediction through Retinal Images
The top of the page reads- "Heart Attack Prediction Using Retinal Images".
Original Image (Left):
Presents a normal retinal fundus picture with visible blood vessels and an optic disc.
Represents the raw input image utilized by the model for feature extraction.
Clustered Image (Right):
Here is shown "5 Cluster" segmented output in which retina is represented in color-coded
areas.
Each cluster is most probably indicating a separate feature such as blood vessels, optic disc,
and background.
Purpose of Clustering:
Aims to segregate important biomarkers related to cardiovascular disease.
It helps in visualizing the breakdown of the model's interpretation that can lead to early
detection of risks for heart attacks.
Next Button: This indicates that one may access subsequent procedures or results through this
section.
51
RESULT:
52
which signifies improvement in the ability of the model to distinguish diseased from non-
diseased regions within the images of the retina. This smooth decrease may suggest that the
classification organs are indeed optimally operating, owing to synergy between RCNN
feature extraction and CNN-based classification.
3. train/dfl_loss
The distribution focal loss measures the accuracy of bounding box prediction distributions.
The consistent decrease highlights the model's improved precision in predicting bounding
boxes with tighter and more accurate boundaries. This is essential for detecting subtle retinal
features. This is a graph that shows the precision of bounding box prediction, and the line
clearly shows a positive slope from the start to the end of the training process.
4. metrics/precision(B)
This graph shows the precision metric for bounding box predictions during training. The
upward trend demonstrates that the model is increasingly accurate in its predictions,
minimizing false positives as training progresses. This is crucial for detecting cardiovascular
risk indicators with high reliability.
5. val/box_loss
This graph depicts the bounding box regression loss on validation data. The sharp initial
decline followed by stabilization shows that the model generalizes well in localizing features
on unseen data. The consistent reduction in loss is a positive indicator of robustness.
6. val/cls_loss
The classification loss on validation data decreases sharply and stabilizes, reflecting the
model's ability to generalize its classification capabilities. This suggests that the combination
of RCNN for feature extraction and CNN with AdaBoost is effectively identifying
cardiovascular risk patterns in new images.
7. val/dfl_loss
The focal loss for validation data shows an initial spike followed by a decline. This behavior
indicates that the model adjusts to unseen data during early epochs and stabilizes as it learns.
The eventual decrease reflects improved bounding box prediction consistency on new retinal
images.
8. metrics/mAP50(B)
This graph represents the mean Average Precision at 50% IoU for bounding box predictions
on validation data. The increasing trend signifies that the model is becoming better at
detecting and localizing regions of interest with higher accuracy. This metric confirms the
model's overall performance in identifying key features linked to cardiovascular disease
53
risks .
4.3 Significance of the Proposed Method with its Advantages
Cardiovascular disease (CVD) is one of the leading causes of mortality worldwide, driven
primarily by atherosclerosis (the build-up of fatty deposits in arteries) and an increased risk of
blood clots. The early detection of CVD is critical to prevent severe outcomes and improve
patient health. Current diagnostic methods often require invasive procedures, expensive
equipment, and extensive time, making them inaccessible for many. This proposed method
leverages non-invasive retinal imaging as an alternative for early CVD diagnosis. Research
suggests that blood vessels in the retina carry critical information about cardiovascular health,
providing an innovative and cost-effective diagnostic approach.
54
Scalability and Potential Integration: The model's reliance on retinal images and deep learning
techniques makes it easily integrable into existing healthcare workflows. With minor
adaptations, it can be scaled for mass screenings in hospitals, clinics, and rural health programs,
facilitating widespread adoption.
This novel approach to CVD prediction, combining advanced deep learning techniques
with non-invasive imaging, holds the potential to transform preventive healthcare and save
countless lives. Cardiovascular disease (CVD) remains one of the leading causes of mortality
worldwide, primarily driven by atherosclerosis—the accumulation of fatty deposits in the
arteries—and an increased risk of blood clots. Early detection is essential to prevent severe
complications and improve patient outcomes. However, current diagnostic methods often
involve invasive procedures, costly equipment, and prolonged evaluation, limiting accessibility
for many patients. This study proposes a novel, non-invasive approach using retinal imaging
for early CVD diagnosis. Emerging research indicates that the retinal vasculature holds
valuable insights into cardiovascular health, offering a promising, cost-effective alternative for
early detection and monitoring of CVD. The dataset is structured into three key subsets: a
training set for model development, an evaluation set for validation, and a test set for
performance assessment. This dataset is a part of the Retinal Image Analysis for Multi-
Disease Detection (RIADD) initiative and has been widely used for machine learning and
deep learning applications in medical image analysis. The high-resolution retinal images
capture a variety of conditions, including diabetic retinopathy, glaucoma, macular
degeneration, and other abnormalities. The Dataset plays a crucial role in advancing research
in ophthalmology by enabling automated detection Systems. Alas, existing diagnostic
techniques frequently necessitate invasive testing including, angiography or cardiac
catheterization, expensive equipment like Scanner-CT, MRI, or lengthy evaluation
procedures, is still prohibitive to many especially in developing countries. In light of these
challenges, this study develops a new STUDY DESIGN that employs retinal imaging for
identifying biomarkers of CVD. Due to its potential in illustration of the cardiovascular
condition, the retinal vasculature serves as a biomarker. This method utilizes comprehensive
deep learning models, such as RCNN, CNN, and LSTM, to diagnose retina images and
simultaneously realize the early vision for CVD cost-effectively and at a large scale.
55
CHAPTER 5
CONCLUSION AND FUTURE ENHANCEMENTS
5.1 CONCLUSION
This project successfully demonstrates a novel and cost-efficient approach to predicting
cardiovascular risks using retinal images, achieving an impressive accuracy of 96%. By
focusing on the insights derived from retinal blood vessels, the methodology highlights the
potential of retinal imaging as a non-invasive diagnostic tool for cardiovascular diseases
(CVD). The combination of advanced preprocessing techniques like Contrast Limited
Adaptive Histogram Equalization (CLAHE) and normalization ensures the enhancement of
input image quality, while Region-based Convolutional Neural Networks (RCNN) with
Speeded-Up Robust Features (SURF) extracts rich, detailed features from the retinal images.
Further, the integration of Convolutional Neural Networks (CNN) with Long Short-Term
Memory (LSTM) networks effectively captures both spatial and temporal patterns,
significantly contributing to the model's high accuracy.
The achieved accuracy of 96% underscores the effectiveness of this hybrid deep learning
approach in providing reliable predictions, making it a promising solution for early and
efficient cardiovascular risk diagnosis. The project's uniqueness lies in leveraging retinal
imaging, which is significantly more cost-effective and accessible compared to conventional
cardiovascular diagnostic techniques, such as angiography or CT scans. Additionally, the
fusion of RCNN with SURF and CNN-LSTM ensures robustness and adaptability in
extracting and classifying features from a variety of retinal datasets.
This work establishes a solid foundation for future developments in non-invasive CVD
prediction, providing a scalable and efficient solution to a growing global health challenge.
By enabling early detection and intervention, this system has the potential to reduce
morbidity and mortality rates associated with cardiovascular diseases, as well as to alleviate
the burden on healthcare systems worldwide
5.2 Future Enhancement
Future enhancements for this project could focus on expanding the dataset to include more
diverse retinal images representing varied demographics and cardiovascular conditions to
improve the model's robustness and applicability. Incorporating multimodal data, such as
patient medical history, lifestyle factors, and laboratory test results, alongside retinal images,
could provide a more holistic and accurate prediction model. Developing lightweight versions
56
of the model optimized for deployment on mobile or handheld devices would enable real-
time, point-of-care diagnosis, increasing accessibility in remote and underserved areas.
Advanced feature extraction techniques, such as Vision Transformers or pretrained deep
learning models, could be explored to further enhance the model's performance and accuracy.
Automation of the preprocessing pipeline could streamline the workflow, reducing manual
intervention and processing time. Clinical validation through trials in real-world healthcare
settings would help assess the model's practical reliability and encourage its adoption in
clinical practices. Explainable AI techniques, such as Grad-CAM, could be integrated to
provide transparency and insights into the model's predictions, increasing trust among
clinicians. The model could also be extended to predict risks for other systemic diseases, such
as diabetes or hypertension, broadening its application scope. Enhancing the temporal
analysis capabilities of the CNN-LSTM model could allow tracking of disease progression
over time using longitudinal retinal imaging data. Additionally, partnerships with healthcare
providers and technology companies could accelerate the translation of this system into
commercially viable diagnostic tools. By integrating feedback from practitioners and patients,
the system could be refined for user-friendly interfaces and efficient workflows. The
incorporation of federated learning approaches would enable collaborative model training
without compromising data privacy, fostering global improvements. Leveraging cloud
computing for processing large datasets and providing scalable solutions could make the
system adaptable for broader populations. Improved normalization techniques could ensure
that variations in retinal images due to acquisition devices or conditions do not affect the
model's predictions. Advanced denoising methods could also be employed to handle low-
quality retinal images, making the system more robust. Finally, efforts to reduce computation
costs while maintaining accuracy could make the model more feasible for widespread
deployment in low-resource settings.
57
CHAPTER 6
APPENDICES
6.1 code for a Flask application that allows users to upload an image, processes it with
the RCNN model for object detection, and then displays the results
from flask import *
import os
from werkzeug.utils import secure_filename
import label_image
def load_image(image):
text = label_image.main(image)
return text
app = Flask(_name_)
@app.route('/')
@app.route('/first')
def first():
return render_template('first.html')
@app.route('/login')
def login():
return render_template('login.html')
@app.route('/chart')
def chart():
return render_template('chart.html')
@app.route('/index')
def index():
return render_template('index.html'
@app.route('/predict', methods=['GET', 'POST'])
def upload():
if request.method == 'POST':
# Get the file from post request
f = request.files['file']
file_path = secure_filename(f.filename)
f.save(file_path)
58
# Make prediction
result = load_image(file_path)
result = result.title()
d = {"Vitamin A":" → Deficiency of vitamin A is associated with significant morbidity
and mortality from common childhood infections, and is the world's leading preventable
cause of childhood blindness. Vitamin A deficiency also contributes to maternal mortality
and other poor outcomes of pregnancy and lactation.",
'Vitamin B':" → Vitamin B12 deficiency may lead to a reduction in healthy red blood cells
(anaemia). The nervous system may also be affected. Diet or certain medical conditions
may be the cause. Symptoms are rare but can include fatigue, breathlessness, numbness,
poor balance and memory trouble. Treatment includes dietary changes, B12 shots or
supplements.",
'Vitamin C':" → A condition caused by a severe lack of vitamin C in the diet. Vitamin C
is found in citrus fruits and vegetables. Scurvy results from a deficiency of vitamin C in the
diet. Symptoms may not occur for a few months after a person's dietary intake of vitamin C
drops too low. Bruising, bleeding gums, weakness, fatigue and rash are among scurvy
symptoms. Treatment involves taking vitamin C supplements and eating citrus fruits,
potatoes, broccoli and strawberries.",
'Vitamin D':" → Vitamin D deficiency can lead to a loss of bone density, which can
contribute to osteoporosis and fractures (broken bones). Severe vitamin D deficiency can
also lead to other diseases. In children, it can cause rickets. Rickets is a rare disease that
causes the bones to become soft and bend.",
"Vitamin E":" → Vitamin E needs some fat for the digestive system to absorb it. Vitamin
E deficiency can cause nerve and muscle damage that results in loss of feeling in the arms
and legs, loss of body movement control, muscle weakness, and vision problems. Another
sign of deficiency is a weakened immune system."}
result = result+d[result]
#result2 = result+d[result]
#result = [result]
#result3 = d[result]
print(result)
#print(result3)
os.remove(file_path)
return result
59
#return result3
return None
if _name_ == '_main_':
app.run()
6.2 The HTML code into a Flask application by dividing it into multiple HTML files
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Home - Tree Counting Using Satellite Image</title>
<link rel="stylesheet" href="{{ url_for('static', filename='css/styles.css') }}">
</head>
<body>
<header>
<h1>Image Analytics for Forest Land Diversion Tree Enumeration</h1>
</header>
<nav>
<ul class="navbar-nav">
<li><a href="{{ url_for('index') }}">Home</a></li>
<li><a href="{{ url_for('about') }}">About</a></li>
<li><a href="{{ url_for('upload') }}">Predict Here</a></li>
</ul>
</nav>
<main class="index-main">
<section class="index-image">
<img src="{{ url_for('static', filename='assets/img/intro.jpg') }}" alt="Intro Image">
</section>
<section class="index-content">
<h2>Welcome to Tree Counting Application</h2>
<p>Upload your satellite images to detect and count trees automatically.</p>
<a class="btn" href="{{ url_for('upload') }}">Get Started</a>
</section>
60
</main>
<footer>
<p>© 2024 Tree Enumeration</p>
</footer>
</body>
</html>
from google.colab import drive
drive.mount('/content/drive')
pip install ultralytics
import torch
print(torch.__version__)
from ultralytics import YOLO
model = YOLO("yolov8m.pt")
model.train(data=f"D://project//Data//data.yaml", epochs=10, imgsz=640,
augment=True,nms=True, conf=0.25, iou=0.6)
from IPython import display
display.Image(f"/content/runs/detect/train/val_batch1_pred.jpg",width=1000)
import locale
def getpreferredencoding(do_setlocale = True):
return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
count = 0
61
return total_bounding_boxes
# Specify the path to the test image
test_image_path =
"/content/runs/detect/predict/000000_jpg.rf.915026bba24c92d05b9c2418de59a6ed.jpg"
# Count the total bounding boxes in the test image
count_bounding_boxes(test_image_path
62
REFERENCES
[1.] lkesen, I Kavzoglu, T Sefercik, U.G Altuntas, O.Y Nazar, M Ozturk, M.Y Saygı, M.
(2023). “Heart Disease prediction using Retinal Image”, A Journal On Machine
Learning,Volume 11,Article No 13,pp. 13(2) . https://doi.org/10.1016/j.asr.2022.10.044 .
[2.] rhabib, K B Seminar and Sudradjat. “Recognition of blood vessels in retina using deep
learning algorithm” , A Journal On Applied Science , Volume 14 , Article No 14, Pp. 15(3) .
https://doi.org/10.1016/j.asr.2022.10.044
[3.] eema Moharram ,Xuguang Yuan and Dan Li “ Heart Disease prediction ”, A Journal on
Applied Science ,Volume 13 ,Article No 13, Pp. 13(2) https://doi.org/10.3390/app13020895,
Published Date : 9th January 2023.
[5.] Thani Jintasuttisak, Eran Edirisinghe, Ali Elbattay “Deep neural network based
Automatic heart disease prediction” A Journal on Armology, Volume 12 ,Article no 111 ,pp.
12(8) https://doi.org/10.1016/j.compag.2021.106560
[6.] cas Santos Santana, GabrielAraújo e Silva Ferraz,Gabriel Henrique Ribeiro dos
Santos,Nicole Lopes Bento and Rafael de Oliveira Faria “Predicting Heart Disease using
retinal images”A Journal on Sustainability , Volume 15 , Article no 1, pp.15(1),
https://doi.org/10.3390/su15010820 .
[7.] sh Sri Muna, Andri Prima Nugroho, Muhdan Syarovy, Ardan Wiratmoko, Suwardi, Lilik
Sutiarso “Disease Prediction using Machine Learning over Big Data”, A Journal on
Sustainability,Volume 17, Article no 122,pp.15(1) https://doi.org/10.3390/su15010820 .
[8.] Ling Yao, Tang Liu, Jun Qin , Ning Lu , Chenghu Zhou “Smart Health Prediction
System Using Machine Learning.”, A Journal on Sustainabilty Volume 18, Article no
122 ,pp. 16(1) .
https://doi.org/10.1016/j.ecolind.2021.107591 .
63
[9.] amir Khan, Prasun Kumar Gupta “Chronic Kidney Disease Prediction using Back
[10.] vidas Dukale, Sumit Agale, Swaroop Kalunge, Pushpak Nikam "Intelligent Heart
[12.] Budnik, J Byrtek and A Kapusta “An Active Learning Machine Technique Based
Prediction of Cardiovascular Heart Disease” A Journal on Information Science ,Volume 11,
Article no 11 ,pp. 11(3) 2024
https://doi.org/10.3390/su15010820.
[14.] ngcheng Han,Cunbao Ma ,Jian Chen ,Lin Chen ,Shuhui Shibiao Xu ,Yong
Zhao ,Chenhua Zhang andTatsuya Hagino “Recognition of Cardiovascular Diseases through
Retinal Images Using Optic Cup to Optic Disc Ratio” A Journal on Remote Sensing ,
Volume 14 , Article no 16, pp. 14(16) 2022
https://doi.org/10.3390/rs14164113
64
[15.] ao ,Chenhua Zhang andTatsuya Hagino “can deep learning on retinal images augment
known risk factors for cardiovascular disease prediction in diabetes? A prospective cohort
study from the national screening programme in Scotland.” a Journal On Remote Sensing
Volume 14 Article no : 16, pp. 15(8)
https://doi.org/10.3390/rs14164113
65