0% found this document useful (0 votes)
35 views51 pages

Report 5

It is project report of multiple MULTIPLE DISEASE PREDICTION APP USING MACHINE LEARNING

Uploaded by

Adarsh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views51 pages

Report 5

It is project report of multiple MULTIPLE DISEASE PREDICTION APP USING MACHINE LEARNING

Uploaded by

Adarsh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

A Project Report

On

MULTIPLE DISEASE PREDICTION APP


USING MACHINE LEARNING
Submitted in partial fulfillment of the requirements for the degree of

BACHELOR OF TECHNOLOGY
In
Computer Science and Engineering
(Session: 2023-24)
Submitted By
RANJEET SINGH : (Roll No. - 2005250100039)
AAKASH KUMAR SINGH : (Roll No. - 2005250100001)
GULNAZ : (Roll No. - 2005250100020)
SALONI VISHWAKARM : (Roll No. - 2005250100043)

Under the Supervision of


RANJEET SINGH
(Assistant Professor)

Buddha Institute Of Technology, Gida, Gorakhpur


Affiliated to
Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh,
India May, 2024
CANDIDATE’S DECLARATION

I/We hereby declare that the work presented in this report entitled “THESIS
TITLE", was carried out by me/us. I/We have not submitted the matter
embodied in this report for the award of any other degree or diploma of any
other University or Institute.
I/We have given due credit to the original authors/sources for all the words,
ideas, diagrams, graphics, computer programs, experiments, results, that are
not my original contribution. I have used quotation marks to identify
verbatim sentences and given credit to the original authors/sources.
I affirm that no portion of my work is plagiarized, and the experiments and
results reported in the report are not manipulated. In the event of a complaint
of plagiarism and the manipulation of the experiments and results, I shall be
fully responsible and answerable.

RANJEET SINGH Roll No. - 2005250100039

AAKASH KUMAR SINGH Roll No. - 2005250100001

GULNAZ Roll No. - 2005250100020

SALONI VISHWAKARMA Roll No. - 2005250100043

(Candidates’ Signature)

Date:
Department of Computer Science and Engineering
CERTIFICATE

This is to certify that the Project Report entitled “Multi Desease Detection Using
Convolutional Neural Network” submitted by Ranjeet Singh , Aakash Kumar Singh ,
Gulnaz , Saloni Vishwakarma in partial fulfillment of the requirement for the award of
Bachelor of Technology in Computer Science & Engineering from Buddha Institute of
Technology, Gorakhpur affiliated to Dr. A.P.J. Abdul Kalam Technical University,
Lucknow, Uttar Pradesh represents the work carried out by students under my
supervision. The project embodies the result of original work and studies carried out by
the students themselves. The project's contents do not form the basis for awarding any
other degree to the candidate or anybody else.

Mr. Ranjeet Singh (Assistant Professor) Department


of Computer Science and Engineering Buddha Institute of
Technology, GIDA, Gorakhpur
Date:
ABSTRACT

A large number of currently in use machine learning models for health care analysis
concentrate on a single disease at a time. For example, separate analyses exist for
diabetes, heart disease, and brain tumors. Multiple diseases cannot be predicted by a
single analysis using the same framework. This study proposes a system to forecast many
diseases using the Flask Application Programming Interface (API). This study offered a
way to look at Alzheimer's, diabetes, brain tumors, and cardiac problems. Later on, more
disorders like fever tests, covid detection, and many more might be added. Many illness
analyses were implemented using Flask API, TensorFlow, and machine learning
approaches. Python pickling is used to store the model behavior. and Python unpickling
is used to load the pickle file as needed. This article's study is significant because it takes
into account every aspect that contributes to the illness, allowing for the identification of
the disease's entire spectrum of potential effects. For example, many of the methods in
place today for analyzing diabetes took into account a few factors, including insulin, age,
sex, and BMI. However, we took into account other variables in our suggested system,
such as the number of pregnancies, skin thickness, glucose concentration, heart rate, and
pulse rate, which increases the efficiency of our model over the current one.
The Flask API will be developed and the behavior of the final model captured in a Python
pickle file. When using this API, the illness name and its parameters must be supplied
jointly. The Flask API will call the appropriate model and return the patient's state. The
purpose of this study is to look at as many diseases as feasible in order to monitor patient
situations and notify patients when necessary to lower the death rate.

Keywords: Convolutional Neural Network, Disease Prediction, Disease data, Machine


Learning.
ACKNOWLEDGEMENT

First and foremost, we shall thank God Almighty who gave us the inner
strength, resources, and ability to complete the work successfully, without
which all our efforts would have been in vain.

We are thankful to Mr. Abhinandan Tripathi, Head of the Department of


Computer Science and Engineering at Buddha Institute of Technology,
Gorakhpur for his valuable advice and motivation throughout.

We wholeheartedly thank our project guide Mr. Ranjeet Singh Sir (Assistant
Professor, Dept. of Mechanical Engineering) for his valuable advice and
support.

We convey our sincere thanks to all the faculties and technical experts for
their help and encouragement. We thank all our friends who have helped us
during the work, with their inspiration and cooperation. We truly admire my
parents for their constant encouragement and enduring support, which was
inevitable for this success. Once again, we convey our gratitude to all those
who have directly or indirectly influenced the work.

RANJEET SINGH Roll No.- 2005250100039


AAKASH KUMAR SINGH Roll No.- 2005250100001
GULNAZ Roll No.- 2005250100020
SALONI VISHWAKARMA Roll No.- 2005250100043
CHAPTER 1
INTRODUCTION

Multiple disease prediction using machine learning is an innovative approach to healthcare that aims to
use machine learning algorithms to accurately predict the likelihood of multiple diseases in a patient
based on their medical history and other relevant factors. The goal of this approach is to enable earlier
diagnosis, better treatment, and improved patient outcomes. Machine learning algorithms are
particularly well-suited to the task of disease prediction, as they can learn from large datasets of patient
informan using machine learning is a rapidly advancing field that holds great promise for the future of
healthcare. As technology continues to evolve and more data becomes available, it is likely that machine
learning algorithms will become increasingly sophisticated and accurate, leading to improved patient
outcomes and better overall health. Machine learning (ML) is one of the most rapidly developing fields
of computer science, with several applications. It refers to the process of extracting useful information
from a large set of data. ML techniques are used in different areas such as medical diagnosis, marketing,
industry, and other scientific fields. ML algorithms have been widely used in medical datasets and are
best suited for medical data analysis. There are various forms of ML, including classification,
regression, and clustering. , we focus on classification methods, which are applied to classify a given
dataset into predefined groups and to predict future activities or information to that data due to its good
accuracy and performance. tion and identify patterns and correlations that might not be immediately
apparent to human clinicians. By analyzing data from a wide range of sources, including electronic
health records, medical images, and genetic data, machine learning algorithms can identify subtle
indicators of disease that might be missed by traditional diagnostic methods. Multiple disease prediction
using machine learning has the potential to revolutionize healthcare by enabling more accurate and
personalized diagnoses, earlier interventions, and more effective treatments. However, there are also
challenges and limitations to this approach, including the need for diverse and representative data, the
risk of bias in algorithms, and the need for transparent and ethical implementation. Despite these
challenges, multiple disease prediction.The healthcare sector is gradually embracing artificial
intelligence (AI) technologies, which are pervasive in modern business and daily life. Artificial
intelligence could help healthcare practitioners in many areas of patient care and operational procedures,
enabling them to solve problems more quickly and enhance current solutions. The majority of AI and
healthcare technologies are extremely pertinent to the healthcare sector, even though hospitals and other
healthcare companies may employ entirely different methods. Precision medicine is the most widely
used use of traditional machine learning in the artificial intelligence domain in the healthcare sector. For
many healthcare organizations, the capacity to forecast which treatment modalities will be most effective
for patients based on their characteristics and the course of therapy represents a significant
accomplishment. Furthermore, even if some papers on the topic claim that artificial intelligence (AI)
may perform at least as well as humans in some scenarios, such as disease detection, it will be some time
before AI in healthcare totally replaces people for a variety of medical duties. The majority of AI in
healthcare is employed in applications related to machine learning and precision medicine, which
require data in order to train with known outcomes. We refer to this as supervised learning. Several
analyses of the current health care system's systems only looked at one disease at a time. For example,
articles are used to research diabetes, look at brain tumors, and predict heart issues. Most articles
concentrate on a certain disease.. Any institution that want to review medical records for patients must
use a variety of models. The technique of the current system is helpful for studying a certain kind of
disease. In the modern world, mortality has increased as a result of inaccurate sickness detection. Even
after obtaining treatment for their first ailment, patients may still experience symptoms of another
condition. For the analysis of illnesses, many of the current methods merely employed a few parameters.
As a result, it may not be possible to identify the diseases that the disease's impacts would produce.
Diabetes, for example, raises the risk of heart disease, retinopathy, neuropathy, dementia, and hearing
loss. This research considered the examination of data sets related to diabetes, Alzheimer's disease, brain
tumors, and heart disease. Developers are required to supply the model file associated with each new
disease analysis they add to this API. The developer must set up Python picking to preserve model
behavior when creating new diseases. The developer can load a pickled file to obtain the model behavior
by using this Flask API. In order to assess a patient's health, a user has two options: if the report contains
variables that are utilized to forecast other diseases, this analysis will determine the most pertinent
disease identifications. Alternatively, the user can predict a specific disease. By informing patients
according to their health situation, this article aims to stop the daily growth in the mortality ratio. Patient
analysis can be provided at a lower cost
CHAPTER 2
LITERATURE SURVEY
M. Chen proposed [1] a new convolutional neural network (CNN)-based multimodal disease risk
prediction method based on hospital structures and structural data. M. Chen, Y. Hao, K. Hwang, L. Wang,
and L. Wang Developed a multisite disease prediction system.They predicted the occurrence of three
diseases: diabetes,kidney damage and heart disease. Disease prediction is based on structured information.
Several machine learning algorithms such as naive bays, Decision Tree and K-Nearest Neighbor (KNN)
algorithm are used to predict heart disease,diabetes and stroke.Decision tree method performs better than
Naive Bayes andKNN algorithms. They also indicate whether the patient is at high risk of stroke or
low/risk of stroke. They used CNN-based multi-model disease risk prediction using text data to estimate
the risk of cerebral infarction. The accuracy of CNN-based modal disease prediction prediction (UDRP) is
compared withCNN-based multimodal disease prediction system.CNN-based unimodal prediction
algorithm with faster disease prediction accuracy can be 94.8%. The processes of the CNN-based
multi-model disease risk\prediction algorithm are identical to those of the CNN-UDRP method, except for
two additional testing steps.This study focuses on both the structured and unstructured levels. The author
works with unstructured data.Previous work was based exclusively on structured data and none of the
authors worked with unstructured or semi-structured data. However, this article relies on both structured
and unstructured data.B. Qian, X. Wang, N. Cao, H. He and Y.-G. Jiang [2] created a risk prediction
system for Alzheimer's disease using patient electronic health records (EHR) data. They used an active
learning device to solve a patient real-world problem. Here, an active patient risk model was created.
Alzheimer risk is used in active risk prediction method.Portable 2.0 is a method presented by IM. Chen,
Y. Ma, Y.Li, D. Wu, Y. Zhang, and C. Youn [3] presenting the design of washable smart clothing to
improve the generational experience and service quality healthcare system. . Chen created a new
IoT-based data collection system. newly developed intelligent washable fabric with sensors.Further
analyzes are performed based on physiological data. The main components of this washable smart fabrics
include several sensors, cables\and electrodes. With these parts, the user can collect information about the
patient's physiological state as well as his/her emotional health state using a cloud-based system.

With this fabric, the physiological state of the patient could be detected. And this information is used for
analysis. They talked about the problems that appeared in the creation of the portable 2.0 architecture.
Physiological Data collection, adverse psychological effects,resistance to wireless body area networks and
continuous passive physiological data collection are problems with current technology.many functions
performed in files, such as data analysis, monitoring and forecasting.Again, the author divides the
functional elements of the wearable 2.0 smart clothes into the following groups:sensor integration,
electrical cable-based network, and digital modules. Many applications are discussed here,including
chronic disease monitoring,\elderly care and emotional management.Cyber-Physical Systems (CPS)
cloud-based healthcare system,which stores massive amounts of biological data by Y. Zhang, M .Qiu, C.-
W. Tsai, M. M. Hassan and A.Alamri [4]. Y. Zhang talked about the rapidly growing amount of data in the
pharmaceutical industry. The problem with big data is that data is produced quickly and tends to be stored
in multiple formats. Healthy CPS system created with two technologies and big data technology second.
Data collection, data management and data-oriented levels are the three levels that the system considers.
The layer that collected the data did show a standard format.

data management layer\used for parallel computing and distributed storage.With the health-cps system,
this system performs several operations. In addition, this system is not aware of various health-related
services.The telehealth system was introduced by L. Qiu, K. Gai, and M.Qiu in [5] and discuss how to
manage significant amounts of hospital data in the cloud. The author of this study recommended
improvements to the telehealth system that focus mostly on cloud-based data exchange between all
telehealth providers.However, cloud data\sharing has other challenges, including network bandwidth and
virtual machine.. an exchange A cloud-based data sharing solution is suggested here to improve data
sharing\ideas. An ideal telehealth sharing model is developed here. Using this model, the\factor focuses
on time constraints, network\characteristics, and transmission probability. To this end,the author has
created a completely new ideal method to exchange large data. Users get the best method\to process
biological data with this algorithm.The best clinical decision method was proposed byAjinkya Kunjir,
Harshal Sawant and Nuzhat F. Shaikh[6] and uses patient history for prediction. disease Numerous
Diseases and unexpected patient condition patterns are anticipated here. created a state-of-the-art clinical
decision tool used to accurately predict disease\historical data. It also determined the perception of many
diseases and their unknown pattern. Pie charts\and 2D/3D charts have been used here for
visualization.Several machine learning techniques,including fuzzy logic, fuzzy neural networks and
decision trees, are compared by S. Leoni Sharmila, C. Dharuma and P .Venkatesa in their paper [ 7]. They
use the liver data to classify and do a comparative study. According to the study, the Fuzzy Neutral
Network performs 91% better than other machine learning algorithms in classifying liver disease data.
The author knows classification very well and offers very good presentations. The author used the
simplified fuzzy ARTMAPmany application fields.The author came to the conclusion that machine
learning techniques such as Naive Bayes and Apriori[8] are very useful for disease detection in this
dataset. Here, predictions use a small amount of information, such as symptoms or previous information
from a physical diagnosis. Limitations of this paper include the inability to calculate huge data sets. It is
also difficult to detect today due to the ever-increasing amount of medical data.CNN's Multimodal
Disease Risk Prediction MDRPtechnology was developed by Shraddha Subhash Shirsath [9] to predict
diseases based on a large number of structured\. and unstructured hospital data. Using a machine learning
algorithm (Naive-Bayes), CNN-MDRPfocuses on structured and unstructured data, which increases the
accuracy and speed of disease prediction compared to CNN-UDRP,which only analyzes structured data.
Big data is considered here.A new method based on the artificial neural network (ANN)algorithm has
been developed to predict heart diseases [10]. The researchers developed a classification-based interactive
prediction method that uses an artificial neural network algorithm and takes into account the thirteen most
important clinical factors. Health professionals will greatly benefit from the recommended strategy, which
had an accuracy of 80% in the prediction of heart disease.

A fuzzy method was used to predict heart disease,with limb function [11]. The authors tried to remove
ambiguity and uncertainty from the data using fuzzy KNN classification. The 25 classes and 22 elements
of each category were created from 550 record data levels. The dataset was divided into training and test
halves of equal size. After using the preprocessing methods, the fuzzy KNN methodology was
used.Precision, precision and recall were just some of the\evaluation variables used to evaluate this
method.The results showed that the fuzzy KNN classifier performs better than the KNN classifier\in terms
of accuracy.To evaluate a large image collection, Yang et al. [12]\introduced the CBIR (Content-Based
Image Retrieval) method. This approach searches for a tumor with the same pathologic features using the
tumor region as a query. The function used by the system was called MID(Margin Information
Descriptor). With the tissue surrounding the tumor,

this explains the meaning of the image. Their Proposed approach achieved 89.3%accuracy. However,
since their approaches lead to global linear distance measurements, they provide many local predictions
for different locations. In addition, manual segmentation of the breast tumor was used in the development
of their method.Support vector machines (SVM) with an adaptive function were used to detect breast
cancer and diabetes [13]. The aim was to provide an automatic, fast and flexible\diagnostic approach
using adaptive SVM. The value of the traditional SVMchange was changed to get better results.If then
rules were created as an output of the proposed classifier. Both breast cancer and diabetes were
successfully diagnosed with the proposed method, with 100% accuracy for both diagnoses. Future
Research should focus on developing more effective strategies to change the bias value in traditional
SVM..
CHAPTER 3
PROPOSED METHODOLOGY
The task of predicting diabetes, heart disease, and Parkinson's disease is complicated and calls for an
all-encompassing strategy that incorporates early detection, model building, data collecting, and
community involvement. A methodical strategy for creating a machine learning-based Multiple Disease
Prediction System that works is outlined below:

1. Data Collection:
● Browse Kaggle for relevant datasets containing information on patients diagnosed with diabetes, heart
disease, and Parkinson's disease.
● Look for datasets with a good balance between the number of cases (patients with the disease) and
controls (patients without the disease).
● Pay attention to the data description to ensure it includes features relevant to your chosen diseases, such
as demographics, medical history, and lifestyle factors.

2. Data Preprocessing:
● Download the chosen dataset from Kaggle.
● Use Python libraries like pandas to explore and clean the data.
● Identify missing values and address them using techniques like mean/median imputation or more
advanced methods depending on the data.
● Handle outliers by capping them or removing them if justified.
● Apply feature scaling techniques like standardization or normalization to ensure all features contribute
equally during training.

3. Model Training:
● Here's where you get to experiment! Split the Kaggle dataset into training and testing sets. The training
set will be used to train the models, and the testing set will be used for evaluation later.
● Use libraries like scikit-learn to implement various machine learning algorithms like decision trees,
random forests, and artificial neural networks.
● Train each model on the training set, adjusting hyperparameters (model settings) to optimize
performance.

4. Model Selection:
● Evaluate the performance of each trained model on a separate hold-out validation set (a small portion
of the training data set aside for this purpose). This helps prevent overfitting and gives a better idea of
how the model will perform on unseen data.
● Use metrics like accuracy, precision, and recall to compare the models' performance on all three
diseases.
● Choose the model that delivers the most accurate and reliable predictions for disease prediction.

5. Model Evaluation:
● Now, it's time to assess the chosen model's generalizability.
● Use the completely separate testing set you created earlier to evaluate the model's performance on
unseen data.
● Again, use metrics like accuracy, precision, and recall to gauge the model's effectiveness.

6. User Interface Development:


● Design a user interface that allows users to input patient data relevant to the features present in your
Kaggle dataset.
● Ensure the interface is clear and easy to navigate.
● Integrate the interface with your trained model to receive predictions for the chosen diseases.

7. Exporting the Trained Model:


● Once you're satisfied with the model's performance, use libraries like Pickle to serialize the trained
model.
● This allows you to save the model and use it later for making predictions on new data points without
retraining. 8. Integration with Application:
● Develop a web or mobile application that incorporates the exported model.
● The application should allow users to input patient data and receive disease risk predictions based on
the model's output.
● Include disclaimers about the limitations of the system, emphasizing that it's not a replacement for
professional medical diagnosis.

To put it briefly, the project's suggested methodology compares several training models, chooses the SVM
model due to its high accuracy, implements the model using libraries like pandas, numpy, scikit-learn, and
pickle, and then integrates the trained model into a disease prediction application. The implementation
offers a workable and approachable solution for disease risk assessment and decision support, while also
guaranteeing accurate disease predictions

3.1 1. Browse Kaggle for Relevant Datasets


Searching for Datasets:

Use specific keywords like "diabetes dataset," "heart disease dataset," and "Parkinson's disease dataset."
Refine your search by looking for datasets tagged with terms such as "health," "medical," "clinical data,"
or the specific disease names.
2. Balance Between Cases and Controls
Evaluating Dataset Balance:

Check the dataset description and accompanying documentation for information on the number of cases
(patients with the disease) versus controls (patients without the disease).
Use the dataset preview feature on Kaggle to get an initial sense of the data distribution. For example:
The Diabetes Prediction Dataset includes a balanced set of features for both diabetic and non-diabetic
patients​(Kaggle)​.
The Pima Indians Diabetes Database has approximately equal numbers of positive and negative cases,
which helps in unbiased predictive modeling​(Kaggle)​.
For heart disease, datasets like the Heart Disease UCI dataset often provide a balanced sample of patients
with and without heart disease, which is crucial for model training and evaluation.
3. Pay Attention to Data Descriptions
Relevant Features for Each Disease:

● Diabetes:
Look for datasets that include demographics (age, gender), medical history (family history of diabetes),
lifestyle factors (diet, physical activity), and clinical measurements (BMI, blood pressure, glucose levels).
The Diabetes Health Indicators Dataset includes such comprehensive features​(Kaggle)​.
Heart Disease:
Important features include demographics (age, gender), lifestyle factors (smoking, exercise), medical
history (previous heart conditions, cholesterol levels), and diagnostic test results (ECG, blood pressure).
The Cleveland Heart Disease Dataset is a good example, containing detailed patient records including
these features.
● Parkinson's Disease:
Essential features include demographics (age, gender), clinical assessments (UPDRS scores, tremor
severity), voice measurements (for early detection), and medical history (duration of symptoms). The
Parkinson’s Disease Dataset typically includes voice measurements and other clinical features that are
crucial for research.
Example Datasets:
Diabetes Datasets:

Diabetes Prediction Dataset: Contains features like age, BMI, glucose levels, and more, with a balanced
number of diabetic and non-diabetic patients​(Kaggle)​.
Diabetes Health Indicators Dataset: Features a large sample size with indicators such as physical activity,
cholesterol levels, and demographic information​(Kaggle)​.
Pima Indians Diabetes Database: Well-documented dataset with medical and demographic data
specifically for diabetes research​(Kaggle)​.
Heart Disease Datasets:

Heart Disease UCI Dataset: Includes various medical and demographic features essential for heart disease
prediction and analysis.
Framingham Heart Study Dataset: Contains detailed longitudinal data on heart disease, including lifestyle
and clinical features.
Parkinson's Disease Datasets:

Parkinson's Disease Dataset: Often includes voice recordings, UPDRS scores, and demographic data to
help in early detection and monitoring.
Parkinson's Telemonitoring Dataset: Contains biomedical voice measurements that are crucial for tracking
the progression of the disease.
Final Steps:
Download and Preview the Data: Before full-scale analysis, download the datasets and perform an initial
exploration to understand the structure and content.
Data Cleaning and Preprocessing: Ensure the datasets are clean, with no missing values or
inconsistencies, and preprocess the data as needed for your analysis.
Ethical Considerations: Always check the usage license of the datasets to ensure you comply with any
restrictions on data use, especially for sensitive medical information.
By following these steps, you can effectively gather and utilize high-quality datasets from Kaggle for
research on diabetes, heart disease, and Parkinson's disease.
CHAPTER 4
PROBLEM IDENTIFICATION

Many of the existing machine learning models for health care analysis are concentrating on
one disease per analysis. For example first is for liver analysis, one for cancer analysis, one for
lung diseases like that. If a user wants to predict more than one disease, he/she has to go
through different sites. There is no common system where one analysis can perform more than
one disease prediction. Some of the models have lower accuracy which can seriously affect
patients’ health. When an organization wants to analyze their patient’s health reports, they
have to deploy many models which in turn increases the cost as well as time. Some of the
existing systems consider very few parameters which can yield false results.

4.1 EXISTING SYSTEM


● The study has identified multiple risk factors for cardiovascular disease, including
high blood pressure, high cholesterol, smoking, and diabetes.

● Based on these risk factors, a risk score can be calculated to predict an individual's
likelihood of developing cardiovascular disease.

● Traditional statistical methods are used to identify risk factors and calculate a risk
score, which can be used for disease prevention and management.

4.1.1 DISADVANTAGES OF EXISTING SYSTEM


● Data bias: One of the biggest concerns with machine learning systems is data bias. If
the training data used to develop the system is biased or incomplete, it can lead to
inaccurate predictions and misdiagnosis. This is especially problematic when it comes
to underrepresented populations, as their data may not be well-represented in the
training set.

● Overfitting: Overfitting occurs when a machine learning model is trained too closely
to a particular dataset and becomes overly specialized in predicting it. This can result
in poor generalization to new data and lower accuracy.

● Lack of interpretability: Many machine learning algorithms are "black boxes,"


meaning that it is difficult to understand how they arrive at their predictions. This an
be
problematic in healthcare, where it is important to be able to explain how a diagnosis
was made.

● Limited data availability: Some diseases are rare, which means that there may not be
enough data available to train a machine learning model accurately. This can limit the
effectiveness of the system for predicting such diseases.

● Cost and implementation: Implementing machine learning systems for healthcare can
be expensive and time-consuming. Hospitals and clinics may need to invest in new
hardware, software, and staff training to implement these systems effectively

4.1.2 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This
isto ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are


● Economical Feasibility
● Technical Feasibility
● Social Feasibility

4.1.3 ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be
purchased.
4.1.4 TECHNICAL FEASIBILITY
During this study, the analyst identifies the existing computer systems of the
concerned department and determines whether these technical resources are sufficient for the
proposed system or not. If they are not sufficient, the analyst suggests the configuration of the
computer systems that are required. The analyst generally pursues two or three different
configurations which satisfy the key technical requirements but which represent different
costs. During technical feasibility study, financial resources and budget is also considered. The
main objective of technical feasibility is to determine whether the project is technically
feasible or not, provided it is economically feasible.
4.1.5 SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system
and to make him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user of the system.

4.2 PROPOSED SYSTEM

● This project involved analyzing a multiple disease patient dataset with proper data
processing.

● Different algorithms were used to train and predict, including Decision Trees,
Random Forest, SVM, and Logistic Regression,adaboost.

● In a multi-disease model, it is possible to predict more than one disease at a time,


reducing the need to traverse multiple models to predict disease.
● Diverse training data: To address data bias, a proposed system would use a
diverse range of training data, including data from underrepresented populations,
to ensure that the system can accurately predict diseases across all groups.
● Robust algorithms: The system would use algorithms that are robust to overfitting
and have high accuracy on unseen data. This could be achieved by using techniques
such as regularization and cross-validation.
● Explainable AI: To address the lack of interpretability of machine learning models,
the proposed system would use explainable AI techniques to provide clear and
understandable reasons for its predictions. This would increase the trust and
acceptance of the system among healthcare providers and patients

The Proposed system of multiple disease prediction using machine learning is that we have used
algorithms and all other various tools to build a system which predicts the disease of the patient
using the symptoms and by taking those symptoms we are comparing with the system‟s dataset that
is previously available. By taking those datasets and comparing with the patient‟s disease we will
predict the accurate percentage disease of the patient. The dataset and symptoms go to the prediction
model of the system where the data is pre-processed for the future references and then the feature
selection is done by the user where he will enter/select the various symptoms. Then the classification
of those data is done with the help of machine learning algorithms such as Logistic regression. Then
the data goes in the recommendation model, there it shows the risk analysis that is involved in the
system and it also provides the probability estimation of the system such that it shows the various
probability like how the system behaves when there are n number of predictions are done and it also
does the recommendations for the patients from their final result and also from their symptoms like it
can show what to use and what not to use from the given datasets and the final results. It predicts
probable diseases by mining data sets such as Covid-19, Chronic Kidney disease and heart Disease.
To the best of our knowledge in the area of medical big data analytics none of the existing work
focused on both data types

.4.2.1 ADVANTAGES OF PROPOSED SYSTEM

The proposed system, which leverages Kaggle datasets for analyzing diabetes, heart disease, and Parkinson's
disease, offers several advantages:
1. **Comprehensive Data Availability:**
- **Diverse and Rich Data Sources:** Kaggle hosts a variety of datasets that include extensive patient data,
encompassing demographics, medical history, clinical measurements, and lifestyle factors. This diversity
allows for a comprehensive analysis of each disease.
- **Real-World Data:** Many datasets on Kaggle are derived from real-world clinical settings, providing
realistic scenarios and practical insights into disease prediction and management.

2. **Balanced Datasets:**
- **Improved Model Accuracy:** Balanced datasets ensure that machine learning models are not biased
towards a particular class (e.g., diseased vs. non-diseased). This balance helps in developing robust and
generalizable models.
- **Enhanced Reliability:** A balanced dataset allows for more reliable evaluation metrics such as accuracy,
sensitivity, specificity, and AUC-ROC scores, leading to better assessment and comparison of model
performance.

3. **Feature Richness:**
- **Relevant Features Included:** Datasets often include critical features necessary for understanding and
predicting diseases. For example, diabetes datasets typically contain glucose levels, BMI, and family history,
while heart disease datasets might include cholesterol levels and ECG results. This ensures that the models can
capture the key factors influencing disease outcomes.
- **Customizable for Specific Needs:** Researchers can select and focus on features most relevant to their
specific study or model requirements, tailoring the analysis to their unique objectives.

4. **Ease of Access and Use:**


- **User-Friendly Interface:** Kaggle provides a straightforward platform for accessing, downloading, and
previewing datasets. This ease of access accelerates the research process and reduces the time needed to gather
data.
- **Integrated Tools:** Kaggle integrates various data science tools and programming environments, such as
Jupyter notebooks, which facilitate seamless data analysis and model development directly on the platform.

5. **Community and Collaboration:**


- **Expert Contributions:** Kaggle's community includes data scientists, researchers, and medical
professionals who often contribute high-quality datasets and share insights, fostering collaborative research.
- **Peer Reviews and Benchmarks:** Datasets and models on Kaggle are often reviewed and benchmarked
by the community, providing validation and quality assurance.

6. **Ethical and Legal Compliance:**


- **Data Usage Policies:** Kaggle datasets come with clear usage licenses and privacy guidelines, helping
researchers ensure that their work complies with ethical and legal standards regarding patient data.
- **Informed Consent:** Many datasets are derived from studies where participants have provided informed
consent, ensuring ethical use of data.

7. **Cost-Effectiveness:**
- **Free Access:** Many datasets on Kaggle are available for free, reducing the cost of data acquisition and
making it accessible to a broader range of researchers and institutions.
- **Reduction in Data Collection Costs:** By utilizing existing datasets, researchers save on the substantial
costs and logistical challenges associated with primary data collection.

8. **Support for Advanced Analytics:**


- **Machine Learning and AI Integration:** Kaggle provides resources and environments optimized for
machine learning and AI, enabling sophisticated data analysis techniques such as predictive modeling,
clustering, and anomaly detection.
- **Benchmarking and Competitions:** Researchers can participate in competitions to benchmark their
models against others, promoting innovation and improvement in analytical methods.

By leveraging these advantages, the proposed system can significantly enhance the quality and impact of
research into diabetes, heart disease, and Parkinson's disease, leading to better understanding, prevention, and
treatment strategies.

4.2.2 . PROPOSED ALGORITHM


To create a predictive model for diabetes, heart disease, and Parkinson's disease using the datasets
obtained from Kaggle, we will employ a systematic approach leveraging machine learning
techniques. The proposed algorithm will involve the following steps:

1. Data Collection and Preprocessing


Data Acquisition: Download relevant datasets from Kaggle, ensuring they include features such as
demographics, medical history, and lifestyle factors.
Data Cleaning:
Handle missing values using appropriate imputation techniques (mean, median, or mode for
numerical data, and most frequent value for categorical data).
Remove duplicates and correct any inconsistencies in the data.
Data Transformation:
Encode categorical variables using techniques such as one-hot encoding or label encoding.
Normalize or standardize numerical features to ensure uniformity and improve model performance.
2. Exploratory Data Analysis (EDA)
Descriptive Statistics: Compute summary statistics to understand the central tendency, dispersion,
and shape of the data distribution.
Visualization: Use visual tools like histograms, box plots, and scatter plots to identify patterns,
correlations, and outliers.
Correlation Analysis: Identify relationships between features and the target variable using correlation
matrices and heatmaps.
3. Feature Engineering
Feature Selection: Use techniques such as Recursive Feature Elimination (RFE) and Principal
Component Analysis (PCA) to select the most relevant features.
Feature Creation: Generate new features by combining existing ones, if necessary, to enhance model
performance.
4. Model Development
Data Splitting: Divide the data into training and testing sets, typically in a 70-30 or 80-20 ratio,
ensuring a balanced representation of cases and controls in both sets.
Model Selection: Choose appropriate machine learning algorithms for the prediction task. Common
choices include:
Logistic Regression: For binary classification tasks.
Decision Trees and Random Forests: For handling non-linear relationships and interactions between
features.
Support Vector Machines (SVM): For high-dimensional feature spaces.
Gradient Boosting Machines (GBM) and XGBoost: For improved accuracy and handling complex
datasets.
Neural Networks: For deep learning approaches, especially if the dataset is large and complex.
5. Model Training and Evaluation
Training: Train the selected models on the training dataset.
Hyperparameter Tuning: Optimize model performance by tuning hyperparameters using techniques
such as Grid Search or Random Search with cross-validation.
Evaluation Metrics: Assess model performance using metrics such as:
Accuracy
Precision, Recall, and F1-score
Area Under the Receiver Operating Characteristic Curve (AUC-ROC)
Confusion Matrix
6. Model Validation and Testing
Validation: Validate the model using the testing dataset to ensure it generalizes well to unseen data.
Cross-Validation: Use k-fold cross-validation to further validate the model's performance and
robustness.
7. Model Deployment
Implementation: Deploy the model in a real-world setting or integrate it into a decision support
system.
Monitoring and Maintenance: Continuously monitor model performance and update it with new data
to maintain accuracy and relevance.

4.2.3 ARCHITECTURE DIAGRAM


4.3 SYSTEM DESIGN

A. Suggested Framework

various disease predictions makes it possible to forecast various diseases at once. Consequently, the
user doesn't need to visit multiple locations in order to predict the illnesses. We are concentrating on
diabetes, Alzheimer's, brain tumors, and heart problems. since there is a relationship between the
three conditions. We'll use Flask-API and machine learning techniques to generate many illness
analyses. When requesting access to that API, the user must provide the name of the disease as well
as its parameters. Flask will retrieve the patient's status and call the appropriate model.

B. Architecture Overview
C CNN

Creating a CNN (Convolutional Neural Network) algorithm for a multiple disease prediction app
using machine learning involves several steps:

1. **Data Collection**: Gather a diverse and comprehensive dataset containing medical images
related to the diseases you want to predict. Ensure the dataset is labeled with the corresponding
disease for each image.

2. **Data Preprocessing**: Preprocess the images to make them suitable for input into the CNN.
This may involve resizing, normalization, and augmentation to increase the robustness of the model.

3. **Model Architecture Design**: Design the architecture of the CNN. For a multiple disease
prediction task, you might need a more complex architecture compared to a binary classification
task. You can use pre-trained models like VGG, ResNet, or design a custom architecture tailored to
your dataset and requirements.

4. **Training**: Split your dataset into training, validation, and test sets. Train your CNN model on
the training set using appropriate optimization algorithms like Adam or RMSprop. Tune
hyperparameters such as learning rate, batch size, and number of epochs to achieve the best
performance.

5. **Evaluation**: Evaluate the trained model on the validation set to assess its performance.
Monitor metrics like accuracy, precision, recall, and F1-score. Fine-tune the model based on
validation performance to prevent overfitting.

6. **Testing**: Test the final model on the held-out test set to obtain an unbiased estimate of its
performance. Ensure that the model generalizes well to unseen data.

7. **Deployment**: Integrate the trained model into your multiple disease prediction app. Provide a
user-friendly interface for users to input medical images and receive predictions. Ensure the app's
privacy and security measures comply with healthcare regulations.

8. **Continuous Improvement**: Monitor the performance of your model in real-world scenarios


and collect feedback from users. Continuously update and improve your model based on new data
and insights.

Remember to adhere to ethical guidelines and regulations regarding the use of medical data, ensuring
patient privacy and confidentiality throughout the development and deployment process.

Certainly! Let's delve deeper into each step:


1. **Data Collection**:

- Identify relevant sources for medical image data, such as publicly available datasets, hospitals,
research institutions, or collaboration with medical professionals.

- Ensure the dataset covers a wide range of diseases you intend to predict, with a sufficient number
of samples for each class.

- Pay attention to data quality and ensure that images are annotated with accurate labels.

2. **Data Preprocessing**:

- Resize images to a consistent resolution suitable for input into the CNN.

- Normalize pixel values to a common scale (e.g., [0, 1] or [-1, 1]).

- Apply data augmentation techniques such as rotation, flipping, zooming, and cropping to increase
the diversity of the training data and improve model generalization.

3. **Model Architecture Design**:

- Decide on the architecture type (e.g., VGG, ResNet, Inception) based on the complexity of your
dataset and computational resources.

- Adjust the number of layers, filter sizes, and strides based on the characteristics of medical
images and the diseases being predicted.

- Incorporate techniques like batch normalization and dropout to improve model performance and
prevent overfitting.

- Consider using transfer learning by initializing the model with pre-trained weights on a large
dataset like ImageNet, then fine-tuning the model on your medical image dataset.

4. **Training**:

- Split the dataset into training, validation, and test sets (e.g., 70-15-15 ratio).
- Choose appropriate loss functions such as categorical cross-entropy for multi-class classification.

- Select an optimization algorithm (e.g., Adam, RMSprop) and tune its hyperparameters.

- Monitor training progress using metrics like loss and accuracy, and visualize performance using
tools like TensorBoard.

- Implement techniques like early stopping to prevent overfitting.

5. **Evaluation**:

- Evaluate the trained model on the validation set to tune hyperparameters and prevent overfitting.

- Use metrics such as accuracy, precision, recall, and F1-score to assess model performance.

- Analyze confusion matrices to understand the model's strengths and weaknesses across different
disease classes.

6. **Testing**:

- Assess the model's performance on the held-out test set to ensure unbiased evaluation.

- Calculate performance metrics similar to those used during validation.

- Perform error analysis to identify common misclassifications and areas for improvement.

7. **Deployment**:

- Integrate the trained model into your multiple disease prediction app using frameworks like
TensorFlow Serving or Flask.

- Develop a user-friendly interface for users to upload medical images and receive predictions.

- Implement privacy and security measures to protect patient data, such as encryption and access
controls.

- Comply with healthcare regulations such as HIPAA (if applicable) to ensure patient
confidentiality.

8. **Continuous Improvement**:

- Monitor the deployed model's performance in real-world scenarios and collect user feedback.

- Regularly retrain the model with new data to adapt to evolving patterns and trends.

- Stay updated with the latest research in medical imaging and machine learning to incorporate
advancements into your model.

By following these detailed steps, you can develop a robust CNN algorithm for your multiple disease
prediction app, contributing to improved healthcare outcomes and patient care.

HARDWARE REQUIREMENTS

▪ System processor : Intel Core i7.

▪ Hard Disk : 512 SSD.

▪ Monitor : “15” LED.

▪ Mouse : Optical Mouse.

▪ RAM : 8.0 GB.

▪ Key Board : Standard Windows Keyboard.

SOFTWARE REQUIREMENTS

▪ Operating system : Windows 10.

▪ Coding Language : Python 3.9.

▪ Front-End : Streamlit 3.7, Python

Back-End : Python3.9
CHAPTER 5
IMPLEMENTATION

An Implementation is a realization of a technical specification or algorithm as a program, software


components, or other computer system though computer programming and deployment. Many
implementations may exist for specifications or standards. A special case occurs in object- oriented
programming, when a concrete class implements an interface.
Data Collection
First step for predication system is data collection and deciding about the training and
testing dataset. In this project we have used training dataset and testing dataset.
Attribute Selection
Attribute of dataset are property of dataset which are used for system and for heart many
attributes are like heart bit rate of person, gender, sex of the person, age of the person and many
more for predication system.
Data Pre-processing
Pre processing needed for achieving prestigious result from the machine learning
algorithms. For example Random forest algorithm does not support null values dataset and for this
we have to manage null values from original raw data.
For our project we have to convert some categorized value by dummy value means in the form of
“0”and “1” by using following code.

Balancing of Data
Imbalanced datasets can be balanced in two ways. They are Under Sampling and Over
Sampling.
Under Sampling
Dataset balance is done by the reduction of the size of the data set. This process is
considered when the amount of data is adequate.
Over Sampling
In Over Sampling, dataset balance is done by increasing the size of the dataset. This
process is considered when the amount of data is inadequate.

5.1 MODULES
1. HEART DISEASE PREDICTION
• It uses data about the Effected and normal people data preferences to generate
the result of the patient.
• It performs the Different machine algorithms like
KNN,XGBoost,SVM,RANDOM FOREST,etc
• This aims to predict via different supervised machine learning methods.
Attribute Information
1. Age
2. Sex
3. Chest Pain types
4. Resting blood pressure
5. Serum cholesterol
6. Fasting Blood sugar
7. Resting Cardiographic Result
8. Maximum Heart rate achieved
9. Exercise Reduced Angina
10. Vessels coloured by Fluoroscopy
Data Collection: Gather a dataset containing patient information, including demographic data, medical
history, lifestyle factors, and diagnostic test results such as electrocardiograms (ECG), echocardiograms,
blood tests, and imaging studies. You can obtain such datasets from sources like the UCI Machine
Learning Repository or collaborate with hospitals and research institutions. Ensure the dataset is diverse
and representative of different populations.

Data Preprocessing: Clean and preprocess the collected data to handle missing values, outliers, and
inconsistencies. Perform feature engineering to extract relevant features from the raw data, such as
calculating body mass index (BMI) from height and weight measurements. Normalize numerical features
and encode categorical features for compatibility with machine learning algorithms.

Exploratory Data Analysis (EDA): Analyze the dataset to gain insights into the relationships between
different features and the target variable (presence or absence of heart disease). Visualize the data
distribution, correlations, and trends to inform feature selection and model development.

Feature Selection: Select the most informative features for heart disease prediction to reduce
dimensionality and improve model performance. You can use techniques like correlation analysis,
feature importance ranking, or domain knowledge to identify relevant features.
Model Selection: Choose appropriate machine learning algorithms for heart disease prediction, such as
logistic regression, decision trees, random forests, support vector machines (SVM), or gradient boosting
machines (GBM). Experiment with different models and evaluate their performance using
cross-validation techniques.

Training: Split the dataset into training and test sets. Train the selected model(s) on the training data
using suitable hyperparameters and optimization techniques. Tune hyperparameters using techniques
like grid search or random search to optimize model performance.

Evaluation: Evaluate the trained model(s) on the test set using performance metrics such as accuracy,
precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). Assess
the model's performance in terms of both predictive accuracy and clinical relevance.

Validation: Validate the model's performance using external datasets or real-world clinical data.
Collaborate with healthcare professionals to assess the model's clinical utility, interpretability, and
potential impact on patient care.

Deployment: Integrate the trained model into a software module or application for heart disease
prediction. Provide a user-friendly interface where users, such as clinicians or patients, can input relevant
data and receive predictions or risk scores indicating the likelihood of heart disease.

Continued Monitoring and Improvement: Monitor the performance of the deployed module in
real-world settings and collect feedback from users. Continuously update and improve the model based
on new data, advancements in machine learning techniques, and feedback from healthcare professionals

2 kidney stone detection


Data Collection: Obtain a dataset of medical images that includes cases with and without kidney
stones. These images could be ultrasound, X-ray, or CT scans. Ensure that the dataset is diverse and
representative of different types and sizes of kidney stones.

Data Preprocessing: Preprocess the medical images to enhance their quality and prepare them for
input into your detection model. Common preprocessing steps may include resizing, normalization,
and noise reduction.

Labeling: Annotate the images to indicate the presence or absence of kidney stones. This step is
crucial for supervised learning, where the algorithm learns to identify kidney stones based on labeled
examples.

Model Selection: Choose a suitable model architecture for kidney stone detection. Convolutional
Neural Networks (CNNs) are commonly used for image-based tasks like this. You may use
pre-trained CNN models such as ResNet, VGG, or design a custom architecture tailored to your
dataset.

Training: Split your dataset into training and validation sets. Train your chosen model on the training
data using an appropriate optimization algorithm and loss function. Monitor the model's performance
on the validation set and tune hyperparameters as needed to improve performance.

Evaluation: Evaluate the trained model on a separate test set to assess its performance metrics such
as accuracy, precision, recall, and F1-score. This step helps ensure that the model generalizes well to
unseen data.

Deployment: Integrate the trained model into a software module or application for kidney stone
detection. Provide a user-friendly interface where users can upload medical images, and the module
can output predictions indicating the presence or absence of kidney stones.

Validation: Validate the performance of the deployed module using real-world medical images.
Collect feedback from users and healthcare professionals to iteratively improve the module's
accuracy and usability.

Continued Improvement: Monitor the performance of the module over time and update it as needed
with new data or advancements in machine learning techniques. Stay informed about the latest
research in kidney stone detection to incorporate relevant improvements into your module.

3 Pneumonia Detection
Data Collection: Gather a dataset of chest X-ray images that includes cases with and without
pneumonia. You can obtain such datasets from publicly available sources like the NIH Chest X-ray
dataset or through collaboration with hospitals and research institutions. Ensure that the dataset is
diverse and covers various demographics, ages, and severity levels of pneumonia.
Data Preprocessing: Preprocess the chest X-ray images to enhance their quality and prepare them for
input into your detection model. This may involve resizing, normalization, and potentially
augmentation techniques such as rotation, flipping, and zooming to increase the robustness of the
model.

Labeling: Annotate the chest X-ray images to indicate whether pneumonia is present or absent. This
step is crucial for supervised learning, where the algorithm learns to identify pneumonia based on
labeled examples.

Model Selection: Choose an appropriate model architecture for pneumonia detection. CNNs are
commonly used for image-based tasks like this. You may use pre-trained CNN models such as
DenseNet, ResNet, or custom architectures designed to suit your dataset and requirements.

Training: Split your dataset into training and validation sets. Train your chosen model on the training
data using an appropriate optimization algorithm and loss function (e.g., binary cross-entropy for
binary classification). Monitor the model's performance on the validation set and tune
hyperparameters as needed to improve performance.

Evaluation: Evaluate the trained model on a separate test set to assess its performance metrics such
as accuracy, precision, recall, and F1-score. This step ensures that the model generalizes well to
unseen data and can reliably detect pneumonia.

Deployment: Integrate the trained model into a software module or application for pneumonia
detection. Provide a user-friendly interface where users, such as radiologists or healthcare
professionals, can upload chest X-ray images, and the module can output predictions indicating the
presence or absence of pneumonia.

Validation: Validate the performance of the deployed module using real-world chest X-ray images.
Collect feedback from users and healthcare professionals to iteratively improve the module's
accuracy and usability.
Continued Improvement: Monitor the performance of the module over time and update it as needed
with new data or advancements in machine learning techniques. Stay informed about the latest
research in pneumonia detection to incorporate relevant improvements into your module.

4 Brain Tumor Detection


Data Collection: Obtain a dataset of brain MRI (Magnetic Resonance Imaging) scans that includes
images both with and without brain tumors. Collaborate with hospitals, research institutions, or use
publicly available datasets like the BRATS (Multimodal Brain Tumor Segmentation Challenge)
dataset. Ensure the dataset covers different types, sizes, and locations of brain tumors.

Data Preprocessing: Preprocess the brain MRI scans to enhance their quality and prepare them for
input into your detection model. Common preprocessing steps include resizing, normalization, and
skull stripping to remove non-brain tissues. You may also need to address class imbalance by
augmenting the minority class (tumor images).

Labeling: Annotate the brain MRI scans to indicate the presence, location, and type of brain tumors.
This step is crucial for supervised learning, where the algorithm learns to detect tumors based on
labeled examples.

Model Selection: Choose an appropriate model architecture for brain tumor detection. Convolutional
Neural Networks (CNNs) are commonly used for image-based tasks like this. You may use
pre-trained CNN models such as U-Net, V-Net, or design a custom architecture suited to your dataset
and requirements.

Training: Split your dataset into training, validation, and test sets. Train your chosen model on the
training data using an appropriate optimization algorithm and loss function (e.g., binary
cross-entropy for binary classification or dice coefficient for segmentation tasks). Monitor the
model's performance on the validation set and tune hyperparameters as needed to improve
performance.

Evaluation: Evaluate the trained model on a separate test set to assess its performance metrics such
as accuracy, sensitivity, specificity, Dice similarity coefficient, and Hausdorff distance. This step
ensures that the model generalizes well to unseen data and can reliably detect brain tumors.
Deployment: Integrate the trained model into a software module or application for brain tumor
detection. Provide a user-friendly interface where users, such as radiologists or healthcare
professionals, can upload brain MRI scans, and the module can output predictions indicating the
presence and location of brain tumors.

Validation: Validate the performance of the deployed module using real-world brain MRI scans.
Collect feedback from users and healthcare professionals to iteratively improve the module's
accuracy and usability.

Continued Improvement: Monitor the performance of the module over time and update it as needed
with new data or advancements in machine learning techniques. Stay informed about the latest
research in brain tumor detection to incorporate relevant improvements into your module.

5.1 TECHNOLOGIES USED

5.1.1 PYTHON
Python is a high-level, general-purpose and a very popular programming language.
Python programming language (latest Python 3) is being used in web development, Machine
Learning applications, along with all cutting edge technology in Software Industry. Python
Programming Language is very well suited for Beginners, also for experienced programmers
with other programming languages like C++ and Java.
Python is an interpreted, high-level, general-purpose programming language. Created
by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code
readability with its notable use of significant whitespace. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects.
Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural,) object-oriented, and function
programming. Python is often described as a "batteries included" language due to its comprehensive
standard library.
Python was conceived in the late 1980s as a successor to the ABC language. Python 2.0, released
in 2000, introduced features like list comprehensions and a garbage collection system capable of collecting
reference cycles. Python 3.0, released in 2008, was a major revision of the language that is not completely
backward-compatible, and much Python 2 code does not run unmodified on Python 3.
ADVANTAGES OF PYTHON
1. Easy to read, learn and code

Python is a high-level language and its syntax is very simple. It does not need any semicolons or
braces and looks like English. Thus, it is beginner-friendly. Due to its simplicity, its maintenance cost is
less.
2. Dynamic Typing

In Python, there is no need for the declaration of variables. The data type of the variable gets
assigned automatically during runtime, facilitating dynamic coding.
3. Free, Open Source
It is free and also has an open-source licence. This means the source code is available to the public
for free and one can do modifications to the original code. This modified code can be distributed with no
restrictions.
This is a very useful feature that helps companies or people to modify according to their needs and use
their version.
4. Portable :
Python is also platform-independent. That is, if you write the code on one of the Windows, Mac,
or Linux operating systems, then you can run the same code on the other OS with no need for any
changes.
This is called Write Once Run Anywhere (WORA). However, you should be careful while you add system
dependent features.

5. Extensive Third-Party Libraries


Python comes with a wide range of libraries like NumPy, Pandas, Tkinter, Django, etc. The
python package installer (PIP) helps you install these libraries in your interpreter/ IDLE.
These libraries have different modules/ packages. These modules contain different inbuilt functions and
algorithms. Using these make the coding process easier and makes it look simple.
5.1.2 STREAMLIT
Streamlit is an open-source python framework for building web apps for Machine Learning and
Data Science. We can instantly develop web apps and deploy them easily using Streamlit. Streamlit allows
you to write an app the same way you write a python code. Streamlit makes it seamless to work on the
interactive loop of coding and viewing results in the web app.

The best thing about Streamlit is that you don't even need to know the basics of web development
to get started or to create your first web application. So if you're somebody who's into data science and
you want to deploy your models easily, quickly, and with only a few lines of code, Streamlit is a good fit.

One of the important aspects of making an application successful is to deliver it with an effective
and intuitive user interface. Many of the modern data-heavy apps face the challenge of building an
effective user interface quickly, without taking complicated steps. Streamlit is a promising open-source
Python library, which enables developers to build attractive user interfaces in no time. Streamlit is the
easiest way especially for people with no front-end knowledge to put their code into a web application:

● No front-end (html, js, css) experience or knowledge is required.

● You don't need to spend days or months to create a web app, you can create a really beautiful
machine learning or data science app in only a few hours or even minutes.

● It is compatible with the majority of Python libraries (e.g. pandas, matplotlib, seaborn, plotly,
Keras, PyTorch, SymPy(latex)).

● Less code is needed to create amazing web apps.

● Data caching simplifies and speeds up computation pipelines.


Streamlit is a popular open-source Python library that allows developers to build interactive web
applications for data science and machine learning projects with ease. Here are some of the key features of
Streamlit:

1. Ease of Use: Streamlit is easy to use for both beginners and advanced developers. Its simple
syntax allows developers to build interactive web applications quickly without having to worry
about the details of web development.
2. Data Visualization: Streamlit allows developers to create data visualizations such as charts, plots,
and graphs with just a few lines of code. It supports popular data visualization libraries like
Matplotlib, Plotly, and Altair.
3. Customizable UI Components: Streamlit provides various UI components that can be customized
to fit the needs of the application. These components include sliders, dropdowns, buttons, and text
inputs.
4. Real-time Updating: Streamlit automatically updates the web application in real-time as the user
interacts with it. This makes it easy to create dynamic applications that respond to user input in
real-time.
5. Integration with Machine Learning Libraries: Streamlit integrates seamlessly with popular
machine learning libraries like TensorFlow, PyTorch, and Scikit-learn. This allows developers to
build and deploy machine learning models with ease.
6. Sharing and Deployment: Streamlit makes it easy to share and deploy applications. Developers
can share their applications with others by simply sharing a URL. Streamlit also provides tools for
deploying applications to cloud services like Heroku and AWS

ADVANTAGES OF STREAMLIT

Fast and Easy Development: Streamlit provides a simple and intuitive syntax that makes it easy to
build interactive web applications for data science and machine learning projects. With Streamlit,
developers can build applications faster and with less code.

Real-Time Updates: Streamlit automatically updates the web application in real-time as the user
interacts with it. This allows developers to create dynamic applications that respond to user input in
real-time, without the need for manual updates.
Seamless Integration with Popular Libraries: Streamlit integrates seamlessly with popular data
science and machine learning libraries like Matplotlib, Plotly, Pandas, TensorFlow, PyTorch, and
Scikit-learn. This allows developers to build powerful and complex applications using their preferred
libraries.

Customizable UI Components: Streamlit provides a range of UI components that can be


customized to fit the needs of the application. These components include sliders, dropdowns, buttons, and
text inputs, which can be easily customized with CSS.

Sharing and Deployment: Streamlit makes it easy to share and deploy applications. Developers
can share their applications with others by simply sharing a URL. Streamlit also provides tools for
deploying applications to cloud services like Heroku and AWS, making it easy to scale applications as
needed.

Active Community Support: Streamlit has an active community of developers and users who
contribute to the development of the library, provide support to other developers, and share their own
projects and experiences with the library.

5.1.3 JUPYTER NOTEBOOK

The Jupyter Notebook is an open source web application that you can use to create and share
documents that contain live code, equations, visualizations, and text. Jupyter Notebook is maintained by
the people at Project Jupyter.

Jupyter Notebooks are a spin-off project from the IPython project, which used to have an IPython
Notebook project itself. The name, Jupyter, comes from the core supported programming languages that it
supports: Julia, Python, and R. Jupyter ships with the IPython kernel, which allows you to write your
programs in Python, but there are currently over 100 other kernels that you can also use.

A Jupyter Notebook document is a browser-based REPL containing an ordered list of input/output


cells which can contain code, text (using Markdown), mathematics, plots and rich media. Underneath the
interface, a notebook is a JSON document, following a versioned schema, usually ending with the ".ipynb"
extension.
Jupyter Notebook can connect to many kernels to allow programming in different languages. A
Jupyter kernel is a program responsible for handling various types of requests (code execution, code
completions, inspection), and providing a reply. Kernels talk to the other components of Jupyter using
ZeroMQ, and thus can be on the same or remote machines. Unlike many other Notebook-like interfaces, in
Jupyter, kernels are not aware that they are attached to a specific document, and can be connected to many
clients at once. Usually kernels allow execution of only a single language, but there are a couple of
exceptions.[citation needed] By default Jupyter Notebook ships with the IPython kernel. As of the 2.3
releas (October 2014), there are 49 Jupyter-compatible kernels for many programming languages,
including Python, R, Julia and Haskell

A Jupyter Notebook can be converted to a number of open standard output formats (HTML,
presentation slides, LaTeX, PDF, ReStructuredText, Markdown, Python) through "Download As" in the
web interface, via the nbconvert libraror "jupyter nbconvert" command line interface in a shell. To
simplify visualisation of Jupyter notebook documents on the web, the nbconvert library is provided as a
service through NbViewer which can take a URL to any publicly available notebook document, convert it
to HTML on the fly and display it to the user.

The Jupyter Notebook combines three components:

• The notebook web application: An interactive web application for writing and running code
interactively and authoring notebook documents.
• Kernels: Separate processes started by the notebook web application that runs users’ code in a
given language and returns output back to the notebook web application. The kernel also handles things
like computations for interactive widgets, tab completion and introspection.
• Notebook documents: Self-contained documents that contain a representation of all content visible
in the note-book web application, including inputs and outputs of the computations, narrative text,
equations, images, and rich media representations of objects. Each notebook document has its own kernel.
CHAPTER 6
Testing
A. Suggested Framework Using CNNs

1. Data Collection and Integration


Objective: Gather relevant datasets from Kaggle and other reliable sources.

Heart Disease: Cleveland Heart Disease Dataset (structured data).


Brain Tumor: Brain MRI Images Dataset (image data).
Pneumonia: Chest X-Ray Images Dataset for Pneumonia (image data).
Lung Cancer: Lung Cancer Dataset (CT scans or structured data).

1 import pandas as pd
import os
import glob

# Load structured datasets


heart_disease_data = pd.read_csv('path/to/heart_disease.csv')
lung_cancer_data = pd.read_csv('path/to/lung_cancer.csv')

# Load image datasets for brain tumors and pneumonia


brain_tumor_images = glob.glob('path/to/brain_tumor_images/*.jpg')
pneumonia_images = glob.glob('path/to/pneumonia_images/*.jpeg')

2. Data Preprocessing
Objective: Clean and prepare the data for model training.

For image data (brain tumors and pneumonia):

Image Processing: Resize images, normalize pixel values, and apply data augmentation.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Preprocessing image data


img_size = 128
datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2,
rotation_range=20, width_shift_range=0.2, height_shift_range=0.2,
shear_range=0.2, zoom_range=0.2, horizontal_flip=True,
fill_mode='nearest')

def load_images(image_paths, img_size):


images = [tf.keras.preprocessing.image.load_img(img, target_size=(img_size,
img_size)) for img in image_paths]
images = [tf.keras.preprocessing.image.img_to_array(img) for img in images]
return tf.convert_to_tensor(images)

brain_tumor_images = load_images(brain_tumor_images, img_size)


pneumonia_images = load_images(pneumonia_images, img_size)

#### 3. Model Building with CNNs

**Objective:** Build CNN models for image data.

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten

# Model for image data (brain tumors and pneumonia)


def build_image_model(img_size):
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(img_size,
img_size, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model

brain_tumor_model = build_image_model(img_size)
pneumonia_model = build_image_model(img_size)
```

#### 4. Model Training

**Objective:** Train the models using the training data.

```python
# Training image data models using data augmentation
brain_tumor_datagen = datagen.flow_from_directory('path/to/brain_tumor_images',
target_size=(img_size, img_size), batch_size=32, class_mode='binary', subset='training')
brain_tumor_val_gen = datagen.flow_from_directory('path/to/brain_tumor_images',
target_size=(img_size, img_size), batch_size=32, class_mode='binary', subset='validation')

pneumonia_datagen = datagen.flow_from_directory('path/to/pneumonia_images',
target_size=(img_size, img_size), batch_size=32, class_mode='binary', subset='training')
pneumonia_val_gen = datagen.flow_from_directory('path/to/pneumonia_images',
target_size=(img_size, img_size), batch_size=32, class_mode='binary', subset='validation')

brain_tumor_model.fit(brain_tumor_datagen, validation_data=brain_tumor_val_gen,
epochs=50)
pneumonia_model.fit(pneumonia_datagen, validation_data=pneumonia_val_gen, epochs=50)
```

#### 5. Model Evaluation

**Objective:** Evaluate the models using the test data.

```python
# Evaluate image data models
brain_tumor_eval = brain_tumor_model.evaluate(brain_tumor_val_gen)
pneumonia_eval = pneumonia_model.evaluate(pneumonia_val_gen)

print(f'Brain Tumor Model - Loss: {brain_tumor_eval[0]}, Accuracy: {brain_tumor_eval[1]}')


print(f'Pneumonia Model - Loss: {pneumonia_eval[0]}, Accuracy: {pneumonia_eval[1]}')
```

#### 6. Model Interpretation

**Objective:** Use interpretation techniques to understand model predictions and derive


actionable insights.

```python
import shap

# Example for Brain Tumor Model


explainer = shap.DeepExplainer(brain_tumor_model, brain_tumor_images[:100])
shap_values = explainer.shap_values(brain_tumor_images[:100])

# Visualize the first prediction's explanation


shap.initjs()
shap.force_plot(explainer.expected_value[0], shap_values[0][0], brain_tumor_images[0])
```

### Summary

This comprehensive framework integrates data collection, preprocessing, model building,


training, evaluation, and interpretation using CNNs in TensorFlow. By following these steps,
researchers can develop robust predictive models for heart disease, brain tumors, pneumonia,
and lung cancer. The use of CNNs ensures high performance in image analysis tasks, while
interpretation techniques like SHAP provide transparency and actionable insights from the
models.

App Integration
Integrating a heart disease prediction module into a mobile or web application involves several
steps:

1. **Design the User Interface (UI)**:


- Design an intuitive and user-friendly interface for the application.
- Include input fields for users to enter their demographic and clinical information required for
heart disease prediction.
- Provide clear instructions and guidance to users on how to use the application.

2. **Develop Backend Services**:


- Set up backend services to handle user requests and process prediction tasks.
- Implement API endpoints or server-side scripts to receive input data from the UI, perform
prediction using the trained model, and send back the prediction results to the UI.

3. **Integrate the Model**:


- Incorporate the trained heart disease prediction model into the backend services.
- Ensure that the model can handle incoming data in the required format and provide
predictions accurately and efficiently.

4. **Data Validation and Sanitization**:


- Validate and sanitize user input data to prevent security vulnerabilities and ensure data
integrity.
- Implement server-side validation to check for missing or invalid input values before sending
data to the prediction model.

5. **Security Considerations**:
- Implement security measures such as encryption (e.g., HTTPS) to protect sensitive user data
transmitted between the client and server.
- Implement authentication and authorization mechanisms to ensure that only authorized users
can access the application and its features.

6. **Testing**:
- Conduct thorough testing of the application to ensure functionality, usability, and
performance.
- Test different scenarios, including edge cases and error conditions, to identify and fix
potential issues.

7. **Deployment**:
- Deploy the application to a web server or a cloud platform for public access.
- Configure server infrastructure to handle incoming requests and scale resources as needed to
accommodate varying levels of traffic.

8. **Monitoring and Maintenance**:


- Monitor application performance and user feedback to identify areas for improvement.
- Regularly update the application with bug fixes, security patches, and enhancements based
on user feedback and evolving requirements.

9. **Compliance and Regulation**:


- Ensure compliance with relevant regulations and standards, such as data privacy laws (e.g.,
GDPR, HIPAA) and healthcare industry regulations.
- Implement measures to protect user privacy and maintain confidentiality of personal health
information.

10. **User Education and Support**:


- Provide user education materials, FAQs, or help documentation to assist users in
understanding how to use the application effectively.
- Offer user support channels, such as email support or live chat, to address any questions or
concerns users may have.

By following these steps, you can successfully integrate a heart disease prediction module into a
mobile or web application, providing users with valuable insights into their cardiovascular
health and promoting early detection and preventive care.

OUTPUT
Fig 6.1 All Disease Detection
Fig 6.2 Kidney Stone Fig 6.3 Lung cancer
Fig 6.4 Pneumonia Detection

v Fig 6.5 Brain tumor


CHAPTER 7
CONCLUSION

Since medical data is developing quickly and needs to be analyzed in order to provide precise disease
predictions based on symptoms, we classified patient data using standard machine learning techniques. By
providing the patient records, which aid in determining the degree of disease risk prediction, we were able
to develop an accurate general sickness risk prediction. With this approach, disease and risk variables can
be predicted with the least amount of effort and expense. We compare the performance of the different
algorithms in terms of accuracy and processing time. In comparison to other machine learning algorithms,
the CNN algorithm processes data more quickly and with a higher accuracy. Therefore, in terms of
accuracy and promptness, In contrast with alternative machine learning algorithms, CNN is superior. In
the future, we hope to add more diseases and forecast the likelihood that a patient would develop each
one. Machine learning algorithms for precise disease prediction may facilitate early interventions, tailored
treatment plans, and targeted sickness management strategies. It can improve patient care, help healthcare
providers make well-informed decisions, and optimize the distribution of resources within healthcare
systems. It also has potential for population-level disease surveillance, which would allow for the early
identification of disease outbreaks and the application of preventative measures. This study project's
literature review, which focused on the use of SVM models especially, brought to light the expanding
corpus of information on machine learning-based disease prediction. evaluations in comparison to
alternative machine learning methods,Techniques for feature selection and optimization were
investigated, offering insightful information for further study. In summary, this study highlights the
potential of SVM models in multi-disease prediction and advances the field of disease prediction using
machine learning. We can get closer to developing more precise, timely, and individualized healthcare
interventions by utilizing machine learning, which will ultimately enhance patient outcomes and create
more effective healthcare systems

FUTURE
WORK
Incorporating more data sources: Currently, multiple disease prediction systems typically rely
on electronic health records and medical imaging data. In the future, other data sources such as
wearable devices, social media, and environmental data could be integrated into these systems
to provide a more comprehensive picture of a patient's health.
Addressing data bias: As with all machine learning algorithms, bias in the training data can
lead to inaccurate predictions and perpetuate health disparities. Future work should focus on
developing methods to address and mitigate data bias, such as using more diverse and
representative datasets, and incorporating fairness and equity considerations into the algorithm
development process.
Advancing personalized medicine: Multiple disease prediction using machine learning has the
potential to enable more personalized and precise medicine, by predicting an individual's risk
of developing specific diseases based on their unique medical history and other factors.
CHAPTER 10
REFERENCES

1. Arvind Kumar Tiwari, “Machine Learning based Approaches for Prediction of


Parkinson’s Disease”, Machine Learning and Applications: An International
Journal (MLAU) vol. 3, June 2016.
2. Carlo Ricciardi, et al, “Using gait analysis’ parameters to classify
Parkinsonism: A data mining approach” Computer Methods and Programs in
Biomedicine vol. 180, Oct. 2019.
3. Dr. Anupam Bhatia and Raunak Sulekh, “Predictive Model for Parkinson’s
Disease through Naive Bayes Classification” International Journal of
Computer Science & Communication vol. 9, Dec. 2017, pp. 194- 202, Sept
2017 - March 2018.
4. Dragana Miljkovic et al, “Machine Learning and Data Mining Methods for
Managing Parkinson’s Disease” LNAI 9605, pp 209-220, 2016.
5. M. Abdar and M. Zomorodi-Moghadam, “Impact of Patients’ Gender on Parkinson’s
disease using Classification Algorithms” Journal of AI and Data Mining, vol. 6, 2018.
6. M. A. E. Van Stiphout, J. Marinus, J. J. Van Hilten, F. Lobbezoo, and C. De
Baat, “Oral health of Parkinson’s disease patients: a case-control study,”
Parkinson’s Disease, vol. 2018, Article ID 9315285, 8 pages, 2018.
7. Md. Redone Hassan et al, “A Knowledge Base Data Mining based on
Parkinson’s Disease” International Conference on System Modelling &
Advancement in Research Trends, 2019.
8. H. EL Massari, S. Mhammedi, Z. Sabouri, and N. Gherabi, “Ontology-Based
Machine Learning to Predict Diabetes Patients,” in Advances in Information,
Communication and Cybersecurity, Cham, 2022, pp. 437–445. doi:
10.1007/978- 3-030-91738-8_40.
9. F. Alaa Khaleel and A. M. Al-Bakry, “Diagnosis of diabetes using machine
learning algorithms,” Mater. Today Proc., Jul. 2021, doi:
10.1016/j.matpr.2021.07.196.
10. J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for
diabetes prediction,” ICT Express, vol. 7, no. 4, pp. 432–439, Dec. 2021, doi:
10.1016/j.icte.2021.02.004.
11. P. Cıhan and H. Coşkun, “Performance Comparison of Machine Learning
Models for Diabetes Prediction,” in 2021 29th Signal Processing and
Communications Applications Conference (SIU), Jun. 2021, pp. 1–4. doi:
10.1109/SIU53274.2021.9477824.
12. M. A. Sarwar, N. Kamal, W. Hamid, and M. A. Shah, “Prediction of Diabetes
Using Machine Learning Algorithms in Healthcare,” in 2018 24th International
Conference on Automation and Computing (ICAC), Sep. 2018, pp. 1–6. doi:
10.23919/IConAC.2018.8748992.
13. Y. Jian, M. Pasquier, A. Sagahyroon, and F. Aloul, “A Machine Learning
Approach to Predicting Diabetes Complications,” Healthcare, vol. 9, no. 12,
Art. no. 12, Dec. 2021, doi: 10.3390/healthcare9121712.
14. S. Barik, S. Mohanty, S. Mohanty, and D. Singh, “Analysis of Prediction
Accuracy of Diabetes Using Classifier and Hybrid Machine Learning
Techniques,” in Intelligent and Cloud Computing, Singapore, 2021, pp.
399–409. doi: 10.1007/978-981-15-6202-0_41.
15. Santhana Krishnan J and Geetha S, “Prediction of Heart Disease using
Machine Learning Algorithms” ICIICT, 2019.
16. Aditi Gavhane, Gouthami Kokkula, Isha Panday, Prof. Kailash Devadkar,
“Prediction of Heart Disease using Machine Learning”, Proceedings of the 2nd
International conference on Electronics, Communication and Aerospace
Technology(ICECA), 2018.
17. Senthil kumar mohan, chandrasegar thirumalai and Gautam Srivastva,
“Effective Heart Disease Prediction Using Hybrid Machine Learning
Techniques” IEEE Access 2019.
18. Himanshu Sharma and M A Rizvi, “Prediction of Heart Disease using Machine
Learning Algorithms: A Survey” International Journal on Recent and
Innovation Trends in Computing and Communication Volume: 5 Issue: 8 ,
IJRITCC August 2017.
19. M. Nikhil Kumar, K. V. S. Koushik, K. Deepak, “Prediction of Heart Diseases
Using Data Mining and Machine Learning Algorithms and Tools” International
Journal of Scientific Research in Computer Science, Engineering and
Information Technology
20. ,IJSRCSEIT 2019.
21. Amandeep Kaur and Jyoti Arora,“Heart Diseases Prediction using Data
Mining Techniques: A survey” International Journal of Advanced Research in
Computer Science , IJARCS 2015-2019.
22. Pahulpreet Singh Kohli and Shriya Arora, “Application of Machine Learning
in Diseases Prediction”, 4th International Conference on Computing
Communication And Automation(ICCCA), 2018.

You might also like