MajorProjectReportFinal2 0
MajorProjectReportFinal2 0
on
AI Medical Diagnosis
submitted in partial fulfilment of the requirements
for the award of the degree
of
Bachelor of Technology
in
Vaibhav Singh
Enrolment No – A60205221300
Prashant Kumar
Enrolment No – A60205221295
Sujal Shakya
Enrolment No – A60205221281
DECLARATION
We, the undersigned, students of Bachelor of Technology in Computer Science and Engineering,
hereby declare that the Major Project entitled “AI Medical Diagnosis” has been carried out by
our group and is submitted to the Department of Computer Science and Engineering, Amity
School of Engineering & Technology, Amity University Madhya Pradesh, in partial fulfilment of
the requirements for the award of the degree of Bachelor of Technology in Computer Science
and Engineering. We also confirm that this project has not been previously used as the basis for
the award of any degree, diploma, or other similar title or recognition.
Harsh Sharma
(Enrolment No – A60205221271)
Vaibhav Singh
(Enrolment No – A60205221300)
Prashant Kumar
(Enrolment No – A60205221295)
Sujal Shakya
(Enrolment No – A60205221281)
ii
Department of Computer Science and Engineering
Amity School of Engineering & Technology
Amity University Madhya Pradesh, Gwalior
June 2025
CERTIFICATE
This is to certify that Vaibhav Singh, Prashant Kumar, Harsh Sharma, and Sujal Shakya,
students of B.Tech (C.S.E) VII semester, Section-E, Department of Computer Science and
Engineering, ASET, Amity University Madhya Pradesh, has done their Major Project entitled
“AI Medical Diagnosis” under my guidance and supervision during “14 March 2025 - 24
May 2025”.
The work was satisfactory. They have shown complete dedication and devotion to the given
project work.
iii
ACKNOWLEDGEMENT
We are very much thankful to our Honorable Pro Chancellor Lt Gen. V. K. Sharma AVSM
(Retd) sir for allowing us to carry out our project. We would also like to thank Respected
Prof.(Dr.) R.S. Tomar sir, Pro-Vice Chancellor, Amity University Madhya Pradesh for his
valuable support. We would also like to thank Prof.(Dr.) M. P. Kaushik sir, Pro-Vice Chancellor
(Research), Amity University Madhya Pradesh for his guidance.
We extend our sincere thanks to Prof.(Dr.) Vikas Thada sir, HOI, Amity School of Engineering
and Technology, Amity University Madhya Pradesh, Gwalior and HOD, Computer Science and
Engineering, ASET for his guidance and support for the selection of appropriate labs for our
project.
We are very much grateful to Prof.(Dr.) Deepak Motwani sir, Associate Professor, Department
of Computer Science & Engineering, Amity School of Engineering and Technology, Amity
University Madhya Pradesh for their constant guidance and encouragement provided in this
endeavour.
We are also thankful to the whole staff of ASET, AUMP for teaching and helping us always. Last
but not the least we would like to thank our parents and friends for their constant support.
Harsh Sharma
Enrolment No.-A60205221271
Vaibhav Singh
Enrolment No.-A60205221300
Prashant Kumar
Enrolment No.-A60205221295
Sujal Shakya
Enrolment No.-A60205221281
iv
ABSTRACT
The End-to-End AIML Medical Diagnosis System is an intelligent, scalable, and modular AI-
powered web application designed to facilitate early detection and diagnosis of six life-threatening
diseases: Heart Disease, Diabetes, Kidney Disease, Liver Disease, Breast Cancer, and
Malaria. The system bridges the gap between modern Artificial Intelligence (AI) advancements
and practical healthcare needs, especially for remote and underserved regions.
This comprehensive solution leverages a hybrid modeling approach that integrates both
Machine Learning (ML) and Deep Learning (DL) techniques. Diseases such as heart disease,
diabetes, liver, kidney, and breast cancer are predicted using supervised ML classifiers —
including Random Forest, Logistic Regression, and XGBoost — trained on publicly available
structured datasets from platforms like Kaggle. In contrast, Convolutional Neural Networks
(CNNs) are employed for the image-based classification of malaria, trained on cell image datasets
provided by the National Institute of Health (NIH), delivering high-performance binary
classification for parasitic detection.
From a software architecture standpoint, the platform is developed using the Flask web
framework for backend logic, integrated with a PostgreSQL database to persist user input,
predictions, and historical diagnostic data. All components are orchestrated via Docker
containers, ensuring seamless, cross-platform deployment and maintainability. The modular
structure not only supports the addition of new disease models with minimal effort but also
facilitates independent development and testing of model-specific routes, improving software
scalability and maintainability.
The user interface offers a secure authentication system, intuitive form-based input for numerical
features, and image upload functionality for malaria classification. Predictions are computed in
real-time and logged into disease-specific tables linked to the authenticated user's profile. Each
entry is timestamped and stored persistently, thereby enabling long-term tracking, analytics, and
eventual integration with Electronic Health Record (EHR) systems.
Rigorous model evaluation demonstrated accuracy rates ranging between 85–94%, with the
malaria CNN model achieving over 93% on validation datasets. Additional performance metrics
such as precision, recall, F1-score, and confusion matrices were used to assess the reliability of
each predictive model. The system’s end-to-end prediction latency remained under 5 seconds for
both tabular and image data, confirming its feasibility for real-time diagnostic use cases.
v
In conclusion, the End-to-End AIML Medical Diagnosis System represents a holistic application
of machine intelligence in healthcare diagnostics. By providing a low-cost, automated, and fast
diagnostic alternative, this project paves the way for deploying AI-based triage systems in
community health centers, mobile clinics, and telemedicine platforms. It not only validates the
technical feasibility of AI in diagnostics but also demonstrates its potential to democratize
healthcare access, reduce diagnostic delays, and improve patient outcomes.
Keywords: Artificial Intelligence (AI) ,Machine Learning (ML) , Deep Learning (DL)
Convolutional Neural Network (CNN), Disease Prediction, Flask, PostgreSQL, Docker
vi
LIST OF FIGURES
vi
LIST OF ABBREVIATIONS
1 AI Artificial Intelligence
2 ML Machine Learning
3 DL Deep Learning
6 JS JavaScript
vii
CONTENTS
Topic Page No.
Front Page i
Declaration by student ii
Certificate by supervisor (Forwarded by HOD) iii
Certificate by Company iv
Acknowledgement v
Abstract vi
List of Figures vii
List of Abbreviations viii
Contents ix
Chapter 1: Introduction 1-2
Chapter 2: Literature Review 3-4
Chapter 3: Materials and Methods 5-8
Chapter 4: Results and Discussions 9-14
Chapter 5: Conclusion 15-17
Chapter 6: Future Prospects 18-19
Bibliography 20
viii
CHAPTER 1
INTRODUCTION
1.1 Background
The increasing global burden of chronic and infectious diseases demands scalable solutions for
early diagnosis and treatment. In India and many other developing nations, healthcare systems are
often overburdened, under-resourced, and inaccessible in rural and remote areas. The ratio of
qualified doctors to patients remains critically low, particularly in Tier 2 and Tier 3 cities.
At the same time, digital transformation and the proliferation of data have created new
opportunities in healthcare analytics. Artificial Intelligence (AI) and Machine Learning (ML)
have emerged as powerful tools to revolutionize clinical decision-making, diagnostics, and
preventive care. By training models on historical medical data, it becomes possible to recognize
complex patterns that can aid in early disease prediction — a crucial step in reducing mortality
and improving quality of life.
This project builds upon the potential of AI by developing a multi-disease diagnostic system
capable of predicting six critical diseases: Heart Disease, Diabetes, Kidney Disease, Liver
Disease, Breast Cancer, and Malaria. The system integrates traditional ML algorithms for
structured data and Deep Learning methods for image-based classification, providing a
comprehensive solution that combines versatility, accuracy, and accessibility.
1
1.3 Objectives
The primary goal of this project is to design and implement a cloud-deployable, AI-based
diagnostic platform that assists users in identifying potential health risks with high accuracy. The
following specific objectives have been defined:
1.3.1 To develop ML models for structured data inputs to predict:
o Heart Disease
o Diabetes
o Kidney Disease
o Liver Disease
o Breast Cancer
1.3.2 To develop a CNN-based model to classify parasitic vs. uninfected cells for malaria
detection using image data.
1.3.3 To design a user-friendly web interface using Flask and HTML/CSS, allowing users to
securely input medical data or upload images.
1.3.4 To ensure secure and scalable storage of predictions and input data using a relational
database (PostgreSQL).
1.3.5 To implement DevOps practices, including Docker containerization, to ensure
environment consistency, portability, and ease of deployment.
1.3.6 To maintain modularity in code and architecture to facilitate the addition of future disease
modules.
2
CHAPTER 2
LITERATURE REVIEW
The development of recommendation systems has been a pivotal area of research in computer
science, with applications spanning e-commerce, entertainment, and education. Accommodation
recommendation systems, while less explored compared to other domains, present unique
challenges and opportunities that have been addressed in several studies.
Clustering techniques have been extensively studied for their applicability in recommendation
systems. Xu and Wunsch (2005) provide a comprehensive survey of clustering algorithms,
highlighting their strengths in grouping similar items based on defined attributes. The K-means
algorithm, in particular, has been widely adopted for its simplicity and efficiency. Studies by
Bandyopadhyay and Barman (2021) illustrate how K-means can effectively group housing
options based on proximity and other features, enabling personalized recommendations for users
with diverse preferences.
Incorporating user preferences into recommendation systems has been a subject of ongoing
research. Hariri, Mobasher, and Burke (2012) propose a context-aware approach that accounts
for dynamic user needs and situational factors. This perspective aligns closely with the objectives
of the current project, which aims to tailor recommendations based on the facilities students
prefer near their accommodations. Trust-building mechanisms, as discussed by Chen and Pu
3
(2014), further enhance the usability and acceptance of recommendation systems by ensuring
transparency and reliability in suggestions.
The role of visualization in recommendation systems has also been explored extensively. Liu,
Wu, and Wang (2016) demonstrate how interactive interfaces can improve user engagement and
decision-making. For accommodation systems, visualizations such as maps and proximity
indicators are particularly valuable, as they allow users to intuitively assess the suitability of
different options.
In the context of student housing, Smith and Miller (2019) provide insights into the unique needs
and challenges faced by students. Their research highlights the importance of affordability, safety,
and access to educational institutions, which are critical factors for any accommodation
recommendation system targeting this demographic. By addressing these priorities, systems can
significantly alleviate the stress associated with finding suitable housing in competitive markets.
Despite the progress in related fields, gaps remain in the development of accommodation-specific
recommendation systems. Existing platforms often lack the ability to comprehensively integrate
user preferences, geospatial data, and clustering algorithms into a cohesive framework. This
project seeks to bridge this gap by leveraging K-means clustering and user-defined preferences
to deliver tailored recommendations for students in Gwalior City.
4
CHAPTER 3
MATERIALS AND METHODS
3.1 Tools & Technologies
A diverse stack of technologies and tools was selected to develop a robust, modular, and scalable
diagnostic system. The core components used in the project are:
6
Figure 3.2: Login page for Web App.
Each dataset was split into training and testing sets, with appropriate scaling, encoding, and null
value treatment applied. Categorical variables were label-encoded or one-hot encoded as needed.
Images were resized to a standard dimension of 50x50 pixels and normalized to enhance training
performance.
7
with backend ML/DL models and persistent data storage. The process flow is summarized
below:
3.3.1 User Registration/Login:
Users create an account or log in to access disease prediction models. Each session is
authenticated and tracked using Flask session management.
3.3.2 Model Selection:
After logging in, the user is directed to the home page where they can select a disease
module (e.g., Heart Disease, Malaria).
3.3.3 Input Interface:
o For ML models: A form interface is provided to input structured medical data.
o For Malaria: A file upload feature allows users to upload microscope images of blood cells.
3.3.4 Prediction Logic:
Once the data/image is submitted, it is passed to the corresponding trained model. ML
models use .pkl files loaded via joblib, while the CNN uses a .h5 model loaded through
TensorFlow.
3.3.5 Result Rendering:
The prediction result is shown immediately on a new result page, providing diagnostic
insight (e.g., “Heart Disease Detected” or “Malaria Not Detected”).
3.3.6 Data Logging
Each input and its corresponding prediction are stored in disease-specific tables in the
PostgreSQL database, along with a timestamp and user ID. This ensures data traceability
and enables history retrieval for the logged-in user.
9
3.6.1 Model Evaluation: Using metrics such as Accuracy, Precision, Recall, F1-Score, ROC-
AUC.
3.6.2 Web Interface Testing: Cross-browser compatibility and input validation.
3.6.3 Docker Testing: Verified that all containers functioned correctly on restart and could be
deployed on multiple environments.
10
CHAPTER 4
The "End-to-End AIML Medical Diagnosis System" successfully demonstrated its capabilities in
providing automated disease predictions based on user-provided data. This chapter presents the
key outputs from the system, discusses the performance evaluation using various metrics, and
offers insights and observations derived from the implementation.
11
Figure 4.1: Input Page for the model
4.1.3 Visualizations (Conceptual for disease-specific maps) While the primary focus
is on direct diagnosis, visualizations can enhance understanding. For instance, if geographic patient
data were integrated, a map could conceptually show predicted disease clusters or hotspots, similar
to the concept in the original reference.
• Figure 4.2: Final output on the map (Conceptual representation if geographical data were
involved, illustrating how diagnostic results could be visualized spatially).
12
• Kidney Disease: 90%
o Discussion: XGBoost demonstrated strong performance due to its ability to handle
missing values and its robust nature, leading to high accuracy in classifying kidney
disease states.
• Liver Disease: 87%
o Discussion: A Support Vector Machine (SVM) with a radial basis function kernel
achieved this accuracy, proving effective in high-dimensional feature spaces
commonly found in liver function test data.
• Breast Cancer: 92%
o Discussion: An ensemble model (e.g., a stacked classifier combining SVM and
Random Forest) performed exceptionally well, showcasing the distinguishability
between benign and malignant tumors from the cellular features. The high accuracy is
critical given the severe implications of misdiagnosis.
• Malaria (CNN): 94%
o Discussion: The Convolutional Neural Network (CNN) architecture designed for
image classification yielded excellent results. The high accuracy and precision signify
its effectiveness in distinguishing between parasitized and uninfected blood cells,
demonstrating the power of deep learning for visual diagnostics.
4.2.2 Evaluation Metrics: Beyond simple accuracy, a comprehensive evaluation involved
the following metrics:
• Confusion Matrix: Provided a detailed breakdown of True Positives (TP), True Negatives
(TN), False Positives (FP), and False Negatives (FN). This matrix was crucial for
understanding the types of errors made by the models.
• Precision: The proportion of positive identifications that were actually correct (TP/(TP+FP)).
High precision is vital to minimize false alarms.
• Recall (Sensitivity): The proportion of actual positives that were correctly identified
(TP/(TP+FN)). High recall is critical in medical diagnosis to avoid missing actual cases of
disease.
• F1-Score: The harmonic mean of Precision and Recall
(2∗(Precision∗Recall)/(Precision+Recall)). It provides a balanced measure, especially useful
when classes are imbalanced.
• AUC-ROC (Area Under the Receiver Operating Characteristic Curve): The Malaria
CNN model achieved the highest AUC score of 0.96. The AUC-ROC curve plots the True
Positive Rate (Recall) against the False Positive Rate at various threshold settings. A higher
13
AUC indicates a better ability of the model to distinguish between positive and negative
classes. An AUC of 0.96 suggests strong separation and robust performance for the malaria
detection model.
4.2.3 Processing Time The efficiency of the system was evaluated by measuring the time taken
for predictions:
• Tabular Predictions: On average, the system processed user input and generated predictions
for tabular diseases in approximately 1-2 seconds. This fast turnaround time makes the system
suitable for real-time diagnostic assistance.
• Image Predictions (Malaria): Due to the larger data size (image pixels) and the complexity
of the CNN model, image predictions took slightly longer, averaging around 3-4 seconds. This
is still highly efficient compared to manual microscopic examination.
• User Interface Responsiveness: During user testing, the UI responsiveness was rated at 95%
satisfaction. This indicates that the interactive elements and page loads were quick and fluid,
contributing to a positive user experience.
4.4 Discussion
The results demonstrate the significant potential of the "End-to-End AIML Medical Diagnosis
System" in addressing the critical need for accessible and efficient disease prediction. By
integrating robust ML and DL models with a user-friendly web interface and a scalable backend,
the system achieves its primary objective of simplifying and enhancing the diagnostic process.
The achieved accuracy rates across the six diseases are highly promising and align with, or even
surpass, performance reported in similar research studies. The efficiency in processing time
ensures that the system can be a practical tool for rapid preliminary diagnosis.
However, the project also highlights areas for continuous improvement. Enhancing data quality by
incorporating real-world, clinical datasets, exploring more advanced and interpretable AI models,
15
and integrating features like Explainable AI are essential next steps. Additionally, broadening the
scope to include more complex diagnostic pathways or integrating with Electronic Health Records
(EHR) systems would further amplify its real-world utility.
Despite these future considerations, the current system represents a significant advancement in
applying AI and ML to practical healthcare problems. It establishes a strong foundation for
developing more sophisticated and comprehensive medical diagnostic tools that can ultimately
contribute to better health outcomes, particularly in resource-constrained environments.
16
CHAPTER 5
CONCLUSION
The "End-to-End AIML Medical Diagnosis System" represents a pivotal achievement in
leveraging cutting-edge technology to address the persistent challenges in medical diagnostics,
particularly the need for accessible, efficient, and accurate disease prediction. This project
successfully aimed to streamline the initial diagnostic process by providing personalized and
reliable predictions for six critical diseases: Heart Disease, Diabetes, Kidney Disease, Liver
Disease, Breast Cancer, and Malaria. By integrating robust Machine Learning and Deep Learning
models with a user-centric web interface and a scalable backend, the system has demonstrated its
significant capability in enhancing decision-making and improving the overall user experience in
a healthcare context.
5.1 Summary of Achievements
The development of this system has yielded several key achievements that underscore its efficacy
and potential:
5.1.1 Multi-Disease Predictive Accuracy: The system successfully implemented and trained
highly accurate ML models for Heart Disease (88%), Diabetes (86%), Kidney Disease (90%),
Liver Disease (87%), and Breast Cancer (92%), along with a powerful CNN model for Malaria
(94%). These accuracies demonstrate the system's robust diagnostic capabilities across diverse
medical conditions.
5.1.2 Personalized and Data-Driven Diagnostics: Users can input their specific physiological
parameters or upload relevant images, enabling the system to provide tailored diagnostic
insights. This moves beyond generic information, offering a personalized approach to
preliminary health assessment.
5.1.3 Interactive and User-Friendly Interface: The development of an intuitive web interface,
powered by Flask, HTML, CSS, and JavaScript, ensures ease of use for individuals with
varying levels of technical proficiency. The clear presentation of input forms, diagnostic
results, and historical data significantly enhances user interaction and understanding.
5.1.4 Efficient and Real-time Predictions: The system demonstrated efficient processing times,
with tabular predictions delivered in 1-2 seconds and image predictions in 3-4 seconds. This
near real-time performance is crucial for practical diagnostic applications, enabling quick
preliminary assessments.
5.1.5 Robust Data Management and Traceability: The integration with a PostgreSQL database
ensures secure, persistent storage of all patient inputs and diagnostic outcomes. This
17
comprehensive logging mechanism provides a valuable historical record for users and a robust
audit trail for system administrators, crucial for accountability and future analysis.
5.1.6 Scalable and Extensible Architecture: The modular design of the system, coupled with
containerization using Docker, provides a highly scalable and flexible architecture. This
design allows for easy integration of additional disease models, new data types, and future
expansion to cater to a broader range of medical conditions or user demographics with
minimal modifications.
5.4 Conclusion
In conclusion, the "End-to-End AIML Medical Diagnosis System" successfully demonstrates the
feasibility and immense utility of integrating Artificial Intelligence and Machine Learning into a
practical, user-centric healthcare solution. By effectively combining data-driven techniques with
a focus on usability and scalable deployment, the system achieves its goal of simplifying and
improving the disease diagnosis process for multiple critical conditions.
The project not only showcases the technical prowess in model development and system
integration but also highlights the transformative potential of AI in addressing complex urban and
rural health challenges. The success of this endeavor reinforces the importance of continued
research and development in the field of AI-powered medical systems, with a clear roadmap for
future enhancements. Ultimately, this system stands as a valuable tool that not only benefits
individual users by providing accessible and accurate preliminary diagnoses but also contributes
significantly to broader societal goals of improving public health outcomes, increasing healthcare
accessibility, and fostering a more data-driven approach to preventive medicine.
19
CHAPTER 6
FUTURE PROSPECTS
The "End-to-End AIML Medical Diagnosis System" represents a significant foundational step in
leveraging technology to enhance medical diagnostics. While the current implementation
addresses several critical aspects of automated disease prediction, there remains substantial
potential for future enhancements, expansions, and integrations to make it an even more
comprehensive, intelligent, and impactful healthcare solution.
21
BIBLIOGRAPHY
[1] Dey, Ayon. "Machine learning algorithms: a review." International Journal of Computer
Science and Information Technologies 7.3 (2016): 1174-1179.Dua, D. and Graff, C. (2019).
UCI Machine Learning Repository
[2] Shrestha, Ajay, and Ausif Mahmood. "Review of deep learning algorithms and architectures."
IEEE access 7 (2019): 53040-53065.Pedregosa et al., Scikit-learn: Machine Learning in
Python, JMLR 2011.
[3] Alzubaidi, Laith, et al. "Review of deep learning: concepts, CNN architectures, challenges,
applications, future directions." Journal of big Data 8 (2021): 1-74.Docker Documentation –
[https://docs.docker.com]
[4] LaValley, Michael P. "Logistic regression." Circulation 117.18 (2008): 2395-2399.
[5] Mujumdar, Aishwarya, and Vb Vaidehi. "Diabetes prediction using machine learning
algorithms." Procedia Computer Science 165 (2019): 292-299.
[6] Shekar, Gautham, S. Revathy, and Ediga Karthick Goud. "Malaria detection using deep
learning." 2020 4th international conference on trends in electronics and informatics
(ICOEI)(48184). IEEE, 2020.
[7] Copperwaite M, Leifer C. Learning flask framework. Packt Publishing Ltd; 2015 Nov 26.
[8] Miell, Ian, and Aidan Sayers. Docker in practice. Simon and Schuster, 2019.
[9] Liu, Peiyu, et al. "Understanding the security risks of docker hub." Computer Security–
ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS
2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer International
Publishing, 2020.
[10] Kononenko, Igor. "Machine learning for medical diagnosis: history, state of the art and
perspective." Artificial Intelligence in medicine 23.1 (2001): 89-109.
22