Neha Report
Neha Report
BONAFIDE CERTIFICATE
Certified that this project report “Integrated Medical Diagnosis System using X-ray
Image Detection and RAG-based Report Generation” is the Bonafide work of “
RISHIKESH K (211721243140), VISHNU S (21172124300) ” who carried out the
project work under my supervision.
SIGNATURE SIGNATURE
HEAD OF THE DEPARTMENT SUPERVISOR
Dr.N.KANAGAVALLI,M.E., Ph.D., Mrs.S.SARANYA,M.Tech., Ph.D.,
Assistant Professor Assistant Professor
Artificial Intelligence and Data Science Artificial Intelligence and Data
Science Rajalakshmi Institute of Technology Rajalakshmi Institute of Technology
Chennai- 600 124. Chennai- 600 124.
i
ACKNOWLEDGEMENT
We begin by offering our deepest gratitude to God for His divine guidance,
blessings, and strength, without which this project would not have been possible.
We wish to express our heartfelt indebtedness to our Chairman, Mr. S. Meganathan,
B.E., F.I.E., for his sincere commitment to educating the student community at this
esteemed institution.
We extend our deep gratitude to our Chairperson, Dr. (Mrs.) Thangam
Meganathan, M.A., M.Phil., Ph.D., for her enthusiastic motivation, which greatly
assisted us in completing the project.
Our sincere thanks go to our Vice Chairman, Dr. M. Haree Shankar, MBBS., MD.,
for his dedication to providing the necessary infrastructure and support within our
institution.
We express our sincere acknowledgment to Dr. R. Maheswari, M.E., Ph.D., Head
of the Institution of Rajalakshmi Institute of Technology, for her continuous motivation
and technical support, which enabled us to complete our work on time.
With profound respect, we acknowledge the valuable guidance and suggestions of to
Dr. N. Kanagavalli, M.E., Ph.D., Assistant Professor & Head of the Department of
Artificial Intelligence and Data Science, which contributed significantly to the
development and completion of our project.
We express our sincere thanks to our Coordinator, Dr. C. Srivenkateswaran, M.E.,
M.B.A., Ph.D., Professor, Department of Artificial Intelligence and Machine
Learning, for his continuous support and guidance throughout the project work.
Our heartfelt gratitude goes to our project Supervisor, Mrs. S. Saranya, M.Tech.,
Assistant Professor, Department of Artificial Intelligence and Data Science, for his
invaluable mentorship, constant support, and insightful suggestions that guided us
throughout our project journey.
Finally, we express our deep sense of gratitude to our Parents, Faculty members,
Technical staffs, and Friends for their constant encouragement and moral support,
which played a vital role in the successful completion of this project.
ii
ABSTRACT
This project introduces an advanced and integrated medical diagnostic system that
merges the power of deep learning for image-based disease detection with an
automated report generation feature, utilizing cutting-edge Retrieval-Augmented
Generation (RAG) models. The system aims to significantly streamline and enhance
the process of diagnosing medical conditions through chest X-ray images and
subsequently generating detailed diagnostic reports. By using a deep learning model
based on ResNet, the system is capable of accurately analysing chest X-rays to detect
various diseases, such as pneumonia, tuberculosis, and even early-stage signs of more
severe conditions like lung cancer. These diseases are identified with a high degree of
precision, as ResNet's deep residual learning technique allows the model to focus on
intricate patterns in the X-ray images, offering superior performance compared to
traditional diagnostic methods.
The diagnostic pipeline begins with the preprocessing of chest X-ray images, where
each image is processed and normalized to ensure consistency and to make it suitable
for deep learning analysis. The processed images are then passed through a ResNet-
based neural network that uses convolutional layers to extract critical features from the
images. The model leverages residual blocks, which are key components of ResNet, to
prevent the vanishing gradient problem, allowing for better training of deeper models
and improving diagnostic accuracy. By employing techniques such as data
augmentation, transfer learning, and fine-tuning, the model has been trained on a large
dataset of X-ray images from diverse sources, ensuring that it is robust and
generalizable for real-world clinical environments.
.
iii
TABLE OF CONTENTS
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF TABLES viii
LIST OF FIGURES ix
ABBREVIATIONS X
1. INTRODUCTION 1
1.1 PROBLEM STATEMENT 2
1.2 OBJECTIVES 3
1.3 SCOPE 3
1.4 PROJECT MOTIVATION 4
1.5 METHODOLOGY OVERVIEW 4
2. LITERATURE REVIEW 6
3. METHODOLOGIES 11
3.1 RESEARCH DESIGN APPROACH 11
3.2 TOOL AND TECHNOLOGY USED 13
3.3 DATA COLLECTION 14
3.4 WORK FLOW 15
4. IMPLEMENTATION 17
4.1 DESCRIPTION 17
4.2 ARCHITECTURE 20
4.3 BLOCK DIAGRAM 20
4.4 MODEL / TECHNOLOGY 25
5. RESULT AND DISCUSSION 30
5.1 INTRODUCTION TO TESTING 30
5.2 PERFORMANCE AND EVALUTAION 31
5.3 CHALLENGES FACED 32
AND SOLUTIONS
6. CONCLUSION 35
7. FUTURE ENHANCEMENT 36
iv
APPENDIX 37
REFERENCES 58
BIBLIOGRAPHY 61
CONFERENCE CERTIFICATES 64
v
LIST OF TABLES
vi
LIST OF FIGURES
vii
ABBREVIATION
AI Artificial Intelligence
CAD Computer-Aided Diagnosis
CNN Convolutional Neural Network
DNN Dense Neural Network
ML Machine Learning
SVM Support Vector Machine
KNN K-Nearest Neighbors
DL Deep Learning
AUC-ROC Area Under the Curve - Receiver Operating Characteristic
ResNet Residual Network
API Application Programming Interface
x
CHAPTER 01
INTRODUCTION
In the medical field, chest X-rays (CXR) serve as one of the most essential
for diagnosing and monitoring a variety of thoracic and pulmonary conditions.
They provide valuable insights into the health of the chest, including the lungs,
heart, ribs, and diaphragm. With the ability to detect conditions such as
pneumonia, tuberculosis, lung nodules, pleural effusion, chronic obstructive
pulmonary disease (COPD), and lung cancer, chest X-rays are invaluable in both
clinical practice and emergency care settings. These diagnostic images offer an
efficient and cost-effective way to assess potential abnormalities in patients,
often serving as the first line of defense in detecting a wide array of diseases.
Despite the widespread use of chest X-rays, the interpretation of these images is
far from straightforward. The process of manually analyzing an X-ray requires
significant expertise and training. Radiologists—medical professionals who
specialize in interpreting diagnostic imaging—must analyze the X-ray, recognize
subtle patterns, and correlate these findings with the patient's clinical history and
symptoms. This process is inherently complex and prone to human error,
especially in cases where the abnormalities are not immediately apparent or are
difficult to distinguish from other conditions. Furthermore, radiologists often face
the challenge of managing large volumes of X-ray images, especially in busy
hospitals or during high-demand periods, leading to potential delays in diagnosis.
In addition to the technical challenges of X-ray interpretation, the accessibility of
expert radiological care is a significant concern. Many regions, particularly in
rural or under-resourced areas, lack access to experienced radiologists or even
basic radiological equipment. In such environments, patients may not receive
timely diagnoses or adequate treatment, particularly for serious diseases like
9
cancer or tuberculosis. As the demand for healthcare services continues to rise
globally, there is a growing need for innovative solutions that can bridge the gap
between medical expertise and patient care, especially in underserved areas.
While the advent of artificial intelligence has brought forth tools capable of
detecting specific abnormalities in medical images, a crucial gap remains in their
seamless integration into clinical workflows. Many existing AI solutions primarily
focus on identifying the presence or absence of a particular disease or finding
regions of interest within an image. However, the clinical utility of such systems is
often hampered by the lack of an integrated report generation capability.
Radiologists still need to manually synthesize the AI's findings with other clinical
information to generate a comprehensive and actionable diagnostic report. This
additional step not only adds to the overall turnaround time but also reintroduces the
10
potential for subjective interpretation during the report writing process. A truly
efficient and accessible system needs to bridge this gap by providing not just
detection capabilities but also automated, structured, and clinically relevant reports
that can be directly incorporated into patient records and facilitate timely clinical
decision-making.
11
1.2 Objectives
The limitations of current chest X-ray interpretation methods pose substantial
challenges in clinical practice. Manual analysis by radiologists introduces
subjectivity, leading to variations in interpretation between individuals and even
within the same radiologist over time. This lack of standardization may result in
inconsistent diagnoses and delayed detection of critical conditions. The time-
consuming process of manually reviewing large volumes of images creates
diagnostic delays, especially in high-volume hospitals and screening programs.
12
Report Generation Module: Employs a Retrieval-Augmented Generation
(RAG) model to produce structured, clinically informative lab reports based
on the model's detection outputs.
These components function cohesively to enable a streamlined, AI-assisted
diagnostic workflow suitable for diverse healthcare environments.
Additionally, the system features.
13
1.4 Project Motivation
The development of this AI-powered diagnostic system is driven by the
growing demand for faster, more accurate, and widely accessible diagnostic
solutions in modern healthcare. Traditional diagnostic workflows, especially in
radiology, are often hindered by time constraints, high workloads, and variability in
interpretation. By automating the analysis of chest X-rays and the generation of
detailed, structured medical reports, this system aims to alleviate these challenges
and deliver meaningful improvements in clinical efficiency and diagnostic quality.
Key benefits of the system include:
.
1.5 Methodology Overview
Data Preparation
14
Acquisition and preprocessing of chest X-ray images along with corresponding
clinical annotations.
Dataset splitting into training, validation, and test sets, ensuring balanced class
distribution.
Model Architecture
A dual-headed architecture based on ResNet-50 is implemented:
Custom loss functions are applied to address class imbalance and multi-task
learning.
Inference Pipeline
Real-time inference is enabled with:
15
Report generation: Retrieval-Augmented Generation (RAG) framework
This end-to-end system ensures an efficient workflow from image acquisition to clinical
report generation, supporting deployment in varied healthcare environments.
16
CHAPTER 2
LITERATURE REVIEW
The primary objective of this project is to design and develop a comprehensive AI-powered
diagnostic system that integrates advanced deep learning techniques and natural language
processing (NLP) for the automated interpretation and reporting of chest X-ray (CXR) images.
This system aims not only to automate the disease detection process through medical image
analysis but also to generate clinically relevant and structured reports using a Retrieval-
Augmented Generation (RAG) model. The following are the detailed objectives that form the
foundation of the proposed work:
17
These datasets are essential for training both classification and localization components
of the model effectively.
Custom Data Preprocessing and Balancing
A tailored preprocessing pipeline is designed, incorporating normalization, histogram
equalization, image resizing, and augmentation techniques such as flipping, rotation, and
contrast adjustment. Additionally, strategies like focal loss and class reweighting are
integrated to mitigate class imbalance and improve generalization.
19
Visual and Textual Output Display
The interface shows the processed X-ray image with overlaid bounding boxes and
disease labels, alongside the full generated report. This dual output enhances
interpretability and aids clinical validation.
20
CHAPTER 3
METHODOLOGIES
Label Encoding
Disease labels are provided as pipe-separated strings denoting multiple co-occurring
pathologies (e.g., "Pneumonia|Infiltrate"). To transform these into a machine-readable format,
the MultiLabelBinarizer from Scikit-learn is employed. This encoder converts the multi-label
21
string format into binary vectors, where each element represents the presence or absence of a
specific disease class. This transformation is crucial for enabling the classification head to
handle multi-l
Training Phase:
Data augmentation includes RandomResizedCrop, RandomHorizontalFlip, and rotation. Images
are normalized using the ImageNet mean and standard deviation values to match the input
distribution expected by the ResNet-50 backbone.
22
Validation Phase:
Standard transformations such as deterministic resizing and center cropping are used to ensure
consistent and unbiased performance evaluation.
Backbone: ResNet-50
ResNet-50 is chosen for its ability to learn deep hierarchical representations through residual
connections. The final fully connected layer is removed, and the 2048-dimensional feature
vector output is shared by both output heads. This feature extractor is fine-tuned on the medical
dataset, leveraging pre-trained weights to improve convergence and feature generalization.
Output Heads
Classification Head:
A stack of fully connected layers outputs raw logits representing the probability of each disease
class. BCEWithLogitsLoss is used to handle multi-label predictions effectively.
Component Tool/Library
Model Backbone ResNet-50 (TorchVision)
Deep Learning Framework PyTorch
Data Handling Pandas, PIL
Multi-label Encoding Scikit-learn
Optimizer Adam
Loss Functions BCEWithLogits, MSELoss
Augmentation TorchVision Transforms
Model Saving Torch .pth format
GPU Acceleration CUDA
25
3.6 Summary of Improvements and Potential Enhancements
The current system achieves a strong baseline for automated diagnosis of chest diseases but
offers several avenues for future improvement:
Focal Loss:
Replacing BCE loss with Focal Loss could help focus learning on difficult and
underrepresented cases.
Non-Maximum Suppression (NMS):
Applying NMS can reduce overlapping bounding box predictions, improving
interpretability in multi-lesion scenarios.
Multi-Box Prediction:
Enabling the model to output multiple bounding boxes per image would increase
robustness in cases of multiple pathologies.
Grad-CAM Integration:
Incorporating Grad-CAM for saliency visualization could enhance clinical trust and
transparency.
Medical-Specific Augmentations:
Advanced augmentation techniques such as CLAHE, gamma correction, or synthetic
noise injection could simulate real-world variability more effectively.
Domain-Specific Transfer Learning:
Utilizing models pre-trained on domain-specific datasets like RadImageNet could
improve feature relevance and model generalization in the medical context.
26
CHAPTER 4
IMPLEMENTATION/DEVELOPMENT
4.1 Introduction
This chapter outlines the process of implementing the integrated medical diagnosis
system for X-ray based disease prediction and report generation. The
implementation follows a structured approach, translating the methodologies
described in the previous chapter into a functional AI system. This involves practical
steps in data handling, model development, and system integration to create a
pipeline capable of analyzing X-ray images and generating clinically relevant
reports.
The development process was divided into several logical phases: data
preparation, model architecture design, training pipeline setup, report generation
logic, user interface development, and deployment. Each component was
individually tested and integrated into a unified system that can perform real-time
inference and deliver output reports seamlessly. The choice of technologies and
tools was made with scalability, accuracy, and clinical applicability in mind.
4.2 Architecture
1. Preprocessing Module: Responsible for preparing the raw chest X-ray images,
resizing, normalization, and data augmentation. It ensures consistency across input
images and enhances model generalization.
27
2. Image Classification and Localization Module: Utilizes a ResNet-50 based
neural network with dual outputs: one for multi-label classification of diseases and
another for bounding box regression to localize abnormalities.
28
4.3 Block Diagram
29
ResNet-50 Based Detection Model
The primary deep learning model used for analyzing chest X-ray images is
based on ResNet-50, a 50-layer deep convolutional neural network known for its
superior performance in image classification tasks. Its ability to maintain gradient
flow through deep layers using residual connections makes it highly effective in
identifying complex patterns in medical imaging.
Classification Head: Outputs probability scores for multiple diseases per image
using fully connected layers and sigmoid activation.
Bounding Box Head: Predicts normalized bounding box coordinates (x, y, width,
height) indicating the region of pathology.
The final training objective is: Total Loss = Classification Loss + 0.1 *
Bounding Box Loss
The model was trained using the Adam optimizer (learning rate = 0.0001,
weight decay = 1e-4) on a dataset of labeled chest X-ray images. Image
augmentations like random crop, flipping, and normalization were applied during
training to improve generalization. During validation, fixed-size center crops
ensured consistent evaluation.
Disease Overview
Clinical Symptoms
Pathophysiological Insight
Technology Stack
31
CUDA: To leverage GPU acceleration and reduce training/inference time.
The integration of computer vision and NLP in this system exemplifies the
potential of AI in clinical diagnostics. This implementation sets the stage for further
innovations, including support for more imaging types, enhanced report personal.
32
The implementation process was divided into the following key phases:
• Data Collection and Preprocessing
1. Gathering dermoscopic images and patient metadata from publicly
available medical repositories.
2. Applying data augmentation, normalization, and encoding
techniques to ensure data consistency.
• Model Development and Architecture Design
1. Implementing EfficientNetV2 as the image feature extractor.
2. Constructing a fully connected neural network (DNN) for tabular
data processing.
3. Integrating both feature sets using a concatenation layer to form a
multimodal model.
• Training and Optimization
4. Implementing hyperparameter tuning, class balancing, and transfer
learning to improve model performance.
5. Applying data augmentation techniques to prevent overfitting and
enhance generalization.
• Performance Evaluation and Model Validation
6. Evaluating the model using accuracy, precision, recall, F1-score, and
AUC-ROC metrics.
7. Conducting comparative analysis with other deep learning models
such as ResNet and VGG16.
• Deployment Considerations and Future Enhancements
8. Exploring real-time clinical deployment for dermatologists and
healthcare professionals.
9. Investigating the integration of edge computing and cloud-based AI
solutions to enable scalable deployment in hospitals and
telemedicine applications.
19
CHAPTER 5
RESULT AND DISCUSSION
Note: Metrics were computed on the test split using scikit-learn’s multi-label metrics and are
indicative of the model’s balanced performance across diseases.
39
CHAPTER 6
CONCLUSION
In addition to model development, the system includes a fully functional Streamlit web
application that allows users to interact with the pipeline seamlessly—uploading
images, viewing predictions, reading reports, and exporting them for clinical use.
The practical implications of this system are significant in both clinical and research
contexts:
40
Clinical Decision Support: By automating disease detection and reporting, the
system reduces the diagnostic burden on radiologists and enhances decision-making
speed in high-demand scenarios like emergency rooms and outpatient departments.
Educational Value: The system can serve as a teaching aid for medical students,
demonstrating AI-powered image interpretation and encouraging the adoption of
technology in healthcare education.
Time and Cost Efficiency: Automated report generation and visual annotation
drastically reduce the time and effort required for manual documentation, enabling
healthcare facilities to allocate human resources more effectively.
While the system establishes a robust baseline for AI-assisted diagnostics, several
avenues for improvement and future exploration exist:
41
RAG model can widen accessibility across diverse linguistic and cultural healthcare
environments.
With continued iteration, validation, and integration, the system holds immense promise
as a deployable tool in real-world medical environments—revolutionizing how we
interpret medical imagery and deliver timely care to patients.
43
CHAPTER 7
FUTURE ENHANCEMENTS
The implementation of this AI-powered medical diagnostic system represents a significant
advancement in the application of artificial intelligence to healthcare. However, the journey
does not end here. While the current system serves as a powerful prototype, several future
enhancements can further elevate its clinical relevance, usability, and adaptability across
diverse healthcare ecosystems.
The following key areas have been identified as high-impact enhancements that will
significantly strengthen the system’s capabilities and extend its reach.
44
Generate diagnostic reports in multiple languages, improving accessibility.
Cater to a diverse, global user base, especially in multilingual countries.
Enhance patient understanding and trust, as native-language medical reports are more
relatable and easier to interpret.
Future iterations can integrate language models like mBART or mT5, which are pre-trained on
multilingual corpora, to achieve high-quality cross-language report generation.
45
7.5 Explainable AI (XAI)
One of the major concerns in medical AI is interpretability. Clinicians must understand how
and why a system arrived at a particular decision to trust and rely on it.
To address this, future versions can integrate Explainable AI (XAI) tools such as:
Grad-CAM: Visual heatmaps showing which regions of the X-ray influenced the
model’s prediction.
LIME / SHAP: Model-agnostic explanation frameworks that highlight the most
influential features.
Clinical Pathway Tracing: Textual explanations in the generated reports detailing why
certain disease labels were assigned.
These tools will not only boost clinician confidence but also aid in auditing and debugging
AI predictions, making the system safer and more transparent.
47
APPENDIX
!pip install keras_cv
import os
import keras_cv
import keras
import tensorflow as tf
feat_input = keras.Input...
import cv2
import pandas as pd
import numpy as np
import joblib
drive.mount('/content/drive')
48
import matplotlib.pyplot as plt
drive.mount('/content/drive')
drive.mount('/content/drive')
class CFG:
verbose = 1 # Verbosity
49
lr_mode = "cos" # LR scheduler mode from one of "cos", "step", "exp"
class_names = ['target']
num_classes = 1
keras.utils.set_random_seed(CFG.seed)
BASE_PATH = "/content/drive/MyDrive/FYv1_Data"
# Train + Valid
df = pd.read_csv(f'{BASE_PATH}/train-metadata.csv')
df = df.ffill()
display(df.head(10))
class CFG:
"""
"""
seed = 42
model_name = "tf_efficientnetv2_b0"
import logging
logging.basicConfig(level=logging.INFO)
50
def visualize_sample(image, label):
plt.imshow(image)
plt.title(f"Label: {label}")
plt.axis("off")
plt.show()
import argparse
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int,
default=7)
return parser.parse_args()
args = get_args()
def load_image(path):
"""
"""
image = tf.io.read_file(path)
# Testing
51
testing_df = pd.read_csv(f'{BASE_PATH}/test-metadata.csv')
testing_df = testing_df.ffill()
display(testing_df.head(2))
display(df.target.value_counts(normalize=True)*100)
testing_df = pd.read_csv(f'{BASE_PATH}/test-metadata.csv')
testing_df = testing_df.ffill()
display(testing_df.head(2))
display(df.target.value_counts(normalize=True)*100)
self.df = df
self.file_names = df["Id"].values
self.mode = mode
self.transform = transform
return len(self.file_names)
image_id = self.file_names[index]
52
image = np.array(Image.open(image_path).convert("RGB"))
if self.transform:
image = self.transform(image=image)["image"]
label = torch.tensor(self.labels[index]).float()
def get_transforms(mode="train"):
if mode == "train":
return A.Compose([
A.Resize(*CFG.image_size),
A.HorizontalFlip(p=0.5),
A.ShiftScaleRotate(p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.Normalize(),
ToTensorV2()
])
else:
return A.Compose([
A.Resize(*CFG.image_size),
A.Normalize(),
ToTensorV2()
])
53
# Sampling
positive_df = df.query("target==0").sample(frac=CFG.neg_sample,
random_state=CFG.seed)
positive_df = df.query("target==0").sample(frac=CFG.neg_sample,
random_state=CFG.seed)
positive_df = df.query("target==0").sample(frac=CFG.neg_sample,
random_state=CFG.seed)
display(df.target.value_counts(normalize=True)*100)
# Assume df is your DataFrame and 'target' is the column with class labels
54
class_weights = compute_class_weight('balanced', classes=np.unique(df['target']),
y=df['target'])
class_weights = dict(enumerate(class_weights))
import h5py
isic_id = df.isic_id.iloc[0]
byte_string = training_validation_hdf5[isic_id][()]
print("Image:")
plt.imshow(image);
display(df.target.value_counts(normalize=True)*100)
55
# Assume df is your DataFrame and 'target' is the column with class labels
class_weights = dict(enumerate(class_weights))
import h5py
def main():
import pandas as pd
logger.info("Reading data...")
df = pd.read_csv(CFG.labels_path)
model = CustomModel().to(CFG.device)
criterion = nn.BCEWithLogitsLoss()
56
best_loss = float('inf')
logger.info(f"Epoch {epoch+1}/{CFG.epochs}")
best_loss = valid_loss
torch.save(model.state_dict(), CFG.model_path)
isic_id = df.isic_id.iloc[0]
df["fold"] = -1
57
training_df = df.query("fold!=0")
validation_df = df.query("fold==0")
training_df.target.value_counts()
validation_df.target.value_counts()
"tbp_tile_type","tbp_lv_location", ]
"tbp_lv_deltaLBnorm", "tbp_lv_minorAxisMM", ]
def build_augmenter():
# Define augmentations
aug_layers = [
keras_cv.layers.RandomFlip(mode="horizontal")
58
# Apply augmentations to random samples
ds = ds.with_options(opt)
ds = ds.batch(batch_size, drop_remainder=drop_remainder)
ds = ds.prefetch(AUTO)
return ds
training_df = df.query("fold!=0")
validation_df = df.query("fold==0")
training_df.target.value_counts()
validation_df.target.value_counts()
"tbp_tile_type","tbp_lv_location", ]
## Train
print("# Training:")
training_features = dict(training_df[FEAT_COLS])
training_ids = training_df.isic_id.values
training_labels = training_df.target.values
59
training_ds = build_dataset(training_ids, training_validation_hdf5, training_features,
training_labels, batch_size=CFG.batch_size,
shuffle=True, augment=True)
# Valid
print("# Validation:")
validation_features = dict(validation_df[FEAT_COLS])
validation_ids = validation_df.isic_id.values
validation_labels = validation_df.target.values
validation_labels, batch_size=CFG.batch_size,
shuffle=False, augment=False)
feature_space = keras.utils.FeatureSpace(
features={
"sex": "string_categorical",
"anatom_site_general": "string_categorical",
"tbp_tile_type": "string_categorical",
"tbp_lv_location": "string_categorical",
"age_approx": "float_discretized",
60
# Numerical features to normalize
"tbp_lv_nevi_confidence": "float_normalized",
"clin_size_long_diam_mm": "float_normalized",
"tbp_lv_areaMM2": "float_normalized",
"tbp_lv_area_perim_ratio": "float_normalized",
"tbp_lv_color_std_mean": "float_normalized",
"tbp_lv_deltaLBnorm": "float_normalized",
"tbp_lv_minorAxisMM": "float_normalized",
},
output_mode="concat",
feature_space.adapt(training_ds_with_no_labels)
for x, _ in training_ds.take(1):
preprocessed_x = feature_space(x["features"])
print("preprocessed_x.shape:", preprocessed_x.shape)
print("preprocessed_x.dtype:", preprocessed_x.dtype)
training_ds = training_ds.map(
61
validation_ds =
validation_ds.map( lambda x, y:
({"images": x["images"],
"features":feature_space(x["features"])}, y),
num_parallel_calls=tf.data.AUTOTUNE)
# AUC
auc = keras.metrics.AUC()
# Loss
loss = keras.losses.BinaryCrossentropy(label_smoothing=0.02)
feat_input = keras.Input(shape=(feature_space.get_encoded_features().shape[1],),
name="features")
backbone =
keras_cv.models.EfficientNetV2Backbone.from_preset(CFG.preset) x1 =
backbone(image_input)
x1 = keras.layers.GlobalAveragePooling2D()(x1)
62
x1 = keras.layers.Dropout(0.2)(x1)
# Model Summary
model.summary()
import math
plt.tight_layout()
plt.show()
ckpt_cb = keras.callbacks.ModelCheckpoint(
history = model.fit(
training_ds,
epochs=CFG.epochs,
callbacks=[lr_cb, ckpt_cb],
validation_data=validation_ds,
verbose=CFG.verbose,
class_weight=class_weights,
63
# Extract AUC and validation AUC from history
auc = history.history['auc']
val_auc =
history.history['val_auc'] epochs =
range(1, len(auc) + 1)
max_val_auc_epoch = np.argmax(val_auc)
max_val_auc = val_auc[max_val_auc_epoch]
plt.xlabel('epoch'); plt.ylabel('lr')
plt.title('LR Scheduler')
plt.show()
64
images = inputs["images"]
num_images, NUMERIC_COLUMNS = 8, 4
image = image.numpy().astype("float32")
target= target.numpy().astype("int32")[0]
plt.imshow(image)
plt.title(f"Target: {target}")
plt.axis("off")
plt.tight_layout()
plt.show()
ckpt_cb = keras.callbacks.ModelCheckpoint(
65
history = model.fit(
training_ds,
epochs=CFG.epochs,
callbacks=[lr_cb, ckpt_cb],
validation_data=validation_ds,
verbose=CFG.verbose,
class_weight=class_weights,
auc = history.history['auc']
val_auc =
history.history['val_auc'] epochs =
range(1, len(auc) + 1)
max_val_auc_epoch = np.argmax(val_auc)
max_val_auc = val_auc[max_val_auc_epoch]
# Plotting
plt.show()
# Best Result
best_score = max(history.history['val_auc'])
best_epoch= np.argmax(history.history['val_auc']) + 1
print("#" * 28)
# Inputs
backbone =
keras_cv.models.EfficientNetV2Backbone.from_preset( CFG.pre
set,
load_weights=True
x = backbone(image_input)
x = keras.layers.GlobalAveragePooling2D()
# Dense layers
x = keras.layers.Dense(256, activation="relu")(combined)
x = keras.layers.Dropout(0.3)(x)
x = keras.layers.Dense(64, activation="relu")(x)
x = keras.layers.Dropout(0.2)(x)
# Output
model.compile( optimizer=keras.optimizers.Adam(learnin
g_rate=1e-4), loss=loss,
metrics=[auc]
history = model.fit
training_ds,
validation_data=validation_ds,
epochs=CFG.epochs,
class_weight=class_weights,
verbose=CFG.verbose )
plt.legend()
plt.title("Training History")
plt.show()
68
REFERENCES
[1] Ahmed, I., Routh, B. B., & Kohinoor, M. S. R. (2024). Multi-model attentional
fusion ensemble for accurate skin cancer classification. IEEE Access, 12, 181009-
181022.
[2] Brinker, J., Hekler, F., & Enk, A. (2019). Deep learning outperforms
dermatologists in skin cancer classification. European Journal of Cancer, 119, 88-95.
[3] Codella, N. C., Rotemberg, V., Tschandl, P., Celebi, M. E., Dusza, S., Gutman, D.,
... & Halpern, A. (2019). Skin lesion analysis toward melanoma detection: A challenge
at the 2018 ISIC workshop. IEEE Journal of Biomedical and Health Informatics,
23(3), 1097-1108.
[4] Codella, C., Gutman, D., & Celebi, M. (2018). Skin lesion analysis toward
melanoma detection: A challenge at the ISIC 2017. IEEE Transactions on Medical
Imaging, 36(10), 2204–2216.
[5] Esteva, H., Kuprel, A., & Novoa, R. A. (2017). Dermatologist-level classification
of skin cancer with deep neural networks. Nature, 542(7639), 115-118.
[6] Fujisawa, Y., Otomo, Y., Ogata, Y., Nakamura, Y., Fujita, R., Ishitsuka, Y., ... &
Fujimoto, M. (2019). Deep learning-based evaluation of skin tumor classification using
convolutional neural networks. Journal of Dermatology, 46(11), 1006-1013.
[7] Guergueb, T., & Akhloufi, M. (2021). Melanoma skin cancer detection using recent
deep learning models. Annual International Conference of the IEEE Engineering in
Medicine and Biology Society.
[8] Han, S. S., Kim, M. S., Lim, W., Park, G. H., Park, I., & Chang, S. E. (2018).
Classification of the clinical images for benign and malignant cutaneous tumors using
a deep learning model. Journal of Investigative Dermatology, 138(7), 1529-1538.
[9] ISIC Archive. (2023). International Skin Imaging Collaboration (ISIC) dataset.
ISIC Archive. [Online]. Available: https://www.isic-archive.com
69
[10] Kawahara, J., BenTaieb, A., & Hamarneh, G. (2016). Deep features to classify
skin lesions. 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).
[11] Li, Y., & Shen, L. (2018). Skin lesion analysis towards melanoma detection using
deep learning network. Sensors, 18(2), 556.
[12] Liu, S., Cheng, L., & Chen, Q. (2021). AI-powered detection of skin cancer using
deep convolutional neural networks. Journal of Biomedical Informatics, 112, 103622.
[14] Oliveira, R. B., Papa, J. P., Pereira, A. S., Tavares, J. M. R. S., & Marana, A. N.
(2018). Computational methods for the image segmentation of pigmented skin lesions:
A review. Computer Methods and Programs in Biomedicine, 157, 1-20.
[15] Ravi, V. (2022). Attention cost-sensitive deep learning-based approach for skin
cancer detection and classification. Cancers, 14(5872), 1-26.
[16] Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, faster, stronger. arXiv
preprint arXiv:1612.08242. Available: https://arxiv.org/abs/1612.08242
[17] Sharad, M. N., Sharad, Z. M., Anwar, S. S., & Chaudhari, N. J. (2024). MalenoCare
- Skin cancer detection and prescription using CNN and ML. International Journal of
Advanced Research in Science, Communication and Technology.
[18] Tan, M., & Le, Q. (2021). EfficientNetV2: Smaller models and faster training.
Proceedings of the International Conference on Machine Learning (ICML).
70
[20] Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The HAM10000 dataset: A
large collection of multi-source dermatoscopic images of common pigmented skin
lesions. Scientific Data, 5(1), 180161.
[21] Venugopal, V., Raj, N. I., & Nath, M. K. (2023). Deep neural network using
modified EfficientNet for skin cancer detection in dermoscopic images. Decision
Analytics Journal, 8, 100278.
[22] World Health Organization (WHO). (2023). Skin cancers. WHO. [Online].
Available: https://www.who.int/news-room/fact-sheets/detail/skin-cancers
[23] Xie, F., Fan, H., Li, Y., Jiang, Z., Meng, R., & Bovik, A. (2020). Melanoma
classification on dermoscopy images using a neural network ensemble model. IEEE
Transactions on Medical Imaging, 39(2), 413-422.
[24] Yap, J., Yolland, W., & Tschandl, P. (2018). Multimodal skin lesion classification
using deep learning. Experimental Dermatology.
[25] Yu, C., Yang, S., & Kim, K. (2021). Automated melanoma diagnosis using deep
learning and clinical images: A review. Journal of Clinical Medicine, 10(3), 574.
71
BIBLIOGRAPHY
[1] Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A
Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 35(8), 1798–1828.
[3] Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation
for Deep Learning. Journal of Big Data, 6(60).
[4] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
[5] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553),
436– 444.
[6] Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support Vector
Machines, Regularization, Optimization, and Beyond. MIT Press.
[8] Zhang, Z., & Liu, Q. (2021). Deep Learning in Medical Image Analysis: Advances
and Applications. Academic Press.
[9] Chollet, F. (2021). Deep Learning with Python (2nd Edition). Manning Publications.
[10] Aggarwal, C. C. (2018). Neural Networks and Deep Learning: A Textbook. Springer.
[11] Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller Models and Faster Training.
arXiv preprint arXiv:2104.00298.
[12] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image
Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 770–778.
72
[13] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for
Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.
[14] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4,
Inception- ResNet and the Impact of Residual Connections on Learning. AAAI
Conference on Artificial Intelligence, 4278–4284.
[15] Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun,
S. (2017). Dermatologist-level classification of skin cancer with deep neural networks.
Nature, 542(7639), 115–118.
[16] Goyal, M., Knackstedt, T., Yan, S., & Hassanpour, S. (2020). Artificial
Intelligence- Based Image Classification Methods for Diagnosis of Skin Cancer:
Challenges and Opportunities. Cancers, 12(8), 2278.
[17] Yasaka, K., Abe, O. (2018). Deep Learning and Artificial Intelligence in
Radiology: Current Applications and Future Directions. PLoS One, 13(11), e0207122.
[18] Kooi, T., Litjens, G., Van Ginneken, B., et al. (2017). Large Scale Deep Learning
for Computer Aided Detection of Mammographic Lesions. Medical Image Analysis, 35,
303–312.
[19] Yonekura, H., et al. (2022). Performance Analysis of EfficientNet Models for Skin
Cancer Classification. IEEE Access, 10, 100045–100057.
[20] Han, S. S., et al. (2018). Deep Neural Networks for Skin Cancer Detection and
Diagnosis: A Survey. Expert Systems with Applications, 114, 15–28.
73
[23] National Cancer Institute. (2023). Epidermal Carcinoma Overview. Available at:
https://www.cancer.gov
[24] Skin Cancer Foundation. (2023). Early Detection of Epidermal Carcinoma Using
AI Techniques. Available at: https://www.skincancer.org
[25] arXiv.org. (2024). Recent Advances in Deep Learning for Skin Cancer Detection.
Available at: https://arxiv.org
[26] Google AI Blog. (2023). Advancements in Deep Learning for Medical Imaging.
Available at: https://ai.googleblog.com
[27] Kaggle. (2024). Skin Cancer Detection Datasets and Models. Available at:
https://www.kaggle.com
[28] ResearchGate. (2024). Medical Imaging and AI-Based Diagnosis. Available at:
https://www.researchgate.net
[30] SpringerLink. (2024). Deep Learning in Medical Image Processing. Available at:
https://link.springer.com
[31] Skin Cancer ISIC - The International Skin Imaging Collaboration. Available at:
https://www.isic-archive.com
74
CONFERENCE CERTIFICATES
75