Brain Tumour Final
Brain Tumour Final
Submitted by
    ARULARASU A                            (810421106013)
    DINESH KUMAR B                         (810421106025)
    ELAMURUGAN E                           (810421106027)
    HEMANATH V                             (810421106038)
               BACHELOR OF ENGINEERING
                              In
                                   I
          ANNA UNIVERSITY: CHENNAI 600 025
                        BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
(Autonomous) (Autonomous)
                                        II
                            ACKNOWLEDGEMENT
      Lastly, we would like to extend our gratitude to our cherished parents and
friends for their support in bolstering the success of our endeavor.
                                          III
                                    ABSTRACT
Brain tumors are among the most critical and life-threatening forms of cancer,
requiring early detection and accurate classification to enhance treatment outcomes.
This project proposes a hybrid deep learning model that combines Convolutional
Neural Networks and Transformer architectures to recognize and classify brain
tumors effectively from multi-modal MRI data. While CNNs are proficient in spatial
feature extraction, Transformers excel at modeling long-range dependencies and
contextual relationships. By integrating these two architectures, our approach
leverages the strengths of both for superior diagnostic performance. The model is
trained on multiple MRI modalities—including T1-weighted, T2-weighted, FLAIR,
and contrast-enhanced images—to provide comprehensive information about tumor
structure and progression. Preprocessing techniques such as skull stripping,
normalization, and data augmentation are applied to enhance image quality and
model robustness. Our hybrid model outperforms traditional CNN-only approaches
in terms of accuracy, precision, recall, and F1-score. The results indicate that the
synergy between CNNs and Transformers significantly improves the model’s
capability to distinguish between tumor types and grades. This approach holds
promise for assisting radiologists in making faster, more reliable decisions in clinical
environments. The ultimate aim is to create an intelligent, automated system that can
assist healthcare professionals in the early and accurate detection of brain tumors,
leading to better treatment planning and improved patient outcomes.
                                           IV
                           TABLE OF CONTENTS
CHAPTER                                                                  PAGE
                                       TITLE
  NO                                                                     NO.
ABSTRACT VII
LIST OF ABBREVATIONS
1 INTRODUCTION 1
2 LITERATURE SURVEY
                                        V
    2.8) Transformer-based Multi-modal Learning for Medical Image
                                                                    5
    Analysis
3 SYSTEM ANALYSIS
4.1) Introduction
                                    VI
    4.6) Technology Stack                  13
5.5)Testing Environment 16
5.6) Results 17
7.1) Introduction 22
                                    VII
    7.3) Data Augmentation                        23
8.1) Overview 27
9.1) Overview 32
                                   VIII
     9.3) Limitations                                                 32
10 REFERENCES 36
APPENDIX 41
                                    IX
                          LIST OF FIGURES
FIGURE                                                 PAGE
                                DESCRIPTION
 NO                                                    NO
                                       X
        LIST OF ABBREVATIONS
AI Artificial Intelligence
                   XI
OBJECTIVE
   The overarching objective of this project is to develop a robust, intelligent, and
automated system for brain tumor recognition using advanced deep learning
techniques. Specifically, the aim is to design a hybrid model that combines
Convolutional Neural Networks (CNNs) and Transformer architectures to enhance
the accuracy, sensitivity, and reliability of tumor detection and classification using
multi-modal MRI data.
   The deep learning model is structured to take advantage of CNNs for their
superior ability to learn local spatial features, such as texture, edges, and tumor
boundaries. Ultimately, the goal is to support healthcare professionals by providing
an AI-based diagnostic tool that reduces workload, minimizes diagnostic errors, and
improves patient outcomes through early and accurate brain tumor detection.
                                         XII
PROBLEM STATEMENT
   Brain tumors are one of the most challenging and deadly forms of cancer,
accounting for a significant number of neurological deaths worldwide. Early
detection and accurate classification are essential to improve the chances of survival
and to guide proper treatment planning. However, manual interpretation of MRI
scans is time-consuming, labor-intensive, and highly dependent on the expertise
of radiologists, which introduces the possibility of human error and subjective bias.
   In parallel, Transformer models, known for their success in NLP and recently
in computer vision (Vision Transformers), provide an alternative by offering self-
attention mechanisms that model global relationships between image patches.
However, using Transformers alone in medical imaging may not be optimal due to
their large data requirements and lack of fine-grained spatial feature extraction,
which CNNs handle well.
   Hence, the core problem addressed in this project is the development of a hybrid
deep learning model that not only leverages multi-modal MRI data but also
combines the local feature-learning capability of CNNs with the global attention
mechanism of Transformers. This approach aims to overcome the limitations of
existing models by providing a more accurate, context-aware, and comprehensive
solution for automated brain tumor detection and classification.
                                         XIII
                                  CHAPTER 1
                                INTRODUCTION
   This project introduces a hybrid deep learning model that fuses CNNs and
Vision Transformers to harness both local and global feature representations.
Furthermore, we utilize multi-modal MRI data, incorporating various imaging
modalities like T1, T2, FLAIR.The ultimate aim is to create an intelligent,
automated system that can assist healthcare professionals in the early and accurate
detection of brain tumors, leading to better treatment planning and improved patient
outcomes.
                                           1
                              CHAPTER 2
                          LITERATURE SURVEY
                                       2
the model to capture global dependencies, making it suitable for distinguishing
between different tumor types.
                                       3
tumors. The hybrid approach offers a new perspective in medical image analysis,
suggesting that combining different deep learning models can yield better
diagnostic performance.
                                        4
2.7)TITLE :DeepConvolutional Neural Network for Brain Tumorclassifiction
AUTHOR : R. R. Mehta, A. K. Majumdar
YEAR : 2020
DESCRIPTION:
   The authors developed a deep CNN model for classifying brain tumors into
different types using MRI images. They optimized the architecture using
dropout layers, ReLU activation, and batch normalization to improve training
stability and prevent overfitting. The system was tested on multiple public
datasets and achieved high accuracy, showcasing the effectiveness of deep
learning in tumor diagnosis.
                                        5
2.9) TITLE : 3D CNN and Transformer Model for Brain Tumor Detection
AUTHOR : L. Wang, M. Zhang, F. Chen
YEAR : 2022
DESCRIPTION
   A novel hybrid model combining 3D CNNs and transformers is proposed to
analyze volumetric brain MRI scans. The 3D CNN extracts spatial depth
information, while the transformer captures long-range dependencies across slices.
This dual approach enhances the ability to detect and classify tumors in 3D space.
The model outperformed standalone CNNs and transformers, for real-world
implementation.
                                       6
                                  CHAPTER-3
                              SYSTEM ANALYSIS
                                          7
                     Fig 3.1:Existing CNN based System
DISADVANTAGE
  • Existing CNN-based systems for brain tumor detection often struggle to
      generalize well across different datasets due to variations in MRI scanners and
      imaging conditions.
  • They also require large amounts of annotated data, which is difficult and time-
      consuming to obtain in the medical field.
  •   Additionally, these models lack interpretability, making it hard for clinicians
      to trust and understand the decision-making process.
                                         8
      Furthermore, unlike conventional systems that rely on single-modal data,
our system utilizes multi-modal MRI images (such as T1-weighted, T2-weighted,
and FLAIR), significantly improving tumor visibility and classification accuracy.
Description of Workflow:
                                       9
   6. Classification Layer: Fully connected layers that output tumor categories
      (e.g., glioma, meningioma, pituitary).
   7. Output: Tumor class and region.
ADVANTAGES
                                        10
                               CHAPTER-4
• Input Preprocessing
• Fusion Layer
• Classification Layer
• Output Interface
Each module works in tandem to ensure a comprehensive and robust system for
brain tumor classification.
                                        11
normalized, and augmented with rotations, flipping, and cropping techniques to
improve the model’s generalizability.
4.3.2 CNN Feature Extraction
  CNNs extract spatial features from the MRI images, highlighting tumor
boundaries and intensities. The architecture consists of multiple convolutional and
pooling layers.
4.3.3 Transformer Attention Block
  Transformers enhance the model’s ability to capture long-range dependencies in
the MRI images, which helps in identifying dispersed tumor characteristics.
4.3.4 Fusion Layer
    The CNN and Transformer outputs are merged using a feature fusion technique,
ensuring both local and global features are integrated before classification.
4.3.5 Classification Layer
    Fully connected dense layers classify the tumor type into categories such as
glioma, meningioma, pituitary, or no tumor.
4.3.6 Output Interface
    The system displays the classification result, making it suitable for real-time
medical use.
                                          12
                        Fig 4.1:System Architecture Design
                                        13
                                 CHAPTER-5
                      IMPLEMENTATION AND TESTING
                                         14
dataset and enhance the model's robustness against variations in image orientation
and scale. Histogram equalization was used to improve the contrast of low-visibility
images, making the tumor regions more prominent. The preprocessed dataset was
then split into training, validation, and test sets in a 70:15:15 ratio. Care was taken
to ensure that each subset maintained a balanced distribution of the four categories
glioma, meningioma, pituitary tumor, and no tumor—to prevent bias in the learning
process.
                                          15
5.4 Evaluation Matrix
       To evaluate the performance of the model, several well-established metrics
were used. Accuracy, which measures the proportion of correctly classified samples,
served as a primary indicator of the model's overall effectiveness. Precision and
recall were used to assess the model's ability to correctly identify tumor types while
minimizing false positives and false negatives, respectively. The F1-score, which is
the harmonic mean of precision and recall, provided a balanced measure of model
performance. In addition, the area under the Receiver Operating Characteristic curve
(ROC-AUC) was computed to evaluate the model’s classification capability across
various threshold settings. These metrics together offered a comprehensive view of
the system’s strengths and areas needing improvement. The results demonstrated
that the hybrid model consistently outperformed conventional CNN-only systems,
especially in scenarios involving low-contrast and multi-modal images.
                                         16
5.6 Results
     The hybrid model achieved excellent results in the classification of brain
tumors. On the test dataset, it reached a classification accuracy of 99.10%, which is
significantly higher than traditional approaches. The precision of 99.12% indicated
that the model rarely produced false positives, while the recall of 98.88% showed its
strong ability to detect all true tumor cases. The F1-score of 99.00% confirmed the
model’s balanced performance. The ROC-AUC value of 99.10% further validated
its exceptional classification capability. These results were consistent across
different tumor classes and were especially strong in distinguishing between visually
similar categories such as glioma and meningioma. The model’s success is largely
attributed to the synergistic use of CNNs and Transformers, as well as the rigorous
preprocessing and training strategies employed.
                                           17
5.8 Result Visualization
     Visual interpretation of the model’s internal processes is critical for building
trust in AI systems, especially in the medical field. To this end, several visual outputs
were generated to explain how the model arrived at its decisions. Heatmaps derived
from the Transformer’s attention layers highlighted the areas of the image the model
focused on while making predictions. These visualizations confirmed that the model
accurately localized tumor regions, even in complex cases. Additionally, feature
maps from the CNN layers illustrated how the network detected edges, textures, and
other relevant features at different stages of processing. Bar charts comparing
predicted and actual labels provided a clear overview of model accuracy for each
class. These visual tools not only validated the model’s performance but also made
the results interpretable for clinicians and researchers.
                                           18
                       CHAPTER-6
           HARDWARE AND SOFTWARE IMPLEMENTATION
                                          19
6.3 Software Implementation
   Python 3.10 was used as the primary programming language due to its
compatibility with scientific libraries and frameworks. TensorFlow 2.x and Keras
were utilized for building and training the hybrid deep learning model, offering high-
level abstractions and GPU support. OpenCV was employed for MRI image
processing tasks, including noise reduction, resizing, and normalization. Jupyter
Notebook was used as the development environment, facilitating interactive model
design and real-time result visualization. Additional libraries included NumPy and
Pandas for data manipulation, Matplotlib and Seaborn for visual analytics, and
Scikit-learn for performance metrics. TensorBoard provided live feedback during
training, helping monitor accuracy, loss, and other critical indicators.
                                          20
6.4 Integration and Workflow
   The hybrid model was integrated into a modular pipeline consisting of input
acquisition, preprocessing, classification, and output generation. The modular
architecture allowed independent upgrades or testing of each stage. Data
augmentation and preprocessing were handled automatically during training and
testing. The trained model was exported and loaded using TensorFlow's SavedModel
format, making it easy to deploy across various platforms including web, desktop,
or embedded systems.
                                         21
                                   CHAPTER-7
                   MODEL TRAINING AND VALIDATION
7.1 Introduction
     This chapter elaborates on the detailed procedures involved in training and
validating the proposed hybrid deep learning model for brain tumor recognition. The
success of any machine learning system depends heavily on how well it is trained
and how accurately it generalizes on unseen data. Hence, this chapter discusses the
dataset preparation, model architecture configuration, training strategies, validation
techniques, and the measures adopted to optimize the model’s performance.
                                         22
7.2 Dataset Preproceesing
    Before training, all MRI images were subjected to a preprocessing pipeline to
ensure consistency and quality. The MRI slices were resized to 224×224 pixels and
normalized to have pixel values between 0 and 1. Noise reduction techniques like
Gaussian blurring and histogram equalization were applied using OpenCV. Each
modality (T1, T2, FLAIR) was treated independently, and later combined in a multi-
channel format to enable the model to extract enriched features across different
tissue contrasts.
                                         23
The architecture included:
                                        24
7.7 Training Challenges
   • Several challenges arose during training:
   • High variance between different tumor shapes
   • Slight contrast differences between modalities
   • Overfitting due to class imbalance To mitigate these, class weights were
      introduced in the loss function, and synthetic samples were generated using
      SMOTE.
                                        25
Fig 7.2: Accuracy Lose of System
              26
                                       CHAPTER-8
                             RESULT AND DISCUSSION
8.1 Overview
    The model was evaluated using a publicly available multi-modal MRI brain
tumor dataset. This dataset contains annotated images across three modalities: T1-
weighted, T2-weighted, and FLAIR. It includes MRI scans categorized into three
major tumor classes: glioma, meningioma, and pituitary tumor. Each scan is pre-
labeled, which facilitates supervised learning. The dataset was divided into 70%
training, 15% validation, and 15% testing.
                                         27
8.4 Performance Metrics
• AUC-ROC: Area under the curve that quantifies overall classification ability.
                                          28
8.6 Confusion Matrix Analysis
      A confusion matrix was plotted to examine how well the system distinguishes
between tumor classes. It shows minimal misclassifications. Most errors occurred
between glioma and meningioma, likely due to visual similarity in certain MRI
slices.
                                       29
8.8 Visual Interpretation of Predictions
    Prediction samples were collected and compared with ground truth labels. In
almost all cases, the system accurately identified the tumor type. The prediction
overlay maps demonstrate the model's sensitivity in detecting tumor boundaries.
                                       30
                           Accuracy       Precision     Recall     F1-Score
      Model
                             (%)            (%)          (%)         (%)
  EfficientNetV2             96.2           95.8         95.4        95.6
Vision Transformer
                             97.4           97.0         96.6         96.8
       (ViT)
Hybrid CNN + ViT
                             98.6           98.2         98.0         98.1
    (Proposed)
                                    31
                                    CHAPTER-9
                     CONCLUSION AND FUTURE WORK
9.1 Overview
  The proposed hybrid deep learning system for brain tumor detection and
classification presents a powerful fusion of Convolutional Neural Networks (CNNs)
and Transformer models. By leveraging the strengths of both architectures, the
system effectively addresses several limitations inherent in traditional methods. The
use of multi-modal MRI data, advanced feature extraction, and attention-based
encoding provides a robust and scalable solution suitable for real-time medical
diagnostics.
                                           32
reducing human intervention and enhancing throughput. In addition, the attention
mechanism in Transformers provided interpretability, which is valuable in the
medical domain. Experimental comparisons highlighted a 7–10% increase in
classification accuracy and a reduction in false positives.
9.4 Limitations
    Despite promising results, the system does have certain limitations. First, the
computational complexity of combining CNN and Transformer architectures
requires substantial hardware resources, which may limit accessibility in resource-
constrained environments. Second, the model was trained on a limited dataset, which
could lead to overfitting or biased predictions if not adequately generalized.
Additionally, variability in MRI scanners and acquisition protocols across hospitals
may affect performance when deployed widely.
                                          33
GPUs for real-time clinical use. Incorporating semi-supervised or unsupervised
learning techniques may improve generalization across diverse datasets. Moreover,
federated learning can be introduced to train the model on decentralized data from
multiple hospitals while preserving data privacy. Another extension could involve
3D CNNs or video transformers to utilize volumetric and temporal information for
more accurate detection.
                                          34
   •     Utilize saliency maps (e.g., Grad-CAM or LIME) to highlight tumor regions.
   •     Generate textual justifications mimicking radiologist-like interpretations.
   •     Working with regulatory bodies like the FDA and CDSCO to conduct clinical
         validation trials will be essential to gain approval for real-world use.
                                            35
                                 CHAPTER-10
                                REFERENCES
[1] Pereira, S., Pinto, A., Alves, V., & Silva, C. A. (2016). Brain tumor segmentation
using convolutional neural networks in MRI images. IEEE Transactions on Medical
Imaging, 35(5), 1240–1251. This paper forms the foundation of CNN-based brain
tumor segmentation with efficient learning techniques tailored to 3D MRI data.
[3] Isensee, F., et al. (2021). nnU-Net: Self-adapting framework for biomedical
image segmentation. Nature Methods, 18, 203–211. Discusses dynamic neural
network architectures that auto-configure for optimal performance on medical
datasets.
[4] Kamnitsas, K., et al. (2017). Efficient multi-scale 3D CNN with fully connected
CRF for accurate brain lesion segmentation. Medical Image Analysis, 36, 61–78.
This paper introduces a 3D CNN approach combined with CRFs for improved lesion
boundary detection.
[5] Chen, L. C., et al. (2018). Encoder-decoder with atrous separable convolution
for semantic image segmentation. ECCV. Explores DeepLabV3+ which
significantly enhances feature granularity for semantic regions.
                                         36
[6] Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural
Information Processing Systems. Pioneers the Transformer architecture that enables
long-range dependencies in sequential data.
[7] Dosovitskiy, A., et al. (2021). An image is worth 16x16 words: Transformers for
image recognition at scale. ICLR. Demonstrates the feasibility of applying pure
Transformers to image classification tasks.
[8] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks
for biomedical image segmentation. MICCAI, 234–241. Introduces the U-Net
architecture that became a standard for biomedical segmentation.
[9] Selvaraju, R. R., et al. (2017). Grad-CAM: Visual explanations from deep
networks via gradient-based localization. ICCV. A visualization technique crucial
for explaining CNN decisions in medical imaging.
[10] He, K., et al. (2016). Deep residual learning for image recognition. CVPR.
Presents ResNet, a highly influential model enabling deeper networks without
vanishing gradients.
[11] Huang, G., et al. (2017). Densely connected convolutional networks. CVPR.
Introduces DenseNet, which reuses features and improves gradient flow through
deep architectures.
[12] Çiçek, Ö., et al. (2016). 3D U-Net: Learning dense volumetric segmentation.
MICCAI. Extends U-Net to 3D, supporting volumetric data typical in MRI.
                                        37
[13] Litjens, G., et al. (2017). A survey on deep learning in medical image analysis.
Medical Image Analysis, 42, 60–88. Offers a comprehensive review of deep learning
applications in medical imaging.
[15] Wang, G., et al. (2018). Interactive medical image segmentation using deep
learning with image-specific fine-tuning. IEEE Transactions on Medical Imaging,
37(7), 1562–1573. Shows how deep learning models can be customized for
individual patient scans.
[16] Menze, B. H., et al. (2015). The Multimodal Brain Tumor Image Segmentation
Benchmark (BRATS). IEEE Transactions on Medical Imaging, 34(10), 1993–2024.
Establishes the BRATS dataset standard used in tumor research.
[17] Shen, D., Wu, G., & Suk, H. I. (2017). Deep learning in medical image analysis.
Annual Review of Biomedical Engineering, 19, 221–248. Discusses broad
applications and ethical considerations in AI for healthcare.
[18] Liu, Z., et al. (2021). Swin Transformer: Hierarchical vision transformer using
shifted windows. ICCV. Proposes an efficient hierarchical Transformer architecture
suitable for high-resolution images.
                                          38
[19] Gao, Y., et al. (2022). ConvTrans: A convolutional Transformer network for
medical image segmentation. Medical Image Analysis, 78, 102432. Combines CNNs
and Transformers for better spatial-contextual balance.
[20] Zhou, Z., et al. (2019). UNet++: Redesigning skip connections to exploit
multiscale features in image segmentation. IEEE Transactions on Medical Imaging,
39(6), 1856–1867. Improves segmentation via nested and dense skip pathways.
[21] Hatamizadeh, A., et al. (2022). UNETR: Transformers for 3D medical image
segmentation. CVPR. Integrates Transformer encoders into 3D medical image
models for enhanced feature learning.
[22] Bai, W., et al. (2018). Automated cardiovascular magnetic resonance image
analysis with fully convolutional networks. Journal of Cardiovascular Magnetic
Resonance, 20(1), 65. Applies FCNs in a real-world hospital imaging workflow.
[23] Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-Net: Fully convolutional
neural networks for volumetric medical image segmentation. 3DV. A 3D model
well-suited for MRI volumes of brain structures.
[24] Abadi, M., et al. (2016). TensorFlow: A system for large-scale machine
learning. OSDI. Presents the software backbone for deploying scalable medical AI
applications.
[25] Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization.
ICLR. Introduces the optimization algorithm widely used in training medical image
classification models.
                                        39
                                   APPENDIX
SOURCE CODE
11.1 Data Preprocessing Code (Multi-modal MRI Data Loading and
Normalization)
import nibabel as nib import numpy as np from sklearn.preprocessing import
MinMaxScaler
t1 = load_img(t1_path)
t2 = load_img(t2_path)
flair = load_img(flair_path)
scaler = MinMaxScaler()
normalized = scaler.fit_transform(combined.reshape(-
1,3).return(combined)
return normalized
import torch
import torch.nn as nn
                                         40
class CNNFeatureExtractor(nn.Module):
def init(self):
super(CNNFeatureExtractor, self).init()
self.features = nn.Sequential(
nn.ReLU(),
nn.MaxPool3d(2)
nn.ReLU(),
nn.MaxPool3d(2),
return self.features(x)
class TransformerModule(nn.Module):
super(TransformerModule, self).init()
                                          41
self.transformer_encoder = TransformerEncoder(encoder_layers, num_layers=6)
x = self.linear_proj(x)
x = self.transformer_encoder(x)
return x
class BrainTumorClassifier(nn.Module):
super(BrainTumorClassifier, self).init()
self.cnn = CNNFeatureExtractor()
self.transformer = TransformerModule()
self.classifier = nn.Sequential(
nn.Linear(512, 128),
nn.ReLU(),
nn.Linear(128, num_classes)
features = self.cnn(x)
                                           42
  features = features.view(features.size(0), -1, 64)
encoded = self.transformer(features)
encoded = encoded.mean(dim=1)
out = self.classifier(encoded)
return out
model.eval()
predictions = []
with torch.no_grad():
inputs = batch['image'].to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
predictions.extend(preds.cpu().numpy())
return predictions
43