0% found this document useful (0 votes)

2 views39 pages

Alzheimer's Disease Detection Using Deep Learning and Machine Learning: A Review

Uploaded by

u2004048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views39 pages

Alzheimer's Disease Detection Using Deep Learning and Machine Learning: A Review

Uploaded by

u2004048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Artificial Intelligence Review (2025) 58:262

https://doi.org/10.1007/s10462-025-11258-y

Alzheimer’s disease detection using deep learning and

machine learning: a review

Saeed Mohsen1,2

Accepted: 11 May 2025

Abstract
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that significantly
impacts cognitive function, posing challenges in early diagnosis and treatment. Advances
in artificial intelligence (AI) have revolutionized medical image analysis, providing robust
frameworks for accurate and automated AD detection. This paper reviews recent develop-
ments in deep learning (DL) and machine learning (ML) models for AD classification, like
convolutional neural networks (CNNs), transfer learning, hybrid architectures, and novel
attention mechanisms. Additionally, applications of AD based on AI models, datasets,
preprocessing techniques, challenges, and recent studies in this field are discussed. Also,
the paper provides different medical modalities, factors of increasing risk of Alzheimer,
progress stages of this disease, and several metrics of assessing AI models’ performance.
These metrics such as accuracy, matthews correlation coefficient (MCC), F1-score, recall,
precision, area under the receiver operating characteristic (ROC) curve, confusion matrix,
and loss. Further, the paper presents several comparisons of different DL approaches for
AD, limitations, new trends, suggestions, and future directions for this evolving field.

Keywords Deep learning · Alzheimer disease · Artificial intelligence · Datasets ·

Transfer learning · Evaluation metrics · CNN · Medical applications

1 Introduction

Early detection of Alzheimer’s disease (AD) is essential for implementing effective pre-
ventative strategies. As the most prevalent chronic condition among the elderly, AD affects
a significant portion of the aging population, making timely diagnosis crucial to managing
its high incidence rate. AD is a leading cause of dementia worldwide, necessitating timely

Saeed Mohsen
saeed.mohsen@ksiu.edu.eg
1
Department of Electronics and Communications Engineering, Al-Madinah Higher Institute for
Engineering and Technology, Giza 12947, Egypt
2
Department of Artificial Intelligence Engineering, Faculty of Computer Science and
Engineering, King Salman International University (KSIU), South Sinai 46511, Egypt

13
262 Page 2 of 39 S. Mohsen

and precise diagnostic techniques. So, there are multiple potential causes of AD that affect
thinking, memory, and behavior in older individuals (Chaddad et al. 2018). Traditional
approaches rely on clinical assessments and neuroimaging analysis, often involving signifi-
cant manual intervention (Iqbal et al. 2023).
Deep learning (DL), with its ability to automatically extract meaningful features from
data, has emerged as a transformative tool in medical imaging, particularly in detecting and
classifying AD from modalities such as positron emission tomography (PET), magnetic
resonance imaging (MRI), and computed tomography (CT) scans (Kaya and Çetın-Kaya
2024). Classification tasks in AD detection aim to categorize patients into healthy, mild
cognitive impairment (MCI), or AD groups. This process helps in early intervention, par-
ticularly in MCI cases where the disease might progress to AD (Pei et al. Oct. 2022).
The progress of brain stages for AD is demonstrated in Fig. 1, which highlights the dif-
ferences between No Impairment “normal cognition (NC)” as a healthy stage, mild decline
“mild cognitive impairment (MCI)” as a midle stage, and severe decline “Alzheimer’s dis-
ease (AD)” as a clinical stage (Bellio 2021).
The likelihood of developing AD grows with age, particularly among individuals over
65 (Grundman and Petersen 2004). Several factors may contribute to an increased risk of
Alzheimer, including: Genetics: specific genetic variations have been linked to a heightened
risk of Alzheimer’s. Environmental Influences: exposure to harmful substances or experi-
encing head injuries can raise the likelihood of developing the disease. Lifestyle Choices:
unhealthy habits such as poor nutrition, inactivity, and other detrimental lifestyle behaviors
may play a role in increasing the risk. Medical Conditions: health issues like hyperten-
sion, diabetes, and elevated cholesterol levels are also associated with a greater risk of
Alzheimer’s (Malik et al. 2024). Figure 2 demonstrates some factors of increasing risk of
Alzheimer disease.

Fig. 1 Progress of brain stages for Alzheimer’s disease (Bellio 2021)

13
Alzheimer’s disease detection using deep learning and machine… Page 3 of 39 262

Fig. 2 Factors of increasing risk of Al-

zheimer disease

Fig. 3 Human brain mass for healthy and Alzheimer's

The commonly used neuroimaging modalities include MRI that provides structural
information, enabling the detection of brain atrophy, PET that offers functional insights,
capturing metabolic changes, and CT that helps in identifying structural abnormalities (Wu
et al. 2019; AlSaeed and Omar 2022). Alzheimer's disease leads to a significant reduction
in brain mass, Fig. 3 illustrates human brain mass for healthy and Alzheimer. In individuals
with normal cognition, brain shrinkage is minimal or negligible. However, those with MCI
experience an accelerated reduction in brain volume, losing around 1–2% annually—sig-
nificantly faster than typical aging (Huang et al. 2020). In AD, the rate of brain volume loss
increases further, reaching 3–5% per year. Regions like the hippocampus are particularly
affected, with shrinkage rates as high as 10–15% annually in advanced stages (Sungura et
al. 2021).
The main contributions of this review are the following:

● Summarizing the current AI models whether DL or ML for classifying Alzheimer’s

disease (AD) in brain MRIs.
● Reviewing several applications of AD detection using different AI models.
● Summarizing the datasets, preprocessing techniques, and various evaluation metrics of
AI models for AD detection.
● Reviewing the recent studies in this field with comarisons of DL approaches for AD
detection.
● Comparing AI Approaches versus traditional AD diagnostic methods.
● Presenting a discussion on real-world implementation of AI models and dataset adapt-
ability.
● Providing a discussion on the overfitting problem in DL models and their mitigation

13
262 Page 4 of 39 S. Mohsen

methods.
● Discussing the potential bias in AI models and mitigation strategies for this bias.
● Presenting a discussion on the challenges and limitations in this field of AD.
● Discussing the computational cost and latency in clinical AI deployment.
● Presenting new trends, suggestions, and future directions in DL for AD detection.
● Comparing several studies in AD detection in terms of the performance, methodology,
and key contributions.

The remainder of this paper is organized as follows: Sect. 2 presents AI models for AD
classification and detection. Section 3 provides several applications of deep learning in
Alzheimer disease detection. Section 4 presents different dataset and preprocessing for
Alzheimer disease detection. Section 5 presents recent studies in AD. Section 6 presents
comparison of AI approaches versus traditional AD diagnostic methods. Section 7 discusses
real-world implementation of AI models and dataset adaptability. Section 8 explains the
overfitting problem in DL models and their mitigation methods. Section 9 provides some
medical modalities used with AD. Section 10 presents the evaluation metrics used in AD
detection, while the potential bias in AI models and mitigation strategies are presented in
Sect. 11, while challenges and limitation in this field and comparisons of DL approaches for
AD are introduced in Sect. 12. Section 13 presents a discussion about computational cost
and latency in clinical AI deployment. New trends and suggestions in DL for AD detection
are presented in Sect. 14. Future directions in deep learning for AD detection are discussed
in Sect. 15. Section 16 introduces a benchmark comparison of DL models for AD detection.
Finally, the paper is concluded in Sect. 17.

2 Key AI models for AD classification and detection

This paper reviews state-of-the-art artificial intelligence (AI) models employed for AD
detection, highlighting classification methodologies, their performance, and the challenges
faced in clinical translation. So, several advancements in AI models for AD classification
are presented as follows.
Convolutional Neural Networks (CNNs) are widely used for image-based AD detection
due to their ability to extract hierarchical features efficiently (Feng et al. 2019). They consist
of convolutional layers that learn spatial patterns and pooling layers that reduce compu-
tational complexity while preserving essential information. CNNs have proven effective
in tasks like brain tumor classification (Mohsen et al. 2023a), object detection (Zou et al.
2023), and segmentation (Sahu et al. 2018; Mohsen et al. 2024).
Long Short-Term Memory (LSTM) networks are a type of RNN designed for sequential
data processing, addressing RNN limitations with memory cells and gating mechanisms.
The forget gate removes irrelevant data, while input and output gates regulate information
storage and usage (Nagabushanam et al. 2020; Chakraborty et al. 2020). Their ability to cap-
ture long-term dependencies makes them effective in tasks like speech recognition, machine
translation, and stock price prediction.
Hybrid models integrate multiple deep learning paradigms, such as CNNs with attention
mechanisms or RNNs, to enhance interpretability and performance (Mohsen et al. 2021a).
They are valuable for complex tasks like medical imaging and disease diagnosis, where

13
Alzheimer’s disease detection using deep learning and machine… Page 5 of 39 262

balancing accuracy and explainability is crucial. Examples include CNN + RNN models
for analyzing time-series MRI data to predict disease progression and multi-branch hybrid
models that combine CNNs trained on different modalities (e.g., MRI, PET) with a shared
attention mechanism for comprehensive insights (Prakash et al. 2023).
Transfer learning are pre-trained DL models trained on huge datasets and designed
to be fine-tuned for particular tasks. These models, such as BERT for language tasks or
VGG16 for image tasks, leverage transfer learning to adapt general knowledge to domain-
specific problems (Minoofam et al. 2023). Pre-trained models save time and computational
resources while achieving high performance in tasks like medical imaging, sentiment analy-
sis, and natural language processing. Also, pre-trained models such as Inception and ResNet
have been adapted for AD detection. Fine-tuning these networks on medical imaging data-
sets reduces the need for extensive labeled data and accelerates model convergence. For
example, combining CNNs with pre-trained VGG16 networks significantly improved clas-
sification accuracy in Alzheimer's studies.
Generative adversarial network (GAN) includes two networks—a generator and a dis-
criminator—working in opposition. The generator creates fake data, while the discriminator
evaluates whether the data is fake or real. Through this adversarial training process, the gen-
erator learns to produce highly realistic data. GANs are utilized in several tasks for example,
image synthesis, deepfake creation, and data augmentation for training other models (Tiwari
et al. 2020).
Autoencoder is a neural network designed to learn efficient representations of data
through compression and reconstruction. It has two main components: an encoder that
compresses the input into a latent representation and a decoder that reconstructs the input
from this representation. Autoencoders are useful for tasks such as noise reduction, anomaly
detection, and generating new data. Variational Autoencoders (VAEs) and Convolutional
Autoencoders are common variants used in different domains (Failed 2019).
Vision transformer (ViT) introduces a transformer-based architecture to image process-
ing, replacing the traditional convolutional approach. It divides images into patches, which
are treated as input tokens for a transformer model. By applying self-attention mechanisms,
ViTs focus on relationships between patches, making them effective for tasks like object
detection, image classification, and segmentation. Their scalability allows them to excel
with large datasets, though they may require significant computational resources (Gong et
al. 2024).
Deep neural network (DNN) is a general framework for neural networks with multiple
layers. It includes an input layer, hidden layers, and an output layer, with each neuron in
one layer linked to each neuron in the next. DNNs use optimization algorithms like gradi-
ent descent and backpropagation to learn patterns from data. This versatile architecture is
applied across diverse domains, including image recognition, and financial modeling, offer-
ing a robust foundation for many deep learning solutions (Wu et al. 2024).
Attention-based models, such as dual-attention mechanisms, focus on clinically relevant
regions, improving detection accuracy and reducing false positives. The common recent
attention models is based on a block, namely sequeeze and excitation (SE). Examples
include self-attention in transformers or spatial and channel attention in convolutional set-
tings. Attention models can highlight critical regions in brain MRI scans that contribute to
disease diagnosis, aiding clinicians in understanding model decisions (Li et al. 2024).

13
262 Page 6 of 39 S. Mohsen

K-nearest neighbors (KNN) is a simple, non-parametric supervised learning (SL) algo-

rithm utilized for regression and classification applications. KNN works by recognizing
the k closest data points to a provided query point, depend on a distance measurement like
Euclidean distance, and then predicting the output based on the majority class in case clas-
sification or the mean for a regression task of these neighbors. It is easy to implement and
interpretable but can be computationally expensive for huge dataset because of the need to
calculate distances for all points. KNN performs well with small datasets and when features
are scaled appropriately (Mohsen et al. 2021b).
Support vector machine (SVM) is a SL algorithm primarily utilized for classification
applications, though it can also applied for regression products. The SVM operates via
determining a hyperplane that optimum separates data points of various categories in a high-
dimensional space. It maximizes the margin between the hyperplane and the nearest data
points from each class, called support vectors. The SVM can also handle non-linear prob-
lems using kernel tricks like the radial basis function (RBF). It is effective in high dimen-
sional spaces, but can be computationally intensive with large datasets (Kazemi et al. 2022).
Decision tree (DT) is a tree-structured model utilized for regression and classification
tasks. It splits the dataset into subsets based on feature values, forming a tree with decision
nodes and leaf nodes. Each decision node represents a test on a feature, while leaf nodes
provide the output prediction (Rahmatillah et al. 2023). DT is easy to interpret both numer-
ics and categorical data. However, it is prone to over-fitting, which can be mitigated using
methods such as combining or pruning multiple trees in ensemble methods (Aaboub et al.
2023).
Ensemble learning is a powerful ML approach that combines multiple models, it
improves model accuracy by combining multiple weak learners. Techniques include bag-
ging (reducing variance, e.g., Random Forest), boosting (reducing bias, e.g., AdaBoost,
Gradient Boosting), and stacking (using a meta-learner for better predictions). These meth-
ods often outperform individual models and are widely used in competitions like Kaggle for
their robustness and ability to reduce overfitting. (Ramteke and Maidamwar 2023).
Artificial neural network (ANN) is a model inspired by biological neural networks
(BNNs). It includes an input layer, one hidden layer, and an output layer. Each layer con-
tains nodes linked by weights, which are updated during training to minimize the error.
ANNs can model complex relationships in data and are utilized for applications like regres-
sion, classification, and pattern identification. However, they need significant computational
resources and large datasets to perform well (Khan et al. 2023; Aswin et al. 2022).
Logistic regression is a model utilized for binary classification tasks. Despite its name,
it is a classification algorithm rather than a regression method. Logistic regression uses the
sigmoid function to map linear combinations of input features into a probability between 0
and 1. The output is thresholded (e.g., at 0.5) to assign class labels. Logistic Regression is
simple and works well for linearly separable data, but it struggles with non-linear relations
unless extended with feature transformations (Yadhukrishnan and MA 2023).
Naive bayes (NB) is a model based on Bayes' rule, assuming conditional independence
between features given the class label. Despite its "naive" assumption of feature indepen-
dence, it often performs surprisingly well for text classification and spam filtering. Variants
include Gaussian NB (for continuous features) and Multinomial Naive Bayes (for count-
based features like word frequency). It is efficient, requires minimal data, and is robust
to irrelevant features, but its independence assumption may limit performance in certain

13
Alzheimer’s disease detection using deep learning and machine… Page 7 of 39 262

datasets (Vijay and Verma 2023). Figure 4 summarizes most common AI models for AD
classification.

3 Several applications of Alzheimer disease detection with AI models

Artificial Intelligence (AI) has revolutionized the field of medical imaging and diagnostic
analysis. Its application in Alzheimer’s disease (AD) detection spans a wide range of clini-
cal, research, and technological domains. The applications of AI in this field span diagnostic,
therapeutic, and research domains. By leveraging advanced AI architectures and integrating
multimodal data, these models hold immense potential to improve early detection, stream-
line clinical workflows, and enhance personalized care for AD patients. We present many
key applications with highlighting its impact as follows:

Fig. 4 Most common AI models for AD classification

13
262 Page 8 of 39 S. Mohsen

1. Early Diagnosis and Risk Assessment: DL models enable early detection of MCI, a
precursor to AD, from structural and functional brain imaging. The impact of this appli-
cation is used to provide timely interventions, slowing disease progression. Also, it aids
in personalized treatment planning (Shuvo et al. 2022).
2. Clinical Decision Support Systems (CDSS): Automated diagnostic tools powered by
DL assist clinicians in identifying AD from MRI, PET, or CT scans with high accuracy.
The effect of this application is utilized to reduces diagnostic errors and inter-observer
variability. Additionally, it saves time for clinicians by automating feature extraction
and classification (Duarte de Almeida and Oliveira 2019).
3. Neuroimaging Biomarker Identification: DL models help identify biomarkers from
imaging data, such as hippocampal atrophy that correlate with AD. The benefit of this
application is to enhance understanding of disease pathology and support the develop-
ment of targeted therapies (Skolariki et al. 2020).
4. Disease Progression Prediction: Predicting the transition from MCI to AD using
sequential imaging data analyzed by DL models (e.g., hybrid CNN-RNN architectures).
The impact of this application is utilized to help in monitoring disease progression and
inform long-term care strategies and research studies (Guarín et al. 2024).
5. Personalized Treatment Monitoring: DL analyzes neuroimaging and clinical data to
assess the performance of treatments and interventions. The effect of this application is
used to enable dynamic adjustments to treatment plans and facilitate real-time monitor-
ing of therapeutic outcomes (Koutkias et al. 2010).
6. Cognitive Function Prediction: Predicting cognitive scores (e.g., MMSE) based on
imaging and non-imaging data using DL regression models. The benefit of this applica-
tion is to offer a non-invasive proxy for cognitive testing and support routine clinical
evaluations (Zhang et al. 2013).
7. Drug Development and Clinical Trials: DL aids in patient stratification by accurately
classifying participants into healthy, MCI, and AD groups, ensuring homogeneous
study cohorts. It effects on reduceing heterogeneity in clinical trials and facilitating
drug discovery and efficacy testing (Sonka and Grunkin 2002).
8. Public Health Screening: Large-scale, automated AD screening using DL-powered
platforms integrated with imaging tools (e.g., mobile MRI units or cloud-based sys-
tems). This application is used to expand access to diagnostic services in underserved
regions and support epidemiological studies and public health initiatives (Fletcher et al.
2017).
9. Integration with Wearable Devices: Combining DL models with data from wearable
sensors that track sleep, physical activity, or speech patterns to predict cognitive decline.
The impact of this application is used to offer non-invasive, continuous monitoring
of at-risk individuals. Also, it promotes early-stage intervention (Alattar and Mohsen
2023).
10. Real-Time Diagnosis in Telemedicine: DL models integrated into telemedicine plat-
forms analyze remotely acquired imaging and clinical data to provide diagnostic
insights. The effect of this application is utilized to increase diagnostic reach, especially
in rural areas and reduce patient burden of traveling for clinical assessments (Yan and
Song 2010).
11. Cross-Modality Analysis: DL combines data from various imaging modalities (e.g.,
MRI and PET) for more comprehensive diagnostics. There are two effects for this

13
Alzheimer’s disease detection using deep learning and machine… Page 9 of 39 262

application. The first one is improving classification accuracy, while the second effect
is providing a holistic view of structural and functional changes in the brain (Feng et al.
2021).
12. Educational Tools for Medical Training: DL-powered visualizations (e.g., Grad-CAM,
saliency maps) are used as teaching tools for medical students and radiologists to inter-
pret AD-related changes. There are twp benefits for this application: enhancing learning
through real-world data and reducing the gap between practical application and theo-
retical knowledge (Sakuma 2013).
13. Synthetic Data Generation: Generative Adversarial Networks (GANs) create synthetic
neuroimaging data to augment limited datasets. It addresses data scarcity and class
imbalance issues and facilitates training of more robust DL models (Failed 2023). Fig-
ure 5 shows several applications of Alzheimer disease with DL.

4 Datasets and preprocessing techniques for Alzheimer disease

detection

There are publicly available datasets in the field of AD, such as ADNI (Alzheimer’s Dis-
ease Neuroimaging Initiative) that are a comprehensive dataset with MRI and PET images,
OASIS (Open Access Series of Imaging Studies) that focus on aging and AD detection, and
AIBL (Australian Imaging, Biomarkers, and Lifestyle) dataset that provides longitudinal
neuroimaging data.
The success of DL models in Alzheimer’s disease classification heavily relies on high-
quality, well-labeled datasets. Several public datasets have become the benchmark for
researchers in this field, allowing for reproducible results and fostering collaboration among
researchers.

Fig. 5 Applications of Alzheimer disease with deep learning

13
262 Page 10 of 39 S. Mohsen

4.1 Datasets for Alzheimer disease detection

The following datasets are widely used in Alzheimer’s disease detection:

1. Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (Chu and Gebre-Amlak

2021) is a widely used resource for AD classification, containing MRI, PET scans, and
clinical data from subjects at various disease stages. It provides both cross-sectional and
longitudinal data, aiding in classification and disease progression tracking. Deep learn-
ing models trained on ADNI have shown high accuracy, and its inclusion of genetic data
supports multimodal classification approaches.
2. Open Access Series of Imaging Studies (OASIS) dataset (Marcus et al. 2010) provides
MRI scans and clinical data for individuals with AD, MCI, and healthy controls. It is
valuable for studying early Alzheimer’s detection, as it includes MCI patients. Research
using OASIS has shown deep learning models can identify early signs of AD, such as
brain atrophy and cortical thinning.
3. Australian Imaging, Biomarkers, and Lifestyle Study (AIBL) dataset (Bloch and Fried-
rich 2019) provides a valuable resource for AD research, offering MRI scans, PET data,
and clinical information for individuals at different stages of cognitive decline. It is par-
ticularly beneficial for studying the correlation between lifestyle factors, biomarkers,
and brain changes in Alzheimer’s disease. Research leveraging this dataset has shown
how deep learning models can integrate lifestyle data and imaging data to predict the
onset of Alzheimer's, highlighting the potential for preventive strategies.
4. Indian Cognitive and Health Data Repository (ICHDR) dataset (PJKRKSN and SDP
2023) For regional studies, the ICHDR dataset provides a rich source of data for
Alzheimer’s disease classification in the Indian population. This dataset includes brain
MRI images, neuropsychological assessments, and genetic information. Models trained
on ICHDR can be used to develop region-specific detection tools, as factors like cul-
tural practices and genetics may influence the presentation of AD.
5. Harvard Medical School (HMS) dataset (Assam et al. 2021) has features T2-weighted
brain MRI images with dimensions of 265 × 256 pixels. This collection comprises 613
images, categorized into two classes related to Alzheimer’s disease: 27 images fall
under the normal class, representing two cases, while 513 images belong to the abnor-
mal class, covering forty cases. The abnormal class encompasses conditions associated
with degenerative, cerebrovascular, and provocative diseases. A unique aspect of the
HMS dataset is its focus on individuals who are non-cognitively impaired at the outset,
making it especially significant for studies targeting the early detection of Alzheim-
er’s. The dataset spans multiple years, offering a longitudinal perspective that enables
researchers to observe changes in brain structure and function over time, capturing the
disease's early progression.

HMS promotes collaboration and inclusivity by offering its dataset freely to the research
community. However, as a newer initiative compared to established datasets like ADNI
and OASIS, HMS may currently have a more limited data volume, potentially restricting
the depth of analysis. Furthermore, while the dataset is publicly accessible, specific details
regarding data types and access protocols might not be as well-documented or user-friendly
as those provided by ADNI and OASIS.

13
Alzheimer’s disease detection using deep learning and machine… Page 11 of 39 262

Table 1 Comparisons of different datasets

References Dataset Diversity Size (AD Modality Strengths Limita-
patients) tions
Chu and ADNI Mostly White/Caucasian ~ 600–800 MRI, Comprehensive, Longitu-
Gebre-Am- PET, multi-modal, dinal data
lak (2021) fMRI large-scale increases
complexity
Marcus et OASIS Mostly White ~ 200 MRI Open access, Smaller
al. (2010) focused on dataset
aging-related compared
studies to ADNI
Bloch and AIBL Mostly White (Austra- ~ 250–300 MRI, PET Includes cogni- Geograph-
Friedrich lian population) tive scores, ically
(2019) longitudinal limited
data population
P. J. K, R. ICHDR More diverse, includes ~ 400–500 MRI Includes brain Small
K, S. N Asian populations neuropsy- dataset
and S. D. P chological
(2023) assessments
and genetic
information
Assam et HMS Ethnically diverse (Afri- ~ 1000 MRI High resolution, Only
al. (2021) can American, Hispanic, large scale consists
White) of two
classes

Fig. 6 Different datasets, processing techniques, and tools used with AD

Table 1 illustrates some different dataset comparisons. Figure 6 illustrates different data-
sets, preprocessing techniques, and processing tools used with AD.
All these mentioned datasets need different preprocessing techniques such as nor-
malization and scaling to ensure consistent intensity across scans, data augmentation to
expand dataset size, improving model robustness, and shuffling to prevent data leakage and

13
262 Page 12 of 39 S. Mohsen

enhances generalization. So, preprocceing techniques are very important to obtain a high
effectiveness of AI models.

4.2 Preprocessing techniques for Alzheimer disease detection

The following preprocessing techniques are utilized in Alzheimer’s disease classification:

1. Data augmentation is a technique used to artificially enlarge the size and diversity
of datasets by employing different transformations to the existing data. This process
is especially valuable in scenarios where the available data is limited. For images,
common augmentation methods include flipping, rotating, zooming, cropping, add-
ing noise, and color shifting. For text, techniques such as random insertion, synonym
replacement, deletion, or translation are used. The primary goal of data augmentation is
to enhance a model's prediction ability, reduce overfitting, and improve robustness by
exposing it to a broader range of variations in the input data (Nanthini et al. 2023).
2. Normalization is a preprocessing step that changes the scale of data features to bring
them within a consistent range, ensuring comparability. In image processing, normal-
ization typically involves scaling values of pixels from 0–255 to a range such as 0–1 or
-1 to 1. This technique is beneficial because it speeds up the training process, ensures a
more uniform feature distribution, and helps machine learning models converge faster.
A common normalization formula involves subtracting the average and dividing by the
standard deviation of the dataset (Huang et al. 2023).
3. Skull stripping is a preprocessing step in medical imaging, particularly for MRI scans.
This technique involves removing the skull and other non-brain tissues from the image,
leaving only the brain region. Skull stripping enhances the focus on brain-specific areas
by eliminating irrelevant features and noise, which, in turn, enhances the efficiency of
models analyzing these images. Methods for skull stripping include threshold-based
techniques, region-growing algorithms, and advanced deep learning-based automated
approaches (Singh et al. 2023).
4. Shuffling is the process of randomizing the order of data samples before feeding them
into a machine learning model. This step is crucial in preventing the model from learn-
ing biases or patterns that are specific to the order of the data. By shuffling, we ensure
that the training process does not inadvertently favor certain sequences or classes,
enhancing the model's ability to predict. Shuffling is often used in conjunction with
stochastic gradient descent (SGD) during model training to maintain randomness and
variability in the input (Nehal et al. 2023).
5. Gradient-weighted class activation mapping (Grad-CAM) is a visualization tech-
nique designed to provide insights into the decision-making process of deep learning
models, particularly CNNs. It works by leveraging the gradients flowing into the last
convolutional layer of the model to compute a weighted activation map. This map is
then used to create a heatmap that highlights the regions in an input image that con-
tribute most to the model’s prediction. Grad-CAM is especially useful for debugging,
understanding model behavior, and improving interpretability by revealing which parts
of an image influenced the decision (Quach et al. 2023).
6. Salience maps are another visualization tool that highlights the most important regions
of an input image influencing a model's output. These maps are generated by computing

13
Alzheimer’s disease detection using deep learning and machine… Page 13 of 39 262

the gradients of the model's output with respect to the input image, showing the sen-
sitivity of the output to changes in specific input features. Salience maps are widely
used in tasks like feature importance analysis, object detection, and attention mecha-
nisms, providing a clearer understanding of the model's focus and decision-making
process (Mukherjee et al. 2015). Table 2 shows comparisons of several preprocessing
techniques.

4.3 Explanation of model interpretability techniques

Techniques like SHAP and LIME provide powerful ways to understand and explain AI
model decisions, increasing clinician trust and promoting AI adoption in healthcare. By
integrating these methods into decision-support systems, we can improve diagnostic accu-
racy and treatment outcomes while ensuring greater acceptance by both doctors and patients.

4.3.1 Shapley additive explanations (SHAP)

SHAP is based on Shapley values from game theory, which analyze the contribution of
each feature to a model’s prediction. The core idea is to compute the average impact of each
feature across all possible feature orderings, providing both global and local interpretability
of the model. The SHAP has three advantages: the first one, it provides a consistent and fair
explanation of feature importance. Also, it can be applied to deep learning models, random
forests, and regression models. Further, it offers global insights into the model's behavior as
well as local explanations for individual predictions.

Table 2 Comparisons of several References Technique Benefits Challenges

preprocessing techniques
Nanthini et al. Data Expands dataset Risk of
(2023) Augmentation size, improves introducing
generalization unrealistic
transformations
Huang et al. Normalization Reduces inter- May lead to
(2023) subject variabil- loss of contrast
ity in intensity in critical
regions
Singh et al. Skull Stripping Removes non- Time-consum-
(2023) brain tissues for ing, requires ro-
cleaner inputs bust automation
tools.
Nehal et al. Shuffling Prevents data Adds compu-
(2023) leakage, ensures tational cost
balanced during each
training epoch
Quach et al. Grad-CAM Highlights criti- Can mislocalize
(2023) cal regions subtle features
Mukherjee et Saliency Maps Visual- Overcoming
al. (2015) izes feature noise in input
contributions data

13
262 Page 14 of 39 S. Mohsen

4.3.2 Local interpretable model-agnostic explanations (LIME)

LIME focuses on local interpretability by creating a simplified model (e.g., linear regres-
sion) to approximate the predictions of the original model in a small neighborhood around a
specific data point. This is achieved by generating perturbed data and analyzing how feature
variations affect predictions. The advantages of the Lime are: Fast and effective in explain-
ing individual predictions, model-agnostic, meaning it can be applied to any machine learn-
ing model, and helps in understanding why a model made a particular decision for a specific
patient.

4.3.3 Role of model interpretability in enhancing clinician trust and adoption

In healthcare, clinicians rely on logical and explainable reasoning when making critical
decisions. If they understand how and why a model provides a certain diagnosis, they will
have more confidence in using it. The ability to justify predictions is crucial, as medical
professionals need to ensure that AI-driven recommendations align with established clinical
knowledge and patient history. Techniques like SHAP and LIME enhance trust by offering
clear explanations of model decisions, allowing doctors to compare AI-generated insights
with their own expertise. This transparency reassures clinicians that AI is a supportive tool
rather than an unpredictable "black-box" system. One of the biggest barriers to AI adoption
in healthcare is the concern that models operate as opaque "black boxes," making decisions
without clear reasoning. Interpretable AI models address this issue by revealing which fac-
tors contribute to specific predictions, increasing transparency and credibility.

5 Recent studies in AD detection

There are recent studies in the field of AD presented (Slimi et al. 2024; Ching et al. 2024;
Alsubaie et al. 2024; Jo et al. 2019; Lu et al. 2019; Iriondo et al. 2203; Raza et al. 2023;
Gnanasegar et al. 2020). In (Slimi et al. 2024), the hybrid model integrating Xception
and DenseNet121 architectures has demonstrated outstanding performance, achieving an
accuracy of 99.85% for AD classification. This success is attributed to the combination of
Xception’s efficient depthwise separable convolutions and DenseNet121’s connected lay-
ers, which enhance feature extraction and improve the model's representational capacity.
The used dataset is preprocessed MRIs. The use of the Synthetic Minority Over-sampling
Technique (SMOTE) effectively addressed class imbalance, ensuring better generalization
across underrepresented categories. However, this hybrid model comes with notable chal-
lenges. The integration of two complex architectures results in high computational cost and
increased processing overhead, which poses difficulties for real-world scalability. Addition-
ally, the model's complexity may limit its adaptability to settings with constrained computa-
tional resources, highlighting the trade-off between performance and practicality.
In (Ching et al. 2024), the EfficientNet-B0 model, leveraging transfer learning, offers
a lightweight and efficient solution for AD detection, reaching an accuracy of 87.17%. Its
streamlined architecture makes it particularly appropriate for deployment on systems with
limited computational resources, like edge devices or smaller healthcare facilities. Despite
its advantages, the model has limitations in handling complex feature representations due to

13
Alzheimer’s disease detection using deep learning and machine… Page 15 of 39 262

its reduced network depth, which can impact its ability to discern subtle patterns in medical
imaging data. Additionally, its performance is relatively lower compared to deeper archi-
tectures, especially in multi-class classification tasks where a greater level of feature dis-
crimination is required. These trade-offs highlight the balance between model efficiency and
accuracy, with EfficientNet-B0 excelling in resource-constrained environments but facing
challenges in more demanding classification scenarios. The used datasets are MRI scans
divided into four classes (non-demented to severe AD).
In (Alsubaie et al. 2024), Capsule Networks (CapsNets) have shown promising potential
in Alzheimer’s disease classification because of their robust feature representation capa-
bilities. CapsNets excel in capturing spatial hierarchies and relationships between features,
which is particularly beneficial for analyzing complex brain imaging data. Their architec-
ture also helps reduce overfitting, even when applied to datasets with variations in segmen-
tation, enhancing model robustness. Despite these advantages, CapsNets face significant
challenges, including high computational demand and slower training times compared to
traditional CNNs. Additionally, their scalability remains limited, as handling large-scale
datasets efficiently is difficult. These trade-offs make CapsNets a compelling, yet computa-
tionally intensive, alternative for medical imaging tasks.
In (Jo et al. 2019), deep boltzmann machines (DBMs) have been applied to AD detection
by integrating multi-modal neuroimaging data, demonstrating their effectiveness in pre-
dicting the conversion from MCI to AD. By leveraging data from multiple sources, such
as MRI, PET, and clinical measures, DBMs capture complementary features that enhance
prediction accuracy and provide a holistic view of patient conditions. However, their effec-
tiveness depends heavily on the availability of high-quality, multi-modal datasets, which
are often challenging to collect in clinical practice. Additionally, the training of DBMs is
computationally intensive, requiring significant processing power and time. These limita-
tions, while notable, are balanced by their ability to process and synthesize complex, diverse
datasets, offering valuable insights in early diagnosis and disease progression modeling.
In (Lu et al. 2019), the CNN-LSTM hybrid model combines the spatial feature extrac-
tion capabilities of CNNs with the temporal sequence processing strength LSTM networks,
achieving a high accuracy of 98.5% on segmented datasets for Alzheimer’s disease detec-
tion. This integration allows the model to capture both spatial patterns from medical imaging
and temporal dependencies, such as progressive changes in brain scans over time. However,
this approach necessitates advanced pre-processing techniques, including segmentation
and normalization, to optimize data quality and ensure effective model training. Addition-
ally, the hybrid architecture introduces increased training complexity and higher memory
requirements, making it more computationally demanding compared to simpler models.
Despite these challenges, the CNN-LSTM hybrid model is a powerful tool for analyzing
time-sequenced medical data in Alzheimer’s research. This work utilized two MRI datasets
with Alzheimer’s disease stages.
In (Iriondo et al. 2203), the DeepAD model leverages advanced DL techniques for pre-
dicting AD progression by integrating clinical information and 3D MRI scans from multiple
cohorts. A key strength of DeepAD lies in its ability to address domain adaptation and inter-
study biases through adversarial training, which enhances its robustness across diverse data-
sets. The model effectively combines clinical data with imaging features, utilizing mutual
information loss to promote domain generalization and improve prediction consistency
across varying data sources. Despite its strengths, DeepAD’s multi-network architecture

13
262 Page 16 of 39 S. Mohsen

makes it computationally intensive, demanding significant processing power and memory

resources. Furthermore, its reliance on high-quality multi-modal datasets can limit its appli-
cability in settings where such data are unavailable, posing challenges for universal deploy-
ment. Nonetheless, DeepAD represents a significant step forward in integrating diverse data
types for accurate disease progression prediction.
In (Raza et al. 2023), the DenseNet and ResNet-based classification approach combines
the strengths of DenseNet-169 and ResNet-50 architectures for robust feature learning, uti-
lizing datasets such as OASIS and other publicly available MRI repositories. This dual-
model strategy has proven highly effective in detecting Alzheimer’s disease, particularly
in its early stages, achieving high accuracy levels. DenseNet's efficient feature reuse and
ResNet's residual learning help capture intricate patterns in neuroimaging data. However, a
notable limitation of these models is their susceptibility to performance degradation when
dealing with imbalanced datasets. Without the application of preprocessing techniques such
as oversampling or data augmentation, the models may struggle to generalize effectively,
especially when minority classes are underrepresented. These challenges underscore the
significance of preprocessing in improving model efficiency for real-world applications.
In (Gnanasegar et al. 2020), an integration of a LSTM model with a Boruta based fea-
ture selection presented for a classification task. This model trained on the OASIS MRI
dataset for Alzheimer's disease. The Boruta algorithm, a wrapper-based feature selection
method, was employed to identify Clinical Dementia Rating (CDR) and Mini Mental Status
Examination (MMSE) scores as key predictors of dementia. The LSTM-based model clas-
sified subjects as demented or non-demented, achieving an impressive 94% accuracy. A key
advantage of our approach is its ability to mitigate overfitting, ensuring robust and reliable
performance. Table 3 summarizes several recent studies on different DL models for AD
classification.

6 Comparison of AI approaches versus traditional AD diagnostic

methods

Traditional methods (MMSE, CDR, PET) remain valuable, but have limitations in early
detection, cost, and scalability, while AI models outperform them in accuracy, efficiency,
and early detection, especially when using deep learning on MRI and PET data. AI could
augment traditional methods rather than replace them, assisting clinicians in making faster,
more precise diagnoses.
Traditional AD diagnostic methods such as MMSE, CDR, and PET. The MMSE is a
simple 30-question cognitive test used to assess memory, attention, and problem-solving.
However, it lacks sensitivity to early-stage AD and is influenced by education level and lan-
guage skills, while the CDR is a clinician-led structured interview evaluating six domains
(memory, orientation, problem-solving, etc.). Results depend on expert interpretation, mak-
ing it prone to variability. The PET scans are highly accurate, but expensive and not widely
available. PET detects amyloid-beta and tau protein deposits, biomarkers of AD.
AI-based models can analyze MRI/PET changes and detect neurodegeneration years
before cognitive symptoms appear. Some models achieve > 90% accuracy, compared to
MMSE’s ~ 80%. DL models use MRI scans to objectively quantify brain atrophy in key
regions (e.g., hippocampus), reducing subjectivity. AI models are more accessible and less

13
Alzheimer’s disease detection using deep learning and machine… Page 17 of 39 262

Table 3 Several Recent Studies Study Model Accu- Strengths Weaknesses

on different DL models for racy
Alzheimer's disease classification (%)
Slimi Xcep- 99.85 Handles High com-
et al. tion + DenseNet121 imbalance, putational
(2024) excellent cost
accuracy
Ching EfficientNet-B0 87.17 Lightweight, Lim-
et al. efficient ited feature
(2024) depth
Alsubaie Capsule Networks 93.50 Robust feature High com-
et al. (CapsNet) representation putational
(2024) demand
Jo et al. DBM with Multi- 98.8 Effective Requires
(2019) modal Data multi-modal high-quali-
integration ty datasets
Lu et al. CNN-LSTM Hy- 98.50 Captures spa- Increased
(2019) brid Model tial–temporal preprocess-
features ing and
complexity
Iriondo DenseNet + Adver- – Robust to High com-
et al. sarial Networks domain shifts, putational
(2203) multi-modal cost, data-
learning dependent
Raza DenseNet169 97.84 Feature Class
et al. richness imbalance
(2023) issues
Gnanas- LSTM 94 mitigate Low
egar overfitting and accuracy,
et al. Boruta feature limited
(2020) selection analysis

costly, while still achieving high diagnostic performance. Some DL models trained on MRI
data achieve accuracy comparable to PET-based assessments. Table 4 shows a comparison
of AI Approaches versus Traditional AD Diagnostic Methods.

7 Real-world implementation of AI models and dataset adaptability

In this Section, we present some real-world implementations of AI models in clinical set-

tings and their adaptability to different datasets.

7.1 AI in medical imaging

Application: AI models like Google's DeepMind and IBM Watson Health analyze X-rays,
MRIs, and CT scans to detect diseases such as cancer, pneumonia, and brain hemorrhages.
Model: Deep learning-based Convolutional Neural Networks (CNNs), use case: radiology
and pathology. Adaptability: AI models trained on large, diverse datasets like the ChestX-
ray14 dataset (NIH) can be fine-tuned with local hospital data to improve specificity. Also,
transfer learning enables adaptation to new datasets with fewer labeled images, allowing
hospitals in different regions to deploy these systems efficiently.

13
262 Page 18 of 39 S. Mohsen

Table 4 Comparison of AI ap- Aspect Traditional AD diag- AI-based approaches

proaches versus traditional AD nostic methods
diagnostic methods
Type of data Clinical assessment Multimodal data integra-
used (MMSE, CDR), neuro- tion (MRI, PET, genetic,
imaging (PET, MRI), behavioral, electronic
genetic tests health records)
Subjectivity High (depends on Low (objective,
clinicians expertise) data-driven)
Early detection Limited, often detects Superior, can detect
ability moderate to severe preclinical AD before
cases symptoms appear
Time and Cost Time-consuming, Faster Processing, re-
costly (especially PET duces costs over time
scans)
Scalability Limited by clini- Highly scalable, can
cians availability and analyze large datasets
resources automatically
Interpretability Well understood by Improving with explain-
clinicians able AI (XAI), but still a
challenge
Longitudinal Requires repeated AI methods can track
tracking clinical visits cognitive decline over-
time using DL
Accuracy Moderate Accuracy Higher accuracy (over
(MMSE ~ 8%, CDR: 90% for some AI
clinical dependent) models)

7.2 AI for predictive analytics in ICU

Application: AI-powered sepsis watch system at Duke University analyzes vital signs, lab
results, and medical history to predict sepsis risk in ICU patients. Model: Recurrent Neural
Networks (RNNs) and Long Short-Term Memory (LSTM) networks, use case: early sepsis
detection. Adaptability: the model integrates data from Electronic Health Records (EHR)
such as MIMIC-III (MIT). Also, continuous training on new patient data ensures adaptabil-
ity to different populations and hospital workflows.

7.3 AI in personalized treatment planning

Application: IBM Watson for Oncology assists doctors by recommending personalized

treatment plans based on patient history, clinical trials, and medical literature. Model: Rein-
forcement Learning (RL) and Decision Support Systems, use case: oncology (cancer treat-
ment). Adaptability: It uses Natural Language Processing (NLP) to analyze global and local
datasets. Also, customization with regional treatment guidelines ensures adaptability to dif-
ferent healthcare environments.

7.4 AI-powered robotic surgery

Application: The da Vinci Surgical System enhances precision in robotic-assisted surger-

ies by analyzing real-time data. Model: Computer Vision & Reinforcement Learning, use
case: minimally invasive surgeries. Adaptability: AI learns from past surgical procedures

13
Alzheimer’s disease detection using deep learning and machine… Page 19 of 39 262

and adapts to different patient anatomies. Also, federated learning allows hospitals to share
anonymized data to improve model performance without compromising patient privacy.
When evaluating the potential clinical adoption of different models, several key factors
must be considered alongside accuracy: clinical adoption of models depends on interpret-
ability, scalability, and efficiency. Explainable AI (XAI) methods enhance trust by provid-
ing clear insights into model decisions. Scalability factors, such as computational cost and
EHR integration, affect real-world deployment, with lighter or hybrid models being more
feasible. Balancing accuracy with efficiency is crucial, as high-performance models with
excessive computational demands may be impractical for widespread clinical use.

7.5 Generalizability of AI models in healthcare

Generalizability refers to an AI model’s ability to maintain high performance when applied

to new datasets, different hospitals, and diverse patient populations. Challenges to general-
izability as follows:
Data Distribution Shift: AI models trained on one dataset may not perform well on data
from another region due to variations in imaging protocols, demographics, and clinical prac-
tices. Additionally, limited Representation: many AI models are trained on datasets from a
specific region, leading to biases when applied globally. Also, overfitting to training data:
deep learning models might memorize patterns rather than learning generalizable features.

7.6 Scalability of AI models in clinical workflows

Scalability refers to an AI model’s ability to handle increasing amounts of data, users, and
healthcare facilities while maintaining efficiency. Challenges to scalability as follows: com-
putational Costs: Large AI models require significant processing power, limiting deployment
in resource-constrained hospitals. Additionally, integration with EHR Systems: Hospitals
use different Electronic Health Record (EHR) formats (e.g., Epic, Cerner, Meditech), mak-
ing AI integration complex. Also, regulatory and compliance Hurdles: AI models must meet
local regulations (e.g., HIPAA, GDPR) before large-scale deployment.

7.7 Deployment challenges of AI models

Despite the potential of AI to revolutionize healthcare, deploying AI models in real-world

clinical environments presents several challenges. These challenges range from technical
and regulatory issues to ethical and operational hurdles.

1. Data-Related Challenges: healthcare data is sensitive and protected by regulations like

HIPAA (USA), GDPR (Europe), and other regional laws. AI models require large data-
sets, making compliance difficult. Also, healthcare data is stored in various formats
across hospitals (EHR systems like Epic, Cerner, and Meditech) with different struc-
tures. Lack of standardization makes integration difficult.
2. Model-Related Challenges: many AI models, especially deep learning models, func-
tion as "black boxes," making it difficult for clinicians to understand how decisions are
made. Also, AI models may become less accurate over time due to changes in medical
practices, equipment, or patient demographics (a phenomenon known as model drift).

13
262 Page 20 of 39 S. Mohsen

3. Operational & Infrastructure Challenges: AI models often require significant modifica-

tions to existing clinical workflows, leading to resistance from healthcare professionals.
Additionally, many hospitals lack in-house AI expertise, making it difficult to maintain
and update AI models.
4. Regulatory and Ethical Challenges: AI models must comply with stringent regulations
before they can be used in clinical practice (e.g., FDA approval in the USA, CE mark-
ing in Europe). Also, patients may be reluctant to trust AI-based diagnoses or treatment
recommendations.

8 Overfitting problem in DL and mitigation methods

8.1 Overfiting description

Overfitting occurs when a trained model becomes overly adapted to the training data, lead-
ing to excellent performance on training data but poor generalization to test or unseen data.
This means the model memorizes patterns specific to the training set rather than learning
general trends that can be applied to new data.
Overfitting typically happens when the model is too complex, containing too many
parameters relative to the dataset size. It also occurs when the model fits the training data
too tightly, capturing noise or irrelevant patterns instead of useful trends. Additionally, if
the model relies on unnecessary or irrelevant features, it becomes ineffective when tested
on new datasets.
There are several signs that indicate a model is overfitting. The most common sign is an
extremely high accuracy on training data but a significant drop in performance on test data.
Another indicator is a large gap between training accuracy and test accuracy, showing that
the model is not generalizing well. If a model is excessively complex without improving
real-world predictions, it is also likely to be overfitting.

8.2 Mitigation methods against overfiting

To reduce overfitting, one effective method is increasing the dataset size. More data helps
the model generalize better instead of memorizing training samples. When additional data
is not available, data augmentation techniques, such as image rotation, flipping, brightness
adjustments, and adding noise, can artificially expand the dataset. Another approach is to
simplify the model. Reducing the number of layers in neural networks or selecting a less
complex algorithm can prevent overfitting. Similarly, reducing the number of parameters in
the neural network architecture helps in making the model more generalizable. Cross-vali-
dation is also a useful technique for mitigating overfitting. K-Fold Cross-Validation ensures
that the model is tested on multiple subsets of data, reducing the likelihood of overfitting
by providing a better estimate of its performance on unseen data. Regularization techniques
play a crucial role in preventing overfitting. L1 regularization (Lasso) penalizes the absolute
values of parameters, forcing some of them to become zero and effectively selecting only
important features. L2 regularization (Ridge), on the other hand, penalizes the squared val-
ues of parameters, reducing their magnitude without eliminating them, which results in a
more stable model. For neural networks, dropout is an effective regularization method that

13
Alzheimer’s disease detection using deep learning and machine… Page 21 of 39 262

randomly disables neurons during training to prevent the model from depending too much
on specific features. Feature selection is another key strategy. Removing unnecessary or
low-impact features reduces the complexity of the model, making it less prone to overfit-
ting. Dimensionality reduction techniques such as Principal Component Analysis (PCA)
can help retain only the most significant features while discarding irrelevant ones. Adding
noise to training data is another way to prevent overfitting. By introducing random varia-
tions, the model learns to generalize better instead of relying on minor patterns that may not
be present in real-world data. This is especially useful in neural networks to enhance their
robustness. Lastly, early stopping is a widely used technique to avoid overfitting. By moni-
toring the model's performance on a validation set, training can be stopped when validation
accuracy starts to decline, preventing the model from memorizing the training data instead
of generalizing.

9 Medical modalities used with AD detection

In this Section, we explain different modalities used for Alzheimer disease detection. Each
modality has unique strengths, making them complementary tools for diagnosis and research
in various medical fields, especially neurology and oncology.
Computed tomography (CT) is a medical imaging technique that utilizes X-rays to pro-
duce detailed cross-sectional images of the body. The patient lies in a rotating scanner where
X-rays capture data, which a computer reconstructs into 2D or 3D images. CT scans are
commonly used to detect tumors, fractures, and internal bleeding, making them invalu-
able in emergency diagnostics due to their speed. They give high-resolution images of both
bones and soft tissues, but CT scans involve exposure to ionizing radiation and are less
effective than MRI for visualizing soft tissue contrast (Gao and Hui 2016).
Structural magnetic resonance imaging (sMRI) is a non-invasive imaging technique
that utilizes roubst magnetic fields to produce highly detailed images of body structures.
It is particularly useful for visualizing brain anatomy, helping detect abnormalities such
as tumors or atrophy. sMRI plays a key role in studying structural changes in neurodegen-
erative diseases like Alzheimer’s. It offers excellent soft tissue contrast without exposing
patients to ionizing radiation. However, it is time-consuming, expensive, and unsuitable for
individuals with metal implants or severe claustrophobia (SSCS and B U 2023).
Positron emission tomography (PET) is a functional imaging technique that reveals met-
abolic or biochemical activity in tissues. A radioactive tracer is injected into the body, and
as it decays, the PET scanner detects the emitted gamma rays to generate images. PET scans
are widely used to identify cancer, monitor its spread, and assess brain activity in condi-
tions like Alzheimer’s. They are also effective in evaluating heart function and blood flow.
While PET provides valuable functional data and is sensitive to abnormal cellular activity,
it involves exposure to radioactive materials and has lower spatial resolution compared to
MRI or CT (Salah et al. 2024).
Diffusion tensor imaging (DTI) is a form of MRI that maps the diffusion of water mol-
ecules within tissues, primarily to study neural pathways in the brain. By measuring the
direction of water diffusion, DTI visualizes white matter tracts and brain connectivity. It is
particularly useful for diagnosing conditions like traumatic brain injury, multiple sclerosis,
and stroke, as well as for research in neural development and disorders. DTI is non-invasive

13
262 Page 22 of 39 S. Mohsen

and effective for assessing white matter integrity but is sensitive to motion artifacts and
requires complex data analysis (Sang and Li 2024).
Functional magnetic resonance imaging (fMRI) measures brain activity via detecting
changes in blood flow associated with neural activity (Yang et al. 2024). This technique uses
the Blood Oxygen Level-Dependent (BOLD) signal to pinpoint active brain regions during
specific tasks or rest. fMRI is a vital tool in cognitive neuroscience, mapping functions like
motor and sensory areas, and aiding pre-surgical planning for brain tumor removal. While it
is non-invasive and offers high spatial resolution, fMRI is an indirect measure of neuronal
activity, relying on blood flow changes, and is highly sensitive to motion artifacts.
Figure 7 illustrates the percentage of utilization of each different imaging modality with
Alzheimer Disease.

10 Evaluation metrics used in AD detection

In Alzheimer's disease (AD) classification, assessing the performance of deep learning mod-
els is essential to ensure their reliability and clinical applicability. Various evaluation metrics
are employed to assess model performance, each providing unique insights into how well
a model distinguishes between healthy and AD patients. There are many standard metrics
for model assessing include Precision, Recall, Accuracy, F1-Score to quantify classification
performance, Receiver Operating Characteristic-Area Under Curve (ROC-AUC) to analyze
discriminative ability, and also various loss Metrics to evaluate convergence during training.
Each metric has its advantages and is suitable for different types of analysis, based on the
nature of the dataset, the issue at hand, and the consequences of model errors. The following
are some of the most commonly used metrics, along with the following Equations (Mohsen
et al. 2023b; Chicco et al. 2021; Patil and Nisha 2021).

10.1 Accuracy

It is one of the most basic and oftenly utilized metrics in classification tasks. It represents
the proportion of correctly predicted samples (both positive and negative) to the overall
number of instances. In the context of Alzheimer’s classification, accuracy calculates how
often the model correctly classifies both the Alzheimer's patients and healthy individuals.
Equation (1) represents the accuracy.

Fig. 7 Percentage of utilization of each

different imaging modality with Al-
zheimer disease

13
Alzheimer’s disease detection using deep learning and machine… Page 23 of 39 262

TP + TN
Accuracy = (1)
TP + FP + TN + FN

where TP = True Positives (correctly predicted AD cases), TN = True Negatives (cor-

rectly predicted healthy cases), FP = False Positives (healthy cases predicted as AD), and
FN = False Negatives (AD cases predicted as healthy). While accuracy is a valuable mea-
surement, it can be misleading when dealing with unbalanced datasets, where the number of
healthy individuals far outweighs the AD patients.

10.2 Precision

In medical classification tasks like AD detection, where the consequences of FPs and FNs
are significant, more measurements like precision, recall, and F1-score are preferred. Pre-
cision (also known as PPV) measures the proportion of true positive predictions between
all the positive predictions made by a model. This is especially important when false posi-
tives (misdiagnosing healthy individuals as AD patients) are a concern. It is represented by
Eq. (2).

TP
P recision = (2)
TP + FP

10.3 Recall

It is also known as sensitivity, indicates the proportion of actual positive cases that were cor-
rectly identified by a model. Recall is particularly important when false negatives (failing to
identify AD patients) are a concern. Recall is calculated by Eq. (3).

TP
Recall = (3)
TP + FN

10.4 F1-score

It is a harmonic mean of precision and recall, providing a balance between them. The
F1-score is especially beneficable when the dataset is unbalanced, as it provides a more
balanced view of the model’s efficiency than accuracy alone. Studies in AD detection often
prioritize F1-score because misdiagnosing AD patients (false negatives) can be more dan-
gerous than misclassifying healthy individuals. It is determined through Eq. (4).

P recision × Recall
F 1 − score = 2 × (4)
P recision + Recall

10.5 Area under the receiver operating characteristic curve (AUC-ROC)

It is another popular metric, particularly in binary classification tasks like Alzheimer's dis-
ease detection. It plots the True Positive Rate (Recall) against the False Positive Rate (FPR),
which is defined in Eq. (5).

13
262 Page 24 of 39 S. Mohsen

FP
FPR = (5)
FP + TN

The area under the curve (AUC) quantifies the ability of the model to distinguish between
the two classes (AD vs. healthy). A value of 1 refers to perfect classification, while a value
of 0.5 suggests that the model performs no better than random guessing. AUC-ROC is
particularly useful when comparing models with different thresholds for decision-making.
Higher AUC values indicate better model discrimination between AD and non-AD cases.
AUC-ROC is estimated via Eq. (6).
∫ 1
AU C = T P Rd(F P R)(6)
0

10.6 Matthews correlation coefficient (MCC)

It is another metric that is valuable for evaluating binary classification models, especially
when dealing with imbalanced datasets. Unlike accuracy, MCC takes into account all four
confusion matrix elements (TP, TN, FP, FN) and provides a balanced metric of prediction
performance. MCC ranges from -1 (perfect inverse prediction) to + 1 (perfect prediction),
with 0 refering random predictions. MCC is particularly useful when both classes in the
dataset are of similar importance, as it considers the trade-offs between false positives (FPs)
and false negatives (FNs). The Eq. (7) represents the MCC.

(T P × T N ) − (F P × F N )
M CC = √ (7)
(T P + F P ) (T P + F N ) (T N + F P ) (T N + F N )

10.7 Confusion matrix (CM)

It is not a metric by itself, but a valuable tool for visualizing a model's performance. It dis-
plays the counts of true positive, true negative, false positive, and false negative values in a
matrix form, allowing for easy calculation of other measurements e.g., accuracy, F1-score,
precision, recall. Table 5 illustrates statistical values of confusion matrix (CM). This matrix
helps in understanding how well the model differentiates between two classes and where it
makes confusions, which is essential to medical diagnoses where the consequences of false
predictions can be significant.

10.8 Loss

It is often utilized to assess the performance of classification models that output proba-
bilities rather than discrete labels. This metric quantifies the accuracy of the probabilistic

Table 5 Statistical values of – Predicted: negative Predicted: positive

confusion matrix
Actual: negative TN FP
Actual: positive FN TP

13
Alzheimer’s disease detection using deep learning and machine… Page 25 of 39 262

predictions, penalizing incorrect classifications more heavily when the model is confident
about its wrong predictions. Equation (8) illustrates the log loss.

N
1
Loss = − [yi log 2 yi + (1 − yi ) log 2 (1 − yi )](8)
N
i=1

where yi is the true label of the ith instance (0 or 1), while yi is the predicted label for the ith
instance, and N is the total number of instances.
Lower loss values refer better model performance, with perfect predictions achieving a
log loss of 0.
These evaluation metrics each offer distinct advantages according to the nature of the
Alzheimer’s disease classification problem. While accuracy provides a general sense of
model performance, metrics like precision, F1-score, AUC-ROC, recall, and MCC give a
more detailed understanding of model strengths and weaknesses, particularly in the case of
imbalanced datasets. The use of confusion matrices and log loss further enhances model
evaluation, providing deeper insights into the specific types of losses a model makes. Under-
standing these metrics is crucial for implementing a DL model that isn’t only accurate, but
also reliable and safe for use in clinical settings.

10.9 Error analysis in AI models for misclassification patterns

Error analysis is a crucial step in evaluating AI models, especially in high-stakes fields such
as healthcare. By identifying common misclassification patterns, researchers and clinicians
can improve model performance, enhance interpretability, and mitigate risks in clinical deci-
sion-making. There are several misclassification patterns often fall into several categories:
false positives (Type I Errors), false negatives (Type II Errors), confusion between similar
classes, bias towards majority class, and context-dependent errors. Moreover, several fac-
tors contribute to these misclassification errors: data imbalance, feature overlap, poor data
quality, limited training data, and bias in training data. Misclassification in AI models can
have serious consequences in healthcare: false positives: unnecessary interventions, false
negatives: missed diagnoses, trust and adoption challenges, and ethical and legal concerns.

11 Potential bias in AI models and bias mitigation strategies

Bias in AI models for Alzheimer’s Disease (AD) diagnosis presents significant challenges
that can lead to disparities in healthcare outcomes. Ethical concerns, including fairness,
transparency, and accountability, must be addressed to ensure AI-driven diagnostic tools
benefit all patient populations equitably. By implementing robust bias mitigation strategies
such as improving data diversity, auditing algorithms, and enhancing model transparency,
we can reduce disparities and improve trust in AI-assisted healthcare.
Fairness in AI-driven AD diagnosis requires a collaborative effort between clinicians,
data scientists, and policymakers to create models that are both accurate and ethically
responsible. AI should serve as a decision-support tool that enhances, rather than replaces,
human expertise. With ongoing monitoring and refinement, AI has the potential to revolu-

13
262 Page 26 of 39 S. Mohsen

tionize early AD detection, ensuring timely and precise diagnosis for all patients, regardless
of their background. Also, AI tools should be designed to support equitable access to early
AD screening and intervention, regardless of a patient’s socioeconomic status or location.

12 Challenges, limitations, and comparisons of deep learning

approaches for AD

This section presents many challenges in this field such as AD datasets are often limited in
size and skewed towards certain classes, impacting model training. Also, models learned
on particular datasets may struggle to predict across different populations and imaging
protocols. In addition, deep learning models achieve high accuracy, their black-box nature
hinders clinical adoption. Also, there are key challenges in AD detection such as early detec-
tion, imaging, heterogeneity of alzheimer’s, lack of ground truth, longitudinal monitoring,
and complexity of neurological data. Table 6 presents many challenges comparisons.
Also, a comprehensive comparison of various deep learning (DL) models is presented,
their architectures, datasets, preprocessing techniques, evaluation metrics, and challenges.
The aim is to highlight the strengths and weaknesses of different approaches. Table 7 shows
some comparisons for models' architecture.

13 Computational cost and latency in clinical AI deployment

Deploying AI models in clinical settings involves computational costs and latency chal-
lenges. Deep learning models for AD detection, especially using MRI/PET, require high
computational power and may not be feasible for real-time use. EEG- and speech-based
detection methods are more suitable for real-time applications due to lower inference times
(milliseconds to seconds). Real-time AD detection using EEG or speech analysis is feasible
with lightweight AI models. Cloud-based solutions introduce network latency, while edge
AI and hardware optimizations (e.g., model compression, FPGA acceleration) can improve
feasibility. Balancing accuracy, cost, and computational efficiency is key to successful
deployment in real-world clinical environments. Deep learning models for medical imaging

Table 6 Comparisons of Model type Impact of deep learning Solutions

challenges model
Limited dataset Transfer learning models Data augmenta-
size mitigate this to an extent tion, synthetic data
generation
Imbalanced CNNs may overfit to domi- Use GANs or resa-
classes nant classes mpling techniques
Model Black-box nature of CNNs Introduce explain-
interpretability and hybrids reduces clinical able AI techniques
trust (e.g., Grad-CAM)
Generalization Poor cross-dataset perfor- Multi-domain
mance of specialized models training and
testing
High computa- Complex models like hybrid Optimize archi-
tion costs architectures demand sub- tectures, use cloud
stantial resources computing

13
Alzheimer’s disease detection using deep learning and machine… Page 27 of 39 262

Table 7 Comparisons for mod- Model type Advantages Limitations

els’ architecture
CNN Extracts hierarchical spatial Requires large
features, efficient for neuro- datasets to avoid
imaging data overfitting; limited
temporal analysis
Transfer learning Leverages pre-trained mod- Potential mis-
els, reducing computation match between
time and data requirements pre-trained and
medical domains
Hybrid models Combines advantages of Increased com-
different architectures (e.g., plexity, computa-
CNN + RNN) tional overhead
Attention Focuses on clinically May require large
mechanisms relevant regions, improving datasets for robust
interpretability training of atten-
tion modules
GANs Generates synthetic data for Can introduce
augmentation, addressing artifacts affecting
class imbalances model robustness

(e.g., CNNs, transformers) require high computational power for inference, especially when
processing high-resolution MRI, PET, or EEG scans. Large-scale models (e.g., deep neural
networks with millions of parameters) demand powerful GPUs/TPUs, increasing deploy-
ment costs. Edge AI solutions (e.g., deploying models on local hospital servers or medical
devices) must balance accuracy with computational efficiency. Inference time depends on
hardware (CPU vs. GPU), model size, and input complexity.

14 New trends and suggestions in DL for Alzheimer disease detection

In this section, we discuss many directions aim to enhance accuracy, usability, and clinical
impact in Alzheimer's research. There are many trends in this field are presented as follows:

1. Multi-modal data integration: Combining imaging (MRI, PET) with clinical, genetic,
and behavioral data enhances model robustness and predictive power. Practical exam-
ple: in Alzheimer's disease diagnosis, researchers combine MRI scans, PET imaging,
genetic markers, and cognitive test scores to improve early detection. DL models inte-
grate these diverse data types to enhance classification accuracy.
2. Transformers and attention mechanisms: Vision Transformers (ViT) and self-attention
layers are increasingly used for improved feature extraction. Practical example: Vision
Transformers (ViT) is used to analyze MRI and PET scans for early detection of AD.
ViTs use self-attention mechanisms to capture long-range dependencies in brain imag-
ing data, improving feature extraction and classification accuracy.
3. Explainable AI (XAI): Efforts to make models interpretable, aiding clinicians in under-
standing predictions and decisions. Practical example: in Alzheimer Disease detection
from MRI scans, SHAP (SHapley Additive exPlanations) is used to highlight the most
important pixels in an image that contribute to a model’s decision, helping radiologists
trust AI-driven predictions.

13
262 Page 28 of 39 S. Mohsen

4. Federated learning: Enabling collaborative model training across institutions with-

out sharing sensitive patient data. Practical example: a federated learning approach in
Alzheimer Disease detection allows multiple hospitals to collaboratively train a deep
learning model on MRI images without transferring patient data, preserving privacy
while improving generalization.
5. Synthetic data generation: Techniques like GANs to augment datasets, addressing
limited availability and class imbalance. Practical example: in rare disease diagnosis,
GANs (Generative Adversarial Networks) generate synthetic MRI scans to balance
underrepresented classes in datasets, improving model performance in detecting condi-
tions with limited real data.

By comparing the mentioned five trends in terms of the models, datasets, techniques, and
challenges, it becomes evident that the selection of approach depends on particular project
goals, data availability, and computational resources.
Also, there are several suggestions in this field are introduced as follows:

1- Focus on early detection: Develop methods for early-stage diagnosis, such as MCI to
Alzheimer’s transition prediction.
2- Enhanced interpretability: Integrate XAI tools to build trust in clinical applications.
3- Ethical AI: Address biases in datasets to avoid disparities in diagnosis and treatment
recommendations.
4- Cross-domain collaboration: Encourage partnerships between AI experts, neuroscien-
tists, and clinicians to refine methodologies.
5- Real-world validation: Validate models on diverse, real-world datasets to ensure scal-
ability and generalization.

15 Future directions in DL for Alzheimer disease detection

Several suggestions could be applied such as multi-modal learning to combine imaging

data with genetic, clinical, and lifestyle information, also explainable AI (XAI) to develop
interpretable models to gain clinicians’ trust, Larger and diverse datasets to enhance model
robustness across demographics, and real-time applications to translate DL models into
clinical settings for real-time diagnostics. Also, there are many future works aim to make
Alzheimer's disease classification more accurate, interpretable, and clinically viable. These
works can be applied as follows:

1- Integration of multi-modal data: Future research can focus on fusing various data modal-
ities like MRI, PET, clinical records, and genomic data for holistic disease modeling.
2- Advanced transfer learning: Expanding pre-trained models on domain-specific tasks to
improve early detection accuracy and reduce the need for large labeled datasets.
3- Personalized models: Creating models tailored to individual patients by leveraging lon-
gitudinal data and personal health records.
4- Real-time diagnosis tools: Developing lightweight models deployable on edge devices
for real-time, point-of-care screening.

13
Alzheimer’s disease detection using deep learning and machine… Page 29 of 39 262

5- Longitudinal prediction models: Exploring disease progression through temporal data

analysis using RNNs, LSTMs, or Transformers.
6- Explainability improvements: Designing interpretable models to gain clinician trust by
explaining predictions using heatmaps or feature attribution techniques.
7- Federated learning applications: Promoting decentralized training approaches to
enhance collaboration while maintaining data privacy.
8- Synthetic data and augmentation: Using GANs and variational autoencoders to expand
datasets for better model training.
9- Exploration of non-imaging biomarkers: Incorporating speech, handwriting, or gait
analysis for non-invasive early diagnosis.
10- Bias mitigation: Ensuring models are unbiased by testing across diverse populations to
generalize performance.

16 Benchmark comparison of DL models for AD detection

The following is a detailed benchmark comparison across various studies in the field of
AD classification, focusing on the deep learning models used, datasets, and accuracy per-
formance. In (Slimi et al. 2024), a hybrid DL model integrating DenseNet121 and Xcep-
tion networks has demonstrated exceptional performance in AD classification. Utilizing the
ADNI dataset comprising MRI images, this approach achieved an impressive overall accu-
racy of 99.85% through fivefold cross-validation. This model leverages the complementary
strengths of DenseNet121 and Xception for feature extraction, providing a robust frame-
work that enhances detection performance. Key innovations include using data augmenta-
tion techniques like SMOTE to balance datasets and enhance generalization. Furthermore,
the model exhibits resilience against various image noise types, including Gaussian, Salt-
and-Pepper, and Speckle, making it a robust solution for practical applications in Alzheim-
er's disease detection.
In (Turrisi et al. 2024), CNNs have been extensively used for Alzheimer’s disease clas-
sification, leveraging various architectural designs to achieve robust performance. Utilizing
the ADNI dataset, these models have consistently delivered accuracy rates ranging between
95 and 97%, with slight variations based on the specific architecture and the inclusion of
data augmentation techniques. A key strength of CNN-based approaches is their focus on
reproducibility, ensuring that experimental setups and results can be consistently validated.
Moreover, the diversity in architectural choices—ranging from simple to highly complex
designs—enables researchers to tailor models to the nuances of their datasets. The consis-
tent application of cross-validation further reinforces the reliability of these models, making
CNNs a foundational tool in Alzheimer's disease detection and classification.
In (Tong et al. 2024), multiple Instance Learning (MIL) has emerged as an approach for
dementia classification, particularly in scenarios including weakly labeled or incomplete
data. Using the OASIS dataset, which includes MRI scans and clinical data, MIL achieves
accuracy rates between 85 and 90%, depending on the configuration. The model's strength
lies in its ability to handle missing annotations and ambiguous information, which are com-
mon challenges in clinical environments. By evaluating sets of instances (e.g., slices of MRI
scans or segments of clinical data) rather than requiring explicit labels for each instance,
MIL effectively classifies dementia even with limited or incomplete information. This capa-

13
262 Page 30 of 39 S. Mohsen

bility makes it a practical and efficient tool for real-world applications where fully labeled
datasets are rare.
In (Morris et al. 2406), explainable AI (XAI) has been increasingly integrated into CNNs
for AD diagnosis, focusing on enhancing the interpretability of AI-driven decisions. Using
the ADNI dataset of MRI images, these models achieve accuracy levels of 96–98%, compa-
rable to traditional CNNs. However, the distinguishing contribution of XAI-based models
lies in their use of methods such as saliency maps and attention mechanisms to provide
visual or conceptual explanations for their predictions. This capability helps build trust
among clinicians by offering insights into how the model identifies Alzheimer’s-related
patterns, such as brain atrophy. By addressing the “black box” nature of DL, XAI-driven
models promote the adoption of AI in clinical settings, fostering collaboration between AI
systems and medical practitioners through transparent and interpretable decision-making
processes.
In (Khan et al. 2022), the study explored hybrid DL models—comprising CNN, Bidirec-
tional LSTM, and Stacked Deep Dense Neural Network (SDDNN)—for early Alzheimer’s
disease (AD) detection through text classification of clinical transcripts. Leveraging the
DementiaBank dataset, the models were trained and evaluated using both randomly initial-
ized weights and pre-trained GloVe embeddings. Extensive hyperparameter tuning was per-
formed through GridSearch method. Among these models, the SDDNN mixed with GloVe
embeddings reached the highest accuracy of 93.31% and outperformed others in metrics
such as AUC and recall. The results underscore the promise of automated approaches in
assisting clinicians with early AD diagnosis, while emphasizing the need for further research
to enhance performance on larger datasets.
In (Ramani et al. 2024), three-dimensional Convolutional Neural Networks (3D CNNs)
have proven highly effective for the early detection of AD by leveraging 3D volumetric
data from MRI scans. Utilizing the ADNI dataset, these models achieve accuracy rates of
93% to 97%, depending on their specific configurations and pre-processing approaches.
The primary strength of 3D CNNs is their ability to capture spatial dependencies across
the three-dimensional structure of the brain, enabling the detection of subtle changes in
brain regions associated with early Alzheimer's. Unlike traditional 2D CNNs, which analyze
slices of imaging data independently, 3D CNNs analyze the complete volumetric context,
offering improved sensitivity to early structural alterations. This capability makes 3D CNNs
particularly valuable for diagnosing Alzheimer's disease at its earliest stages, facilitating
timely interventions.
In (Shastry 2024), a custom CNN has been developed using the ADNI dataset, which
includes 10,000 MRI images categorized into three classes: Non-demented, MCI, and AD.
The methodology involved incorporating regularization methods, e.g. dropout, and employ-
ing data augmentation techniques to improve the model's robustness and generalizability.
These strategies significantly enhanced classification performance, achieving high accu-
racy while mitigating overfitting by expanding the effective training dataset with synthetic
variations. However, the custom CNN faces challenges, including limited interpretability,
which restricts its ability to provide explanations for its predictions—a critical factor for
clinical adoption. Additionally, the model is computationally expensive to train, requir-
ing substantial computational power and large memory, which may limit its scalability in
resource-constrained settings. Despite these limitations, the approach demonstrates a pow-
erful framework for AD detection.

13
Alzheimer’s disease detection using deep learning and machine… Page 31 of 39 262

In (Nanthini et al. 2024), the Hybrid Deep Belief Network (DBN) approach integrates
imaging and non-imaging data using a multi-task learning framework, applied to a com-
bined dataset of ADNI and OASIS with 15,000 images. The data is categorized into 3
classes: Cognitive Normal, MCI, and AD. The methodology leverages the representational
power of Deep Belief Networks (DBNs) to process multi-modal data, improving the ability
to extract complex patterns and dependencies. This approach excels in robust multi-modal
learning and provides effective predictions of disease progression, particularly by synthe-
sizing diverse data types like MRI scans and clinical metrics. However, the integration
of multi-modal data imposes high computational demands, requiring substantial resources
for training and deployment. Despite these challenges, the Hybrid DBN method offers a
promising avenue for improving diagnostic accuracy and understanding Alzheimer’s dis-
ease progression.
Hussain et al. (2020) introduced a 12-layer CNN model to classify two categories
(Alzheimer/healthy) on MRI data from the OASIS dataset, achieving 97.75% accuracy and
97.50% F1-score, surpassing pre-trained architectures like InceptionV3 and VGG. Erdog-
mus and Cui et al. (2021) utilized adaptive logistic regression with particle swarm optimiza-
tion (PSO) on the ADNI dataset, reporting accuracy values of 96.27%, 84.81%, and 76.13%
for different binary classifications of Alzheimer’s subtypes. Erdogmus and Kabakus (Erdog-
mus and Kabakus 2023) proposed a 12-layer CNN optimized using 12 hyperparameters for
the DARWIN dataset, which involved converting 1D data to 2D, yielding a classification
accuracy of 90.4%.
Sun et al. (2021) enhanced ResNet50 with spatial transformer networks and attention
mechanisms, achieving 97.1% accuracy. Manimurugan (2020) fine-tuned VGG19 on
OASIS data to achieve 95.82% accuracy, while Sharma et al. (2022) employed a VGG16-
based model with ANN for Alzheimer’s classification, obtaining 90.4% accuracy for four-
class classification. Savas (2022) identified EfficientNetB0 as the top-performing model in
a comparative analysis with a 92.98% accuracy rate. Lahmiri (2023) combined CNN-based
feature extraction with KNN classification optimized using Bayesian optimization, achiev-
ing 94.96% accuracy on OASIS data.
There are several common challenges were encountered across the mentioned studies on
Alzheimer’s disease classification using deep learning techniques: one of the primary issues
is data availability and quality. Many studies rely on publicly available datasets such as
ADNI, OASIS, and DementiaBank, which, while valuable, may not always be representa-
tive of diverse populations. Additionally, limited data can lead to overfitting, especially in
complex models requiring extensive training.
Another significant challenge is class imbalance. Some datasets contain an uneven dis-
tribution of classes, with fewer samples for early-stage Alzheimer’s or Mild Cognitive
Impairment (MCI). This imbalance can bias models toward majority classes, reducing per-
formance for underrepresented groups. Researchers often apply techniques such as overs-
ampling, undersampling, or weighted loss functions to address this issue, but it remains a
persistent limitation.
Interpretability and explainability also pose critical concerns. Deep learning models, par-
ticularly CNNs, often function as "black boxes," making it difficult for clinicians to trust
and interpret their decisions. Although explainable AI (XAI) methods like saliency maps
and attention mechanisms have been introduced, their effectiveness in real-world clinical

13
262 Page 32 of 39 S. Mohsen

decision-making remains an ongoing challenge. Without clear interpretability, AI models

may struggle to gain acceptance in medical practice.
The computational complexity of deep learning models is another hurdle. Advanced
architectures, such as 3D CNNs and hybrid deep learning models, demand substantial com-
putational resources, making them difficult to implement in clinical settings with limited
infrastructure. Training deep models on large MRI datasets requires powerful GPUs and
extended processing time, limiting accessibility for many research institutions.
Generalizability and reproducibility are also key concerns. While cross-validation tech-
niques enhance model reliability, models trained on one dataset (e.g., ADNI) may not gen-
eralize well to others (e.g., OASIS or DARWIN) due to variations in imaging protocols,
demographics, and clinical criteria. Ensuring that models remain robust across different data
sources is a significant challenge that researchers must address.
Handling noisy and incomplete data presents further difficulties. Clinical datasets often
contain missing or weakly labeled information, requiring techniques like Multiple Instance
Learning (MIL) to compensate. Noisy data, arising from imaging artifacts or inconsistent
labeling, can reduce model performance and lead to unreliable predictions.
Another challenge is feature extraction and data representation. The effectiveness of deep
learning depends on the quality of extracted features, but determining the most relevant
features for Alzheimer’s detection remains complex. Multi-modal learning approaches that
integrate MRI scans, clinical records, and cognitive assessments can improve accuracy but
also introduce additional complexity in feature fusion and model design.
Overfitting and regularization are also recurrent issues. Many studies applied techniques
such as dropout, data augmentation, and hyperparameter tuning to mitigate overfitting.
However, ensuring that models remain robust across different datasets is still a challenge.
Some models achieve high accuracy on training data but fail to generalize well on unseen
test data.
Finally, clinical adoption and validation remain major barriers. Despite the high reported
accuracy of many models, deep learning approaches require extensive validation before
clinical deployment. Regulatory challenges, ethical concerns, and the need for seamless
integration into existing medical workflows hinder real-world implementation. Without
addressing these concerns, AI-based models may struggle to transition from research set-
tings to practical healthcare applications.
Table 8 presents a comparison for several studies in AD detection in terms of the method-
ology, dataset, AI models, accuracy, and key contributions. The used criteria is the accuracy
metric for the same application of Alzheimer disease (AD). All studies include AI models
that used for AD detection applications, these models are compared in terms of the accuracy
to evaluate their performance.
The architecture of a deep learning model is selected based on some factors for instance
number of parameters. So, there are many considerations to choose a model. For example,
the Inception-V3 model has a higher accuracy than VGG-19 and it needs less size on the
RAM memory than VGG-19, due to the small number of parameters for the Inception-
V3. Therefore, it requires has computational complexity (more mathematical computations/
operations), which lead to consuming more training time.

13
Alzheimer’s disease detection using deep learning and machine… Page 33 of 39 262

Table 8 A comparison for several studies in AD detection

Study Methodology Dataset AI models Accuracy Key contributions
Slimi Hybrid Deep ADNI (MRI DenseNet121, 99.85% Feature fusion, noise
et al. Learning Model images) Xception (hybrid) robustness, high ac-
(2024) (DenseNet121 & curacy, SMOTE for
Xception) data augmentation
Turrisi CNN-Based Model ADNI Various CNN 95–97% Reproducibility,
et al. for AD Classification architectures consistent
(2024) evaluation,
performance
with data
augmentation
Tong Multiple Instance OASIS Multiple Instance 85–90% Handling weakly
et al. Learning for Demen- (MRI + clinical) Learning (MIL) labeled data, dealing
(2024) tia Classification with incomplete
clinical data
Morris Explainable AI ADNI CNN + XAI 96–98% Transparency, trust-
et al. (XAI) for AD techniques (e.g., building in clinical
(2406) Diagnosis saliency maps, settings, interpreta-
attention) ble decision-making
Khan Implementing three DementiaBank Hybrid models 93.31% Achieving signifi-
et al. deep learning mod- (CNN, Bidirec- cant hyperparameter
(2022) els with a significant tional LSTM), optimization via
hyperparameter SDDNN GridSearch
optimization
Ra- 3D CNN for Early ADNI (3D MRI 3D CNN (volu- 93–97% 3D spatial depen-
mani AD volumes) metric data) dencies for
et al. Detection early detection,
(2024) capturing
subtle brain changes
in
volumetric MRI
data
Shas- Custom CNN + data ADNI CNN – High accuracy,
try augmentation reduced overfitting
(2024)
Nan- Multi-task learning ADNI + OASIS DBN + Multi-task – Effective progres-
thini sion prediction
et al.
(2024)
Hus- Applying three deep OASIS CNN, Incep- 97.75 Apply CNN for both
sain learning: models for tionV3, and VGG binary and multi
et al. binary classification classification
(2020) (Alzheimer/healthy)
on MRI
Cui Implementing adap- ADNI Developing PSO 96.27 High accuracy,
et al. tive logistic regres- as an optimization reduced overfitting
(2021) sion with particle algorithm
swarm optimization
(PSO)
Erdog- Building 12-layers DARWIN CNN 90.4 Various processing
mus CNN optimized on huge dataset
and using twelve
Ka- hyperparameters
bakus
(2023)

13
262 Page 34 of 39 S. Mohsen

Table 8 (continued)
Study Methodology Dataset AI models Accuracy Key contributions
Sun Enhancing ResNet50 ADNI Hybrid of Transfer 97.10 A new ResNet
et al. with spatial trans- Learning model incorporating
(2021) former networks the Mish activation
and attention function
mechanisms
Man- Applying trans- OASIS VGG19 94.82 Fine tuning for the
imuru- fer learning for hyperparameters
gan multi-classification
(2020)
Shar- Employing a Kaggle VGG16 90.4 Features from MRI
ma VGG16-based scan images are
et al. model with ANN extracted using the
(2022) VGG16 model
Savaş Implementing ADNI Transfer Learning 92.98 Apply several pro-
(2022) EfficientNetB0 as cessing techniques
the top-performing on the dataset
model
Lah- CNN-based feature OASIS Developing CNN 94.96 Applying processing
miri extraction with KNN with an optimiza- with both ML and
(2023) classification tion algorithm DL models

17 Conclusions

This paper examines the latest advancements in DL and ML models for AD classification.
It also explores various applications of AI in AD research, including datasets, preprocessing
methods, challenges, and notable recent studies in the field. Furthermore, the paper dis-
cusses medical imaging modalities, risk factors associated with AD, the disease's progres-
sion stages, and key metrics used to evaluate the performance of AI models. Additionally,
it provides comparative analyses of different DL approaches, highlights their limitations,
identifies emerging trends, and offers recommendations and future directions for this rapidly
evolving domain.
In conclusion, deep learning has emerged as a transformative tool in the detection and
classification of AD, offering potential for early diagnosis and personalized treatment.
While advancements in multi-modal integration, transfer learning, and model interpretabil-
ity have been made, challenges such as data scarcity, computational complexity, and model
generalization remain.
Deep learning has proven to be a game-changer in Alzheimer’s disease classifica-
tion, offering unprecedented accuracy and efficiency. While challenges remain, continued
advancements in model architectures, data availability, and interpretability promise to cover
the gap between research and clinical practice. This paper presents a review to underscore
the potential of DL models as critical tools in the fight against AD.
In the context of evaluation metrics for AD detection, it is essential to use a combination
of the evaluation metrics to ensure reliable and accurate results. Proper evaluation allows for
the optimization of DL models, making them more effective in clinical practice and improv-
ing early detection of AD.
Future work should focus on refining these models, ensuring clinical applicability, and
promoting collaborations to enhance diagnostic accuracy and expand access to AD care,
ultimately enhancing patient outcomes and advancing healthcare solutions.

13
Alzheimer’s disease detection using deep learning and machine… Page 35 of 39 262

Author contributions The author contributed to the writing—the original draft of the manuscript, concepts,
methodology, resources, visualization, similarity reduction, and the editing of the manuscript, the review of
the writing and grammatical errors for the manuscript, and supervision of the proposed work.

Funding Open access funding provided by The Science, Technology & Innovation Funding Authority
(STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
No funding was received for this manuscript.

Data availability No datasets were generated or analysed during the current study.

Declarations

Conflict of interest The authors declare no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons
licence, and indicate if changes were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material.
If material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References
Aaboub F, Chamlal H, Ouaderhman T (2023) Analysis of the prediction performance of decision tree-based
algorithms, International Conference on Decision Aid Sciences and Applications (DASA), Annaba,
Algeria, pp. 7–11.
Alattar AE, Mohsen S (2023) A survey on smart wearable devices for healthcare applications. Wireless Pers
Commun 132:775–783
AlSaeed D, Omar SF (2022) Brain MRI analysis for Alzheimer’s disease diagnosis using CNN-based feature
extraction and machine learning. Sensors 22(8):2911
Alsubaie MG, Luo S, Shaukat K (2024) Alzheimer’s disease detection using deep learning on neuroimaging:
a systematic review. Mach Learn Knowl Extract 6(1):464–505
Assam M, Kanwal H, Farooq U, Shah SK, Mehmood A, Choi GS (2021) An efficient classification of MRI
brain images. IEEE Access 9:33313–33322
Aswin KS, Purushothaman M, Sritharani P (2022) ANN and deep learning classifiers for BCI applications,
Third International Conference on Intelligent Computing Instrumentation and Control Technologies
(ICICICT), Kannur, India, 2022, pp. 1603–1607.
Bellio M (2021) Translating predictive models for Alzheimer’s disease to clinical practice: user research,
adoption opportunities, and conceptual design of a decision support tool, Doctoral thesis (Ph.D), Uni-
versity College London
BJ BN, Yadhukrishnan S (2023) A comparative study on document images classification using logistic
regression and multiple linear regressions, Second International Conference on Augmented Intelligence
and Sustainable Systems (ICAISS), Trichy, India, pp. 1096–1104.
Bloch L, Friedrich CM (2019) Classification of Alzheimer’s disease using volumetric features of multiple
MRI scans, 41st Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), Berlin, Germany, pp. 2396–2401.
Chaddad A, Desrosiers C, Niazi T (2018) Deep radiomic analysis of MRI related to Alzheimer’s disease.
IEEE Access 6:58213–58221
Chakraborty S, Banik J, Addhya S, Chatterjee D (2020) Study of dependency on number of LSTM units for
character based text generation models, International Conference on Computer Science, Engineering
and Applications (ICCSEA), Gunupur, India, pp. 1–5.
Chicco D, Warrens MJ, Jurman G (2021) The Matthews correlation coefficient (MCC) is more informative
than Cohen’s Kappa and Brier score in binary classification assessment. IEEE Access 9:78368–78381
Ching WP, Abdullah SS, Shapiai MI (2024) Transfer learning for Alzheimer’s disease diagnosis using effi-
cientNet-B0 convolutional neural network. J Adv Res Appl Sci Eng Technol 35(1):181–191

13
262 Page 36 of 39 S. Mohsen

Chu NN, Gebre-Amlak H (2021) Navigating neuroimaging datasets ADNI for Alzheimer’s disease. IEEE
Consum Electron Mag 10(5):61–63
Cui X, Xiao R, Liu X, Qiao H, Zheng X, Zhang Y, Du J (2021) Adaptive LASSO logistic regression based
on particle swarm optimization for Alzheimer’s disease early diagnosis. Chemom and Intell Lab Syst
215:104316
de Almeida JRD, Oliveira JL (2019) GenericCDSS—a generic clinical decision support system, IEEE 32nd
International Symposium on Computer-Based Medical Systems (CBMS), Cordoba, Spain, pp. 186–191
Erdogmus P, Kabakus AT (2023) The promise of convolutional neural networks for the early diagnosis of the
Alzheimer’s disease. Eng Appl Artif Intell 123:106254
Feng C, Elazab A, Yang P, Wang T, Zhou F, Hu H, Xiao X, Lei B (2019) Deep learning framework for
Alzheimer’s disease diagnosis via 3D-CNN and FSBi-LSTM. IEEE Access 7:63605–63618
Feng Y, Xu J, Ji YM, Wu F (2021) LLM: learning cross-modality person re-identification via low-rank local
matching, IEEE Signal Processing Letters, 28:1789–1793,
Fletcher R, Díaz XS, Bajaj H, Ghosh-Jerath S (2017) Development of smart phone-based child health screen-
ing tools for community health workers, IEEE Global Humanitarian Technology Conference (GHTC),
San Jose, CA, USA, pp. 1–9
Gao XW, Hui R (2016) A deep learning based approach to classification of CT brain images, SAI Computing
Conference (SAI), London, UK, pp. 28–31
Gnanasegar SM, Bhasuran B, Natarajan J (2020) A long short-term memory deep learning network for MRI
based Alzheimer’s disease dementia classification. J Appl Bioinf Comput Biol 9(6):1–7
Gong Z, Chanmean M, Gu W (2024) Multi-scale hybrid attention integrated with vision transformers for
enhanced image segmentation, 2nd International Conference on Algorithm, Image Processing and
Machine Vision (AIPMV), Zhenjiang, China, pp. 180–184.
Grundman M, Petersen RC et al (2004) Mild cognitive impairment can be distinguished from Alzheimer
disease and normal aging for clinical trials. Arch Neurol 61(1):59
Guarín DL, Wong JK, McFarland NR, Ramirez-Zamora A (2024) Characterizing disease progression
in Parkinson’s disease from videos of the finger tapping test. IEEE Trans Neural Syst Rehabil Eng
32:2293–2301
Hashemifar S, Iriondo C, Casey E, Hejrati M (2022) DeepAD: a robust deep learning model of Alzheimer's
disease progression for real-world clinical applications, Preprint at arXiv:2203.09096
Huang Z, Zhu X, Ding M, Zhang X (2020) Medical image classification using a light-weighted hybrid neural
network based on PCANet and densenet. IEEE Access 8:24697–24712
Huang L, Qin J, Zhou Y, Zhu F, Liu L, Shao L (2023) Normalization techniques in training DNNs: methodol-
ogy, analysis and application. IEEE Trans Pattern Anal Mach Intell 45(8):10173–10196
Hussain E, Hasan M, Hassan SZ, Azmi TH, Rahman MA Parvez MZ (2020) Deep learning based binary clas-
sification for Alzheimer’s disease detection using brain MRI images, 2020 15th IEEE Conference on
Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, pp. 1115–1120.
Iqbal S, Qureshi AN, Li J, Mahmood T (2023) On the analyses of medical images using traditional machine
learning techniques and convolutional neural networks. Archiv Comput Methods Eng 30:3173–3233
Jo T, Nho K, Saykin AJ (2019) Deep learning in Alzheimer’s disease: diagnostic classification and prognostic
prediction using neuroimaging data. Front Aging Neurosci 11(220):1–14
Kaya M, Çetın-Kaya Y (2024) A novel deep learning architecture optimization for multiclass classification of
Alzheimer’s disease level. IEEE Access 12:46562–46581
Kazemi A, Boostani R, Odeh M, AL-Mousa MR (2022) Two-layer SVM, towards deep statistical learn-
ing, International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI),
Zarqa, Jordan, pp. 1–6.
Khan YF, Kaushik B, Rahmani MKI, Ahmed ME (2022) Stacked deep dense neural network model to predict
Alzheimer’s dementia using audio transcript data. IEEE Access 10:32750–32765
Khan K, Husain S, Nauryzbayev G, Hashmi M (2023) Development and evaluation of ANN, ACOR-ANN,
ALO-ANN based small-signal behavioral models for GaN-on-Si HEMT, 30th IEEE International Con-
ference on Electronics, Circuits and Systems (ICECS), Istanbul, Turkiye, pp. 1–4.
Koutkias VG, Chouvarda I, Triantafyllidis A, Malousi A, Giaglis GD, Maglaveras N (2010) A personalized
framework for medication treatment management in chronic care. IEEE Trans Inf Technol Biomed
14(2):464–472
Lahmiri S (2023) Integrating convolutional neural networks, kNN, and Bayesian optimization for effi-
cient diagnosis of Alzheimer’s disease in magnetic resonance images. Biomed Signal Process Control
80:104375
Li B, Xu K, Feng D, Mi H, Wang H Zhu J (2019) Denoising convolutional autoencoder based B-mode ultra-
sound tongue image feature extraction, IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), Brighton, UK, pp. 7130–7134

13
Alzheimer’s disease detection using deep learning and machine… Page 37 of 39 262

Li J, Liu H, Li K, Shan K (2024) Heart sound classification based on two-channel feature fusion and dual
attention mechanism, 5th International Conference on Computer Engineering and Application (ICCEA),
Hangzhou, China, pp. 1294–1297.
Lu J, Zhang Q, Yang Z, Tu M (2019) A hybrid model based on convolutional neural network and long short-
term memory for short-term load forecasting, 2019 IEEE Power & Energy Society General Meeting
(PESGM), Atlanta, GA, USA, pp. 1–5
Malik I, Iqbal A, Gu YH, Al-antari MA (2024) Deep learning for Alzheimer’s disease prediction: a compre-
hensive review. Diagnostics 14(12):1281
Manimurugan S (2020) Classification of Alzheimer’s disease from MRI images using CNN based pre-trained
VGG-19 model. J Comput Sci Intell Technol. 1(2):34–41
Marcus DS, Fotenos AF, Csernansky JG, Morris JC, Buckner RL (2010) Open access series of imaging studies:
longitudinal MRI data in nondemented and demented older adults. J Cogn Neurosci 22(12):2677–2684
Minoofam SAH, Bastanfard A, Keyvanpour MR (2023) TRCLA: a transfer learning approach to reduce
negative transfer for cellular learning automata. IEEE Trans Neural Netw Learn Syst 34(5):2480–2489
Mohsen S, Elkaseer A, Scholz SG (2021a) Industry 4.0-oriented deep learning models for human activity
recognition. IEEE Access 9:150508–150521
Mohsen S, Elkaseer A, Scholz SG (2021) Human activity recognition using K-nearest neighbor machine
learning algorithm, 8th KES International Conference on Sustainable Design and Manufacturing, Croa-
tia, pp. 304–313.
Mohsen S, Ali AM, El-Rabaie E-SM, ElKaseer A, Scholz SG, Hassan AMA (2023a) Brain tumor classifica-
tion using hybrid single image super-resolution technique with ResNext101_32× 8d and VGG19 pre-
trained models. IEEE Access 11:55582–55595
Mohsen S, Bajaj M, Kotb H, Pushkarna M, Alphonse S, Ghoneim SSM (2023b) Efficient artificial neural
network for smart grid stability prediction. Int Trans Electr Energy Syst 2023:1–13
Mohsen S, Ali AM, Emam A (2024) Automatic modulation recognition using CNN deep learning models.
Multimed Tools Appl 83:7035–7056
Morris T, Liu Z, Liu L, Zhao X (2024) Using a convolutional neural network and explainable AI to diagnose
dementia based on MRI scans. Preprint at arXiv:2406.18555
Mukherjee P, Lall B, Shah A (2015) Saliency map based improved segmentation, IEEE International Confer-
ence on Image Processing (ICIP), Quebec City, QC, Canada, pp. 1290–1294.
Nagabushanam P, George ST, Radha S (2020) EEG signal classifiation using LSTM and improved neural
network algorithms. Soft Comput 24(13):9981–10003
Nanthini K, Sivabalaselvamani D, Chitra K, Gokul P, KavinKumar S, Kishore S., 2023 (2023) A survey on
data augmentation techniques, 7th International Conference on Computing Methodologies and Com-
munication, Erode, India, pp. 913–920.
Nanthini K, Tamilarasi A, Sivabalaselvamani D, Suresh P (2024) Automated classification of Alzheimer’s
disease based on deep belief neural networks. Neural Comput Appl 36:7405–7419
Nehal TH, Khan AA, Shifa SA, Saiyara L, Hossain U, Islam AE (2023) A Shuffling building block and
augmentation parameter tuning techniques to handle small medical dataset, International Conference
on Electrical, Computer and Communication Engineering (ECCE), Chittagong, Bangladesh, 2023, pp.
1–6.
Patil V, Nisha SL (2021) Detection of Alzheimer’s disease using machine learning and image processing,
2021 International Conference on Smart Generation Computing, Communication and Networking
(SMART GENCON), Pune, India, pp. 1–5.
Pei Z, Gou Y, Ma M, Guo M, Leng C, Chen Y, Li J (2022) Alzheimer’s disease diagnosis based on long-range
dependency mechanism using convolutional neural network. Multimed Tools Appl 81(25):36053–36068
PJ K, RK, SN, SDP (2023) Secure and enhanced medical data repository, 6th International Conference on
Contemporary Computing and Informatics (IC3I), Gautam Buddha Nagar, India, pp. 727–732.
Prakash S, Jalal AS, Pathak P (2023) Forecasting COVID-19 pandemic using prophet, LSTM, hybrid GRU-
LSTM, CNN-LSTM, Bi-LSTM and stacked-LSTM for India, 6th International Conference on Informa-
tion Systems and Computer Networks (ISCON), Mathura, India, pp. 1–6.
Quach L-D, Quoc KN, Quynh AN, Thai-Nghe N, Nguyen TG (2023) Explainable deep learning models with
gradient-weighted class activation mapping for smart agriculture. IEEE Access 11:83752–83762
Rahmatillah I, Astuty E, Sudirman ID (2023) An improved decision tree model for forecasting consumer
decision in a medium groceries store, IEEE 17th International Conference on Industrial and Information
Systems (ICIIS), Peradeniya, Sri Lanka, pp. 245–250.
Ramani R, Ganesh SS, Rao SPVS, Aggarwal N (2024) Integrated multi-modal 3D-CNN and RNN approach
with transfer learning for early detection of Alzheimer’s disease. Iran J Sci Technol Trans Electr Eng.
https://doi.org/10.1007/s40998-024-00769-z

13
262 Page 38 of 39 S. Mohsen

Ramteke N, Maidamwar P (2023) Cardiac patient data classification using ensemble machine learning tech-
nique, 14th International Conference on Computing Communication and Networking Technologies
(ICCCNT), Delhi, India, 2023, pp. 1–6.
Raza N, Naseer A, Tamoor M, Zafar K (2023) Alzheimer disease classification through transfer learning
approach. Diagnostics 13(4):801
Resmi S, Singh T, Singh RP, Kumar P (2023) Skull stripping in magnetic resonance imaging of brain using
semantic segmentation, 14th International Conference on Computing Communication and Networking
Technologies (ICCCNT), Delhi, India, pp. 1–7.
Sahu S, Sarma H, Jyoti Bora D (2018) image segmentation and its different techniques: an in-depth analysis,
International Conference on Research in Intelligent and Computing in Engineering (RICE), San Salva-
dor, El Salvador, pp. 1–7.
Sakuma I (2013) Education and training in regulatory science for medical device development, 35th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka,
Japan, pp. 3155–3158.
Salah SBH, Chouchene M, Zayene MA, Sayadi FE (2024) Classification of Alzheimer's diseases from PET
images using a convolutional neural network. International Conference on Control, Automation and
Diagnosis (ICCAD), Paris, France, pp. 1–5.
Sang Y, Li W (2024) Classification study of Alzheimer’s disease based on self-attention mechanism and DTI
imaging using GCN. IEEE Access 12:24387–24395
Savaş S (2022) Detecting the stages of Alzheimer’s disease with pre-trained deep learning architectures. Arab
J Sci Eng 47(2):2201–2218
Sharma S, Guleria K, Tiwari S, Kumar S (2022) A deep learning based convolutional neural network model
with VGG16 feature extractor for the detection of Alzheimer disease using MRI scans. Meas Sens
24:100506
Shastry KA (2024) Deep learning-based classification of Alzheimer’s disease using MRI scans: a customized
convolutional neural network approach. SN Comput Sci 5:917
Shuvo MMH, Ahmed N, Islam H, Alaboud K, Cheng J, Mosa ASM, Islam SK (2022) Machine learning
embedded smartphone application for early-stage diabetes risk assessment, IEEE International Sympo-
sium on Medical Measurements and Applications (MeMeA), Messina, Italy, pp. 1–6.
Skolariki K, Exarchos T, Vlamos P (2020) Contributing factors to Alzheimer’s disease and biomarker iden-
tification techniques, 5th South-East Europe Design Automation, Computer Engineering, Computer
Networks and Social Media Conference (SEEDA-CECNSM), Corfu, Greece, pp. 1–8.
Slimi H, Balti A, Abid S, Sayadi M (2024) A combinatorial deep learning method for Alzheimer’s disease
classification-based merging pretrained networks. Front Comput Neurosci 18(1444019):1–13
Sonka M, Grunkin M (2002) Image processing and analysis in drug discovery and clinical trials. IEEE Trans
Med Imaging 21(10):1209–1211
SS, CS, B. U, (2023) sMRI classification of Alzheimer's disease using genetic algorithm and multi-instance
learning (GA+MIL), International Conference on Electrical, Electronics, Communication and Comput-
ers (ELEXCOM), Roorkee, India, pp. 1–4.
Sun H, Wang A, Wang W, Liu C (2021) An improved deep residual network prediction model for the early
diagnosis of Alzheimer’s disease. Sensors 21(12):4182
Sungura R, Onyambu C, Mpolya E, Sauli E, Vianney J-M (2021) The extended scope of neuroimaging and
prospects in brain atrophy mitigation: a systematic review. Interdiscip Neurosurg 23:100875
Swami A, T. V, (2023) Multi-label tabular synthetic data generation for bundle recommendation problem,
2023 IEEE 2nd International Conference on Data, Decision and Systems (ICDDS), Mangaluru, India,
pp. 1–6.
Tiwari Y, Rasool A, Hajela G (2020) Machine learning with generative adverserial network, Second Inter-
national Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India,
pp. 543–548.
Tong T, Wolz R, Gao Q, Guerrero R, Hajnal JV, Rueckert D (2024) Multiple instance learning for classifica-
tion of dementia in brain MRI. Med Image Anal 18(5):808–818
Turrisi R, Verri A, Barla A (2024) Deep learning-based Alzheimer’s disease detection: reproducibility and the
effect of modeling choices. Front Comput Neurosci 18(1360095):1–13
Vijay V, Verma P (2023) Variants of Naïve bayes algorithm for hate speech detection in text documents,
International Conference on Artificial Intelligence and Smart Communication (AISC), Greater Noida,
India, pp. 18–21.
Wu J, Zhang Y, Wang K, Tang X (2019) Skip connection U-net for white matter hyperintensities segmenta-
tion from MRI. IEEE Access 7:155194–155202
Wu Q, Xie Q, Xia D (2024) Application of computer vision and deep learning neural network in multi-modal
information fusion, International Conference on Data Science and Network Security (ICDSNS), Tiptur,
India, pp. 1–5.

13
Alzheimer’s disease detection using deep learning and machine… Page 39 of 39 262

Yan L, K Song (2010) Design of ARM-based telemedicine consultation system, International Conference on
Biomedical Engineering and Computer Science, Wuhan, China, pp. 1–4.
Yang D, Wang R, Song C (2024) Classification of Alzheimer's disease using fMRI-based brain functional
network data, 6th Asia Symposium on Image Processing, Tianjin, China, pp. 97–101.
Zhang S, Hu J, Bao Z, Wu J (2013) Prediction of spectrum based on improved RBF neural network in
cognitive radio, International Conference on Wireless Information Networks and Systems (WINSYS),
Reykjavik, Iceland, pp. 1–5.
Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey. Proc IEEE 111(3):257–276

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.