0% found this document useful (0 votes)

3 views20 pages

Federated Explainable AI-Based Alzheimer's Disease Prediction With Multimodal Data

Uploaded by

u2004048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views20 pages

Federated Explainable AI-Based Alzheimer's Disease Prediction With Multimodal Data

Uploaded by

u2004048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Received 1 February 2025, accepted 25 February 2025, date of publication 4 March 2025, date of current version 14 March 2025.

Digital Object Identifier 10.1109/ACCESS.2025.3547343

Federated Explainable AI-Based Alzheimer’s

Disease Prediction With Multimodal Data
SOBHANA JAHAN 1 , MD. RAWNAK SAIF ADIB 2 , SYED MAHMUDUL HUDA 3 ,
MD. SAZZADUR RAHMAN 4 , M. SHAMIM KAISER 4 , (Senior Member, IEEE),
A. S. M. SANWAR HOSEN 5 , DEEPAK GHIMIRE 6 , AND MI JIN PARK7
1 Department of Computer Science and Engineering, Bangladesh University of Professionals, Dhaka 1216, Bangladesh
2 Department of Computer Science and Engineering, International University of Business Agriculture and Technology, Dhaka 1230, Bangladesh
3 SuperAnnotate AI, San Francisco, CA 94105 USA
4 Institute of Information Technology, Jahangirnagar University, Savar, Dhaka 1342, Bangladesh
5 Department of Artificial Intelligence and Big Data, Woosong University, Daejeon 34606, South Korea
6 IT Application Research Center, Korea Electronics Technology Institute, Jeonju-si 54853, Republic of Korea
7 Department of Psychiatry, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 03083, South Korea

Corresponding author: Md. Sazzadur Rahman (sazzad@juniv.edu) and Mi Jin Park (dreamy16@gmail.com)
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development
Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HR22C160504) and Woosong
University Academic Research Fund, South Korea, in 2025.

ABSTRACT Alzheimer’s Disease (AD) is a progressive neurological disease that severely impairs cognitive
function. Early detection is critical for effective treatment and management. Machine Learning (ML)
methods are often used to ensure early detection and prediction. However, ML has various issues, including
the data island problem. The fragmentation that results from the data island problem makes building reliable,
effective ML models more complex, and it is particularly problematic in industries where privacy is a
concern, like healthcare. Federated Learning (FL) can help tackle the data island problem by keeping
sensitive patient data decentralized and enabling many institutions to work together on model training
without exchanging raw data, all while maintaining privacy compliance. As Random Forest (RF) is proven to
be the best-performing classifier in this research, an RF classifier is used to create FL. The model incorporates
multiple data modalities, such as Magnetic Resonance Imaging (MRI) segmentation and clinical and psy-
chological data, to capture the variety of characteristics influencing the progress of AD. Another concerning
issue with ML is its uninterpretable character. We use SHapley Additive exPlanations (SHAP) Explainable
Artificial Intelligence (XAI) techniques that emphasize important factors impacting model decisions in order
to improve predictability and transparency. This explainability promotes confidence in AI-based diagnoses
by enabling researchers and physicians to comprehend the underlying mechanisms guiding the predictions.
The combination of XAI, FL, and Open Access Series of Imaging Studies (OASIS-3) Multimodal data
offers an interpretable, scalable, reliable, and privacy-centered solution for multiple complex issues, such as
predicting AD. This approach results in better diagnosis precision, greater security, and increased confidence
in AI technologies, making it a novel methodology in medical sciences. With data privacy maintained, our
method produces 98.93% accurate predictions, providing a solid detection strategy for AD. The suggested
approach’s F1-score, Precision, Recall, and AUC are 98.93%, 98.94%, 98.93%, and 99.97%, respectively.
This work also shows that a multimodal dataset performs better than a single modal dataset.

INDEX TERMS Federated learning, data security, data privacy, clinical data, psychological data, MRI
segmentation data, explainable artificial intelligence, dementia, neurodegenerative disease.

The associate editor coordinating the review of this manuscript and

approving it for publication was Zijian Zhang .

2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 13, 2025 For more information, see https://creativecommons.org/licenses/by/4.0/ 43435
S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

I. INTRODUCTION To address the disadvantages of centralized datasets,

Alzheimer’s Disease (AD) is a Neuro-Digenarative Disorder privacy, and security, a new approach known as Federated
(NDD) where brain cell death results in memory failure and Learning (FL) is followed. FL allows models to be trained
cognitive deterioration. It is estimated that at least 50 million on decentralized datasets. The model is trained locally using
citizens worldwide suffer from AD and various dementias [1]. the data each device (such as smartphones, Internet of Things
Given that Asia is the most inhabited hemisphere in the globe, devices, or local servers) can access. Only the updated
a survey projects that by 2050, there will be around 71 million model parameters (not the raw data) are then transmitted
dementia patients in this region. About 5.5 million Americans back to a central server for aggregation. This approach
suffer from AD, which is the most intense cause of dementia protects data since the raw data is never transferred outside
in Western societies [2]. AD does not presently have a cure. the local devices. FL uses the data’s decentralized nature
However, some drugs can reduce the illness’s course [3]. The to improve privacy and lower the dangers associated with
primary sign of AD is memory deterioration, which results centralized data storage. This is the main interaction between
from recent experiences. This is one of the early signals, FL and decentralized datasets. FL makes collaborative model
but as the illness progresses, memory declines and novel training possible without requiring the transfer of sensitive
symptoms come. A family member or close friend may be data, which is especially helpful in industries like finance or
better able to identify problems if symptoms decline. healthcare, where data confidentiality is crucial.
One of the most popular uses of Machine Learning (ML) AD can be determined by various factors (such as
in healthcare is medical diagnosis [4]. ML can identify trends clinical data, genetic and psychological information, and
in particular diseases and notify medical professionals of any brain imaging). The incorporation of multiple sources of
anomalies. Analysis of clinical data facilitates a deeper com- data into the diagnostic process increases the accuracy and
prehension of the fundamental processes causing illnesses. reliability of the diagnosis. Unimodal strategies sometimes
It also helps to comprehend how risk factors affect the course fail to capture significant information that may be present in
of a disease. There is a lot of data available to clinicians these other forms of data. Considering different modalities of data
days. It covers everything from biochemical tests to imaging provides a more complete picture of the disease. FL enables
device outputs and clinical symptoms. Diagnose-challenging collaborative training on such distributed datasets, combining
diseases and their symptoms can be recognized and detected insights from all modalities while preserving privacy.
using ML. It also diagnoses other inherited disorders, Explainable Artificial Intelligence (XAI) is required to
including early-stage cancers that are challenging to detect. tackle the ‘‘uninterpretable’’ aspect of ML models, which
The issues associated with AD have gained significant renders judgments devoid of reasoning. Comprehending the
attention in technical innovation and healthcare research. rationale behind Artificial Intelligence (AI) judgments is
In this battle, ML proves to be a formidable weapon, utilizing crucial for promoting transparency, credibility, and trust
enormous volumes of data to produce insights and prediction in crucial fields such as healthcare, finance, and law
models that were previously unattainable. However, besides enforcement. XAI contributes to ensuring that ML models are
the frequently criticized ‘‘uninterpretable’’ character of ML impartial, fair, and compliant with ethical norms in addition
algorithms, where the decision-making process is opaque to being accurate. It also provides light on decision-making
and difficult to understand, using centralized datasets in ML processes, which helps programmers debug and enhance
raises severe privacy and security problems. models. XAI also makes better model openness possible,
A primary drawback of applying ML to a centralized helps identify and reduce biases, enhances overall model
dataset is the heightened danger to data security and privacy. performance, and ensures that ML models continue to reflect
When all the data is kept in one place, it becomes more humans’ goals and values.
susceptible to hackers, infringements, and misuse. Large This paper presents a Server-side setup for XAI in the
data centers may also result in inefficient processing and context of FL for Alzheimer’s research. This decentralized
storage because the network needs to be able to handle AI system aims to protect privacy by enabling cross-checking
heavy data loads, which can be expensive. The method model behavior across several nodes without requiring access
may also lead to problems with compartmentalization, which to specific client IDs. This methodology prioritizes an overall
restricts what can be gained from integrating datasets from assessment of the FL model’s explainability over client-
various sources. The procedure can run into issues with specific insights. This work provided a unique framework
compliance and regulations, especially if it is handling data designed expressly for applications in Alzheimer’s research
from several jurisdictions. In addition, adversarial assaults that combines the interpretability and transparency provided
can take advantage of model flaws and manipulate inputs to by XAI with the privacy-preserving advantages of FL.
provide biased or inaccurate results. Furthermore, tracking Existing ML, Deep Learning (DL), and FL models for
the decision-making process can be challenging when AD prediction have certain drawbacks: their performance
using uninterpretable models in ML, which might impede is suboptimal when working with a single modality, data
accountability and transparency. When data is exchanged privacy regulations limit their implementation, ML creates a
between several parties or jurisdictions, privacy concerns are data island problem, and these models are not interpretable.
exacerbated and may result in breaching data protection laws. The isolation of essential data across several businesses, insti-

43436 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

tutions, or geographic areas, which hinders data exchange 1) SYMPTOMS

and collaboration, is known as the ‘‘data island problem.’’ The primary sign of AD is memory loss, which results from
To mitigate the data island problem and the uninterpretable recent discussions or experiences. This is one of the first
problem of ML and to ensure a highly performed model with warning signs, but as the disease progresses, memory declines
a diverse dataset, this work proposes a privacy-preserving FL and new symptoms emerge. An individual experiencing the
XAI approach using a multimodal dataset. Random Forest condition may initially be aware of problems with cognition
(RF) is one of the best-performing models, so this FL and memory. If symptoms intensify, a family member may
model is built using an RF classifier. Part of this work has be more likely to notice difficulties. The assessment of
been published in [5], where the superiority of multimodal dementia involves two steps. First and foremost, it is critical
datasets over single-modal data is analyzed broadly. The to distinguish dementia syndromes from other illnesses
integration of FL, XAI, and Multimodal Processing provides that may present similarly, such as delirium, sadness, and
a scalable, trustworthy, and privacy-preserving framework moderate cognitive impairment. Secondly, the identification
for AD prediction. It guarantees better diagnostic accuracy, of a subtype of dementia disorder is crucial as it could dictate
enhanced security, and improved trust in AI systems, making the appropriate course of treatment [7]. Alteration in the brain
it a cutting-edge approach to healthcare research. The main linked to AD causes increasing problems with the following
contributions of this work are listed below: elements.
1) Proposed a novel XAI-enabled RF-FL model using a • Memory: Everyone has memory loss from time to time,
multimodal dataset to predict AD efficiently with high but AD-related cognition loss is permanent and gets
accuracy and ensure explainability. Due to multimodal worse with time. One’s ability to function at home or
data, the data diversity is clearly ensured, which will work increasingly deteriorates with memory loss.
make the model more robust and accurate. The usage • Thinking and Reasoning: AD makes it hard to
of FL and XAI made the model privacy-preserving and concentrate and think clearly, mainly while focusing
trustworthy. on calculating concepts like numbers. It might be
2) Compared and analyzed the result of the proposed somewhat difficult to manage several jobs at once. It can
model with other related recent works and found that be challenging to balance money orders, maintain an
our proposed model is better than others. account of cash, and pay bills on schedule. Eventually,
3) Broadly discussed the SHapley Additive exPlanations someone with AD may lose their capacity to understand
(SHAP)-XAI for the model and found the top five and work with numbers.
features for each class and their identifying factors. • Making Judgments and Decisions: The capacity to
This article is structured as follows: Section II comes with reason and make sound decisions in day-to-day settings
the Background of AD, FL, and XAI. Related works are is compromised by AD. For example, someone may
presented in Section III. The proposed methodology, along dress badly for the weather or make poor choices in
with the dataset description and model structure, is presented social settings. People could find it more challenging to
in Section IV. The result analysis is stated in Section V. respond to everyday problems.
Finally, Section VI comes up with the concluding remarks. • Organizing and Carrying Out Known Tasks: Easy
chores that need to be finished one after the other get
II. BACKGROUND harder. This could be cooking a meal from scratch or
A. ALZHEIMER’S DISEASE doing something you enjoy doing. People who have
The most frequent type of dementia is AD, an advancing neu- progressed AD eventually lose the ability to perform
rodegenerative illness that is most commonly distinguished simple daily routines like dressing and bathing.
by acute impairment of memory and cognitive deterioration.
Ultimately, AD can impact conduct, communication, visu- 2) STAGES
ospatial direction, and motor abilities [6]. AD is thought to The preclinical stage, also known as the presymptomatic
affect more than 6 million Americans, many of whom are period, can span several years or longer and is the first of
65 years of age or older. Many people experience the illness in the clinical phases of AD. There is no operational dete-
their lives as companions or closest relatives of a person who rioration in day-to-day activities, minor cognitive decline,
has it. One of the early signs of AD is the inability to recall initial neurological modifications in the cerebral cortex and
recent events or discussions. It eventually leads to severe hippocampus, or diagnostic manifestations of AD at this
memory loss and failure to perform daily tasks and activities. stage. The moderate or initial phase of AD is characterized
Medication may alleviate or reduce symptoms. Assistance by several symptoms that patients first experience. These
and efforts can be used to help people with the condition and symptoms include mood fluctuations, the start of depression,
those who take care of them. AD cannot be cured. Severe trouble recalling and concentrating on things, and navigating
brain damage at an advanced stage might lead to infection, time and location [8]. In the middle phase of AD, the
hunger, and dehydration. There is a risk of death from these disease has extended to sections of the cerebral cortex,
conditions. causing more severe memory loss, which includes difficulties

VOLUME 13, 2025 43437

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

identifying relatives and close companions, losing control • Family History: Another significant risk factor for
over impulses, and having issues with communication, AD is family history. Studies show that individuals
reading, and writing. Extreme AD, also known as late-stage with a father, brother, or sister with Alzheimer’s
AD, is characterized by a progressive loss of operation are at an increased risk of developing the disease
and cognition, including the inability to identify loved themselves. The risk increases if more than one family
ones, confined status, difficulty taking in and urinating, and member is affected. Risk can be further increased by
ultimately the patient’s death from these adverse effects. controllable health indicators such as sleep patterns,
The disease spreads throughout the cortex, severely forming smoking practices, diabetes, or hypertension.
neuritic plaques and neurofibrillary tangles. • Sex: The fundamental reason for this is that women
usually outlive males. Nevertheless, women over 80 still
3) CAUSES have a slightly higher chance of acquiring AD compared
Aside from dementia, a few variables are thought to increase to men of the same age. There is no certainty as to why
the risk of developing AD, even if the exact etiology of the this is happening.
ailment is still unknown [9]. The root cause of AD is an • Genetics: It is well established that genetics contribute
abnormal buildup of proteins in the brain. The buildup of to AD. A person’s susceptibility to an illness is
tau and amyloid proteins causes brain cell death. The human influenced by two types of genes: deterministic and risky
brain consists of more than 100 billion neurons and other genes. Both groups have been linked to Alzheimer’s
cells. Neurons collaborate to carry out all older replications genes. Deterministic genes—those that cause disease
required for tasks like thinking, absorbing information, rather than raise the risk of getting one—are thought to
recollecting, and organizing. Several factors causing AD are be responsible for less than 1% of Alzheimer’s cases.
mentioned below:
• Age: The risk of developing AD rises every five years B. FEDERATED LEARNING
after the age of 65. Nevertheless, AD is not just a
An ML technique called FL allows building a collaborative
problem for older people. Twenty out of the patients with
model without centralizing the data. Important issues, includ-
the condition are under the age of sixty-five. This kind
ing data privacy, security, and access rights, are covered by
of AD, sometimes referred to as young or early AD, can
this approach, which is especially important when handling
begin to affect people as early as 40.
sensitive data. FL operates by requiring the following crucial
• Family History: Despite the genuine rise in danger
steps to be completed:
is tiny, the genes one inherits from one’s parents can
increase one’s risk of acquiring AD. However, AD is
1) Initialization: At the beginning of the procedure,
triggered by the transfer of a single gene in a small
a central server initializes a global model. All contribut-
number of families, and the disorder has a significantly
ing clients’ devices start with this model, which has not
greater probability of heredity.
yet been trained.
• Down’s Syndrome: Individuals who have Down syn-
2) Distribution: Every client involved has access to the
drome are more likely to acquire AD. This is due to
original global model. The model structure is the same
the possibility that the genetic changes causing Down’s
for all clients; however, each will train the model
syndrome may also later lead to the build-up of amyloid
separately using their own or local data.
plaques in the brain, which may eventually cause AD.
3) Local Training: When they receive the model, clients
• Head Injuries: AD may be more likely to develop
use their data to train it. This autonomous local training
in those who have experienced severe head injuries,
ensures that the information never leaves the client’s
although more research is required in this area.
gadget or device and protects privacy.
4) Model Updating: Each client updates the model
4) RISK FACTORS
following local data training. These updates may be
Understanding the etiology, pathophysiology, and risk fac- model parameter gradients or weights depending on
tors associated with AD remains a research focus [10]. how FL is implemented.
Researchers think that a variety of risk factors, such as genes, 5) Aggregation: After that, the central server receives all
routines, and actions, contribute to Alzheimer’s and other local model updates. The server then combines these
dementias. Numerous hazards can be altered to minimize a updates to enhance the global framework or model.
person’s chance of cognitive slump, even though other risk Federated Averaging (FedAvg), which determines the
factors, including age and family history, may be fixed. weighted average of the received updates, is one of the
• Age: Multiple epidemiological research studies concur aggregating techniques that can be applied.
that age is one of the most significant risk factors for 6) Global Model Update: The global model is updated
AD and cognitive deterioration among the numerous using the combined updates. Without actually exchang-
demographic characteristics, including gender, race, and ing the data, this stage unifies the insights from all local
social status [11]. datasets into a single model.

43438 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

7) Iteration: The distribution to global model updating model. For clinical, psychological, and MRI segmentation
processes are repeated throughout multiple rounds. data, the obtained accuracies are 98.0%, 94.21%, and
The global model learns from more varied local 88.85%, in that order. In addition to proposing an AD patient
datasets with each iteration, improving its accuracy and management system, they explainability for their work using
robustness. SHAP based on features, showing which characteristic
8) Finalization: The final global model is prepared for performs which significance. This work is an extension of the
deployment and can predict fresh, unobserved data work of Jahan et al. [5]. The usage of multimodal datasets
after the training is judged adequate or a specific over a single modality was analyzed in Jahan et al. [5] this
criterion is satisfied (finishing a given number of paper.
rounds). Ouyang et al. [14] have presented the ADMarker. The
FL allows entities to decouple ML from the requirement earliest complete solution combines novel FL algorithms
to store data in the cloud by enabling collaborative learning with multi-modal sensors to identify multidimensional,
of a shared prediction model while retaining all training data over 20 AD digital biomarkers while protecting confiden-
locally on the device. tiality. It can locate biomarkers in everyday life and create
a small, multimodal device solution that can be quickly
installed in homes. The framework that is being suggested is
C. EXPLAINABLE ARTIFICIAL INTELLIGENCE a unique three-stage FL that combines new unsupervised and
Regarding accuracy, AI and ML have shown they can upend weakly supervised multi-modal FL models to jointly solve
public services, businesses, and society. They can even many significant real-world biomarker identification issues,
outperform humans for various tasks, including speech and which comprise heterogeneity, limited computer supplies,
image recognition and language translation. DL, their most and inadequate data tags. A four-week medical experiment
effective product in terms of accuracy, is sometimes called a was conducted on 91 senior people, of whom 31 had AD,
‘‘uninterpretable’’ [12]. Explaining these models gets harder 30 had Mild Cognitive Impairment (MCI), and 30 had
and harder as they include enormous numbers of weights Cognitively Normal (NC) conditions. ADMarker was used
that carry essential information from the trained datasets. in this trial. With only a relatively minimal number of
Explainable artificial intelligence, which encourages AI labeled information, the results showed the detection of over
systems that can reveal their internal workings and explain 20 everyday events with up to 93.8% detection accuracy
their decision-making, was inspired by this problem [13]. in typical residential contexts. The scientists described their
XAI refers to the collection of procedures that enable general strategy as interpretable and dispersed, and after utilizing the
human users to understand and have faith in the output and identified digital biomarkers, they obtained 88.9% accuracy
outcomes produced by ML algorithms. Prominent figures in for MCI in the early detection of AD. The weak labels
research, business, and government have been examining the derived from the activity records for online supervised
advantages of explainability and creating algorithms that can FL are typically brief and shallow because most people
be applied in various situations. Explainability is a prereq- fail to document their behaviors in length. Despite its
uisite for AI clinical decision support systems in the health- effectiveness, the system faces limitations, including high
care industry, for example, according to research. This is computational overhead during FL and potential challenges
because the capacity to understand system outputs promotes in handling heterogeneous data quality across institutions.
collaborative decision-making among patients and healthcare These limitations highlight the need for further optimization
providers and offers much-needed framework accountability. to balance model performance with computational efficiency.
The model’s effectiveness suffers when the data used to
III. RELATED WORKS identify AD in multi-site datasets differs. Privacy concerns
Leveraging the multimodal dataset that included clinical, arise from the standard domain adaptation strategy, which
neuroimaging, and psychological data gathered from the includes exchanging information between origin and target
Open Access Series of Imaging Studies (OASIS)-3 dataset, regions. Federated Domain Adaptation Framework via Trans-
Jahan et al. [5] suggested a five-class AD prediction tech- former (FedDAvT), developed by Lei et al. [15], offers an
nique. To ensure multimodality, they conducted data-level FL-based approach to this problem that may eradicate data
fusion employing clinical data from the Alzheimer’s Disease privacy and heterogeneity. Using mean squared error for
Research Center (ADRC), segmentation data from brain subdomain modification, the self-attention maps in the origin
Magnetic Resonance Imaging (MRIs), and psychological and destination fields are synchronized. Results have shown
evaluations. To choose one trait out of 39, they employed that using three AD datasets, their proposed model achieved
Pearson’s correlation technique. They applied the Synthetic accuracy rates of 88.75%, 69.51%, and 69.88% on the AD
Minority Oversampling Technique (SMOTE) to build a vs. NC, MCI vs. NC, and AD vs. MCI two-way classification
balanced dataset because their chosen dataset doesn’t have tasks, respectively.
equal representation for all five classes. SMOTE was able Bukhari et al. [16] proposed an inventive Deep Convo-
to produce 2248 cases for each class. With 98.81% accuracy lutional Neural Network (DCNN) model and optimized it
for the multimodal approach, RF was their best-performing using FL’s assistance. By facilitating decentralized, multi-

VOLUME 13, 2025 43439

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

institution learning and exhibiting remarkable efficiency In order to predict several classes of AD, Almohimmed
and versatility, their model protects the confidentiality of et al. [20] suggested a multi-level stacking approach that
information. With a 93% accuracy rate on the OASIS integrates heterogeneous models and modalities. The ADNI
database, the model shows promise for improving patient care cognitive sub-scores, such as the clinical dementia evaluation
and AD diagnosis. (sum of boxes) and the AD evaluation scale, are among
Using the FL technique, Ghosh et al. [17] have addressed the modalities. The authors trained each modality (Clinical
the problem of data abundance for AD patients. They have Dementia Rating (CDR), Alzheimer’s Disease Assessment
used two datasets, including brain MRI pictures taken in Scale (ADAS), and Functional Activities Questionnaire
several planes: Alzheimer’s Disease Neuroimaging Initiative (FQA)) in level 1 of the suggested technique using six
(ADNI) and OASIS. They gathered 436 images for the basic models: RF, Logistic Regression (LR), Decision Tree
OASIS Dataset, 338 of which were regular MRI scans (DT), Support Vector Machine (SVM), K-nearest Neighbors
and 98 AD-positive. Additionally, 457 pictures were col- (KNN), and Native Bayes (NB). Next, construct stake testing,
lected for the ADNI Dataset, comprising 136 AD and 321 CN which combines the results of every model for the stacking
MRI scans. Then, the datasets were combined for the training training and test set and which combines the outputs of
and testing phases to assess the model’s efficiency, and then every base model for the training set. In level 2, six basic
data augmentation was performed for both datasets to expand models (RF, LR, DT, SVM, KNN, and NB) are coupled
the dataset. Using the OASIS, ADNI, and combined datasets, in the training stack for the set used for training and the
the suggested model MobileNet underwent training in a testing stack for the set to be tested to create three stacking
federated manner to achieve accuracy of 95.24%, 81.94%, models for each modality that is trained and analyzed. Meta-
and 83.97%, respectively. The accuracy could be further learners—RF is trained using stacking learning, and RF is
improved using more diverse datasets and sophisticated assessed using stacking testing. In level 3, an additional
models. There could also be room for development in the dataset called staking training and testing is created by
explainability of the existing work. combining the output forecasting the stacking model from
Almarar and Otoum [18] investigated decentralized model all three modalities (CDR, ADAS, and FQA) in the training
training for AD detection using FL in conjunction with RF and testing datasets. The meta-learner is trained using training
and XGBoost (XGB). For this research, they also made use stacking, and the last prediction is generated by evaluating the
of the AD MRI dataset. They chose XGB and RF because meta-learner utilizing the test set. The outcomes show that
of their track record of success with complicated datasets. the multi-modality strategy performs better than the single-
The accuracy for RF and XGB is 94.19% and 95.53%, modality strategy. Furthermore, when compared to standard
respectively, in the results section. FL, on the other hand, ML classifiers and stacking algorithms using full multi-
enabled numerous devices to cooperate and train this kind modalities, the suggested multi-level stacking algorithms
of ML model without centralizing the data. More direct perform best with specific features. For example, they
implementation of FL could have been done to make the work achieve accuracy, F1 score, precision, recall, and F1-scores
more impactful. of 92.08%, 92.01%, 92.07%, and 92.08% for two classes
Castro et al. [19] used the FL technique to safeguard the and 90.03%, 90.05%, 90.19%, and 90.03% for the three
security and safety of the medical picture data included in the different classes, respectively. The performance of this model
OASIS and ADNI datasets by using biometric identification is not up to the mark. The model was adequate, but it
for validating the images. The authors have constructed a could not adequately deal with high-dimensional data and
CNN model with a 2D input layer for training reasons. the computational burden that comes with it. Thus, further
There are 1052 photos in the OASIS collection, of which optimizations are needed to ensure AI frameworks exhibit
526 images are evenly split between healthy and AD patients. better scalability and generalizability on other datasets.
Conversely, ADNI has an enormous archive of 6400 photos, The BrainCrossFed model, proposed by Rashmi et al. [21],
3200 of which are of healthy individuals and AD sufferers. overcomes concerns about data privacy by leveraging labeled
Their two experiments made up their work. The CNN model data accessible at several federated hubs and a dataset for
was implemented in the first experiment, while the FL AD to calculate a cross-fed average of model weights without
model was used in the second. Accuracy rates for the CNN transferring data to the cloud. The accuracy of the suggested
model were 91.94% and 90.70% for ADNI and OASIS, DL model has grown from 99.11% to 99.77%, nearly 100%,
respectively. After five learning rounds for the FL model, as a result of the Cross FedAvg method. The model can
the maximum accuracy was achieved in Round 5, with also detect global minima with the help of such a cross-fed
OASIS and ADNI scoring 92.00% and 88.50%, respectively. average. The suggested model achieves a final performance
By putting protection mechanisms in place against assaults of 99.7% with 100%.
like the model reversal attack, the reliability of the suggested A Hybrid FL (HFL) framework was put forth by Baiying
system could be increased. The results may have been Lei et al. [22] to protect data privacy while simultaneously
improved by employing more potent models, integrating the enabling the use of unlabeled data for Deep Neural Network
datasets, and raising accuracy. (DNN) training. In order to depict the Region Of Interest

43440 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

(ROI), the authors presented a unique Brain-Region Attention Basnin et al. [26] proposed a novel Evolutionary Federated
Network (BANet) that uses attention to highlight significant Learning (EFL) strategy for the diagnosis of AD utilizing the
regions. To be more precise, the authors take the preprocessed multimodal datasets available in the ADNI repository. The
structural Magnetic Resonance Imaging (sMRI) data and approach used an evolutionary model that focused on improv-
apply a brain template to extract ROI signals. Besides, ing the effectiveness of the federated model by enhancing its
we supplement the existing loss with a self-supervised capabilities in conditions where uncertainty is a key factor.
loss to direct the attention map creation for learning the For the specific case of the ADNI multimodal dataset (MRI,
representations from unlabeled data. Lastly, the authors Positron Emission Tomography (PET), and clinical data),
tested this approach on a multi-center database with five both baseline and EFL methods were implemented. Still,
AD datasets. The results of the experiment demonstrate it was shown that EFL had higher accuracy in terms of
that the suggested strategy outperforms the most advanced AD diagnosis. The high cost of evolutionary optimization
techniques, with mean accuracy rates on the MCI vs. NC, and extreme heterogeneity of the data in relation to clients
AD vs. NC, and AD vs. MCI tests of 63.34%, 85.69%, and served as the main limitations of this work. This work did
698.89%, respectively. The results may have been improved not address the interpretability aspect of the diagnosis model.
by employing more potent models, integrating the datasets, In future directions, the focus could be on developing models
and raising accuracy. that are highly scalable and have high dynamic adaptability
Ghosh and Gayathri [23] presented DenseFed-PSO, an FL features.
framework combining DenseNet and Particle Swarm Opti- According to the research done by Mitrovska et al. [27],
mization (PSO) for AD detection. DenseNet served as the a Secure Federated Learning (SFL) technique is suggested,
backbone for efficient feature extraction, while PSO opti- which makes use of homomorphic encryption and secure
mized hyperparameters to enhance model performance. The aggregation techniques. They explained that this is done to
federated approach ensured privacy by training models across conceal the data during the model training process with all the
decentralized datasets without sharing sensitive patient infor- institutions involved. As assessed on the ADNI dataset, SFL
mation. Experimental results demonstrated superior accuracy preserved the patient’s diagnostic details while still being able
and robustness compared to traditional methods. However, to make the diagnosis with reasonable precision. However,
challenges include computational overhead, scalability issues some of the limitations include the excessive expense of
with heterogeneous data, and high communication costs. encryption and other issues that could be derived from the
Despite these limitations, DenseFed-PSO offered a promising increase in the size of either the data set or the participants.
solution for privacy-preserving, high-performance medical In the future, studies could work on enhancing the efficiency
image analysis in Alzheimer’s diagnosis. required for widespread adoption and optimizing encryption
Raisa et al. [24] developed the Cyber-Physical Fusion techniques.
System aimed at stress detection using different modalities,
such as physiological and behavioral data combined with
social media usage. Further, a DNN was used by the system A. RESEARCH GAP
to conduct reliable stress detection by fusing information The performance of ML models in predicting AD using
from wearable smart devices and the web. However, data data from a single modality is not up to the mark, nor is
integration complexity and privacy concerns were raised. there a possibility of improving the model’s performance.
The scalability of the platform, as well as privacy-preserving Even ML and DL models enable data island problems. Data
methods for real-world applications, were recommended for privacy and security are not ensured in ML and DL models.
future work. Therefore, there are possibilities to improve the model’s
Kubi and Nazir [25] explored dementia prediction using performance using FL frameworks. Furthermore, very few
multimodal clinical and imaging data, integrating biomark- existing research studies have been done using multimodal
ers, cognitive assessments, and neuroimaging features to data. Still, there are scopes to work on and explore research
enhance prediction accuracy. By employing Gaussian Naïve using multimodal datasets and enhance the robustness of the
Bayes (GNB) models, the study achieved a performance model. In recent research works, various imaging and clinical
accuracy of 95% in detecting dementia stages, demon- data are integrated to achieve multimodality. However,
strating the effectiveness of combining heterogeneous data Psychological data are critical in predicting AD. But, to the
sources. The model excels at capturing complex rela- best of our knowledge, research works on multimodal data
tionships across modalities, providing a robust and early that contain psychological data are currently missing. Even
detection mechanism. However, its limitations included high existing FL, ML, and DL AD diagnosis models are mostly
computational demands, potential biases in clinical and uninterpretable, meaning the general user can’t know the
imaging datasets, and challenges in data harmonization reason behind the model’s decision. Even to the best of our
across modalities. The study underscored the importance of knowledge, there is no multimodal data-centric FL-based AD
addressing these limitations for real-world applicability and prediction model that offers interpretability using the XAI
scalability. technique.

VOLUME 13, 2025 43441

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

IV. PROPOSED METHODOLOGY aged 65 and upwards. The Mini-Mental State Examination
A. DATASET COLLECTION (MMSE) measures overall cognitive ability, including scores
For our work, we used the OASIS-3 [28]. This study spanning 0 (severe impairment) to 30 (with no impairment).
incorporates information from several studies conducted Subjects underwent clinical evaluation techniques in line
at Washington University’s Knight Alzheimer’s Disease with the National Alzheimer Coordinating Center’s UDS.
Research Center in St. Louis over more than 15 years. UDS evaluations included physical tests, historical health
Participants are drawn from three primary cohorts: (i) events, and a neurological assessment. Every three years,
cognitively general user with a CDR of 0 and a family history subjects aged 64 and below receive clinical and psychological
of AD, (ii) cognitively general user with a CDR of 0 and evaluations. Subjects aged 65 and up underwent annual
no family history of AD, (iii) older adults 65 years and clinical and cognitive examinations. The UDS employed
older with a CDR of 0 (cognitively normal) or CDR 0.5–1 the CDR Scale to evaluate participants’ dementia position.
(very mild to mild symptoms of AD). Medical problems that A participant was deemed to have moderate dementia if their
interfered with long-term participation or study requirements CDR score was 2, but when they came to CDR 0.5, they
were among the exclusion criteria. Community outreach was were deemed no more suitable for in-person evaluations. The
used to enroll participants, who then gave blood samples for OASIS data ‘‘ADRC Clinical Data’’ (UDS form B1 and B4
genetic testing and had lumbar punctures, neuroimaging, and variables) includes the following information: weight, height,
cognitive tests every two to three years. age at admission, and CDR assessments. As a component of
OASIS-3 consists of structural and functional magnetic the evaluation, clinicians conducted a diagnostic impression
resonance imaging, amyloid and metabolic PET imaging, input and discussion. This led to a tagged dementia diagnosis
neuropsychological testing, and clinical data from 1,098 that was documented in the OASIS data ‘‘ADRC Clinical
individuals, of whom 605 are cognitively normal adults, Data.’’ Diagnoses for this variable include ‘‘AD dementia,’’
and 493 are at various stages of cognitive decline. These ‘‘regular cognition,’’ ‘‘vascular dementia,’’ and vitamin
participants range from 42 to 95. There, an identification deficiencies, as well as drunkenness and disorders of mood.
was given to the data of each participant, and all dates were While there may be some overlap in the diagnostic results, the
removed and uniformed to represent the number of days since determination of the factors dx1–dx5 is different from UDS
their membership. A large number of Magnetic Resonance assessments.
(MR) Volumetric segmentation files produced by Freesurfer
accompanied the sessions. Due to these psychological data, 2) PSYCHOLOGICAL DATA
Freesurfer volumetric MRI segmentation data and ADRC Numerous well-known psychological tests, including the
clinical data are utilized in this research. Table 1 lists all the Boston Naming Test, animals, Trailmaking B (Trail B),
data for each modality together with several characteristics. Trailmaking A (Trail A), vegetables, digit span, logical
memory, digit symbol, Wechsler Adult Intelligence Scale
(WAIS), and others, are included in the psychological
1) ADRC CLINICAL DATA evaluation dataset [28]. Digit Span involves participants
In compliance with the National Alzheimer Coordinating repeating a series of digits in both directions in order to assess
Center Uniform Data Set (UDS), participants completed clin- their memory and attention range. The participant’s score
ical evaluation protocols. The UDS examinations included was derived from the number of trials that were successfully
medical history, neurological evaluation, physical exami- repeated both backward and forward, in addition to the most
nation, and family history of AD. Clinical and cognitive extended duration that the subject could repeat backward.
assessments were performed every three years for those ages Semantic assessments of memory and language include
64 and under and annually for those ages 65 and older. the Boston Naming Test, which asks participants to name
The ADRC clinical data [27] includes longitudinal data pictures of everyday things, and the Category Fluency Test,
from 1098 unique patients collected over time. Longitudinal which asks participants to name as many words as possible
data is data that is consistently gathered periodically from that fall into a category, including ‘‘vegetable’’ and ‘‘animal.’’
the same participants. The numbers for participants of Psychomotor speed was measured using the Trail Making
CN, AD, Uncertain, Non-AD, and Other classes are 4476, exam, Part A, and the WAIS-R Digit Symbol exam. The
1058, 505, 142, and 43, respectively. The classes such WAIS-R Digit Symbol test is scored based on how many
as CN, AD, unclear dementia, and non-AD dementia all pairs of digit symbols are completed in ninety seconds. The
have unique characteristics. Non-AD dementia labels include executive-level function test was administered using the Trail
vascular dementia, Parkinson’s disease, Dementia with Lewy Making Test Part B. In part A of the Trail Making Test,
Bodies Disease (DLBD), and frontotemporal dementia. The subjects were required to link a series of digits (1)-26) and,
dataset’s key features included age, judgment, Mini-Mental in part B, a series of alternating numbers and letters (1)-A-
State Exam (MMSE), personal care, APOE (apolipoprotein 2-B) in order to construct a trail. The number of commission
E gene), memory, height, CDR, weight, Orient (latest mistakes, the number of proper lines, and the total time to
and long-term memory tests), and Sumbox. Each year, completion in seconds—with a maximum of 150s for Trails A
the ADRC offers a neuropsychological test for people as well as 300s for Trails B—are every measure of outcome.

43442 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

TABLE 1. A brief explanation of multimodal dataset [5].

The Wechsler Memory Scale–Logical Revised Memory— boundary region at every vertex within the tessellated
Story A assesses episodic memory. Subjects are asked to surface area, using continuous information from the whole
recall facts from a brief narrative with 25 bits of data after an three-dimensional MR volume. The generated maps can
expert reads it loudly and again following a 30-minute lag. identify submillimeter variances across groups and are not
Scores range from 0 (no memory) to 25 (total recall). limited by the information’s voxel resolution. Procedures
for measuring cortical thickness have been verified through
comparison with manual measurements and histological
3) MRI SCAN SEGMENTATION DATA investigation. Freesurfer morpho metric techniques have
The Knight Alzheimer Research Imaging Program at Wash- proven reliable in test-retesting across scanner brands and
ington University in St. Louis performed all of the neu- field strengths.
roimaging studies. Three distinct Siemens scanner models—
Vision 1.5T, TIM Trio 3T (two separate scanners of this B. PRE-PROCESSING OF MULTIMODAL DATASET
kind), and BioGraph mMR PET-MR 3T—were used to This research project aims to use multimodal data to predict
capture MRI data (Siemens Medical Solutions USA, Inc.). AD. Here, multimodal data is created through data-level
For archival purposes, every session concurrently recorded on fusion. To achieve multimodality, we have combined three
the Biograph mMR scanner has been divided into separate distinct domain datasets: MRI (more accurately, brain MRI)
PET and MRI sessions. In these sessions, the images were segmentation and clinical and psychological data. Three
analyzed by an open-source software suite called Freesurfer, separate datasets are integrated at the start of the data fusion
which can process and interpret MRI scans of an individual’s approach. The OASIS website has undergone this integration.
brain. This Freesurfer dataset provides volumetric data for This website offers the ability to combine various patient
various human brain regions, including the total cortex, the domain data according to subject ID and session.
intracranial, hemisphere cortex, supratentorial, subcortical
shades of gray, overall gray, and hemisphere cortical white 1) QUANTITY OF EACH MODALITY’S INSTANCES
matter [28]. 3342 occurrences of the same participants were found in the
The Extensible Neuroimaging Archive Toolkit (XNAT) Psychological and Clinical assessments once the combined
pipeline of Freesurfer was used to perform the image dataset was created. That being said, there are 3220 cases in
analysis. FreeSurfer v5.0 or v5.1 has been used to handle the MRI segmentation dataset. Thus, there are 122 missing
T1-weighted pictures. The technique encompasses motion instances in the MRI segmentation dataset (3342-3220) =
alteration and volumetric T1 weighted image averaging, 122. There were 810 (clinical and psychological) and 799
computerized Talairach change, tessellation of the gray (MRI segmentation) distinct subjects. The combined dataset
matter-white matter limit, magnitude normalization, com- has 34 different kinds of labels. All 34 types of labels fall into
puterized topology, division of subcortical white matter five main categories: uncertain dementia, CN, AD, non-AD,
and deep gray matter dimensional structures (including the and others.
hippocampus, putamen, caudate, amygdala, and ventricles),
and removal of non-brain tissue using an integrated exterior TABLE 2. RMSE value after data imputation using KNN [5].
extending technique. After completing cortical models,
flexible processes were used for data processing.
After cortical algorithms were finished, several deformable
rules were carried out for additional data processing and
analysis. These procedures included surface increment,
parcellation of the cortex into units according to gyral and
sulcal framework, authorization to a spherical atlas depending
on specific cortical bending structures to correlate cortical 2) KNN-BASED DATA IMPUTATION
geometry throughout subjects, and generating various kinds At this point, the missing values are filled in by applying
of surface-based information, such as corresponding cur- KNN imputation. The average value of every sample’s
vature and sulcal depth. This approach is used in division missing values is calculated using the N nearest neighbors
and deformation processes to create illustrations of cortical of the training group. If the features of two samples are
thickness, which is computed as the closest distance between similar and none is absent, then they are near. The root mean
the white/gray limit and the cerebrospinal fluid (CSF)/gray square error (RMSE) of the imputed collection with varying

VOLUME 13, 2025 43443

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

numbers of neighbors is displayed in Table 2. Two neighbors 4) DATASET BALANCING

have a lower RMSE score than the others. Consequently, two The multimodal dataset was imbalanced, meaning the
neighbors are imputed to the joined or fused dataset. number of instances in each class was unequal. The CN
class contained 2248 data instances, whereas AD contained
669 samples. Non-AD, Uncertain dementia, and Others
3) SELECTION OF FEATURES contained 106, 287, and 26 cases, respectively. For making
The multimodal dataset has 39 features, but not every feature an imbalanced medical dataset balanced, data oversampling
will have the same impact on the prediction. Both single is a solution. Here, oversampling using the SMOTE is
datasets and multimodal datasets undergo feature selection employed to create a balanced dataset. SMOTE is one
operations to determine the key features. The Pearson’s of the most widely used oversampling strategies to deal
Correlation is used to choose features. A linear relationship with the imbalance problem. It aims to correct the class
between two quantitative variables is ascertained using distribution by randomly selecting minority class samples and
Pearson’s correlation, shown in Figure 1. It is regarded as replicating them. By merging minority instances that already
the most effective method for quantifying the relationship exist, SMOTE generates new minority instances. Following
between variables because it depends on the covariance oversampling, it was discovered that there were 2248 cases
technique. It shows the trend of the association as well as the in each of the five classes. Therefore, after oversampling, the
magnitude of the correlation. total number of cases became 11240, mentioned in Table 3.
Additionally, the Boruta feature selection approach and
cross-validated Recursive Feature Elimination (RFE) were
C. MODEL IMPLEMENTATION
utilized initially to pick significant features. However, it turns
1) FEDERATED LEARNING MODEL
out that each of these two strategies has disadvantages of
its own. The total amount of features used for ML models An ML technique called FL allows building a collaborative
varies after every RFE execution, and RFE is computationally model without centralizing the data. This architecture can be
costly. As RFE was not a stable feature selection approach better described in four steps:
for this work, therefore it was not an appropriate feature Initialization: Firstly, the central server has initialized the
selection strategy for this investigation. Additionally, Boruta global model. The global parameters or meta parameters
did not offer a high accuracy score. In this study, Pearson’s of the model are set by the central server known as the
correlation is favored based on these disadvantages. coordinator (e.g., weights in a ML model). These parameters
For Pearson correlation mapping, the range of values is - are set to a pre-defined range of values, which is zero.
1.0 to 1.0. 1.0 denotes perfect correlation, while -1.0 denotes An upper configuration is provided for a fixed set of
no correlation. It is not necessary to focus on two features hyperparameters to be set, such as the learning rate, number
that have a significant correlation when selecting features. of iterations (rounds), and the approach to obtain the model
To decrease errors and reduce computing complexity, one (e.g., FedAvg). All clients have forwarded the initial model,
trait between these two should be chosen. For the Clinical which has been trained using the global data, to all devices
dataset, 0.9 is the threshold value that performs the best. that are part of it.
The optimal value for the MRI dataset is obtained within the Local Training: Secondly, all the client devices have started
thresholds of 0.9 and 0.95. The most promising results for executing using this RF model. Here, the number of clients
the Psychological dataset are obtained at threshold values of is kept at 10 because this setup ensures the best result. Each
0.75, 0.8, and 0.9. Using the Psychological dataset, threshold client trains this RF model using its dataset separately. This
readings of 0.75, 0.8, and 0.9 yields the most promising is local training, and after this process, the model parameter
findings. gradients or weights are transferred to the global model.
For multimodal datasets, CortexVol, RhCortexVol, Corti- Aggregation: Thirdly, using these updated weights, the
calWhiteMatterVol, TotalGrayVol, and RhCorticalWhiteM- central global model will train itself, improving its accuracy
atterVol were discovered to be the least influential feature and other performance parameters. The server uses the
with strong correlation values (features with robust cor- FedAvg technique to combine these updates to enhance the
relations should be eliminated since they lead to volatile global framework or model.
models, numerical mistakes, and subpar prediction accu- Global Update: Fourthly, the global model will pass the
racy) at the threshold of 0.95 Pearson correlation. For updated RF to each local model, and the local models will
this work, the selected features are: DIGIF, LOGIMEM, again train RF with their dataset. Finally, this process will
DIGIB, DIGIBLEN, TRAILARR, ANIMALS, TRAILA, continue for 50 epochs as per this work, and every time,
TRAILBRR, MEMUNITS, TRAILALI, TRAILB, TRAIL- clients will train RF and pass the updated weights.
BLI, WAIS, MEMTIME, BOSTON, lhCortexVol, IntraCra-
nialVol, SupraTentorialVol, DIGIFLEN, LhCorticalWhiteM- 2) ALGORITHM
atterVol, AgeAtEntry, Homehobb, Judgment, Commun, The FedAvg algorithm (Global Model) is written below in
Memory, VEG, SubCortGrayVol, Orient, Perscare, MMSE, Algorithm 1. This algorithm can be described in 8 steps.
APOE, Sumbox, Height, and Weight. 1. Setting up the Model Vo:

43444 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

FIGURE 1. Pearson’s heatmap with 39 features of multimodal dataset for correlation [5].

TABLE 3. Five classes and their number of samples [5].

VOLUME 13, 2025 43445

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

The first step in the procedure is to initialize the global This is an iterative process, which means that the algorithm
model Vo. It contains all of the model’s parameters, including proceeds in sync with the global server for multiple rounds.
weights and biases. Each round is a cycle in which the local device receives,
2. Iterative Training Rounds: trains, and updates the global model.
Here, the algorithm operates in a series of communication 2. Receiving the Updated Global Model:
rounds (t = 1, 2, . . .), during which information is At the start of every round, the local device is provided
periodically exchanged between the central server and the with the updated global model Vt from the central server. This
participating devices. enables all the devices to commence their local training from
3. Client Selection (Rt): the identical state of the global model.
In each round, a subset of the available clients (Rt) is 3. Local Training Using the Updated Global Model:
picked. A random set of max (C,P,1) was defined as Rt. Each local device applies the received global model as the
C represents a portion of the clients who participate, and P initial state of its regional training. The model is trained on the
represents the total number of devices. This helps to balance global model using the device’s private, locally stored data.
participation variety and communication expenses. RF is applied to optimize the model on the local dataset.
4. Broadcasting the Global Model: 4. Computing the Updated Local Model:
The central server provides the most recent global model Upon completing the training process on the local data, the
Vt − 1 to all selected devices Rt. During the training session, device calculates its updated local model parameters (Vt P).
all devices are trained using the same model. These updates incorporate the information gained from the
5. Local Device Training: local data. Still, the changes are made in a way that keeps the
The selected device (P) uses the local dataset to train the information secure, meaning that the original files are never
global model Vt − 1. The devices adjust the parameters of the disclosed.
local model to obtain Vt P based on local data. 5. Disseminating the Updated Local Model:
6. Local Updates Are Being Uploaded: The relevant updates to the local model parameters
After local training, the devices provide updated model are returned to the central server for further storage and
parameters to the central server Vt P . These updates are computation. Communication is limited to exchanging model
usually limited to model weights or gradients, ensuring that updates, such as weights, update gradients, and model
no raw data is communicated and data privacy is protected. parameters.
7. Global Aggregation:
On the central server, all the updates from all the devices Algorithm 2 Algorithm of FedAvg (Local Device)
that participated in the training Vt P are merged to produce a for each round t = 1,2,. . . , do;
new version of the global model Vt. This process makes use Receive the updated global model;
of the FedAvg technique. For every device, all updates are Use the updated global model to train the local model;
weighted proportionately with respect to the size of the data Compute the new updated local model Vt P ;
set available on the device. Share the recently updated local model to global model;
8. Iterative Process:
At the conclusion of each training round, the global model
is modified Vt and sent back to all the selected devices to 3) RANDOM FOREST GLOBAL MODEL DESCRIPTION
begin the subsequent training round. This process is repeated For implementing the global model of the FL, the RF
for a set number of rounds or until the model reaches the model gives the best performance. The specifics of its
desired target effectiveness. implementation are covered here. In this instance, the basis
estimator is the decision tree classifier. Each estimator is
Algorithm 1 Algorithm of FedAvg (Global Model) trained using a different bootstrap sample that is extracted
Input: Vo from the training set. Estimators use all attributes for both
for each round t = 1,2,. . . , do Rt=(random set of max training and prediction. The forest contains one hundred
(C,P,1) devices); trees. A statistic called Gini is used to evaluate a split’s
for each device P belongs to Rt in parallel do quality. For an internal node to be divided, at least two
Broadcast recent global model Vt − 1 to device P; samples are needed. Up until each leaf has fewer than a pair of
Receive the update from the local model Vt P from the samples, nodes are increased. There must be a single sample
device P; present at minimum for a leaf node to exist. The square
Update the global model Vt. root function is used to determine how many attributes to
look at while figuring out the best split. The tree can reach
The FedAvg algorithm’s local device component describes a maximum level or depth of 30 and a minimum of one
how each device (or client) adds to the collaborative learning node, which can function as a leaf. The tree is grown using
process in an FL system. The provided Algorithm 2 is the Best-First method. The relative reduction of impurities
explained in detail below: distinguishes the most effective nodes. Bootstrap samples
1. Initial Process: are employed in tree construction. Every time the model

43446 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

is executed, the identical test and train sets are given to it Therefore, the contribution of a variable (or several variables)
because the random state is maintained at 42. All five classes to the difference between the model’s predicted value and the
are supposed to be allocated weight one. average of all individual predictions is the Shapley value for
a given individual. To make this happen:
4) EXPLAINABLE AI MODEL Step 1: Calculating the Shapley values for a particular
A set of frameworks and tools known as ‘‘XAI’’ is intended subject. For the input characteristics, simulate different
to help people comprehend and analyze the predictions combinations of values.
generated by ML models. It also helps people understand the Step 2: Calculate how much each combination’s average
model’s behaviors and enables debugging and performance forecast differs from the predicted value. Consequently,
improvements. Any ML model’s output can be explained a variable’s Shapley value is equal to the average of the value
using the game-theoretic method known as SHAP. This contributions resulting from the different combinations. Tree
method uses the standard Shapley values and related expan- SHAP is a quick and precise method for computing SHAP
sions from game theory to connect optimal credit distribution values for tree models and ensembles under various feature
to local explanations. A feature value’s Shapley value is its dependence assumptions.
impact on the payment, weighted and aggregated across all The SHAP Tree Explainer is used in this research endeavor
possible feature value combinations. The SHAP value for a to analyze and see all of the decision-making aspects and their
feature k is calculated as: percentages. This method makes it simple for a physician
X |G|! (h − |G| − 1)! or patient to comprehend the model’s reasoning, as well as
φk (val) = the characteristics and proportions that go into the decision-
h!
G⊆{1,...,h}\{k} making process.
(val (G ∪ {k}) − val(G)) (1)
D. PROPOSED MODEL ARCHITECTURE
Here, in equation 1, φk (val) is the SHAP value for feature
There will be a server-side global model that will work on the
k, representing its contribution to the prediction. h denotes
RF model. Initially, this model will be transferred to all the
the set of features. G is the subset of features excluding
clients. Clients will train the model using their local data and
i. val(G) determines the model’s prediction when only the
pass the parameters and model weight to the global model.
features in subset G are considered (other features are treated
Using the FedAvg technique, all the updated weights will be
as missing).
Z accumulated into the global model. The newly updated model
will be passed on to the clients again, and this process will
valx (G) = ẑ(w1 , . . . , wh )dHw∈G
/ − EW (fˆ (W )) (2)
continue until all 50 epochs are completed. Here, in the global
In equation 2, valx (G) represents the value or contribution model, there will be a SHAP explainer that will interpret the
of the subset G of features to the model’s prediction for a uninterpretable RF global model as a white box RF global
specific input x. ẑ(w1 , . . . , wh ) describes a modified function model. Using this XAI model, the central administrator can
or approximation of the model output fˆ , evaluated with identify whether the model is performing correctly as well
some or all features from the subset G. dHw∈G as which features are the most impactful for classifying each
/ refers to the
marginal distribution over the features not included in G. class. As FL supports data privacy, this central administrator
Integration over this distribution accounts for the uncertainty XAI will not breach individual patients’ data privacy. Figure 2
or average contribution of the excluded features. EW (fˆ (W )) illustrates the proposed Federated XAI-based AD Prediction
refers to the expected value of the model’s output fˆ , averaged Model Architecture.
over the entire distribution H (W ) of all features. This is
the baseline prediction, representing the model’s behavior V. PERFORMANCE ANALYSIS
without specific feature contributions A. PERFORMANCE METRICS
An evaluation measure describes a model’s performance.
valw (G) = valw ({1, 3}) The confusion matrix stands out across numerous metrics.
Z Z
A confusion matrix generates a matrix that summarizes the
= ẑ(w1 , W2 , w3 , W4 )dHW2 W4 − EW (fˆ (W ))
R R overall effectiveness of the model. The matrix is N × N , with
(3) N being the total number of classes anticipated. There are
For each missing feature G, several integrations must be four essential terms for calculating the confusion matrix: True
executed. Equation 3 provides an example: applying a model Positive (TP), False Positive (FP), True Negative (TN), and
with four features (w1, w2, w3, and w4) and estimating the False Negative (FN). Some crucial definitions connected to
coalition G based on feature values w1 and w3. the confusion matrix are:
This is comparable to the feature contributions of the linear 1) Accuracy: Accuracy is the most widely used parameter
model. The only attribution method that satisfies the criteria for evaluating classification models. This statistic
of Efficiency, Symmetry, and Additivity—all of which are calculates the fraction of correct predictions among all
useful in defining a just compensation—is the Shapley value. cases evaluated. The best use case occurs when the

VOLUME 13, 2025 43447

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

FIGURE 2. Federated explainable AI-based Alzheimer’s disease prediction framework.

data is balanced. Equation 4 describes the method of score. Equation 7 describes the method of calculating
calculating Accuracy. F1-Score.
TP + TN Precision × Recall
Accuracy = (4) F1 − Score = 2 × (7)
TP + TN + FP + FN Precision + Recall
5) AUC: The Area Under Coverage (AUC) score repre-
2) Precision: From the overall predicted patterns, the
sents the area under the region of the coverage curve.
measure of precision is used to calculate how well the
It summarizes a model’s ability to provide relative
positive patterns in the positive classes were antici-
scores that distinguish between positive and negative
pated. It is the proportion of successfully identified
examples across all categorization thresholds. The
positive cases. Equation 5 describes the method of
AUC score ranges from 0 to 1, with 0.5 representing
calculating Precision.
random guessing and 1 indicating perfect performance.
TP
Precision = (5)
TP + FP B. PERFORMANCE ANALYSIS WITH THREE INDIVIDUAL
3) Recall or Sensitivity: Recall is the total number of DATASETS
actual positive cases that were accurately forecasted. Table 4 shows the individual dataset performance for different
This statistic is also known as sensitivity. Equation 6 models. For feature selection in psychological, clinical, and
describes the method for calculating Recall. MRI segmentation datasets, the 0.9 thresholds produced the
maximum accuracy. For all three datasets, the RF model
TP
Recall = (6) outperforms the other models (DT, LR, KNN, Multilayer
TP + FN Perceptron (MLP), Ada Boost (AB), SVM, and XGB)
4) F1-Score: The F1 score can be understood as a in terms of accuracy. All of these results were achieved
harmonic mean of precision and recall, with 1 repre- using a five-class classification. Here, the used features of
senting the best score and 0 representing the lowest Psychological dataset: DigiB, DigiF, Logimem, Animals,

43448 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

FIGURE 3. Confusion matrix of the random forest classifier using FIGURE 4. Confusion matrix of the random forest federated learning
multi-modal dataset. model using multi-modal dataset.

Veg, TrailBrr, TrailA, TrailAli, TrailArr, TrailB, TrailBli, and 98.79%, respectively. RF’s ability to handle complex,
WAIS, Memtime, and Boston. MRI Segmentation dataset: multimodal, and noisy datasets with nonlinear relationships
IntraCranialVol, lhCortexVol, SubCortGrayVol, SupraTen- makes it ideal for AD prediction. It leverages ensemble
torialVol, and LhCorticalWhiteMatterVol. Clinical dataset: learning to reduce overfitting and improve generalization,
Mmse, AgeatEntry, Judgment, Commun, Perscare, Memory, leading to better performance than other models.
Homehobb, Orient, Apoe, Height, and Weight.
D. MULTIMODAL DATASET WITH OR WITHOUT FL
C. ANALYZING PERFORMANCE WITH THE MULTIMODAL APPROACH
DATASET From Table 6, it is clear that the performance of the FL
Table 5 makes it evident that, when using the multimodal approach is higher than the without FL approach. The
dataset and a feature selection threshold of 0.95, RF is testing accuracy of FL is 98.93%, which is higher than
the best-performing algorithm for five-class prediction. This with our FL approach. The precision, recall, F1-score, and
indicates that a feature is ignored if its correlation value AUC value are 98.93%, 98.94%, 98.93%, and 99.97%.
is more significant than 0.95. Animals, Logimem, Digif, Due to data diversity, minimal overfitting, scalability, model
Digib, DigiBlen, TrailBli, Veg, DigiFlen, WAIS, Memunits, robustness, and maintaining data heterogeneity, the FL
Traila, Trailarr, Trailbrr, Memtime, Boston, Trailali, Trailb approach outperforms the without-FL approach. Figure 4
and so on are the characteristics that are employed in shows the confusion matrix of the FL approach.
this instance. Apoe, Height, Weight, Commun, Memory,
Orient, Homehobb, Judgment, Persecare, Apoe, lhCortexVol, E. COMPARISON OF MODELS WITH EXISTING WORKS
LhCorticalWhiteMatterVol, SubCortGrayVol, SupraTentori- It is observed from Table 7 that the accuracy of the ADmarker
alVol, Mmse, Age At Entry, and Sumbox. model [14] for predicting the biomarker of AD is 88.9%.
Table 5 describes the stratified 10-fold cross-validation, That of the FedDAvt [15] model is 88.57% in predicting AD.
testing, and training accuracy. The multimodal data has Ghosh and Gayathri [23], Castro et al. [19], and Bukhari at
an improved accuracy rating (10-fold cross-validation) el. [16] used the FL technique using an unimodal dataset,
compared to the individual dataset. In this case, the achieving the performances of 95.24%, 92%, and 97%,
cross-validation accuracy is employed since it prevents a respectively. Baglat et al. [29], and Kavitha et al. [30], and
model from being overfit. The RF classifier’s precision, Almarar and Otoum [18] used unimodal (MRI) OASIS data,
recall, and F1-score can be calculated based on the confusion RF model as a classifier, and ensured an accuracy of 86.8%,
matrix (Figure 3). The RF model’s training, testing, cross- 86.92%, and 94.19%, respectively. Furthermore, Amrutesh
validation accuracy, F1-score, precision, AUC, and recall et al. [31] used Text data from OASIS and RF model for
values are 100%, 98.84%, 98.81%, 98.75%, 98.94%, 99.97%, classification and secured 92.13% accuracy. On the other

VOLUME 13, 2025 43449

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

TABLE 4. 10-fold cross-validation accuracy of various ML models using individual datasets [5].

TABLE 5. Analyzing accuracy using the multimodal dataset [5].

TABLE 6. Performance comparison of multimodal dataset with FL approach versus without FL approach.

TABLE 7. Comparison of the proposed work with other recent related works.

hand, Kubi and Nazir [25] proposed a GNB model for SHAP violin plot. The model’s features are presented along
predicting dementia using clinical and imaging data from the the y-axis in descending order of relevance. Each feature
OASIS repository and achieved 95% accuracy. The details correlates to one violin plot, which depicts how it influences
of the internal model of these related works are mentioned model predictions. The x-axis depicts SHAP values, which
in Section III. In contrast to all these aforementioned works, indicate how each feature affects the model’s output. Positive
this work proposed an RF-FL approach using Multimodal SHAP values indicate that a characteristic improves the
data from the OASIS repository, which is why this work has projected outcome, whereas negative SHAP scores indicate
achieved better performance (accuracy 98.93%) than other the opposite. The greater the feature’s influence on the
existing models. Figure 4 shows the confusion matrix of the prediction, the more SHAP value deviates from zero.
RF-FL model. The width of each violin plot represents the distribution
of SHAP values for that feature over every dataset record.
F. EXPLAINABLE ARTIFICIAL INTELLIGENCE A broader view of the violin plot reveals that more cases had
A SHAP violin plot shows how each feature affects model SHAP values of that magnitude. Narrow sections represent
predictions. It combines the qualities of a violin plot and fewer instances of specific SHAP values. The figure’s sym-
an overview plot, displaying both the distribution of SHAP metry indicates both the characteristic’s positive and negative
values and the size of each feature’s influence on the result effects. The violin plot’s point colors, which typically range
of the model. Here are ways to read essential elements of the from blue (low values) to red (high values), represent the

43450 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

FIGURE 5. SHAP violin plot for CN class.

FIGURE 7. SHAP violin plot for Non-AD class.

FIGURE 6. SHAP violin plot for AD class.

FIGURE 8. SHAP violin plot for uncertain class.

feature values for each data point. This color gradient makes it
easier to see how a feature’s value influences its SHAP value.
For example, high feature values (red) may systematically From Figure 5, it is observed that memory, sumbox (0),
lead to positive SHAP values, indicating that the feature judgment, commun, and orient are the top five essential
influences predictions favorably when it is high. The features features. The low memory, sumbox, judgment, commun, and
near the peak of the violin plot are more relevant to the orient values are responsible for CN prediction.
model’s estimations, while those at the bottom are less Furthermore, sumbox is the most important feature for
significant. The width and spread of the violin represent the predicting AD Figure 6. Here, the higher value of sumbox
heterogeneity in how each attribute affects predictions. The (mostly 2-5) is influencing positively. Similarly, a bit higher
import features can be easily identified by the long tails on value (mostly 1) of judgment, homehobb, orient, and memory
the right side of the x-axis. positively influence AD identification. Figure 7 illustrates

VOLUME 13, 2025 43451

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

TABLE 8. Top five influential features of each class.

The introduction of XAI improves the predictions of the

model by providing insight into the crucial factors driving
disease progression, which is critical for both physicians
and researchers. This interpretability builds trust in AI-
driven forecasts, enabling healthcare professionals to make
better judgments. In general, this method shows that FL,
when combined with explainability and multimodal data,
is a promising way to improve AD prediction, improve
patient outcomes, and protect data privacy. The model’s
accuracy, precision, recall, F1-score, and AUC are 98.93%,
98.94%, 98.93%, and 99.97%, respectively. Future research
can concentrate on refining the model’s performance across
several additional data modalities to improve predicted
accuracy. As this work focused on the FL model rather than
the system, system performance metrics are not calculated
as performance metrics. Therefore, system performance met-
rics, such as training time, computation cost, communication
overhead, and latency, could be measured in the future when
establishing a complete FL system.
FIGURE 9. SHAP violin plot for others class.

REFERENCES
[1] (2022). Alzheimer’s: Death of Key Brain Cells Causes Daytime
that the very high value of sumbox (mostly 5-10) is Sleepiness. Accessed: Jul. 15, 2022. [Online]. Available: https://www.
medicalnewstoday.com/articles/326073
responsible for Non-AD prediction. For Non-AD prediction [2] (2022). What Is Alzheimer’s Disease?. Accessed: Jul. 15, 2022. [Online].
judgment, homehobb, orient, perscare, and memory are the Available: https://www.nia.nih.gov/health/what-alzheimers-disease
other most important features, and their value lies between [3] S. Jahan and M S. Kaiser, ‘‘An explainable Alzheimer’S disease prediction
(0.5-2). Afterward, sumbox (mostly 0.5-2.5), memory, judg- using efficientnet-B7 convolutional neural network architecture,’’ in The
Fourth Industrial Revolution and Beyond: Select Proceedings of IC4IR+.
ment, homehobb, and orient are the most influential features Cham, Switzerland: Springer, 2023, pp. 737–748.
for classifying the uncertain class, which is represented in [4] K. C. A. Khanzode and R. D. Sarode, ‘‘Advantages and disadvantages of
Figure 8. Finally, for others class in Figure 9, sumbox (mostly artificial intelligence and machine learning: A literature review,’’ Int. J.
Library Inf. Sci. (IJLIS), vol. 9, no. 1, p. 3, 2020.
0, 0.5, 0.75, and 1), judgment, memory, intraCranialVolume,
[5] S. Jahan, K. A. Taher, M S. Kaiser, M. Mahmud, M. S. Rahman,
and homehobb are the top five most influential features. A. S. M. S. Hosen, and I.-H. Ra, ‘‘Explainable ai-based alzheimer’s
As the sumbox is the most common and influential feature prediction and management using multimodal data,’’ PLoS ONE, vol. 18,
in each class, the value range of the sumbox is also no. 11, 2023, Art. no. e0294253.
[6] M. A. DeTure and D. W. Dickson, ‘‘The neuropathological diagnosis of
stated here. Table 8 expresses the top influential features Alzheimer’S disease,’’ Mol. Neurodegeneration, vol. 14, no. 1, p. 32, 2019.
sequentially. [7] I. Bhushan, M. Kour, G. Kour, S. Gupta, S. Sharma, and A. Yadav,
‘‘Alzheimer’s disease: Causes & treatment—A review,’’ Ann. Biotechnol.,
vol. 1, no. 1, p. 1002, 2018.
VI. CONCLUSION
[8] Z. Breijyeh and R. Karaman, ‘‘Comprehensive review on Alzheimer’s
This paper proposes a Federated XAI-based framework that disease: Causes and treatment,’’ Molecules, vol. 25, no. 24, p. 5789,
preserves privacy to predict AD using multimodal data. The Dec. 2020.
use of FL guarantees that sensitive patient data remains [9] S. Jahan, M. R. S. Adib, M. Mahmud, and M. S. Kaiser, ‘‘Comparison
between explainable ai algorithms for Alzheimer’S disease prediction
localized, protecting privacy while allowing collaboration using efficientnet models,’’ in Proc. Int. Conf. brain Informat. Cham,
between many institutions. The model incorporates sev- Switzerland: Springer, 2023, pp. 357–368.
eral components of Alzheimer’s progression by combining [10] G. A. Edwards III, N. Gamez, G. Escobedo Jr., O. Calderon, and I. Moreno-
Gonzalez, ‘‘Modifiable risk factors for Alzheimer’S disease,’’ Frontiers
multimodal data, such as MRI scans, clinical data, and Aging Neurosci., vol. 11, p. 146, Jun. 2019.
psychological data, collected from the OASIS-3 repository, [11] R. A. Armstrong, ‘‘Risk factors for alzheimer’s disease,’’ Folia Neu-
resulting in more accurate and comprehensive predictions. ropathol., vol. 57, no. 2, pp. 87–105, 2019.

43452 VOLUME 13, 2025

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

[12] P. Angelov, E. Soares, R. Jiang, N. I. Arnold, and P. M. Atkinson, [30] C. Kavitha, V. Mani, S. R. Srividhya, O. I. Khalaf, and C. A. T. Romero,
‘‘Explainable artificial intelligence: An analytical review,’’ Wiley Inter- ‘‘Early-stage Alzheimer’s disease prediction using machine learning
discipl. Rev., Data Mining Knowl. Discovery, vol. 11, no. 5, p. e142, models,’’ Frontiers Public Health, vol. 10, Mar. 2022, Art. no. 853294.
Jul. 2021. [31] A. Amrutesh, C. G. G. Bhat, A. Amruthamsh, K. P. A. Rani, and
[13] D. Minh, H. X. Wang, Y. F. Li, and T. N. Nguyen, ‘‘Explainable artificial S. Gowrishankar, ‘‘Alzheimer’s disease prediction using machine learning
intelligence: A comprehensive review,’’ Artif. Intell. Rev., vol. 55, no. 5, and transfer learning models,’’ in Proc. 6th Int. Conf. Comput. Syst. Inf.
pp. 3503–3568, Jun. 2022. Technol. Sustain. Solutions (CSITSS), Dec. 2022, pp. 1–6.
[14] X. Ouyang, X. Shuai, Y. Li, L. Pan, X. Zhang, H. Fu, S. Cheng, X. Wang,
S. Cao, J. Xin, H. Mok, Z. Yan, D. S. F. Yu, T. Kwok, and G. Xing,
‘‘ADMarker: A multi-modal federated learning system for monitoring
digital biomarkers of Alzheimer’s disease,’’ in Proc. 30th Annu. Int. Conf.
Mobile Comput. Netw., May 2024, pp. 404–419.
[15] B. Lei, Y. Zhu, E. Liang, P. Yang, S. Chen, H. Hu, H. Xie, Z. Wei, F. Hao,
X. Song, T. Wang, X. Xiao, S. Wang, and H. Han, ‘‘Federated domain
adaptation via transformer for multi-site Alzheimer’s disease diagnosis,’’
IEEE Trans. Med. Imag., vol. 42, no. 12, pp. 3651–3664, Dec. 2023. SOBHANA JAHAN received the B.Sc. and M.Sc.
[16] S. M. S. Bukhari, M. H. Zafar, S. K. R. Moosavi, N. M. Khan, and degrees in information and communication engi-
F. Sanfilippo, ‘‘Enhancing Alzheimer’S disease detection and classifi- neering from Bangladesh University of Profes-
cation through federated learning-optimized deep convolutional neural sionals (BUP), in 2020 and 2022, respectively.
networks on MRI data,’’ in Proc. Intell. Syst. Conf. Cham, Switzerland: She was a Lecturer with the Green University
Springer, 2024, pp. 693–712. of Bangladesh (GUB) and a Teaching Assistant
[17] T. Ghosh, M. I. A. Palash, M. A. Yousuf, M. A. Hamid, M. M. Monowar, (TA) with BUP. She is currently a Lecturer with
and M. O. Alassafi, ‘‘A robust distributed deep learning approach to detect the Department of Computer Science and Engi-
Alzheimer’S disease from mri images,’’ Mathematics, vol. 11, no. 12, neering, BUP. She has published several papers
p. 2633, 2023. in Q1 international journals and conferences. Her
[18] N. Almarar and S. Otoum, ‘‘A federated MRI and ML approach for research interests include artificial intelligence, ML, and the IoT. In 2021, she
precision healthcare detection,’’ in Proc. IEEE Int. Conf. Commun. (ICC), was awarded a Research Fellowship from the ICT Division of Bangladesh
Jun. 2024, pp. 836–842.
Government. In 2024, she has been awarded as the Best Researcher of BUP.
[19] F. Castro, D. Impedovo, and G. Pirlo, ‘‘A federated learning system She actively serves as a Reviewer for IEEE and PLOS One journals.
with biometric medical image authentication for Alzheimer’S diagno-
sis,’’ in Proc. 13th Int. Conf. Pattern Recognit. Appl. Methods, 2024,
pp. 951–960.
[20] A. AlMohimeed, R. M. A. Saad, S. Mostafa, N. M. El-Rashidy,
S. Farrag, A. Gaballah, M. A. Elaziz, S. El-Sappagh, and H. Saleh,
‘‘Explainable artificial intelligence of multi-level stacking ensemble for
detection of Alzheimer’s disease based on particle swarm optimization
and the sub-scores of cognitive biomarkers,’’ IEEE Access, vol. 11,
pp. 123173–123193, 2023. MD. RAWNAK SAIF ADIB received the B.Sc.
[21] U. Rashmi, B. M. Beena, and S. Ambesange, ‘‘BrainCrossFed CNN and M.Sc. degrees in information and commu-
model for Alzheimer classification using MRI data and comparison nication engineering from Bangladesh University
and benchmarking proposed model with DINOv2 and ExplainableAI
of Professionals (BUP) under the Department
using GradCAM,’’ in Proc. Int. Conf. Confluence Advancements
of Information Communication and Technology,
Robot., Vis. Interdiscipl. Technol. Manage. (IC-RVITM), Nov. 2023,
pp. 1–7. in 2020 and 2024, respectively. He is currently
a Lecturer with the Department of Computer
[22] B. Lei, Y. Liang, J. Xie, Y. Wu, E. Liang, Y. Liu, P. Yang, T. Wang, C. Liu,
J. Du, X. Xiao, and S. Wang, ‘‘Hybrid federated learning with brain-region Science and Engineering, International University
attention network for multi-center Alzheimer’s disease detection,’’ Pattern of Business Agriculture and Technology (IUBAT).
Recognit., vol. 153, Sep. 2024, Art. no. 110423. He has published two Q1 journal articles and
[23] A. Ghosh and S. Gayathri, ‘‘DenseFed-PSO: Particle swarm optimization- several international conference papers. His research interests include XAI,
based DenseNet federated model in Alzheimer’s detection,’’ in Proc. Int. ML, AI in healthcare, and computer vision. He received the MIYAN
Symp. Intell. Inform. Switzerland: Springer, 2023, pp. 229–243. PUBLICATION REWARD from the Vice-Chancellor of IUBAT for his
[24] J. F. Raisa, S. Jahan, and M. S. Kaiser, ‘‘A cyber-physical fusion system research article, in 2024. Besides, he actively serves as a Reviewer for
for stress detection using multimodal and social media data,’’ in Proc. journals, such as PLOS One and Heliyon.
4th Int. Conf. Ind. Revolution Beyond, S. Hossain, M. S. Hossain, M. S.
Kaiser, S. P. Majumder, and K. Ray, Eds., Singapore: Springer, 2022,
pp. 615–627.
[25] N. N. B. A. Kubi and S. Nazir, ‘‘Dementia prediction with multimodal
clinical and imaging data,’’ Int. J. Inf. Technol., vol. 17, no. 1, pp. 5–16,
Jan. 2025.
[26] N. Basnin, T. Mahmud, R. U. Islam, and K. Andersson, ‘‘An evolutionary
federated learning approach to diagnose Alzheimer’s disease under
uncertainty,’’ Diagnostics, vol. 15, no. 1, p. 80, Jan. 2025. SYED MAHMUDUL HUDA received the B.Sc.
[27] A. Mitrovska, P. Safari, K. Ritter, B. Shariati, and J. K. Fischer, ‘‘Secure degree in information and communication engi-
federated learning for Alzheimer’s disease detection,’’ Frontiers Aging neering from Bangladesh University of Profes-
Neurosci., vol. 16, Mar. 2024, Art. no. 1324032. sionals (BUP), in 2020. He was an Intern with
[28] P. J. LaMontagne et al., ‘‘OASIS-3: Longitudinal neuroimaging, clinical, the Product and Technology Division, bKash Ltd.
and cognitive dataset for normal aging and Alzheimer disease,’’ medrxiv, He is currently a Quality Control Specialist with
Dec. 2019. SuperAnnotate AI, where he has been working
[29] P. Baglat, A. W. Salehi, A. Gupta, and G. Gupta, ‘‘Multiple machine with various companies and their data to imple-
learning models for detection of Alzheimer’s disease using oasis dataset,’’ ment ML products for different sectors, such as
in Proc. Int. Work. Conf. Transf. Diffusion IT, Tiruchirappalli, India. Cham, business, agriculture, healthcare, and LLM. His
Switzerland: Springer, Dec. 2020, pp. 614–622. research interests include artificial intelligence, ML, big data, and the IoT.

VOLUME 13, 2025 43453

S. Jahan et al.: Federated Explainable AI-Based AD Prediction With Multimodal Data

MD. SAZZADUR RAHMAN received the B.Sc. DEEPAK GHIMIRE received the bachelor’s
and M.S. degrees in applied physics, electron- degree in computer engineering from Pokhara
ics, and communication engineering from the University, Nepal, in 2007, and the M.S. and Ph.D.
University of Dhaka, Dhaka, Bangladesh, in degrees in computer science and engineering
2005 and 2006, respectively, and the Ph.D. degree from Jeonbuk National University, South Korea,
in material science from Kyushu University, Japan, in 2011 and 2014, respectively. Before pursuing
in 2015. He was a Faculty Member with the his graduate studies, he was a Lecturer with
Faculty of Computer Science and Engineering, Pokhara University. From 2014 to 2020, he was
Hajee Mohammad Danesh Science and Technol- a Postdoctoral Researcher with the Intelligent
ogy University, from May 2009 to November Image Processing Research and Development
2018. Since November 2018, he has been with the Institute of Information Division, Korea Electronics Technology Institute (KETI), South Korea.
Technology, Jahangirnagar University, Savar, Dhaka, where he is currently From 2018 to 2022, he was an AI Computer Vision Engineer with NVISO.ai,
a Professor. His research interests include material science for computer a Swiss-based company specializing in AI edge computing applications.
applications, surface science, ubiquitous computing, WSNs, ML, and the From 2022 to June 2024, he held the position of a Research Professor with
IoT. Soongsil University, South Korea. Since July 2024, he has been a Senior
Researcher with the IT Application Research Center, KETI. He has authored
more than 40 research papers in international journals and conferences. His
research interests include developing lightweight deep learning solutions
for edge computing with applications across various domains, including
M. SHAMIM KAISER (Senior Member, IEEE) robotics, agriculture, and surveillance. He actively serves as a reviewer for
received the bachelor’s and master’s degrees in IEEE, Elsevier, Springer, and MDPI journals.
applied physics, electronics, and communica-
tion engineering from the University of Dhaka,
Bangladesh, in 2002 and 2004, respectively, and
the Ph.D. degree in telecommunication engi-
neering from the Asian Institute of Technology
(AIT), Pathum Thani, Thailand, in 2010. In 2005,
he joined the Department of ETE, Daffodil
International University, as a Lecturer. In 2010,
he was an Assistant Professor with the Department of EEE, Eastern
University, Bangladesh, and the Department of MNS, BRAC University,
Dhaka. Since 2011, he has been an Assistant Professor with the Institute
of Information Technology, Jahangirnagar University, Dhaka, where he
became an Associate Professor, in 2015, and a Full Professor, in 2019.
He has authored more than 100 papers in different peer-reviewed journals
and conferences. His current research interests include data analytics, ML,
wireless networks and signal processing, cognitive radio networks, big data
and cyber security, and renewable energy. He is the Founding Chapter Chair
of the IEEE Bangladesh Section Computer Society Chapter.

A. S. M. SANWAR HOSEN received the M.S.

and Ph.D. degrees in computer science and MI JIN PARK received the bachelor’s degree in
engineering from Jeonbuk National University, psychology from Yonsei University, in 2008, the
Jeonju-si, South Korea. He is currently an Assis- M.S./M.D. degree from the School of Medicine,
tant Professor with the Department of Artificial Jeonbuk University, in 2015, and the Ph.D. degree
Intelligence and Big Data, Woosong University, from the School of Medicine, Sungkyunkwan
Daejeon, South Korea. He has published several University, in 2021. She was with the Samsung
papers in journals and international conferences. Medical Center as an Intern from 2015 to 2016,
His recent research interests include wireless a Resident Doctor, from 2016 to 2020, and a
sensor networks, the Internet of Things, fog-cloud Fellow Doctor, from 2020 to 2022. She is currently
computing, cyber security, artificial intelligence, and blockchain. He has also a Clinical Assistant Professor with the Department
been invited to serve as a guest editor and a technical program committee of Psychiatry, Seoul St. Mary’s Hospital, College of Medicine, The Catholic
member for several reputed international conferences, such as IEEE and University of Korea, Seoul, South Korea. Her research interests include
ACM. He has been an Expert Reviewer of IEEE TRANSACTIONS, Elsevier, depression, addiction, and ML-driven healthcare IoT applications.
Springer, and MDPI journals and magazines.