0% found this document useful (0 votes)

9 views13 pages

05 MEDIN52024744 Online

Good

Uploaded by

Shibly Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

05 MEDIN52024744 Online

Good

Uploaded by

Shibly Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Received: 6 November 2024 | Revised: 11 February 2025 | Accepted: 3 March 2025 | Published online: 20 March 2025

Medinformatics
RESEARCH ARTICLE 2025, Vol. 2(2) 107–119
DOI: 10.47852/bonviewMEDIN52024744

An Ensemble Approach for Artificial Neural

Network-Based Liver Disease Identification from
Optimal Features Through Hybrid Modeling
Integrated with Advanced Explainable AI
Safiul Haque Chowdhury1,* , Mohammad Mamun1, Tanvir Ahmed Shaikat1 , Mohammed Ibrahim Hussain1,
Sadiq Iqbal1,2 and Muhammad Minoar Hossain1,3
1
Department of Computer Science and Engineering, Bangladesh University, Bangladesh
2
Department of Computer Science and Engineering, Dhaka University of Engineering and Technology, Bangladesh
3
Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Bangladesh

Abstract: Liver disease is any condition that negatively affects the liver’s function or structure, resulting in impaired liver function and various
health complications. Abnormal conditions are rapidly increasing day by day. In this study, we used a dataset of key liver disease-related blood
sample biomarkers to utilize various machine learning (ML) techniques to enhance the accuracy of liver disease prediction. Specifically, we
integrated the artificial neural network (ANN) model with five ML models: Stacked Generalization (Stacking), Bootstrap Aggregating
(Bagging), Adaptive Boosting (AdaBoost), Gradient-Boosted Decision Tree (GBDT), and Support Vector Machine (SVM)—resulting in
five distinct hybrid models: Stacking with ANN (SANN), Bagging with ANN, AdaBoost with ANN (ABANN), GBDT with ANN
(GANN), and SVM with ANN (SVMANN). We tested all these hybrid models with feature selection techniques, including linear
discriminant analysis (LDA), principal component analysis (PCA), recursive feature elimination (RFE), and also without feature
selection. Through extensive testing, we found that these five hybrid models performed best when combined with LDA rather than PCA,
RFE, or no feature selection. This discovery led us to create a max voting ensemble (MVE) of these LDA-optimized hybrid models.
Remarkably, our prediction accuracy increased from 79.15% to 98.38% using the MVE. Furthermore, we employ explainable artificial
intelligence techniques such as Local Interpretable Model-agnostic Explanations, Shapley Additive Explanations, and Individual
Conditional Expectations to analyze and enhance trust in the predictions. We also implemented 10-fold cross-validation to ensure the
robustness and reliability of our results. This research underscores the significance of advancements in neural network systems and
highlights the potential for hybrid models to improve predictive accuracy in liver disease diagnosis. Our findings pave the way for a new
generation of computational technologies endowed with intelligence, ultimately contributing to better health outcomes and a deeper
understanding of liver disease dynamics.
Keywords: liver disease, machine learning, artificial neural network, explainable artificial intelligence

1. Introduction individuals aged 45–64 in 2019. Prevalence rates from a 2016

study include Japanese Americans (6.9%), Hispanic/Latino
Liver disease is a primary global health concern, affecting persons (6.7%), White persons (4.1%), and African American/
millions and burdening healthcare systems. Early detection and Black and Native Hawaiian persons (3.9%). Nonalcoholic fatty
accurate prediction can significantly improve patient outcomes liver disease is the most common cause of cirrhosis, with risk
[1]. Over 100 million people in the U.S. have liver disease, with factors including heavy alcohol use, obesity, type 2 diabetes, and
4.5 million diagnosed and an estimated 80–100 million with fatty certain medical and lifestyle factors. Cirrhosis also increases
liver disease. Untreated liver disease can lead to liver failure and stroke risk, with an incidence of 2.17% annually compared to
cancer, resulting in 51,642 deaths in 2020 (15.7 per 100,000). 1.11% without cirrhosis. Death rates from liver cirrhosis have
Chronic liver disease/cirrhosis was the 12th leading cause of death been higher for Black/African American men and women than for
in 2020 and the 8th for non-Hispanic African American/Black their white counterparts since the 1950s [2].
Recent research efforts have extensively explored the utilization
*Corresponding author: Safiul Haque Chowdhury, Department of Computer
of machine learning (ML) techniques for predicting liver disease,
Science and Engineering, Bangladesh University, Bangladesh. Email: safiul. showcasing a variety of methodological approaches and their
haque@bu.edu.bd corresponding accuracies. All these studies focused on classification

© The Author(s) 2025. Published by BON VIEW PUBLISHING PTE. LTD. This is an open access article under the CC BY License (https://creativecommons.org/
licenses/by/4.0/).

107
Medinformatics Vol. 2 Iss. 2 2025

for liver disease prediction. For instance, Choubey et al. [3] adopted 2.1. Dataset
Decision Tree (DT) algorithms and achieved an accuracy of 75.10%,
while Shetty and Satyanarayana [4] enhanced Support Vector We obtained a dataset from the UCI ML Repository [14]
Machine (SVM) with Random Sampling for a 71% accuracy rate. containing 583 samples of individuals, both affected and unaffected
Alyabis et al. [5] turned to Neural Network Analysis and obtained a by liver disease. The dataset comprises 10 features, excluding the
79.6% success rate, and Singh and Agarwal [6] experimented with target variable indicating the presence or absence of liver disease. Of
an Extreme Learning Machine (ELM), resulting in 77.77% accuracy. the 583 instances in the dataset, 416 samples are affected by the
Further contributions include Azam et al. [7], who integrated disease, while the remaining 167 are free. These 10 features contain
K-Nearest Neighbor (KNN) with Feature Selection Techniques vital information related to various blood parameters and liver
(KNNWFST) for a 74% accuracy, and Choudhary et al. [8], who conditions, including Age, Gender, Total Bilirubin (TB), Direct
applied Logistic Regression (LR) with a 70.54% accuracy rate. Bilirubin (DB), Alkaline Phosphatase (ALPH), Alanine
Additional studies by Khan et al. [9] and Thirunavukkarasu et al. Aminotransferase (ALAT), Aspartate Aminotransferase (ASAT),
[10] utilized Random Forest (RF) and LR to achieve accuracies of Total Proteins (TP), Albumin (AL), and Albumin and Globulin
72.17% and 73.97%, respectively. Muthuselvan et al. [11] used Ratio (AGR). The dataset comprehensively represents individuals’
Random Tree, and Yasmin et al. [12] studied KNN, yielding 74.2% liver health features, incorporating key biochemical markers and
and 76.03% accuracy, demonstrating the diverse range of ML demographic information. This diverse set of features will serve as
methodologies being explored for liver disease prediction. the foundation for constructing and evaluating predictive models for
In our study, we critically analyzed the limitations and scopes of liver disease diagnosis. Additionally, Table 1 provides a detailed
these previous studies, seeking to bring novelty to our research description of all 10 features and their corresponding value types,
methodology. In the discussion section, we provide a facilitating a better understanding of the dataset’s composition and
comprehensive comparison of these studies with our findings to characteristics.
highlight the advancements and contributions of our approach.
In recent years, the emergence of advanced computational 2.2. Analysis and visualization
techniques, such as artificial neural network (ANNs) and
explainable artificial intelligence (XAI), has provided promising Data analysis and visualization are crucial in understanding
avenues for enhancing the predictive capabilities of liver disease datasets, especially when applying different ML models [15].
diagnosis models. This research investigates the potential of ANN- These techniques provide valuable insights into the distribution,
based models integrated with XAI techniques for predicting liver patterns, outliers, and relationships within the data, essential for
disease from optimal features extracted from patient data. Unlike making informed decisions during model development, feature
traditional statistical methods, ANNs offer the advantage of learning selection, and evaluation. In our liver dataset analysis, we utilize
complex patterns and relationships from large datasets, enabling various visualization techniques, including histograms [16], violin
more accurate and robust predictions. Moreover, incorporating XAI plots [17], and correlation heatmaps [18].
methods allows for interpreting and understanding the ANN
model’s decision-making process, addressing the critical need for 2.3. Preprocessing
transparency and explainability in medical AI systems [13]. The
We employed various preprocessing techniques to address
primary objective of this study is to develop ANN-based models
missing values and transform textual values into numerical
trained on a comprehensive dataset of clinical variables associated
representations [19]. In our dataset, we encountered missing
with liver disease, utilizing feature selection techniques to identify
values in the “AGR” feature, totaling four instances. Additionally,
the most informative features for prediction. By leveraging XAI
we converted gender values, where females were represented as
methods, such as Local Interpretable Model-agnostic Explanation
one and males as 0. Missing values in the dataset were addressed
(LIME) and Shapley Additive Explanation (SHAP), we aim to
using data imputation techniques. Specifically, we utilized the
elucidate the underlying factors driving the model’s predictions,
mean imputation method to fill in the missing values of the AGR
enhancing its interpretability and trustworthiness.
feature. The mean imputation formula is given as follows:
This research is motivated by the potential of advanced
computational techniques to revolutionize medical diagnostics and P
decision-making processes. By harnessing the power of ANNs x
Meanð X Þ ¼ (1)
and XAI, we aim to develop more accurate, transparent, and n
clinically relevant predictive models for liver disease. Specifically,
this study integrates ANNs with robust ML models to enhance Here, Meanð X P Þ is the mean value that is used to fill in missing values
predictive accuracy, employs advanced XAI tools to ensure in the dataset, x denotes the sum of all non-missing values in the
transparency in decision-making, and optimizes feature selection feature, and n indicates the total number of non-missing values in the
to target the most informative clinical variables for liver disease feature.
prediction. These advancements can potentially inform clinical We employed one-hot encoding to convert textual values into
practice and improve patient outcomes through early detection and numerical representations [20]. This technique transforms
personalized treatment strategies. categorical variables into binary vectors, effectively representing
each category as a separate feature. In our case, we encoded gender
2. Materials and Methods information, where females were mapped to 1 and males to 0.

The main goal of this study is to accurately predict liver disease 2.4. Ideal feature finding
by employing various ANN-based hybrid models and subsequently
assembling them for improved performance. The research workflow The process of selecting the most relevant and informative
is outlined in Figure 1. Sections 2.1 to 2.9 provide a brief working features from a dataset to improve the performance of ANN-based
structure of the study. ML models. This step is essential in building efficient and

108
Medinformatics Vol. 2 Iss. 2 2025

Figure 1. Working outline of the research

accurate predictive models as it helps reduce dimensionality, (PCA), and recursive feature elimination (RFE) [21] to identify
minimize overfitting, and enhance model interpretability. optimal features for ANN-based ML models, aiming to enhance
In this study, we applied feature selection techniques such as predictions of liver disease outcomes. We identified the most
linear discriminant analysis (LDA), principal component analysis compelling feature selection approach among the tested methods

109
Medinformatics Vol. 2 Iss. 2 2025

Table 1. Analysis of different features of the dataset

Feature Description Value type Unit
Age Represents the age of the individual Numerical Years
Gender Indicates the gender of the individual Nominal –
TB Measures the total amount of bilirubin in the blood, indicating liver function and potential abnormalities. Numerical μmol/L
DB Elevated levels of direct bilirubin may indicate obstructive liver disease Numerical μmol/L
ALPH Reflects the levels of alkaline phosphatase enzyme in the blood, produced by the liver, bones, and other Numerical μkat/L
tissues. Elevated levels of alkaline phosphatase may indicate liver or bone disorders.
ALAT Measures the levels of alanine aminotransferase enzyme in the blood, primarily found in the liver. Numerical U/L
Elevated levels of alanine aminotransferase may indicate liver damage or disease.
ASAT Indicates the levels of aspartate aminotransferase enzyme in the blood, which is also predominantly Numerical U/L
found in the liver. Elevated levels of aspartate aminotransferase may suggest liver damage or
inflammation.
TP Represents the total protein concentration in the blood, including albumin and globulin. Abnormal levels Numerical g/dL
of total proteins may indicate liver disease or other underlying health conditions.
AL Specifies the albumin concentration in the blood, which is synthesized by the liver and plays a crucial Numerical g/dL
role in maintaining osmotic pressure and transporting various substances in the blood.
AGR It provides the ratio of albumin to globulin in the blood, which can indicate liver function and overall Numerical –
health. Albumin and globulin ratio abnormalities may suggest liver disease or other underlying
conditions.
Result Individuals with liver disease are labeled “1”, while those without liver disease are labeled “0”. Nominal –

and integrated it into our models to improve predictive performance. several base models and uses their predictions as input features for a
Furthermore, we employed a max voting ensemble (MVE) technique meta-model, which learns to refine and integrate these predictions.
to combine multiple ANN models utilizing the best feature subset, The meta-model addresses errors and biases of individual models,
significantly boosting accuracy and robustness. yielding a more robust and accurate prediction [26]. Mathematically,
LDA is a dimensionality reduction technique that finds linear the stacking process with an ANN is as follows:
combinations of features to best separate different classes or
categories in the data. It is commonly used for classification tasks Xmeta ¼ ½ANN1 ðXÞ ANN2 ðXÞ . . . ANNn ðXÞ; y (2)
to maximize the separation between classes while minimizing the
variance within each class [22]. Here, X represents the input features, y represents the target variable,
PCA is another dimensionality reduction technique that transforms ANNi(X) represents the prediction made by ANN model i,
the original features into a lower-dimensional space while preserving as and RF(Xmeta) represents the prediction made by the RF as a
much variance as possible. PCA identifies the principal components that meta-model. Then, the RF meta-model is trained on Xmeta:
capture the most significant variation in the data, allowing for
dimensionality reduction and simplification of the dataset [23]. RFðXmeta Þ ¼ ð½ANN1 ðXÞ; ANN2 ðXÞ; . . . ; ANNn ðXÞÞ (3)
RFE is a feature selection method that recursively removes
features based on their importance from the dataset. It trains the The performance metrics are then calculated based on the predictions
model on the remaining features and evaluates their performance, of the RF meta-model.
continuing this process until the optimal subset of features is
identified. RFE helps select the most informative features while 2.5.2. Bagging with ANN
discarding redundant or irrelevant ones, thereby improving model Bagging is an ensemble method that enhances ML stability and
efficiency and interpretability [24]. accuracy by bootstrap sampling to create multiple training subsets.
Each subset trains a base model, and their predictions are aggregated
2.5. ML model construction for the final output [27]. By introducing model diversity, bagging
reduces overfitting and improves generalization. This study employs
Our study thoroughly examined the preprocessed dataset by Bagging with ANNs as base models to mitigate prediction variance.
integrating various ML models with ANN to enhance prediction While ANNs excel at capturing complex data patterns, they are
accuracy. Our approach involved leveraging ANN as the base sensitive to training subsets. Bagging reduces this sensitivity,
model and implementing five distinct algorithms. These enhancing prediction stability.
algorithms are designed to improve predictive performance by The Bagging Classifier aggregates predictions from multiple
incorporating unique methodologies and characteristics. This ANN base models through averaging. For N base model, predictions
comprehensive analysis aims to identify the most effective model are the average of all base model predictions. Mathematically, this is
for accurately predicting liver disease outcomes. Our methodology represented as:
underscores the importance of neural network systems in
maximizing prediction accuracy across different ML models [25].
1X N
BCðXÞ ¼ ANNi ðXÞ (4)
N i¼1
2.5.1. Stacking with ANN (SANN)
Model stacking, or stacked generalization, is an ML technique that Here, ANNi ðXÞ represent the prediction made by the i-th ANN model
combines multiple models to enhance predictive performance. It trains on the input features X, BCðXÞ represent the prediction made by the

110
Medinformatics Vol. 2 Iss. 2 2025

Bagging Classifier on the input features. X. N represents the number

of ANN models used in the ensemble method. Y ¼ decision functionðfSVM ðXÞ þ fANN ðXÞÞ (7)

2.5.3. ABANN This equation encapsulates the idea of integrating the predictions
Adaptive Boosting (AdaBoost) enhances classification from both the SVM and ANN models to form the output of the
performance by combining weak learners, typically shallow DTs, SVMANN hybrid model.
through iterative training that assigns higher weights to misclassified
samples [28]. In AdaBoost with ANNs (ABANN), ANNs replace 2.6. ML model evaluation
traditional weak learners. Multiple ANNs are trained sequentially,
with each focusing more on previous misclassifications. The We evaluate our hybrid ANN-based ML models for liver disease
final ABANN prediction is a weighted sum of individual ANN prediction using an 80:20 train-test split, ensuring robust training and a
predictions, with weights based on their accuracy during training. realistic performance assessment. Predictions are analyzed through a
Mathematically, this process is expressed as: confusion matrix (CM), which categorizes predictions into True Positives
(TP), True Negatives (TN), False Positives (FP), and False Negatives
! (FN), forming the basis for calculating key performance metrics [31].
X
N
ABANNðXÞ ¼ sign αi ANNi ðXÞ (5) Accuracy is the proportion of correctly classified instances and
i¼1 is calculated as:

Here, ANNi ðXÞ represent the prediction made by the i-th ANN model TP þ TN
on the input feature. X, ABANNðXÞ represent the prediction made by 100 (8)
TP þ TN þ FP þ FN
the AdaBoost model on the input features. X, AdaBoost model com-
bines predictions from multiple base models ANN through a Precision measures the accuracy of optimistic predictions, which is
weighted sum. Considering N base models and αi Represents the crucial for minimizing FPs in medical contexts. It is calculated as:
weight assigned to the i-th base model. The sign function ensures
the final prediction is binary, typically {−1, 1} in classification tasks. TP
The weights αi are determined during the training process, favoring 100 (9)
TP þ FP
models with better performance. This iterative approach of combin-
ing multiple ANNs with AdaBoost enhances the model’s overall pre- Recall reflects the model’s ability to identify actual positive cases,
dictive accuracy and robustness. ensuring minimal missed diagnoses. It is computed as:

2.5.4. GBDT with ANN (GANN) TP

100 (10)
Gradient Boosting, mainly represented by the Gradient Boosting TP þ FN
Classifier in scikit-learn, is a powerful ensemble learning technique
that builds a strong predictive model by sequentially adding weak F1 Score balances precision and recall, comprehensively evaluating
learners (typically DT) to an ensemble [29]. Each subsequent weak the model’s performance. It is given by:
learner corrects the errors made by the previous ones, leading to a
final strong learner that combines the predictions of all weak learners. Precision Recall
2 100 (11)
In this implementation, Gradient Boosting with ANNs is used as a Precision þ Recall
base learner, and GANN utilizes ANNs instead of DT as weak learners.
Mathematically, the GANN model is represented as follows: Based on these performance metrics, we selected the models with the
highest Accuracy, Precision, Recall, and F1 Score to determine which
X
N should be taken forward for the MVE. This approach ensures that only
GANNðXÞ ¼ ANNi ðXÞ (6) the most reliable models, demonstrating strong diagnostic capabilities
i¼1 in liver disease prediction, contribute to the ensemble, enhancing our
final model’s robustness and overall predictive accuracy.
Here, ANNi ðXÞ represent the prediction made by the i-th ANN model To evaluate the stability of the Max Voting model, we generated
on the input features. X, GANNðXÞ represent the prediction made by the standard deviation (SD), 95% confidence interval (CI), and
the Gradient Boosting model on the input features. X and N represent receiver operating characteristic (ROC) curve.
the number of ANN models. The SD indicates the variability of the model’s performance
across different runs. A lower SD suggests the model’s performance
2.5.5. SVM with ANN (SVMANN) is more stable and consistent [32].
SVM is a supervised algorithm for classification and regression The 95% CI gives us a range of values within which we can be
that identifies the optimal hyperplane to maximize class separation, 95% confident that the true model performance lies [33]. This helps
using support vectors to define the margin [30]. It solves a convex us understand the potential variability of the model’s effectiveness,
optimization problem to minimize errors and employs kernel ensuring it performs reliably.
functions for non-linearly separable data. In the hybrid SVMANN The ROC curve was generated to visualize the model’s ability to
model, SVM’s high-dimensional handling is combined with ANN’s distinguish between classes at various threshold values. A higher
ability to capture complex patterns, enhancing classification accuracy. area under the curve (AUC) indicates that the model is more
Let’s denote the output of the SVM model as fSVM ðXÞ and the output stable and reliable in its discrimination between classes [34].
of the ANN model as fANN ðXÞ. Then, the combined prediction Y
SVMANN can be obtained by applying the outputs of both models to 2.7. Max Voting
a decision function, which could be a simple sum or another function,
depending on the specific implementation. Here’s the mathematical After evaluating various feature optimization techniques, we
representation: found that LDA outperformed PCA and RFE. We then applied

111
Medinformatics Vol. 2 Iss. 2 2025

MVE [35] to combine predictions from five hybrid ANN models, the most effective. We then implemented a MVE model with LDA,
leveraging their diverse strengths to improve accuracy and addressed outliers through scalarization, and used 10-fold cross-
robustness. This approach enhances prediction using ANN’s validation for results. Finally, XAI techniques were applied to
learning capabilities and LDA’s discriminative power. enhance the interpretability and trustworthiness of the predictions.
We gain valuable insights into the dataset’s structure and feature
b
y ¼ MajorityVote relationships through comprehensive data analysis using visualizations
such as histograms, violin plots, and correlation heatmaps. Histograms
ðySANN þ LDA ; yBANN þ LDA ; yABANN þ LDA ; yGANN þ LDA ; ySVMANN þ LDA Þ
reveal that Age, TPs, and AL follow near-normal distributions, while
(12) features like TB, ALPH, and ASAT exhibit right-skewed distributions
with notable outliers, indicating the presence of extreme values that
could impact model performance. Violin plots further confirm that
2.8. Performance analysis with XAI bilirubin and enzyme levels are highly skewed. In contrast, protein
levels and Age maintain more symmetric distributions, providing a
We analyze our best model’s predictions using XAI techniques, clearer view of data spread and potential anomalies. Additionally, the
such as SHAP, LIME, and Individual Conditional Expectation (ICE)
correlation heatmap highlights strong positive relationships, such as
plots, to gain transparency into its decision-making process. SHAP
between TB and DB and Alamine Aminotransferase and ASAT,
attributes prediction contributions to individual features, while suggesting collinearity among liver function markers. Moderate
LIME provides local explanations for specific predictions. ICE negative correlations, like the inverse relationship between AL and
plots reveal feature effects across instances. These techniques Age, also emerge, offering insights into potential dependencies. These
enhance the interpretability of our ANN-based Max Voting model,
analyses are crucial in understanding data characteristics, guiding
improving its transparency for clinical applications [36–38].
feature selection, and optimizing model performance.
Table 2 consolidates the performance metrics for six models
2.9. External validation across different feature optimization scenarios—no optimization,
To further evaluate the performance of our MVE model, we test it LDA, PCA, and RFE—providing a comprehensive comparison of
in various ways. We gather real-time patient information from multiple accuracy, precision, recall, and F1 score. Without feature reduction,
internet sources [39, 40], collecting three patient data sets representing SVMANN leads with an accuracy of 78.03%, while SANN trails at
diverse demographic and health conditions. These datasets are then 76.32%, setting the baseline for model effectiveness. With LDA
tested against the pre-trained model, developed using a well- applied, overall performance improves, with SANN achieving the
established dataset, allowing us to assess how well the model highest accuracy of 79.15% and SVMANN recording the lowest at
generalizes to unseen real-time data. Additionally, we apply the 75.72%, underscoring the nuanced impact of LDA on these models.
model to a multiclass classification dataset instead of the original When PCA is used, SVMANN emerges as the top performer with a
binary classification task to examine its performance with more 77.44% accuracy, contrasting with ABANN’s lower accuracy of
complex classification problems. This approach helps us evaluate 74.70%, while corresponding precision, recall, and F1 scores
the model’s adaptability and scalability across a broader range of further delineate these differences. Finally, under RFE, SVMANN
potential outcomes. The results from both the real-time patient data again attains the highest accuracy at 78.03%, whereas GANN
and the multiclass dataset provide valuable insights into the model’s shows the lowest at 75.81%. This table highlights how various
capabilities and highlight areas for future improvement. feature optimization techniques distinctly influence model
performance, offering detailed insights into their relative strengths
3. Results and Discussion and weaknesses across multiple evaluation metrics.
Table 3 summarizes the LDA model’s feature importance
After preprocessing the liver dataset, we evaluated ANN-based rankings and coefficients within the Max Voting framework
models and applied feature reduction techniques, finding LDA to be across ten cross-validation folds and the Final Optimal Feature

Table 2. Model performances with and without feature reduction

Feature optimization Models ANN SANN BANN ABANN GANN SVMANN

No Optimization Accuracy 76.92 76.32 75.73 76.41 76.32 78.03

Precision 70.94 49.17 67.66 68.76 49.17 39.02
Recall 58.98 51.48 58.84 67.23 51.48 50
F1 Score 58.17 47.57 59.23 67.65 47.57 43.83
LDA Accuracy 78.03 79.15 78.55 78.21 78.55 75.72
Precision 39.02 39.64 39.28 39.1 39.28 37.86
Recall 50 49.89 50 50 50 50
F1 Score 43.83 44.18 43.99 43.88 43.99 43.09
PCA Accuracy 76.49 75.81 76.07 74.7 76.75 77.44
Precision 70.98 56.81 68.33 64.88 67.23 38.75
Recall 57.18 52.64 57.35 60 61.27 49.95
F1 Score 56.2 50.16 56.86 60.42 62.33 43.64
RFE Accuracy 77.35 77.44 77.26 77.01 75.81 78.03
Precision 47.02 55.19 46.08 69.12 66.89 39.02
Recall 50.52 50.95 50.38 61.62 58.22 50
F1 Score 45.63 46.22 45.3 62.77 58.42 43.83

112
Medinformatics Vol. 2 Iss. 2 2025

TB (0.0328)
Gender (0.0033)
TB (0.0251)
TB (0.0305)
TB (0.0257)
AGR (0.0141)
Gender (0.1017)
AGR (0.0053)
TB (0.0281)
Gender (0.1746)
Table 4. Metrics across 10 folds for Max Voting model
Rank 10 Fold Accuracy Precision Recall F1 score

AGR
1 98.40 98.28 98.24 98.25
2 98.35 98.25 98.30 98.28
3 98.42 98.35 98.27 98.38
4 98.36 98.27 98.28 98.30
AGR (0.0525)
TB (0.0233)
AGR (0.1255)
AGR (0.0447)
AGR (0.0348)
TB (0.0263)
AGR (0.1386)
TB (0.0118)
AGR (0.0801)
ASAT (0.1835)
5 98.39 98.29 98.31 98.30
Rank 9

6 98.37 98.28 98.29 98.31

TB
7 98.38 98.24 98.27 98.28
8 98.36 98.31 98.28 98.30
9 98.40 98.25 98.28 98.32
10 98.37 98.30 98.26 98.36
Gender (0.0952)
AGR (0.0369)
ASAT (0.1548)
Gender (0.1533)
ASAT (0.1423)
ASAT (0.1475)
ALAT (0.1607)
ASAT (0.1741)
ASAT (0.1313)
AGR (0.2219)
Mean 98.38 98.28 98.28 98.31
Standard Deviation 0.0221 0.0329 0.0199 0.0382
Rank 8

Age

95% Confidence ±0.0158 ±0.0236 ±0.0142 ±0.0274

Interval
ASAT (0.1353)
ALAT (0.2087)
Gender (0.1841)
ASAT (0.1823)
Gender (0.1462)
Gender (0.2022)
ALPH (0.2984)
Gender (0.2337)
ALPH (0.2864)
ALPH (0.3109)

Set (FOFS), derived as the union of top features across all folds. AL
Rank 7

consistently ranks as the most influential feature, followed by TP and

ALAT

DB, highlighting their critical role in classification. Mid-ranked

features such as ALAT, Age, and ALPH exhibit moderate
predictive significance, while Gender, AGR, and TB contribute
less but ensure comprehensive feature representation. The
ALPH (0.2983)
ASAT (0.2267)
Age (0.3116)
ALPH (0.3164)
ALPH (0.2995)
Age (0.2472)
Age (0.3014)
Age (0.3340)
Gender (0.2879)
Age (0.3817)

coefficients, shown in parentheses, indicate each feature’s weight

in the classification model, with fold-specific variations, such as
Rank 6

the prominence of DB in Fold 7 and TB in Fold 10, showcasing

the adaptability of the LDA model in leveraging diverse markers
for liver disease diagnosis. The FOFS integrates the top-ranked
features across all folds, ensuring a comprehensive feature set for
effective classification: {‘Gender,’ ‘AST,’ ‘ALPH,’ ‘AL,’ ‘TP,’
DB (0.3990)
ALPH (0.3164)
ALPH (0.3130)
Age (0.3715)
Age (0.3211)
ALPH (0.3241)
ASAT (0.3472)
DB (0.4010)
Age (0.3622)
TB (0.4987)

‘DB,’ ‘ALAT,’ ‘Age,’ ‘TB,’ ‘AGR’}.

Rank 5

The MVE method demonstrates exceptional performance, achieving

an accuracy of 98.38% along with precision, recall, and F1 score values of
Table 3. LDA feature importance sets and coefficients for the Max Voting model

98.28%, 98.28%, and 98.31%, respectively. Additionally, LDA emerges

as the most compelling feature optimization method, and the ensemble
Age (0.4106)
DB (0.3646)
DB (0.4687)
DB (0.4095)
DB (0.4431)
DB (0.4103)
TP (0.3867)
ALPH (0.4084)
DB (0.3973)
ALAT (0.5307)

approach leverages the strengths of individual models with LDA to

enhance predictive performance.
Rank 4

The metrics provided in Table 4 demonstrate the consistency of

the Max Voting model across different folds. These include
accuracy, precision, recall, and F1 score, with averages calculated
for each metric. Additionally, the SD and 95% CI quantify
TP (0.5429)
Age (0.4102)
ALAT (0.5043)
ALAT (0.5186)
ALAT (0.4810)
TP (0.4548)
TB (0.6883)
TP (0.4481)
ALAT (0.4439)
TP (0.6697)

variability and highlight the reliability and stability of the model’s

performance across multiple data subsets.
Rank 3

ALPH

Figure 2 compares the performance metrics of the top-performing

models across various feature optimization arrangements and without
feature optimizations for liver disease prediction. The MVE method,
integrating multiple ANN models with LDA, achieves the highest
ALAT (0.5493)
TP (0.5036)
TP (0.5345)
TP (0.6365)
TP (0.6181)
ALAT (0.4887)
AL (0.6993)
ALAT (0.4848)
TP (0.4712)
AL (0.9073)

accuracy (98.38%) and F1 score (81.8%), highlighting its

superiority over individual models.
Rank 2

ASAT

Figure 3(A) presents the CM for the Max Voting model. It shows
only one misclassification between disease and non-disease cases,
indicating strong predictive performance. Meanwhile, Figure 3(B)
displays the ROC curve, where the model achieves an AUC of 1.00,
AL (0.6101)
AL (0.6748)
AL (0.6895)
AL (0.6904)
AL (0.6444)
AL (0.6692)
DB (1.0238)
AL (0.5628)
AL (0.6611)
DB (0.9085)

signifying perfect classification and excellent discriminative ability.

Rank 1

Figure 4 provides a detailed examination of the features

Gender

influencing the model’s predictions. Figure 4(A) displays the

SHAP summary plot, showing that features like DB and ALPH
strongly contribute to prediction accuracy, while lower values of
AL and TB negatively affect the predictions. In Figure 4(B), the
FOFS
Fold

SHAP waterfall plot illustrates the individual feature impacts. It

10
1
2
3
4
5
6
7
8
9

highlights that ALPH reduces the likelihood of liver disease while

113
Medinformatics Vol. 2 Iss. 2 2025

Max Voting

SVMANN + RFE

SVMANN + PCA

SANN + LDA

SVMANN

0 20 40 60 80 100 120

F1 Score Recall Precision Accuracy Linear (Accuracy)

Figure 2. Comparison of the best-performed models with and without feature optimization and Max Voting

Figure 3. Performance evaluation of the MVE model: (A) CM for classification accuracy and (B) ROC curve for model
discrimination

TPs, along with Age and AL, increase it. Figure 4(C) presents the underscores the model’s interpretability and potential for real-
LIME explanation for class 0 (no liver disease), where Alamine world clinical applications.
Aminotransferase and DB contribute negatively, while TPs have a The performance of Deep Learning models, including Long-Short-
small positive impact. Figure 4(D) shows the LIME explanation Term Memory (LSTM), Gated Recurrent Unit (GRU), and Convolutional
for class 1 (liver disease), where Alamine Aminotransferase and Neural Network-Long-Short-Term Memory (CNN-LSTM) Ensemble
ALPH contribute negatively. Finally, Figure 4(E) illustrates the models, was evaluated on a 583-instance dataset. Among these models,
SHAP FORCE plot, offering a more granular view of the force LSTM achieved the highest accuracy at 68.38%, closely followed by
and direction of each feature’s influence on the final prediction, GRU at 68.12%. However, neither LSTM nor GRU outperformed the
emphasizing their relative contribution in a visual format. These Max Voting model. Regarding precision, recall, and F1 score, LSTM
visualizations comprehensively understand the feature impacts achieved 52.47%, 52.45%, and 47.56%, respectively, while GRU
driving the Max Voting model’s decisions. showed better precision and recall at 59.01% and 55.47% but slightly
Figure 5(A) presents the ICE plots for each feature, showing how lower F1 at 53.88%. The CNN-LSTM ensemble model, on the other
prediction values change with varying feature values. Features like hand, had the lowest performance across all metrics, with an accuracy
Age and TB exhibit a more substantial influence on predictions, of 50.1%, precision of 51.15%, recall of 50.78%, and F1 score of
while Gender and TPs have minimal impact. Figure 5(B) shows the 45.62%. Despite the strengths of these deep learning models, they do
SHAP dependence analysis, revealing that Age and TB contribute not surpass the MVE in predictive accuracy.
positively to predictions. At the same time, ALPH and AL have We further evaluated the performance of our Max Voting model
varying impacts, suggesting their effects are more context- using real-time patient data and a different dataset to assess its
dependent. These analyses provide a deeper understanding of how accuracy across various contexts. The validation with real-time
individual features drive the model’s decision-making process. sample data involved testing the model on new patient samples,
Table 5 compares feature prioritization between various XAI including data from Mr. Akash (23, male) from Dr. Lal’s
methods, such as SHAP, LIME, FORCE, ICE, and clinical Pathology Lab, Mrs. Sushila (53, female) from House of
experts. DB and TB are consistently high-priority features. This Diagnostics, and Mr. Wasif (30, male) from Chughtai Lab. Key
alignment between the model’s decisions and expert judgment health indicators such as TB, DB, ALPH, ALAT, ASAT, TP, AL,

114
Medinformatics Vol. 2 Iss. 2 2025

Figure 4. XAI feature impact analysis in the Max Voting model: (A) SHAP summary plot, (B) SHAP waterfall plot, (C) LIME
explanation for no liver disease prediction, (D) LIME explanation for liver disease prediction, and (E) FORCE plot

Figure 5. Feature impact analysis for liver health assessment in the Max Voting model: (A) ICE plots for each feature and (B) SHAP
dependence analysis of predictive features

115
Medinformatics Vol. 2 Iss. 2 2025

Table 5. Comparison of feature importance rankings in liver disease prediction between XAI methods and clinical expert judgment
XAI decision Experts decision
LIME SHAP
Priority SHAP Disease No Disease WATERFALL FORCE ICE Expert 1 Expert 2
First DB ALAT ALAT ALPH TB TB Both TB and DB are TB/DB
Second ALPH ALPH DB Age ALPH ALPH ASAT, ALPH
the most important
Third Age Age AL TP DB ASAT AL, AGR
Fourth ALAT TP ALPH ASAT ASAT DB Age, Gender
Fifth TP DB TP ALAT TP TP –
Sixth ASAT TB AGR DB ALAT AGR –

Table 6. Comparison of our study with previous studies

Existing literatures Dataset Accuracy Feature optimization Cross validation XAI
Choubey et al. [3] 583 Sample 75.10% ✓ ✖ ✖
Shetty and Satyanarayana [4] 583 Sample 71% ✖ ✖ ✖
Alyabis et al. [5] 583 Sample 79.6% ✓ ✖ ✖
Singh and Agarwal [6] 583 Sample 77.77% ✖ ✖ ✖
Azam et al. [7] 583 Sample 74% ✓ ✖ ✖
Choudhary et al. [8] 583 Sample 70.54% ✖ ✓ ✖
Khan et al. [9] 583 Sample 72.17% ✖ ✖ ✖
Thirunavukkarasu et al. [10] 583 Sample 73.97% ✖ ✖ ✖
Muthuselvan et al. [11] 583 Sample 74.2% ✖ ✓ ✖
Yasmin et al. [12] 583 Sample 76.03% ✓ ✖ ✖
Our Study 583 Sample 98.38% ✓ ✓ ✓

and AGR were used to evaluate the model’s accuracy. The results of the model effectively distinguishes between different risk levels,
this validation were compared with the 583-sample dataset, reflecting its robust predictive capabilities on the MHR dataset.
showcasing the model’s ability to accurately assess and predict Finally, Table 6 compares with existing literature and reveals
patient health metrics. the superior accuracy of 98.38% achieved by the proposed
The MVE model was tested on an external liver disease dataset of approach, significantly outperforming the 70.54% to 79.6% range
30,691 patients [41] for validation. The model demonstrates strong reported in previous studies. Unlike earlier research, this study
performance with an average accuracy of 88.35%, highlighting its incorporates feature optimization, cross-validation, and XAI
ability to generalize effectively to external samples and confirming its techniques, addressing existing gaps. The ensemble model, built
robustness in predicting liver disease outcomes. The model’s on ANN-based hybrid approaches, enhances predictive accuracy
precision, recall, and F1 score also reflect solid performance, with and interpretability, distinguishing it from prior work.
mean values of 92.99%, 79.48%, and 83.26%, respectively. The The lower accuracies and other performance metrics presented
standard deviation for accuracy, precision, recall, and F1 score is in Table 4 can be attributed mainly to the limited size of the
1.94, 1.30, 2.77, and 1.43, respectively, indicating relatively stable dataset, which consists of only 583 samples and 11 features. This
performance across the folds. The 95% confidence intervals for these small dataset restricts the ability of individual models to generalize
metrics are ±1.20 for accuracy, ±1.30 for precision, ±2.77 for recall, effectively, especially when it comes to capturing complex patterns.
and ±1.43 for F1 score, further validating the model’s effectiveness in As a result, the models exhibit lower precision, recall, and F1 scores.
liver disease prediction. When trained on such limited datasets with few features, models are
We collect a Maternal Health Risk (MHR) dataset from Kaggle more prone to overfitting or underfitting, as they lack sufficient
[42], which contains 1,014 samples and seven features divided into information to identify intricate relationships. This ultimately leads to
three classes: low, mid, and high risk. The results of the Max Voting reduced accuracy and other performance metrics [43].
model across 10 folds are evaluated using performance metrics such However, the MVE method with LDA achieves higher
as accuracy, precision, recall, and F1 score. The model demonstrates accuracy despite the limitations of individual models. By
strong performance, achieving an average accuracy of 93.07%, combining predictions from multiple models through Max Voting,
precision of 92.71%, recall at 93.06%, and an F1 score of 92.85%. this approach mitigates the weaknesses of each model, enhancing
For each fold, the standard deviation and 95% confidence intervals overall performance. LDA’s role in reducing dimensionality
are calculated, showing minor variability, with confidence intervals allows each model to focus on the most relevant features,
of ±0.74 for accuracy, ±0.80 for precision, ±1.72 for recall, and improving their performance within the ensemble. The ensemble
±0.89 for F1 score. The CM reveals that the model accurately capitalizes on the strengths of each model. At the same time,
classifies most samples across the three classes, correctly predicting LDA’s feature optimization provides a more transparent, more
60 out of 67 Low-risk cases, 67 out of 71 Mid-risk cases, and 61 out robust representation of the data, leading to improved accuracy
of 64 High-risk cases. Some misclassifications occur, such as 5 Low- and other metrics in the combined outcome.
risk cases misclassified as Mid and 2 as High, along with a few Even though our dataset contains only 11 features, we used
misclassifications in the Mid and High-risk categories, but overall, feature optimization techniques because they enhance model

116
Medinformatics Vol. 2 Iss. 2 2025

Figure 6. Real-time framework for liver disease prediction using semi-auto biochemistry analysis, ANN-based MVE models, and
XAI-driven interpretation

accuracy by refining the dataset to its most informative aspects. against an existing, optimized dataset of 583 samples using LDA,
These techniques, like LDA, PCA, and RFE, help the model focus chosen for its effective feature selection in ANN-based models.
on the features that most significantly contribute to identifying Afterward, predictions are generated using the ANN-based MVE
patterns and improving predictive reliability. By reducing noise for improved accuracy. Finally, XAI enables users, including
and minimizing irrelevant data, feature optimization allows for non-experts, to understand the projections and confidently take
more effective learning and generalization, increasing stability and further medical actions in consultation with experts.
reducing computational complexity. This approach is also
valuable in small datasets, where maximizing the signal-to-noise 4. Conclusion
ratio is critical for robust performance [44].
LDA proved the most compelling feature selection method In conclusion, this study demonstrates a robust approach to
because it maximizes class separation, making it ideal for enhancing liver disease prediction by integrating ANN with five
classification tasks where distinguishing between classes is distinct ML models—Stacking, Bagging, AdaBoost, Gradient-
crucial. Unlike PCA, which reduces dimensionality based on Boosted Decision Tree, and SVM—to create five hybrid models
variance without considering class labels, or RFE, which does not optimized through LDA. Combined into a MVE, these
directly optimize for class discrimination, LDA enhances class LDA-optimized hybrids achieve a significant accuracy increase
separability. Additionally, LDA handles class imbalances better from 79.15% to 98.38%. XAI techniques, such as LIME, SHAP,
by considering the ratio of between-class to within-class variance, and ICE, further support the transparency of the model’s decision-
ensuring that selected features are most relevant for distinguishing making process. We validate the ensemble model’s effectiveness
between classes, even in imbalanced datasets [45]. by comparing its predictions with doctors’ decisions and testing it
Figure 6 depicts our liver disease prediction framework in the on samples from external sources and a multiclass MHR dataset,
real-time scenario. Patient data is collected via questionnaires and confirming its adaptability beyond the initial dataset. A real-time
blood samples, which are then analyzed in a semi-auto biochemistry demonstration of our model underscores its practical utility,
analyzer to measure liver function indicators. This data is tested though the study notes limitations, particularly in applying the

117
Medinformatics Vol. 2 Iss. 2 2025

model to clinical settings due to data constraints. Future work will & S. Uhlig (Eds.), 6G enabled fog computing in IoT:
address these limitations by implementing Differential Privacy and Applications and opportunities (pp. 183–213). Springer.
Clinical Servers to protect patient data, with plans to extend the https://doi.org/10.1007/978-3-031-30101-8_8
model to support multi-disease prediction. Additionally, we aim to [4] Shetty, P. J., & Satyanarayana. (2023). Prediction performance
construct a web server that would enhance the accessibility and of classification models for imbalanced liver disease data.
value of this tool for the broader community and end users. International Journal of Statistics and Applied Mathematics,
8(5), 58–62.
Acknowledgment [5] Alyabis, M. A. S., Howaimil, B. M. I., Alyabes, A. M. S.,
Alrabiah, A. A. H., Alrabiah, A. S. H., Aljumayi, I. M., : : : ,
Our heartfelt gratitude goes to two distinguished physicians who & Binshaheen, H. S. (2022). Prediction of liver diseases
generously shared their expertise, greatly enriching this study. Special using neural network analysis. International Journal of
thanks are extended to Dr. Shahriar Shafiq, a Higher Specialty Pharmaceutical and Bio Medical Science, 2(8), 314–320.
Registrar in Diabetes and Endocrinology at the Royal College of https://doi.org/10.47191/ijpbms/v2-i8-08
Physicians of Edinburgh, England, referenced as Expert 1 in [6] Singh, G., & Agarwal, C. (2023). Prediction and analysis of
Table 5. Dr. Shafiq’s insightful guidance was instrumental in liver disease using extreme learning machine. In Sentiment
navigating the complexities of liver disease research. Appreciation Analysis and Deep Learning: Proceedings of ICSADL 2022,
is also due to Dr. Talha Sami Anik, Assistant Surgeon with the 679–690. https://doi.org/10.1007/978-981-19-5443-6_52
Government of the People’s Republic of Bangladesh, listed as [7] Azam, S., Rahman, A., Iqbal, S. M. H. S., & Ahmed, T. (2020).
Expert 2 in Table 5. Dr. Anik’s extensive experience, including his Prediction of liver diseases by using few machine learning
work at Birdem General Hospital and Dhaka Medical College & based approaches. Australian Journal of Engineering and
Hospital, brought valuable perspectives to this research. Despite Innovative Technology, 2(5), 85–90. https://doi.org/10.
their demanding schedules, both doctors demonstrated exceptional 34104/ajeit.020.085090
commitment and professionalism, providing critical support that [8] Choudhary, R., Gopalakrishnan, T., Ruby, D., Gayathri, A.,
significantly contributed to the advancement of this study. Murthy, V. S., & Shekhar, R. (2021). An efficient model for
predicting liver disease using machine learning. In R.
Ethical Statement Satpathy, T. Choudhury, S. Satpathy, S. N. Mohanty, & X.
Zhang (Eds.), Data analytics in bioinformatics: A machine
This study does not contain any studies with human or animal learning perspective (pp. 443–457). Wiley. https://doi.org/10.
subjects performed by any of the authors.
1002/9781119785620.ch18
[9] Khan, B., Naseem, R., Ali, M., Arshad, M., & Jan, N. (2019).
Conflicts of Interest Machine learning approaches for liver disease diagnosing.
The authors declare that they have no conflicts of interest to this International Journal of Data Science and Advanced
work. Analytics, 1(1), 27–31. https://doi.org/10.69511/ijdsaa.v1i1.71
[10] Thirunavukkarasu, K., Singh, A. S., Irfan, & Chowdhury,
Data Availability Statement A. (2018). Prediction of liver disease using classification
algorithms. In 4th International Conference on Computing
The data that support this work are available upon reasonable Communication and Automation, 1–3. https://doi.org/10.1109/
request to the corresponding author. CCAA.2018.8777655
[11] Muthuselvan, S., Rajapraksh, S., Somasundaram, K., &
Author Contribution Statement Karthik, K. (2018). Classification of liver patient dataset
using machine learning algorithms. International Journal of
Safiul Haque Chowdhury: Conceptualization, Methodology, Engineering & Technology, 7(3.34), 323–326. https://doi.
Software, Validation, Formal analysis, Investigation, Resources, org/10.14419/ijet.v7i3.34.19217
Data curation, Writing – original draft, Writing – review & editing, [12] Yasmin, R., Amin, R., & Reza, S. (2023). Design of novel
Visualization, Project administration. Mohammad Mamun: Formal feature union for prediction of liver disease patients: A
analysis, Investigation, Resources, Data curation, Writing – original machine learning approach. In The Fourth Industrial
draft, Writing – review & editing, Visualization, Project Revolution and Beyond: Select Proceedings of IC4IR+,
administration. Tanvir Ahmed Shaikat: Visualization, Project 515–526. https://doi.org/10.1007/978-981-19-8032-9_36
administration. Mohammed Ibrahim Hussain: Writing – review & [13] Kufel, J., Bargieł-Łączek, K., Kocot, S., Koźlik, M.,
editing, Supervision. Sadiq Iqbal: Writing – review & editing, Bartnikowska, W., Janik, M., : : : , & Gruszczyńska, K.
Visualization, Supervision. Muhammad Minoar Hossain: Writing (2023). What is machine learning, artificial neural networks
– review & editing, Supervision. and deep learning?—Examples of practical applications in
medicine. Diagnostics, 13(15), 2582. https://doi.org/10.
References 3390/diagnostics13152582
[14] Ramana, B. V., Babu, M. S. P., & Venkateswarlu, N. B. (2012).
[1] Williams, R. (2006). Global challenges in liver disease. A critical comparative study of liver patients from USA and
Hepatology, 44(3), 521–526. https://doi.org/10.1002/hep.21347 INDIA: An exploratory analysis. International Journal of
[2] American Liver Foundation. (2023). How many people have liver Computer Science Issues, 9(3), 506–516.
disease? Retrieved from: https://liverfoundation.org/about-your-li [15] Shinde, B. G., & Shivthare, S. (2024). Impact of data visualization
ver/facts-about-liver-disease/how-many-people-have-liver-disease/ in data analysis to improve the efficiency of machine learning
[3] Choubey, D. K., Dubey, P., Tewari, B. P., Ojha, M., & Kumar, models. Journal of Advanced Zoology, 45, 107–112.
J. (2023). Prediction of liver disease using soft computing and [16] Roy, S., Bhalla, K., & Patel, R. (2024). Mathematical analysis
data science approaches. In M. Kumar, S. S. Gill, J. K. Samriya, of histogram equalization techniques for medical image

118
Medinformatics Vol. 2 Iss. 2 2025

enhancement: A tutorial from the perspective of data loss. [32] Lee, D. K., In, J., & Lee, S. (2015). Standard deviation and
Multimedia Tools and Applications, 83(5), 14363–14392. standard error of the mean. Korean Journal of Anesthesiology,
https://doi.org/10.1007/s11042-023-15799-8 68(3), 220–223. https://doi.org/10.4097/kjae.2015.68.3.220
[17] Hu, K. (2020). Become competent within one day in generating [33] Hazra, A. (2017). Using the confidence interval confidently.
boxplots and violin plots for a novice without prior R experience. Journal of Thoracic Disease, 9(10), 4125–4130. https://doi.
Methods and Protocols, 3(4), 64. https://doi.org/10.3390/ org/10.21037/jtd.2017.09.14
mps3040064 [34] Hoo, Z. H., Candlish, J., & Teare, D. (2017). What is an ROC
[18] Gu, Z. (2022). Complex heatmap visualization. iMeta, 1(3), curve? Emergency Medicine Journal, 34(6), 357–359. https://
e43. https://doi.org/10.1002/imt2.43 doi.org/10.1136/emermed-2017-206735
[19] Ismail, A. R., Abidin, N. Z., & Maen, M. K. (2022). Systematic [35] Tian, T., & Zhu, J. (2015). Max-margin majority voting for
review on missing data imputation techniques with machine learning from crowds. In Advances in Neural Information
learning algorithms for healthcare. Journal of Robotics and Processing Systems 28: 29th Annual Conference on Neural
Control, 3(2), 143–152. https://doi.org/10.18196/jrc.v3i2.13133 Information Processing Systems, 1–9.
[20] Yu, L., Zhou, R., Chen, R., & Lai, K. K. (2022). Missing data [36] Barredo Arrieta, A., Díaz-Rodríguez, N., del Ser, J.,
preprocessing in credit classification: One-hot encoding or Bennetot, A., Tabik, S., Barbado, A., : : : , & Herrera, F.
imputation? Emerging Markets Finance and Trade, 58(2), (2020). Explainable Artificial Intelligence (XAI):
472–482. https://doi.org/10.1080/1540496X.2020.1825935 Concepts, taxonomies, opportunities and challenges toward
[21] Pisner, D. A., & Schnyer, D. M. (2020). Support vector responsible AI. Information Fusion, 58, 82–115. https://
machine. In A. Mechelli, & S. Vieira (Eds.), Machine doi.org/10.1016/j.inffus.2019.12.012
learning: Methods and applications to brain disorders (pp. [37] van den Broeck, G., Lykov, A., Schleich, M., & Suciu, D.
101–121). Academic Press. https://doi.org/10.1016/B978-0- (2022). On the tractability of SHAP explanations. Journal of
12-815739-8.00006-7 Artificial Intelligence Research, 74, 851–886. https://doi.org/
[22] Zhao, S., Zhang, B., Yang, J., Zhou, J., & Xu, Y. (2024). Linear 10.1613/jair.1.13283
discriminant analysis. Nature Reviews Methods Primers, 4(1), [38] Kawakura, S., Hirafuji, M., Ninomiya, S., & Shibasaki, R. (2022).
70. https://doi.org/10.1038/s43586-024-00346-y Analyses of diverse agricultural worker data with explainable
[23] Greenacre, M., Groenen, P. J. F., Hastie, T., D’Enza, A. I., artificial intelligence: XAI based on SHAP, LIME, and
Markos, A., & Tuzhilina, E. (2023). Publisher correction: LightGBM. European Journal of Agriculture and Food Sciences,
Principal component analysis. Nature Reviews Methods 4(6), 11–19. https://doi.org/10.24018/ejfood.2022.4.6.348
Primers, 3(1), 22. https://doi.org/10.1038/s43586-023-00209-y [39] Vikas1055. (2019). Lab serial No. patient name age/sex
[24] Chen, X. W., & Jeong, J. C. (2007). Enhanced recursive feature referred by testname. Retrieved from: https://www.coursehe
elimination. In Sixth International Conference on Machine ro.com/file/42005265/labreportnewpdf/
Learning and Applications, 429–435. https://doi.org/10.1109/ [40] Asking for Self. (n.d.). Talk to liver on liver function test.
ICMLA.2007.35 Retrieved from: https://www.marham.pk/forum/liver-speciali
[25] Abiodun, O. I., Jantan, A., Omolara, A. E., Dada, K. V., st/liver-function-test
Mohamed, N. A., & Arshad, H. (2018). State-of-the-art in [41] Velu, S. R., Ravi, V., & Tabianan, K. (2022). Data mining in
artificial neural network applications: A survey. Heliyon, predicting liver patients using classification model. Health and
4(11), e00938. https://doi.org/10.1016/j.heliyon.2018.e00938 Technology, 12(6), 1211–1235. https://doi.org/10.1007/s12553-
[26] Pavlyshenko, B. (2018). Using stacking approaches for machine 022-00713-3
learning models. In IEEE Second International Conference on [42] Ahmed, M., Kashem, M. A., Rahman, M., & Khatun, S. (2020).
Data Stream Mining & Processing, 255–258. https://doi.org/10. Review and analysis of risk factor of maternal health in remote
1109/DSMP.2018.8478522 area using the Internet of Things (IoT). In InECCE2019:
[27] González, S., García, S., del Ser, J., Rokach, L., & Herrera, F. Proceedings of the 5th International Conference on
(2020). A practical tutorial on bagging and boosting based Electrical, Control & Computer Engineering, 357–365.
ensembles for machine learning: Algorithms, software tools, https://doi.org/10.1007/978-981-15-2317-5_30
performance study, practical perspectives and opportunities. [43] Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A survey
Information Fusion, 64, 205–237. https://doi.org/10.1016/ of convolutional neural networks: Analysis, applications, and
j.inffus.2020.07.007 prospects. IEEE Transactions on Neural Networks and
[28] Cao, Y., Miao, Q. G., Liu, J. C., & Gao, L. (2013). Advance Learning Systems, 33(12), 6999–7019. https://doi.org/10.
and prospects of AdaBoost algorithm. Acta Automatica 1109/TNNLS.2021.3084827
Sinica, 39(6), 745–758. https://doi.org/10.1016/S1874- [44] Ali, M. Z., Abdullah, A., Zaki, A. M., Rizk, F. H., Eid, M. M., &
1029(13)60052-X El-Kenway, E. M. (2024). Advances and challenges in feature
[29] Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A selection methods: A comprehensive review. Journal of
comparative analysis of gradient boosting algorithms. Artificial Intelligence and Metaheuristics, 7(1), 67–77.
Artificial Intelligence Review, 54(3), 1937–1967. https://doi. https://doi.org/10.54216/JAIM.070105
org/10.1007/s10462-020-09896-5 [45] Kim, A. K. H., & Chung, H. (2021). The effect of rebalancing
[30] Vishwanathan, S. V. M., & Murty, M. N. (2002). SSVM: A simple on LDA in imbalanced classification. Stat, 10(1), e384. https://
SVM algorithm. In Proceedings of the 2002 International Joint doi.org/10.1002/sta4.384
Conference on Neural Networks, 3, 2393–2398. https://doi.org/
10.1109/IJCNN.2002.1007516 How to Cite: Chowdhury, S. H., Mamun, M., Shaikat, M. T. A., Hussain, M. I., Iqbal,
M. S., & Hossain, M. M. (2025). An Ensemble Approach for Artificial Neural
[31] Fahmy, M. M. (2022). Confusion matrix in binary classification Network-Based Liver Disease Identification from Optimal Features Through
problems: A step-by-step tutorial. Journal of Engineering Hybrid Modeling Integrated with Advanced Explainable AI. Medinformatics, 2(2),
Research, 6(5), T1–T12. 107–119. https://doi.org/10.47852/bonviewMEDIN52024744

119

Medin52024744 R1
No ratings yet
Medin52024744 R1
13 pages
6946-Article Text-15452-1-10-20240908
No ratings yet
6946-Article Text-15452-1-10-20240908
8 pages
ML Model for Early Liver Disease Detection
No ratings yet
ML Model for Early Liver Disease Detection
5 pages
Performance Analysis of Machine Learning Algorithms For Prediction of Liver Disease
No ratings yet
Performance Analysis of Machine Learning Algorithms For Prediction of Liver Disease
7 pages
Liver Disease
No ratings yet
Liver Disease
11 pages
Automatic Detection of Liver Diseases Based On Sup
No ratings yet
Automatic Detection of Liver Diseases Based On Sup
20 pages
Paper 99-Liver Disease Prediction and Classification Using Machine Learning
No ratings yet
Paper 99-Liver Disease Prediction and Classification Using Machine Learning
9 pages
Irjet V5i4896 PDF
No ratings yet
Irjet V5i4896 PDF
4 pages
V BZ IIop 6 Vnig B61 X JNXK 5 S 7 LTo 6 DX FGZ1 C JCX Yo 7
No ratings yet
V BZ IIop 6 Vnig B61 X JNXK 5 S 7 LTo 6 DX FGZ1 C JCX Yo 7
9 pages
(IJCST-V10I5P50) :DR K. Sailaja, Guttalasandu Vasudeva Reddy
No ratings yet
(IJCST-V10I5P50) :DR K. Sailaja, Guttalasandu Vasudeva Reddy
7 pages
2 Paper
No ratings yet
2 Paper
12 pages
Major Project
No ratings yet
Major Project
13 pages
A Deep Learning Approach For Classification and Prediction of Cirrhosis Liver Non Alcoholic Fatty Liver Disease NAFLD
No ratings yet
A Deep Learning Approach For Classification and Prediction of Cirrhosis Liver Non Alcoholic Fatty Liver Disease NAFLD
8 pages
Final - Project Report
No ratings yet
Final - Project Report
19 pages
(Defence)
No ratings yet
(Defence)
33 pages
1 s2.0 S2949953425000025 Main
No ratings yet
1 s2.0 S2949953425000025 Main
24 pages
An Euclidean Distance Based KNN Computational Method For Assessing Degree of Liver Damage
No ratings yet
An Euclidean Distance Based KNN Computational Method For Assessing Degree of Liver Damage
12 pages
Liver Disease Prediction Using Machine L
No ratings yet
Liver Disease Prediction Using Machine L
6 pages
Using A Machine Learning Model To Risk Stratify For The Presence of Significant Liver Disease in A Primary Care Population
No ratings yet
Using A Machine Learning Model To Risk Stratify For The Presence of Significant Liver Disease in A Primary Care Population
18 pages
1 Paper
No ratings yet
1 Paper
6 pages
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
No ratings yet
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
5 pages
Improved Liver Disease Prediction From Clinical Data Through An Evaluation of Ensemble Learning Approaches
No ratings yet
Improved Liver Disease Prediction From Clinical Data Through An Evaluation of Ensemble Learning Approaches
24 pages
Condensed Research Proposal
No ratings yet
Condensed Research Proposal
2 pages
NM Literature Review
No ratings yet
NM Literature Review
2 pages
Liver Disease Prediction via Ensemble
No ratings yet
Liver Disease Prediction via Ensemble
7 pages
Project Publish1
No ratings yet
Project Publish1
12 pages
Liver Disease Prediction Using Ensemble Technique
No ratings yet
Liver Disease Prediction Using Ensemble Technique
4 pages
Comparative Analysis of Machine Learning Techniques For Indian Liver Disease Patients
No ratings yet
Comparative Analysis of Machine Learning Techniques For Indian Liver Disease Patients
5 pages
Exploring The Link Between NAFLD and Gallstone Disease Using Machine Learning Approach
No ratings yet
Exploring The Link Between NAFLD and Gallstone Disease Using Machine Learning Approach
9 pages
Article For DV
No ratings yet
Article For DV
17 pages
Hepatitis
No ratings yet
Hepatitis
42 pages
Liver Disease Prediction Using Ensemble Technique
No ratings yet
Liver Disease Prediction Using Ensemble Technique
4 pages
Cirrhosis Prediction in Chronic Liver Disease Patients Using Machine Learning Techniques
No ratings yet
Cirrhosis Prediction in Chronic Liver Disease Patients Using Machine Learning Techniques
5 pages
A Detection and Segmentation of Medical Image Using Machine Learning Algorithms
No ratings yet
A Detection and Segmentation of Medical Image Using Machine Learning Algorithms
5 pages
Final Paper
No ratings yet
Final Paper
6 pages
Applications of Machine Learning For Prediction of Liver Disease
No ratings yet
Applications of Machine Learning For Prediction of Liver Disease
3 pages
A Tentative Analysis of Liver Disorder Using Data Mining Algorithms J48, Decision Table and Naive Bayes
No ratings yet
A Tentative Analysis of Liver Disorder Using Data Mining Algorithms J48, Decision Table and Naive Bayes
4 pages
Accurate Diagnosis of Liver Diseases Through The Application of Deep Convolutional Neural Network On Biopsy Images
No ratings yet
Accurate Diagnosis of Liver Diseases Through The Application of Deep Convolutional Neural Network On Biopsy Images
29 pages
Hepatitis Disease Prediction Using - Machine.Learning
No ratings yet
Hepatitis Disease Prediction Using - Machine.Learning
12 pages
Liver Failure and Cirrhosis Prediction - Using Methods For Machine Learning
No ratings yet
Liver Failure and Cirrhosis Prediction - Using Methods For Machine Learning
7 pages
Literature Review - Week 2
No ratings yet
Literature Review - Week 2
3 pages
Parul Institute of Engineering and Technology Faculty of Engineering and Technology Department of Information Technology
No ratings yet
Parul Institute of Engineering and Technology Faculty of Engineering and Technology Department of Information Technology
15 pages
Jyoti 2020
No ratings yet
Jyoti 2020
5 pages
Prediction of Hepatitis Disease Using Effective Deep Neural Network
No ratings yet
Prediction of Hepatitis Disease Using Effective Deep Neural Network
5 pages
29JOICS
No ratings yet
29JOICS
9 pages
Online and Biomedical Engineering
No ratings yet
Online and Biomedical Engineering
15 pages
Liver Cirrhosis PPT
No ratings yet
Liver Cirrhosis PPT
19 pages
Extreme Learning Machine Framework For Risk Stratification
No ratings yet
Extreme Learning Machine Framework For Risk Stratification
20 pages
Liver Disease Prediction Using Machine Learning and Deep Learning
No ratings yet
Liver Disease Prediction Using Machine Learning and Deep Learning
73 pages
Sample Document Paper
No ratings yet
Sample Document Paper
17 pages
Sat - 9.Pdf - Predicting Liver Failure Using Supervised Machine Learning Approach
No ratings yet
Sat - 9.Pdf - Predicting Liver Failure Using Supervised Machine Learning Approach
11 pages
Final Paper
No ratings yet
Final Paper
10 pages
Tabular Data Generation To Improve Classification of Liver Disease DiagnosisApplied Sciences Switzerland
No ratings yet
Tabular Data Generation To Improve Classification of Liver Disease DiagnosisApplied Sciences Switzerland
18 pages
Automatic Classification of Diffuse Liver Diseases: Cirrhosis & Hepatosteatosis Using Ultrasound Images
No ratings yet
Automatic Classification of Diffuse Liver Diseases: Cirrhosis & Hepatosteatosis Using Ultrasound Images
5 pages
A1. HepatitsNet
No ratings yet
A1. HepatitsNet
1 page
Liver Disease Prediction Using Machine Learning Final
No ratings yet
Liver Disease Prediction Using Machine Learning Final
22 pages
Performance Evolution of Different Machine Learning Algorithms For Prediction of Liver Disease
No ratings yet
Performance Evolution of Different Machine Learning Algorithms For Prediction of Liver Disease
8 pages
Major Project Synopsis
No ratings yet
Major Project Synopsis
5 pages
Crafting Formal PowerPoint
No ratings yet
Crafting Formal PowerPoint
3 pages
Experiment 3
No ratings yet
Experiment 3
8 pages
Joy CT
No ratings yet
Joy CT
1 page
Final - MATERIALS-TO-BE
No ratings yet
Final - MATERIALS-TO-BE
27 pages
LU Decomposition - and - Inverse
No ratings yet
LU Decomposition - and - Inverse
26 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Quantile Project
No ratings yet
Quantile Project
38 pages
Syllabus
No ratings yet
Syllabus
2 pages
Exp 1 Study of Different Types of Diode Characteristics
No ratings yet
Exp 1 Study of Different Types of Diode Characteristics
6 pages
L9 - Binary Search Tree
No ratings yet
L9 - Binary Search Tree
33 pages
Lab Report 1
No ratings yet
Lab Report 1
7 pages
L10 - Heap Sort
No ratings yet
L10 - Heap Sort
53 pages
L8 - Huffman Algorithm
No ratings yet
L8 - Huffman Algorithm
52 pages
Enhanced Parkinson's Disease Prediction Using Feature Selection, SMOTE, and Machine Learning With Deep Learning Models
No ratings yet
Enhanced Parkinson's Disease Prediction Using Feature Selection, SMOTE, and Machine Learning With Deep Learning Models
6 pages
6 Month Weekly Checklist
No ratings yet
6 Month Weekly Checklist
3 pages
2025 k-gks 정부 초청 대학원 장학생 지원가능학과
No ratings yet
2025 k-gks 정부 초청 대학원 장학생 지원가능학과
6 pages
Lecture 19 Basics of Cache
No ratings yet
Lecture 19 Basics of Cache
23 pages
Learn 2
No ratings yet
Learn 2
10 pages
Linking References in Google Docs
No ratings yet
Linking References in Google Docs
1 page
Data Mining Final Solution
No ratings yet
Data Mining Final Solution
10 pages
Bank
No ratings yet
Bank
24 pages
CONFERENCES
No ratings yet
CONFERENCES
1 page
6 Month Research Roadmap
No ratings yet
6 Month Research Roadmap
2 pages
AI Lecture 01
No ratings yet
AI Lecture 01
23 pages
Learn
No ratings yet
Learn
20 pages
F1 Introduction To Big Data
No ratings yet
F1 Introduction To Big Data
24 pages
Untitled Document
No ratings yet
Untitled Document
30 pages
Help 2
No ratings yet
Help 2
102 pages
JPM 12 01211 v2
No ratings yet
JPM 12 01211 v2
16 pages
Memory Hierarchy & Technology Trends
No ratings yet
Memory Hierarchy & Technology Trends
8 pages
ML LAB Viva Questions With Answers
No ratings yet
ML LAB Viva Questions With Answers
10 pages
Final Project
100% (2)
Final Project
28 pages
R: Adabag
No ratings yet
R: Adabag
34 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
2023 CFA© Program Curriculum Level II Volume 1: Quantitative Methods and Economics 1st Edition Cfa Institute Ready To Read
No ratings yet
2023 CFA© Program Curriculum Level II Volume 1: Quantitative Methods and Economics 1st Edition Cfa Institute Ready To Read
82 pages
Seminar Report
No ratings yet
Seminar Report
25 pages
Loan Approval Prediction Using Ensemble ML
No ratings yet
Loan Approval Prediction Using Ensemble ML
10 pages
ML Interview Prep: Design Patterns
No ratings yet
ML Interview Prep: Design Patterns
29 pages
ML Price Prediction
No ratings yet
ML Price Prediction
7 pages
Data Science Bro - 2a
No ratings yet
Data Science Bro - 2a
28 pages
10.1515 - Rams 2024 0006
No ratings yet
10.1515 - Rams 2024 0006
18 pages
Unit 3: Classification & Regression: Question Bank and Its Solution
No ratings yet
Unit 3: Classification & Regression: Question Bank and Its Solution
180 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
18 pages
AI Tools
No ratings yet
AI Tools
16 pages
Bagging MARS for Diabetes Study
No ratings yet
Bagging MARS for Diabetes Study
8 pages
Improving Regressors Using Boosting Techniques: Observations, XX
No ratings yet
Improving Regressors Using Boosting Techniques: Observations, XX
9 pages
Machine Learning and AI in Marketing - Connecting Computing Power To Human Insights
No ratings yet
Machine Learning and AI in Marketing - Connecting Computing Power To Human Insights
24 pages
ML Predicts PET FRP Concrete Behavior
No ratings yet
ML Predicts PET FRP Concrete Behavior
21 pages
An Ensemble Technique To Predict Parkinson's Disease Using Machine Learning Algorithms
No ratings yet
An Ensemble Technique To Predict Parkinson's Disease Using Machine Learning Algorithms
17 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
ML for RC Beam Shear Capacity
No ratings yet
ML for RC Beam Shear Capacity
108 pages
Syllabus AIML BCA
No ratings yet
Syllabus AIML BCA
19 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
(Mirza 2022) Inflation Prediction in Emerging Economies ML and FX Reserve Integration For Enhanced Forecasting
No ratings yet
(Mirza 2022) Inflation Prediction in Emerging Economies ML and FX Reserve Integration For Enhanced Forecasting
11 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Algorithms & Machine Learning Intro
No ratings yet
Algorithms & Machine Learning Intro
76 pages
Bagging Vs Boosting in Machine Learning
100% (1)
Bagging Vs Boosting in Machine Learning
4 pages
6th Sem Project PDF
No ratings yet
6th Sem Project PDF
18 pages

05 MEDIN52024744 Online

Uploaded by

05 MEDIN52024744 Online

Uploaded by

Received: 6 November 2024 | Revised: 11 February 2025 | Accepted: 3 March 2025 | Published online: 20 March 2025

An Ensemble Approach for Artificial Neural

1. Introduction individuals aged 45–64 in 2019. Prevalence rates from a 2016

Figure 1. Working outline of the research

Table 1. Analysis of different features of the dataset

Bagging Classifier on the input features. X. N represents the number

2.5.4. GBDT with ANN (GANN) TP

Table 2. Model performances with and without feature reduction

No Optimization Accuracy 76.92 76.32 75.73 76.41 76.32 78.03

6 98.37 98.28 98.29 98.31

95% Confidence ±0.0158 ±0.0236 ±0.0142 ±0.0274

consistently ranks as the most influential feature, followed by TP and

DB, highlighting their critical role in classification. Mid-ranked

coefficients, shown in parentheses, indicate each feature’s weight

the prominence of DB in Fold 7 and TB in Fold 10, showcasing

‘DB,’ ‘ALAT,’ ‘Age,’ ‘TB,’ ‘AGR’}.

The MVE method demonstrates exceptional performance, achieving

98.28%, 98.28%, and 98.31%, respectively. Additionally, LDA emerges

approach leverages the strengths of individual models with LDA to

The metrics provided in Table 4 demonstrate the consistency of

variability and highlight the reliability and stability of the model’s

Figure 2 compares the performance metrics of the top-performing

accuracy (98.38%) and F1 score (81.8%), highlighting its

signifying perfect classification and excellent discriminative ability.

Figure 4 provides a detailed examination of the features

influencing the model’s predictions. Figure 4(A) displays the

SHAP waterfall plot illustrates the individual feature impacts. It

highlights that ALPH reduces the likelihood of liver disease while

F1 Score Recall Precision Accuracy Linear (Accuracy)

Table 6. Comparison of our study with previous studies

You might also like