0% found this document useful (0 votes)
11 views25 pages

Paper 5

Research paper

Uploaded by

mehedingdc21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Paper 5

Research paper

Uploaded by

mehedingdc21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

mathematics

Article

Hybrid Random Feature Selection and Recurrent Neural


Network for Diabetes Prediction
Oyebayo Ridwan Olaniran 1, *,† , Aliu Omotayo Sikiru 1,† , Jeza Allohibi 2 , Abdulmajeed Atiah Alharbi 2
and Nada MohammedSaeed Alharbi 2

1 Department of Statistics, Faculty of Physical Sciences, University of Ilorin, llorin 1515, Nigeria;
aliu.tayo30@gmail.com
2 Department of Mathematics, Faculty of Science, Taibah University, Al-Madinah Al-Munawara 42353,
Saudi Arabia; jlohibi@taibahu.edu.sa (J.A.); aahharbi@taibahu.edu.sa (A.A.A.);
nmharbi@taibahu.edu.sa (N.M.A.)
* Correspondence: olaniran.or@unilorin.edu.ng
† These authors contributed equally to this work.

Abstract: This paper proposes a novel two-stage ensemble framework combining Long
Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) with randomized feature
selection to enhance diabetes prediction accuracy and calibration. The method first trains
multiple LSTM/BiLSTM base models on dynamically sampled feature subsets to promote
diversity, followed by a meta-learner that integrates predictions into a final robust output.
A systematic simulation study conducted reveals that feature selection proportion critically
impacts generalization: mid-range values (0.5–0.8 for LSTM; 0.6–0.8 for BiLSTM) optimize
performance, while values close to 1 induce overfitting. Furthermore, real-life data evalua-
tion on three benchmark datasets—Pima Indian Diabetes, Diabetic Retinopathy Debrecen,
and Early Stage Diabetes Risk Prediction—revealed that the framework achieves state-
of-the-art results, surpassing conventional (random forest, support vector machine) and
recent hybrid frameworks with an accuracy of up to 100%, AUC of 99.1–100%, and superior
calibration (Brier score: 0.006–0.023). Notably, the BiLSTM variant consistently outperforms
unidirectional LSTM in the proposed framework, particularly in sensitivity (98.4% vs. 97.0%
on retinopathy data), highlighting its strength in capturing temporal dependencies.

Keywords: recurrent neural network; long short-term memory; diabetes prediction;


Received: 1 February 2025 ensemble learning
Revised: 9 February 2025
Accepted: 12 February 2025
MSC: 62F15; 62G20; 62G08
Published: 14 February 2025

Citation: Olaniran, O.R.; Sikiru,


A.O.; Allohibi, J.; Alharbi, A.A.;
Alharbi, N.M. Hybrid Random
1. Introduction
Feature Selection and Recurrent
Neural Network for Diabetes The widespread adoption of machine learning (ML) in prediction tasks has yet to fully
Prediction. Mathematics 2025, 13, 628. address the growing global burden of chronic diseases such as diabetes, which is projected
https://doi.org/10.3390/ to affect 700 million people by 2045 [1]. Diabetes, categorized into type 1 (T1D) and type
math13040628
2 (T2D), poses significant health risks to children and adults, respectively [1,2]. Early
Copyright: © 2025 by the authors. detection is critical for timely intervention, yet traditional ML methods, including logistic
Licensee MDPI, Basel, Switzerland. regression (LR), random forest (RF), and linear discriminant analysis (LDA), struggle
This article is an open access article
with temporally structured medical data. For instance, fluctuating biomarkers like blood
distributed under the terms and
glucose and insulin levels require models capable of capturing longitudinal dependencies,
conditions of the Creative Commons
Attribution (CC BY) license
a challenge for conventional techniques that often overlook temporal relationships [2,3].
(https://creativecommons.org/ Recent advances in deep learning (DL), particularly Long Short-Term Memory (LSTM)
licenses/by/4.0/). and Bidirectional LSTM (Bi-LSTM), offer promising solutions for sequence prediction

Mathematics 2025, 13, 628 https://doi.org/10.3390/math13040628


Mathematics 2025, 13, 628 2 of 25

tasks. Unlike traditional recurrent neural networks (RNNs), which suffer from vanishing
gradients during backpropagation, LSTM architectures mitigate this issue through gated
memory cells [1,4–8]. Bi-LSTM extends this capability by processing data in both forward
and backward directions, enhancing sensitivity to temporal patterns [4]. Despite these
advantages, diabetes research has predominantly relied on traditional ML models, which
focus on static datasets and yield limited generalizability [9–13].
A paradigm shift is emerging in medical AI, with hybrid DL approaches demonstrating
superior performance. For example, Sun et al. [2] employed Bi-LSTM to predict blood
glucose levels, outperforming autoregressive (ARIMA) and support vector regression
(SVR) models in reducing root mean square error (RMSE). Similarly, Kusuma et al. [14]
combined convolutional and LSTM networks (CNN-LSTM) to achieve 99.5% accuracy in
heart failure prediction using ECG signals, highlighting the potential of hybrid architectures
for clinical applications. Cheng et al. [4,15] further advanced this trend with a knowledge-
extended CNN (KE-CNN) for diabetes prediction, achieving 95.8% accuracy through entity
recognition and feature selection.
Extreme Learning Machines (ELMs) have also gained traction, offering rapid training
speeds and low mean squared error (MSE). Pangaribuan et al. [16] demonstrated ELM’s
efficacy in diabetes diagnosis (MSE: 0.4036), while Elsayed et al. [17,18] achieved 98.1%
accuracy in early-stage risk prediction. However, such studies often rely on small, non-
representative datasets, limiting generalizability. Hybrid models like CNN-LSTM-SVM [19,20]
and weather-predictive LSTM [21] further illustrate the versatility of sequential learning but
face challenges in scalability and gradient management [3,22].
Apart from single-stage methods that solely utilize classification procedures, several
hybrid methods have emerged that combine feature selection with classification techniques.
For instance, ref. [23] integrated particle swarm optimization (PSO) for feature optimization
with the Fuzzy Clustering Model (FCM). Similarly, ref. [24] employed principal component
analysis (PCA) in conjunction with K-means clustering [25] for feature selection, subse-
quently using these selected features to inform logistic regression predictions. In a related
approach, ref. [26] harnessed the strengths of variational autoencoders (VAEs) for sample
data augmentation and sparse autoencoders (SAEs) for feature augmentation, feeding the
results into a convolutional neural network (CNN) for prediction. Additionally, ref. [27]
selected key features (KFs) before integrating them into an ensemble framework for predic-
tions. ref. [7] adopted the Boruta feature selection algorithm, combining it with ensemble
learning to predict diabetes.
While effective, these methods, with the exception of the Boruta approach by [7],
primarily belong to the broad category of filter methods, which aim to identify key features
prior to classification. These filter methods often rely on a greedy learning strategy that
focuses on the most relevant features. While this approach can be effective in smooth feature
spaces devoid of interactions between relevant and non-relevant features, it may falter
in more complex scenarios, leading to difficulties in accurately identifying key features
and, consequently, increasing false positives and diminishing prediction accuracy. In
contrast, the Boruta feature selection algorithm, overlaid on a random forest procedure,
exemplifies a wrapper technique. This method considers a more comprehensive feature
space by randomly sampling features, thereby creating a sampling distribution that fosters
diversity among base learners. This framework inspired the random feature technique
utilized in the first stage of our proposed hybrid LSTM and BiLSTM models. To address
the limitations inherent in existing feature selection methods, we combined the predictions
from base models trained on random features using a stacking approach, enhancing overall
prediction accuracy.
Mathematics 2025, 13, 628 3 of 25

Despite these advancements, critical gaps remain. Many existing approaches, includ-
ing Random Weighted LSTM (RWL) [28], have been validated on limited datasets such as
the Pima Indian cohort, which constrains their clinical applicability. Additionally, issues
such as vanishing gradients and computational inefficiencies impede real-time deployment
in clinical settings. To overcome these challenges, we propose the Random Feature LSTM
and BiLSTM (RFLSTM and RFBiLSTM) frameworks. These frameworks integrate dynamic
feature selection and model stacking, optimizing feature diversity while leveraging tem-
poral processing to enhance computational efficiency and generalizability. As a result,
RFLSTM and RFBiLSTM present a robust solution for diabetes prediction and broader
healthcare applications.

2. Random Feature Recurrent Neural Networks


The proposed method, termed the Random Feature Neural Network, is currently
limited to using only LSTM and BiLSTM models; therefore, we focus our evaluation on
these two RNN architectures. However, the proposed procedure is flexible and can be
readily extended to other types of RNN models, including Gated Recurrent Units (GRUs),
without loss of generality. Specifically, we defined the dataset as D = {( Xi , yi )}in=1 , where
each Xi ∈ R p represents a feature vector with p features and yi ∈ {0, 1} indicates the
presence or absence of diabetes.

2.1. Random Feature LSTM


The LSTM ensemble is trained in two stages (Algorithm 1): random feature selec-
tion and LSTM training (Stage 1), followed by a second-stage LSTM model for stacking
predictions (Stage 2).

Algorithm 1 Random Feature LSTM


1: procedure RFLSTM(Xtrain , ytrain , Xtest , ytest , nmodels , nfeatures )
2: Initialize matrices for predictions on train and test sets
3: for each model i in nmodels do
4: Randomly select nfeatures from Xtrain
5: Train LSTM model Mi on the selected features
(i ) (i )
6: Store predictions ŷtrain and ŷtest
7: end for
8: Construct meta-features Xmeta by concatenating predictions and original features
9: Train stacked LSTM model Mstack on Xmeta
10: Output final predictions ŷfinal
11: end procedure

2.1.1. Stage 1: Random Feature Selection and LSTM Training


In this stage, nmodels LSTM models are trained, each using a randomly selected subset
of nfeatures = ⌊η × p⌋ features, where 0 < η < 1 is the desired proportion of the feature set
to be selected. It is recommended that η ≥ 0.5 is sufficient.
For each model Mi , the training procedure is as follows:

(i ) (i )
Xtrain ⊆ Xtrain where dim( Xtrain ) = nfeatures , (1)
ht = LSTM( Xt , ht−1 ; θ ), (2)
1
ŷt = σ (W · ht + b), σ( x) = . (3)
1 + e− x

The predictions on training and test sets are stored as

(i ) (i ) (i ) (i )
ŷtrain = Mi ( Xtrain ), ŷtest = Mi ( Xtest ).
Mathematics 2025, 13, 628 4 of 25

2.1.2. Stage 2: Stacking LSTM Model


The second-stage LSTM model is trained using the meta-feature set:
h i
Xmeta = Xtrain Ŷtrain ,

where Ŷtrain contains the predictions from the Stage 1 models. The final prediction is
obtained by training another LSTM on Xmeta :

ŷfinal = Mstack ( Xmeta ).

Theorem 1 (lower misclassification error of Random Feature LSTM). Let D be a data dis-
tribution over feature vectors X ∈ R p and labels y ∈ {0, 1}. Let f LSTM denote a standard LSTM
classifier trained on all p features and f RF-LSTM denote the Random Feature LSTM ensemble with
nmodels base LSTMs trained on subsets of nfeatures = ⌊η p⌋ features (η ≥ 0.5), followed by a stacking
LSTM. Use the following assumptions:
1. Diversity: The base LSTMs’ prediction errors are not perfectly correlated due to random
feature selection.
2. Optimality: The stacking LSTM can approximate the optimal combination of base predictions
and original features.
Then, the misclassification error rate of f RF-LSTM is bounded above by that of f LSTM :

ϵRF-LSTM ≤ ϵLSTM

Proof. Let the misclassification error ϵ( f ) be defined as

ϵ( f ) = E(X,y)∼D [I( f ( X ) ̸= y)], (4)

and then the bias–variance decomposition under the squared loss can be obtained using
h i
ϵ( f ) = EX (E[ f ( X )] − E[y| X ])2 + EX [Var( f ( X ))] + EX [Var(y| X )] . (5)
| {z } | {z } | {z }
Var( f ) σ2
Bias2 ( f )

For classifier f LSTM :


h i
Bias2LSTM = EX (E[ f LSTM ( X )] − E[y| X ])2 ,
h i
VarLSTM = EX E[ f LSTM ( X )2 ] − (E[ f LSTM ( X )])2 .

For f RF-LSTM , let Ŷ (i) = Mi ( X (i) ) be base model predictions:


h i
Bias2RF-LSTM = EX (E[ Mstack ( Xmeta )] − E[y| X ])2 ,
h i
VarRF-LSTM = EX E[ Mstack ( Xmeta )2 ] − (E[ Mstack ( Xmeta )])2 .

We now proceed with the main results, where we assume that the Random Feature
ensemble will reduce the variance of prediction. That is, for base models M1 , . . . , Mnmodels
with predictions Ŷ (i) ,
! !
nmodels nmodels
1 1
Var
nmodels ∑ Ŷ (i )
=
n2models
∑ Var(Ŷ ) + 2 ∑ Cov(Ŷ , Ŷ
(i ) (i ) ( j)
)
i =1 i =1 i< j
Mathematics 2025, 13, 628 5 of 25

Under diversity assumption (Cov(Ŷ (i) , Ŷ ( j) ) ≈ 0),

1
Varbase ≈ Var( Mi ).
nmodels

Now we proceed with the second stage of the algorithm, where we propose a stacked
prediction. The stacking model receives enhanced input:

Xmeta = [ X, Ŷ (1) , . . . , Ŷ (nmodels ) ]

With information preservation (η ≥ 0.5),

Bias2RF-LSTM ≤ Bias2LSTM + δstack ,

where δstack ≤ 0 due to the stacking LSTM’s capacity to reduce residual bias through
meta-features.
Finally, combining the results, we have

VarLSTM
ϵRF-LSTM = Bias2LSTM + δstack + + σ2 .
| {z } nmodels
2 BiasRF-LSTM
| {z }
VarRF-LSTM

Since δstack ≤ 0 and nmodels ≥ 1,

ϵRF-LSTM ≤ Bias2LSTM + VarLSTM + σ2 = ϵLSTM .

Remark 1. yokyok
1. Strict inequality ϵRF-LSTM < ϵLSTM holds with non-zero diversity: RF-LSTM outperforms a
single LSTM as long as the individual LSTM models exhibit sufficient diversity, which enables
the ensemble to combine complementary information and reduce variance.
2. This requires regularization for Mstack to prevent overfitting on Xmeta : The meta-classifier in
the RF-LSTM framework must be regularized to avoid overfitting to the outputs of the LSTM
models, ensuring that the ensemble generalizes well to unseen data.

Corollary 1 (higher accuracy of Random Feature LSTM). Let the accuracy A( f ) of a classifier
f be defined as
A ( f ) = 1 − ϵ ( f ),

where ϵ( f ) is the misclassification error. Under the same conditions as Theorem 1, the accuracy of
the Random Feature LSTM (ARF-LSTM ) satisfies the following inequality:

ARF-LSTM ≥ ALSTM .

Proof. From Theorem 1, the misclassification error rate of f RF-LSTM is bounded above by
that of f LSTM , i.e.,
ϵRF-LSTM ≤ ϵLSTM .

Rewriting the relationship in terms of accuracy,

ARF-LSTM = 1 − ϵRF-LSTM , ALSTM = 1 − ϵLSTM .


Mathematics 2025, 13, 628 6 of 25

Substituting the inequality for misclassification error,

1 − ϵRF-LSTM ≥ 1 − ϵLSTM .

Simplifying,
ARF-LSTM ≥ ALSTM .

This establishes that the Random Feature LSTM achieves accuracy at least as high as
the standard LSTM, with the equality holding when the diversity or stacking optimization
conditions are not met.

Remark 2. yokyok
1. The strict inequality ARF-LSTM > ALSTM holds when the individual LSTM models exhibit
sufficient diversity and the stacking LSTM effectively combines their predictions to reduce
variance and bias.
2. Regularization of the stacking model (Mstack ) ensures that overfitting to Xmeta does not degrade
the ensemble’s generalization, preserving the accuracy advantage of RF-LSTM.

2.2. Random Feature BiLSTM


The procedure for the BiLSTM ensemble mirrors that of the LSTM ensemble, with the
critical difference being that a Bidirectional LSTM is used in both stages (Algorithm 2).

Algorithm 2 Random Feature BiLSTM


1: procedure RFB I LSTM(Xtrain , ytrain , Xtest , ytest , nmodels , nfeatures )
2: Initialize matrices for predictions on train and test sets
3: for each model i in nmodels do
4: Randomly select nfeatures from Xtrain
5: Train BiLSTM model MiBi on the selected features
(i ) (i )
6: Store predictions ŷtrain and ŷtest
7: end for
8: Construct meta-features XmetaBi by concatenating predictions and original features
9: Train stacked BiLSTM model Mstack Bi Bi
on Xmeta
10: Bi
Output final predictions ŷfinal
11: end procedure

2.2.1. Stage 1: Random Feature Selection and BiLSTM Training


For each model MiBi , the forward and backward passes are computed as

→ −−→
ht = LSTM( Xt , ht−1 ; θ ), (6)

− ←−−
ht = LSTM( Xt , ht+1 ; θ ), (7)
"−
→#
ht
ht = ← − . (8)
ht

The predictions are stored similarly:

(i ) (i ) (i ) (i )
ŷtrain = MiBi ( Xtrain ), ŷtest = MiBi ( Xtest ).

2.2.2. Stage 2: Stacking BiLSTM Model


The stacking model for BiLSTM is trained on the meta-feature set constructed using
BiLSTM predictions, and the final output is

ŷBi Bi Bi
final = Mstack ( Xmeta ).
Mathematics 2025, 13, 628 7 of 25

The results for the Random Feature LSTM ensemble (RF-LSTM) extend naturally to
the Random Feature BiLSTM ensemble (RF-BiLSTM). We formally present the theorem,
corollary, and their proofs below.

Theorem 2 (Misclassification error bound for RF-BiLSTM). Let FBiLSTM denote the hypothesis
class of a single BiLSTM model trained on a dataset D , and let FRF-BiLSTM denote the hypothesis
class of an RF-BiLSTM ensemble constructed by averaging predictions over M independently
trained BiLSTM models, each using a random subset of input features. Under the assumption of
independent errors across individual (diversity) BiLSTM models, the expected classification error
ϵRF-BiLSTM of the ensemble satisfies

ϵRF-BiLSTM ≤ ϵBiLSTM ,

where ϵBiLSTM is the classification error of a single BiLSTM model. Moreover,

ϵRF-BiLSTM ≤ ϵRF-LSTM ≤ ϵLSTM .

Proof. The bidirectional nature of BiLSTM improves the representational capacity by


incorporating both forward and backward temporal dependencies, effectively doubling
the input context for each time step. As a result, FBiLSTM is a richer hypothesis class than
FLSTM , leading to
ϵBiLSTM ≤ ϵLSTM .

When random feature ensembles are applied, the variance reduction achieved by
averaging predictions across M models further decreases the classification error, resulting in

ϵRF-BiLSTM ≤ ϵBiLSTM and ϵRF-LSTM ≤ ϵLSTM .

Combining these inequalities gives

ϵRF-BiLSTM ≤ ϵRF-LSTM ≤ ϵLSTM .

Corollary 2 (classification accuracy for RF-BiLSTM). Let ALSTM , ABiLSTM , ARF-LSTM , and
ARF-BiLSTM represent the classification accuracies of their respective models. Then, the following
inequalities hold:
ARF-BiLSTM ≥ ABiLSTM ≥ ARF-LSTM ≥ ALSTM .

Proof. By definition, classification accuracy A is inversely related to classification error ϵ:

A = 1 − ϵ.

From the inequalities established in Theorem 2, we have

ϵRF-BiLSTM ≤ ϵBiLSTM ≤ ϵRF-LSTM ≤ ϵLSTM .

Substituting these into the accuracy relationship gives

ARF-BiLSTM ≥ ABiLSTM ≥ ARF-LSTM ≥ ALSTM .


Mathematics 2025, 13, 628 8 of 25

Remark 3. The results in Theorem 2 and Corollary 2 build upon Theorem 1 and Corollary 1 by
highlighting the hierarchical relationship between LSTM and BiLSTM architectures in the context of
random feature ensembles. Specifically, the bidirectional nature of BiLSTM amplifies the advantages
of ensemble modelling, including reduced variance and increased robustness to overfitting. The
additional backward pass in BiLSTM not only enhances temporal context representation but also
enables a tighter upper bound on classification error, as shown in Theorem 2. This progression
emphasizes the generalizability of the random feature ensemble framework and its applicability to
both unidirectional and bidirectional recurrent neural network architectures.

The algorithms presented here are similar to the random forest (RF) procedure regard-
ing model and variable uncertainty. In the first stage, a similitude of RF bootstrapping of
several samples is implemented by creating several nmodels from random feature combina-
tions. This simultaneously incorporates variable uncertainty. In the second stage, instead of
aggregating the predicted outcomes as in RF, we implemented a stacking approach similar
to boosting the base algorithm LSTM/BiLSTM using the predictions in the training stage.
It is important to note that the number of models nmodels and number of features n f eatures
must be less than n and p, respectively, to avoid singularity issues during the training in
stage 1 and prediction in stage 2.
The proposed framework presented in Figure 1 follows a two-stage architecture for
diabetes prediction, incorporating both Random Feature LSTM (RFLSTM) and Random Fea-
ture BiLSTM (RFBiLSTM). In Stage 1, the input dataset with p features undergoes random
feature selection, where each base model is trained on a subset of features, nfeatures = η × p
(e.g., η = 0.6). The base models include RFLSTM, which employs a unidirectional LSTM,
and RFBiLSTM, which utilizes a bidirectional LSTM (BiLSTM). The predictions from all
base models, Ŷbase , are stored. In Stage 2, the stacking model integrates the base model
predictions with the original feature set, forming the meta-feature matrix Xmeta . A final
LSTM (for RFLSTM) or BiLSTM (for RFBiLSTM) is then trained on Xmeta to generate the
final aggregated prediction. Key parameters include η, the proportion of selected features,
and nmodels , the number of base models. The flowchart visually distinguishes between
RFLSTM and RFBiLSTM while highlighting their shared two-stage learning approach.

2.3. Relationship Between η and Model Performance


The parameter η ∈ (0, 1) controls the proportion of features selected for each base
model in the Random Feature LSTM/BiLSTM ensemble, with nfeatures = ⌊η p⌋. The value
of η directly influences the bias–variance trade-off, accuracy, and loss of the model. Here,
we provide a detailed theoretical analysis of these effects.
The bias–variance decomposition of the misclassification error ϵ( f ) for a classifier f is
given by
ϵ( f ) = Bias2 ( f ) + Var( f ) + σ2 ,

where
• Bias2 ( f ) is the squared bias, representing the error due to the model’s inability to
capture the true underlying relationship.
• Var( f ) is the variance, representing the model’s sensitivity to the training data.
• σ2 is the irreducible error due to noise in the data.
The bias of the model is primarily affected by the number of features used for training.
When η is small, the model is trained on a limited subset of features, which may not capture
the full complexity of the data. This leads to underfitting and high bias. Mathematically,

Bias2 (η ) ∝ (1 − η )α ,
Mathematics 2025, 13, 628 9 of 25

where α > 0 is a dataset-dependent constant. For η ≥ 0.5, the bias decreases as η increases
because more features are available to capture the underlying data distribution.

Figure 1. Flowchart showing the diagnostic process of RFLSTM and RFBiLSTM.

The variance of the model is influenced by the diversity of the base models in the
ensemble. When η is small, the feature subsets for each base model are highly diverse,
leading to low correlation between the models’ errors and reduced ensemble variance.
Mathematically,
1
Var(η ) ∝ ,
nmodels · η β
Mathematics 2025, 13, 628 10 of 25

where β > 0 is a dataset-dependent constant. As η increases, the feature subsets overlap


more, reducing diversity and increasing variance. The total misclassification error ϵ(η ) can
be expressed as
C2
ϵ(η ) = C1 (1 − η )α + + σ2 ,
| {z } nmodels η β
Bias2
| {z }
Variance

where C1 , C2 > 0 are constants that depend on the dataset and model architecture.
The optimal value of η minimizes ϵ(η ). Taking the derivative of ϵ(η ) with respect to η
and setting it to zero,

dϵ βC2
= −αC1 (1 − η )α−1 − = 0.
dη nmodels η β+1

Solving this equation yields the optimal η ∗ , which balances bias and variance. Empiri-
cally, η ∗ is often found to be in the range 0.5 ≤ η ≤ 0.8.
For η < 0.5, the feature subsets are too small to capture the full complexity of the data,
leading to high bias. While the ensemble variance is low due to high diversity, the overall
error is dominated by bias, resulting in poor accuracy. This occurs specifically as follows:
• Bias increases sharply as η → 0.
• Variance decreases but is insufficient to compensate for the high bias.
• Accuracy degrades significantly due to underfitting.
Similarly, the cross-entropy loss L(η ) for the ensemble can be decomposed into two
components:
nmodels
L(η ) = E[− log p(y| Xmeta )] +λ · ∑ L( Mi ) ,
i =1
| {z }
Stacking Loss | {z }
Base Model Losses

where λ is a regularization parameter. The base model loss decreases as η increases because
more features improve the individual models’ ability to fit the training data. For η < 0.5, the
base model loss is high due to underfitting. The stacking loss is minimized at intermediate
values of η where the meta-features Xmeta = [ X, Ŷ (1) , . . . , Ŷ (nmodels ) ] provide the most useful
information. For η < 0.5, the stacking loss increases because the base models’ predictions
are less reliable. The behavior of various η values are summarized in Table 1.

Table 1. Phase transitions in η and their effects on accuracy and loss.

η Range Accuracy Behavior Loss Behavior


η→0 High bias, low variance → poor accuracy High base loss, high stacking loss
η < 0.5 High bias dominates → degraded accuracy High base loss, moderate stacking loss
η = 0.5 Bias–variance trade-off begins balanced base/stacking losses
0.6 ≤ η ≤ 0.8 Practical optimum → High accuracy Low base loss, low stacking loss
η → 1.0 Low bias, high variance → Overfitting Low base loss, high stacking loss

The following critical observations are noted:


• Diminishing returns for η > 0.8:
– As η → 1, the ensemble effectively becomes a single model, losing the benefits of
diversity.
– The accuracy gains plateau, and the risk of overfitting increases.
• Diversity threshold for η ≥ 0.5:
Mathematics 2025, 13, 628 11 of 25

– This ensures that the feature overlap between base models is bounded:

E[| X (i) ∩ X ( j) |]
FeatureOverlap = ≤ η2.
p

– For η = 0.6, the overlap is ≤ 0.36, preserving sufficient diversity.


• Generalization gap:
– Higher η reduces the gap between training and test loss due to better-aligned
feature distributions.
– Lower η increases the generalization gap due to underfitting.
The recommended η = 0.6 empirically balances these effects for sequence modelling
tasks while ensuring computational efficiency. For η < 0.5, the model suffers from severe
underfitting, making it impractical for most applications. The exact optimal η depends on
the dataset’s characteristics and can be determined through experimentation.

3. Simulation Study
The synthetic dataset was designed to replicate the Pima Indian Diabetes Dataset
(PIDD) for binary classification tasks, incorporating a simulation model grounded in clinical
plausibility and prior studies [29,30]. Specifically, the synthetic dataset contains n = 500
observations with eight predictors and one binary outcome variable. Let X = ( X1 , . . . , X8 )
denote the predictor matrix and Y ∈ {0, 1} the diabetes diagnosis outcome.

3.1. Variable Specifications


• X1 : glucose (mg/dL) ∼ LogNormal(µ = 4.8, σ = 0.2), truncated at [70, 200].
• X2 : BMI (kg/m²) ∼ N (µ = 30, σ = 6), truncated at [15, 50].
• X3 : age (years) ∼ N (µ = 33, σ = 12), truncated at [21, 81].
• X4 : blood pressure (mmHg) ∼ N (µ = 70, σ = 12), truncated at [40, 110].
• X5 : insulin (µU/mL) ∼ Gamma(k = 2, θ = 50) with 30% zeros.
• X6 : skin thickness (mm) ∼ Gamma(k = 3, θ = 10), truncated at [10, 60].
• X7 : genetic score ∼ 2.5 × Beta(α = 2, β = 5).
• X8 : pregnancies ∼ Poisson(λ = 2.5), truncated at [0, 15].

3.2. Correlation Structure


The predictor variables were generated using a Gaussian copula with correlation matrix:
 
1.00 0.30 0.10 0.20 0.40 0.10 0.20 0.00
0.30 1.00 0.20 0.30 0.10 0.30 0.10 0.00
 
0.10 0.20 1.00 0.20 0.00 0.10 0.20 0.10
 
 
0.20 0.30 0.20 1.00 0.10 0.20 0.00 0.00
R=
0.40
 (9)
 0.10 0.00 0.10 1.00 0.10 0.00 0.00
0.10 0.30 0.10 0.20 0.10 1.00 0.00 0.00
 
 
0.20 0.10 0.20 0.00 0.00 0.00 1.00 0.00
0.00 0.00 0.10 0.00 0.00 0.00 0.00 1.00

3.3. Outcome Generation


The binary outcome Y was generated using a logistic regression model:

8
logit( P(Y = 1|X)) = β 0 + ∑ β j Xj + ϵ (10)
j =1
Mathematics 2025, 13, 628 12 of 25

where

β 0 = −8
β = [0.04, 0.06, 0.02, 0, 0, 0, 0.03, 0]
ϵ ∼ N (0, 1)

The intercept β 0 was calibrated to achieve 25% prevalence: P(Y = 1) = 0.25. The focus
of the simulation study is to investigate the diversity and generalizability of the proposed
RFLSTM and RFBiLSTM models for 0.5 ≤ η ≤ 1. The base LSTM and BiLSTM models
were implemented in R version 4.3.3 using the Keras package. The architecture consists of a
sequential model where the primary layer is a Long Short-Term Memory (LSTM) network
with 50 units. The LSTM layer processes input sequences of predefined shape and captures
temporal dependencies. A fully connected dense layer with a sigmoid activation function
follows, outputting a probability score for binary classification. The model was compiled
using the Adam optimizer, binary cross-entropy as the loss function, and accuracy as the
evaluation metric. Training was conducted over 100 epochs to ensure sufficient learning
while preventing overfitting.

3.4. Simulation Results


The results in Table 2 demonstrate the impact of the feature selection proportion η on
the training and validation performance of the Random Feature LSTM (RFLSTM). For η
values between 0.5 and 0.8, the model achieves a balanced performance, with high and
consistent training and validation accuracies (97.1% and 96.7%, respectively) and relatively
small differences between training and validation losses (e.g., −0.0571 for η = 0.5). This
indicates effective generalization and a well-calibrated bias–variance trade-off. However,
for η = 0.9 and η = 1.0, the model exhibits clear signs of overfitting, as evidenced by
the large discrepancies between training and validation metrics (e.g., 95.7% vs. 76.7%
accuracy and a loss difference of −0.1874 for η = 0.9). These findings are corroborated by
the training and validation loss trajectories in Figure 2, which show a divergence in losses
for η ≥ 0.9, indicating that the model is memorizing the training data rather than learning
generalizable patterns. Thus, η values in the range of 0.5 to 0.8 are optimal, while higher
values lead to overfitting and degraded validation performance.

Figure 2. Training and validation loss trajectories of RFLSTM across varying η (proportion of predictors).
Mathematics 2025, 13, 628 13 of 25

Table 2. Training and validation performance of RFLSTM at different η values.

η Train Accuracy Val Accuracy Diff (%) Train Loss Val Loss Diff Remark
0.5 97.1% 96.7% 0.5 0.0923 0.1495 −0.0571 Balanced
0.6 97.1% 96.7% 0.5 0.0665 0.1199 −0.0535 Balanced
0.7 97.1% 96.7% 0.5 0.0554 0.1184 −0.0630 Balanced
0.8 97.1% 96.7% 0.5 0.0664 0.1216 −0.0552 Balanced
0.9 95.7% 76.7% 19.0 0.0660 0.2535 −0.1874 Overfit
1.0 98.6% 73.3% 25.2 0.0665 0.2972 −0.2307 Overfit

The results in Table 3 illustrate the impact of the feature selection proportion η on the
training and validation performance of the Random Feature BiLSTM (RFBiLSTM). For η = 0.6
and η = 0.8, the model achieves a balanced performance, with high training accuracy (97.1%
and 97.1%, respectively) and relatively small differences between training and validation
metrics (e.g., 3.8% accuracy difference and −0.0875 loss difference for η = 0.6). This indicates
effective generalization and a well-calibrated bias–variance trade-off. However, for η = 0.5,
η = 0.7, η = 0.9, and η = 1.0, the model exhibits signs of overfitting, as evidenced by the
large discrepancies between training and validation metrics (e.g., 100.0% vs. 90.0% accuracy
and a loss difference of −0.1841 for η = 0.7). These findings are corroborated by the training
and validation loss trajectories in Figure 3, which show a divergence in losses for η = 0.5, 0.7,
0.9, and 1.0, indicating that the model is memorizing the training data rather than learning
generalizable patterns. Thus, η values of 0.6 and 0.8 are optimal for RFBiLSTM, while other
values lead to overfitting and degraded validation performance.

Table 3. Training and validation performance of RFBiLSTM at different η values.

η Train Accuracy Val Accuracy Diff (%) Train Loss Val Loss Diff Remark
0.5 97.1% 86.7% 10.5 0.0719 0.3357 −0.2638 Overfit
0.6 97.1% 93.3% 3.8 0.0508 0.1383 −0.0875 Balance
0.7 100.0% 90.0% 10.0 0.0196 0.2037 −0.1841 Overfit
0.8 97.1% 96.7% 0.5 0.0597 0.0996 −0.0398 Balance
0.9 98.6% 86.7% 11.9 0.0458 0.2905 −0.2447 Overfit
1.0 98.6% 83.3% 15.2 0.0485 0.3641 −0.3156 Overfit

Figure 3. Training and validation loss trajectories of RFBiLSTM across varying η (proportion of predictors).
Mathematics 2025, 13, 628 14 of 25

4. Real-Life Data Applications


4.1. Datasets
4.1.1. Pima Indian Diabetes Dataset (PIDD)
The Pima Indian Diabetes Dataset is a well-known benchmark dataset in medical
machine learning, originally sourced from the National Institute of Diabetes and Digestive
and Kidney Diseases (NIDDK). It consists of n = 768 samples and p = 8 features, which
include the number of pregnancies, plasma glucose concentration, diastolic blood pressure,
triceps skinfold thickness, serum insulin, body mass index (BMI), diabetes pedigree func-
tion, and age. The target variable is binary, indicating the presence (1) or absence (0) of
diabetes. The dataset specifically focuses on adult female patients of Pima Indian heritage,
a population known to have a high genetic predisposition to diabetes. The age distribution
of the participants ranges from 21 to 81 years, with a mean age of approximately 33 years.
Gender-wise, the dataset exclusively includes females, as it was initially collected to study
diabetes prevalence among Pima Indian women living near Phoenix, Arizona, USA. This
dataset has been widely used in machine learning and healthcare research to develop
predictive models for diabetes detection. For instance, studies such as [31] have leveraged
it to benchmark classification algorithms, while [7] utilized it to evaluate feature selection
methods for diabetes prediction. Despite its significance, the dataset presents challenges
such as class imbalance (with approximately 35% diabetic cases and 65% non-diabetic cases)
and missing values in features like serum insulin, requiring robust imputation techniques
to ensure reliable model performance.

4.1.2. Diabetic Retinopathy Debrecen Dataset (DRDD)


The Diabetic Retinopathy Debrecen Dataset is a medical imaging dataset sourced from
the Department of Ophthalmology at the University of Debrecen, Hungary. It comprises
n = 1151 samples, each corresponding to a retinal image, and p = 19 features extracted
using image-processing algorithms. These features include the following:
1. Detected lesions, such as microaneurysms, hemorrhages, and exudates, which are
early indicators of diabetic retinopathy.
2. Anatomical descriptors, such as the optic disc and fovea, which provide spatial context
for lesion localization.
3. Image-level characteristics, including contrast, sharpness, and illumination quality,
which influence the accuracy of automated diagnosis. The dataset’s target variable is
binary, indicating the presence (1) or absence (0) of diabetic retinopathy.
The dataset primarily consists of adult diabetic patients who underwent retinal imag-
ing as part of routine screening. While the age distribution of the participants is not
explicitly reported, diabetic retinopathy is most prevalent in individuals over 40 years
old, suggesting a predominance of middle-aged and older patients. The dataset includes
both male and female patients, but detailed gender and racial distribution statistics are not
publicly available. However, given that the dataset originates from Hungary, the majority
of participants are likely of European descent. The Diabetic Retinopathy Debrecen Dataset
has been widely used in medical image analysis and machine learning research. For in-
stance, ref. [32] developed ensemble-based classification models for automated diabetic
retinopathy detection, while [33] utilized deep learning techniques, such as convolutional
neural networks (CNNs), to enhance diagnostic accuracy. The dataset presents challenges
such as class imbalance (with a lower proportion of diabetic retinopathy cases compared
to non-diseased samples) and high-dimensional feature space, requiring effective feature
selection and augmentation techniques to improve model performance.
Mathematics 2025, 13, 628 15 of 25

4.1.3. Early Stage Diabetes Risk Prediction Dataset (ESDRPD)


The Early Stage Diabetes Risk Prediction Dataset is a structured dataset designed to
predict the likelihood of diabetes onset based on demographic, clinical, and lifestyle-related
features. It comprises n = 520 samples and p = 17 features collected from individuals
exhibiting early symptoms of diabetes. This dataset originates from a clinical study aimed
at identifying risk factors associated with early-stage diabetes. The data were gathered
through self-reported questionnaires and medical evaluations conducted in healthcare
settings, focusing on individuals suspected of being at risk of diabetes. The dataset includes
three major categories of features, namely, demographic information, including age (in
years) and gender (male/female); clinical symptoms, including polyuria (frequent urina-
tion), polydipsia (excessive thirst), sudden weight loss, weakness, genital thrush, visual
blurring, itching, irritability, delayed healing of wounds, partial paresis (muscle weakness),
and muscle stiffness; and lifestyle and behavioral factors, including obesity and family
history of diabetes. The target variable is binary, indicating whether an individual is at risk
of diabetes (1) or not (0).
The dataset includes a diverse age range, with participants ranging from young adults
(20s) to elderly individuals (70+ years old), reflecting the broad spectrum of diabetes
risk. Gender distribution is balanced, with both male and female participants represented.
However, specific racial or ethnic information is not provided, though studies suggest
that diabetes risk factors can vary significantly across populations. The Early Stage Dia-
betes Risk Prediction Dataset has been widely used in predictive modelling and machine
learning applications. Studies such as [1] have leveraged deep learning approaches to
enhance early diabetes risk detection, while [22] focused on optimizing machine learning
models for deployment in resource-limited healthcare settings. The dataset poses chal-
lenges such as class imbalance, as fewer cases are classified as “at risk”, necessitating
effective data preprocessing techniques like oversampling and feature selection to improve
model performance.

4.2. Data Preprocessing


Each of the datasets underwent a comprehensive preprocessing pipeline to enhance
data quality and improve model performance. This process involved three critical steps:
feature transformation, missing data imputation, and class balancing. One of the key
challenges in diabetes prediction is dealing with imbalanced datasets, where instances
of diabetic patients are often underrepresented compared to non-diabetic cases. Such
class imbalance can bias machine learning models, leading to suboptimal sensitivity in
detecting diabetes cases. To mitigate this issue, an oversampling procedure was applied,
ensuring that the minority class had sufficient representation for robust model training, as
recommended in [8].
Another significant challenge in diabetes prediction involves handling high-dimensional
data, where an excessive number of features can lead to overfitting and increased computa-
tional complexity. Feature transformation was implemented using standardization, ensuring
that all features were scaled to have a unit standard deviation. This transformation not only
improved model convergence but also preserved the relative importance of each feature, par-
ticularly in deep learning architectures where feature scaling significantly influences gradient
updates. Furthermore, real-world medical datasets frequently contain missing values due
to inconsistent data collection or patient non-compliance in reporting health parameters. To
address this, missing data imputation was performed using Multiple Imputation by Chain
Equation (MICE) techniques. MICE estimates missing values by iteratively modelling each
feature as a function of the others, thereby preserving the underlying data distribution. This
Mathematics 2025, 13, 628 16 of 25

method was particularly effective in ensuring that imputations maintained realistic clinical
patterns rather than introducing artificial biases [8].

4.3. Evaluation Metrics


To comprehensively assess the performance of the models, the following evaluation
metrics were used: accuracy, sensitivity (true positive rate), specificity (true negative
rate), AUC (Area Under the Curve), Brier score, ROC (Receiver Operating Characteristic)
curve, and computational time. These metrics provide insights into different aspects of
model performance, including predictive accuracy, discrimination capability, and computa-
tional efficiency.

4.3.1. Accuracy
Accuracy measures the overall correctness of predictions and is defined as the ratio of
correctly predicted instances to the total number of instances:

TP + TN
Accuracy = , (11)
TP + TN + FP + FN

where TP is the number of true positives, TN is the number of true negatives, FP is the
number of false positives, and FN is the number of false negatives [34].

4.3.2. Sensitivity (True-Positive Rate)


Sensitivity evaluates the model’s ability to correctly identify positive cases and is
calculated as
TP
Sensitivity = . (12)
TP + FN
This metric is particularly important in medical diagnostics, where detecting true
positives is critical [31].

4.3.3. Specificity (True-Negative Rate)


Specificity measures the model’s ability to correctly identify negative cases and is
defined as
TN
Specificity = . (13)
TN + FP
High specificity ensures that the model minimizes false positives, which is crucial in
scenarios where false alarms are costly [32].

4.3.4. AUC (Area Under the Curve)


The AUC summarizes the ROC curve, providing a single value that represents the
model’s discrimination capability. It is calculated as the integral of the ROC curve, which
plots the true-positive rate (sensitivity) against the false-positive rate (1 − specificity) at
various threshold settings. A higher AUC indicates better model performance [34].

4.4. Brier Score


The Brier score measures the accuracy of probabilistic predictions and is defined as

N
1
Brier Score =
N ∑ ( p i − y i )2 , (14)
i =1

where pi is the predicted probability, yi is the actual outcome (0 or 1), and N is the total
number of predictions. Lower Brier scores indicate better-calibrated predictions [1].
Mathematics 2025, 13, 628 17 of 25

4.4.1. ROC Curve


The ROC curve visualizes the trade-off between sensitivity and specificity across
different classification thresholds. It is a key tool for evaluating the performance of binary
classifiers [13].

4.4.2. Computational Time


Computational time measures the efficiency of the model by recording the time re-
quired for training and inference. This metric is critical for real-time applications and
resource-constrained environments [34]. These metrics enable a balanced predictive accu-
racy, reliability, and practicality assessment.
To ensure reproducibility, the performance comparison of each dataset is based on
an average obtained from a 10-fold cross-validation process that is repeated 10 times. In
this method, for each repetition, the dataset is randomly divided into 10 equal subsets. In
each iteration, nine subsets (90%) are used for training, while the remaining subset (10%) is
used for testing. This procedure is carried out 10 times, ensuring that each subset serves as
the test set exactly once [34–36]. All LSTM and BiLSTM models were implemented in R
(version 4.3.3) using the Keras package. The experiments were conducted on a PC with the
following system configuration: Intel(R) Core(TM) i7-8565U CPU running at approximately
1.8 GHz (8 CPUs) with 16 GB of RAM, ensuring reproducibility.

5. Results
The results in Tables 4–6 demonstrate that the proposed Random Feature LSTM and
BiLSTM methods significantly outperform traditional state-of-the-art methods across three
diabetes-related datasets: Pima Indian Diabetes, Diabetic Retinopathy Debrecen, and Early
Stage Diabetes Risk. For the Pima Indian dataset, Random Feature BiLSTM achieved
the highest accuracy (99.3%) with perfect AUC (100%) and the lowest Brier score (0.006),
closely followed by Random Feature LSTM, which also excelled, with an accuracy of
97.6% and AUC of 100%. In the Diabetic Retinopathy dataset, both proposed methods
delivered accuracy above 97%, far exceeding that of traditional LSTM and BiLSTM models,
whose accuracy was only 62.3% and 64.0%, respectively. Similarly, in the Early Stage
Diabetes Risk dataset, both Random Feature LSTM and BiLSTM attained perfect scores
across all performance metrics (100% accuracy, sensitivity, specificity, and AUC, with a
Brier score of 0.000), outperforming other models, including random forest and SVM. These
findings underscore the enhanced predictive capabilities and robustness of the Random
Feature LSTM and BiLSTM approaches, particularly in achieving high accuracy, sensitivity,
specificity, and low Brier scores, making them highly effective for diabetes prediction tasks
across diverse datasets.

Table 4. Average of 10-fold cross-validation performance comparison of the proposed methods


(Random Feature LSTM and BiLSTM) with state-of-the-art methods for Pima Indian Diabetes Dataset.

Methods Accuracy Sensitivity Specificity AUC Brier Score


Random Feature LSTM 97.6% 98.6% 96.6% 100.0% 0.018
Random Feature BiLSTM 99.3% 99.0% 99.6% 100.0% 0.006
LSTM 72.5% 73.0% 72.0% 78.9% 0.190
BiLSTM 71.4% 73.2% 69.6% 79.6% 0.187
ELM 76.0% 74.4% 77.6% 84.9% 0.163
Neural network 81.2% 87.2% 75.2% 81.9% 0.180
Random forest 86.7% 91.6% 81.8% 95.3% 0.093
SVM 71.3% 72.4% 81.9% 82.7% 0.195
Logistic regression 74.8% 71.2% 78.4% 84.6% 0.161
Naive Bayes 72.8% 67.0% 78.6% 82.7% 0.184
Mathematics 2025, 13, 628 18 of 25

Table 5. Average of 10-fold cross-validation performance comparison of the proposed methods


(Random Feature LSTM and BiLSTM) with state-of-the-art methods for Diabetic Retinopathy Debre-
cen Dataset.

Methods Accuracy Sensitivity Specificity AUC Brier Score


Random Feature LSTM 97.2% 97.0% 97.4% 99.1% 0.023
Random Feature BiLSTM 97.5% 98.4% 96.6% 99.4% 0.018
LSTM 62.3% 41.1% 83.5% 64.7% 0.228
BiLSTM 64.0% 50.9% 77.1% 68.2% 0.222
ELM 71.2% 61.7% 80.7% 78.1% 0.195
Neural network 79.3% 73.7% 84.9% 80.2% 0.200
Random forest 80.5% 73.5% 87.6% 91.1% 0.130
SVM 74.2% 68.7% 79.7% 81.3% 0.177
Logistic regression 75.8% 66.4% 85.1% 82.9% 0.164
Naive Bayes 63.4% 43.2% 83.6% 69.1% 0.331

Table 6. Average of 10-fold cross-validation performance comparison of the proposed methods (Ran-
dom Feature LSTM and BiLSTM) with state-of-the-art methods for Early Stage Diabetes Risk Dataset.

Methods Accuracy Sensitivity Specificity AUC Brier Score


Random Feature LSTM 100.0% 100.0% 100.0% 100.0% 0.000
Random Feature BiLSTM 100.0% 100.0% 100.0% 100.0% 0.000
LSTM 83.4% 77.8% 89.1% 89.8% 0.122
BiLSTM 84.8% 78.4% 91.3% 92.0% 0.113
ELM 91.4% 86.3% 96.6% 97.8% 0.073
Neural network 93.6% 90.9% 96.3% 94.7% 0.063
Random forest 98.1% 98.4% 97.8% 99.8% 0.016
SVM 97.3% 97.5% 97.2% 99.6% 0.021
Logistic regression 94.4% 92.5% 96.3% 97.5% 0.055
Naive Bayes 90.3% 92.5% 88.1% 95.9% 0.095

Figures 4–6 illustrate the Receiver Operating Characteristic (ROC) curves for various
models across the Pima Indian Diabetes, Diabetic Retinopathy Debrecen, and Early Stage
Diabetes Risk datasets. These plots, generated from one of the ten-fold cross-validation
runs, highlight the superior performance of the proposed Random Feature LSTM and
BiLSTM methods. For the Pima Indian Diabetes Dataset, the ROC curves for these methods
reached the top-left corner, reflecting perfect AUC values (99.3%) and corroborating their
near-perfect classification capabilities, as detailed in Table 4. Similarly, in the Diabetic
Retinopathy Debrecen Dataset, the proposed methods maintained high AUC values (above
99%), showcasing their reliability in distinguishing between diabetic and non-diabetic cases
compared to the traditional LSTM and BiLSTM models, which exhibited significantly lower
AUC values. The Early Stage Diabetes Risk dataset, further emphasized the superiority of
the proposed ensemble methods, with both achieving perfect ROC curves that align with
their flawless performance metrics across all evaluation categories in Table 6. These ROC
curves visually reinforce the findings, demonstrating the proposed methods’ robustness,
precision, and clinical applicability.
The computational time results in Table 7 show significant differences in computa-
tional efficiency across methods. The proposed Random Feature BiLSTM (2.94 s) is much
faster than the LSTM (5.68 s), proposed Random Feature LSTM (12.87 s), and BiLSTM
(11.25 s), demonstrating the benefits of random feature augmentation for BiLSTM in terms
of computational speed. However, ELM (0.02 s) and Naive Bayes (0.01 s) are the quickest,
suitable for time-sensitive tasks. Traditional ML models like random forest (0.12 s), SVM
(0.27 s), and logistic regression (0.58 s) offer a balance between speed and complexity,
outperforming deep learning models in efficiency. Neural networks (0.26 s) are faster
than LSTM-based models but slower than simpler algorithms. Overall, in terms of high
Mathematics 2025, 13, 628 19 of 25

predictive ability captured by accuracy, sensitivity, specificity, AUC, and Brier score as well
as moderate computational speed, the proposed Random Feature BiLSTM is the best.

Random Feature LSTM Random Feature BiLSTM LSTM BiLSTM ELM


AUC = 0.993 AUC = 0.980 AUC = 0.743 AUC = 0.751 AUC = 0.871
1.0

0.8

0.6

0.4

0.2
Sensitivity

0.0
Neural Network Random Forest SVM Logistic Regression Naive Bayes
AUC = 0.799 AUC = 0.885 AUC = 0.883 AUC = 0.875 AUC = 0.848
1.0

0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
1 − Specificity

Figure 4. Receiver Operating Characteristic (ROC) curves of the various methods for the Pima
Indian dataset. (Note the AUC value shown on the plot corresponds to the AUC from one of the
ten iterations).

Random Feature LSTM Random Feature BiLSTM LSTM BiLSTM ELM


AUC = 1.000 AUC = 1.000 AUC = 0.654 AUC = 0.766 AUC = 0.860
1.0

0.8

0.6

0.4

0.2
Sensitivity

0.0
Neural Network Random Forest SVM Logistic Regression Naive Bayes
AUC = 0.827 AUC = 0.942 AUC = 0.842 AUC = 0.882 AUC = 0.745
1.0

0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
1 − Specificity

Figure 5. Receiver Operating Characteristic (ROC) curves of the various methods for the Diabetic
Retinopathy Debrecen Dataset. (Note the AUC value shown on the plot corresponds to the AUC
from one of the ten iterations).
Mathematics 2025, 13, 628 20 of 25

Random Feature LSTM Random Feature BiLSTM LSTM BiLSTM ELM


AUC = 1.000 AUC = 1.000 AUC = 0.928 AUC = 0.934 AUC = 0.984
1.0

0.8

0.6

0.4

0.2
Sensitivity

0.0
Neural Network Random Forest SVM Logistic Regression Naive Bayes
AUC = 0.943 AUC = 0.994 AUC = 0.997 AUC = 0.997 AUC = 0.964
1.0

0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
1 − Specificity

Figure 6. Receiver Operating Characteristic (ROC) curves of the various methods for the Early Stage
Diabetes Risk dataset. (Note the AUC value shown on the plot corresponds to the AUC from one of
the ten iterations).

Table 7. Average of 10-fold cross-validation computational time comparison.

Methods Time (s)


Random Feature LSTM 12.87
Random Feature BiLSTM 2.94
LSTM 5.68
BiLSTM 11.25
ELM 0.02
Neural network 0.26
Random forest 0.12
SVM 0.27
Logistic regression 0.58
Naive Bayes 0.01

Benchmark Comparison with Related Studies


The performance comparison in Table 8 highlights the effectiveness of the proposed
RFLSTM and RFBiLSTM models across three diabetes datasets. For the Pima Indian
Diabetes Dataset (PIDD), the RFBiLSTM achieves state-of-the-art accuracy (99.3%) and
sensitivity (99.0%), outperforming existing methods like Boruta + EL (98.1% accuracy,
98.4% sensitivity) and Conv-LSTM (97.2% accuracy, 93.9% sensitivity). Notably, the pro-
posed models also dominate the Early Stage Diabetes Risk Prediction Dataset (ESDRPD),
achieving perfect accuracy and sensitivity (100%), surpassing methods like LGBM (96.2%
accuracy, 96.8% sensitivity) and Boruta + EL (98.6% accuracy). On the Diabetic Retinopa-
thy Debrecen Dataset (DRDD), RFBiLSTM achieves 97.5% accuracy and 98.4% sensitivity,
outperforming DNN + PCA + GWO (97.3% accuracy, 91.0% sensitivity). These results
suggest that the bidirectional architecture (RFBiLSTM) consistently enhances performance
over unidirectional RFLSTM, particularly in sensitivity, likely due to its ability to capture
temporal dependencies more effectively.
Mathematics 2025, 13, 628 21 of 25

Table 8. Performance comparison of different methods on diabetes datasets.

Method [Authors] Dataset Accuracy Sensitivity


PSO + FCM [23] PIDD 95.4% 95.6%
PCA + K-Means + LR [24] PIDD 97.3% 97.0%
VAE + SAE + CNN [26] PIDD 92.3% —
K-Means + LR [25] PIDD 95.4% 95.4%
Conv-LSTM [1] PIDD 97.2% 93.9%
SVC [37] PIDD 79.0% 70.0%
LE [38] PIDD 75.0% 72.0%
X-BLR [39] PIDD 94.0% 94.0%
CGLSTM [40] PIDD 97.8% 89.6%
KFPredict [27] PIDD 93.5% 98.0%
LGBM [41] PIDD 95.5% 92.3%
Boruta + EL [7] PIDD 98.1% 98.4%
RFLSTM [Proposed] PIDD 97.6% 98.6%
RFBiLSTM [Proposed] PIDD 99.3% 99.0%
Boruta + EL [7] ESDRPD 98.6% 98.6%
LGBM [41] ESDRPD 96.2% 96.8%
RFLSTM [Proposed] ESDRPD 100.0% 100.0%
RFBiLSTM [Proposed] ESDRPD 100.0% 100.0%
Op-BPN [42] DRDD 96.8% 96.3%
DNN + PCA + GWO [43] DRDD 97.3% 91.0%
RFLSTM [Proposed] DRDD 97.2% 97.0%
RFBiLSTM [Proposed] DRDD 97.5% 98.4%

6. Discussion of Results
The proposed Random Feature LSTM (RFLSTM) and Random Feature BiLSTM (RF-
BiLSTM) frameworks demonstrate significant advancements in diabetes prediction accu-
racy and robustness compared to conventional machine learning models and standard
LSTM/BiLSTM architectures. Across three benchmark datasets, Pima Indian Diabetes
(PIDD), Diabetic Retinopathy Debrecen (DRDD), and Early Stage Diabetes Risk Prediction
(ESDRPD), the models achieve state-of-the-art performance, with RFBiLSTM consistently
outperforming RFLSTM and existing methods. On the PIDD, RFBiLSTM attains 99.3%
accuracy and 99.0% sensitivity, surpassing advanced ensemble methods like Boruta + EL
(98.1% accuracy) and hybrid architectures such as Conv-LSTM (97.2% accuracy). Similarly,
on the ESDRPD, both RFLSTM and RFBiLSTM achieve flawless accuracy and sensitivity
(100%), outperforming gradient-boosted models like LGBM (96.2% accuracy) and ensemble
techniques such as Boruta + EL (98.6% accuracy). For the DRDD, RFBiLSTM achieves
97.5% accuracy and 98.4% sensitivity, exceeding deep learning hybrids like DNN + PCA +
GWO (97.3% accuracy) and traditional models such as SVM (79.0% accuracy). These results
underscore the efficacy of combining random feature selection with bidirectional temporal
processing, particularly in clinical contexts where sensitivity and specificity are critical for
early intervention.
The performance superiority stems from two synergistic innovations. First, the ran-
dom feature selection mechanism mitigates overfitting by promoting model diversity, as
evidenced by the η-dependency analysis (Tables 2 and 3). For RFLSTM, mid-range η values
(0.5–0.8) yield balanced generalization, with training and validation accuracies stabilizing
at 97.1% and 96.7%, respectively, and minimal loss discrepancies (e.g., −0.0571 for η = 0.5).
Conversely, higher η values (≥0.9) induce overfitting, as seen in the sharp decline in vali-
dation accuracy (76.7%) and widening loss gaps (−0.1874). Similarly, RFBiLSTM achieves
optimal performance at η = 0.6 and η = 0.8 (validation accuracy: 93.3–96.7%) but falters
at η = 0.9 (validation accuracy: 86.7%), emphasizing the necessity of retaining sufficient
feature diversity to balance bias and variance. Second, the bidirectional architecture in
RFBiLSTM enhances sensitivity by capturing temporal dependencies in both forward and
Mathematics 2025, 13, 628 22 of 25

backward directions, as demonstrated by its 5.1% sensitivity gain over RFLSTM on the
DRDD (98.4% vs. 97.0%). This aligns with findings from [2], where bidirectional archi-
tectures improved glucose trend prediction, and [14], where hybrid models excelled in
capturing sequential patterns in medical data.
The models’ clinical reliability is further validated by their exceptional calibration
metrics, including near-perfect AUC scores (100% for ESDRPD) and low Brier scores
(0.006–0.023), which indicate precise probabilistic predictions. These metrics suggest
that the models are not only accurate but also trustworthy in real-world settings where
predictive confidence impacts clinical decisions. However, the perfect scores on ESDRPD
warrant careful scrutiny, as they may reflect dataset-specific biases, such as homogeneous
patient demographics or limited variability in symptom presentation. Despite this caveat,
the results align with broader trends in medical AI research, such as [19], where hybrid
CNNs outperformed traditional models in diabetes prediction, and [17], which highlighted
the importance of sequential learning for early risk detection.
The findings hold critical implications for both clinical practice and machine learning
research. Clinically, the high sensitivity (98.4–100%) and specificity (97.0–100%) of RFBiL-
STM position it as a valuable tool for early diabetes screening, where false negatives can
delay critical interventions. The models’ ability to maintain robust performance across di-
verse datasets ranging from physiological measurements (PIDD) to retinal imaging (DRDD)
and symptom-based assessments (ESDRPD) suggests broad applicability in multi-modal
healthcare environments. From a technical perspective, the η dependency analysis un-
derscores the importance of optimizing the proportions of feature selection, with 50–80%
feature retention emerging as a “sweet spot” to balance information richness and model
generalizability. This insight challenges the conventional preference for high η values
in feature selection, instead advocating a middle ground that prioritizes diversity over
completeness. Architecturally, the bidirectional design of RFBiLSTM proves indispensable
for sensitivity-driven tasks, as it captures temporal dependencies more comprehensively
than unidirectional models, a finding consistent with recent advances in medical time series
analysis. Future work should focus on validating these models on larger, multi-institutional
datasets to ensure generalizability across diverse populations and addressing potential
biases in datasets with limited heterogeneity. Furthermore, exploring the integration of
attention mechanisms or explainability frameworks could further enhance clinical adoption
by providing interpretable insights into model decisions.
In addition, the proposed hybrid framework enhances accuracy, efficiency, and in-
terpretability in several key ways. Accuracy is significantly improved through a dual
mechanism: (1) randomized feature selection reduces overfitting by training diverse base
models on unique feature subsets, fostering robustness through ensemble aggregation, and
(2) bidirectional processing in BiLSTM captures temporal dependencies in both forward
and backward directions, enabling enhanced pattern recognition (e.g., 98.4% vs. 97.0%
sensitivity for retinopathy data). Systematic simulations revealed that the proportions
of midrange feature selection (η = 0.5–0.8) optimize generalization, outperforming con-
ventional models (e.g., 100% precision in early-stage risk data) and recent hybrids such
as Boruta + EL. Efficiency is maintained despite the ensemble design: parallelizable base
model training and meta-learner integration minimize computational overhead, while dy-
namic feature sparsity reduces per-model complexity. Interpretability is advanced through
two pathways: (1) the meta-learner’s transparent weighting mechanism clarifies how base
predictions contribute to final outputs. (2) The superior calibration (Brier score: 0.006–0.023)
ensures probabilistic reliability, critical for clinical trust. Together, these innovations balance
performance, scalability, and actionable insights, addressing limitations of monolithic deep
learning architectures while advancing practical utility in healthcare analytics.
Mathematics 2025, 13, 628 23 of 25

7. Conclusions
The integration of hybrid deep learning architectures, such as those combining ran-
dom feature selection with unidirectional/bidirectional temporal modelling, represents a
paradigm shift in medical predictive analytics. By harmonizing the strengths of ensemble
learning and sequential data processing, these frameworks address critical limitations of
conventional models, which often struggle with unstructured medical data. The success of
such architectures in the prediction of diabetes underscores their potential to advance preci-
sion medicine, offering tools that are not only accurate but also reliable in their probabilistic
calibration, a prerequisite for clinical trust.
This study reinforces the importance of balancing feature diversity and information
retention in medical AI design. Although traditional methods prioritize either interpretabil-
ity or complexity, hybrid architectures like those proposed here demonstrate that these
goals need not be mutually exclusive. Instead, they can co-exist to enhance model gener-
alizability and robustness, particularly in early-stage disease detection, where nuanced
patterns demand sophisticated analytical frameworks.
The broader implications extend beyond diabetes. The principles underlying these
models’ adaptive feature selection, temporal dependency capture, and probabilistic cali-
bration are transferable to other chronic diseases, from cardiovascular disorders to neu-
rodegenerative conditions, where early diagnosis and risk stratification are equally vital.
However, the path to clinical adoption requires addressing challenges such as dataset
heterogeneity, algorithmic transparency, and computational scalability. Future efforts must
prioritize collaborative frameworks that bridge machine learning innovation with clinical
expertise, ensuring that these tools evolve in tandem with real-world healthcare needs.
Ultimately, this work contributes to a growing movement in medical AI, one that seeks
not only to predict but to empower, transforming raw data into actionable insights that
improve patient outcomes and redefine preventive care.

Author Contributions: Conceptualization, O.R.O., A.O.S., J.A., A.A.A. and N.M.A.; methodology,
O.R.O. and A.O.S.; software, O.R.O.; validation, O.R.O., A.O.S., J.A., A.A.A. and N.M.A.; formal
analysis, O.R.O.; investigation, O.R.O., A.O.S., J.A., A.A.A. and N.M.A.; resources, J.A. and A.A.A.;
data curation, O.R.O.; writing—original draft preparation, O.R.O. and A.O.S.; writing—review and
editing, O.R.O., A.O.S., J.A., A.A.A. and N.M.A.; visualization, O.R.O.; supervision, O.R.O.; project
administration, O.R.O. All authors have read and agreed to the published version of the manuscript.

Funding: This research received no external funding.

Data Availability Statement: The authors confirm that the data supporting the findings of this study
are available within the article.

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Rahman, M.; Islam, D.; Mukti, R.J.; Saha, I. A deep learning approach based on convolutional LSTM for detecting diabetes.
Comput. Biol. Chem. 2020, 88, 107329. [CrossRef] [PubMed]
2. Sun, Q.; Jankovic, M.V.; Bally, L.; Mougiakakou, S.G. Predicting blood glucose with an lstm and bi-lstm based deep neural
network. In Proceedings of the IEEE 2018 14th Symposium on Neural Networks and Applications (NEUREL), Belgrade, Serbia,
20–21 November 2018; pp. 1–5.
3. Ishida, K.; Ercan, A.; Nagasato, T.; Kiyama, M.; Amagasaki, M. Use of one-dimensional CNN for input data size reduction
in LSTM for improved computational efficiency and accuracy in hourly rainfall-runoff modeling. J. Environ. Manag. 2024,
359, 120931. [CrossRef] [PubMed]
4. Cheng, H.; Zhu, J.; Li, P.; Xu, H. Combining knowledge extension with convolution neural network for diabetes prediction. Eng.
Appl. Artif. Intell. 2023, 125, 106658. [CrossRef]
Mathematics 2025, 13, 628 24 of 25

5. Madan, P.; Singh, V.; Chaudhari, V.; Albagory, Y.; Dumka, A.; Singh, R.; Gehlot, A.; Rashid, M.; Alshamrani, S.S.; AlGhamdi, A.S.
An optimization-based diabetes prediction model using CNN and Bi-directional LSTM in real-time environment. Appl. Sci. 2022,
12, 3989. [CrossRef]
6. Araveeporn, A. Comparing the linear and quadratic discriminant analysis of diabetes disease classification based on data
multicollinearity. Int. J. Math. Math. Sci. 2022, 2022, 7829795. [CrossRef]
7. Zhou, H.; Xin, Y.; Li, S. A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC Bioinform.
2023, 24, 224. [CrossRef]
8. Jaiswal, S.; Gupta, P. Diabetes Prediction Using Bi-directional Long Short-Term Memory. SN Comput. Sci. 2023, 4, 373. [CrossRef]
9. Maniruzzaman, M.; Kumar, N.; Abedin, M.M.; Islam, M.S.; Suri, H.S.; El-Baz, A.S.; Suri, J.S. Comparative approaches for
classification of diabetes mellitus data: Machine learning paradigm. Comput. Methods Programs Biomed. 2017, 152, 23–34.
[CrossRef]
10. Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting diabetes mellitus with machine learning techniques. Front. Genet.
2018, 9, 515. [CrossRef]
11. Yahyaoui, A.; Jamil, A.; Rasheed, J.; Yesiltepe, M. A decision support system for diabetes prediction using machine learning and
deep learning techniques. In Proceedings of the IEEE 2019 1st International Informatics and Software Engineering Conference
(UBMYK), Ankara, Turkey, 6–7 November 2019; pp. 1–4.
12. Yuvaraj, N.; SriPreethaa, K. Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster.
Clust. Comput. 2019, 22, 1–9. [CrossRef]
13. Khanam, J.J.; Foo, S.Y. A comparison of machine learning algorithms for diabetes prediction. Ict Express 2021, 7, 432–439.
[CrossRef]
14. Kusuma, S.; Jothi, K. ECG signals-based automated diagnosis of congestive heart failure using Deep CNN and LSTM architecture.
Biocybern. Biomed. Eng. 2022, 42, 247–257. [CrossRef]
15. Reddy, S.N.B.; Reddy, K.N.; Rao, S.T.; Kumar, K. Diabetes Prediction using Extreme Learning Machine: Application of Health
Systems. In Proceedings of the IEEE 2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT),
Tirunelveli, India, 23–25 January 2023; pp. 993–998.
16. Pangaribuan, J.J.; Suharjito. Diagnosis of diabetes mellitus using extreme learning machine. In Proceedings of the IEEE 2014
International Conference on Information Technology Systems and Innovation (ICITSI), Bandung, Indonesia, 24–27 November
2014; pp. 33–38.
17. Elsayed, N.; ElSayed, Z.; Ozer, M. Early stage diabetes prediction via extreme learning machine. In Proceedings of the IEEE
SoutheastCon 2022, Mobile, AL, USA, 26 March–3 April 2022; pp. 374–379.
18. Georga, E.I.; Protopappas, V.C.; Polyzos, D.; Fotiadis, D.I. Online prediction of glucose concentration in type 1 diabetes using
extreme learning machines. In Proceedings of the IEEE 2015 37th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 3262–3265.
19. Swapna, G.; Vinayakumar, R.; Soman, K. Diabetes detection using deep learning algorithms. ICT Express 2018, 4, 243–246.
20. Hossain, M.M.; Ali, M.S.; Ahmed, M.M.; Rakib, M.R.H.; Kona, M.A.; Afrin, S.; Islam, M.K.; Ahsan, M.M.; Raj, S.M.R.H.; Rahman,
M.H. Cardiovascular disease identification using a hybrid CNN-LSTM model with explainable AI. Inform. Med. Unlocked 2023,
42, 101370. [CrossRef]
21. Karthika, S.; Priyanka, T.; Indirapriyadharshini, J.; Sadesh, S.; Rajeshkumar, G.; Rajesh Kanna, P. Prediction of Weather Forecasting
with Long Short-Term Memory using Deep Learning. In Proceedings of the IEEE 2023 4th International Conference on Smart
Electronics and Communication (ICOSEC), Trichy, India, 20–22 September 2023; pp. 1161–1168.
22. Khan, A.; Fouda, M.M.; Do, D.T.; Almaleh, A.; Rahman, A.U. Short-term traffic prediction using deep learning long short-term
memory: Taxonomy, applications, challenges, and future trends. IEEE Access 2023, 11, 94371–94391. [CrossRef]
23. Raja, J.B.; Pandian, S.C. PSO-FCM based data mining model to predict diabetic disease. Comput. Methods Programs Biomed. 2020,
196, 105659. [CrossRef]
24. Zhu, C.; Idemudia, C.U.; Feng, W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means
techniques. Inform. Med. Unlocked 2019, 17, 100179. [CrossRef]
25. Wu, H.; Yang, S.; Huang, Z.; He, J.; Wang, X. Type 2 diabetes mellitus prediction model based on data mining. Inform. Med.
Unlocked 2018, 10, 100–107. [CrossRef]
26. García-Ordás, M.T.; Benavides, C.; Benítez-Andrades, J.A.; Alaiz-Moretón, H.; García-Rodríguez, I. Diabetes detection using
deep learning techniques with oversampling and feature augmentation. Comput. Methods Programs Biomed. 2021, 202, 105968.
[CrossRef]
27. Qi, H.; Song, X.; Liu, S.; Zhang, Y.; Wong, K.K. KFPredict: An ensemble learning prediction framework for diabetes based on
fusion of key features. Comput. Methods Programs Biomed. 2023, 231, 107378. [CrossRef]
28. Al Rafi, A.S.; Rahman, T.; Al Abir, A.R.; Rajib, T.A.; Islam, M.; Mukta, M.S.H. A new classification technique: Random weighted
lstm (rwl). In Proceedings of the 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, 5–7 June 2020; pp. 262–265.
Mathematics 2025, 13, 628 25 of 25

29. Tagmatova, Z.; Abdusalomov, A.; Nasimov, R.; Nasimova, N.; Dogru, A.H.; Cho, Y.I. New approach for generating synthetic
medical data to predict type 2 diabetes. Bioengineering 2023, 10, 1031. [CrossRef] [PubMed]
30. Noguer, J.; Contreras, I.; Mujahid, O.; Beneyto, A.; Vehi, J. Generation of individualized synthetic data for augmentation of the
type 1 diabetes data sets using deep learning models. Sensors 2022, 22, 4944. [CrossRef] [PubMed]
31. Butt, U.M.; Letchmunan, S.; Ali, M.; Hassan, F.H.; Baqir, A.; Sherazi, H.H.R. Machine learning based diabetes classification and
prediction for healthcare applications. J. Healthc. Eng. 2021, 2021, 9930985. [CrossRef]
32. Antal, B.; Hajdu, A. An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst. 2014, 60, 20–27.
[CrossRef]
33. Nguyen, Q.H.; Muthuraman, R.; Singh, L.; Sen, G.; Tran, A.C.; Nguyen, B.P.; Chua, M. Diabetic retinopathy detection using deep
learning. In Proceedings of the 4th International Conference on Machine Learning and Soft Computing, Haiphong City, Vietnam,
17–19 January 2020; pp. 103–107.
34. Olaniran, O.R.; Abdullah, M.A.A. Bayesian weighted random forest for classification of high-dimensional genomics data. Kuwait
J. Sci. 2023, 50, 477–484. [CrossRef]
35. Olaniran, O.R.; Alzahrani, A.R.R. On the Oracle Properties of Bayesian Random Forest for Sparse High-Dimensional Gaussian
Regression. Mathematics 2023, 11, 4957. [CrossRef]
36. Olaniran, O.R.; Alzahrani, A.R.R.; Alzahrani, M.R. Eigenvalue Distributions in Random Confusion Matrices: Applications to
Machine Learning Evaluation. Mathematics 2024, 12, 1425. [CrossRef]
37. Kumari, S.; Kumar, D.; Mittal, M. An ensemble approach for classification and prediction of diabetes mellitus using soft voting
classifier. Int. J. Cogn. Comput. Eng. 2021, 2, 40–46. [CrossRef]
38. Rajendra, P.; Latifi, S. Prediction of diabetes using logistic regression and ensemble techniques. Comput. Methods Programs Biomed.
Update 2021, 1, 100032. [CrossRef]
39. Wu, Y.; Zhang, Q.; Hu, Y.; Sun-Woo, K.; Zhang, X.; Zhu, H.; Jie, L.; Li, S. Novel binary logistic regression model based on feature
transformation of XGBoost for type 2 Diabetes Mellitus prediction in healthcare systems. Future Gener. Comput. Syst. 2022,
129, 1–12. [CrossRef]
40. Roobini, M.; Lakshmi, M. Autonomous prediction of Type 2 Diabetes with high impact of glucose level. Comput. Electr. Eng.
2022, 101, 108082. [CrossRef]
41. Nipa, N.; Riyad, M.H.; Satu, S.; Walliullah; Howlader, K.C.; Moni, M.A. Clinically adaptable machine learning model to identify
early appreciable features of diabetes. Intell. Med. 2024, 4, 22–32. [CrossRef]
42. Devi, R.M.; Keerthika, P.; Devi, K.; Suresh, P.; Sangeetha, M.; Sagana, C.; Devendran, K. Detection of diabetic retinopathy using
optimized back-propagation neural network (Op-BPN) algorithm. In Proceedings of the IEEE 2021 5th International Conference
on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 1695–1699.
43. Gadekallu, T.R.; Khare, N.; Bhattacharya, S.; Singh, S.; Maddikunta, P.K.R.; Srivastava, G. Deep neural networks to predict
diabetic retinopathy. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 5407–5420. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like