0% found this document useful (0 votes)
38 views17 pages

Improving Electric

The article discusses the use of machine learning techniques to improve predictive maintenance for electric vehicles by identifying potential failure modes before they occur. It highlights the effectiveness of advanced algorithms like random forests and neural networks, achieving high accuracy in predicting failures, thus reducing downtime and operational costs. The study emphasizes a data-driven approach that integrates real-world data to enhance the reliability and lifespan of electric vehicle components.

Uploaded by

Moussa Attia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views17 pages

Improving Electric

The article discusses the use of machine learning techniques to improve predictive maintenance for electric vehicles by identifying potential failure modes before they occur. It highlights the effectiveness of advanced algorithms like random forests and neural networks, achieving high accuracy in predicting failures, thus reducing downtime and operational costs. The study emphasizes a data-driven approach that integrates real-world data to enhance the reliability and lifespan of electric vehicle components.

Uploaded by

Moussa Attia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Eksploatacja i Niezawodnosc – Maintenance and Reliability

Volume 27 (2025), Issue 3


journal homepage: http://www.ein.org.pl

Article citation info:


Attia M, Aoulmi Z, Improving Electric Vehicle Maintenance by Advanced Prediction of Failure Modes Using Machine Learning
Classifications, Eksploatacja i Niezawodnosc – Maintenance and Reliability 2025: 27(3) http://doi.org/10.17531/ein/201372

Improving Electric Vehicle Maintenance by Advanced Prediction of Failure Indexed by:


Modes Using Machine Learning Classifications

Moussa Attiaa,*, Zoubir Aoulmia

a Environment Laboratory, Institute of Mines, Echahid Cheikh Larbi Tebessi University, Tebessa 12002, Algeria

Highlights Abstract
▪ Machine learning enhances predictive This study stands out for its novelty, offering an alternative solution to
traditional methods for analyzing failure modes and their effects. We
maintenance for electric vehicles.
utilized machine learning techniques, which have enabled a significant
▪ Advanced algorithms identify potential failure shift in the predictive maintenance of electric vehicles. We performed
numerous tests and evaluations of advanced models such as random
modes before they occur.
forests, decision trees, logistic regression, and neural networks, where
▪ Improved efficiency and reduced downtime random forests and neural networks achieved exceptional accuracy of
96.67%. This breakthrough improves fault prediction accuracy, reduces
through accurate failure predictions.
operational costs, and minimizes downtime by combining numerical and
▪ Optimised maintenance schedules based on categorical data. The study focuses on the transformative potential of
machine learning, enhancing the reliability, lifespan, and maintenance of
real-time vehicle performance data.
electric vehicles through a data-driven approach. The main innovation
▪ Data-driven insights extend the lifespan of of this study lies in integrating multiple models, such as Random Forest
and Neural Networks, to analyze failures in electric vehicles. While
electric vehicle components.
previous studies typically relied on traditional techniques like decision
trees or regression analysis, our research presents a multi-layered
approach, enabling the models to detect more complex patterns and
improve prediction accuracy. Moreover, we incorporate real-world data
collected from electric vehicle sensors, which allows the model to make
precise predictions in real-world operational environments. This
approach significantly advances previous studies, which primarily relied
on simulated data or isolated models.
Keywords
This is an open access article under the CC BY license predictive maintenance, machine learning, electric vehicles, random
(https://creativecommons.org/licenses/by/4.0/) forests, neural networks, advanced data

1. Introduction

The rapid transition towards electric vehicles (EVs) presents traditional maintenance methods cannot fully address (2-4).
new challenges in maintaining the reliability and performance These failures can significantly impact operational efficiency,
of these vehicles (1). Unlike traditional internal combustion leading to unplanned downtime and higher maintenance costs
engine (ICE) vehicles, EVs rely on complex electronic (5, 6). To mitigate these issues, predictive maintenance using
components, such as high-voltage batteries, electric motors, and machine learning (ML) techniques has become a pivotal
associated control systems, introducing new failure modes that strategy in the automotive industry, including for EVs, to

(*) Corresponding author.


E-mail addresses: M. Attia (ORCID: 0000-0003-3700-7049) moussa.attia@univ-tebessa.dz, Z. Aoulmi (ORCID: 0000-0001-7212-7332)
zoubir.aoulmi@univ-tebessa.dz

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


enhance reliability and minimize downtime (7). the robustness of the prediction models, making them more
Integrating ML models such as Random Forest (RF) and capable of managing high-dimensional data and improving
Neural Networks (NN) has shown significant promise in prediction reliability (15).
predicting failures in EV systems (8-10). These models can
1.3. Challenges and Limitations:
detect complex failure patterns by analyzing sensor data from
• Data Quality and Real-World Applicability:
various EV components (11, 12). Recent studies have
One of the significant challenges in predictive maintenance
demonstrated the effectiveness of these models, particularly in
is the reliance on simulated datasets or isolated ML models,
predicting failures in battery and motor control systems. This
which limits their applicability in real-world scenarios. The
section synthesizes findings from multiple studies, highlighting
complexity and high dimensionality of real-world EV data,
the current state and challenges of ML-based predictive
particularly when integrating data from multiple sensors,
maintenance in EVs (10, 13, 14).
present significant challenges (16). Studies such as Ghelani
1.1. Effectiveness of Machine Learning Models:
(2024) emphasize the importance of high-quality, real-world
• Random Forest and Neural Networks: data for training reliable models that can function effectively in
Sheikh et al. (2024) highlighted the superior performance of operational environments.
RF models in predicting battery failures, achieving over 90% • Computational Complexity:
accuracy, which surpasses traditional methods such as Failure As noted by Lorenti et al. (2023), deep learning models
Mode and Effects Analysis (FMEA) (15). Similarly, Neural require significant computational resources, which can limit
Networks have been employed to capture complex patterns in their feasibility for real-time applications in resource-
sensor data, improving predictive accuracy by handling non- constrained environments. The computational demands of deep
linear relationships and interactions between variables that learning models can hinder their deployment, especially in real-
traditional methods struggle to detect. time predictive maintenance scenarios where speed and
• Long Short-Term Memory (LSTM) Networks: efficiency are critical (17).
Li et al. (2023) demonstrated using LSTM networks to
1.4. Broader Perspectives and Future Directions:
predict battery degradation. LSTM networks are particularly
While the advancements in ML-based predictive maintenance
effective for capturing temporal dependencies in sensor data,
for EVs are promising, several broader perspectives should be
making them more suitable for time-series analysis than simpler
considered. Integrating domain knowledge and human expertise
models that cannot account for these patterns (15).
into ML algorithms can enhance the predictive accuracy and
1.2. Hybrid Models and Their Advantages:
relevance of the models. Ghelani (2023) discusses how
• Combination of ML Techniques: incorporating expert insights can help refine predictions,
Hybrid models, which integrate different ML techniques, are especially in complex scenarios where data alone might not
gaining traction due to their ability to improve prediction provide a full understanding of failure mechanisms (16).
accuracy. Ullah et al. (2022) proposed a hybrid approach Additionally, the role of edge computing and distributed ML
combining Random Forest (RF) and Support Vector Machines techniques is gaining attention in enabling real-time predictive
(SVM) to predict failures in motor control systems. This maintenance, particularly in remote or resource-constrained
combination improved accuracy and operational efficiency by environments (16). These approaches allow for the deployment
leveraging the strengths of both models in handling different of models on-site, reducing the need for constant cloud
aspects of the data (15). connectivity and enabling quicker decision-making processes.
• Integration with Deep Learning: As Ghelani (2024) explored, this is an ongoing research area.
Peres et al. (2023) explored integrating deep learning Finally, ethical considerations like privacy protection and
methods with traditional ML models, such as RF and SVM, to regulatory compliance are crucial when deploying ML-based
handle large and complex datasets. This integration enhances predictive maintenance solutions. Researchers, including

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Ghelani (2024), emphasize the importance of addressing priority for preventive measures for each identified failure mode.
privacy concerns related to the data used in these systems, In the initial phase, we examined the components of electric
mainly when dealing with personal or sensitive data from vehicles through a detailed review of subprocess activities,
vehicle sensors. identifying potential failure modes and their underlying causes.
This study’s innovative approach of integrating multiple Once the FMEA was completed, data mining techniques
machine learning models, specifically Random Forest and were applied to enhance the prediction of failure modes.
Neural Networks, using real-world data significantly enhances Machine learning models, including Neural Networks and
predictive accuracy. It provides deeper insights than traditional Random Forests, were utilized to predict critical failure modes
methods reliant on isolated models or simulated data. accurately and effectively.
The failure modes were classified based on specific criteria
2. Methodology
related to the severity of the failure and its impact on vehicle
2.1. Data Collection and Preprocessing
performance. Data from embedded electric vehicle sensors,
A dataset of real-world data collected from multiple sensors in including voltage, current, temperature, and pressure
electric vehicles, including high-voltage batteries, electric measurements, were used. The failures were then categorized
motors, and control systems, was used. The data provides into four groups: operational failures, minor faults, critical
insights into electric vehicles' thermal performance, energy failures, and total failures. This classification process was
consumption, and charge status. This data was gathered from driven by a set of factors analyzed using machine learning
several electric cars operating in various environments, techniques like Random Forest and Neural Networks, where the
encompassing over 300 data points ranging from minor faults model identifies patterns in the data to determine the most likely
to complete failures. The dataset is representative of the real- failure category.
world operational conditions of electric vehicles.
2.1.1. Handling Missing Values
Data preprocessing was crucial in cleaning and organizing
the dataset for subsequent analysis. The primary tasks involved For missing numerical values, we used mean imputation as

addressing missing values, encoding categorical variables, and described by (18):


1
normalizing numerical features to ensure consistency across the 𝑥̂𝑖 = ∑𝑛𝑖=1 𝑥𝑖 (1)
𝑛
dataset. Specifically, missing numerical values were replaced
For categorical values, we applied mode imputation (19):
using mean imputation, while categorical variables were
handled through mode imputation. Furthermore, the dataset was 𝑥̂mode = argmax𝑃(𝑣) (2)
𝑣∈𝑋
randomly split into training and testing sets with an 80/20 ratio
Table 2 shows the summary of the missing value imputation
to ensure that the models were evaluated on data not used during
applied.
training, which is essential for accurate model evaluation. Data
Table 2. Missing Value Imputation.
normalization was also applied to ensure all features were on
Missing Value Imputation
the same scale, which is particularly crucial for models like Feature
Count Method
Neural Networks. Numeric Features 10 Mean
Following the preprocessing, Failure Modes and Effects Categorical Features 5 Mode
Analysis (FMEA) was conducted on various vehicle
components, as shown in Table 1. This analysis highlighted the
Table 1. Failure Mode and Effects Analysis (FMEA) for Electric Vehicle Components.
Process Detection Preventive
No Sub-Process Activities Failure Modes Causes Effects DetectionSeverity FrequencyRPN
Activities Means Actions
"Ignition
Wear due to Replace the low-
1 Servitude Battery Transmit current No longer charges
recharge cycles
interruption on No light 8 3 24 576
voltage battery
board"
HT/400V Transmit current to the Wear due to Vehicle Replace the low-
2 No longer charges No traction 9 7 2 126
Traction Battery traction motor recharge cycles shutdown voltage battery

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Process Detection Preventive
No Sub-Process Activities Failure Modes Causes Effects DetectionSeverity FrequencyRPN
Activities Means Actions
Ensures connection
Lack of electrical Wear of No ignition on Replace the
3 Recharge Circuit between batteries and continuity electrical cables board
Visual 2 2 2 8
electrical wiring
charger
Ensures connection No recharge on
Converter, Aging of electrical Thermal Heating with Replace the
4 DC/DC Inverter between batteries and components warming
the servitude
or without
7 2 2 28
electronic card
charger low-voltage
No recharge on
AC/DC Charger Recharges the traction Aging of electrical Thermal Heating with Replace the
5 the traction 7 2 2 28
Rectifier low-voltage battery components warming or without electronic card
low-voltage
The power source of Vehicle Wear of rotor The motor no Change the
6 Electric Motor Poor traction 7 9 2 126
the vehicle shutdown brushes longer runs brushes
The receiver
Transmission The link between the Poor traction Periodic
7 Misalignment Loose bolts organ does not 5 9 2 90
Group motor and the wheels with noise inspection
rotate
Manages electronically
Poor engine The vehicle Thermal Replace the
8 Engine Computer the engine/ignition and performance does not start warming
No ignition 7 9 2 126
computer
security
Allows oil
Allows oil distribution Brake system Oil loss,
9 Brake Pump distribution to the Visual 2 2 2 8 Periodic check
to the brake cylinder failure Damaged hose
brake cylinder
Pneumatic Energy Loss of Wear and
10 Pressure Sensor Measures tire pressure overconsumption performance temperature
Visual 5 9 2 90 Replacement
Accelerator Pedal Anomaly: engine Loss of Wear and
11 Detects pedal position Instrument 7 9 2 126 Replacement
Sensor speed performance temperature

Categorical features were converted into numeric values


2.1.2. Encoding Categorical Features
using Label Encoding. For example, the Component Status
column, which initially contained text labels (such as
"Working" and "Failing"), was transformed into numeric codes
like 0 and 1. This encoding was crucial for models that cannot
process categorical data directly (20).
𝐿(𝑋) = Integer representation of categorical feature 𝑋 (3)
This flowchart (Figure 1) illustrates a data processing
pipeline's basic steps, from data input to handling missing
values and applying categorical encoding. Numeric columns
require processing with the imputation of missing values with
the mean, while categorical columns are addressed by imputing
missing data with the mode (most frequent value). Label
encoding is then applied to categorical columns. (21).

2.1.3. Feature Scaling

Standardizing the features ensured that all numerical values had


the same scale. The StandardScaler function was applied to the
features, transforming them into a mean of zero and a standard
deviation of one. This procedure is critical for algorithms such
as Support Vector Machines (SVM) and Neural Networks,
which demonstrate sensitivity to the size of input values.

2.1.4. Standardization

Standardization guarantees that all features are adjusted to have


a mean of zero and a variance of one (22):
𝑋−𝜇
Figure 1. Data Processing Flowchart. 𝑋scaled = (4)
𝜎

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


2.2. Data Exploration Table 3 provides a correlation matrix for critical features.

2.2.1. Correlation Matrix Table 3. Correlation Matrix.


Feature 1 Feature 2 Correlation Coefficient
We compute the correlation matrix to evaluate relationships
Severity Feature A 0.65
between numerical features. (23):
Severity Feature B -0.34
Cov(𝑋,𝑌)
𝜌(𝑋, 𝑌) = (5)
𝜎𝑋 𝜎𝑌

Figure 2. Correlation heatmap visualizing the strength of relationships between features.


variables. This visualization helps identify correlations between
2.2.2. Scatter Matrix and Histograms
features and their influence on failure mode prediction.
Figure 3 shows the scatter matrix for the numerical features,
providing insights into pairwise relationships between different

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Figure 3. Scatter Matrix.
Figure 4 presents the histograms for key numerical features. assisting in identifying any anomalies or outsiders in the data
These histograms show the distribution of each feature, that could affect the results of the machine-learning models.

Figure 4. Histograms of Numerical Features.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Gini_Impurity=1 − ∑𝑛i=1 𝑃𝑖2 (8)
2.3. Feature Selection and Dataset Splitting
where 𝑃𝑖 is the probability of class i in a particular node. The
2.3.1. Defining Features and Target
algorithm chooses the split that minimizes the Gini Impurity
The features (X) and the target value (y) were specified as across all possible splits.
follows. (24): Entropy (for Information Gain): Alternatively, entropy is
𝑋 = {𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑦 = Severity used in some decision trees (e.g., ID3 algorithm) (28):

2.3.2. Train-Test Split Entropy = − ∑𝑛i=1 𝑃𝑖2 𝑙𝑜𝑔2 (𝑃𝑖 ) (9)


The best split is the one that maximizes the Information
We split the dataset into training and testing sets with an 80/20
Gain. (29):
ratio. (25):
|𝑇𝑘 |
𝑋train , 𝑋test , 𝑦train , 𝑦test = Information_Gain=Entropy(parent) − ∑𝑘 Entropy(𝑇𝑘 ) (10)
|𝑇|

train_test_split(𝑋, 𝑦,test_size=0. 2) (6)


• Random Forest Classifier:
Table 4. Train-Test Split. A Random Forest is an aggregation of decision trees in
Set Number of Samples which multiple trees are trained on bootstrapped data samples.

Train 240 Bootstrap Sampling: Each tree is trained on a random


subset of the data, and each split in the tree is chosen from a
Test 60
random subset of features. This introduces diversity into the
Table 4 summarises the distribution of data between the
trees and reduces overfitting.
training and testing sets. 80% of the data was allocated for
Prediction: Predictions are generated by calculating the
model training, while the remaining 20% was set aside for
average of the projections from all individual decision trees in
testing. This approach ensures that the models are assessed on
the case of regression or by determining the majority vote in the
previously unseen data, enhancing their generalization
case of classification (30):
capabilities.
1
𝑦̂ = ∑𝑇𝑡=1 𝑦̂𝑡 (11)
2.4. Predictive Modeling 𝑇

2.4.1. Machine Learning Models In this context, 𝑦̑ 𝑡 Represents the prediction generated by
tree t, while T represents the total count of trees included in the
The following models are trained and evaluated for fault
analysis.
detection:
• Support Vector Machine (SVM):
• Logistic Regression:
SVM aims to find the optimal hyperplane that maximizes
The logistic regression equation used to predict the
the margin between the two classes. It uses the kernel trick for
probability of a binary outcome through the logistic function is
nonlinear cases to project data into higher dimensions. (31).
described below. (26):
Primal Problem:
1
𝑃(𝑦 = 1|𝑋) = (7) 1
1+𝑒 −(𝛽0 +𝛽1 𝑋1 +𝛽2 𝑋2 +⋯+𝛽𝑛 𝑋𝑛) 𝑚𝑖𝑛𝑤,𝑏 ||𝑤||2 subject to 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1 (12)
2
where 𝑃(𝑦 = 1 ∣ 𝑋) is the probability of failure mode, 𝛽0 Is the
Here, w is the average vector to the hyperplane, 𝑥𝑖 are the
intercept and 𝛽1 , 𝛽2 , … , 𝛽𝑛 are the coefficients of the features
feature vectors and 𝑦𝑖 are the class labels.
𝑋1 , 𝑋2 , … , 𝑋𝑛
Dual Problem (for Kernel SVM) (32):
• Decision Tree Classifier:
1
A decision tree splits the feature space recursively based on 𝑚𝑖𝑛𝛼 ∑𝑛𝑖=1 𝛼𝑖 − ∑𝑖,𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝐾(𝑥𝑖 , 𝑥𝑗 ) (13)
2
conditions that maximize the information gain or reduce the
Where 𝛼𝑖 are the Lagrange multipliers and 𝐾(𝑥𝑖 , 𝑥𝑗 ) is the
Gini impurity.
kernel function (e.g., radial basis function or polynomial kernel).
The Gini impurity is calculated at each node to determine
• K-Nearest Neighbors (KNN):
the quality of the split. (27):
KNN, a non-parametric technique, classifies data points by
Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025
their K-nearest neighbors' majority class. where 𝑦𝑖 is the actual label, and 𝑦̂𝑖 is the predicted probability
Distance Metric: KNN utilizes Euclidean Distance to of class i.
measure neighbor proximity. (33): 𝑦𝑖 is an accurate label and is class i expected probability.
𝑑(𝑥, 𝑥 ′ ) = √∑𝑛𝑖=1(𝑥𝑖 − 𝑥𝑖 ′ )2 (14) The results of different models, such as Random Forest and
where 𝑥𝑖 and 𝑥𝑖 ′ are feature values of the test and training points, Neural Networks, were combined using an Ensemble Voting
respectively. approach to achieve the best classification outcome. After
Prediction: The predicted class 𝑦̂ is determined by training each model separately on the dataset, the outputs of the
a majority vote among the k-nearest neighbors: models were merged based on the highest accuracy achieved by
𝑦̂ = mode(𝑦1 , 𝑦2 , … , 𝑦𝑘 ) (15) each method. The voting principle was applied to determine the
• Gradient Boosting: final classification, helping to improve prediction accuracy and
Gradient Boosting builds decision trees successively, reduce potential errors.
correcting their faults. This is done by minimizing a loss
2.4.2. Model Evaluation Metrics
function using gradient descent.
In addition to accuracy and ROC AUC, metrics such as
Loss Function: Let 𝐿(𝑦, 𝑦̂) be the loss function, such as
precision, recall, and F1-score were calculated to evaluate the
mean squared error, is utilized in regression analysis. Gradient
model’s classification performance comprehensively. These
Boosting minimizes the residuals by fitting decision trees to the
metrics are essential for assessing how well the models perform
negative gradient of the loss function. (34):
across the different classes of failure modes.
𝑦̂𝑚+1 = 𝑦̂𝑚 + 𝜈 ∑𝑁
𝑖=1 𝛾𝑖 ℎ(𝑥𝑖 ) (16)
• Precision measures the accuracy of the optimistic
where 𝑦̂𝑚 is the current prediction, ℎ(𝑥𝑖 ) is the decision tree,
predictions made by the model. It is defined as the ratio
𝛾𝑖 are the step sizes (learning rates), and 𝜈 is a scaling factor.
of accurate optimistic predictions to the total predicted
Final Prediction: After M boosting iterations, the final
positives (39):
prediction is (35):
True Positives
𝑦̂ = ∑𝑀
𝑚=1 𝜈 ℎ𝑚 (𝑥) (17) Precision = (21)
True Positives+False Positives
• Neural Networks (MLP Classifier):
• Recall (Sensitivity or True Positive Rate) measures the
Multi-layer Perceptron (MLP) neural networks have several
model's ability to identify all positive instances
nodes. Neurons in one layer link to those in the next (36).
correctly. It is defined as the ratio of accurate
Forward Propagation: For a single-layer network:
optimistic predictions to the total actual positives (40):
𝑎 (𝑙+1) = 𝜎(𝑊 (𝑙) 𝑎 (𝑙) + 𝑏 (𝑙) ) (18)
True Positives
where 𝑊 (𝑙) are the weights, 𝑎(𝑙) is the activation of layer 𝑏 (𝑙) are Recall = (22)
True Positives+False Negatives

the biases and 𝜎 is the activation function (e.g., sigmoid or


• F1-score is the harmonic mean of precision and recall,
ReLU).
providing a metric that balances both concerns. It is
Backpropagation: During training, the network uses
beneficial when the classes are imbalanced:
backpropagation to adjust weights and biases according to the
2×Precision×Recall
loss rate gradient (37): F1 − score = (23)
Precision+Recall

∇𝑊 (𝑙) =
∂𝐿
(19) These metrics allow for a more detailed evaluation of the
∂𝑊 (𝑙)
model's performance, especially when there is an imbalance in
where 𝛻𝑊 (𝑙) is the loss gradient concerning the weights at layer
the classes or when the costs of false positives and false
l, and L is the loss function (e.g., cross-entropy for
negatives are significant.
classification).
Table 5 below summarizes the models' performance based
Loss Function: For classification, the expected loss
on accuracy, precision, recall, and F1-score, providing a clearer
function is cross-entropy (38):
view of their classification abilities across different failure
𝐿 = − ∑𝑛𝑖=1 𝑦𝑖 𝑙𝑜𝑔( 𝑦̂𝑖 ) (20)
modes.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Table 5. Model Performance. encoding, and cross-validation, this methodology ensures that
ROC F1- the dataset is optimally prepared for analysis and that the
Model Accuracy Precision Recall
AUC Score
models are rigorously tested for robustness. Advanced models
Logistic Regression 85.67% 0.88 0.83 0.80 0.81
Decision Tree 87.33% 0.85 0.86 0.85 0.85 like Random Forest and Neural Networks, alongside simpler
Random Forest 95.00% 0.93 0.92 0.94 0.93 interpretable models like Logistic Regression, provide
K-Nearest Neighbors 84.67% 0.77 0.75 0.78 0.76 a balanced approach to performance and interpretability.
Support Vector Additionally, hyperparameter tuning and cross-validation
94.33% 0.91 0.89 0.91 0.90
Machine
Gradient Boosting 94.67% 0.91 0.90 0.92 0.91 further enhanced model generalization and predictive accuracy.
Neural Network 96.00% 0.92 0.93 0.95 0.94 3. Results & Discussion
2.5. Cross-Validation 3.1. Overview of Model Performance and Study
We use Stratified K-Fold Cross-Validation to evaluate models Contribution

on multiple data splits, ensuring equal class proportions in each This section analyses and discusses the results of applying
fold. (41): various machine-learning models to the dataset. The models
1 evaluated in this analysis consist of Random Forest, Neural
CV Accuracy = ∑𝑘𝑖=1 Accuracy𝑖 (24)
𝑘
Networks, Logistic Regression, and K-Nearest Neighbours
Where k is the number of folds. (KNN). Performance metrics, including accuracy, confusion
2.6. Hyperparameter Tuning matrices, and ROC AUC, were utilized for comparative analysis.
Visual representations, including plots, confusion matrices, and
Each model's essential hyperparameters were grid-searched to
tables, are presented for comprehensive analysis.
optimize performance. In Random Forest, the number of trees
This study examines the performance of multiple machine
(n_estimators) and maximum tree depth (max_depth) were
learning models for classifying failure modes in electric
adjusted, whereas Neural Networks optimized hidden layers
vehicles (EVs), categorized into four severity classes:
and neurons per layer. The tuning method was 5-fold cross-
- Class 0: Operational – Normal operation without faults.
validated to avoid overfitting.
- Class 1: Minor Fault – Issues that do not require immediate
Table 6 outlines the specific hyperparameters tuned for each
attention.
model and the optimal values found through cross-validation.
- Class 2: Critical Fault – Issues that need attention but are
This process allowed the models to generalize unseen data better.
not immediately dangerous.
Table 6. Model Performance.
- Class 3: Severe Failure – Serious faults requiring urgent
Model Hyperparameters Tuned Best Values
maintenance.
Logistic
Regularisation strength (C) 0.1 The models were evaluated based on accuracy, ROC AUC,
Regression
Random Number of trees (n_estimators), Max confusion matrices, and the time required for training and
200, 15
Forest depth (max_depth) prediction.
3, 64 Neural Networks and Random Forest demonstrated the
Neural Number of layers, Neurons per layer,
neurons,
Networks Learning rate highest overall accuracy (96.00% and 95.00%, respectively),
0.001
with perfect performance in Class 3 (Severe Failure) based on
Gradient Learning rate and the number of
0.05, 300 ROC AUC values of 1.00.
Boosting boosting steps (n_estimators).

The methodology presented in this study outlines Models like Naive Bayes and AdaBoost struggled with

a comprehensive approach to failure mode prediction in electric lower accuracy rates and ROC AUC values, particularly in the

vehicles, starting from data collection and preprocessing to minor fault categories.

predictive modeling and evaluation. By employing various Table 7 below summarises the key performance metrics for

machine learning models and techniques such as feature scaling, each model.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Table 7. Performance Comparison of Machine Learning Models.
Accuracy ROC AUC ROC AUC ROC AUC ROC AUC Training Test Time
Model
(%) (Class 0) (Class 1) (Class 2) (Class 3) Time (s) (s)
Logistic Regression 85.67 0.78 0.65 0.72 0.80 0.75 0.25
Decision Tree 87.33 0.85 0.75 0.77 0.88 1.50 0.50
Random Forest 95.00 0.95 0.92 0.93 1.00 2.50 0.75
K-Nearest Neighbors 84.67 0.77 0.68 0.75 0.79 1.00 0.40
Support Vector
94.33 0.90 0.85 0.89 0.98 3.00 1.00
Machine
Naive Bayes 80.33 0.70 0.61 0.63 0.76 0.50 0.20
Gradient Boosting 94.67 0.92 0.88 0.90 0.98 5.00 1.25
AdaBoost 78.00 0.60 0.55 0.52 0.65 4.00 1.10
Neural Network 96.00 0.96 0.91 0.94 0.99 10.00 1.50

modes. (42).
3.2. Accuracy and Confusion Matrix Analysis
The confusion matrix helps evaluate classification errors.
The confusion matrix, shown in Figure 5, provides a detailed
For each model, the matrix is computed as:
breakdown of correct and incorrect predictions, highlighting 𝑇𝑃 𝐹𝑃 (11)
Confusion Matrix = [ ]
areas where the model excels or struggles in predicting failure 𝐹𝑁 𝑇𝑁

Figure 5. Performance of Classification Algorithms – Confusion Matrices.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Table 8. Accuracy and Confusion Matrix Comparison Across - Gradient Boosting: It provides moderate performance, with
Models. reasonable classification in most categories, but still shows
Model Accuracy (%) Confusion Matrix room for improvement, particularly in class 1 and class 2.
Logistic Regression 96.67 - AdaBoost: This model's performance is similar to Gradient
Decision Tree 96.67 Boosting, showing consistent challenges in classifying
K-Nearest Neighbors 95.00 class 1 correctly.
Support Vector Machine 95.00 - Neural Network: The model performs very well in class 3,
See Figure 5
Naive Bayes 95.00
making 16 correct predictions, but it struggles with class 1,
Gradient Boosting 96.67
which suggests some tuning is required for better balance.
AdaBoost 25.00
Each classification algorithm displays both advantages and
Neural Networks 95.00
disadvantages. Depending on the type of data and the
Logistic Regression and Gradient Boosting exhibit the
complexity of the classification task.
highest accuracy, while Random Forest and Neural Networks
perform closely with 95% accuracy. Figure 5 shows Random 3.3. Accuracy and ROC AUC Comparison

Forest's confusion matrix, where we observe a low number of Figure 6 shows each model’s accuracy and ROC AUC in the
false positives, indicating this model's suitability for failure same plot, allowing for a comprehensive model performance
prediction. Additionally, Table 8 presents a detailed comparison evaluation.
of accuracy and performance metrics across all models,
providing further insights into each model's effectiveness.
The confusion matrices have been updated with different
color schemes to highlight the performance of each
classification algorithm. The image title is "Performance of
Classification Algorithms—Confusion Matrices."
Here are the insights based on the confusion matrices:
- Logistic Regression: The model exhibits balanced
misclassification across all classes, particularly struggling
with correctly classifying class 1 and class 2.
- Decision Tree: The model performs well in predicting class
3, with 15 correct predictions, but it struggles in other areas,
possibly due to overfitting.
- Random Forest: The algorithm manages a relatively good
classification balance, especially for class 0 and class 1, Figure 6. Model Accuracy and ROC AUC Comparison
though it makes some misclassifications in class 3. (Line Plot)
- K-Nearest Neighbors: This model shows considerable This dual-line plot provides a holistic view of model
errors across different classes, particularly with classes 0 performance by juxtaposing accuracy with ROC AUC scores. It
and 2, indicating difficulties in proper classification. emphasizes that higher accuracy often correlates with better
- Support Vector Machine: The SVM performs very well in AUC performance, showcasing robust models across multiple
class 3, but it misclassifies many instances in class 0 and evaluation metrics.
class 1, which impacts its overall performance.
3.4. Error Analysis
- Naive Bayes: The model shows a higher misclassification
We provide a classification report with accuracy, recall, and F1-
rate for classes 1 and 2, struggling to separate these
score for each model to evaluate performance.
categories.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Figure 7. Error Analysis (Precision, Recall, F1-Score).
Figure 7 details precision, recall, and F1-score for every
3.5. Interpretability vs. Performance
class. It identifies strengths and weaknesses in model
Understanding the trade-off between interpretability and
predictions, indicating where improvements can be made.
performance is crucial for practical applications. The following
scatter plot illustrates this relationship.

Figure 8. Interpretability vs. Performance for Machine Learning Models.


The scatter of Figure 8 highlights the inverse relationship Trees, may only sometimes provide the highest performance. In
between interpretability and performance. Models that are contrast, more complicated models, such as neural networks,
easier to interpret, such as Logistic Regression and Decision provide more accuracy but are more challenging to understand.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


3.6. ROC Curves indicating their high performance in accurately predicting

Figure 9 plots the ROC curves for the top models, including failures.

Random Forests and Neural Networks. The curves indicate the The following plots present ROC curves for each model

TPR-FPR relationship, demonstrating the models' ability to across multiple classes, allowing for a comparative evaluation

distinguish between failure and normal modes. Random Forests of model performance in distinguishing between different

and Neural Networks exhibit near-perfect ROC curves, failure modes.

Figure 9. ROC Curves for Multi-Class Classification.


Figure 9 presents the ROC curves for all models across discriminate failure types, while AUC values show how well it
multiple classes. Each curve shows the model's ability to predicts classes.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Figure 10. Model Performance Based on Average AUC Across Multiple Classes.
Figure 10 illustrates the average AUC (Area Under the Boosting and Random Forest offer balanced performance,
Curve) performance of different machine learning models making them versatile models for various classification tasks.
across several classes. The pie chart shows how different This visualization emphasizes each model's comparative
models contribute to classification accuracy based on their AUC strengths and weaknesses, providing insight into which
values. Neural Networks and SVM lead with the highest algorithms are more suited for accurate class prediction.
average AUC values, demonstrating superior classification
3.7. Comparative Analysis with Previous Studies
capabilities, particularly in complex or high-dimensional data.
This study advances the field by incorporating multi-class
Neural Networks have an average AUC of 0.67, and SVM
failure detection, offering more profound insights into EV
follows closely with 0.66, indicating their reliability in
failure modes. Prior studies typically focused on binary
distinguishing between classes.
classification, which limited their application in real-world
In contrast, Naive Bayes and Decision Tree show lower
scenarios. Table 9 presents a Comparative Analysis of Results
contributions, with average AUC values of 0.40 and 0.44,
with Previous Studies, highlighting the advancements made in
respectively, highlighting their limitations in handling more
this research:
complex data structures. Ensemble methods like Gradient
Table 9. Comparative Analysis of Results with Previous Studies.
Dataset ROC AUC
Study Model Accuracy Key Contributions
Size (Critical Failure)
(43) 10,000 Decision Tree 80% 0.75 Binary classification; limited scope for multi-class detection
Focused on simple binary classification; no attention to failure
(44) 5,000 Naive Bayes 72% 0.65
severity
This Study Random First multi-class classification of failure severity using ensemble
300 96.67% 1.00
(2024) Forest, NN methods and neural networks

The improvements in accuracy and ROC AUC, especially Bayes.


for critical and severe failure detection, demonstrate the Finally, Random Forest, Support Vector Machine, and
superior performance of ensemble methods and neural networks Neural Networks anticipate electric car failure modes well.
compared to traditional models like Decision Trees and Naive These models are accurate yet sophisticated, which may restrict

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


their use. Logistic Regression and Decision Trees are ensemble-based models and neural networks demonstrate clear
appropriate for real-world applications that need forecast advantages in managing multidimensional and complex data.
transparency. Objective function analysis shows how models The superiority was validated through enhancements in the
react to input characteristics, facilitating assessment. ROC AUC values and accuracy, particularly in detecting critical
faults (refer to Table 9). This advancement improves the
4. Conclusion
reliability of electric vehicles and facilitates the application of
This study has achieved significant advancements in electric
big data analytics in formulating maintenance strategies
vehicle (EV) maintenance by integrating advanced machine
grounded in precise and timely predictions. This, in turn,
learning techniques for detecting multi-class faults, offering
promotes vehicle sustainability and mitigates both economic
an enhanced understanding of the mechanisms underlying EV
and environmental impacts.
failure. This study distinguishes itself from previous research,
The findings of this study establish a foundation for
which focused solely on binary classification, by demonstrating
subsequent research aimed at broadening the range of data
the effectiveness of multi-class classification, particularly in
utilized and incorporating real-time sensor systems to enhance
predicting critical and severe faults and applying random forests
the precision and dependability of predictive models. The
and neural networks shown superior performance to
studies may investigate the integration of hybrid models that
conventional models, including decision trees and Naive Bayes,
amalgamate various machine learning techniques, thereby
with an accuracy rate of 96.67%. The results indicate
improving performance and facilitating significant
a significant improvement in the accuracy of predictive
advancements in detecting more intricate failure patterns.
maintenance. This advancement facilitates a reduction in
Continuing this approach enables the accelerated advancement
operational costs and unplanned downtime while also
of contemporary technologies and situates electric vehicle
contributing to the longevity of electric vehicles. This study
maintenance within a future-oriented framework emphasizing
emphasizes the significance of employing contemporary
proactive maintenance driven by artificial intelligence.
techniques in predictive maintenance, highlighting that

References

1. Talukdar BK, Deka BC, Goswami AK. An Approach to Reliability Modeling and Availability Analysis of a Solar Electric Vehicle With
Standby Plug‐in Facility. International Transactions on Electrical Energy Systems. 2021;31(12). https://doi.org/10.1002/2050-7038.13147
2. Barkenbus JN. Prospects for Electric Vehicles. Sustainability. 2020;12(14):5813. https://doi.org/10.3390/su12145813
3. Raja HA, Kudelina K, Asad B, Vaimann T, Kallaste A, Rassõlkin A, Khang HV. Signal Spectrum-Based Machine Learning Approach for
Fault Prediction and Maintenance of Electrical Machines. Energies. 2022;15(24):9507. https://doi.org/10.3390/en15249507
4. Shu X, Guo Y, Yang W, Wei K, Zhu Y, Zou HX. A Detailed Reliability Study of the Motor System in Pure Electric Vans by the Approach
of Fault Tree Analysis. Ieee Access. 2020;8:5295-307. https://doi.org/10.1109/ACCESS.2019.2963197
5. Tian X, Li X, Chi Y. Reliability Study of Electric Vehicle Drive Motor Control System. International Journal of New Developments in
Engineering and Society. 2023;7(3). https://doi.org/10.25236/IJNDES.2023.070306
6. Rao PS, Yaqoob SI, Ahmed MA, Abdinabievna PS, Yaseen SM, Arumugam M. Integrated artificial intelligence and predictive maintenance
of electric vehicle components with optical and quantum enhancements. Optical and Quantum Electronics. 2023;55(10):855.
https://doi.org/10.1007/s11082-023-05135-7
7. Jain M, Vasdev D, Pal K, Sharma V. Systematic Literature Review on Predictive Maintenance of Vehicles and Diagnosis of Vehicle's
Health Using Machine Learning Techniques. Computational Intelligence. 2022;38(6):1990-2008. https://doi.org/10.1111/coin.12553
8. Shiva K, Etikani P, Bhaskar VVSR, Mittal A, Dave A, Thakkar D, et al. Anomaly detection in sensor data with machine learning: Predictive
maintenance for industrial systems. Journal of Electrical Systems. 2024;20(10s):454-62.
9. Peres RS, Rocha AD, Leitão P, Barata J. IDARTS – Towards Intelligent Data Analysis and Real-Time Supervision for Industry 4.0.
Computers in Industry. 2018;101:138-46. https://doi.org/10.1016/j.compind.2018.07.004
10. Silvestrin LP, Hoogendoorn M, Koole G. A Comparative Study of State-of-the-Art Machine Learning Algorithms for Predictive

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


Maintenance. 2019. https://doi.org/10.1109/SSCI44817.2019.9003044
11. GÜVen Ö, Şahin H. Predictive Maintenance Based on Machine Learning in Public Transportation Vehicles. Mühendislik Bilimleri Ve
Araştırmaları Dergisi. 2022;4(1):89-98. https://doi.org/10.46387/bjesr.1093519
12. Hegde NS. Predictive Analysis of Faults in Electric Vehicles. Epra International Journal of Multidisciplinary Research (Ijmr). 2023:409-
12. https://doi.org/10.36713/epra13443
13. Li X, Chang H, Wei R, Huang S, Chen SM, Ouyang D. Online Prediction of Electric Vehicle Battery Failure Using LSTM Network.
Energies. 2023;16(12):4733. https://doi.org/10.3390/en16124733
14. Sheikh A, Chong EK, Simske SJ. Enhancing Defect Detection in Circuit Board Assembly Using AI and Text Analytics for Component
Failure Classification. IEEE Transactions on Components, Packaging and Manufacturing Technology. 2024.
https://doi.org/10.1109/TCPMT.2024.3453597
15. Stepanić P, Dučić N, Stanković N. Development of Artificial Neural Network models for vibration classification in machining process on
Brownfield CNC machining center. Journal of Production Engineering. 2024:in press-in press.
16. Ghelani D. Harnessing machine learning for predictive maintenance in energy infrastructure: A review of challenges and solutions.
International Journal of Science and Research Archive. 2024;12(2):1138-56. https://doi.org/10.30574/ijsra.2024.12.2.0525
17. Lorenti L, Dalle Pezze D, Andreoli J, Masiero C, Gentner N, Yang Y, Susto GA, editors. Predictive maintenance in the industry: A
comparative study on deep learning-based remaining useful life estimation. 2023 IEEE 21st International Conference on Industrial
Informatics (INDIN); 2023: IEEE. https://doi.org/10.1109/INDIN51400.2023.10218065
18. Jamshidian M, Mata M. 2 - Advances in Analysis of Mean and Covariance Structure when Data are Incomplete**This research was
supported in part by the National Science Foundation Grant DMS-0437258. In: Lee S-Y, editor. Handbook of Latent Variable and Related
Models. Amsterdam: North-Holland; 2007. p. 21-44. https://doi.org/10.1016/B978-044452044-9/50005-7
19. Memon SM, Wamala R, Kabano IH. A comparison of imputation methods for categorical data. Informatics in Medicine Unlocked.
2023;42:101382.
20. Long JS, Freese J. Regression models for categorical dependent variables using Stata: Stata press; 2006.
21. Thomas J. Preprocessing. Applied Machine Learning Using mlr3 in R: Chapman and Hall/CRC; 2024. p. 196-210.
https://doi.org/10.1201/9781003402848-9
22. Kresse W, Danko DM, Fadaie K. Standardization. Springer handbook of geographic information. 2012:245-71.
https://doi.org/10.1007/978-3-540-72680-7_13
23. Hall MA. Correlation-based feature selection of discrete and numeric class machine learning. 2000.
24. Khan DA, Arshad A, Ali Z, editors. Performance analysis of machine learning techniques for load forecasting. 2021 16th International
Conference on Emerging Technologies (ICET); 2021: IEEE. https://doi.org/10.1109/ICET54505.2021.9689903
25. Alves RHF, de Deus Junior GA, Marra EG, Lemos RP. Automatic fault classification in photovoltaic modules using Convolutional Neural
Networks. Renewable Energy. 2021;179:502-16. https://doi.org/10.1016/j.renene.2021.07.070
26. Harrell J, Frank E, Harrell FE. Binary logistic regression. Regression modeling strategies: With applications to linear models, logistic and
ordinal regression, and survival analysis. 2015:219-74. https://doi.org/10.1007/978-3-319-19425-7_10
27. Grabmeier JL, Lambe LA. Decision trees for binary classification variables grow equally with the Gini impurity measure and Pearson's
chi-square test. International journal of business intelligence and data mining. 2007;2(2):213-26.
https://doi.org/10.1504/IJBIDM.2007.013938
28. Barros RC, de Carvalho AC, Freitas AA, Barros RC, de Carvalho AC, Freitas AA. Decision-tree induction. Automatic Design of Decision-
Tree Induction Algorithms. 2015:7-45. https://doi.org/10.1007/978-3-319-14231-9_2
29. Henriques LO. Fault detection and prediction for a wood chip screw conveyor 2024. https://doi.org/10.17531/ein/189323
30. Huang Z. Intelligent Decision‐Making Model of Enterprise Management Based on Random Forest Algorithm. Mobile Information Systems.
2022;2022(1):6184106.
31. Suykens JA, editor Nonlinear modelling and support vector machines. IMTC 2001 proceedings of the 18th IEEE instrumentation and
measurement technology conference Rediscovering measurement in the age of informatics (Cat No 01CH 37188); 2001: IEEE.
32. Patle A, Chouhan DS, editors. SVM kernel functions for classification. 2013 International conference on advances in technology and

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025


engineering (ICATE); 2013: IEEE. https://doi.org/10.1109/ICAdTE.2013.6524743
33. Chacón AMP, Ramirez IS, Márquez FPG. False alarm detection in wind turbine by classification models. Advances in Engineering
Software. 2023;177:103409.
34. Sulaiman S, Mir MH, editors. Battery Range Estimation in Electric Vehicles Using Machine Learning and Deep Learning Techniques.
2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS); 2024:
IEEE.
35. Wang J, Li P, Ran R, Che Y, Zhou Y. A short-term photovoltaic power prediction model based on the gradient boost decision tree. Applied
Sciences. 2018;8(5):689. https://doi.org/10.3390/app8050689
36. Biem A. Neural Networks: A Review. Data classification: Algorithms and applications. 2014:205-44.
37. Chung H, Lee SJ, Park JG, editors. Deep neural network using trainable activation functions. 2016 International Joint Conference on
Neural Networks (IJCNN); 2016: IEEE.
38. Gajowniczek K, Liang Y, Friedman T, Ząbkowski T, Van den Broeck G. Semantic and generalized entropy loss functions for semi-
supervised deep learning. Entropy. 2020;22(3):334. https://doi.org/10.3390/e22030334
39. Wang Y, Xie B, Wan F, Xiao Q, Dai L. Application of ROC curve analysis in evaluating the performance of alien species’ potential
distribution models. Biodiversity Science. 2007;15(4):365. https://doi.org/10.1360/biodiv.060280
40. Meisner A, Carone M, Pepe MS, Kerr KF. Combining biomarkers by maximizing the true positive rate for a fixed false positive rate.
Biometrical Journal. 2021;63(6):1223-40. https://doi.org/10.1002/bimj.202000210
41. Raschka S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:181112808. 2018.
42. Heydarian M, Doyle TE, Samavi R. MLCM: Multi-label confusion matrix. IEEE Access. 2022;10:19083-95.
https://doi.org/10.1109/ACCESS.2022.3151048
43. Jaleel M, Amira A, Malekmohamadi H, editors. Classification of Gas Sensor Data Using Multiclass SVM. Science and Information
Conference; 2023: Springer. https://doi.org/10.1007/978-3-031-37717-4_88
44. Vichare SS. PROBABILISTIC ENSEMBLE MACHINE LEARNING APPROACHES FOR UNSTRUCTURED TEXTUAL DATA
CLASSIFICATION: Purdue University Graduate School; 2024.

Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025

You might also like