Improving Electric
Improving Electric
a Environment Laboratory, Institute of Mines, Echahid Cheikh Larbi Tebessi University, Tebessa 12002, Algeria
Highlights Abstract
▪ Machine learning enhances predictive This study stands out for its novelty, offering an alternative solution to
traditional methods for analyzing failure modes and their effects. We
maintenance for electric vehicles.
utilized machine learning techniques, which have enabled a significant
▪ Advanced algorithms identify potential failure shift in the predictive maintenance of electric vehicles. We performed
numerous tests and evaluations of advanced models such as random
modes before they occur.
forests, decision trees, logistic regression, and neural networks, where
▪ Improved efficiency and reduced downtime random forests and neural networks achieved exceptional accuracy of
96.67%. This breakthrough improves fault prediction accuracy, reduces
through accurate failure predictions.
operational costs, and minimizes downtime by combining numerical and
▪ Optimised maintenance schedules based on categorical data. The study focuses on the transformative potential of
machine learning, enhancing the reliability, lifespan, and maintenance of
real-time vehicle performance data.
electric vehicles through a data-driven approach. The main innovation
▪ Data-driven insights extend the lifespan of of this study lies in integrating multiple models, such as Random Forest
and Neural Networks, to analyze failures in electric vehicles. While
electric vehicle components.
previous studies typically relied on traditional techniques like decision
trees or regression analysis, our research presents a multi-layered
approach, enabling the models to detect more complex patterns and
improve prediction accuracy. Moreover, we incorporate real-world data
collected from electric vehicle sensors, which allows the model to make
precise predictions in real-world operational environments. This
approach significantly advances previous studies, which primarily relied
on simulated data or isolated models.
Keywords
This is an open access article under the CC BY license predictive maintenance, machine learning, electric vehicles, random
(https://creativecommons.org/licenses/by/4.0/) forests, neural networks, advanced data
1. Introduction
The rapid transition towards electric vehicles (EVs) presents traditional maintenance methods cannot fully address (2-4).
new challenges in maintaining the reliability and performance These failures can significantly impact operational efficiency,
of these vehicles (1). Unlike traditional internal combustion leading to unplanned downtime and higher maintenance costs
engine (ICE) vehicles, EVs rely on complex electronic (5, 6). To mitigate these issues, predictive maintenance using
components, such as high-voltage batteries, electric motors, and machine learning (ML) techniques has become a pivotal
associated control systems, introducing new failure modes that strategy in the automotive industry, including for EVs, to
2.1.4. Standardization
2.4.1. Machine Learning Models In this context, 𝑦̑ 𝑡 Represents the prediction generated by
tree t, while T represents the total count of trees included in the
The following models are trained and evaluated for fault
analysis.
detection:
• Support Vector Machine (SVM):
• Logistic Regression:
SVM aims to find the optimal hyperplane that maximizes
The logistic regression equation used to predict the
the margin between the two classes. It uses the kernel trick for
probability of a binary outcome through the logistic function is
nonlinear cases to project data into higher dimensions. (31).
described below. (26):
Primal Problem:
1
𝑃(𝑦 = 1|𝑋) = (7) 1
1+𝑒 −(𝛽0 +𝛽1 𝑋1 +𝛽2 𝑋2 +⋯+𝛽𝑛 𝑋𝑛) 𝑚𝑖𝑛𝑤,𝑏 ||𝑤||2 subject to 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1 (12)
2
where 𝑃(𝑦 = 1 ∣ 𝑋) is the probability of failure mode, 𝛽0 Is the
Here, w is the average vector to the hyperplane, 𝑥𝑖 are the
intercept and 𝛽1 , 𝛽2 , … , 𝛽𝑛 are the coefficients of the features
feature vectors and 𝑦𝑖 are the class labels.
𝑋1 , 𝑋2 , … , 𝑋𝑛
Dual Problem (for Kernel SVM) (32):
• Decision Tree Classifier:
1
A decision tree splits the feature space recursively based on 𝑚𝑖𝑛𝛼 ∑𝑛𝑖=1 𝛼𝑖 − ∑𝑖,𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝐾(𝑥𝑖 , 𝑥𝑗 ) (13)
2
conditions that maximize the information gain or reduce the
Where 𝛼𝑖 are the Lagrange multipliers and 𝐾(𝑥𝑖 , 𝑥𝑗 ) is the
Gini impurity.
kernel function (e.g., radial basis function or polynomial kernel).
The Gini impurity is calculated at each node to determine
• K-Nearest Neighbors (KNN):
the quality of the split. (27):
KNN, a non-parametric technique, classifies data points by
Eksploatacja i Niezawodność – Maintenance and Reliability Vol. 27, No. 3, 2025
their K-nearest neighbors' majority class. where 𝑦𝑖 is the actual label, and 𝑦̂𝑖 is the predicted probability
Distance Metric: KNN utilizes Euclidean Distance to of class i.
measure neighbor proximity. (33): 𝑦𝑖 is an accurate label and is class i expected probability.
𝑑(𝑥, 𝑥 ′ ) = √∑𝑛𝑖=1(𝑥𝑖 − 𝑥𝑖 ′ )2 (14) The results of different models, such as Random Forest and
where 𝑥𝑖 and 𝑥𝑖 ′ are feature values of the test and training points, Neural Networks, were combined using an Ensemble Voting
respectively. approach to achieve the best classification outcome. After
Prediction: The predicted class 𝑦̂ is determined by training each model separately on the dataset, the outputs of the
a majority vote among the k-nearest neighbors: models were merged based on the highest accuracy achieved by
𝑦̂ = mode(𝑦1 , 𝑦2 , … , 𝑦𝑘 ) (15) each method. The voting principle was applied to determine the
• Gradient Boosting: final classification, helping to improve prediction accuracy and
Gradient Boosting builds decision trees successively, reduce potential errors.
correcting their faults. This is done by minimizing a loss
2.4.2. Model Evaluation Metrics
function using gradient descent.
In addition to accuracy and ROC AUC, metrics such as
Loss Function: Let 𝐿(𝑦, 𝑦̂) be the loss function, such as
precision, recall, and F1-score were calculated to evaluate the
mean squared error, is utilized in regression analysis. Gradient
model’s classification performance comprehensively. These
Boosting minimizes the residuals by fitting decision trees to the
metrics are essential for assessing how well the models perform
negative gradient of the loss function. (34):
across the different classes of failure modes.
𝑦̂𝑚+1 = 𝑦̂𝑚 + 𝜈 ∑𝑁
𝑖=1 𝛾𝑖 ℎ(𝑥𝑖 ) (16)
• Precision measures the accuracy of the optimistic
where 𝑦̂𝑚 is the current prediction, ℎ(𝑥𝑖 ) is the decision tree,
predictions made by the model. It is defined as the ratio
𝛾𝑖 are the step sizes (learning rates), and 𝜈 is a scaling factor.
of accurate optimistic predictions to the total predicted
Final Prediction: After M boosting iterations, the final
positives (39):
prediction is (35):
True Positives
𝑦̂ = ∑𝑀
𝑚=1 𝜈 ℎ𝑚 (𝑥) (17) Precision = (21)
True Positives+False Positives
• Neural Networks (MLP Classifier):
• Recall (Sensitivity or True Positive Rate) measures the
Multi-layer Perceptron (MLP) neural networks have several
model's ability to identify all positive instances
nodes. Neurons in one layer link to those in the next (36).
correctly. It is defined as the ratio of accurate
Forward Propagation: For a single-layer network:
optimistic predictions to the total actual positives (40):
𝑎 (𝑙+1) = 𝜎(𝑊 (𝑙) 𝑎 (𝑙) + 𝑏 (𝑙) ) (18)
True Positives
where 𝑊 (𝑙) are the weights, 𝑎(𝑙) is the activation of layer 𝑏 (𝑙) are Recall = (22)
True Positives+False Negatives
∇𝑊 (𝑙) =
∂𝐿
(19) These metrics allow for a more detailed evaluation of the
∂𝑊 (𝑙)
model's performance, especially when there is an imbalance in
where 𝛻𝑊 (𝑙) is the loss gradient concerning the weights at layer
the classes or when the costs of false positives and false
l, and L is the loss function (e.g., cross-entropy for
negatives are significant.
classification).
Table 5 below summarizes the models' performance based
Loss Function: For classification, the expected loss
on accuracy, precision, recall, and F1-score, providing a clearer
function is cross-entropy (38):
view of their classification abilities across different failure
𝐿 = − ∑𝑛𝑖=1 𝑦𝑖 𝑙𝑜𝑔( 𝑦̂𝑖 ) (20)
modes.
on multiple data splits, ensuring equal class proportions in each This section analyses and discusses the results of applying
fold. (41): various machine-learning models to the dataset. The models
1 evaluated in this analysis consist of Random Forest, Neural
CV Accuracy = ∑𝑘𝑖=1 Accuracy𝑖 (24)
𝑘
Networks, Logistic Regression, and K-Nearest Neighbours
Where k is the number of folds. (KNN). Performance metrics, including accuracy, confusion
2.6. Hyperparameter Tuning matrices, and ROC AUC, were utilized for comparative analysis.
Visual representations, including plots, confusion matrices, and
Each model's essential hyperparameters were grid-searched to
tables, are presented for comprehensive analysis.
optimize performance. In Random Forest, the number of trees
This study examines the performance of multiple machine
(n_estimators) and maximum tree depth (max_depth) were
learning models for classifying failure modes in electric
adjusted, whereas Neural Networks optimized hidden layers
vehicles (EVs), categorized into four severity classes:
and neurons per layer. The tuning method was 5-fold cross-
- Class 0: Operational – Normal operation without faults.
validated to avoid overfitting.
- Class 1: Minor Fault – Issues that do not require immediate
Table 6 outlines the specific hyperparameters tuned for each
attention.
model and the optimal values found through cross-validation.
- Class 2: Critical Fault – Issues that need attention but are
This process allowed the models to generalize unseen data better.
not immediately dangerous.
Table 6. Model Performance.
- Class 3: Severe Failure – Serious faults requiring urgent
Model Hyperparameters Tuned Best Values
maintenance.
Logistic
Regularisation strength (C) 0.1 The models were evaluated based on accuracy, ROC AUC,
Regression
Random Number of trees (n_estimators), Max confusion matrices, and the time required for training and
200, 15
Forest depth (max_depth) prediction.
3, 64 Neural Networks and Random Forest demonstrated the
Neural Number of layers, Neurons per layer,
neurons,
Networks Learning rate highest overall accuracy (96.00% and 95.00%, respectively),
0.001
with perfect performance in Class 3 (Severe Failure) based on
Gradient Learning rate and the number of
0.05, 300 ROC AUC values of 1.00.
Boosting boosting steps (n_estimators).
The methodology presented in this study outlines Models like Naive Bayes and AdaBoost struggled with
a comprehensive approach to failure mode prediction in electric lower accuracy rates and ROC AUC values, particularly in the
vehicles, starting from data collection and preprocessing to minor fault categories.
predictive modeling and evaluation. By employing various Table 7 below summarises the key performance metrics for
machine learning models and techniques such as feature scaling, each model.
modes. (42).
3.2. Accuracy and Confusion Matrix Analysis
The confusion matrix helps evaluate classification errors.
The confusion matrix, shown in Figure 5, provides a detailed
For each model, the matrix is computed as:
breakdown of correct and incorrect predictions, highlighting 𝑇𝑃 𝐹𝑃 (11)
Confusion Matrix = [ ]
areas where the model excels or struggles in predicting failure 𝐹𝑁 𝑇𝑁
Forest's confusion matrix, where we observe a low number of Figure 6 shows each model’s accuracy and ROC AUC in the
false positives, indicating this model's suitability for failure same plot, allowing for a comprehensive model performance
prediction. Additionally, Table 8 presents a detailed comparison evaluation.
of accuracy and performance metrics across all models,
providing further insights into each model's effectiveness.
The confusion matrices have been updated with different
color schemes to highlight the performance of each
classification algorithm. The image title is "Performance of
Classification Algorithms—Confusion Matrices."
Here are the insights based on the confusion matrices:
- Logistic Regression: The model exhibits balanced
misclassification across all classes, particularly struggling
with correctly classifying class 1 and class 2.
- Decision Tree: The model performs well in predicting class
3, with 15 correct predictions, but it struggles in other areas,
possibly due to overfitting.
- Random Forest: The algorithm manages a relatively good
classification balance, especially for class 0 and class 1, Figure 6. Model Accuracy and ROC AUC Comparison
though it makes some misclassifications in class 3. (Line Plot)
- K-Nearest Neighbors: This model shows considerable This dual-line plot provides a holistic view of model
errors across different classes, particularly with classes 0 performance by juxtaposing accuracy with ROC AUC scores. It
and 2, indicating difficulties in proper classification. emphasizes that higher accuracy often correlates with better
- Support Vector Machine: The SVM performs very well in AUC performance, showcasing robust models across multiple
class 3, but it misclassifies many instances in class 0 and evaluation metrics.
class 1, which impacts its overall performance.
3.4. Error Analysis
- Naive Bayes: The model shows a higher misclassification
We provide a classification report with accuracy, recall, and F1-
rate for classes 1 and 2, struggling to separate these
score for each model to evaluate performance.
categories.
Figure 9 plots the ROC curves for the top models, including failures.
Random Forests and Neural Networks. The curves indicate the The following plots present ROC curves for each model
TPR-FPR relationship, demonstrating the models' ability to across multiple classes, allowing for a comparative evaluation
distinguish between failure and normal modes. Random Forests of model performance in distinguishing between different
References
1. Talukdar BK, Deka BC, Goswami AK. An Approach to Reliability Modeling and Availability Analysis of a Solar Electric Vehicle With
Standby Plug‐in Facility. International Transactions on Electrical Energy Systems. 2021;31(12). https://doi.org/10.1002/2050-7038.13147
2. Barkenbus JN. Prospects for Electric Vehicles. Sustainability. 2020;12(14):5813. https://doi.org/10.3390/su12145813
3. Raja HA, Kudelina K, Asad B, Vaimann T, Kallaste A, Rassõlkin A, Khang HV. Signal Spectrum-Based Machine Learning Approach for
Fault Prediction and Maintenance of Electrical Machines. Energies. 2022;15(24):9507. https://doi.org/10.3390/en15249507
4. Shu X, Guo Y, Yang W, Wei K, Zhu Y, Zou HX. A Detailed Reliability Study of the Motor System in Pure Electric Vans by the Approach
of Fault Tree Analysis. Ieee Access. 2020;8:5295-307. https://doi.org/10.1109/ACCESS.2019.2963197
5. Tian X, Li X, Chi Y. Reliability Study of Electric Vehicle Drive Motor Control System. International Journal of New Developments in
Engineering and Society. 2023;7(3). https://doi.org/10.25236/IJNDES.2023.070306
6. Rao PS, Yaqoob SI, Ahmed MA, Abdinabievna PS, Yaseen SM, Arumugam M. Integrated artificial intelligence and predictive maintenance
of electric vehicle components with optical and quantum enhancements. Optical and Quantum Electronics. 2023;55(10):855.
https://doi.org/10.1007/s11082-023-05135-7
7. Jain M, Vasdev D, Pal K, Sharma V. Systematic Literature Review on Predictive Maintenance of Vehicles and Diagnosis of Vehicle's
Health Using Machine Learning Techniques. Computational Intelligence. 2022;38(6):1990-2008. https://doi.org/10.1111/coin.12553
8. Shiva K, Etikani P, Bhaskar VVSR, Mittal A, Dave A, Thakkar D, et al. Anomaly detection in sensor data with machine learning: Predictive
maintenance for industrial systems. Journal of Electrical Systems. 2024;20(10s):454-62.
9. Peres RS, Rocha AD, Leitão P, Barata J. IDARTS – Towards Intelligent Data Analysis and Real-Time Supervision for Industry 4.0.
Computers in Industry. 2018;101:138-46. https://doi.org/10.1016/j.compind.2018.07.004
10. Silvestrin LP, Hoogendoorn M, Koole G. A Comparative Study of State-of-the-Art Machine Learning Algorithms for Predictive