A Research Study on Unsupervised Machine
Learning Algorithms for Early Fault Detection
         in Predictive Maintenance
            Summer Internship under the guidance of
                Dr. M. Punniyamoorthy
                       Professor(HAG)
          National Institute of Technology
                  Tiruchirapalli-620015
                       Submitted by
                        NAVEENRAJ P
                BACHELOR OF TECHNOLOGY (PRODUCTION)
              National Institute of Technology
                      Tiruchirapalli-620015
                         JUNE 2025
                         INSTITUTE BONAFIDE CERTIFICATE
This is to certify that the project entitled” A Research Study on
Unsupervised Machine Learning Algorithms for Early Fault Detection in
Predictive Maintenance “ is a record of the work done by Naveenraj P
(24AC0043)in fulfillment of the Summer Internship at the National
Institute of Technology, Tiruchirapalli, during June 2025.
I declare that I have carried out the work presented in this report and that
I have not submitted the results in any form previously for the award of any
degree or diploma.
Dr. M. Punniyamoorthy,
Guide,
Department of Management Studies,
National Institute of Technology,
Tiruchirapalli-620015
Project submitted on 25.06.2025
                                     1
CONTENTS
ABSTRACT
     1.Introduction-                                 2
    2. LITERATURE REVIEW                             3
    3. FAULT DETECTION                               4
           A. Data Collection                        4
           B. Feature Selection Using PCA            6
                                                     6
           C. T2 Statistic
                                                     7
           D. Cluster Analysis
                                                     7
           E. Optimal Number of Clusters
                                                     8
           F. Heirarchical Clustering
                                                     8
           G. K-Means and Fuzzy C-Means Clustering
                                                     11
           H. Model-Based Clustering                 12
     4. RESULTS                                      12
    5.Conclusion                                     13
    6.references                                     17
   Appendix
                                                 0
  A Research Study on Unsupervised Machine
Learning Algorithms for Early Fault Detection in
            Predictive Maintenance
ABSTRACT
The area of predictive maintenance has taken a lot of prominence in
the last couple of years due to various reasons. With new algorithms
and methodologies growing across different learning methods, it has
remained a challenge for industries to adopt which method is fit,
robust and provide most accurate detection. Fault detection is one
of the critical components of predictive maintenance; it is very much
needed for industries to detect faults early and accurately. In a
production environment, to minimize the cost of maintenance,
sometimes it is required to build a model with minimal or no
historical data. In such cases, unsupervised learning would be a
better option model building. In this paper, we have chosen a simple
vibration data collected from an exhaust fan, and have fit different
unsupervised learning algorithms such as PCA T2 statistic,
Hierarchical clustering, K-Means, Fuzzy C-Means clustering and
model-based clustering to test its accuracy, performance, and
robustness. In the end, we have proposed a methodology to
benchmark different algorithms and choosing the final model.
1.INTRODUCTION
The concept of predictive maintenance (PdM) was proposed a few
decades ago. PdM is also a subset of planned maintenance. PdM did
not gain prominence until the recent decade. This rapid advance is
mainly due to emerging internet technologies, connected sensors,
systems capable of handling big data sets and realizing the need to
use these techniques. The abrupt growth can also be theorized due
to the demand for high-quality products, at the least cost and with
shortest lead time. Every year, it is estimated that U.S. industry
spends $200 billion on maintenance of plant equipment and
facilities and the result of ineffective maintenance leads to a loss of
more than $60 billion [1]. In food and beverage industry it was
estimated that failures and downtime accounted for 18% of OEE [2].
                                  1
 Over the years, different architecture, algorithms, and
methodologies have been proposed. One of the most prominent
methods is watchdog agent, a design enclosed with various machine
learning algorithms [3] [11]. Some of the other architectures are an
OSA-CBM architecture [4], SIMAP Architecture [5], and predictive
maintenance framework [6]. Emerging technologies such as the
Internet of things (IoT) devices have formed a gateway to connect to
machines and its subcomponents to not only collect the process data
and its parameters but also to collect the physical health aspects of
the machine such as vibration, pressure, temperature, acoustics,
viscosity, flow rate and many as such. This information is widely used
for early fault detection, fault identification, health assessment of the
machine and predict the future state of the machine. Some of this is
made possible due to machine learning algorithms available across
different learning domains.
Machine learning is a subsection of Artificial Intelligence Figure 1.
Machine learning can be defined a program or an algorithm that is
capable of learning with minimum or no additional support. Machine
learning helps in solving many problems such as big data, vision,
speech recognition, and robotics [7]. Machine learning is classified
into three types. In supervised learning, the predictors and response
variables are known for building the model, in unsupervised learning,
, only response variables are known, and in reinforced learning, the
agent learns actions and consequences by interacting with the
environment. In this research, the main focus will be on unsupervised
learning methodology. One of the most commonly used approaches
in unsupervised learning is clustering where, response variables are
grouped into clusters either user-defined or model based on the
distance, model, density, class, or characteristic of that variable. For
this research, vibration data has been used. Data collection, feature
selection, and extraction will be described in the later sections.
                                    2
II. LITERATURE REVIEW
 The primary goal of PdM is to reduce the cost of a product or
service and to have a competitive advantage in the market to
survive. Today business analytics are embedded across PdM to
realize the need for it and to make appropriate decisions. Business
analytics can be viewed in three different prospective (i) Descriptive
analytics (ii) Predictive analytics and (iii) Prescriptive analytics [16].
Descriptive analytics is a process of answering questions like what
happened in the past? This is done by analyzing historical data
and summarizing them in charts. In maintenance, this step is
performed using control charts. Predictive analytics is an
extension to descriptive analytics where historical data is analyzed
to predict the future outcomes. In maintenance, it is used predict
type of failure and time to complete failure. Finally, prescriptive
analytics is a process of optimization to identify the best
alternatives to minimize or maximize the objective. This also
answers the questions such as what can be done? In maintenance,
this can be used to optimize the maintenance schedules to
minimize the cost of maintenance. In this paper, our primary focus
will be on descriptive and predictive analytics to detect the faults.
Predictive analytics has spread its applications into various
applications such as railway track maintenance, vehicle
monitoring [23], automotive subcomponents [8], utility systems
[19], computer systems, electrical grids [13], aircraft maintenance
[21], oil and gas industry, computational finance and many more.
Fault detection is one of the concepts in predictive maintenance
which is well accepted in the industry. Early Failure detection
could potentially eliminate catastrophic machine failures. In one
of the recent research studies, this process is classified into
different methods such as quantitative model-based methods,
qualitative model-based methods, and process history based
                                    3
methods [25]. Principle component analysis (PCA) is one of the
oldest and most prominent algorithms that are widely used today.
It was first invented by Karl Pearson in 1901. Since then, they have
been many hybrid approaches to PCA for fault detection such as
using Kernel PCA [17], adaptive threshold using Exponential
weight moving average for T2 and Q statistic [9], multiscale
neighborhood normalization-based multiple dynamic principal
component analysis (MNN MDPCA) method [27], Independent
Component Analysis. Another common method used for fault
detection is clustering method. Similar to PCA, there are various
algorithms such as neural net clustering algorithm neural
networks and subtractive clustering [28], K-means [10], Gaussian
mixture model [15], C-Means, Hierarchical Clustering [22], and
Modified Rank Order clustering (MROC) [33].
FAULT DETECTION
 Fault detection is one of the most critical components of
predictive maintenance. Fault detection can be defined as a
process of identifying the abnormal behavior of a subsystem. Any
deviation from a standard behavior can be categorized as a
failure. In this section, we will discuss different algorithms such
as Principle Component Analysis (PCA) T2 statistic, Hierarchical
clustering, K- Means clustering, C Means, and Model-based
clustering for fault detection and benchmark its results for
vibration monitoring data.
  A. Data Collection Vibration data is one of the most commonly
     used technique to detect any abnormalities in a
     submachine. In this research paper, a vibration monitor
     sensor was set up on an exhaust fan. The vibration was
     collected every 240 minutes for 12 days at a sampling
     frequency of 2048 Hz on both X and Y axis. From the
     following data, different features were extracted such as
     peak acceleration, peak velocity, turning speed, RMS
     Velocity, and Damage accumulation. Figure 2 is the time
     series plots of the data.
                                 4
  Modern Definition of IMF (e.g., in Variational Mode
Decomposition - VMD)
In modern methods like VMD, the concept of IMF shifts more
towards a mathematical and frequency-domain definition:
    1. Band-limited modes:
       Each IMF is modeled as a mode with a compact support in
       the frequency domain, i.e., each mode is centered around
       a specific frequency (ωₖ).
    2. Mode constraints:
       Modes are derived such that they are:
        o   Smooth in the frequency domain (minimal bandwidth).
        o   Non-overlapping in frequency.
        o   Optimized through variational principles to extract
            meaningful components.
    3. No need for envelopes or zero-crossings:
       Modern IMFs do not require extrema/zero-crossing counts
       or mean-zero envelopes like in EMD.
Use Case: VMD and other recent methods define IMFs through
optimization problems that provide better mathematical
robustness, particularly for noisy or closely spaced signal
components.
In Figure 2, we can see a trend line generating closer to index
60th observation. In this paper, we will test to see how different
algorithms help in detecting this fault earlier.
                                  5
B. Feature Selection Using PCA Not all features extracted
   provide a true correlation. If right features are not selected,
   then a significant amount of noise would be added to the
   final model and hence, reduce the accuracy of the model.
   One of the most prominent algorithms for that is used for
   dimensionality reduction is Principle component analysis.
   Principal component analysis (PCA) is a mathematical
   algorithm that reduces the dimensionality of the data while
   retaining most of the variation (information) in the data set
   [18]. In a simple context, it is an algorithm to identify
   patterns in data and expressing such a way to showcase
   those similarities and differences [29]. Algorithm:
    Step 1: Consider a data matrix X [X]mxn where, X is the
   matrix, m is a row, and n is a column
   Step 2: Subtract the mean from each dimension [ 356 ] −[ ]
   Step 3: Calculate the covariance matrix [ ] (3)
   Step 4: Calculate the eigenvectors and eigenvalues of the
   covariance matrix ([ ] − ){ } ={0}
   Step 5: Store the eigenvector in a matrix [ ] =[{ }{ }{ }…..{ }]
   Step 6: Store eigenvalues in a diagonal matrix [ ] (4) (5) (6)
   where [Eigen] is the eigenvalues corresponding to the
   principal components, and P contains the loading vectors
   Step 7: Rank eigenvalues in decreasing order and choose
   top “r” vectors to retain [
   Step 8: Retain “r” eigenvectors [ ] ] =[{ }{ }{ }…..{ }] (7) (8)
   Step 9: Calculate the principal components [U] which is
   projected in data matrix [ ] [ ] =[ ] (9) Summary of the PCA
   indicates that the first two principal components show
   95.65% of variance compared to rest of the components. A
   scree plot can be plotted for Eigenvalues versus principle
   components as shown in Figure 4. This plot can be used to
   define the components that show significant variance in the
   data. From summary data and scree plot, we can conclude
   that the first two principal components present maximum
   variation compared to the rest of the principal components.
C. T2 Statistic
    T2 Statistic is a multivariate statistical analysis. The T 2
   statistic for the data observation x can be calculated by [12]
   = ∑ (10) The upper confidence limit for T 2 is obtained using
   the F-distribution: , ,∝= ( ) , ,∝
                                6
   where n is the number of samples in the data, a is the
   number of principal components, and α is the level of
   significance [24]. This statistic can be used to measure the
   values against the threshold and any values above the
   threshold; can be concluded as out of control data. In this
   case, it is going to be faulty data. The results for the
   vibration data are shown the Figure 5
   Based on the results from T2 statistic in Figure 5, we can
   observe that the faults can be detected as early as 41
   observations. Hence, this early detection would help the
   maintenance teams to monitor these process changes and
   take corrective actions accordingly.
D. Cluster Analysis
    Clustering analysis is one of the unsupervised learning
   methods. In cluster analysis, similar data are grouped into
   different clusters. Some of the most prominent cluster
   analyses are K-Means clustering, C-Means clustering, and
   hierarchical clustering. There are various merging principles
   in hierarchical clustering. They are iterative, hierarchical,
   density based, Metasearch controlled and stochastic. In this
   paper, we will be discussing one of the commonly used
   hierarchical clusterings.
E. Optimal Number of Clusters
    In cluster analysis, we need to know the optimal number of
   clusters that can be formed. Although we know that, we
   have healthy data and faulty data, identifying the number of
   optimal cluster formations in our data would help in
   understanding different states in the data and representing
   the data more accurately. To identify the number of
   clusters, there are many procedures available such as elbow
   method, Bayesian Inference Criterion method and nbClust
   package in R. The results for elbow method is shown in
   Figure 6 and using nbClust [30] is shown in Figure 7.
   From both the procedures shown in Figure 6 and Figure 7,
   we can identify that 3 clusters are the optimal number of
   clusters. For fault detection, we can use three clusters and
   theorize each cluster represents a normal condition,
   warning condition, and faulty condition. In the next section
   of cluster analysis, we can observe how each of the
   clustering algorithms provides the results. From both the
   procedures shown in Figure 6 and Figure 7, we can identify
                              7
  that 3 clusters are the optimal number of clusters. For fault
  detection, we can use three clusters and theorize each
  cluster represents a normal condition, warning condition,
  and faulty condition. In the next section of cluster analysis,
  we can observe how each of the clustering algorithms
  provides the results.
F. Heirarchical Clustering
    Start by assigning each item to its own cluster, so that if
   you have N items, you now have N clusters, each containing
   just one item. Let the distances (similarities) between the
   clusters equal the distances (similarities) between the items
   they contain [24]. Algorithm: Step 1: Find the closest (most
   similar) pair of clusters and merge them into a single
   cluster, so that now you have one less cluster. Step 2:
   Compute distances (similarities) between the new cluster
   and each of the old clusters. Step 3: Repeat steps 2 and 3
   until all items are clustered into a single cluster of size N. In
   Figure 8, the cluster is formed based on the feature data
   using Ward's method. Irrespective of feature data and
   Principle components, the results were identical. Three
   clusters were formed, where the first cluster includes
   observations from 1 to 40, the second cluster includes
   observations 41 to 67 and finally, the third cluster includes
   observations from 68 to 71. Based on the domain
   knowledge, we can represent cluster 1 as healthy dataset,
   cluster 2 as warning dataset and finally cluster 3 as faulty
   data set.
G. K-Means and Fuzzy C-Means Clustering
   K-means is one of the most common unsupervised learning
   clustering algorithms. This most straightforward algorithm’s
   goal is to divide the data set into pre-determined clusters
   based on distance. Here, we have used Euclidian distance.
   The graphical results as shown in Figure 9. C-means is a
   data clustering technique where each data point belongs to
   every cluster at some degree. Fuzzy C means was first
   introduced by Bezdek [14]. Fuzzy C-Means has been applied
   in various applications such as agricultural, engineering,
   astronomy, chemistry, geology, image analysis [14], medical
                                8
     diagnosis, and shape analysis and target recognition [26].
     The graphical results for C-Means is as shown in Figure 9.
     From Table III summary of K-means and C-means
     clustering, we can observe that clusters of sizes 4, 27 and
     40 are formed. Observation 1 to 40 formed one cluster, 41
     to 67 formed second cluster and the third cluster with 68 to
     71 observations. These results are same as hierarchical
     clustering
       Scree plot to determine the variation between principal
     components.
        T2 statistic results for training dataset and testing
dataset.
                                  9
          Determining the optimal number of clusters based on
elbow method.
Determining the number of clusters using nbClust package.
       CLUSTER MEANS OF K-MEANS ALGORITHM
                         1      2
                    1 -9.665 -1.609
                    2 -0.497 1.856
                    3 1.301 -1.092
     Within cluster sum of squares by cluster: [1] 16.758705
     39.575966 8.823486 (between_SS / total_SS = 90.2 %)
      FUZZYC-MEANS CLUSTER CENTERS WITH 3CLUSTERS
                               10
                           1      2
                      1 1.275 -1.071
                      2 -0.289 1.920
                      3 -9.935 -1.723
H.Model-Based Clustering
     A Gaussian mixture model (GMM) is used for modeling data
    that comes from one of the several groups: the groups might
    be different from each other, but data points within the
    same group can be well-modeled by a Gaussian distribution
    [20]. Gaussian finite mixture model fitted by EM algorithm
    is an iterative algorithm where some initial random estimate
    starts and updates every iterate until convergence is
    detected [31] [32]. Initialization can be started based on a
    set of initial parameters and start E-step or set of initial
    weights and proceed to M-step. This step can be either set
                              11
     randomly or could be chosen based on some method.
     Summary of Classification Mclust EVV (ellipsoidal, equal
     volume) model with five components: log.likelihood n df BIC
     ICL -57.23501 71 25 -221.037 -222.0734
     K-Means and C-Means clustering for fault identification.
The results are summarized in Table 3. The results from
Gaussian finite mixture model fitted by EM algorithm
Classification, there was a total of 5 groups of components are
formed. Component 1 and two are assigned to observation 1 to
40, component group 3 consists of observation 41 to 63,
component group 4 consist of observations 64 to 67 and finally
component 5 consists of observations 68 to 71. It is interesting to
note that, the critical fault detection which is accurately
predicted similarly to other clustering algorithms as well.
RESULTS
 In this research, initially, we were hypothesized that two states
in data. One is healthy data set, and the other is unhealthy data
set. Using PCA and T2 statistic, we were able to fit our hypothesis
states and able to detect the faults 31 observations ahead.
Whereas, without a tool and just based on data plots we could
observe the trends only 11 observations ahead. As we moved on
to fitting different unsupervised clustering algorithms, we found
most of the clustering algorithms provided much more than the
T2 statistic. Using elbow method and nbClust package, we were
able to identify that the most optimal number of clusters that
could be formed was three. Based on these results, when data
was fitted in hierarchical clustering, K-means, and C-means, the
                                12
results were nearly identical. Based on the previous knowledge of
the data, we were able to identify each of three states. The first
state was identified as healthy state (since it was calibrated for
healthy data), second state was identified as the warning state
and finally the third state was identified as faulty state. It would
not be surprising to obtain the following results as all these
algorithms were based on a distance measure.
     Gaussian finite mixture model fitted by EM algorithm
     classification.
     For our final model, Gaussian finite mixture model fitted by
     EM algorithm was used. Unlike providing the number of
     clusters, this model identifies optimal clusters and
     accordingly classifies the observations into groups. Here,
     the model recognized a total of 5 components. Although
     with five components, upon closer investigation, we could
     observe that, there is an overlap of component 1 and 2 and
     component 3 and 4. When these components are
     reorganized we can observe much similar pattern to the
     previous cluster analysis.
     V. CONCLUSION
     This research started out as a test bed to benchmark
     different machine learning algorithms for early fault
     detection using unsupervised learning. In our results, T2
     statistic provided more accurate results compared to GMM
     method, and no hypothesis was required to identify the
     relationship between cluster and state. One of the main
     benefits of this method is that, even when this is deployed
                                 13
to the manufacturing environment, with minimum or no
domain knowledge, one can identify fault or critical
condition when compared to clustering analysis. On the
other hand in clustering, some information about the data
is needed to name the clusters as healthy, warning or
critical. Clustering methodology is undoubtedly a better tool
in detecting different levels of faults where T2 statistic
would be challenging after certain levels. To emphasize this,
when the cost machine maintenance is expensive,
clustering would be a flexible option where machine health
can be monitored continuously until a critical level is
reached.
                           14
In conclusion of this study, although most algorithms
provided nearly similar results, each algorithm provided
deeper insight into the data. Hence, if the application is just
to detect the faults, T2statistic would be an excellent to
                            15
In conclusion of this study, although most algorithms provided
nearly similar results, each algorithm provided deeper insight
into the data. Hence, if the application is just to detect the faults,
T2statistic would be an excellent tool.
But if fault detection needs to be performed under different levels
then, clustering algorithms would be a better choice.
VI. FUTURE SCOPE OF WORK
 Fault detection is one of the preliminary analytics for predictive
maintenance. Hence, detecting the fault accurately is regarded
important. This work is currently performed for vibration data.
The scope of this research can be extended out to other physics-
based parameters and combination of these parameters. It would
also be interesting to observe the detection accuracy for bigger
sample size and multiple fault states.
REFERENCES
[1] Mobley, R Keith, “An Introduction to predictive maintenance”,
2002, 2nd ed, ISBN 0-7506-7531-4
 [2] Battini, D., Calzavara, M., Persona, A., and Sgarbossa, F.
(2016) “Sustainable Packaging Development for Fresh Food
Supply Chains. Package.” Technol. Sci., 29: 25–43. doi:
10.1002/pts.2185.
[3] Jay Lee, Hung-An Kao, Shanhu Yang, (2014) “Service
Innovation and Smart Analytics for Industry 4.0 and Big Data
Environment”, Procedia CIRP Volume 16, 2014, Pages 3-8
[4] Lebold M, Thurston M. “Open standards for condition-based
maintenance and Prognostic systems”. In: Proceedings of
MARCON 2001—fifth annual maintenance and reliability
conference, Gatlinburg, USA, 2001.
[5] Garcia E, Guyennet H, Lapayre J-C, Zerhouni N. “A new
industrial cooperative tele-maintenance platform”. Comput Ind
Eng 2004;46(4): 851–64.
[6] Groba. C, Cech. S, Rosenthal. F., Gossling. A, “Architecture of
the predictive maintenance framework”, 6th International
Conference on Computer Information Systems and Industrial
Management Applications, 2007, IEEE
                                  16
Appendix
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from scipy.stats import f
# Simulated feature data (to replicate the paper,
values are dummy for example)
np.random.seed(42)
n_samples = 72
n_features = 5
X = np.random.normal(0, 1, (n_samples,
n_features))
# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# PCA - First principles
cov_matrix = np.cov(X_scaled, rowvar=False)
                            17
eigen_values, eigen_vectors =
np.linalg.eigh(cov_matrix)
# Sort eigenvalues and eigenvectors
sorted_index = np.argsort(eigen_values)[::-1]
eigen_values = eigen_values[sorted_index]
eigen_vectors = eigen_vectors[:, sorted_index]
# Scree Plot
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(eigen_values) + 1),
eigen_values, 'o-', color='blue')
plt.title('Scree Plot')
plt.xlabel('Principal Component')
plt.ylabel('Eigenvalue')
plt.grid(True)
plt.tight_layout()
plt.show()
# Project data onto top 2 PCs
k=2
top_eigenvectors = eigen_vectors[:, :k]
PC_scores = np.dot(X_scaled, top_eigenvectors)
                             18
# T² statistic
T2 = np.sum((PC_scores ** 2) / eigen_values[:k],
axis=1)
# Calculate T² threshold
n = n_samples
a=k
alpha = 0.05
F_val = f.ppf(1 - alpha, a, n - a)
T2_limit = a * (n - 1) / (n - a) * F_val
# T² plot
plt.figure(figsize=(10, 4))
plt.plot(T2, label='T² Statistic', color='darkred')
plt.axhline(y=T2_limit, color='green', linestyle='--',
label='Control Limit')
plt.title("T² Statistic Plot")
plt.xlabel("Observation Index")
plt.ylabel("T² Value")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
                              19
from sklearn.cluster import KMeans,
AgglomerativeClustering
from sklearn.mixture import GaussianMixture
import seaborn as sns
# Use first two principal components
data_for_clustering = PC_scores[:, :2]
# K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans_labels =
kmeans.fit_predict(data_for_clustering)
# Hierarchical Clustering (Agglomerative)
hierarchical =
AgglomerativeClustering(n_clusters=3,
linkage='ward')
hierarchical_labels =
hierarchical.fit_predict(data_for_clustering)
# Gaussian Mixture Model Clustering
gmm = GaussianMixture(n_components=3,
random_state=42)
gmm_labels = gmm.fit_predict(data_for_clustering)
                          20
# Plotting Clusters
fig, axs = plt.subplots(1, 3, figsize=(18, 5))
# K-Means
sns.scatterplot(x=data_for_clustering[:, 0],
y=data_for_clustering[:, 1], hue=kmeans_labels,
palette="tab10", ax=axs[0])
axs[0].set_title("K-Means Clustering")
axs[0].set_xlabel("PC1")
axs[0].set_ylabel("PC2")
# Hierarchical
sns.scatterplot(x=data_for_clustering[:, 0],
y=data_for_clustering[:, 1],
hue=hierarchical_labels, palette="tab10",
ax=axs[1])
axs[1].set_title("Hierarchical Clustering")
axs[1].set_xlabel("PC1")
axs[1].set_ylabel("PC2")
# GMM
sns.scatterplot(x=data_for_clustering[:, 0],
y=data_for_clustering[:, 1], hue=gmm_labels,
palette="tab10", ax=axs[2])
axs[2].set_title("Gaussian Mixture Model")
                           21
axs[2].set_xlabel("PC1")
axs[2].set_ylabel("PC2")
plt.tight_layout()
plt.show()
                           22