Deep Graph Neural Network For Fault Detection and Identification in Distribution Systems
Deep Graph Neural Network For Fault Detection and Identification in Distribution Systems
in Distribution Systems
Quang-Ha Ngo1 , Bang L H Nguyen2 , Jianhua Zhang1 , Karl Schoder3 , Herbert Ginn4 , and
Tuyen Vu1
1
Clarkson University
Posted on 5 Sep 2024 — CC-BY-NC-SA 4 — https://doi.org/10.36227/techrxiv.172555531.16904989/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...
2
Duy Tan University
3
Florida State University
4
University of South Carolina
1
Deep Graph Neural Network for Fault Detection
and Identification in Distribution Systems
Quang-Ha Ngo1 , Bang L. H. Nguyen2 , Jianhua Zhang1 , Karl Schoder3 , Herbert Ginn4 , and Tuyen Vu1
1 Clarkson University, 2 Duy Tan University, 3 Florida State University, 4 University of South Carolina
Abstract—With the growing integration of renewable energy relay-based and data driven-based approaches. Digital relay-
sources, the penetration of the distributed generation leads based methods rely on protection devices installed at certain
to increasingly dynamic power distribution system topologies. locations to detect faults, typically using impedance, travelling
This poses challenges for traditional fault diagnostic methods in
accurately classifying and locating faults. This paper develops a waveform, and sparse measurements analysis, as described in
deep graph neural network (GNN) for detecting and managing [2]. Travelling wave methods analyze transient waves from
fault events on distribution systems. The 1-D convolutional graph fault points to facilitate the identification and localization of
attention networks (1-D GAT) can exploit the spatial-temporal faults but require high-frequency sensors, substantially raising
features from both voltages and branch currents to enhance the deployment cost [13]. The impedance-based methods iter-
the accuracy of fault diagnostics compared to existing methods.
The effectiveness of the proposed method is evaluated on the atively solve solve nonlinear equations with line steady-state
Potsdam 13-bus and IEEE 123-bus test feeder systems. Results conditions using voltage or current measurements to locate
indicate notable improvements in accuracy and other metrics, faults [14]. Voltage sag-based methods identify voltage drop
achieving a 1-2% increase in fault event detection, an average patterns at certain monitored buses, as faults produce unique
4% improvement in identifying fault types, and an average 5% voltage sag propagation depending on their location [15].
improvement in fault location.
Index Terms—Fault management, distribution system, 1-D Some other digital relay-based approaches like the transient
convolutional, graph attention networks, deep learning. monitor function [16], or the Hibert transform-based detection
[17] are suggested to identify faults by analyzing current
I. I NTRODUCTION signals. Both [18] and [19] point out that the precision of
Modern power grids are beginning to transition to clean impedance-based techniques can be influenced by factors such
renewable energy resources. It has been predicted that by as types of fault, measurement errors, and varying system
2050, 44% of electrical energy in the US will come from parameters. However, these traditional coordination methods
renewables (with more than 80% from solar and wind) [1]. for protective relays are facing challenges related to cost, ac-
While renewable energy resources provide benefits through curacy, and uncertainty due to the intermittent and distributed
the generation of clean electricity, the integration of more nature of these renewable energy sources. This dynamic nature
renewables elevates the risk of faults in power distribution of modern distribution systems necessitates the development
systems due to bad weather, insulation failures, and improper of new data-driven approaches for fault diagnostics.
operations. More than 70% of the main causes of interruptions Unlike the digital relay-based methods, data driven methods
are caused by various faults in the distribution system opera- focus on mining distinctive features from a large amount of
tion [2]. The resulting accidental blackouts can affect critical system measurements and fault data to identify and locate
business operations and production, and even become threats faults in power distribution systems without solving complex
to communities. Therefore, three essential tasks - fault event equations of the physical systems or analyzing digital relays.
detection, classification, and localization - are introduced to [20], [21] collectively introduce and explore the innovation
prevent this issue. Firstly, fault event detection identifies if a to fault classification using a convolutional neural network
fault occurs in the power system or not. Secondly, fault classifi- (CNN) on a 3-bus distribution network. [22] proposes an
cation involves determining specific fault types, such as single adaptive time-frequency memory (AD-TFM) cell that embeds
line-to-ground, line-to-line, or three-line-to-ground faults. This an adaptive wavelet transform into long short-term memory
information is critical for selecting the appropriate protection (LSTM) for fault detection in power distribution systems.
schemes and isolation methods [3]. Thirdly, fault localization [23] mentions a voltage data processing-based approach us-
identifies faulted positions and expedites restoration. Solving ing Gaussian process regression for fault localization and
three tasks as simultaneously as possible is crucial to enable a isolation in AC microgrids. [24] introduces a bidirectional
rapid and effective fault isolation of protection schemes, and LSTM for predicting the voltage stability in hybrid AC/DC
restoration in power systems, thereby minimizing disruptions microgrids. [25] incorporates LSTM and an adaptive neural
and enhancing overall reliability. fuzzy inference system (ANFIS) to accurately detect faults
Existing power system fault detection techniques, proposed in the IEEE 13-node system. [26] proposes a wavelet multi
in the literature [2], [4]–[12], can be categorized into digital resolution analysis and data mining-based approach for fault
detection and classification. [27] uses Fast Fourier Transform
for feature extraction and employs a multilayer perceptron
neural network for fault classification and location in dis- Neighborhood
tribution systems. [28] proposes a deep CNN-transformer
model that utilizes 1-D deep CNNs for feature extraction
and a transformer encoder with self-attention for sequence
learning to fault detection in the IEEE 14-bus distribution Fig. 1. A self-attention process in each graph attention layer [41]
system. Overall, the data driven-based methods aim to learn
the relationship between measurements and output labels by to perform fault event detection, fault type identification, and
minimizing the loss function over the training data. However, fault localizationHistorical
using and three-phase voltagePower andSystem
branch current
simulation data Real-time Measurements
these conventional approaches struggle to effectively capture data of main buses. The main contributions can be summarized
correlations for regression and classification tasks, particularly as follows:
with the increasing complexity of information from unevenly • We propose the 1-Dand
Data acquisition convolutional graph
1-D Graphattention
Attention neural
Feature extraction
distributed and dynamic distribution systems. network to improve the accuracy Network of faultmodels
diagnostics
Motivated by the limitations of existing data-driven ap- using voltages and branch currents asDeploy inputs.
proaches, graph neural network-based fault diagnostic methods • We are Graph
the construction
first to and exploit branch
Potsdamcurrents as multi-
IEEE 123-Bus
Parameter initialization 13-Bus
have been proposed. [29] proposes a multi-receptive field dimensional edge attributes for feature representation,
graph convolutional network to learn feature representations enhancing the capability of diagnosing faults since branch
from multiple neighborhood domains for mechanical fault Deep Graph Neural
currents can Fault event detection,
indeed reflect fault characteristics.
Network Training classification & localization
diagnosis. In [30], a graph convolutional neural network • The proposed method is noise-resilient, as validated with
Training Stage Testing Stage
leverages dynamic voltage measurements as nodal features the Potsdam 13-node system and the IEEE 123-node
to determine the types and locations of faults in an 8-bus system under noise scenarios.
shipboard test network. A semi-supervised graph convolutional The remaining sections are structured as follows. Section II
networks is proposed in [31] to address the limited availability introduces the description of the proposed fault diagnostic
of labeled data for electromechanical system fault diagnosis. scheme. The section III shows the simulation results of our
[32] introduces a combination of the contrastive learning and proposed method, employed to the IEEE 123-node and Pots-
a generative adversarial neural network (GAN) to detect and dam 13-node systems. The final section, Section IV, presents
classify faults in distribution lines. These papers mention that a the conclusions.
graph neural network (GNN) can leverage the graph-structured
data to effectively capture the spatial correlations among power II. M ETHODOLOGY
system components, addressing the limitation of traditional This section introduces the principles and structures of
data-driven methods. the 1-D convolution layer and the graph attention layer. The
Although GNN-based approaches achieve remarkable re- proposed fault diagnostic approach based on the 1-D GAT is
sults for fault location and classification in power systems, then described in detail.
there are still research gaps. The summary of major technical
differences with current GNN-based approaches is outlined in A. 1-D Convolutional Layer
Table I. Voltage phasors and current injections are utilized The 1-D convolution layer is a deep learning technique for
as node features by [33], [34], and [35], while only volt- analyzing sequential data, such as time series, audio signals,
age measurements are considered as node features for the and images over a single dimension [42]. This layer slides
fault detection, classification, and localization under varying a filter over the sequence, extracting features by applying a
conditions in [36]–[39]. However, these approaches neglect convolution operation that captures the temporal dependencies
branch currents, potentially limiting their fault analysis capa- within the data. This makes the 1-D convolutional layer
bility. Incorporating current measurements as edge features is well-suited to learn local patterns efficiently in recognizing
believed to be essential for enhancing fault diagnostic accuracy trends and anomalies in time series [42]. The 1-D convolution
in the future research of these papers because different types operation can be mathematically formulated as follows:
of faults (such as line-to-ground, line-to-line, or three-phase
faults) exhibit distinct patterns in branch currents [33], [34],
Nl−1
[36]. In [40], a physics-preserved graph network is introduced Ckl = σ
X
l−1
(wik ∗ Cil−1 ) + blk , (1)
to enhance fault location accuracy under limited data, but the i=1
range of fault types is limited. Regarding fault detection and
identification, the work in [33]–[35], [39], [40] only focus where Ckl is the output of the k th neuron at the layer l and
on one or two tasks. In summary, current GNN-based fault Cil−1 is the output of the ith neuron at previous layer l − 1.
l−1
diagnostic methods have notable limitations including: Limited Nl−1 represents the set of feature maps. wik are the trainable
fault diagnosis capability, and Low resolution of fault types. weights from the i neuron at layer l − 1 to the k th neuron at
th
In this paper, a novel fault diagnostic deep graphical learn- layer l. The operator * indicates the convolution operation and
ing method for distribution systems is developed. The target is b stands for the network bias. σ(·) is the activation function.
TABLE I
C OMPARISON OF T ECHNICAL A SPECTS W ITH E XISTING GNN- BASED R ESEARCH
Aspects Chen et al. [33] Hu et al. [34] De et al. [35] Bang et al. [36] Xu et al. [37] Chan et al. [38] Tong et al. [39] Li et al. [40] Ours
Fault Detection ✗ ✗ ✗ ✓ ✓ ✓ ✓ ✗ ✓
Fault Classification ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✗ ✓
Fault Localization ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✓ ✓
Branch Currents
Considered as Inputs ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓
Noise resilient ✓ ✓ ✗ ✗ ✗ ✗ ✓ ✗ ✓
Types of Fault LG, LL LG, LL LG LG, LL, LLG LG, LL LG, LL, LLG LG, LL LG, LL LG, LL, LLG
LLG LLG, LLL LLL, LLLG LLL LLL, LLLG LLG, LLL LLG LLL, LLLG
Following each 1-D convolutional layer, a nonlinear acti- and attention mechanism. The components involved in the
vation layer is applied. The layer is crucial to enable the information aggregation can be described as follows.
modeling of complex relationships between inputs and outputs 1) Self-attention mechanism: This enables nodes in a graph
by introducing non-linearity. Using an activate layer also to weigh the importance of their neighboring nodes
increases the flexibility and power of neural networks to model when updating their own representation. This allows
complex relationships within the data effectively. Among the each node to dynamically adjust its representation based
widely used activation functions are the Rectified Linear Unit on the information from its neighbors, capturing the re-
(ReLU), Sigmoid, and Tanh, each offering different character- lational dependencies within the graph. The coefficients
istics suited to specific tasks. The ReLU function, which is ei,j are computed across pairs of nodes i,j based on
employed in this paper could be written as follows: their features as:
PMU PMU
2 3 4 5 6 7
Graph-structured Dataset
DGU 5
Reshape
13
1 12 11 10 9 8
Temporal Information
Dropout
Dropout
Conv1d
Flatten
Dense
Dense
ReLu
ReLu
Dense
OD(11)
ReLu
ReLu
Fault Type
Classification
OD(13)
Fault Location
DataBatch(x=13, 1, 20, 3), edge index
= [2, 13], edge attrs = [13, 1, 20, 3]) OD(13, 32) OD(16, 27) OD(256) OD(128) Sigmoid or Softmax
training process, depicted in Fig. 2, can be summarized as feature of bus i and the edge attribute as a branch current
Target node
follows. Initially, the spatial and temporal information is col- between bus i and bus j are formulated as
lected from the power system and pre-processed to transform Neighborhood
into graph format. The graph datasets are divided into batches
Va,1 Va,2 · · · Va,K Ia,1 Ia,2 · · · Ia,K
which are propagated through the 1-D GAT model. The Xi = Vb,1 Vb,2 · · · Vb,K , Eij = Ib,1 Ib,2 · · · Ib,K (7)
configuration of 1-D GAT includes one graph attention layer, Vc,1 Vc,2 · · · Vc,K Ic,1 Ic,2 · · · Ic,K
one 1-D convolutional layer, and three fully connected layers The information of each data is considered as a graph
with decreasing number of features. The inclusion of a 1-D G = (X : {X1 , X2 , · · · XM } , E, A, E = {E1 , E2 , · · · EN }),
convolutional Historical
layer allows
and 1-D GAT to Powerincorporate
System a temporal where X represents a set of node features, M denotes the total
simulation data Real-time Measurements
pattern recognition capability into the graph-structured data number of buses in the system, E refers to the collection of
analysis. The input data for fault diagnosis includes the node edges, and each edge connects two nodes [44]. E consists of
features X,Datatheacquisition and
edge indices E, the adjacency
1-D Graph Attention
matrix A, and the vector of edge features, N is the total number of edges, and
Feature extraction Network models
the edge features E. The learning goal is to develop a model A denotes the adjacency matrix, representing the connectivity
Deploy
that can effectively capture and generalize the relationship of the distribution network.
Graph construction and Potsdam
between the node and edge attributes,
Parameter initialization
and theIEEEdesired
13-Bus
123-Bus output, The fault types are categorized into 11 labels, including
illustrated as Eq. (6a). The loss function L(ŷ, y), described single line-to-ground (AG, BG, CG), line-to-line (AB, BC,
as Eq. (6b),Deep considers
Graph Neuralthe probability distribution
Fault event detection, over the CA), double line-to-ground (ABG, BCG, CAG), triple line-
classes predicted
Networkby the model and
Training classification
compares & localization
them with the to-ground (ABCG), and triple short-circuit (ABC). The yevent
Training
true distribution. The Stageinput-output model Testing
of theStage1-D GAT and denotes fault occurrence with binary values, 1 for presence
the loss function can be expressed as and 0 for absence. The ytype ranges from 0 to 10, indicating
corresponding 11 fault labels. Similarly, the fault location label
ŷ = F (X, E, A, E, W ) (6a) is ylocation = i, where i = 1, 2, · · · M if there is a fault at the
XN ith bus.
L(ŷ, y) = − yi log(yˆi ) (6b) Algorithm 1 and 1 describe the training and testing proce-
i=1 dure for fault detection, classification, and localization. The
training procedure involves iteratively sampling batches of
where W is the trainable weights, y is the ground-truth label graph data, using the proposed 1-D GAT model to identify
and ŷ is the output label. yi represents the true probability of labels, computing the loss, and updating model parameters.
class i (0 or 1) and yˆi represents the predicted probability of The testing procedure outlines where the trained model’s
class i. The label vectors y and ŷ could correspond to yevent parameters are loaded, the model is applied to the test data,
for fault binary event, ytype for fault categories, or ylocation and evaluation metrics are computed. The training of graph
for fault locations. datasets utilizes the stochastic gradient descent (SGD) as an
Each three-phase bus voltage and branch current are respec- optimizer. The binary cross entropy (BCE) serves as the loss
tively denoted as Va,k , Vb,k , Vc,k and Ia,k , Ib,k , Ic,k , where K function for fault event detection, while fault classification and
is the length of the period and k = 1, 2, ..., K. The node location utilize the cross entropy.
III. S IMULATION R ESULTS 2 3 4 5 6 7
A. Case Studies
DGU 5 : Fault points
This paper focuses case studies involving the Potsdam 13- 13 : PMU measurements
node and the IEEE 123-node systems, illustrated in Figs. 3
and 4. The Potsdam 13-node system, designed by our lab, is
powered by five inverter-based generators (IBGs), including 1 12 11 10 9 8
one photovoltaic array, two hydro plants, and two fossil fuel-
based backup generators. This microgrid can operate in grid-
DGU 1 DGU 2 DGU 3 DGU 4
connected as well as islanded modes, ensuring uninterrupted
Fig. 3. The Potsdam 13-node system diagram
power supply even during outages. The nominal voltage is
32 29 250 350
13.2 kV at 60 Hz. The microgrid is simulated using Matlab and 2
33
PMU 303
251
4 51 5 PMU 6
111 110 112 7113 114
28 50 151 300
Opal-RT, with load and IBG parameters set according to those 31
25 47
49
109 107
48 DGU 5 46
outlined in [45]. Input measurements are captured at locations 27
26
45
64
108
106 104 451
43 103
identified by blue squares. The bus voltages and branch 13 23 44 65
105
450
102
63 100
42 41
currents are measured at a rate of 1 kHz. For the proposed fault PMU
24
21 PMU
66 PMU
101
99 71
40 98
diagnostic approach utilizing the deep graph neural network, 1
22
12 11 38
39
10
62
9
197 70
8
36 97 69
operational data including load-change scenarios and fault 20
19
18
135
35
68
75
160 67
conditions are gathered for training and testing procedures. 37
60
73
74
57
The graph structure representing the system’s topology is DGU 111 14
59
58 DGU 2 DGU 372 DGU 4
85
61 79
610
constructed, taking into account all 13 buses in the system. 10
9
52
53 54 77
78
2 152 55 56
Regarding the IEEE 123-node system, we perform the sim- 7 8 13
76
80
84
94
ulation in the Opal-RT, with parameter settings in accordance 149 1 34
96 76
90 88 81
150 12 92
17
with [46]. Fault positions are created at different three-phase
15 86 83
buses across the system, as illustrated in Fig. 4. Measurement 3 5 6
95
93 91 89
87 82
16
data, collected only from the buses, are indicated by blue 4 195
3000 3000
2000 2000
Vc
Vb 8 0
1000 1000
A c c u r a c y (% )
Va
0 0
1000 1000
6 0 M L P
2000 2000
C N N
G C N
3000 3000
G A T
1 -D G A T
4000 4000 4 0
0 200 400 600 800 1000 400 405 410 415 420 0 1 0 2 0 3 0 4 0 5 0
Time (ms) Time (ms)
E p o c h s
Fig. 5. The three-phase voltage data in the IEEE 123-node Fig. 6. The training accuracy curves with machine learning models
true label
CA 0 0 0 0 1 867 0 0 0 0 0 B7 0 0 0 0 0 2 698 6 0 0 0 0 0
ABC 0 0 0 0 0 0 751 0 0 0 125 B8 0 0 0 0 0 0 5 717 0 0 0 0 0
ABG 0 0 0 0 0 0 0 837 0 0 0 B9 2 0 0 0 0 0 0 0 730 4 2 0 0
BCG 0 0 0 0 0 0 0 0 832 0 0 B10 0 1 0 0 0 0 0 3 1 739 0 0 0
0 0 0 0 0 0 0 0 0 892 0 B11 0 0 0 0 0 0 0 0 0 0 70313 3
CAG 0 0 0 0 0 0 0 0 0 0 10708 6
B12
ABCG 0 0 0 0 0 0 152 0 0 0 686 B13 0 2 0 0 0 0 0 0 0 1 1 12675
CG
AB
CA
AG
BG
BC
C
G
G
AB G
CG
AB
B1
B2
B3
B4
B5
B6
B7
B8
B19
B10
B11
B12
3
AB
CA
BC
B
predicted label predicted label
Fig. 7. The fault classification confusion matrix for Potsdam microgrid Fig. 9. The fault location confusion matrix on Potsdam microgrid
AG 268 0 0 0 0 0 0 0 0 0 0 B7 331 0 0 0 0 0 0 0 0 0
BG 15 303 0 0 0 0 0 0 0 0 0 B18 1 315 0 1 0 0 0 0 0 0
CG 4 0 296 0 2 1 0 0 0 2 0 B25 3 1 343 0 0 0 0 0 0 0
AB 7 0 4 302 0 0 0 0 0 0 0 B51 3 3 0 338 0 0 0 1 0 2
BC 0 0 0 0 278 0 0 0 0 0 0
true label
B53 2 0 0 0 297 0 0 0 0 0
true label
CA 4 0 0 0 3 296 0 0 0 0 0
B62 1 0 0 0 2 337 0 0 0 0
ABC 0 0 0 0 0 0 276 0 3 1 10
B80 0 0 0 0 0 0 319 1 0 0
ABG 11 1 0 0 0 0 0 274 0 0 0
BCG 7 0 0 0 1 0 0 0 320 0 0 B89 0 0 0 0 0 0 0 323 0 0
CAG 12 0 0 0 0 0 1 0 1 297 0 B97 0 0 0 1 0 0 0 0 316 17
ABCG 0 1 0 0 0 0 22 0 0 0 278 B101 0 0 0 1 0 0 1 0 24 316
B7
8
5
1
3
2
0
9
7
01
CG
AB
CA
AG
BG
BC
C
G
G
AB G
CG
B1
B2
B5
B5
B6
B8
B8
B9
AB
AB
CA
BC
B1
predicted label predicted label
Fig. 8. The fault classification confusion matrix for IEEE 123-node system Fig. 10. The fault location confusion matrix on IEEE 123-node system
Fault location: We can see the proposed model maintains its Similarly, confusion matrices of 1-D GAT for fault location
lead, especially noteworthy on the IEEE 123-node system with are shown in Figs. 9 and 10. From confusion matrices, it
an accuracy of 98.03% and a precision of 98.06%. While the can be seen that the diagonal elements are substantially high,
performance gap between the proposed model and others like implying good performance for fault location. The proposed
the GAT and GCN is narrower in fault location compared to model can localize the fault positions with 13 buses and 10
detection and classification, the proposed model still exhibits buses for Potsdam microgrid and IEEE 123 bus, respectively.
the best overall capability in pinpointing the exact location E. Impact of Measurement Noises
of faults within the systems. This precision in fault location
is crucial for quick responses and minimizing downtime in To evaluate the effects of noises, we add the noises into the
electrical grids. voltage and branch current measurements before training and
testing. This noise is characterized by a normal distribution
with a mean of zero, with noise ratios of 3%, 6%, and 10%.
D. Confusion Matrices for Classification and Location
The impact of additional noises on the accuracies of both the
The confusion matrices of 1-D GAT for fault type clas- Potsdam 13-node and the IEEE 123-node systems is presented
sification in the Potsdam 13-node and the IEEE 123-node in Table VI. The results indicate that low noise levels, such
systems are illustrated in Figs. 7 and 8, respectively. The diag- as 3% and 6%, have a small impact on the performance, with
onal elements represent correct classifications, while the off- accuracy decreases from 0.5% to 2% for each fault scenario.
diagonal elements indicate mis-classifications between certain However, under 10% of noise, the accuracies drop noticeably.
fault types. These matrices reveals that the proposed model Under three noise scenarios in the IEEE 123-node system,
demonstrated a high degree of accuracy in classifying and the performance of GAT and 1-D convolution GAT is com-
predicting the nature of the fault data. However, it is hard for pared in Table VII. The proposed method outperforms GAT
the model to distinguish the ABC and ABCG faults because by 1.3% to 3% as noise increases from 3% to 10%. The
two fault types have a similar transient behavior. To this end, proposed 1-D convolution GAT approach is more robust and
the proposed models can learn and correctly classify all the noise-resilient than the traditional GAT method, particularly
unseen remaining fault types. in scenarios with higher levels of noise or disturbances in the
TABLE IV
FAULT DETECTION METRICS ON P OTSDAM 13- NODE MICROGRID
TABLE V
FAULT DETECTION METRICS ON IEEE 123- NODE SYSTEM