E (N) Equivariant Graph Neural Networks: Thomas Et Al. 2018 Fuchs Et Al. 2020 Finzi Et Al. 2020 K Ohler Et Al. 2020
E (N) Equivariant Graph Neural Networks: Thomas Et Al. 2018 Fuchs Et Al. 2020 Finzi Et Al. 2020 K Ohler Et Al. 2020
Abstract
This paper introduces a new model to learn graph
neural networks equivariant to rotations, transla-
arXiv:2102.09844v1 [cs.LG] 19 Feb 2021
We evaluate our method in modelling dynamical systems, 2013; Defferrard et al., 2016; Kipf & Welling, 2016a).
representation learning in graph autoencoders and predict- Given a graph G = (V, E) with nodes vi ∈ V and edges
ing molecular properties in the QM9 dataset. Our method eij ∈ E we define a graph convolutional layer following
reports the best performance in all three experiments achiev- notation from (Gilmer et al., 2017) as:
ing state-of-the-art-results on several regression targets of
the QM9 dataset. mij = φe (hli , hlj , aij )
X
mi = mij (2)
2. Background
j∈N (i)
In this section we introduce the relevant materials on equiv- hl+1 = φh (hli , mi )
i
ariance and graph neural networks which will later comple-
ment the definition of our method. Where hli ∈ Rnf is the nf-dimensional embedding of node
vi at layer l. aij are the edge attributes. N (i) represents
2.1. Equivariance the set of neighbors of node vi . Finally, φe and φh are the
edge and node operations respectively which are commonly
Let Tg : X →− X be a set of transformations on X for the
approximated by Multilayer Perceptrons (MLPs).
abstract group g ∈ G. We say a function φ : X → − Y is
equivariant to g if there exists an equivalent transformation
on its output space Sg : Y → − Y such that: 3. Equivariant Graph Neural Networks
φ(Tg (x)) = Sg (φ(x)) (1) In this section we present Equivariant Graph Neural Net-
works (EGNNs). Following the notation from background
As a practical example, let φ(·) be a non-linear function,
Section 2.2, we consider a graph G = (V, E) with nodes
x = (x1 , . . . , xM ) ∈ RM ×n an input set of M point clouds
vi ∈ V and edges eij ∈ E. In addition to the feature node
embedded in a n-dimensional space, φ(x) = y ∈ RM ×n
embeddings hi ∈ Rnf we now also consider a n-dimensional
the transformed set of point clouds, Tg a translation on the
coordinate xi ∈ Rn associated with each of the graph nodes.
input set Tg (x) = x + g and Sg an equivalent translation
Our model will preserve equivariance to rotations and trans-
on the output set Sg (y) = y + g. If our transformation
lations on these set of coordinates xi and it will also preserve
φ:X→ − Y is translation equivariant, translating the input
equivariance to permutations on the set of nodes V in the
set Tg (x) and then applying the function φ(Tx (x)) on it,
same fashion as GNNs.
will deliver the same result as first running the function
y = φ(x) and then applying an equivalent translation to the Our Equivariant Graph Convolutional Layer (EGCL) takes
output Tg (y) such that Equation 1 is fulfilled and φ(x+g) = as input the set of node embeddings hl = {hl0 , . . . , hlM −1 },
φ(x) + g. In this work we explore the following three types coordinate embeddings xl = {xl0 , . . . , xlM −1 } and edge
of equivariance on a set of particles x: information E = (eij ) and outputs a transformation on hl+1
1. Translation equivariance. Translating the input by g ∈ and xl+1 . Concisely: hl+1 , xl+1 = EGCL[hl , xl , E]. The
Rn results in an equivalent translation of the output. equations that define this layer are the following:
Let x+g be shorthand for (x1 +g, . . . , xM +g). Then
2
mij = φe hli , hlj , xli − xlj , aij (3)
y + g = φ(x + g) X
2. Rotation (and reflection) equivariance. For any or- xl+1 = xli + xli − xlj φx (mij )
i (4)
thogonal matrix Q ∈ Rn×n , let Qx be shorthand for j6=i
(Qx1 , . . . , QxM ). Then rotating the input results in an X
equivalent rotation of the output Qy = φ(Qx). mi = mij (5)
j∈N (i)
3. Permutation equivariance. Permuting the input re-
hl+1 = φh hli , mi
sults in the same permutation of the output P (y) = i (6)
φ(P (x)) where P is a permutation on the row indexes.
Notice the main differences between the above proposed
Note that velocities v ∈ RM ×n are unaffected by transla-
method and the original Graph Neural Network from equa-
tions, but they transform equivalently under rotation (2) and
tion 2 are found in equations 3 and 4. In equation 3 we now
permutation (3). Our method introduced in Section 3 will
input the relative squared distance between two coordinates
satisfy the three above mentioned equivariant constraints.
kxli − xlj k2 into the edge operation φe . The embeddings hli ,
hlj , and the edge attributes aij are also provided as input
2.2. Graph Neural Networks
to the edge operation as in the GNN case. In our case the
Graph Neural Networks are permutation equivariant net- edge attributes will incorporate the edge values aij = eij ,
works that operate on graph structured data (Bruna et al., but they could also include additional edge information.
E(n) Equivariant Graph Neural Networks
In Equation 4 we update the position of each particle xi as a those cases where it is not 0. We can include momentum
vector field in a radial direction. In other words, the position to our proposed method by just replacing Equation 4 of our
of each particle xi is updated by the weighted sum of all model with the following equation:
relative differences (xi −xj )∀j . The weights of this sum are X
vil+1 = φv hli vil + xli − xlj φx (mij )
provided as the output of the function φx : Rnf → R1 that
takes as input the edge embedding mij from the previous j6=i (7)
l+1 l l+1
edge operation and outputs a scalar value. This equation xi = xi + vi
is the main difference of our model compared to standard
Note that this extends the EGCL layer as
GNNs and it is the reason why equivariances 1, 2 are pre-
hl+1 , xl+1 , vl+1 = EGCL[hl , xl , vl , E]. The only
served (proof in Appendix A). Despite its simplicity, this
difference is that now we broke down the coordinate update
equivariant operation is very flexible since the embedding
(eq. 4) in two steps, first we compute the velocity vil+1 and
mij can carry information from the whole graph and not
then we use this velocity to update the position xli . The
only from the given edge eij .
velocity vl is scaled by a new function φv : RN → R1
Finally, equations 5 and 6 follow the same updates than that maps the node embedding hli to a scalar value. Notice
(0)
standard GNNs. Equation 5 just aggregates all the incoming that if the initial velocity is set to zero (vi = 0), both
messages from neighbor nodes N (i) to node vi and Equa- equations 4 and 7 become exactly the same for the first layer
tion 6 performs the node operation φv which takes as input l = 0 and they become equivalent for the next layers since
the aggregated messages mi , the node emedding hli and φv just re-scales all the outputs of φx from the previous
outputs the updated node embedding hl+1
i . layers with a scalar value. We proof the equivariance of
this variant of the model in Appendix B.1. This variant
3.1. Analysis on E(n) equivariance is used in experiment 5.1 where we have to provide the
initial velocity of the system, and predict a relative position
In this section we analyze the equivariance properties of our
change.
model for E(3) symmetries (i.e. properties 1 and 2 stated in
section 2.1). In other words, our model should be translation
3.3. Inferring the edges
equivariant on x for any translation vector g ∈ Rn and it
should also be rotation and reflection equivariant on x for Given a point cloud or a set of nodes, we may not always be
any orthogonal matrix Q ∈ Rn×n . More formally our provided with an adjacency matrix. In those cases we can
model satisfies: assume a fully connected graph where all nodes exchange
messages with each other, in other words, the neighborhood
Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl )
operator N (i) from equation 5 would include all other nodes
We provide a formal proof of this in Appendix A. Intuitively, of the graph except for i. This fully connected approach
let’s consider a hl feature which is already E(n) invariant, may not scale well to large graphs where we may want to
then we can see that the resultant edge embedding mij from locally limit the exchange of messages to avoid an overflow
Equation 3 will also be E(n) invariant, because in addition of information.
to hl , it only depends on squared distances kxli − xlj k2 , Similarly to (Serviansky et al., 2020; Kipf et al., 2018) we
which are E(n) invariant. Next, Equation 4 computes xl+1 i present a simple solution to infer the relations/edges of
by a weighted sum of differences (xi − xj ) which is added the graph in our model. We can re-write the aggregation
to xi , this transforms as a type-1 vector and preserves equiv- operation from our model (eq. 5) in the following way:
ariance (see Appendix A). Finally the last two equations 5 X X
and 6 that generate the next layer node-embeddings hl+1 re- mi = mij = eij mij (8)
main E(n) invariant since they only depend on hl and mij j∈N (i) j6=i
Table 1. Comparison over different works from the literature under the message passing framework notation. We created this table with
the aim to provide a clear and simple way to compare over these different methods. The names from left to right are: Graph Neural
Networks (Gilmer et al., 2017); Radial Field from Equivariant Flows (Köhler et al., 2019); Tensor Field Networks (Thomas et al., 2018);
Schnet (Schütt et al., 2017b); and our Equivariant Graph Neural Network. The difference between two points is written rij = (xi − xj ).
is more data efficient than GNNs since it doesn’t require The symmetry problem: The above stated autoencoder
to generalize over rotations and translations of the data may seem straightforward to implement at first sight but in
while it ensembles the flexibility of GNNs in the larger some cases there is a strong limitation regarding the symme-
data regime. Due to its high model bias, the Radial Field try of the graph. Graph Neural Networks are convolutions
algorithm performs well when data is scarce but it is unable on the edges and nodes of a graph, i.e. the same function is
to learn the subtleties of the dataset as we increase the applied to all edges and to all nodes. In some graphs (e.g.
training size. In summary, our EGNN benefits from both those defined only by its adjacency matrix) we may not have
the high bias of E(n) methods and the flexibility of GNNs. input features in the nodes, such that the difference among
nodes relies only on their edges or neighborhood topology.
5.2. Graph Autoencoder Therefore, if the neighborhood of two nodes is exactly the
same, their encoded embeddings will be the same too. A
A Graph Autoencoder can learn unsupervised representa- clear example of this is a cycle graph (an example of a 4
tions of graphs in a continuous latent space (Kipf & Welling, nodes cycle graph is provided in Figure 3). When running a
2016b; Simonovsky & Komodakis, 2018). In this exper- Graph Neural Network encoder on a node featureless cycle
iment section we use our EGNN to build an Equivariant graph, we will obtain the exact same embedding for each
Graph Autoencoder. We will explain how Graph Autoen- of the nodes, which makes it impossible to reconstruct the
coders can benefit from equivariance and we will show how edges of the original graph from the node embeddings. The
our method outperforms standard GNN autoencoders in the cycle graph is a severe example where all nodes have the ex-
provided datasets. This problem is particularly interesting act same neighborhood topology but these symmetries can
since the embedding space can be scaled to larger dimen- be present in different ways for other graphs with different
sions and is not limited to a 3 dimensional Euclidean space. edge distributions or even when including node features if
Similarly to the work of (Kipf & Welling, 2016b) further these are not unique.
extended by section 3.3 in (Liu et al., 2019), our graph
auto-encoder z = q(G) embeds a graph G into a set of
latent nodes z = {z1 , . . . zM } ∈ RM ×n , where M is the
number of nodes and n the embedding size per node. Notice
this reduces the memory complexity to store the graphs
from O(M 2 ) to O(M ) with respect to the number of nodes.
This differs from (Simonovsky & Komodakis, 2018) which
embeds the graph in a single vector z ∈ RK , which causes
the reconstruction to be computationally very expensive
since the nodes of the decoded graph have to be matched Figure 3. Visual representation of a Graph Autoencoder for a 4
again to the ground truth. nodes cycle graph. The bottom row illustrates that adding noise at
the input graph breaks the symmetry of the embedding allowing
More specifically, we will compare our Equivariant Graph the reconstruction of the adjacency matrix.
Auto-Encoder in the task presented in (Liu et al., 2019)
where a graph G = (V, E) with node features H ∈ RM ×nf An ad-hoc method to break the symmetry of the graph intro-
and adjacency matrix A ∈ {0, 1}M ×M is embedded into duced by (Liu et al., 2019). This method introduces noise
a latent space z = q(H, A) ∈ RM ×n . Following (Kipf & sampled from a Gaussian distribution into the input node
Welling, 2016b; Liu et al., 2019), we are only interested in features of the graph h0i ∼ N (0, σI). This noise allows
reconstructing the adjacency matrix A since the datasets we different representation for all node embeddings and as a
will work with do not contain node features. The decoder result the graph can be decoded again, but it comes with
g(·) proposed by (Liu et al., 2019) takes as input the embed- a drawback, the network has to generalize over the new
ding space z and outputs the reconstructed adjacency matrix introduced noise distribution. Our Equivariant Graph Au-
 = g(z), this decoder function is defined as follows: toencoder will remain translation and rotation equivariant to
1 this sampled noise which we find makes the generalization
Âij = ge (zi , zj ) = 2 (9) much easier. In our case we will simply input this noise
1 + exp(w kzi − zj k + b)
as the input coordinates x0 ∼ N (0, σI) ∈ RM ×n of our
EGNN which will output an equivariant transformation of
Where w and b are its only learnable parameters and ge (·) is
them xL , this output will be used as the embedding of the
the decoder edge function applied to every pair of node em-
graph (i.e. z = xL ) which is the input to the decoder from
beddings. It reflects that edge probabilities will depend on
Equation 9.
the relative distances among node embeddings. The training
loss is defined as the binary cross entropy
P between the esti- Dataset: We generated community-small graphs (You et al.,
mated and the ground truth edges L = ij BCE(Âij , Aij ). 2018; Liu et al., 2019) by running the original code from
E(n) Equivariant Graph Neural Networks
Figure 5. In the Table at the left we report the Binary Cross Entropy, % Error and F1 scores for the test partition on the Graph Autoencoding
experiment in the Community Small and Erdos&Renyi datasets. In the Figure at the right, we report the F1 score when overfitting a
training partition of 100 samples in the Erdos&Renyi dataset for different values of sparsity pe . The GNN is not able to successfully
auto-encode sparse graphs (small pe values) for the Erdos&Renyi dataset even when training and testing on the same small subset.
(You et al., 2018). These graphs contain 12 ≤ M ≤ 20 all numbers refer to the test partition. We also include a
nodes. We also generated a second dataset using the Er- ”Baseline” that predicts all edges as missing Âij = 0. The
dos&Renyi generative model (Bollobás & Béla, 2001) sam- standard GNN seems to suffer from the symmetry problem
pling random graphs with an initial number of 7 ≤ M ≤ 16 and provides the worst performance. When introducing
nodes and edge probability pe = 0.25. We sampled 5.000 noise (Noise-GNN), both the loss and the error decrease
graphs for training, 500 for validation and 500 for test for showing that it is actually useful to add noise to the input
both datasets. Each graph is defined as and adjacency matrix nodes. Finally, our EGNN remains E(n) equivariant to
A ∈ {0, 1}M ×M . this noise distribution and provides the best reconstruction
with a 0.11% error in the Erdos&Renyi dataset and close to
Implementation details: Our Equivariant Graph Auto-
optimal 0.05% in the Community Small dataset.
Encoder is composed of an EGNN encoder followed by
the decoder from Equation 9. The graph edges Aij are Overfitting the training set: We explained the symme-
input as edge attributes aij in Equation 3. The noise try problem and we showed the EGNN outperforms other
used to break the symmetry is input as the coordinates methods in the given datasets. Although we observed that
x0 ∼ N (0, σI) ∈ RM ×n in the first layer and h0 is initial- adding noise to the GNN improves the results, it is difficult
ized as ones since we are working with featureless graphs. to exactly measure the impact of the symmetry limitation in
As mentioned before, the encoder outputs an equivariant these results independent from other factors such as gener-
transformation on the coordinates which is the graph em- alization from the training to the test set. In this section we
bedding and input to the decoder z = xL ∈ RM ×n . We conduct an experiment where we train the different models
use n = 8 dimensions for the embedding space. We com- in a subset of 100 Erdos&Renyi graphs and embedding size
pare the EGNN to its GNN cousin, we also compare to n = 16 with the aim to overfit the data. We evaluate the
Noise-GNN which is an adaptation of our GNN to match methods on the training data. In this experiment the GNN
the setting from (Liu et al., 2019), and we also include the is unable to fit the training data properly while the EGNN
Radial Field algorithm as an additional baseline. All four can achieve perfect reconstruction and Noise-GNN close to
models have 4 layers, 64 features for the hidden layers, the perfect. We sweep over different pe sparsity values from 0.1
Swish activation function as a non-linearity and they were to 0.9 since the symmetry limitation is more present in very
all trained for 100 epochs using the Adam optimizer and sparse or very dense graphs. We report the F1 scores of this
learning rate 1e−4 . More details are provided in Appendix experiment in the right plot of Figure 5.
C.2. Since the number of nodes is larger than the number of
In this experiment we showed that E(n) equivariance im-
layers, the receptive field of a GNN may not comprise the
proves performance when embedding graphs in a continuous
whole graph which can make the comparison unfair with our
space as a set of nodes in dimension n. Even though this is a
EGNN. To avoid this limitation, all models exchange mes-
simple reconstruction task, we think this can be a useful step
sages among all nodes and the edge information is provided
towards generating graphs or molecules where often graphs
as edge attributes aij = Aij in all of them.
(i.e. edges) are decoded as pairwise distances or similarities
Results: In the table from Figure 5 we report the Binary between nodes e.g. (Kipf & Welling, 2016b; Liu et al., 2019;
Cross Entropy loss between the estimated and ground truth Grover et al., 2019), and these metrics (e.g. eq. 9) are E(n)
edges, the % Error which is defined as the percentage of invariant. Additionally this experiment also showed that our
wrong predicted edges with respect to the total amount of method can successfully perform in a E(n) equivariant task
potential edges, and the F1 score of the edge classification, for higher dimensional spaces where n > 3.
E(n) Equivariant Graph Neural Networks
Table 3. Mean Absolute Error for the molecular property prediction benchmark in QM9 dataset.
5.3. Molecular data — QM9 dings hL from the output of the EGNN to the estimated
property value. Further implementation details are reported
The QM9 dataset (Ramakrishnan et al., 2014) has become
in Appendix. C. We compare to NMP (Gilmer et al., 2017),
a standard in machine learning as a chemical property pre-
Schnet (Schütt et al., 2017b), Cormorant (Anderson et al.,
diction task. The QM9 dataset consists of small molecules
2019), L1Net (Miller et al., 2020), LieConv (Finzi et al.,
represented as a set of atoms (up to 29 atoms per molecule),
2020), TFN (Thomas et al., 2018) and SE(3)-Tr. (Fuchs
each atom having a 3D position associated and a five di-
et al., 2020).
mensional one-hot node embedding that describe the atom
type (H, C, N, O, F). The dataset labels are a variety of Results are presented in Table 3. Our method reports state
chemical properties for each of the molecules which are esti- of the art in 9 out of 12 property predictions, and it is the
mated through regression. These properties are invariant to second best performing method for the remaining 3 proper-
translations, rotations and reflections on the atom positions. ties. Perhaps, surprisingly, we outperform other equivariant
Therefore those models that are E(3) invariant are highly networks that consider higher order representations while in
suitable for this task. this task, we are only using type-0 representations (i.e. rela-
tive distances) to define the geometry of the molecules. In
We imported the dataset partitions from (Anderson et al.,
Appendix D we prove that when only positional information
2019), 100K molecules for training, 18K for validation and
is given (i.e. no velocity or higher order type features), then
13K for testing. A variety of 12 chemical properties were
the geometry is completely defined by the norms in-between
estimated per molecule. We optimized and report the Mean
points up to E(n)-transformations, in other words, if two
Absolute Error between predictions and ground truth.
collections of points separated by E(n) transformations are
Implementation details: Our EGNN receives as input the considered to be identical, then the relative norms between
3D coordinate locations of each atom which are provided as points is a unique identifier of the collection.
x0i in Equation 3 and an embedding of the atom properties
which is provided as input node features h0i . Since this is 6. Conclusions
an invariant task and also x0 positions are static, there is no
need to update the particle’s position x by running Equation Equivariant graph neural networks are receiving increasing
4 as we did in previous experiments. Consequently, we interest from the natural and medical sciences as they repre-
tried both manners and we didn’t notice any improvement sent a new tool for analyzing molecules and their properties.
by updating x. When not updating the particle’s position In this work, we presented a new E(n) equivariant deep ar-
(i.e. skipping Equation 4), our model becomes E(3) in- chitecture for graphs that is computationally efficient, easy
variant, which is analogous to a standard GNN where all to implement, and significantly improves over the current
relative squared norms between pairs of points kxi − xj k2 state-of-the-art on a wide range of tasks. We believe these
are inputted to the edge operation (eq. 3). Additionally, properties make it ideally suited to make a direct impact on
since we are not provided with an adjacency matrix and topics such as drug discovery, protein folding and the design
molecules can scale up to 29 nodes, we use the extension of new materials, as well as applications in 3D computer
of our model from Section 3.3 that infers a soft estimation vision.
of the edges. Our EGNN network consists of 7 layers, 128
features per hidden layer and the Swish activation function
as a non-linearity. A sum-pooling operation preceded and
followed by two layers MLPs maps all the node embed-
E(n) Equivariant Graph Neural Networks
Cohen, T. S. and Welling, M. Steerable cnns. In 5th Inter- Miller, B. K., Geiger, M., Smidt, T. E., and Noé,
national Conference on Learning Representations, ICLR, F. Relevance of rotationally equivariant convolutions
2017. for predicting molecular properties. arXiv preprint
arXiv:2008.08461, 2020.
Defferrard, M., Bresson, X., and Vandergheynst, P. Con-
Nagabandi, A., Kahn, G., Fearing, R. S., and Levine, S.
volutional neural networks on graphs with fast localized
Neural network dynamics for model-based deep reinforce-
spectral filtering. In Advances in neural information pro-
ment learning with model-free fine-tuning. In 2018 IEEE
cessing systems, pp. 3844–3852, 2016.
International Conference on Robotics and Automation
Finzi, M., Stanton, S., Izmailov, P., and Wilson, A. G. Gen- (ICRA), pp. 7559–7566. IEEE, 2018.
eralizing convolutional neural networks for equivariance
Ramachandran, P., Zoph, B., and Le, Q. V. Searching for
to lie groups on arbitrary continuous data. arXiv preprint
activation functions. arXiv preprint arXiv:1710.05941,
arXiv:2002.12880, 2020.
2017.
Fuchs, F., Worrall, D., Fischer, V., and Welling, M. Se Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld,
(3)-transformers: 3d roto-translation equivariant attention O. A. Quantum chemistry structures and properties of
networks. Advances in Neural Information Processing 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
Systems, 33, 2020.
Rezende, D. J., Racanière, S., Higgins, I., and Toth, P. Equiv-
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and ariant hamiltonian flows. CoRR, abs/1909.13739, 2019.
Dahl, G. E. Neural message passing for quantum chem-
istry. arXiv preprint arXiv:1704.01212, 2017. Romero, D. W. and Cordonnier, J.-B. Group equiv-
ariant stand-alone self-attention for vision. In In-
Grover, A., Zweig, A., and Ermon, S. Graphite: Iterative ternational Conference on Learning Representations,
generative modeling of graphs. In International confer- 2021. URL https://openreview.net/forum?
ence on machine learning, pp. 2434–2444. PMLR, 2019. id=JkfYjnOEo6M.
Grzeszczuk, R., Terzopoulos, D., and Hinton, G. Neuroan- Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R.,
imator: Fast neural network emulation and control of and Tkatchenko, A. Quantum-chemical insights from
physics-based models. In Proceedings of the 25th an- deep tensor neural networks. Nature communications, 8
nual conference on Computer graphics and interactive (1):1–8, 2017a.
techniques, pp. 9–20, 1998.
Schütt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S.,
Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel, Tkatchenko, A., and Müller, K.-R. Schnet: A continuous-
R. Neural relational inference for interacting systems. filter convolutional neural network for modeling quantum
arXiv preprint arXiv:1802.04687, 2018. interactions. arXiv preprint arXiv:1706.08566, 2017b.
E(n) Equivariant Graph Neural Networks
Serviansky, H., Segol, N., Shlomi, J., Cranmer, K., Gross, E.,
Maron, H., and Lipman, Y. Set2graph: Learning graphs
from sets. Advances in Neural Information Processing
Systems, 33, 2020.
Simonovsky, M. and Komodakis, N. Graphvae: Towards
generation of small graphs using variational autoencoders.
In International Conference on Artificial Neural Net-
works, pp. 412–422. Springer, 2018.
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L.,
Kohlhoff, K., and Riley, P. Tensor field networks:
Rotation-and translation-equivariant neural networks for
3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
Uy, M. A., Pham, Q.-H., Hua, B.-S., Nguyen, T., and Yeung,
S.-K. Revisiting point cloud classification: A new bench-
mark dataset and classification model on real-world data.
In Proceedings of the IEEE International Conference on
Computer Vision, pp. 1588–1597, 2019.
You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J.
Graphrnn: Generating realistic graphs with deep auto-
regressive models. arXiv preprint arXiv:1802.08773,
2018.
E(n) Equivariant Graph Neural Networks
A. Equivariance Proof
In this section we prove that our model is translation equivariant on x for any translation vector g ∈ Rn and it is rotation
and reflection equivariant on x for any orthogonal matrix Q ∈ Rn×n . More formally, we will prove the model satisfies:
We will analyze how a translation and rotation of the input coordinates propagates through our model. We start assuming h0
is invariant to E(n) transformations on x, in other words, we do not encode any information about the absolute position
or orientation of x0 into h0 . Then, the output mij of Equation 3 will be invariant too since the distance between two
particles is invariant to translations kxli + g − [xlj + g]k2 = kxli − xlj k2 , and it is invariant to rotations and reflections
kQxli − Qxlj k2 = (xli − xlj )> Q> Q(xli − xlj ) = (xli − xlj )> I(xli − xlj ) = kxli − xlj k2 such that the edge operation becomes
invariant:
2
2
mi,j = φe hli , hlj , Qxli + g − [Qxlj + g] , aij = φe hli , hlj , xli − xlj , aij
The second equation of our model (eq. 4) that updates the coordinates x is E(n) equivariant. Following, we prove its
equivariance by showing that an E(n) transformation of the input leads to the same transformation of the output. Notice
mij is already invariant as proven above. We want to show:
X
Qxl+1 + g = Qxli + g + Qxli + g − [Qxlj + g] φx (mi,j )
i
j6=i
Derivation.
X X
Qxli + g + Qxli + g − Qxlj − g φx (mi,j ) = Qxli + g + Q xli − xlj φx (mi,j )
j6=i j6=i
X
= Q xli + xli − xj φx (mi,j ) + g
l
j6=i
= Qxl+1
i +g
Therefore, we have proven that rotating and translating xl results in the same rotation and translation on xl+1 at the output
of Equation 4.
Furthermore equations 5 and 6 only depend on mij and hl which as saw at the beginning of this proof, are E(n) invariant,
therefore the output of Equation 6 hl+1 will be invariant too. Thus concluding that a transformation Qxl + g on xl will result
in the same transformation on xl+1 while hl+1 will remain invariant to it such that Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl )
is satisfied.
j6=i
xl+1
i = xli
+ vil+1
X
mi = mij
j∈N (i)
hl+1 = φh hli , mi
i
E(n) Equivariant Graph Neural Networks
In Appendix A we already proved the equivariance of our EGNN (Section 3) when not including vector type inputs. In its
velocity type inputs variant we only replaced its coordinate updates (eq. 4) by Equation 7 that includes velocity. Since this is
the only modification we will only prove that Equation 7 re-written below is equivariant.
X
vil+1 = φv hli vil + xli − xlj φx (mij )
j6=i
xl+1
i = xli + vil+1
First, we prove the first line preserves equivariance, that is we want to show:
X
Qvil+1 = φv hli Qvil + Qxli + g − [Qxlj + g] φx (mij )
j6=i
Derivation.
X X
φv hli Qvil + Qxli + g − [Qxlj + g] φx (mij ) = Qφv hli vil + Q xli − xlj φx (mij )
(10)
j6=i j6=i
X
hli vil + xli − xlj φx (mij )
= Q φv (11)
j6=i
= Qvil+1 (12)
Finally, it is straightforward to show the second equation is also equivariant, that is we want to show Qxl+1
i +g =
Qxli + g + Qvil+1
Derivation.
Qxli + g + Qvil+1 = Q xli + vil+1 + g
= Qxl+1
i +g
Concluding we showed that an E(n) transformation on the input set of points results in the same transformation on the
output set of points such that hl+1 , Qxl+1 + g, Qvl+1 = EGCL[hl , Qxl + g, Qvl , E] is satisfied.
C. Implementation details
In this Appendix section we describe the implementation details of the experiments. First, we describe those parts of
our model that are the same across all experiments. Our EGNN model from Section 3 contains the following three main
learnable functions.
• The edge function φe (eq. 3) is a two layers MLP with two Swish non-linearities: Input →
− {LinearLayer() →
− Swish()
→
− LinearLayer() →
− Swish() } → − Output.
• The coordinate function φx (eq. 4) consists of a two layers MLP with one non-linearity: Input →
− {LinearLayer() →
−
Swish() →
− LinearLayer() } →
− Output
• The node function φh (eq. 6) consists of a two layers MLP with one non-linearity and a residual connection: Input →
−
{LinearLayer() →
− Swish() →
− LinearLayer() → − Addition(Input) } →
− Output
These functions are used in our EGNN across all experiments. Notice the GNN (eq. 2) also contains and edge operation and
a node operation φe and φh respectively. We use the same functions described above for both the GNN and the EGNN such
that comparisons are as fair as possible.
E(n) Equivariant Graph Neural Networks
• EGNN: For the EGNN we use its variation that considers vector type inputs from Section 3.2. This variation adds
the function φv to the model which is composed of two linear layers with one non-linearity: Input →
− {LinearLayer()
→
− Swish() → − LinearLayer() } →− Output. Functions φe , φx and φh that define our EGNN are the same than for all
experiments and are described at the beginning of this Appendix C.
• GNN: The GNN is also composed of 4 layers, its learnable functions edge operation φe and node operation φh from
Equation 2 are exactly the same as φe and φh from our EGNN introduced in Appendix C. We chose the same functions
for both models to ensure a fair comparison. In the GNN case, the initial position p0 and velocity v0 from the particles
is passed through a linear layer and inputted into the GNN first layer h0 . The particle’s charges are inputted as edge
attributes aij = ci cj . The output of the GNN hL is passed through a two layers MLP that maps it to the estimated
position.
• Radial Field: The Radial Field algorithm is described in the Related Work 4, its only parameters are contained in
its edge operation φrf () which in our case is a two layers MLP with two non linearities Input →− {LinearLayer() →−
Swish() → − LinearLayer() → − Tanh } − → Output. Notice we introduced a Tanh at the end of the MLP which fixes some
instability issues that were causing this model to diverge in the dynamical system experiment. We also augmented the
Radial Field algorithm with the vector type inputs modifications introduced in Section 3.2. In addition to the norms
between pairs of points, φrf () also takes as input the particle charges ci cj .
In this experiment we worked with Community Small (You et al., 2018) and Erdos&Renyi (Bollobás & Béla, 2001) generated
datasets.
• Community Small: We used the original code from (You et al., 2018) (https://github.com/JiaxuanYou/graph-generation)
to generate a Community Small dataset. We sampled 5.000 training graphs, 500 for validation and 500 for testing.
• Erdos&Renyi is one of the most famous graph generative algorithms. We used the ”gnp random graph(M , p)” function
from (https://networkx.org/) that generates random graphs when povided with the number of nodes M and the edge
probability p following the Erdos&Renyi model. Again we generated 5.000 graphs for training, 500 for validation
and 500 for testing. We set the edge probability (or sparsity value) to p = 0.25 and the number of nodes M ranging
from 7 to 16 deterministically uniformly distributed. Notice that edges are generated stochastically with probability
p, therefore, there is a chance that some nodes are left disconnected from the graph, ”gnp random graph(M , p)”
function discards these disconnected nodes such that even if we generate graphs setting parameters to 7 ≤ M ≤ 16
and p = 0.25 the generated graphs may have less number of nodes.
Finally, in the graph autoencoding experiment we also overfitted in a small partition of 100 samples (Figure 5) for the
Erdos&Renyi graphs described above. We reported results for different p values ranging from 0.1 to 0.9. For each p value
we generated a partition of 100 graphs with initial number of nodes between 7 ≤ M ≤ 16 using the Erdos&Renyi generative
model.
Models
All models consist of 4 layers, 64 features for the hidden layers and the Swish activation function as a non linearity. The
EGNN is defined as explained in Section 3 without any additional modules (i.e. no velocity type features or inferring edges).
The functions φe , φx and φh are defined at the beginning of this Appendix C. The GNN (eq. 2) mimics the EGNN in terms
that it uses the same φh and φe than the EGNN for its edge and node updates. The Noise-GNN is exactly the same as the
GNN but inputting noise into the h0 features. Finally the Radial Field was defined in the Related Related work Section 4
which edge’s operation φrf consists of a two layers MLP: Input → − { Linear() →
− Swish() →− Linear() } →− Output.
Other implementation details
All experiments have been trained with learning rate 1e−4 , batch size 1, Adam optimizer, 100 training epochs for the 5.000
samples sized datasets performing early stopping for the minimum Binary Cross Entropy loss in the validation partition.
The overfitting experiments were trained for 10.000 epochs on the 100 samples subsets.
Thus showing that Axi = yi for all i = 1, . . . , M , proving (1). Finally we want to show that A is orthogonal, when
restricted to Vx . This follows since:
hAxij , Axij i = hyij , yij i = hxij , xij i
for the basis elements xi1 , ..., xid . This implies that A is orthogonal (at least when restricted to Vx ). Finally A can be
extended via an orthogonal complement of Vx to the whole space. This concludes the proof for (2) and shows that A is
indeed orthogonal.