0% found this document useful (0 votes)
20 views15 pages

E (N) Equivariant Graph Neural Networks: Thomas Et Al. 2018 Fuchs Et Al. 2020 Finzi Et Al. 2020 K Ohler Et Al. 2020

This paper introduces a new model called E(n)-Equivariant Graph Neural Networks (EGNNs) that can learn graph neural networks equivariant to rotations, translations, reflections and permutations. The proposed model achieves competitive performance without requiring computationally expensive higher-order representations, and can be scaled to higher-dimensional spaces unlike existing methods. The effectiveness of EGNNs is demonstrated on tasks including dynamical systems modelling, graph autoencoders, and predicting molecular properties.

Uploaded by

kikag60730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views15 pages

E (N) Equivariant Graph Neural Networks: Thomas Et Al. 2018 Fuchs Et Al. 2020 Finzi Et Al. 2020 K Ohler Et Al. 2020

This paper introduces a new model called E(n)-Equivariant Graph Neural Networks (EGNNs) that can learn graph neural networks equivariant to rotations, translations, reflections and permutations. The proposed model achieves competitive performance without requiring computationally expensive higher-order representations, and can be scaled to higher-dimensional spaces unlike existing methods. The effectiveness of EGNNs is demonstrated on tasks including dynamical systems modelling, graph autoencoders, and predicting molecular properties.

Uploaded by

kikag60730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

E(n) Equivariant Graph Neural Networks

Victor Garcia Satorras 1 Emiel Hoogeboom 1 Max Welling 1

Abstract
This paper introduces a new model to learn graph
neural networks equivariant to rotations, transla-
arXiv:2102.09844v1 [cs.LG] 19 Feb 2021

tions, reflections and permutations called E(n)-


Equivariant Graph Neural Networks (EGNNs). In
contrast with existing methods, our work does not
require computationally expensive higher-order
representations in intermediate layers while it
still achieves competitive or better performance.
In addition, whereas existing methods are lim-
ited to equivariance on 3 dimensional spaces,
our model is easily scaled to higher-dimensional
spaces. We demonstrate the effectiveness of our
method on dynamical systems modelling, repre-
sentation learning in graph autoencoders and pre-
dicting molecular properties.

Figure 1. Example of rotation equivariance on a graph with a graph


neural network φ
1. Introduction
Although deep learning has largely replaced hand-crafted
features, many advances are critically dependent on induc- Recently, various forms and methods to achieve E(3) or
tive biases in deep neural networks. An effective method to SE(3) equivariance have been proposed (Thomas et al.,
restrict neural networks to relevant functions is to exploit 2018; Fuchs et al., 2020; Finzi et al., 2020; Köhler et al.,
the symmetry of problems by enforcing equivariance with 2020). Many of these works achieve innovations in study-
respect to transformations from a certain symmetry group. ing types of higher-order representations for intermediate
Notable examples are translation equivariance in Convo- network layers. However, the transformations for these
lutional Neural Networks and permutation equivariance in higher-order representations require coefficients or approx-
Graph Neural Networks (Bruna et al., 2013; Defferrard et al., imations that can be expensive to compute. Additionally,
2016; Kipf & Welling, 2016a). in practice for many types of data the inputs and outputs
are restricted to scalar values (for instance temperature or
Many problems exhibit 3D translation and rotation symme- energy, referred to as type-0 in literature) and 3d vectors
tries. Some examples are point clouds (Uy et al., 2019), 3D (for instance velocity or momentum, referred to as type-1 in
molecular structures (Ramakrishnan et al., 2014) or N-body literature).
particle simulations (Kipf et al., 2018). The group corre-
sponding to these symmetries is named the Euclidean group: In this work we present a new architecture that is translation,
SE(3) or when reflections are included E(3). It is often de- rotation and reflection equivariant (E(n)), and permutation
sired that predictions on these tasks are either equivariant or equivariant with respect to an input set of points. Our model
invariant with respect to E(3) transformations. is simpler than previous methods in that it does not require
the spherical harmonics as in (Thomas et al., 2018; Fuchs
1
UvA-Bosch Delta Lab, University of Amsterdam, et al., 2020) while it can still achieve competitive or bet-
Netherlands. Correspondence to: Victor Garcia Sator- ter results. In addition, equivariance in our model is not
ras <v.garciasatorras@uva.nl>, Emiel Hoogeboom
<e.hoogeboom@uva.nl>, Max Welling <m.welling@uva.nl>. limited to the 3-dimensional space and can be scaled to
larger dimensional spaces without a significant increase in
Preliminary work. Under review. computation.
E(n) Equivariant Graph Neural Networks

We evaluate our method in modelling dynamical systems, 2013; Defferrard et al., 2016; Kipf & Welling, 2016a).
representation learning in graph autoencoders and predict- Given a graph G = (V, E) with nodes vi ∈ V and edges
ing molecular properties in the QM9 dataset. Our method eij ∈ E we define a graph convolutional layer following
reports the best performance in all three experiments achiev- notation from (Gilmer et al., 2017) as:
ing state-of-the-art-results on several regression targets of
the QM9 dataset. mij = φe (hli , hlj , aij )
X
mi = mij (2)
2. Background
j∈N (i)
In this section we introduce the relevant materials on equiv- hl+1 = φh (hli , mi )
i
ariance and graph neural networks which will later comple-
ment the definition of our method. Where hli ∈ Rnf is the nf-dimensional embedding of node
vi at layer l. aij are the edge attributes. N (i) represents
2.1. Equivariance the set of neighbors of node vi . Finally, φe and φh are the
edge and node operations respectively which are commonly
Let Tg : X →− X be a set of transformations on X for the
approximated by Multilayer Perceptrons (MLPs).
abstract group g ∈ G. We say a function φ : X → − Y is
equivariant to g if there exists an equivalent transformation
on its output space Sg : Y → − Y such that: 3. Equivariant Graph Neural Networks
φ(Tg (x)) = Sg (φ(x)) (1) In this section we present Equivariant Graph Neural Net-
works (EGNNs). Following the notation from background
As a practical example, let φ(·) be a non-linear function,
Section 2.2, we consider a graph G = (V, E) with nodes
x = (x1 , . . . , xM ) ∈ RM ×n an input set of M point clouds
vi ∈ V and edges eij ∈ E. In addition to the feature node
embedded in a n-dimensional space, φ(x) = y ∈ RM ×n
embeddings hi ∈ Rnf we now also consider a n-dimensional
the transformed set of point clouds, Tg a translation on the
coordinate xi ∈ Rn associated with each of the graph nodes.
input set Tg (x) = x + g and Sg an equivalent translation
Our model will preserve equivariance to rotations and trans-
on the output set Sg (y) = y + g. If our transformation
lations on these set of coordinates xi and it will also preserve
φ:X→ − Y is translation equivariant, translating the input
equivariance to permutations on the set of nodes V in the
set Tg (x) and then applying the function φ(Tx (x)) on it,
same fashion as GNNs.
will deliver the same result as first running the function
y = φ(x) and then applying an equivalent translation to the Our Equivariant Graph Convolutional Layer (EGCL) takes
output Tg (y) such that Equation 1 is fulfilled and φ(x+g) = as input the set of node embeddings hl = {hl0 , . . . , hlM −1 },
φ(x) + g. In this work we explore the following three types coordinate embeddings xl = {xl0 , . . . , xlM −1 } and edge
of equivariance on a set of particles x: information E = (eij ) and outputs a transformation on hl+1
1. Translation equivariance. Translating the input by g ∈ and xl+1 . Concisely: hl+1 , xl+1 = EGCL[hl , xl , E]. The
Rn results in an equivalent translation of the output. equations that define this layer are the following:
Let x+g be shorthand for (x1 +g, . . . , xM +g). Then
 2

mij = φe hli , hlj , xli − xlj , aij (3)
y + g = φ(x + g) X
2. Rotation (and reflection) equivariance. For any or- xl+1 = xli + xli − xlj φx (mij )

i (4)
thogonal matrix Q ∈ Rn×n , let Qx be shorthand for j6=i
(Qx1 , . . . , QxM ). Then rotating the input results in an X
equivalent rotation of the output Qy = φ(Qx). mi = mij (5)
j∈N (i)
3. Permutation equivariance. Permuting the input re-
hl+1 = φh hli , mi

sults in the same permutation of the output P (y) = i (6)
φ(P (x)) where P is a permutation on the row indexes.
Notice the main differences between the above proposed
Note that velocities v ∈ RM ×n are unaffected by transla-
method and the original Graph Neural Network from equa-
tions, but they transform equivalently under rotation (2) and
tion 2 are found in equations 3 and 4. In equation 3 we now
permutation (3). Our method introduced in Section 3 will
input the relative squared distance between two coordinates
satisfy the three above mentioned equivariant constraints.
kxli − xlj k2 into the edge operation φe . The embeddings hli ,
hlj , and the edge attributes aij are also provided as input
2.2. Graph Neural Networks
to the edge operation as in the GNN case. In our case the
Graph Neural Networks are permutation equivariant net- edge attributes will incorporate the edge values aij = eij ,
works that operate on graph structured data (Bruna et al., but they could also include additional edge information.
E(n) Equivariant Graph Neural Networks

In Equation 4 we update the position of each particle xi as a those cases where it is not 0. We can include momentum
vector field in a radial direction. In other words, the position to our proposed method by just replacing Equation 4 of our
of each particle xi is updated by the weighted sum of all model with the following equation:
relative differences (xi −xj )∀j . The weights of this sum are X
vil+1 = φv hli vil + xli − xlj φx (mij )
 
provided as the output of the function φx : Rnf → R1 that
takes as input the edge embedding mij from the previous j6=i (7)
l+1 l l+1
edge operation and outputs a scalar value. This equation xi = xi + vi
is the main difference of our model compared to standard
Note that this extends the EGCL layer as
GNNs and it is the reason why equivariances 1, 2 are pre-
hl+1 , xl+1 , vl+1 = EGCL[hl , xl , vl , E]. The only
served (proof in Appendix A). Despite its simplicity, this
difference is that now we broke down the coordinate update
equivariant operation is very flexible since the embedding
(eq. 4) in two steps, first we compute the velocity vil+1 and
mij can carry information from the whole graph and not
then we use this velocity to update the position xli . The
only from the given edge eij .
velocity vl is scaled by a new function φv : RN → R1
Finally, equations 5 and 6 follow the same updates than that maps the node embedding hli to a scalar value. Notice
(0)
standard GNNs. Equation 5 just aggregates all the incoming that if the initial velocity is set to zero (vi = 0), both
messages from neighbor nodes N (i) to node vi and Equa- equations 4 and 7 become exactly the same for the first layer
tion 6 performs the node operation φv which takes as input l = 0 and they become equivalent for the next layers since
the aggregated messages mi , the node emedding hli and φv just re-scales all the outputs of φx from the previous
outputs the updated node embedding hl+1
i . layers with a scalar value. We proof the equivariance of
this variant of the model in Appendix B.1. This variant
3.1. Analysis on E(n) equivariance is used in experiment 5.1 where we have to provide the
initial velocity of the system, and predict a relative position
In this section we analyze the equivariance properties of our
change.
model for E(3) symmetries (i.e. properties 1 and 2 stated in
section 2.1). In other words, our model should be translation
3.3. Inferring the edges
equivariant on x for any translation vector g ∈ Rn and it
should also be rotation and reflection equivariant on x for Given a point cloud or a set of nodes, we may not always be
any orthogonal matrix Q ∈ Rn×n . More formally our provided with an adjacency matrix. In those cases we can
model satisfies: assume a fully connected graph where all nodes exchange
messages with each other, in other words, the neighborhood
Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl )
operator N (i) from equation 5 would include all other nodes
We provide a formal proof of this in Appendix A. Intuitively, of the graph except for i. This fully connected approach
let’s consider a hl feature which is already E(n) invariant, may not scale well to large graphs where we may want to
then we can see that the resultant edge embedding mij from locally limit the exchange of messages to avoid an overflow
Equation 3 will also be E(n) invariant, because in addition of information.
to hl , it only depends on squared distances kxli − xlj k2 , Similarly to (Serviansky et al., 2020; Kipf et al., 2018) we
which are E(n) invariant. Next, Equation 4 computes xl+1 i present a simple solution to infer the relations/edges of
by a weighted sum of differences (xi − xj ) which is added the graph in our model. We can re-write the aggregation
to xi , this transforms as a type-1 vector and preserves equiv- operation from our model (eq. 5) in the following way:
ariance (see Appendix A). Finally the last two equations 5 X X
and 6 that generate the next layer node-embeddings hl+1 re- mi = mij = eij mij (8)
main E(n) invariant since they only depend on hl and mij j∈N (i) j6=i

which, as we saw above, are E(n) invariant. Therefore the


output hl+1 is E(n) invariant and xl+1 is E(n) equivariant Where eij takes value 1 if there is an edge between nodes
to xl . Inductively, a composition of EGCLs will also be (i, j) and 0 otherwise. Notice this doesn’t modify yet the
equivariant. original equation used in our model, it is just a change in
notation. Now we can choose to approximate the relations
eij with the following function eij = φinf (mij ), where
3.2. Extending EGNNs for vector type representations
φinf : Rnf → [0, 1]1 resembles a linear layer followed
In this section we propose a slight modification to the pre- by a sigmoid function that takes as input the current edge
sented method such that we explicitly keep track of the embedding and outputs a soft estimation of its edge value.
particle’s momentum. In some scenarios this can be useful This modification doesn’t change the E(n) properties of the
not only to obtain an estimate of the particle’s velocity at model since we are only operating on the messages mij
every layer but also to provide an initial velocity value in which are already E(n) invariant.
E(n) Equivariant Graph Neural Networks

GNN Radial Field TFN Schnet EGNN

mij = φe (hli , hlj , krlij k2 , aij )


mij = φe (hli , hlj , aij ) mij = φrf (krlij k)rlij Wlk rlji hlk mij = φcf (krlij k)φs (hlj )
P
Edge mij = k i
m̂ij = rlij φx (mij )
P
. mi =
P
mij
P P P mi = j∈N (i) mij
Agg j∈N (i) mi = j6=i mij mi = j6=i mij mi = j6=i mij P
m̂i = j6=i m̂ij
 
l+1
hi = φh hli , mi
Node hl+1
i = φh (hli , mi ) xl+1
i = xli + mi hl+1
i = wll hli + mi hl+1
i = φh (hli , mi )
xl+1
i = xli + m̂i

Non-equivariant E(n)-Equivariant SE(3)-Equivariant E(n)-Invariant E(n)-Equivariant

Table 1. Comparison over different works from the literature under the message passing framework notation. We created this table with
the aim to provide a clear and simple way to compare over these different methods. The names from left to right are: Graph Neural
Networks (Gilmer et al., 2017); Radial Field from Equivariant Flows (Köhler et al., 2019); Tensor Field Networks (Thomas et al., 2018);
Schnet (Schütt et al., 2017b); and our Equivariant Graph Neural Network. The difference between two points is written rij = (xi − xj ).

4. Related Work Relationship with existing methods


Group equivariant neural networks have demonstrated their In Table 1 the EGNN equations are detailed together with
effectiveness in a wide variety of tasks (Cohen & Welling, some of its closest methods from the literature under the
2016; 2017; Weiler & Cesa, 2019; Rezende et al., 2019; message passing notation from (Gilmer et al., 2017). This ta-
Romero & Cordonnier, 2021). Recently, various forms and ble aims to provide a simple way to compare these different
methods to achieve E(3) or SE(3) equivariance have been algorithms. It is structured in three main rows that describe
proposed. Thomas et al. (2018); Fuchs et al. (2020) utilize i) the edge ii) aggregation and iii) node update operations.
the spherical harmonics to compute a basis for the trans- The GNN algorithm is the same as the previously introduced
formations, which allows transformations between higher- in section 2.1. Our EGNN algorithm is also equivalent to the
order representations. A downside to this method is that the description in section 3 but notation has been modified to
spherical harmonics need to be recomputed which can be match the (edge, aggregation, node) format. In all equations
expensive. Currently, an extension of this method to arbi- rlij = (xi − xj )l . Notice that except the EGNN, all algo-
trary dimensions is unknown. Finzi et al. (2020) parametrize rithms have the same aggregation operation and the main
transformations by mapping kernels on the Lie Algebra. For differences arise from the edge operation. The algorithm
this method the neural network outputs are in certain situ- that we call ”Radial Field” is the E(n) equivariant update
ations stochastic, which may be undesirable. Köhler et al. from (Köhler et al., 2019). This method is E(n) equivariant,
(2019); Köhler et al. (2020) propose an E(n) equivariant however its main limitation is that it only operates on x and
network to model 3D point clouds, but the method is only it doesn’t propagate node features h among nodes. In the
defined for positional data on the nodes without any feature method φrf is modelled as an MLP. Tensor Field Networks
dimensions. (TFN) (Thomas et al., 2018) instead propagate the node
embeddings h but it uses spherical harmonics to compute
Another related line of research concerns message pass-
its learnable weight kernel W`k : R3 → R(2`+1)×(2k+1)
ing algorithms on molecular data. (Gilmer et al., 2017)
which preserves SE(3) equivariance but is expensive to
presented a message passing setting (or Graph Neural Net-
compute an limited to the 3 dimensional space. The SE(3)
work) for quantum chemistry, this method is permutation
Transformer (Fuchs et al., 2020) (not included in this table),
equivariant but not translation or rotation equivariant. (Kon-
can be interpreted as an extension of TFN with attention.
dor et al., 2018) extends the equivariance of GNNs such
Schnet (Schütt et al., 2017b) can be interpreted as an E(n)
that each neuron transforms in a specific way under per-
invariant Graph Neural Network where φcf receives as input
mutations, but this extension only affects its permutation
relative distances and outputs a continuous filter convolu-
group and not translations or rotations in a geometric space.
tion that multiplies the neighbor embeddings h. Our EGNN
Further works (Schütt et al., 2017b;a) build E(n) invariant
differs from these other methods in terms that it performs
message passing networks by considering the relative dis-
two different updates in each of the table rows, one related
tances between points. (Anderson et al., 2019; Miller et al.,
to the embeddings h and another related to the coordinates
2020) additionally include SO(3) equivariance in its inter-
x, these two variables exchange information in the edge
mediate layers for modelling the behavior and properties of
operation. In summary the EGNN can retain the flexibility
molecular data. Our method is also framed as a message
of GNNs while remaining E(n) equivariant as the Radial
passing framework but in contrast to these methods it also
Field algorithm and without the need to compute expensive
achieves E(n) equivariance.
operations (i.e. spherical harmonics).
E(n) Equivariant Graph Neural Networks

5. Experiments tails are provided in Appendix C.1. A Linear model that


simply considers the motion equation p(t) = p(0) + v(0) t
5.1. Modelling a dynamical system — N-body system is also included as a baseline. Since some methods may
In a dynamical system a function defines the time depen- be more expensive than others we also provide the forward
dence of a point or set of points in a geometrical space. Mod- pass time in seconds for each of the models for a batch of
elling these complex dynamics is crucial in a variety of ap- 100 samples in a GTX 1080 Ti GPU.
plications such as control systems (Chua et al., 2018), model
based dynamics in reinforcement learning (Nagabandi et al., Method MSE Forward time (s)
2018), and physical systems simulations (Grzeszczuk et al., Linear 0.0819 .0001
1998; Watters et al., 2017). In this experiment we forecast SE(3) Transformer 0.0244 .1346
the positions for a set of particles which are modelled by Tensor Field Network 0.0155 .0343
simple interaction rules, yet can exhibit complex dynamics. Graph Neural Network 0.0107 .0032
Similarly to (Fuchs et al., 2020), we extended the Charged Radial Field 0.0106 .0039
Particles N-body experiment from (Kipf et al., 2018) to a 3 EGNN 0.0071 .0062
dimensional space. The system consists of 5 particles that
carry a positive or negative charge and have a position and Table 2. Mean Squared Error for the future position estimation in
a velocity associated in 3-dimensional space. The system is the N-body system experiment, and forward time in seconds for a
batch size of 100 samples running in a GTX 1080Ti GPU.
controlled by physic rules: particles are attracted or repelled
depending on their charges. This is an equivariant task since
Results As shown in Table 2 our model significantly outper-
rotations and translations on the input set of particles result
forms the other equivariant and non-equivariant alternatives
in the same rotation and translation throughout the entire
while still being efficient in terms of running time. It reduces
trajectory.
the error with respect to the second best performing method
Dataset: We sampled 3.000 trajectories for training, 2.000 by a 33%. In addition it doesn’t require the computation
for validation and 2.000 for testing. Each trajectory has of spherical harmonics which makes it more time efficient
a duration of 1.000 timesteps. For each trajectory we than Tensor Field Networks and the SE(3) Transformer.
are provided with the initial particle positions p(0) =
(0) (0)
{p1 , . . . p5 } ∈ R5×3 , their initial velocities v(0) =
(0) (0)
{v1 , . . . v5 } ∈ R5×3 and their respective charges c =
{c1 , . . . c5 } ∈ {−1, 1}5 . The task is to estimate the posi-
tions of the five particles after 1.000 timesteps. We optimize
the averaged Mean Squared Error of the estimated position
with the ground truth one.
Implementation details: In this experiment we used the ex-
tension of our model that includes velocity from section 3.2.
We input the position p(0) as the first layer coordinates x0
of our model and the velocity v(0) as the initial velocity in
(0)
Equation 7, the norms kvi k are also provided as features to
h0i . The charges are input as edge attributes aij = ci cj . The
Figure 2. Mean Squared Error in the N-body experiment for the Ra-
model outputs the last layer coordinates xL as the estimated dial Field, GNN and EGNN methods when sweeping over different
positions. We compare our method to its non equivariant amounts of training data.
Graph Neural Network (GNN) cousin, and the equivariant
methods: Radial Field (Köhler et al., 2019), Tensor Field Analysis for different number of training samples: We
Networks and the SE(3) Transformer. All algorithms are want to analyze the performance of our EGNN in the small
composed of 4 layers and have been trained under the same and large data regime. In the following, we report on a
conditions, batch size 100, 10.000 epochs, Adam optimizer, similar experiment as above, but instead of using 3.000
the learning rate was tuned independently for each model. training samples we generated a new training partition of
We used 64 features for the hidden layers in the Radial 50.000 samples and we sweep over different amounts of data
Field, the GNN and our EGNN. As non-linearity we used from 100 to 50.000 samples. We compare the performances
the Swish activation function (Ramachandran et al., 2017). of our EGNN vs its non-equivariant GNN counterpart and
For TFN and the SE(3) Transformer we swept over different the Radial Field algorithm. Results are presented in Figure
number of vector types and features and chose those that 2. Our method outperforms both Radial Field and GNNs
provided the best performance. Further implementation de- in the small and large data regimes. This shows the EGNN
E(n) Equivariant Graph Neural Networks

is more data efficient than GNNs since it doesn’t require The symmetry problem: The above stated autoencoder
to generalize over rotations and translations of the data may seem straightforward to implement at first sight but in
while it ensembles the flexibility of GNNs in the larger some cases there is a strong limitation regarding the symme-
data regime. Due to its high model bias, the Radial Field try of the graph. Graph Neural Networks are convolutions
algorithm performs well when data is scarce but it is unable on the edges and nodes of a graph, i.e. the same function is
to learn the subtleties of the dataset as we increase the applied to all edges and to all nodes. In some graphs (e.g.
training size. In summary, our EGNN benefits from both those defined only by its adjacency matrix) we may not have
the high bias of E(n) methods and the flexibility of GNNs. input features in the nodes, such that the difference among
nodes relies only on their edges or neighborhood topology.
5.2. Graph Autoencoder Therefore, if the neighborhood of two nodes is exactly the
same, their encoded embeddings will be the same too. A
A Graph Autoencoder can learn unsupervised representa- clear example of this is a cycle graph (an example of a 4
tions of graphs in a continuous latent space (Kipf & Welling, nodes cycle graph is provided in Figure 3). When running a
2016b; Simonovsky & Komodakis, 2018). In this exper- Graph Neural Network encoder on a node featureless cycle
iment section we use our EGNN to build an Equivariant graph, we will obtain the exact same embedding for each
Graph Autoencoder. We will explain how Graph Autoen- of the nodes, which makes it impossible to reconstruct the
coders can benefit from equivariance and we will show how edges of the original graph from the node embeddings. The
our method outperforms standard GNN autoencoders in the cycle graph is a severe example where all nodes have the ex-
provided datasets. This problem is particularly interesting act same neighborhood topology but these symmetries can
since the embedding space can be scaled to larger dimen- be present in different ways for other graphs with different
sions and is not limited to a 3 dimensional Euclidean space. edge distributions or even when including node features if
Similarly to the work of (Kipf & Welling, 2016b) further these are not unique.
extended by section 3.3 in (Liu et al., 2019), our graph
auto-encoder z = q(G) embeds a graph G into a set of
latent nodes z = {z1 , . . . zM } ∈ RM ×n , where M is the
number of nodes and n the embedding size per node. Notice
this reduces the memory complexity to store the graphs
from O(M 2 ) to O(M ) with respect to the number of nodes.
This differs from (Simonovsky & Komodakis, 2018) which
embeds the graph in a single vector z ∈ RK , which causes
the reconstruction to be computationally very expensive
since the nodes of the decoded graph have to be matched Figure 3. Visual representation of a Graph Autoencoder for a 4
again to the ground truth. nodes cycle graph. The bottom row illustrates that adding noise at
the input graph breaks the symmetry of the embedding allowing
More specifically, we will compare our Equivariant Graph the reconstruction of the adjacency matrix.
Auto-Encoder in the task presented in (Liu et al., 2019)
where a graph G = (V, E) with node features H ∈ RM ×nf An ad-hoc method to break the symmetry of the graph intro-
and adjacency matrix A ∈ {0, 1}M ×M is embedded into duced by (Liu et al., 2019). This method introduces noise
a latent space z = q(H, A) ∈ RM ×n . Following (Kipf & sampled from a Gaussian distribution into the input node
Welling, 2016b; Liu et al., 2019), we are only interested in features of the graph h0i ∼ N (0, σI). This noise allows
reconstructing the adjacency matrix A since the datasets we different representation for all node embeddings and as a
will work with do not contain node features. The decoder result the graph can be decoded again, but it comes with
g(·) proposed by (Liu et al., 2019) takes as input the embed- a drawback, the network has to generalize over the new
ding space z and outputs the reconstructed adjacency matrix introduced noise distribution. Our Equivariant Graph Au-
 = g(z), this decoder function is defined as follows: toencoder will remain translation and rotation equivariant to
1 this sampled noise which we find makes the generalization
Âij = ge (zi , zj ) = 2 (9) much easier. In our case we will simply input this noise
1 + exp(w kzi − zj k + b)
as the input coordinates x0 ∼ N (0, σI) ∈ RM ×n of our
EGNN which will output an equivariant transformation of
Where w and b are its only learnable parameters and ge (·) is
them xL , this output will be used as the embedding of the
the decoder edge function applied to every pair of node em-
graph (i.e. z = xL ) which is the input to the decoder from
beddings. It reflects that edge probabilities will depend on
Equation 9.
the relative distances among node embeddings. The training
loss is defined as the binary cross entropy
P between the esti- Dataset: We generated community-small graphs (You et al.,
mated and the ground truth edges L = ij BCE(Âij , Aij ). 2018; Liu et al., 2019) by running the original code from
E(n) Equivariant Graph Neural Networks

Community Small Erdos&Renyi


Encoder BCE % Error F1 BCE % Error F1
Baseline - 31.79 .0000 - 25.13 0.000
GNN 6.75 1.29 0.980 14.15 4.62 0.907
Noise-GNN 3.32 0.44 0.993 4.56 1.25 0.975
Radial Field 9.69 1.23 0.981 6.28 1.51 0.970
EGNN 2.18 0.05 0.999 1.65 0.11 0.998

Figure 5. In the Table at the left we report the Binary Cross Entropy, % Error and F1 scores for the test partition on the Graph Autoencoding
experiment in the Community Small and Erdos&Renyi datasets. In the Figure at the right, we report the F1 score when overfitting a
training partition of 100 samples in the Erdos&Renyi dataset for different values of sparsity pe . The GNN is not able to successfully
auto-encode sparse graphs (small pe values) for the Erdos&Renyi dataset even when training and testing on the same small subset.

(You et al., 2018). These graphs contain 12 ≤ M ≤ 20 all numbers refer to the test partition. We also include a
nodes. We also generated a second dataset using the Er- ”Baseline” that predicts all edges as missing Âij = 0. The
dos&Renyi generative model (Bollobás & Béla, 2001) sam- standard GNN seems to suffer from the symmetry problem
pling random graphs with an initial number of 7 ≤ M ≤ 16 and provides the worst performance. When introducing
nodes and edge probability pe = 0.25. We sampled 5.000 noise (Noise-GNN), both the loss and the error decrease
graphs for training, 500 for validation and 500 for test for showing that it is actually useful to add noise to the input
both datasets. Each graph is defined as and adjacency matrix nodes. Finally, our EGNN remains E(n) equivariant to
A ∈ {0, 1}M ×M . this noise distribution and provides the best reconstruction
with a 0.11% error in the Erdos&Renyi dataset and close to
Implementation details: Our Equivariant Graph Auto-
optimal 0.05% in the Community Small dataset.
Encoder is composed of an EGNN encoder followed by
the decoder from Equation 9. The graph edges Aij are Overfitting the training set: We explained the symme-
input as edge attributes aij in Equation 3. The noise try problem and we showed the EGNN outperforms other
used to break the symmetry is input as the coordinates methods in the given datasets. Although we observed that
x0 ∼ N (0, σI) ∈ RM ×n in the first layer and h0 is initial- adding noise to the GNN improves the results, it is difficult
ized as ones since we are working with featureless graphs. to exactly measure the impact of the symmetry limitation in
As mentioned before, the encoder outputs an equivariant these results independent from other factors such as gener-
transformation on the coordinates which is the graph em- alization from the training to the test set. In this section we
bedding and input to the decoder z = xL ∈ RM ×n . We conduct an experiment where we train the different models
use n = 8 dimensions for the embedding space. We com- in a subset of 100 Erdos&Renyi graphs and embedding size
pare the EGNN to its GNN cousin, we also compare to n = 16 with the aim to overfit the data. We evaluate the
Noise-GNN which is an adaptation of our GNN to match methods on the training data. In this experiment the GNN
the setting from (Liu et al., 2019), and we also include the is unable to fit the training data properly while the EGNN
Radial Field algorithm as an additional baseline. All four can achieve perfect reconstruction and Noise-GNN close to
models have 4 layers, 64 features for the hidden layers, the perfect. We sweep over different pe sparsity values from 0.1
Swish activation function as a non-linearity and they were to 0.9 since the symmetry limitation is more present in very
all trained for 100 epochs using the Adam optimizer and sparse or very dense graphs. We report the F1 scores of this
learning rate 1e−4 . More details are provided in Appendix experiment in the right plot of Figure 5.
C.2. Since the number of nodes is larger than the number of
In this experiment we showed that E(n) equivariance im-
layers, the receptive field of a GNN may not comprise the
proves performance when embedding graphs in a continuous
whole graph which can make the comparison unfair with our
space as a set of nodes in dimension n. Even though this is a
EGNN. To avoid this limitation, all models exchange mes-
simple reconstruction task, we think this can be a useful step
sages among all nodes and the edge information is provided
towards generating graphs or molecules where often graphs
as edge attributes aij = Aij in all of them.
(i.e. edges) are decoded as pairwise distances or similarities
Results: In the table from Figure 5 we report the Binary between nodes e.g. (Kipf & Welling, 2016b; Liu et al., 2019;
Cross Entropy loss between the estimated and ground truth Grover et al., 2019), and these metrics (e.g. eq. 9) are E(n)
edges, the % Error which is defined as the percentage of invariant. Additionally this experiment also showed that our
wrong predicted edges with respect to the total amount of method can successfully perform in a E(n) equivariant task
potential edges, and the F1 score of the edge classification, for higher dimensional spaces where n > 3.
E(n) Equivariant Graph Neural Networks

Task α ∆ε εHOMO εLUMO µ Cν G H R2 U U0 ZPVE


Units bohr3 meV meV meV D cal/mol K meV meV bohr3 meV meV meV
NMP .092 69 43 38 .030 .040 19 17 .180 20 20 1.500
Schnet .235 63 41 34 .033 .033 14 14 .073 19 14 1.700
Cormorant .085 61 34 38 .038 .026 20 21 .961 21 22 2.027
L1Net .088 68 46 35 .043 .031 14 14 .354 14 13 1.561
LieConv .084 49 30 25 .032 .038 22 24 .800 19 19 2.280
TFN .223 58 40 38 .064 .101 - - - - - -
SE(3)-Tr. .142 53 35 33 .051 .054 - - - - - -
EGNN .071 48 29 25 .029 .031 12 12 .106 12 11 1.554

Table 3. Mean Absolute Error for the molecular property prediction benchmark in QM9 dataset.

5.3. Molecular data — QM9 dings hL from the output of the EGNN to the estimated
property value. Further implementation details are reported
The QM9 dataset (Ramakrishnan et al., 2014) has become
in Appendix. C. We compare to NMP (Gilmer et al., 2017),
a standard in machine learning as a chemical property pre-
Schnet (Schütt et al., 2017b), Cormorant (Anderson et al.,
diction task. The QM9 dataset consists of small molecules
2019), L1Net (Miller et al., 2020), LieConv (Finzi et al.,
represented as a set of atoms (up to 29 atoms per molecule),
2020), TFN (Thomas et al., 2018) and SE(3)-Tr. (Fuchs
each atom having a 3D position associated and a five di-
et al., 2020).
mensional one-hot node embedding that describe the atom
type (H, C, N, O, F). The dataset labels are a variety of Results are presented in Table 3. Our method reports state
chemical properties for each of the molecules which are esti- of the art in 9 out of 12 property predictions, and it is the
mated through regression. These properties are invariant to second best performing method for the remaining 3 proper-
translations, rotations and reflections on the atom positions. ties. Perhaps, surprisingly, we outperform other equivariant
Therefore those models that are E(3) invariant are highly networks that consider higher order representations while in
suitable for this task. this task, we are only using type-0 representations (i.e. rela-
tive distances) to define the geometry of the molecules. In
We imported the dataset partitions from (Anderson et al.,
Appendix D we prove that when only positional information
2019), 100K molecules for training, 18K for validation and
is given (i.e. no velocity or higher order type features), then
13K for testing. A variety of 12 chemical properties were
the geometry is completely defined by the norms in-between
estimated per molecule. We optimized and report the Mean
points up to E(n)-transformations, in other words, if two
Absolute Error between predictions and ground truth.
collections of points separated by E(n) transformations are
Implementation details: Our EGNN receives as input the considered to be identical, then the relative norms between
3D coordinate locations of each atom which are provided as points is a unique identifier of the collection.
x0i in Equation 3 and an embedding of the atom properties
which is provided as input node features h0i . Since this is 6. Conclusions
an invariant task and also x0 positions are static, there is no
need to update the particle’s position x by running Equation Equivariant graph neural networks are receiving increasing
4 as we did in previous experiments. Consequently, we interest from the natural and medical sciences as they repre-
tried both manners and we didn’t notice any improvement sent a new tool for analyzing molecules and their properties.
by updating x. When not updating the particle’s position In this work, we presented a new E(n) equivariant deep ar-
(i.e. skipping Equation 4), our model becomes E(3) in- chitecture for graphs that is computationally efficient, easy
variant, which is analogous to a standard GNN where all to implement, and significantly improves over the current
relative squared norms between pairs of points kxi − xj k2 state-of-the-art on a wide range of tasks. We believe these
are inputted to the edge operation (eq. 3). Additionally, properties make it ideally suited to make a direct impact on
since we are not provided with an adjacency matrix and topics such as drug discovery, protein folding and the design
molecules can scale up to 29 nodes, we use the extension of new materials, as well as applications in 3D computer
of our model from Section 3.3 that infers a soft estimation vision.
of the edges. Our EGNN network consists of 7 layers, 128
features per hidden layer and the Swish activation function
as a non-linearity. A sum-pooling operation preceded and
followed by two layers MLPs maps all the node embed-
E(n) Equivariant Graph Neural Networks

References Kipf, T. N. and Welling, M. Semi-supervised classifica-


tion with graph convolutional networks. arXiv preprint
Anderson, B., Hy, T.-S., and Kondor, R. Cormorant:
arXiv:1609.02907, 2016a.
Covariant molecular neural networks. arXiv preprint
arXiv:1906.04015, 2019. Kipf, T. N. and Welling, M. Variational graph auto-encoders.
arXiv preprint arXiv:1611.07308, 2016b.
Bollobás, B. and Béla, B. Random graphs. Number 73.
Cambridge university press, 2001. Köhler, J., Klein, L., and Noé, F. Equivariant flows: sam-
pling configurations for multi-body systems with sym-
Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spec- metric energies. CoRR, abs/1910.00753, 2019.
tral networks and locally connected networks on graphs.
arXiv preprint arXiv:1312.6203, 2013. Köhler, J., Klein, L., and Noé, F. Equivariant flows: exact
likelihood generative learning for symmetric densities.
Chua, K., Calandra, R., McAllister, R., and Levine, S. arXiv preprint arXiv:2006.02425, 2020.
Deep reinforcement learning in a handful of trials us-
ing probabilistic dynamics models. arXiv preprint Kondor, R., Son, H. T., Pan, H., Anderson, B., and Trivedi,
arXiv:1805.12114, 2018. S. Covariant compositional networks for learning graphs.
arXiv preprint arXiv:1801.02144, 2018.
Cohen, T. and Welling, M. Group equivariant convolutional
networks. In Balcan, M. and Weinberger, K. Q. (eds.), Liu, J., Kumar, A., Ba, J., Kiros, J., and Swersky, K. Graph
Proceedings of the 33nd International Conference on normalizing flows. In Advances in Neural Information
Machine Learning, ICML, 2016. Processing Systems, pp. 13578–13588, 2019.

Cohen, T. S. and Welling, M. Steerable cnns. In 5th Inter- Miller, B. K., Geiger, M., Smidt, T. E., and Noé,
national Conference on Learning Representations, ICLR, F. Relevance of rotationally equivariant convolutions
2017. for predicting molecular properties. arXiv preprint
arXiv:2008.08461, 2020.
Defferrard, M., Bresson, X., and Vandergheynst, P. Con-
Nagabandi, A., Kahn, G., Fearing, R. S., and Levine, S.
volutional neural networks on graphs with fast localized
Neural network dynamics for model-based deep reinforce-
spectral filtering. In Advances in neural information pro-
ment learning with model-free fine-tuning. In 2018 IEEE
cessing systems, pp. 3844–3852, 2016.
International Conference on Robotics and Automation
Finzi, M., Stanton, S., Izmailov, P., and Wilson, A. G. Gen- (ICRA), pp. 7559–7566. IEEE, 2018.
eralizing convolutional neural networks for equivariance
Ramachandran, P., Zoph, B., and Le, Q. V. Searching for
to lie groups on arbitrary continuous data. arXiv preprint
activation functions. arXiv preprint arXiv:1710.05941,
arXiv:2002.12880, 2020.
2017.
Fuchs, F., Worrall, D., Fischer, V., and Welling, M. Se Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld,
(3)-transformers: 3d roto-translation equivariant attention O. A. Quantum chemistry structures and properties of
networks. Advances in Neural Information Processing 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
Systems, 33, 2020.
Rezende, D. J., Racanière, S., Higgins, I., and Toth, P. Equiv-
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and ariant hamiltonian flows. CoRR, abs/1909.13739, 2019.
Dahl, G. E. Neural message passing for quantum chem-
istry. arXiv preprint arXiv:1704.01212, 2017. Romero, D. W. and Cordonnier, J.-B. Group equiv-
ariant stand-alone self-attention for vision. In In-
Grover, A., Zweig, A., and Ermon, S. Graphite: Iterative ternational Conference on Learning Representations,
generative modeling of graphs. In International confer- 2021. URL https://openreview.net/forum?
ence on machine learning, pp. 2434–2444. PMLR, 2019. id=JkfYjnOEo6M.
Grzeszczuk, R., Terzopoulos, D., and Hinton, G. Neuroan- Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R.,
imator: Fast neural network emulation and control of and Tkatchenko, A. Quantum-chemical insights from
physics-based models. In Proceedings of the 25th an- deep tensor neural networks. Nature communications, 8
nual conference on Computer graphics and interactive (1):1–8, 2017a.
techniques, pp. 9–20, 1998.
Schütt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S.,
Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel, Tkatchenko, A., and Müller, K.-R. Schnet: A continuous-
R. Neural relational inference for interacting systems. filter convolutional neural network for modeling quantum
arXiv preprint arXiv:1802.04687, 2018. interactions. arXiv preprint arXiv:1706.08566, 2017b.
E(n) Equivariant Graph Neural Networks

Serviansky, H., Segol, N., Shlomi, J., Cranmer, K., Gross, E.,
Maron, H., and Lipman, Y. Set2graph: Learning graphs
from sets. Advances in Neural Information Processing
Systems, 33, 2020.
Simonovsky, M. and Komodakis, N. Graphvae: Towards
generation of small graphs using variational autoencoders.
In International Conference on Artificial Neural Net-
works, pp. 412–422. Springer, 2018.
Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L.,
Kohlhoff, K., and Riley, P. Tensor field networks:
Rotation-and translation-equivariant neural networks for
3d point clouds. arXiv preprint arXiv:1802.08219, 2018.

Uy, M. A., Pham, Q.-H., Hua, B.-S., Nguyen, T., and Yeung,
S.-K. Revisiting point cloud classification: A new bench-
mark dataset and classification model on real-world data.
In Proceedings of the IEEE International Conference on
Computer Vision, pp. 1588–1597, 2019.

Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu,


R., and Tacchetti, A. Visual interaction networks: Learn-
ing a physics simulator from video. Advances in neural
information processing systems, 30:4539–4547, 2017.
Weiler, M. and Cesa, G. General e(2)-equivariant steerable
cnns. In Wallach, H. M., Larochelle, H., Beygelzimer,
A., d’Alché-Buc, F., Fox, E. B., and Garnett, R. (eds.),
Advances in Neural Information Processing Systems 32:
Annual Conference on Neural Information Processing
Systems, 2019.

You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J.
Graphrnn: Generating realistic graphs with deep auto-
regressive models. arXiv preprint arXiv:1802.08773,
2018.
E(n) Equivariant Graph Neural Networks

A. Equivariance Proof
In this section we prove that our model is translation equivariant on x for any translation vector g ∈ Rn and it is rotation
and reflection equivariant on x for any orthogonal matrix Q ∈ Rn×n . More formally, we will prove the model satisfies:

Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl )

We will analyze how a translation and rotation of the input coordinates propagates through our model. We start assuming h0
is invariant to E(n) transformations on x, in other words, we do not encode any information about the absolute position
or orientation of x0 into h0 . Then, the output mij of Equation 3 will be invariant too since the distance between two
particles is invariant to translations kxli + g − [xlj + g]k2 = kxli − xlj k2 , and it is invariant to rotations and reflections
kQxli − Qxlj k2 = (xli − xlj )> Q> Q(xli − xlj ) = (xli − xlj )> I(xli − xlj ) = kxli − xlj k2 such that the edge operation becomes
invariant:
 2
  2

mi,j = φe hli , hlj , Qxli + g − [Qxlj + g] , aij = φe hli , hlj , xli − xlj , aij

The second equation of our model (eq. 4) that updates the coordinates x is E(n) equivariant. Following, we prove its
equivariance by showing that an E(n) transformation of the input leads to the same transformation of the output. Notice
mij is already invariant as proven above. We want to show:
X
Qxl+1 + g = Qxli + g + Qxli + g − [Qxlj + g] φx (mi,j )

i
j6=i

Derivation.
X X
Qxli + g + Qxli + g − Qxlj − g φx (mi,j ) = Qxli + g + Q xli − xlj φx (mi,j )
 

j6=i j6=i
 
X
= Q xli + xli − xj φx (mi,j ) + g
l


j6=i

= Qxl+1
i +g

Therefore, we have proven that rotating and translating xl results in the same rotation and translation on xl+1 at the output
of Equation 4.
Furthermore equations 5 and 6 only depend on mij and hl which as saw at the beginning of this proof, are E(n) invariant,
therefore the output of Equation 6 hl+1 will be invariant too. Thus concluding that a transformation Qxl + g on xl will result
in the same transformation on xl+1 while hl+1 will remain invariant to it such that Qxl+1 + g, hl+1 = EGCL(Qxl + g, hl )
is satisfied.

B. Re-formulation for velocity type inputs


In this section we write down the EGNN transformation layer hl+1 , xl+1 , vl+1 = EGCL[hl , xl , vl , E] that can take in
velocity input and output channels. We also prove it remains E(n) equivariant.
 2

mij = φe hli , hlj , xli − xlj , aij
X
vil+1 = φv hli vil + xli − xlj φx (mij )
 

j6=i

xl+1
i = xli
+ vil+1
X
mi = mij
j∈N (i)

hl+1 = φh hli , mi

i
E(n) Equivariant Graph Neural Networks

B.1. Equivariance proof for velocity type inputs


In this subsection we prove that the velocity types input formulation of our model is also E(n) equivariant on x. More
formally, for any translation vector g ∈ Rn and for any orthogonal matrix Q ∈ Rn×n , the model should satisfy:
hl+1 , Qxl+1 + g, Qvl+1 = EGCL[hl , Qxl + g, Qvl , E]

In Appendix A we already proved the equivariance of our EGNN (Section 3) when not including vector type inputs. In its
velocity type inputs variant we only replaced its coordinate updates (eq. 4) by Equation 7 that includes velocity. Since this is
the only modification we will only prove that Equation 7 re-written below is equivariant.
X
vil+1 = φv hli vil + xli − xlj φx (mij )
 

j6=i

xl+1
i = xli + vil+1

First, we prove the first line preserves equivariance, that is we want to show:
X
Qvil+1 = φv hli Qvil + Qxli + g − [Qxlj + g] φx (mij )
 

j6=i

Derivation.
X X
φv hli Qvil + Qxli + g − [Qxlj + g] φx (mij ) = Qφv hli vil + Q xli − xlj φx (mij )
   
(10)
j6=i j6=i
 
X
hli vil + xli − xlj φx (mij )
 
= Q  φv (11)
j6=i

= Qvil+1 (12)
Finally, it is straightforward to show the second equation is also equivariant, that is we want to show Qxl+1
i +g =
Qxli + g + Qvil+1
Derivation.
Qxli + g + Qvil+1 = Q xli + vil+1 + g


= Qxl+1
i +g
Concluding we showed that an E(n) transformation on the input set of points results in the same transformation on the
output set of points such that hl+1 , Qxl+1 + g, Qvl+1 = EGCL[hl , Qxl + g, Qvl , E] is satisfied.

C. Implementation details
In this Appendix section we describe the implementation details of the experiments. First, we describe those parts of
our model that are the same across all experiments. Our EGNN model from Section 3 contains the following three main
learnable functions.

• The edge function φe (eq. 3) is a two layers MLP with two Swish non-linearities: Input →
− {LinearLayer() →
− Swish()

− LinearLayer() →
− Swish() } → − Output.
• The coordinate function φx (eq. 4) consists of a two layers MLP with one non-linearity: Input →
− {LinearLayer() →

Swish() →
− LinearLayer() } →
− Output
• The node function φh (eq. 6) consists of a two layers MLP with one non-linearity and a residual connection: Input →

{LinearLayer() →
− Swish() →
− LinearLayer() → − Addition(Input) } →
− Output

These functions are used in our EGNN across all experiments. Notice the GNN (eq. 2) also contains and edge operation and
a node operation φe and φh respectively. We use the same functions described above for both the GNN and the EGNN such
that comparisons are as fair as possible.
E(n) Equivariant Graph Neural Networks

C.1. Implementation details for Dynamical Systems


Dataset
In the dynamical systems experiment we used a modification of the Charged Particle’s N-body (N=5) system from (Kipf
et al., 2018). Similarly to (Fuchs et al., 2020), we extended it from 2 to 3 dimensions customizing the original code from
(https://github.com/ethanfetaya/NRI) and we removed the virtual boxes that bound the particle’s positions. The sampled
dataset consists of 3.000 training trajectories, 2.000 for validation and 2.000 for testing. Each trajectory has a duration of
1.000 timesteps. We actually generated trajectories of 5.000 time steps and sliced them from timestep 3.000 to timestep
4.000 such that the initial conditions are more realistic than the Gaussian Noise initialization from which they are initialized.
In our second experiment, we sweep from 100 to 50.000 training samples, for this we just created a new training partition
following the same procedure as before but now generating 50.000 trajectories instead. The validation and test partition
remain the same from last experiment.
Models
All models are composed of 4 layers, the details for each model are the following.

• EGNN: For the EGNN we use its variation that considers vector type inputs from Section 3.2. This variation adds
the function φv to the model which is composed of two linear layers with one non-linearity: Input →
− {LinearLayer()

− Swish() → − LinearLayer() } →− Output. Functions φe , φx and φh that define our EGNN are the same than for all
experiments and are described at the beginning of this Appendix C.

• GNN: The GNN is also composed of 4 layers, its learnable functions edge operation φe and node operation φh from
Equation 2 are exactly the same as φe and φh from our EGNN introduced in Appendix C. We chose the same functions
for both models to ensure a fair comparison. In the GNN case, the initial position p0 and velocity v0 from the particles
is passed through a linear layer and inputted into the GNN first layer h0 . The particle’s charges are inputted as edge
attributes aij = ci cj . The output of the GNN hL is passed through a two layers MLP that maps it to the estimated
position.

• Radial Field: The Radial Field algorithm is described in the Related Work 4, its only parameters are contained in
its edge operation φrf () which in our case is a two layers MLP with two non linearities Input →− {LinearLayer() →−
Swish() → − LinearLayer() → − Tanh } − → Output. Notice we introduced a Tanh at the end of the MLP which fixes some
instability issues that were causing this model to diverge in the dynamical system experiment. We also augmented the
Radial Field algorithm with the vector type inputs modifications introduced in Section 3.2. In addition to the norms
between pairs of points, φrf () also takes as input the particle charges ci cj .

• Tensor Field Network: We used the Pytorch implementation from https://github.com/FabianFuchsML/se3-transformer-


public. We swept over different hyper paramters, degree ∈ {2, 3, 4}, number of features ∈ {12, 24, 32, 64, 128}. We
got the best performance in our dataset for degree 2 and number of features 32. We used the Relu activation layer
instead of the Swish for this model since it provided better performance.

• SE(3) Transformers: For the SE(3)-Transformer we used code from https://github.com/FabianFuchsML/se3-


transformer-public. Notice this implementation has only been validated in the QM9 dataset but it is the only available
implementation of this model. We swept over different hyperparamters degree ∈ {1, 2, 3, 4}, number of features ∈ 16,
32, 64 and divergence ∈ {1, 2}, along with the learning rate. We obtained the best performance for degree 3, number
of features 64 and divergence 1. As in Tensor Field Networks we obtained better results by using the Relu activation
layer instead of the Swish.

Other implementation details


In Table 2 all models were trained for 10.000 epochs, batch size 100, Adam optimizer, the learning rate was fixed and
independently chosen for each model. All models are 4 layers deep and the number of training samples was set to 3.000.

C.2. Implementation details for Graph Autoneoders


Dataset
E(n) Equivariant Graph Neural Networks

In this experiment we worked with Community Small (You et al., 2018) and Erdos&Renyi (Bollobás & Béla, 2001) generated
datasets.

• Community Small: We used the original code from (You et al., 2018) (https://github.com/JiaxuanYou/graph-generation)
to generate a Community Small dataset. We sampled 5.000 training graphs, 500 for validation and 500 for testing.
• Erdos&Renyi is one of the most famous graph generative algorithms. We used the ”gnp random graph(M , p)” function
from (https://networkx.org/) that generates random graphs when povided with the number of nodes M and the edge
probability p following the Erdos&Renyi model. Again we generated 5.000 graphs for training, 500 for validation
and 500 for testing. We set the edge probability (or sparsity value) to p = 0.25 and the number of nodes M ranging
from 7 to 16 deterministically uniformly distributed. Notice that edges are generated stochastically with probability
p, therefore, there is a chance that some nodes are left disconnected from the graph, ”gnp random graph(M , p)”
function discards these disconnected nodes such that even if we generate graphs setting parameters to 7 ≤ M ≤ 16
and p = 0.25 the generated graphs may have less number of nodes.

Finally, in the graph autoencoding experiment we also overfitted in a small partition of 100 samples (Figure 5) for the
Erdos&Renyi graphs described above. We reported results for different p values ranging from 0.1 to 0.9. For each p value
we generated a partition of 100 graphs with initial number of nodes between 7 ≤ M ≤ 16 using the Erdos&Renyi generative
model.
Models
All models consist of 4 layers, 64 features for the hidden layers and the Swish activation function as a non linearity. The
EGNN is defined as explained in Section 3 without any additional modules (i.e. no velocity type features or inferring edges).
The functions φe , φx and φh are defined at the beginning of this Appendix C. The GNN (eq. 2) mimics the EGNN in terms
that it uses the same φh and φe than the EGNN for its edge and node updates. The Noise-GNN is exactly the same as the
GNN but inputting noise into the h0 features. Finally the Radial Field was defined in the Related Related work Section 4
which edge’s operation φrf consists of a two layers MLP: Input → − { Linear() →
− Swish() →− Linear() } →− Output.
Other implementation details
All experiments have been trained with learning rate 1e−4 , batch size 1, Adam optimizer, 100 training epochs for the 5.000
samples sized datasets performing early stopping for the minimum Binary Cross Entropy loss in the validation partition.
The overfitting experiments were trained for 10.000 epochs on the 100 samples subsets.

C.3. Implementation details for QM9


For QM9 (Ramakrishnan et al., 2014) we used the dataset partitions from (Anderson et al., 2019). We imported the dataloader
from his code repository (https://github.com/risilab/cormorant) which includes his data-preprocessing. Additionally all
properties have been normalized by substracting the mean and dividing by the Mean Absolute Deviation.
Our EGNN consists of 7 layers. Functions φe , φx and φh are defined at the beginning of this Appendix C. Additionally, we
use the module φinf presented in Section 3.3 that infers the edges . This function φinf is defined as a linear layer followed
by a sigmoid: Input →
− {Linear() →− sigmoid()} →− Output. Finally, the output of our EGNN hL is forwarded through a two
layers MLP that acts node-wise, a sum pooling operation and another two layers MLP that maps the averaged embedding
to the predicted property value, more formally: hL →− {Linear() → − Swish() → − Linear() → − Sum-Pooling() → − Linear() →−
Swish() →− Linear} →− Property. The number of hidden features for all model hidden layers is 128.
We trained each property individually for a total of 1.000 epochs, we used Adam optimizer, batch size 96, weight decay
1e−16 , and cosine decay for the learning rate starting at at a lr=5e−4 except for the Homo, Lumo and Gap properties where
its initial value was set to 1e−3 .
E(n) Equivariant Graph Neural Networks

D. Sometimes invariant features are all you need.


Perhaps surprisingly we find our EGNNs outperform other equivariant networks that consider higher-order representations.
In the typical story of equivariance, forcing features to be invariant destroys information and impedes performance. However,
in this section we proof that when only positional information is given (i.e. no velocity-type features) then the geometry
is completely defined by the invariant distance norms in-between points, without loss of relevant information. As a
consequence, it is not necessary to consider higher-order representation types of the relative distances, not even the relative
difference as vector. To be precise, note that these invariant features still need to be permutation equivariant, they are only
E(n) invariant.
To be specific, we want to show that for a collection of points {xi }M
i=1 the norm of in-between distances `2 (xi , xj ) are a
unique identifier of the geometry, where collections separated by an E(n) transformations are considered to be identical.
We want to show invariance of the norms under E(n) transformations and uniqueness: two point collections are identical
(up to E(n) transform) when they have the same distance norms.
Invariance. Let {xi } be a collection of M points where xi ∈ Rn and the `2 distances are `2 (xi , xj ). We want to show that
all `2 (xi , xj ) are unaffected by E(n) transformations.
Proof. Consider an arbitrary E(n) transformation Rn → Rn : x 7→ Qx + t where Q is orthogonal and t ∈ Rn is a
translation. Then for all i, j:
√ √
`2 (Qxi + t, Qxj + t) = (Qxi + t − [Qxj + t])T (Qxi + t − [Qxj + t]) = (Qxi − Qxj )T (Qxi − Qxj )
√ √
= (xi − xj )T QT Q(xi − xj ) = (xi − xj )T (xi − xj ) = `2 (xi , xj )
This proves that the `2 distances are invariant under E(n) transforms.
Uniqueness. Let {xi } and {yi } be two collection of M points each where all in-between distance norms are identical,
meaning `2 (xi , xj ) = `2 (yi , yj ). We want to show that xi = Qyi + t for some orthogonal Q and translation t, for all i.
Proof. Subtract x0 from all {xi } and y0 from all {yi }, so x̃i = xi − x0 and ỹi = yi − y0 . As proven above, since
translation is an E(n) transformation the distance norms are unaffected and:
`2 (x̃i , x̃j ) = `2 (xi , xj ) = `2 (yi , yj ) = `2 (ỹi , ỹj ).
So without loss of generality, we may assume that x0 = y0 = 0. As a direct consequence ||xi ||2 = ||yi ||2 . Now writing out
the square:
xTi xi − 2xTi xj + xTj xj = ||xi − xj ||22 = ||yi − yj ||22 = yiT yi − 2yiT yj + yjT yj
And since ||xi ||2 = ||yi ||2 , it follows that xTi xj = yiT yj or equivalently written as dot product hxi , xj i = hyi , yj i. Notice
that this already shows that angles between pairs of points are the same.
At this moment, it might already be intuı̈tive that the collections of points are indeed identical. To finalize the proof formally
we will construct a linear map A for which we will show that (1) it maps every xi to yi and (2) that it is orthogonal. First
note that from the angle equality it follows immediately that for every linear combination:
X X
|| ci xi ||2 = || ci yi ||2 (∗).
i i
Let Vx be the linear span of {xi } (so Vx is the linear subspace of all linear combinations of {xi }). Let {xij }dj=1 be a basis
of Vx , where d ≤ n. Recall that one can define a linear map by choosing a basis, and then define for each basis vector where
it maps to. Define a linear map A from Vx P to Vy by the transformation from the basis xij to yij for j = 1, ..., d. Now pick
any point xi and write it in its basis xi = j cj xij ∈ Vx . We want to show Axi = yi or alternatively ||yi − Axi ||2 = 0.
P P P
Note that Axi = A j cj xij = j cj Axij = j cj yij . Then:
X X X X
||yi − cj yij ||22 = hyi , yi i − 2hyi , c i yij i + h ci yij , ci yij i
j j j j
(∗) X X X
= hxi , xi i − 2hxi , ci xij i + h c i xij , ci xij i = hxi , xi i − 2hxi , xi i + hxi , xi i = 0.
j j j

Thus showing that Axi = yi for all i = 1, . . . , M , proving (1). Finally we want to show that A is orthogonal, when
restricted to Vx . This follows since:
hAxij , Axij i = hyij , yij i = hxij , xij i
for the basis elements xi1 , ..., xid . This implies that A is orthogonal (at least when restricted to Vx ). Finally A can be
extended via an orthogonal complement of Vx to the whole space. This concludes the proof for (2) and shows that A is
indeed orthogonal.

You might also like