Master Thesis
Master Thesis
Faculty of Mathematics
Master Thesis
Violetta Schäfer
supervised by
11 Januar, 2022
Abstract
Physics-informed neural networks (PINNs) are neural networks (NNs) which are
used for solving partial dierential equations (PDEs). They minimize a loss function
which incorporates the given PDE by using automatic dierentiation. To the best
of our knowledge, PINNs are currently only used for solving one instance of a PDE
problem, i.e. xed initial and boundary conditions. We investigate if PINNs can be
generalized such that they are able to solve PDEs for various instances of initial or
To this end, we consider two cases: xed initial and various boundary conditions,
boundary conditions. We show that for xed initial conditions, PINNs are able to
achieve good prediction accuracy on unseen data. For various initial conditions the
generalization question is still open. Due to the gradient evaluations in each iteration
PINNs for many instances of the problem infeasible in practice. From the theoretical
point of view, we assume that further training and optimization of code and the
iii
Zusammenfassung
Physikalisch-unterrichtete neuronale Netze (PINNs) sind neuronale Netze (NNs), die
imieren eine Verlustfunktion, die die gegebene PDE einarbeitet, indem automatische
Dierentiation benutzt wird. Nach unserem besten Wissen werden PINNs zurzeit
nur für das Lösen von einer Instanz einer PDE zu lösen, sprich feste Anfangs- und
sie in der Lage sind PDEs mit mehreren Instanzen von Anfangs- bzw. Randbedin-
gungen zu lösen.
Zu diesem Zweck betrachten wir zwei Fälle: feste Anfangs- und variierende Randbe-
leitungsgleichung mit Dirichlet Randbedingungen. Wir zeigen, dass für feste An-
fangsbedingungen PINNs dazu in der Lage sind eine gute Genauigkeit für Vorher-
sagen auf ungesehen Daten zu erzielen. Für variierende Anfangsbedingungen ist die
Frage der Generalisierung noch oen. Durch die Auswertung des Gradienten in jeder
Iteration ist der Trainingsprozess äuÿerst zeitaufwändig. Dies macht die General-
isierung von PINNs für viele Instanzen des Problems in der Praxis undurchführbar.
Aus theoretischer Sicht gehen wir davon aus, dass weiteres Training und die Op-
timierung des Codes und des Trainingsdatensatzes die Frage in einer zukünftigen
v
Contents
1 Introduction 1
2 Methods 3
2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Results 33
3.1 Fixed Boundary and Initial Condition . . . . . . . . . . . . . . . . . . 33
3.1.2 Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Discussion 73
4.1 Alternative Method: Learning Nonlinear Operators . . . . . . . . . . 77
5 Conclusion 81
vii
Contents
Bibliography 83
viii
1 Introduction
In recent years, the topic of machine learning has been well-established not only
in scientic research topics, but also in our everyday lives. Be it through several
language assistants that are integrated in our everyday gadgets [1] or objects like
cars or robots that are able to decide autonomously using articial intelligence [2, 3].
With tremendous speed scientists of all reasearch areas are developing and enhancing
machine learning techniques. And all have one in common. Data is the fundament.
Having a sucient amount of data available is crucial for machine learning ap-
plications. It is common knowledge that neural networks only learn from data and
it is intuitively clear that a lack of data causes the issue that we probably cannot
rely on the prediction of the resulting model. Fortunately, we live in an age where
the amount of data and computational resources are not a concern at least for ap-
real-world phenomena are not described by images or words but physical laws that
that "in the course of analyzing complex physical, biological or engineering systems,
the cost of data acquisition is prohibitive, and we are inevitably faced with the
[4].
the considered problem. They aim to provide a surrogate of the solution by mini-
mizing the corresponding PDE residual. In general, PDEs are solved by numerical
methods such as the nite element method. However, these classical methods are
computes gradients at machine precision [5] completely mesh-free, with the capabil-
ity of neural networks to serve as universal function approximators [6]. For complex
problems, already this mesh generation is time consuming, not to mention the high
1
Introduction
Can we train a PINN to solve several reference problems such that it is possible to
use this model for dierent input parameters afterwards, and thus save computa-
tional time? Since a fully specied problem always requires boundary and initial
conditions, the question is in other words: Can PINNs be trained in a way such that
they provide reliable predictions for arbitrary boundary and initial conditions?
Motivated by the question if PINNs are able to learn physics at all, Raissi,
Perdikaris, and Karniadakis (2017) already presented promising results which in-
dicate that "if the given PDE is well-posed" PINNs are "capable of achieving good
prediction accuracy" [4]. A theoretical justication for PINNs provide Shin, Dar-
bon, and Karniadakis (2020), showing the consistency of PINNs. In the last three
years the repertoire of scientic publications regarding PINNs has been growing
rapidly. PINNs have been applied to various uid mechanics problems [811] as
well as heat transfer problems [1215] especially showing the eectiveness of PINNs
et al. (2019) developed a user-friendly software tool called DeepXDE [17] which is
capable of solving several forward and inverse problems involving dierential equa-
tions. More recently, it has been shown [1820] that PINNs can integrate seemlessly
This thesis is organized as follows. First, we study the method of neural networks
and look at the concepts that are used during training such as regularization. Sub-
dierences towards standard neural networks. Within this section we also describe
basic features of the software library DeepXDE. Afterwards, we give a brief intro-
duction into the nite element method and how this method is used in the numerical
PDE solver FEniCS which we utilize to construct our ground truth data. We then de-
heat equation. After that, we present the results obtained by the previously ex-
plained methods and discuss them afterwards. Finally, we make conclusions based
on our observings.
2
2 Methods
In this chapter we explain the basic concept of a neural network (NN) and how
this concept is augmented such that the resulting model can be named a physics-
informed neural network (PINN). Subsequently, we explain the functionality of the
numerical PDE solver FEniCS which we are using for generating the ground truth
data. After that, we present the application example to which these concepts are
applied to.
NNs are widely used in many visual based application elds such as autonomous
driving, extracting information from text les or teaching a robot to tidy up your
room [2, 3, 21]. More relevant for our purposes is that they provide a robust approach
a certain ratio. The training is performed on the usually large training dataset in
order to learn the task. The test dataset is used during training as well in order
to examine if the model is able to produce proper results when using data that is
not part of the training data. Additionally, it provides an indication if the current
the test data is used to make predictions afterwards. The goal is to acquire a model
NNs are either used for classication or regression. In a classication task the NN
has to decide to which class a certain input feature belongs to. A common example
is the classication of handwritten digits where the well-known MNIST dataset [23]
in this thesis.
3
Methods
Network Architecture
The architecture of an articial neural network is based on the concept of the bi-
Weight Bias
b
x1 w1
Activation
Function Output
Input x2 w2
Σ σ ν(x)
x3 w3
dened as
where the activation functions are used component-wise. Here, d is the dimension of
the input layer, L denotes the number of layers also called depth of û, n1 , . . . , nL−1
denote the number of neurons for each of the L − 1 hidden layers, also called width
of the respective layer. If n1 = · · · = nL−1 then, ni is called width of û for i ∈
{1, . . . , L − 1}. nL is the dimension of the output layer. The matrices W l contain
the network's weights and the vector bl biases.
4
2.1 Neural Networks
x
û(x)
t
. .
. .
. .
Figure 2.2: FNN with two hidden layers and x = (x, t)T ∈ R2 as input.
The graph of a FNN is acyclic and has only egdes which follow the direction from
inhibited through other neurons" [22]. Moreover, it is responsible for the nonlinear
transformation of the input which prevents the model from being a linear regression
model. Commonly used activation functions are pictured in Figure 2.3. Each activa-
tion function has its advantages and disadvantages. For more information we refer
to [25]. Notice, that there is no specic rule which activation function to choose for
a given task. In this thesis, we are using the hyperbolic tangent function
sinh x e − e−x
x
tanh x = =
cosh x ex + e−x
as it is used in [16, 26] for similar regression tasks. Note, that for regression tasks
5
Methods
it is common to not apply an activation function to the output layer. Unlike for
classication tasks, here, continuous values are required instead of discrete ones.
For simplicity we combine the learnable parameters W l ∈ Rnl ×nl−1 and bl ∈ Rnl
n ×nl−1 +1
into one parameter Θl ∈ R l for l ∈ {1, . . . , L}. Since the prediction û is
µ
PL
dependent on Θ := (Θ1 , . . . , ΘL ) ∈ R with µ = l=1 nl (nl−1 + 1) we also write
ûΘ (x).
2.1.1 Training
loss function. For a given input x the loss function describes a measurement of
the error between the network's output ûΘ (x) and the ground truth u(x), i.e. the
desired output. There are several loss functions that are used for machine learning
applications according to which task they have to solve. Since we have a regression
task the mean squared error (MSE) loss is a proper loss function.
Due to the squaring, larger errors are penalized more heavily than smaller ones.
In addition to that, quadratic functions are prefered over other mappings, e.g. | · |1 -
norm, since they are dierentiable in each point. One can add an additional sum-
Regularization
The purpose of regularization is to avoid overtting which would lead to a model
that ts the training data too precisely. The other extreme is called undertting. An
undertted model barely ts the training data and is most likely not able to provide
reliable predictions for unseen data. The principles of overtting and undertting
are visualized in Figure 2.4. Both cases result in a model that is not able to perform
well on unseen data and thus has poor generalization capabilities. Since the concept
6
2.1 Neural Networks
Figure 2.4: Example of a model that is undertted, balanced or overtted. The blue
markers represent the training data and the dotted line the prediction of the model.
Source: https://analystprep.com/study-notes/cfa-level-2/quantitative-method/
overfitting-methods-addressing/.
proximation. If the model is overtted, the resulting function will contain a lot of
Then, the coecients of higher degrees would be large. In order to smoothen the
penalizes the weights that are considerably large compared to the others. There are
dened in Denition 2.3 for a L-layer FNN ûΘ : Rd → RnL with weights and biases
summarized in the parameter Θ. If we add a L2 regularization to the loss function
we obtain
1 X X
LT (Θ) = kûΘ (x) − u(x)k22 + λ Θ2j
|T | x∈T
j∈U \B
to penalize the exibility of the model. Note, that only the weights and not the
The dierentiability of the loss function follows from the dierentiability of the
activation function and the fact that the k · k2 -norm is dierentiable as well. The
training process of the network under the MSE loss can then be formulated as the
optimization problem
min L(Θ)
Θ
7
Methods
with loss function L as dened in Def. 2.3 and Θ ∈ Rµ representing the network's
method of steepest descent (Alg. 1) is "the simplest gradient method for optimiza-
tion" [27]. However, for non-convex functions it is possible that the algorithm fails
to converge towards the global minimum. This is the case for the choice of a small
learning rate for instance. The algorithm converges more slowly and possibly to-
wards a local minimum. Otherwise, although a large learning rate could accelerate
the process until convergence there is the risk of not reaching the global optimum
gradient descent, a SGD method only takes small subsets of the training dataset,
of the actual gradients leading to a noisy gradient in total. Due to this noise, the
method is capable of escaping possible local minima. One can draw an analogy to the
it is likely to leaving it again due to the momentum and the algorithm continues
reaching a local minimum due to a zero gradient. Thus, SGD is eective for objective
The SGD method we are dealing with is called Adam (adaptive moment estima-
tion), rst proposed by Kingma and Ba, 2014. According to the authors, it "only
requires rst-order gradients with little memory requirement" [28].Also, "the mag-
tackles the aforementioned problems by using estimations of rst (mean) and sec-
ond moments (uncentered variance) of the gradient to adapt the learning rate for
8
2.1 Neural Networks
Algorithm 2 Adam
Require: stepsize α, decay rates β1 , β2 ∈ [0, 1), objective function L(Θ)
Require: Initialization: Θ0
k←0 . Initialize iteration step
m0 ← 0 . Initialize 1st moment vector
v0 ← 0 . Initialize 2nd moment vector
while Θk not converged do
k ←k+1
gk ← ∇Θ L(Θk−1 ) . Compute gradient at iteration step k
mk ← β1 · mk−1 + (1 − β1 ) · gk . Update biased 1st moment estimate
vk ← β2 · vk−1 + (1 − β2 ) · gk2 . Update biased 2nd raw moment estimate
mk
m̂k ← 1−β k . st
Compute bias-corrected 1 moment estimate
average, it "adds history to the parameter update equation based on the gradient
the name suggests, computes the gradient by using the chain rule while propagating
backwards through the network. For more details we refer to [30, pp. 140-148].
Machine learning frameworks such as TensorFlow (Google Brain Team, 2015) and
PyTorch (Paszke et al., 2016) have implemented the method of automatic dier-
entiation (AD). One might assume that these are distinct methods as they have
dierent names. But both are strongly related to each other. As a matter of fact,
9
Methods
The NN is a solely data-driven method mainly used for applications where enough
data is available. Especially image data can easily be gathered from open source
databases. However, most real-world phenomena are not described by images but
reaction or other systems. It is possible that only partial data is available when ob-
serving such phenomena which results in a poor predictive performance when using
NNs. This motivated the idea to involve these physical laws, often formulated as
tical computations and paved the way for PINNs to emerge. PINNs are NNs which
are not only trained with ground truth data, but learn the scientic laws described
by the considered PDE. Before we explain the concept of a PINN in more detail, we
Dk u = ∂xk11 · · · ∂xkdd u
f x, Dk1 u, . . . , Dkm u = 0,
x ∈ Ω,
is called partial dierential equation (PDE) of order p = max1≤i≤m |ki | for the func-
tion u.
f x, Dk1 u, . . . , Dkm u = h
B(u) = g on ∂Ω
10
2.2 Physics-Informed Neural Networks
I(u) = u0 in Ω at t = 0.
BCs that u assumes along the boundary ∂Ω. Possible choices are
1. Dirichlet : u = uD on ∂Ω
2. Neumann : n · ∇u = uN on ∂Ω
3. Robin : αn · ∇u + βu = uR on ∂Ω
PINN Architecture
We describe the concept of a PINN using ideas and notations from [4, 16]. Since
solution u : Ω̃ → R denoted by
f x, Dk1 u, . . . , Dkm u = 0,
x ∈ Ω̃,
B(u, x) = 0, x ∈ ∂ Ω̃.
11
Methods
In order to solve this PDE with a PINN, the following procedure [16] has to be
fullled.
2. Specify the two training sets Tf and Tb for the PDE and BC/IC, respectively.
3. Specify the loss function LT (Θ) by summing the weighted MSE loss of both
parameters Θ as dened in Denition 2.2. Notice that PINNs are not constrained to
FNNs but could have other network architectures as well. The model ûΘ acts as a
Figure 2.5: Schematic of a PINN solving the one-dimensional heat equation ut = λuxx
with mixed BCs u(x, t) = gD (x, t) on ΓD ⊂ ∂ Ω̃ and ∂n ∂u
(x, t) = gR (u, x, t) on ΓR ⊂ ∂ Ω̃.
The IC is treated as a special type of Dirichlet BC. Tf and Tb denote the sets of training
points for the PDE and BC/IC residuals, respectively. Source: [16].
surrogate of the solution u. In order to solve the PDE, the computation of (partial)
derivatives is essential. By the usage of AD we are able to compute the derivatives (of
any order) of our network ûΘ with respect to all relevant input variables, independent
from the structure of the underlying programming code. Thus, we can include the
PDE residual into our computations with no need for a mesh generation as it is used
for the nite element method. This inclusion is done by considering two subsets of
the training data T ⊂ Ω̃. The set Tf ⊂ Ω̃ contains points in the domain and
Tb ⊂ ∂ Ω̃ points on the boundary and initial data. The training points are randomly
distributed in the domain. As indicated in Figure 2.5, only for points in Tb ground
12
2.2 Physics-Informed Neural Networks
truth data is used. For points in Tf the given PDE residual is minimized. This is the
key dierence from a standard NN which uses ground truth data for all considered
training data. In doing this, the model ûΘ learns the physical law imposed by the
PDE.
2.2.1 Training
Like NNs, PINNs aim to minimize a certain loss function. But dierent from stan-
dard NNs, the loss function of a PINN is a linear combination of two loss functions
x ∈ Ω̃. Then, for given sets of training points Tf ⊂ Ω̃ and Tb ⊂ ∂ Ω̃ the PINN loss
LT : Rµ → R is dened as
where
1 X 2
f x, Dk1 û, . . . , Dkm û
LTf (Θ) = 2
,
|Tf | x∈T
f
1 X
LTb (Θ) = kB(û, x)k22
|Tb | x∈T
b
In section 2.1 we explained the concept of regularization which prevents the model
from the issue of overtting. In fact, when looking at the PINN loss, the term
nalize those parameters Θ which would lead to solutions not satisfying the PDE.
"Therefore, a key property of PINNs is that they can be eectively trained using
small datasets" [4]. Also, the authors of [7] note that "it has been empirically
reported that a properly chosen boundary weight wb could accelerate the overall
Like NNs, the training corresponds to the minimization problem minΘ L(Θ). Since
13
Methods
the same reasoning as for NNs. The training of the network's parameters Θ is
proceeded using a gradient descent approach like Adam [28] or L-BFGS [33]. "The
required number of iterations highly depends on the problem (e.g. the smoothness
of the solution)" [16]. In each step of the training process, (partial) derivatives
proposed a strategy to improve the eciency of the training process called residual-
based adaptive renement which we will discuss later.
guarantee that this process converges to the global minimum. However, Raissi,
Perdikaris, and Karniadakis emphasize in [4] that "if the given PDE is well-posed"
the method PINN "is capable of achieving good prediction accuracy given a sucient
Karniadakis present in [7] that for linear second-order elliptic and parabolic type
PDEs "the sequence of minimizers strongly converges to the PDE solution in C0 ",
where C0 denotes the set of continuous functions. They state furthermore that
"if each minimizer satises the initial/boundary conditions the convergence mode
becomes H1 " [7] where H1 denotes the Sobolev space which we will introduce later.
Even more than that state Cai et al. in [18] where they ascertained with great success
that PINNs are able to solve ill-posed problems as well when providing only partial
Lu et al. developed a software library called DeepXDE [17] which enables the user
to solve forward and inverse PDEs via PINNs. Additionally, it is able to solve other
it is said that the library was designed "to make the user code stay compact and
the process of analyzing scientic problems when using machine learning methods.
DeepXDE supports the Python libraries Tensorow 1.x, 2.x and PyTorch. In the
A useful feature of the library is that it supports complex geometry domains using
supports basic geometries (rectangle, circle, etc.), it is possible that the user needs
14
2.2 Physics-Informed Neural Networks
Figure 2.6: Examples of constructive solid geometry (CSG) in 2D. Left : A and B represent
the rectangle and circle, respectively. From A and B the union A|B , dierence A − B and
intersection A&B are constructed. Right : A complex geometry (top) is constructed from
a polygon, a rectangle and a circle (bottom). Source: [16].
One major advantage is that the user does not have to provide training data. The
user denes the PDE and BC/IC as functions represented in analytical form. One
can either specify the locations of the points or only set the number of points and
then DeepXDE will sample the required number randomly on a grid covering the
specied domain.
ing the residual-based adaptive renement (RAR) method. In some cases where
functions contain areas with steep gradients, it is useful to choose more training
points in these areas. However, in many applications the shape of the solution is
points poses a challenge. To this end, Lu et al. proposes with RAR a method which
improves the distribution of training points during training process by adding more
points in the locations where the PDE residual is large. This is repeated until the
mean residual Z
1
f x, Dk1 u, . . . , Dkm u dx
εr =
V Ω̃
the user to monitor the training process and to make modications in real time, e.g.
change the learning rate, if necessary. Possible callback functions can for example
save the model after certain epochs, calculate the rst derivative of the outputs with
respect to the inputs, or monitor a movie of the spectrum of the function's Fourier
15
Methods
An overview of the usage of DeepXDE for solving PDEs is depicted in Figure 2.7.
training data by either set the specic point locations or only set the number
of points.
Figure 2.7: Flow chart of the usage of DeepXDE. The white boxes dene the PDE and
the training hyperparameters. The blue boxes combine the PDE problem and training
hyperparameters in the white boxes. The orange boxes describe the steps to solve the
PDE (from right to left). Source: [16].
At the current state of the art, DeepXDE does not support the denition of various
BCs/ICs for a specic PDE. Thus, we cannot use this library for the generalization
task without having to change the code manually. Also, although it is user-friendly
it is dicult to take control over the single steps that proceed in the background.
16
2.3 Generating Ground Truth
to do this is via the nite element method (FEM). A popular software library for
solving PDEs by using the FEM is FEniCS [34]. Before we describe how to use
FEniCS in practical applications, we explain the theory of the FEM using material
The FEM is the most important method for solving PDEs, at least for elliptic and
parabolic PDEs. The basic idea is to transform the considered PDE which is usually
given in the so-called strong form into the weak form. Essential for this approach
is the proper choice of the underlying function space. Therefore, we briey explain
denote the Hilbert space of square-integrable functions with inner product and norm
Z Z 21
2
hf, giL2 (Ω) := f (x)g(x)dx, kf kL2 (Ω) := |f (x)| dx .
Ω Ω
intuitively use
u(x + h) − u(x)
u0 (x) = lim .
h→0 h
However, pointwise evaluation does not make sense in L2 (Ω). Thus, we require
another denition of derivatives for this function space. To this end, we dene
0
C∞ (Ω) := {φ ∈ C∞ (Ω) : Ω ⊃ suppφ is compact}
g ∈ L2 (Ω) with
Z Z
|k|
g(x)φ(x)dx = (−1) f (x)Dk φ(x)dx ∀φ ∈ C∞
0
(Ω).
Ω Ω
17
Methods
With this formalism we can dene the important function space which is used for
the FEM.
For m > 0 let Hm (Ω) be the set of all functions u ∈ L2 (Ω) which posses a weak
k
derivative Dw u ∈ L2 (Ω) for all |k| ≤ m. In Hm (Ω) we dene a inner product and
norm
12
X X
hu, viHm (Ω) = Dwk u, Dwk v L2 (Ω)
, kukHm (Ω) = kDwk uk2L2 (Ω) .
|k|≤m |k|≤m
apply the presented methods to the heat equation later, we will describe how the
are looking for a solution u : Ω × [0, T ] → R which fullls the heat equation
ut = ∆u + f (2.1)
Neumann BCs
∂u
u = uD on ΓD , = uN on ΓN ,
∂n
with ΓD and ΓN forming the boundary of Ω, i.e. ΓD ∪ ΓN = ∂Ω and ΓD ∩ ΓN = ∅.
Since we have a time-dependent problem, we need to prescribe initial data
u(·, 0) = u0 on Ω.
For now, we have given our problem in strong form. The FEM requires the problem
18
2.3 Generating Ground Truth
V = {v ∈ H1 (Ω) : v = 0 on ΓD }.
2. Multiply the strong form (2.1) by test functions v ∈ V and integrate over
domain Ω Z Z Z
v u̇ dx − v∆u dx = vf dx ∀v ∈ V.
Ω Ω Ω
Green's formula
Z Z Z
h∇u, ∇vi dx = − v∆u dx + v h∇u, ni ds
Ω Ω ∂Ω
∂u
and obtain due to
∂n
= uN on ΓN the weak form:
Note that for homogeneous BCs the boundary integral vanishes and we obtain u∈
V . It is conveninent to introduce an abstract notation for (2.2). We dene on
Z Z Z
hu̇, vi := v u̇ dx, L(v) := vf dx + vuN ds,
Ω Ω ΓN
for given f (·, t) ∈ L2 (Ω) and uN (·, t) ∈ L2 (ΓN ). Still, we consider an innite-
dimensional function space. Since the FEM is a numerical method, the key step is
19
Methods
(2.3) results in a system of ordinary dierential equations where the initial value
tion. In the FEM, the domain Ω is rst discretized, i.e. the domain is divided into
elements that have a simple shape as depicted in Figure 2.8. In each element, the
Figure 2.8: Top: Element types of degree=1. Bottom : Element types of degree=2.
The markers • identify nodes in which a function value is interpolated. Source: https:
//www.femto-engineering.de/stories/festigkeitsberechnungen/.
As already menioned before, FEniCS is a numerical PDE solver which uses the
the user has to perform the steps as stated in section 2.3.1. The goal is to provide
FEniCS a variational problem. What makes FEniCS attractive is that the steps
of writing the program code for dening the variational problem with all relevant
assumptions "result in fairly short code, while a similar program in most other
software frameworks for PDEs require much more code and technically dicult
programming" [35].
Assume we have given the heat equation as stated in section 2.3.1. For simplicity
we only assume Dirichlet BCs. FEniCS distinguishes between the trial space U and
20
2.3 Generating Ground Truth
U = {u ∈ H1 (Ω) : u = uD on ∂Ω},
V = {v ∈ H1 (Ω) : v = 0 on ∂Ω}.
In order to solve a PDE with FEniCS the user has to perform the following steps
[35]:
Choose the nite element spaces U, V by specifying the domain (the mesh)
element spaces. Also, u denotes the solution of the discrete problem and ue corre-
sponds to the exact solution if we need to distinguish between both. Thus, we want
a(u, v) = L(v) ∀v ∈ V.
where the bilinear form a(u, v) collects all terms with the unknown solution u and
L(v) is the linear functional compromised of all known functions. But how does
ence approximation. For a xed t, each stationary problem is then transformed into
Let un denote the solution u at time tn where n is an integer iterating over all
considered time levels. "For simplicity and stability reasons" [35], FEniCS chooses
n+1
un+1 − un
∂u
≈
∂t ∆t
with ∆t as the discretization parameter. With this, we obtain the time discrete
version of the heat equation sampled at some time level tn+1 , also called backward
21
Methods
un+1 − un
= ∆un+1 + f n+1 .
∆t
FEniCS now solves, given the initial value u0 , a sequence of spatial stationary prob-
n+1
lems for u iteratively
u0 = u0 , (2.4)
To turn (2.5) into weak form, we have to multiply the equation by a test function
v∈V and integrate over the domain Ω where second-order derivatives are removed
where
Z
n+1
un+1 v + ∆t ∇un+1 , ∇v
a(u , v) = dx,
ZΩ
Ln+1 (v) = un + ∆tf n+1 v dx.
Ω
yields
a(u0 , v) = L0 (v)
where
Z
0
a(u , v) = u0 v dx,
ZΩ
L0 (v) = u0 v dx.
Ω
Thus, we obtain the following sequence of variational problems in order to solve the
22
2.4 Application to Heat Equation
Since PINNs provide an adequate method for solving PDEs eciently [4] we want to
investigate whether they can be applied to certain simulation models e.g. a simplied
thermal model of a satellite. A satellite lives in the earth orbit and absorbs heat
from the sun or other sources via radiation. We are interested in modelling the heat
conduction in the metal plates that have to be kept under a certain temperature.
To this end, we consider the well-known heat equation. In this thesis, we are only
looking at the one-dimensional case in order to expose rst ideas how to deal with
ut = λuxx + f
u = uD on ∂Ω
u = u0 in Ω.
For the sake of simplicity we only consider the heat equation with f = 0 and Dirichlet
BCs uD . The heat equation has for specied boundary and initial conditions a unique
Analytical Solution
The following is based on [37]. We exemplarily show the computation of the solution
of the heat equation with Ω = [0, π], λ = 1, f = 0 and zero Dirichlet BCs uD = 0
given by
ut = uxx in Ω×I
u=0 on ∂Ω
u(·, 0) = u0 in Ω.
23
Methods
T 0 (t) X 00 (x)
=
T (t) X(x)
constant in space
| {z }
T 0 (t)
assuming X(x) 6= 0 6= T (t). Let c := T (t)
be a constant. Due to the boundary
1) c=0
X 00 (x) = cX(x)
=⇒ X(x) = a1 x + a2
=⇒ a1 = a2 = 0
2) c>0
X 00 (x) = cX(x)
√ √
=⇒ X(x) = a1 e cx
+a2 e− cx
√ √
We obtain a1 + a2 = 0 and a1 e cπ
+a2 e− cπ
= 0.
√ √
=⇒ a1 (e cπ
−e− cπ
)=0
=⇒ a1 = a2 = 0
The above cases both lead to the trivial solution. Hence, c < 0. Let us denote
2
c = −k with k ∈ N. Then
X 00 (x) = −k 2 X(x)
=⇒ X(x) = ak sin(kx).
Note that X(π) = 0 for all k ∈ N. The corresponding solution for T (t) is given by
2
T (t) = e−k t .
24
2.4 Application to Heat Equation
are solutions of the one-dimensional heat equation on the interval [0, π] with zero
∞
2
X
u(x, t) = ak sin(kx) e−k t .
k=1
onq
2
The coecients ak are given by the initial condition u0 . sin(kx) : k ∈ N
The functions
π
form an orthonormal basis of the Hilbert space of square integrable functions on [0, π]
with zero boundary conditions. Hence, the initial condition u0 can be written as
∞
X 2
u0 (x) = hu0 , sin(kx)i sin(kx)
k=1
π
∞ Z π
X 2
= u0 (z) sin(kz)dz sin(kx).
k=1
π 0
It follows Z π
2
ak = u0 (z) sin(kz)dz .
π 0
∞ Z π
X 2 2
u(x, t) = u0 (z) sin(kz)dz sin(kx) e−k t .
k=1
π 0
Example 2.1. Consider the one-dimensional heat equation with λ = 1 and zero
Since we want our model to be generalizable, we look at the heat equation with
various boundary and initial conditions. As already mentioned before (section 2.2.2),
using PINNs with current state of the art methods, this requires the model to be
trained anew for every modied boundary or initial condition. This constraint raises
a fundamental question:
25
Methods
To answer this question, we take a look at several cases. First, we compare a PINN
With this, we want to expose if and under what conditions PINNs are superior over
NNs. Subsequently, we vary the BCs for a xed IC. Under this setting we also
investigate the impact of an additional nonlinear term in the residual. The most
interesting case we look at deals with a compilation of several ICs. Here, the BCs are
given implicit by the ICs. In order to be able to describe the input functions with
using ideas and results from [38, 39]. Therefore, we require some preliminaries.
Let T := [0, 1] be a Torus of length 1 and L2 (T) be the Hilbert space of square-
2.3.1. Elements f ∈ L2 (T) are periodic functions living on R that are continued
With (2.6), we can represent the elements of the Hilbert space L2 (T) with respect
to that basis. Let Sn : L2 (T) → L2 (T) be the linear operator dened by
n
X
Sn f := f, e2π i k· e2π i k·
k=−n
which we denote by the nth Fourier partial sum of f ∈ L2 (T). Note, that this is also
dened for f ∈ L1 (T) ⊃ L2 (T) where L1 (T) denotes the Hilbert space of measurable
functions. Now, we can state the following theorem.
X Z 1
f (x) = ck (f ) e 2π i kx
, ck (f ) := f, e 2π i k·
= f (x) e−2π i kx dx (2.7)
k∈Z 0
26
2.4 Application to Heat Equation
X
kf k2L2 (T) = |ck (f )|2 < ∞.
k∈Z
Proof. Let ( )
n
X
Pn := ck e2π i k· : ck ∈ C
k=−n
Pn , i.e.
kf − Sn f k = min{kf − pk : p ∈ Pn }
=0 ∀pn ∈ Pn
Since the functions e2π i kx : k ∈ Z form an ONB of L2 (T) the Parseval equation
27
Methods
holds
X
kf k2L2 (T) = |ck (f )|2 .
k∈Z
its Fourier coecients. Note, that in (2.7) the equality is meant in the sense of L2 .
There is no pointwise or uniform convergence here. The Fourier coecients ck (f )
correspond to the amplitudes of oscillations contained in f.
a discrete setting. Besides, in some cases only certain frequencies k are of inter-
est. Therefore, consider the regarded function sampled on an equispaced grid, i.e.
−1 j
(fj )N
j=0 with fj := f N
, N ∈ N.
Denition 2.10. [38] (Discrete Fourier Transform)
N
X −1
fˆk := fj e−2πijk/N , k = 0, . . . , N − 1
j=0
relation between the Fourier coecients ck (f ) and their discrete analogue fˆk is given
by
1 ˆ N
ck (f ) ≈ fk , k = 0, . . . , − 1
N 2
1 ˆ N
c−k (f ) ≈ fN −k , k = 1, . . . ,
N 2
where w.l.o.g let N be even. In the following, we identify the phrase Fourier coe-
cients with the values fˆk computed by the DFT as stated in Denition 2.10). The
DFT is invertible [38]. For given Fourier coecients fˆk the original sampled function
N −1
1 X ˆ 2πijk/N
fj := fk e , j = 0, . . . , N − 1
N k=0
28
2.4 Application to Heat Equation
Before we tackle the generalization task of this thesis, we rst look at a more basic
simple function approximation by using the ground truth data? According to Raissi
et al. PINNs seem to be the answer for problems where only small data is available
2. Extrapolation
Do PINNs have better extrapolation capabilities than NNs?
In order to answer this, we examine what impact the residual has on the training
inner domain into the set of boundary points where the ground truth is used in the
loss function. Generally, this would not be done when using PINNs in practice (see
Def. 2.7). However, with this approach we achieve that the only dierence between
the two models is that the PINN has the additional residual term in the loss function
the one-dimensional heat equation with the following initial and boundary conditions
BC: uD = 0.
boundary and initial data. We consider the set of training points T ⊂ Ω̃. In section
truth data is only used for Tb . However, as mentioned above, for comparison reasons
we set Tb = T . The ground truth data is obtained by solving the PDE with FEniCS
(see subsection 2.3.2) on a 100 × 100 mesh, i.e. a grid consisting of 100 points in x-
Tackling the rst question we train the models with dierent sizes of the train-
ing dataset. We consider sizes in M = {10, 15, 20, 30, 40, 60, 80, 100, 120, 160, 200}.
First, we train the NN with a selected size of randomly distributed training points.
Then, with the same net initialization and the same training data we train the cor-
responding PINN. The test dataset consists of 200 randomly distributed points. For
29
Methods
all considered cases we use the same test dataset. With the IC and BC from (2.9)
we obtain the settings for each considered size of the training dataset as depicted
in Table 2.1. With the obtained models we perform extrapolation and look at the
Table 2.1: Setting for comparison of PINN to NN. Both were trained with the same net
initialization on the same data with the Adam optimizer and a learning rate of 0.0001.
Ω I λ Depth Width Epochs
In the following we elaborate on the cases mentioned above. In general, the heat
ut = ∇ · (d(u)∇u) + r(u)
where d(u) > 0 is the diusivity and r(u) > 0 a reaction term. In the cases
considered before, it holds d = λ and r = 0. However, regarding the motivation
to model a reaction of the satellite with the space environment, for the nonlinear
case we take an additional radiative loss into account. In a vaccuum, a satellite can
only absorb or emit heat via radiation. We model this radiative loss in a simple
fashion by the Stefan-Boltzmann law r(u) = −σ(u4 − v4 ) where v(x, t) denotes the
temperature of the surroundings and σ being a constant. For simplicity we restrict
30
2.4 Application to Heat Equation
ut = λuxx + r(u)
We choose a, b ∈ {0, 0.2, 0.4, 0.6, 0.8, 1}. This results in a dataset consisting of
36 individual datasets.
IC: u0 (x) = −x + 1
a, for x = 0
BC: uD (x) = .
b, for x = 1
straight line to change over time in a way such that it fullls the BC.
c) Let Ω = [0, π], a, b ∈ [0, 1] and the IC and BC as dened in a). Unlike in a),
ut = λuxx − σu4 .
d) Let Ω = [0, 1] and r = 0. The ICs are given by functions that are parametrized
by 6 Fourier coecients (see Def. 2.10). The BCs are given implicit. Here, we
provide the PINN 2065 functions. Thus, the dataset consists of 2065 individual
dierent slopes and polynomials. In doing so, we achieve that the network
31
Methods
learns various types of functions such that it is able to predict the solution for
Tb ⊂ ∂ Ω̃. This is done for each individual dataset. Consider case a): If we choose
points where 10 points live in the inner region an the other 10 on the boundary.
The 10 boundary points are also used for the residual loss in order to increase the
regularization. Thus, the complete training dataset consists of 720 points. The same
procedure is done for the test dataset. Here, for each case we use a size of 200 test
points. The dierent problem settings with the respective network architectures are
Table 2.2: Settings for the generalization of PINNs. The column Size denotes the size of
the training dataset for each individual dataset. In the nonlinear case we choose σ = 3.7.
All models were trained with the Adam optimizer and a learning rate of 0.0001.
Sizes Ω I λ r Depth Width Epochs
summarized in Table 2.2. We use Python's scipy module to compute the complex-
valued Fourier coecients fˆk . We sample the function used as IC at 100 points
with higher frequencies, i.e. c0 , c1 , c2 , c−1 , c−2 , c−3 . Thus, we have to use fˆ0 , fˆ1 ,
fˆ2 , fˆ97 , fˆ98 , fˆ99 according to the relation mentioned above. For simplicity reasons
we only use the real and imaginary part of the numbers instead of their complex
representation. As before, we use FEniCS to generate the ground truth data for
Tb . Therefore, we use the 6 Fourier coecients and compute the resulting function
32
3 Results
We nally present the results of the tasks presented in section 2.4. First, we show
the results when xing the boundary and initial condition. Here, we show the
comparison between a NN and a PINN. In the next section, we look at the eects
of selected initial conditions and nonlinearities in the residual when varying the
boundary conditions. Lastly, we present the results gained when training the model
with various initial conditions provided as six Fourier coecients. The results are
models' performance for increasing size of the training dataset. In the following the
phrase size corresponds to the number of training points. For each size we present
the model at that training step where it achieved the best test loss since the test loss
results for each considered size is displayed in Table 3.1. For all models the test loss
was evaluated at the same 200 test points. The relative L2 error takes all 10000
grid points into account. The relative L2 error of the PINN is smaller than the error
of the NN for all sizes up to 160 inclusively. The PINN test loss is smaller than
the NN test loss up to size 100. Looking at specic sizes, dierent behaviors of the
NN and the PINN, respectively, can be observed. From size 30 to 40 the test loss
and the relative L2 error of the NN decrease whereas for the PINN these values are
increasing or being equal. From size 80 to 100 the test loss and the error of the NN
are increasing, for the PINN they are decreasing. From size 100 to 120, the relative
A visualization of the evolution of the relative L2 error and the runtime over the
sizes can be seen in Figure 3.1. Consider the errors displayed in Figure 3.1a. At rst
33
Results
Table 3.1: Comparison of PINNs and NNs trained for various numbers of random sampled
training points. Train loss, test loss and relative L2 error are compared. The column size
denotes the number of training points. The test losses of all models are evaulated at the
same 200 points.
NN PINN
size train loss test loss rel. L2 error train loss test loss rel. L2 error
glance, it can be seen that the PINN error is smaller than the NN error for sizes up
to 160 inclusively. The largest dierence between the error of the PINN and the NN
is at size 15. In Figure 3.1b the runtime of the training process for each considered
size is displayed. For the PINN the runtime increases over the sizes whereas the
runtime of the NNs stays within a range of about 97 and 124 seconds. At size 10 the
PINN takes twice as long as the NN. For 200 training points the PINN's runtime is
34
3.1 Fixed Boundary and Initial Condition
Loss Curves
We exemplarily look at the loss curves of the PINNs and NNs trained for sizes in
{15, 30, 40, 80, 100, 160} pictured in Figure 3.2. The losses for size 15 are shown in
Figure 3.2a. One can see the large dierence between the train and test losses for
both the PINN and NN. This indicates a lack of train data which corresponds to a
insucient regularization and thus results in a poor generalization. For the NN this
gap is larger and there is no signicantly decrease of test loss from epoch 4000 on,
whereas for the PINN it decreases up to epoch 19000. For size 30 (Fig. 3.2b), the
gap between the PINN's train and test loss is small compared to the NN where it
is still large. Here, the NN's test loss stops decreasing signicantly at epoch 13000.
From size 40 on (Fig. 3.2c), the dierence of train and test loss is for both PINN and
NN small compared to the smaller sizes. All curves are decreasing over the number
of epochs. One can see in Figures 3.2a-d that the gap between train and test loss
becomes smaller for increasing size of the training dataset. This gap becomes larger
again for size 100 as can be seen in Figure 3.2e. Figure 3.2f displays the similar loss
Solution Plots
Now, we present the solution plots that correspond to the losses presented be-
fore. In Figure 3.3 the solution plots of NNs and PINNs trained with sizes in
{15, 30, 40, 80, 100, 160} are compared. For each size the prediction of the model
and the used training data are displayed. Additionally, we look at the relative
L∞ error evaluated at each grid point in order to see where the errors are made.
Furthermore, we depict the maximal errors of all considered models in Table 3.2.
In case of size 15 (Fig. 3.3a-b) one can see that in the case of the NN the errors
are especially made on the boundary and in those regions where no training points
were available. The PINN has in the aforementioned areas only minor errors. In
case of size 30, pictured in Figures 3.3c-d, the larger errors made by the NN are
located in areas of the boundary, especially around the initial condition where no
training points are available. The PINN has only a maximal error of 1.95% although
there are no training points in that region. Also, compared to the PINN the NN
makes larger errors in the inner domain. In Figures 3.3e-f, representing 40 utilized
training points, the error plots of PINN and NN have a similar ratio between large
and smaller errors. Comparing size 80 (Fig. 3.3g-h) to size 100 (Fig. 3.3i-j), one
can see in the case of the NN that for size 100 larger errors are made than for size
35
Results
36
3.1 Fixed Boundary and Initial Condition
Figure 3.3: Predicted solutions of PINNs and NNs trained with sizes in {15, 30, 40}. The
black markers represent the used training points. Top : Prediction. Bottom : Relative
L∞ error. (a)-(d): NN makes larger errors than PINN. Signicant errors are made at
the boundaries. (e)-(f): Both are making similar errors. Errors are mainly made at the
boundaries.
37
Results
Figure 3.3: Predicted solutions of PINNs and NNs trained with sizes in {80, 100, 160}.
The black markers represent the used training points. Top : Predicted solution. Bottom :
Relative L∞ error. (g)-(j): NN makes larger errors than PINN. Errors are mainly made
at the boundaries. (k)-(l): Both are making similar errors. Minor errors are made at the
boundaries.
38
3.1 Fixed Boundary and Initial Condition
80. This can be seen by the increasing of the error at the boundary in positive
t-direction. In contrast to the NN, the PINNs trained with 80 and 100 training
points, respectively, have a similar error distribution with a maximal error of 0.98%
for size 80 and 0.45% for size 100. In case of size 160 (Fig. 3.3k-l), the PINN and
NN behave similar in the sense of the amount of errors. Both are making minor
errors at the boundaries although the locations of the errors dier. Also, the inner
domain of the PINN has a smoother error distribution than the NN having isolated
spots.
Table 3.2: Predictive errors of PINNs and NNs for various number of training samples.
The left column corresponds to the size of the training dataset, the maximal relative L∞
error with locations are displayed.
NN PINN
3.1.2 Extrapolation
In terms of extrapolation we depict the models trained with sizes in {10, 15, 40, 80, 100}
in Figure 3.4. Generally, it can be said that the error increases in positive t-direction.
performed with all regarded models in Table 3.3. In case of size 10 (Fig. 3.4a-b),
the PINN achieves better extrapolation results than the NN. The error of the NN
increases up to 81.48%. In case of size 15 (Fig. 3.4c-d), both are making a similar
amount of errors. In Figures 3.4e-f it can be seen that the NN extrapolates bet-
ter than the PINN. Here, the NN has a maximal error of 47.12% compared to the
PINN's error of 57.09%. Looking at size 80 (Fig. 3.4g-h), the PINN achieves better
39
Results
Figure 3.4: Extrapolated solution of PINNs and NNs trained with sizes in {10, 15}. The
models were trained for t ∈ [0, 1]. Top : Exact solution. Middle : Predicted solution.
Bottom : Relative L∞ error. (a)-(b): NN makes larger errors than PINN. (c)-(d): Both are
40
3.1 Fixed Boundary and Initial Condition
Figure 3.4: Extrapolated solution of PINNs and NNs trained with sizes in {40, 80}. The
models were trained for t ∈ [0, 1]. Top : Exact solution. Middle : Predicted solution.
Bottom : Relative L∞ error. (e)-(f): PINN makes larger errors than NN. (c)-(d): NN
41
Results
Figure 3.5: Relative L2 error for extrapolation performed with the models trained with
dierent sizes of the training dataset.
extrapolation results with only a small maximal error of 8.64%. The error of the
NN increases up to 31.7%. In case of size 100 (Fig. 3.4i-j), the NN is superior over
the PINN. The maximal error of the PINN is of 59.23% whereas the NN has only
42
3.1 Fixed Boundary and Initial Condition
an error of 15.37%.
In Figure 3.5 we plotted the relative L2 error over the number of training samples.
One can see great uctuations of the error for growing size of the training dataset.
Thus, there is no specic rule that can be deduced under what assumptions a NN
Table 3.3: Predictive errors of extrapolation performed with PINN and NN models trained
for various number of training samples. The left column corresponds to the size of the
training dataset, the relative L∞ error with locations are depicted.
NN PINN
Within this section we validate our implementation using PyTorch with the library
tion. Although DeepXDE provides the ability to use dierent backends, it fails to
generate results with the backend PyTorch since relevant functions have not been
implemented yet. Thus, for the following results we use the default backend Tensor-
Flow 1.x. The network architecture is the same as the one used for the comparison
In Figure 3.6 the comparison of the predicted solutions is pictured. One can see
that smaller errors are achieved by DeepXDE with a maximal error of 0.13% at
(x, t)T = (π, 1)T . Our self-made implementation has a maximal error of 0.42% also
T T
at (x, t) = (π, 1) . Looking at Table 3.4 it can be seen that DeepXDE is about
43
Results
two times faster than the self-made implementation. Also, the relative L2 error of
Table 3.4: Comparison of the training process of a PINN using DeepXDE and PyTorch.
Both were trained for 20000 epochs on the same train and test data with the Adam
optimizer and a learning rate of 0.0001.
train loss test loss time rel. L2 error
44
3.2 Fixed Initial and Various Boundary Conditions
Conditions
We now address the generalization task which is the core of this thesis. At the
moment we are looking at models trained with xed initial conditions and boundary
conditions with values in M := {0, 0.2, 0.4, 0.6, 0.8, 1}. Within this section, we also
include investigations on the behavior of the training process when the PDE residual
The loss curves of models trained with sine as initial condition and training set
sizes in {20, 40, 60} are presented in Figure 3.7. For size 20 one can see a large gap
between train and test loss. This indicates a poor regularization and yields a model
that does not generalize well. Comparing size 40 to size 60, there is no signicantly
dierence of the test loss curves. In Table 3.5 we have displayed the runtime of
Figure 3.7: Plot of train and test losses of models trained with sine as IC and boundary
values in {0, 0.2, 0.4, 0.6, 0.8, 1}. The models were trained for sizes in {20, 40, 60} and 60000
epochs with Adam as optimizer and a learning rate of 0.0001.
each model and the train and test loss values of the respective best model, i.e. the
model at that training step which achieves the best test loss. For the given problem
setting the best test loss is achieved by size 60. The runtime increases linearly with
45
Results
Table 3.5: Runtime of the models with sine as IC trained for 60000 epochs with sizes in
{20, 40, 60}. Additionally, the train and test loss values of the best models are depicted.
size runtime train loss test loss
Table 3.6: Predictive errors of the model trained for solving the one-dimensional heat
equation with sine as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and size 60. The two
left columns represent the left (l) and right (r) boundary values, respectively. Additionally,
the maximal relative L∞ error with locations and the overall relative L2 error are depicted.
l r (x, t)T max. error rel. L2 error
46
3.2 Fixed Initial and Various Boundary Conditions
growing size of the training dataset. For size 60 the training process has a runtime
We are looking at the results obtained by the model trained with size 60. We
predict the solutions of the one-dimensional heat equation with BCs in [0, 1] and
stepsize 0.1 resulting in 121 cases. Some examples of predicted solutions with their
relative L∞ error are pictured in Figure 3.8. In Table 3.6, we additionally show the
maximal relative L∞ error and their locations, as well as the relative L2 error. For
better readibility, in the following we refer by maximal error to the maximal relative
L∞ error and by overall error to the relative L2 error. By l , r we denote the left and
For all cases we have only a small maximal error of less than 1%. The largest
A similar error can be observed for cases l = 0.2, r = 0.0 and l = 0.4, r = 0.0 which
are part of the training data. This can be seen in Figures 3.8a, c and Table 3.6. It
occurs that for xed l ∈ {0.2, 0.3, 0.4, 0.5} the overall error is largest for r = 0.0 and
then decreases for growing r as indicated by Figures 3.8c-f. This behavior changes
for l ∈ {0.6, 0.7, 0.8, 0.9}. Here, the overall error increases up to a certain point and
then starts decreasing again which is shown in Figures 3.8h-j. The smallest relative
L2 error of 1.47 × 10−3 is achieved by case l = 1.0, r = 0.9 (see Fig. 3.8l) which is
47
Results
Figure 3.8: Predicted solutions of a PINN trained with sine as IC and 60 training points
for various left (l) and right (r) boundary values. The black markers represent the used
training points. Cases where no markers are visible are not part of the training data. Top :
Prediction. Bottom : Relative L∞ error. (a)-(c): Minor errors at around x = π2 near initial
condition. Error vanishes in positive t-direction. (c)-(f): Error vanishes for xed l and
growing r.
48
3.2 Fixed Initial and Various Boundary Conditions
Figure 3.8: Predicted solutions of a PINN trained with sine as IC and 60 training points
for various left (l) and right (r) boundary values. The black markers represent the used
training points. Cases where no markers are visible are not part of the training data. Top :
Prediction. Bottom : Relative L∞ error. For all cases one can only see minor errors smaller
than 0.6%. (g): Minor errors (up to 0.57%) near initial condition. (h)-(j): The overall error
decreases for xed l and growing r. (l): Case of smallest relative L2 error (1.47 × 10−3 ).
49
Results
Boundaries
In Figure 3.9 the train and tess losses of the models trained with sizes in {40, 60, 80}
are pictured. As initial condition we choose the line u0 (x) = −x + 1 where the
boundary has varying values in M. For cases 60 and 80 the loss curves show an
overtting of the models which is indicated by the large gap between the train and
test losses, respectively. For size 60 this gap is larger. Here, the best model is
already achieved at epoch 8700. The corresponding loss values and the runtimes are
depicted in Table 3.7. For this setting we do not consider a training beyond 20000
epochs since this amount is already sucient for the test loss curves to converge as
can be seen in Figure 3.9. For a size of 80 the runtime of the training process is 7.2
hours.
Figure 3.9: Train and test losses of models trained with the line u0 (x) = −x + 1 as IC and
boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1}. The models were trained for sizes in {40, 60, 80}
and 20000 epochs with Adam as optimizer and a learning rate of 0.0001.
We consider results obtained by the model trained with size 80. As in the section
before, we predict the solutions of the one-dimensional heat equation with BCs in
[0, 1] and stepsize 0.1 resulting in 121 cases. In Figure 3.10, some cases of predicted
solutions with the respective relative L∞ error are pictured. The corresponding
maximal errors with their locations as well as the relative L2 errors can be found in
Table 3.8.
For all cases one can see a large error in those areas where the discontinuities occur.
50
3.2 Fixed Initial and Various Boundary Conditions
Figure 3.10: Predicted solutions of a PINN trained with the line u0 (x) = −x + 1 as IC
and 80 training points for various left (l) and right (r) boundary values. The black markers
represent the used training points. Cases where no markers are visible are not part of the
training data. Top : Exact solution. Middle : Prediction. Bottom : Relative L∞ error.
(a): Largest maximal error (63.13%) and largest overall error (1.28 × 10−1 ) of all cases.
(b)-(d): Large errors (up to 40.98%) near the boundaries where the discontinuities occur.
For growing r the error areas neart the initial condition decrease.
51
Results
Figure 3.10: Predicted solutions of a PINN trained with the line u0 (x) = −x + 1 as IC
and 80 training points for various left (l) and right (r) boundary values. The black markers
represent the used training points. Cases where no markers are visible are not part of the
training data. Top : Exact solution. Middle : Prediction. Bottom : Relative L∞ error.
(i)-(l): Errors (up to 9.62%) are made near the boundaries where the discontinuities occur.
For growing r the error areas decrease or vanish.
52
3.2 Fixed Initial and Various Boundary Conditions
Figure 3.11: Line plots of exact and predicted solutions of the one-dimensional heat
equation with the line u0 (x) = −x + 1 as IC and discontinuous boundaries depicted at
time steps t = 0, 0.1, 0.2. (a)-(b): For t = 0 the prediction becomes negative at boundary
x = 1. (c)-(f): The larger the dierence between the line and BC, the larger the error at
the corresponding boundary.
53
Results
Table 3.7: Runtime of the models trained with the line u0 (x) = −x + 1 as IC and sizes
in {40, 60, 80} for 20000 epochs. Additionally, the train and test loss values of the best
models are depicted.
size runtime train loss test loss
Table 3.8: Predictive errors of the model trained for solving the heat equation ut = λuxx
with the line u0 (x) = −x + 1 as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and a
training size of 80. The two left columns represent the left (l) and right (r) boundary
values, respectively. Additionally, the maximal relative L∞ error with locations and the
overall relative L2 error are depicted.
l r (x, t)T max. error rel. L2 error
T
0.0 0.0 (0, 0.01) 63.13% 1.28e-01
The largest maximal error of 63.13% is obtained by case l = 0.0, r = 0.0 depicted in
Figure 3.10a. Depending on the boundary condition, the error propagates further
in positive t-direction. For a larger dierence between the boundary values and the
line, or in cases where the right boundary, i.e. x = 1, tends to zero, this propagation
is more distinct. Note that for most cases, the maximal error is in other areas about
of l = 0.0, r = 0.0. In addition to that, this case makes an overall error of 0.128
which is also the largest compared to the other cases. The prediction with the lowest
−3
relative L2 error (7.58×10 ) is obtained by case l = 0.9, r = 0.3 (Fig. 3.10l) as
54
3.2 Fixed Initial and Various Boundary Conditions
In Figure 3.11 we additionally display for some cases how the model tries to t
the BC at t = 0. Especially when comparing 3.11c and 3.11d, one can see that the
larger the dierence between the line and boundary value, the larger the L∞ error
values for xed l = 0.9 or r = 0.2 in Table 3.8. An exception provide the cases where
r = 0.0, i.e. u(1, 0) = 0. Although the dierence tends to zero here, the prediction
becomes negative in these cases leading to a larger error.
55
Results
We investigate the heat equation with an additional nonliner term, i.e. ut = λuxx −
σu4 . Here, we consider models trained with sine as initial condition and boundary
values in M. The sizes of the training datasets are in {30, 40, 60}. In Figure 3.12
one can see the train and tess losses of the considered models. For all cases the
loss curves proceed in the same order of magnitude. However, for epochs larger
than 40000 the curves start departing from each other slightly. The best model is
achieved for size 60. In Table 3.9 the runtimes of the models are displayed. The
duration of the training process for size 60 is 19.6h which is similar to the case with
Figure 3.12: Train and test losses of models trained to solve the nonlinear heat equation
ut = λuxx − σu4 with sine as IC and boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1}. The
models were trained for sizes in {30, 40, 60} and 60000 epochs with Adam as optimizer and
a learning rate of 0.0001.
Table 3.9: Runtime of the models trained with an additional nonlinear term with sizes
in {30, 40, 60} for 60000 epochs. Additionally, the train and test loss values of the best
models are depicted.
size runtime train loss test loss
We are looking at the results obtained by the model trained with size 60. As
56
3.2 Fixed Initial and Various Boundary Conditions
before, we predict the solution for boundary values in [0, 1] with stepsize 0.1. In
Figure 3.13 we have displayed some examples of the 121 cases. We further show
the corresponding overall and maximal errors with the respective locations in Table
3.10.
Table 3.10: Predictive errors of the model trained for solving the nonlinear heat equation
ut = λuxx − σu4 with sine as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and size 60.
The two left columns represent the left (l) and right (r) boundary values, respectively.
Additionally, the maximal relative L∞ error with locations and the overall relative L2
error are depicted.
l r (x, t)T max. error rel. L2 error
T
0.1 0.0 (2.79, 0.01) 1.78% 1.24e-02
It appears that for xed l ∈ {0, 0.1, 0.2, 0.3, 0.4} and r = 0.0, the prediction shows
a larger error close to the boundaries compared to other regions of the domain. For
growing r this error decreases. Instead, the error concentrates on the area around
(x, t)T = ( π2 , 0.05)T as can be seen in Figures 3.13a-d. Figure 3.13f shows that this
scheme is only barely present for l = 0.5. Here, one can already see the concentration
of the error on the aforementioned area for r = 0.0 having a maximal error of 1.24%.
For a xed l > 0.5 and r = 0.0, the relative L∞ error is largest in the same area
as mentioned before. For growing r this area increases which is shown by Figures
−2
3.13g-j. The largest relative L2 error of 2.25 × 10 is obtained by case l = 1.0,
r = 1.0. The smallest overall error of 8.28 × 10−3 is achieved by the case l = 0.3,
r = 0.1 shown by Figure 3.13e.
In Figure 3.14 we have pictured for various cases line plots of the exact and
predicted solutions at time levels t ∈ {0, 0.1, 0.2}. The functions shown in Figures
3.14a-b assume the shape of a parabola. In these cases the errors are mainly made
57
Results
Figure 3.13: Predicted solutions of a PINN trained for solving the nonlinear heat equation
ut = λuxx − σu4 with sine as IC and 60 training points for various left (l) and right (r)
boundary values. All depicted cases represent unseen data. Top : Prediction. Bottom :
Relative L∞ error. (a): Large errors (up to 1.78%) at the boundaries. (b)-(d): For
growing r the error translocates to the increasing area around (x, t)T = ( π2 , 0.05)T . (e):
Case with smallest relative L2 error of 8.29e-03. (f): Errors mainly near boundary x = π
and the area around (x, t)T = ( π2 , 0.05)T .
58
3.2 Fixed Initial and Various Boundary Conditions
Figure 3.13: Predicted solutions of a PINN trained for solving the nonlinear heat equation
ut = λuxx − σu4 with sine as IC and 60 training points for various left (l) and right (r)
boundary values. The black markers represent the used training points. Cases where no
markers are visible are not part of the training data.Top : Prediction. Bottom : Relative
L∞ error. (g)-(j): Large error in the area around (x, t)T = ( π2 , 0.05)T . Increasing area of
error for growing r. (k): Case with largest relative L∞ error (5.12%) made at (0, 0)T . (l):
Case with largest relative L2 error of 2.25 × 10−2 .
59
Results
Figure 3.14: Line plots of exact and predicted solutions of the nonlinear heat equation
ut = λuxx − σu4 with sine as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and size 60
depicted at time steps t = 0, 0.1, 0.2. One can see that the more peaked the indentations
at the boundaries, the larger the error at the vertex of the parabolic shaped functions.
(a)-(b): Minor errors of less than 1% around x = π2 for t ∈ {0.1, 0.2}. (c)-(d): Larger
error of about 3% around x = π2 for t ∈ {0.1, 0.2}. (e): Large deviation between exact and
predicted solution at (0, 0)T . (f): Large error around x = π2 .
60
3.2 Fixed Initial and Various Boundary Conditions
at the vertex of the parabola. Note that these are only minor errors as mentioned
above. In cases shown in Figures 3.14c-f, for t>0 the shape of the parabola is still
recognizable. However, indentations are visible at the boundaries. For 3.14d,f these
indentations are at both boundary points and in 3.14c,e there are especially at the
left boundary, i.e. x = 0. Note that in case of l = 1.0, r = 0.0 the prediction already
T
tries to simulate this indent at (0, 0) resulting in the largest maximal error of all
cases (see Fig. 3.14e). Regarding these indents one can see that the more peaked
the indentations, the greater the error between the exact and predicted solutions at
61
Results
We present the results obtained by training a PINN such that it is able to solve
the one-dimensional heat equation ut = λuxx for arbitrary initial and boundary
conditions. Here, we trained the models with several initial conditions represented
by six of their corresponding Fourier coecients and sizes in {40, 60, 80}. In Figure
3.15 we have plotted the losses of the considered models. One can see that all
Figure 3.15: Train and test losses of models trained with several initial functions parame-
terized by six Fourier coecients and boundary values given implicit by the corresponding
initial functions. The models were trained for sizes in {40, 60, 80} and 60000 epochs with
Adam as optimizer and a learning rate of 0.0001.
curves still have potential to further decrease. The train and test curves of each
considered size proceed close to each other. It appears, that the test curves of the
models trained with size 60 and 80 almost lie on top of each other. The best model is
achieved by size 80. In Table 3.11 we have displayed the runtimes of the considered
Table 3.11: Runtime of the models trained with several initial functions parameterized
by Fourier coecients and sizes in {40, 60, 80} for 60000 epochs. Additionally, the train
and test loss values of the best models are depicted.
size runtime train loss test loss
62
3.3 Various Boundary and Initial Conditions
models. This setting required a tremendous amount of time for training. The model
trained with 80 points per function required about 19.5 days for 60000 epochs.
In the following we present results achieved by the model trained with size 80.
In this setting we consider the training and test data, separately. Note, that the
numbering of functions are independent from each other, i.e. No.1 from the training
Training Data
In Figures 3.17 and 3.18 we have displayed some of the 2065 initial functions we
have used to train the model. In the former we collect some functions which the
model managed to learn, in the sense that the prediction ts the shape of the exact
solution, and in the latter we exemplarily show cases where the model fails to predict
the correct solution. In Table 3.12 we depict the errors for some of the functions.
Table 3.12: Predictive errors of generalized model trained for solving the one-dimensional
heat equation ut = λuxx with several initial functions parameterized by six Fourier coe-
cients and size 80. The model was evaluated on training data. The left column represents
the function number. Additionally, the maximal relative L∞ error with locations and the
overall relative L2 error are depicted.
No. (x, t)T max. error rel. L2 error
T
69 (0, 0) 6.74% 1.03e-02
T
280 (0, 0) 12.12% 1.97e-02
Regarding the functions shown in Figure 3.17, it appears that for odd functions
the prediction at t=0 ts more the exact solution than for even functions. Larger
errors are especially made at the boundaries here. This can be seen in Figures
63
Results
3.17a, b, e, h and also in Figures 3.18a-d where even functions are depicted whose
prediction error is large. Function No. 432 (Fig. 3.18c) even exhibits the largest
−3
overall error of 0.121. The smallest relative L2 error of 3.75×10 is achieved by
For better visualization of the model's performance on the training data we show
a histogram depicted in Figure 3.16. About 90% of all 2065 functions achieved a L2
error of less than 2%. The remaining functions have at least an overall error of less
than 10%.
Figure 3.16: Histogram representing the frequency distribution of the (relative L2 ) error.
Considered are the predictions of the model evaluated at the 2065 functions that were used
as initial conditions for training.
64
3.3 Various Boundary and Initial Conditions
Figure 3.17: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. Depicted are functions at time levels t = 0, 0.1, 0.2 which were used
for training. One can see relatively small prediction errors. (a),(b),(e): Even functions
provide larger errors, especially at the boundaries.
65
Results
Figure 3.17: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. Depicted are functions at time levels t = 0, 0.1, 0.2 which were used for
training. (g)-(j): Even functions make larger errors than odd functions. (k): Case with
the smallest realtive L2 error (3.75×10−3 ).
66
3.3 Various Boundary and Initial Conditions
Figure 3.18: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. Depicted are functions at time levels t = 0, 0.1, 0.2 which were used for
training. One can see large prediction errors.
67
Results
Test Data
We now illustrate the performance of the model on unseen data. The test data
contains functions that are similar to the functions contained in the training data
except that they have a dierent scaling. As in case of the training data, we show
test functions that are well-tted and functions with a large prediction error, sepa-
rately. In Figure 3.19 some of the functions which show a low prediction error are
already observed in the training data that smaller prediction errors are made for
odd functions compared to even ones. The best prediction results are achieved by
functions No. 289 and 391 (see Figs 3.19b, a and Tab. 3.13). No. 391 has only
−3
an overall error of 4.24×10 whereas No. 289 has the smallest relative L2 error of
−3
3.75×10 . Note that this is the same smallest overall error that was achieved in
the training data as well. In Figure 3.20 one can see some of the functions with a
poor prediction result. Particularly function No. 140 (see Fig. 3.20d) has a large
prediction error with a L2 error of 264%. Unlike the previous presented functions,
No. 122 and 98 (see Figs. 3.20a, b) have their maximal error not near the initial
condition. For function No. 122 it is near t = 0.67 with a maximal error of 86.11%
Table 3.13: Predictive errors of generalized model trained for solving the one-dimensional
heat equation ut = λuxx with several initial functions parameterized by six Fourier coef-
cients and size 80. The model was evaluated on test data. The left column represents
the function number. Additionally, the maximal relative L∞ error with locations and the
overall relative L2 error are depicted.
No. (x, t)T max. error rel. L2 error
68
3.3 Various Boundary and Initial Conditions
Figure 3.19: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. The model was evaluated on test data. Depicted are functions at time
levels t = 0, 0.1, 0.2 whose prediction is close to the exact solution.
69
Results
Figure 3.20: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. The model was evaluated on test data. One can see large prediction
errors. (a)-(b): Functions depicted at time levels t = 0, 0.5, 1 which have a large error at
areas dierent from t = 0. (c)-(f): Functions depicted at time levels t = 0, 0.1, 0.2.
70
3.3 Various Boundary and Initial Conditions
For the test data, we have also a histogram depicted in Figure 3.21 which shows
the amount of functions having a certain relative L2 error. Here, about 75% of the
functions show an error of less than 2%. However, more than 12.5% of all functions
show an error of larger than 10%. Of these functions 1.5% have even an error larger
than 100%.
Figure 3.21: Histogram representing the frequency distribution of the (relative L2 ) error.
Shown are the prediction errors of the model evaluated at 392 test functions.
71
4 Discussion
The main purpose of this thesis is to investigate on the question if PINNs can be
generalized such that they are able to solve several PDE problems, i.e. to solve a
PDE with various boundary and initial conditions. The obtained results demand
a partial answering of this rather general question. Besides, encapsulated from the
a standard NN. To this end, we evaluate and interpret the key ndings of the afore
presented results by answering these two main questions by six separate subques-
tions.
a poor generalization. Here, the smaller gap for the PINN intimates that better
regularization has been achieved than for the NN. This was to be expected since
the PINN loss contains the PDE residual which acts as a regularization mechanism,
as mentioned in section 2.2.1 and in [4]. The corresponding solution plots (see Fig.
3.3a-b) and errors depicted in Tables 3.1 and 3.2, allow the conclusion that for the
considered problem already 15 training points are sucient for a PINN to learn the
problem. This can be justied regarding the fact that the prediction was performed
on a 100 × 100 mesh, i.e. 10000 data points, from which only 15 were used for
−3
training and nevertheless a small (relative L2 ) error of 5.7×10 can be achieved.
It should be noticed that no boundary or initial data was used for training which
causes the NN to make large errors at the boundaries (up to ≈11%). Still, the PINN
only 2%.
sidered the one-dimensional heat equation, a linear PDE, with sine as IC and zero
Dirichlet BCs. From the mathematical point of view, this problem can be regarded
as an "easy" solvable problem due to the linearity of the PDE and since the initial
73
Discussion
condition is smooth and innitely often dierentiable. Also, the boundary conditions
are simple compared to Neumann or Robin BCs which require derivative evaluations
themselves. It can be assumed that for more complex problems the required number
considered to put more training points near the sharp front which is characteristic
for this PDE such that the model is able to learn the discontinuity well. Also, Raissi,
Perdikaris, and Karniadakis note in [4] that "the total number of collocation points
glance, Figures 3.4a, b suggest that PINNs extrapolate better than NNs due to the
smaller error and the fact that the shape of sine is mostly maintained in case of
the PINN. However, this suggestion is not generally valid. This becomes clear for
increasing size of the training dataset, especially when looking at Figure 3.5 where
the relative L2 error is depicted for all considered sizes. The zick zack pattern
showcase that neither NN nor PINN performs better in terms of extrapolation such
that no valid answer can be given here. This was to be expected however since in
general, NNs do not boast proper extrapolation capabilities [40] and due to PINNs
the dierence betweeen PINN and NN for size 15, shown by the losses, solution plots,
and the corresponding errors. In this case, the PINN is superior over the NN due
to the observed better generalization. This observation can also be seen for case 30
(see Fig. 3.2b), where a still relatively large distance between the PINN and NN test
loss is visible. Also, the solution plots (see Fig. 3.3c-d) and errors (Tabs. 3.1, 3.2)
lead to a similar conclusion as for size 15. However, from size 40 on, one can see that
the PINN and NN perform similar. This is indicated by the losses (see Fig. 3.2c)
and undergird by the corresponding values (test loss, rel. L2 error, maximal rel. L∞
error) which only dier slightly from each other. For the considered problem (PDE),
a size of 40 training samples represents the threshold where the PINN provides no
plots and error values for sizes larger than 30 are more promising in case of PINNs,
case of 40 training points, the training process of the NN is about three times faster.
This can be tied to the fact that for PINNs the gradient has to be evaluated in each
iteration. Figure 3.1b displays that for PINNs the computational time increases
74
linearly with growing size of the training dataset compared to the NN which shows
no signicantly increase of time for sizes up to 200. This leads to the conclusion
that PINNs are superior over NNs for small sizes of the training dataset where the
term "small" has to be seen relatively to the considered problem. This statement is
also conrmed by Raissi, Perdikaris, and Karniadakis [4] where they mentioned that
one of the key properties of PINNs is that they are able to achieve good prediction
results even though it may be only a small amount of data available where standard
tion 3.2 indicate with the aid of three cases that if the initial condition is xed, it is
indeed possible to generalize a PINN to solve the one-dimensional heat equation for
various BCs. In the following, we explain this statement on the basis of these cases
in more detail.
conclusion that for sine as IC it is possible to generalize a PINN when varying the
boundary conditions. This is due to the fact that for all possible boundary values
only minor errors are made, especially for cases where no training data was used at
all, e.g. l=1.0, r=0.9, as seen in Figure 3.8l. This can also be veried by looking at
the relative L2 errors (see Tab. 3.6) which are all of order 10−3 , and the maximal
L∞ errors which for all cases are less than 1%. This bevavior ts the expection
made beforehand, regarding the fact that NNs have proper interpolation capabili-
ties in general [40]. The boundary values for which training data was used are in
{0, 0.2, . . . , 1}. For future work, it would be interesting to consider larger distances
between the values, e.g. {0, 0.5, 1}, and look at the evolution of the resulting gener-
alization errors.
occur in the region around the IC. This indicates that too few training points have
been selected in the areas where the discontinuities occur and thus the PINN is not
able to learn the discontinuity well. It can be assumed that adding more points in
those areas can improve the performance since it has already been shown with the
possible reason might also be the fact that no form of discontinuity appears in the
75
Discussion
architecture of the network. The hyperbolic tangent which serves as activation func-
tion is continuous and innitely often dierentiable. In some future work, one can
investigate if this behavior changes when using an activation function which is also
not dierentiable in some point, e.g. the ReLU function which is not dierentiable
at x = 0. However, although the large errors near the boundaries lead to a larger
overall error, the relative L2 error stays within a reasonable order such that we can
dierence between these two cases is the nonlinear term. Thus, we can expose ap-
propriately the inuence of the nonlinearity on the model. Comparing the solution
plots of Case 1 (see Fig. 3.8) and Case 3 (see Fig. 3.13), one can see the nonlin-
earity causes larger errors in general, especially near the initial condition. A reason
for that could be the fact that in these areas there is a great change of the graph of
the function as can be seen in Figures 3.14i,j,l. This means that in these areas the
gradient is steep and therefore more training points should be set at these locations
which is not the case as pictured in Figures 3.13i,l where the used training points
are displayed. This was already mentioned in question 1 where we emphasized that
for areas with steep gradients more training points are required. However, this can
only be done if the shape of function is known in advance, which is generally not
the case in practical applications. Nevertheless, the PINN provides good prediction
accuracy (see Tab. 3.10) such that it can be stated that for this setting it is possible
and corresponding errors it seems that odd functions are easier for the model to
learn than even functions. Especially Table 3.12 shows that on the training data the
relative L2 error is for even functions at least one order of magnitude higher than
for odd functions. This behavior can be observed on the test data as well. However,
this is not a statement that holds in general. Looking for example at Figure 3.20f,
one can see an odd function which shows large discrepancies between the prediction
and exact solution. Since a similar function contained in the test data (Fig. 3.19d)
achieved a L2 error of 1%, the model architecture seems to be sucient for the
model to learn this kind of function. Thus, it can be assumed that further training
can improve the prediction accuracy in this case. Additionally, it appears that very
large errors are made for functions whose shape seems to be "easy", e.g. the shape
of a parabola (see Fig. 3.20a). Specically, function No. 140 (Fig. 3.20d) shows a
76
4.1 Alternative Method: Learning Nonlinear Operators
prediction result which is not even close to optimal providing a L2 error of 264%,
although a similar function contained in the training data (see Fig. 3.17i) has been
learned well. A possible reason for that could be the frequency distribution of the
functions within the training data. Tendentially, the training data include more odd
functions than even. For possible future work, it should be considered to choose a
on the train data (see Fig. 3.16) shows that at least 90% of all initial functions
achieve an error of less than 2%. Assuming that an error of 2% can be regarded
as a measure of "well-learned", one can say that the model is able to learn these
the histogram representing the test functions (Fig. 3.21), one can see that 1.5% of
all test functions (6 out of 392) have a L2 error larger than 100%. Although these
few functions are not representative for the overall performance of the model, this
causes the model to be not reliable. Apart from the fact that further training could
improve the overall prediction accuracy, by the obtained results we cannot give a
there is still room for optimization, e.g. improve eciency of the implementation
code or optimizing the training dataset. Still, a duration of several weeks for the
training process cannot be ignored. An important point to mention here is, that we
only investigated problems in one dimension. For higher dimensions the runtime will
grow exponentially which makes this whole process not applicable in practice. For
the generalization of PDEs for dierent instances of initial and boundary conditions,
respectively.
Operators
Due to the necessary gradient evaluation in each iteration the training process is
77
Discussion
equation for several instances of the equation. Like standard NNs, only data is re-
quired instead of taking the considered equation into account. The basis forms an
approximation result which states that a neural network with a single hidden layer
can approximate accurately any nonlinear operator [41]. Lu, Jin, and Karniadakis
present in [42] how to use this result in practice and propose deep operator networks
(DeepONets), a data-driven method "to learn (nonlinear) operators accurately and
into C(K2 ). The network to learn an operator G : u 7→ G(u) takes an input function
u ∈ V and a point y ∈ K2 , and outputs a real number G(u)(y) ∈ R (see Fig. 4.1A).
In order to work with the input function u numerically, it has to be represented
but nite many locations {x1 , . . . , xm } which in [42] are called sensors. The only
requirement they claim is the consistency of the sensors for all input functions u (see
Fig. 4.1B). The DeepONet consists of two subnetworks. The architecture is shown
in Figure 4.1C. There are p branch networks each taking (u(x1 ), u(x2 ), . . . , u(xm ))T
as the input and outputting a scalar bk ∈ R for k = 1, . . . , p. In addition to the
branch networks, there is the trunk network which takes y as the input and outputs
In order to increase the performance by reducing the generalization error [42], biases
are added to the last layer of each branch network bk and also to the last stage
p
X
G(u)(y) ≈ bk tk + b0 .
k=1
The authors note that in practice, p is at least of the order 10 which results in
networks, they merge them together into one single branch network (see Fig. 4.1D)
which outputs a vector (b1 , . . . , bp )T ∈ Rp . The authors refer to the two types of
DeepONets as stacked (Fig. 4.1) and unstacked (Fig. 4.1) DeepONet. All versions of
78
4.1 Alternative Method: Learning Nonlinear Operators
Figure 4.1: Illustration of the problem setup and architectures of DeepONets. (A): The
network to learn an operator G : u 7→ G(u) takes two inputs (u(x1 ), u(x2 ), . . . , u(xm ))T
and y . (B): Illustration of the training data. For each input function u we require that
we have the same number of evaluations at the same scattered sensors x1 , x2 , . . . , xm .
However, there are no constraints enforced on the number of locations for the evaluation
of output functions. (C): The stacked DeepONet has one trunk network and p stacked
branch networks. (D): The unstacked DeepONet has one trunk network and one branch
network. Source [42].
problems and showed that by introducing this inductive bias DeepONets can achieve
"even exponential error convergence with respect to the training dataset size". Also,
Note that the authors emphasize that this method achieves good prediction results
when only small data is available. Thus, it can be promising to use this as an
79
5 Conclusion
We investigated on physics-informed neural networks (PINNs), i.e. neural networks
(NNs) which incorporate the considered PDE into the training process. Speci-
cally, we examined the question if PINNs can be generalized in order to solve PDE
Since PINNs are NNs which only dier in their training process, we rst explained
the method of NNs and elucidated the regularization mechanism which is used during
PINNs and illustrated how a PDE is involved within the training process. The PINN
loss function is divided into a PDE and a boundary loss. Only the boundary loss
considers ground truth data consisting of boundary and initial data. The PDE loss
contains the considered PDE and evaluates the required derivatives via AD. It acts
as a regularization mechanism such that PINNs can be used when only partial data is
available. In this context we also pointed to the existence of the scientic machine
PDE problems for xed instances of boundary and initial conditions. It is user-
friendly and has several features such as the exible monitoring of intermediate
results due to so-called callback functions. Also, the method of RAR increases
the training eciency by adding more training points in those areas where steep
solutions of our problem using the software library FEniCS. FEniCS uses the nite
element method, the conventional method to solve PDEs numerically. The user
basically has to provide the considered problem in weak form, also called variational
formulation.
We used a PINN to solve the one-dimensional heat equation with Dirichlet bound-
compared a PINN to a NN with the same prerequisites for solving one instance of
the PDE, i.e. for xed initial and boundary conditions. As it turned out, PINNs
are superior over NNs in terms of accuracy for a small number of training samples.
81
Conclusion
due to the gradient evaluations in each iteration. This can be a severe bottleneck
for higher dimensional problems as the runtime increases exponentially for growing
this task in two steps. First, we xed the initial condition and trained the model
with various instances of boundary conditions. Secondly, we trained the model with
a large amount of initial conditions and implicit given boundary conditions. Our
numerical results indicate that a PINN can be generalized in the rst setting. Due
to the long runtime in case of various initial conditions, it is not possible to give
took several weeks and it seems as if further training is still required because of
the partially poor predictive performance on both training and test data. Still, it
looks promising that PINNs can be generalized in this case considering the results
for xed initial conditions and the fact that there are indeed functions being part
of the test data where the model achieved good prediction accuracy. However, if
many instances are involved in the training data, one has to take into consideration
that this comes along with a massive amount of gradient evaluations which slows
down the overall training process. To improve this, one could optimize the selection
gradient evaluations. Another approach for some future work could be adapting
the source code of DeepXDE such that it can handle various instances of initial and
boundary data. The implemented RAR method could reduce the required amount of
to attempt other methods when it comes to the generalization of PDEs for various
data-driven concept which also aims to achieve good prediction results when only
From the machine learning perspective it is rather unintuitive that for certain
parts of the input data no ground truth is actually used or even needed. How-
ever, from the scientic point of view it is promising that a well-known data-driven
concept, the neural network, is augmented in a way that it takes the analytical rep-
resentation of the problem into account. PINNs reveal their strengths in settings
where traditional solving methods fail to provide reliable predictions due to a lack
of data, e.g. unknown boundary or initial data. However, for applications where the
training requires a lot of input features or gradient evaluations our results show that
conditions it is possible to use PINNs to solve general PDEs for various BCs.
82
Bibliography
[1] Alexander Maedche et al. AI-Based Digital Assistants. In: Business & Infor-
mation Systems Engineering 61.4 (2019), pp. 535544. issn: 1867-0202. doi:
tidy up home environments with service robots. In: Advanced Robotics 35.8
[4] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics In-
[6] Allan Pinkus. Approximation theory of the MLP model in neural networks.
[7] Yeonjong Shin, Jerome Darbon, and George Em Karniadakis. On the conver-
gence of physics informed neural networks for linear second-order elliptic and
[8] Maziar Raissi et al. Deep learning of vortex-induced vibrations. In: Journal
of Fluid Mechanics 861 (2019), pp. 119137. doi: 10.1017/jfm.2018.872.
[9] Zhiping Mao, Ameya D. Jagtap, and George Em Karniadakis. Physics-informed
neural networks for high-speed ows. In: Computer Methods in Applied Me-
chanics and Engineering https:
360 (2020), p. 112789. issn: 0045-7825. doi:
83
Bibliography
[10] Xiaowei Jin et al. NSFnets (Navier-Stokes ow nets): Physics-informed neural
pdf/FEDSM2020/83730/V003T05A054/6575747/v003t05a054- fedsm2020-
20159.pdf. url: https://doi.org/10.1115/FEDSM2020-20159.
[13] Tongsheng Wang et al. Reconstruction of natural convection within an en-
closure using deep neural network. In: International Journal of Heat and
Mass Transfer https://doi.
164 (2021), p. 120626. issn: 0017-9310. doi:
[16] Lu Lu et al. DeepXDE: A deep learning library for solving dierential equa-
[19] Shengze Cai et al. Physics-informed neural networks (PINNs) for uid me-
84
Bibliography
[20] Liu Yang, Xuhui Meng, and George Em Karniadakis. B-PINNs: Bayesian
physics-informed neural networks for forward and inverse PDE problems with
noisy data. In: Journal of Computational Physics 425 (2021), p. 109913. issn:
https://doi.org/10.1016/j.jcp.2020.109913. url: http
0021-9991. doi:
s://www.sciencedirect.com/science/article/pii/S0021999120306872.
[21] Mark Craven, Johan Kumlien, et al. Constructing biological knowledge bases
by extracting information from text sources. In: ISMB. Vol. 1999. 1999,
pp. 7786.
[23] Christopher J.C. Burges Yann LeCun Corinna Cortes. The MNIST Database
of Handwritten Digits. http://yann.lecun.com/exdb/mnist/. 1998.
[24] Niklas Rottmayer. Deep Learning with Neural Networks. Bachelor Thesis.
2019.
tice and Research for Deep Learning. In: CoRR abs/1811.03378 (2018). arXiv:
1811.03378. url: http://arxiv.org/abs/1811.03378.
[26] cloud4science. Notes on Deep Learning and Dierential Equations. June 2020.
url:https://cloud4scieng.org/2020/06/10/notes-on-deep-learning-
and-differential-equations/.
[27] Ya-xiang Yuan. A NEW STEPSIZE FOR THE STEEPEST DESCENT METHOD.
gradient-descent-with-momentum-from-scratch/.
[30] Christopher M.Bishop. Neural Networks for Pattern Recognition. Oxford Uni-
versity Press, 1995.
network methods for boundary value problems with irregular boundaries. In:
85
Bibliography
[33] Richard H Byrd et al. A limited memory algorithm for bound constrained
optimization. In: SIAM Journal on scientic computing 16.5 (1995), pp. 1190
1208.
[35] Hans Petter Langtangen and Anders Logg. Solving PDEs in Python - The
FEniCS Tutorial I. Springer, 2016.
[36] Bernd Simeon. Numerical PDEs I. Lecture Notes.
[37] http://www-m15.ma.tum.de/foswiki/pub/M15/Allgemein
Oct. 2021. url:
es/IllposedProblems/WebHome/board4.pdf.
[38] Gabriele Steidl and Ronny Bergmann. Fundamentals of Mathematical Image
2011.
[40] Etienne Barnard and LFA Wessels. Extrapolation and interpolation in neural
network classiers. In: IEEE Control Systems Magazine 12.5 (1992), pp. 50
53.
[41] Tianping Chen and Hong Chen. Universal approximation to nonlinear opera-
tors by neural networks with arbitrary activation functions and its application
[42] Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning non-
86