0% found this document useful (0 votes)
43 views94 pages

Master Thesis

Pinns neural nets

Uploaded by

Pablo Llaria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views94 pages

Master Thesis

Pinns neural nets

Uploaded by

Pablo Llaria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

University of Kaiserslautern

Faculty of Mathematics

Master Thesis

Generalization of PINNs for Various


Boundary and Initial Conditions

Violetta Schäfer

supervised by

Dr. Martin Bracke

Prof. Dr. René Pinnau

Dr. rer. nat. Martin Siggel

11 Januar, 2022
Abstract
Physics-informed neural networks (PINNs) are neural networks (NNs) which are

used for solving partial dierential equations (PDEs). They minimize a loss function

which incorporates the given PDE by using automatic dierentiation. To the best

of our knowledge, PINNs are currently only used for solving one instance of a PDE

problem, i.e. xed initial and boundary conditions. We investigate if PINNs can be

generalized such that they are able to solve PDEs for various instances of initial or

boundary conditions, respectively.

To this end, we consider two cases: xed initial and various boundary conditions,

and various initial conditions given parameterized by six Fourier coecients. As

application example we choose the one-dimensional heat equation with Dirichlet

boundary conditions. We show that for xed initial conditions, PINNs are able to

achieve good prediction accuracy on unseen data. For various initial conditions the

generalization question is still open. Due to the gradient evaluations in each iteration

the training process is exceedingly time-consuming. This makes the generalization of

PINNs for many instances of the problem infeasible in practice. From the theoretical

point of view, we assume that further training and optimization of code and the

training dataset could answer the question in some future work.

iii
Zusammenfassung
Physikalisch-unterrichtete neuronale Netze (PINNs) sind neuronale Netze (NNs), die

verwendet werden um partielle Dierentialgleichungen (PDEs) zu lösen. Sie min-

imieren eine Verlustfunktion, die die gegebene PDE einarbeitet, indem automatische

Dierentiation benutzt wird. Nach unserem besten Wissen werden PINNs zurzeit

nur für das Lösen von einer Instanz einer PDE zu lösen, sprich feste Anfangs- und

Randbedingungen. Wir untersuchen ob PINNs generalisiert werden können, sodass

sie in der Lage sind PDEs mit mehreren Instanzen von Anfangs- bzw. Randbedin-

gungen zu lösen.

Zu diesem Zweck betrachten wir zwei Fälle: feste Anfangs- und variierende Randbe-

dingungen, und variierende Anfangsbedingungen gegeben parametrisiert durch sechs

Fourierkoezienten. Als Anwendungsbeispiel wählen wir die eindimensionale Wärme-

leitungsgleichung mit Dirichlet Randbedingungen. Wir zeigen, dass für feste An-

fangsbedingungen PINNs dazu in der Lage sind eine gute Genauigkeit für Vorher-

sagen auf ungesehen Daten zu erzielen. Für variierende Anfangsbedingungen ist die

Frage der Generalisierung noch oen. Durch die Auswertung des Gradienten in jeder

Iteration ist der Trainingsprozess äuÿerst zeitaufwändig. Dies macht die General-

isierung von PINNs für viele Instanzen des Problems in der Praxis undurchführbar.

Aus theoretischer Sicht gehen wir davon aus, dass weiteres Training und die Op-

timierung des Codes und des Trainingsdatensatzes die Frage in einer zukünftigen

Arbeit beantworten könnten.

v
Contents
1 Introduction 1

2 Methods 3
2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Physics-Informed Neural Networks . . . . . . . . . . . . . . . . . . . 10

2.2.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2 Application Library: DeepXDE . . . . . . . . . . . . . . . . . 14

2.3 Generating Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.1 Finite Element Method . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2 Numerical PDE Solver: FEniCS . . . . . . . . . . . . . . . . . 20

2.4 Application to Heat Equation . . . . . . . . . . . . . . . . . . . . . . 23

2.4.1 Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.2 Comparison of PINN to NN . . . . . . . . . . . . . . . . . . . 29

2.4.3 Generalization of PINNs . . . . . . . . . . . . . . . . . . . . . 30

3 Results 33
3.1 Fixed Boundary and Initial Condition . . . . . . . . . . . . . . . . . . 33

3.1.1 Size of Training Dataset . . . . . . . . . . . . . . . . . . . . . 33

3.1.2 Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1.3 Validation with DeepXDE . . . . . . . . . . . . . . . . . . . . 43

3.2 Fixed Initial and Various Boundary Conditions . . . . . . . . . . . . 45

3.2.1 Initial Condition: Sine . . . . . . . . . . . . . . . . . . . . . . 45

3.2.2 Initial Condition: Line with Discontinuous Boundaries . . . . 50

3.2.3 Additional Nonlinear Term: Radiative Loss . . . . . . . . . . . 56

3.3 Various Boundary and Initial Conditions . . . . . . . . . . . . . . . . 62

4 Discussion 73
4.1 Alternative Method: Learning Nonlinear Operators . . . . . . . . . . 77

5 Conclusion 81

vii
Contents

Bibliography 83

viii
1 Introduction
In recent years, the topic of machine learning has been well-established not only

in scientic research topics, but also in our everyday lives. Be it through several

language assistants that are integrated in our everyday gadgets [1] or objects like

cars or robots that are able to decide autonomously using articial intelligence [2, 3].

With tremendous speed scientists of all reasearch areas are developing and enhancing

machine learning techniques. And all have one in common. Data is the fundament.

Having a sucient amount of data available is crucial for machine learning ap-

plications. It is common knowledge that neural networks only learn from data and

it is intuitively clear that a lack of data causes the issue that we probably cannot

rely on the prediction of the resulting model. Fortunately, we live in an age where

the amount of data and computational resources are not a concern at least for ap-

plications based on e.g. image recognition or language processing. However, most

real-world phenomena are not described by images or words but physical laws that

can be formulated as partial dierential equations (PDEs). Unfortunately, it holds

that "in the course of analyzing complex physical, biological or engineering systems,

the cost of data acquisition is prohibitive, and we are inevitably faced with the

challenge of drawing conclusions and making decisions under partial information"

[4].

Physics-informed neural networks (PINNs) are neural networks which compensise


this lack of data by respecting the given law of physics that stems from the PDE of

the considered problem. They aim to provide a surrogate of the solution by mini-

mizing the corresponding PDE residual. In general, PDEs are solved by numerical

methods such as the nite element method. However, these classical methods are

mesh-based and use approximative derivatives yielding discretization errors. PINNs

provide the ability to solve PDEs by combining automatic dierentiation, which

computes gradients at machine precision [5] completely mesh-free, with the capabil-

ity of neural networks to serve as universal function approximators [6]. For complex

problems, already this mesh generation is time consuming, not to mention the high

computational eort of the whole simulation itself.

1
Introduction

We address this issue by investigating the generalization capabilities of PINNs.

Can we train a PINN to solve several reference problems such that it is possible to

use this model for dierent input parameters afterwards, and thus save computa-

tional time? Since a fully specied problem always requires boundary and initial

conditions, the question is in other words: Can PINNs be trained in a way such that

they provide reliable predictions for arbitrary boundary and initial conditions?

Motivated by the question if PINNs are able to learn physics at all, Raissi,

Perdikaris, and Karniadakis (2017) already presented promising results which in-

dicate that "if the given PDE is well-posed" PINNs are "capable of achieving good

prediction accuracy" [4]. A theoretical justication for PINNs provide Shin, Dar-

bon, and Karniadakis (2020), showing the consistency of PINNs. In the last three

years the repertoire of scientic publications regarding PINNs has been growing

rapidly. PINNs have been applied to various uid mechanics problems [811] as

well as heat transfer problems [1215] especially showing the eectiveness of PINNs

in solving inverse problems where traditional methods typically fail. Moreover, Lu

et al. (2019) developed a user-friendly software tool called DeepXDE [17] which is

capable of solving several forward and inverse problems involving dierential equa-

tions. More recently, it has been shown [1820] that PINNs can integrate seemlessly

multi-delity experimental data with the governing equations.

This thesis is organized as follows. First, we study the method of neural networks

and look at the concepts that are used during training such as regularization. Sub-

sequently, we introduce physics-informed neural networks and expose parallels and

dierences towards standard neural networks. Within this section we also describe

basic features of the software library DeepXDE. Afterwards, we give a brief intro-

duction into the nite element method and how this method is used in the numerical

PDE solver FEniCS which we utilize to construct our ground truth data. We then de-

scribe how we apply PINNs to a practical example considering the one-dimensional

heat equation. After that, we present the results obtained by the previously ex-

plained methods and discuss them afterwards. Finally, we make conclusions based

on our observings.

2
2 Methods
In this chapter we explain the basic concept of a neural network (NN) and how

this concept is augmented such that the resulting model can be named a physics-
informed neural network (PINN). Subsequently, we explain the functionality of the

numerical PDE solver FEniCS which we are using for generating the ground truth

data. After that, we present the application example to which these concepts are

applied to.

2.1 Neural Networks

NNs are widely used in many visual based application elds such as autonomous

driving, extracting information from text les or teaching a robot to tidy up your

room [2, 3, 21]. More relevant for our purposes is that they provide a robust approach

of approximating nonlinear mappings [6]. "The form of the mapping is governed by

adjustable parameters that have to be trained." [22].


The NN requires a dataset that is divided into a training and test dataset under

a certain ratio. The training is performed on the usually large training dataset in

order to learn the task. The test dataset is used during training as well in order

to examine if the model is able to produce proper results when using data that is

not part of the training data. Additionally, it provides an indication if the current

network architecture is sucient to learn the task or has to be adjusted. A part of

the test data is used to make predictions afterwards. The goal is to acquire a model

that is able to perform well on unseen data.

NNs are either used for classication or regression. In a classication task the NN

has to decide to which class a certain input feature belongs to. A common example

is the classication of handwritten digits where the well-known MNIST dataset [23]

is used. A regression task refers to learning a (nonlinear) function which we address

in this thesis.

3
Methods

Network Architecture

The architecture of an articial neural network is based on the concept of the bi-

ological analogue - the brain. An important role is played by the interconnected

neurons that are responsible for the exchange of information.

Denition 2.1. (Neuron)

Let d ∈ N. An articial neuron ν : Rd → R with weight w ∈ Rd , bias b ∈ R and

activation function σ : R → R is dened as the map ν(x) = σ(w · x + b) for x ∈ Rd .

Weight Bias
b
x1 w1
Activation
Function Output

Input x2 w2
Σ σ ν(x)

x3 w3

Figure 2.1: Model of an articial neuron taking an input feature of dimension d = 3.

The neurons are arranged in layers. A NN can then be regarded as a parallel

circuit of concatenated neurons as visualized in Figure 2.2.

Denition 2.2. [24] (FNN)

Let L, d, n1 , . . . , nL ∈ N and n0 := d. A L-layer feedforward neural network


û : Rn0 → RnL with ane linear maps Al : Rnl−1 → Rnl , x 7→ Al (x) = W l x + bl
, bl ∈ R l and activation functions σl : R → R, l = 1, ..., L is
n ×nl−1 n
with W l ∈ R l

dened as

Input Layer : û(0) (x) = x ∈ Rd


: û(l) (x) = σl W l û(l−1) (x) + bl ∈ Rnl

Hidden Layers for 1≤l ≤L−1
Output Layer : û(x) = W L û(L−1) (x) + bL ∈ RnL

where the activation functions are used component-wise. Here, d is the dimension of
the input layer, L denotes the number of layers also called depth of û, n1 , . . . , nL−1
denote the number of neurons for each of the L − 1 hidden layers, also called width
of the respective layer. If n1 = · · · = nL−1 then, ni is called width of û for i ∈

{1, . . . , L − 1}. nL is the dimension of the output layer. The matrices W l contain
the network's weights and the vector bl biases.

4
2.1 Neural Networks

Input 1. Hidden 2. Hidden Output


Layer Layer Layer Layer

x
û(x)

t
. .
. .
. .

Figure 2.2: FNN with two hidden layers and x = (x, t)T ∈ R2 as input.

The graph of a FNN is acyclic and has only egdes which follow the direction from

input to output. The activation function "decides whether a neuron is excited or

inhibited through other neurons" [22]. Moreover, it is responsible for the nonlinear

transformation of the input which prevents the model from being a linear regression

model. Commonly used activation functions are pictured in Figure 2.3. Each activa-

Figure 2.3: Examples of activation functions used for neural networks.

tion function has its advantages and disadvantages. For more information we refer

to [25]. Notice, that there is no specic rule which activation function to choose for

a given task. In this thesis, we are using the hyperbolic tangent function
sinh x e − e−x
x
tanh x = =
cosh x ex + e−x

as it is used in [16, 26] for similar regression tasks. Note, that for regression tasks

5
Methods

it is common to not apply an activation function to the output layer. Unlike for

classication tasks, here, continuous values are required instead of discrete ones.

For simplicity we combine the learnable parameters W l ∈ Rnl ×nl−1 and bl ∈ Rnl
n ×nl−1 +1
into one parameter Θl ∈ R l for l ∈ {1, . . . , L}. Since the prediction û is
µ
PL
dependent on Θ := (Θ1 , . . . , ΘL ) ∈ R with µ = l=1 nl (nl−1 + 1) we also write
ûΘ (x).

2.1.1 Training

The training of the network's parameters corresponds to minimizing a specied

loss function. For a given input x the loss function describes a measurement of

the error between the network's output ûΘ (x) and the ground truth u(x), i.e. the

desired output. There are several loss functions that are used for machine learning

applications according to which task they have to solve. Since we have a regression

task the mean squared error (MSE) loss is a proper loss function.

Denition 2.3. (MSE Loss)


d nL
Let ûΘ : R → R with d, nL ∈ N be a FNN as dened in Def. 2.2. Further, let
Ω ⊂ Rd be a bounded domain and u : Ω → RnL be the ground truth function the
FNN aims to approximate. Then, for a given set of training points T ⊂ Ω the mean

squared error loss LT : Rµ → R is dened as


1 X
LT (Θ) = kûΘ (x) − u(x)k22 .
|T | x∈T

Due to the squaring, larger errors are penalized more heavily than smaller ones.

In addition to that, quadratic functions are prefered over other mappings, e.g. | · |1 -
norm, since they are dierentiable in each point. One can add an additional sum-

mand to the loss function containing a certain regularization.

Regularization
The purpose of regularization is to avoid overtting which would lead to a model

that ts the training data too precisely. The other extreme is called undertting. An

undertted model barely ts the training data and is most likely not able to provide

reliable predictions for unseen data. The principles of overtting and undertting

are visualized in Figure 2.4. Both cases result in a model that is not able to perform

well on unseen data and thus has poor generalization capabilities. Since the concept

of generalization is the key essence of this thesis, we explain the functionality of

regularization with an example.

6
2.1 Neural Networks

Figure 2.4: Example of a model that is undertted, balanced or overtted. The blue
markers represent the training data and the dotted line the prediction of the model.
Source: https://analystprep.com/study-notes/cfa-level-2/quantitative-method/
overfitting-methods-addressing/.

Consider a regression problem where the constructed NN serves as function ap-

proximation. If the model is overtted, the resulting function will contain a lot of

curvature. Suppose this function has the mathematical description of a polynomial.

Then, the coecients of higher degrees would be large. In order to smoothen the

function, these coecients ought to shrink. This is achieved by regularization. It

penalizes the weights that are considerably large compared to the others. There are

dierent types of regularization such as e.g. L1 (Lasso) regularization or L2 (Ridge)


regularization. We show the example of L2 regularization. Consider the MSE loss as

dened in Denition 2.3 for a L-layer FNN ûΘ : Rd → RnL with weights and biases
summarized in the parameter Θ. If we add a L2 regularization to the loss function
we obtain
1 X X
LT (Θ) = kûΘ (x) − u(x)k22 + λ Θ2j
|T | x∈T
j∈U \B

U n= {0, . . . , µ − 1} and B = Ll=1 Bl as the set of oindices of all biases where


S
for
Pl−1
Bl = i=1 ni (ni−1 + 1) + kl (nl−1 + 1) − 1 : 1 ≤ kl ≤ nl denotes the set of indices

of biases of layer l . The regularization parameter λ ≥ 0 governs how much we want

to penalize the exibility of the model. Note, that only the weights and not the

biases are considered for regularization.

The dierentiability of the loss function follows from the dierentiability of the

activation function and the fact that the k · k2 -norm is dierentiable as well. The

training process of the network under the MSE loss can then be formulated as the

optimization problem

min L(Θ)
Θ

7
Methods

with loss function L as dened in Def. 2.3 and Θ ∈ Rµ representing the network's

parameters. The minimization is done using a gradient descent approach. The

method of steepest descent (Alg. 1) is "the simplest gradient method for optimiza-

tion" [27]. However, for non-convex functions it is possible that the algorithm fails

Algorithm 1 Method of Steepest Descent


Require: objective function L(Θ)
Require: Initialization: Θ0
k←0 . Initialize iteration step
αk > 0 . Compute step size via line search
while Θk not converged do
k ←k+1
gk ← ∇Θ L(Θk−1 ) . Compute gradient at iteration step k
Θk ← Θk−1 − αk−1 gk . Update parameters
αk ← αk−1 . Compute new step size
end while
return Θk . Resulting parameters

to converge towards the global minimum. This is the case for the choice of a small

learning rate for instance. The algorithm converges more slowly and possibly to-

wards a local minimum. Otherwise, although a large learning rate could accelerate

the process until convergence there is the risk of not reaching the global optimum

at all. Therefore, this method is rarely used in practice.

In this thesis, we consider a stochastic gradient descent (SGD) approach. Unlike

gradient descent, a SGD method only takes small subsets of the training dataset,

so-called mini-batches, in each optimization step. This yields only approximations

of the actual gradients leading to a noisy gradient in total. Due to this noise, the

method is capable of escaping possible local minima. One can draw an analogy to the

momentum from physics where it is described as a mass having a certain velocity

while moving in a specic direction. If the algorithm reaches a local minimum,

it is likely to leaving it again due to the momentum and the algorithm continues

optimizing. Instead, a gradient descent approach would end optimization when

reaching a local minimum due to a zero gradient. Thus, SGD is eective for objective

functions with a large amount of curvatures.

The SGD method we are dealing with is called Adam (adaptive moment estima-

tion), rst proposed by Kingma and Ba, 2014. According to the authors, it "only

requires rst-order gradients with little memory requirement" [28].Also, "the mag-

nitudes of parameter updates are invariant to rescaling of the gradient". Adam

tackles the aforementioned problems by using estimations of rst (mean) and sec-

ond moments (uncentered variance) of the gradient to adapt the learning rate for

8
2.1 Neural Networks

Algorithm 2 Adam
Require: stepsize α, decay rates β1 , β2 ∈ [0, 1), objective function L(Θ)
Require: Initialization: Θ0
k←0 . Initialize iteration step
m0 ← 0 . Initialize 1st moment vector
v0 ← 0 . Initialize 2nd moment vector
while Θk not converged do
k ←k+1
gk ← ∇Θ L(Θk−1 ) . Compute gradient at iteration step k
mk ← β1 · mk−1 + (1 − β1 ) · gk . Update biased 1st moment estimate
vk ← β2 · vk−1 + (1 − β2 ) · gk2 . Update biased 2nd raw moment estimate
mk
m̂k ← 1−β k . st
Compute bias-corrected 1 moment estimate

. Compute bias-corrected 2nd raw moment estimate


1
vk
v̂k ← 1−β2k
Θk ← Θk−1 − α · √m̂k . Update parameters
v̂k +
end while
return Θk . Resulting parameters

the networks's parameters in each update. By calculating the exponential moving

average, it "adds history to the parameter update equation based on the gradient

encountered in the previous updates" [29].

In Algorithm 2,gk2 denotes the elementwise square gk gk . The authors recommend


α = 0.001, β1 = 0.9, β2 = 0.999 and  = 10−8 as default settings. Further details
can be read in [28].

In deep learning the gradient ∇L(Θ) is obtained by backpropagation which, as

the name suggests, computes the gradient by using the chain rule while propagating

backwards through the network. For more details we refer to [30, pp. 140-148].

Machine learning frameworks such as TensorFlow (Google Brain Team, 2015) and

PyTorch (Paszke et al., 2016) have implemented the method of automatic dier-
entiation (AD). One might assume that these are distinct methods as they have

dierent names. But both are strongly related to each other. As a matter of fact,

backpropagation is a special case of AD.

9
Methods

2.2 Physics-Informed Neural Networks

The NN is a solely data-driven method mainly used for applications where enough

data is available. Especially image data can easily be gathered from open source

databases. However, most real-world phenomena are not described by images but

physical laws characterizing conservation laws, diusion processes, advection-diusion-

reaction or other systems. It is possible that only partial data is available when ob-

serving such phenomena which results in a poor predictive performance when using

NNs. This motivated the idea to involve these physical laws, often formulated as

partial dierential equations (PDEs), as additional source of information into prac-

tical computations and paved the way for PINNs to emerge. PINNs are NNs which

are not only trained with ground truth data, but learn the scientic laws described

by the considered PDE. Before we explain the concept of a PINN in more detail, we

introduce some basic denitions which are based on [31].

Denition 2.4. (Higher-Order Partial Derivative)

Let u: Ω → R withΩ ⊂ Rd being a domain. Further, let k = (k1 , . . . , kd )T ∈ Nd0 be

a multi-index of order |k| := k1 + · · · + kd . Then

Dk u = ∂xk11 · · · ∂xkdd u

denotes a partial derivative of order |k|.


Denition 2.5. (PDE)
m
Let f: Ω×R →R and k1 , . . . , km be multi-indices and u : Ω → R. Then

f x, Dk1 u, . . . , Dkm u = 0,

x ∈ Ω,

is called partial dierential equation (PDE) of order p = max1≤i≤m |ki | for the func-

tion u.

We restrict ourselves to the class of well-posed problems rst introduced by Hadamard.


In the context of PDEs we obtain the following denition.

Denition 2.6. (Well-Posed Problem)

Consider the problem described by the PDE

f x, Dk1 u, . . . , Dkm u = h


on Ω × (0, T ), Ω ⊂ Rd , with boundary condition (BC)

B(u) = g on ∂Ω

10
2.2 Physics-Informed Neural Networks

and initial condition (IC)

I(u) = u0 in Ω at t = 0.

Then, the problem is well-posed in the sense of Hadamard if it holds

1. There exists at least one solution for every (appropriate) data.

2. For every h, g and u0 there exists at most one solution.

3. The solution u = u(h, g, u0 ) depends continuously on the data h, g and u0 , i.e.


small changes in the data yield only small changes in the solution.

A PDE dened with an IC and BCs constitutes an initial-boundary-value problem.


In order to deal with well-posed problems in practice, we need to prescribe proper

BCs that u assumes along the boundary ∂Ω. Possible choices are

1. Dirichlet : u = uD on ∂Ω

2. Neumann : n · ∇u = uN on ∂Ω

3. Robin : αn · ∇u + βu = uR on ∂Ω

for given functions uD , uN , uR : ∂Ω → R and constants α, β ∈ R. Here, n denotes


 T
the normal vector and ∇ = the gradient operator. Further, by
∂ ∂
∂x1
, . . . , ∂xd
 2  T
we denote the Laplace operator.
∂ ∂2
∆ = ∂x 2 , . . . , ∂x2
1 d

PINN Architecture

We describe the concept of a PINN using ideas and notations from [4, 16]. Since

we consider PDEs in numerical simulations, we restrict our domain to be bounded.

Therefore, let Ω̃ = Ω × [0, T ] where Ω ⊂ Rd , d ∈ N, is bounded. Additionally,


we dene our boundary to be ∂ Ω̃ = ∂Ω × (0, T ) ∪ Ω × {0} since numerically the
boundary and initial data are treated equally. Thus, in the following we treat the

IC as a special type of BC. For x = (x1 , . . . , xd , t)T ∈ Ω̃ we consider a PDE with

solution u : Ω̃ → R denoted by

f x, Dk1 u, . . . , Dkm u = 0,

x ∈ Ω̃,

with boundary conditions

B(u, x) = 0, x ∈ ∂ Ω̃.

11
Methods

In order to solve this PDE with a PINN, the following procedure [16] has to be

fullled.

1. Construct a NN ûΘ with network parameters Θ.

2. Specify the two training sets Tf and Tb for the PDE and BC/IC, respectively.

3. Specify the loss function LT (Θ) by summing the weighted MSE loss of both

the PDE and BC/IC residual.

4. Train the NN by minimizing the loss function LT (Θ).

A visualization of this procedure solving the one-dimensional heat quation ut = λuxx


with mixed BCs is pictured in Figure 2.5. The basis forms a NN ûΘ with network

parameters Θ as dened in Denition 2.2. Notice that PINNs are not constrained to

FNNs but could have other network architectures as well. The model ûΘ acts as a

Figure 2.5: Schematic of a PINN solving the one-dimensional heat equation ut = λuxx
with mixed BCs u(x, t) = gD (x, t) on ΓD ⊂ ∂ Ω̃ and ∂n ∂u
(x, t) = gR (u, x, t) on ΓR ⊂ ∂ Ω̃.
The IC is treated as a special type of Dirichlet BC. Tf and Tb denote the sets of training
points for the PDE and BC/IC residuals, respectively. Source: [16].

surrogate of the solution u. In order to solve the PDE, the computation of (partial)

derivatives is essential. By the usage of AD we are able to compute the derivatives (of

any order) of our network ûΘ with respect to all relevant input variables, independent
from the structure of the underlying programming code. Thus, we can include the

PDE residual into our computations with no need for a mesh generation as it is used

for the nite element method. This inclusion is done by considering two subsets of

the training data T ⊂ Ω̃. The set Tf ⊂ Ω̃ contains points in the domain and

Tb ⊂ ∂ Ω̃ points on the boundary and initial data. The training points are randomly

distributed in the domain. As indicated in Figure 2.5, only for points in Tb ground

12
2.2 Physics-Informed Neural Networks

truth data is used. For points in Tf the given PDE residual is minimized. This is the

key dierence from a standard NN which uses ground truth data for all considered

training data. In doing this, the model ûΘ learns the physical law imposed by the

PDE.

2.2.1 Training

Like NNs, PINNs aim to minimize a certain loss function. But dierent from stan-

dard NNs, the loss function of a PINN is a linear combination of two loss functions

evaluated at points in Tf and Tb , separately. It is conveninent to choose the MSE

as loss function for each training set.

Denition 2.7. (PINN Loss)


d
Let ûΘ : R → R with d ∈ N be a FNN as dened in Def. 2.2. Further, let
Ω̃ ⊂ Rd , d ∈ N, be a bounded domain and u : Ω̃ → R be the solution of the
k k

PDE f x, D 1 u, . . . , D m u = 0 with boundary and initial data B(u, x) = 0 for

x ∈ Ω̃. Then, for given sets of training points Tf ⊂ Ω̃ and Tb ⊂ ∂ Ω̃ the PINN loss
LT : Rµ → R is dened as

LT (Θ) = wf LTf (Θ) + wb LTb (Θ)

where

1 X 2
f x, Dk1 û, . . . , Dkm û

LTf (Θ) = 2
,
|Tf | x∈T
f

1 X
LTb (Θ) = kB(û, x)k22
|Tb | x∈T
b

with weights wf and wb .

In section 2.1 we explained the concept of regularization which prevents the model

from the issue of overtting. In fact, when looking at the PINN loss, the term

wf LTf (Θ) acts as regularization with regularization parameter wf in order to pe-

nalize those parameters Θ which would lead to solutions not satisfying the PDE.

"Therefore, a key property of PINNs is that they can be eectively trained using

small datasets" [4]. Also, the authors of [7] note that "it has been empirically

reported that a properly chosen boundary weight wb could accelerate the overall

training and result in a better performance". It is recommended [32] to choose wb


large and positive in order to improve the accuracy of the model.

Like NNs, the training corresponds to the minimization problem minΘ L(Θ). Since

13
Methods

the PINN loss is a linear combination of two MSE losses, it is dierentiable by

the same reasoning as for NNs. The training of the network's parameters Θ is

proceeded using a gradient descent approach like Adam [28] or L-BFGS [33]. "The

required number of iterations highly depends on the problem (e.g. the smoothness

of the solution)" [16]. In each step of the training process, (partial) derivatives

have to be calculated. Thus, if the number of points in Tf is very large, it is

computationally expensive to calculate the PINN loss in every iteration. Lu et al.

proposed a strategy to improve the eciency of the training process called residual-
based adaptive renement which we will discuss later.

Since we are solving a non-convex optimization problem, there is no theoretical

guarantee that this process converges to the global minimum. However, Raissi,

Perdikaris, and Karniadakis emphasize in [4] that "if the given PDE is well-posed"

the method PINN "is capable of achieving good prediction accuracy given a sucient

number of collocation points" lying in Tf . As a matter of fact, Shin, Darbon, and

Karniadakis present in [7] that for linear second-order elliptic and parabolic type

PDEs "the sequence of minimizers strongly converges to the PDE solution in C0 ",
where C0 denotes the set of continuous functions. They state furthermore that

"if each minimizer satises the initial/boundary conditions the convergence mode

becomes H1 " [7] where H1 denotes the Sobolev space which we will introduce later.

Even more than that state Cai et al. in [18] where they ascertained with great success

that PINNs are able to solve ill-posed problems as well when providing only partial

measurement data on the boundary.

2.2.2 Application Library: DeepXDE

Lu et al. developed a software library called DeepXDE [17] which enables the user

to solve forward and inverse PDEs via PINNs. Additionally, it is able to solve other

kinds of problems such as forward and inverse integro-dierential equations. In [16]

it is said that the library was designed "to make the user code stay compact and

manageable, resembling closely the mathematical formulation". Thus, it simplies

the process of analyzing scientic problems when using machine learning methods.

DeepXDE supports the Python libraries Tensorow 1.x, 2.x and PyTorch. In the

following, we expose some of the advantages of DeepXDE proposed in [16].

A useful feature of the library is that it supports complex geometry domains using

the technique of constructive solid geometry (CSG). Although DeepXDE already

supports basic geometries (rectangle, circle, etc.), it is possible that the user needs

to dene a more complex geometry. CSG supports two- and three-dimensional

domains. Some examples of possible domains are displayed in Figure 2.6.

14
2.2 Physics-Informed Neural Networks

Figure 2.6: Examples of constructive solid geometry (CSG) in 2D. Left : A and B represent
the rectangle and circle, respectively. From A and B the union A|B , dierence A − B and
intersection A&B are constructed. Right : A complex geometry (top) is constructed from
a polygon, a rectangle and a circle (bottom). Source: [16].

One major advantage is that the user does not have to provide training data. The

user denes the PDE and BC/IC as functions represented in analytical form. One

can either specify the locations of the points or only set the number of points and

then DeepXDE will sample the required number randomly on a grid covering the

specied domain.

Another advantage is that DeepXDE increases the training eciency by employ-

ing the residual-based adaptive renement (RAR) method. In some cases where

functions contain areas with steep gradients, it is useful to choose more training

points in these areas. However, in many applications the shape of the solution is

unknown in advance. Thus, the design of the distribution of considered training

points poses a challenge. To this end, Lu et al. proposes with RAR a method which

improves the distribution of training points during training process by adding more

points in the locations where the PDE residual is large. This is repeated until the

mean residual Z
1
f x, Dk1 u, . . . , Dkm u dx

εr =
V Ω̃

is smaller than a threshold ε where V denotes the volume of Ω̃.


Further, the user has the ability to add so-called callback functions. This enables

the user to monitor the training process and to make modications in real time, e.g.

change the learning rate, if necessary. Possible callback functions can for example

save the model after certain epochs, calculate the rst derivative of the outputs with

respect to the inputs, or monitor a movie of the spectrum of the function's Fourier

transform. It is convenient to dene customized callback functions which are called

at specied stages of the training process.

15
Methods

An overview of the usage of DeepXDE for solving PDEs is depicted in Figure 2.7.

The corresponding procedure is according to [16] summarized as follows.

1. Specify the computational domain using the geometry module.

2. Specify the PDE using the grammar of TensorFlow.

3. Specify the boundary and initial conditions.

4. Combine geometry, PDE and BCs/ICs together into data.PDE or data.TimePDE


for time-independent or time-dependent problems, respectively. Specify the

training data by either set the specic point locations or only set the number

of points.

5. Construct a NN using the maps module.

6. Dene a model by combining the PDE problem in 4 and the NN in 5.

7. Call model.compile to set the optimization hyperparameters such as optimizer


and learning rate. The weights wf , wb can be set here by loss_weights.

8. Call model.train to train the network from random initialization or a pre-

trained model using the argument model_restore_path. It is extremely


exible to monitor and modify the training behavior using callbacks.

9. Call model.predict to predict the PDE solution at dierent locations.

Figure 2.7: Flow chart of the usage of DeepXDE. The white boxes dene the PDE and
the training hyperparameters. The blue boxes combine the PDE problem and training
hyperparameters in the white boxes. The orange boxes describe the steps to solve the
PDE (from right to left). Source: [16].

At the current state of the art, DeepXDE does not support the denition of various

BCs/ICs for a specic PDE. Thus, we cannot use this library for the generalization

task without having to change the code manually. Also, although it is user-friendly

it is dicult to take control over the single steps that proceed in the background.

16
2.3 Generating Ground Truth

2.3 Generating Ground Truth

Since we address the generalization of PINNs in the manner of proof of concept, we


need reference solutions to which we compare our numerical results. Thus, we have

to solve various initial-boundary-value problems numerically. The conventional way

to do this is via the nite element method (FEM). A popular software library for

solving PDEs by using the FEM is FEniCS [34]. Before we describe how to use

FEniCS in practical applications, we explain the theory of the FEM using material

from [35, 36].

2.3.1 Finite Element Method

The FEM is the most important method for solving PDEs, at least for elliptic and

parabolic PDEs. The basic idea is to transform the considered PDE which is usually

given in the so-called strong form into the weak form. Essential for this approach

is the proper choice of the underlying function space. Therefore, we briey explain

which function spaces we are dealing with in this section.

Consider an open set Ω ⊂ Rd with piecewise smooth boundaries. By L2 (Ω) we

denote the Hilbert space of square-integrable functions with inner product and norm

Z Z  21
2
hf, giL2 (Ω) := f (x)g(x)dx, kf kL2 (Ω) := |f (x)| dx .
Ω Ω

Concerning PDEs, we obviously need to deal with derivatives. In R we would

intuitively use
u(x + h) − u(x)
u0 (x) = lim .
h→0 h
However, pointwise evaluation does not make sense in L2 (Ω). Thus, we require

another denition of derivatives for this function space. To this end, we dene

0
C∞ (Ω) := {φ ∈ C∞ (Ω) : Ω ⊃ suppφ is compact}

where C∞ (Ω) denotes the set of innitely often dierentiable functions on Ω.

Denition 2.8. [36] (Weak Derivative)

A function f ∈ L2 (Ω) posseses a weak derivative Dwk f if there exists a function

g ∈ L2 (Ω) with

Z Z
|k|
g(x)φ(x)dx = (−1) f (x)Dk φ(x)dx ∀φ ∈ C∞
0
(Ω).
Ω Ω

17
Methods

Then it holds Dwk f = g .

We can express Denition 2.8 also in terms of the inner product by

hg, φiL2 (Ω) = (−1)|k| f, Dk φ 0


∀φ ∈ C∞ (Ω).

With this formalism we can dene the important function space which is used for

the FEM.

Denition 2.9. [36] (Sobolev Space)

For m > 0 let Hm (Ω) be the set of all functions u ∈ L2 (Ω) which posses a weak
k
derivative Dw u ∈ L2 (Ω) for all |k| ≤ m. In Hm (Ω) we dene a inner product and

norm

  12
X X
hu, viHm (Ω) = Dwk u, Dwk v L2 (Ω)
, kukHm (Ω) =  kDwk uk2L2 (Ω)  .
|k|≤m |k|≤m

Hm (Ω) is called Sobolev space.


The Sobolev space Hm (Ω) is complete and hence a Hilbert space [36]. Since we

apply the presented methods to the heat equation later, we will describe how the

FEM is applied to this example.

Let Ω ⊂ Rd be a bounded domain and [0, T ] be the time interval of interest. We

are looking for a solution u : Ω × [0, T ] → R which fullls the heat equation

ut = ∆u + f (2.1)

where f: Ω → R is the supplied heat. We consider a combination of Dirichlet and

Neumann BCs
∂u
u = uD on ΓD , = uN on ΓN ,
∂n
with ΓD and ΓN forming the boundary of Ω, i.e. ΓD ∪ ΓN = ∂Ω and ΓD ∩ ΓN = ∅.
Since we have a time-dependent problem, we need to prescribe initial data

u(·, 0) = u0 on Ω.

For now, we have given our problem in strong form. The FEM requires the problem

to be in weak form. The idea is to x a time t and consider (2.1) as stationary

problem. Then, the following steps are proceeded.

18
2.3 Generating Ground Truth

1. Dene the function space for the so-called test functions

V = {v ∈ H1 (Ω) : v = 0 on ΓD }.

2. Multiply the strong form (2.1) by test functions v ∈ V and integrate over

domain Ω Z Z Z
v u̇ dx − v∆u dx = vf dx ∀v ∈ V.
Ω Ω Ω

3. Transform second-order derivatives into rst-order derivatives by applying

Green's formula

Z Z Z
h∇u, ∇vi dx = − v∆u dx + v h∇u, ni ds
Ω Ω ∂Ω

∂u
and obtain due to
∂n
= uN on ΓN the weak form:

Find u ∈ H1 (Ω) with u = uD on ΓD such that


Z Z Z Z
v u̇ dx + h∇u, ∇vi dx = vf dx + vuN ds ∀v ∈ V. (2.2)
Ω Ω Ω ΓN

Note that for homogeneous BCs the boundary integral vanishes and we obtain u∈
V . It is conveninent to introduce an abstract notation for (2.2). We dene on

H1 (Ω) × H1 (Ω) the bilinear form


Z
a(u, v) := h∇u, ∇vi dx

and the linear functionals

Z Z Z
hu̇, vi := v u̇ dx, L(v) := vf dx + vuN ds,
Ω Ω ΓN

for given f (·, t) ∈ L2 (Ω) and uN (·, t) ∈ L2 (ΓN ). Still, we consider an innite-

dimensional function space. Since the FEM is a numerical method, the key step is

to switch to a discrete (nite-dimensional) space Vh ⊂ V , where h denotes the grid

size of the considered mesh, and we end up in the nal step.

4. Apply the Galerkin projection which corresponds to the task:

For each time step t nd the approximation uh ∈ Vh


with uh = uD on ΓD such that

hu̇h , vi + a(uh , v) = L(v) ∀v ∈ Vh . (2.3)

19
Methods

(2.3) results in a system of ordinary dierential equations where the initial value

uh (·, 0) can be computed with interpolation of the IC u(·, 0) = u0 or a so-called L2


projection of u0 by huh (·, 0), vi = hu0 , vi, for all v ∈ Vh .
It remains the question of how to choose the subspace Vh for the Galerkin projec-

tion. In the FEM, the domain Ω is rst discretized, i.e. the domain is divided into

elements that have a simple shape as depicted in Figure 2.8. In each element, the

functions in Vh are locally dened piecewise polynomials in d variables with typically


the same degree over all elements.

Figure 2.8: Top: Element types of degree=1. Bottom : Element types of degree=2.
The markers • identify nodes in which a function value is interpolated. Source: https:
//www.femto-engineering.de/stories/festigkeitsberechnungen/.

2.3.2 Numerical PDE Solver: FEniCS

As already menioned before, FEniCS is a numerical PDE solver which uses the

FEM method. It can be used when programming in Python or C++. Basically,

the user has to perform the steps as stated in section 2.3.1. The goal is to provide

FEniCS a variational problem. What makes FEniCS attractive is that the steps

of writing the program code for dening the variational problem with all relevant

assumptions "result in fairly short code, while a similar program in most other

software frameworks for PDEs require much more code and technically dicult

programming" [35].

Assume we have given the heat equation as stated in section 2.3.1. For simplicity

we only assume Dirichlet BCs. FEniCS distinguishes between the trial space U and

20
2.3 Generating Ground Truth

the test space V where

U = {u ∈ H1 (Ω) : u = uD on ∂Ω},
V = {v ∈ H1 (Ω) : v = 0 on ∂Ω}.

In order to solve a PDE with FEniCS the user has to perform the following steps

[35]:

ˆ Choose the nite element spaces U, V by specifying the domain (the mesh)

and the type of function space (polynomial degree and type).

ˆ Express the PDE as a (discrete) variational problem.

We want a representation of the form 2.3. Thus, we consider discrete nite-dimensional

subspaces Uh ⊂ U , Vh ⊂ V and want to nd the discrete solution uh . In the fol-

lowing, the superscript h is dropped since FEniCS aims to provide "(almost) a

one-to-one relationship between the mathematical formulation of a problem and the

corresponding FEniCS program". Thus, we refer by U and V to the discrete nite

element spaces. Also, u denotes the solution of the discrete problem and ue corre-

sponds to the exact solution if we need to distinguish between both. Thus, we want

to solve the discrete variational problem:

For each time step t nd u∈U such that

a(u, v) = L(v) ∀v ∈ V.

where the bilinear form a(u, v) collects all terms with the unknown solution u and

L(v) is the linear functional compromised of all known functions. But how does

FEniCS handle the derivative in time? As mentioned in the introduction, there

are no implemented methods here computing derivatives accurately. FEniCS solves

time-dependent problems by rst discretizing the time derivative by a nite dier-

ence approximation. For a xed t, each stationary problem is then transformed into

a variational formulation. We will stick to the notations given in [35].

Let un denote the solution u at time tn where n is an integer iterating over all

considered time levels. "For simplicity and stability reasons" [35], FEniCS chooses

a simple backward dierence:

n+1
un+1 − un

∂u

∂t ∆t

with ∆t as the discretization parameter. With this, we obtain the time discrete

version of the heat equation sampled at some time level tn+1 , also called backward

21
Methods

Euler or implicit Euler discretization

un+1 − un
= ∆un+1 + f n+1 .
∆t

FEniCS now solves, given the initial value u0 , a sequence of spatial stationary prob-
n+1
lems for u iteratively

u0 = u0 , (2.4)

un+1 − ∆t∆un+1 = un + ∆tf n+1 , for n = 0, 1, 2, . . . (2.5)

To turn (2.5) into weak form, we have to multiply the equation by a test function

v∈V and integrate over the domain Ω where second-order derivatives are removed

by Green's formula. We obtain the variational formulation

a(un+1 , v) = Ln+1 (v)

where

Z
n+1
un+1 v + ∆t ∇un+1 , ∇v

a(u , v) = dx,
ZΩ
Ln+1 (v) = un + ∆tf n+1 v dx.


We also need to approximate the IC (2.4). As mentioned in section 2.3.1, this is

done by either interpolating the initial value u0 or performing a L2 projection of u0


into the nite element space. Turning (2.4) into a variational formulation as well

yields

a(u0 , v) = L0 (v)

where

Z
0
a(u , v) = u0 v dx,
ZΩ
L0 (v) = u0 v dx.

Thus, we obtain the following sequence of variational problems in order to solve the

heat equation with the FEM in FEniCS: For n = 0, 1, 2, . . .

Find u0 ∈ U such that a(u0 , v) = L0 (v) ∀v ∈ V.


Find un+1 ∈ U such that a(un+1 , v) = Ln+1 (v) ∀v ∈ V

22
2.4 Application to Heat Equation

2.4 Application to Heat Equation

Since PINNs provide an adequate method for solving PDEs eciently [4] we want to

investigate whether they can be applied to certain simulation models e.g. a simplied

thermal model of a satellite. A satellite lives in the earth orbit and absorbs heat

from the sun or other sources via radiation. We are interested in modelling the heat

conduction in the metal plates that have to be kept under a certain temperature.

To this end, we consider the well-known heat equation. In this thesis, we are only

looking at the one-dimensional case in order to expose rst ideas how to deal with

a generalization task when using PINNs.

Let Ω ⊂ R be a bounded domain and I = [0, T ] be an interval. Then u : Ω×I → R


is a solution of the heat equation if

ut = λuxx + f

with thermal diusivity λ and source term f : Ω → R. The solution u is unique if

it further satises given boundary and initial conditions

u = uD on ∂Ω
u = u0 in Ω.

For the sake of simplicity we only consider the heat equation with f = 0 and Dirichlet
BCs uD . The heat equation has for specied boundary and initial conditions a unique

analytical solution which we are using for training.

Analytical Solution

The following is based on [37]. We exemplarily show the computation of the solution

of the heat equation with Ω = [0, π], λ = 1, f = 0 and zero Dirichlet BCs uD = 0
given by

ut = uxx in Ω×I
u=0 on ∂Ω
u(·, 0) = u0 in Ω.

We solve this PDE by separation of variables and use the ansatz

u(x, t) = X(x)T (t).

23
Methods

We immediately obtain X(x)T 0 (t) = X 00 (x)T (t) which yields

T 0 (t) X 00 (x)
=
T (t) X(x)
constant in space
| {z }

T 0 (t)
assuming X(x) 6= 0 6= T (t). Let c := T (t)
be a constant. Due to the boundary

conditions X(0) = X(π) = 0 it follows

1) c=0

X 00 (x) = cX(x)
=⇒ X(x) = a1 x + a2
=⇒ a1 = a2 = 0

2) c>0

X 00 (x) = cX(x)
√ √
=⇒ X(x) = a1 e cx
+a2 e− cx

√ √
We obtain a1 + a2 = 0 and a1 e cπ
+a2 e− cπ
= 0.
√ √
=⇒ a1 (e cπ
−e− cπ
)=0
=⇒ a1 = a2 = 0

The above cases both lead to the trivial solution. Hence, c < 0. Let us denote
2
c = −k with k ∈ N. Then

X 00 (x) = −k 2 X(x)
=⇒ X(x) = ak sin(kx).

Note that X(π) = 0 for all k ∈ N. The corresponding solution for T (t) is given by

2
T (t) = e−k t .

Thus, all functions


2t
uk (x, t) = ak sin(kx) e−k

24
2.4 Application to Heat Equation

are solutions of the one-dimensional heat equation on the interval [0, π] with zero

boundary conditions. By superposition we obtain the general solution


2
X
u(x, t) = ak sin(kx) e−k t .
k=1

onq
2
The coecients ak are given by the initial condition u0 . sin(kx) : k ∈ N
The functions
π
form an orthonormal basis of the Hilbert space of square integrable functions on [0, π]

with zero boundary conditions. Hence, the initial condition u0 can be written as


X 2
u0 (x) = hu0 , sin(kx)i sin(kx)
k=1
π
∞ Z π 
X 2
= u0 (z) sin(kz)dz sin(kx).
k=1
π 0

It follows Z π 
2
ak = u0 (z) sin(kz)dz .
π 0

Thus, the analytical solution is given by

∞ Z π 
X 2 2
u(x, t) = u0 (z) sin(kz)dz sin(kx) e−k t .
k=1
π 0

Example 2.1. Consider the one-dimensional heat equation with λ = 1 and zero

Dirichlet boundary conditions uD = 0. Let u0 (x) = sin(x) be the initial condition

for x ∈ [0, π]. Then, the analytical solution is given by

u(x, t) = sin(x) e−t .

Since we want our model to be generalizable, we look at the heat equation with

various boundary and initial conditions. As already mentioned before (section 2.2.2),

using PINNs with current state of the art methods, this requires the model to be

trained anew for every modied boundary or initial condition. This constraint raises

a fundamental question:

Can we construct a model which enables us to make reliable

predictions for various problems without the necessity of

training the model over and over again?

25
Methods

To answer this question, we take a look at several cases. First, we compare a PINN

to a NN in terms of the size of the training dataset and extrapolation capabilities.

With this, we want to expose if and under what conditions PINNs are superior over

NNs. Subsequently, we vary the BCs for a xed IC. Under this setting we also

investigate the impact of an additional nonlinear term in the residual. The most

interesting case we look at deals with a compilation of several ICs. Here, the BCs are

given implicit by the ICs. In order to be able to describe the input functions with

as few parameters as possible we ought to choose a proper representation. Hence,

we parameterize the functions by Fourier coecients.

2.4.1 Fourier Analysis

We want to verify the usage of Fourier coecients as function representation by

using ideas and results from [38, 39]. Therefore, we require some preliminaries.

Let T := [0, 1] be a Torus of length 1 and L2 (T) be the Hilbert space of square-

integrable functions f : T → C with inner product and norm as stated in section

2.3.1. Elements f ∈ L2 (T) are periodic functions living on R that are continued

periodically beyond the interval T.

Theorem 2.2. (ONB of L2 (T))


The functions

e2π i kx = cos 2πkx + i sin πkx : k ∈ Z



(2.6)

form an orthonormal basis (ONB) of L2 (T).

Proof. See for example [38].

With (2.6), we can represent the elements of the Hilbert space L2 (T) with respect
to that basis. Let Sn : L2 (T) → L2 (T) be the linear operator dened by

n
X
Sn f := f, e2π i k· e2π i k·
k=−n

which we denote by the nth Fourier partial sum of f ∈ L2 (T). Note, that this is also

dened for f ∈ L1 (T) ⊃ L2 (T) where L1 (T) denotes the Hilbert space of measurable
functions. Now, we can state the following theorem.

Theorem 2.3. (Fourier Series of a Function)

Every function f ∈ L2 (T) has a unique representation of the form

X Z 1
f (x) = ck (f ) e 2π i kx
, ck (f ) := f, e 2π i k·
= f (x) e−2π i kx dx (2.7)
k∈Z 0

26
2.4 Application to Heat Equation

where the series (Sn (f ))∞


n=0 converges in L2 (T) to f. We call (2.7) Fourier series of

f with Fourier coecients ck (f ). Further, the Parseval equation is fullled

X
kf k2L2 (T) = |ck (f )|2 < ∞.
k∈Z

Proof. Let ( )
n
X
Pn := ck e2π i k· : ck ∈ C
k=−n

be the space of all trigonometric polynomials of degree ≤ n. We show that the

Fourier partial sum operator Sn : L2 (T) → L2 (T) is an orthogonal projector onto

Pn , i.e.

kf − Sn f k = min{kf − pk : p ∈ Pn }

for an arbitrary f ∈ L2 (T). Let f ∈ L2 (T) and pn ∈ Pn . Then, it holds

hSn f − f, pn i = hSn f, pn i − hf, pn i


* n n
+
X X
= ck (f ) e2π i k· , ck e2π i k· − hf, pn i , ck ∈ C
k=−n k=−n
n
X n
X
= ck (f )ck e2π i k· , e2π i k· − ck f, e2π i k·
k=−n
| {z } k=−n
=1
 
Xn
= ck ck (f ) − f, e2π i k· 
k=−n
| {z }
=0

=0 ∀pn ∈ Pn

Thus, f − Sn f is orthogonal to Sn f ∈ Pn . It immediately follows

kf − Sn f k2L2 (T) = kf k2L2 (T) − kSn f k2L2 (T)


Z 1 X n 2
2 2π i kx
= kf kL2 (T) − ck (f ) e dx
0 k=−n
n Z 1
2
X 2
= kf k2L2 (T) − |ck (f )| e2π i kx dx
k=−n 0 | {z }
=1
n
X
= kf k2L2 (T) − |ck (f )|2 . (2.8)
k=−n


Since the functions e2π i kx : k ∈ Z form an ONB of L2 (T) the Parseval equation

27
Methods

holds
X
kf k2L2 (T) = |ck (f )|2 .
k∈Z

As an immediate consequence the assertion follows from (2.8) for n → ∞.


Theorem 2.3 ensures that a function f ∈ L2 (T) can be described completely by

its Fourier coecients. Note, that in (2.7) the equality is meant in the sense of L2 .
There is no pointwise or uniform convergence here. The Fourier coecients ck (f )
correspond to the amplitudes of oscillations contained in f.

To be able to analyze a function in practical applications, it is crucial to have

a discrete setting. Besides, in some cases only certain frequencies k are of inter-

est. Therefore, consider the regarded function sampled on an equispaced grid, i.e.
−1 j
(fj )N

j=0 with fj := f N
, N ∈ N.
Denition 2.10. [38] (Discrete Fourier Transform)

The linear map F : CN → CN , Ff = fˆ which is dened by

N
X −1
fˆk := fj e−2πijk/N , k = 0, . . . , N − 1
j=0

is called discrete Fourier transform (DFT) of f ∈ CN .


The discretization induces an aliasing error. A proper approximation for the

relation between the Fourier coecients ck (f ) and their discrete analogue fˆk is given

by

1 ˆ N
ck (f ) ≈ fk , k = 0, . . . , − 1
N 2
1 ˆ N
c−k (f ) ≈ fN −k , k = 1, . . . ,
N 2

where w.l.o.g let N be even. In the following, we identify the phrase Fourier coe-

cients with the values fˆk computed by the DFT as stated in Denition 2.10). The

DFT is invertible [38]. For given Fourier coecients fˆk the original sampled function

can be obtained by the inverse discrete Fourier transform.


Denition 2.11. (Inverse Discrete Fourier Transform)

The linear map F −1 : CN → CN , F −1 fˆ = f dened by

N −1
1 X ˆ 2πijk/N
fj := fk e , j = 0, . . . , N − 1
N k=0

is called inverse discrete Fourier transform (IDFT).

28
2.4 Application to Heat Equation

2.4.2 Comparison of PINN to NN

Before we tackle the generalization task of this thesis, we rst look at a more basic

question. What is the added value of a PINN compared to a NN which performs a

simple function approximation by using the ground truth data? According to Raissi

et al. PINNs seem to be the answer for problems where only small data is available

[4]. Thus, we consider two cases

1. Size of Training Dataset


How many training points are required in order to gain a reliable model?

2. Extrapolation
Do PINNs have better extrapolation capabilities than NNs?

In order to answer this, we examine what impact the residual has on the training

process of a PINN. This is done by comparing a PINN to a NN with the same

network architecture and net initialization. Furthermore, we include points of the

inner domain into the set of boundary points where the ground truth is used in the

loss function. Generally, this would not be done when using PINNs in practice (see

Def. 2.7). However, with this approach we achieve that the only dierence between

the two models is that the PINN has the additional residual term in the loss function

which enables us to compare both methods.

Let Ω = [0, π], I = [0, 1] and λ = 1. Consider a function u: Ω × I → R satisfying

the one-dimensional heat equation with the following initial and boundary conditions

IC: u0 (x) = sin(x) (2.9)

BC: uD = 0.

As in section 2.2 we dene Ω̃ = Ω × I ∂ Ω̃ = ∂Ω × (0, T ) ∪ Ω × {0} containing


with

boundary and initial data. We consider the set of training points T ⊂ Ω̃. In section

2.2.1 it is claimed that T is divided into sets Tf ⊂ Ω̃ and Tb ⊂ ∂ Ω̃ where ground

truth data is only used for Tb . However, as mentioned above, for comparison reasons

we set Tb = T . The ground truth data is obtained by solving the PDE with FEniCS

(see subsection 2.3.2) on a 100 × 100 mesh, i.e. a grid consisting of 100 points in x-

and 100 points in t-direction where x ∈ Ω, t ∈ I .

Tackling the rst question we train the models with dierent sizes of the train-

ing dataset. We consider sizes in M = {10, 15, 20, 30, 40, 60, 80, 100, 120, 160, 200}.
First, we train the NN with a selected size of randomly distributed training points.

Then, with the same net initialization and the same training data we train the cor-

responding PINN. The test dataset consists of 200 randomly distributed points. For

29
Methods

all considered cases we use the same test dataset. With the IC and BC from (2.9)

we obtain the settings for each considered size of the training dataset as depicted

in Table 2.1. With the obtained models we perform extrapolation and look at the

behavior for the extreme case t ∈ [1, 5].

Table 2.1: Setting for comparison of PINN to NN. Both were trained with the same net
initialization on the same data with the Adam optimizer and a learning rate of 0.0001.
Ω I λ Depth Width Epochs

[0, π] [0, 1] 1. 5 50 20000

2.4.3 Generalization of PINNs

The quality of a NN is usually measured by its generalization ability. In fact, one

can consider a model as useless if it makes inaccurate predictions on unseen data.

Thus, we investigate the generalization capabilities of PINNs for various problem

settings of the heat equation. We consider two main cases

1. Fixed Initial and Various Boundary Conditions


a) Sine as IC

b) Line as IC with discontinuities at the boundary

c) Nonlinear reaction term in residual with sine as IC

2. Various Boundary and Initial Conditions


d) Functions parametrized by Fourier coecients

In the following we elaborate on the cases mentioned above. In general, the heat

equation is a special case of the diusion-reaction system

ut = ∇ · (d(u)∇u) + r(u)

where d(u) > 0 is the diusivity and r(u) > 0 a reaction term. In the cases
considered before, it holds d = λ and r = 0. However, regarding the motivation
to model a reaction of the satellite with the space environment, for the nonlinear

case we take an additional radiative loss into account. In a vaccuum, a satellite can

only absorb or emit heat via radiation. We model this radiative loss in a simple

fashion by the Stefan-Boltzmann law r(u) = −σ(u4 − v4 ) where v(x, t) denotes the
temperature of the surroundings and σ being a constant. For simplicity we restrict

30
2.4 Application to Heat Equation

to v = 0. In doing this, we investigate the PINN's behavior when the residual

contains an additional nonlinear term.

To summarize, we are looking for a solution u: Ω × I → R with Ω, I ⊂ R that

satises the PDE

ut = λuxx + r(u)

for given IC u0 and BC uD . For all cases we consider I = [0, 1].

a) Let Ω = [0, π], r = 0 and

IC: u0 (x) = sin(x)




a, for x=0
BC: uD (x) = .

b, for x=π

We choose a, b ∈ {0, 0.2, 0.4, 0.6, 0.8, 1}. This results in a dataset consisting of

36 individual datasets.

b) Let Ω = [0, 1], r = 0 and

IC: u0 (x) = −x + 1


a, for x = 0
BC: uD (x) = .

b, for x = 1

As in a) we choose a, b ∈ [0, 1] with stepsize 0.2 and obtain a dataset consisting


of 36 individual datasets. The discontinuities at the boundaries force the

straight line to change over time in a way such that it fullls the BC.

c) Let Ω = [0, π], a, b ∈ [0, 1] and the IC and BC as dened in a). Unlike in a),

b), here it holds r 6= 0. The considered PDE becomes

ut = λuxx − σu4 .

We again obtain a dataset consisting of 36 individual datasets.

d) Let Ω = [0, 1] and r = 0. The ICs are given by functions that are parametrized

by 6 Fourier coecients (see Def. 2.10). The BCs are given implicit. Here, we

provide the PINN 2065 functions. Thus, the dataset consists of 2065 individual

datasets. The functions are variations of rectangular functions, lines with

dierent slopes and polynomials. In doing so, we achieve that the network

31
Methods

learns various types of functions such that it is able to predict the solution for

an arbitrary IC represented by its Fourier coecients.

As usual for PINNs, we dene a set of training points T ⊂ Ω̃ with Tf ⊂ Ω̃ and

Tb ⊂ ∂ Ω̃. This is done for each individual dataset. Consider case a): If we choose

a size of 20 training points, each of the 36 individual datasets uses 20 training

points where 10 points live in the inner region an the other 10 on the boundary.

The 10 boundary points are also used for the residual loss in order to increase the

regularization. Thus, the complete training dataset consists of 720 points. The same

procedure is done for the test dataset. Here, for each case we use a size of 200 test

points. The dierent problem settings with the respective network architectures are

Table 2.2: Settings for the generalization of PINNs. The column Size denotes the size of
the training dataset for each individual dataset. In the nonlinear case we choose σ = 3.7.
All models were trained with the Adam optimizer and a learning rate of 0.0001.
Sizes Ω I λ r Depth Width Epochs

Sine 20, 40, 60 [0, π] [0, 1] 1. 0 5 60 60000

Line 40, 60, 80 [0, 1] [0, 1] 0.4 0 5 100 20000

Nonlinear 30, 40, 60 [0, π] [0, 1] 0.1 −σu4 5 100 60000

Fourier 40, 60, 80 [0, 1] [0, 1] 0.2 0 5 60 60000

summarized in Table 2.2. We use Python's scipy module to compute the complex-

valued Fourier coecients fˆk . We sample the function used as IC at 100 points

and take 6 coecients as its parametrization. We are interested in those coecients

with higher frequencies, i.e. c0 , c1 , c2 , c−1 , c−2 , c−3 . Thus, we have to use fˆ0 , fˆ1 ,
fˆ2 , fˆ97 , fˆ98 , fˆ99 according to the relation mentioned above. For simplicity reasons

we only use the real and imaginary part of the numbers instead of their complex

representation. As before, we use FEniCS to generate the ground truth data for

Tb . Therefore, we use the 6 Fourier coecients and compute the resulting function

using the IDFT.

32
3 Results
We nally present the results of the tasks presented in section 2.4. First, we show

the results when xing the boundary and initial condition. Here, we show the

comparison between a NN and a PINN. In the next section, we look at the eects

of selected initial conditions and nonlinearities in the residual when varying the

boundary conditions. Lastly, we present the results gained when training the model

with various initial conditions provided as six Fourier coecients. The results are

generated on a 100 × 100 grid.

3.1 Fixed Boundary and Initial Condition

3.1.1 Size of Training Dataset

We compare a PINN to a NN with the same prerequisites. We are interested in the

models' performance for increasing size of the training dataset. In the following the

phrase size corresponds to the number of training points. For each size we present

the model at that training step where it achieved the best test loss since the test loss

serves as indicator for a good generalization ability. An overview of the obtained

results for each considered size is displayed in Table 3.1. For all models the test loss

was evaluated at the same 200 test points. The relative L2 error takes all 10000

grid points into account. The relative L2 error of the PINN is smaller than the error

of the NN for all sizes up to 160 inclusively. The PINN test loss is smaller than

the NN test loss up to size 100. Looking at specic sizes, dierent behaviors of the

NN and the PINN, respectively, can be observed. From size 30 to 40 the test loss

and the relative L2 error of the NN decrease whereas for the PINN these values are

increasing or being equal. From size 80 to 100 the test loss and the error of the NN

are increasing, for the PINN they are decreasing. From size 100 to 120, the relative

L2 error of the NN is decreasing whereas for the PINN it is increasing.

A visualization of the evolution of the relative L2 error and the runtime over the

sizes can be seen in Figure 3.1. Consider the errors displayed in Figure 3.1a. At rst

33
Results

Table 3.1: Comparison of PINNs and NNs trained for various numbers of random sampled
training points. Train loss, test loss and relative L2 error are compared. The column size
denotes the number of training points. The test losses of all models are evaulated at the
same 200 points.
NN PINN

size train loss test loss rel. L2 error train loss test loss rel. L2 error

10 1.99e-06 4.24e-04 5.45e-02 6.71e-06 9.1e-05 1.91e-02

15 4.85e-06 2.67e-04 3.04e-02 4.5e-07 8.e-06 5.7e-03

20 1.8e-07 2.23e-05 1.58e-02 4.e-07 5.1e-06 4.57e-03

30 7.78e-06 5.1e-05 2.04e-02 1.3e-06 4.3e-06 4.65e-03

40 1.25e-06 4.38e-06 6.31e-03 2.9e-06 4.3e-06 6.22e-03

60 1.1e-06 2.92e-06 5.01e-03 1.2e-06 1.4e-06 2.73e-03

80 8.8e-07 1.77e-06 4.23e-03 1.2e-06 1.5e-06 2.46e-03

100 5.7e-07 4.15e-06 7.04e-03 4.5e-07 1.e-06 1.93e-03

120 1.86e-06 6.94e-06 5.61e-03 7.3e-07 1.5e-06 2.34e-03

160 1.e-06 1.27e-06 2.4e-03 1.2e-06 1.3e-06 2.37e-03

200 5.7e-07 8.3e-07 1.92e-03 9.e-07 1.e-06 2.11e-03

(a) Relative L2 error (b) Runtime


Figure 3.1: Plot of relative L2 error and runtime over sizes. (a): The relative L2 error is
evaluated at the best model. (b): The runtime corresponds to a training of 20000 epochs.

glance, it can be seen that the PINN error is smaller than the NN error for sizes up

to 160 inclusively. The largest dierence between the error of the PINN and the NN

is at size 15. In Figure 3.1b the runtime of the training process for each considered

size is displayed. For the PINN the runtime increases over the sizes whereas the

runtime of the NNs stays within a range of about 97 and 124 seconds. At size 10 the

PINN takes twice as long as the NN. For 200 training points the PINN's runtime is

34
3.1 Fixed Boundary and Initial Condition

about 6 times larger than the runtime of the NN.

Loss Curves
We exemplarily look at the loss curves of the PINNs and NNs trained for sizes in

{15, 30, 40, 80, 100, 160} pictured in Figure 3.2. The losses for size 15 are shown in

Figure 3.2a. One can see the large dierence between the train and test losses for

both the PINN and NN. This indicates a lack of train data which corresponds to a

insucient regularization and thus results in a poor generalization. For the NN this

gap is larger and there is no signicantly decrease of test loss from epoch 4000 on,

whereas for the PINN it decreases up to epoch 19000. For size 30 (Fig. 3.2b), the

gap between the PINN's train and test loss is small compared to the NN where it

is still large. Here, the NN's test loss stops decreasing signicantly at epoch 13000.

From size 40 on (Fig. 3.2c), the dierence of train and test loss is for both PINN and

NN small compared to the smaller sizes. All curves are decreasing over the number

of epochs. One can see in Figures 3.2a-d that the gap between train and test loss

becomes smaller for increasing size of the training dataset. This gap becomes larger

again for size 100 as can be seen in Figure 3.2e. Figure 3.2f displays the similar loss

values of both the PINN and NN.

Solution Plots
Now, we present the solution plots that correspond to the losses presented be-

fore. In Figure 3.3 the solution plots of NNs and PINNs trained with sizes in

{15, 30, 40, 80, 100, 160} are compared. For each size the prediction of the model

and the used training data are displayed. Additionally, we look at the relative

L∞ error evaluated at each grid point in order to see where the errors are made.

Furthermore, we depict the maximal errors of all considered models in Table 3.2.

In case of size 15 (Fig. 3.3a-b) one can see that in the case of the NN the errors

are especially made on the boundary and in those regions where no training points

were available. The PINN has in the aforementioned areas only minor errors. In

case of size 30, pictured in Figures 3.3c-d, the larger errors made by the NN are

located in areas of the boundary, especially around the initial condition where no

training points are available. The PINN has only a maximal error of 1.95% although

there are no training points in that region. Also, compared to the PINN the NN

makes larger errors in the inner domain. In Figures 3.3e-f, representing 40 utilized

training points, the error plots of PINN and NN have a similar ratio between large

and smaller errors. Comparing size 80 (Fig. 3.3g-h) to size 100 (Fig. 3.3i-j), one

can see in the case of the NN that for size 100 larger errors are made than for size

35
Results

(a) Size 15 (b) Size 30

(c) Size 40 (d) Size 80

(e) Size 100 (f ) Size 160


Figure 3.2: Loss curves of PINN and NN over number of epochs for sizes in
{15, 30, 40, 80, 100, 160}. (a): Large gap between train and test loss for both PINN and
NN. For the NN this gap is larger. (b): Gap between train and test loss still large for NN
but smaller for PINN. (c)-(d): For growing size the loss curves are approaching each other
for both PINN and NN. (e): Dierence between train and test loss increases again (for NN
larger dierence). (f): Loss values of PINN and NN similar for almost all epochs.

36
3.1 Fixed Boundary and Initial Condition

(a) NN: Size 15 (b) PINN: Size 15

(c) NN: Size 30 (d) PINN: Size 30

(e) NN: Size 40 (f ) PINN: Size 40

Figure 3.3: Predicted solutions of PINNs and NNs trained with sizes in {15, 30, 40}. The
black markers represent the used training points. Top : Prediction. Bottom : Relative
L∞ error. (a)-(d): NN makes larger errors than PINN. Signicant errors are made at
the boundaries. (e)-(f): Both are making similar errors. Errors are mainly made at the
boundaries.

37
Results

(g) NN: Size 80 (h) PINN: Size 80

(i) NN: Size 100 (j) PINN: Size 100

(k) NN: Size 160 (l) PINN: Size 160

Figure 3.3: Predicted solutions of PINNs and NNs trained with sizes in {80, 100, 160}.
The black markers represent the used training points. Top : Predicted solution. Bottom :
Relative L∞ error. (g)-(j): NN makes larger errors than PINN. Errors are mainly made
at the boundaries. (k)-(l): Both are making similar errors. Minor errors are made at the
boundaries.

38
3.1 Fixed Boundary and Initial Condition

80. This can be seen by the increasing of the error at the boundary in positive

t-direction. In contrast to the NN, the PINNs trained with 80 and 100 training

points, respectively, have a similar error distribution with a maximal error of 0.98%

for size 80 and 0.45% for size 100. In case of size 160 (Fig. 3.3k-l), the PINN and

NN behave similar in the sense of the amount of errors. Both are making minor

errors at the boundaries although the locations of the errors dier. Also, the inner

domain of the PINN has a smoother error distribution than the NN having isolated

spots.

Table 3.2: Predictive errors of PINNs and NNs for various number of training samples.
The left column corresponds to the size of the training dataset, the maximal relative L∞
error with locations are displayed.
NN PINN

size (x, t)T max. error (x, t)T max. error

10 (0, 1)T 14.5% (0, 1)T 4.5%

15 (π, 0)T 10.86% (π, 0)T 2.05%

20 (0, 0)T 7.77% (π, 1)T 1.43%


T T
30 (0, 0) 10.61% (0, 0) 1.95%
T T
40 (0, 1) 2.72% (0, 1) 2.7%

60 (0, 0)T 3.32% (π, 1)T 0.89%

80 (0, 0)T 2.92% (π, 1)T 0.98%

100 (0, 0)T 4.87% (π, 0)T 0.45%

120 (π, 0)T 2.38% (0.63, 0)T 0.57%

160 (0, 0)T 0.67% (π, 1)T 0.83%


T T
200 (π, 0.27) 0.54% (π, 1) 0.78%

3.1.2 Extrapolation

In terms of extrapolation we depict the models trained with sizes in {10, 15, 40, 80, 100}
in Figure 3.4. Generally, it can be said that the error increases in positive t-direction.

For completeness we additionally display the maximal errors of the extrapolation

performed with all regarded models in Table 3.3. In case of size 10 (Fig. 3.4a-b),

the PINN achieves better extrapolation results than the NN. The error of the NN

increases up to 81.48%. In case of size 15 (Fig. 3.4c-d), both are making a similar

amount of errors. In Figures 3.4e-f it can be seen that the NN extrapolates bet-

ter than the PINN. Here, the NN has a maximal error of 47.12% compared to the

PINN's error of 57.09%. Looking at size 80 (Fig. 3.4g-h), the PINN achieves better

39
Results

(a) NN: Size 10 (b) PINN: Size 10

(c) NN: Size 15 (d) PINN: Size 15

Figure 3.4: Extrapolated solution of PINNs and NNs trained with sizes in {10, 15}. The
models were trained for t ∈ [0, 1]. Top : Exact solution. Middle : Predicted solution.
Bottom : Relative L∞ error. (a)-(b): NN makes larger errors than PINN. (c)-(d): Both are

making similar errors.

40
3.1 Fixed Boundary and Initial Condition

(e) NN: Size 40 (f ) PINN: Size 40

(g) NN: Size 80 (h) PINN: Size 80

Figure 3.4: Extrapolated solution of PINNs and NNs trained with sizes in {40, 80}. The
models were trained for t ∈ [0, 1]. Top : Exact solution. Middle : Predicted solution.
Bottom : Relative L∞ error. (e)-(f): PINN makes larger errors than NN. (c)-(d): NN

makes larger errors than PINN.

41
Results

(i) NN: Size 100 (j) PINN: Size 100


Figure 3.4: Extrapolated solution of PINN and NN trained with 100 training points.
The models were trained for t ∈ [0, 1]. Top : Exact solution. Middle : Predicted solution.
Bottom : Relative L∞ error. (i)-(j): PINN makes larger errors than NN.

Figure 3.5: Relative L2 error for extrapolation performed with the models trained with
dierent sizes of the training dataset.

extrapolation results with only a small maximal error of 8.64%. The error of the

NN increases up to 31.7%. In case of size 100 (Fig. 3.4i-j), the NN is superior over

the PINN. The maximal error of the PINN is of 59.23% whereas the NN has only

42
3.1 Fixed Boundary and Initial Condition

an error of 15.37%.

In Figure 3.5 we plotted the relative L2 error over the number of training samples.

One can see great uctuations of the error for growing size of the training dataset.

Thus, there is no specic rule that can be deduced under what assumptions a NN

or PINN performs better in terms of extrapolation.

Table 3.3: Predictive errors of extrapolation performed with PINN and NN models trained
for various number of training samples. The left column corresponds to the size of the
training dataset, the relative L∞ error with locations are depicted.
NN PINN

size (x, t)T max. error (x, t)T max. error


T T
10 (π, 5) 81.48% (0, 4.75) 60.66%

15 (π, 5)T 44.76% (0, 5)T 45.23%

20 (0, 5)T 55.03% (0, 5)T 21.8%

30 (π, 5)T 33.25% (0.98, 5)T 45.47%

40 (1.87, 5)T 47.12% (2.38, 5)T 57.09%

60 (2.82, 5)T 19.05% (1.75, 5)T 18.61%

80 (2.82, 5)T 31.7% (1.3, 5)T 8.64%


T T
100 (1.02, 5) 15.37% (0.63, 5) 59.23%

120 (0, 5)T 19.34% (0, 5)T 37.18%

160 (2.16, 5)T 37.98% (0, 5)T 14.91%

200 (0, 4.8)T 9.29% (1.33, 5)T 26.46%

3.1.3 Validation with DeepXDE

Within this section we validate our implementation using PyTorch with the library

DeepXDE mentioned in section 2.2.2. This is done in order to exploit advantages

or disadvantages of using DeepXDE compared to a self-made PINN implementa-

tion. Although DeepXDE provides the ability to use dierent backends, it fails to

generate results with the backend PyTorch since relevant functions have not been

implemented yet. Thus, for the following results we use the default backend Tensor-

Flow 1.x. The network architecture is the same as the one used for the comparison

of NN and PINN (see Tab. 2.1).

In Figure 3.6 the comparison of the predicted solutions is pictured. One can see

that smaller errors are achieved by DeepXDE with a maximal error of 0.13% at

(x, t)T = (π, 1)T . Our self-made implementation has a maximal error of 0.42% also
T T
at (x, t) = (π, 1) . Looking at Table 3.4 it can be seen that DeepXDE is about

43
Results

two times faster than the self-made implementation. Also, the relative L2 error of

DeepXDE is one order of magnitude smaller.

Table 3.4: Comparison of the training process of a PINN using DeepXDE and PyTorch.
Both were trained for 20000 epochs on the same train and test data with the Adam
optimizer and a learning rate of 0.0001.
train loss test loss time rel. L2 error

DeepXDE 1.04e-06 1.18e-06 112.8s 3.56e-04

Torch 2.42e-06 3.79e-06 273.3s 2.33e-03

(a) DeepXDE (b) PyTorch


Figure 3.6: Comparison of DeepXDE with own PINN implementation using PyTorch.
Top : Predicted solution. The black markers represent the used training data (100 data
points). Bottom : Relative L∞ error.

44
3.2 Fixed Initial and Various Boundary Conditions

3.2 Fixed Initial and Various Boundary

Conditions

We now address the generalization task which is the core of this thesis. At the

moment we are looking at models trained with xed initial conditions and boundary

conditions with values in M := {0, 0.2, 0.4, 0.6, 0.8, 1}. Within this section, we also

include investigations on the behavior of the training process when the PDE residual

contains an additional nonlinear term. The following results are compmuted on a

Tesla V100 SXM2 unless othwise stated.

3.2.1 Initial Condition: Sine

The loss curves of models trained with sine as initial condition and training set

sizes in {20, 40, 60} are presented in Figure 3.7. For size 20 one can see a large gap

between train and test loss. This indicates a poor regularization and yields a model

that does not generalize well. Comparing size 40 to size 60, there is no signicantly

dierence of the test loss curves. In Table 3.5 we have displayed the runtime of

Figure 3.7: Plot of train and test losses of models trained with sine as IC and boundary
values in {0, 0.2, 0.4, 0.6, 0.8, 1}. The models were trained for sizes in {20, 40, 60} and 60000
epochs with Adam as optimizer and a learning rate of 0.0001.

each model and the train and test loss values of the respective best model, i.e. the

model at that training step which achieves the best test loss. For the given problem

setting the best test loss is achieved by size 60. The runtime increases linearly with

45
Results

Table 3.5: Runtime of the models with sine as IC trained for 60000 epochs with sizes in
{20, 40, 60}. Additionally, the train and test loss values of the best models are depicted.
size runtime train loss test loss

20 15.3h 1.2e-04 3e-04

40 17.8h 3e-05 9.5e-05

60 19.5h 4e-05 7.35e-05

Table 3.6: Predictive errors of the model trained for solving the one-dimensional heat
equation with sine as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and size 60. The two
left columns represent the left (l) and right (r) boundary values, respectively. Additionally,
the maximal relative L∞ error with locations and the overall relative L2 error are depicted.
l r (x, t)T max. error rel. L2 error

0.2 0.0 (1.65, 0.05)T 0.7% 5.28e-03

0.3 0.0 (1.65, 0.05)T 0.74% 5.51e-03

0.4 0.0 (1.62, 0.05)T 0.74% 5.38e-03

0.4 0.4 (1.52, 0.02)T 0.68% 4.47e-03

0.4 0.7 (1.55, 0.02)T 0.62% 3.85e-03


T
0.4 0.9 (1.59, 0.06) 0.57% 3.24e-03
T
0.5 0.0 (1.62, 0.05) 0.69% 4.95e-03

0.6 0.7 (1.52, 0.01)T 0.57% 3.64e-03

0.8 0.0 (0.29, 0.03)T 0.46% 2.38e-03

0.8 0.3 (1.55, 0.02)T 0.52% 3.41e-03

0.8 0.7 (1.52, 0.01)T 0.53% 3.02e-03

1.0 0.7 (1.59, 0.01)T 0.54% 2e-03


T
1.0 0.9 (1.62, 0.01) 0.52% 1.47e-03

46
3.2 Fixed Initial and Various Boundary Conditions

growing size of the training dataset. For size 60 the training process has a runtime

of about 19.5 hours.

We are looking at the results obtained by the model trained with size 60. We

predict the solutions of the one-dimensional heat equation with BCs in [0, 1] and

stepsize 0.1 resulting in 121 cases. Some examples of predicted solutions with their

relative L∞ error are pictured in Figure 3.8. In Table 3.6, we additionally show the

maximal relative L∞ error and their locations, as well as the relative L2 error. For

better readibility, in the following we refer by maximal error to the maximal relative

L∞ error and by overall error to the relative L2 error. By l , r we denote the left and

right boundary value, respectively.

For all cases we have only a small maximal error of less than 1%. The largest

(maximal and L2 ) l = 0.3, r = 0.0 shown in Figure


error can be observed for case
−3
3.8b. Here, the maximal error is of 0.74% and the relative L2 error is 5.51 × 10 .

A similar error can be observed for cases l = 0.2, r = 0.0 and l = 0.4, r = 0.0 which

are part of the training data. This can be seen in Figures 3.8a, c and Table 3.6. It

occurs that for xed l ∈ {0.2, 0.3, 0.4, 0.5} the overall error is largest for r = 0.0 and
then decreases for growing r as indicated by Figures 3.8c-f. This behavior changes

for l ∈ {0.6, 0.7, 0.8, 0.9}. Here, the overall error increases up to a certain point and

then starts decreasing again which is shown in Figures 3.8h-j. The smallest relative

L2 error of 1.47 × 10−3 is achieved by case l = 1.0, r = 0.9 (see Fig. 3.8l) which is

not part of the training data.

47
Results

(a) l=0.2, r=0.0 (b) l=0.3, r=0.0

(c) l=0.4, r=0.0 (d) l=0.4, r=0.4

(e) l=0.4, r=0.7 (f ) l=0.4, r=0.9

Figure 3.8: Predicted solutions of a PINN trained with sine as IC and 60 training points
for various left (l) and right (r) boundary values. The black markers represent the used
training points. Cases where no markers are visible are not part of the training data. Top :
Prediction. Bottom : Relative L∞ error. (a)-(c): Minor errors at around x = π2 near initial
condition. Error vanishes in positive t-direction. (c)-(f): Error vanishes for xed l and
growing r.

48
3.2 Fixed Initial and Various Boundary Conditions

(g) l=0.6, r=0.7 (h) l=0.8, r=0.0

(i) l=0.8, r=0.3 (j) l=0.8, r=0.7

(k) l=1.0, r=0.7 (l) l=1.0, r=0.9

Figure 3.8: Predicted solutions of a PINN trained with sine as IC and 60 training points
for various left (l) and right (r) boundary values. The black markers represent the used
training points. Cases where no markers are visible are not part of the training data. Top :
Prediction. Bottom : Relative L∞ error. For all cases one can only see minor errors smaller
than 0.6%. (g): Minor errors (up to 0.57%) near initial condition. (h)-(j): The overall error
decreases for xed l and growing r. (l): Case of smallest relative L2 error (1.47 × 10−3 ).

49
Results

3.2.2 Initial Condition: Line with Discontinuous

Boundaries

In Figure 3.9 the train and tess losses of the models trained with sizes in {40, 60, 80}
are pictured. As initial condition we choose the line u0 (x) = −x + 1 where the
boundary has varying values in M. For cases 60 and 80 the loss curves show an

overtting of the models which is indicated by the large gap between the train and

test losses, respectively. For size 60 this gap is larger. Here, the best model is

already achieved at epoch 8700. The corresponding loss values and the runtimes are

depicted in Table 3.7. For this setting we do not consider a training beyond 20000

epochs since this amount is already sucient for the test loss curves to converge as

can be seen in Figure 3.9. For a size of 80 the runtime of the training process is 7.2

hours.

Figure 3.9: Train and test losses of models trained with the line u0 (x) = −x + 1 as IC and
boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1}. The models were trained for sizes in {40, 60, 80}
and 20000 epochs with Adam as optimizer and a learning rate of 0.0001.

We consider results obtained by the model trained with size 80. As in the section

before, we predict the solutions of the one-dimensional heat equation with BCs in

[0, 1] and stepsize 0.1 resulting in 121 cases. In Figure 3.10, some cases of predicted

solutions with the respective relative L∞ error are pictured. The corresponding

maximal errors with their locations as well as the relative L2 errors can be found in

Table 3.8.

For all cases one can see a large error in those areas where the discontinuities occur.

50
3.2 Fixed Initial and Various Boundary Conditions

(a) l=0.0, r=0.0 (b) l=0.3, r=0.0

(c) l=0.3, r=0.5 (d) l=0.3, r=0.9

(e) l=0.0, r=0.2 (f ) l=0.4, r=0.2

Figure 3.10: Predicted solutions of a PINN trained with the line u0 (x) = −x + 1 as IC
and 80 training points for various left (l) and right (r) boundary values. The black markers
represent the used training points. Cases where no markers are visible are not part of the
training data. Top : Exact solution. Middle : Prediction. Bottom : Relative L∞ error.
(a): Largest maximal error (63.13%) and largest overall error (1.28 × 10−1 ) of all cases.
(b)-(d): Large errors (up to 40.98%) near the boundaries where the discontinuities occur.
For growing r the error areas neart the initial condition decrease.

51
Results

(g) l=0.7, r=0.2 (h) l=1.0, r=0.2

(i) l=0.9, r=0.0 (j) l=0.9, r=0.1

(k) l=0.9, r=0.2 (l) l=0.9, r=0.3

Figure 3.10: Predicted solutions of a PINN trained with the line u0 (x) = −x + 1 as IC
and 80 training points for various left (l) and right (r) boundary values. The black markers
represent the used training points. Cases where no markers are visible are not part of the
training data. Top : Exact solution. Middle : Prediction. Bottom : Relative L∞ error.
(i)-(l): Errors (up to 9.62%) are made near the boundaries where the discontinuities occur.
For growing r the error areas decrease or vanish.

52
3.2 Fixed Initial and Various Boundary Conditions

(a) l=0.0, r=0.0 (b) l=0.3, r=0.0

(c) l=0.3, r=0.5 (d) l=0.3, r=0.9

(e) l=0.9, r=0.1 (f ) l=0.9, r=0.3

Figure 3.11: Line plots of exact and predicted solutions of the one-dimensional heat
equation with the line u0 (x) = −x + 1 as IC and discontinuous boundaries depicted at
time steps t = 0, 0.1, 0.2. (a)-(b): For t = 0 the prediction becomes negative at boundary
x = 1. (c)-(f): The larger the dierence between the line and BC, the larger the error at
the corresponding boundary.

53
Results

Table 3.7: Runtime of the models trained with the line u0 (x) = −x + 1 as IC and sizes
in {40, 60, 80} for 20000 epochs. Additionally, the train and test loss values of the best
models are depicted.
size runtime train loss test loss

40 5.9h 2.78e-03 3.67e-03

60 6.7h 3e-03 3.48e-03

80 7.2h 5e-04 1.46e-03

Table 3.8: Predictive errors of the model trained for solving the heat equation ut = λuxx
with the line u0 (x) = −x + 1 as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and a
training size of 80. The two left columns represent the left (l) and right (r) boundary
values, respectively. Additionally, the maximal relative L∞ error with locations and the
overall relative L2 error are depicted.
l r (x, t)T max. error rel. L2 error
T
0.0 0.0 (0, 0.01) 63.13% 1.28e-01

0.3 0.0 (0, 0.01)T 40.98% 5.1e-02

0.3 0.5 (0, 0.01)T 30.28% 2.3e-02

0.3 0.9 (0, 0.01)T 31.11% 2.04e-02

0.9 0.0 (1, 0.03)T 8.3% 1.66e-02

0.9 0.1 (0, 0.03)T 6.24% 1.06e-02


T
0.9 0.2 (1, 0.01) 6.25% 8.1e-03
T
0.9 0.3 (1, 0.01) 9.62% 7.58e-03

0.0 0.2 (0, 0.01)T 53.94% 7.39e-02

0.4 0.2 (0, 0.01)T 29.41% 2.74e-02

0.7 0.2 (0, 0.01)T 11.18% 1.15e-02

1.0 0.2 (0, 0.02)T 9.49% 1.29e-02

The largest maximal error of 63.13% is obtained by case l = 0.0, r = 0.0 depicted in
Figure 3.10a. Depending on the boundary condition, the error propagates further

in positive t-direction. For a larger dierence between the boundary values and the

line, or in cases where the right boundary, i.e. x = 1, tends to zero, this propagation
is more distinct. Note that for most cases, the maximal error is in other areas about

1% or smaller. As already mentioned, the largest maximal error is made in case

of l = 0.0, r = 0.0. In addition to that, this case makes an overall error of 0.128

which is also the largest compared to the other cases. The prediction with the lowest
−3
relative L2 error (7.58×10 ) is obtained by case l = 0.9, r = 0.3 (Fig. 3.10l) as

depicted in Table 3.8.

54
3.2 Fixed Initial and Various Boundary Conditions

In Figure 3.11 we additionally display for some cases how the model tries to t

the BC at t = 0. Especially when comparing 3.11c and 3.11d, one can see that the

larger the dierence between the line and boundary value, the larger the L∞ error

at the respective boundary. This can be veried by looking at the corresponding

values for xed l = 0.9 or r = 0.2 in Table 3.8. An exception provide the cases where
r = 0.0, i.e. u(1, 0) = 0. Although the dierence tends to zero here, the prediction
becomes negative in these cases leading to a larger error.

55
Results

3.2.3 Additional Nonlinear Term: Radiative Loss

We investigate the heat equation with an additional nonliner term, i.e. ut = λuxx −
σu4 . Here, we consider models trained with sine as initial condition and boundary

values in M. The sizes of the training datasets are in {30, 40, 60}. In Figure 3.12

one can see the train and tess losses of the considered models. For all cases the

loss curves proceed in the same order of magnitude. However, for epochs larger

than 40000 the curves start departing from each other slightly. The best model is

achieved for size 60. In Table 3.9 the runtimes of the models are displayed. The

duration of the training process for size 60 is 19.6h which is similar to the case with

sine as IC and without the additional nonlinear term.

Figure 3.12: Train and test losses of models trained to solve the nonlinear heat equation
ut = λuxx − σu4 with sine as IC and boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1}. The
models were trained for sizes in {30, 40, 60} and 60000 epochs with Adam as optimizer and
a learning rate of 0.0001.

Table 3.9: Runtime of the models trained with an additional nonlinear term with sizes
in {30, 40, 60} for 60000 epochs. Additionally, the train and test loss values of the best
models are depicted.
size runtime train loss test loss

30 17.2h 1.6e-04 5.5e-04

40 18h 6e-05 2.3e-04

60 19.6h 7e-05 1.6e-04

We are looking at the results obtained by the model trained with size 60. As

56
3.2 Fixed Initial and Various Boundary Conditions

before, we predict the solution for boundary values in [0, 1] with stepsize 0.1. In

Figure 3.13 we have displayed some examples of the 121 cases. We further show

the corresponding overall and maximal errors with the respective locations in Table

3.10.

Table 3.10: Predictive errors of the model trained for solving the nonlinear heat equation
ut = λuxx − σu4 with sine as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and size 60.
The two left columns represent the left (l) and right (r) boundary values, respectively.
Additionally, the maximal relative L∞ error with locations and the overall relative L2
error are depicted.
l r (x, t)T max. error rel. L2 error
T
0.1 0.0 (2.79, 0.01) 1.78% 1.24e-02

0.1 0.2 (0.48, 0.05)T 0.99% 1.02e-02

0.1 0.6 (1.68, 0.06)T 1.3% 9.66e-03

0.1 0.9 (1.94, 0.03)T 1.96% 1.24e-02

0.3 0.1 (2.79, 0.01)T 1.18% 8.29e-03

0.5 0.0 (1.36, 0.06)T 1.24% 9.63e-03

0.8 0.0 (1.33, 0.04)T 1.73% 1.09e-02


T
0.8 0.2 (1.43, 0.04) 2.14% 1.31e-02

0.8 0.6 (1.55, 0.03)T 2.77% 1.66e-02

0.8 0.9 (1.52, 0.02)T 3.28% 1.95e-02

1.0 0.0 (0, 0)T 5.12% 1.26e-02

1.0 1.0 (1.49, 0.02)T 4% 2.25e-02

It appears that for xed l ∈ {0, 0.1, 0.2, 0.3, 0.4} and r = 0.0, the prediction shows
a larger error close to the boundaries compared to other regions of the domain. For

growing r this error decreases. Instead, the error concentrates on the area around
(x, t)T = ( π2 , 0.05)T as can be seen in Figures 3.13a-d. Figure 3.13f shows that this
scheme is only barely present for l = 0.5. Here, one can already see the concentration

of the error on the aforementioned area for r = 0.0 having a maximal error of 1.24%.

For a xed l > 0.5 and r = 0.0, the relative L∞ error is largest in the same area

as mentioned before. For growing r this area increases which is shown by Figures
−2
3.13g-j. The largest relative L2 error of 2.25 × 10 is obtained by case l = 1.0,

r = 1.0. The smallest overall error of 8.28 × 10−3 is achieved by the case l = 0.3,
r = 0.1 shown by Figure 3.13e.
In Figure 3.14 we have pictured for various cases line plots of the exact and

predicted solutions at time levels t ∈ {0, 0.1, 0.2}. The functions shown in Figures

3.14a-b assume the shape of a parabola. In these cases the errors are mainly made

57
Results

(a) l=0.1, r=0.0 (b) l=0.1, r=0.2

(c) l=0.1, r=0.6 (d) l=0.1, r=0.9

(e) l=0.3, r=0.1 (f ) l=0.5, r=0.0

Figure 3.13: Predicted solutions of a PINN trained for solving the nonlinear heat equation
ut = λuxx − σu4 with sine as IC and 60 training points for various left (l) and right (r)
boundary values. All depicted cases represent unseen data. Top : Prediction. Bottom :
Relative L∞ error. (a): Large errors (up to 1.78%) at the boundaries. (b)-(d): For
growing r the error translocates to the increasing area around (x, t)T = ( π2 , 0.05)T . (e):
Case with smallest relative L2 error of 8.29e-03. (f): Errors mainly near boundary x = π
and the area around (x, t)T = ( π2 , 0.05)T .

58
3.2 Fixed Initial and Various Boundary Conditions

(g) l=0.8, r=0.0 (h) l=0.8, r=0.2

(i) l=0.8, r=0.6 (j) l=0.8, r=0.9

(k) l=1.0, r=0.0 (l) l=1.0, r=1.0

Figure 3.13: Predicted solutions of a PINN trained for solving the nonlinear heat equation
ut = λuxx − σu4 with sine as IC and 60 training points for various left (l) and right (r)
boundary values. The black markers represent the used training points. Cases where no
markers are visible are not part of the training data.Top : Prediction. Bottom : Relative
L∞ error. (g)-(j): Large error in the area around (x, t)T = ( π2 , 0.05)T . Increasing area of
error for growing r. (k): Case with largest relative L∞ error (5.12%) made at (0, 0)T . (l):
Case with largest relative L2 error of 2.25 × 10−2 .

59
Results

(a) l=0.1, r=0.2 (b) l=0.3, r=0.1

(c) l=0.8, r=0.6 (d) l=0.8, r=0.9

(e) l=1.0, r=0.0 (f ) l=1.0, r=1.0

Figure 3.14: Line plots of exact and predicted solutions of the nonlinear heat equation
ut = λuxx − σu4 with sine as IC, boundary values in {0, 0.2, 0.4, 0.6, 0.8, 1} and size 60
depicted at time steps t = 0, 0.1, 0.2. One can see that the more peaked the indentations
at the boundaries, the larger the error at the vertex of the parabolic shaped functions.
(a)-(b): Minor errors of less than 1% around x = π2 for t ∈ {0.1, 0.2}. (c)-(d): Larger
error of about 3% around x = π2 for t ∈ {0.1, 0.2}. (e): Large deviation between exact and
predicted solution at (0, 0)T . (f): Large error around x = π2 .

60
3.2 Fixed Initial and Various Boundary Conditions

at the vertex of the parabola. Note that these are only minor errors as mentioned

above. In cases shown in Figures 3.14c-f, for t>0 the shape of the parabola is still

recognizable. However, indentations are visible at the boundaries. For 3.14d,f these

indentations are at both boundary points and in 3.14c,e there are especially at the

left boundary, i.e. x = 0. Note that in case of l = 1.0, r = 0.0 the prediction already
T
tries to simulate this indent at (0, 0) resulting in the largest maximal error of all

cases (see Fig. 3.14e). Regarding these indents one can see that the more peaked

the indentations, the greater the error between the exact and predicted solutions at

the vertex of the parabola.

61
Results

3.3 Various Boundary and Initial Conditions

We present the results obtained by training a PINN such that it is able to solve

the one-dimensional heat equation ut = λuxx for arbitrary initial and boundary

conditions. Here, we trained the models with several initial conditions represented

by six of their corresponding Fourier coecients and sizes in {40, 60, 80}. In Figure

3.15 we have plotted the losses of the considered models. One can see that all

Figure 3.15: Train and test losses of models trained with several initial functions parame-
terized by six Fourier coecients and boundary values given implicit by the corresponding
initial functions. The models were trained for sizes in {40, 60, 80} and 60000 epochs with
Adam as optimizer and a learning rate of 0.0001.

curves still have potential to further decrease. The train and test curves of each

considered size proceed close to each other. It appears, that the test curves of the

models trained with size 60 and 80 almost lie on top of each other. The best model is

achieved by size 80. In Table 3.11 we have displayed the runtimes of the considered

Table 3.11: Runtime of the models trained with several initial functions parameterized
by Fourier coecients and sizes in {40, 60, 80} for 60000 epochs. Additionally, the train
and test loss values of the best models are depicted.
size runtime train loss test loss

40 10d 23h 8.4e-04 7.25e-04

60 16d 22h 4.45e-04 4.51e-04

80 19d 11h 3.6e-04 4.44e-04

62
3.3 Various Boundary and Initial Conditions

models. This setting required a tremendous amount of time for training. The model

trained with 80 points per function required about 19.5 days for 60000 epochs.

In the following we present results achieved by the model trained with size 80.

In this setting we consider the training and test data, separately. Note, that the

numbering of functions are independent from each other, i.e. No.1 from the training

data is a dierent function than No. 1 from the test data.

Training Data
In Figures 3.17 and 3.18 we have displayed some of the 2065 initial functions we

have used to train the model. In the former we collect some functions which the

model managed to learn, in the sense that the prediction ts the shape of the exact

solution, and in the latter we exemplarily show cases where the model fails to predict

the correct solution. In Table 3.12 we depict the errors for some of the functions.

Table 3.12: Predictive errors of generalized model trained for solving the one-dimensional
heat equation ut = λuxx with several initial functions parameterized by six Fourier coe-
cients and size 80. The model was evaluated on training data. The left column represents
the function number. Additionally, the maximal relative L∞ error with locations and the
overall relative L2 error are depicted.
No. (x, t)T max. error rel. L2 error
T
69 (0, 0) 6.74% 1.03e-02
T
280 (0, 0) 12.12% 1.97e-02

314 (0.49, 0)T 25.86% 7.99e-02

369 (0.51, 0)T 21.37% 2.4e-02

384 (0.19, 0)T 5.54% 1.2e-02

432 (0.83, 0.29)T 29.8% 1.21e-01

794 (0, 0)T 4.7% 5.6e-03


T
1085 (0, 0) 1.85% 3.75e-03
T
1166 (0.66, 0.01) 0.72% 4.82e-03

1581 (0.59, 0.01)T 1.86% 4.46e-03

1711 (0, 0)T 1.7% 4.41e-03

1946 (1, 0)T 1.71% 5.12e-03

2050 (0.7, 0)T 1.57% 4.51e-03

Regarding the functions shown in Figure 3.17, it appears that for odd functions

the prediction at t=0 ts more the exact solution than for even functions. Larger

errors are especially made at the boundaries here. This can be seen in Figures

63
Results

3.17a, b, e, h and also in Figures 3.18a-d where even functions are depicted whose

prediction error is large. Function No. 432 (Fig. 3.18c) even exhibits the largest
−3
overall error of 0.121. The smallest relative L2 error of 3.75×10 is achieved by

function No. 1085 (see Fig. 3.17k).

For better visualization of the model's performance on the training data we show

a histogram depicted in Figure 3.16. About 90% of all 2065 functions achieved a L2
error of less than 2%. The remaining functions have at least an overall error of less

than 10%.

Figure 3.16: Histogram representing the frequency distribution of the (relative L2 ) error.
Considered are the predictions of the model evaluated at the 2065 functions that were used
as initial conditions for training.

64
3.3 Various Boundary and Initial Conditions

(a) No. 384 (b) No. 280

(c) No. 1166 (d) No. 1976

(e) No. 282 (f ) No. 2050

Figure 3.17: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. Depicted are functions at time levels t = 0, 0.1, 0.2 which were used
for training. One can see relatively small prediction errors. (a),(b),(e): Even functions
provide larger errors, especially at the boundaries.

65
Results

(g) No. 1852 (h) No. 289

(i) No. 345 (j) No. 743

(k) No. 1085 (l) No. 794

Figure 3.17: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. Depicted are functions at time levels t = 0, 0.1, 0.2 which were used for
training. (g)-(j): Even functions make larger errors than odd functions. (k): Case with
the smallest realtive L2 error (3.75×10−3 ).

66
3.3 Various Boundary and Initial Conditions

(a) No. 69 (b) No. 314

(c) No. 432 (d) No. 369

(e) No. 1581 (f ) No. 1711

Figure 3.18: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. Depicted are functions at time levels t = 0, 0.1, 0.2 which were used for
training. One can see large prediction errors.

67
Results

Test Data
We now illustrate the performance of the model on unseen data. The test data

contains functions that are similar to the functions contained in the training data

except that they have a dierent scaling. As in case of the training data, we show

test functions that are well-tted and functions with a large prediction error, sepa-

rately. In Figure 3.19 some of the functions which show a low prediction error are

depicted. It is noticeable that this collection includes especially odd functions. We

already observed in the training data that smaller prediction errors are made for

odd functions compared to even ones. The best prediction results are achieved by

functions No. 289 and 391 (see Figs 3.19b, a and Tab. 3.13). No. 391 has only
−3
an overall error of 4.24×10 whereas No. 289 has the smallest relative L2 error of
−3
3.75×10 . Note that this is the same smallest overall error that was achieved in

the training data as well. In Figure 3.20 one can see some of the functions with a

poor prediction result. Particularly function No. 140 (see Fig. 3.20d) has a large

prediction error with a L2 error of 264%. Unlike the previous presented functions,

No. 122 and 98 (see Figs. 3.20a, b) have their maximal error not near the initial

condition. For function No. 122 it is near t = 0.67 with a maximal error of 86.11%

and in case of No. 98 it is at t = 1.

Table 3.13: Predictive errors of generalized model trained for solving the one-dimensional
heat equation ut = λuxx with several initial functions parameterized by six Fourier coef-
cients and size 80. The model was evaluated on test data. The left column represents
the function number. Additionally, the maximal relative L∞ error with locations and the
overall relative L2 error are depicted.
No. (x, t)T max. error rel. L2 error

18 (1, 0)T 141.21% 5.33e-01


T
84 (0, 0) 12.12% 2e-02
T
98 (0.01, 1) 21.78% 3.3e-01

120 (1, 0)T 110.74% 6.63e-01

122 (0.85, 0.67)T 86.11% 7.62e-01

140 (0, 0)T 401.17% 2.64e-00

176 (0.76, 0)T 29.76% 5.37e-02

208 (0, 0)T 6.11% 1e-02


T
280 (0.63, 0.02) 1.25% 5.49e-03
T
289 (0, 0) 1.92% 3.75e-03

314 (0, 0)T 3.49% 4.5e-03

391 (0.65, 0)T 0.99% 4.24e-03

68
3.3 Various Boundary and Initial Conditions

(a) No. 391 (b) No. 289

(c) No. 84 (d) No. 208

(e) No. 280 (f ) No. 314

Figure 3.19: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. The model was evaluated on test data. Depicted are functions at time
levels t = 0, 0.1, 0.2 whose prediction is close to the exact solution.

69
Results

(a) No. 122 (b) No. 98

(c) No. 120 (d) No. 140

(e) No. 176 (f ) No. 18

Figure 3.20: Line plots of exact and predicted solutions of the model trained for solving
the one-dimensional heat equation with various ICs parameterized by six Fourier coe-
cients and size 80. The model was evaluated on test data. One can see large prediction
errors. (a)-(b): Functions depicted at time levels t = 0, 0.5, 1 which have a large error at
areas dierent from t = 0. (c)-(f): Functions depicted at time levels t = 0, 0.1, 0.2.

70
3.3 Various Boundary and Initial Conditions

For the test data, we have also a histogram depicted in Figure 3.21 which shows

the amount of functions having a certain relative L2 error. Here, about 75% of the

functions show an error of less than 2%. However, more than 12.5% of all functions

show an error of larger than 10%. Of these functions 1.5% have even an error larger

than 100%.

Figure 3.21: Histogram representing the frequency distribution of the (relative L2 ) error.
Shown are the prediction errors of the model evaluated at 392 test functions.

71
4 Discussion
The main purpose of this thesis is to investigate on the question if PINNs can be

generalized such that they are able to solve several PDE problems, i.e. to solve a

PDE with various boundary and initial conditions. The obtained results demand

a partial answering of this rather general question. Besides, encapsulated from the

generalization task, we examined what the added value of a PINN is compared to

a standard NN. To this end, we evaluate and interpret the key ndings of the afore

presented results by answering these two main questions by six separate subques-

tions.

1) How many training points are required in order to gain a reliable


model? For size 15 (see Fig. 3.2a), the large gap between train and test loss for
both the PINN and NN indicate an unsucient regularization which could lead to

a poor generalization. Here, the smaller gap for the PINN intimates that better

regularization has been achieved than for the NN. This was to be expected since

the PINN loss contains the PDE residual which acts as a regularization mechanism,

as mentioned in section 2.2.1 and in [4]. The corresponding solution plots (see Fig.

3.3a-b) and errors depicted in Tables 3.1 and 3.2, allow the conclusion that for the

considered problem already 15 training points are sucient for a PINN to learn the

problem. This can be justied regarding the fact that the prediction was performed

on a 100 × 100 mesh, i.e. 10000 data points, from which only 15 were used for
−3
training and nevertheless a small (relative L2 ) error of 5.7×10 can be achieved.

It should be noticed that no boundary or initial data was used for training which

causes the NN to make large errors at the boundaries (up to ≈11%). Still, the PINN

achieves good prediction accuracy at the boundaries with a maximal L∞ error of

only 2%.

It should be mentioned that this question is highly problem dependent. We con-

sidered the one-dimensional heat equation, a linear PDE, with sine as IC and zero

Dirichlet BCs. From the mathematical point of view, this problem can be regarded

as an "easy" solvable problem due to the linearity of the PDE and since the initial

73
Discussion

condition is smooth and innitely often dierentiable. Also, the boundary conditions

are simple compared to Neumann or Robin BCs which require derivative evaluations

themselves. It can be assumed that for more complex problems the required number

of training samples increases. This assumption is supported by Lu et al. [16] where

they mentioned that in case of Burger's equation, a nonlinear PDE, it should be

considered to put more training points near the sharp front which is characteristic

for this PDE such that the model is able to learn the discontinuity well. Also, Raissi,

Perdikaris, and Karniadakis note in [4] that "the total number of collocation points

needed [. . .] will increase exponentially" for higher dimensional problems.

2) Do PINNs have better extrapolation capabilities than NNs? At rst

glance, Figures 3.4a, b suggest that PINNs extrapolate better than NNs due to the

smaller error and the fact that the shape of sine is mostly maintained in case of

the PINN. However, this suggestion is not generally valid. This becomes clear for

increasing size of the training dataset, especially when looking at Figure 3.5 where

the relative L2 error is depicted for all considered sizes. The zick zack pattern

showcase that neither NN nor PINN performs better in terms of extrapolation such

that no valid answer can be given here. This was to be expected however since in

general, NNs do not boast proper extrapolation capabilities [40] and due to PINNs

being a special variant of NNs.

3) When are PINNs superior over NNs? In question 1 we already pointed to

the dierence betweeen PINN and NN for size 15, shown by the losses, solution plots,

and the corresponding errors. In this case, the PINN is superior over the NN due

to the observed better generalization. This observation can also be seen for case 30

(see Fig. 3.2b), where a still relatively large distance between the PINN and NN test

loss is visible. Also, the solution plots (see Fig. 3.3c-d) and errors (Tabs. 3.1, 3.2)

lead to a similar conclusion as for size 15. However, from size 40 on, one can see that

the PINN and NN perform similar. This is indicated by the losses (see Fig. 3.2c)

and undergird by the corresponding values (test loss, rel. L2 error, maximal rel. L∞
error) which only dier slightly from each other. For the considered problem (PDE),

a size of 40 training samples represents the threshold where the PINN provides no

signicantly advantage against the respective NN anymore. Although the solution

plots and error values for sizes larger than 30 are more promising in case of PINNs,

a severe bottleneck can be observed in terms of computational runtime. Already in

case of 40 training points, the training process of the NN is about three times faster.

This can be tied to the fact that for PINNs the gradient has to be evaluated in each

iteration. Figure 3.1b displays that for PINNs the computational time increases

74
linearly with growing size of the training dataset compared to the NN which shows

no signicantly increase of time for sizes up to 200. This leads to the conclusion

that PINNs are superior over NNs for small sizes of the training dataset where the

term "small" has to be seen relatively to the considered problem. This statement is

also conrmed by Raissi, Perdikaris, and Karniadakis [4] where they mentioned that

one of the key properties of PINNs is that they are able to achieve good prediction

results even though it may be only a small amount of data available where standard

NNs usually perform worse.

Can a PINN be generalized to solve the 1D heat equation for . . .


4) . . . xed initial and various boundary conditions? The results from sec-

tion 3.2 indicate with the aid of three cases that if the initial condition is xed, it is

indeed possible to generalize a PINN to solve the one-dimensional heat equation for

various BCs. In the following, we explain this statement on the basis of these cases

in more detail.

Initial Condition: Sine


The solution plots (see Fig. 3.8) and error values depicted in Table 3.6 allow the

conclusion that for sine as IC it is possible to generalize a PINN when varying the

boundary conditions. This is due to the fact that for all possible boundary values

only minor errors are made, especially for cases where no training data was used at

all, e.g. l=1.0, r=0.9, as seen in Figure 3.8l. This can also be veried by looking at

the relative L2 errors (see Tab. 3.6) which are all of order 10−3 , and the maximal

L∞ errors which for all cases are less than 1%. This bevavior ts the expection

made beforehand, regarding the fact that NNs have proper interpolation capabili-

ties in general [40]. The boundary values for which training data was used are in

{0, 0.2, . . . , 1}. For future work, it would be interesting to consider larger distances

between the values, e.g. {0, 0.5, 1}, and look at the evolution of the resulting gener-

alization errors.

Initial Condition: Line with Discontinuous Boundaries


Due to the discontinuity at the boundaries, large errors up to 63.13% (see Tab. 3.8)

occur in the region around the IC. This indicates that too few training points have

been selected in the areas where the discontinuities occur and thus the PINN is not

able to learn the discontinuity well. It can be assumed that adding more points in

those areas can improve the performance since it has already been shown with the

example of Burger's equation [4] that a PINN is able to learn discontinuities. A

possible reason might also be the fact that no form of discontinuity appears in the

75
Discussion

architecture of the network. The hyperbolic tangent which serves as activation func-

tion is continuous and innitely often dierentiable. In some future work, one can

investigate if this behavior changes when using an activation function which is also

not dierentiable in some point, e.g. the ReLU function which is not dierentiable

at x = 0. However, although the large errors near the boundaries lead to a larger

overall error, the relative L2 error stays within a reasonable order such that we can

argue that in this case the generalization is possible as well.

Additional Nonlinear Term: Radiative Loss


We compare this case (Case 3) to the case with sine as IC (Case 1) since the only

dierence between these two cases is the nonlinear term. Thus, we can expose ap-

propriately the inuence of the nonlinearity on the model. Comparing the solution

plots of Case 1 (see Fig. 3.8) and Case 3 (see Fig. 3.13), one can see the nonlin-

earity causes larger errors in general, especially near the initial condition. A reason

for that could be the fact that in these areas there is a great change of the graph of

the function as can be seen in Figures 3.14i,j,l. This means that in these areas the

gradient is steep and therefore more training points should be set at these locations

which is not the case as pictured in Figures 3.13i,l where the used training points

are displayed. This was already mentioned in question 1 where we emphasized that

for areas with steep gradients more training points are required. However, this can

only be done if the shape of function is known in advance, which is generally not

the case in practical applications. Nevertheless, the PINN provides good prediction

accuracy (see Tab. 3.10) such that it can be stated that for this setting it is possible

to generalize the model.

5) . . . various boundary and initial conditions? Considering the line plots

and corresponding errors it seems that odd functions are easier for the model to

learn than even functions. Especially Table 3.12 shows that on the training data the

relative L2 error is for even functions at least one order of magnitude higher than

for odd functions. This behavior can be observed on the test data as well. However,

this is not a statement that holds in general. Looking for example at Figure 3.20f,

one can see an odd function which shows large discrepancies between the prediction

and exact solution. Since a similar function contained in the test data (Fig. 3.19d)

achieved a L2 error of 1%, the model architecture seems to be sucient for the

model to learn this kind of function. Thus, it can be assumed that further training

can improve the prediction accuracy in this case. Additionally, it appears that very

large errors are made for functions whose shape seems to be "easy", e.g. the shape

of a parabola (see Fig. 3.20a). Specically, function No. 140 (Fig. 3.20d) shows a

76
4.1 Alternative Method: Learning Nonlinear Operators

prediction result which is not even close to optimal providing a L2 error of 264%,

although a similar function contained in the training data (see Fig. 3.17i) has been

learned well. A possible reason for that could be the frequency distribution of the

functions within the training data. Tendentially, the training data include more odd

functions than even. For possible future work, it should be considered to choose a

training dataset with an equal distribution of odd and even functions.

The histogram representing the frequency distribution of the (relative L2 ) error

on the train data (see Fig. 3.16) shows that at least 90% of all initial functions

achieve an error of less than 2%. Assuming that an error of 2% can be regarded

as a measure of "well-learned", one can say that the model is able to learn these

kind of functions. At rst glance, this is a promising result. However, regarding

the histogram representing the test functions (Fig. 3.21), one can see that 1.5% of

all test functions (6 out of 392) have a L2 error larger than 100%. Although these

few functions are not representative for the overall performance of the model, this

causes the model to be not reliable. Apart from the fact that further training could

improve the overall prediction accuracy, by the obtained results we cannot give a

profound answer here concerning the generalization question of this thesis.

6) Is it recommendable to use PINNs for a generalization task as pro-


posed in this thesis? Independent from the fact if PINNs can be generalized in
general, the computational cost for the training process is tremendous. Certainly,

there is still room for optimization, e.g. improve eciency of the implementation

code or optimizing the training dataset. Still, a duration of several weeks for the

training process cannot be ignored. An important point to mention here is, that we

only investigated problems in one dimension. For higher dimensions the runtime will

grow exponentially which makes this whole process not applicable in practice. For

real-world applications we recommend to consider alternative methods concerning

the generalization of PDEs for dierent instances of initial and boundary conditions,

respectively.

4.1 Alternative Method: Learning Nonlinear

Operators

Due to the necessary gradient evaluation in each iteration the training process is

exceedingly time-consuming. Although the presented methods could be further im-

proved in terms of accuracy and runtime, it should be considered to attempt alter-

native approaches. Dealing with the generalization capabilities of PINNs, we came

77
Discussion

across another method tackling a similar generalization task. We briey introduce

the method of learning nonlinear operators in order to solve a (partial) dierential

equation for several instances of the equation. Like standard NNs, only data is re-

quired instead of taking the considered equation into account. The basis forms an

approximation result which states that a neural network with a single hidden layer

can approximate accurately any nonlinear operator [41]. Lu, Jin, and Karniadakis

present in [42] how to use this result in practice and propose deep operator networks
(DeepONets), a data-driven method "to learn (nonlinear) operators accurately and

eciently from a relatively small dataset" [42].

We explain the idea of DeepONets by reference to Figure 4.1. Let X be a Banach


d d
space and K1 ⊂ X , K2 ⊂ R be two compact sets in X and R , respectively. V
is a compact set in C(K1 ) and G is a nonlinear continuous operator which maps V

into C(K2 ). The network to learn an operator G : u 7→ G(u) takes an input function

u ∈ V and a point y ∈ K2 , and outputs a real number G(u)(y) ∈ R (see Fig. 4.1A).
In order to work with the input function u numerically, it has to be represented

discretely. A straightforward way to do this is to evaluate the function at sucient

but nite many locations {x1 , . . . , xm } which in [42] are called sensors. The only

requirement they claim is the consistency of the sensors for all input functions u (see
Fig. 4.1B). The DeepONet consists of two subnetworks. The architecture is shown

in Figure 4.1C. There are p branch networks each taking (u(x1 ), u(x2 ), . . . , u(xm ))T
as the input and outputting a scalar bk ∈ R for k = 1, . . . , p. In addition to the

branch networks, there is the trunk network which takes y as the input and outputs

(t1 , . . . , tp )T ∈ Rp . The outputs are merged together such that


p
X
G(u)(y) ≈ bk tk .
k=1

In order to increase the performance by reducing the generalization error [42], biases

are added to the last layer of each branch network bk and also to the last stage

p
X
G(u)(y) ≈ bk tk + b0 .
k=1

The authors note that in practice, p is at least of the order 10 which results in

a network that is computationally expensive. To reduce the number of branch

networks, they merge them together into one single branch network (see Fig. 4.1D)

which outputs a vector (b1 , . . . , bp )T ∈ Rp . The authors refer to the two types of

DeepONets as stacked (Fig. 4.1) and unstacked (Fig. 4.1) DeepONet. All versions of

DeepONets are implemented in DeepXDE [17], a scientic machine learning library

78
4.1 Alternative Method: Learning Nonlinear Operators

Figure 4.1: Illustration of the problem setup and architectures of DeepONets. (A): The
network to learn an operator G : u 7→ G(u) takes two inputs (u(x1 ), u(x2 ), . . . , u(xm ))T
and y . (B): Illustration of the training data. For each input function u we require that
we have the same number of evaluations at the same scattered sensors x1 , x2 , . . . , xm .
However, there are no constraints enforced on the number of locations for the evaluation
of output functions. (C): The stacked DeepONet has one trunk network and p stacked
branch networks. (D): The unstacked DeepONet has one trunk network and one branch
network. Source [42].

which we already introduced in section 2.2.2.

In [42] they have applied DeepONets to four ordinary/partial dierential equation

problems and showed that by introducing this inductive bias DeepONets can achieve

smaller generalization errors compared to fully connected networks. They observed

"even exponential error convergence with respect to the training dataset size". Also,

the theoretically dependence of approximation error on dierent factors are derived.

Note that the authors emphasize that this method achieves good prediction results

when only small data is available. Thus, it can be promising to use this as an

alternative to PINNs for the generalization task.

79
5 Conclusion
We investigated on physics-informed neural networks (PINNs), i.e. neural networks

(NNs) which incorporate the considered PDE into the training process. Speci-

cally, we examined the question if PINNs can be generalized in order to solve PDE

problems for various instances of boundary and initial conditions.

Since PINNs are NNs which only dier in their training process, we rst explained

the method of NNs and elucidated the regularization mechanism which is used during

training in order to avoid overtting. Subsquently, we introduced the method of

PINNs and illustrated how a PDE is involved within the training process. The PINN

loss function is divided into a PDE and a boundary loss. Only the boundary loss

considers ground truth data consisting of boundary and initial data. The PDE loss

contains the considered PDE and evaluates the required derivatives via AD. It acts

as a regularization mechanism such that PINNs can be used when only partial data is

available. In this context we also pointed to the existence of the scientic machine

learning library DeepXDE which provides several advantages when investigating

PDE problems for xed instances of boundary and initial conditions. It is user-

friendly and has several features such as the exible monitoring of intermediate

results due to so-called callback functions. Also, the method of RAR increases

the training eciency by adding more training points in those areas where steep

gradients occur. To be able to evaluate our investigations, we produced reference

solutions of our problem using the software library FEniCS. FEniCS uses the nite

element method, the conventional method to solve PDEs numerically. The user

basically has to provide the considered problem in weak form, also called variational

formulation.

We used a PINN to solve the one-dimensional heat equation with Dirichlet bound-

ary conditions. In addition to the generalization task proposed in this thesis, we

compared a PINN to a NN with the same prerequisites for solving one instance of

the PDE, i.e. for xed initial and boundary conditions. As it turned out, PINNs

are superior over NNs in terms of accuracy for a small number of training samples.

However, we observed that a PINN requires more computational time than a NN

81
Conclusion

due to the gradient evaluations in each iteration. This can be a severe bottleneck

for higher dimensional problems as the runtime increases exponentially for growing

number of input features.

In order to be able to answer the generalization question properly, we processed

this task in two steps. First, we xed the initial condition and trained the model

with various instances of boundary conditions. Secondly, we trained the model with

a large amount of initial conditions and implicit given boundary conditions. Our

numerical results indicate that a PINN can be generalized in the rst setting. Due

to the long runtime in case of various initial conditions, it is not possible to give

an appropriate answer to the generalization question in this case. The training

took several weeks and it seems as if further training is still required because of

the partially poor predictive performance on both training and test data. Still, it

looks promising that PINNs can be generalized in this case considering the results

for xed initial conditions and the fact that there are indeed functions being part

of the test data where the model achieved good prediction accuracy. However, if

many instances are involved in the training data, one has to take into consideration

that this comes along with a massive amount of gradient evaluations which slows

down the overall training process. To improve this, one could optimize the selection

of functions contained in the training dataset in order to reduce the number of

gradient evaluations. Another approach for some future work could be adapting

the source code of DeepXDE such that it can handle various instances of initial and

boundary data. The implemented RAR method could reduce the required amount of

computational resources. Concerning practical applications, it should be considered

to attempt other methods when it comes to the generalization of PDEs for various

boundary and initial conditions. The method of learning nonlinear operators is a

data-driven concept which also aims to achieve good prediction results when only

partial data is available.

From the machine learning perspective it is rather unintuitive that for certain

parts of the input data no ground truth is actually used or even needed. How-

ever, from the scientic point of view it is promising that a well-known data-driven

concept, the neural network, is augmented in a way that it takes the analytical rep-

resentation of the problem into account. PINNs reveal their strengths in settings

where traditional solving methods fail to provide reliable predictions due to a lack

of data, e.g. unknown boundary or initial data. However, for applications where the

training requires a lot of input features or gradient evaluations our results show that

it should be consdiered to use dierent approaches. Nonetheless, for xed initial

conditions it is possible to use PINNs to solve general PDEs for various BCs.

82
Bibliography
[1] Alexander Maedche et al.  AI-Based Digital Assistants. In: Business & Infor-
mation Systems Engineering 61.4 (2019), pp. 535544. issn: 1867-0202. doi:

10.1007/s12599-019-00600-8. url: https://doi.org/10.1007/s12599-


019-00600-8.
[2] Sorin Grigorescu et al.  A survey of deep learning techniques for autonomous

driving. In: Journal of Field Robotics https:


37.3 (2020), pp. 362386. doi:

//doi.org/10.1002/rob.21918. url: https://onlinelibrary.wiley.


com/doi/abs/10.1002/rob.21918.
[3] Akira Taniguchi et al.  Autonomous planning based on spatial concepts to

tidy up home environments with service robots. In: Advanced Robotics 35.8

(2021), pp. 471489.

[4] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis.  Physics In-

formed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial

Dierential Equations. In: (Nov. 2017). arXiv: 1711.10561 [cs.AI].


[5] Atilim Gunes Baydin et al.  Automatic dierentiation in machine learning: a

survey. In: Journal of Machine Learning Research 18 (2018), pp. 143.

[6] Allan Pinkus.  Approximation theory of the MLP model in neural networks.

In: Acta numerica 8 (1999), pp. 143195.

[7] Yeonjong Shin, Jerome Darbon, and George Em Karniadakis.  On the conver-

gence of physics informed neural networks for linear second-order elliptic and

parabolic type PDEs. In: arXiv preprint arXiv:2004.01806 (2020).

[8] Maziar Raissi et al.  Deep learning of vortex-induced vibrations. In: Journal
of Fluid Mechanics 861 (2019), pp. 119137. doi: 10.1017/jfm.2018.872.
[9] Zhiping Mao, Ameya D. Jagtap, and George Em Karniadakis.  Physics-informed

neural networks for high-speed ows. In: Computer Methods in Applied Me-
chanics and Engineering https:
360 (2020), p. 112789. issn: 0045-7825. doi:

//doi.org/10.1016/j.cma.2019.112789. url: https://www.sciencedire


ct.com/science/article/pii/S0045782519306814.

83
Bibliography

[10] Xiaowei Jin et al.  NSFnets (Navier-Stokes ow nets): Physics-informed neural

networks for the incompressible Navier-Stokes equations. In: Journal of Com-


putational Physics 426 (2021), p. 109951. issn: 0021-9991. doi: https://doi.
org/10.1016/j.jcp.2020.109951. url: https://www.sciencedirect.
com/science/article/pii/S0021999120307257.
[11] Oliver Hennigh et al.  NVIDIA SimNet
TM : an AI-accelerated multi-physics

simulation framework. In: (2020). arXiv: 2012.07938 [physics.flu-dyn].


[12] Heat Transfer Prediction With Unknown Thermal Boundary Conditions Using
Physics-Informed Neural Networks. Vol. Volume 3: Computational Fluid Dy-
namics; Micro and Nano Fluid Dynamics. Fluids Engineering Division Sum-

mer Meeting. V003T05A054. July 2020. doi:10 . 1115 / FEDSM2020 - 20159.


eprint: https://asmedigitalcollection.asme.org/FEDSM/proceedings-

pdf/FEDSM2020/83730/V003T05A054/6575747/v003t05a054- fedsm2020-
20159.pdf. url: https://doi.org/10.1115/FEDSM2020-20159.
[13] Tongsheng Wang et al.  Reconstruction of natural convection within an en-

closure using deep neural network. In: International Journal of Heat and
Mass Transfer https://doi.
164 (2021), p. 120626. issn: 0017-9310. doi:

org/10.1016/j.ijheatmasstransfer.2020.120626. url: https://www.


sciencedirect.com/science/article/pii/S0017931020335626.
[14] Didier Lucor, Atul Agrawal, and Anne Sergent. Physics-aware deep neural
networks for surrogate modeling of turbulent natural convection. 2021. arXiv:
2103.03565 [cs.LG].
[15] Shengze Cai et al.  Flow over an espresso cup: inferring 3-D velocity and

pressure elds from tomographic background oriented Schlieren via physics-

informed neural networks. In: Journal of Fluid Mechanics 915 (2021).

[16] Lu Lu et al.  DeepXDE: A deep learning library for solving dierential equa-

tions. In: (July 2019). arXiv: 1907.04502 [cs.LG].


[17] DeepXDE, a library for scientic machine learning. https://github.com/
lululxvi/deepxde.
[18] Shengze Cai et al.  Physics-informed neural networks for heat transfer prob-

lems. In: Journal of Heat Transfer 143.6 (2021), p. 060801.

[19] Shengze Cai et al.  Physics-informed neural networks (PINNs) for uid me-

chanics: A review. In: arXiv preprint arXiv:2105.09506 (2021).

84
Bibliography

[20] Liu Yang, Xuhui Meng, and George Em Karniadakis.  B-PINNs: Bayesian

physics-informed neural networks for forward and inverse PDE problems with

noisy data. In: Journal of Computational Physics 425 (2021), p. 109913. issn:
https://doi.org/10.1016/j.jcp.2020.109913. url: http
0021-9991. doi:

s://www.sciencedirect.com/science/article/pii/S0021999120306872.
[21] Mark Craven, Johan Kumlien, et al.  Constructing biological knowledge bases

by extracting information from text sources. In: ISMB. Vol. 1999. 1999,

pp. 7786.

[22] Violetta Schäfer.  Automatic Dierentiation for High-Dimensional Optimiza-

tion Problems. Bachelor Thesis. 2019.

[23] Christopher J.C. Burges Yann LeCun Corinna Cortes. The MNIST Database
of Handwritten Digits. http://yann.lecun.com/exdb/mnist/. 1998.
[24] Niklas Rottmayer.  Deep Learning with Neural Networks. Bachelor Thesis.

2019.

[25] Chigozie Nwankpa et al.  Activation Functions: Comparison of trends in Prac-

tice and Research for Deep Learning. In: CoRR abs/1811.03378 (2018). arXiv:
1811.03378. url: http://arxiv.org/abs/1811.03378.
[26] cloud4science. Notes on Deep Learning and Dierential Equations. June 2020.
url:https://cloud4scieng.org/2020/06/10/notes-on-deep-learning-
and-differential-equations/.
[27] Ya-xiang Yuan.  A NEW STEPSIZE FOR THE STEEPEST DESCENT METHOD.

In: Journal of Computational Mathematics 24.2 (2006), pp. 149156. issn:

02549409, 19917139. url: http://www.jstor.org/stable/43694074.


[28] Diederik P. Kingma and Jimmy Ba.  Adam: A Method for Stochastic Opti-

mization. In: (Dec. 2014). arXiv: 1412.6980 [cs.LG].


[29] https://machinelearningmastery.com/
Jason Brownlee. Oct. 2021. url:

gradient-descent-with-momentum-from-scratch/.
[30] Christopher M.Bishop. Neural Networks for Pattern Recognition. Oxford Uni-
versity Press, 1995.

[31] René Pinnau.  Partial Dierential Equations: An Introduction. Lecture Notes.

[32] Isaac E Lagaris, Aristidis C Likas, and Dimitris G Papageorgiou.  Neural-

network methods for boundary value problems with irregular boundaries. In:

IEEE Transactions on Neural Networks 11.5 (2000), pp. 10411049.

85
Bibliography

[33] Richard H Byrd et al.  A limited memory algorithm for bound constrained

optimization. In: SIAM Journal on scientic computing 16.5 (1995), pp. 1190
1208.

[34] G. N. Wells et al. A. Logg K.-A. Mardal. Automated Solution of Dierential


Equations by the Finite Element Method. Springer. doi.org/10.1007/978-
3-642-23099-8. 2012.

[35] Hans Petter Langtangen and Anders Logg. Solving PDEs in Python - The
FEniCS Tutorial I. Springer, 2016.
[36] Bernd Simeon.  Numerical PDEs I. Lecture Notes.

[37] http://www-m15.ma.tum.de/foswiki/pub/M15/Allgemein
Oct. 2021. url:

es/IllposedProblems/WebHome/board4.pdf.
[38] Gabriele Steidl and Ronny Bergmann.  Fundamentals of Mathematical Image

Processing. Lecture Notes. 2017.

[39] André Neubauer. DFT - Diskrete Fourier-Transformation. Springer Vieweg,

2011.

[40] Etienne Barnard and LFA Wessels.  Extrapolation and interpolation in neural

network classiers. In: IEEE Control Systems Magazine 12.5 (1992), pp. 50

53.

[41] Tianping Chen and Hong Chen.  Universal approximation to nonlinear opera-

tors by neural networks with arbitrary activation functions and its application

to dynamical systems. In: IEEE Transactions on Neural Networks 6.4 (1995),


pp. 911917.

[42] Lu Lu, Pengzhan Jin, and George Em Karniadakis.  Deeponet: Learning non-

linear operators for identifying dierential equations based on the universal

approximation theorem of operators. In: arXiv preprint arXiv:1910.03193


(2019).

86

You might also like