0% found this document useful (0 votes)
59 views21 pages

Deepxde: A Deep Learning Library For Solving Differential Equations

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views21 pages

Deepxde: A Deep Learning Library For Solving Differential Equations

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

SIAM REVIEW \bigcirc

c 2021 Society for Industrial and Applied Mathematics


Vol. 63, No. 1, pp. 208–228

DeepXDE: A Deep Learning


Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Library for Solving Differential


Equations\ast
Lu Lu\dagger
Xuhui Meng\ddagger
Zhiping Mao\S
George Em Karniadakis\P

Abstract. Deep learning has achieved remarkable success in diverse applications; however, its use in
solving partial differential equations (PDEs) has emerged only recently. Here, we present
an overview of physics-informed neural networks (PINNs), which embed a PDE into the
loss of the neural network using automatic differentiation. The PINN algorithm is simple,
and it can be applied to different types of PDEs, including integro-differential equations,
fractional PDEs, and stochastic PDEs. Moreover, from an implementation point of view,
PINNs solve inverse problems as easily as forward problems. We propose a new residual-
based adaptive refinement (RAR) method to improve the training efficiency of PINNs.
For pedagogical reasons, we compare the PINN algorithm to a standard finite element
method. We also present a Python library for PINNs, DeepXDE, which is designed to
serve both as an educational tool to be used in the classroom as well as a research tool
for solving problems in computational science and engineering. Specifically, DeepXDE can
solve forward problems given initial and boundary conditions, as well as inverse problems
given some extra measurements. DeepXDE supports complex-geometry domains based on
the technique of constructive solid geometry and enables the user code to be compact,
resembling closely the mathematical formulation. We introduce the usage of DeepXDE
and its customizability, and we also demonstrate the capability of PINNs and the user-
friendliness of DeepXDE for five different examples. More broadly, DeepXDE contributes
to the more rapid development of the emerging scientific machine learning field.

Key words. education software, DeepXDE, differential equations, deep learning, physics-informed
neural networks, scientific machine learning

AMS subject classifications. 65-01, 65-04, 65L99, 65M99, 65N99

DOI. 10.1137/19M1274067

\ast Received by the editors July 10, 2019; accepted for publication (in revised form) May 8, 2020;

published electronically February 4, 2021.


https://doi.org/10.1137/19M1274067
Funding: This work was supported by DOE PhILMs project de-sc0019453, by AFOSR grant
FA9550-17-1-0013, and by DARPA-AIRA grant HR00111990025.
\dagger Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139 USA

(lu lu@mit.edu).
\ddagger Division of Applied Mathematics, Brown University, Providence, RI 02912 USA (xuhui meng@

brown.edu).
\S School of Mathematical Sciences, Xiamen University, Xiamen, Fujian 361005, China (zpmao@

xmu.edu.cn).
\P Division of Applied Mathematics, Brown University, Providence, RI 02912 USA, and Pacific

Northwest National Laboratory, Richland, WA 99354 USA (George Karniadakis@brown.edu).


208

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 209

1. Introduction. In the last 15 years, deep learning in the form of deep neural
networks has been used very effectively in diverse applications [32], such as computer
vision and natural language processing. Despite the remarkable success in these and
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

related areas, deep learning has not yet been widely used in the field of scientific
computing. However, more recently, solving partial differential equations (PDEs),
e.g., in the standard differential form or in the integral form, via deep learning has
emerged as a potentially new subfield under the name of scientific machine learning
(SciML) [4]. In particular, we can replace traditional numerical discretization methods
with a neural network that approximates the solution to a PDE.
To obtain the approximate solution of a PDE via deep learning, a key step is to
constrain the neural network to minimize the PDE residual, and several approaches
have been proposed to accomplish this. Compared to traditional mesh-based methods,
such as the finite difference method (FDM) and the finite element method (FEM),
deep learning could be a mesh-free approach by taking advantage of automatic dif-
ferentiation [47], and could break the curse of dimensionality [45, 18]. Among these
approaches, some can only be applied to particular types of problems, such as image-
like input domain [28, 33, 58] or parabolic PDEs [6, 19]. Some researchers adopt the
variational form of PDEs and minimize the corresponding energy functional [16, 20].
However, not all PDEs can be derived from a known functional, and thus Galerkin-
type projections have also been considered [39]. Alternatively, one could use the PDE
in strong form directly [15, 52, 30, 31, 7, 50, 47]; in this form, automatic differentia-
tion could be used directly to avoid truncation errors and the numerical quadrature
errors of variational forms. This strong form approach was introduced in [47], coining
the term physics-informed neural networks (PINNs). An attractive feature of PINNs
is that it can be used to solve inverse problems with minimum change of the code
for forward problems [47, 48, 51, 21, 13]. In addition, PINNs have been further ex-
tended to solve integro-differential equations (IDEs), fractional differential equations
(FDEs) [42], and stochastic differential equations (SDEs) [57, 55, 41, 56].
In this paper, we present various PINN algorithms implemented in the Python
library DeepXDE,1 which is designed to serve both as an educational tool to be
used in the classroom as well as a research tool for solving problems in computational
science and engineering (CSE). DeepXDE can be used to solve multiphysics problems,
and supports complex-geometry domains based on the technique of constructive solid
geometry (CSG), hence avoiding tedious and time-consuming computational geometry
tasks. By using DeepXDE, from an implementation point of view, time-dependent
PDEs can be solved as easily as steady states by only defining the initial conditions.
In addition to the main workflow of DeepXDE, users can readily monitor and modify
the solution process via callback functions, e.g., monitoring the Fourier spectrum of
the neural network solution, which can reveal the learning mode of the neural network
shown later in Figure 2. Last but not least, DeepXDE is designed to make the user
code stay compact and manageable, resembling closely the mathematical formulation.
The paper is organized as follows. In section 2, after briefly introducing deep
neural networks and automatic differentiation, we present the algorithm, approxima-
tion theory, and error analysis of PINNs, and we make a comparison between PINNs
and FEM. We then discuss how to use PINNs to solve IDEs and inverse problems. In
addition, we propose the RAR method to improve the training efficiency of PINNs.
In section 3, we introduce the usage of our library, DeepXDE, and its customizability.

1 Source code is published under the Apache License, Version 2.0, on GitHub: https://github.

com/lululxvi/deepxde.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


210 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

In section 4, we demonstrate the capability of PINNs and friendly use of DeepXDE


in five different examples. Finally, we conclude the paper in section 5.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

2. Algorithm and Theory of Physics-Informed Neural Networks. In this sec-


tion, we first provide a brief overview of deep neural networks and automatic differ-
entiation and present the algorithm and theory of PINNs for solving PDEs. We then
make a comparison between PINNs and FEM and discuss how to use PINNs to solve
IDEs and inverse problems. Next we propose RAR, an efficient way to select the
residual points adaptively during the training process.
2.1. Deep Neural Networks. Mathematically, a deep neural network is a par-
ticular choice of a compositional function. The simplest neural network is the feed-
forward neural network (FNN), also called multilayer perceptron (MLP), which ap-
plies linear and nonlinear transformations to the inputs recursively. Although many
different types of neural networks have been developed in the past decades, such as the
convolutional neural network and the recurrent neural network, we consider in this pa-
per FNN, which is sufficient for most PDE problems, and the residual neural network
(ResNet), which is easier to train for deep networks. However, it is straightforward
to employ other types of neural networks.
Let \scrN L (x) : \BbbR din \rightarrow \BbbR dout be an L-layer neural network, or an (L - 1)-hidden
layer neural network, with N\ell neurons in the \ell th layer (N0 = din , NL = dout ). Let
us denote the weight matrix and bias vector in the \ell th layer by \bfitW \ell \in \BbbR N\ell \times N\ell - 1 and
b\ell \in \BbbR N\ell , respectively. Given a nonlinear activation function \sigma , which is applied
elementwisely, the FNN is recursively defined as follows:

input layer: \scrN 0 (x) = x \in \BbbR din ,


hidden layers: \scrN \ell (x) = \sigma (\bfitW \ell \scrN \ell - 1 (x) + \bfitb \ell ) \in \BbbR N\ell for 1 \leq \ell \leq L - 1,
L L L - 1 L dout
output layer: \scrN (x) = \bfitW \scrN (x) + \bfitb \in \BbbR ;

see also a visualization of a neural network in Figure 1. Commonly used activation


functions include the logistic sigmoid 1/(1 + e - x ), the hyperbolic tangent (tanh), and
the rectified linear unit (ReLU, max\{ x, 0\} ).
2.2. Automatic Differentiation. In PINNs, it is required to compute the deriva-
tives of the network outputs with respect to the network inputs. There are four
possible methods for computing the derivatives [5, 38]: (1) hand-coded analytical
derivative; (2) finite difference or other numerical approximations; (3) symbolic dif-
ferentiation (used in software programs such as Mathematica, Maxima, and Maple);
and (4) automatic differentiation (AD; also called algorithmic differentiation). In
deep learning, the derivatives are evaluated using backpropagation [49], a specialized
technique of AD.
Considering the fact that the neural network represents a compositional function,
then AD applies the chain rule repeatedly to compute the derivatives. There are
two steps in AD: one forward pass to compute the values of all variables, and one
backward pass to compute the derivatives. To demonstrate AD, we consider an FNN
of only one hidden layer with two inputs, x1 and x2 , and one output, y:

v = - 2x1 + 3x2 + 0.5,


h = tanh v,
y = 2h - 1.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 211
\partial y
The forward pass and backward pass of AD for computing the partial derivatives \partial x 1
\partial y
and \partial x 2
at (x 1 , x2 ) = (2, 1) are shown in Table 1.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

We can see that AD requires only one forward pass and one backward pass to com-
pute all the partial derivatives, no matter what the input dimension is. In contrast,
\partial y
using finite differences computing each partial derivative \partial x i
requires two function val-
uations y(x1 , . . . , xi , . . . , xdin ) and y(x1 , . . . , xi + \Delta xi , . . . , xdin ) for some small number
\Delta xi , and thus in total din + 1 forward passes are required to evaluate all the partial
derivatives. Hence, AD is much more efficient than finite difference when the input
dimension is high (see [5, 38] for more details of the comparison between AD and the
other three methods). To compute nth-order derivatives, AD can be applied recur-
sively n times. However, this nested approach may lead to inefficiency and numerical
instability, and hence other methods, e.g., Taylor-mode AD, have been developed for
this purpose [8, 9]. Finally we note that with AD we differentiate the neural network
and therefore we can deal with noisy data [42].
\partial y \partial y
Table 1 Example of AD to compute the partial derivatives \partial x1
and \partial x2
at (x1 , x2 ) = (2, 1).

Forward pass Backward pass


\partial y
x1 = 2 \partial y
=1
x2 = 1
\partial y \partial (2h - 1)
v = - 2x1 + 3x2 + 0.5 = - 0.5 \partial h
= \partial h
=2
\partial y \partial y \partial h \partial y
h = tanh v \approx - 0.462 \partial v
= \partial h \partial v
= \partial h
sech2 (v) \approx 1.573
\partial y \partial y \partial v \partial y
y = 2h - 1 = - 1.924 \partial x1
= \partial v \partial x1
= \partial v
\times ( - 2) = - 3.146
\partial y \partial y \partial v \partial y
\partial x2
= \partial v \partial x2
= \partial v
\times 3 = 4.719

2.3. Physics-Informed Neural Networks (PINNs) for Solving PDEs. We con-


sider the following PDE parameterized by \bfitlambda for the solution u(x) with x = (x1 , . . . , xd )
defined on a domain \Omega \subset \BbbR d :
\partial 2 u \partial 2 u
\biggl( \biggr)
\partial u \partial u
(2.1) f x; ,..., ; ,..., ; . . . ; \bfitlambda = 0, x \in \Omega ,
\partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd
with suitable boundary conditions

\scrB (u, x) = 0 on \partial \Omega ,

where \scrB (u, x) could be Dirichlet, Neumann, Robin, or periodic boundary conditions.
For time-dependent problems, we consider time t as a special component of x, and
\Omega contains the temporal domain. The initial condition can be simply treated as a
special type of Dirichlet boundary condition on the spatio-temporal domain.
The algorithm of PINN [31, 47] is shown in Procedure 2.1, and visually in the
\partial 2 u
schematic of Figure 1 solving a diffusion equation \partial u \partial t = \lambda \partial x2 with mixed boundary
\partial u
conditions u(x, t) = gD (x, t) on \Gamma D \subset \partial \Omega and \partial n (x, t) = gR (u, x, t) on \Gamma R \subset \partial \Omega . We
explain each step as follows. In a PINN, we first construct a neural network u(x; \^ \bfittheta )
as a surrogate of the solution u(x), which takes the input x and outputs a vector with
the same dimension as u. Here, \bfittheta = \{ \bfitW \ell , \bfitb \ell \} 1\leq \ell \leq L is the set of all weight matrices
and bias vectors in the neural network u. \^ One advantage of PINNs by choosing
neural networks as the surrogate of u is that we can take the derivatives of u \^ with
respect to its input x by applying the chain rule for differentiating compositions of

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


212 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

Procedure 2.1 The PINN algorithm for solving differential equations.


Step 1 Construct a neural network u(x; \^ \bfittheta ) with parameters \bfittheta .
Step 2 Specify the two training sets \scrT f and \scrT b for the equation and boundary/initial
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

conditions.
Step 3 Specify a loss function by summing the weighted L2 norm of both the PDE
equation and boundary condition residuals.
Step 4 Train the neural network to find the best parameters \bfittheta \ast by minimizing the
loss function \scrL (\bfittheta ; \scrT ).

PDE(λ)

NN(x, t; θ) ∂t
2
Tf
∂ û
∂t − λ ∂∂xû2
σ σ ∂2
∂x2
x σ σ
.. .. û Minimize
t . . I û(x, t) − gD (x, t) Loss θ∗

σ σ ∂ ∂ û
Tb
∂n ∂n (x, t) − gR (u, x, t)

BC & IC

2
Fig. 1 Schematic of a PINN for solving the diffusion equation \partial u \partial t
= \lambda \partial \partial xu
2 with mixed boundary
\partial u
conditions (BC) u(x, t) = gD (x, t) on \Gamma D \subset \partial \Omega and \partial \bfn (x, t) = gR (u, x, t) on \Gamma R \subset \partial \Omega . The
initial condition (IC) is treated as a special type of boundary condition. \scrT f and \scrT b denote
the two sets of residual points for the equation and BC/IC.

functions using AD, which is conveniently integrated in machine learning packages


such as TensorFlow [1] and PyTorch [43].
In the next step, we need to restrict the neural network u \^ to satisfy the physics
imposed by the PDE and boundary conditions. In practice, we restrict u \^ on some
scattered points (e.g., randomly distributed points, or clustered points in the domain
[37]), i.e., the training data \scrT = \{ x1 , x2 , . . . , x| \scrT | \} of size | \scrT | . In addition, \scrT comprises
two sets, \scrT f \subset \Omega and \scrT b \subset \partial \Omega , which are the points in the domain and on the
boundary, respectively. We refer to \scrT f and \scrT b as the sets of ``residual points.""
To measure the discrepancy between the neural network u \^ and the constraints,
we consider the loss function defined as the weighted summation of the L2 norm of
residuals for the equation and boundary conditions:

(2.2) \scrL (\bfittheta ; \scrT ) = wf \scrL f (\bfittheta ; \scrT f ) + wb \scrL b (\bfittheta ; \scrT b ),

where
\biggr) \bigm\| 2
\partial 2 u \partial 2 u
\bigm\| \biggl(
1 \sum \bigm\| \partial u
\^ \partial u
\^ \^ \^ \bigm\|
\scrL f (\bfittheta ; \scrT f ) = \bigm\| f x; ,..., ; ,..., ; . . . ; \bfitlambda \bigm\|
\bigm\| ,
| \scrT f | \bigm\| \partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd 2
x\in \scrT f
1 \sum
\scrL b (\bfittheta ; \scrT b ) = u, x)\| 22 ,
\| \scrB (\^
| \scrT b |
x\in \scrT b

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 213

and wf and wb are the weights. The loss involves derivatives, such as the partial
derivative \partial u/\partial x \^ 1 or the normal derivative at the boundary \partial u/\partial n
\^ = \nabla \^
u \cdot n, which
are handled via AD.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

In the last step, the procedure of searching for a good \bfittheta by minimizing the loss
\scrL (\bfittheta ; \scrT ) is called ``training."" Considering the fact that the loss is highly nonlinear and
nonconvex with respect to \bfittheta [10], we usually minimize the loss function by gradient-
based optimizers such as gradient descent, Adam [29], and L-BFGS [12]. We remark
that, based on our experience, for smooth PDE solutions L-BFGS can find a good
solution with fewer iterations than Adam, because L-BFGS uses second-order deriva-
tives of the loss function, while Adam relies only on first-order derivatives. However,
for stiff solutions L-BFGS is more likely to be stuck at a bad local minimum. The
required number of iterations highly depends on the problem (e.g., the smoothness
of the solution), and to check whether the network converges or not, we can monitor
the loss function or the PDE residual using callback functions. We also note that
acceleration of training can be achieved by using an adaptive activation function that
may remove bad local minima; see [24, 23].
Unlike traditional numerical methods, for PINNs there is no guarantee of unique
solutions, because PINN solutions are obtained by solving nonconvex optimization
problems, which in general do not have a unique solution. In practice, to achieve a
good level of accuracy, we need to tune all the hyperparameters, e.g., network size,
learning rate, and the number of residual points. The required network size depends
highly on the smoothness of the PDE solution. For example, a small network (e.g.,
a few layers and twenty neurons per layer) is sufficient for solving the 1D Poisson
equation, but a deeper and wider network is required for the 1D Burgers equation to
achieve a similar level of accuracy. We also note that PINNs may converge to different
solutions from different network initial values [42, 51, 21], and thus a common strategy
is that we train PINNs from random initialization a few times (e.g., 10 independent
runs) and choose the network with the smallest training loss as the final solution.
In the algorithm of PINN introduced above, we enforce soft constraints of bound-
ary/initial conditions through the loss \scrL b . This approach can be used for complex
domains and any type of boundary conditions. On the other hand, it is possible to
enforce hard constraints for simple cases [30]. For example, when the boundary con-
dition is u(0) = u(1) = 0 with \Omega = [0, 1], we can simply choose the surrogate model
as u(x) \^ = x(x - 1)\scrN (x) to satisfy the boundary condition automatically, where \scrN (x)
is a neural network.
We note that we have great flexibility in choosing the residual points \scrT , and here
we provide three possible strategies:
1. We can specify the residual points at the beginning of training, which could
be grid points on a lattice or random points, and never change them during
the training process.
2. In each optimization iteration, we could select randomly different residual
points.
3. We could improve the location of the residual points adaptively during train-
ing, e.g., the method proposed in subsection 2.8.
When the number of residual points required is very large, e.g., in multiscale problems,
it is computationally expensive to calculate the loss and gradient in every iteration.
Instead of using all residual points, we can split the residual points into small batches,
and in each iteration we only use one batch to calculate the loss and update model
parameters; this is the so-called ``mini-batch"" gradient descent. The aforementioned

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


214 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

strategy (2), i.e., resampling in each step, is a special case of mini-batch gradient
descent by choosing \scrT = \Omega with | \scrT | = \infty .
Recent studies show that for function approximation, neural networks learn tar-
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

get functions from low to high frequencies [46, 54], but here we show that the learning
mode of PINNs is different due to the existence \sum 5 of high-order derivatives. For example,
when we approximate the function f (x) = k=1 sin(2kx)/(2k) in [ - \pi , \pi ] by a neural
network, the function is learned from low to high frequency (Figure 2A). However,
\sum 5
when we employ a PINN to solve the Poisson equation - fxx = k=1 2k sin(2kx)
with zero boundary conditions in the same domain, all frequencies are learned almost
simultaneously (Figure 2B). Interestingly, by comparing Figure 2A and Figure 2B we
can see that at least in this case solving the PDE using a PINN is faster than ap-
proximating a function using a neural network. We can monitor this training process
using the callback functions in our library DeepXDE as discussed later.

Fig. 2 Convergence of the amplitude for each frequency during the \sum training process. (A) A neu-
5
ral network is trained to approximate the function f (x) = k=1 sin(2kx)/(2k). The color
represents amplitude values with the maximum amplitude for each \sum 5 frequency normalized to
1. (B) A PINN is used to solve the Poisson equation - fxx = k=1 2k sin(2kx) with zero
boundary conditions. We use a neural network of 4 hidden layers and 20 neurons per layer.
The learning rate is chosen as 10 - 4 , and 500 random points are sampled for training.

2.4. Approximation Theory and Error Analysis for PINNs. One fundamental
question related to PINNs is whether there exists a neural network satisfying both
the PDE equation and the boundary conditions, i.e., whether there exists a neural
network that can simultaneously and uniformly approximate a function and its partial
derivatives. To address this question, we first introduce some notation. Let \BbbZ d+ be
the set of d-dimensional nonnegative integers. For m = (m1 , . . . , md ) \in \BbbZ d+ , we set
| m| := m1 + \cdot \cdot \cdot + md , and
\partial | m|
Dm := md .
\partial xm
1 . . . \partial xd
1

We say f \in C m (\BbbR d ) if Dk f \in C(\BbbR d ) for all k \leq m, k \in \BbbZ d+ , where C(\BbbR d ) = \{ f : \BbbR d \rightarrow
\BbbR | f is continuous\} is the space of continuous functions. Then we recall the following
theorem of derivative approximation using single hidden layer neural networks due to
Pinkus [44].
Theorem 2.1. Let mi \in \BbbZ d+ , i = 1, . . . , s, and set m = maxi=1,...,s | mi | . Assume
\sigma \in C m (\BbbR ) and that \sigma is not a polynomial. Then the space of single hidden layer

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 215

F
ũT uT uF u
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Optimization Generalization Approximation


error Eopt error Egen error Eapp

Fig. 3 Illustration of errors of a PINN. The total error consists of the approximation error, the
optimization error, and the generalization error. Here, u is the PDE solution, u\scrF is the
best function close to u in the function space \scrF , u\scrT is the neural network whose loss is at
a global minimum, and u \~\scrT is the function obtained by training a neural network.

neural nets
\scrM (\sigma ) := span\{ \sigma (w \cdot x + b) : w \in \BbbR d , b \in \BbbR \}
is dense in 1
,...,ms i
Cm (\BbbR d ) := \cap si=1 C m (\BbbR d ),
1
,...,ms
i.e., for any f \in C m (\BbbR d ), any compact K \subset \BbbR d , and any \varepsilon > 0, there exists a
g \in \scrM (\sigma ) satisfying
max | Dk f (x) - Dk g(x)| < \varepsilon
x\in K

for all k \in \BbbZ d+ for which k \leq mi for some i.


Theorem 2.1 shows that feed-forward neural nets (FNNs) with enough neurons can
simultaneously and uniformly approximate any function and its partial derivatives.
However, neural networks in practice have limited size. Let \scrF denote the family of all
the functions that can be represented by our chosen neural network architecture. The
solution u is unlikely to belong to the family \scrF , and we define u\scrF = arg minf \in \scrF \| f - u\|
as the best function in \scrF close to u (Figure 3). Because we only train the neural
network on the training set \scrT , we define u\scrT = arg minf \in \scrF \scrL (f ; \scrT ) as the neural
network whose loss is at global minimum. For simplicity, we assume that u, u\scrF ,
and u\scrT are well defined and unique. Finding u\scrT by minimizing the loss is often
computationally intractable [10], and our optimizer returns an approximate solution
u
\~\scrT .
We can then decompose the total error \scrE as [11]

\scrE := \| \~
u\scrT - u\| \leq \| \~ u\scrT - u\scrT \| + \| u\scrT - u\scrF \| + \| u\scrF - u\| .
\underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{}
\scrE opt \scrE gen \scrE app

The approximation error \scrE app measures how closely u\scrF can approximate u. The
generalization error \scrE gen is determined by the number/locations of residual points
in \scrT and the capacity of the family \scrF . Neural networks of larger size have smaller
approximation errors but could lead to higher generalization errors, which is called
bias-variance tradeoff. Overfitting occurs when the generalization error dominates.
In addition, the optimization error \scrE opt stems from the loss function complexity and
the optimization setup, such as learning rate and number of iterations. However,
currently there is no error estimation for PINNs yet, and even quantifying the three
errors for supervised learning is still an open research problem [36, 35, 25].
2.5. Comparison between PINNs and FEM. To further explain the ideas of
PINNs and to help those with knowledge of FEM to understand PINNs more easily,
we make a comparison between PINNs and FEM point by point (Table 2):

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


216 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

Table 2 Comparison between PINN and FEM.


Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PINN FEM
Basis function Neural network (nonlinear) Piecewise polynomial (linear)
Parameters Weights and biases Point values
Training points Scattered points (mesh-free) Mesh points
PDE embedding Loss function Algebraic system
Parameter solver Gradient-based optimizer Linear solver
Errors \scrE app , \scrE gen , and \scrE opt (subsection 2.4) Approximation/quadrature errors
Error bounds Not available yet Partially available [14, 26]

\bullet In FEM we approximate the solution u by a piecewise polynomial with point


values to be determined, while in PINNs we construct a neural network as
the surrogate model parameterized by weights and biases.
\bullet FEM typically requires a mesh generation, while PINN is totally mesh-free,
and we can use either a grid or random points.
\bullet FEM converts a PDE to an algebraic system, using the stiffness matrix and
mass matrix, while PINN embeds the PDE and boundary conditions into the
loss function.
\bullet In the last step, the algebraic system in FEM is solved exactly by a linear
solver, but the network in PINN is learned by a gradient-based optimizer.
At a more fundamental level, PINNs provide a nonlinear approximation to the func-
tion and its derivatives, whereas FEM represents a linear approximation.
2.6. PINNs for Solving Integro-differential Equations. When solving integro-
differential equations (IDEs), we still employ the automatic differentiation technique
to analytically derive the integer-order derivatives, while we approximate integral op-
erators numerically using classical methods (Figure 4) [42], such as Gaussian quadra-
ture. Therefore, we introduce a fourth error component, the discretization error \scrE dis ,
due to the approximation of the integral by Gaussian quadrature.
For example, when solving
\int x
dy
+ y(x) = et - x y(t)dt,
dx 0

we first use Gaussian quadrature of degree n to approximate the integral


\int x n
\sum
t - x
e y(t)dt \approx wi eti (x) - x y(ti (x)),
0 i=1

and then we use a PINN to solve the following PDE instead of the original equation:
n
dy \sum
+ y(x) \approx wi eti (x) - x y(ti (x)).
dx i=1

PINNs can also be easily extended to solve FDEs [42] and SDEs [57, 55, 41, 56], but
we do not discuss such cases here due to page limit constraints.
2.7. PINNs for Solving Inverse Problems. In inverse problems, there are some
unknown parameters \bfitlambda in (2.1), but we have some extra information on some points
\scrT i \subset \Omega besides the differential equation and boundary conditions:
\scrI (u, x) = 0 for x \in \scrT i .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 217
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Fig. 4 Schematic illustrating the modification of the PINN algorithm for solving IDEs. We employ
the AD technique to analytically derive the integer-order derivatives, and we approximate
integral operators numerically using standard methods. (The figure is reproduced from [42].)

From an implementation point of view, PINNs solve inverse problems as easily as


forward problems [47, 42]. The only difference between solving forward and inverse
problems is that we add an extra loss term to (2.2):

\scrL (\bfittheta , \bfitlambda ; \scrT ) = wf \scrL f (\bfittheta , \bfitlambda ; \scrT f ) + wb \scrL b (\bfittheta , \bfitlambda ; \scrT b ) + wi \scrL i (\bfittheta , \bfitlambda ; \scrT i ),

where
1 \sum
\scrL i (\bfittheta , \bfitlambda ; \scrT i ) = u, x)\| 22 .
\| \scrI (\^
| \scrT i |
x\in \scrT i

We then optimize \bfittheta and \bfitlambda together, and our solution is \bfittheta \ast , \bfitlambda \ast = arg min\bfittheta ,\bfitlambda \scrL (\bfittheta , \bfitlambda ; \scrT ).
2.8. Residual-Based Adaptive Refinement (RAR). As we discussed in subsec-
tion 2.3, the residual points \scrT are usually randomly distributed in the domain. This
works well for most cases, but it may not be efficient for certain PDEs that exhibit
solutions with steep gradients. Take the Burgers equation as an example; intuitively
we should put more points near the sharp front to capture the discontinuity well.
However, it is challenging, in general, to design a good distribution of residual points
for problems whose solution is unknown. To overcome this challenge, we propose
a residual-based adaptive refinement (RAR) method to improve the distribution of
residual points during the training process (Procedure 2.2), conceptually similar to
FEM refinement methods [2]. The idea of RAR is that we will add more residual points
\bigm\| \bigl( \partial u\^ 2 2
\partial u
\^
; \partial u\^ , . . . , \partial x\partial 1 \partial xu\^
\bigr) \bigm\|
in the locations where the PDE residual \bigm\| f x; \partial x 1
, . . . , \partial x d \partial x1 \partial x1 d
; . . . ; \bfitlambda \bigm\|
is large, and we repeatedly add points until the mean residual

\partial 2 u \partial 2 u
\int \bigm\| \biggl( \biggr) \bigm\|
1 \bigm\| f x; \partial u
\bigm\| \^ \partial u
\^ \^ \^ \bigm\|
(2.3) \scrE r = , . . . , ; , . . . , ; . . . ; \bfitlambda \bigm\| dx
V \Omega \bigm\| \partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd \bigm\|

is smaller than a threshold \scrE 0 , where V is the volume of \Omega .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


218 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

Procedure 2.2 RAR for improving the distribution of residual points for training.
Step 1 Select the initial residual points \scrT , and train the neural network for a limited
number of iterations.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Step 2 Estimate the mean PDE residual \scrE r in (2.3) by Monte Carlo integration,
i.e., by the average of values at a set of randomly sampled dense locations
\scrS = \{ x1 , x2 , . . . , x| \scrS | \} :

\partial 2 u \partial 2 u
\bigm\| \biggl( \biggr) \bigm\|
\bigm\| f x; \partial u
1 \sum \bigm\| \^ \partial u
\^ \^ \^ \bigm\|
\scrE r \approx , . . . , ; , . . . , ; . . . ; \bfitlambda \bigm\| .
| \scrS | \bigm\| \partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd \bigm\|
x\in \scrS

Step 3 Stop if \scrE r < \scrE 0 . Otherwise, add m new points with the largest residuals in \scrS
to \scrT , retrain the network, and go to Step 2.

3. DeepXDE Usage and Customization. In this section, we introduce the usage


of DeepXDE and how to customize DeepXDE to meet new problem requirements.
3.1. Usage. Compared to traditional numerical methods, the code written with
DeepXDE is much shorter and more comprehensive, resembling closely the math-
ematical formulation. Solving differential equations in DeepXDE is no more than
specifying the problem using the built-in modules, including computational domain
(geometry and time), PDE equations, boundary/initial conditions, constraints, train-
ing data, neural network architecture, and training hyperparameters. The workflow
is shown in Procedure 3.1 and Figure 5.

Procedure 3.1 Usage of DeepXDE for solving differential equations.


Step 1 Specify the computational domain using the geometry module.
Step 2 Specify the PDE using the grammar of TensorFlow.
Step 3 Specify the boundary and initial conditions.
Step 4 Combine the geometry, PDE and boundary/initial conditions together into
data.PDE or data.TimePDE for time-independent problems or for time-
dependent problems, respectively. To specify training data, we can either
set the specific point locations, or only set the number of points and then
DeepXDE will sample the required number of points on a grid or randomly.
Step 5 Construct a neural network using the maps module.
Step 6 Define a Model by combining the PDE problem in Step 4 and the neural net
in Step 5.
Step 7 Call Model.compile to set the optimization hyperparameters, such as
optimizer and learning rate. The weights in (2.2) can be set here by
loss weights.
Step 8 Call Model.train to train the network from random initialization or a pre-
trained model using the argument model restore path. It is extremely
flexible to monitor and modify the training behavior using callbacks.
Step 9 Call Model.predict to predict the PDE solution at different locations.

In DeepXDE, the built-in primitive geometries include interval, triangle,


rectangle, polygon, disk, cuboid, and sphere. Other geometries can be con-
structed from these primitive geometries using three boolean operations: union (---),
difference (-), and intersection (\&). This technique is called constructive
solid geometry (CSG); see Figure 6 for examples. CSG supports both 2D and 3D
geometries.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 219
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Fig. 5 Flowchart of DeepXDE corresponding to Procedure 3.1. The white boxes define the PDE
problem and the training hyperparameters. The blue boxes combine the PDE problem and
training hyperparameters in the white boxes. The orange boxes are the three steps (from right
to left) to solve the PDE.

A B

A|B

A-B | &

A& B

Fig. 6 Examples of constructive solid geometry (CSG) in 2D. (left) A and B represent the rectangle
and circle, respectively. The union A| B, difference A - B, and intersection A \& B are
constructed from A and B. (right) A complex geometry (top) is constructed from a polygon, a
rectangle, and two circles (bottom) through the union, difference, and intersection operations.
This capability is included in the module geometry of DeepXDE.

DeepXDE supports four standard boundary conditions, including Dirichlet


(DirichletBC), Neumann (NeumannBC), Robin (RobinBC), and periodic
(PeriodicBC), and a more general boundary condition can be defined using
OperatorBC. The initial condition can be defined using IC. There are two types
of neural networks available in DeepXDE: feed-forward neural network (maps.FNN)
and residual neural network (maps.ResNet). It is also convenient to choose differ-
ent training hyperparameters, such as loss types, metrics, optimizers, learning rate
schedules, initializations, and regularizations.
In addition to solving differential equations, DeepXDE can also be used to ap-
proximate functions from multifidelity data [40] and learn nonlinear operators [34].
3.2. Customizability. All the components of DeepXDE are loosely coupled, and
thus DeepXDE is well structured and highly configurable. In this subsection, we
discuss how to customize DeepXDE to address new problem requirements, e.g., new
geometry or network architecture.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


220 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

3.2.1. Geometry. As we introduced above, DeepXDE has already supported 7


basic geometries and the CSG technique. However, it is still possible that the user
needs a new geometry, which cannot be constructed in DeepXDE. In this situation, a
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

new geometry can be defined as shown in Procedure 3.2. Currently DeepXDE does not
accurately support descriptions of complex curvilinear boundaries; however, a future
extension could be incorporation of the nonuniform rational basis spline (NURBS)
[22] for such representations.

Procedure 3.2 Customization of the new geometry module MyGeometry. The class
methods should only be implemented as needed.
1 (Geometry):
2 inside(self, x):
3 """Check if x is inside the geometry."""
4 on_boundary(self, x):
5 """Check if x is on the geometry boundary."""
6 boundary_normal(self, x):
7 """Compute the unit normal at x for Neumann or Robin boundary conditions."""
8 periodic_point(self, x, component):
9 """Compute the periodic image of x for periodic boundary condition."""
10 uniform_points(self, n, boundary=True):
11 """Compute the equispaced point locations in the geometry."""
12 random_points(self, n, random="pseudo"):
13 """Compute the random point locations in the geometry."""
14 uniform_boundary_points(self, n):
15 """Compute the equispaced point locations on the boundary."""
16 random_boundary_points(self, n, random="pseudo"):
17 """Compute the random point locations on the boundary."""

3.2.2. Neural Networks. DeepXDE currently supports two neural networks:


feed-forward neural network (maps.FNN) and residual neural network (maps.ResNet).
A new network can be added, as shown in Procedure 3.3.

Procedure 3.3 Customization of the neural network MyNet.


1 (Map):
2 @property
3 inputs(self):
4 """Return the net inputs."""
5 @property
6 outputs(self):
7 """Return the net outputs."""
8 @property
9 targets(self):
10 """Return the targets of the net outputs."""
11 build(self):
12 """Construct the network."""

3.2.3. Callbacks. It is usually a good strategy to monitor the training process


of the neural network, and then make modifications in real time, e.g., change the
learning rate. In DeepXDE, this can be implemented by adding a callback function,
and here we only list a few commonly used ones already implemented in DeepXDE:
\bullet ModelCheckpoint, which saves the model after certain epochs or when a
better model is found.
\bullet OperatorPredictor, which calculates the values of the operator applied to
the outputs.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 221

\bullet FirstDerivative, which calculates the first derivative of the outputs with
respect to the inputs. This is a special case of OperatorPredictor with the
operator being the first derivative.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\bullet MovieDumper, which dumps the movie of the function during the training
progress, and/or the movie of the spectrum of its Fourier transform.
It is very convenient to add other callback functions, which will be called at different
stages of the training process; see Procedure 3.4.

Procedure 3.4 Customization of the callback MyCallback. Here, we only show how
to add functions to be called at the beginning/end of every epoch. Similarly, we can
call functions at the other training stages, such as at the beginning of training.
1 (Callback):
2 on_epoch_begin(self):
3 """Called at the beginning of every epoch."""
4 on_epoch_end(self):
5 """Called at the end of every epoch."""

4. Demonstration Examples. In this section, we use PINNs and DeepXDE to


solve different problems. In all examples, we use the tanh as the activation function,
and the other hyperparameters are listed in Table 3. The weights wf , wb , and wi in
the loss function are set as 1. The codes of all examples are published in GitHub.
Table 3 Hyperparameters used for the following 5 examples. ``Adam, L-BFGS"" represents that
we first use Adam for a certain number of iterations, and then switch to L-BFGS. The
optimizer L-BFGS does not require learning rate, and the neural network (NN) is trained
until convergence, so the number of iterations is also ignored for L-BFGS.

Example NN depth NN width Optimizer Learning rate \# Iterations


1 4 50 Adam, L-BFGS 0.001 50000
2 3 20 Adam, L-BFGS 0.001 10000
3 3 40 Adam 0.001 60000
4 3 20 Adam 0.001 80000
5 4 20 L-BFGS - -

4.1. Poisson Equation over an L-shaped Domain. Consider the following 2D


Poisson equation over an L-shaped domain \Omega = [ - 1, 1]2 \setminus [0, 1]2 :
- \Delta u(x, y) = 1, (x, y) \in \Omega ,
u(x, y) = 0, (x, y) \in \partial \Omega .
We choose 1200 and 120 random points drawn from a uniform distribution as \scrT f and
\scrT b , respectively. The PINN solution is given in Figure 7B. For comparison, we also
present the numerical solution obtained using the spectral element method (SEM) [27]
(Figure 7A). The result of the absolute error is shown in Figure 7C.
4.2. RAR for the Burgers Equation. We first consider the 1D Burgers equation:
\partial u \partial u \partial 2 u
+u = \nu 2 , x \in [ - 1, 1], t \in [0, 1],
\partial t \partial x \partial x
u(x, 0) = - sin(\pi x), u( - 1, t) = u(1, t) = 0.
Let \nu = 0.01/\pi . Initially, we randomly select 2500 points (spatio-temporal domain)
as the residual points, and then 40 more residual points are added adaptively via

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


222 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

A B C
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Fig. 7 Example in subsection 4.1. Comparison of the PINN solution with the solution obtained
by using the spectral element method (SEM). (A) The SEM solution uSEM , (B) the PINN
solution uN N , (C) the absolute error | uSEM - uN N | .

RAR developed in subsection 2.8 with m = 1, | \scrS | = 100000, and \scrE 0 = 0.005. The
training procedure after adding new points is shown in Table 3. We compare the
PINN solution with RAR and the PINN solution based on 2540 randomly selected
training data (panels A and B of Figure 8), and demonstrate that PINN with RAR
can capture the discontinuity much better. For a comparison, the finite difference
solutions using the central difference scheme for space discretization and the forward
Euler scheme for time discretization for the Burgers equation in the conservative form
are also shown in Figure 8A. Here, we also present two examples for the residual points
added via the RAR method. As shown in panels C and D of Figure 8, the added points
(green crosses) are quite close to the sharp interface, which indicates the effectiveness
of RAR.
We further solve the following 2D Burgers equation using the RAR:
1 2
\partial t u + u\partial x u + v\partial y u = (\partial u + \partial y2 u),
Re x
1 2
\partial t v + u\partial x v + v\partial y v = (\partial v + \partial y2 v),
Re x
x, y \in [0, 1], and t \in [0, 1],

where u and v are the velocities along the x and y directions, respectively. In addition,
Re is a nondimensional number (Reynolds number) defined as Re = U L/\nu , in which U
and L are, respectively, the characteristic velocity and length, and \nu is the kinematic
viscosity of fluid. The exact solution can be obtained [3] as
3 1
u(x, y, t) = - ,
4 4 [1 + exp(( - 4x + 4y - t)Re/32)]
3 1
v(x, y, t) = + ,
4 4 [1 + exp(( - 4x + 4y - t)Re/32)]

using the Dirichlet boundary conditions on all boundaries. In the present study, Re
is set to be 5000, which is quite challenging due to the fact that the high Reynolds
number leads to a steep gradient in the solution. Initially, 200 residual points are
randomly sampled in the spatio-temporal domain, and 5000 and 1000 random points
are used for each initial and boundary condition, respectively. We only add 10 extra
residual points using RAR, and the total number of the residual points is 210 after
convergence. For comparison, we also test the case without RAR using 210 randomly
sampled residual points. The results are displayed in panels E and F of Figure 8,
demonstrating the effectiveness of RAR.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 223

A 1 B 0.4
PINN w/o RAR PINN w/o RAR
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PINN w/ RAR PINN w/ RAR


0.5 FD 20000 0.3

L 2 relative error
FD 2400
Reference
0 0.2
u

-0.5 0.1

0
-1
-1 -0.5 0 0.5 1 2500 2510 2520 2530 2540
x # Residual points
C 1 1 D 1 1

0.5 0.5 0.5 0.5

0 0 0 0
x

-0.5 -0.5 -0.5 -0.5

-1 -1 -1 -1
0 0.5 1 0 0.5 1
t t
E F
1
PINN w/ RAR
PINN w/o RAR
0.7
Reference

0.9
u(x, 2/3, t)

v(x, 2/3, t)

t t
0.6

PINN w/ RAR 0.8


PINN w/o RAR
0.5 Reference
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x

Fig. 8 Example in subsection 4.2. Comparisons of the PINN solutions of 1D ( A, B, C, and D)


and 2D ( E and F) Burgers equations with and without RAR. (A) The cyan, green, and red
lines represent the reference solution of u from [47], the PINN solution without RAR, and
the PINN solution with RAR at t = 0.9, respectively. For the finite difference (FD) method,
200\times 100 = 20000 spatial-temporal grid points are used to achieve a good solution (blue line).
If only 60 \times 40 = 2400 points are used, the FD solution has large oscillations around the
discontinuity (brown line). (B) L2 relative error versus the number of residual points. The
red solid line and shaded region correspond to the mean and one-standard-deviation band for
the L2 relative error of PINN with RAR, respectively. The blue dashed line is the mean and
one-standard-deviation for the error of PINN using 2540 random residual points. The mean
and standard deviation are obtained from 10 runs with random initial residual points. (C)
and (D) Two representative examples for the residual points added via RAR. Black dots:
initial residual points; green cross: added residual points; 6 and 11 residual points are added
in C and D, respectively. (E) and (F) Comparison of the velocity profiles at y = 32 from the
PINNs with and without RAR for the 2D Burgers equation. The profiles at three different
times (t = 0.2, 0.5, and 1) are presented with t increasing along the direction of the arrow.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


224 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

4.3. Inverse Problem for the Lorenz System. Consider the parameter identifi-
cation problem of the Lorenz system
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

dx dy dz
= \rho (y - x), = x(\sigma - z) - y, = xy - \beta z,
dt dt dt
with initial condition (x(0), y(0), z(0)) = ( - 8, 7, 27), where \rho , \sigma , and \beta are the three
parameters to be identified from the observations at certain times. The observations
are produced by solving the above system to t = 3 using Runge--Kutta (4, 5) with
the underlying true parameters (\rho , \sigma , \beta ) = (10, 15, 8/3). We choose 400 uniformly
distributed random points and 25 equispaced points as the residual points \scrT f and \scrT i ,
respectively. The evolution trajectories of \rho , \sigma , and \beta are presented in Figure 9A,
with the final identified values of (\rho , \sigma , \beta ) = (10.002, 14.999, 2.668).

Fig. 9 Examples in subsections 4.3 and 4.4. The identified values of (A) the Lorenz system and
(B) diffusion-reaction system converge to the true values during the training process. The
parameter values are scaled for plotting.

4.4. Inverse Problem for Diffusion-Reaction Systems. A diffusion-reaction sys-


tem in porous media for the solute concentrations CA , CB , and CC (A + 2B \rightarrow C) is
described by
\partial CA \partial 2 CA 2 \partial CB \partial 2 CB 2
=D 2
- kf CA CB , =D - 2kf CA CB , x \in [0, 1], t \in [0, 10],
\partial t \partial x \partial t \partial x2
CA (x, 0) = CB (x, 0) = e - 20x , CA (0, t) = CB (0, t) = 1, CA (1, t) = CB (1, t) = 0,
where D = 2 \times 10 - 3 is the effective diffusion coefficient, and kf = 0.1 is the effective
reaction rate. Because D and kf depend on the pore structure and are difficult to
measure directly, we estimate D and kf based on 40000 observations of the concen-
trations CA and CB in the spatio-temporal domain. The identified D (1.98 \times 10 - 3 )
and kf (0.0971), displayed in Figure 9B, agree well with their true values.
4.5. Volterra IDE. Here, we consider the first-order IDE of Volterra type in the
domain [0, 5]: \int x
dy
+ y(x) = et - x y(t)dt, y(0) = 1,
dx 0
with the exact solution y(x) = e - x cosh x. We solve this IDE using the method in
subsection 2.6 and approximate the integral using Gaussian-Legendre quadrature of
degree 20. The L2 relative error is 2 \times 10 - 3 , and the solution is shown in Figure 10.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 225

1
Exact
0.9 PINN
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Training points
0.8

y 0.7

0.6

0.5

0.4
0 1 2 3 4 5
x

Fig. 10 Example in subsection 4.5. The PINN algorithm for solving Volterra IDE. The solid blue
line is the exact solution, and the dashed red line is the numerical solution from PINN.
Twelve equispaced residual points (black dots) are used.

5. Concluding Remarks. In this paper, we present the algorithm, approxima-


tion theory, and error analysis of physics-informed neural networks (PINNs) for solv-
ing different types of partial differential equations (PDEs). Compared to traditional
numerical methods, PINNs employ automatic differentiation to handle differential
operators, and thus they are mesh-free. Unlike numerical differentiation, automatic
differentiation does not differentiate the data, and hence it can tolerate noisy data
for training. We also discuss how to extend PINNs to solve other types of differential
equations, such as integro-differential equations, and also how to solve inverse prob-
lems. In addition, we propose a residual-based adaptive refinement (RAR) method
to improve the distribution of residual points during the training process, and thus
increase the training efficiency.
To benefit both the educational and the computational science communities, we
have developed the Python library DeepXDE, an implementation of PINNs. By in-
troducing the usage of DeepXDE, we show that DeepXDE enables user codes to be
compact and follow closely the mathematical formulation. We also demonstrate how
to customize DeepXDE to meet new problem requirements. Our numerical examples
for forward and inverse problems verify the effectiveness of PINNs and the capabil-
ity of DeepXDE. Scientific machine learning is emerging as a new and potentially
powerful alternative to classical scientific computing, so we hope that libraries such
as DeepXDE will accelerate this development and make it accessible not only to stu-
dents but also to other researchers who may find the need to adopt PINN-like methods
in their research, which can be very effective, especially for inverse problems.
Despite the aforementioned advantages, PINNs still have some limitations. For
forward problems, PINNs are currently slower than finite elements, but this can be
alleviated via offline training [58, 53]. For long time integration, one can also use
time-parallel methods to simultaneously compute on multiple GPUs for shorter time
domains. Another limitation is the search for effective neural network architectures,
which is currently done empirically by users; however, emerging meta-learning tech-
niques can be used to automate this search; see [59, 17]. Moreover, while here we
enforce the strong form of PDEs, which is easily implemented using automatic dif-
ferentiation, alternative weak/variational forms may also be effective, although they
require the use of quadrature grids. Many other extensions for multiphysics and

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


226 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

multiscale problems are possible across different scientific disciplines by creatively de-
signing the loss function and introducing suitable solution spaces. For instance, in the
five examples we present here, we only assume data on scattered points; however, in
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

geophysics or biomedicine we may have mixed data in the form of images and point
measurements. In this case, we can design a composite neural network consisting of
one convolutional neural network and one PINN sharing the same set of parameters,
and minimize the total loss which could be a weighted summation of multiple losses
from each neural network.

REFERENCES

[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,


G. Irving, M. Isard et al., TensorFlow: A system for large-scale machine learning,
in 12th USENIX Symposium on Operating Systems Design and Implementation, 2016,
pp. 265--283. (Cited on p. 212)
[2] M. Ainsworth and J. T. Oden, A Posteriori Error Estimation in Finite Element Analysis,
John Wiley \& Sons, 2011. (Cited on p. 217)
[3] A. Baeza and P. Mulet, Adaptive mesh refinement techniques for high-order shock capturing
schemes for multi-dimensional hydrodynamic simulations, Internat. J. Numer. Methods
Fluids, 52 (2006), pp. 455--471. (Cited on p. 222)
[4] N. Baker, F. Alexander, T. Bremer, A. Hagberg, Y. Kevrekidis, H. Najm, M. Parashar,
A. Patra, J. Sethian, S. Wild et al., Workshop Report on Basic Research Needs for
Scientific Machine Learning: Core Technologies for Artificial Intelligence, Tech. report,
U.S. DOE Office of Science, Washington, DC, 2019. (Cited on p. 209)
[5] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differen-
tiation in machine learning: A survey, J. Mach. Learn. Res., 18 (2017), pp. 5595--5637.
(Cited on pp. 210, 211)
[6] C. Beck, W. E, and A. Jentzen, Machine learning approximation algorithms for high-
dimensional fully nonlinear partial differential equations and second-order backward
stochastic differential equations, J. Nonlinear Sci., 29 (2019), pp. 1563--1619. (Cited on
p. 209)
[7] J. Berg and K. Nystrom, \" A unified deep artificial neural network approach to partial differ-
ential equations in complex geometries, Neurocomput., 317 (2018), pp. 28--41. (Cited on
p. 209)
[8] M. Betancourt, A Geometric Theory of Higher-Order Automatic Differentiation, preprint,
https://arxiv.org/abs/1812.11592, 2018. (Cited on p. 211)
[9] J. Bettencourt, M. J. Johnson, and D. Duvenaud, Taylor-mode automatic differentiation
for higher-order derivatives in JAX, in NeurIPS 2019 Workshop on Program Transforma-
tions, 2019, https://openreview.net/forum?id=SkxEF3FNPH. (Cited on p. 211)
[10] A. Blum and R. L. Rivest, Training a 3-node neural network is NP-complete, in Advances
in Neural Information Processing Systems, Morgan Kaufmann, 1989, pp. 494--501. (Cited
on pp. 213, 215)
[11] L. Bottou and O. Bousquet, The tradeoffs of large scale learning, in Advances in Neural
Information Processing Systems, Morgan Kaufmann, 2008, pp. 161--168. (Cited on p. 215)
[12] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound con-
strained optimization, SIAM J. Sci. Comput., 16 (1995), pp. 1190--1208, https://doi.org/
10.1137/0916069. (Cited on p. 213)
[13] Y. Chen, L. Lu, G. E. Karniadakis, and L. D. Negro, Physics-Informed Neural Networks
for Inverse Problems in Nano-optics and Metamaterials, preprint, https://arxiv.org/abs/
1912.01085, 2019. (Cited on p. 209)
[14] P. G. Ciarlet, The Finite Element Method for Elliptic Problems, Classics Appl. Math. 40,
SIAM, 2002, https://doi.org/10.1137/1.9780898719208. (Cited on p. 216)
[15] M. Dissanayake and N. Phan-Thien, Neural-network-based approximations for solving partial
differential equations, Comm. Numer. Methods Engrg., 10 (1994), pp. 195--201. (Cited on
p. 209)
[16] W. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving
variational problems, Commun. Math. Stat., 6 (2018), pp. 1--12. (Cited on p. 209)
[17] C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of
deep networks, in Proceedings of the 34th International Conference on Machine Learning,
PMLR, 2017, pp. 1126--1135. (Cited on p. 225)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


DEEPXDE 227

[18] P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger, A Proof That Artifi-
cial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approxi-
mation of Black-Scholes Partial Differential Equations, preprint, https://arxiv.org/abs/
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

1809.02362, 2018. (Cited on p. 209)


[19] J. Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using
deep learning, Proc. Natl. Acad. Sci. USA, 115 (2018), pp. 8505--8510. (Cited on p. 209)
[20] J. He, L. Li, J. Xu, and C. Zheng, ReLU Deep Neural Networks and Linear Finite Elements,
preprint, https://arxiv.org/abs/1807.03973, 2018. (Cited on p. 209)
[21] Q. He, D. Brajas-Solano, G. Tartakovsky, and A. M. Tartakovsky, Physics-Informed
Neural Networks for Multiphysics Data Assimilation with Application to Subsurface Trans-
port, preprint, https://arxiv.org/abs/1912.02968, 2019. (Cited on pp. 209, 213)
[22] T. J. Hughes, J. A. Cottrell, and Y. Bazilevs, Isogeometric analysis: CAD, finite elements,
NURBS, exact geometry and mesh refinement, Computer Methods Appl. Mech. Engrg.,
194 (2005), pp. 4135--4195. (Cited on p. 220)
[23] A. D. Jagtap, K. Kawaguchi, and G. E. Karniadakis, Locally Adaptive Activation Func-
tions with Slope Recovery Term for Deep and Physics-Informed Neural Networks, preprint,
https://arxiv.org/abs/1909.12228, 2019. (Cited on p. 213)
[24] A. D. Jagtap, K. Kawaguchi, and G. E. Karniadakis, Adaptive activation functions accel-
erate convergence in deep and physics-informed neural networks, J. Comput. Phys., 404
(2020), art. 109136. (Cited on p. 213)
[25] P. Jin, L. Lu, Y. Tang, and G. E. Karniadakis, Quantifying the Generalization Error in
Deep Learning in Terms of Data Distribution and Neural Network Smoothness, preprint,
https://arxiv.org/abs/1905.11427, 2019. (Cited on p. 215)
[26] C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element
Method, Courier Corporation, 2012. (Cited on p. 216)
[27] G. E. Karniadakis and S. J. Sherwin, Spectral/hp Element Methods for Computational Fluid
Dynamics, 2nd ed., Oxford University Press, 2013. (Cited on p. 221)
[28] Y. Khoo, J. Lu, and L. Ying, Solving parametric PDE problems with artificial neural net-
works, preprint, https://arxiv.org/abs/1707.03351, 2017. (Cited on p. 209)
[29] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in the 3rd International
Conference on Learning Representations, 2015; preprint available from https://arxiv.org/
abs/1412.6980. (Cited on p. 213)
[30] I. E. Lagaris, A. Likas, and D. I. Fotiadis, Artificial neural networks for solving ordinary
and partial differential equations, IEEE Trans. Neural Networks, 9 (1998), pp. 987--1000.
(Cited on pp. 209, 213)
[31] I. E. Lagaris, A. C. Likas, and D. G. Papageorgiou, Neural-network methods for bound-
ary value problems with irregular boundaries, IEEE Trans. Neural Networks, 11 (2000),
pp. 1041--1049. (Cited on pp. 209, 211)
[32] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436--444. (Cited
on p. 209)
[33] Z. Long, Y. Lu, X. Ma, and B. Dong, PDE-net: Learning PDEs from data, in Proceedings
of the 35th International Conference on Machine Learning, PMLR, 2018, pp. 3214--3222.
(Cited on p. 209)
[34] L. Lu, P. Jin, and G. E. Karniadakis, DeepONet: Learning Nonlinear Operators for Identi-
fying Differential Equations Based on the Universal Approximation Theorem of Operators,
preprint, https://arxiv.org/abs/1910.03193, 2019. (Cited on p. 219)
[35] L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, Dying ReLU and Initialization: Theory and
Numerical Examples, preprint, https://arxiv.org/abs/1903.06733, 2019. (Cited on p. 215)
[36] L. Lu, Y. Su, and G. E. Karniadakis, Collapse of Deep and Narrow Neural Nets, preprint,
https://arxiv.org/abs/1808.04947, 2018. (Cited on p. 215)
[37] Z. Mao, A. D. Jagtap, and G. E. Karniadakis, Physics-informed neural networks for high-
speed flows, Comput. Methods Appl. Mech. Engrg., 360 (2020), art. 112789. (Cited on
p. 212)
[38] C. C. Margossian, A review of automatic differentiation and its efficient implementation,
Wiley Interdiscip. Rev. Data Mining Knowl. Disc., 9 (2019), art. e1305. (Cited on pp. 210,
211)
[39] A. J. Meade, Jr., and A. A. Fernandez, The numerical solution of linear ordinary differential
equations by feedforward neural networks, Math. Comput. Modelling, 19 (1994), pp. 1--25.
(Cited on p. 209)
[40] X. Meng and G. E. Karniadakis, A Composite Neural Network That Learns from Multi-
fidelity Data: Application to Function Approximation and Inverse PDE Problems,
preprint, https://arxiv.org/abs/1903.00104, 2019. (Cited on p. 219)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


228 LU LU, XUHUI MENG, ZHIPING MAO, AND GEORGE EM KARNIADAKIS

[41] M. A. Nabian and H. Meidani, A Deep Neural Network Surrogate for High-Dimensional
Random Partial Differential Equations, preprint, https://arxiv.org/abs/1806.02957, 2018.
(Cited on pp. 209, 216)
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

[42] G. Pang, L. Lu, and G. E. Karniadakis, fPINNs: Fractional physics-informed neural net-
works, SIAM J. Sci. Comput., 41 (2019), pp. A2603--A2626, https://doi.org/10.1137/
18M1229845. (Cited on pp. 209, 211, 213, 216, 217)
[43] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
L. Antiga, and A. Lerer, Automatic differentiation in PyTorch, in NIPS 2017 Workshop
on Autodiff, 2017, https://openreview.net/forum?id=BJJsrmfCZ. (Cited on p. 212)
[44] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numer., 8 (1999),
pp. 143--195. (Cited on p. 214)
[45] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao, Why and when can deep---
but not shallow---networks avoid the curse of dimensionality: A review, Internat. J. Au-
tomation Comput., 14 (2017), pp. 503--519. (Cited on p. 209)
[46] N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio,
and A. Courville, On the spectral bias of neural networks, in Proceedings of the 36th
International Conference on Machine Learning, PMLR, 2019, pp. 5301--5310. (Cited on
p. 214)
[47] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial
differential equations, J. Comput. Phys., 378 (2019), pp. 686--707. (Cited on pp. 209, 211,
217, 223)
[48] M. Raissi, A. Yazdani, and G. E. Karniadakis, Hidden fluid mechanics: Learning velocity
and pressure fields from flow visualizations, Science, 367 (2020), pp. 1026--1030. (Cited on
p. 209)
[49] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-
propagating errors, Nature, 323 (1986), pp. 533--536. (Cited on p. 210)
[50] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial dif-
ferential equations, J. Comput. Phys., 375 (2018), pp. 1339--1364. (Cited on p. 209)
[51] A. M. Tartakovsky, C. O. Marrero, P. Perdikaris, G. D. Tartakovsky, and D. Barajas-
Solano, Learning Parameters and Constitutive Relationships with Physics Informed Deep
Neural Networks, preprint, https://arxiv.org/abs/1808.03398, 2018. (Cited on pp. 209,
213)
[52] B. P. van Milligen, V. Tribaldos, and J. Jimenez,\' Neural network differential equation and
plasma equilibrium solver, Phys. Rev. Lett., 75 (1995), pp. 3594--3597. (Cited on p. 209)
[53] N. Winovich, K. Ramani, and G. Lin, ConvPDE-UQ: Convolutional neural networks with
quantified uncertainty for heterogeneous elliptic partial differential equations on varied
domains, J. Comput. Phys., 394 (2019), pp. 263--279. (Cited on p. 225)
[54] Z.-Q. J. Xu, Y. Zhang, T. Luo, Y. Xiao, and Z. Ma, Frequency Principle: Fourier Analysis
Sheds Light on Deep Neural Networks, preprint, https://arxiv.org/abs/1901.06523, 2019.
(Cited on p. 214)
[55] L. Yang, D. Zhang, and G. E. Karniadakis, Physics-Informed Generative Adversarial Net-
works for Stochastic Differential Equations, preprint, https://arxiv.org/abs/1811.02033,
2018. (Cited on pp. 209, 216)
[56] D. Zhang, L. Guo, and G. E. Karniadakis, Learning in Modal Space: Solving Time-
Dependent Stochastic PDEs Using Physics-Informed Neural Networks, preprint, https:
//arxiv.org/abs/1905.01205, 2019. (Cited on pp. 209, 216)
[57] D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis, Quantifying total uncertainty in physics-
informed neural networks for solving forward and inverse stochastic problems, J. Comput.
Phys., 397 (2019), art. 108850. (Cited on pp. 209, 216)
[58] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris, Physics-Constrained Deep
Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification with-
out Labeled Data, preprint, https://arxiv.org/abs/1901.06314, 2019. (Cited on pp. 209,
225)
[59] B. Zoph and Q. V. Le, Neural Architecture Search with Reinforcement Learning, preprint,
https://arxiv.org/abs/1611.01578, 2016. (Cited on p. 225)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like