Deepxde: A Deep Learning Library For Solving Differential Equations
Deepxde: A Deep Learning Library For Solving Differential Equations
Abstract. Deep learning has achieved remarkable success in diverse applications; however, its use in
solving partial differential equations (PDEs) has emerged only recently. Here, we present
an overview of physics-informed neural networks (PINNs), which embed a PDE into the
loss of the neural network using automatic differentiation. The PINN algorithm is simple,
and it can be applied to different types of PDEs, including integro-differential equations,
fractional PDEs, and stochastic PDEs. Moreover, from an implementation point of view,
PINNs solve inverse problems as easily as forward problems. We propose a new residual-
based adaptive refinement (RAR) method to improve the training efficiency of PINNs.
For pedagogical reasons, we compare the PINN algorithm to a standard finite element
method. We also present a Python library for PINNs, DeepXDE, which is designed to
serve both as an educational tool to be used in the classroom as well as a research tool
for solving problems in computational science and engineering. Specifically, DeepXDE can
solve forward problems given initial and boundary conditions, as well as inverse problems
given some extra measurements. DeepXDE supports complex-geometry domains based on
the technique of constructive solid geometry and enables the user code to be compact,
resembling closely the mathematical formulation. We introduce the usage of DeepXDE
and its customizability, and we also demonstrate the capability of PINNs and the user-
friendliness of DeepXDE for five different examples. More broadly, DeepXDE contributes
to the more rapid development of the emerging scientific machine learning field.
Key words. education software, DeepXDE, differential equations, deep learning, physics-informed
neural networks, scientific machine learning
DOI. 10.1137/19M1274067
\ast Received by the editors July 10, 2019; accepted for publication (in revised form) May 8, 2020;
(lu lu@mit.edu).
\ddagger Division of Applied Mathematics, Brown University, Providence, RI 02912 USA (xuhui meng@
brown.edu).
\S School of Mathematical Sciences, Xiamen University, Xiamen, Fujian 361005, China (zpmao@
xmu.edu.cn).
\P Division of Applied Mathematics, Brown University, Providence, RI 02912 USA, and Pacific
1. Introduction. In the last 15 years, deep learning in the form of deep neural
networks has been used very effectively in diverse applications [32], such as computer
vision and natural language processing. Despite the remarkable success in these and
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
related areas, deep learning has not yet been widely used in the field of scientific
computing. However, more recently, solving partial differential equations (PDEs),
e.g., in the standard differential form or in the integral form, via deep learning has
emerged as a potentially new subfield under the name of scientific machine learning
(SciML) [4]. In particular, we can replace traditional numerical discretization methods
with a neural network that approximates the solution to a PDE.
To obtain the approximate solution of a PDE via deep learning, a key step is to
constrain the neural network to minimize the PDE residual, and several approaches
have been proposed to accomplish this. Compared to traditional mesh-based methods,
such as the finite difference method (FDM) and the finite element method (FEM),
deep learning could be a mesh-free approach by taking advantage of automatic dif-
ferentiation [47], and could break the curse of dimensionality [45, 18]. Among these
approaches, some can only be applied to particular types of problems, such as image-
like input domain [28, 33, 58] or parabolic PDEs [6, 19]. Some researchers adopt the
variational form of PDEs and minimize the corresponding energy functional [16, 20].
However, not all PDEs can be derived from a known functional, and thus Galerkin-
type projections have also been considered [39]. Alternatively, one could use the PDE
in strong form directly [15, 52, 30, 31, 7, 50, 47]; in this form, automatic differentia-
tion could be used directly to avoid truncation errors and the numerical quadrature
errors of variational forms. This strong form approach was introduced in [47], coining
the term physics-informed neural networks (PINNs). An attractive feature of PINNs
is that it can be used to solve inverse problems with minimum change of the code
for forward problems [47, 48, 51, 21, 13]. In addition, PINNs have been further ex-
tended to solve integro-differential equations (IDEs), fractional differential equations
(FDEs) [42], and stochastic differential equations (SDEs) [57, 55, 41, 56].
In this paper, we present various PINN algorithms implemented in the Python
library DeepXDE,1 which is designed to serve both as an educational tool to be
used in the classroom as well as a research tool for solving problems in computational
science and engineering (CSE). DeepXDE can be used to solve multiphysics problems,
and supports complex-geometry domains based on the technique of constructive solid
geometry (CSG), hence avoiding tedious and time-consuming computational geometry
tasks. By using DeepXDE, from an implementation point of view, time-dependent
PDEs can be solved as easily as steady states by only defining the initial conditions.
In addition to the main workflow of DeepXDE, users can readily monitor and modify
the solution process via callback functions, e.g., monitoring the Fourier spectrum of
the neural network solution, which can reveal the learning mode of the neural network
shown later in Figure 2. Last but not least, DeepXDE is designed to make the user
code stay compact and manageable, resembling closely the mathematical formulation.
The paper is organized as follows. In section 2, after briefly introducing deep
neural networks and automatic differentiation, we present the algorithm, approxima-
tion theory, and error analysis of PINNs, and we make a comparison between PINNs
and FEM. We then discuss how to use PINNs to solve IDEs and inverse problems. In
addition, we propose the RAR method to improve the training efficiency of PINNs.
In section 3, we introduce the usage of our library, DeepXDE, and its customizability.
1 Source code is published under the Apache License, Version 2.0, on GitHub: https://github.
com/lululxvi/deepxde.
We can see that AD requires only one forward pass and one backward pass to com-
pute all the partial derivatives, no matter what the input dimension is. In contrast,
\partial y
using finite differences computing each partial derivative \partial x i
requires two function val-
uations y(x1 , . . . , xi , . . . , xdin ) and y(x1 , . . . , xi + \Delta xi , . . . , xdin ) for some small number
\Delta xi , and thus in total din + 1 forward passes are required to evaluate all the partial
derivatives. Hence, AD is much more efficient than finite difference when the input
dimension is high (see [5, 38] for more details of the comparison between AD and the
other three methods). To compute nth-order derivatives, AD can be applied recur-
sively n times. However, this nested approach may lead to inefficiency and numerical
instability, and hence other methods, e.g., Taylor-mode AD, have been developed for
this purpose [8, 9]. Finally we note that with AD we differentiate the neural network
and therefore we can deal with noisy data [42].
\partial y \partial y
Table 1 Example of AD to compute the partial derivatives \partial x1
and \partial x2
at (x1 , x2 ) = (2, 1).
where \scrB (u, x) could be Dirichlet, Neumann, Robin, or periodic boundary conditions.
For time-dependent problems, we consider time t as a special component of x, and
\Omega contains the temporal domain. The initial condition can be simply treated as a
special type of Dirichlet boundary condition on the spatio-temporal domain.
The algorithm of PINN [31, 47] is shown in Procedure 2.1, and visually in the
\partial 2 u
schematic of Figure 1 solving a diffusion equation \partial u \partial t = \lambda \partial x2 with mixed boundary
\partial u
conditions u(x, t) = gD (x, t) on \Gamma D \subset \partial \Omega and \partial n (x, t) = gR (u, x, t) on \Gamma R \subset \partial \Omega . We
explain each step as follows. In a PINN, we first construct a neural network u(x; \^ \bfittheta )
as a surrogate of the solution u(x), which takes the input x and outputs a vector with
the same dimension as u. Here, \bfittheta = \{ \bfitW \ell , \bfitb \ell \} 1\leq \ell \leq L is the set of all weight matrices
and bias vectors in the neural network u. \^ One advantage of PINNs by choosing
neural networks as the surrogate of u is that we can take the derivatives of u \^ with
respect to its input x by applying the chain rule for differentiating compositions of
conditions.
Step 3 Specify a loss function by summing the weighted L2 norm of both the PDE
equation and boundary condition residuals.
Step 4 Train the neural network to find the best parameters \bfittheta \ast by minimizing the
loss function \scrL (\bfittheta ; \scrT ).
PDE(λ)
∂
NN(x, t; θ) ∂t
2
Tf
∂ û
∂t − λ ∂∂xû2
σ σ ∂2
∂x2
x σ σ
.. .. û Minimize
t . . I û(x, t) − gD (x, t) Loss θ∗
σ σ ∂ ∂ û
Tb
∂n ∂n (x, t) − gR (u, x, t)
BC & IC
2
Fig. 1 Schematic of a PINN for solving the diffusion equation \partial u \partial t
= \lambda \partial \partial xu
2 with mixed boundary
\partial u
conditions (BC) u(x, t) = gD (x, t) on \Gamma D \subset \partial \Omega and \partial \bfn (x, t) = gR (u, x, t) on \Gamma R \subset \partial \Omega . The
initial condition (IC) is treated as a special type of boundary condition. \scrT f and \scrT b denote
the two sets of residual points for the equation and BC/IC.
(2.2) \scrL (\bfittheta ; \scrT ) = wf \scrL f (\bfittheta ; \scrT f ) + wb \scrL b (\bfittheta ; \scrT b ),
where
\biggr) \bigm\| 2
\partial 2 u \partial 2 u
\bigm\| \biggl(
1 \sum \bigm\| \partial u
\^ \partial u
\^ \^ \^ \bigm\|
\scrL f (\bfittheta ; \scrT f ) = \bigm\| f x; ,..., ; ,..., ; . . . ; \bfitlambda \bigm\|
\bigm\| ,
| \scrT f | \bigm\| \partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd 2
x\in \scrT f
1 \sum
\scrL b (\bfittheta ; \scrT b ) = u, x)\| 22 ,
\| \scrB (\^
| \scrT b |
x\in \scrT b
and wf and wb are the weights. The loss involves derivatives, such as the partial
derivative \partial u/\partial x \^ 1 or the normal derivative at the boundary \partial u/\partial n
\^ = \nabla \^
u \cdot n, which
are handled via AD.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
In the last step, the procedure of searching for a good \bfittheta by minimizing the loss
\scrL (\bfittheta ; \scrT ) is called ``training."" Considering the fact that the loss is highly nonlinear and
nonconvex with respect to \bfittheta [10], we usually minimize the loss function by gradient-
based optimizers such as gradient descent, Adam [29], and L-BFGS [12]. We remark
that, based on our experience, for smooth PDE solutions L-BFGS can find a good
solution with fewer iterations than Adam, because L-BFGS uses second-order deriva-
tives of the loss function, while Adam relies only on first-order derivatives. However,
for stiff solutions L-BFGS is more likely to be stuck at a bad local minimum. The
required number of iterations highly depends on the problem (e.g., the smoothness
of the solution), and to check whether the network converges or not, we can monitor
the loss function or the PDE residual using callback functions. We also note that
acceleration of training can be achieved by using an adaptive activation function that
may remove bad local minima; see [24, 23].
Unlike traditional numerical methods, for PINNs there is no guarantee of unique
solutions, because PINN solutions are obtained by solving nonconvex optimization
problems, which in general do not have a unique solution. In practice, to achieve a
good level of accuracy, we need to tune all the hyperparameters, e.g., network size,
learning rate, and the number of residual points. The required network size depends
highly on the smoothness of the PDE solution. For example, a small network (e.g.,
a few layers and twenty neurons per layer) is sufficient for solving the 1D Poisson
equation, but a deeper and wider network is required for the 1D Burgers equation to
achieve a similar level of accuracy. We also note that PINNs may converge to different
solutions from different network initial values [42, 51, 21], and thus a common strategy
is that we train PINNs from random initialization a few times (e.g., 10 independent
runs) and choose the network with the smallest training loss as the final solution.
In the algorithm of PINN introduced above, we enforce soft constraints of bound-
ary/initial conditions through the loss \scrL b . This approach can be used for complex
domains and any type of boundary conditions. On the other hand, it is possible to
enforce hard constraints for simple cases [30]. For example, when the boundary con-
dition is u(0) = u(1) = 0 with \Omega = [0, 1], we can simply choose the surrogate model
as u(x) \^ = x(x - 1)\scrN (x) to satisfy the boundary condition automatically, where \scrN (x)
is a neural network.
We note that we have great flexibility in choosing the residual points \scrT , and here
we provide three possible strategies:
1. We can specify the residual points at the beginning of training, which could
be grid points on a lattice or random points, and never change them during
the training process.
2. In each optimization iteration, we could select randomly different residual
points.
3. We could improve the location of the residual points adaptively during train-
ing, e.g., the method proposed in subsection 2.8.
When the number of residual points required is very large, e.g., in multiscale problems,
it is computationally expensive to calculate the loss and gradient in every iteration.
Instead of using all residual points, we can split the residual points into small batches,
and in each iteration we only use one batch to calculate the loss and update model
parameters; this is the so-called ``mini-batch"" gradient descent. The aforementioned
strategy (2), i.e., resampling in each step, is a special case of mini-batch gradient
descent by choosing \scrT = \Omega with | \scrT | = \infty .
Recent studies show that for function approximation, neural networks learn tar-
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
get functions from low to high frequencies [46, 54], but here we show that the learning
mode of PINNs is different due to the existence \sum 5 of high-order derivatives. For example,
when we approximate the function f (x) = k=1 sin(2kx)/(2k) in [ - \pi , \pi ] by a neural
network, the function is learned from low to high frequency (Figure 2A). However,
\sum 5
when we employ a PINN to solve the Poisson equation - fxx = k=1 2k sin(2kx)
with zero boundary conditions in the same domain, all frequencies are learned almost
simultaneously (Figure 2B). Interestingly, by comparing Figure 2A and Figure 2B we
can see that at least in this case solving the PDE using a PINN is faster than ap-
proximating a function using a neural network. We can monitor this training process
using the callback functions in our library DeepXDE as discussed later.
Fig. 2 Convergence of the amplitude for each frequency during the \sum training process. (A) A neu-
5
ral network is trained to approximate the function f (x) = k=1 sin(2kx)/(2k). The color
represents amplitude values with the maximum amplitude for each \sum 5 frequency normalized to
1. (B) A PINN is used to solve the Poisson equation - fxx = k=1 2k sin(2kx) with zero
boundary conditions. We use a neural network of 4 hidden layers and 20 neurons per layer.
The learning rate is chosen as 10 - 4 , and 500 random points are sampled for training.
2.4. Approximation Theory and Error Analysis for PINNs. One fundamental
question related to PINNs is whether there exists a neural network satisfying both
the PDE equation and the boundary conditions, i.e., whether there exists a neural
network that can simultaneously and uniformly approximate a function and its partial
derivatives. To address this question, we first introduce some notation. Let \BbbZ d+ be
the set of d-dimensional nonnegative integers. For m = (m1 , . . . , md ) \in \BbbZ d+ , we set
| m| := m1 + \cdot \cdot \cdot + md , and
\partial | m|
Dm := md .
\partial xm
1 . . . \partial xd
1
We say f \in C m (\BbbR d ) if Dk f \in C(\BbbR d ) for all k \leq m, k \in \BbbZ d+ , where C(\BbbR d ) = \{ f : \BbbR d \rightarrow
\BbbR | f is continuous\} is the space of continuous functions. Then we recall the following
theorem of derivative approximation using single hidden layer neural networks due to
Pinkus [44].
Theorem 2.1. Let mi \in \BbbZ d+ , i = 1, . . . , s, and set m = maxi=1,...,s | mi | . Assume
\sigma \in C m (\BbbR ) and that \sigma is not a polynomial. Then the space of single hidden layer
F
ũT uT uF u
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Fig. 3 Illustration of errors of a PINN. The total error consists of the approximation error, the
optimization error, and the generalization error. Here, u is the PDE solution, u\scrF is the
best function close to u in the function space \scrF , u\scrT is the neural network whose loss is at
a global minimum, and u \~\scrT is the function obtained by training a neural network.
neural nets
\scrM (\sigma ) := span\{ \sigma (w \cdot x + b) : w \in \BbbR d , b \in \BbbR \}
is dense in 1
,...,ms i
Cm (\BbbR d ) := \cap si=1 C m (\BbbR d ),
1
,...,ms
i.e., for any f \in C m (\BbbR d ), any compact K \subset \BbbR d , and any \varepsilon > 0, there exists a
g \in \scrM (\sigma ) satisfying
max | Dk f (x) - Dk g(x)| < \varepsilon
x\in K
\scrE := \| \~
u\scrT - u\| \leq \| \~ u\scrT - u\scrT \| + \| u\scrT - u\scrF \| + \| u\scrF - u\| .
\underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{}
\scrE opt \scrE gen \scrE app
The approximation error \scrE app measures how closely u\scrF can approximate u. The
generalization error \scrE gen is determined by the number/locations of residual points
in \scrT and the capacity of the family \scrF . Neural networks of larger size have smaller
approximation errors but could lead to higher generalization errors, which is called
bias-variance tradeoff. Overfitting occurs when the generalization error dominates.
In addition, the optimization error \scrE opt stems from the loss function complexity and
the optimization setup, such as learning rate and number of iterations. However,
currently there is no error estimation for PINNs yet, and even quantifying the three
errors for supervised learning is still an open research problem [36, 35, 25].
2.5. Comparison between PINNs and FEM. To further explain the ideas of
PINNs and to help those with knowledge of FEM to understand PINNs more easily,
we make a comparison between PINNs and FEM point by point (Table 2):
PINN FEM
Basis function Neural network (nonlinear) Piecewise polynomial (linear)
Parameters Weights and biases Point values
Training points Scattered points (mesh-free) Mesh points
PDE embedding Loss function Algebraic system
Parameter solver Gradient-based optimizer Linear solver
Errors \scrE app , \scrE gen , and \scrE opt (subsection 2.4) Approximation/quadrature errors
Error bounds Not available yet Partially available [14, 26]
and then we use a PINN to solve the following PDE instead of the original equation:
n
dy \sum
+ y(x) \approx wi eti (x) - x y(ti (x)).
dx i=1
PINNs can also be easily extended to solve FDEs [42] and SDEs [57, 55, 41, 56], but
we do not discuss such cases here due to page limit constraints.
2.7. PINNs for Solving Inverse Problems. In inverse problems, there are some
unknown parameters \bfitlambda in (2.1), but we have some extra information on some points
\scrT i \subset \Omega besides the differential equation and boundary conditions:
\scrI (u, x) = 0 for x \in \scrT i .
Fig. 4 Schematic illustrating the modification of the PINN algorithm for solving IDEs. We employ
the AD technique to analytically derive the integer-order derivatives, and we approximate
integral operators numerically using standard methods. (The figure is reproduced from [42].)
\scrL (\bfittheta , \bfitlambda ; \scrT ) = wf \scrL f (\bfittheta , \bfitlambda ; \scrT f ) + wb \scrL b (\bfittheta , \bfitlambda ; \scrT b ) + wi \scrL i (\bfittheta , \bfitlambda ; \scrT i ),
where
1 \sum
\scrL i (\bfittheta , \bfitlambda ; \scrT i ) = u, x)\| 22 .
\| \scrI (\^
| \scrT i |
x\in \scrT i
We then optimize \bfittheta and \bfitlambda together, and our solution is \bfittheta \ast , \bfitlambda \ast = arg min\bfittheta ,\bfitlambda \scrL (\bfittheta , \bfitlambda ; \scrT ).
2.8. Residual-Based Adaptive Refinement (RAR). As we discussed in subsec-
tion 2.3, the residual points \scrT are usually randomly distributed in the domain. This
works well for most cases, but it may not be efficient for certain PDEs that exhibit
solutions with steep gradients. Take the Burgers equation as an example; intuitively
we should put more points near the sharp front to capture the discontinuity well.
However, it is challenging, in general, to design a good distribution of residual points
for problems whose solution is unknown. To overcome this challenge, we propose
a residual-based adaptive refinement (RAR) method to improve the distribution of
residual points during the training process (Procedure 2.2), conceptually similar to
FEM refinement methods [2]. The idea of RAR is that we will add more residual points
\bigm\| \bigl( \partial u\^ 2 2
\partial u
\^
; \partial u\^ , . . . , \partial x\partial 1 \partial xu\^
\bigr) \bigm\|
in the locations where the PDE residual \bigm\| f x; \partial x 1
, . . . , \partial x d \partial x1 \partial x1 d
; . . . ; \bfitlambda \bigm\|
is large, and we repeatedly add points until the mean residual
\partial 2 u \partial 2 u
\int \bigm\| \biggl( \biggr) \bigm\|
1 \bigm\| f x; \partial u
\bigm\| \^ \partial u
\^ \^ \^ \bigm\|
(2.3) \scrE r = , . . . , ; , . . . , ; . . . ; \bfitlambda \bigm\| dx
V \Omega \bigm\| \partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd \bigm\|
Procedure 2.2 RAR for improving the distribution of residual points for training.
Step 1 Select the initial residual points \scrT , and train the neural network for a limited
number of iterations.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Step 2 Estimate the mean PDE residual \scrE r in (2.3) by Monte Carlo integration,
i.e., by the average of values at a set of randomly sampled dense locations
\scrS = \{ x1 , x2 , . . . , x| \scrS | \} :
\partial 2 u \partial 2 u
\bigm\| \biggl( \biggr) \bigm\|
\bigm\| f x; \partial u
1 \sum \bigm\| \^ \partial u
\^ \^ \^ \bigm\|
\scrE r \approx , . . . , ; , . . . , ; . . . ; \bfitlambda \bigm\| .
| \scrS | \bigm\| \partial x1 \partial xd \partial x1 \partial x1 \partial x1 \partial xd \bigm\|
x\in \scrS
Step 3 Stop if \scrE r < \scrE 0 . Otherwise, add m new points with the largest residuals in \scrS
to \scrT , retrain the network, and go to Step 2.
Fig. 5 Flowchart of DeepXDE corresponding to Procedure 3.1. The white boxes define the PDE
problem and the training hyperparameters. The blue boxes combine the PDE problem and
training hyperparameters in the white boxes. The orange boxes are the three steps (from right
to left) to solve the PDE.
A B
A|B
A-B | &
A& B
Fig. 6 Examples of constructive solid geometry (CSG) in 2D. (left) A and B represent the rectangle
and circle, respectively. The union A| B, difference A - B, and intersection A \& B are
constructed from A and B. (right) A complex geometry (top) is constructed from a polygon, a
rectangle, and two circles (bottom) through the union, difference, and intersection operations.
This capability is included in the module geometry of DeepXDE.
new geometry can be defined as shown in Procedure 3.2. Currently DeepXDE does not
accurately support descriptions of complex curvilinear boundaries; however, a future
extension could be incorporation of the nonuniform rational basis spline (NURBS)
[22] for such representations.
Procedure 3.2 Customization of the new geometry module MyGeometry. The class
methods should only be implemented as needed.
1 (Geometry):
2 inside(self, x):
3 """Check if x is inside the geometry."""
4 on_boundary(self, x):
5 """Check if x is on the geometry boundary."""
6 boundary_normal(self, x):
7 """Compute the unit normal at x for Neumann or Robin boundary conditions."""
8 periodic_point(self, x, component):
9 """Compute the periodic image of x for periodic boundary condition."""
10 uniform_points(self, n, boundary=True):
11 """Compute the equispaced point locations in the geometry."""
12 random_points(self, n, random="pseudo"):
13 """Compute the random point locations in the geometry."""
14 uniform_boundary_points(self, n):
15 """Compute the equispaced point locations on the boundary."""
16 random_boundary_points(self, n, random="pseudo"):
17 """Compute the random point locations on the boundary."""
\bullet FirstDerivative, which calculates the first derivative of the outputs with
respect to the inputs. This is a special case of OperatorPredictor with the
operator being the first derivative.
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
\bullet MovieDumper, which dumps the movie of the function during the training
progress, and/or the movie of the spectrum of its Fourier transform.
It is very convenient to add other callback functions, which will be called at different
stages of the training process; see Procedure 3.4.
Procedure 3.4 Customization of the callback MyCallback. Here, we only show how
to add functions to be called at the beginning/end of every epoch. Similarly, we can
call functions at the other training stages, such as at the beginning of training.
1 (Callback):
2 on_epoch_begin(self):
3 """Called at the beginning of every epoch."""
4 on_epoch_end(self):
5 """Called at the end of every epoch."""
A B C
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Fig. 7 Example in subsection 4.1. Comparison of the PINN solution with the solution obtained
by using the spectral element method (SEM). (A) The SEM solution uSEM , (B) the PINN
solution uN N , (C) the absolute error | uSEM - uN N | .
RAR developed in subsection 2.8 with m = 1, | \scrS | = 100000, and \scrE 0 = 0.005. The
training procedure after adding new points is shown in Table 3. We compare the
PINN solution with RAR and the PINN solution based on 2540 randomly selected
training data (panels A and B of Figure 8), and demonstrate that PINN with RAR
can capture the discontinuity much better. For a comparison, the finite difference
solutions using the central difference scheme for space discretization and the forward
Euler scheme for time discretization for the Burgers equation in the conservative form
are also shown in Figure 8A. Here, we also present two examples for the residual points
added via the RAR method. As shown in panels C and D of Figure 8, the added points
(green crosses) are quite close to the sharp interface, which indicates the effectiveness
of RAR.
We further solve the following 2D Burgers equation using the RAR:
1 2
\partial t u + u\partial x u + v\partial y u = (\partial u + \partial y2 u),
Re x
1 2
\partial t v + u\partial x v + v\partial y v = (\partial v + \partial y2 v),
Re x
x, y \in [0, 1], and t \in [0, 1],
where u and v are the velocities along the x and y directions, respectively. In addition,
Re is a nondimensional number (Reynolds number) defined as Re = U L/\nu , in which U
and L are, respectively, the characteristic velocity and length, and \nu is the kinematic
viscosity of fluid. The exact solution can be obtained [3] as
3 1
u(x, y, t) = - ,
4 4 [1 + exp(( - 4x + 4y - t)Re/32)]
3 1
v(x, y, t) = + ,
4 4 [1 + exp(( - 4x + 4y - t)Re/32)]
using the Dirichlet boundary conditions on all boundaries. In the present study, Re
is set to be 5000, which is quite challenging due to the fact that the high Reynolds
number leads to a steep gradient in the solution. Initially, 200 residual points are
randomly sampled in the spatio-temporal domain, and 5000 and 1000 random points
are used for each initial and boundary condition, respectively. We only add 10 extra
residual points using RAR, and the total number of the residual points is 210 after
convergence. For comparison, we also test the case without RAR using 210 randomly
sampled residual points. The results are displayed in panels E and F of Figure 8,
demonstrating the effectiveness of RAR.
A 1 B 0.4
PINN w/o RAR PINN w/o RAR
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
L 2 relative error
FD 2400
Reference
0 0.2
u
-0.5 0.1
0
-1
-1 -0.5 0 0.5 1 2500 2510 2520 2530 2540
x # Residual points
C 1 1 D 1 1
0 0 0 0
x
-1 -1 -1 -1
0 0.5 1 0 0.5 1
t t
E F
1
PINN w/ RAR
PINN w/o RAR
0.7
Reference
0.9
u(x, 2/3, t)
v(x, 2/3, t)
t t
0.6
4.3. Inverse Problem for the Lorenz System. Consider the parameter identifi-
cation problem of the Lorenz system
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
dx dy dz
= \rho (y - x), = x(\sigma - z) - y, = xy - \beta z,
dt dt dt
with initial condition (x(0), y(0), z(0)) = ( - 8, 7, 27), where \rho , \sigma , and \beta are the three
parameters to be identified from the observations at certain times. The observations
are produced by solving the above system to t = 3 using Runge--Kutta (4, 5) with
the underlying true parameters (\rho , \sigma , \beta ) = (10, 15, 8/3). We choose 400 uniformly
distributed random points and 25 equispaced points as the residual points \scrT f and \scrT i ,
respectively. The evolution trajectories of \rho , \sigma , and \beta are presented in Figure 9A,
with the final identified values of (\rho , \sigma , \beta ) = (10.002, 14.999, 2.668).
Fig. 9 Examples in subsections 4.3 and 4.4. The identified values of (A) the Lorenz system and
(B) diffusion-reaction system converge to the true values during the training process. The
parameter values are scaled for plotting.
1
Exact
0.9 PINN
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Training points
0.8
y 0.7
0.6
0.5
0.4
0 1 2 3 4 5
x
Fig. 10 Example in subsection 4.5. The PINN algorithm for solving Volterra IDE. The solid blue
line is the exact solution, and the dashed red line is the numerical solution from PINN.
Twelve equispaced residual points (black dots) are used.
multiscale problems are possible across different scientific disciplines by creatively de-
signing the loss function and introducing suitable solution spaces. For instance, in the
five examples we present here, we only assume data on scattered points; however, in
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
geophysics or biomedicine we may have mixed data in the form of images and point
measurements. In this case, we can design a composite neural network consisting of
one convolutional neural network and one PINN sharing the same set of parameters,
and minimize the total loss which could be a weighted summation of multiple losses
from each neural network.
REFERENCES
[18] P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger, A Proof That Artifi-
cial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approxi-
mation of Black-Scholes Partial Differential Equations, preprint, https://arxiv.org/abs/
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
[41] M. A. Nabian and H. Meidani, A Deep Neural Network Surrogate for High-Dimensional
Random Partial Differential Equations, preprint, https://arxiv.org/abs/1806.02957, 2018.
(Cited on pp. 209, 216)
Downloaded 02/06/24 to 92.106.92.235 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
[42] G. Pang, L. Lu, and G. E. Karniadakis, fPINNs: Fractional physics-informed neural net-
works, SIAM J. Sci. Comput., 41 (2019), pp. A2603--A2626, https://doi.org/10.1137/
18M1229845. (Cited on pp. 209, 211, 213, 216, 217)
[43] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
L. Antiga, and A. Lerer, Automatic differentiation in PyTorch, in NIPS 2017 Workshop
on Autodiff, 2017, https://openreview.net/forum?id=BJJsrmfCZ. (Cited on p. 212)
[44] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numer., 8 (1999),
pp. 143--195. (Cited on p. 214)
[45] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao, Why and when can deep---
but not shallow---networks avoid the curse of dimensionality: A review, Internat. J. Au-
tomation Comput., 14 (2017), pp. 503--519. (Cited on p. 209)
[46] N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio,
and A. Courville, On the spectral bias of neural networks, in Proceedings of the 36th
International Conference on Machine Learning, PMLR, 2019, pp. 5301--5310. (Cited on
p. 214)
[47] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial
differential equations, J. Comput. Phys., 378 (2019), pp. 686--707. (Cited on pp. 209, 211,
217, 223)
[48] M. Raissi, A. Yazdani, and G. E. Karniadakis, Hidden fluid mechanics: Learning velocity
and pressure fields from flow visualizations, Science, 367 (2020), pp. 1026--1030. (Cited on
p. 209)
[49] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-
propagating errors, Nature, 323 (1986), pp. 533--536. (Cited on p. 210)
[50] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial dif-
ferential equations, J. Comput. Phys., 375 (2018), pp. 1339--1364. (Cited on p. 209)
[51] A. M. Tartakovsky, C. O. Marrero, P. Perdikaris, G. D. Tartakovsky, and D. Barajas-
Solano, Learning Parameters and Constitutive Relationships with Physics Informed Deep
Neural Networks, preprint, https://arxiv.org/abs/1808.03398, 2018. (Cited on pp. 209,
213)
[52] B. P. van Milligen, V. Tribaldos, and J. Jimenez,\' Neural network differential equation and
plasma equilibrium solver, Phys. Rev. Lett., 75 (1995), pp. 3594--3597. (Cited on p. 209)
[53] N. Winovich, K. Ramani, and G. Lin, ConvPDE-UQ: Convolutional neural networks with
quantified uncertainty for heterogeneous elliptic partial differential equations on varied
domains, J. Comput. Phys., 394 (2019), pp. 263--279. (Cited on p. 225)
[54] Z.-Q. J. Xu, Y. Zhang, T. Luo, Y. Xiao, and Z. Ma, Frequency Principle: Fourier Analysis
Sheds Light on Deep Neural Networks, preprint, https://arxiv.org/abs/1901.06523, 2019.
(Cited on p. 214)
[55] L. Yang, D. Zhang, and G. E. Karniadakis, Physics-Informed Generative Adversarial Net-
works for Stochastic Differential Equations, preprint, https://arxiv.org/abs/1811.02033,
2018. (Cited on pp. 209, 216)
[56] D. Zhang, L. Guo, and G. E. Karniadakis, Learning in Modal Space: Solving Time-
Dependent Stochastic PDEs Using Physics-Informed Neural Networks, preprint, https:
//arxiv.org/abs/1905.01205, 2019. (Cited on pp. 209, 216)
[57] D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis, Quantifying total uncertainty in physics-
informed neural networks for solving forward and inverse stochastic problems, J. Comput.
Phys., 397 (2019), art. 108850. (Cited on pp. 209, 216)
[58] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris, Physics-Constrained Deep
Learning for High-Dimensional Surrogate Modeling and Uncertainty Quantification with-
out Labeled Data, preprint, https://arxiv.org/abs/1901.06314, 2019. (Cited on pp. 209,
225)
[59] B. Zoph and Q. V. Le, Neural Architecture Search with Reinforcement Learning, preprint,
https://arxiv.org/abs/1611.01578, 2016. (Cited on p. 225)