Sartori Toma
Sartori Toma
Laureando
Tomà Sartori – 2022779
Sommario
Questa tesi si propone di analizzare le possibili applicazioni delle reti neurali artificiali, in par-
ticolare delle cosiddette "Physics-informed neural networks" (PINN), un tipo di reti neurali
artificiali che sfrutta la conoscenza delle leggi fisiche per fronteggiare la mancanza di dati e ac-
celerare il processo di "training", nell’ambito dell’elettromagnetismo computazionale, svilup-
pando un codice che utilizzi le PINN per risolvere problemi elettromagnetici, in particolare
problemi elettrostatici 1D e 2D. Viene proposto un esempio, preso dal sito di Mathworks® ,
nel quale si risolve l’equazione di Poisson in un dominio di forma circolare, impiegando una
rete neurale artificiale. A partire dal codice visto nell’esempio, inizia poi il lavoro di sviluppo
di un algoritmo MATLAB che sia in grado di risolvere il problema elettrostatico del conden-
satore piano, in varie configurazioni: 1D con permittività omogenea, 1D con permittività non
omogenea, 2D con permittività non omogenea. I risultati sono poi confrontati con le soluzioni
degli stessi problemi ottenute per via analitica o tramite un codice FEM. Il risultato finale è
un codice MATLAB utilizzabile per risolvere il problema elettrostatico del condensatore piano
nelle configurazioni sopra elencate e che fornisce soluzioni la cui accuratezza è comparabile a
quella delle soluzioni ottenute tramite il metodo degli elementi finiti. Sono inoltre analizzate
le limitazioni che caratterizzano questo tipo di algoritmi e che ne ostacolano l’applicazione a
problemi più complessi.
v
Abstract
The goal of this thesis is to analyze the possible applications of artificial neural networks, in
particular the so-called "Physics-informed neural networks" (PINN), a type of artificial neural
networks that exploits the knowledge of the physical laws to overcome the lack of data and
speed up the training process, in the field of computational electromagnetism, developing a
code that uses PINNs to solve electromagnetic problems described by partial differential equa-
tions, in particular 1D and 2D electrostatic problems. An example, taken from the Mathworks®
website, is proposed. In this example a PINN is used to solve the Poisson’s problem on a unit
disk. From this example, we start the development of a MATLAB algorithm that is able to solve
the electrostatic problem of the parallel plate capacitor, in various configurations: 1D with ho-
mogeneous permittivity, 1D with non-homogeneous permittivity, 2D with non-homogeneous
permittivity. The results are then compared to the analytical or FEM solutions of the same
problems. The final result is a MATLAB code that can be used to solve the parallel-plate
capacitor problem in the aforementioned configurations and provides solutions with accuracy
comparable to those obtained via the finite element method. The limitations that characterize
this type of algorithms and limit their application to more complex problems are also analyzed.
vii
Contents
Sommario iii
Abstract v
List of Figures ix
6 Conclusion 55
A MATLAB codes 57
A.1 Main algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.2 Loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Bibliography 67
ix
List of Figures
3.1 Model setup and mesh generation in MATLAB using the PDE toolbox (source:
Mathworks website [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Geometry and mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Definition of the network structure (source: Mathworks website [19]) . . . . . 17
3.4 Identification of the training points (source: Mathworks website [19]) . . . . . 18
3.5 Generation of the mini-batches of collocation points (source: Mathworks web-
site [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 The training algorithm (source: Mathworks website [19]) . . . . . . . . . . . . 19
3.7 Loss function initialization and gradient computation (source: Mathworks web-
site [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.8 Calculation of the loss function gradient (source: Mathworks website [19]) . . 20
3.9 Implementation of the adamupdate function (source: Mathworks website [19]) 21
3.10 Initialization of the training progress monitor (source: Mathworks website [19]) 21
3.11 Training progress monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.12 Analytical solution (above) and PINN solution (below) (source: Mathworks
website [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
List of Abbreviations
AI Artificial Intelligence
ML Machine Learning
DL Deep Learning
ANN Artificial Neural Network
PINN Physics-Informed Neural Network
AD Automatic Differentiation
MSE Mean Squared Error
FEM Finite Element Method
BVP Boundary Value Problem
IBVP Initial-Boundary Value Problem
1
Chapter 1
are intended to artificially replicate the brain synapses. Each node receives signals (real num-
bers) from other neurons as inputs, and its output is a non-linear function of the sum of its
inputs, called activation function, which can be for example a hard limiter or some kind of
sigmoid function. Each connection to a node has a weight, meaning that each node performs a
weighted sum of its inputs. A bias might also be added to the sum of the inputs. The weights
and biases represent the adjustable parameters of the network. A node (or neuron) of a network
can be mathematically represented as a function:
n
Y = θ ( ∑ Wi Xi + b), (1.1)
i=1
where θ is the activation function, Wi are the weights and b is the bias [5].
The nodes of an ANN are organized in layers. There is usually an input layer, which takes
the input data (for example, some features of sample objects) and feeds them to the rest of
the network, and an output layer, which performs the required task (for example, cataloguing
the objects according to certain criteria). In between them there can be one or more layers,
called hidden layers, whose nodes receive their inputs from the nodes connected to them in
the previous layer, and send their outputs to the nodes connected to them in the next one.
Networks organized in this way are called feedforward networks. If the neurons in one layer
are all connected to every neuron in the next layer, the feedforward network is called fully
connected. Conversely, if the some nodes in a layer also send their output to other nodes in
the same layer or in previous layers, the network is called recurrent. If a layer has N neurons,
whose outputs are described by (1.1), the mathematical representation of the output of the layer
is:
θ (∑Mj=1 W1 j X j + b1 )
⎡ ⎤ ⎡ ⎤
Y1
⎢ .. ⎥ ⎢ .. ⎥
⎢.⎥ ⎢ . ⎥
M
⎢ ⎥ ⎢ ⎥
Y = ⎢ Yi ⎥ = ⎢ θ (∑ j=1 Wi j X j + bi ) ⎥
⎢ ⎥ ⎢
⎥ (1.2)
⎢ .. ⎥ ⎢ .. ⎥
⎣.⎦ ⎣ . ⎦
M
YN θ (∑ j=1 WN j X j + bN )
If we represent the weights Wi, j as a matrix
⎡ ⎤
W11 · · · W1M
⎢ .. .. .. ⎥ ,
W=⎣ . . . ⎦ (1.3)
WN1 · · · WNM
1.3. Network training 3
where X is the vector of the outputs of the previous layer and b is the vector of the biases. If we
take a look at the network in Figure 1.2, its output, according to (1.4) can be mathematically
formulated as:
Y3 = θ (W3 Y2 + b3 )
= θ (W3 θ (W2 Y1 + b2 ) + b3 ) (1.5)
3 2 1 1 2 3
= θ (W θ (W θ (W X + b ) + b ) + b ).
Chapter 2
∇ · D = ρ, (2.1)
∇ · B = 0, (2.2)
∂B
∇×E = − , (2.3)
∂t
∂D
∇×H = J+ , (2.4)
∂t
where D is the electric displacement field, ρ is the charge density, B is the magnetic flux
density, E is the electric field, H is the magnetic field strength and J is the current density.
These equations can be rearranged and combined depending on the particular conditions of
the problem (e.g. electrostatics, magnetostatics etc.). Under specific conditions, discussed in
Section 2.6, they can be reduced to a particular form of the equation:
∂u
∇ · k1 ∇u + k2 = F, (2.5)
∂t
which describes field problems characterized by the presence of a scalar potential u. Together
with boundary and initial conditions, (2.5) fully describes the problem to be solved. In (2.5), u
is the scalar potential (electric or magnetic), F is the source term (e.g. current density) and k1
and k2 are material properties (such as permittivity and reluctivity). Although the application of
PINNs is not limited to this type of PDEs, in this thesis we are going to deal only with equations
like (2.5). A PINN can be trained so that its output approximates the potential distribution u
in a domain. This is done by making it fulfill the PDE together with boundary and initial
conditions. In the next paragraphs we will see how the knowledge of the physical laws can
6 Chapter 2. Physics-informed neural networks
be embedded in the cost function and how the boundary and initial conditions can be used as
additional training data in order to make the network output convergent to the exact solution.
1 B
Lb = ∑ ∥Yb,i − BCi ∥2 ,
B i=1
(2.6)
where B is the number of sample points on the boundary, Yb,i is the network output on the i-th
boundary point and BCi is the value of the potential on the i-th boundary point prescribed by
Dirichlet boundary conditions. In the case of Neumann conditions, (2.6) doesn’t change much.
The only thing is that we need to compute the derivative of the network output on Neumann
boundary points. This can be easily done by a technique called automatic differentiation (AD),
which will be discussed in Section 2.3. The second term of the cost function is directly derived
form the differential equation. Equation (2.5), in the case of static problems, reads:
∇ · k∇u = F. (2.7)
In order to obtain a solution consistent with the equation, the identity (2.8) must be fulfilled at
collocation points, thus obtaining the second term of our loss function:
1 C
LPDE = ∑ res2i ,
C i=1
(2.9)
where C is the number of collocation points, k and u are respectively the vector of the values
of the material coefficient and the vector of the values of the network output on the collocation
points and Fi is the value of the source term on the i-th collocation point. Now we can combine
the two terms to obtain the complete cost function:
L = λ1 Lb + λ2 LPDE , (2.11)
λ1 + λ2 = 1. (2.12)
By minimizing the cost function L, an approximate solution of the field problem is sought.
2.3. Automatic Differentiation 7
The two main aspects of the training process are the computation of the cost function, which
involves the differentiation of the network output with respect to the input coordinates, and its
optimization, which is gradient-based and involves the differentiation of the loss function with
respect to the network parameters. Differentiation and optimization are discussed in the next
sections.
AD makes use of this notation to compute the derivative of the function f . For the reason
mentioned above, this method is suitable for differentiating functions expressed not only in
closed form but also through an algorithm (numerical algorithms perform a combination of
elementary operations). This is of particular interest since, in the practical implementation of
PINNs, we will need to compute the gradient of functions expressed as algorithms.
AD can be performed in two different modes: forward mode and reverse mode. Here we
only present AD in reverse mode, since it is the mode that is commonly used in ML appli-
cations. An in-depth investigation on this topic is presented in [13]. The following example,
proposed by Baydin et al. in [13] to introduce AD, is provided. If we consider a function
f (x1 , x2 ) = ln(x1 ) + x1 x2 − sin(x2 ), evaluated at (x1 , x2 ) = (2, 5), we can compute its gradi-
ent ∇ f = ( ∂∂xf1 , ∂∂xf2 ) in the following way. We split the process into two phases. An initial
8 Chapter 2. Physics-informed neural networks
Intermediate
variables
v−1 = x1 =2
v0 = x2 =5
v1 = ln(v−1 ) = ln(2)
v2 = v−1 v0 = 2·5
v3 = sin(v0 ) = sin(5)
v4 = v1 + v2 = 0.693 + 10
v5 = v4 − v3 = 10.693 + 0.959
y = v5 = 11.652
forward phase, in which we compute all the intermediate variables vi (see table 2.1), and a
second reverse phase, in which we compute the desired derivative combining the derivatives
of the intermediate variables. Once the vi have been defined, we proceed by assigning to each
intermediate variable its complement
∂y
vi = , (2.14)
∂ vi
which represents the sensitivity of y with respect to a variation of vi . Our goal is to compute
v−1 and v0 . From Table 2.1, we see that v0 affects y through v2 and v3 . This means that ve can
compute v0 as:
∂y ∂ y ∂ v2 ∂ y ∂ v3 ∂ v2 ∂ v3
v0 = = + = v2 + v3 . (2.15)
∂ v0 ∂ v2 ∂ v0 ∂ v3 ∂ v0 ∂ v0 ∂ v0
At this point, it is clear how we can compute all the derivatives just by looking at how the
variables interact with each other.
Table 2.2, shows how the derivatives of f with respect to the input (x1 , x2 ) are computed. Note
that v0 and v−1 are computed in two incremental steps. In the case of Rn → R functions, like the
one in the example, only one run of the reverse mode is needed to compute the gradient of the
function ∇ f = ( ∂∂xf1 , ..., ∂∂ xfi , ..., ∂∂xfn ). This is a big advantage in network training applications,
since gradients of scalar functions with a large number of variables need to be computed.
Derivatives
calculation
from y expression:
v5 =y =1
from v5 expression:
v4 = v5 ∂∂ vv54 = v5 · 1 =1
v3 = v5 ∂∂ vv53 = v5 · (−1) = −1
from v4 expression:
v1 = v4 ∂∂ vv41 = v4 · 1 =1
v2 = v4 ∂∂ vv42 = v4 · 1 =1
from v3 expression:
v0 = v3 ∂∂ vv30 = v3 cos v0 = −0.284
from v2 expression:
v−1 = v2 ∂∂vv−12 = v2 v0 =5
v0 = v0 + v2 ∂∂ vv20 = v0 + v2 v−1 = 1.716
form v1 expression:
v−1 = v−1 + v1 ∂∂vv−11 = v−1 + vv−11 = 5.5
∂y
x1 = ∂ x1 = v−1 = 5.5
∂f
x2 = ∂ x2 = v0 = 1.716
where mk and vk are the first and second moment at iteration k, β1 and β2 are two hyperparame-
ters, valued between 0 and 1 (usually close to 1), ∇θ C(θk−1 ) is the gradient of the cost function
computed after (k − 1) iterations and θk−1 is the set of network parameters at iteration (k − 1).
As we said, Adam is a combination of two algorithms. The update rules for Momentum and
RMSProp are:
where α is the learning rate, which is set a priori and controls the amplitude of the movement
in the search space [16], and ε is a small coefficient to avoid division by zero. As we can see
from (2.18) and (2.19), Momentum uses the previous gradient to smooth out fluctuations in
the optimization process, while RMSProp scales the learning rate based on the magnitude of
recent gradients [17]. Adam combines the two update rules to obtain an adaptive learning rate
for each parameter. Before proceeding, we note that at the beginning of the process, mk and vk
are set to zero, and, since β1 and β2 are close to 1, they both tend to be biased towards zero. A
10 Chapter 2. Physics-informed neural networks
We can now use m̂k and v̂k to update the network parameters in the following way:
m̂k
θk = θk−1 − α · √ . (2.22)
( v̂k + ε)
Adam offers a few advantages compared to other optimization algorithms [17], such as adap-
tive learning rates for each parameter, which allows a faster convergence and higher accuracy in
high-dimensional parameter spaces (which is the case of neural networks), the bias correction
of the moments estimates and an overall robustness to hyperparameter choices.
In the next paragraph we combine all the knowledge acquired so far about cost function
definition, automatic differentiation and Adam optimization to explain the training process of
a PINN more in-depth.
where α0 is the initial learning rate, β is the decay rate and k is the current iteration. β1 and
β2 are usually kept at their default value (β1 = 0.9, β2 = 0.999) and ε is set in the order of
10−8 [17].
Once all the hyperpameters have been set, we can start the optimization process. The
collocation points are divided in batches, and a batch queue is formed. The cost function C is
computed on the batch collocation points and on the boundary points, as described in Section
2.2, and then its gradient with respect to the network parameters is obtained through AD.
Once the gradient is computed, we pass it to the Adam algorithm, which updates the network
parameters. The process is repeated on the next batches of points until the batch queue is empty.
Then the algorithm proceeds to the next training epoch, until all epochs are performed and the
training process is over. Algorithm 1 shows a pseudocode describing the network training
process.
2.6. Application of PINNs to computational electromagnetics 11
Once the training is successfully completed, i.e. the cost function has reached a value very
close to zero, we can evaluate the network output on a set of domain points different from
collocation points, and we will obtain the solution of the given problem on those points. If we
have a solution obtained analytically or through a conventional solver (for example FEM) on
the same set of points, we can compare it to the solution obtained through the PINN to see if
the training process was actually done properly. This phase is called network testing and serves
as validation of the whole algorithm.
∇×E = 0 (2.24)
∇·D = ρ (2.25)
D = εE. (2.26)
12 Chapter 2. Physics-informed neural networks
Since the electric field is curl-free, there exists an electric scalar potential V such that:
E = −∇V. (2.27)
Combining (2.24), (2.25), (2.26) and (2.27) we obtain a single equation involving only the
scalar potential V :
∇ · ε∇V = −ρ, (2.28)
which is a particular form of (2.5) and can be used to train a PINN as described in Sections 2.2
and 2.5.
∇×H = J (2.29)
∇·B = 0 (2.30)
H = νB, (2.31)
the magnetic flux density is divergence free, there exists a magnetic vector potential A such
that:
B = ∇ × A. (2.32)
To ensure the uniqueness of the solution, we need to impose a gauge on A. For magnetostatics,
the Coulomb gauge is typically adopted:
∇ · A = 0, (2.33)
Combining (2.29), (2.30) (2.31) and (2.32) we obtain a single equation involving only the
vector potential A:
∇ × ν∇ × A = J. (2.34)
For 2D magnetostatics, if plane symmetry is present (i.e. the x and y components of B vary
only on the (x, y) plane), we can rearrange (2.34) to obtain an equation like (2.5). If B only has
components on the (x, y) plane, i.e. B = (Bx , By , 0), A only has one component along the z-axis
(A = (0, 0, Az )). If we formally compute the curl of A, we obtain:
∂ Az ∂ Az
B = ∇×A = ( ,− , 0) (2.35)
∂x ∂y
−∇ · ν∇Az = Jz , (2.36)
which is a particular form of (2.5) and can be used to train a PINN as described in Section 2.2
and 2.5.
training process, the solution will be consistent with the physics of the problem. Moreover, a
small adjustment in the cost function definition could allow PINNs to also treat problems in
time domain without the need for time-stepping [10]. It would be sufficient to add time as an
input of the network and take collocation points in space and time [9]. The cost function can
be modified by adding the time-dependent term of the PDE to (2.10) and add a term to (2.11)
to take initial conditions into consideration.
PINNs could also be a great tool for non-linear electromagnetic problems. When we treat
magnetic media with non-linear B-H characteristics with FEM, we need to perform an iterative
procedure [18] to account for the variations of the magnetic permeability, which can be very
time-consuming. If we take the variations of µ into consideration during the network training
process, we could use PINNs to treat non-linear problems just as we do with linear ones.
The goal of this thesis is to find out if it is actually possible to practically implement what is
presented in this chapter at the current stage of PINNs development. We will investigate PINNs
capabilities and limitations when it comes to computational electromagnetism, starting from a
simple problem like the solution of Poisson’s equation on a circular domain and then moving
on to more complex settings and tasks. A simple Matlab algorithm is presented in the next
chapter as a starting point, and then, in the following part of the thesis, we will try to modify it
to obtain the solution of increasingly more challenging electromagnetic problems.
15
Chapter 3
−∇2 u = f , (3.1)
where u is the potential distribution, f is the source function and ∇2 is the Laplacian. The
Laplace operator can be written in carthesian coordinates as:
∂2 ∂2 ∂2
∇2 = ( , , ). (3.2)
∂ x2 ∂ y2 ∂ z2
In order to introduce the practical implementation of PINNs, the example of PINN solver
present in the MATLAB library [19] is here examined. In this example, the Poisson’s equation,
with f = 1, is solved on a unit disk (i.e. a circle with r = 1), with u = 0 on the boundary
(Dirichlet condition).The collocation points and boundary points are obtained from the triangle
mesh of the disk and then the potential distribution is approximated by training a PINN. To test
its accuracy, the solution is compared with the analytical solution, which in this case is:
1 − x2 − y2
u(x, y) = . (3.3)
4
− ∇2 u = 1 on Ω
(3.4)
with u = 0 on ∂ Ω.
The domain Ω is a circle with unitary radius. The MATLAB PDE toolbox [20] is used to set
up the model and generate the mesh. Figure 3.1 shows how to generate the model and the mesh
using the functions provided by the PDE toolbox. The function createpde generates a blank
PDE model, then the geometry of the domain is defined by the function geometryFromEdges
(circleg is the function that describes a unitary disk center at the origin). Dirichlet boundary
condition is imposed using applyBoundaryCondition, where "Edge" defines the edges on which
the condition is imposed. After that, the function specifyCoefficients is used to assign to each
16 Chapter 3. PINN solution of Poisson’s equation on a unit disk
F IGURE 3.1: Model setup and mesh generation in MATLAB using the PDE
toolbox (source: Mathworks website [19])
coefficient of the equation its prescribed value. Coefficients m, d, c, a, f are defined as:
∂ 2u ∂u
m 2
+d − ∇(c∇u) + au = f , (3.5)
∂t ∂t
so in this specific case the coefficients are imposed as m=0, d=0, c=1, a=0, f=1 in order to
obtain a PDE equal to the one defined in (3.4). The domain is then meshed using the function
generateMesh, through which the target maximum length of the mesh edges (Hmax ), the mesh
growth rate (Hgrad ) and the target edge size around selected domain edges (Hedge ) can be set. In
this case a smaller edge length around the boundary edges is chosen to obtain a large number
of sampled nodes on the boundary. In this way the boundary conditions can be accurately
enforced on all ∂ Ω. Figure 3.2 shows the meshed domain.
The network that we defined earlier can receive as input 1 × 2 vectors representing the coordi-
nates of each sample point. If we have a batch of N points on which the network output has
to be evaluated, we need to store it as a deep learning array with 2 channels (corresponding to
the two dimensions x, y) of size N. This is done using "BC" as the format input (B=batch size,
C=number of channels) in minibatchqueue. This way the network knows that each row of the
input batch corresponds to a point and each column corresponds to a dimension. The size of
each batch, the number of epochs, the initial learning rate of the optimization algorithm and its
decay rate represent the training options of the problem, i.e. the hyperparameters that we need
to set a priori. In this example, the hyperparamters are chosen as:
• nepochs = 50: number of training epochs,
of points in the queue and passes it, together with all the other required inputs, to the function
that computes the loss function value and its gradient. The command dlfeval [26] is used to
evaluate the function output at each iteration. This command is used to evaluate user defined
deep learning models, such as that implemented in the function modelLoss, proposed in the
example. The loss function defined by the user takes the PDE model, the network structure,
the current batch of collocation points and the coefficients of the PDE as inputs, and yields the
value of the loss function and its gradient with respect to the network parameters.
The first step is to compute the network output on the collocation points. This is done
using the command forward, which takes the network structure and the coordinates as inputs
and gives a vector with values of the network output on the input coordinates as a result. After
that, the Laplacian of the output is computed. This is carried out by evaluating the gradient
of the output with respect to the collocation points coordinates and then the gradient of the x
and y components of the gradient, again with respect to the collocation points coordinates. To
compute gradients, we use automatic differentiation. In MATLAB, automatic differentiation
is performed using the command dlgradient [23], which takes the sum of the network output
vector components and the coordinates as inputs, and gives the gradient of the network output
with respect to the input coordinates as output. In order to be able to compute the Laplacian,
we need to enable the computation of the derivatives of the gradient via the command Enable-
HigherDerivatives, which ensures the differentiability of the gradient. At this point we can
compute the residuals of the PDE on each collocation point of the batch and thus the first term
of the loss function. The residual on the i-th collocation point is computed as:
where uPINN,i is the neural network output on the i-th collocation point. A vector called res,
containing the residuals resi computed on all the collocation points, is defined. For the train-
ing to be considered successful, the residuals should all be zero. We define a target vector
zeroTarget of the same dimension as res, composed of only zeros. The first term of the loss
function is:
lossF = MSE(res, zeroTarget). (3.7)
Now the second term of the loss function, the one related to boundary conditions, needs to be
computed. First, the mesh nodes belonging to each boundary edge are found with the command
findNodes and the respective boundary condition with the command findBoundaryConditions
(in this case it is u = 0 on all the boundary). After that, the coordinates of the boundary
nodes are formatted using the function dlarray, which transforms the matrix of coordinates
in a formatted deep learning array (again, we need to set the format command to "BC", see
Section 3.2 for the details). Then the output of the network on the boundary nodes is evaluated.
If we call the boundary condition imposed on the j-th boundary node as actualBC j and the
network output on the same node as predictedBC j , and we define two vectors actualBC and
predictedBC containing all the actualBC j and predictedBC j terms, the second term of the loss
function is computed as:
Now we have all the elements to calculate the value of the total loss function. We choose
λ1 = 0.6 and λ1 = 0.4 (see (2.11)), and the the loss function is computed as:
Now, the loss function gradient with respect to the network parameters (also called learnables)
is computed with, again, automatic differentiation (see figure 3.8).
After the computation of the loss function value and its gradient, we can proceed with the opti-
mization. The Adam algorithm (see Section 2.4 for details) is implemented in MATLAB with
the function adamupdate [28]. This function takes the network structure, the loss function gra-
dient, the previous first and second moments of the gradient (averageGrad and averageSqGrad
3.4. Training results and network testing 21
in figure 3.9, they need to be initialized to zero before the first epoch of training), the current
iteration number and the learning rate. As outputs, it gives the updated network structure and
the value of the two moments to be used in the next iteration.
After the process has been repeated with all the batches in the queue, the algorithm updates
the learning rate as:
αi
α= , (3.10)
1 + β · E poch
communicates the new loss function value to the training progress monitor and updates its
parameters (see last three rows in figure 3.6). Figure 3.10 shows how the training progress
monitor is initialized in MATLAB.
To test the accuracy of the solution, Upinn , which is the array of potentials evaluated on mesh
nodes, can be compared with the analytical solution (3.3) computed on the same nodes (Utrue ).
To do so we plot the two solutions and we compute the L2 norm of the error vector as:
Figure 3.12 shows the analytical solution and the solution obtained through the training of the
PINN. We can see that the potential distribution is approximated quite well by the PINN, and
the norm of the error is 4.3 · 10−1 . It means that with just 28 seconds of training the network
is able to solve the Poisson’s problem with good accuracy. To further improve the accuracy
of the solution, one can use a finer mesh and a higher number of training epochs. This simple
example is useful to understand how the training of a PINN can be implemented in MATLAB
and how PINN algorithms can be applied to electromagnetic problems. It also represents a
starting point for the development of codes discussed in the next chapters.
3.4. Training results and network testing 23
F IGURE 3.12: Analytical solution (above) and PINN solution (below) (source:
Mathworks website [19])
25
Chapter 4
The goal is to compute the potential distribution inside the dielectric region knowing the value
of the potential at the terminals (V0 and VL ), the distance between them (L) and the charge
density (ρv ) and permittivity (ε) between the two plates.
In electrostatic conditions, the potential distribution inside the capacitor is the solution of
(2.27). The Dirichlet conditions on the two terminals ensure the uniqueness of the solution.
We can use PINNs to solve the equation by modifying the MATLAB code presented in the
previous chapter. The accuracy of the PINN solutions was tested by comparing them to analyt-
ical solutions or to solutions obtained through a FEM algorithm developed in MATLAB. The
FEM algorithm was validated by comparing its results to solutions obtained through COMSOL
Multiphysics® .
means that the electrical potential v will vary only along x-axis and (2.27) becomes:
∂ 2u
= 0. (4.1)
∂ x2
The boundary conditions are u(x = 0) = 1 V and u(x = L) = 0 V. The distance L between the
two terminals is 1m.
We start from the domain definition and the mesh. In this case the domain is simply a
segment of length L, and can be subdivided in n subsegments of equal length, that will be the
elements of the mesh. The higher n, the finer the mesh. In this case we choose n = 50, so we
have 50 elements and 51 nodes. For a 1D problem it is not necessary to use the MATLAB PDE
toolbox to to generate the mesh of the geometry. The equation coefficients are all zero but c,
which is equal to the air permittivity ε0 and can be directly plugged in the loss function formula
without using the specifyCoefficients function (like we did in Section 3.2). The same applies to
the boundary conditions. They can be defined as a two-component vector BdCond and passed
directly to the loss function input without using the applyBoundaryCondition function.
collocation points, since there is no need to divide such a small number of points in batches.
The training process hyperparamters are:
∂ 2 uPINN,i
res1,i = , (4.2)
∂ x2
where uPINN,i is the network output on the i-th collocation point. We use dlgradient to compute
the derivative in (4.2). We define a vector res1 containing all the resi terms. We then define a
target vector target1 of zeros with a number of entries equal to the number of collocation points
(the residual of the PDE must be zero on every collocation point) and compute the first term of
the loss function as:
lossF = MSE(res1 , target1 ). (4.3)
The MSE is computed using the MATLAB function l2loss. The second part of the loss func-
tion is simply constructed by evaluating the network output on the two boundary nodes, and
computing the residual on the j-th boundary point as:
where ubd, j is the network output on the j-th boundary point and BdCond j is the boundary
condition imposed on the j -th boundary point. We define a vector res2 containing all the res2, j
terms and a vector target2 of zeros of the same size as res2 . Then the loss function term related
to the boundary conditions is computed as:
where target2 is a 2-component vector of zeros. The final loss function expression (constructed
as (2.11)), with λ1 = 0.4 and λ2 = 0.6 is:
The loss function gradient with respect to the network parameters is computed using dlgradi-
ent. Figure 4.4 shows the MATLAB function for the loss function calculation.
4.3.2.2 Differentiation
Another problem concerning non-homogeneous media is that, if the loss function of the prob-
lem is constructed as in (2.9), one has to compute the spacial derivative of the material parame-
ter. Unless the parameter variation is described by a function in closed form, AD is not able to
compute its derivatives (see Section 2.3 for details). The only alternative is using finite differ-
ences, but, when moving from one region to another, we may encounter a sharp change in the
value of the material parameter. This leads to large derivatives and has a detrimental impact on
the stability of the network training [10] and subsequently on the accuracy of the final solution.
To solve the aforementioned problem, Gong et al., in [10], propose a different way to define
the loss function. They consider a magnetostatic problem, but their approach can be easily
transposed to electrostatics. In Chapter 2, we have seen the PDE formulations of electrostatic
and magnetostatic problems. Both can be described by a Poisson’s equation (2.28) in terms
of scalar potential. In the approach proposed in [10] Maxwell’s equations are not combined
together: for instance, for electrostatics, Gauss’ equation is not combined to the constitutive
equation for dielectrics; in this way, the field problem is formulated in terms of electric potential
and electric displacement field:
−ε∇V = D (4.9)
∇ · D = ρ. (4.10)
∂V
−ε =D (4.11)
∂x
∂D
= ρ. (4.12)
∂x
We see that, in (4.11), the permittivity is not differentiated. If we construct a neural network
4.3. One-dimensional capacitor with non-homogeneous permittivity 31
with two outputs (approximating V and D) instead of just one, we can train it using (4.11) and
(4.12) to define the loss function. This way, instead of one residual for lossF and one residual
for lossU, we will have multiple residuals to combine. On the i-th collocation point we will
have:
∂VPINN,i
res1,i = −εi − DPINN,i (4.13)
∂x
∂ DPINN,i
res2,i = −ρ (4.14)
∂x
for lossF, where VPINN,i , DPINN,i and εi are the network outputs and the permittivity value on
the i-th collocation point, and, on the j-th boundary point,
for lossU, where Vbd, j is the first network output on the j-th boundary point and VDir, j are the
corresponding Dirichlet condition. We can now build the loss function as:
where the lossFk in (4.16) are computed as in (4.3) and the lossU in (4.17) as in (4.5).
∂VPINN,i
res1,i = −pi − DPINN,i (4.18)
∂x
lossF1 = MSE(res1 , target1 ), (4.19)
where pi is the component of the vector p (see Section 4.3.1) corresponding to the i-th colloca-
tion point, res1 is a vector containing all the res1,i terms and target1 is a vector of zeros of the
same size as the number of collocation points. The second term of the loss function is defined
from (4.10). In this case ρ = 0, so we end up with
∂ DPINN,i
res2,i = (4.20)
∂x
lossF2 = MSE(res2 , target2 ), (4.21)
where res2 is a vector containing all the res2,i terms and target2 is a vector of zeros of the same
size as the number of collocation points. The derivatives in (4.18) and (4.20) are computed
using dlgradient. The loss function term related to the PDE is then computed as:
Now we need to define the two terms pertaining the boundary conditions on V and D. The
conditions on V are simply V (x = 0) = 1 and V (x = L) = 0. The boundary conditions on D
come from (4.9). In a 2D domain, if we have Dirichlet conditions on V , it means that ∂V
∂t = 0
on the boundary, leading to Dt = ε ∂V
∂t = 0. In this case the domain is 1D, so we only have
4.3. One-dimensional capacitor with non-homogeneous permittivity 33
the normal component Dn = Dx , meaning that we actually don’t need to enforce any boundary
condition on D. This leaves us with only one loss function term for the BCs, defined as in (4.4):
where Vbd, j and BdCond j are the network output and the boundary condition on the j-th bound-
ary point and target3 is a vector of zeros of the same size as the number of boundary points.
The total loss function is then obtained combining lossF and lossU:
Its gradient with respect to the network parameters is computed using dlgradient.
Figure 4.8 shows the modified MATLAB function that computes the loss function and its gra-
dient. The rest of the training process is the same as described in Section 3.3.3.
The L2 norm of the error vector between the analytical and the PINN solution is 0.0075. Figure
4.9 shows the loss function evolution in time while Figure 4.10 shows the comparison between
the two solutions evaluated on the mesh nodes. We can see that the PINN manages to approxi-
mate the real solution quite well with a fairly small number of training epochs.
To further test the accuracy of the solution, we can exploit the differentiability of the PINN
4.3. One-dimensional capacitor with non-homogeneous permittivity 35
output (see Section 1.4) to compute the electric field inside the capacitor. At this purpose a
MATLAB function that exploits dlgradient for automatic differentiation has been developed.
It allows to compute the electric field from the scalar potential as dlgradient do compute E as:
∂V
E =− . (4.27)
∂x
If we compute E on the mesh nodes, we can compare it with the analytical solution derived
from (4.26):
V0 −VL 1
E(x) = · . (4.28)
log(2) x + L
In figure 4.11 we see the function used to compute the electric field, while figure 4.12 shows
the comparison between the electric field computed via (4.28) and the one computed differen-
tiating the PINN solution via AD. The L2 norm of the error vector between the two solutions,
computed as in (3.11), is 0.1365, which confirms the good accuracy of the PINN solution.
F IGURE 4.13: One dimensional capacitor with varying permittivity, PINN vs.
analytical, old approach
alytical solution. We can see that the PINN was not able to approximate the variation of the
permittivity and the solution looks like the one obtained with a constant permittivity value.
This is due to the fact that AD, as already mentioned in Section 4.3.2.2, is not able to compute
the derivatives of the permittivity, because the permittivity is passed to the modelLoss function
as a vector of constant coefficients, while AD can only differentiate functions expressed in
closed form or by an algorithm (see Section 2.3). In this particular case, since ε is defined by
an expression in closed form (see (4.8)), we could solve this problem by defining the permit-
tivity vector inside the modelLoss MATLAB function instead of passing it to the function as an
input, allowing AD to compute its derivatives. However, in the next Sections, we will present
some cases in which this is not possible. This means that the approach proposed in Section 4.3
allows us to treat problems that would be impossible to solve using the algorithm discussed in
the previous chapter, improving the generality of the code.
∂VPINN,i
res1,i = −pi − Dx,PINN,i (4.29)
∂x
∂VPINN,i
res2,i = −pi − Dy,PINN,i (4.30)
∂y
∂ Dx,PINN,i ∂ Dy,PINN,i
res3,i = ( + ). (4.31)
∂x ∂y
On the j-th Dirichlet node we have two residuals to minimize:
whereVbd, j and BdCond j are the network output corresponding to the electric potential and the
boundary condition on the j-th Dirichlet node and Dbd,x, j is the network output corresponding
to the electric displacement field component tangent to the Dirichlet boundary on the j-th
Dirichlet node. On the k-th Neumann node we have two residuals to minimize, related to
Neumann conditions on V and D:
∂VbdN,k
res6,k = (4.34)
∂x
res7,k = DbdN,x,k , (4.35)
(4.36)
where VbdN,k and DbdN,x,k are the network output components on k-th Neumann node. Each
loss function component is then defined as in (4.3):
The complete loss function is then a linear combination of all the lossn terms. Figure 4.18
shows the MATLAB function used to compute the loss function and its gradient.
After 300 epochs of training, for a total training time of 1 minute and 28 seconds, the loss
function has reached a value of 1.9226 · 10−04 . We can test if the approach was successful
by comparing the potential distribution along the y-axis with the one obtained in the 1D case.
To test the accuracy of the solution, we can plot the potential distribution along the y-axis on
50 equally spaced points for three different values of the x-coordinate (x = 0 m, x = 0.5 m
and x = 1 m). If the training is carried out properly,the potential distributions obtained from
these three x-coordinates must be approximately the same. We can also compare them with
the analytical solution of the 1D case to see if they are accurate. Figure 4.19 shows the results.
40 Chapter 4. PINN solution of the parallel-plate capacitor problem
We can see that in all the three cases the PINN solution is accurate compared to the analytical
one. To validate the graphical results, we can check the L2 norm of the error vectors. For x = 0
m, L2 (E) = 0.0203, for x = 0.5 m, L2 (E) = 0.0224 and for x = 1 m, L2 (E) = 0.0183. All the
norms are small and in the same order of magnitude.
We can conclude that the proposed approach works for both 1D and 2D electrostatic prob-
lems with non-homogeneous permittivity. In the next Section we will see if it also works when
we have a permittivity that is not a continuous function (unlike (4.8)), but has a constant value
in one region of the dielectric and another constant value in another region.
batches, one with the points belonging to the first region and one with the points belonging to
the second region. We will call these two arrays XY 1 and XY2 .
Now it is time to modify our MATLAB function to compute the loss function and its gra-
dient. The idea to solve this kind of problem is to enforce (4.9) and (4.10) on the collocation
points of the two regions separately, and then combine the two solutions by imposing an in-
terface condition. In this case we can enforce the continuity of the electric displacement field
component normal to the interface (Dy in this case). This new condition requires the addition
of a new term to the loss function:
where Dy,int1,i and Dy,int2,i are the y-components of the D field evaluated on the i-th couple of
interface points (interface points that have the same x-coordinate but belong one to region 1
and one to region 2), resint is a vector containing all the resint,i terms and targetint is a vector
of zeros of the same size as resint . Figure 4.21 shows the bit of code that enforces the interface
condition in the MATLAB function. nod_int1 and nod_int2 are the interface nodes in the two
regions.
F IGURE 4.21: Enforcement of the interface condition in the loss function al-
gorithm
Boundary conditions will be enforced in the same way as we did in the previous example. The
final loss function will be the linear combination of the loss functions related to the PDE in the
two regions, the loss function related to the interface condition and the loss functions related to
the boundary conditions.
two regions regions, as we did in the previous examples. The first region is the area around the
inclusion (cyan area in figure 4.26), the second one is the inclusion (red area in Figure 4.26).
The interface conditions to enforce are the following:
−
[Dy ] = D+
y − Dy = 0 for interfaces parallel to the capacitor terminals (4.40)
−
[Dx ] = D+
x − Dx = 0 for interfaces normal to the capacitor terminals. (4.41)
(4.40) and (4.41) are enforced in the same way as described in Section 4.5.2, being careful not
to enforce two different conditions on the nodes that are in the corners of the inclusion. They
could belong to both parallel and normal interfaces, we arbitrarily decide that they belong to
the parallel ones. The training is then performed in the same way as before, with the addition
of two loss function terms to account for interface conditions on Dx . The number of epochs is
initially set to 10000.
time, even though the loss function has reached a value of 2.407 · 10−6 , the solution obtained
was not accurate compared to the one obtained with FEM. We can clearly observe from Figure
4.28 that the PINN output is not an accurate approximation of the potential inside the capacitor.
The equipotential lines are straight, while they should curve in correspondence of the dielectric
inclusion edges, and they are concentrated towards the lower terminal of the capacitor. Numer-
ical results show that the algorithm is not able to treat 2D electrostatic problems in which the
electric displacement has two spatial components.
50 Chapter 4. PINN solution of the parallel-plate capacitor problem
Chapter 5
of a ferromagnetic material. We can observe how the curve saturates, showing a non linear be-
havior. An in-depth explanation of ferromagnetic materials and the phenomenon of hysteresis
is presented in [31].
As already mentioned in Section 2.7, a finite element solution of non-linear electromagnetic
problems can be computationally demanding because it often requires a time-domain solver.
Integrating the non-linearity into the loss function of a network can be a way to solve non-linear
problems using PINNs. The value of µ can be updated at each training iteration, knowing the
magnetization curve of the material and the value of H and B. B can be easily obtained from
5.4. Time-dependent problems 53
the curl of the magnetic potential A (by using automatic differentiation), we would need to
add one more PINN output to obtain the magnetic field strength H. This can be done using
the approach proposed in [10], that we already readapted to solve the parallel plate capacitor
problems in Chapter 4. An additional term would be added to the loss function to account for
the variation of µ:
BPINN,i
µcurve,i = (5.1)
HPINN,i
BPINN,i
resµ,i = − µcurve,i , (5.2)
HPINN,i
where µcurve,i is the value of the permeability on the i-th collocation point obtained from the
magnetization curve, and BPINN,i and HPINN,i are the values of the magnetic flux density and
the magnetic field strength on the i-th collocation point obtained through the PINN training. In
this way the non-linearity would be integrated in the PINN training and the solution obtained
should be coherent with the physics of the problem.
∂
∇ · k1 ∇u(x, y,t) + k2 u(x, y,t) = F. (5.3)
∂t
The residuals that define the loss function would be:
∂ uPINN,i
resPDE,i = −∇ · k1 ∇uPINN,i − k2 − F, (5.4)
∂t
resin,i = uPINN,i,0 − u0,i (5.5)
resbd, j = uPINN,bd, j − ubd, j , (5.6)
where uPINN,i is the network output computed on the collocation point (xi , yi ,ti ), uPINN,i,0 is
the network output on the collocation point (xi , yi , 0), uPINN,bd, j is the network output on the
boundary point (xbd, j , ybd, j ,t j ) and u0,i and ubd, j are the initial and boundary conditions on
the i-th collocation point and on the j-th boundary point. All the loss function terms can be
computed as in (3.7).
be tackled in order to make PINNs a viable alternative to traditional numerical methods such
as FEM. If what is discussed in this chapter is implemented successfully, it could make the
PINN-based algorithm a serious competitor against conventional solvers for electromagnetic
initial-boundary value problems (IBVPs). However, the issues mentioned in Section 5.1 and
5.2 must be resolved.
55
Chapter 6
Conclusion
The goal of this work was to develop a MATLAB code that was able to solve increasingly more
difficult electromagnetic BVPs using PINNs. We analyzed all the essential aspects of artificial
neural networks structure and training and then we tried to apply the knowledge acquired to the
solution of practical examples. We started from the Poisson’s equation example taken from the
MathWorks® website and then moved on to analyze various configurations of the parallel plate
capacitor, re-adapting the initial code depending on the situation. We succesfully adapted the
MATLAB PINN solver to the analysis of a 1D electrostatic problem with homogeneous media.
In order to extend the analysis to inhomogeneous approach, a new approach, based on field and
scalar variables, was developed. The approach proposed in [10], that we rearranged in order
to apply it to electrostatic problems, combined with the interface conditions method that we
implemented, turned out to be successful in various applications. Numerical results showed a
very good agreement with both analytical and FEM solutions.
However, the extension to 2D static problems led to a considerable increase in the code
complexity. The training time has risen from 9 s all the way up to 50 min and 34 s, which is a
considerable amount of time compared to a standard FEM solver for a linear, two-dimensional,
electrostatic problems like the ones we presented in Section 4.7. Moreover, it was not possible
to solve a full 2D electrostatic problem with a dielectric inclusion.
Even though the attempts made in this thesis were not always successful and there are still
many steps to be taken in order to make PINN-based solvers competitive with FEM-based ones,
we also need to highlight the positive results that are presented in this thesis. Even if the prob-
lems that the algorithm was able to solve were very simple, it must be remarked that PINNs
are still at an early stage of development and the literature is still limited, especially if electro-
magnetic problems are considered. It is clear that developing a code that is able to deal with
the vast majority of problems that electrical engineers are required to solve is not an easy task.
The algorithm presented in this thesis is just at the beginning of its development and already
encountered challenges that seem unsolvable with the tools and strategies that we introduced
in the previous chapters. A huge amount of research will be required to bring everything that
we mentioned in this thesis together into a code that is usable, solid and efficient.
The possible future developments of the code presented in this thesis, as discussed in
Chapter 5, regard the possibility to solve problems that feature non-homogeneous media, field
sources, non-linear materials and time-dependent equations. As this thesis highlights, PINNs
offer all the tools needed to tackle this type of problems. However a way to successfully im-
plement them in practical applications must be found.
57
Appendix A
MATLAB codes
In this Appendix we present the last functioning version of the algorithm, that was used to
solve the problem discussed in Section 4.7. Section A.1 shows the main algorithm and Section
A.2 shows the function used to compute the loss function value and its gradients.
%Domain
L_x=1; %[m]
L_y=1; %[m]
rel_toll=1e-6;
%Boundary conditions
V_b=1; %bottom
V_t=0; %top
V_0=V_t-V_b; %applied voltage
%% MESH GENERATION
centres=zeros(M,2);
for i=1:M
nodes=conn(i,:);
nodtr=nod(nodes,:);
centres(i,:)=mean(nodtr,1);
end
ind=convhull(nod(:,1),nod(:,2));
Fr=sort(ind(1:end-1));
indFr1=find(abs(nod(Fr,2))<rel_toll*L_y); %bottom nodes:
%they have y coordinate equal to 0
FrD1=Fr(indFr1); %dirichlet boundary nodes (bottom)
indFr2=find(abs(nod(Fr,2)-L_y)<rel_toll*L_y); %top nodes:
%they have y coordinate equal to L_y
FrD2=Fr(indFr2); %dirichlet boundary nodes (top)
indN=setdiff(Fr,FrD1);
indN=setdiff(indN,FrD2);
BdNodes_top=nod(FrD2,:);
BdNodes_bottom=nod(FrD1,:);
BdNodes=nod(Fr,:);
BdNodesD=[BdNodes_top
BdNodes_bottom];
BdNodesD=dlarray(BdNodesD,"BC");
BdNodesN=nod(indN,:);
BdNodesN=dlarray(BdNodesN,"BC"); %neumann nodes
%Boundary conditions
BdCond=zeros(size(BdNodesD,1),1);
BdCond(1:size(BdNodes_top,1))=V_t;
BdCond(size(BdNodes_top,1)+1:size(BdNodes_top,1)+size(BdNodes_bottom,1))=V_b;
BdCond=dlarray(BdCond,"BC");
ind_reg2=[];
for e=1:M
triangle=conn(e,:); %triangle nodes
nodtri=nod(triangle,:); %triangle coordinates
centre=mean(nodtri,1); %compute the centre of the triangle
%check if the triangle is inside the inclusion:
if abs(centre(2))>0.3 && abs(centre(2))<0.7
ind_reg2=[ind_reg2,e]; %#ok<AGROW>
patch(nodtri(:,1),nodtri(:,2),'red')
else
patch(nodtri(:,1),nodtri(:,2),'blue')
end
end
ind_reg3=[];
for e=1:M
triangle=conn(e,:); %triangle nodes
nodtri=nod(triangle,:); %triangle coordinates
centre=mean(nodtri,1); %compute the centre of the triangle
%check if the triangle is inside the inclusion:
if abs(centre(2))>0.7
ind_reg3=[ind_reg3,e]; %#ok<AGROW>
patch(nodtri(:,1),nodtri(:,2),'green')
end
end
ind_tot=1:M;
ind_reg1=setdiff(ind_tot,ind_reg2);
ind_reg1=setdiff(ind_reg1,ind_reg3);
%% INTERFACE POINTS
ind_int3=[ind_int3,i]; %#ok<AGROW>
end
nod_int1=sortrows(nod_int1);
nod_int1=dlarray(nod_int1,"BC");
nod_int2_1=sortrows(nod_int2_1);
nod_int2_1=dlarray(nod_int2_1,"BC");
nod_int2_2=sortrows(nod_int2_2);
nod_int2_2=dlarray(nod_int2_2,"BC");
nod_int3=sortrows(nod_int3);
nod_int3=dlarray(nod_int3,"BC");
centres=dlarray(centres',"BC");
domainCollocationPoints=centres;
ind_int=[ind_int1, ind_int2_1, ind_int2_2, ind_int3];
ind_reg1=setdiff(ind_reg1,ind_int1);
ind_reg2=setdiff(ind_reg2,ind_int2_1);
ind_reg2=setdiff(ind_reg2,ind_int2_2);
ind_reg3=setdiff(ind_reg3,ind_int3);
domainCollocationPoints(ind_int,:)=[];
%% COEFFICIENTS VECTOR
p=p1.*ones(M,1);
p(ind_reg2)=p2;
p=dlarray(p,"BC");
%% NETWORK ARCHITECTURE
numNeurons=150;
A.1. Main algorithm 61
layers=[
featureInputLayer(2,Name="featureinput")
fullyConnectedLayer(numNeurons,Name="fc1")
tanhLayer(Name="tanh_1")
fullyConnectedLayer(numNeurons,Name="fc2")
tanhLayer(Name="tanh_2")
fullyConnectedLayer(numNeurons,Name="fc3")
tanhLayer(Name="tanh_3")
fullyConnectedLayer(3,Name="fc4")
];
pinn=dlnetwork(layers);
%% TRAINING OPTIONS
%Initialize the average gradients and squared average gradients for Adam
average_grad=[];
average_sq_grad=[];
N_iterations=N_epochs;
monitor=trainingProgressMonitor(Metrics="Loss",Info="Epoch",...
...XLabel="Iteration");
%% TRAIN PINN
iteration=0;
epoch=0;
learning_rate=initialLR;
[loss,gradients]=dlfeval(@modelLoss_cap2D,pinn,BdNodesD,BdNodesN,...
...BdCond,XY1,XY2,XY3,nod_int1,nod_int2_1,nod_int2_2,nod_int3);
[pinn,average_grad,average_sq_grad]=adamupdate(pinn,gradients,...
...average_grad,average_sq_grad,iteration,learning_rate);
learning_rate=initialLR/(1+LRDecay*iteration);
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch+"of"+N_epochs);
monitor.Progress=100*iteration/N_iterations;
end
%% TEST PINN
%Contour plot
nodes_dlarray=dlarray(nod',"CB");
Out_pinn=gather(extractdata(predict(pinn,nodes_dlarray)));
Vpinn=Out_pinn(1,:);
Dxpinn=Out_pinn(2,:);
Dypinn=Out_pinn(3,:);
figure (3)
Z=reshape(Vpinn,[N_y+1,N_x+1]); %reshape the solution vector as a matrix
contour(X,Y,Z,'LineWidth',1)
colorbar
axis equal
xlabel('x');
ylabel('y');
zlabel('potential (PINN)')
xt = 0.35;
yt = 0.6;
str = 'PINN solution';
text(xt,yt,str,'Color','black','FontSize',14)
f=gcf;
exportgraphics(f,'PINN solution.jpg')
A.2. Loss function 63
[out1]=forward(net,XY1);
V=out1(1,:);
D_x=out1(2,:);
D_y=out1(3,:);
%Compute derivatives of V
dV=dlgradient(sum(V,"all"),XY1,EnableHigherDerivatives=true);
dV_x=dV(1,:);
dV_y=dV(2,:);
%Compute derivatives of D
dD_x=dlgradient(sum(D_x,"all"),XY1,EnableHigherDerivatives=true);
dD_xx=dD_x(1,:);
dD_y=dlgradient(sum(D_y,"all"),XY1,EnableHigherDerivatives=true);
dD_yy=dD_y(2,:);
res1=-dV_x-D_x;
res2=-dV_y-D_y;
res3=dD_xx+dD_yy;
target1=zeros(size(res1));
lossF1=l2loss(res1,target1);
target2=zeros(size(res2));
lossF2=l2loss(res2,target2);
target3=zeros(size(res3));
lossF3=l2loss(res3,target3);
lambda1=0.25;
lambda2=0.25;
lossF_reg1=lambda1*lossF1+lambda2*lossF2+(1-lambda1-lambda2)*lossF3;
[out2]=forward(net,XY2);
V=out2(1,:);
D_x=out2(2,:);
D_y=out2(3,:);
%Compute derivatives of V
dV=dlgradient(sum(V,"all"),XY2,EnableHigherDerivatives=true);
64 Appendix A. MATLAB codes
dV_x=dV(1,:);
dV_y=dV(2,:);
%Compute derivatives of D
dD_x=dlgradient(sum(D_x,"all"),XY2,EnableHigherDerivatives=true);
dD_xx=dD_x(1,:);
dD_y=dlgradient(sum(D_y,"all"),XY2,EnableHigherDerivatives=true);
dD_yy=dD_y(2,:);
res1=-16.*dV_x-D_x;
res2=-16.*dV_y-D_y;
res3=dD_xx+dD_yy;
target1=zeros(size(res1));
lossF1=l2loss(res1,target1);
target2=zeros(size(res2));
lossF2=l2loss(res2,target2);
target3=zeros(size(res3));
lossF3=l2loss(res3,target3);
lambda1=0.25;
lambda2=0.25;
lossF_reg2=lambda1*lossF1+lambda2*lossF2+(1-lambda1-lambda2)*lossF3;
[out3]=forward(net,XY3);
V=out3(1,:);
D_x=out3(2,:);
D_y=out3(3,:);
%Compute derivatives of V
dV=dlgradient(sum(V,"all"),XY3,EnableHigherDerivatives=true);
dV_x=dV(1,:);
dV_y=dV(2,:);
%Compute derivatives of D
dD_x=dlgradient(sum(D_x,"all"),XY3,EnableHigherDerivatives=true);
dD_xx=dD_x(1,:);
dD_y=dlgradient(sum(D_y,"all"),XY3,EnableHigherDerivatives=true);
dD_yy=dD_y(2,:);
res1=-dV_x-D_x;
res2=-dV_y-D_y;
res3=dD_xx+dD_yy;
A.2. Loss function 65
target1=zeros(size(res1));
lossF1=l2loss(res1,target1);
target2=zeros(size(res2));
lossF2=l2loss(res2,target2);
target3=zeros(size(res3));
lossF3=l2loss(res3,target3);
lambda1=0.25;
lambda2=0.25;
lossF_reg3=lambda1*lossF1+lambda2*lossF2+(1-lambda1-lambda2)*lossF3;
[out_int1]=forward(net,nod_int1);
[out_int2_1]=forward(net,nod_int2_1);
[out_int2_2]=forward(net,nod_int2_2);
[out_int3]=forward(net,nod_int3);
Dy1=out_int1(3,:);
Dy21=out_int2_1(3,:);
Dy23=out_int2_2(3,:);
Dy3=out_int3(3,:);
res4=Dy1-Dy21;
res5=Dy23-Dy3;
target4=zeros(size(res4));
target5=zeros(size(res5));
lossF4=l2loss(res4,target4);
lossF5=l2loss(res5,target5);
loss_int=0.5*lossF4+0.5*lossF5;
lossF=0.25*lossF_reg1+0.25*lossF_reg2+0.25*lossF_reg3+(1-0.75)*loss_int;
%Electric potential
[Bdout]=forward(net,BdNodesD);
V_bd=Bdout(1,:);
res4=V_bd-BdCond;
target4=zeros(size(res4));
lossU1=l2loss(res4,target4);
D_bdx=Bdout(2,:);
res5=D_bdx;
target5=zeros(size(res5));
lossU2=l2loss(res5,target5);
lambda3=1;
lossUd=lambda3*lossU1+(1-lambda3)*lossU2;
66 Appendix A. MATLAB codes
%Electric potential
[Bdout]=forward(net,BdNodesN);
V_bd=Bdout(1,:);
dV_bd=dlgradient(sum(V_bd,"all"),BdNodesN,EnableHigherDerivatives=true);
dV_bd_x=dV_bd(1,:);
res6=dV_bd_x;
target6=zeros(size(res6));
lossU3=l2loss(res6,target6);
D_bdx=Bdout(2,:);
res7=D_bdx;
target7=zeros(size(res7));
lossU4=l2loss(res7,target7);
lambda4=0.5;
lossUn=lambda4*lossU3+(1-lambda4)*lossU4;
lambda5=0.5;
lossU=lambda5*lossUd+(1-lambda5)*lossUn;
lambda=0.5;
loss=lambda*lossF+(1-lambda)*lossU;
gradients=dlgradient(loss,net.Learnables);
end
67
Bibliography
[2] LeCun, Y., Bengio, Y., Hinton, G. "Deep learning", Nature, Vol. 521, pp. 436-44, May
2015.
[3] Shinde, P.P., Shah, S. "A Review of Machine Learning and Deep Learning Applica-
tions", Fourth International Conference on Computing Communication Control and Au-
tomation (ICCUBEA), 2018.
[4] Krogh, A. "What are artificial neural networks?", NATURE BIOTECHNOLOGY, Vol.
26, No. 2, Feb. 2008.
[5] Awad, M., Khanna, R. Efficient Learning Machines, Theories, Concepts, and Applica-
tions for Engineers and System Designers, pp. 127-142, ApressOpen, 2015.
[6] Krenker, A., Bešter, J., Kos, A. "Introduction to the Artificial Neural Networks", Arti-
ficial Nural Networks Methodological Advances and Biomedical Applications, InTech,
2011.
[7] Beltrán-Pulido, A., Bilionis, I., Aliprantis, D. "Physics-Informed Neural Networks for
Solving Parametric Magnetostatic Problems", IEEE Transactions on Energy Conver-
sion, Vol. 20, No. 10, Jun. 2022.
[8] Lagaris, I.E., Likas, A., Fotiadis, D.I. "Artificial Neural Networks for Solving Ordinary
and Partial Differential Equations", IEEE Transactions on Neural Networks, Vol. 9, No.
5, Sep. 1998.
[9] Lu, L., Meng, X., Mao, Z., Karniadakis, G.M. "DeepXDE: A Deep Learning Library
for Solving Differential Equations", SIAM Review, Vol. 63, No. 1, pp. 208–228, Society
for Industrial and Applied Mathematics, Feb. 2021.
[10] Gong, Z., Chu, Y., Yang, S. "Physics-Informed Neural Networks for Solving Two-
Dimensional Magnetostatic Fields", IEEE Transactions on Magnetics, Vol. 59, No. 11,
Nov. 2023.
[11] Katsikis, D., Muradova, A.D., Stavroulakis, G.E. "A Gentle Introduction to Physics-
Informed Neural Networks, with Applications in Static Rod and Beam Problems", Jour-
nal of Advances in Applied & Computational Mathematics, Vol. 9, pp. 103-128, May
2022.
[12] Hornik, K., Stinchcombe, M., White, H. "Multilayer feedforward networks are universal
approximators", Neural Networks, Vol. 2, No. 5, pp. 359-366, 1989.
[13] Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M. "Automatic Differentiation
in Machine Learning: a Survey", Journal of Machine Learning Research, Vol. 18, pp.
1-43, Apr 2018.
68 BIBLIOGRAPHY
[14] Griewank, A., Walther, A. Evaluating Derivatives: Principles and Techniques of Algo-
rithmic Differentiation, Society for Industrial and Applied Mathematics, 2008.
[15] Kingma, D.P., Ba, J.L. "Adam: a method for stochastic optimization", International
Conference on Learning Representations, 2015.
[18] Kim, N.H. Introduction to Nonlinear Finite Element Analysis, Springer, 2015.
[19] "Solve Poisson Equation on Unit Disk Using Physics-Informed Neural Networks",
MathWorks Help Center,
<https://it.mathworks.com/help/pde/ug/solve-poisson-equation-on-unit-disk-using-
pinn.html>
[30] "Electrical Transformers, What Transformers Are and How They Work", Maddox Trans-
former,
<https://www.maddoxtransformer.com/electrical-transformers>.
BIBLIOGRAPHY 69
[31] Hernando, A., Crespo, P., Marín, P., González, A. "Magnetic Hysteresis", Encyclopedia
of Materials: Science and Technology, pp. 4780-4787, Elsevier Science Ltd., 2001.
[32] Wang, J., Lin, H., Fang, S., Huang, Y. "A General Analytical Model of Permanent Mag-
net Eddy Current Couplings", IEEE Transactions on Magnetics, Vol. 50, No. 1, Jan.
2014.
[34] Mattey, R., Ghosh, S. "A Physics Informed Neural Network for Time–Dependent Non-
linear and Higher Order Partial Differential Equations", Computer Methods in Applied
Mechanics and Engineering, Jun. 2021.