<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finding Multiple Solutions of ODEs with Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Applied Computational Science, Harvard University</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Marco Di Giovanni</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Politecnico di Milano. Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB).</institution>
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Applications of neural networks to numerical problems have gained increasing interest. Among different tasks, finding solutions of ordinary differential equations (ODEs) is one of the most intriguing problems, as there may be advantages over well established and robust classical approaches. In this paper, we propose an algorithm to find all possible solutions of an ordinary differential equation that has multiple solutions, using artificial neural networks. The key idea is the introduction of a new loss term that we call the interaction loss. The interaction loss prevents different solutions families from coinciding. We carried out experiments with two nonlinear differential equations, all admitting more than one solution, to show the effectiveness of our algorithm, and we performed a sensitivity analysis to investigate the impact of different hyper-parameters.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>A wide variety of phenomena are governed by
mathematical laws that can be represented using ordinary differential
equations, ranging from the field of physics to chemistry to
finance.</p>
      <p>In the field of physics, examples include Navier-Stokes
equation, Schrodinger equation and many others. However,
it is not always possible to solve those problems
analytically, and sometimes numerical approaches are required
since these equations are too complicated or the number of
equations and variables is too big.</p>
      <p>Numerical approaches to solve differential equations have
been widely studied and improved almost since the first
differential equations were written down. Classical approaches
for spatial discretization of partial differential equations
(PDEs) include finite differences, finite elements, finite
volumes, and spectral methods. Classical methods for
discretizing ordinary differential equations (ODEs) include the
Runge-Kutta (RK) methods and the backward difference
formulas (BDF). Also artificial neural networks (ANNs)
have been applied to this scope. Lagaris, Likas, and
Fotiadis(1998) listed a set of theoretical advantages of the
Copyright c 2020, for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0 International
(CCBY 4.0).</p>
      <p>ANN approach, suggesting that even if the results are not
quite as good as the classical approaches, there are good
reasons to continue the research in this direction. In fact, the
solution obtained is differentiable with a closed analytic form
and this approach is easily parallelizable and general since it
can be applied to ODEs, systems of ODE, and PDEs.</p>
      <p>In this paper we design an algorithm in order to apply
this technique to solve a different problem: finding all
possible solutions of an ordinary differential equation that
permits non-unique solutions. The vast majority of analytical an
numerical techniques have been developed and applied to
differential equations emitting unique solutions. However,
there are practical scientific problems that are modeled by
nonlinear differential equations that do not result in unique
solutions. The Bratu equation has been used to model
combustion phenomena as well as temperatures in the sun’s core.
The Bratu equation does not have a known analytical
solution in more than one spatial dimension. Moreover, classical
numerical methods will not be able to find all solution
families. (Karkowski 2013).</p>
      <p>After describing the background concepts and the
proposed algorithm, we list a set of simple problems where
the solutions are known in order to test the algorithm. The
results are presented including sensitivity analysis of the
hyper-parameters that need to be tuned.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Algorithms to solve differential equations using neural
networks were originally developed by Lagaris, Likas, and
Fotiadis(1998). The authors proposed a novel approach to
tackle the problem of numerically solving differential
equations using ANNs, listing some advantages of their
algorithm with respect to classical ones. Testing it with
ordinary differential equations (ODEs), partial differential
equations (PDEs) and systems of ODEs, they checked their
generalization abilities, obtaining surprisingly good results.
Finally, they extended it to problems with arbitrarily shaped
domains, in more than one dimension (Lagaris, Likas, and
Papageorgiou(1998)) and they applied it to quantum
mechanics (Lagaris, Likas, and Fotiadis(1997)). An interesting
survey is presented by Kumar and Yadav(2011). The
authors collect a large number of papers about solving
differential equations, focusing on both multi-layer
perceptrons and radial basis functions techniques, published
between 2001 and 2011. A similar approach, called the Deep
Galerkin Method (DGM), was proposed by Sirignano and
Spiliopoulos(2018). The authors present an algorithm
similar to Galerkin methods, but with neural networks instead of
basis functions. Raissi, Perdikaris, and Karniadakis(2019)
formulate Physics-informed neural networks, a framework
developed to solve two main classes of problems:
datadriven solution of PDEs, also inspired by Lagaris, Likas, and
Fotiadis(1998), and a data-driven discovery of PDEs,
encoding in the neural networks the physical laws that govern a
data-set. In another work,
        <xref ref-type="bibr" rid="ref3">Baymani, Kerayechian, and
Effati(2010</xref>
        ) compare the performance of neural networks with
the errors using Aman-Kerayechian method, for obtaining
the solutions of Stokes Equation.
      </p>
      <p>As stated before, one of the theoretical advantages of
using artificial neural networks to solve ODEs, with respect to
classical methods, is that the latter are affected to the curse
of dimensionality, while the NN approach scales better with
the number of dimensions. E, Han, and Jentzen(2017)
analyze this propriety, reformulating the PDEs using backward
stochastic differential equations, and applying it to solve
the nonlinear Black-Scholes equation, the
Hamilton-JacobiBellman equation, and the Allen-Cahn equation.</p>
      <p>It is also essential that neural networks incorporate known
physical laws. Mattheakis et al.(2019) embed physical
symmetries into the structure of the neural network. The
symplectic neural network obtained is tested with a system of
energy-conserving differential equations and outperforms an
unsupervised, non-symplectic neural network.</p>
      <p>
        There is also a new trend investigating dynamical
systems, introducing new families of neural networks,
characterized by continuous-depth, whose output is computed
using black-box differential equation solvers. Their
application is different from this work, since they are mainly used
for classification tasks (He et al. 2016; Chen et al.
        <xref ref-type="bibr" rid="ref6">2018;
Zhu, Chang, and Fu 2018</xref>
        ).
      </p>
    </sec>
    <sec id="sec-3">
      <title>Background</title>
      <p>As described by Lagaris, Likas, and Fotiadis(1998), fully
connected feed forward neural networks can be used to
compute solutions of differential equations with some
advantages and disadvantages with respect to classical approaches.
In this section, we briefly expose the background concepts.</p>
      <sec id="sec-3-1">
        <title>Differential equations</title>
        <sec id="sec-3-1-1">
          <title>We write a general ODE as,</title>
          <p>
            F (t; u(t); u0(t); u00(t); :::) = 0
(1)
where t is the independent variable and u(t) is the
solution of the differential equation. We restrict the scope of
the present paper to differential equations of only a single
variable and interpret t as time. Note that u0 represents the
first derivative and u00 represents the second derivative. We
remark that the differential equation considered herein (1)
need not to be separable with respect to u0(t), as required
in classical Runge-Kutta methods. This is one of the main
advantages of using neural networks to solve differential
equations with respect to classical approaches. In order to
determine u(t) from (1), one must specify either initial or
boundary values. Boundary value problems (BVPs) impose
the values of the function at the boundaries of the interval
t 2 [t0; tf ], e,g. u0 = u(t0) and uf = u(tf ). Initial value
problems (IVPs), on the other hand, impose the value of the
function and its derivatives at t0, e.g. u0 = u(t0), v0 =
u0(t0). Both of these approaches can be implemented using
neural networks with very little modification to the loss. For
example, switching between an IVP and a BVP does not
require any modification to the network architecture. One
only need adjust the loss function to incorporate the
appropriate initial or boundary conditions. Classical methods
often require different algorithms for IVPs or BVPs and
therefore may require additional code infrastructure. The
situation may become more severe when considering PDEs. The
neural network architecture will now take in multiple inputs
(e.g. space and time) and the loss function will incorprate the
appropriate initial and boundary conditions. However,
traditional PDE software codes may use multiple algorithms to
solve the entire PDE (e.g. time-integration in time and finite
elements in space
            <xref ref-type="bibr" rid="ref1 ref5">(Seefeldt et al. 2017; Alnaes et al. 2015;
Burns et al. 2016)</xref>
            .
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Solving ODEs with Neural networks</title>
        <p>Following the methodology explained in Lagaris, Likas, and
Fotiadis(1998), we use a fully connected feed forward neural
network to solve ODEs.</p>
        <p>
          The key idea of this approach is that the neural network
itself represents the solution of the differential equation. Its
parameters are learnt in order to obtain a function that
minimizes a loss, forcing the differential equation to be solved
and its boundary values (or initial conditions) to be satisfied.
Firstly, the architecture of the neural network has to be
selected, setting the number of layers, the number of units per
layer and the activation functions. Fine-tuning these
parameters and testing more complex architectures can be done to
better approximate the loss functions. However, for the
purpose of this work, we fix the architecture of the neural
network selecting the one that we believe is suitable enough for
the differential equations selected, by manually inspecting
the validation loss. Too small architectures could be unable
to model the target function, while bigger architecture could
be harder to train. For ODEs, the neural network only has
a single input: the independent variable t at several discrete
points ti. The output of the neural network represents the
value of the function that we want to learn. Since we are only
focusing on scalar equations in the present work, the
number of units in the output layer should also be one. Automatic
differentiation
          <xref ref-type="bibr" rid="ref7">(Corliss et al. 2002)</xref>
          is used to compute exact
derivatives with respect not only to the parameters of the
network, but also to the inputs to the neural network (here t).
The training process of the neural network is accomplished
by tuning the parameters of the neural network to minimize
a loss function. In the current problem, the loss function is
simply the differential operator acting on the predicted
function. That is,
i
We note that this method is related to the Galerkin
LeastSquares method where the residual is evaluated at discrete
points (similar to a collocation method). A function that
guarantees F ( ) = 0 everywhere is said to satisfy the
differential equation. In practice, the best case is that the neural
network will find a function u (t) that minimizes (2) by
obtaining values for the residual close to zero.
        </p>
        <p>To train the neural network, we pick a set of ntrain points
belonging to the domain in which we are interested [t0; tf ].
In this work we consider equally spaced points, although this
is not required. We perform a forward pass of the network
which provides the function approximation and calculate the
derivatives as required by the differential operator through
automatic differentiation. The forward pass is completed by
computing the loss via (2). The weights and biases of the
network are updated with the gradient descent algorithm via
application of the backpropagation algorithms. This process
is repeated until convergence or until the maximum number
of epochs is reached. The output of the process is the
function that obtains the lowest loss calculated using an
evaluation set, picking nval points equally spaced in the same
domain.</p>
        <p>A critical consideration is to realize the specified initial
and/or boundary conditions. To encode boundary values or
initial conditions, we use the method described by Sirignano
and Spiliopoulos(2018). We add a term in the loss function
evaluating the squared distance between the actual value of
the function at its boundaries and the imposed values. In the
case of BVPs this additional loss function is,</p>
        <p>LBV = (u(t0)
u0)2 + (u(tf )
uf )2:
In the case of IVPs, the new loss function is,</p>
        <p>LIC =
nord 1</p>
        <p>X
n=0
u(n) (t0)
v(n) 2
0
where nord is the order of the differential operator, u(n) is
the nth derivative of the function u and v(n) is the initial
0
value corresponding to the nth derivative. For example, for a
second order IVP, the loss would contain v0(0) and v0(1).</p>
        <p>The total loss is,</p>
        <p>L = LODE + LI (5)
where LI is given by (3) or (4) depending on the context of
the problem.</p>
        <p>In some applications, it may be necessary to introduce
a penalty parameter before LI to mitigate different
scalings between LODE and LI . In the present work, we fix the
penalty parameter to unity and note that this did not
interfere with the current results. It may be necessary to tune this
penalty parameter for more complicated problems.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Equation library</title>
      <p>The proposed approach is tested on two nonlinear
differential equations. Each of these equations is known to admit
multiple families of solutions.
(3)
(4)
(6)
(7)
(8)
(9)
(11a)
(11b)
(12)
(13)
(14a)
(14b)
(14c)
(16)
(17)
f (u0(t)) =</p>
      <p>1
u0(t)
p
u1(t) = 2 t
u2(t) = t + 1
f (u0(t)) = u0(t)3
u1(t) = 2
u2(t) = 2t + 8
u3(t) =</p>
      <p>3
t 2
3
t
1</p>
      <sec id="sec-4-1">
        <title>1D Bratu problem</title>
        <p>
          is
u00(t) + Ceu(t) = 0;
u (0) = u (1) = 0:
(15)
This ODE has two solutions if C &lt; Ccrit and only one
solution if C Ccrit, where Ccrit 3:51. We select C = 1
in the regime where two functions solve the problem
          <xref ref-type="bibr" rid="ref4">(Bratu
1914)</xref>
          . The exact analytical solution is
        </p>
        <sec id="sec-4-1-1">
          <title>The one-dimensional Bratu problem</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Clairaut equation</title>
        <p>The general Clairaut equation is
u(t) = tu0(t) + f (u0(t));
u (0) = u0
where f is a known nonlinear function Ince(1956). There
are two families of solutions to (6). The first is a line of the
form,</p>
        <p>u(t) = Ct + f (C)
where is determined by the initial conditions. The second
family of solutions can be represented in parametric form
as,
t(z) =
u(z) =
f 0(z)
zf 0(z) + f (z):</p>
        <p>In this work, we will consider two forms for the function
f . The first form is,
which can be written in the form of (1) as,
With u (1) = 2, the two separate solutions are,</p>
        <p>F (t; u(t); u0(t)) = tu0(t)2
u(t)u0(t) + 1 = 0
(10)</p>
        <sec id="sec-4-2-1">
          <title>For the second problem we select, which can be written as,</title>
          <p>F (t; u(t); u0(t)) = tu0(t)
u(t) + u0(t)3 = 0:
This problem has three distict solutions. With u( 3) = 2,
the three solutions are,
where
satisfies
u(t) = 2 ln</p>
          <p>cosh ( )
cosh ( (1</p>
          <p>2t))
cosh
= p
4
2C
:
For C = 1, equation (17) has two solutions given by 1
0:379 and 2 2:73, which ultimately results in two
solutions for (16).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Methodology</title>
      <p>In this section, we describe an algorithm that generalizes
the method described in section ”Solving ODEs with Neural
networks” to find all families of solutions to a given
differential equation.</p>
      <p>To find the whole set of solutions of an ODE, we
represent each of the N solutions with a different neural network
ul(t), l = 1; :::; N . Each function ul(t) is modelled through
a neural network with the same architecture, the same
number of layers and the same number of units, but weights are
not shared.</p>
      <p>The training is done simultaneously, minimizing a loss
function,
(18)
(19)
(20)
(21)
L =</p>
      <p>N
X `j</p>
      <p>j
`j = LODEj + LIj
where
as in equation 5. Since the neural networks are not
”interacting”, they will learn one of the possible solutions, but
there is no requirement that the learnt solutions will be
different. To overcome this shortcoming, we define an
interaction function g (u1; u2), detecting if two predicted functions
are close. Given two functions u1(t) and u2(t), the value of
g(u1; u2) needs to be high if they are similar, while, if they
are different, its value should be low. We propose an
interaction function of the form,
ntrain</p>
      <p>X
gp(u1; u2) =
ju1(ti)</p>
      <p>u2(ti)j p
i
where p is a hyperparameter that controls the separation
between the two functions. We propose a new loss function
that incorporates this interaction via,</p>
      <p>Ltot = L +</p>
      <p>X gp(ui; uj )
i6=j
where L is given by (18) and is a hyperparameter setting
the importance of the interaction.</p>
      <p>Training the neural network with loss defined in (21)
could lead to bad results near the boundaries or initial points.
This is because the different solution families must be close
to each other in those regions, but (21) insists that they
should be different. To overcome this problem, we use the
modified loss in equation (21) for a training interval [ni; nf ]
where ni &gt; 0 and nf &lt; ntot.</p>
      <p>To improve the performance of the algorithm, the
interaction term can be scaled by a factor dependent on the
distance of the point with respect to the boundaries. For
example, if we are dealing with an IVP the scaling factor will
t t
be 0 , so that the interaction is low near t0, while it is
tf t0
maximum near tf . For BVPs, the procedure is similar, with
low values at the borders and high values in the middle of
the interval.</p>
      <p>The basic training process is as follows. First, N neural
networks are randomly initialized. The first part of the
training proceeds for ni epochs using the loss in (18). The
networks could converge to the same minima or to different
ones. After ni epochs, the new loss in (21) replaces the loss
in (18) and the networks begin to interact through the
interaction term. The networks gradually try to converge to
solutions of the differential equation while moving away from
each other. After nf total epochs, the interaction term is
removed and the original loss in (18) is used to help each
network converge to the real minima. If the second phase of
the training separated the functions enough to be near
different minima, then they will converge to different solutions. A
variant to this approach is to fix one or more of the neural
networks during the second part of the training, so that the
effect of the interaction is to move the remaining solutions
away from the others. For example, if there are two distinct
solutions, one will be fixed near a real minimum, while the
other will be pushed away. If the number of solutions of the
differential equation is lower than the number of neural
networks, at least two of them will eventually converge to the
same solution.</p>
      <p>To explore better the space of solutions, we can gradually
increase the strength of the interaction . During the second
part of the training, the differential equation is gradually
neglected and the functions will tend to become more distinct.</p>
      <p>Data: 0; M ; lossM ; Dm; k
N 1, 0, N Nold
while &lt; M do</p>
      <p>N N createN N (N );
loss; D train(N N );
if 9 i jlossi &gt; lossM then</p>
      <p>return N Nold;
end
if 9 i; j jDij &lt; Dm then
k;</p>
      <p>N one;
end
else
end</p>
      <p>N Nold
N</p>
      <p>N N ;</p>
      <p>N + 1;
end
return N Nold;</p>
      <p>Algorithm 1: Complete algorithm</p>
      <p>Algorithm 1 depicts the method in detail. We initialize the
number of neural networks to 1 and the interaction
parameter to 0. We train the network and compute the loss and
the pairwise distances. The pairwise distances Dij are
calculated using (22).</p>
      <p>Dij =</p>
      <p>1
ntrain
ntrain
X
jui(tl)
uj (tl)j</p>
      <p>(22)
l</p>
      <p>Then, we check that every final loss is lower than a
threshold lossM , in order to understand if the algorithm converged
or not. If not, it means that the training process must be
improved (for example using a better optimization technique,
increasing the number of epochs or widening the
architecture).</p>
      <p>If the final losses are all lower than a threshold, then we
check the pairwise distances, to understand if the solutions
obtained are the same or not. If there is at least one couple
of solutions that has a distance lower than a fixed
threshold Dm, then it means that those functions converged to the
same minimum, thus we increase the strength of interaction
by a factor k and perform again the training. Otherwise, it
means that we were able to find at least N solutions. We
increase the number of networks by 1 and rerun the algorithm
in order to check if there are more than N possible solutions.
This procedure allows us to calculate all the solutions
without knowing their number a priori. To speed up the training,
we can use the neural networks learned as initialization of
the new ones, so that only one network has to be learned
from scratch each time we increase N .</p>
    </sec>
    <sec id="sec-6">
      <title>Results and discussion</title>
      <p>In this section the results are presented, followed by a
sensitivity analysis of the hyper-parameters selected.</p>
      <sec id="sec-6-1">
        <title>Neural network architecture</title>
        <p>All results were obtained using the same network
architecture. The network consists of two fully-connected
layers with 8 units per layer and sigmoid activations functions.
The architecture also uses a skip connection, which we
observed helps with learning linear functions. The network is
trained using the Adam optimizer with learning rate 10 2,
1 = 0:99 and 2 = 0:999</p>
      </sec>
      <sec id="sec-6-2">
        <title>Interaction function analysis</title>
        <p>We analyzed different shapes of interaction functions. The
results suggest that the interaction function in equation (20)
represents fairly well our idea of interaction and can be
easily tuned. In figure 1, the shape of the function changing p is
shown, while in figure 2, the interaction functions are shown
for different levels of strength . While increases during
the training, we fix the value of p = 5.
The proposed approach is able to successfully find the
distinct solutions corresponding to the analytical ones
(equations (11), (14), and (16)), up to a threshold of 10 5,
calculated as mean squared error for 10000 test points equally
spaced in the domains. In figures 3, 4, and 5, the solutions
of the three problems are plotted. The real solutions and
the ones obtained with the new algorithm are in very good
agreement, their difference is not visible.
Figures 6 and 7 show how the solutions evolve in
different training phases on the Clairaut equation, when the loss
function changes from equation (18) to equation (21).
Figure 6 shows the functions learned after ni = 1000 epochs.
They have not already converged, but we note that they are
converging to the same minima. Figure 7 shows the
functions learned in nf = 1000 epochs after the first training
phase. The functions learned are not satisfying the
differential equation yet, but are distinct enough to converge to
different minima that represents the two solutions of the
differential equation. The proposed algorithm doesn’t
guarantee that the solutions will be far enough to reach different
minima, if they exist, but increases the chances of not
converging to the same one. After the third training phase, the
solutions converged (see figure 3).</p>
        <p>In figure 8, we plot an example of loss function from
the Clairaut problem (problem 1) . The blue line represents
the loss in equation (18), while the green one is the total
loss (equation (21)). We notice that they coincide before
ni = 1000 and after ni + nf = 2000, while they are
different elsewhere where the algorithm tries to minimize the
loss with the interaction term included.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Selection of Hyperparameters</title>
        <p>The algorithm described above (algorithm 1) requires as
inputs a set of hyper-parameters, that we describe and analyze
in this section:</p>
        <p>0 is the initial strength of the interaction. This is not
a sensitive parameter. If not properly tuned, it will only
slow down the algorithm without any severe impact on
the accuracy of the solutions. The only caution needed is
to select it small enough to make the algorithm converge,
otherwise the second training phase could move the
solutions too far from the minima not to be able to properly
converge back to the real solutions. We set this value to
10 1;
k is the strength increase and is usually set to 2 in order to
double the strength of the interaction at every iteration;
M is the maximum strength to try, usually set as a
stopping condition. We should be careful to set it not too
small, since the second training phase should separate the
solutions enough to make them converge to different
minima, but not too high in order to not spend to much time in
the training phase (the higher M , the higher the number
of iterations of the loop). We set this value to 10;
lossM is the maximum value of the loss so that we know
the algorithm converged (the functions learned satisfy the
differential equation). We set it to 10 4;
Dm is the most important hyper parameter to set since
it defines if two learned functions are the same or not. If
this value is too low, then two functions will be
considered different even if they are actually the same, while if
it is set too high then two real solutions too close to each
other will be viewed as the same leading to an error. This
parameter is analyzed in more detail in the following
section.</p>
        <p>Analysis of Dm In figure 9, we show how a wrong
selection of Dm leads to wrong results. If we set Dm = 10 2,
a value too low, the algorithm finds more solutions than the
real ones. All the functions shown in the figure have loss less
than lossM and are therefore considered solutions of the
differential equation. However, they are distant more than Dm
from each other so they are not considered the same
solution.</p>
        <p>If we pick Dm too high, the distinct solutions could be
considered as the same and the algorithm will not be able to
detect both of them. For example, since for Clairaut
equation (problem 1), the average distance of the two solutions is
D 0:61 in the range [1; 5], setting Dm D will lead to
wrong results.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper, we designed a novel algorithm to find all
possible solutions of a differential equation with non-unique
solutions using neural networks. We tested the accuracy
applying it to three simple nonlinear differential equations that we
know admit more than one possible solution. The method
successfully finds the solutions with high accuracy. We
tested different interaction functions and hyper-parameters
and we verified the robustness of the approach. We expect
this algorithm to work also for more challenging ODEs, if
an accurate selection of hyper-parameters is performed.
Future work will involve not only application and analysis of
the proposed algorithm to higher dimension ODEs and
systems of ODEs, but also the exploitation of more appropriate
architectures such as dynamical systems like ResNets (He et
al. 2016).
E, W.; Han, J.; and Jentzen, A. 2017. Deep learning-based
numerical methods for high-dimensional parabolic partial differential
equations and backward stochastic differential equations.
Communications in Mathematics and Statistics 5(4):349–380.
He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual
learning for image recognition. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).</p>
      <p>Ince, E. 1956. Ordinary Differential Equations. Dover Books on
Mathematics. Dover Publications.</p>
      <p>Karkowski, J. 2013. Numerical experiments with the bratu
equation in one, two and three dimensions. Computational and Applied
Mathematics 32(2):231–244.</p>
      <p>Kumar, M., and Yadav, N. 2011. Multilayer perceptrons and
radial basis function neural network methods for the solution of
differential equations: A survey. Computers and Mathematics with
Applications 62(10):3796 – 3811.</p>
      <p>Lagaris, I.; Likas, A.; and Fotiadis, D. 1997. Artificial neural
network methods in quantum mechanics. Computer Physics
Communications 104(1):1 – 14.</p>
      <p>Lagaris, I. E.; Likas, A.; and Fotiadis, D. I. 1998. Artificial
neural networks for solving ordinary and partial differential equations.
Trans. Neur. Netw. 9(5):987–1000.</p>
      <p>Lagaris, I. E.; Likas, A.; and Papageorgiou, D. G. 1998. Neural
network methods for boundary value problems defined in
arbitrarily shaped domains. CoRR cs.NE/9812003.</p>
      <p>Mattheakis, M.; Protopapas, P.; Sondak, D.; Giovanni, M. D.; and
Kaxiras, E. 2019. Physical symmetries embedded in neural
networks.</p>
      <p>Raissi, M.; Perdikaris, P.; and Karniadakis, G. 2019.
Physicsinformed neural networks: A deep learning framework for solving
forward and inverse problems involving nonlinear partial
differential equations. Journal of Computational Physics 378:686 – 707.
Seefeldt, B.; Sondak, D.; Hensinger, D. M.; Phipps, E. T.; Foucar,
J. G.; Pawlowski, R. P.; Cyr, E. C.; Shadid, J. N.; Smith, T. M.;
Weber, P. D.; et al. 2017. Drekar v. 2.0. Technical report, Sandia
National Lab.(SNL-NM), Albuquerque, NM (United States).
Sirignano, J., and Spiliopoulos, K. 2018. Dgm: A deep
learning algorithm for solving partial differential equations. Journal of
Computational Physics 375:1339 – 1364.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Alnaes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blechta</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hake</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Johansson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kehlet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Logg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ring</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Rognes,
          <string-name>
            <surname>M. E.</surname>
          </string-name>
          ; and Wells,
          <string-name>
            <surname>G. N.</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>The fenics project version 1.5. Archive of Numerical Software</source>
          <volume>3</volume>
          (
          <issue>100</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Baymani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kerayechian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Effati</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Artificial neural networks approach for solving stokes problem</article-title>
          .
          <source>Applied Mathematics</source>
          <volume>1</volume>
          :
          <fpage>288</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bratu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>1914</year>
          .
          <article-title>Sur les e´quations inte´grales non line´aires</article-title>
          . Bulletin de la Socie´te´ Mathe´matique de France 42:
          <fpage>113</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Burns</surname>
            ,
            <given-names>K. J.</given-names>
          </string-name>
          ; Vasil,
          <string-name>
            <given-names>G. M.</given-names>
            ;
            <surname>Oishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            ;
            <surname>Lecoanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ; and
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Dedalus: Flexible framework for spectrally solving differential equations</article-title>
          .
          <source>Astrophysics Source Code Library.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          2018.
          <article-title>Neural ordinary differential equations</article-title>
          . In Bengio, S.; Wallach,
          <string-name>
            <surname>H.</surname>
          </string-name>
          ; Larochelle,
          <string-name>
            <given-names>H.</given-names>
            ;
            <surname>Grauman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Cesa-Bianchi</surname>
          </string-name>
          , N.; and Garnett, R., eds.,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>31</volume>
          . Curran Associates, Inc.
          <fpage>6571</fpage>
          -
          <lpage>6583</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Corliss</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Faure,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Griewank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Hascoet</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          ; and Naumann,
          <string-name>
            <surname>U.</surname>
          </string-name>
          <year>2002</year>
          .
          <article-title>Automatic differentiation of algorithms: from simulation to optimization</article-title>
          , volume
          <volume>1</volume>
          . Springer Science &amp; Business Media.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Convolutional neural networks combined with runge-kutta methods</article-title>
          . arXiv preprint arXiv:
          <year>1802</year>
          .08831.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>