=Paper= {{Paper |id=Vol-2587/article_7 |storemode=property |title=Finding Multiple Solutions of ODEs with Neural Networks |pdfUrl=https://ceur-ws.org/Vol-2587/article_7.pdf |volume=Vol-2587 |authors=Marco Di Giovanni,David Sondak,Pavlos Protopapas,Marco Brambilla |dblpUrl=https://dblp.org/rec/conf/aaaiss/GiovanniSP020 }} ==Finding Multiple Solutions of ODEs with Neural Networks== https://ceur-ws.org/Vol-2587/article_7.pdf
                     Finding Multiple Solutions of ODEs with Neural Networks


                  Marco Di Giovanni1 , David Sondak2 , Pavlos Protopapas2 , Marco Brambilla1
            1
                Politecnico di Milano. Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB). Milan, Italy
                 2
                   Institute for Applied Computational Science, Harvard University, Cambridge, MA, United States




                            Abstract                                 ANN approach, suggesting that even if the results are not
                                                                     quite as good as the classical approaches, there are good rea-
  Applications of neural networks to numerical problems have         sons to continue the research in this direction. In fact, the so-
  gained increasing interest. Among different tasks, finding so-
  lutions of ordinary differential equations (ODEs) is one of the
                                                                     lution obtained is differentiable with a closed analytic form
  most intriguing problems, as there may be advantages over          and this approach is easily parallelizable and general since it
  well established and robust classical approaches. In this pa-      can be applied to ODEs, systems of ODE, and PDEs.
  per, we propose an algorithm to find all possible solutions of        In this paper we design an algorithm in order to apply
  an ordinary differential equation that has multiple solutions,     this technique to solve a different problem: finding all pos-
  using artificial neural networks. The key idea is the introduc-    sible solutions of an ordinary differential equation that per-
  tion of a new loss term that we call the interaction loss. The     mits non-unique solutions. The vast majority of analytical an
  interaction loss prevents different solutions families from co-    numerical techniques have been developed and applied to
  inciding. We carried out experiments with two nonlinear dif-       differential equations emitting unique solutions. However,
  ferential equations, all admitting more than one solution, to
                                                                     there are practical scientific problems that are modeled by
  show the effectiveness of our algorithm, and we performed
  a sensitivity analysis to investigate the impact of different      nonlinear differential equations that do not result in unique
  hyper-parameters.                                                  solutions. The Bratu equation has been used to model com-
                                                                     bustion phenomena as well as temperatures in the sun’s core.
                                                                     The Bratu equation does not have a known analytical solu-
                         Introduction                                tion in more than one spatial dimension. Moreover, classical
A wide variety of phenomena are governed by mathemati-               numerical methods will not be able to find all solution fam-
cal laws that can be represented using ordinary differential         ilies. (Karkowski 2013).
equations, ranging from the field of physics to chemistry to            After describing the background concepts and the pro-
finance.                                                             posed algorithm, we list a set of simple problems where
   In the field of physics, examples include Navier-Stokes           the solutions are known in order to test the algorithm. The
equation, Schrodinger equation and many others. However,             results are presented including sensitivity analysis of the
it is not always possible to solve those problems analyt-            hyper-parameters that need to be tuned.
ically, and sometimes numerical approaches are required
since these equations are too complicated or the number of                                 Related Work
equations and variables is too big.
                                                                     Algorithms to solve differential equations using neural net-
   Numerical approaches to solve differential equations have
                                                                     works were originally developed by Lagaris, Likas, and
been widely studied and improved almost since the first dif-
                                                                     Fotiadis(1998). The authors proposed a novel approach to
ferential equations were written down. Classical approaches
                                                                     tackle the problem of numerically solving differential equa-
for spatial discretization of partial differential equations
                                                                     tions using ANNs, listing some advantages of their algo-
(PDEs) include finite differences, finite elements, finite vol-
                                                                     rithm with respect to classical ones. Testing it with ordi-
umes, and spectral methods. Classical methods for dis-
                                                                     nary differential equations (ODEs), partial differential equa-
cretizing ordinary differential equations (ODEs) include the
                                                                     tions (PDEs) and systems of ODEs, they checked their gen-
Runge-Kutta (RK) methods and the backward difference
                                                                     eralization abilities, obtaining surprisingly good results. Fi-
formulas (BDF). Also artificial neural networks (ANNs)
                                                                     nally, they extended it to problems with arbitrarily shaped
have been applied to this scope. Lagaris, Likas, and Fo-
                                                                     domains, in more than one dimension (Lagaris, Likas, and
tiadis(1998) listed a set of theoretical advantages of the
                                                                     Papageorgiou(1998)) and they applied it to quantum me-
Copyright c 2020, for this paper by its authors. Use permit-         chanics (Lagaris, Likas, and Fotiadis(1997)). An interesting
ted under Creative Commons License Attribution 4.0 International     survey is presented by Kumar and Yadav(2011). The au-
(CCBY 4.0).                                                          thors collect a large number of papers about solving dif-
ferential equations, focusing on both multi-layer percep-         advantages of using neural networks to solve differential
trons and radial basis functions techniques, published be-        equations with respect to classical approaches. In order to
tween 2001 and 2011. A similar approach, called the Deep          determine u(t) from (1), one must specify either initial or
Galerkin Method (DGM), was proposed by Sirignano and              boundary values. Boundary value problems (BVPs) impose
Spiliopoulos(2018). The authors present an algorithm simi-        the values of the function at the boundaries of the interval
lar to Galerkin methods, but with neural networks instead of      t ∈ [t0 , tf ], e,g. u0 = u(t0 ) and uf = u(tf ). Initial value
basis functions. Raissi, Perdikaris, and Karniadakis(2019)        problems (IVPs), on the other hand, impose the value of the
formulate Physics-informed neural networks, a framework           function and its derivatives at t0 , e.g. u0 = u(t0 ), v0 =
developed to solve two main classes of problems: data-            u0 (t0 ). Both of these approaches can be implemented using
driven solution of PDEs, also inspired by Lagaris, Likas, and     neural networks with very little modification to the loss. For
Fotiadis(1998), and a data-driven discovery of PDEs, encod-       example, switching between an IVP and a BVP does not
ing in the neural networks the physical laws that govern a        require any modification to the network architecture. One
data-set. In another work, Baymani, Kerayechian, and Ef-          only need adjust the loss function to incorporate the appro-
fati(2010) compare the performance of neural networks with        priate initial or boundary conditions. Classical methods of-
the errors using Aman-Kerayechian method, for obtaining           ten require different algorithms for IVPs or BVPs and there-
the solutions of Stokes Equation.                                 fore may require additional code infrastructure. The situa-
   As stated before, one of the theoretical advantages of us-     tion may become more severe when considering PDEs. The
ing artificial neural networks to solve ODEs, with respect to     neural network architecture will now take in multiple inputs
classical methods, is that the latter are affected to the curse   (e.g. space and time) and the loss function will incorprate the
of dimensionality, while the NN approach scales better with       appropriate initial and boundary conditions. However, tradi-
the number of dimensions. E, Han, and Jentzen(2017) ana-          tional PDE software codes may use multiple algorithms to
lyze this propriety, reformulating the PDEs using backward        solve the entire PDE (e.g. time-integration in time and finite
stochastic differential equations, and applying it to solve       elements in space (Seefeldt et al. 2017; Alnæs et al. 2015;
the nonlinear Black-Scholes equation, the Hamilton-Jacobi-        Burns et al. 2016).
Bellman equation, and the Allen-Cahn equation.
   It is also essential that neural networks incorporate known
physical laws. Mattheakis et al.(2019) embed physical sym-        Solving ODEs with Neural networks
metries into the structure of the neural network. The sym-
plectic neural network obtained is tested with a system of        Following the methodology explained in Lagaris, Likas, and
energy-conserving differential equations and outperforms an       Fotiadis(1998), we use a fully connected feed forward neural
unsupervised, non-symplectic neural network.                      network to solve ODEs.
   There is also a new trend investigating dynamical sys-            The key idea of this approach is that the neural network
tems, introducing new families of neural networks, charac-        itself represents the solution of the differential equation. Its
terized by continuous-depth, whose output is computed us-         parameters are learnt in order to obtain a function that min-
ing black-box differential equation solvers. Their applica-       imizes a loss, forcing the differential equation to be solved
tion is different from this work, since they are mainly used      and its boundary values (or initial conditions) to be satisfied.
for classification tasks (He et al. 2016; Chen et al. 2018;       Firstly, the architecture of the neural network has to be se-
Zhu, Chang, and Fu 2018).                                         lected, setting the number of layers, the number of units per
                                                                  layer and the activation functions. Fine-tuning these param-
                       Background                                 eters and testing more complex architectures can be done to
As described by Lagaris, Likas, and Fotiadis(1998), fully         better approximate the loss functions. However, for the pur-
connected feed forward neural networks can be used to com-        pose of this work, we fix the architecture of the neural net-
pute solutions of differential equations with some advan-         work selecting the one that we believe is suitable enough for
tages and disadvantages with respect to classical approaches.     the differential equations selected, by manually inspecting
In this section, we briefly expose the background concepts.       the validation loss. Too small architectures could be unable
                                                                  to model the target function, while bigger architecture could
Differential equations                                            be harder to train. For ODEs, the neural network only has
                                                                  a single input: the independent variable t at several discrete
We write a general ODE as,                                        points ti . The output of the neural network represents the
               F (t, u(t), u0 (t), u00 (t), ...) = 0       (1)    value of the function that we want to learn. Since we are only
                                                                  focusing on scalar equations in the present work, the num-
where t is the independent variable and u(t) is the solu-         ber of units in the output layer should also be one. Automatic
tion of the differential equation. We restrict the scope of       differentiation (Corliss et al. 2002) is used to compute exact
the present paper to differential equations of only a single      derivatives with respect not only to the parameters of the net-
variable and interpret t as time. Note that u0 represents the     work, but also to the inputs to the neural network (here t).
first derivative and u00 represents the second derivative. We     The training process of the neural network is accomplished
remark that the differential equation considered herein (1)       by tuning the parameters of the neural network to minimize
need not to be separable with respect to u0 (t), as required      a loss function. In the current problem, the loss function is
in classical Runge-Kutta methods. This is one of the main         simply the differential operator acting on the predicted func-
tion. That is,                                                               Clairaut equation     The general Clairaut equation is
                  X
        LODE =          F (ti , u(ti ), u0 (ti ), u00 (ti ), ...)2 .   (2)             u(t) = tu0 (t) + f (u0 (t)),        u (0) = u0     (6)
                    i
We note that this method is related to the Galerkin Least-                   where f is a known nonlinear function Ince(1956). There
Squares method where the residual is evaluated at discrete                   are two families of solutions to (6). The first is a line of the
points (similar to a collocation method). A function that                    form,
guarantees F (·) = 0 everywhere is said to satisfy the dif-                                       u(t) = Ct + f (C)                       (7)
ferential equation. In practice, the best case is that the neural            where is determined by the initial conditions. The second
network will find a function u∗ (t) that minimizes (2) by ob-                family of solutions can be represented in parametric form
taining values for the residual close to zero.                               as,
                                                                                              t(z) = −f 0 (z)
                                                                                            
   To train the neural network, we pick a set of ntrain points
belonging to the domain in which we are interested [t0 , tf ].                                                                            (8)
                                                                                              u(z) = −zf 0 (z) + f (z).
In this work we consider equally spaced points, although this
is not required. We perform a forward pass of the network                        In this work, we will consider two forms for the function
which provides the function approximation and calculate the                  f . The first form is,
derivatives as required by the differential operator through                                                        1
automatic differentiation. The forward pass is completed by                                        f (u0 (t)) =                           (9)
                                                                                                                  u0 (t)
computing the loss via (2). The weights and biases of the
network are updated with the gradient descent algorithm via                  which can be written in the form of (1) as,
application of the backpropagation algorithms. This process
is repeated until convergence or until the maximum number                        F (t, u(t), u0 (t)) = tu0 (t)2 − u(t)u0 (t) + 1 = 0     (10)
of epochs is reached. The output of the process is the func-                 With u (1) = 2, the two separate solutions are,
tion that obtains the lowest loss calculated using an eval-                                                  √
uation set, picking nval points equally spaced in the same                                         u1 (t) = 2 t                         (11a)
domain.                                                                                           u2 (t) = t + 1                        (11b)
   A critical consideration is to realize the specified initial
and/or boundary conditions. To encode boundary values or                       For the second problem we select,
initial conditions, we use the method described by Sirignano
                                                                                                   f (u0 (t)) = u0 (t)3                  (12)
and Spiliopoulos(2018). We add a term in the loss function
evaluating the squared distance between the actual value of                  which can be written as,
the function at its boundaries and the imposed values. In the
case of BVPs this additional loss function is,                                   F (t, u(t), u0 (t)) = tu0 (t) − u(t) + u0 (t)3 = 0.     (13)
           LBV = (u(t0 ) − u0 )2 + (u(tf ) − uf )2 .          (3)            This problem has three distict solutions. With u(−3) = 2,
   In the case of IVPs, the new loss function is,                            the three solutions are,
                       X−1 
                      nord
                                             (n)
                                                 2                                                            3
                                                                                                              t 2
              LIC =           u(n) (t0 ) − v0                 (4)                              u1 (t) = 2 −                       (14a)
                        n=0
                                                                                                              3
where nord is the order of the differential operator, u(n) is                                       u2 (t) = 2t + 8              (14b)
                                              (n)
the nth derivative of the function u and v0 is the initial                                          u3 (t) = −t − 1               (14c)
                               th
value corresponding to the n derivative. For example, for a                  1D Bratu problem The one-dimensional Bratu problem
                                             (0)      (1)
second order IVP, the loss would contain v0 and v0 .                         is
   The total loss is,                                                                  u00 (t) + Ceu(t) = 0, u (0) = u (1) = 0.    (15)
                       L = LODE + LI                        (5)              This ODE has two solutions if C < Ccrit and only one so-
where LI is given by (3) or (4) depending on the context of                  lution if C ≥ Ccrit , where Ccrit ≈ 3.51. We select C = 1
the problem.                                                                 in the regime where two functions solve the problem (Bratu
   In some applications, it may be necessary to introduce                    1914). The exact analytical solution is
a penalty parameter before LI to mitigate different scal-                                                      cosh (α)
ings between LODE and LI . In the present work, we fix the                                   u(t) = 2 ln                                 (16)
penalty parameter to unity and note that this did not inter-                                               cosh (α(1 − 2t))
fere with the current results. It may be necessary to tune this              where α satisfies
penalty parameter for more complicated problems.                                                             4
                                                                                                   cosh α = √ α.                         (17)
                    Equation library                                                                         2C
The proposed approach is tested on two nonlinear differen-                   For C = 1, equation (17) has two solutions given by α1 ≈
tial equations. Each of these equations is known to admit                    0.379 and α2 ≈ 2.73, which ultimately results in two solu-
multiple families of solutions.                                              tions for (16).
                       Methodology                                   ones. After ni epochs, the new loss in (21) replaces the loss
In this section, we describe an algorithm that generalizes           in (18) and the networks begin to interact through the inter-
the method described in section ”Solving ODEs with Neural            action term. The networks gradually try to converge to so-
networks” to find all families of solutions to a given differ-       lutions of the differential equation while moving away from
ential equation.                                                     each other. After nf total epochs, the interaction term is re-
   To find the whole set of solutions of an ODE, we repre-           moved and the original loss in (18) is used to help each net-
sent each of the N solutions with a different neural network         work converge to the real minima. If the second phase of
ul (t), l = 1, ..., N . Each function ul (t) is modelled through     the training separated the functions enough to be near differ-
a neural network with the same architecture, the same num-           ent minima, then they will converge to different solutions. A
ber of layers and the same number of units, but weights are          variant to this approach is to fix one or more of the neural
not shared.                                                          networks during the second part of the training, so that the
   The training is done simultaneously, minimizing a loss            effect of the interaction is to move the remaining solutions
function,                                                            away from the others. For example, if there are two distinct
                                 XN                                  solutions, one will be fixed near a real minimum, while the
                            L=       `j                      (18)    other will be pushed away. If the number of solutions of the
                                  j                                  differential equation is lower than the number of neural net-
where                                                                works, at least two of them will eventually converge to the
                         `j = LODEj + LIj                   (19)     same solution.
                                                                        To explore better the space of solutions, we can gradually
as in equation 5. Since the neural networks are not ”inter-          increase the strength of the interaction λ. During the second
acting”, they will learn one of the possible solutions, but          part of the training, the differential equation is gradually ne-
there is no requirement that the learnt solutions will be dif-       glected and the functions will tend to become more distinct.
ferent. To overcome this shortcoming, we define an interac-
tion function g (u1 , u2 ), detecting if two predicted functions       Data: λ0 , λM , lossM , Dm , k
are close. Given two functions u1 (t) and u2 (t), the value of         N ← 1, λ ← λ0 , N Nold ← N one;
g(u1 , u2 ) needs to be high if they are similar, while, if they       while λ < λM do
are different, its value should be low. We propose an inter-               N N ← createN N (N );
action function of the form,                                               loss, D ← train(N N );
                            nX
                             train
                                                                           if ∃ i |lossi > lossM then
            gp (u1 , u2 ) =        |u1 (ti ) − u2 (ti )|−p  (20)               return N Nold ;
                             i                                             end
where p is a hyperparameter that controls the separation be-               if ∃ i, j |Dij < Dm then
tween the two functions. We propose a new loss function                        λ ← λ ∗ k;
that incorporates this interaction via,                                    end
                                X                                          else
                Ltot = L + λ        gp (ui , uj )       (21)                   N Nold ← N N ;
                                 i6=j                                          N ← N + 1;
where L is given by (18) and λ is a hyperparameter setting                 end
the importance of the interaction.                                     end
   Training the neural network with loss defined in (21)               return N Nold ;
could lead to bad results near the boundaries or initial points.                Algorithm 1: Complete algorithm
This is because the different solution families must be close
to each other in those regions, but (21) insists that they              Algorithm 1 depicts the method in detail. We initialize the
should be different. To overcome this problem, we use the            number of neural networks to 1 and the interaction param-
modified loss in equation (21) for a training interval [ni , nf ]    eter to λ0 . We train the network and compute the loss and
where ni > 0 and nf < ntot .                                         the pairwise distances. The pairwise distances Dij are cal-
   To improve the performance of the algorithm, the inter-           culated using (22).
action term can be scaled by a factor dependent on the dis-
tance of the point with respect to the boundaries. For ex-                                        nX
                                                                                                   train
                                                                                           1
ample, if we are dealing with an IVP the scaling factor will                     Dij =                     |ui (tl ) − uj (tl )|   (22)
     t − t0                                                                              ntrain
                                                                                                    l
be          , so that the interaction is low near t0 , while it is
    tf − t0                                                             Then, we check that every final loss is lower than a thresh-
maximum near tf . For BVPs, the procedure is similar, with           old lossM , in order to understand if the algorithm converged
low values at the borders and high values in the middle of           or not. If not, it means that the training process must be im-
the interval.                                                        proved (for example using a better optimization technique,
   The basic training process is as follows. First, N neural         increasing the number of epochs or widening the architec-
networks are randomly initialized. The first part of the train-      ture).
ing proceeds for ni epochs using the loss in (18). The net-             If the final losses are all lower than a threshold, then we
works could converge to the same minima or to different              check the pairwise distances, to understand if the solutions
obtained are the same or not. If there is at least one couple      spaced in the domains. In figures 3, 4, and 5, the solutions
of solutions that has a distance lower than a fixed thresh-        of the three problems are plotted. The real solutions and
old Dm , then it means that those functions converged to the       the ones obtained with the new algorithm are in very good
same minimum, thus we increase the strength of interaction         agreement, their difference is not visible.
by a factor k and perform again the training. Otherwise, it
means that we were able to find at least N solutions. We in-
crease the number of networks by 1 and rerun the algorithm
in order to check if there are more than N possible solutions.
This procedure allows us to calculate all the solutions with-
out knowing their number a priori. To speed up the training,
we can use the neural networks learned as initialization of
the new ones, so that only one network has to be learned
from scratch each time we increase N .

                 Results and discussion
In this section the results are presented, followed by a sensi-
tivity analysis of the hyper-parameters selected.
                                                                   Figure 3: Solutions of            Figure 4: Solutions of
Neural network architecture                                        Clairaut equation, first          Clairaut equation, second
                                                                   problem (equation 11)             problem (equation 14)
All results were obtained using the same network archi-
tecture. The network consists of two fully-connected lay-
ers with 8 units per layer and sigmoid activations functions.
The architecture also uses a skip connection, which we ob-
served helps with learning linear functions. The network is
trained using the Adam optimizer with learning rate 10−2 ,
β1 = 0.99 and β2 = 0.999

Interaction function analysis
We analyzed different shapes of interaction functions. The
results suggest that the interaction function in equation (20)
represents fairly well our idea of interaction and can be eas-
ily tuned. In figure 1, the shape of the function changing p is
shown, while in figure 2, the interaction functions are shown
for different levels of strength λ. While λ increases during
the training, we fix the value of p = 5.

                                                                       Figure 5: Solutions of Bratu problem (equation 16)


                                                                   Training phases
                                                                   Figures 6 and 7 show how the solutions evolve in differ-
                                                                   ent training phases on the Clairaut equation, when the loss
                                                                   function changes from equation (18) to equation (21). Fig-
                                                                   ure 6 shows the functions learned after ni = 1000 epochs.
                                                                   They have not already converged, but we note that they are
                                                                   converging to the same minima. Figure 7 shows the func-
                                                                   tions learned in nf = 1000 epochs after the first training
Figure 1: Interaction func-        Figure 2: Interaction func-
                                                                   phase. The functions learned are not satisfying the differ-
tion at different p values at      tion at different λ values at
                                                                   ential equation yet, but are distinct enough to converge to
λ=1                                p=1
                                                                   different minima that represents the two solutions of the dif-
                                                                   ferential equation. The proposed algorithm doesn’t guaran-
                                                                   tee that the solutions will be far enough to reach different
                                                                   minima, if they exist, but increases the chances of not con-
Quantitative results                                               verging to the same one. After the third training phase, the
The proposed approach is able to successfully find the dis-        solutions converged (see figure 3).
tinct solutions corresponding to the analytical ones (equa-           In figure 8, we plot an example of loss function from
tions (11), (14), and (16)), up to a threshold of 10−5 , cal-      the Clairaut problem (problem 1) . The blue line represents
culated as mean squared error for 10000 test points equally        the loss in equation (18), while the green one is the total
                                                                   • λM is the maximum strength to try, usually set as a stop-
                                                                     ping condition. We should be careful to set it not too
                                                                     small, since the second training phase should separate the
                                                                     solutions enough to make them converge to different min-
                                                                     ima, but not too high in order to not spend to much time in
                                                                     the training phase (the higher λM , the higher the number
                                                                     of iterations of the loop). We set this value to 10;
                                                                   • lossM is the maximum value of the loss so that we know
                                                                     the algorithm converged (the functions learned satisfy the
                                                                     differential equation). We set it to 10−4 ;
                                                                   • Dm is the most important hyper parameter to set since
Figure 6: Two solutions            Figure 7: Two solutions           it defines if two learned functions are the same or not. If
of Clairaut equation (prob-        of Clairaut equation (prob-       this value is too low, then two functions will be consid-
lem 1), after the first train-     lem 1), after the second          ered different even if they are actually the same, while if
ing phase                          training phase                    it is set too high then two real solutions too close to each
                                                                     other will be viewed as the same leading to an error. This
                                                                     parameter is analyzed in more detail in the following sec-
                                                                     tion.
loss (equation (21)). We notice that they coincide before          Analysis of Dm In figure 9, we show how a wrong selec-
ni = 1000 and after ni + nf = 2000, while they are dif-            tion of Dm leads to wrong results. If we set Dm = 10−2 ,
ferent elsewhere where the algorithm tries to minimize the         a value too low, the algorithm finds more solutions than the
loss with the interaction term included.                           real ones. All the functions shown in the figure have loss less
                                                                   than lossM and are therefore considered solutions of the dif-
                                                                   ferential equation. However, they are distant more than Dm
                                                                   from each other so they are not considered the same solu-
                                                                   tion.
                                                                      If we pick Dm too high, the distinct solutions could be
                                                                   considered as the same and the algorithm will not be able to
                                                                   detect both of them. For example, since for Clairaut equa-
                                                                   tion (problem 1), the average distance of the two solutions is
                                                                   D∗ ≈ 0.61 in the range [1, 5], setting Dm ≥ D∗ will lead to
                                                                   wrong results.




           Figure 8: Validation loss during training


Selection of Hyperparameters
The algorithm described above (algorithm 1) requires as in-
puts a set of hyper-parameters, that we describe and analyze
in this section:
• λ0 is the initial strength of the interaction. This is not
  a sensitive parameter. If not properly tuned, it will only
  slow down the algorithm without any severe impact on             Figure 9: Example of Dm value set too low for Clairaut
  the accuracy of the solutions. The only caution needed is        equation (problem 1)
  to select it small enough to make the algorithm converge,
  otherwise the second training phase could move the solu-
  tions too far from the minima not to be able to properly                                Conclusion
  converge back to the real solutions. We set this value to        In this paper, we designed a novel algorithm to find all pos-
  10−1 ;                                                           sible solutions of a differential equation with non-unique so-
• k is the strength increase and is usually set to 2 in order to   lutions using neural networks. We tested the accuracy apply-
  double the strength of the interaction at every iteration;       ing it to three simple nonlinear differential equations that we
know admit more than one possible solution. The method                    Mattheakis, M.; Protopapas, P.; Sondak, D.; Giovanni, M. D.; and
successfully finds the solutions with high accuracy. We                   Kaxiras, E. 2019. Physical symmetries embedded in neural net-
tested different interaction functions and hyper-parameters               works.
and we verified the robustness of the approach. We expect                 Raissi, M.; Perdikaris, P.; and Karniadakis, G. 2019. Physics-
this algorithm to work also for more challenging ODEs, if                 informed neural networks: A deep learning framework for solving
an accurate selection of hyper-parameters is performed. Fu-               forward and inverse problems involving nonlinear partial differen-
ture work will involve not only application and analysis of               tial equations. Journal of Computational Physics 378:686 – 707.
the proposed algorithm to higher dimension ODEs and sys-                  Seefeldt, B.; Sondak, D.; Hensinger, D. M.; Phipps, E. T.; Foucar,
tems of ODEs, but also the exploitation of more appropriate               J. G.; Pawlowski, R. P.; Cyr, E. C.; Shadid, J. N.; Smith, T. M.;
architectures such as dynamical systems like ResNets (He et               Weber, P. D.; et al. 2017. Drekar v. 2.0. Technical report, Sandia
al. 2016).                                                                National Lab.(SNL-NM), Albuquerque, NM (United States).
                                                                          Sirignano, J., and Spiliopoulos, K. 2018. Dgm: A deep learn-
                                                                          ing algorithm for solving partial differential equations. Journal of
                           References                                     Computational Physics 375:1339 – 1364.
Alnæs, M.; Blechta, J.; Hake, J.; Johansson, A.; Kehlet, B.; Logg,        Zhu, M.; Chang, B.; and Fu, C. 2018. Convolutional neural
A.; Richardson, C.; Ring, J.; Rognes, M. E.; and Wells, G. N. 2015.       networks combined with runge-kutta methods. arXiv preprint
The fenics project version 1.5. Archive of Numerical Software             arXiv:1802.08831.
3(100).
Baymani, M.; Kerayechian, A.; and Effati, S. 2010. Artificial neu-
ral networks approach for solving stokes problem. Applied Mathe-
matics 1:288–292.
Bratu, G. 1914. Sur les équations intégrales non linéaires. Bulletin
de la Société Mathématique de France 42:113–142.
Burns, K. J.; Vasil, G. M.; Oishi, J. S.; Lecoanet, D.; and Brown,
B. 2016. Dedalus: Flexible framework for spectrally solving dif-
ferential equations. Astrophysics Source Code Library.
Chen, T. Q.; Rubanova, Y.; Bettencourt, J.; and Duvenaud, D. K.
2018. Neural ordinary differential equations. In Bengio, S.; Wal-
lach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; and Gar-
nett, R., eds., Advances in Neural Information Processing Systems
31. Curran Associates, Inc. 6571–6583.
Corliss, G.; Faure, C.; Griewank, A.; Hascoet, L.; and Naumann,
U. 2002. Automatic differentiation of algorithms: from simulation
to optimization, volume 1. Springer Science & Business Media.
E, W.; Han, J.; and Jentzen, A. 2017. Deep learning-based nu-
merical methods for high-dimensional parabolic partial differential
equations and backward stochastic differential equations. Commu-
nications in Mathematics and Statistics 5(4):349–380.
He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learn-
ing for image recognition. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Ince, E. 1956. Ordinary Differential Equations. Dover Books on
Mathematics. Dover Publications.
Karkowski, J. 2013. Numerical experiments with the bratu equa-
tion in one, two and three dimensions. Computational and Applied
Mathematics 32(2):231–244.
Kumar, M., and Yadav, N. 2011. Multilayer perceptrons and ra-
dial basis function neural network methods for the solution of dif-
ferential equations: A survey. Computers and Mathematics with
Applications 62(10):3796 – 3811.
Lagaris, I.; Likas, A.; and Fotiadis, D. 1997. Artificial neural net-
work methods in quantum mechanics. Computer Physics Commu-
nications 104(1):1 – 14.
Lagaris, I. E.; Likas, A.; and Fotiadis, D. I. 1998. Artificial neu-
ral networks for solving ordinary and partial differential equations.
Trans. Neur. Netw. 9(5):987–1000.
Lagaris, I. E.; Likas, A.; and Papageorgiou, D. G. 1998. Neural
network methods for boundary value problems defined in arbitrar-
ily shaped domains. CoRR cs.NE/9812003.