=Paper= {{Paper |id=Vol-2964/article_167 |storemode=property |title=Sparsely Constrained Neural Networks for Model Discovery of PDEs |pdfUrl=https://ceur-ws.org/Vol-2964/article_167.pdf |volume=Vol-2964 |authors=Gert-Jan Both,Gijs Vermarien,Remy Kusters |dblpUrl=https://dblp.org/rec/conf/aaaiss/BothVK21 }} ==Sparsely Constrained Neural Networks for Model Discovery of PDEs== https://ceur-ws.org/Vol-2964/article_167.pdf
              Sparsely constrained neural networks for model discovery of PDEs
                                  Gert-Jan Both,1 Gijs Vermariën, 2 Remy Kusters 1
       1
           Université de Paris, INSERM U1284, Center for Research and Interdisciplinarity (CRI), F-75006 Paris, France
                                  2
                                    Leiden Observatory, Leiden University, Leiden, The Netherlands
                       gert-jan.both@cri-paris.org, vermarien@strw.leidenuniv.nl, remy.kusters@cri-paris.org




                           Abstract                                sets, but suffers from convergence issues and, to date, does
                                                                   not leverage advanced sparse regression techniques.
  Sparse regression on a library of candidate features has de-
                                                                      In this paper we present a modular approach to combine
  veloped as the prime method to discover the partial differ-
  ential equation underlying a spatio-temporal data-set. These     deep-learning based models with state-of-the-art sparse re-
  features consist of higher order derivatives, limiting model     gression techniques. Our framework consists of a neural
  discovery to densely sampled data-sets with low noise. Neural    network to model the data, from which we construct the
  network-based approaches circumvent this limit by construct-     function library. Key to our approach is that we dynamically
  ing a surrogate model of the data, but have to date ignored      apply a mask to select the active terms in the function library
  advances in sparse regression algorithms. In this paper we       throughout training and constrain the network to solutions of
  present a modular framework that dynamically determines          the equation given by these active terms. To determine this
  the sparsity pattern of a deep-learning based surrogate using    mask, we can use any non-differentiable sparsity-promoting
  any sparse regression technique. Using our new approach,         algorithm (see figure 1). This allows us to use a constrained
  we introduce a new constraint on the neural network and
                                                                   neural network to model the data and construct an accurate
  show how a different network architecture and sparsity es-
  timator improve model discovery accuracy and convergence         function library, while an advanced sparsity promoting algo-
  on several benchmark examples. Our framework is available        rithm is used to dynamically discover the equation based on
  at https://github.com/PhIMaL/DeePyMoD                            output from the network.
                                                                      We present three experiments to show how varying these
                                                                   components improves the performance of model discovery.
                       Introduction                                (I) We replace the gradient-based optimisation of the con-
Model discovery aims at finding interpretive models in the         straint by one based on ordinary least squares, leading to
form of PDEs from large spatio-temporal data-sets. Most            much faster convergence. (II) We show that using PDE-find
algorithms apply sparse regression on a predefined set of can-     to find the active components outperforms a threshold-based
didate terms, as initially proposed by Brunton et al. for ODEs     Lasso approach in highly noisy data-set. (III) We demon-
with SINDY (Brunton, Proctor, and Kutz 2016) and by Rudy           strate that using a SIREN (Sitzmann et al. 2020) instead of a
et al. for PDEs with PDE-find (Rudy et al. 2017). By writ-         standard feed forward-neural network allows us to discover
ing the unknown differential equation as ∂t u = f (u, ux , ...)    equations from highly complex data-sets.
and assuming the right-hand side is a linear combination of
predefined terms, i.e. f (u, ux , ...) = au + bux + ... = Θξ,      Related Work
model discovery reduces to finding a sparse coefficient vec-       Sparse regression Sparse regression as a means to dis-
tor ξ. Calculating the time derivative ut and the function         cover differential equations was pioneered by SINDY (Brun-
library Θ is notoriously hard for noisy and sparse data since      ton, Proctor, and Kutz 2016) and PDE-find (Rudy et al.
it involves calculating higher order derivatives. The error        2017). They have since been expanded to automated hyper-
in these terms is typically high due to the use of numerical       parameter tuning (Champion et al. 2019a; Maddu et al.
differentiation techniques such as finite difference or spline     2019); a Bayesian approach for model discovery using
interpolation, limiting classical model discovery to low-noise     Sparse Bayesian Learning (Yuan et al. 2019), model dis-
and densely sampled data-sets. Deep learning-based methods         covery for parametric differential equations(Rudy, Kutz, and
circumvent this issue by constructing a surrogate from the         Brunton 2019) and evolutionary approach to PDE discovery
data and calculating the feature library Θ as well as the time     (Maslyaev, Hvatov, and Kalyuzhnaya 2019).
derivative ut from this digital twin using automatic differen-
tiation. This approach significantly improves the accuracy of      Deep learning-based model discovery With the advent of
the time derivative and the library in noisy and sparse data       Physics Informed neural networks (Raissi, Perdikaris, and
Copyright c 2021, for this paper by its authors. Use permitted     Karniadakis 2017a,b), a neural network has become one of
under Creative Commons License Attribution 4.0 International (CC   the prime approaches to create a surrogate of the data and
BY 4.0)                                                            then perform sparse regression on the networks prediction
                  Data                   Funct. approx.              Library                 Sparsity
                                                                                         Determine mask




                                                                                                                      Underlying PDE
                                                                                           Constraint
                                                                                            Sparsity mask
                                                   mse     reg




Figure 1: Schematic overview of our framework. (I) A function approximator constructs a surrogate of the data, (II) from which
a Library of possible terms and the time derivative is constructed using automatic differentiation. (III) A sparsity estimator
selects the active terms in the library using sparse regression and (IV) the function approximator is constrained to solutions
allowed by the active terms by the constraint.


(Schaeffer 2017; Berg and Nyström 2019). Alternatively,            the constraint from the sparsity selection process itself. We
Neural ODEs are introduced to discover unknown governing            first calculate a sparsity mask g and constrain the network
equation (Rackauckas et al. 2020) from physical data-sets.          only by the active terms in the mask: instead of constraining
Different optimisation strategy based on the method of alter-       the neural network with ξ, we constrain it with ξ◦ g, replacing
nating direction is considered in (Chen, Liu, and Sun 2020),        eq. 1 with
and graph based approaches have been developed recently
(Seo and Liu 2019; Sanchez-Gonzalez et al. 2018). (Grey-
danus, Dzamba, and Yosinski 2019) and (Cranmer et al. 2020)                   N                   N
                                                                           1 X              2  1 X                        2
directly encode symmetries in neural networks using respec-          L=          (ui − ûi ) +       (∂t ûi − Θi (ξ · g)) . (2)
                                                                           N i=1               N i=1
tively the Hamiltonian and Lagrangian framework. Finally,
auto-encoders have been used to model PDEs and discover
latent variables(Lu, Kim, and Soljačić 2019; Iten et al. 2020),      Training using eq. 2 requires two steps: first, we calculate g
but do not lead to an explicit equation and require large           using a sparse estimator. Next, we minimise it with respect to
amounts of data.                                                    the network parameters using the masked coefficient vector.
                                                                    The sparsity mask g need not be calculated differentiably, so
   Deep-learning based model discovery with                         that any classical, non-differentiable sparse estimator can be
                                                                    used. This approach has several additional advantages: i) It
               sparse regression                                    provides an unbiased estimate of the coefficient vector since
Deep learning-based model discovery uses a neural network           we do not apply l1 or l2 regularisation on ξ, ii) the sparsity
to construct a surrogate model û of the data u. A library of       pattern is determined from the full library Θ, rather than only
candidate terms Θ is constructed using automatic differentia-       from the remaining active terms, allowing dynamic addition
tion from û and the neural network is constrained to solutions     and removal of active terms throughout training, and iii) we
allowed by this library (Both et al. 2019). The loss function       can use cross validation in the sparse estimator to find the
of the network thus consists of two contributions, (i) a mean       optimal hyper parameters for model selection. Finally, we
square error to learn the mapping (~x, t) → û and (ii) a term      note that the sparsity mask g mirrors the role of attention in
to constrain the network,                                           transformers (Bahdanau, Cho, and Bengio 2016).
                                                                       Using this change, we construct a general framework for
             N                   N                                  deep learning based model discovery using four modules (see
          1 X              2  1 X                  2                figure 1). (I) A function approximator constructs a surro-
    L=          (ui − ûi ) +       (∂t ûi − Θi ξ) .        (1)
          N i=1               N i=1                                 gate model of the data, (II) from which a Library of possible
                                                                    terms and the time derivative is constructed using automatic
   The sparse coefficient vector ξ is learned concurrently with     differentiation. (III) A sparsity estimator constructs a spar-
the network parameters and plays two roles: 1) determining          sity mask to select the active terms in the library using some
the active (i.e. non-zero) components of the underlying PDE         sparse regression algorithm and (IV) a constraint constrains
and 2) constraining the network according to these active           the function approximator to solutions allowed by the active
terms. We propose to separate these two tasks by decoupling         terms obtained from the sparsity estimator.
Training We typically calculate the sparsity mask g using                                            Losses
                                                                        A
an external, non-differentiable estimator. In this case, updat-
ing the mask at the right time is crucial: before the function
approximator has reasonably approximated the data, updat-
ing the mask would adversely affect training, as it is likely
to select the wrong terms. Vice versa, updating the mask too
late risks using a function library from an overfitted network.
We implement a procedure in the spirit of ”early stopping”                                          Coefficients
to decide when to update: the data-set gets split into a train          B
and test-set and we update the mask once the mean squared
error on the test-set reaches a minimum or changes less than
a preset value δ. We typically set δ = 10−6 to ensure the
network has learned a good representation of the data.
   After the first update, we periodically update the mask
using the sparsity estimator. In figure 2 we demonstrate this
training procedure on a Burgers equation with 1500 samples                                             Mask
                                                                        C
with 2% white noise. It shows the losses on the train- and
testset in panel A, the coefficients of the constraint in panel
B and the sparsity mask in C. In practice we observe that
large data-sets with little noise typically discover the correct
PDE after only a single sparsity update, but that noisy data-
sets require several updates, removing only a few terms at a
time. Final convergence is reached when the l1 norm of the
coefficient vector remains constant.

Package We provide our framework as a python based
package at https://github.com/PhIMaL/DeePyMoD, with the               Figure 2: A) MSE of the test-set and the total loss of the train-
documentation and examples available at https://phimal.               set as function of the number of epochs. The vertical line
github.io/DeePyMoD/. Mirroring our approach, each model               indicates the first time the sparsity mask is applied. B) The
consists of four modules: a function approximator, library,           twelve coefficients as function of the number of epochs. The
constraint and sparsity estimator module. Each module can be          two terms uxx and uux need to be recovered. C) Dynamic
customised or replaced without affecting the other modules,           sparsity mask during training. Yellow components are active,
allowing for quick experimentation. Our framework is built            blue components are inactive.
on Pytorch (Paszke et al. 2019) and any Pytorch model (i.e.
Recurrent Neural Networks) can be used as function approxi-
mator. The sparse estimator module follows the Scikit-learn
                                                                      Panel A) shows that the least-squares approach reaches a
API (Pedregosa et al.; Buitinck et al. 2013), i.e., all the build-
                                                                      consistently lower loss. More strikingly, we show in panel
in Scikit-learn estimators, such as those in PySindy(de Silva
                                                                      B) that the mean absolute error in the coefficients is three
et al. 2020) or SK-time (Löning et al.), can be used.
                                                                      orders of magnitude lower. We explain the difference as a
                                                                      consequence of the random initialisation of ξ: the network
                        Experiments                                   is initially constrained by incorrect coefficients, prolonging
Constraint The sparse coefficient vector ξ in eq. 1 is typ-           convergence. The random initialisation also causes the larger
ically found by optimising it concurrently with the neural            spread in results compared to the least squares method. The
network parameters θ. Considering a network with parameter            least squares method does not suffer from sensitivity to the
configuration θ∗ , the problem of finding ξ can be rewrit-            initialisation and consistently converges.
                                    2
ten as arg minξ |ut (θ∗ ) − Θ(θ∗ )ξ| . This can be analytically
solved by least squares under mild assumptions; we calcu-
                                                                      Sparsity estimator Implementing the sparsity estimator
late ξ by solving this problem every iteration, rather than
                                                                      separately from the neural network allows us to use any
optimizing it using gradient descent. In figure 3 we compare
                                                                      sparsity promoting algorithm. Here we show that a classi-
the two constraining strategies on a Burgers data-set1 , by
                                                                      cal method for PDE model discovery, PDE-find (Rudy et al.
training for 5000 epochs without updating the sparsity mask2 .
                                                                      2017), can be used together with neural networks to per-
    1
      We solve ut = ux x + νuux with a delta-peak initial condition   form model discovery in highly sparse and noisy data-sets.
for ν = 0.1 for x = [−3, 4], t = [0.5, 5], randomly sample 2000       We compare it with the thresholded Lasso3 in figure 4 ap-
points and add 10% white noise.                                       proach (Both et al. 2019) on a Burgers data-set 4 with vary-
    2
      All experiments use a network with a tanh activation function   ing amounts of noise. The PDE-find estimator discovers the
of 5 layers with 30 neurons per layer. The network is optimized
                                                                         3
using the ADAM optimiser with a learning rate of 2 · 10−3 and                We use a pre-set threshold of 0.1.
                                                                         4
β = (0.99, 0.999).                                                           See footnote 2, only with 1000 points randomly sampled.
          A                       Train loss                                                   1.0                          Threshold




                                                                            Fraction correct
                                               Grad. desc.                                     0.8                          PDE-find
                   4                           Lst. sq.
        Log cost                                                                               0.6
                                                                                               0.4
                   6
                                                                                               0.2
                                                                                               0.0
                   8
                                   Epoch                                                             0   20      40    60       80   100
                                                                                                              Noise level (%)
          B 0                  Coefficient error
                                                                    Figure 4: Fraction of correct discovered Burgers equations
                   1
        Log MAE




                                                                    (averaged over 10 runs) as function of the noise level for the
                                                                    thresholded lasso and PDE-find sparsity estimator.
                   2
                           Grad. desc.
                   3       Lst. sq.                                                            Discussion and future work
                       0   1       2      3        4      5         In this paper we introduced a framework for model discovery,
                                   Epoch                1e3         combining classical sparsity estimation with deep learning
                                                                    based surrogates. Building on this, we showed that replacing
                                                                    the function approximator, constraint or dynamically apply-
Figure 3: A) Loss and B) mean absolute error of the coeffi-         ing the sparsity estimator during training can extend model
cients obtained with the gradient descent and the least squares     discovery to more complex datasets, speed up convergence
constraint as a function of the number of epochs. Results have      or make it more robust to noise. Each of the four components
been averaged over twenty runs and shaded area denotes the          is decoupled from the rest and can be independently changed,
standard deviation.                                                 making our approach a solid base for future research. Cur-
                                                                    rently, the function approximator simply learns the solution
                                                                    using a feed forward neural network. We suspect that adding
correct equation in the majority of cases, even with up to          more structure, for example by using recurrent, convolutional
60% − 80% noise, whereas the thresholded lasso mostly fails         or graph neural networks, will improve the performance of
at 40%. We emphasise that the modular approach we propose           model discovery. It might also be beneficial to regularise the
here allows to combine classical and deep learning-based            constraint, for example by implementing lasso or ridge re-
techniques. More advanced sparsity estimators such as SR3           gression. Updating the sparsity mask in a non-differentiable
(Champion et al. 2019b) can easily be included in this frame-       manner works because the neural network is able to learn a
work.                                                               fairly accurate surrogate without imposing sparsity on the
                                                                    constraint. If the network is unable to learn an accurate repre-
Function approximator We show in figure 5 that a tanh-              sentation, our approach breaks down. Updating the mask in a
based NN fails to converge on a data-set of the Kuramoto-           differentiable manner would not suffer from this drawback,
Shivashinksy (KS) equation5 (panel A and B). Consequently,          and we intend to pursue this in future works.
the coefficient vectors are incorrect (Panel D). As our frame-
work is agnostic to the underlying function approximator,                                            Acknowledgments
we instead use a SIREN 6 , which is able to learn very sharp        This work received support from the CRI Research Fellow-
features in the underlying dynamics. In panel B we show that        ship to attributed to Remy Kusters. We thank the Betten-
a SIREN is able to learn the complex dynamics of the KS             court Schueller Foundation long term partnership and NVidia
equation and in panel C that it discovers the correct equation7 .   for supplying the GPU under the Academic Grant program.
This example shows that the choice of function approxima-           We would also like to thank the authors and contributors of
tor can be a decisive factor in the success of neural network       Numpy ((Harris et al. 2020)), Scipy ((Virtanen et al. 2020)),
based model discovery. Using our framework we can also              Scikit-learn ((Pedregosa et al.)), Matplotlib ((Hunter 2007)),
explore using RNNs, Neural ODEs (Rackauckas et al. 2020)            Ipython ((Perez and Granger 2007)), and Pytorch ((Paszke
or Graph Neural Networks (Seo and Liu 2019).                        et al. 2019)) for making our work possible through their open-
    5                                                               source software. The authors declare no competing interest.
      We solve ∂t u + uux + uxx + uxxxx = 0 between x =
[0, 100], t = [0, 44], randomly sample 25000 points and add 5%
white noise.                                                                                             References
    6
      Both networks use 8 layers with 50 neurons. We train the        Bahdanau, D.; Cho, K.; and Bengio, Y. 2016. Neural Ma-
SIREN using ADAM with a learning rate of 2.5 · 10−4 and               chine Translation by Jointly Learning to Align and Trans-
β = (0.999, 0.999)                                                    late. arXiv:1409.0473 [cs, stat] URL http://arxiv.org/abs/
    7
      In bold; uux : green, uxx : blue and uxxxx : orange             1409.0473. ArXiv: 1409.0473.
 A      Kuramoto-Shivashinksy              B                          MSE                    C                      SIREN
                                               10 1
                                                                                                   0

                                                                                                   5
                                               10 2
 t




                                                                                             D 10                    Tanh
                                               10 3
                                                                                                   0
                                                                   SIREN
                                                                   Tanh
 u




                                               10 4                                               10
                      x                               0.0     0.5      1.0     1.5     2.0             0.0    0.5      1.0     1.5     2.0
                                                                     Epoch           1e4                            Epoch            1e4

Figure 5: A) Solution of the KS equation. Lower panel shows the cross section at the last time point: t = 44. B) MSE as function
of the number of epochs for both the tanh-based and SIREN NN. Coefficients as function of number of epochs for C) the SIREN.
and D) the tanh-based NN. The bold curves in panel C and D are the terms in the KS equation components; green: uux :, blue:
uxx and orange: uxxxx . Only SIREN is able to discover the correct equation.


  Berg, J.; and Nyström, K. 2019. Data-driven discovery                     laws from scarce data. arXiv:2005.03448 [physics, stat] URL
  of PDEs in complex datasets. Journal of Computational                      http://arxiv.org/abs/2005.03448. ArXiv: 2005.03448.
  Physics 384: 239–252. ISSN 00219991. doi:10.1016/j.jcp.                    Cranmer, M.; Greydanus, S.; Hoyer, S.; Battaglia, P.; Spergel,
  2019.01.036. URL http://arxiv.org/abs/1808.10788. ArXiv:                   D.; and Ho, S. 2020.        Lagrangian Neural Networks.
  1808.10788.                                                                arXiv:2003.04630 [physics, stat] URL http://arxiv.org/abs/
  Both, G.-J.; Choudhury, S.; Sens, P.; and Kusters, R. 2019.                2003.04630. ArXiv: 2003.04630.
  DeepMoD: Deep learning for Model Discovery in noisy data.                  de Silva, B. M.; Champion, K.; Quade, M.; Loiseau, J.-C.;
  arXiv:1904.09406 [physics, q-bio, stat] URL http://arxiv.org/              Kutz, J. N.; and Brunton, S. L. 2020. PySINDy: A Python
  abs/1904.09406. ArXiv: 1904.09406.                                         package for the Sparse Identification of Nonlinear Dynamics
  Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2016. Dis-                 from Data. arXiv:2004.08424 [physics] URL http://arxiv.org/
  covering governing equations from data by sparse identifi-                 abs/2004.08424. ArXiv: 2004.08424.
  cation of nonlinear dynamical systems. Proceedings of the                  Greydanus, S.; Dzamba, M.; and Yosinski, J. 2019. Hamil-
  National Academy of Sciences 113(15): 3932–3937. ISSN                      tonian Neural Networks. arXiv:1906.01563 [cs] URL http:
  0027-8424, 1091-6490. doi:10.1073/pnas.1517384113. URL                     //arxiv.org/abs/1906.01563. ArXiv: 1906.01563.
  http://www.pnas.org/lookup/doi/10.1073/pnas.1517384113.
                                                                             Harris, C. R.; Millman, K. J.; van der Walt, S. J.; Gommers,
  Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller,             R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg,
  A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.;               S.; Smith, N. J.; Kern, R.; Picus, M.; Hoyer, S.; van Kerkwijk,
  Grobler, J.; Layton, R.; Vanderplas, J.; Joly, A.; Holt,                   M. H.; Brett, M.; Haldane, A.; del Rı́o, J. F.; Wiebe, M.;
  B.; and Varoquaux, G. 2013. API design for machine                         Peterson, P.; Gérard-Marchant, P.; Sheppard, K.; Reddy, T.;
  learning software: experiences from the scikit-learn project.              Weckesser, W.; Abbasi, H.; Gohlke, C.; and Oliphant, T. E.
  arXiv:1309.0238 [cs] URL http://arxiv.org/abs/1309.0238.                   2020. Array programming with NumPy. Nature 585(7825):
  ArXiv: 1309.0238.                                                          357–362. ISSN 0028-0836, 1476-4687. doi:10.1038/s41586-
                                                                             020-2649-2. URL http://www.nature.com/articles/s41586-
  Champion, K.; Lusch, B.; Kutz, J. N.; and Brunton, S. L.                   020-2649-2.
  2019a. Data-driven discovery of coordinates and governing
  equations. arXiv:1904.02107 [stat] URL http://arxiv.org/abs/               Hunter, J. D. 2007. Matplotlib: A 2D Graphics Environment.
  1904.02107. ArXiv: 1904.02107.                                             Computing in Science Engineering 9(3): 90–95. ISSN 1558-
                                                                             366X. doi:10.1109/MCSE.2007.55. Conference Name: Com-
  Champion, K.; Zheng, P.; Aravkin, A. Y.; Brunton, S. L.; and               puting in Science Engineering.
  Kutz, J. N. 2019b. A unified sparse optimization framework
  to learn parsimonious physics-informed models from data.                   Iten, R.; Metger, T.; Wilming, H.; del Rio, L.; and Renner, R.
  arXiv:1906.10612 [physics] URL http://arxiv.org/abs/1906.                  2020. Discovering physical concepts with neural networks.
  10612. ArXiv: 1906.10612.                                                  Physical Review Letters 124(1): 010508. ISSN 0031-9007,
                                                                             1079-7114. doi:10.1103/PhysRevLett.124.010508. URL http:
  Chen, Z.; Liu, Y.; and Sun, H. 2020. Deep learning of physical             //arxiv.org/abs/1807.10300. ArXiv: 1807.10300.
Lu, P. Y.; Kim, S.; and Soljačić, M. 2019. Extracting Inter-   Sanchez-Gonzalez, A.; Heess, N.; Springenberg, J. T.; Merel,
pretable Physical Parameters from Spatiotemporal Systems us-     J.; Riedmiller, M.; Hadsell, R.; and Battaglia, P. 2018. Graph
ing Unsupervised Learning. arXiv:1907.06011 [physics, stat]      networks as learnable physics engines for inference and con-
URL http://arxiv.org/abs/1907.06011. ArXiv: 1907.06011.          trol. arXiv:1806.01242 [cs, stat] URL http://arxiv.org/abs/
                                                                 1806.01242. ArXiv: 1806.01242.
Löning, M.; Bagnall, A.; Ganesh, S.; and Kazakov, V. ????
sktime: A Unified Interface for Machine Learning with Time       Schaeffer, H. 2017. Learning partial differential equa-
Series 10.                                                       tions via data discovery and sparse optimization. Proceed-
                                                                 ings of the Royal Society A: Mathematical, Physical and
Maddu, S.; Cheeseman, B. L.; Sbalzarini, I. F.; and Müller,     Engineering Sciences 473(2197): 20160446. ISSN 1364-
C. L. 2019. Stability selection enables robust learning          5021, 1471-2946. doi:10.1098/rspa.2016.0446. URL https:
of partial differential equations from limited noisy data.       //royalsocietypublishing.org/doi/10.1098/rspa.2016.0446.
arXiv:1907.07810 [physics] URL http://arxiv.org/abs/1907.
07810. ArXiv: 1907.07810.                                        Seo, S.; and Liu, Y. 2019. Differentiable Physics-informed
                                                                 Graph Networks. arXiv:1902.02950 [cs, stat] URL http://
Maslyaev, M.; Hvatov, A.; and Kalyuzhnaya, A. 2019.              arxiv.org/abs/1902.02950. ArXiv: 1902.02950.
Data-driven PDE discovery with evolutionary approach.
arXiv:1903.08011 [cs, math] 11540: 635–641. doi:10.1007/         Sitzmann, V.; Martel, J. N. P.; Bergman, A. W.; Lindell, D. B.;
978-3-030-22750-0 61. URL http://arxiv.org/abs/1903.08011.       and Wetzstein, G. 2020. Implicit Neural Representations with
ArXiv: 1903.08011.                                               Periodic Activation Functions. arXiv:2006.09661 [cs, eess]
                                                                 URL http://arxiv.org/abs/2006.09661. ArXiv: 2006.09661.
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.;       Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland,
Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga,        M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.;
L.; Desmaison, A.; Köpf, A.; Yang, E.; DeVito, Z.; Raison,      Weckesser, W.; Bright, J.; van der Walt, S. J.; Brett, M.; Wil-
M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai,    son, J.; Millman, K. J.; Mayorov, N.; Nelson, A. R. J.; Jones,
J.; and Chintala, S. 2019. PyTorch: An Imperative Style,         E.; Kern, R.; Larson, E.; Carey, C. J.; Polat, ; Feng, Y.; Moore,
High-Performance Deep Learning Library. arXiv:1912.01703         E. W.; VanderPlas, J.; Laxalde, D.; Perktold, J.; Cimrman,
[cs, stat] URL http://arxiv.org/abs/1912.01703. ArXiv:           R.; Henriksen, I.; Quintero, E. A.; Harris, C. R.; Archibald,
1912.01703.                                                      A. M.; Ribeiro, A. H.; Pedregosa, F.; van Mulbregt, P.; and
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.;          Contributors, S. . . 2020. SciPy 1.0–Fundamental Algorithms
Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss,   for Scientific Computing in Python. Nature Methods 17(3):
R.; Dubourg, V.; Vanderplas, J.; Passos, A.; and Cournapeau,     261–272. ISSN 1548-7091, 1548-7105. doi:10.1038/s41592-
D. ???? Scikit-learn: Machine Learning in Python. MACHINE        019-0686-2. URL http://arxiv.org/abs/1907.10121. ArXiv:
LEARNING IN PYTHON 6.                                            1907.10121.
Perez, F.; and Granger, B. E. 2007. IPython: A System            Yuan, Y.; Li, J.; Li, L.; Jiang, F.; Tang, X.; Zhang, F.; Liu, S.;
for Interactive Scientific Computing. Computing in Sci-          Goncalves, J.; Voss, H. U.; Li, X.; Kurths, J.; and Ding, H.
ence Engineering 9(3): 21–29. ISSN 1558-366X. doi:               2019. Machine Discovery of Partial Differential Equations
10.1109/MCSE.2007.53.                                            from Spatiotemporal Data. arXiv:1909.06730 [physics, stat]
                                                                 URL http://arxiv.org/abs/1909.06730. ArXiv: 1909.06730.
Rackauckas, C.; Ma, Y.; Martensen, J.; Warner, C.; Zubov,
K.; Supekar, R.; Skinner, D.; and Ramadhan, A. 2020. Uni-
versal Differential Equations for Scientific Machine Learn-
ing. arXiv:2001.04385 [cs, math, q-bio, stat] URL http:
//arxiv.org/abs/2001.04385. ArXiv: 2001.04385.
Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2017a.
Physics Informed Deep Learning (Part I): Data-driven
Solutions of Nonlinear Partial Differential Equations.
arXiv:1711.10561 [cs, math, stat] URL http://arxiv.org/abs/
1711.10561. ArXiv: 1711.10561.
Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2017b.
Physics Informed Deep Learning (Part II): Data-driven
Discovery of Nonlinear Partial Differential Equations.
arXiv:1711.10566 [cs, math, stat] URL http://arxiv.org/abs/
1711.10566. ArXiv: 1711.10566.
Rudy, S. H.; Brunton, S. L.; Proctor, J. L.; and Kutz, J. N.
2017. Data-driven discovery of partial differential equations.
Science Advances 3(4): e1602614. ISSN 2375-2548. doi:
10.1126/sciadv.1602614. URL http://advances.sciencemag.
org/lookup/doi/10.1126/sciadv.1602614.
Rudy, S. H.; Kutz, J. N.; and Brunton, S. L. 2019. Deep learn-
ing of dynamics and signal-noise decomposition with time-
stepping constraints. Journal of Computational Physics 396:
483–506. ISSN 00219991. doi:10.1016/j.jcp.2019.06.056.
URL http://arxiv.org/abs/1808.02578. ArXiv: 1808.02578.