=Paper= {{Paper |id=Vol-2964/article_209 |storemode=property |title=Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics |pdfUrl=https://ceur-ws.org/Vol-2964/article_209.pdf |volume=Vol-2964 |authors=Sourav Dutta,Peter Rivera-Casillas,Matthew Farthing |dblpUrl=https://dblp.org/rec/conf/aaaiss/DuttaRF21 }} ==Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics== https://ceur-ws.org/Vol-2964/article_209.pdf
         Neural Ordinary Differential Equations for Data-Driven Reduced Order
                      Modeling of Environmental Hydrodynamics
                           Sourav Dutta,1 Peter Rivera-Casillas,2 Matthew W. Farthing,1
         1
           U.S. Army Engineer Research and Development Center, Coastal and Hydraulics Laboratory, Vicksburg, MS,
     2
         U.S. Army Engineer Research and Development Center, Information and Technology Laboratory, Vicksburg, MS,
                        1, 2
                             {sourav.dutta, peter.g.rivera-casillas, matthew.w.farthing}@erdc.dren.mil


                            Abstract                                  the construction of a solution-dependent, linear basis space
  Model reduction for fluid flow simulation continues to be of
                                                                      spanned by a set of RB “modes”, which are extracted from
  great interest across a number of scientific and engineering        a collection of high-fidelity solutions, also called snapshots.
  fields. Here, we explore the use of Neural Ordinary Differ-         The RB “modes” can be thought of as a set of global ba-
  ential Equations, a recently introduced family of continuous-       sis functions spanning a linear subspace that can be used to
  depth, differentiable networks (Chen et al. 2018), as a way to      approximate the dynamics of the high-fidelity model. The
  propagate latent-space dynamics in reduced order models. We         most well known method to extract the reduced basis is
  compare their behavior with two classical non-intrusive meth-       called proper orthogonal decomposition (POD) (Sirovich
  ods based on proper orthogonal decomposition and radial ba-         1987; Berkooz, Holmes, and Lumley 1993), which is partic-
  sis function interpolation as well as dynamic mode decompo-         ularly effective when the coherent structures of the flow can
  sition. The test problems we consider include incompressible        be hierarchically ranked in terms of their energy content.
  flow around a cylinder as well as real-world applications of
  shallow water hydrodynamics in riverine and estuarine sys-             In the online stage of traditional RB methods, a linear
  tems. Our findings indicate that Neural ODEs provide an el-         combination of the reduced order RB modes is used to ap-
  egant framework for stable and accurate evolution of latent-        proximate the high-fidelity solution for a new configura-
  space dynamics with a promising potential of extrapolatory          tion of flow parameters. The procedure adopted to com-
  predictions. However, in order to facilitate their widespread       pute the expansion coefficients leads to the classification
  adoption for large-scale systems, significant effort needs to       of these methods into two broad categories: intrusive and
  directed at accelerating their training times. This will enable a   non-intrusive. In an intrusive RB method, the expansion co-
  more comprehensive exploration of the hyperparameter space
                                                                      efficients are determined by the solution of a reduced or-
  for building generalizable Neural ODE approximations over
  a wide range of system dynamics.                                    der system of equations, which is typically obtained via a
                                                                      Galerkin or Petrov-Galerkin projection of the high-fidelity
                                                                      (full-order) system onto the RB space (Lozovskiy, Farthing,
                        Introduction                                  and Kees 2017). Typically this projection and solution in-
Despite the trend of hardware improvements and significant            volves modification of high-fidelity simulators and hence the
gains in the algorithmic efficiency of standard discretiza-           label intrusive. For linear systems, Galerkin projection is the
tion procedures, high-fidelity numerical simulation of engi-          most popular choice. However, in the presence of nonlin-
neering systems governed by nonlinear partial differential            earities, an affine expansion of the nonlinear (or non-affine)
equations still pose a prohibitive computational challenge            differential operator must be recovered in order to make the
(Quarteroni, Manzoni, and Negri 2016) for several decision-           evaluation of the projection-based reduced model indepen-
making applications involving control (Proctor, Brunton,              dent of the number of DOFs of the high-fidelity solution.
and Kutz 2016), optimal design and multi-fidelity optimiza-              Several different techniques, collectively referred to as
tion (Peherstorfer, Willcox, and Gunzburger 2016), and/or             hyper-reduction methods (Amsallem et al. 2015), have been
uncertainty quantification (Sapsis and Majda 2013). Re-               proposed to address this problem. These include the em-
duced order models (ROMs) offer a valuable alternative way            pirical interpolation method (EIM), its discrete counterpart
to simulate such dynamical systems with considerably re-              DEIM (Chaturantabut and Sorensen 2010), “gappy POD”
duced computational cost (Benner, Gugercin, and Willcox               (Willcox 2006), as well as the residual DEIM method (Xiao
2015).                                                                et al. 2014). Beyond the need for hyper-reduction to re-
   Reduced basis (RB) methods (Quarteroni, Manzoni, and               cover efficiency, in complex nonlinear problems it is also
Negri 2016) constitute a family of widely popular ROM                 common that some of the intrinsic structures present in the
techniques that are usually implemented with an offline-              high-fidelity model may be lost during order reduction us-
online decomposition paradigm. The offline stage involves             ing Galerkin projection-based approaches. This is because
Copyright © 2021, for this paper by its authors. Use permitted un-    the Galerkin projection approach inherently assumes that
der Creative Commons License Attribution 4.0 International (CC        the residual generated by the truncated representation of the
BY 4.0).                                                              high-fidelity model is orthogonal to the reduced basis space
which leads to the loss of higher order nonlinear interac-        Machine learning techniques can be introduced at any of
tion terms in the reduced representation. This can result in      these stages. For example, many works have explored the
qualitatively wrong solutions or instability issues (Amsallem     use of deep learning-based approaches like autoencoders as
and Farhat 2012). As a remedy, Petrov-Galerkin projec-            a way to introduce a nonlinear alternative for dimension re-
tion based approaches have been proposed (Carlberg, Bou-          duction (Lusch, Nathan Kutz, and Brunton 2018; Ghorban-
Mosleh, and Farhat 2011; Carlberg et al. 2013; Fang et al.        idehno et al. 2021). Combining these methods with data-
2013).                                                            driven latent-space propagation (for example via fully con-
   An alternative family of methods to address the issues of      nected or recurrent neural networks) leads to a fully non-
instability and loss of efficiency in the intrusive ROM frame-    intrusive approach (Gonzalez and Balajewicz 2018). On the
works is represented by non-intrusive reduced order models        other hand, one can also combine nonlinear dimension re-
(NIROMs), and forms the subject of this study. The primary        duction with intrusive projection to create a hybrid method
advantage of this class of methods is that complex modifica-      (Lee and Carlberg 2020; Kim et al. 2020). In this work,
tions to the source code describing the physical model can        we study three different data-driven strategies for accurate
be avoided, thus making it easier to develop reduced models       learning of system dynamics within the context of linear di-
when the legacy or proprietary source codes are not avail-        mension reduction. In the first two methods we adopt the
able. In these methods, instead of a Galerkin-type projec-        POD technique for identification of an optimal global ba-
tion, the expansion coefficients for the reduced solution are     sis. For the latent space evolution, we utilize a kernel-based
obtained via interpolation on the space of a reduced basis        multivariate interpolation method called radial basis func-
extracted from snapshot data. However, since the reduced          tion (RBF) interpolation, and a machine learning strategy
dynamics generally belong to nonlinear, matrix manifolds,         designed for sequential learning of time-series data called
a variety of interpolation techniques have been proposed          neural ordinary differential equations (NODE). In the third
that are capable of enforcing the constraints characterizing      strategy, the three stages of ROM development are combined
those manifolds. Regression-based non-intrusive methods           together by using a classical modal decomposition technique
have been proposed that, among others, use artificial neu-        called the dynamic mode decomposition (DMD), that is sup-
ral networks (ANNs), in particular multi-layer perceptrons        ported by rigorous mathematical analysis of Koopman mode
(Hesthaven and Ubbiali 2018), Gaussian process regression         theory.
(GPR) (Guo and Hesthaven 2019), and radial basis function
(RBF) (Audouze, De Vuyst, and Nair 2013) to perform the           Proper orthogonal decomposition
interpolation.
                                                                  POD is a popular technique for dimension reduction (An-
   Here, we will explore an alternative approach to propagat-
                                                                  toulas and Sorensen 2001) of the solution manifold of a
ing latent-space dynamics based on Neural ODEs, which are
                                                                  dynamical system by determining a linear reduced space
a family of continuous-depth, differentiable networks that
                                                                  spanned by an orthogonal basis with an associated energetic
can be seen as an extension of ResNets in the limit of a zero
                                                                  hierarchy. (Taira et al. 2020) provides an excellent overview
discretization step size (Dupont, Doucet, and Teh 2019). De-
                                                                  of POD as well as a comparison with other dimension-
tails of our approach follow below. In addition, we consider
                                                                  reduction techniques.
two NIROM techniques - a) based on linear dimension re-
                                                                     Consider a snapshot matrix S = [b                b M ] ∈ RN ×M
                                                                                                         v1 , . . . , v
duction via POD and latent space evolution via Radial Ba-
                                                                  containing a collection of M high-fidelity snapshots of the
sis Functions (RBF), and b) Dynamic Mode Decomposition
                                                                  solution manifold from time t = 0 to t = T such that
(DMD) that will serve as the benchmarks in our numerical
                                                                  b k ∈ RN is the k th snapshot with the temporal mean value
                                                                  v
experiments to provide comparisons with the Neural ODE                                                                 vi
                                                                                  b k = vk − v̄ where v̄ = M
                                                                                                            P
approach. We then proceed with several numerical experi-          removed, i.e. , v                              i=1 M is the time-
ments based on incompressible flow around a cylinder and          averaged solution. The goal of thePOD procedure is to iden-
shallow water hydrodynamics in order to evaluate the meth-        tify a linear subspace χ = span ψ 1 , . . . , ψ r , (r  M )
ods’ performance for fast replay applications in complex          which approximates the solution manifold optimally with
fluid-dynamics problems.                                          respect to the L2 -norm.
                                                                     The POD bases can be efficiently extracted by performing
                                                                  a “thin” singular value decomposition (SVD) of the snap-
                       Methodology
                                                                  shot matrix S = Θ   eΣeΨ e T , where Σ
                                                                                                       e = diag(σ1 , . . . , σR ) is
The standard ROM development framework usually consists
                                                                  a R × R diagonal matrix containing the singular values ar-
of three stages:
                                                                  ranged in decreasing order of magnitude, σ1 ≥ σ2 . . . ≥ σR
1. identification of a low-dimensional latent (or reduced-        and R < min{N, M } is the rank of S. Θ     e and Ψ    e are N × R
   order) space,                                                  and M × R matrices respectively, whose columns are the
2. determining a latent-space representation of the nonlinear     orthonormal left and right singular vectors of S such that
   dynamical system in terms of the reduced basis and mod-        ΘeTΘ e = IR = Ψ   e T Ψ.
                                                                                        e The columns θ n of the matrix Θ      e are
   eling the evolution of the system of modal coefficients,       ordered corresponding to the singular values σn and these
   and                                                            provide the desired POD basis. Let Θ denote the matrix of
3. reconstruction in the high-fidelity space for validation and   the first m columns of Θ,  e Ψ be the matrix containing the
   analysis.                                                      first m rows of Ψ, and Σ be a diagonal matrix containing
                                                                                     e
the first m singular values from Σ,
                                 e then the high-fidelity so-             modal coefficients as {zl | l = 0, . . . , M − 1} such that
lution vn at time tn can be approximated as,                              Ni = Ne = M , and making some simplifying assumptions
                             m                                            leads to a symmetric, linear system of M equations to solve
                                                                          for the unknown interpolation coefficients, αj,k
                             X
  vn ≈ v̄ + Θzn = v̄ +              zin θ i ,                       (1)
                              i=1
                                                                            Aαj = g j , for j = 1, . . . , m,                              (4)
where zn ∈ Rm is a vector of modal coefficients with re-
spect to the reduced basis. The modal coefficient matrix                  where
Z = ΘT S constitutes our training data for the latent space                       [An,k ] = [φ kzn − zk k ],
                                                                                                         
                                                                                                                   n, k = 0, . . . M − 1,
learning methods. Due to the Eckart-Young-Mirsky theo-
rem, the POD basis provides an optimal rank-m approxi-                      α = [αj,0 , . . . , αj,M −1 ] , g = [gj,0 , . . . , gj,M −1 ]T .
                                                                              j                          T     j

mation Sb = ΘΣΨT of the snapshot matrix S with a desired
                                                                          The coefficients αj define a unique RBF interpolant which
level of accuracy, τP OD .                                                can then be used to approximate eq. (2) and generate a non-
   The POD method has been successfully applied in statis-                intrusive model for the evolution of the modal coefficients
tics (Jolliffe 1986), signal analysis and pattern recognition             In this work, a first-order forward Euler scheme has been
(Deheuvels and Martynov 2008), ocean models (Vermeulen                    employed for the discretization of the time derivative, and a
and Heemink 2006), air pollution models (Fang et al. 2014),               strictly positive-definite Matérn C 0 kernel, given by φ(r) =
convective Boussinesq flows (San and Borggaard 2015), and                 e−cr has been adopted, where r is the Euclidean distance
Shallow Water Equation (SWE) models (Stefanescu, Sandu,                   and c is the RBF shape factor (Fasshauer 2007).
and Navon 2014; Lozovskiy et al. 2016).
                                                                             Adopting RBF interpolation for modeling the latent space
Latent space evolution                                                    evolution of the modal coefficients has been shown to be
                                                                          quite successful for nonlinear, time-dependent partial dif-
In this section, we outline two non-intrusive methods for                 ferential equations (PDEs) (Xiao et al. 2015; Dutta et al.
modeling the evolution of time-series data in the latent space            2020), nonlinear, parametrized PDEs (Audouze, De Vuyst,
defined by the POD basis. RBF interpolation is a classical,               and Nair 2013; Xiao et al. 2017), and aerodynamic shape op-
data-driven, kernel-based method for computing an approx-                 timization (Iuliano and Quagliarella 2013), to name a few.
imate continuous response surface that aligns with the given
multivariate data. The second technique called NODE is a                  Neural ordinary differential equations Recurrent neural
neural-network based method to predict the continuous evo-                network (RNN) architectures like LSTM and GRU are of-
lution of a vector c over time, that is designed to preserve              ten employed to encode time-series data and forecast fu-
memory effects within the architecture.                                   ture states, as their internal memory preserving architec-
                                                                          ture allows them to incorporate state information over a se-
Radial basis function interpolation For simplicity, let                   quence of input data. Although RNNs have seen great suc-
the time evolution of the modal coefficients z be represented             cess in natural language processing tasks, they have had rel-
as a semi-discrete dynamical system,                                      atively limited success in high-fidelity scientific computing
   ż = f (z, t), with z0 = ΘT v0 − v̄
                                        
                                                          (2)             applications (Ferrandis et al. 2019; Wang, Ripamonti, and
                                                                          Hesthaven 2020), as it has been observed that a sequence
where all the information about the temporal dynamics in-                 generated by an RNN may fail to preserve temporal reg-
cluding the effects of any numerical stabilization of the high-           ularity of the underlying signal, and thus may not repre-
fidelity solver and all the nonlinear terms are embedded in               sent true continuous dynamics. (Chen et al. 2018). With
f (z, t). In the POD-RBF NIROM framework (Dutta et al.                    deep neural networks (DNN) such as ResNet, the evolu-
2020), instead of the Galerkin projection, the components of              tion of the features over the network depth is equivalent
the time derivative function fj (j = 1, . . . , m) are approxi-           to solving an ordinary differential equation (ODE) such as
mated using RBF interpolation.                                            dz
                                                                           dt = F (z, θ) using the forward Euler method, and this con-
   Let Fj denote a RBF approximation of the time derivative               nection between ResNet’s architecture and numerical inte-
function fj , which is defined by a linear combination of Ni              grators has been explored in details by (Ruthotto and Haber
instances of a radial basis function φ,                                   2019) and others. Several other deep learning methods have
             Ni                                                           been proposed for learning ODEs and PDEs. These include
                                                                          using PDE-based network (Long, Lu, and Dong 2019), train-
             X
  Fj (z) =         αj,k φ (kz − b
                                zk k) ,         j = 1, . . . , m,   (3)
                                                                          ing DNNs using physics-informed soft penalty constraints
             k=1
                                                                          (Raissi, Perdikaris, and Karniadakis 2019), and using sparse
where {b zk | k = 1, . . . , Ni } denotes the set of interpolation        regularizers and regression (Brunton et al. 2016; Champion
centers and αj,k (k = 1, . . . , Ni ) is the unknown interpola-           et al. 2019), to name a few.
tion coefficient corresponding to the k th center for the j th               Chen et al. (2018) proposed a ’continuous-depth’ neu-
component of the modal coefficient. These interpolation co-               ral network called ODE-Net that effectively replaces the
efficients are computed by enforcing the interpolation func-              layers in ResNet-like architectures with a trainable ODE
tion Fj to exactly match the time derivative of the modal                 solver. The memory efficiency and stability of this neural or-
coefficients at Ne test points (Ne ≥ Ni ). Choosing the cen-              dinary differential equation (NODE) approach was further
ters and the test points identically from the set of snapshot             improved in (Gholami, Keutzer, and Biros 2019; Dupont,
Doucet, and Teh 2019) and others. (Maulik et al. 2020) ap-                In this work, we utilize the TFDiffEq (https://github.com/
plied the NODE framework to obtain latent space closure                titu1994/tfdiffeq) library that runs on the Tensorflow Eager
models for ROMs of a one-dimensional advecting shock                   Execution platform to train the NODE models. Although a
problem and a one-dimensional Burgers’ turbulence prob-                single layer architecture guarantees upper-bounds accord-
lem that exhibits multiscale behavior in the wavenumber                ing to the universal approximation theorem (Barron 1993),
space. Some other notable recent applications of NODE in-              deeper networks with up to four layers as well as several
clude the identification of ODE or PDE models from time-               linear and nonlinear activation functions are also explored
dependent data (Sun, Zhang, and Schaeffer 2020), modeling              due to their possibly improved expressibility for more com-
of irregularly spaced time series data (Rubanova, Chen, and            plex nonlinear dynamics (Zhang et al. 2019). RMSProp is
Duvenaud 2019), modeling of spatio-temporal information                adopted for loss minimization with an initial learning rate
in video signals (Kanaa et al. 2019). Finlay et al. (2020) used        of 0.001, a staircase decay function with a range of variable
a combination of optimal transport theory and stability reg-           decay schedules, and a momentum coefficient of 0.9. NODE
ularizations to propose a neural-ODE generative model that             predictions of comparable accuracy were obtained for all
can be efficiently trained on large-scale datasets. Here we            the numerical experiments by using both the adjoint method
further explore the application of the POD-NODE method-                as well as by backpropagating gradients directly through
ology to complex, real-world flows characterized by systems            the hidden steps of the ode solver. However, for large-scale
of two-dimensional, nonlinear PDEs.                                    training data the latter method may lead to memory issues,
   We assume that the time evolution of the modal coef-                especially while computing on GPU nodes.
ficients of the high-fidelity dynamical system in the latent
space can be modeled using a (first-order) ODE,                        Dynamic mode decomposition
    dz                                                                 As a final point of comparison, we consider Dynamic mode
          = F(t, z(t)), with z(0) = z0 , z ∈ Rd , d ≥ 1. (5)           decomposition (DMD). DMD is a data-driven ROM tech-
     dt
                                                                       nique that represents the temporal dynamics of a com-
 The goal is to obtain a NN approximation Fb of the dynamics           plex, nonlinear system (Schmid 2010; Kutz et al. 2016)
 function F such that ddtz ≈ net(t, z) = F(t,      b z, ω). The full   as the combination of a few linearly evolving, spatially
 procedure can be outlined as follows:                                 coherent modes that oscillate at a fixed frequency, and
1. Compute the time series of modal coefficients                       which are closely related to the eigenvectors of the infinite-
    [z0 , . . . , zM −1 ] for t ∈ {0, . . . , M − 1} where zk ∈ Rm .   dimensional Koopman operator (Koopman 1931; Mezić
                                                                       2013). Consider the following snapshot matrices contain-
2. Initialize a NN approximation for the dynamics function             ing a few temporally-equispaced snapshots of a high-
    F(t,
     b z, ω) where ω represents the initial NN parameters.             dimensional dynamical system:
3. The NN parameters are optimized iteratively through the                X = v0 v1 . . . vM −1 , X0 = v1 v2 . . . vM
                                                                                                                          
    following steps.
  (a) Compute the approximate forward time trajectory of               where vk ∈ RN is the k th solution snapshot, N is the spa-
       the modal coefficients by solving eq. (5) using a stan-         tial degrees of freedom of the discretized system, and M is
       dard ODE integrator as,                                         the total number of temporal snapshots. DMD involves the
                                                                       identification of the best-fit linear operator AX that relates
         ẑM −1 = ODESolve(F,
                           b ω, z0 , t0 , tM −1 )               (6)    the above matrices as X0 = AX X, and computing its eigen-
                                                                       values and eigenvectors. Computing a least-square approx-
 (b) The free parameters of the NODE model are                         imation of AX using the Moore-Penrose pseudoinverse(† )
         0 M −1
     {ω,
       t ,t    }. Evaluate the differentiable lossfunction           may pose computational challenges due to the size of the
     L ODESolve(F,  b ω, z0 , t0 , tM −1 ) − zM −1 .                   discrete dynamical system. For computational efficiency, the
  (c) To optimise the loss, compute gradients with respect to          exact DMD algorithm (adopted here) avoids computing the
       the free parameters. Similar to the usual backpropaga-          Moore-Penrose pseudoinverse(† ) by projecting the operator
       tion algorithm, this can be achieved by first computing         on to a reduced space obtained by POD, as outlined in (Alla
       the gradient ∂L/∂b  z(t), and then a performing a reverse       and Kutz 2017).
       traversal through the intermediate states of the ODE               In recent years, Koopman mode theory has provided a rig-
       integrator. For a memory-efficient implementation, the          orous theoretical background for an efficient modal decom-
       adjoint method (Chen et al. 2018) can be used to back-          position in problems describing oscillations and other non-
       propagate the errors by solving an adjoint system for           linear dynamics using DMD (Rowley et al. 2009). Several
       the augmented state vector b = [ ∂L        ∂L ∂L T              variants of the DMD algorithm have been proposed (Proctor,
                                              z , ∂ω , ∂t ] back-
                                             ∂b
                             M −1      0                               Brunton, and Kutz 2016; Kutz, Fu, and Brunton 2016; Alek-
       wards in time from t        to t .
                     ∂L                                                seev et al. 2016; Le Clainche and Vega 2017) and have been
 (d) The gradient ∂ω     (t = 0) computed in the previous step         successfully applied as efficient ROM techniques for deter-
       is used to update the parameters ω by using an opti-            mining the optimal global basis modes for nonlinear, time-
       mization algorithm like RMSProp or Adam.                        dependent problems (Bistrian and Navon 2015, 2017). For
4. The trained NODE approximation of the dynamics func-                non-parametrized PDEs, DMD presents an efficient frame-
    tion can be used to compute predictions for the time tra-          work that combines all the three stages of ROM develop-
    jectory of the modal coefficients.                                 ment to learn a linear operator in an optimal least square
sense. However, this approach cannot be directly applied to     with “tanh” activations were found to train better when ev-
parametrized problems (Alsayyari et al. 2021).                  ery element of the input state vector was individually scaled
                                                                to be bounded in [−1, 1], while networks with “elu” activa-
               Numerical experiments                            tions trained better without scaling of input vectors. Aug-
In this section, we first assess the performance of different   mentation of input states as outlined in (Dupont, Doucet,
NODE architectures for a benchmark flow problem char-           and Teh 2019) was found to have no significant impact on
acterized by the incompressible Navier Stokes equations         the training. The RMSProp optimizer paired with either a
(NSE), and then further evaluate the relative performance       step decay function or an exponential decay function were
of all three NIROM models for two real-world applications       found to be equally effective. However, further numerical
governed by the shallow water equations (SWE). The POD-         experiments are necessary to study the efficiency of alterna-
RBF and DMD NIROM training runs were performed on               tive first-order and second-order optimization methods. The
a Macbook Pro 2018 with a 2.9 GHz 6-Core Intel Core i9          number of decay steps were varied in discrete increments
processor and 32 GB 2400 MHz DDR4 RAM. The NODE                 between 5000 to 25000, and decay rates ranging from 0.1 to
models were trained in serial on Vulcanite, a high perfor-      0.9 were studied. It was observed that a lower initial learning
mance computer at the U.S. Army Engineer Research and           rate (≈ 0.001) combined with either larger decay steps and
Development Center DoD Supercomputing Resource Center           smaller decay rates or vice versa led to a desirable training
(ERDC-DSRC). Vulcanite is equipped with NVIDIA Tesla            trajectory. Fig. 1 shows the evolution of the 1st , 3rd , and 5th
V100 PCIe GPU accelerator nodes and has 32GBytes mem-           latent-space modal coefficients for the pressure and the x-
ory/node.                                                       velocity solutions, obtained using the best 8 NODE models.
                                                                All the models generate accurate predictions at a finer tem-
Flow around a cylinder                                          poral resolution than the training data, and have excellent
                                                                agreement with the high-fidelity solution even while extrap-
This problem simulates a time-periodic fluid flow through
                                                                olating outside the training data (5 ≤ t ≤ 6 seconds).
a 2D pipe with a circular obstacle. The flow domain is a
rectangular pipe with a circular hole of radius r = 0.05,
denoted by Ω = [0, 2.2] × [−0.2, 0.21] \ Br (0.2, 0). The
flow is governed by
  ∂u
     + ∇ · (u ⊗ u) − ν∆u + ∇p = 0,                        (7)
  ∂t
                         ∇·u=0                            (8)
where u denotes the velocity, p the pressure, ⊗ is the outer
product (dyadic product) given by a ⊗ b = abT , and
ν = 0.001 is the kinematic viscosity. No slip boundary con-
ditions are specified along the lower and upper walls, and
on the boundary of the circular obstacle. A parabolic inflow
velocity profile is prescribed on the left wall,
                                           
                    (0.21 − y)(y − 0.2)
   u(0, y) = 4U                           ,0 ,            (9)
                            0.412
and zero gradient outflow boundary conditions on the right      Figure 1: Comparison of NODE models in the latent space
wall. High-fidelity simulation data is obtained with Open-      for the cylinder example
FOAM using an unstructured mesh with 14605 nodes at
Re = 100, such that the flow exhibits the periodic shedding        Fig. 2 compares the time trajectory of the spatial root
of von Karman vertices. 313 training snapshots are collected    mean square errors (RMSE) in the high-fidelity space for
for t = [2.5, 5.0] seconds with ∆t = 0.008 seconds, and the     two of the best NODE models with two DMD NIROM solu-
NIROM predictions are obtained for t = [2.5, 6.0] seconds       tions obtained using truncation levels of r = 20 and r = 8.
with ∆t = 0.002 seconds.                                        It is encouraging to note that even though the NODE solu-
   A large collection of NODE architectures and hyperpa-        tions are computed using a latent-space representation that
rameter configurations were trained for 50000 epochs and        is roughly comparable to the DMD solution with a smaller
details of the best 8 models are presented in Table 1. A        truncation level (r = 8), they are superior in accuracy to
fourth-order Runge-Kutta solver was found to be the opti-       the coarsely truncated DMD solutions. Furthermore, unlike
mal choice in terms of both accuracy and efficiency among       the POD-RBF solution that is trained with a first-order Eu-
all the available solvers ranging from the fixed-step for-      ler time discretization, the NODE solutions did not exhibit
ward Euler and the midpoint solvers to the adaptive-step        any significant loss in accuracy with time, even while pre-
Dormand-Prince (dopri5) solver. The “tanh” and “elu” acti-      dicting outside the training region. It is, however, important
vation functions were found to be the most effective among      to note that the training time for any new NODE architec-
all the available linear and nonlinear activation functions.    ture was extremely high (see Table 1) when compared to
Due to the nature of the activation functions, the networks     generating a POD-RBF or a DMD NIROM model, which
                                                                                                        LR      decay
                    Id                Layers           Units              Act.                                          Scaling   Augmented       MSE        Training
                                                                                                        steps, rate
                                                                          linear, relu,                 5000-25000,
                    Range                  1-4         32-512
                                                                          elu, tanh, ...                0.1-0.9
                    NODE1                   1            256              elu                           10000, 0.3       No           No         8.87e-4    24.45 hrs
                    NODE2                   1            256              tanh                          5000, 0.7        Yes          No         9.01e-4    24.56 hrs
                    NODE3                   1            512              elu                           5000, 0.5        No           No         8.86e-4    24.39 hrs
                    NODE4                   1            256              tanh                          10000, 0.25      Yes          Yes        9.02e-4    22.97 hrs
                    NODE5                   4             64              tanh                          5000, 0.5        Yes          No         9.19e-4    27.98 hrs
                    NODE6                   1            256              elu                           10000, 0.1       No           No         8.87e-4    24.13 hrs
                    NODE7                   2            128              elu                           5000, 0.5        No           No         8.86e-4    25.80 hrs
                    NODE8                   1            512              tanh                          5000, 0.5        Yes          Yes        9.00e-4    24.77 hrs

Table 1: Best NODE architectures for the cylinder example. All models were trained for 50000 epochs using the fourth-order
Runge-Kutta solver and the RMSProp optimizer with an initial learning rate of 1e-3 and a momentum of 0.9.


usually required less than a minute in most cases. Such long                                                      Bay in California, USA. The AdH high-fidelity model con-
training times may pose a significant challenge for exhaus-                                                       sists of N = 6311 nodes, uses tidal data obtained from
tive explorations of the design space for optimal architec-                                                       NOAA/NOS Co-Ops website at a tailwater elevation inflow
tures and hyperparameters, and may hinder the adoption of                                                         boundary and has no flow boundary conditions everywhere
existing packages for automated architecture search. Thus,                                                        else. Further details are available in (Dutta et al. 2020).
a concerted effort needs to be directed towards acceleration                                                         The training space is generated using 1801 high-fidelity
of NODE training times and towards constraining the design                                                        snapshots obtained between t = 41 minutes to t = 50
space by a priori identification of promising architectures.                                                      hours at a time interval of ∆t = 100 seconds. The pre-
                                                                                                                  dicted ROM solutions are computed for the same time in-
 0.05
                          DMD(20):p        NODE3:p     0.06                 DMD(20):vx           NODE3:vx         terval with ∆t = 50 seconds. A latent space of dimension
 0.04                     DMD(8):p         NODE5:p                          DMD(8):vx            NODE5:vx
                                                                                                                  265 is generated by using a POD truncation tolerance of
 0.03                                                  0.04
 0.02
                                                                                                                  τP OD = 5 × 10−7 for each solution component. The RBF
                                                       0.02                                                       NIROM approximation is computed using a shape factor,
 0.01
 0.00                                                  0.00                                                       c = 0.01. The simulation time points provided as input to
        2.5   3.0   3.5     4.0 4.5 5.0    5.5   6.0          2.5   3.0   3.5     4.0 4.5 5.0     5.5    6.0
                          Time (seconds)                                        Time (seconds)                    the NODE model are normalized to lie in t ∈ [0, 1]. The
                                                                                                                  ‘dopri5’ ODE solver is adopted for computing the hidden
Figure 2: Comparison of RMSE of best NODE models with                                                             states both forward and backward in time. Learning from the
DMD for the cylinder example                                                                                      conclusions of the cylinder example, a network consisting of
                                                                                                                  a single hidden layer with 256 neurons is deployed and the
                                                                                                                  RMSProp optimizer with an initial learning rate of 0.001,
                                                                                                                  a staircase decay rate of 0.5 every 5000 epochs, and a mo-
Shallow water equations                                                                                           mentum of 0.9 is utilized for training the model over 20000
The next two numerical examples involve flows governed by                                                         epochs. For the DMD NIROM, the simulation time points
the depth-averaged SWE which is written in a conservative                                                         are normalized to an unit time step, and a truncation level of
residual formulation as                                                                                           r = 115 is used to compute the DMD eigen-spectrum.
                ∂q ∂px    ∂py                                                                                        Figure 3 shows the NIROM solutions (top row) for ux at
   R≡              +    +     + r = 0,                                                                  (10)      t = 17.36 hours and the corresponding error plots.
                ∂t   ∂x    ∂y
                                                                                                                     Figure 4 shows the spatial RMSE over time for the
where the state variable q = [h, ux h, uy h]T consists of the                                                     depth (left) and the x-velocity (right) NIROM solutions. The
flow depth, h, and the discharges in the x and y directions,                                                      NODE NIROM solution has comparable accuracy to the
given by ux h and uy h, respectively. Further details about                                                       DMD NIROM solution and unlike the RBF NIROM solu-
the flux vectors px , py and the high-fidelity model equa-                                                        tion, does not exhibit any appreciable accumulation of error
tions are available in (Dutta et al. 2020). The high-fidelity                                                     over time.
numerical solutions of the SWE are obtained using the 2D
depth-averaged module of the Adaptive Hydraulics (AdH)                                                            Riverine flow in Red River The final numerical example
finite element suite, which is a U.S. Army Corps of Engi-                                                         involves an application of the 2D SWE to simulate riverine
neers (USACE) high-fidelity, finite element resource for 2D                                                       flow in a section of the Red River in Louisiana, USA. The
and 3D dynamics (Trahan et al. 2018).                                                                             AdH high-fidelity model uses N = 12291 nodes, has a natu-
                                                                                                                  ral inflow velocity condition upstream, a tailwater elevation
Tidal flow in San Diego bay This numerical example in-                                                            boundary downstream, and no flow boundary along the river
volves the simulation of tide-driven flow in the San Diego                                                        bank. For further details see (Dutta et al. 2020).
            RBF solution at t=17.36 hrs                       DMD solution at t=17.36 hrs                   NODE solution at t=17.36 hrs                               RBF solution at t=3.61 hrs                      DMD solution at t=3.61 hrs                    NODE solution at t=3.61 hrs
              0.90163 < ux < 1.09034                            0.90537 < ux < 1.07212                         0.96310 < ux < 1.06566                                    0.18894 < ux < 0.31979                          0.35848 < ux < 0.61996                        0.24505 < ux < 0.32397

                                              1.00                                               1.00                                          1.00                                                                                                     0.5                                              0.3
                                              0.75                                               0.75                                          0.75                                                                                                     0.4
                                                                                                                                                                                                        0.2                                                                                              0.2
                                              0.50                                               0.50                                          0.50                                                                                                     0.3
                                              0.25                                               0.25                                          0.25                                                     0.1                                             0.2                                              0.1
                                              0.00                                               0.00                                          0.00                                                                                                     0.1
                                                                                                                                                                                                                                                                                                         0.0
                                                                                                   0.25                                          0.25                                                   0.0                                             0.0
                                                0.25
                                                                                                                                                 0.50                                                                                                     0.1                                             0.1
                                                0.50                                               0.50                                                                                                     0.1
                                                0.75                                               0.75                                          0.75                                                                                                     0.2                                             0.2




            (a) RBF ux                                       (b) DMD ux                                    (c) NODE ux                                              (a) RBF ux                                    (b) DMD ux                                    (c) NODE ux

        0.020851