=Paper= {{Paper |id=Vol-2964/article_209 |storemode=property |title=Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics |pdfUrl=https://ceur-ws.org/Vol-2964/article_209.pdf |volume=Vol-2964 |authors=Sourav Dutta,Peter Rivera-Casillas,Matthew Farthing |dblpUrl=https://dblp.org/rec/conf/aaaiss/DuttaRF21 }} ==Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics== https://ceur-ws.org/Vol-2964/article_209.pdf

Neural Ordinary Differential Equations for Data-Driven Reduced Order
Modeling of Environmental Hydrodynamics
Sourav Dutta,1 Peter Rivera-Casillas,2 Matthew W. Farthing,1
1
U.S. Army Engineer Research and Development Center, Coastal and Hydraulics Laboratory, Vicksburg, MS,
2
U.S. Army Engineer Research and Development Center, Information and Technology Laboratory, Vicksburg, MS,
1, 2
{sourav.dutta, peter.g.rivera-casillas, matthew.w.farthing}@erdc.dren.mil

Abstract the construction of a solution-dependent, linear basis space
Model reduction for fluid flow simulation continues to be of
spanned by a set of RB “modes”, which are extracted from
great interest across a number of scientific and engineering a collection of high-fidelity solutions, also called snapshots.
fields. Here, we explore the use of Neural Ordinary Differ- The RB “modes” can be thought of as a set of global ba-
ential Equations, a recently introduced family of continuous- sis functions spanning a linear subspace that can be used to
depth, differentiable networks (Chen et al. 2018), as a way to approximate the dynamics of the high-fidelity model. The
propagate latent-space dynamics in reduced order models. We most well known method to extract the reduced basis is
compare their behavior with two classical non-intrusive meth- called proper orthogonal decomposition (POD) (Sirovich
ods based on proper orthogonal decomposition and radial ba- 1987; Berkooz, Holmes, and Lumley 1993), which is partic-
sis function interpolation as well as dynamic mode decompo- ularly effective when the coherent structures of the flow can
sition. The test problems we consider include incompressible be hierarchically ranked in terms of their energy content.
flow around a cylinder as well as real-world applications of
shallow water hydrodynamics in riverine and estuarine sys- In the online stage of traditional RB methods, a linear
tems. Our findings indicate that Neural ODEs provide an el- combination of the reduced order RB modes is used to ap-
egant framework for stable and accurate evolution of latent- proximate the high-fidelity solution for a new configura-
space dynamics with a promising potential of extrapolatory tion of flow parameters. The procedure adopted to com-
predictions. However, in order to facilitate their widespread pute the expansion coefficients leads to the classification
adoption for large-scale systems, significant effort needs to of these methods into two broad categories: intrusive and
directed at accelerating their training times. This will enable a non-intrusive. In an intrusive RB method, the expansion co-
more comprehensive exploration of the hyperparameter space
efficients are determined by the solution of a reduced or-
for building generalizable Neural ODE approximations over
a wide range of system dynamics. der system of equations, which is typically obtained via a
Galerkin or Petrov-Galerkin projection of the high-fidelity
(full-order) system onto the RB space (Lozovskiy, Farthing,
Introduction and Kees 2017). Typically this projection and solution in-
Despite the trend of hardware improvements and significant volves modification of high-fidelity simulators and hence the
gains in the algorithmic efficiency of standard discretiza- label intrusive. For linear systems, Galerkin projection is the
tion procedures, high-fidelity numerical simulation of engi- most popular choice. However, in the presence of nonlin-
neering systems governed by nonlinear partial differential earities, an affine expansion of the nonlinear (or non-affine)
equations still pose a prohibitive computational challenge differential operator must be recovered in order to make the
(Quarteroni, Manzoni, and Negri 2016) for several decision- evaluation of the projection-based reduced model indepen-
making applications involving control (Proctor, Brunton, dent of the number of DOFs of the high-fidelity solution.
and Kutz 2016), optimal design and multi-fidelity optimiza- Several different techniques, collectively referred to as
tion (Peherstorfer, Willcox, and Gunzburger 2016), and/or hyper-reduction methods (Amsallem et al. 2015), have been
uncertainty quantification (Sapsis and Majda 2013). Re- proposed to address this problem. These include the em-
duced order models (ROMs) offer a valuable alternative way pirical interpolation method (EIM), its discrete counterpart
to simulate such dynamical systems with considerably re- DEIM (Chaturantabut and Sorensen 2010), “gappy POD”
duced computational cost (Benner, Gugercin, and Willcox (Willcox 2006), as well as the residual DEIM method (Xiao
2015). et al. 2014). Beyond the need for hyper-reduction to re-
Reduced basis (RB) methods (Quarteroni, Manzoni, and cover efficiency, in complex nonlinear problems it is also
Negri 2016) constitute a family of widely popular ROM common that some of the intrinsic structures present in the
techniques that are usually implemented with an offline- high-fidelity model may be lost during order reduction us-
online decomposition paradigm. The offline stage involves ing Galerkin projection-based approaches. This is because
Copyright © 2021, for this paper by its authors. Use permitted un- the Galerkin projection approach inherently assumes that
der Creative Commons License Attribution 4.0 International (CC the residual generated by the truncated representation of the
BY 4.0). high-fidelity model is orthogonal to the reduced basis space
which leads to the loss of higher order nonlinear interac- Machine learning techniques can be introduced at any of
tion terms in the reduced representation. This can result in these stages. For example, many works have explored the
qualitatively wrong solutions or instability issues (Amsallem use of deep learning-based approaches like autoencoders as
and Farhat 2012). As a remedy, Petrov-Galerkin projec- a way to introduce a nonlinear alternative for dimension re-
tion based approaches have been proposed (Carlberg, Bou- duction (Lusch, Nathan Kutz, and Brunton 2018; Ghorban-
Mosleh, and Farhat 2011; Carlberg et al. 2013; Fang et al. idehno et al. 2021). Combining these methods with data-
2013). driven latent-space propagation (for example via fully con-
An alternative family of methods to address the issues of nected or recurrent neural networks) leads to a fully non-
instability and loss of efficiency in the intrusive ROM frame- intrusive approach (Gonzalez and Balajewicz 2018). On the
works is represented by non-intrusive reduced order models other hand, one can also combine nonlinear dimension re-
(NIROMs), and forms the subject of this study. The primary duction with intrusive projection to create a hybrid method
advantage of this class of methods is that complex modifica- (Lee and Carlberg 2020; Kim et al. 2020). In this work,
tions to the source code describing the physical model can we study three different data-driven strategies for accurate
be avoided, thus making it easier to develop reduced models learning of system dynamics within the context of linear di-
when the legacy or proprietary source codes are not avail- mension reduction. In the first two methods we adopt the
able. In these methods, instead of a Galerkin-type projec- POD technique for identification of an optimal global ba-
tion, the expansion coefficients for the reduced solution are sis. For the latent space evolution, we utilize a kernel-based
obtained via interpolation on the space of a reduced basis multivariate interpolation method called radial basis func-
extracted from snapshot data. However, since the reduced tion (RBF) interpolation, and a machine learning strategy
dynamics generally belong to nonlinear, matrix manifolds, designed for sequential learning of time-series data called
a variety of interpolation techniques have been proposed neural ordinary differential equations (NODE). In the third
that are capable of enforcing the constraints characterizing strategy, the three stages of ROM development are combined
those manifolds. Regression-based non-intrusive methods together by using a classical modal decomposition technique
have been proposed that, among others, use artificial neu- called the dynamic mode decomposition (DMD), that is sup-
ral networks (ANNs), in particular multi-layer perceptrons ported by rigorous mathematical analysis of Koopman mode
(Hesthaven and Ubbiali 2018), Gaussian process regression theory.
(GPR) (Guo and Hesthaven 2019), and radial basis function
(RBF) (Audouze, De Vuyst, and Nair 2013) to perform the Proper orthogonal decomposition
interpolation.
POD is a popular technique for dimension reduction (An-
Here, we will explore an alternative approach to propagat-
toulas and Sorensen 2001) of the solution manifold of a
ing latent-space dynamics based on Neural ODEs, which are
dynamical system by determining a linear reduced space
a family of continuous-depth, differentiable networks that
spanned by an orthogonal basis with an associated energetic
can be seen as an extension of ResNets in the limit of a zero
hierarchy. (Taira et al. 2020) provides an excellent overview
discretization step size (Dupont, Doucet, and Teh 2019). De-
of POD as well as a comparison with other dimension-
tails of our approach follow below. In addition, we consider
reduction techniques.
two NIROM techniques - a) based on linear dimension re-
Consider a snapshot matrix S = [b b M ] ∈ RN ×M
v1 , . . . , v
duction via POD and latent space evolution via Radial Ba-
containing a collection of M high-fidelity snapshots of the
sis Functions (RBF), and b) Dynamic Mode Decomposition
solution manifold from time t = 0 to t = T such that
(DMD) that will serve as the benchmarks in our numerical
b k ∈ RN is the k th snapshot with the temporal mean value
v
experiments to provide comparisons with the Neural ODE vi
b k = vk − v̄ where v̄ = M
P
approach. We then proceed with several numerical experi- removed, i.e. , v i=1 M is the time-
ments based on incompressible flow around a cylinder and averaged solution. The goal of thePOD procedure is to iden-
shallow water hydrodynamics in order to evaluate the meth- tify a linear subspace χ = span ψ 1 , . . . , ψ r , (r M )
ods’ performance for fast replay applications in complex which approximates the solution manifold optimally with
fluid-dynamics problems. respect to the L2 -norm.
The POD bases can be efficiently extracted by performing
a “thin” singular value decomposition (SVD) of the snap-
Methodology
shot matrix S = Θ eΣeΨ e T , where Σ
e = diag(σ1 , . . . , σR ) is
The standard ROM development framework usually consists
a R × R diagonal matrix containing the singular values ar-
of three stages:
ranged in decreasing order of magnitude, σ1 ≥ σ2 . . . ≥ σR
1. identification of a low-dimensional latent (or reduced- and R < min{N, M } is the rank of S. Θ e and Ψ e are N × R
order) space, and M × R matrices respectively, whose columns are the
2. determining a latent-space representation of the nonlinear orthonormal left and right singular vectors of S such that
dynamical system in terms of the reduced basis and mod- ΘeTΘ e = IR = Ψ e T Ψ.
e The columns θ n of the matrix Θ e are
eling the evolution of the system of modal coefficients, ordered corresponding to the singular values σn and these
and provide the desired POD basis. Let Θ denote the matrix of
3. reconstruction in the high-fidelity space for validation and the first m columns of Θ, e Ψ be the matrix containing the
analysis. first m rows of Ψ, and Σ be a diagonal matrix containing
e
the first m singular values from Σ,
e then the high-fidelity so- modal coefficients as {zl | l = 0, . . . , M − 1} such that
lution vn at time tn can be approximated as, Ni = Ne = M , and making some simplifying assumptions
m leads to a symmetric, linear system of M equations to solve
for the unknown interpolation coefficients, αj,k
X
vn ≈ v̄ + Θzn = v̄ + zin θ i , (1)
i=1
Aαj = g j , for j = 1, . . . , m, (4)
where zn ∈ Rm is a vector of modal coefficients with re-
spect to the reduced basis. The modal coefficient matrix where
Z = ΘT S constitutes our training data for the latent space [An,k ] = [φ kzn − zk k ],

n, k = 0, . . . M − 1,
learning methods. Due to the Eckart-Young-Mirsky theo-
rem, the POD basis provides an optimal rank-m approxi- α = [αj,0 , . . . , αj,M −1 ] , g = [gj,0 , . . . , gj,M −1 ]T .
j T j

mation Sb = ΘΣΨT of the snapshot matrix S with a desired
The coefficients αj define a unique RBF interpolant which
level of accuracy, τP OD . can then be used to approximate eq. (2) and generate a non-
The POD method has been successfully applied in statis- intrusive model for the evolution of the modal coefficients
tics (Jolliffe 1986), signal analysis and pattern recognition In this work, a first-order forward Euler scheme has been
(Deheuvels and Martynov 2008), ocean models (Vermeulen employed for the discretization of the time derivative, and a
and Heemink 2006), air pollution models (Fang et al. 2014), strictly positive-definite Matérn C 0 kernel, given by φ(r) =
convective Boussinesq flows (San and Borggaard 2015), and e−cr has been adopted, where r is the Euclidean distance
Shallow Water Equation (SWE) models (Stefanescu, Sandu, and c is the RBF shape factor (Fasshauer 2007).
and Navon 2014; Lozovskiy et al. 2016).
Adopting RBF interpolation for modeling the latent space
Latent space evolution evolution of the modal coefficients has been shown to be
quite successful for nonlinear, time-dependent partial dif-
In this section, we outline two non-intrusive methods for ferential equations (PDEs) (Xiao et al. 2015; Dutta et al.
modeling the evolution of time-series data in the latent space 2020), nonlinear, parametrized PDEs (Audouze, De Vuyst,
defined by the POD basis. RBF interpolation is a classical, and Nair 2013; Xiao et al. 2017), and aerodynamic shape op-
data-driven, kernel-based method for computing an approx- timization (Iuliano and Quagliarella 2013), to name a few.
imate continuous response surface that aligns with the given
multivariate data. The second technique called NODE is a Neural ordinary differential equations Recurrent neural
neural-network based method to predict the continuous evo- network (RNN) architectures like LSTM and GRU are of-
lution of a vector c over time, that is designed to preserve ten employed to encode time-series data and forecast fu-
memory effects within the architecture. ture states, as their internal memory preserving architec-
ture allows them to incorporate state information over a se-
Radial basis function interpolation For simplicity, let quence of input data. Although RNNs have seen great suc-
the time evolution of the modal coefficients z be represented cess in natural language processing tasks, they have had rel-
as a semi-discrete dynamical system, atively limited success in high-fidelity scientific computing
ż = f (z, t), with z0 = ΘT v0 − v̄

(2) applications (Ferrandis et al. 2019; Wang, Ripamonti, and
Hesthaven 2020), as it has been observed that a sequence
where all the information about the temporal dynamics in- generated by an RNN may fail to preserve temporal reg-
cluding the effects of any numerical stabilization of the high- ularity of the underlying signal, and thus may not repre-
fidelity solver and all the nonlinear terms are embedded in sent true continuous dynamics. (Chen et al. 2018). With
f (z, t). In the POD-RBF NIROM framework (Dutta et al. deep neural networks (DNN) such as ResNet, the evolu-
2020), instead of the Galerkin projection, the components of tion of the features over the network depth is equivalent
the time derivative function fj (j = 1, . . . , m) are approxi- to solving an ordinary differential equation (ODE) such as
mated using RBF interpolation. dz
dt = F (z, θ) using the forward Euler method, and this con-
Let Fj denote a RBF approximation of the time derivative nection between ResNet’s architecture and numerical inte-
function fj , which is defined by a linear combination of Ni grators has been explored in details by (Ruthotto and Haber
instances of a radial basis function φ, 2019) and others. Several other deep learning methods have
Ni been proposed for learning ODEs and PDEs. These include
using PDE-based network (Long, Lu, and Dong 2019), train-
X
Fj (z) = αj,k φ (kz − b
zk k) , j = 1, . . . , m, (3)
ing DNNs using physics-informed soft penalty constraints
k=1
(Raissi, Perdikaris, and Karniadakis 2019), and using sparse
where {b zk | k = 1, . . . , Ni } denotes the set of interpolation regularizers and regression (Brunton et al. 2016; Champion
centers and αj,k (k = 1, . . . , Ni ) is the unknown interpola- et al. 2019), to name a few.
tion coefficient corresponding to the k th center for the j th Chen et al. (2018) proposed a ’continuous-depth’ neu-
component of the modal coefficient. These interpolation co- ral network called ODE-Net that effectively replaces the
efficients are computed by enforcing the interpolation func- layers in ResNet-like architectures with a trainable ODE
tion Fj to exactly match the time derivative of the modal solver. The memory efficiency and stability of this neural or-
coefficients at Ne test points (Ne ≥ Ni ). Choosing the cen- dinary differential equation (NODE) approach was further
ters and the test points identically from the set of snapshot improved in (Gholami, Keutzer, and Biros 2019; Dupont,
Doucet, and Teh 2019) and others. (Maulik et al. 2020) ap- In this work, we utilize the TFDiffEq (https://github.com/
plied the NODE framework to obtain latent space closure titu1994/tfdiffeq) library that runs on the Tensorflow Eager
models for ROMs of a one-dimensional advecting shock Execution platform to train the NODE models. Although a
problem and a one-dimensional Burgers’ turbulence prob- single layer architecture guarantees upper-bounds accord-
lem that exhibits multiscale behavior in the wavenumber ing to the universal approximation theorem (Barron 1993),
space. Some other notable recent applications of NODE in- deeper networks with up to four layers as well as several
clude the identification of ODE or PDE models from time- linear and nonlinear activation functions are also explored
dependent data (Sun, Zhang, and Schaeffer 2020), modeling due to their possibly improved expressibility for more com-
of irregularly spaced time series data (Rubanova, Chen, and plex nonlinear dynamics (Zhang et al. 2019). RMSProp is
Duvenaud 2019), modeling of spatio-temporal information adopted for loss minimization with an initial learning rate
in video signals (Kanaa et al. 2019). Finlay et al. (2020) used of 0.001, a staircase decay function with a range of variable
a combination of optimal transport theory and stability reg- decay schedules, and a momentum coefficient of 0.9. NODE
ularizations to propose a neural-ODE generative model that predictions of comparable accuracy were obtained for all
can be efficiently trained on large-scale datasets. Here we the numerical experiments by using both the adjoint method
further explore the application of the POD-NODE method- as well as by backpropagating gradients directly through
ology to complex, real-world flows characterized by systems the hidden steps of the ode solver. However, for large-scale
of two-dimensional, nonlinear PDEs. training data the latter method may lead to memory issues,
We assume that the time evolution of the modal coef- especially while computing on GPU nodes.
ficients of the high-fidelity dynamical system in the latent
space can be modeled using a (first-order) ODE, Dynamic mode decomposition
dz As a final point of comparison, we consider Dynamic mode
= F(t, z(t)), with z(0) = z0 , z ∈ Rd , d ≥ 1. (5) decomposition (DMD). DMD is a data-driven ROM tech-
dt
nique that represents the temporal dynamics of a com-
The goal is to obtain a NN approximation Fb of the dynamics plex, nonlinear system (Schmid 2010; Kutz et al. 2016)
function F such that ddtz ≈ net(t, z) = F(t, b z, ω). The full as the combination of a few linearly evolving, spatially
procedure can be outlined as follows: coherent modes that oscillate at a fixed frequency, and
1. Compute the time series of modal coefficients which are closely related to the eigenvectors of the infinite-
[z0 , . . . , zM −1 ] for t ∈ {0, . . . , M − 1} where zk ∈ Rm . dimensional Koopman operator (Koopman 1931; Mezić
2013). Consider the following snapshot matrices contain-
2. Initialize a NN approximation for the dynamics function ing a few temporally-equispaced snapshots of a high-
F(t,
b z, ω) where ω represents the initial NN parameters. dimensional dynamical system:
3. The NN parameters are optimized iteratively through the X = v0 v1 . . . vM −1 , X0 = v1 v2 . . . vM

following steps.
(a) Compute the approximate forward time trajectory of where vk ∈ RN is the k th solution snapshot, N is the spa-
the modal coefficients by solving eq. (5) using a stan- tial degrees of freedom of the discretized system, and M is
dard ODE integrator as, the total number of temporal snapshots. DMD involves the
identification of the best-fit linear operator AX that relates
ẑM −1 = ODESolve(F,
b ω, z0 , t0 , tM −1 ) (6) the above matrices as X0 = AX X, and computing its eigen-
values and eigenvectors. Computing a least-square approx-
(b) The free parameters of the NODE model are imation of AX using the Moore-Penrose pseudoinverse(† )
0 M −1
{ω,
t ,t }. Evaluate the differentiable lossfunction may pose computational challenges due to the size of the
L ODESolve(F, b ω, z0 , t0 , tM −1 ) − zM −1 . discrete dynamical system. For computational efficiency, the
(c) To optimise the loss, compute gradients with respect to exact DMD algorithm (adopted here) avoids computing the
the free parameters. Similar to the usual backpropaga- Moore-Penrose pseudoinverse(† ) by projecting the operator
tion algorithm, this can be achieved by first computing on to a reduced space obtained by POD, as outlined in (Alla
the gradient ∂L/∂b z(t), and then a performing a reverse and Kutz 2017).
traversal through the intermediate states of the ODE In recent years, Koopman mode theory has provided a rig-
integrator. For a memory-efficient implementation, the orous theoretical background for an efficient modal decom-
adjoint method (Chen et al. 2018) can be used to back- position in problems describing oscillations and other non-
propagate the errors by solving an adjoint system for linear dynamics using DMD (Rowley et al. 2009). Several
the augmented state vector b = [ ∂L ∂L ∂L T variants of the DMD algorithm have been proposed (Proctor,
z , ∂ω , ∂t ] back-
∂b
M −1 0 Brunton, and Kutz 2016; Kutz, Fu, and Brunton 2016; Alek-
wards in time from t to t .
∂L seev et al. 2016; Le Clainche and Vega 2017) and have been
(d) The gradient ∂ω (t = 0) computed in the previous step successfully applied as efficient ROM techniques for deter-
is used to update the parameters ω by using an opti- mining the optimal global basis modes for nonlinear, time-
mization algorithm like RMSProp or Adam. dependent problems (Bistrian and Navon 2015, 2017). For
4. The trained NODE approximation of the dynamics func- non-parametrized PDEs, DMD presents an efficient frame-
tion can be used to compute predictions for the time tra- work that combines all the three stages of ROM develop-
jectory of the modal coefficients. ment to learn a linear operator in an optimal least square
sense. However, this approach cannot be directly applied to with “tanh” activations were found to train better when ev-
parametrized problems (Alsayyari et al. 2021). ery element of the input state vector was individually scaled
to be bounded in [−1, 1], while networks with “elu” activa-
Numerical experiments tions trained better without scaling of input vectors. Aug-
In this section, we first assess the performance of different mentation of input states as outlined in (Dupont, Doucet,
NODE architectures for a benchmark flow problem char- and Teh 2019) was found to have no significant impact on
acterized by the incompressible Navier Stokes equations the training. The RMSProp optimizer paired with either a
(NSE), and then further evaluate the relative performance step decay function or an exponential decay function were
of all three NIROM models for two real-world applications found to be equally effective. However, further numerical
governed by the shallow water equations (SWE). The POD- experiments are necessary to study the efficiency of alterna-
RBF and DMD NIROM training runs were performed on tive first-order and second-order optimization methods. The
a Macbook Pro 2018 with a 2.9 GHz 6-Core Intel Core i9 number of decay steps were varied in discrete increments
processor and 32 GB 2400 MHz DDR4 RAM. The NODE between 5000 to 25000, and decay rates ranging from 0.1 to
models were trained in serial on Vulcanite, a high perfor- 0.9 were studied. It was observed that a lower initial learning
mance computer at the U.S. Army Engineer Research and rate (≈ 0.001) combined with either larger decay steps and
Development Center DoD Supercomputing Resource Center smaller decay rates or vice versa led to a desirable training
(ERDC-DSRC). Vulcanite is equipped with NVIDIA Tesla trajectory. Fig. 1 shows the evolution of the 1st , 3rd , and 5th
V100 PCIe GPU accelerator nodes and has 32GBytes mem- latent-space modal coefficients for the pressure and the x-
ory/node. velocity solutions, obtained using the best 8 NODE models.
All the models generate accurate predictions at a finer tem-
Flow around a cylinder poral resolution than the training data, and have excellent
agreement with the high-fidelity solution even while extrap-
This problem simulates a time-periodic fluid flow through
olating outside the training data (5 ≤ t ≤ 6 seconds).
a 2D pipe with a circular obstacle. The flow domain is a
rectangular pipe with a circular hole of radius r = 0.05,
denoted by Ω = [0, 2.2] × [−0.2, 0.21] \ Br (0.2, 0). The
flow is governed by
∂u
+ ∇ · (u ⊗ u) − ν∆u + ∇p = 0, (7)
∂t
∇·u=0 (8)
where u denotes the velocity, p the pressure, ⊗ is the outer
product (dyadic product) given by a ⊗ b = abT , and
ν = 0.001 is the kinematic viscosity. No slip boundary con-
ditions are specified along the lower and upper walls, and
on the boundary of the circular obstacle. A parabolic inflow
velocity profile is prescribed on the left wall,

(0.21 − y)(y − 0.2)
u(0, y) = 4U ,0 , (9)
0.412
and zero gradient outflow boundary conditions on the right Figure 1: Comparison of NODE models in the latent space
wall. High-fidelity simulation data is obtained with Open- for the cylinder example
FOAM using an unstructured mesh with 14605 nodes at
Re = 100, such that the flow exhibits the periodic shedding Fig. 2 compares the time trajectory of the spatial root
of von Karman vertices. 313 training snapshots are collected mean square errors (RMSE) in the high-fidelity space for
for t = [2.5, 5.0] seconds with ∆t = 0.008 seconds, and the two of the best NODE models with two DMD NIROM solu-
NIROM predictions are obtained for t = [2.5, 6.0] seconds tions obtained using truncation levels of r = 20 and r = 8.
with ∆t = 0.002 seconds. It is encouraging to note that even though the NODE solu-
A large collection of NODE architectures and hyperpa- tions are computed using a latent-space representation that
rameter configurations were trained for 50000 epochs and is roughly comparable to the DMD solution with a smaller
details of the best 8 models are presented in Table 1. A truncation level (r = 8), they are superior in accuracy to
fourth-order Runge-Kutta solver was found to be the opti- the coarsely truncated DMD solutions. Furthermore, unlike
mal choice in terms of both accuracy and efficiency among the POD-RBF solution that is trained with a first-order Eu-
all the available solvers ranging from the fixed-step for- ler time discretization, the NODE solutions did not exhibit
ward Euler and the midpoint solvers to the adaptive-step any significant loss in accuracy with time, even while pre-
Dormand-Prince (dopri5) solver. The “tanh” and “elu” acti- dicting outside the training region. It is, however, important
vation functions were found to be the most effective among to note that the training time for any new NODE architec-
all the available linear and nonlinear activation functions. ture was extremely high (see Table 1) when compared to
Due to the nature of the activation functions, the networks generating a POD-RBF or a DMD NIROM model, which
LR decay
Id Layers Units Act. Scaling Augmented MSE Training
steps, rate
linear, relu, 5000-25000,
Range 1-4 32-512
elu, tanh, ... 0.1-0.9
NODE1 1 256 elu 10000, 0.3 No No 8.87e-4 24.45 hrs
NODE2 1 256 tanh 5000, 0.7 Yes No 9.01e-4 24.56 hrs
NODE3 1 512 elu 5000, 0.5 No No 8.86e-4 24.39 hrs
NODE4 1 256 tanh 10000, 0.25 Yes Yes 9.02e-4 22.97 hrs
NODE5 4 64 tanh 5000, 0.5 Yes No 9.19e-4 27.98 hrs
NODE6 1 256 elu 10000, 0.1 No No 8.87e-4 24.13 hrs
NODE7 2 128 elu 5000, 0.5 No No 8.86e-4 25.80 hrs
NODE8 1 512 tanh 5000, 0.5 Yes Yes 9.00e-4 24.77 hrs

Table 1: Best NODE architectures for the cylinder example. All models were trained for 50000 epochs using the fourth-order
Runge-Kutta solver and the RMSProp optimizer with an initial learning rate of 1e-3 and a momentum of 0.9.

usually required less than a minute in most cases. Such long Bay in California, USA. The AdH high-fidelity model con-
training times may pose a significant challenge for exhaus- sists of N = 6311 nodes, uses tidal data obtained from
tive explorations of the design space for optimal architec- NOAA/NOS Co-Ops website at a tailwater elevation inflow
tures and hyperparameters, and may hinder the adoption of boundary and has no flow boundary conditions everywhere
existing packages for automated architecture search. Thus, else. Further details are available in (Dutta et al. 2020).
a concerted effort needs to be directed towards acceleration The training space is generated using 1801 high-fidelity
of NODE training times and towards constraining the design snapshots obtained between t = 41 minutes to t = 50
space by a priori identification of promising architectures. hours at a time interval of ∆t = 100 seconds. The pre-
dicted ROM solutions are computed for the same time in-
0.05
DMD(20):p NODE3:p 0.06 DMD(20):vx NODE3:vx terval with ∆t = 50 seconds. A latent space of dimension
0.04 DMD(8):p NODE5:p DMD(8):vx NODE5:vx
265 is generated by using a POD truncation tolerance of
0.03 0.04
0.02
τP OD = 5 × 10−7 for each solution component. The RBF
0.02 NIROM approximation is computed using a shape factor,
0.01
0.00 0.00 c = 0.01. The simulation time points provided as input to
2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Time (seconds) Time (seconds) the NODE model are normalized to lie in t ∈ [0, 1]. The
‘dopri5’ ODE solver is adopted for computing the hidden
Figure 2: Comparison of RMSE of best NODE models with states both forward and backward in time. Learning from the
DMD for the cylinder example conclusions of the cylinder example, a network consisting of
a single hidden layer with 256 neurons is deployed and the
RMSProp optimizer with an initial learning rate of 0.001,
a staircase decay rate of 0.5 every 5000 epochs, and a mo-
Shallow water equations mentum of 0.9 is utilized for training the model over 20000
The next two numerical examples involve flows governed by epochs. For the DMD NIROM, the simulation time points
the depth-averaged SWE which is written in a conservative are normalized to an unit time step, and a truncation level of
residual formulation as r = 115 is used to compute the DMD eigen-spectrum.
∂q ∂px ∂py Figure 3 shows the NIROM solutions (top row) for ux at
R≡ + + + r = 0, (10) t = 17.36 hours and the corresponding error plots.
∂t ∂x ∂y
Figure 4 shows the spatial RMSE over time for the
where the state variable q = [h, ux h, uy h]T consists of the depth (left) and the x-velocity (right) NIROM solutions. The
flow depth, h, and the discharges in the x and y directions, NODE NIROM solution has comparable accuracy to the
given by ux h and uy h, respectively. Further details about DMD NIROM solution and unlike the RBF NIROM solu-
the flux vectors px , py and the high-fidelity model equa- tion, does not exhibit any appreciable accumulation of error
tions are available in (Dutta et al. 2020). The high-fidelity over time.
numerical solutions of the SWE are obtained using the 2D
depth-averaged module of the Adaptive Hydraulics (AdH) Riverine flow in Red River The final numerical example
finite element suite, which is a U.S. Army Corps of Engi- involves an application of the 2D SWE to simulate riverine
neers (USACE) high-fidelity, finite element resource for 2D flow in a section of the Red River in Louisiana, USA. The
and 3D dynamics (Trahan et al. 2018). AdH high-fidelity model uses N = 12291 nodes, has a natu-
ral inflow velocity condition upstream, a tailwater elevation
Tidal flow in San Diego bay This numerical example in- boundary downstream, and no flow boundary along the river
volves the simulation of tide-driven flow in the San Diego bank. For further details see (Dutta et al. 2020).
RBF solution at t=17.36 hrs DMD solution at t=17.36 hrs NODE solution at t=17.36 hrs RBF solution at t=3.61 hrs DMD solution at t=3.61 hrs NODE solution at t=3.61 hrs
0.90163 < ux < 1.09034 0.90537 < ux < 1.07212 0.96310 < ux < 1.06566 0.18894 < ux < 0.31979 0.35848 < ux < 0.61996 0.24505 < ux < 0.32397

1.00 1.00 1.00 0.5 0.3
0.75 0.75 0.75 0.4
0.2 0.2
0.50 0.50 0.50 0.3
0.25 0.25 0.25 0.1 0.2 0.1
0.00 0.00 0.00 0.1
0.0
0.25 0.25 0.0 0.0
0.25
0.50 0.1 0.1
0.50 0.50 0.1
0.75 0.75 0.75 0.2 0.2

(a) RBF ux (b) DMD ux (c) NODE ux (a) RBF ux (b) DMD ux (c) NODE ux

0.020851