=Paper=
{{Paper
|id=Vol-2964/article_209
|storemode=property
|title=Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics
|pdfUrl=https://ceur-ws.org/Vol-2964/article_209.pdf
|volume=Vol-2964
|authors=Sourav Dutta,Peter Rivera-Casillas,Matthew Farthing
|dblpUrl=https://dblp.org/rec/conf/aaaiss/DuttaRF21
}}
==Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics==
Neural Ordinary Differential Equations for Data-Driven Reduced Order Modeling of Environmental Hydrodynamics Sourav Dutta,1 Peter Rivera-Casillas,2 Matthew W. Farthing,1 1 U.S. Army Engineer Research and Development Center, Coastal and Hydraulics Laboratory, Vicksburg, MS, 2 U.S. Army Engineer Research and Development Center, Information and Technology Laboratory, Vicksburg, MS, 1, 2 {sourav.dutta, peter.g.rivera-casillas, matthew.w.farthing}@erdc.dren.mil Abstract the construction of a solution-dependent, linear basis space Model reduction for fluid flow simulation continues to be of spanned by a set of RB “modes”, which are extracted from great interest across a number of scientific and engineering a collection of high-fidelity solutions, also called snapshots. fields. Here, we explore the use of Neural Ordinary Differ- The RB “modes” can be thought of as a set of global ba- ential Equations, a recently introduced family of continuous- sis functions spanning a linear subspace that can be used to depth, differentiable networks (Chen et al. 2018), as a way to approximate the dynamics of the high-fidelity model. The propagate latent-space dynamics in reduced order models. We most well known method to extract the reduced basis is compare their behavior with two classical non-intrusive meth- called proper orthogonal decomposition (POD) (Sirovich ods based on proper orthogonal decomposition and radial ba- 1987; Berkooz, Holmes, and Lumley 1993), which is partic- sis function interpolation as well as dynamic mode decompo- ularly effective when the coherent structures of the flow can sition. The test problems we consider include incompressible be hierarchically ranked in terms of their energy content. flow around a cylinder as well as real-world applications of shallow water hydrodynamics in riverine and estuarine sys- In the online stage of traditional RB methods, a linear tems. Our findings indicate that Neural ODEs provide an el- combination of the reduced order RB modes is used to ap- egant framework for stable and accurate evolution of latent- proximate the high-fidelity solution for a new configura- space dynamics with a promising potential of extrapolatory tion of flow parameters. The procedure adopted to com- predictions. However, in order to facilitate their widespread pute the expansion coefficients leads to the classification adoption for large-scale systems, significant effort needs to of these methods into two broad categories: intrusive and directed at accelerating their training times. This will enable a non-intrusive. In an intrusive RB method, the expansion co- more comprehensive exploration of the hyperparameter space efficients are determined by the solution of a reduced or- for building generalizable Neural ODE approximations over a wide range of system dynamics. der system of equations, which is typically obtained via a Galerkin or Petrov-Galerkin projection of the high-fidelity (full-order) system onto the RB space (Lozovskiy, Farthing, Introduction and Kees 2017). Typically this projection and solution in- Despite the trend of hardware improvements and significant volves modification of high-fidelity simulators and hence the gains in the algorithmic efficiency of standard discretiza- label intrusive. For linear systems, Galerkin projection is the tion procedures, high-fidelity numerical simulation of engi- most popular choice. However, in the presence of nonlin- neering systems governed by nonlinear partial differential earities, an affine expansion of the nonlinear (or non-affine) equations still pose a prohibitive computational challenge differential operator must be recovered in order to make the (Quarteroni, Manzoni, and Negri 2016) for several decision- evaluation of the projection-based reduced model indepen- making applications involving control (Proctor, Brunton, dent of the number of DOFs of the high-fidelity solution. and Kutz 2016), optimal design and multi-fidelity optimiza- Several different techniques, collectively referred to as tion (Peherstorfer, Willcox, and Gunzburger 2016), and/or hyper-reduction methods (Amsallem et al. 2015), have been uncertainty quantification (Sapsis and Majda 2013). Re- proposed to address this problem. These include the em- duced order models (ROMs) offer a valuable alternative way pirical interpolation method (EIM), its discrete counterpart to simulate such dynamical systems with considerably re- DEIM (Chaturantabut and Sorensen 2010), “gappy POD” duced computational cost (Benner, Gugercin, and Willcox (Willcox 2006), as well as the residual DEIM method (Xiao 2015). et al. 2014). Beyond the need for hyper-reduction to re- Reduced basis (RB) methods (Quarteroni, Manzoni, and cover efficiency, in complex nonlinear problems it is also Negri 2016) constitute a family of widely popular ROM common that some of the intrinsic structures present in the techniques that are usually implemented with an offline- high-fidelity model may be lost during order reduction us- online decomposition paradigm. The offline stage involves ing Galerkin projection-based approaches. This is because Copyright © 2021, for this paper by its authors. Use permitted un- the Galerkin projection approach inherently assumes that der Creative Commons License Attribution 4.0 International (CC the residual generated by the truncated representation of the BY 4.0). high-fidelity model is orthogonal to the reduced basis space which leads to the loss of higher order nonlinear interac- Machine learning techniques can be introduced at any of tion terms in the reduced representation. This can result in these stages. For example, many works have explored the qualitatively wrong solutions or instability issues (Amsallem use of deep learning-based approaches like autoencoders as and Farhat 2012). As a remedy, Petrov-Galerkin projec- a way to introduce a nonlinear alternative for dimension re- tion based approaches have been proposed (Carlberg, Bou- duction (Lusch, Nathan Kutz, and Brunton 2018; Ghorban- Mosleh, and Farhat 2011; Carlberg et al. 2013; Fang et al. idehno et al. 2021). Combining these methods with data- 2013). driven latent-space propagation (for example via fully con- An alternative family of methods to address the issues of nected or recurrent neural networks) leads to a fully non- instability and loss of efficiency in the intrusive ROM frame- intrusive approach (Gonzalez and Balajewicz 2018). On the works is represented by non-intrusive reduced order models other hand, one can also combine nonlinear dimension re- (NIROMs), and forms the subject of this study. The primary duction with intrusive projection to create a hybrid method advantage of this class of methods is that complex modifica- (Lee and Carlberg 2020; Kim et al. 2020). In this work, tions to the source code describing the physical model can we study three different data-driven strategies for accurate be avoided, thus making it easier to develop reduced models learning of system dynamics within the context of linear di- when the legacy or proprietary source codes are not avail- mension reduction. In the first two methods we adopt the able. In these methods, instead of a Galerkin-type projec- POD technique for identification of an optimal global ba- tion, the expansion coefficients for the reduced solution are sis. For the latent space evolution, we utilize a kernel-based obtained via interpolation on the space of a reduced basis multivariate interpolation method called radial basis func- extracted from snapshot data. However, since the reduced tion (RBF) interpolation, and a machine learning strategy dynamics generally belong to nonlinear, matrix manifolds, designed for sequential learning of time-series data called a variety of interpolation techniques have been proposed neural ordinary differential equations (NODE). In the third that are capable of enforcing the constraints characterizing strategy, the three stages of ROM development are combined those manifolds. Regression-based non-intrusive methods together by using a classical modal decomposition technique have been proposed that, among others, use artificial neu- called the dynamic mode decomposition (DMD), that is sup- ral networks (ANNs), in particular multi-layer perceptrons ported by rigorous mathematical analysis of Koopman mode (Hesthaven and Ubbiali 2018), Gaussian process regression theory. (GPR) (Guo and Hesthaven 2019), and radial basis function (RBF) (Audouze, De Vuyst, and Nair 2013) to perform the Proper orthogonal decomposition interpolation. POD is a popular technique for dimension reduction (An- Here, we will explore an alternative approach to propagat- toulas and Sorensen 2001) of the solution manifold of a ing latent-space dynamics based on Neural ODEs, which are dynamical system by determining a linear reduced space a family of continuous-depth, differentiable networks that spanned by an orthogonal basis with an associated energetic can be seen as an extension of ResNets in the limit of a zero hierarchy. (Taira et al. 2020) provides an excellent overview discretization step size (Dupont, Doucet, and Teh 2019). De- of POD as well as a comparison with other dimension- tails of our approach follow below. In addition, we consider reduction techniques. two NIROM techniques - a) based on linear dimension re- Consider a snapshot matrix S = [b b M ] ∈ RN ×M v1 , . . . , v duction via POD and latent space evolution via Radial Ba- containing a collection of M high-fidelity snapshots of the sis Functions (RBF), and b) Dynamic Mode Decomposition solution manifold from time t = 0 to t = T such that (DMD) that will serve as the benchmarks in our numerical b k ∈ RN is the k th snapshot with the temporal mean value v experiments to provide comparisons with the Neural ODE vi b k = vk − v̄ where v̄ = M P approach. We then proceed with several numerical experi- removed, i.e. , v i=1 M is the time- ments based on incompressible flow around a cylinder and averaged solution. The goal of thePOD procedure is to iden- shallow water hydrodynamics in order to evaluate the meth- tify a linear subspace χ = span ψ 1 , . . . , ψ r , (r M ) ods’ performance for fast replay applications in complex which approximates the solution manifold optimally with fluid-dynamics problems. respect to the L2 -norm. The POD bases can be efficiently extracted by performing a “thin” singular value decomposition (SVD) of the snap- Methodology shot matrix S = Θ eΣeΨ e T , where Σ e = diag(σ1 , . . . , σR ) is The standard ROM development framework usually consists a R × R diagonal matrix containing the singular values ar- of three stages: ranged in decreasing order of magnitude, σ1 ≥ σ2 . . . ≥ σR 1. identification of a low-dimensional latent (or reduced- and R < min{N, M } is the rank of S. Θ e and Ψ e are N × R order) space, and M × R matrices respectively, whose columns are the 2. determining a latent-space representation of the nonlinear orthonormal left and right singular vectors of S such that dynamical system in terms of the reduced basis and mod- ΘeTΘ e = IR = Ψ e T Ψ. e The columns θ n of the matrix Θ e are eling the evolution of the system of modal coefficients, ordered corresponding to the singular values σn and these and provide the desired POD basis. Let Θ denote the matrix of 3. reconstruction in the high-fidelity space for validation and the first m columns of Θ, e Ψ be the matrix containing the analysis. first m rows of Ψ, and Σ be a diagonal matrix containing e the first m singular values from Σ, e then the high-fidelity so- modal coefficients as {zl | l = 0, . . . , M − 1} such that lution vn at time tn can be approximated as, Ni = Ne = M , and making some simplifying assumptions m leads to a symmetric, linear system of M equations to solve for the unknown interpolation coefficients, αj,k X vn ≈ v̄ + Θzn = v̄ + zin θ i , (1) i=1 Aαj = g j , for j = 1, . . . , m, (4) where zn ∈ Rm is a vector of modal coefficients with re- spect to the reduced basis. The modal coefficient matrix where Z = ΘT S constitutes our training data for the latent space [An,k ] = [φ kzn − zk k ], n, k = 0, . . . M − 1, learning methods. Due to the Eckart-Young-Mirsky theo- rem, the POD basis provides an optimal rank-m approxi- α = [αj,0 , . . . , αj,M −1 ] , g = [gj,0 , . . . , gj,M −1 ]T . j T j mation Sb = ΘΣΨT of the snapshot matrix S with a desired The coefficients αj define a unique RBF interpolant which level of accuracy, τP OD . can then be used to approximate eq. (2) and generate a non- The POD method has been successfully applied in statis- intrusive model for the evolution of the modal coefficients tics (Jolliffe 1986), signal analysis and pattern recognition In this work, a first-order forward Euler scheme has been (Deheuvels and Martynov 2008), ocean models (Vermeulen employed for the discretization of the time derivative, and a and Heemink 2006), air pollution models (Fang et al. 2014), strictly positive-definite Matérn C 0 kernel, given by φ(r) = convective Boussinesq flows (San and Borggaard 2015), and e−cr has been adopted, where r is the Euclidean distance Shallow Water Equation (SWE) models (Stefanescu, Sandu, and c is the RBF shape factor (Fasshauer 2007). and Navon 2014; Lozovskiy et al. 2016). Adopting RBF interpolation for modeling the latent space Latent space evolution evolution of the modal coefficients has been shown to be quite successful for nonlinear, time-dependent partial dif- In this section, we outline two non-intrusive methods for ferential equations (PDEs) (Xiao et al. 2015; Dutta et al. modeling the evolution of time-series data in the latent space 2020), nonlinear, parametrized PDEs (Audouze, De Vuyst, defined by the POD basis. RBF interpolation is a classical, and Nair 2013; Xiao et al. 2017), and aerodynamic shape op- data-driven, kernel-based method for computing an approx- timization (Iuliano and Quagliarella 2013), to name a few. imate continuous response surface that aligns with the given multivariate data. The second technique called NODE is a Neural ordinary differential equations Recurrent neural neural-network based method to predict the continuous evo- network (RNN) architectures like LSTM and GRU are of- lution of a vector c over time, that is designed to preserve ten employed to encode time-series data and forecast fu- memory effects within the architecture. ture states, as their internal memory preserving architec- ture allows them to incorporate state information over a se- Radial basis function interpolation For simplicity, let quence of input data. Although RNNs have seen great suc- the time evolution of the modal coefficients z be represented cess in natural language processing tasks, they have had rel- as a semi-discrete dynamical system, atively limited success in high-fidelity scientific computing ż = f (z, t), with z0 = ΘT v0 − v̄ (2) applications (Ferrandis et al. 2019; Wang, Ripamonti, and Hesthaven 2020), as it has been observed that a sequence where all the information about the temporal dynamics in- generated by an RNN may fail to preserve temporal reg- cluding the effects of any numerical stabilization of the high- ularity of the underlying signal, and thus may not repre- fidelity solver and all the nonlinear terms are embedded in sent true continuous dynamics. (Chen et al. 2018). With f (z, t). In the POD-RBF NIROM framework (Dutta et al. deep neural networks (DNN) such as ResNet, the evolu- 2020), instead of the Galerkin projection, the components of tion of the features over the network depth is equivalent the time derivative function fj (j = 1, . . . , m) are approxi- to solving an ordinary differential equation (ODE) such as mated using RBF interpolation. dz dt = F (z, θ) using the forward Euler method, and this con- Let Fj denote a RBF approximation of the time derivative nection between ResNet’s architecture and numerical inte- function fj , which is defined by a linear combination of Ni grators has been explored in details by (Ruthotto and Haber instances of a radial basis function φ, 2019) and others. Several other deep learning methods have Ni been proposed for learning ODEs and PDEs. These include using PDE-based network (Long, Lu, and Dong 2019), train- X Fj (z) = αj,k φ (kz − b zk k) , j = 1, . . . , m, (3) ing DNNs using physics-informed soft penalty constraints k=1 (Raissi, Perdikaris, and Karniadakis 2019), and using sparse where {b zk | k = 1, . . . , Ni } denotes the set of interpolation regularizers and regression (Brunton et al. 2016; Champion centers and αj,k (k = 1, . . . , Ni ) is the unknown interpola- et al. 2019), to name a few. tion coefficient corresponding to the k th center for the j th Chen et al. (2018) proposed a ’continuous-depth’ neu- component of the modal coefficient. These interpolation co- ral network called ODE-Net that effectively replaces the efficients are computed by enforcing the interpolation func- layers in ResNet-like architectures with a trainable ODE tion Fj to exactly match the time derivative of the modal solver. The memory efficiency and stability of this neural or- coefficients at Ne test points (Ne ≥ Ni ). Choosing the cen- dinary differential equation (NODE) approach was further ters and the test points identically from the set of snapshot improved in (Gholami, Keutzer, and Biros 2019; Dupont, Doucet, and Teh 2019) and others. (Maulik et al. 2020) ap- In this work, we utilize the TFDiffEq (https://github.com/ plied the NODE framework to obtain latent space closure titu1994/tfdiffeq) library that runs on the Tensorflow Eager models for ROMs of a one-dimensional advecting shock Execution platform to train the NODE models. Although a problem and a one-dimensional Burgers’ turbulence prob- single layer architecture guarantees upper-bounds accord- lem that exhibits multiscale behavior in the wavenumber ing to the universal approximation theorem (Barron 1993), space. Some other notable recent applications of NODE in- deeper networks with up to four layers as well as several clude the identification of ODE or PDE models from time- linear and nonlinear activation functions are also explored dependent data (Sun, Zhang, and Schaeffer 2020), modeling due to their possibly improved expressibility for more com- of irregularly spaced time series data (Rubanova, Chen, and plex nonlinear dynamics (Zhang et al. 2019). RMSProp is Duvenaud 2019), modeling of spatio-temporal information adopted for loss minimization with an initial learning rate in video signals (Kanaa et al. 2019). Finlay et al. (2020) used of 0.001, a staircase decay function with a range of variable a combination of optimal transport theory and stability reg- decay schedules, and a momentum coefficient of 0.9. NODE ularizations to propose a neural-ODE generative model that predictions of comparable accuracy were obtained for all can be efficiently trained on large-scale datasets. Here we the numerical experiments by using both the adjoint method further explore the application of the POD-NODE method- as well as by backpropagating gradients directly through ology to complex, real-world flows characterized by systems the hidden steps of the ode solver. However, for large-scale of two-dimensional, nonlinear PDEs. training data the latter method may lead to memory issues, We assume that the time evolution of the modal coef- especially while computing on GPU nodes. ficients of the high-fidelity dynamical system in the latent space can be modeled using a (first-order) ODE, Dynamic mode decomposition dz As a final point of comparison, we consider Dynamic mode = F(t, z(t)), with z(0) = z0 , z ∈ Rd , d ≥ 1. (5) decomposition (DMD). DMD is a data-driven ROM tech- dt nique that represents the temporal dynamics of a com- The goal is to obtain a NN approximation Fb of the dynamics plex, nonlinear system (Schmid 2010; Kutz et al. 2016) function F such that ddtz ≈ net(t, z) = F(t, b z, ω). The full as the combination of a few linearly evolving, spatially procedure can be outlined as follows: coherent modes that oscillate at a fixed frequency, and 1. Compute the time series of modal coefficients which are closely related to the eigenvectors of the infinite- [z0 , . . . , zM −1 ] for t ∈ {0, . . . , M − 1} where zk ∈ Rm . dimensional Koopman operator (Koopman 1931; Mezić 2013). Consider the following snapshot matrices contain- 2. Initialize a NN approximation for the dynamics function ing a few temporally-equispaced snapshots of a high- F(t, b z, ω) where ω represents the initial NN parameters. dimensional dynamical system: 3. The NN parameters are optimized iteratively through the X = v0 v1 . . . vM −1 , X0 = v1 v2 . . . vM following steps. (a) Compute the approximate forward time trajectory of where vk ∈ RN is the k th solution snapshot, N is the spa- the modal coefficients by solving eq. (5) using a stan- tial degrees of freedom of the discretized system, and M is dard ODE integrator as, the total number of temporal snapshots. DMD involves the identification of the best-fit linear operator AX that relates ẑM −1 = ODESolve(F, b ω, z0 , t0 , tM −1 ) (6) the above matrices as X0 = AX X, and computing its eigen- values and eigenvectors. Computing a least-square approx- (b) The free parameters of the NODE model are imation of AX using the Moore-Penrose pseudoinverse(† ) 0 M −1 {ω, t ,t }. Evaluate the differentiable lossfunction may pose computational challenges due to the size of the L ODESolve(F, b ω, z0 , t0 , tM −1 ) − zM −1 . discrete dynamical system. For computational efficiency, the (c) To optimise the loss, compute gradients with respect to exact DMD algorithm (adopted here) avoids computing the the free parameters. Similar to the usual backpropaga- Moore-Penrose pseudoinverse(† ) by projecting the operator tion algorithm, this can be achieved by first computing on to a reduced space obtained by POD, as outlined in (Alla the gradient ∂L/∂b z(t), and then a performing a reverse and Kutz 2017). traversal through the intermediate states of the ODE In recent years, Koopman mode theory has provided a rig- integrator. For a memory-efficient implementation, the orous theoretical background for an efficient modal decom- adjoint method (Chen et al. 2018) can be used to back- position in problems describing oscillations and other non- propagate the errors by solving an adjoint system for linear dynamics using DMD (Rowley et al. 2009). Several the augmented state vector b = [ ∂L ∂L ∂L T variants of the DMD algorithm have been proposed (Proctor, z , ∂ω , ∂t ] back- ∂b M −1 0 Brunton, and Kutz 2016; Kutz, Fu, and Brunton 2016; Alek- wards in time from t to t . ∂L seev et al. 2016; Le Clainche and Vega 2017) and have been (d) The gradient ∂ω (t = 0) computed in the previous step successfully applied as efficient ROM techniques for deter- is used to update the parameters ω by using an opti- mining the optimal global basis modes for nonlinear, time- mization algorithm like RMSProp or Adam. dependent problems (Bistrian and Navon 2015, 2017). For 4. The trained NODE approximation of the dynamics func- non-parametrized PDEs, DMD presents an efficient frame- tion can be used to compute predictions for the time tra- work that combines all the three stages of ROM develop- jectory of the modal coefficients. ment to learn a linear operator in an optimal least square sense. However, this approach cannot be directly applied to with “tanh” activations were found to train better when ev- parametrized problems (Alsayyari et al. 2021). ery element of the input state vector was individually scaled to be bounded in [−1, 1], while networks with “elu” activa- Numerical experiments tions trained better without scaling of input vectors. Aug- In this section, we first assess the performance of different mentation of input states as outlined in (Dupont, Doucet, NODE architectures for a benchmark flow problem char- and Teh 2019) was found to have no significant impact on acterized by the incompressible Navier Stokes equations the training. The RMSProp optimizer paired with either a (NSE), and then further evaluate the relative performance step decay function or an exponential decay function were of all three NIROM models for two real-world applications found to be equally effective. However, further numerical governed by the shallow water equations (SWE). The POD- experiments are necessary to study the efficiency of alterna- RBF and DMD NIROM training runs were performed on tive first-order and second-order optimization methods. The a Macbook Pro 2018 with a 2.9 GHz 6-Core Intel Core i9 number of decay steps were varied in discrete increments processor and 32 GB 2400 MHz DDR4 RAM. The NODE between 5000 to 25000, and decay rates ranging from 0.1 to models were trained in serial on Vulcanite, a high perfor- 0.9 were studied. It was observed that a lower initial learning mance computer at the U.S. Army Engineer Research and rate (≈ 0.001) combined with either larger decay steps and Development Center DoD Supercomputing Resource Center smaller decay rates or vice versa led to a desirable training (ERDC-DSRC). Vulcanite is equipped with NVIDIA Tesla trajectory. Fig. 1 shows the evolution of the 1st , 3rd , and 5th V100 PCIe GPU accelerator nodes and has 32GBytes mem- latent-space modal coefficients for the pressure and the x- ory/node. velocity solutions, obtained using the best 8 NODE models. All the models generate accurate predictions at a finer tem- Flow around a cylinder poral resolution than the training data, and have excellent agreement with the high-fidelity solution even while extrap- This problem simulates a time-periodic fluid flow through olating outside the training data (5 ≤ t ≤ 6 seconds). a 2D pipe with a circular obstacle. The flow domain is a rectangular pipe with a circular hole of radius r = 0.05, denoted by Ω = [0, 2.2] × [−0.2, 0.21] \ Br (0.2, 0). The flow is governed by ∂u + ∇ · (u ⊗ u) − ν∆u + ∇p = 0, (7) ∂t ∇·u=0 (8) where u denotes the velocity, p the pressure, ⊗ is the outer product (dyadic product) given by a ⊗ b = abT , and ν = 0.001 is the kinematic viscosity. No slip boundary con- ditions are specified along the lower and upper walls, and on the boundary of the circular obstacle. A parabolic inflow velocity profile is prescribed on the left wall, (0.21 − y)(y − 0.2) u(0, y) = 4U ,0 , (9) 0.412 and zero gradient outflow boundary conditions on the right Figure 1: Comparison of NODE models in the latent space wall. High-fidelity simulation data is obtained with Open- for the cylinder example FOAM using an unstructured mesh with 14605 nodes at Re = 100, such that the flow exhibits the periodic shedding Fig. 2 compares the time trajectory of the spatial root of von Karman vertices. 313 training snapshots are collected mean square errors (RMSE) in the high-fidelity space for for t = [2.5, 5.0] seconds with ∆t = 0.008 seconds, and the two of the best NODE models with two DMD NIROM solu- NIROM predictions are obtained for t = [2.5, 6.0] seconds tions obtained using truncation levels of r = 20 and r = 8. with ∆t = 0.002 seconds. It is encouraging to note that even though the NODE solu- A large collection of NODE architectures and hyperpa- tions are computed using a latent-space representation that rameter configurations were trained for 50000 epochs and is roughly comparable to the DMD solution with a smaller details of the best 8 models are presented in Table 1. A truncation level (r = 8), they are superior in accuracy to fourth-order Runge-Kutta solver was found to be the opti- the coarsely truncated DMD solutions. Furthermore, unlike mal choice in terms of both accuracy and efficiency among the POD-RBF solution that is trained with a first-order Eu- all the available solvers ranging from the fixed-step for- ler time discretization, the NODE solutions did not exhibit ward Euler and the midpoint solvers to the adaptive-step any significant loss in accuracy with time, even while pre- Dormand-Prince (dopri5) solver. The “tanh” and “elu” acti- dicting outside the training region. It is, however, important vation functions were found to be the most effective among to note that the training time for any new NODE architec- all the available linear and nonlinear activation functions. ture was extremely high (see Table 1) when compared to Due to the nature of the activation functions, the networks generating a POD-RBF or a DMD NIROM model, which LR decay Id Layers Units Act. Scaling Augmented MSE Training steps, rate linear, relu, 5000-25000, Range 1-4 32-512 elu, tanh, ... 0.1-0.9 NODE1 1 256 elu 10000, 0.3 No No 8.87e-4 24.45 hrs NODE2 1 256 tanh 5000, 0.7 Yes No 9.01e-4 24.56 hrs NODE3 1 512 elu 5000, 0.5 No No 8.86e-4 24.39 hrs NODE4 1 256 tanh 10000, 0.25 Yes Yes 9.02e-4 22.97 hrs NODE5 4 64 tanh 5000, 0.5 Yes No 9.19e-4 27.98 hrs NODE6 1 256 elu 10000, 0.1 No No 8.87e-4 24.13 hrs NODE7 2 128 elu 5000, 0.5 No No 8.86e-4 25.80 hrs NODE8 1 512 tanh 5000, 0.5 Yes Yes 9.00e-4 24.77 hrs Table 1: Best NODE architectures for the cylinder example. All models were trained for 50000 epochs using the fourth-order Runge-Kutta solver and the RMSProp optimizer with an initial learning rate of 1e-3 and a momentum of 0.9. usually required less than a minute in most cases. Such long Bay in California, USA. The AdH high-fidelity model con- training times may pose a significant challenge for exhaus- sists of N = 6311 nodes, uses tidal data obtained from tive explorations of the design space for optimal architec- NOAA/NOS Co-Ops website at a tailwater elevation inflow tures and hyperparameters, and may hinder the adoption of boundary and has no flow boundary conditions everywhere existing packages for automated architecture search. Thus, else. Further details are available in (Dutta et al. 2020). a concerted effort needs to be directed towards acceleration The training space is generated using 1801 high-fidelity of NODE training times and towards constraining the design snapshots obtained between t = 41 minutes to t = 50 space by a priori identification of promising architectures. hours at a time interval of ∆t = 100 seconds. The pre- dicted ROM solutions are computed for the same time in- 0.05 DMD(20):p NODE3:p 0.06 DMD(20):vx NODE3:vx terval with ∆t = 50 seconds. A latent space of dimension 0.04 DMD(8):p NODE5:p DMD(8):vx NODE5:vx 265 is generated by using a POD truncation tolerance of 0.03 0.04 0.02 τP OD = 5 × 10−7 for each solution component. The RBF 0.02 NIROM approximation is computed using a shape factor, 0.01 0.00 0.00 c = 0.01. The simulation time points provided as input to 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Time (seconds) Time (seconds) the NODE model are normalized to lie in t ∈ [0, 1]. The ‘dopri5’ ODE solver is adopted for computing the hidden Figure 2: Comparison of RMSE of best NODE models with states both forward and backward in time. Learning from the DMD for the cylinder example conclusions of the cylinder example, a network consisting of a single hidden layer with 256 neurons is deployed and the RMSProp optimizer with an initial learning rate of 0.001, a staircase decay rate of 0.5 every 5000 epochs, and a mo- Shallow water equations mentum of 0.9 is utilized for training the model over 20000 The next two numerical examples involve flows governed by epochs. For the DMD NIROM, the simulation time points the depth-averaged SWE which is written in a conservative are normalized to an unit time step, and a truncation level of residual formulation as r = 115 is used to compute the DMD eigen-spectrum. ∂q ∂px ∂py Figure 3 shows the NIROM solutions (top row) for ux at R≡ + + + r = 0, (10) t = 17.36 hours and the corresponding error plots. ∂t ∂x ∂y Figure 4 shows the spatial RMSE over time for the where the state variable q = [h, ux h, uy h]T consists of the depth (left) and the x-velocity (right) NIROM solutions. The flow depth, h, and the discharges in the x and y directions, NODE NIROM solution has comparable accuracy to the given by ux h and uy h, respectively. Further details about DMD NIROM solution and unlike the RBF NIROM solu- the flux vectors px , py and the high-fidelity model equa- tion, does not exhibit any appreciable accumulation of error tions are available in (Dutta et al. 2020). The high-fidelity over time. numerical solutions of the SWE are obtained using the 2D depth-averaged module of the Adaptive Hydraulics (AdH) Riverine flow in Red River The final numerical example finite element suite, which is a U.S. Army Corps of Engi- involves an application of the 2D SWE to simulate riverine neers (USACE) high-fidelity, finite element resource for 2D flow in a section of the Red River in Louisiana, USA. The and 3D dynamics (Trahan et al. 2018). AdH high-fidelity model uses N = 12291 nodes, has a natu- ral inflow velocity condition upstream, a tailwater elevation Tidal flow in San Diego bay This numerical example in- boundary downstream, and no flow boundary along the river volves the simulation of tide-driven flow in the San Diego bank. For further details see (Dutta et al. 2020). RBF solution at t=17.36 hrs DMD solution at t=17.36 hrs NODE solution at t=17.36 hrs RBF solution at t=3.61 hrs DMD solution at t=3.61 hrs NODE solution at t=3.61 hrs 0.90163 < ux < 1.09034 0.90537 < ux < 1.07212 0.96310 < ux < 1.06566 0.18894 < ux < 0.31979 0.35848 < ux < 0.61996 0.24505 < ux < 0.32397 1.00 1.00 1.00 0.5 0.3 0.75 0.75 0.75 0.4 0.2 0.2 0.50 0.50 0.50 0.3 0.25 0.25 0.25 0.1 0.2 0.1 0.00 0.00 0.00 0.1 0.0 0.25 0.25 0.0 0.0 0.25 0.50 0.1 0.1 0.50 0.50 0.1 0.75 0.75 0.75 0.2 0.2 (a) RBF ux (b) DMD ux (c) NODE ux (a) RBF ux (b) DMD ux (c) NODE ux 0.020851