Variational Autoencoders for Learning Nonlinear Dynamics of Physical Systems Ryan Lopez,3 Paul J. Atzberger 1,2,+ * 1 Department of Mathematics, University of California Santa Barbara (UCSB). 2 Department of Mechanical Engineering, University of California Santa Barbara (UCSB). 3 Department of Physics, University of California Santa Barbara (UCSB). + atzberg@gmail.com http://atzberger.org/ Abstract classes [51, 67], sparse symbolic dictionary methods that are linear-in-parameters such as SINDy [9, 64, 67], and dynamic We develop data-driven methods for incorporating physical Bayesian networks (DBNs), such as Hidden Markov Chains information for priors to learn parsimonious representations of nonlinear systems arising from parameterized PDEs and (HMMs) and Hidden-Physics Models [58, 54, 62, 5, 43, 26]. mechanics. Our approach is based on Variational Autoen- A central challenge in learning non-linear dynamics is to coders (VAEs) for learning nonlinear state space models from obtain representations not only capable of reproducing sim- observations. We develop ways to incorporate geometric and ilar outputs as observed directly in the training dataset but to topological priors through general manifold latent space rep- infer structures that can provide stable more long-term ex- resentations. We investigate the performance of our methods trapolation capabilities over multiple future steps and input for learning low dimensional representations for the nonlin- states. In this work, we develop learning methods aiming to ear Burgers equation and constrained mechanical systems. obtain robust non-linear models by providing ways to in- corporate more structure and information about the underly- Introduction ing system related to smoothness, periodicity, topology, and other constraints. We focus particularly on developing Prob- The general problem of learning dynamical models from a abilistic Autoencoders (PAE) that incorporate noise-based time series of observations has a long history spanning many regularization and priors to learn lower dimensional repre- fields [51, 67, 15, 35] including in dynamical systems [8, 67, sentations from observations. This provides the basis of non- 68, 47, 50, 52, 32, 19, 23], control [9, 51, 60, 63], statistics linear state space models for prediction. We develop meth- [1, 48, 26], and machine learning [15, 35, 46, 58, 3, 73]. Re- ods for incorporating into such representations geometric ferred to as system identification in control and engineering, and topological information about the system. This facili- many approaches have been developed starting with linear tates capturing qualitative features of the dynamics to en- dynamical systems (LDS). These includes the Kalman Fil- hance robustness and to aid in interpretability of results. We ter and extensions [39, 22, 28, 70, 71], Principle Orthogo- demonstrate and perform investigations of our methods to nal Decomposition (POD) [12, 49], and more recently Dy- obtain models for reductions of parameterized PDEs and for namic Mode Decomposition (DMD) [63, 45, 69] and Koop- constrained mechanical systems. man Operator approaches [50, 20, 42]. These successful and widely-used approaches rely on assumptions on the model structure, most commonly, that a time-invariant LDS pro- Learning Nonlinear Dynamics with vides a good local approximation or that noise is Gaussian. Variational Autoencoders (VAEs) There also has been research on more general nonlinear We develop data-driven approaches based on a Variational system identification [1, 65, 15, 35, 66, 47, 48, 51]. Non- Autoencoder (VAE) framework [40]. We learn from obser- linear systems pose many open challenges and fewer uni- vation data a set of lower dimensional representations that fied approaches given the rich behaviors of nonlinear dy- are used to make predictions for the dynamics. In prac- namics. For classes of systems and specific application do- tice, data can include experimental measurements, large- mains, methods have been developed which make differ- scale computational simulations, or solutions of complicated ent levels of assumptions about the underlying structure of dynamical systems for which we seek reduced models. Re- the dynamics. Methods for learning nonlinear dynamics in- ductions aid in gaining insights for a class of inputs or phys- clude the NARAX and NOE approaches with function ap- ical regimes into the underlying mechanisms generating the proximators based on neural networks and other models observed behaviors. Reduced descriptions are also helpful in * Work supported by grants DOE Grant ASCR PHILMS DE- many optimization problems in design and in development SC0019246 and NSF Grant DMS-1616353. of controllers [51]. Copyright © 2021for this paper by its authors. Use permitted under Standard autoencoders can result in encodings that yield Creative Commons License Attribution 4.0 International (CC BY unstructured scattered disconnected coding points for sys- 4.0). tem features z. VAEs provide probabilistic encoders and de- coders where noise provides regularizations that promote more connected encodings, smoother dependence on inputs, and more disentangled feature components [40]. As we shall discuss, we also introduce other regularizations into our methods to help aid in interpretation of the learned latent representations. Figure 2: Variational Autoencoder (VAE). VAEs [40] are used to learn representations of the nonlinear dynamics. Deep Neural Networks (DNNs) are trained (i) to serve as feature extractors to represent functions u(x, t) and their evolution in a low dimensional latent space as z(t) (encoder ∼ qθe ), and (ii) to serve as approximators that can con- struct predictions u(x, t+τ ) using features z(t+τ ) (decoder ∼ pθd ). Figure 1: Learning Nonlinear Dynamics. Data-driven way to estimate the log likelihood that the encoder-decoder methods are developed for learning robust models to predict reproduce the observed data sample pairs (X(i) , x(i) ) using from u(x, t) the non-linear evolution to u(x, t+τ ) for PDEs the codes z0 and z. Here, we include a latent-space map- and other dynamical systems. Probabilistic Autoencoders ping z0 = fθ` (z) parameterized by θ` , which we can use (PAEs) are utilized to learn representations z of u(x, t) in to characterize the evolution of the system or further pro- low dimensional latent spaces with prescribed geometric cessing of features. The X(i) is the input and x(i) is the and topological properties. The model makes predictions us- output prediction. For the case of dynamical systems, we ing learnable maps that (i) encode an input u(x, t) ∈ U take X(i) ∼ ui (t) a sample of the initial state function ui (t) as z(t) in latent space (top), (ii) evolve the representation z(t) → z(t + τ ) (top-right), (iii) decode the representation and the output x(i) ∼ ui (t + τ ) the predicted state function z(t + τ ) to predict û(x, t + τ ) (bottom-right). ui (t + τ ). We discuss the specific distributions used in more detail below. We learn VAE predictors using a Maximum Likelihood The LKL term involves the Kullback-Leibler Divergence Estimation (MLE) approach for the Log Likelihood (LL) [44, 18] acting similar to a Bayesian prior on latent space LLL = log(pθ (X, x)). For dynamics of u(s), let X = u(t) to regularize the encoder conditional probability distribu- and x = u(t+τ ). We base pθ on the autoencoder framework tion so that for each sample this distribution is similar to in Figure 1 and 2. We use variational inference to approxi- pθd . We take pθd = η(0, σ02 ) a multi-variate Gaussian with mate the LL by the Evidence Lower Bound (ELBO) [7] to independent components. This serves (i) to disentangle the train a model with parameters θ using encoders and decoders features from each other to promote independence, (ii) pro- based on minimizing the loss function vide a reference scale and localization for the encodings z, and (iii) promote parsimonious codes utilizing smaller di- θ∗ = arg min −LB (θe , θd , θ` ; X(i) , x(i) ), mensions than d when possible. θe ,θd The LRR term gives a regularization that promotes retain- LB = LRE + LKL + LRR , (1) h i ing information in z so the encoder-decoder pair can recon- LRE = Eqθe (z|X(i) ) log pθd (x(i) |z0 ) struct functions. As we shall discuss, this also promotes or-   ganization of the latent space for consistency over multi-step LKL = −βDKL qθe (z|X(i) ) k p̃θd (z) predictions and aids in model interpretability. h i We use for the specific encoder probability distributions LRR = γEqθe (z0 |x(i) ) log pθd (x(i) |z0 ) . conditional Gaussians z ∼ qθe (z|x(i) ) = a(X(i) , x(i) ) + The qθe denotes the encoding probability distribution and η(0, σe2 ) where η is a Gaussian with variance σe2 , (i.e. i i pθd the decoding probability distribution. The loss ` = −LB EX [z] = a, VarX [z] = σe2 ). One can think of the learned provides a regularized form of MLE. mean function a in the VAE as corresponding to a typi- The terms LRE and LKL arise from the ELBO variational cal encoder a(X(i) , x(i) ; θe ) = a(X(i) ; θe ) = z(i) and the bound LLL ≥ LRE +LKL when β = 1, [7]. This provides a variance function σe2 = σe2 (θe ) as providing control of a noise source to further regularize the encoding. Among other Related Work properties, this promotes connectedness of the ensemble of latent space codes. For the VAE decoder distribution, we Many variants of autoencoders have been developed for take x ∼ pθd (x|z(i) ) = b(z(i) ) + η(0, σd2 ). The learned making predictions of sequential data, including those based mean function b(z(i) ; θe ) corresponds to a typical decoder on Recurrent Neural Networks (RNNs) with LSTMs and and the variance function σe2 = σe2 (θd ) controls the source GRUs [34, 29, 16]. While RNNs provide a rich approxima- of regularizing noise. tion class for sequential data, they pose for dynamical sys- The terms to be learned in the VAE framework tems challenges for interpretability and for training to obtain are (a, σe , fθ` , b, σd ) which are parameterized by θ = predictions stable over many steps with robustness against (θe , θd , θ` ). In practice, it is useful to treat variances noise in the training dataset. Autoencoders have also been σ(·) initially as hyper-parameters. We learn predictors for combined with symbolic dictionary learning for latent dy- the dynamics by training over samples of evolution pairs namics in [11] providing some advantages for interpretabil- {(uin , uin+1 )}m ity and robustness, but require specification in advance of i=1 , where i denotes the sample index and a sufficiently expressive dictionary. Neural networks incor- uin = ui (tn ) with tn = t0 + nτ for a time-scale τ . porating physical information have also been developed that To make predictions, the learned models use the follow- impose stability conditions during training [53, 46, 24]. The ing stages: (i) extract from u(t) the features z(t), (ii) evolve work of [17] investigates combining RNNs with VAEs to ob- z(t) → z(t + τ ), (iii) predict using z(t + τ ) the û(t + τ ), tain more robust models for sequential data and considered summarized in Figure 1. By composition of the latent evo- tasks related to processing speech and handwriting. lution map the model makes multi-step predictions of the dynamics. In our work we learn dynamical models making use of VAEs to obtain probabilistic encoders and decoders between Learning with Manifold Latent Spaces euclidean and non-euclidean latent spaces to provide ad- ditional regularizations to help promote parsimoniousness, Roles of Non-Euclidean Geometry and disentanglement of features, robustness, and interpretabil- Topology ity. Prior VAE methods used for dynamical systems in- For many systems, parsimonious representations can be clude [31, 55, 27, 13, 55, 59]. These works use primar- obtained by working with non-euclidean manifold latent ily euclidean latent spaces and consider applications includ- spaces, such as a torus for doubly periodic systems or even ing human motion capture and ODE systems. Approaches non-orientable manifolds, such as a klein bottle as arises in for incorporating topological information into latent vari- imaging and perception studies [10]. For this purpose, we able representations include the early works by Kohonen learn encoders E over a family of mappings to a prescribed on Self-Organizing Maps (SOMs) [41] and Bishop on Gen- manifold M of the form erative Topographical Maps (GTMs) based on density net- works providing a generative approach [6]. More recently, z = Eφ (x) = Λ(Ẽφ (x)) = Λ(w), w = Ẽφ (x). VAE methods using non-euclidean latent spaces include We take the map Ẽφ (x) : x → w, where we represent [37, 38, 25, 14, 21, 2]. These incorporate the role of geom- a smooth closed manifold M of dimension m in R2m , as etry by augmenting the prior distribution p̃θd (z) on latent supported by the Whitney Embedding Theorem [72]. The Λ space to bias toward a manifold. In the recent work [57], an maps (projects) points w ∈ R2m to the manifold represen- explicit projection procedure is introduced, but in the special tation z ∈ M ⊂ R2m . In practice, we accomplish this two case of a few manifolds having an analytic projection map. ways: (i) we provide an analytic mapping Λ to M, (ii) we In our work we develop further methods for more gen- provide a high resolution point-cloud representation of the eral latent space representations, including non-orientable target manifold along with local gradients and use for Λ a manifolds, and applications to parameterized PDEs and con- quantized mapping to the nearest point on M. We provide strained mechanical systems. We introduce more general more details in Appendix A. methods for non-euclidean latent spaces in terms of point- This allows us to learn VAEs with latent spaces for z cloud representations of the manifold along with local gra- with general specified topologies and controllable geomet- dient information that can be utilized within general back- ric structures. The topologies of sphere, torus, klein bottle propogation frameworks, see Appendix A. This also allows are intrinsically different than Rn . This allows for new types for the case of manifolds that are non-orientable and hav- of priors such as uniform on compact manifolds or distribu- ing complex shapes. Our methods provide flexible ways to tions with more symmetry. As we shall discuss, additional design and control both the topology and the geometry of latent space structure also helps in learning more robust rep- the latent space by merging or subtracting shapes or stretch- resentations less sensitive to noise since we can unburden ing and contracting regions. We also consider additional the encoder and decoder from having to learn the embedding types of regularizations for learning dynamical models fa- geometry and avoid the potential for them making erroneous cilitating multi-step predictions and more interpretable state use of extra latent space dimensions. We also have statistical space models. In our work, we also consider reduced models gains since the decoder now only needs to learn a mapping for non-linear PDEs, such as Burgers Equations, and learn- from the manifold M for reconstructions of x. These more ing representations for more general constrained mechanical parsimonious representations also aid identifiability and in- systems. We also investigate the role of non-linearities mak- terpretability of models. ing comparisons with other data-driven models. Results Deep Neural Networks (DNNs) with layer sizes (in)-400- Burgers’ Equation of Fluid Mechanics: Learning 400-(out), ReLU activations, and γ = 0.5, β = 1, and initial Nonlinear PDE Dynamics standard deviations σd = σe = 4 × 10−3 . We show results of our VAE model predictions in Figure 3 and Table 1. We consider the nonlinear viscous Burgers’ equation ut = −uux + νuxx , (2) where ν is the viscosity [4, 36]. We consider periodic bound- ary conditions on Ω = [0, 1]. Burgers equation is motivated as a mechanistic model for the fluid mechanics of advective transport and shocks, and serves as a widely used benchmark for analysis and computational methods. The nonlinear Cole-Hopf Transform CH can be used to relate Burgers equation to the linear Diffusion equation φt = νφxx [36]. This provides a representation of the solution u  Z x  1 0 0 φ(x, t) = CH[u] = exp − u(x , t)dx 2ν 0 ∂ u(x, t) = CH−1 [φ] = −2ν ln φ(x, t). (3) ∂x This can be represented by the Fourier expansion ∞ X φ(x, t) = φ̂k (0) exp(−4π 2 k 2 νt) · exp(i2πkx). k=−∞ Figure 3: Burgers’ Equation: Prediction of Dynamics. We The φ̂k (0) = Fk [φ(x, 0)] and φ(x, t) = consider responses for U1 = {u | u(x, t; α) = α sin(2πx) + −1 2 2 F [{φ̂k (0) exp(−4π k νt)}] with F the Fourier (1−α) cos3 (2πx)}. Predictions are made for the evolution u transform. This provides an analytic representa- over the time-scale τ satisfying equation 2 with initial con- tion of the solution of the viscous Burgers equation ditions in U1 . We find our nonlinear VAE methods are able u(x, t) = CH−1 [φ(x, t)] where φ̂(0) = F[CH[u(x, 0)]]. In to learn with 2 latent dimensions the dynamics with errors general, for nonlinear PDEs with initial conditions within a < 1%. Methods such as DMD [63, 69] with 3 modes which class of functions U, we aim to learn models that provide are only able to use a single linear space to approximate the predictions u(t + τ ) = Sτ u(t) approximating the evolution initial conditions and prediction encounter challenges in ap- operator Sτ over time-scale τ . For the Burgers equation, proximating the nonlinear evolution. We find our linear VAE the CH provides an analytic way to obtain a reduced order method with 2 modes provides some improvements, by al- model by truncating the Fourier expansion to |k| ≤ nf /2. lowing for using different linear spaces for representing the This provides for the Burgers equation a benchmark input and output functions, but at the cost of additional com- model against which to compare our learned models. For putations. Results are summarized in Table 1. general PDEs comparable analytic representations are not usually available, motivating development of data-driven We show the importance of the non-linear approximation approaches. properties of our VAE methods in capturing system behav- We develop VAE methods for learning reduced order iors by making comparisons with Dynamic Mode Decompo- models for the responses of nonlinear Burgers Equation sition (DMD) [63, 69], Principle Orthogonal Decomposition when the initial conditions are from a collection of func- (POD) [12], and a linear variant of our VAE approach. Re- tions U. We learn VAE models that extract from u(x, t) la- cent CNN-AEs have also studied related advantages of non- tent variables z(t) to predict u(x, t + τ ). Given the non- linear approximations [46]. Some distinctions in our work is uniqueness of representations and to promote interpretabil- the use of VAEs to further regularize AEs and using topo- ity of the model, we introduce the inductive bias that the logical latent spaces to facilitate further capturing of struc- evolution dynamics in latent space for z is linear of the ture. The DMD and POD are widely used and successful ap- form ż = −λ0 z, giving exponential decay rate λ0 . For dis- proaches that aim to find an optimal linear space on which crete times, we take zn+1 = fθ` (zn ) = exp(−λ0 τ ) · zn , to project the dynamics and learn a linear evolution law for where θ` = (λ0 ). We still consider general nonlinear map- system behaviors. DMD and POD have been successful in pings for the encoders and decoders which are represented obtaining models for many applications, including steady- by deep neural networks. We train the model on the pairs state fluid mechanics and transport problems [69, 63]. How- (u(x, t), u(x, t + τ )) by drawing m samples of ui (x, ti ) ∈ ever, given their inherent linear approximations they can en- Sti U which generates the evolved state under Burgers equa- counter well-known challenges related to translational and tion ui (x, ti +τ ) over time-scale τ . We perform VAE studies rotational invariances, as arise in advective phenomena and with parameters ν = 2 × 10−2 , τ = 2.5 × 10−1 with VAE other settings [8]. Our comparison studies can be found in Method Dim 0.25s 0.50s 0.75s 1.00s VAE Nonlinear 2 4.44e-3 5.54e-3 6.30e-3 7.26e-3 VAE Linear 2 9.79e-2 1.21e-1 1.17e-1 1.23e-1 DMD 3 2.21e-1 1.79e-1 1.56e-1 1.49e-1 POD 3 3.24e-1 4.28e-1 4.87e-1 5.41e-1 Cole-Hopf-2 2 5.18e-1 4.17e-1 3.40e-1 1.33e-1 Cole-Hopf-4 4 5.78e-1 6.33e-2 9.14e-3 1.58e-3 Cole-Hopf-6 6 1.48e-1 2.55e-3 9.25e-5 7.47e-6 γ 0.00s 0.25s 0.50s 0.75s 1.00s 0.00 1.600e-01 6.906e-03 1.715e-01 3.566e-01 5.551e-01 0.50 1.383e-02 1.209e-02 1.013e-02 9.756e-03 1.070e-02 2.00 1.337e-02 1.303e-02 9.202e-03 8.878e-03 1.118e-02 β 0.00s 0.25s 0.50s 0.75s 1.00s 0.00 1.292e-02 1.173e-02 1.073e-02 1.062e-02 1.114e-02 0.50 1.190e-02 1.126e-02 1.072e-02 1.153e-02 1.274e-02 1.00 1.289e-02 1.193e-02 7.903e-03 7.883e-03 9.705e-03 4.00 1.836e-02 1.677e-02 8.987e-03 8.395e-03 8.894e-03 Table 1: Burgers’ Equation: Prediction Accuracy. The Figure 4: Burgers’ Equation: Latent Space Represen- reconstruction L1 -relative errors in predicting u(x, t) for tations and Extrapolation Predictions. We show the la- our VAE methods, Dynamic Model Decomposition (DMD), tent space representation z of the dynamics for the in- and Principle Orthogonal Decomposition (POD), and reduc- put functions u(·, t; α) ∈ U1 . VAE organizes for u the tion by Cole-Hopf (CH), over multiple-steps and number learned representations z(α, t) in parameter α (blue-green) of latent dimensions (Dim) (top). Results when varying the into circular arcs that are concentric in the time parameter strength of the reconstruction regularization γ and prior β t, (yellow-orange) (left). The reconstruction regularization (bottom). with γ aligns subsequent time-steps of the dynamics in latent space facilitating multi-step predictions. The learned VAE tions are taken to be the two locations x1 , x2 ∈ R2 giving model exhibits a level of extrapolation to predict dynamics x = (x1 , x2 ) ∈ R4 . When the segments are rigidly con- even for some inputs u 6∈ U1 beyond the training dataset strained these configurations lie on a manifold (torus). We (right). can also allow the segments to extend and consider more ex- otic constraints such as the two points x1 , x2 must be on a klein bottle in R4 . Related situations arise in other ar- Table 1. eas of imaging and mechanics, such as in pose estimation We also considered how our VAE methods performed and in studies of visual perception [56, 10, 61]. For the when adjusting the parameters β for the strength of the prior arm mechanics, we can use this prior knowledge to con- p̃ as in β-VAEs [33] and γ for the strength of the reconstruc- struct a torus latent space represented by the product space tion regularization. The reconstruction regularization has a of two circles S 1 × S 1 . To obtain a learnable class of mani- significant influence on how the VAE organizes representa- fold encoders, we use the family of maps Eθ = Λ(Ẽθ (x)), tions in latent space and the accuracy of predictions of the with Ẽθ (x) into R4 and Λ(w) = Λ(w1 , w2 , w3 , w4 ) = dynamics, especially over multiple steps, see Figure 4 and (z1 , z2 , z3 , z4 ) = z, where (z1 , z2 ) = (w1 , w2 )/k(w1 , w2 )k, Table 1. The regularization serves to align representations (z3 , z4 ) = (w3 , w4 )/k(w3 , w4 )k, see VAE Section and Ap- consistently in latent space facilitating multi-step composi- pendix A. For the case of klein bottle constraints, we use tions. We also found our VAE learned representions capable our point-cloud representation of the non-orientable mani- of some level of extrapolation beyond the training dataset. fold with the parameterized embedding in R4 When varying β, we found that larger values improved the multiple step accuracy whereas small values improved the z1 = (a + b cos(u2 )) cos(u  1 ) z2 = (a + b cos(u2 )) sin(u  1) single step accuracy, see Table 1. z3 = b sin(u2 ) cos u21 z4 = b sin(u2 ) sin u21 , with u1 , u2 ∈ [0, 2π]. The Λ(w) is taken to be the map to the Constrained Mechanics: Learning with nearest point of the manifold M, which we compute numer- Non-Euclidean Latent Spaces ically along with the needed gradients for backpropogation To learn more parsimonous and robust representations of as discussed in Appendix A. physical systems, we develop methods for latent spaces hav- Our VAE methods are trained with encoder and decoder ing geometries and topologies more general than euclidean DNN’s having layers of sizes (in)-100-500-100-(out) with space. This is helpful in capturing inherent structure such Leaky-ReLU activations with s = 1e-6 with results reported as periodicities or other symmetries. We consider physical in Figure 5 and Table 2. We find learning representations is systems with constrained mechanics, such as the arm mech- improved by use of the manifold latent spaces, in these tri- anism for reaching for objects in figure 5. The observa- als even showing a slight edge over R4 . When the wrong Torus epoch method 1000 2000 3000 final VAE 2-Manifold 6.6087e-02 6.6564e-02 6.6465e-02 6.6015e-02 VAE R2 1.6540e-01 1.2931e-01 9.9903e-02 8.0648e-02 VAE R4 8.0006e-02 7.6302e-02 7.5875e-02 7.5626e-02 VAE R10 8.3411e-02 8.4569e-02 8.4673e-02 8.4143e-02 with noise σ 0.01 0.05 0.1 0.5 VAE 2-Manifold 6.7099e-02 8.0608e-02 1.1198e-01 4.1988e-01 VAE R2 8.5879e-02 9.7220e-02 1.2867e-01 4.5063e-01 VAE R4 7.6347e-02 9.0536e-02 1.2649e-01 4.9187e-01 VAE R10 8.4780e-02 1.0094e-01 1.3946e-01 5.2050e-01 Klein Bottle epoch method 1000 2000 3000 final VAE 2-Manifold 5.7734e-02 5.7559e-02 5.7469e-02 5.7435e-02 VAE R2 1.1802e-01 9.0728e-02 8.0578e-02 7.1026e-02 VAE R4 6.9057e-02 6.5593e-02 6.4047e-02 6.3771e-02 VAE R10 6.8899e-02 6.9802e-02 7.0953e-02 6.8871e-02 with noise σ 0.01 0.05 0.1 0.5 VAE 2-Manifold 5.9816e-02 6.9934e-02 9.6493e-02 4.0121e-01 VAE R2 1.0120e-01 1.0932e-01 1.3154e-01 4.8837e-01 VAE R4 6.3885e-02 7.6096e-02 1.0354e-01 4.5769e-01 VAE R10 7.4587e-02 8.8233e-02 1.2082e-01 4.8182e-01 Table 2: Manifold Latent Variable Model: VAE Recon- struction Errors The L2 -relative errors of reconstruction for our VAE methods. The final is the lowest value during Figure 5: VAE Representations of Motions using Mani- training. The manifold latent spaces show improved learn- fold Latent Spaces. We learn from observations represen- ing. When an incompatible topology is used, such as R2 , this tations for constrained mechanical systems using general can result in deterioration in learned representations. With non-euclidean manifolds latent spaces M. The arm mech- noise in the input X̃ = X + ση(0, 1) and reconstructing anism has configurations x = (x1 , x2 ) ∈ R4 . For rigid the target X, the manifold latent spaces also show improve- segments, the motions are constrained to be on a manifold ments for learning. (torus) M ⊂ R4 . For extendable segments, we can also consider more exotic constraints, such as requiring x1 , x2 to be on a klein bottle in R4 (top). Results of our VAE meth- restrictions is more likely to use a common latent represen- ods for learned representations for motions under these con- tation. For Rd with d > 2, the extraneous dimensions in the straints are shown. VAE learns the segment length constraint latent space can result in overfitting of the encoder to the and two nearly decoupled coordinates for the torus dataset noise. We see as d becomes larger the reconstruction accu- that mimic the roles of angles. VAE learns for the klein bot- racy decreases, see Table 2. These results demonstrate how tle dataset two segment motions to generate configurations geometric priors can aid learning in constrained mechanical (middle and bottom). systems. Conclusions 2 topology is used, such as in R , we find in both cases a sig- We developed VAE’s for learning robustly nonlinear dynam- nificant deterioration in the reconstruction accuracy, see Ta- ics of physical systems by introducing methods for latent ble 2. This arises since the encoder must be continuous and representations utilizing general geometric and topological hedge against the noise regularizations. This results in an in- structures. We demonstrated our methods for learning the curred penalty for a subset of configurations. The encoder non-linear dynamics of PDEs and constrained mechanical exhibits non-injectivity and a rapid doubling back over the systems. We expect our methods can also be used in other space to accommodate the decoder by lining up nearby con- physics-related tasks and problems to leverage prior geo- figurations in the topology of the input space manifold to metric and topological knowledge for improving learning handle noise perturbations in z from the probabilistic na- for nonlinear systems. ture of the encoding. We also studied robustness when train- ing with noise for X̃ = X + ση(0, 1) and measuring ac- Acknowledgments curacy for reconstruction relative to target X. As the noise Authors research supported by grants DOE Grant ASCR PHILMS increases, we see that the manifold latent spaces improve DE-SC0019246 and NSF Grant DMS-1616353. Also to R.N.L. reconstruction accuracy acting as a filter through restrict- support by a donor to UCSB CCS SURF program. Authors also ing the representation. The probabilistic decoder will tend acknowledge UCSB Center for Scientific Computing NSF MR- to learn to estimate the mean over samples of a common SEC (DMR1121053) and UCSB MRL NSF CNS-1725797. P.J.A. underlying configuration and with the manifold latent space would also like to acknowledge a hardware grant from Nvidia. References governing equations. Proceedings of the National [1] Archer, E.; Park, I. M.; Buesing, L.; Cunning- Academy of Sciences 116(45): 22445–22451. ISSN ham, J.; and Paninski, L. 2015. Black box vari- 0027-8424. doi:10.1073/pnas.1906995116. URL ational inference for state space models. arXiv https://www.pnas.org/content/116/45/22445. preprint arXiv:1511.07367 URL https://arxiv.org/abs/ [12] Chatterjee, A. 2000. An introduction to the proper or- 1511.07367. thogonal decomposition. Current Science 78(7): 808– [2] Arvanitidis, G.; Hansen, L. K.; and Hauberg, S. 2018. 817. ISSN 00113891. URL http://www.jstor.org/ Latent Space Oddity: on the Curvature of Deep Gener- stable/24103957. ative Models. In International Conference on Learning [13] Chen, N.; Karl, M.; and Van Der Smagt, P. 2016. Dy- Representations. URL https://openreview.net/forum? namic movement primitives in latent space of time- id=SJzRZ-WCZ. dependent variational autoencoders. In 2016 IEEE- [3] Azencot, O.; Yin, W.; and Bertozzi, A. 2019. Con- RAS 16th International Conference on Humanoid sistent dynamic mode decomposition. SIAM Jour- Robots (Humanoids), 629–636. IEEE. URL https: nal on Applied Dynamical Systems 18(3): 1565– //ieeexplore.ieee.org/document/7803340. 1585. URL https://www.math.ucla.edu/∼bertozzi/ [14] Chen, N.; Klushyn, A.; Ferroni, F.; Bayer, J.; and Van papers/CDMD SIADS.pdf. Der Smagt, P. 2020. Learning Flat Latent Manifolds [4] Bateman, H. 1915. Some Recent Researches on the with VAEs. In III, H. D.; and Singh, A., eds., Pro- Motion of Fluids. Monthly Weather Review 43(4): ceedings of the 37th International Conference on Ma- 163. doi:10.1175/1520-0493(1915)43h163:SRROTMi chine Learning, volume 119 of Proceedings of Ma- 2.0.CO;2. chine Learning Research, 1587–1596. Virtual: PMLR. [5] Baum, L. E.; and Petrie, T. 1966. Statistical Infer- URL http://proceedings.mlr.press/v119/chen20i.html. ence for Probabilistic Functions of Finite State Markov [15] Chiuso, A.; and Pillonetto, G. 2019. Sys- Chains. Ann. Math. Statist. 37(6): 1554–1563. doi: tem Identification: A Machine Learning Perspec- 10.1214/aoms/1177699147. URL https://doi.org/10. tive. Annual Review of Control, Robotics, and 1214/aoms/1177699147. Autonomous Systems 2(1): 281–304. doi:10.1146/ [6] Bishop, C. M.; Svensén, M.; and Williams, C. annurev-control-053018-023744. URL https://doi.org/ K. I. 1996. GTM: A Principled Alternative to 10.1146/annurev-control-053018-023744. the Self-Organizing Map. In Mozer, M.; Jordan, [16] Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bah- M. I.; and Petsche, T., eds., Advances in Neu- danau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. ral Information Processing Systems 9, NIPS, Den- 2014. Learning Phrase Representations using RNN ver, CO, USA, December 2-5, 1996, 354–360. MIT Encoder–Decoder for Statistical Machine Translation. Press. URL http://papers.nips.cc/paper/1207-gtm-a- In Proceedings of the 2014 Conference on Empirical principled-alternative-to-the-self-organizing-map. Methods in Natural Language Processing (EMNLP), [7] Blei, D. M.; Kucukelbir, A.; and McAuliffe, J. D. 2017. 1724–1734. Doha, Qatar: Association for Computa- Variational Inference: A Review for Statisticians. Jour- tional Linguistics. doi:10.3115/v1/D14-1179. URL nal of the American Statistical Association 112(518): https://www.aclweb.org/anthology/D14-1179. 859–877. doi:10.1080/01621459.2017.1285773. URL [17] Chung, J.; Kastner, K.; Dinh, L.; Goel, K.; Courville, https://doi.org/10.1080/01621459.2017.1285773. A. C.; and Bengio, Y. 2015. A Recurrent Latent Vari- [8] Brunton, S. L.; and Kutz, J. N. 2019. Reduced Or- able Model for Sequential Data. Advances in neural der Models (ROMs), 375–402. Cambridge University information processing systems abs/1506.02216. URL Press. doi:10.1017/9781108380690.012. http://arxiv.org/abs/1506.02216. [9] Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2016. [18] Cover, T. M.; and Thomas, J. A. 2006. Elements of In- Discovering governing equations from data by sparse formation Theory (Wiley Series in Telecommunications identification of nonlinear dynamical systems. Pro- and Signal Processing). USA: Wiley-Interscience. ceedings of the National Academy of Sciences 113(15): ISBN 0471241954. 3932–3937. ISSN 0027-8424. doi:10.1073/pnas. 1517384113. URL https://www.pnas.org/content/113/ [19] Crutchfield, J.; and McNamara, B. S. 1987. Equations 15/3932. of Motion from a Data Series. Complex Syst. 1. [10] Carlsson, G.; Ishkhanov, T.; de Silva, V.; and Zomoro- [20] Das, S.; and Giannakis, D. 2019. Delay-Coordinate dian, A. 2008. On the Local Behavior of Spaces of Maps and the Spectra of Koopman Operators 175: Natural Images. International Journal of Computer 1107–1145. ISSN 0022-4715. doi:10.1007/s10955- Vision 76(1): 1–12. ISSN 1573-1405. URL https: 019-02272-w. //doi.org/10.1007/s11263-007-0056-x. [21] Davidson, T. R.; Falorsi, L.; Cao, N. D.; Kipf, T.; [11] Champion, K.; Lusch, B.; Kutz, J. N.; and Brunton, and Tomczak, J. M. 2018. Hyperspherical Variational S. L. 2019. Data-driven discovery of coordinates and Auto-Encoders URL https://arxiv.org/abs/1804.00891. [22] Del Moral, P. 1997. Nonlinear filtering: Interacting [32] Hesthaven, J. S.; Rozza, G.; and Stamm, B. 2016. Re- particle resolution. Comptes Rendus de l’Académie des duced Basis Methods 27–43. ISSN 2191-8198. doi: Sciences - Series I - Mathematics 325(6): 653 – 658. 10.1007/978-3-319-22470-1 3. ISSN 0764-4442. doi:https://doi.org/10.1016/S0764- [33] Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, 4442(97)84778-7. URL http://www.sciencedirect. X.; Botvinick, M. M.; Mohamed, S.; and Lerchner, A. com/science/article/pii/S0764444297847787. 2017. beta-VAE: Learning Basic Visual Concepts with [23] DeVore, R. A. 2017. Model Reduction and Approx- a Constrained Variational Framework. In ICLR. URL imation: Theory and Algorithms, chapter Chapter 3: https://openreview.net/forum?id=Sy2fzU9gl. The Theoretical Foundation of Reduced Basis Meth- [34] Hochreiter, S.; and Schmidhuber, J. 1997. Long Short- ods, 137–168. SIAM. doi:10.1137/1.9781611974829. Term Memory. Neural Comput. 9(8): 1735–1780. ch3. URL https://epubs.siam.org/doi/abs/10.1137/1. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. 9781611974829.ch3. URL https://doi.org/10.1162/neco.1997.9.8.1735. [24] Erichson, N. B.; Muehlebach, M.; and Mahoney, [35] Hong, X.; Mitchell, R.; Chen, S.; Harris, C.; Li, K.; M. W. 2019. Physics-informed autoencoders for and Irwin, G. 2008. Model selection approaches for Lyapunov-stable fluid flow prediction. arXiv preprint non-linear system identification: a review. Interna- arXiv:1905.10866 . tional Journal of Systems Science 39(10): 925–946. doi:10.1080/00207720802083018. URL https://doi. [25] Falorsi, L.; Haan, P. D.; Davidson, T.; Cao, N. D.; org/10.1080/00207720802083018. Weiler, M.; Forré, P.; and Cohen, T. 2018. Explo- rations in Homeomorphic Variational Auto-Encoding. [36] Hopf, E. 1950. The partial differential equation ut + ArXiv abs/1807.04689. URL https://arxiv.org/pdf/ uux = µxx . Comm. Pure Appl. Math. 3, 201-230 1807.04689.pdf. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/ cpa.3160030302. [26] Ghahramani, Z.; and Roweis, S. T. 1998. Learn- ing Nonlinear Dynamical Systems Using an EM [37] Jensen, K. T.; Kao, T.-C.; Tripodi, M.; and Hennequin, Algorithm. In Kearns, M. J.; Solla, S. A.; and G. 2020. Manifold GPLVMs for discovering non- Cohn, D. A., eds., Advances in Neural Informa- Euclidean latent structure in neural data URL https: tion Processing Systems 11, [NIPS Conference, //arxiv.org/abs/2006.07429. Denver, Colorado, USA, November 30 - Decem- [38] Kalatzis, D.; Eklund, D.; Arvanitidis, G.; and Hauberg, ber 5, 1998], 431–437. The MIT Press. URL S. 2020. Variational Autoencoders with Rieman- http://papers.nips.cc/paper/1594-learning-nonlinear- nian Brownian Motion Priors. arXiv e-prints dynamical-systems-using-an-em-algorithm. arXiv:2002.05227. URL https://arxiv.org/abs/2002. 05227. [27] Girin, L.; Leglaive, S.; Bie, X.; Diard, J.; Hueber, T.; and Alameda-Pineda, X. 2020. Dynamical Variational [39] Kalman, R. E. 1960. A New Approach to Linear Fil- Autoencoders: A Comprehensive Review . tering and Prediction Problems. Journal of Basic Engi- neering 82(1): 35–45. ISSN 0021-9223. doi:10.1115/ [28] Godsill, S. 2019. Particle Filtering: the First 25 Years 1.3662552. URL https://doi.org/10.1115/1.3662552. and beyond. In Proc. Speech and Signal Processing (ICASSP) ICASSP 2019 - 2019 IEEE Int. Conf. Acous- [40] Kingma, D. P.; and Welling, M. 2014. Auto-Encoding tics, 7760–7764. Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, [29] Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Canada, April 14-16, 2014, Conference Track Pro- Deep Learning. The MIT Press. ISBN 0262035618. ceedings. URL http://arxiv.org/abs/1312.6114. URL https://www.deeplearningbook.org/. [41] Kohonen, T. 1982. Self-organized formation of topo- [30] Gross, B.; Trask, N.; Kuberry, P.; and Atzberger, P. logically correct feature maps. Biological cybernetics 2020. Meshfree methods on manifolds for hydrody- 43(1): 59–69. URL https://link.springer.com/article/ namic flows on curved surfaces: A Generalized Mov- 10.1007/BF00337288. ing Least-Squares (GMLS) approach. Journal of [42] Korda, M.; Putinar, M.; and Mezić, I. 2020. Data- Computational Physics 409: 109340. ISSN 0021- driven spectral analysis of the Koopman operator. 9991. doi:https://doi.org/10.1016/j.jcp.2020.109340. Applied and Computational Harmonic Analy- URL http://www.sciencedirect.com/science/article/pii/ sis 48(2): 599 – 629. ISSN 1063-5203. doi: S0021999120301145. https://doi.org/10.1016/j.acha.2018.08.002. URL [31] Hernández, C. X.; Wayment-Steele, H. K.; Sultan, http://www.sciencedirect.com/science/article/pii/ M. M.; Husic, B. E.; and Pande, V. S. 2018. Varia- S1063520318300988. tional encoding of complex dynamics. Physical Re- [43] Krishnan, R. G.; Shalit, U.; and Sontag, D. A. view E 97(6). ISSN 2470-0053. doi:10.1103/physreve. 2017. Structured Inference Networks for Nonlin- 97.062412. URL http://dx.doi.org/10.1103/PhysRevE. ear State Space Models. In Singh, S. P.; and 97.062412. Markovitch, S., eds., Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February [54] Pawar, S.; Ahmed, S. E.; San, O.; and Rasheed, A. 4-9, 2017, San Francisco, California, USA, 2101– 2020. Data-driven recovery of hidden physics in re- 2109. AAAI Press. URL http://aaai.org/ocs/index.php/ duced order modeling of fluid flows 32: 036602. ISSN AAAI/AAAI17/paper/view/14215. 1070-6631. doi:10.1063/5.0002051. [44] Kullback, S.; and Leibler, R. A. 1951. On Informa- [55] Pearce, M. 2020. The Gaussian Process Prior VAE for tion and Sufficiency. Ann. Math. Statist. 22(1): 79–86. Interpretable Latent Dynamics from Pixels. volume doi:10.1214/aoms/1177729694. URL https://doi.org/ 118 of Proceedings of Machine Learning Research, 10.1214/aoms/1177729694. 1–12. PMLR. URL http://proceedings.mlr.press/v118/ [45] Kutz, J. N.; Brunton, S. L.; Brunton, B. W.; pearce20a.html. and Proctor, J. L. 2016. Dynamic Mode De- [56] Perea, J. A.; and Carlsson, G. 2014. A Klein-Bottle- composition. Philadelphia, PA: Society for In- Based Dictionary for Texture Representation. In- dustrial and Applied Mathematics. doi:10.1137/1. ternational Journal of Computer Vision 107(1): 75– 9781611974508. URL https://epubs.siam.org/doi/abs/ 97. ISSN 1573-1405. URL https://doi.org/10.1007/ 10.1137/1.9781611974508. s11263-013-0676-2. [46] Lee, K.; and Carlberg, K. T. 2020. Model reduc- [57] Perez Rey, L. A.; Menkovski, V.; and Portegies, J. tion of dynamical systems on nonlinear manifolds 2020. Diffusion Variational Autoencoders. In Bessiere, using deep convolutional autoencoders. Journal of C., ed., Proceedings of the Twenty-Ninth International Computational Physics 404: 108973. ISSN 0021- Joint Conference on Artificial Intelligence, IJCAI-20, 9991. doi:https://doi.org/10.1016/j.jcp.2019.108973. 2704–2710. International Joint Conferences on Arti- URL http://www.sciencedirect.com/science/article/pii/ ficial Intelligence Organization. doi:10.24963/ijcai. S0021999119306783. 2020/375. URL https://arxiv.org/pdf/1901.08991.pdf. [47] Lusch, B.; Kutz, J. N.; and Brunton, S. L. 2018. Deep [58] Raissi, M.; and Karniadakis, G. E. 2018. Hidden learning for universal linear embeddings of nonlinear physics models: Machine learning of nonlinear par- dynamics. Nature Communications 9(1): 4950. ISSN tial differential equations. Journal of Computational 2041-1723. URL https://doi.org/10.1038/s41467-018- Physics 357: 125 – 141. ISSN 0021-9991. URL 07210-0. https://arxiv.org/abs/1708.00588. [48] Mania, H.; Jordan, M. I.; and Recht, B. 2020. Ac- [59] Roeder, G.; Grant, P. K.; Phillips, A.; Dalchau, N.; and tive learning for nonlinear system identification with Meeds, E. 2019. Efficient Amortised Bayesian Infer- guarantees. arXiv preprint arXiv:2006.10277 URL ence for Hierarchical and Nonlinear Dynamical Sys- https://arxiv.org/pdf/2006.10277.pdf. tems URL https://arxiv.org/abs/1905.12090. [49] Mendez, M. A.; Balabane, M.; and Buchlin, J. M. 2018. Multi-scale proper orthogonal decomposition [60] Samuel H. Rudy, J. Nathan Kutz, S. L. B. 2018. Deep (mPOD) doi:10.1063/1.5043720. learning of dynamics and signal-noise decomposition with time-stepping constraints. arXiv:1808:02578 [50] Mezić, I. 2013. Analysis of Fluid Flows via Spec- URL https://doi.org/10.1016/j.jcp.2019.06.056. tral Properties of the Koopman Operator. Annual Re- view of Fluid Mechanics 45(1): 357–378. doi:10.1146/ [61] Sarafianos, N.; Boteanu, B.; Ionescu, B.; and Kaka- annurev-fluid-011212-140652. URL https://doi.org/ diaris, I. A. 2016. 3D Human pose estimation: A 10.1146/annurev-fluid-011212-140652. review of the literature and analysis of covariates. Computer Vision and Image Understanding 152: 1 – [51] Nelles, O. 2013. Nonlinear system identification: 20. ISSN 1077-3142. doi:https://doi.org/10.1016/ from classical approaches to neural networks and j.cviu.2016.09.002. URL http://www.sciencedirect. fuzzy models. Springer Science & Business Me- com/science/article/pii/S1077314216301369. dia. URL https://play.google.com/books/reader?id= tyjrCAAAQBAJ&hl=en&pg=GBS.PR3. [62] Saul, L. K. 2020. A tractable latent variable model for nonlinear dimensionality reduction. Proceed- [52] Ohlberger, M.; and Rave, S. 2016. Reduced Ba- ings of the National Academy of Sciences 117(27): sis Methods: Success, Limitations and Future Chal- 15403–15408. ISSN 0027-8424. doi:10.1073/pnas. lenges. Proceedings of the Conference Algoritmy 1916012117. URL https://www.pnas.org/content/117/ 1–12. URL http://www.iam.fmph.uniba.sk/amuc/ojs/ 27/15403. index.php/algoritmy/article/view/389. [63] Schmid, P. J. 2010. Dynamic mode decomposition of [53] Parish, E. J.; and Carlberg, K. T. 2020. Time-series numerical and experimental data. Journal of Fluid Me- machine-learning error models for approximate solu- chanics 656: 5–28. doi:10.1017/S0022112010001217. tions to parameterized dynamical systems. Computer URL https://doi.org/10.1017/S0022112010001217. Methods in Applied Mechanics and Engineering 365: 112990. ISSN 0045-7825. doi:https://doi.org/10.1016/ [64] Schmidt, M.; and Lipson, H. 2009. Distilling Free- j.cma.2020.112990. URL http://www.sciencedirect. Form Natural Laws from Experimental Data 324: 81– com/science/article/pii/S0045782520301742. 85. ISSN 0036-8075. doi:10.1126/science.1165893. [65] Schoukens, J.; and Ljung, L. 2019. Nonlinear Sys- tem Identification: A User-Oriented Road Map. IEEE Control Systems Magazine 39(6): 28–99. doi:10.1109/ MCS.2019.2938121. [66] Schön, T. B.; Wills, A.; and Ninness, B. 2011. System identification of nonlinear state-space mod- els. Automatica 47(1): 39 – 49. ISSN 0005- 1098. doi:https://doi.org/10.1016/j.automatica.2010. 10.013. URL http://www.sciencedirect.com/science/ article/pii/S0005109810004279. [67] Sjöberg, J.; Zhang, Q.; Ljung, L.; Benveniste, A.; Delyon, B.; Glorennec, P.-Y.; Hjalmarsson, H.; and Juditsky, A. 1995. Nonlinear black- box modeling in system identification: a unified overview. Automatica 31(12): 1691 – 1724. ISSN 0005-1098. doi:https://doi.org/10.1016/0005- 1098(95)00120-8. URL http://www.sciencedirect. com/science/article/pii/0005109895001208. Trends in System Identification. [68] Talmon, R.; Mallat, S.; Zaveri, H.; and Coifman, R. R. 2015. Manifold Learning for Latent Variable Inference in Dynamical Systems. IEEE Transactions on Sig- nal Processing 63(15): 3843–3856. doi:10.1109/TSP. 2015.2432731. [69] Tu, J. H.; Rowley, C. W.; Luchtenburg, D. M.; Brun- ton, S. L.; and Kutz, J. N. 2014. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics URL http://aimsciences.org/ /article/id/1dfebc20-876d-4da7-8034-7cd3c7ae1161. [70] Van Der Merwe, R.; Doucet, A.; De Freitas, N.; and Wan, E. 2000. The Unscented Particle Filter. In Proceedings of the 13th International Conference on Neural Information Processing Systems, NIPS’00, 563–569. Cambridge, MA, USA: MIT Press. [71] Wan, E. A.; and Van Der Merwe, R. 2000. The un- scented Kalman filter for nonlinear estimation. In Pro- ceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), 153–158. doi:10.1109/ASSPCC. 2000.882463. [72] Whitney, H. 1944. The Self-Intersections of a Smooth n-Manifold in 2n-Space. Annals of Mathematics 45(2): 220–246. ISSN 0003486X. URL http://www.jstor.org/ stable/1969265. [73] Yang, Y.; and Perdikaris, P. 2018. Physics- informed deep generative models. arXiv preprint arXiv:1812.03511 . Appendix A: Backpropogation of Encoders for where Φk (u, w) = 21 kw − σ k (u)k22 . The w is the input and Non-Euclidean Latent Spaces given by u∗ , k ∗ is the solution sought. For smooth parameterizations, the optimal solution satisfies General Manifolds We develop methods for using backpropogation to learn en- G = ∇z Φk∗ (u∗ , w) = 0. coder maps from Rd to general manifolds M. We perform During learning we need gradients ∇w Λ(w) = ∇w z when learning using the family of manifold encoder maps of the w is varied characterizing variations of points on the mani- form Eθ = Λ(Ẽθ (x)). This allows for use of latent spaces fold z = Λ(w). We derive these expressions by considering having general topologies and geometries. We represent the variations w = w(γ) for a scalar parameter γ. We can ob- manifold as an embedding M ⊂ R2m and computationally tain the needed gradients by determining the variations of use point-cloud representations along with local gradient in- u∗ = u∗ (γ). We can express these gradients using the Im- formation, see Figure 6. To allow for Eθ to be learnable, we plicit Function Theorem as develop approaches for incorporating our maps into general d du∗ dw backpropogation frameworks. 0= G(u∗ (γ), w(γ)) = ∇u G + ∇w G . dγ dγ dγ This implies du∗ −1 dw = − [∇u G] ∇w G . dγ dγ As long as we can evaluate at u these local gradients ∇u G, ∇w G, dw/dγ, we only need to determine computationally the solution u∗ . For the backpropogation framework, we use these to assemble the needed gradients for our manifold en- coder maps Eθ = Λ(Ẽθ (x)) as follows. We first find numerically the closest point in the manifold ∗ z ∗ ∈ M and represent it as z ∗ = σ(u∗ ) = σ k (u∗ ) for some chart k ∗ . In this chart, the gradients can be expressed as G = ∇u Φ(u, w) = −(w − σ(u))T ∇u σ(u). We take here a column vector convention with ∇u σ(u) = [σu1 | . . . |σuk ]. We next compute ∇u G = ∇uu Φ = ∇u σ T ∇u σ − (w − σ(u))T ∇uu σ(u) Figure 6: Learnable Mappings to Manifold Surfaces We and develop methods based on point cloud representations em- ∇w G = ∇w,u Φ = −I∇u σ(u). bedded in Rn for learning latent manifold representations For implementation it is useful to express this in more detail having general geometries and topologies. component-wise as X For a manifold M of dimension m, we can represent it [G]i = − (wk − σk (u))∂ui σk (u), by an embedding within R2m , as supported by the Whitney k Embedding Theorem [72]. We let z = Λ(w) be a mapping with w ∈ R2m to points on the manifold z ∈ M. This allows for X [∇u G]i,j = [∇uu Φ]i,j = ∂uj σk (u)∂ui σk (u) learning within the family of manifold encoders w = Ẽθ (x) k any function from Rd to R2m . This facilitates use of deep X neural networks and other function classes. In practice, we − (wk − σk (u))∂u2i ,uj σk (u) shall take z = Λ(w) to map to the nearest location on the k manifold. We can express this as the optimization problem [∇w G]i,j = [∇w,u Φ]i,j X 1 = − ∂wj wk ∂ui σk (u) = −∂ui σj (u). z ∗ = arg min kw − zk22 . z∈M 2 k We can always express a smooth manifold using local coor- The final gradient is given by dinate charts σ k (u), for example, by using a local Monge- dΛ(w) dz ∗ du∗ −1 dw Gauge quadratic fit to the point cloud [30]. We can express = = ∇u σ = −∇u σ [∇u G] ∇w G . dγ dγ dγ dγ z ∗ = σ k (u∗ ) for some chart k ∗ . In terms of the coordinate charts {Uk } and local parameterizations {σ k (u)} we can ex- In summary, once we determine the point z ∗ = Λ(w) press this as we need only evaluate the above expressions to obtain the needed gradient for learning via backpropogation 1 u∗ , k ∗ = arg min kw − σ k (u)k22 , ∇θ Eθ (x) = ∇w Λ(w)∇θ Ẽθ (x), w = Ẽθ (x). k,u∈Uk 2 The ∇w Λ is determined by dΛ(w)/dγ using γ = w1 , . . . wn . In practice, the Ẽθ (x) is represented by a deep neural network from Rd to R2m . In this way, we can learn general encoder mappings Eθ (x) from x ∈ Rd to general manifolds M.