Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning Panos Stinis Advanced Computing, Mathematics and Data Division Pacific Northwest National Laboratory, Richland WA 99354 Abstract to enforce constraints coming from a dynamical system dur- ing the training of a neural network to represent the flow We assume that we are given a time series of data from a dynamical system and our task is to learn the flow map of map of the system. Thus, prior domain knowledge is incor- the dynamical system. We present a collection of results on porated in the neural network training. On the other hand, as how to enforce constraints coming from the dynamical sys- we will show, the accurate representation of the dynamical tem in order to accelerate the training of deep neural networks system flow map through a neural network is equivalent to to represent the flow map of the system as well as increase constructing a temporal integrator for the dynamical system their predictive ability. In particular, we provide ways to en- modified to account for unresolved temporal scales. Thus, force constraints during training for all three major modes of machine learning can enhance scientific computing. learning, namely supervised, unsupervised and reinforcement We assume that we are given data in the form of a time learning. In general, the dynamic constraints need to include terms which are analogous to memory terms in model reduc- series of the states of a dynamical system (a training trajec- tion formalisms. Such memory terms act as a restoring force tory). Our task is to train a neural network to learn the flow which corrects the errors committed by the learned flow map map of the dynamical system. This means to optimize the during prediction. parameters of the neural network so that when it is presented For supervised learning, the constraints are added to the ob- with the state of the system at one instant, it will predict ac- jective function. For the case of unsupervised learning, in curately the state of the system at another instant which is particular generative adversarial networks, the constraints are a fixed time interval apart. If we want to use the data alone introduced by augmenting the input of the discriminator. Fi- to train a neural network to represent the flow map, then it nally, for the case of reinforcement learning and in particular is easy to construct simple examples where the trained flow actor-critic methods, the constraints are added to the reward map has rather poor predictive ability (Stinis et al. 2019). function. In addition, for the reinforcement learning case, we The reason is that the given data train the flow map to learn present a novel approach based on homotopy of the action- how to respond accurately as long as the state of the system value function in order to stabilize and accelerate training. We use numerical results for the Lorenz system to illustrate is on the trajectory. However, at every timestep, when we the various constructions. invoke the flow map to predict the estimate of the state at the next timestep, we commit an error. After some steps, the predicted trajectory veers into parts of phase space where Introduction the neural network has not trained. When this happens, the Scientific machine learning, which combines the strengths neural network’s predictive ability degrades rapidly. of scientific computing with those of machine learning, is One way to aid the neural network in its training task is to becoming a rather active area of research. Several related provide data that account for this inevitable error. In (Stinis priority research directions were stated in the recently pub- et al. 2019), we advanced the idea of using a noisy version of lished report (Baker et al. 2019). In particular, two prior- the training data i.e. a noisy version of the training trajectory. ity research directions are: (i) how to leverage scientific do- In particular, we attach a noise cloud around each point on main knowledge in machine learning (e.g. physical prin- the training trajectory. During training, the neural network ciples, symmetries, constraints); and (ii) how can machine learns how to take as input points from the noise cloud, and learning enhance scientific computing (e.g reduced-order or map them back to the noiseless trajectory at the next time sub-grid physics models, parameter optimization in multi- instant. This is an implicit way of encoding a restoring force scale simulations). in the parameters of the neural network. We have found that Our aim in the current work is to present a collection of this modification can improve the predictive ability of the results that contribute to both of the aforementioned prior- trained neural network but up to a point. ity research directions. On the one hand, we provide ways We want to aid the neural network further by enforcing Copyright c 2020, for this paper by its authors. Use permit- constraints that we know the state of the system satisfies. In ted under Creative Commons License Attribution 4.0 International particular, we assume that we have knowledge of the dif- (CCBY 4.0). ferential equations that govern the evolution of the system (our constructions work also if we assume algebraic con- 1999), the constraints are added to the reward function. In straints see e.g. (Stinis et al. 2019)). Enforcing the differ- addition, for the reinforcement learning case, we have de- ential equations directly at the continuum level can be ef- veloped a novel approach based on homotopy of the action- fected for supervised and and reinforcement learning but it value function in order to stabilize and accelerate training. is more involved for unsupervised learning. Here we have In recent years, there has been considerable interest in opted to enforce constraints in discrete time. We want to in- the development of methods that utilize data and physical corporate the discretized dynamics into the training process constraints in order to train predictors for dynamical sys- of the neural network. The purpose of such an attempt can be tems and differential equations e.g. see (Berry, Giannakis, explained in two ways: (i) we want to aid the neural network and Harlim 2015; Raissi, Perdikaris, and Karniadakis 2018; so that it does not have to discover the dynamics (physics) Chen et al. 2018; Han, Jentzen, and E 2018; Sirignano from scratch; and (ii) we want the constraints to act as reg- and Spiliopoulos 2018; Felsberger and Koutsourelakis 2018; ularizers for the optimization problem which determines the Wan et al. 2018; Ma et al. 2018) and references therein. parameters of the neural network. Our approach is different, it introduces the novel concept Closer inspection of the concept of noisy data and of en- of training on purpose with modified (noisy) data in order to forcing the discretized constraints reveals that they can be incorporate (implicitly or explicitly) a restoring force in the combined. However, this needs to be done with care. Re- dynamics learned by the neural network flow map. We have call that when we use noisy data we train the neural net- also provided the connection between the incorporation of work to map a point from the noise cloud back to the noise- such restoring forces and the concept of memory in model less point at the next time instant. Thus, we cannot enforce reduction. the discretized constraints as they are because the dynam- Due to space limitations, we cannot expand on the details ics have been modified. In particular, the use of noisy data of how to enforce constraints for the 3 major modes of learn- requires that the discretized constraints be modified to ac- ing (please see Sections 1 and 2 in (Stinis 2019) for a de- count explicitly for the restoring force. We have called the tailed discussion of all the constructions). Instead we focus modification of the discretized constraints the explicit error- on the presentation of numerical results for the Lorenz sys- correction. tem to showcase the performance of the proposed approach. The meaning of the restoring force is analogous to that of Also, we note that we have not included results which show memory terms in model reduction formalisms (Chorin and how enforcing constraints, implicitly or explicitly, is better Stinis 2006). Note that the memory here is not because we than not enforcing constraints at all (please see (Stinis et al. are only resolving part of the system’s variables (see e.g. 2019) and (Stinis 2019) for such results). (Ma, Wang, and E 2018; Harlim et al. 2019)) but due to the use of a finite timestep. The timescales that are smaller Numerical results than the timestep used are not resolved explicitly. However, The Lorenz system is given by their effect on the resolved timescales cannot be ignored. In fact, it is what causes the inevitable error at each applica- dx1 tion of the flow map. The restoring force that we include = σ(x2 − x1 ) (1) dt in the modified constraints is there to remedy this error i.e. dx2 to account for the unresolved timescales albeit in a simpli- = ρx1 − x2 − x1 x3 (2) dt fied manner. This is precisely the role played by memory dx3 terms in model reduction formalisms. In the current work = x1 x2 − βx3 (3) we have restricted attention to linear error-correction terms. dt The linear terms come with coefficients whose magnitude where σ, ρ and β are positive. We have chosen for the nu- is optimized as part of the training. In this respect, optimiz- merical experiments the commonly used values σ = 10, ing the error-correction term coefficients becomes akin to ρ = 28 and β = 8/3. For these values of the parameters temporal renormalization. This means that the coefficients the Lorenz system is chaotic and possesses an attractor for depend on the temporal scale at which we probe the system almost all initial points. We have chosen the initial condition (Goldenfeld 1992; Barenblatt 2003). Finally, we note that x1 (0) = 0, x2 (0) = 1 and x3 (0) = 0. the error-correction term can be more complex than linear. We have used as training data the trajectory that starts In fact, it can be modeled by a separate neural network. It from the specified initial condition and is computed by the can also involve not just the previous state but also states Euler scheme with timestep δt = 10−4 . In particular, we further back in time. Results for such more elaborate error- have used data from a trajectory for t ∈ [0, 3]. For all three correction terms will be presented elsewhere. modes of learning, we have trained the neural network to We have implemented constraint enforcing in all three represent the flow map with timestep ∆t = 1.5 × 10−2 i.e. major modes of learning. For supervised learning, the con- 150 times larger than the timestep used to produce the train- straints are added to the objective function. For the case of ing data. After we trained the neural network that represents unsupervised learning, in particular generative adversarial the flow map, we used it to predict the solution for t ∈ [0, 9]. networks (GANs) (Goodfellow et al. 2014), the constraints Thus, the trained flow map’s task is to predict (through iter- are introduced by augmenting the input of the discrimina- ative application) the whole training trajectory for t ∈ [0, 3] tor (Stinis et al. 2019). Finally, for the case of reinforcement starting from the given initial condition and then keep pro- learning and in particular actor-critic methods (Sutton et al. ducing predictions for t ∈ (3, 9]. This is a severe test of the learned flow map’s predictive (zj1 , zj2 , zj3 ) from the noise cloud makes zero the residuals abilities for four reasons. First, due to the chaotic nature of the Lorenz system there is no guarantee that the flow map j1 = F1 (zj ) − zj1 − ∆t[σ(zj2 − zj1 )] + ∆ta1 zj1 (4) can correct its errors so that it can follow closely the training j2 = F2 (zj ) − zj2 − ∆t[ρzj1 − zj2 − zj1 zj3 ] + ∆ta2 zj2 trajectory even for the interval [0, 3] used for training. Sec- (5) ond, by extending the interval of prediction beyond the one j3 = F3 (zj ) − zj3 − ∆t[zj1 zj2 − βzj3 ] + ∆ta3 zj3 , (6) used for training we want to check whether the neural net- work has actually learned the map of the Lorenz system and where a1 , a2 and a3 are parameters to be optimized dur- not just overfitting the training data. Third, we have chosen ing training along with the parameters of the neural network an initial condition that is far away from the attractor but our flow map. The first three terms on the RHS of (4)-(6) are the integration interval is long enough so that the system does forward Euler scheme, while the third is the diagonal linear reach the attractor and then evolves on it. In other words, we error-correcting term. More elaborate error-correcting terms want the neural network to learn both the evolution of the will appear elsewhere (see also (Stinis 2019)). transient and the evolution on the attractor. Fourth, we have chosen to train the neural network to represent the flow map Supervised learning corresponding to a much larger timestep than the one used to The loss function used for enforcing constraints in super- produce the training trajectory in order to check the ability vised learning was of the error-correcting term to account for a significant range  m 3  of unresolved timescales (relative to the training trajectory). 1 XX Loss = [(Fl (zj ) − xdata jl ) 2 +  2 jl , ] (7) We performed experiments with different values for the m j=1 l=1 various parameters that enter in our constructions. We present here indicative results for the case of N = 2 × 104 where jl are the residuals given by (4)-(6). The uncon- samples (N/3 for training, N/3 for validation and N/3 for strained loss function is given by (7) without the residuals. testing). We have chosen Ncloud = 100 for the cloud of We used a deep neural network for the representation of points around each input. Thus, the timestep ∆t = 1.5 × the flow map with 10 hidden layers of width 20. We note 10−2 . This is because there are 20000/100 = 200 time that because the solution of the Lorenz system acquires val- instants in the interval [0, 3] at a distance ∆t = 3/200 = ues outside of the region of the activation function we have 1.5 × 10−2 apart. removed the activation function from the last layer of the The noise cloud for the neural network at a point t was generator (alternatively we could have used batch normal- constructed using the point xi (t) for i = 1, 2, 3, on the train- ization and kept the activation function). Fig. 1 compares ing trajectory and adding random disturbances so that it be- the evolution of the prediction for x1 (t) of the neural net- comes the collection xil (t)(1 − Rrange + 2Rrange × ξil ) work flow map starting at t = 0 and computed with a where l = 1, . . . , Ncloud . The random variables ξil ∼ timestep ∆t = 1.5 × 10−2 to the ground truth (training U [0, 1] and Rrange = 2 × 10−2 . As we have explained trajectory) computed with the forward Euler scheme with before, we want to train the neural network to map the in- timestep δt = 10−4 . We show plots only for x1 (t) since the put from the noise cloud at a time t to the noiseless point results are similar for the x2 (t) and x3 (t). xi (t + ∆t) (for i = 1, 2, 3,) on the training trajectory at time We make two observations. First, the prediction of the t + ∆t. neural network flow map is able to follow with adequate ac- We have to also motivate the value of Rrange for the curacy the ground truth not only during the interval [0, 3] that range of the noise cloud. Recall that the training trajectory was used for training, but also during the interval (3, 9]. Sec- was computed with the Euler scheme which is a first-order ond, the explicit enforcing of constraints i.e. the enforcing of scheme. For the interval ∆t = 1.5 × 10−2 we expect the the constraints (4)-(6) (see results in Fig. 1(b)) is better than error committed by the flow map to be of similar magnitude the implicit enforcing of constraints. and thus we should accommodate this error by considering a cloud of points within this range. We found that taking Unsupervised learning Rrange slightly larger and equal to 2 × 10−2 helps the accu- For the case of unsupervised learning we have chosen racy of the training. GANs. To enforce constraints we consider a two-player min- We denote by (F1 (zj ), F2 (zj ), F3 (zj )) the neural net- max game with the modified value function V const (D, G) : work flow map prediction at tj + ∆t for the input vector zj = (zj1 , zj2 , zj3 ) from the noise cloud at time tj . Also, min max V const (D, G) = Ex∼pdata (x) [log D(x, D (x))] G D xdata j = (x1 (tj +∆t), x2 (tj +∆t), x3 (tj +∆t)) is the point +Ez∼pz (z) [log(1 − D(G(z), G (z)))], (8) on the training trajectory computed by the Euler scheme with δt = 10−4 . For the mini-batch size we have chosen where D (x) ∼ N (0, (2δt)2 ) is the constraint residual for m = 1000 for the supervised and unsupervised cases and the true sample (see explanation in Section 2.4 in (Stinis m = 33 for the reinforcement learning case. et al. 2019)). Also, G (z) is the constraint residual for the We also need to specify the constraints that we want generator-created sample (see (4)-(6) above). The uncon- to enforce. Using the notation introduced above, we want strained value function is given by (8) without the residuals. to train the neural network flow map so that its out- Note that in our setup, the generator input distribution pz (z) put (F1 (zj ), F2 (zj ), F3 (zj )) for an input data point zj = will be from the noise cloud around the training trajectory. 20 20 15 15 10 10 5 5 x1 x1 0 0 −5 −5 −10 −10 −15 −15 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Time Time (a) (a) 20 20 15 15 10 10 5 5 x1 x1 0 0 −5 −5 −10 −10 −15 −15 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Time Time (b) (b) Figure 1: Supervised learning. Comparison of ground truth Figure 2: Unsupervised learning (GAN). Comparison of for x1 (t) computed with the Euler scheme with timestep ground truth for x1 (t) computed with the Euler scheme δt = 10−4 (blue dots) and the neural network flow map pre- with timestep δt = 10−4 (blue dots) and the neural net- diction with timestep ∆t = 1.5 × 10−2 (red crosses). Note work flow map (GAN generator) prediction with timestep that the timestep of the neural network flow map is 150 times ∆t = 1.5 × 10−2 (red crosses). (a) noisy data without en- larger than the timestep used to produce the training data. (a) forced constraints during training; (b) noisy data with en- noisy data without enforced constraints during training; (b) forced constraints during training (see text for details). noisy data with enforced constraints during training (see text for details). has predictive accuracy. We also note that training with noiseless data is even more On the other hand, the true data distribution pdata is the dis- brittle. For the very few experiments where we avoided in- tribution of values of the (noiseless) training trajectory. stability the predicted solution from the trained GAN gener- We have used for the GAN generator a deep neural net- ator was not accurate at all. work with 9 hidden layers of width 20 and for the discrim- inator a neural network with 2 hidden layers of width 20. Reinforcement learning The numbers of hidden layers both for the generator and The last case we examine is that of reinforcement learning the discriminator were chosen as the smallest that allowed (see (Stinis 2019) for notation and details about the con- the GAN training to reach its game-theoretic optimum with- structions). In particular, we use a deterministic policy actor- out at the same time requiring large scale computations. Fig. critic method (Lillicrap et al. 2015). In our application we 2 compares the evolution of the prediction of the neural have identified the neural network flow map with the action network flow map starting at t = 0 and computed with a policy. For the representation of the deterministic action pol- timestep ∆t = 1.5 × 10−2 to the ground truth (training icy, we used a deep neural network with 10 hidden layers of trajectory) computed with the forward Euler scheme with width 20. For the representation of the action-value func- timestep δt = 10−4 . tion we used a deep neural network with 15 hidden layers of Fig. 2(a) shows results for the implicit enforcing of con- width 20. The task of learning an accurate representation of straints. We see that this is not enough to produce a neural the action-value function is more difficult than that of find- network flow map with long-term predictive accuracy. Fig. ing the action policy. This justifies the need for a stronger 2(b) shows the significant improvement in the predictive ac- network to represent the action-value function. curacy when we enforce the constraints explicitly. The re- The training of actor-critic methods in their original form sults for this specific example are not as good as in the case suffers from stability issues. Researchers have developed of supervised learning presented earlier. We note that train- various modifications and tricks to stabilize training (see ing a GAN with or without constraints is a delicate numeri- the review in (Pfau and Vinyals 2016)). The one that en- cal task as explained in more detail in (Stinis et al. 2019). abled us to stabilize results in the first place is that of target One needs to find the right balance between the expres- networks (Mnih et al. 2015; Lillicrap et al. 2015). The tar- sive strengths of the generator and the discriminator (game- get network concept uses different networks to represent the theoretic optimum) to avoid instabilities but also train the action-value function and the action policy that appear in neural network flow map i.e. the GAN generator, so that it the expression for the target in the Bellman equation. How- ever, the predictive accuracy of the trained neural network 20 flow map i.e. the action policy, was extremely poor unless 15 we also used our homotopy approach for the action-value 10 function. This was true for both cases of enforcing or not 5 x1 constraints explicitly during training. With this in mind we 0 present results with and without the homotopy approach for −5 −10 the action-value function to highlight the accuracy improve- −15 ment afforded by the use of homotopy. 0 1 2 3 4 Time 5 6 7 8 9 After each iteration of the optimizer for the action-value (a) function, the homotopy approach uses the quantity 25 δ × Q(st , µ(st )) + (1 − δ) × [rt + γQ(st+1 , µ(st+1 ))] (9) 20 15 in the optimization for the action policy. Here, Q(st , µ(st )) 10 is the action-value function, µ(st ) is the action policy and rt x1 5 the reward function, γ ∈ [0, 1] is the discount factor which 0 expresses the degree of faith in future actions, and δ is the −5 homotopy parameter (see Section 2.3 in (Stinis 2019)). We −10 initialized the homotopy parameter δ at 0, and increased its −15 0 1 2 3 4 5 6 7 8 9 Time value (until it reached 1) every 2000 training iterations. We have set the discount factor to γ = 1, which is a diffi- (b) cult case. It corresponds to the case of a deterministic envi- ronment which means that the same actions always produce Figure 3: Reinforcement learning (Actor-critic). Compar- the same rewards. This is the situation in our numerical ex- ison of ground truth for x1 (t) computed with the Euler periments where we are given a training trajectory that does scheme with timestep δt = 10−4 (blue dots), the neural net- not change. We have conducted more experiments for other work flow map prediction with timestep ∆t = 1.5 × 10−2 values of γ but a detailed presentation of those results will with homotopy for the action-value function during training await a future publication. (red crosses) and the neural network flow map prediction The reward function (with constraints) for an input point with timestep ∆t = 1.5 × 10−2 without homotopy for the zj from the noise cloud at time tj action-value function during training (green triangles). (a) 3   noisy data without enforced constraints during training; (b) data X data 2 2 noisy data with enforced constraints during training (see text r(zj , xj ) = − (µl (zj ) − xjl ) + jl (10) for details). l=1 where xdata j is the noiseless point from the training trajec- tory at time tj + ∆t. Also, µl (zj ) is the action at zj i.e. training of a neural network to represent the flow map of the the prediction of the neural network flow map and jl is the system. We have provided ways that the constraints can be constraint residual for the prediction (see (4)-(6) above). enforced in all three major modes of learning, namely su- Fig. 3 presents results of the prediction performance of pervised, unsupervised and reinforcement learning. In line the neural network flow map when it was trained with and with the law of scientific computing that one should build in without the use of homotopy for the action value function. an algorithm as much prior information is known as pos- In Fig. 3(a) we have results for the implicit enforcing of con- sible, we observe a striking improvement in performance straints while in Fig. 3(b) for the explicit enforcing of con- when known constraints are enforced during training. There straints. We make two observations. First, both for implicit is an added benefit of training with noisy data and how these and explicit enforcing of the constraints, the use of homo- correspond to the incorporation of a restoring force in the topy leads to accurate results for long times. Especially for dynamics of the system (see (Stinis et al. 2019) and (Stinis the case of explicit enforcing which gave us some of the best 2019) for more details). This restoring force is analogous to results from all the numerical experiments we conducted for memory terms appearing in model reduction formalisms. In the different modes of learning. Second, if we do not use our framework, the reduction is in a temporal sense i.e. it al- homotopy, the predictions are extremely poor both for im- lows us to construct a flow map that remains accurate though plicit and explicit forcing. Indeed, the green curve in Fig. it is defined for large timesteps. 3(a) representing the prediction of x1 (t) for the case of im- The model reduction connection opens an interesting av- plicit constraint enforcing without homotopy is as inaccu- enue of research that makes contact with complex sys- rate as it looks. It starts at 0 and within a few steps drops to tems appearing in real-world problems. The use of larger a negative value and does not change much after that. The timesteps for the neural network flow map than the ground predictions for x2 (t) and x3 (t) are equally inaccurate. truth without sacrificing too much accuracy is important. We can imagine an online setting where observations come at Discussion and future work sparsely placed time instants and are used to update the pa- We have presented a collection of results about the enforc- rameters of the neural network flow map. The use of sparse ing of known constraints for a dynamical system during the observations could be dictated by necessity e.g. if it is hard to obtain frequent measurements or efficiency e.g. the local Goldenfeld, N. 1992. Lectures on Phase Transitions and the processing of data in field-deployed sensors can be costly. Renormalization Group. Perseus Books. Thus, if the trained flow map is capable of accurate estimates Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; using larger timesteps then its successful updated training Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. using only sparse observations becomes more probable. 2014. Generative adversarial nets. Advances in neural in- The current approach approximates the flow map using a formation processing systems 2672–2680. feed-forward neural network. It will be interesting to com- Han, J.; Jentzen, A.; and E, W. 2018. Solving high- pare its performance with other approaches, most notably dimensional partial differential equations using deep learn- Recurrent Neural Networks which have been used to model ing. Proceedings of the National Academy of Sciences time series data (see e.g. the review (Bianchi et al. 2017)). 115(34):8505–8510. The constructions presented in the current work depend Harlim, J.; Jiang, S. W.; Liang, S.; and Yang, H. 2019. Ma- on a large number of details that can potentially affect their chine learning for prediction with missing dynamics. arXiv performance. A thorough study of the relative merits of en- preprint arXiv:1910.05861. forcing constraints for the different modes of learning needs to be undertaken and will be presented in a future publica- Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; tion. We do believe though that the framework provides a Tassa, Y.; Silver, D.; and Wierstra, D. 2015. Continuous promising research direction at the nexus of scientific com- control with deep reinforcement learning. arXiv preprint puting and machine learning. arXiv:1509.02971. Ma, H.; Leng, S.; Aihara, K.; Lin, W.; and Chen, L. 2018. Acknowledgements Randomly distributed embedding making short-term high- dimensional data predictable. Proceedings of the National The author would like to thank Court Corley, Tobias Academy of Sciences 115(43):E9994–E10002. Hagge, Nathan Hodas, George Karniadakis, Kevin Lin, Paris Ma, C.; Wang, J.; and E, W. 2018. Model reduction with Perdikaris, Maziar Raissi, Alexandre Tartakovsky, Ramakr- memory and the machine learning of dynamical systems. ishna Tipireddy, Xiu Yang and Enoch Yeung for helpful arXiv preprint arXiv:1808.04258v1. discussions and comments. The work presented here was partially supported by the PNNL-funded “Deep Learning Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, for Scientific Discovery Agile Investment” and the DOE- J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, ASCR-funded ”Collaboratory on Mathematics and Physics- A. K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Informed Learning Machines for Multiscale and Multi- Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, physics Problems (PhILMs)”. Pacific Northwest National S.; and Hassabis, D. 2015. Human-level control through Laboratory is operated by Battelle Memorial Institute for deep reinforcement learning. Nature 518(7540):529–533. DOE under Contract DE-AC05-76RL01830. Pfau, D., and Vinyals, O. 2016. Connecting generative ad- versarial networks and actor-critic methods. arXiv preprint References arXiv:1610.01945. Raissi, M.; Perdikaris, P.; and Karniadakis, G. 2018. Nu- Baker, N.; Alexander, F.; Bremer, T.; Hagberg, A.; merical Gaussian processes for time-dependent and non- Kevrekidis, Y.; Najm, H.; Parashar, M.; Patra, A.; Sethian, linear partial differential equations. SIAM J. Sci. Comput. J.; Wild, S.; and Willcox, K. 2019. Workshop report on 40:A172–A198. basic research needs for scientific machine learning: Core technologies for artificial intelligence. Sirignano, J., and Spiliopoulos, K. 2018. DGM: A deep learning algorithm for solving partial differential equations. Barenblatt, G. I. 2003. Scaling. Cambridge University Press. Journal of Computational Physics 375:1339 – 1364. Berry, T.; Giannakis, D.; and Harlim, J. 2015. Nonpara- Stinis, P.; Hagge, T.; Tartakovsky, A. M.; and Young, E. metric forecasting of low-dimensional dynamical systems. 2019. Enforcing constraints for interpolation and extrapo- Phys. Rev. E 91:032915. lation in generative adversarial networks. Journal of Com- Bianchi, F. M.; Maiorino, E.; Kampffmeyer, M. C.; Rizzi, putational Physics 397. A.; and Jenssen, R. 2017. An overview and comparative Stinis, P. 2019. Enforcing constraints for time series predic- analysis of recurrent neural networks for short term load tion in supervised, unsupervised and reinforcement learning. forecasting. arXiv preprint arXiv:1705.04378. arXiv preprint arXiv:1905.07501. Chen, R. T. Q.; Rubanova, Y.; Bettencourt, J.; and Duve- Sutton, R. S.; McAllester, D.; Singh, S.; and Mansour, Y. naud, D. 2018. Neural ordinary differential equations. arXiv 1999. Policy gradient methods for reinforcement learning preprint arXiv:1806.07366v3. with function approximation. In Proceedings of the 12th Chorin, A. J., and Stinis, P. 2006. Problem reduction, renor- International Conference on Neural Information Processing malization and memory. Communications in Applied Math- Systems, NIPS’99, 1057–1063. Cambridge, MA, USA: MIT ematics and Computational Science 1:1–27. Press. Felsberger, L., and Koutsourelakis, P. 2018. Physics- Wan, Z.; Vlachas, P.; Koumoutsakos, P.; and Sapsis, T. 2018. constrained, data-driven discovery of coarse-grained dy- Data-assisted reduced-order modeling of extreme events in namics. arXiv preprint arXiv:1802.03824v1. complex dynamical systems. PLoS ONE 13:e0197704.