Graph Networks with Physics-aware Knowledge Informed in Latent Space

Graph Networks with Physics-aware Knowledge Informed in Latent Space SungyongSeo Department of Computer Science University of Southern California YanLiu Department of Computer Science University of Southern California Graph Networks with Physics-aware Knowledge Informed in Latent Space 58276DDFC1FCB28FD07B49895B71D137 GROBID - A machine learning software for extracting information from scholarly documents

While physics conveys knowledge of nature built from an interplay between observations and theory, it has been considered less important for modeling deep neural networks. Despite the usefulness of physical rules, it is particularly challenging to leverage the knowledge for sparse data since most physics equations are well defined on the continuous and dense space. In addition, it is even harder to inform the equations into a model if the observations are not fully governed by the given physical knowledge. In this work, we present a novel architecture to incorporate physics or domain knowledge given as a form of partial differential equations (PDEs) on sparse observations by utilizing graph structure. Moreover, we leverage the representation power of deep learning by informing the knowledge in latent space. We demonstrate that climate prediction tasks are significantly improved and validate the effectiveness and importance of the proposed model.

Introduction

Modeling natural phenomena in the real-world, such as climate, traffic, molecule, and so on, is extremely challenging but important. Deep learning has achieved significant successes in prediction performance by learning latent representations from data-rich applications such as speech recognition (Hinton et al. 2012), text understanding (Wu et al. 2016), and image recognition (Krizhevsky, Sutskever, and Hinton 2012). While the accuracy and efficiency of datadriven deep learning models can be improved with ad-hoc architectural changes for specific tasks, we are confronted with many challenging learning scenarios in modeling natural phenomenon, where a limited number of labeled examples are available, there is much noise in the data, and there could be constant changes in data distributions (e.g. dynamic systems). Furthermore, in many domains, data are only available on scattered collections of points (sensors or point clouds, see Figure 1) where the majority of existing methods are not applicable. These challenges are not easily addressed under the purely data-driven learning models and therefore, there is a pressing need to develop new generation robust learning models that can address these challenging learning scenarios. Physics is one of the fundamental pillars describing how the real-world behaves. It is imperative that physicsinformed learning models are powerful solutions to modeling natural phenomena. Incorporating domain knowledge has several benefits: first, it helps an optimized solution to be more stable and to prevent overfitting; second, it provides theoretical guidance with which an optimized model is supposed to follow and thus, helps training with fewer data; lastly, since a model is driven by the desired inductive bias, it would be more robust to unseen data, and thus it is easier to enable accurate extrapolation.

In the meanwhile, there exist a series of challenges when we incorporate physics principles into machine learning models. First, a model needs to properly handle the spatial and temporal constraints. Many physics equations demonstrate how a set of physical quantities behaves on space and time. For example, the wave equation describes how a signal is propagated through a medium over time. Second, the model should capture relations between objects, such as image patches (Santoro et al. 2017) or rigid bodies (Battaglia et al. 2016;Chang et al. 2017). Third, the learning modules should be shared over all objects because physical laws are commonly applicable to all objects. Finally, the model should be flexible to extract unknown patterns instead of be-ing strictly constrained to the physics knowledge. Since it is not always possible to describe all rules governing realworld data, data-driven learning is required to fill the gap between the known physics and real observations.

In this paper, we address the problem of modeling dynamical systems based on graph neural networks by incorporating useful knowledge described as differentiable physics equations. We propose a generic architecture, physics-aware graph networks (PaGN), which can leverage explicitly required physics and learn implicit patterns from data as illustrated in Figure 1. The proposed model properly handles spatially distributed objects and their relations as vertices and edges in a graph. Moreover, temporal dependencies are learned by recurrent computations. As Battaglia et al. (2018) suggest, the inductive bias of a graph-based model is its invariance [to] node/edge permutations, and thus, all trainable functions for the same input types are shared.

Our contributions of this work are summarized as follows: • We develop a novel physics-aware learning architecture, PaGN, which incorporates differentiable physics equations with a graph network framework. • We explore the performance of PaGN on graph signal prediction tasks to demonstrate that the physics knowledge is helpful to provide a significant improvement in prediction tasks and make a model more robust. • We investigate the effectiveness and the importance of PaGN from climate prediction to provide how physics knowledge can be beneficial for prediction performance.

Related Work

Incorporating physics Among many attempts incorporating physical knowledge into data-driven models, Cressie and Wikle (2015) covered a number of statistical models (e.g., a hierarchical Bayesian framework) handling physical equations. Raissi, Perdikaris, and Karniadakis (2017a) introduced a concept of physics-informed neural networks, which utilize physics equations explicitly to train neural networks. By optimizing the model at initial/boundary and sampled collocation points, the data-driven solutions of nonlinear PDEs can be found. Based on this fundamental idea, a number of works for simulating and discovering PDEs have been published (Raissi and Karniadakis 2018;Raissi 2018;Raissi, Perdikaris, and Karniadakis 2017b). Although these works leveraged physical knowledge, they are limited because they require all physics behind given data to be explicitly known. de Bezenac, Pajot, and Gallinari (2018) considered a similar problem as ours. They proposed how transport physics (advection and diffusion) could be incorporated for forecasting sea surface temperature (SST). In other words, they proposed how the motion flow that is helpful for the temperature flow prediction could be extracted in an unsupervised manner from a sequence of SST images. This work is a major milestone since it captures not only the dominant transport physics but also unknown patterns inferred through the neural networks. Despite of its novel architecture, the model is specifically designed for transport physics and it is not straightforward to extend the model to other physics equations. Furthermore, it is restricted in a regular grid to use conventional convolutional neural networks (CNNs) for images.

Discovering physical dynamics A class of models (Grzeszczuk, Terzopoulos, and Hinton 1998;Battaglia et al. 2016;Chang et al. 2017;Watters et al. 2017;Sanchez-Gonzalez et al. 2018;Kipf et al. 2018) have been proposed based on the assumption that neural networks can learn complex physical interactions and simulate unseen dynamics based on a current state. The models along this direction are based on common relational inductive biases (Santoro et al. 2017;Battaglia et al. 2018), i.e., functions connecting entities and relations are shared and can be learned from a given sequence of simulated dynamics. (Chang et al. 2017;Battaglia et al. 2016;Sanchez-Gonzalez et al. 2018) commonly assumed that the objects' behaviors were governed by classical kinetic physics equations. Then, object-and relation-centric functions were proposed to learn the transition from the current state to the next state without explicitly injecting the equations into the model. Discovering latent physics by data-driven learning has been actively studied (Long et al. 2018;Brunton, Proctor, and Kutz 2016). While the properly constrained filters enable us to identify the governing PDEs, it is only applicable when we are aware of the form of target PDEs. Unlike this line of works that extracts latent patterns from data only, our proposed model can incorporate known physics and at the same time extract latent patterns from data which cannot be captured by existing knowledge.

Background

In this section, we introduce how differential operators in Euclidean domain are analogously defined on the discrete graph domain and briefly show that the graph networks module is able to efficiently express the differential operators.

Calculus on Graphs

Preliminary Given a graph G = (V, E) where V and E are a set of vertices V = {1, . . . , n} and edges E ⊆ V 2 , respectively, two types of real functions can be defined on the vertices, f : V → R, and edges, F : E → R, of the graph. It is also possible to define multiple functions on the vertices or edges as multiple feature maps of a pixel in CNNs. Since f and F can be viewed as scalar and vector fields in differential geometry (Figure 2), the corresponding discrete operators on graphs can be defined as follow (Bronstein et al. 2017).

Gradient on graphs

The gradient on a graph is the linear operator defined by

∇ : L 2 (V) → L 2 (E) (∇f ) ij = (f j − f i ) if {i, j} ∈ E and 0 otherwise.

where L 2 (V) and L 2 (E) denote Hilbert spaces of vertex and edge functions, respectively, thus f ∈ L 2 (V) and F ∈ L 2 (E). As the gradient in Euclidean space measures the rate Divergence on graphs The divergence in Euclidean space maps vector fields to scalar fields. Similarly, the divergence on a graph is the linear operator defined by div :

L 2 (E) → L 2 (V) (div F ) i = j:(i,j)∈E w ij F ij ∀i ∈ V

where w ij is a weight on the edge (i, j). It denotes a weighted sum of incident edge functions to a vertex i, which is interpreted as the netflow at a vertex i.

Laplacian on graphs Laplacian (∆ = ∇ 2 ) in Euclidean space measures the difference between the values of the scalar field with its average on infinitesimal balls. Similarly, the graph Laplacian is defined as

∆ : L 2 (V) → L 2 (V) (∆f ) i = j:(i,j)∈E w ij (f i − f j ) ∀i ∈ V

The graph Laplacian can be represented as a matrix form, L = D − W where D = diag( j:j =i w ij ) is a degree matrix and W denotes a weighted adjacency matrix. Note that L = ∆ = −div∇ and the minus sign is required to make L positive semi-definite.

Based on the core differential operators on a graph, we can re-write differentiable physics equations (e.g., Diffusion equation or Wave equation) on a graph. Given a set of nodes (v), edges (e), and global (u) attributes, the steps of computation in a graph networks block are as follow:

Graph Networks1. e ij ← φ e (e ij , v i , v j , u) for all {i, j} ∈ E pairs. 2. v i ← φ v (v i , ē i , u) for all i ∈ V.

ē i is an aggregated edge attribute related to the node i.

3. u ← φ u (u, ē , v )

ē and v are aggregated attributes of all edges and all nodes in a graph, respectively. where φ e , φ v , φ u are edge, node, and global update functions, respectively, and they can be implemented by learnable neural networks. Note that the computation order is flexible. The aggregators can be chosen freely once it is invariant to permutations of their inputs.

Mapping Equation Physics example node→ edge eij = φ e (vi, vj) = (∇v)ij ∇φ = −E (Electric field) edge → node vi = φ v (eij) = (div e)i ∇ • E = ρ/ 0 (Maxwell's eqn.) node → node vi = φ v (vi, {v j:(i,j)∈E }) = (∆v)i ∆φ = 0 (Laplace's eqn.)

As φ e is a mapping function from vertices to edges, it can be replaced by the graph gradient operator to describe the known relation explicitly. Similarly, φ v can learn divergence-like mapping (edge to node) functions. For curlinvolved functions, it is required to add another updating function, φ c , which is mapping from nodes/edges/global attributes to a 3-clique attribute and vice versa. In other words, the graph networks have highly flexible modules which are able to imitate the differential operators in a graph explicitly or implicitly.

Physics-aware Graph Networks

As deep learning models are successful to model complex behaviors or extract abstract features in data, it is natural to focus on how the data-driven modeling can solve practical problems in physics or engineering fields. In this section, we provide how domain knowledge described in physics can be incorporated with the graph networks framework.

Static Physics

Many fields in physics dealing with static properties, such as Electrostatic, Magnetostatic, or Hydrostatic, describe a number of physics phenomena at rest. Among the various phenomena, it is easy to express differentiable physics rules in discrete forms on a graph with the operators in previous Section . For instances, the Poisson equation (∇ 2 φ = − ρ 0 ) in Electrostatics is realized as a simple matrix multiplication of graph Laplacian with a vertex function. Table 1 provides some differential formulas in Electrostatic and how the updating functions are defined in graph networks.

Dynamic Physics

More practical equations have been written in the dynamic forms, which describe how a given physical quantity is changing in a given region over time. GN can be regarded as a module that updates a graph state including the attributes of node, edge, and a whole graph.

G = GN(G)(1)

Equation Physics example

v i = vi + αφ v (vi, {v j:(i,j)∈E }) = vi + α(∆v)i u = α∆u (Diffusion eqn.) v i = 2v i − vi + c 2 φ v (v i , {v j:(i,j)∈E }) = 2v i − vi + c 2 (∆v )i ü = c 2 ∆u (Wave eqn.)f ∂u ∂t , • • • , ∂ M u ∂t M , ∂u ∂x , • • • , ∂ N u ∂x N = 0 (2)

where u is a physical quantity spatiotemporally varying and x is the direction where u is defined on. M and N denote the highest order of time and spatial derivatives, respectively. Under the state updating view in Equation 1, any types of PDEs written in Equation 2 can be represented as a form of finite differences. Table 2 provides the examples of the dynamic physics. u and ü are the first and second order time derivatives, respectively.

Physics in Latent Space

We provide how the differential operators are implemented in a GN module in a previous section. However, it is hardly practical for modeling complicated real-world problems with the differential operators solely because it is only possible when all physics equations governing the observed phenomena are explicitly known. For example, although we are aware that there are a number of physics equations involved in climate observations, it is almost infeasible to include all required equations for modeling the observations. Thus, it is necessary to utilize the learnable parameters in GN to fill the missing dynamics which is not described by given equations.

There is another advantage to utilize learnable parameters. There are a number of unknown parameters, which need to be pre-defined to specify the physics equations, and the parameters can be inferred by the learnable parameters. For example, while we have knowledge that input signal has a wave property, the speed of waves (c in Table 2) should be given to fully describe the wave equation. It will be even worse when multiple input signals are involved since each signal is governed by different parameters in the same kind of equation. While both temperature and surface pressure are continuous and diffusive, they should have different diffusion coefficients (α in Table 2) in the same diffusion equation. To address the issue we can transform the input signals to latent space and use one equation in the latent space instead of imposing multiple equations to input signals separately. Then, the parameters in Encoder make the different signals follow the equation differently. We formalize how this idea is implemented as follow.

Forward/Recurrent computation Figure 3 provides how the desired physics knowledge is integrated with the graph networks. Given a graph G = {v, e, u}, it is fed into an encoder which transforms a set of attributes of nodes (v), edges (e), and a whole graph (u) into latent spaces. ṽ, ẽ, ũ = Encoder(v, e, u)

(3)

After the encoder, the encoded graph H = {ṽ, ẽ, ũ} is repeatedly updated within the core block as many as the required time steps T . For each step, H is updated to H which denotes the next state of the encoded graph.

H = GN(H) (4)

Finally, the sequentially updated attributes are retransformed to the original spaces by a decoder.

v , e , u = Decoder(ṽ , ẽ , ũ )

There are two types of objective function in this architecture, physics knowledge and supervised objective. First, we define physics-informed constraint, which is a form of equations in Table 1 and 2 depending on given physics knowledge and even mixed.

f s phy (H t ), f d phy (H t , • • • , H t+M ) (6) L phy = t f s phy (H t ) + f d phy (H t , • • • , H t+M ) (7)

where f s phy (H t ) and f d phy (H t , • • • , H t+M ) are the static and dynamic physics-informed quantity, respectively. For example, we can impose gradient constraint or the diffusion equation between node/edge latent representations as follow:

f s phy (H t ) = ẽ t − ∇ṽ t 2 f d phy (H t , H t+1 ) = ṽ t+1 − ṽ t − α∇ 2 ṽ t 2

Secondly, the supervised loss function between the predicted graph, Ĝ , and the target graph, G . This loss function is constructed based on the task, such as the cross-entropy or the mean squared error (MSE). Finally, the total objective function is a sum of the two constraints:

L = L sup + λL phy (8)

where λ controls the importance of the physics term.

Experiment

In this section, we evaluate PaGN on a real-world climate dataset on the Southern California region.

Climate Data

For the evaluation on real-world data, we used the hourly simulated climate observations for 16 days on the Southern California region (Zhang et al. 2018). In this dataset, we sampled small regions randomly from two area (Los Angeles and San Diego, Figure 4) encompassing urban and rural meteorological features to generate spatially discrete observations. To build a graph, we connected a pair of the sampled regions by using k-nearest neighbors algorithm (k = 3). This data preprocessing is required to verify the proposed

𝒢 Encoder ℋ GN ℋ′ 𝒢 $ ′ Decoder x T

Physics equation

Supervised Loss

𝒢′ ⨀

Figure 3: Recurrent architecture to incorporate physics equation on GN. The blue blocks have learnable parameters and the orange blocks are objective functions. is a concatenation operator and the middle core block can be repeated as many as the required time steps (T ). idea as well as evaluate PaGN on the spatiotemporally sparse setting, which is more common for sensor-based datasets.

The vertex attributes consist of 10 climate observations, Air temperature, Albedo, Precipitation, Soil moisture, Relative humidity, Specific humidity, Surface pressure, Planetary boundary layer height, and Wind vector (2 directions). While the edge attributes are not given explicitly, we could specify the type of each edge by using the type of connected regions. There are 13 different land-usage types and each type summarizes how the corresponding land is used. Based on the types of connected regions, we assigned different embedding vectors to edges.

PaGN Architecture

As explained in Section , PaGN consists of three modules, graph encoder, GN block, and graph decoder (Figure 3). The encoder contains two feed forward networks, φ v and φ e , applied to node and edge features, respectively. By passing the encoder, the features are transformed to the latent space (H) where we will impose physics equations.

In the GN block, the node/edge/graph features are updated by the GN algorithm described in Section . The latent graph states, H and H , indicate the hidden states of the current and next observations. For the physics constraint, we informed the diffusion and wave equation in Table 2, which describe the behavior of the continuous physical quantities. As the most of the climate observations are varying continuously, the diffusion equation, as a part of the continuity equation, is one of the inductive bias that should be considered for modeling. In addition, the wave equation is useful to describe atmospheric phenomena, especially 1 solar day harmonics (e.g., Atmospheric tide). Note that the physics equations are not directly applied to the input observations, but rather to the latent representations. The state-updating process is repeated at least as many as the order of the equations to provide the finite difference equation. For multistep predictions, the recurrent module is repeated as many as the number of the predictions and the physics equation will be also applied multiple times as well. Finally, the decoder takes H as input to return the next predictions. The following objective is the total loss function of PaGN with the diffusion equation.

L = T i=1 ŷ i − y i 2 + λ T i=1 ṽ i − ṽi−1 − α∇ 2 ṽi−1 2 (9)

where y is a vector of the target observations (i.e. node vectors) and α adjusts the diffusivity of the latent representa-

Experimental Settings

In our experiments, we used the air temperature as a target observation and other 9 observations were used as input. We first evaluated our model by performing the one-step and multistep prediction tasks on the two different area with a mean square error metric. For both regions, we commonly trained the model with input observations for 10 timesteps (t − 10 : t − 1) and predicted targets from t − 9 to t. First 65% of a total length was used as a training set and remaining series was split into validation (10%) and test sets (25%).

We explored several baselines: MLP, LSTM, and GNonly ignoring the physics constraint in PaGN. We also compared GN-skip which connects between H and H with the skip-connection (He et al. 2016) without the physics constraint. spatiotemporally continuous. Among the graph-based models, PaGN(diff) provides the least MSEs. It validates that the diffusive property provides a strong inductive bias with the latent representation learning. Note that the standard deviations from PaGN(diff) are significantly smaller than those of other baselines and it implies that the integrated physics knowledge properly stabilizes optimization process by introducing additional objective.

One step Prediction

Multistep Prediction

To evaluate the effectiveness of the state-wise regularization more carefully, we conducted the multistep prediction task (10 forecast horizon). For the task, the recurrent modules are modified to predict input observations as well and the predicted one is re-fed in the model for future timesteps. While the models having a recurrent module are able to predict a few more steps reasonably, there are a couple of things we should pay attention. First, the results imply that utilizing the neighboring information is important because GN-only model shows similar or better MSEs compared to LSTM for the multistep tasks, even though it has a simple recurrent module that is not as good as that of LSTM. Second, we found that the diffusion equation in PaGN gives the stable state transition and the property provides slowly varying latent states which are desired particularly for the climate forecasting. Note that the skip-connection in GN-skip is also able to restrict the rapid changes of H. However, it is necessary to more carefully optimize the parameters in GN-skip to learn the residual term in H = H + GN(H) properly.

Effectiveness of Physics Constraint

One of the benefits of physics-aware learning is data efficiency. We explore how much the physics constraint is helpful by testing if PaGN can be well-trained when the number of data for the supervised objective is limited for the one-step prediction task. We randomly sampled training data which were used to optimize the total loss function (Equation 9) and the left unsampled data were only used to minimize the physics constraint:

L = L i sup + λL i phy , i is a sampled step L = λL i phy ,otherwise

We found that the diffusion equation can benefit to optimize PaGN even if the target observations are partially available (Figure 5a). Although the overall performances of PaGN are degraded when less number of sampled data are used, the error are not far deviated from those of GN-only.

Importance of Physics Constraint

To study the importance of the physics term, we trained PaGN with different λ controlling the importance of the physics term. While we found that the physics term is substantially helpful from Table 3 and 4, the term is not supposed to be dominant (See Figure 5b) but tuned properly. This is intuitive since the term only provides partial knowledge (diffusive input signals), which changes loss surface to help parameters more stable to predict next signals, instead of governing the dynamics explicitly. Scaling down the physics term is similar to what Sabour, Frosst, and Hinton (2017) did for reconstruction error not to dominate margin loss but to help the optimization process. We also present MSEs from PaGN(rand) defined by randomly sampling (α, β) ∈ [−2.5, 2.5] in the constraint ||v + αv + βv − c∆v|| 2 , and PaGN(diff+wave) superposing the two equations. Table 5 shows that the random equation significantly degrades the overall prediction quality. Note that the simple superposition of two equations does not always guarantee lower error even if each equation is helpful separately. When the two equations are non-linearly connected in the unknown (fully) governing equation, the superposition cannot provide meaningful inductive bias. The results demonstrate that the physics term is an useful inductive bias when it is properly defined.

Conclusion

In this work, we introduce a new architecture PaGN based on graph networks to incorporate prior knowledge given as a form of PDEs over time and space. While existing works more focus on how to discover equations in data generated by explicit physics rules, we propose a method to leverage weakly given inductive bias describing data. We empirically analyze the performance of PaGN across a range of prediction experiments on the climate observations.

Figure 1 :1Figure 1: Concept of the proposed PaGN. Many sensorbased observations are only sparsely available (See circled regions) but there are continuous physical process (e.g., Diffusion) behind the sparse observations. Some of the known physics rules are injected into a model and the remained unknown dynamics will be extracted from data.

Figure 2: Scalar/vector fields on Euclidean space and vertex/edge functions on a graph. and direction of change in a scalar field, the gradient on a graph computes differences of the values between two adjacent vertices and the differences are defined along the directions of the corresponding edges.

Battaglia et al. (2018) proposed a graph networks framework, which generalizes relations among vertices, edges, and a whole graph. Graph Networks (GN) describe how edge, node, and global attributes are updated by propagating information among themselves.

Figure 5 :5Figure 5: In (a) MSEs of PaGN are almost as good as GNonly (gray lines) despite the less number of training data. (b) provides how the prediction performance is dependent on the physics term.

Table 1 :1Examples of static equations in Graph networks

Table 2 :2Examples of dynamic equations in Graph networkswhere G is the updated graph state. Dynamic physics formulas are written as a function of time and spatial derivatives:

Table 3 :3One step prediction error (MSE)tions, which is found through cross validation. Note that the equation term can be replaced by other equations properly.

Table 4 :4Table3shows the prediction error of the baselines and PaGN on different areas. MLP and LSTM are shared over all stations and their performaces are outperformed by other models leveraging a given graph structure. It implies that knowing neighboring information is significantly helpful to infer its own state and it is intuitive since climate behaviors are Multistep prediction error (MSE)

ModelLA areaSD areaLSTM1.9022±0.2078 1.2489±0.2295GN-only1.6137±0.1128 1.5532±0.2023GN-skip1.5429±0.0932 1.4423±0.1622PaGN(diff) 1.4656±0.0474 1.0999±0.0435

Even the GN-only model is outperformed by PaGN when only 70% training data are used with the state-wise constraint.1.2LA area SD area1.11.0MSE0.90.80.70.60.50%25%50%75%

Table 5 :5One step prediction MSE with different constraints.

Interaction networks for learning about objects, relations and physics PBattaglia RPascanu MLai DJRezende Advances in neural information processing systems 2016 Relational inductive biases, deep learning, and graph networks PWBattaglia JBHamrick VBapst ASanchez-Gonzalez VZambaldi MMalinowski ATacchetti DRaposo ASantoro RFaulkner arXiv:1806.01261 2018 arXiv preprint Geometric deep learning: going beyond euclidean data MMBronstein JBruna YLecun ASzlam PVandergheynst IEEE Signal Processing Magazine 34 4 2017 Discovering governing equations from data by sparse identification of nonlinear dynamical systems SLBrunton JLProctor JNKutz Proceedings of the National Academy of Sciences 113 15 2016 <author> <persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Chang</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Ullman</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Torralba</surname></persName> </author> <author> <persName><forename type="first">J</forename><forename type="middle">B</forename><surname>Tenenbaum</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b5"> <analytic> <title level="a" type="main">A Compositional Object-Based Approach to Learning Physical Dynamics International Conference on Learning Representations Statistics for spatiotemporal data NCressie CKWikle 2015 John Wiley & Sons Deep Learning for Physical Processes: Incorporating Prior Scientific Knowledge EDe Bezenac APajot PGallinari International Conference on Learning Representations 2018 Neuroanimator: Fast neural network emulation and control of physics-based models RGrzeszczuk DTerzopoulos GHinton Proceedings of the 25th annual conference on Computer graphics and interactive techniques the 25th annual conference on Computer graphics and interactive techniques ACM 1998 Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups KHe XZhang SRen JSun ;Hinton Proceedings of the IEEE conference on computer vision and pattern recognition the IEEE conference on computer vision and pattern recognition 2016. 2012 29 Deep residual learning for image recognition Neural Relational Inference for Interacting Systems TKipf EFetaya K.-CWang MWelling RZemel International Conference on Machine Learning 2018 Imagenet classification with deep convolutional neural networks AKrizhevsky ISutskever GEHinton Advances in neural information processing systems 2012 PDE-Net: Learning PDEs from Data ZLong YLu XMa BDong Proceedings of the 35th International Conference on Machine Learning the 35th International Conference on Machine Learning 2018 MRaissi arXiv:1801.06637 Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations 2018 arXiv preprint Hidden physics models: Machine learning of nonlinear partial differential equations MRaissi GEKarniadakis arXiv:1711.10561 Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations MRaissi PPerdikaris GEKarniadakis 2018. 2017a 357 arXiv preprint MRaissi PPerdikaris GEKarniadakis arXiv:1711.10566 Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations 2017b arXiv preprint Dynamic routing between capsules SSabour NFrosst GEHinton Advances in neural information processing systems 2017 Graph Networks as Learnable Physics Engines for Inference and Control ASanchez-Gonzalez NHeess JTSpringenberg JMerel MRiedmiller RHadsell PBattaglia Proceedings of the 35th International Conference on Machine Learning the 35th International Conference on Machine Learning 2018 A simple neural network module for relational reasoning ASantoro DRaposo DGBarrett MMalinowski RPascanu PBattaglia TLillicrap Advances in neural information processing systems 2017 Google's neural machine translation system: Bridging the gap between human and machine translation NWatters ATacchetti TWeber RPascanu PBattaglia DZoran YNips ; Wu MSchuster ZChen QVLe MNorouzi WMacherey MKrikun YCao QGao KMacherey arXiv:1609.08144 2017. 2016 arXiv preprint Visual interaction networks Systematic Comparison of the Influence of Cool Wall versus Cool Roof Adoption on Urban Climate in the Los Angeles Basin JZhang AMohegh YLi RLevinson GBan-Weiss Environmental science & technology 52 19 2018