=Paper=
{{Paper
|id=Vol-2964/article_173
|storemode=property
|title=Graph Networks with Physics-aware Knowledge Informed in Latent Space
|pdfUrl=https://ceur-ws.org/Vol-2964/article_173.pdf
|volume=Vol-2964
|authors=Sungyong Seo,Yan Liu
|dblpUrl=https://dblp.org/rec/conf/aaaiss/SeoL21
}}
==Graph Networks with Physics-aware Knowledge Informed in Latent Space==
Graph Networks with Physics-aware Knowledge Informed in Latent Space
Sungyong Seo and Yan Liu
{sungyons, yanliu.cs} at usc.edu
Department of Computer Science
University of Southern California
Abstract Known
physics
While physics conveys knowledge of nature built from an in-
terplay between observations and theory, it has been consid- Unknown
physics
ered less important for modeling deep neural networks. De-
𝑡 𝑡 + ∆𝑡
spite the usefulness of physical rules, it is particularly chal-
lenging to leverage the knowledge for sparse data since most Modeling
physics equations are well defined on the continuous and
dense space. In addition, it is even harder to inform the equa- Physics
tions into a model if the observations are not fully governed Constraint
by the given physical knowledge. In this work, we present a
Graph
novel architecture to incorporate physics or domain knowl- Networks
edge given as a form of partial differential equations (PDEs) 𝑡 𝑡 + ∆𝑡
on sparse observations by utilizing graph structure. Moreover,
we leverage the representation power of deep learning by in- Figure 1: Concept of the proposed PaGN. Many sensor-
forming the knowledge in latent space. We demonstrate that
based observations are only sparsely available (See circled
climate prediction tasks are significantly improved and vali-
date the effectiveness and importance of the proposed model. regions) but there are continuous physical process (e.g., Dif-
fusion) behind the sparse observations. Some of the known
physics rules are injected into a model and the remained un-
Introduction known dynamics will be extracted from data.
Modeling natural phenomena in the real-world, such as cli-
mate, traffic, molecule, and so on, is extremely challenging Physics is one of the fundamental pillars describing
but important. Deep learning has achieved significant suc- how the real-world behaves. It is imperative that physics-
cesses in prediction performance by learning latent repre- informed learning models are powerful solutions to mod-
sentations from data-rich applications such as speech recog- eling natural phenomena. Incorporating domain knowledge
nition (Hinton et al. 2012), text understanding (Wu et al. has several benefits: first, it helps an optimized solution to
2016), and image recognition (Krizhevsky, Sutskever, and be more stable and to prevent overfitting; second, it pro-
Hinton 2012). While the accuracy and efficiency of data- vides theoretical guidance with which an optimized model
driven deep learning models can be improved with ad-hoc is supposed to follow and thus, helps training with fewer
architectural changes for specific tasks, we are confronted data; lastly, since a model is driven by the desired inductive
with many challenging learning scenarios in modeling nat- bias, it would be more robust to unseen data, and thus it is
ural phenomenon, where a limited number of labeled ex- easier to enable accurate extrapolation.
amples are available, there is much noise in the data, and In the meanwhile, there exist a series of challenges when
there could be constant changes in data distributions (e.g. we incorporate physics principles into machine learning
dynamic systems). Furthermore, in many domains, data are models. First, a model needs to properly handle the spatial
only available on scattered collections of points (sensors or and temporal constraints. Many physics equations demon-
point clouds, see Figure 1) where the majority of existing strate how a set of physical quantities behaves on space and
methods are not applicable. These challenges are not easily time. For example, the wave equation describes how a sig-
addressed under the purely data-driven learning models and nal is propagated through a medium over time. Second, the
therefore, there is a pressing need to develop new generation model should capture relations between objects, such as im-
robust learning models that can address these challenging age patches (Santoro et al. 2017) or rigid bodies (Battaglia
learning scenarios. et al. 2016; Chang et al. 2017). Third, the learning mod-
Copyright © 2021for this paper by its authors. Use permitted under ules should be shared over all objects because physical laws
Creative Commons License Attribution 4.0 International (CC BY are commonly applicable to all objects. Finally, the model
4.0). should be flexible to extract unknown patterns instead of be-
ing strictly constrained to the physics knowledge. Since it other physics equations. Furthermore, it is restricted in a reg-
is not always possible to describe all rules governing real- ular grid to use conventional convolutional neural networks
world data, data-driven learning is required to fill the gap (CNNs) for images.
between the known physics and real observations.
In this paper, we address the problem of modeling dynam- Discovering physical dynamics A class of mod-
ical systems based on graph neural networks by incorpo- els (Grzeszczuk, Terzopoulos, and Hinton 1998; Battaglia
rating useful knowledge described as differentiable physics et al. 2016; Chang et al. 2017; Watters et al. 2017; Sanchez-
equations. We propose a generic architecture, physics-aware Gonzalez et al. 2018; Kipf et al. 2018) have been proposed
graph networks (PaGN), which can leverage explicitly re- based on the assumption that neural networks can learn
quired physics and learn implicit patterns from data as il- complex physical interactions and simulate unseen dynam-
lustrated in Figure 1. The proposed model properly handles ics based on a current state. The models along this direction
spatially distributed objects and their relations as vertices are based on common relational inductive biases (Santoro
and edges in a graph. Moreover, temporal dependencies are et al. 2017; Battaglia et al. 2018), i.e., functions connect-
learned by recurrent computations. As Battaglia et al. (2018) ing entities and relations are shared and can be learned
suggest, the inductive bias of a graph-based model is its in- from a given sequence of simulated dynamics. (Chang
variance [to] node/edge permutations, and thus, all trainable et al. 2017; Battaglia et al. 2016; Sanchez-Gonzalez et al.
functions for the same input types are shared. 2018) commonly assumed that the objects’ behaviors were
Our contributions of this work are summarized as follows: governed by classical kinetic physics equations. Then,
• We develop a novel physics-aware learning architecture, object- and relation-centric functions were proposed to
PaGN, which incorporates differentiable physics equa- learn the transition from the current state to the next state
tions with a graph network framework. without explicitly injecting the equations into the model.
• We explore the performance of PaGN on graph signal pre- Discovering latent physics by data-driven learning has been
diction tasks to demonstrate that the physics knowledge is actively studied (Long et al. 2018; Brunton, Proctor, and
helpful to provide a significant improvement in prediction Kutz 2016). While the properly constrained filters enable us
tasks and make a model more robust. to identify the governing PDEs, it is only applicable when
we are aware of the form of target PDEs. Unlike this line
• We investigate the effectiveness and the importance of of works that extracts latent patterns from data only, our
PaGN from climate prediction to provide how physics proposed model can incorporate known physics and at the
knowledge can be beneficial for prediction performance. same time extract latent patterns from data which cannot be
captured by existing knowledge.
Related Work
Incorporating physics Among many attempts incorporat- Background
ing physical knowledge into data-driven models, Cressie
and Wikle (2015) covered a number of statistical models In this section, we introduce how differential operators in
(e.g., a hierarchical Bayesian framework) handling physi- Euclidean domain are analogously defined on the discrete
cal equations. Raissi, Perdikaris, and Karniadakis (2017a) graph domain and briefly show that the graph networks mod-
introduced a concept of physics-informed neural networks, ule is able to efficiently express the differential operators.
which utilize physics equations explicitly to train neural net-
works. By optimizing the model at initial/boundary and sam-
Calculus on Graphs
pled collocation points, the data-driven solutions of nonlin- Preliminary Given a graph G = (V, E) where V and E
ear PDEs can be found. Based on this fundamental idea, a are a set of vertices V = {1, . . . , n} and edges E ⊆ V2 , re-
number of works for simulating and discovering PDEs have spectively, two types of real functions can be defined on the
been published (Raissi and Karniadakis 2018; Raissi 2018; vertices, f : V → R, and edges, F : E → R, of the graph.
Raissi, Perdikaris, and Karniadakis 2017b). Although these It is also possible to define multiple functions on the ver-
works leveraged physical knowledge, they are limited be- tices or edges as multiple feature maps of a pixel in CNNs.
cause they require all physics behind given data to be ex- Since f and F can be viewed as scalar and vector fields in
plicitly known. differential geometry (Figure 2), the corresponding discrete
de Bezenac, Pajot, and Gallinari (2018) considered a sim- operators on graphs can be defined as follow (Bronstein et al.
ilar problem as ours. They proposed how transport physics 2017).
(advection and diffusion) could be incorporated for forecast-
ing sea surface temperature (SST). In other words, they pro- Gradient on graphs The gradient on a graph is the linear
posed how the motion flow that is helpful for the tempera- operator defined by
ture flow prediction could be extracted in an unsupervised
manner from a sequence of SST images. ∇ : L2 (V) → L2 (E)
This work is a major milestone since it captures not only (∇f )ij = (fj − fi ) if {i, j} ∈ E and 0 otherwise.
the dominant transport physics but also unknown patterns
inferred through the neural networks. Despite of its novel where L2 (V) and L2 (E) denote Hilbert spaces of vertex
architecture, the model is specifically designed for transport and edge functions, respectively, thus f ∈ L2 (V) and F ∈
physics and it is not straightforward to extend the model to L2 (E). As the gradient in Euclidean space measures the rate
Mapping Equation Physics example
node eij = φe (vi , vj ) ∇φ = −E
→ edge = (∇v)ij (Electric field)
(a) Scalar field (b) Vector field (c) Vertex func (d) Edge func
edge vi = φv (eij ) ∇ · E = ρ/0
Figure 2: Scalar/vector fields on Euclidean space and ver- → node (Maxwell’s eqn.)
tex/edge functions on a graph. = (div e)i
and direction of change in a scalar field, the gradient on a node vi = φv (vi , {vj:(i,j)∈E }) ∆φ = 0
graph computes differences of the values between two adja- → node = (∆v)i (Laplace’s eqn.)
cent vertices and the differences are defined along the direc-
tions of the corresponding edges.
Table 1: Examples of static equations in Graph networks
Divergence on graphs The divergence in Euclidean space
maps vector fields to scalar fields. Similarly, the divergence where φe , φv , φu are edge, node, and global update func-
on a graph is the linear operator defined by tions, respectively, and they can be implemented by learn-
div : L2 (E) → L2 (V) able neural networks. Note that the computation order is
X flexible. The aggregators can be chosen freely once it is in-
(div F )i = wij Fij ∀i ∈ V variant to permutations of their inputs.
j:(i,j)∈E As φe is a mapping function from vertices to edges,
where wij is a weight on the edge (i, j). It denotes a it can be replaced by the graph gradient operator to de-
weighted sum of incident edge functions to a vertex i, which scribe the known relation explicitly. Similarly, φv can learn
is interpreted as the netflow at a vertex i. divergence-like mapping (edge to node) functions. For curl-
involved functions, it is required to add another updating
function, φc , which is mapping from nodes/edges/global at-
Laplacian on graphs Laplacian (∆ = ∇2 ) in Euclidean tributes to a 3-clique attribute and vice versa. In other words,
space measures the difference between the values of the the graph networks have highly flexible modules which are
scalar field with its average on infinitesimal balls. Similarly, able to imitate the differential operators in a graph explicitly
the graph Laplacian is defined as or implicitly.
∆ : L2 (V) → L2 (V)
X Physics-aware Graph Networks
(∆f )i = wij (fi − fj ) ∀i ∈ V
j:(i,j)∈E
As deep learning models are successful to model complex
behaviors or extract abstract features in data, it is natural to
The graph Laplacian can be represented
P as a matrix form, focus on how the data-driven modeling can solve practical
L = D − W where D = diag( j:j6=i wij ) is a degree problems in physics or engineering fields. In this section, we
matrix and W denotes a weighted adjacency matrix. Note provide how domain knowledge described in physics can be
that L = ∆ = −div∇ and the minus sign is required to incorporated with the graph networks framework.
make L positive semi-definite.
Based on the core differential operators on a graph, we Static Physics
can re-write differentiable physics equations (e.g., Diffusion Many fields in physics dealing with static properties, such
equation or Wave equation) on a graph. as Electrostatic, Magnetostatic, or Hydrostatic, describe a
number of physics phenomena at rest. Among the various
Graph Networks phenomena, it is easy to express differentiable physics rules
Battaglia et al. (2018) proposed a graph networks frame- in discrete forms on a graph with the operators in previous
work, which generalizes relations among vertices, edges, Section . For instances, the Poisson equation (∇2 φ = − ρ0 )
and a whole graph. Graph Networks (GN) describe how in Electrostatics is realized as a simple matrix multiplication
edge, node, and global attributes are updated by propagat- of graph Laplacian with a vertex function. Table 1 provides
ing information among themselves. some differential formulas in Electrostatic and how the up-
Given a set of nodes (v), edges (e), and global (u) at- dating functions are defined in graph networks.
tributes, the steps of computation in a graph networks block
are as follow: Dynamic Physics
1. e0ij ← φe (eij , vi , vj , u) for all {i, j} ∈ E pairs. More practical equations have been written in the dynamic
2. vi0 ← φv (vi , ē0i , u) for all i ∈ V. forms, which describe how a given physical quantity is
ē0i is an aggregated edge attribute related to the node i. changing in a given region over time. GN can be regarded as
0 u 0 0
a module that updates a graph state including the attributes
3. u ← φ (u, ē , v̄ ) of node, edge, and a whole graph.
ē0 and v̄ 0 are aggregated attributes of all edges and all
nodes in a graph, respectively. G 0 = GN(G) (1)
Equation Physics example Forward/Recurrent computation Figure 3 provides how
the desired physics knowledge is integrated with the graph
vi0 = vi + αφv (vi , {vj:(i,j)∈E }) u̇ = α∆u networks. Given a graph G = {v, e, u}, it is fed into an
= vi + α(∆v)i (Diffusion eqn.) encoder which transforms a set of attributes of nodes (v),
edges (e), and a whole graph (u) into latent spaces.
vi00 = 2vi0 − vi + c2 φv (vi0 , {vj:(i,j)∈E
0
}) ü = c2 ∆u ṽ, ẽ, ũ = Encoder(v, e, u) (3)
= 2vi0 − vi + c2 (∆v 0 )i (Wave eqn.)
After the encoder, the encoded graph H = {ṽ, ẽ, ũ} is re-
peatedly updated within the core block as many as the re-
Table 2: Examples of dynamic equations in Graph networks quired time steps T . For each step, H is updated to H0 which
denotes the next state of the encoded graph.
where G 0 is the updated graph state. Dynamic physics for- H0 = GN(H) (4)
mulas are written as a function of time and spatial deriva- Finally, the sequentially updated attributes are re-
tives: transformed to the original spaces by a decoder.
∂ M u ∂u ∂N u
∂u v 0 , e0 , u0 = Decoder(ṽ 0 , ẽ0 , ũ0 ) (5)
f ,··· , M , ,··· , N = 0 (2)
∂t ∂t ∂x ∂x
There are two types of objective function in this architec-
where u is a physical quantity spatiotemporally varying and ture, physics knowledge and supervised objective. First, we
x is the direction where u is defined on. M and N denote the define physics-informed constraint, which is a form of equa-
highest order of time and spatial derivatives, respectively. tions in Table 1 and 2 depending on given physics knowl-
Under the state updating view in Equation 1, any types of edge and even mixed.
PDEs written in Equation 2 can be represented as a form
of finite differences. Table 2 provides the examples of the
s
fphy (Ht0 ), fphy
d
(Ht0 , · · · , Ht+M
0
) (6)
dynamic physics. u̇ and ü are the first and second order time
X
Lphy = s
fphy (Ht0 ) + fphy
d
(Ht0 , · · · , Ht+M
0
) (7)
derivatives, respectively. t
Physics in Latent Space s
where fphy (Ht0 ) and fphy
d
(Ht0 , · · · , Ht+M
0
) are the static and
dynamic physics-informed quantity, respectively. For exam-
We provide how the differential operators are implemented
ple, we can impose gradient constraint or the diffusion equa-
in a GN module in a previous section. However, it is
tion between node/edge latent representations as follow:
hardly practical for modeling complicated real-world prob-
lems with the differential operators solely because it is only s
fphy (Ht0 ) = kẽ0t − ∇ṽt0 k2
possible when all physics equations governing the observed
phenomena are explicitly known. For example, although we
d
fphy (Ht0 , Ht+1
0 0
) = kṽt+1 − ṽt0 − α∇2 ṽt0 k2
are aware that there are a number of physics equations in- Secondly, the supervised loss function between the predicted
volved in climate observations, it is almost infeasible to in- graph, Ĝ 0 , and the target graph, G 0 . This loss function is con-
clude all required equations for modeling the observations. structed based on the task, such as the cross-entropy or the
Thus, it is necessary to utilize the learnable parameters in mean squared error (MSE). Finally, the total objective func-
GN to fill the missing dynamics which is not described by tion is a sum of the two constraints:
given equations.
There is another advantage to utilize learnable parameters. L = Lsup + λLphy (8)
There are a number of unknown parameters, which need to
where λ controls the importance of the physics term.
be pre-defined to specify the physics equations, and the pa-
rameters can be inferred by the learnable parameters. For
example, while we have knowledge that input signal has a Experiment
wave property, the speed of waves (c in Table 2) should be In this section, we evaluate PaGN on a real-world climate
given to fully describe the wave equation. It will be even dataset on the Southern California region.
worse when multiple input signals are involved since each
signal is governed by different parameters in the same kind Climate Data
of equation. While both temperature and surface pressure For the evaluation on real-world data, we used the hourly
are continuous and diffusive, they should have different dif- simulated climate observations for 16 days on the South-
fusion coefficients (α in Table 2) in the same diffusion equa- ern California region (Zhang et al. 2018). In this dataset, we
tion. To address the issue we can transform the input signals sampled small regions randomly from two area (Los Ange-
to latent space and use one equation in the latent space in- les and San Diego, Figure 4) encompassing urban and rural
stead of imposing multiple equations to input signals sepa- meteorological features to generate spatially discrete obser-
rately. Then, the parameters in Encoder make the different vations. To build a graph, we connected a pair of the sampled
signals follow the equation differently. We formalize how regions by using k-nearest neighbors algorithm (k = 3).
this idea is implemented as follow. This data preprocessing is required to verify the proposed
𝒢 Encoder ⨀ GN ℋ′ Decoder 𝒢$ ′ 𝒢′
ℋ
Supervised Loss
Physics equation
xT
Figure 3: Recurrent architecture to incorporate physics equation on GN. The blue blocks have learnable parameters and the
orange blocks are objective functions. is a concatenation operator and the middle core block can be repeated as many as the
required time steps (T ).
idea as well as evaluate PaGN on the spatiotemporally sparse
setting, which is more common for sensor-based datasets.
The vertex attributes consist of 10 climate observations,
Air temperature, Albedo, Precipitation, Soil moisture, Rel-
ative humidity, Specific humidity, Surface pressure, Plane-
tary boundary layer height, and Wind vector (2 directions).
While the edge attributes are not given explicitly, we could
specify the type of each edge by using the type of connected
regions. There are 13 different land-usage types and each Figure 4: Sampled regions in Southern California area.
type summarizes how the corresponding land is used. Based (Left) Los Angeles (274 nodes) and (Right) San Diego (282
on the types of connected regions, we assigned different em- nodes) area.
Model LA area SD area
bedding vectors to edges.
MLP 0.8140±0.0651 0.7735±0.0539
PaGN Architecture LSTM 0.7855±0.0644 0.8123±0.0875
GN-only 0.5951±0.0517 0.6947±0.1859
As explained in Section , PaGN consists of three modules, GN-skip 0.5906±0.0620 0.6456±0.1499
graph encoder, GN block, and graph decoder (Figure 3). The PaGN (wave) 0.5366±0.0631 0.6413±0.1549
encoder contains two feed forward networks, φv and φe , ap- PaGN (diff) 0.5289±0.0405 0.5746 ±0.0471
plied to node and edge features, respectively. By passing the
encoder, the features are transformed to the latent space (H) Table 3: One step prediction error (MSE)
where we will impose physics equations.
In the GN block, the node/edge/graph features are updated
by the GN algorithm described in Section . The latent graph tions, which is found through cross validation. Note that the
states, H and H0 , indicate the hidden states of the current and equation term can be replaced by other equations properly.
next observations. For the physics constraint, we informed
the diffusion and wave equation in Table 2, which describe Experimental Settings
the behavior of the continuous physical quantities. As the In our experiments, we used the air temperature as a target
most of the climate observations are varying continuously, observation and other 9 observations were used as input. We
the diffusion equation, as a part of the continuity equation, is first evaluated our model by performing the one-step and
one of the inductive bias that should be considered for mod- multistep prediction tasks on the two different area with a
eling. In addition, the wave equation is useful to describe mean square error metric. For both regions, we commonly
atmospheric phenomena, especially 1 solar day harmonics trained the model with input observations for 10 timesteps
(e.g., Atmospheric tide). Note that the physics equations are (t − 10 : t − 1) and predicted targets from t − 9 to t. First
not directly applied to the input observations, but rather to 65% of a total length was used as a training set and remain-
the latent representations. The state-updating process is re- ing series was split into validation (10%) and test sets (25%).
peated at least as many as the order of the equations to pro- We explored several baselines: MLP, LSTM, and GN-
vide the finite difference equation. For multistep predictions, only ignoring the physics constraint in PaGN. We also com-
the recurrent module is repeated as many as the number of pared GN-skip which connects between H and H0 with the
the predictions and the physics equation will be also applied skip-connection (He et al. 2016) without the physics con-
multiple times as well. Finally, the decoder takes H0 as input straint.
to return the next predictions. The following objective is the
total loss function of PaGN with the diffusion equation. One step Prediction
T T
X X Table 3 shows the prediction error of the baselines and PaGN
L= kŷi0 − yi0 k2 + λ kṽi0 − ṽi−1 − α∇2 ṽi−1 k2 on different areas. MLP and LSTM are shared over all sta-
i=1 i=1 tions and their performaces are outperformed by other mod-
(9) els leveraging a given graph structure. It implies that know-
where y 0 is a vector of the target observations (i.e. node vec- ing neighboring information is significantly helpful to infer
tors) and α adjusts the diffusivity of the latent representa- its own state and it is intuitive since climate behaviors are
Model LA area SD area 1.2
LA area LA area
SD area 2.5 SD area
LSTM 1.9022±0.2078 1.2489±0.2295 1.1
GN-only 1.6137±0.1128 1.5532±0.2023 1.0 2.0
GN-skip 1.5429±0.0932 1.4423±0.1622
MSE
MSE
0.9
PaGN(diff) 1.4656±0.0474 1.0999±0.0435 1.5
0.8
Table 4: Multistep prediction error (MSE) 0.7 1.0
0.6
0.5 0.5
spatiotemporally continuous. Among the graph-based mod- 0% 25% 50% 75% 0.001 0.01 0.1 1.0
els, PaGN(diff) provides the least MSEs. It validates that (a) MSE on sampled data (b) MSE with different λ
the diffusive property provides a strong inductive bias with
the latent representation learning. Note that the standard de- Figure 5: In (a) MSEs of PaGN are almost as good as GN-
viations from PaGN(diff) are significantly smaller than only (gray lines) despite the less number of training data.
those of other baselines and it implies that the integrated (b) provides how the prediction performance is dependent
physics knowledge properly stabilizes optimization process on the physics term.
by introducing additional objective.
Multistep Prediction Model LA area SD area
To evaluate the effectiveness of the state-wise regularization PaGN(rand) 1.1406 0.7073
more carefully, we conducted the multistep prediction task PaGN(diff+wave) 0.5624 0.6724
(10 forecast horizon). For the task, the recurrent modules are
modified to predict input observations as well and the pre- Table 5: One step prediction MSE with different constraints.
dicted one is re-fed in the model for future timesteps. While Importance of Physics Constraint
the models having a recurrent module are able to predict a To study the importance of the physics term, we trained
few more steps reasonably, there are a couple of things we PaGN with different λ controlling the importance of the
should pay attention. First, the results imply that utilizing physics term. While we found that the physics term is sub-
the neighboring information is important because GN-only stantially helpful from Table 3 and 4, the term is not sup-
model shows similar or better MSEs compared to LSTM posed to be dominant (See Figure 5b) but tuned properly.
for the multistep tasks, even though it has a simple recur- This is intuitive since the term only provides partial knowl-
rent module that is not as good as that of LSTM. Second, edge (diffusive input signals), which changes loss surface
we found that the diffusion equation in PaGN gives the sta- to help parameters more stable to predict next signals, in-
ble state transition and the property provides slowly varying stead of governing the dynamics explicitly. Scaling down the
latent states which are desired particularly for the climate physics term is similar to what Sabour, Frosst, and Hinton
forecasting. Note that the skip-connection in GN-skip is also (2017) did for reconstruction error not to dominate margin
able to restrict the rapid changes of H. However, it is neces- loss but to help the optimization process.
sary to more carefully optimize the parameters in GN-skip We also present MSEs from PaGN(rand) defined by
to learn the residual term in H0 = H + GN(H) properly. randomly sampling (α, β) ∈ [−2.5, 2.5] in the constraint
||v 00 + αv 0 + βv − c∆v||2 , and PaGN(diff+wave) su-
Effectiveness of Physics Constraint perposing the two equations. Table 5 shows that the random
One of the benefits of physics-aware learning is data effi- equation significantly degrades the overall prediction qual-
ciency. We explore how much the physics constraint is help- ity. Note that the simple superposition of two equations does
ful by testing if PaGN can be well-trained when the num- not always guarantee lower error even if each equation is
ber of data for the supervised objective is limited for the helpful separately. When the two equations are non-linearly
one-step prediction task. We randomly sampled training data connected in the unknown (fully) governing equation, the
which were used to optimize the total loss function (Equa- superposition cannot provide meaningful inductive bias. The
tion 9) and the left unsampled data were only used to mini- results demonstrate that the physics term is an useful induc-
mize the physics constraint: tive bias when it is properly defined.
L = Lisup + λLiphy , i is a sampled step
Conclusion
L = λLiphy , otherwise
In this work, we introduce a new architecture PaGN based
We found that the diffusion equation can benefit to optimize on graph networks to incorporate prior knowledge given as
PaGN even if the target observations are partially available a form of PDEs over time and space. While existing works
(Figure 5a). Although the overall performances of PaGN are more focus on how to discover equations in data generated
degraded when less number of sampled data are used, the by explicit physics rules, we propose a method to leverage
error are not far deviated from those of GN-only. Even the weakly given inductive bias describing data. We empirically
GN-only model is outperformed by PaGN when only 70% analyze the performance of PaGN across a range of predic-
training data are used with the state-wise constraint. tion experiments on the climate observations.
References Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2017a.
Battaglia, P.; Pascanu, R.; Lai, M.; Rezende, D. J.; et al. 2016. Physics Informed Deep Learning (Part I): Data-driven So-
Interaction networks for learning about objects, relations and lutions of Nonlinear Partial Differential Equations. arXiv
physics. In Advances in neural information processing sys- preprint arXiv:1711.10561 .
tems, 4502–4510. Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2017b.
Battaglia, P. W.; Hamrick, J. B.; Bapst, V.; Sanchez- Physics Informed Deep Learning (Part II): Data-driven Dis-
Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; covery of Nonlinear Partial Differential Equations. arXiv
Raposo, D.; Santoro, A.; Faulkner, R.; et al. 2018. Relational preprint arXiv:1711.10566 .
inductive biases, deep learning, and graph networks. arXiv Sabour, S.; Frosst, N.; and Hinton, G. E. 2017. Dynamic rout-
preprint arXiv:1806.01261 . ing between capsules. In Advances in neural information pro-
Bronstein, M. M.; Bruna, J.; LeCun, Y.; Szlam, A.; and Van- cessing systems, 3856–3866.
dergheynst, P. 2017. Geometric deep learning: going beyond Sanchez-Gonzalez, A.; Heess, N.; Springenberg, J. T.; Merel,
euclidean data. IEEE Signal Processing Magazine 34(4): 18– J.; Riedmiller, M.; Hadsell, R.; and Battaglia, P. 2018. Graph
42. Networks as Learnable Physics Engines for Inference and
Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2016. Discov- Control. In Proceedings of the 35th International Conference
ering governing equations from data by sparse identification on Machine Learning.
of nonlinear dynamical systems. Proceedings of the National Santoro, A.; Raposo, D.; Barrett, D. G.; Malinowski, M.; Pas-
Academy of Sciences 113(15): 3932–3937. canu, R.; Battaglia, P.; and Lillicrap, T. 2017. A simple neural
Chang, M. B.; Ullman, T.; Torralba, A.; and Tenenbaum, J. B. network module for relational reasoning. In Advances in neu-
2017. A Compositional Object-Based Approach to Learning ral information processing systems, 4967–4976.
Physical Dynamics. International Conference on Learning Watters, N.; Tacchetti, A.; Weber, T.; Pascanu, R.; Battaglia,
Representations . P.; and Zoran, D. 2017. Visual interaction networks. NIPS .
Cressie, N.; and Wikle, C. K. 2015. Statistics for spatio- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q. V.; Norouzi, M.;
temporal data. John Wiley & Sons. Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.;
de Bezenac, E.; Pajot, A.; and Gallinari, P. 2018. Deep et al. 2016. Google’s neural machine translation system:
Learning for Physical Processes: Incorporating Prior Scien- Bridging the gap between human and machine translation.
tific Knowledge. In International Conference on Learning arXiv preprint arXiv:1609.08144 .
Representations. Zhang, J.; Mohegh, A.; Li, Y.; Levinson, R.; and Ban-Weiss,
Grzeszczuk, R.; Terzopoulos, D.; and Hinton, G. 1998. Neu- G. 2018. Systematic Comparison of the Influence of Cool
roanimator: Fast neural network emulation and control of Wall versus Cool Roof Adoption on Urban Climate in the Los
physics-based models. In Proceedings of the 25th annual Angeles Basin. Environmental science & technology 52(19):
conference on Computer graphics and interactive techniques, 11188–11197.
9–20. ACM.
He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual
learning for image recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition, 770–
778.
Hinton, G.; Deng, L.; Yu, D.; Dahl, G. E.; Mohamed, A.-
r.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath,
T. N.; et al. 2012. Deep neural networks for acoustic model-
ing in speech recognition: The shared views of four research
groups. IEEE Signal processing magazine 29(6): 82–97.
Kipf, T.; Fetaya, E.; Wang, K.-C.; Welling, M.; and Zemel, R.
2018. Neural Relational Inference for Interacting Systems.
International Conference on Machine Learning .
Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Ima-
genet classification with deep convolutional neural networks.
In Advances in neural information processing systems, 1097–
1105.
Long, Z.; Lu, Y.; Ma, X.; and Dong, B. 2018. PDE-Net:
Learning PDEs from Data. In Proceedings of the 35th In-
ternational Conference on Machine Learning. URL http:
//proceedings.mlr.press/v80/long18a.html.
Raissi, M. 2018. Deep Hidden Physics Models: Deep
Learning of Nonlinear Partial Differential Equations. arXiv
preprint arXiv:1801.06637 .
Raissi, M.; and Karniadakis, G. E. 2018. Hidden physics
models: Machine learning of nonlinear partial differential
equations. Journal of Computational Physics 357: 125–141.