<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eu-
gene Brevdo, Zhifeng Chen, Craig Citro, Greg S
Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
et al. TensorFlow: Large-scale machine learning on
heterogeneous distributed systems. arXiv preprint
arXiv:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Data-driven Inverse Modeling from Sparse Observations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kailai Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Darve</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Computational and Mathematical Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Introduction: Data-driven Inverse Modeling with Neural Networks</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Mechanical Engineering Stanford University Stanford</institution>
          ,
          <addr-line>California 94305</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1603</year>
      </pub-date>
      <volume>04467</volume>
      <abstract>
        <p>Deep neural networks (DNN) have been used to model nonlinear relations between physical quantities. Those DNNs are embedded in physical systems described by partial differential equations (PDE) and trained by minimizing a loss function that measures the discrepancy between predictions and observations in some chosen norm. This loss function often includes the PDE constraints as a penalty term when only sparse observations are available. As a result, the PDE is only satisfied approximately by the solution. However, the penalty term typically slows down the convergence of the optimizer for stiff problems. We present a new approach that trains the embedded DNNs while numerically satisfying the PDE constraints. We develop an algorithm that enables differentiating both explicit and implicit numerical solvers in reverse-mode automatic differentiation. This allows the gradients of the DNNs and the PDE solvers to be computed in a unified framework. We demonstrate that our approach enjoys faster convergence and better stability in relatively stiff problems compared to the penalty method. Our approach allows for the potential to solve and accelerate a wide range of data-driven inverse modeling, where the physical constraints are described by PDEs and need to be satisfied accurately.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Models involving partial differential equations (PDE) are
usually used for describing physical phenomena in science
and engineering. Unknown parameters in the models can be
calibrated using observations, which are typically associated
with the output of the models.</p>
      <p>
        When the unknown is a function, an approach is to
approximate the unknown using a neural network and plug it
into the PDE. The neural network is trained by matching the
predicted and the observed output of the PDE model. In the
presence of full-field observations, in many cases we can
approximate the derivatives in the PDE and reduce the inverse
problem to a standard regression problem (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for an
example). However, in the context of sparse observations, i.e.,
Copyright c 2020, for this paper by its authors. Use permitted
un-der Creative Commons License Attribution 4.0 International
(CCBY 4.0).
only part of the outputs of the models are observable, we
must couple the PDE and the neural network to obtain the
prediction.
      </p>
      <p>Specifically, we formulate the inverse problem as a
PDEconstrained optimization problem
min L(u) =
2</p>
      <p>X (u(xi)
ui)</p>
      <p>2
i2Iobs
s:t: F ( ; u) = 0
where L is called the loss function, which measures the
discrepancy between estimated outputs u and observed outputs
ui at locations fxig. Iobs is the set of indices of locations
where observations are available. F is the PDE model from
which we can calculate the solution u. is the space of
all neural networks with a fixed architecture and can be
viewed as weights and biases. can also be physical
parameter spaces when we solve a parametric inverse problem.
One popular way to solve this problem is by minimizing the
augmented loss function (penalty method) [2]
min L~( ; u) = L(u) +
;u
kF ( ; u)k22
However, this approach suffers from ill-conditioning and
slow convergence partially due to the additional independent
variable u besides .</p>
      <p>Gradients</p>
      <p>L˜(θ, u)
λ F (θ, u) 22
PDE
Residual
θ</p>
      <p>L(u)
u</p>
      <p>Observation
Mismatch</p>
      <p>PDE
Solver
Observation
Mismatch</p>
      <p>L(u)
u
θ</p>
      <p>Gradients</p>
      <p>In this work, we propose a new approach, physics
constrained learning (PCL), that improves the conditioning and
accelerates the convergence of inverse modeling. First, we
enforce the physical constraint F ( ; u) = 0 by solving the
PDE numerically. Our approach is compatible with common
numerical schemes such as finite difference methods, finite
volume methods, and finite element methods. Second, the
gradient @L(u( )) needed for optimization is computed with
@
reverse-mode automatic differentiation (AD) [3], and the
required Jacobian is computed using forward-mode automatic
differentiation. We use ADCME1 for AD functionalities in
this work.</p>
      <p>Methods: Physics Constrained Learning
The main step in PCL is how to compute the gradients
@L(u( )) . PCL is based on the formula
@
1 1
A
bian propagation and will remain sparse as long as the
numerical scheme we choose has local basis functions.
2. Solving the linear system
(1)
(2)
0
3. Apply reverse mode automatic differentiation to compute
Here can be the neural network weights and biases and
thus can be high dimensional. The challenge here is to
compute the Jacobian matrix as well as the gradient Equation (2).
The detailed algorithm and analysis is presented in [4].
Findings and Discussion: Enabling Faster and</p>
      <p>More Robust Convergence
The key finding from our work is that enforcing physical
constraints leads to faster and more robust convergence
compared to the penalty method for stiff problems. We conduct
multiple numerical examples and show that in our
benchmark problems,
1. PCL enjoys faster convergence with respect to the number
of iterations to converge to a predetermined accuracy.
Particularly, we observe a 104 times speed-up compared with
the penalty method in the Helmholtz problem. We also
prove a convergence result, which shows that for the
chosen model problem, the condition number in the penalty
method is much worse than that of PCL.
2. PCL exhibits mesh independent convergence, while the
penalty method does not scale with respect to the number
of iterations as well as PCL when we refine the mesh.
s:t:Au = y
where u0 = A 1y so that the optimal = 1; the
corresponding penalty method solve a least square problem
min kA yk22 A = pI A p0 y y = u00
We have proved the following theorem
Theorem 0.1 The condition number of A is
lim inf (A ) (A)2</p>
      <p>!1
and therefore the condition number of the unconstraint
optimization problem from the penalty method is the square of
that from PCL asymptotically.</p>
      <p>Conclusions
We believe that enforcing physical constraints in
illconditioned inverse problem is essential for developing
robust and efficient algorithms. Particularly, when the
unknowns are represented by neural networks, PCL
demonstrates superior robustness and efficiency compared to
the penalty method. Technically, the application of
automatic differentiation gets rid of the challenging and
timeconsuming process of deriving and implementing gradients
and Jacobians. Meanwhile, AD also allows for leveraging
the computational graph optimization to improve the inverse
modeling performance. One limitation of PCL is that the
PDE must be solved for each gradient computation, which
can be expensive in both memory and computational costs.
This computational challenge can be alleviated by
considering accelerating techniques such as reduced-order modeling.
[2] Maziar Raissi, Paris Perdikaris, and George E
Karniadakis. Physics-informed neural networks: A deep
learning framework for solving forward and inverse
problems involving nonlinear partial differential
equations. Journal of Computational Physics, 378:686–707,
2019.
[4] Kailai Xu and Eric Darve. Physics constrained learning
for data-driven inverse modeling from sparse
observations. arXiv preprint arXiv:2002.10521, 2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Hayden</given-names>
            <surname>Schaeffer</surname>
          </string-name>
          .
          <article-title>Learning partial differential equations via data discovery and sparse optimization</article-title>
          .
          <source>Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences</source>
          ,
          <volume>473</volume>
          (
          <issue>2197</issue>
          ):
          <fpage>20160446</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>