Data-driven Inverse Modeling from Sparse Observations

                                                   Kailai Xu1 and Eric Darve1, 2
                                      1
                                          Institute for Computational and Mathematical Engineering
                                                          2
                                                            Mechanical Engineering
                                                             Stanford University
                                                          Stanford, California 94305


                            Abstract                                    only part of the outputs of the models are observable, we
     Deep neural networks (DNN) have been used to model
                                                                        must couple the PDE and the neural network to obtain the
     nonlinear relations between physical quantities. Those             prediction.
     DNNs are embedded in physical systems described by                   Specifically, we formulate the inverse problem as a PDE-
     partial differential equations (PDE) and trained by min-           constrained optimization problem
     imizing a loss function that measures the discrepancy                                         X                   2
     between predictions and observations in some chosen                             min L(u) =          (u(xi ) − ui )
                                                                                            θ∈Θ
     norm. This loss function often includes the PDE con-                                                      i∈Iobs
     straints as a penalty term when only sparse observa-                                               s.t. F (θ, u) = 0
     tions are available. As a result, the PDE is only satisfied
     approximately by the solution. However, the penalty                where L is called the loss function, which measures the dis-
     term typically slows down the convergence of the op-               crepancy between estimated outputs u and observed outputs
     timizer for stiff problems. We present a new approach              ui at locations {xi }. Iobs is the set of indices of locations
     that trains the embedded DNNs while numerically sat-               where observations are available. F is the PDE model from
     isfying the PDE constraints. We develop an algorithm
     that enables differentiating both explicit and implicit
                                                                        which we can calculate the solution u. Θ is the space of
     numerical solvers in reverse-mode automatic differen-              all neural networks with a fixed architecture and θ can be
     tiation. This allows the gradients of the DNNs and the             viewed as weights and biases. Θ can also be physical pa-
     PDE solvers to be computed in a unified framework.                 rameter spaces when we solve a parametric inverse problem.
     We demonstrate that our approach enjoys faster con-                One popular way to solve this problem is by minimizing the
     vergence and better stability in relatively stiff problems         augmented loss function (penalty method) [2]
     compared to the penalty method. Our approach allows
     for the potential to solve and accelerate a wide range of                             min L̃(θ, u) = L(u) + λkF (θ, u)k22
                                                                                           θ,u
     data-driven inverse modeling, where the physical con-
     straints are described by PDEs and need to be satisfied            However, this approach suffers from ill-conditioning and
     accurately.                                                        slow convergence partially due to the additional independent
                                                                        variable u besides θ.
 Introduction: Data-driven Inverse Modeling
                                                                                                                            Observation
            with Neural Networks                                         Gradients                                          Mismatch
                                                                                                                                          Gradients
Models involving partial differential equations (PDE) are                                                                       L(u)
usually used for describing physical phenomena in science                                    L̃(θ, u)

and engineering. Unknown parameters in the models can be
calibrated using observations, which are typically associated                  λF (θ, u)22            L(u)                         u
with the output of the models.
   When the unknown is a function, an approach is to ap-                    PDE                              Observation    PDE
                                                                            Residual                         Mismatch       Solver
proximate the unknown using a neural network and plug it
into the PDE. The neural network is trained by matching the
                                                                                       θ                 u                           θ
predicted and the observed output of the PDE model. In the
presence of full-field observations, in many cases we can ap-
proximate the derivatives in the PDE and reduce the inverse             Figure 1: Comparison of the penalty method (left) and PCL
problem to a standard regression problem (see [1] for an ex-            (right).
ample). However, in the context of sparse observations, i.e.,
Copyright c 2020, for this paper by its authors. Use permitted             In this work, we propose a new approach, physics con-
un-der Creative Commons License Attribution 4.0 International           strained learning (PCL), that improves the conditioning and
(CCBY 4.0).                                                             accelerates the convergence of inverse modeling. First, we
enforce the physical constraint F (θ, u) = 0 by solving the      3. PCL is more robust to noise and neural network archi-
PDE numerically. Our approach is compatible with common             tectures. The penalty method includes the solution uh as
numerical schemes such as finite difference methods, finite         independent variables to optimize, and the optimizer may
volume methods, and finite element methods. Second, the             converge to a nonphysical local minimum.
gradient ∂L(u(θ))
             ∂θ   needed for optimization is computed with        For theoretical analysis, we consider a model problem
reverse-mode automatic differentiation (AD) [3], and the re-                             minku − u0 k22
quired Jacobian is computed using forward-mode automatic                                 θ
differentiation. We use ADCME1 for AD functionalities in                                s.t.Au = θy
this work.                                                                      −1
                                                                 where u0 = A y so that the optimal θ = 1; the corre-
                                                                 sponding penalty method solve a least square problem
       Methods: Physics Constrained Learning                                                
                                                                                               I       0
                                                                                                                      
                                                                                                                       u
The main step in PCL is how to compute the gradients             min kAθ−yk22       Aλ = √            √           y= 0
∂L(u(θ))
                                                                  θ                            λA − λy                 0
  ∂θ     . PCL is based on the formula                           We have proved the following theorem
                                       −1
                                                                 Theorem 0.1 The condition number of Aλ is
                          
  ∂L(u(θ))         ∂L(u)  ∂F           ∂F (θ, u(θ))                             lim inf κ(Aλ ) ≥ κ(A)2
              =−                                                                   λ→∞
      ∂θ             ∂u      ∂u            ∂θ
                                  u=u(θ)                         and therefore the condition number of the unconstraint op-
The key for efficiency is to compute the gradients in the fol-   timization problem from the penalty method is the square of
lowing three steps                                               that from PCL asymptotically.

1. The Jacobian ∂F
                ∂u            is computed with forward Jaco-                           Conclusions
                     u=u(θ)
   bian propagation and will remain sparse as long as the        We believe that enforcing physical constraints in ill-
   numerical scheme we choose has local basis functions.         conditioned inverse problem is essential for developing ro-
                                                                 bust and efficient algorithms. Particularly, when the un-
2. Solving the linear system
                                                                 knowns are represented by neural networks, PCL demon-
                           T                                   strates superior robustness and efficiency compared to
                                          T
                 ∂F          w = ∂L(u)                          the penalty method. Technically, the application of auto-
                                                      (1)
                 ∂u                   ∂u                         matic differentiation gets rid of the challenging and time-
                     u=G(θ)
                                                                 consuming process of deriving and implementing gradients
3. Apply reverse mode automatic differentiation to compute       and Jacobians. Meanwhile, AD also allows for leveraging
                                                                 the computational graph optimization to improve the inverse
                 ∂L(u(θ))      ∂F                                modeling performance. One limitation of PCL is that the
                          = wT    (θ, G(θ))                (2)
                   ∂θ          ∂θ                                PDE must be solved for each gradient computation, which
Here θ can be the neural network weights and biases and          can be expensive in both memory and computational costs.
thus can be high dimensional. The challenge here is to com-      This computational challenge can be alleviated by consider-
pute the Jacobian matrix as well as the gradient Equation (2).   ing accelerating techniques such as reduced-order modeling.
The detailed algorithm and analysis is presented in [4].
                                                                                        References
 Findings and Discussion: Enabling Faster and                    [1] Hayden Schaeffer. Learning partial differential equa-
          More Robust Convergence                                    tions via data discovery and sparse optimization. Pro-
                                                                     ceedings of the Royal Society A: Mathematical, Physical
 The key finding from our work is that enforcing physical            and Engineering Sciences, 473(2197):20160446, 2017.
 constraints leads to faster and more robust convergence com-
 pared to the penalty method for stiff problems. We conduct      [2] Maziar Raissi, Paris Perdikaris, and George E Kar-
 multiple numerical examples and show that in our bench-             niadakis. Physics-informed neural networks: A deep
 mark problems,                                                      learning framework for solving forward and inverse
                                                                     problems involving nonlinear partial differential equa-
1. PCL enjoys faster convergence with respect to the number          tions. Journal of Computational Physics, 378:686–707,
   of iterations to converge to a predetermined accuracy. Par-       2019.
   ticularly, we observe a 104 times speed-up compared with
   the penalty method in the Helmholtz problem. We also          [3] Martı́n Abadi, Ashish Agarwal, Paul Barham, Eu-
   prove a convergence result, which shows that for the cho-         gene Brevdo, Zhifeng Chen, Craig Citro, Greg S
   sen model problem, the condition number in the penalty            Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
   method is much worse than that of PCL.                            et al. TensorFlow: Large-scale machine learning on
                                                                     heterogeneous distributed systems.      arXiv preprint
2. PCL exhibits mesh independent convergence, while the              arXiv:1603.04467, 2016.
   penalty method does not scale with respect to the number
                                                                 [4] Kailai Xu and Eric Darve. Physics constrained learning
   of iterations as well as PCL when we refine the mesh.
                                                                     for data-driven inverse modeling from sparse observa-
   1
       https://github.com/kailaix/ADCME.jl                           tions. arXiv preprint arXiv:2002.10521, 2020.