Data-driven Inverse Modeling from Sparse Observations Kailai Xu1 and Eric Darve1, 2 1 Institute for Computational and Mathematical Engineering 2 Mechanical Engineering Stanford University Stanford, California 94305 Abstract only part of the outputs of the models are observable, we Deep neural networks (DNN) have been used to model must couple the PDE and the neural network to obtain the nonlinear relations between physical quantities. Those prediction. DNNs are embedded in physical systems described by Specifically, we formulate the inverse problem as a PDE- partial differential equations (PDE) and trained by min- constrained optimization problem imizing a loss function that measures the discrepancy X 2 between predictions and observations in some chosen min L(u) = (u(xi ) − ui ) θ∈Θ norm. This loss function often includes the PDE con- i∈Iobs straints as a penalty term when only sparse observa- s.t. F (θ, u) = 0 tions are available. As a result, the PDE is only satisfied approximately by the solution. However, the penalty where L is called the loss function, which measures the dis- term typically slows down the convergence of the op- crepancy between estimated outputs u and observed outputs timizer for stiff problems. We present a new approach ui at locations {xi }. Iobs is the set of indices of locations that trains the embedded DNNs while numerically sat- where observations are available. F is the PDE model from isfying the PDE constraints. We develop an algorithm that enables differentiating both explicit and implicit which we can calculate the solution u. Θ is the space of numerical solvers in reverse-mode automatic differen- all neural networks with a fixed architecture and θ can be tiation. This allows the gradients of the DNNs and the viewed as weights and biases. Θ can also be physical pa- PDE solvers to be computed in a unified framework. rameter spaces when we solve a parametric inverse problem. We demonstrate that our approach enjoys faster con- One popular way to solve this problem is by minimizing the vergence and better stability in relatively stiff problems augmented loss function (penalty method) [2] compared to the penalty method. Our approach allows for the potential to solve and accelerate a wide range of min L̃(θ, u) = L(u) + λkF (θ, u)k22 θ,u data-driven inverse modeling, where the physical con- straints are described by PDEs and need to be satisfied However, this approach suffers from ill-conditioning and accurately. slow convergence partially due to the additional independent variable u besides θ. Introduction: Data-driven Inverse Modeling Observation with Neural Networks Gradients Mismatch Gradients Models involving partial differential equations (PDE) are L(u) usually used for describing physical phenomena in science L̃(θ, u) and engineering. Unknown parameters in the models can be calibrated using observations, which are typically associated λF (θ, u)22 L(u) u with the output of the models. When the unknown is a function, an approach is to ap- PDE Observation PDE Residual Mismatch Solver proximate the unknown using a neural network and plug it into the PDE. The neural network is trained by matching the θ u θ predicted and the observed output of the PDE model. In the presence of full-field observations, in many cases we can ap- proximate the derivatives in the PDE and reduce the inverse Figure 1: Comparison of the penalty method (left) and PCL problem to a standard regression problem (see [1] for an ex- (right). ample). However, in the context of sparse observations, i.e., Copyright c 2020, for this paper by its authors. Use permitted In this work, we propose a new approach, physics con- un-der Creative Commons License Attribution 4.0 International strained learning (PCL), that improves the conditioning and (CCBY 4.0). accelerates the convergence of inverse modeling. First, we enforce the physical constraint F (θ, u) = 0 by solving the 3. PCL is more robust to noise and neural network archi- PDE numerically. Our approach is compatible with common tectures. The penalty method includes the solution uh as numerical schemes such as finite difference methods, finite independent variables to optimize, and the optimizer may volume methods, and finite element methods. Second, the converge to a nonphysical local minimum. gradient ∂L(u(θ)) ∂θ needed for optimization is computed with For theoretical analysis, we consider a model problem reverse-mode automatic differentiation (AD) [3], and the re- minku − u0 k22 quired Jacobian is computed using forward-mode automatic θ differentiation. We use ADCME1 for AD functionalities in s.t.Au = θy this work. −1 where u0 = A y so that the optimal θ = 1; the corre- sponding penalty method solve a least square problem Methods: Physics Constrained Learning  I 0    u The main step in PCL is how to compute the gradients min kAθ−yk22 Aλ = √ √ y= 0 ∂L(u(θ)) θ λA − λy 0 ∂θ . PCL is based on the formula We have proved the following theorem −1 Theorem 0.1 The condition number of Aλ is  ∂L(u(θ)) ∂L(u)  ∂F  ∂F (θ, u(θ)) lim inf κ(Aλ ) ≥ κ(A)2 =− λ→∞ ∂θ ∂u ∂u ∂θ u=u(θ) and therefore the condition number of the unconstraint op- The key for efficiency is to compute the gradients in the fol- timization problem from the penalty method is the square of lowing three steps that from PCL asymptotically. 1. The Jacobian ∂F ∂u is computed with forward Jaco- Conclusions u=u(θ) bian propagation and will remain sparse as long as the We believe that enforcing physical constraints in ill- numerical scheme we choose has local basis functions. conditioned inverse problem is essential for developing ro- bust and efficient algorithms. Particularly, when the un- 2. Solving the linear system knowns are represented by neural networks, PCL demon-  T strates superior robustness and efficiency compared to  T ∂F  w = ∂L(u) the penalty method. Technically, the application of auto-  (1) ∂u ∂u matic differentiation gets rid of the challenging and time- u=G(θ) consuming process of deriving and implementing gradients 3. Apply reverse mode automatic differentiation to compute and Jacobians. Meanwhile, AD also allows for leveraging the computational graph optimization to improve the inverse ∂L(u(θ)) ∂F modeling performance. One limitation of PCL is that the = wT (θ, G(θ)) (2) ∂θ ∂θ PDE must be solved for each gradient computation, which Here θ can be the neural network weights and biases and can be expensive in both memory and computational costs. thus can be high dimensional. The challenge here is to com- This computational challenge can be alleviated by consider- pute the Jacobian matrix as well as the gradient Equation (2). ing accelerating techniques such as reduced-order modeling. The detailed algorithm and analysis is presented in [4]. References Findings and Discussion: Enabling Faster and [1] Hayden Schaeffer. Learning partial differential equa- More Robust Convergence tions via data discovery and sparse optimization. Pro- ceedings of the Royal Society A: Mathematical, Physical The key finding from our work is that enforcing physical and Engineering Sciences, 473(2197):20160446, 2017. constraints leads to faster and more robust convergence com- pared to the penalty method for stiff problems. We conduct [2] Maziar Raissi, Paris Perdikaris, and George E Kar- multiple numerical examples and show that in our bench- niadakis. Physics-informed neural networks: A deep mark problems, learning framework for solving forward and inverse problems involving nonlinear partial differential equa- 1. PCL enjoys faster convergence with respect to the number tions. Journal of Computational Physics, 378:686–707, of iterations to converge to a predetermined accuracy. Par- 2019. ticularly, we observe a 104 times speed-up compared with the penalty method in the Helmholtz problem. We also [3] Martı́n Abadi, Ashish Agarwal, Paul Barham, Eu- prove a convergence result, which shows that for the cho- gene Brevdo, Zhifeng Chen, Craig Citro, Greg S sen model problem, the condition number in the penalty Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, method is much worse than that of PCL. et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint 2. PCL exhibits mesh independent convergence, while the arXiv:1603.04467, 2016. penalty method does not scale with respect to the number [4] Kailai Xu and Eric Darve. Physics constrained learning of iterations as well as PCL when we refine the mesh. for data-driven inverse modeling from sparse observa- 1 https://github.com/kailaix/ADCME.jl tions. arXiv preprint arXiv:2002.10521, 2020.