GMLS-Nets: A machine learning framework for unstructured data Nathaniel Trask 1, + , Ravi G. Patel1 , Ben J. Gross 2 , Paul J. Atzberger2,† 1 Sandia National Laboratories∗ Center for Computing Research {natrask,rgpatel}@sandia.gov 2 University of California Santa Barbara Department of Mathematics and Mechanical Engineering http://atzberger.org/ atzberg@gmail.com Abstract under partially known dynamics. This data is often scarce or highly constrained, and it has been proposed that successful Data fields sampled on irregularly spaced points arise in many SciML strategies will leverage prior knowledge to enhance science and engineering applications. For regular grids, Con- information gained from such data [18, 25]. One may exploit volutional Neural Networks (CNNs) gain benefits from weight sharing and invariances. We generalize CNNs by introducing physical properties such as transformation symmetries, con- methods for data on unstructured point clouds using Gen- servation structure, or solution regularity [6, 9, 18]. This new eralized Moving Least Squares (GMLS). GMLS is a non- application space necessitates ML architectures capable of parametric meshfree technique for estimating linear bounded utilizing such knowledge. functionals from scattered data, and has emerged as an effec- For data sampled on regular grids, Convolutional Neural tive technique for solving partial differential equations (PDEs). Networks (CNNs) are widely used to exploit translation in- By parameterizing the GMLS estimator, we obtain learning variance and hierarchical structure to extract features from methods for linear and non-linear operators with unstructured data. Here we generalize this technique to the SciML setting stencils. The requisite calculations are local, embarrassingly by introducing GMLS-Nets based on the scattered data ap- parallelizable, and supported by a rigorous approximation the- ory. We show how the framework may be used for unstructured proximation theory underlying GMLS. Similar to how CNNs physical data sets to perform operator regression, develop pre- learn stencils which benefit from weight-sharing, GMLS- dictive dynamical models, and obtain feature extractors for en- Nets operate by using local reconstructions to learn operators gineering quantities of interest. The results show the promise between function spaces. The resulting architecture is simi- of these architectures as foundations for data-driven model larly interpretable and serves as an effective generalization development in scientific machine learning applications. of CNNs to unstructured data, while providing mechanisms to incorporate knowledge of underlying physics. In this work we show how GMLS-Nets may be used in a Introduction SciML setting. Our results show GMLS-Nets are an effective Many scientific and engineering applications require process- tool to discover PDEs, which may be used as a foundation ing data sets sampled on irregularly spaced points. Consider to construct data-driven models while preserving physical e.g. GIS data associating geospatial locations with measure- invariants like conservation principles. We also show they ments, or scientific simulations with unstructured meshes. may be used to improve traditional scientific components, This need is amplified by the recent surge of interest in scien- such as time integrators. We show they also can be used tific machine learning (SciML) [25] targeting the application to regress engineering quantities of interest from scientific of data-driven techniques to the sciences. In this setting, data simulation data. Finally, we briefly show GMLS-Nets can typically takes the form of e.g. synthetic simulation data from perform reasonably relative to convNets on traditional com- meshes, or from sensors associated with data sites evolving puter vision benchmarks. These results indicate the promise ∗ of GMLS-Nets to support data-driven modeling efforts in Sandia National Laboratories is a multimission laboratory man- SciML applications. Implementations in TensorFlow and Py- aged and operated by National Technology and Engineering So- Torch are available at https://github.com/rgp62/gmls-nets and lutions of Sandia, LLC.,a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energys National https://github.com/atzberg/gmls-nets. Nuclear Security Administration under contract DE-NA-0003525. This paper describes objective technical results and analysis. Any Generalized Moving Least Squares (GMLS) subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Generalized Moving Least Squares (GMLS) is a non- Energy or the United States Government. parametric functional regression technique to construct ap- Copyright c 2020, for this paper by its authors. Use permitted under proximations of linear, bounded functionals from scattered Creative Commons License Attribution 4.0 International (CCBY samples of an underlying field by solving local least-square 4.0). problems. On a Banach space V with dual space V∗ , we aim to recover an estimate of a given target functional τx̃ [u] ∈ V∗ stress however that this framework supports a much broader acting on u = u(x) ∈ V, where x, x̃ denote associated loca- application. Consider e.g. learning from flux data related to tions in a compactly supported domain Ω ⊂ Rd . We assume H(div)-conforming discretizations, R where one may select u is characterized by an unstructured collection of sampling as sampling functional λi (u) = fi u · dA, or consider the N functionals, Λ(u) := {λj (u)}j=1 ⊂ V∗ . physical constraints that may be imposed by selecting P as To construct this estimate, we consider P ⊂ V and seek an be divergence free or satisfy a differential equation. element p∗ ∈ P which provides an optimal reconstruction of We illustrate now the connection between GMLS and con- the samples in the following weighted-`2 sense. volutional networks in the case of a uniform grid, Xh ⊂ Zd . Consider a sampling functional λj (u) = (u(xj ) − u(xi )), N ∗ X 2 and assume the parameterization τx̃,ξ (Φ) = ξ1 , ..., ξdim(P) , p = argmin (λj (u) − λj (p)) ω(λj , τx̃ ). (1) xi,j = xi − xj . Then the GMLS estimate is given explicitly p∈P j=1 at a point xi by Here ω(λj , τx̃ ) is a positive, compactly supported kernel !−1 function establishing spatial correlation between the tar- X X h τx˜i [u] = ξα φα (xk )W (xi,k )φβ (xk ) get functional and sampling set. If one associates locations α,β,j k (5) N Xh := {xj }j=1 ⊂ Ω with Λ(u), then one may consider φβ (xj )W (xi,j )(uj − ui ). radial kernels ω = W (||xj − x̃||2 ), with support r < . Assuming the basis P = span{φ1 , ..., φdim(P) }, and denot- Contracting P terms involving α, β and k, we may write ing Φ(x) = {φi (x)}i=1,...,dim(P) , the optimal reconstruction τxh˜i [u] = j c(τ, Λ)ij (uj − ui ). The collection of stencil may be written in terms of an optimal coefficient vector a(u) coefficients at xi ∈ Xh are {c(τ, Λ)ij }j . Therefore, one application for GMLS is to build stencils similar to convo- p∗ = Φ(x)| a(u). (2) lutional networks. A major distinction is that GMLS can Provided one has knowledge of how the target functional handle scattered data sets and a judicious selection of Λ, P acts on P, the final GMLS estimate may be obtained by and ω can be used to inject prior information. Alternatively, applying the target functional to the optimal reconstruction one may interpret the regression over P as an encoding in a low-dimensional space well-suited to characterize common τx̃h [u] = τx̃ (Φ)| a(u). (3) operators. For continuous functions for example, an opera- tor’s action on the space of polynomials is often sufficient Sufficient conditions for the existence of solutions to Eqn. to obtain a good approximation. Unlike CNNs there is no 1 depend only upon the unisolvency of Λ over V, the distri- need to handle boundary effects; GMLS-nets instead learns bution of samples Xh , and mild conditions on the domain one-sided stencils. Ω; they are independent of the choice of τx̃ . For theoretical underpinnings and recent applications, we refer readers to [5, GMLS-Nets 16, 29, 30]. From an ML perspective, GMLS estimation consists of two GMLS has primarily been used to obtain point estimates parts: (i) data is encoded via the coefficient vector a(u) pro- of differential operators to develop meshfree discretizations viding a compression of the data in terms of P, (ii) the op- of PDEs. The abstraction of GMLS however provides a math- erator is regressed over P∗ ; this is equivalent to finding a ematically rigorous approximation theory framework which function qξ : a(u) → R. We propose GMLS-Layers encod- may be applied to a wealth of problems, whereby one may ing this process in Figure 1, parameterizing a(u) = N N (u). tailor the choice of τx̃ , Λ, P and ω to a given application. In This architecture accepts input channels indexed by α the current work, we will assume the action of τx̃ on P is which consist of components of the data vector-field [u]α unknown, and introduce a parameterization τx̃,ξ (Φ), where ξ sampled over the scattered points Xh . We allow for different denote hyperparameters to be inferred from data. Classically, sampling points for each channel, which may be helpful for GMLS is restricted to linear bounded target functionals; we heterogeneous data. Each of these input channels is then used will also consider a novel nonlinear extension by considering to obtain an encoding of the input field as the vector a(u) estimates of the form identifying the optimal representer in P. τx̃h [u] = qx̃,ξ (a(u)), (4) We next select our parameterization of the functional via qξ , which may be any family of functions trainable by where qx̃,ξ is a family of nonlinear operators parameterized back-propagation. We will consider two cases in this work by ξ acting upon the GMLS reconstruction. Where unam- appropriate for linear and non-linear operators. In the lin- biguous, we will drop the x̃ dependence of operators and ear case we consider qξ (a) = ξ T a, which is sufficient to simply write e.g. τ h [u] = qξ (a(u)). We have recently used exactly reproduce differential operators. For the nonlinear related non-linear variants of GMLS to develop solvers for case we parameterize with a multi-layer perceptron (MLP), PDEs on manifolds in [29]. qξ (a) = MLP(a). Note that in the case of linear activation For simplicity, in this work we specialize as follows. Let: function, the single layer MLP model reduces to the linear Λ be point evaluations on Xh ; P be πm (Rd ), the space of model. p̄ mth -order polynomials; let W (r) = (1 − r/)+ , where f+ Nonlinearity may thus be handled within a single non- denotes the positive part of a function f and p ∈ N. We linear GMLS-Layer, or by stacking multiple linear GMLS- GMLS-Layer Splines [19], a Gaussian correlation kernel [10, 11], or a Mapping MLP kernel function based on a learnable combination of ra- dial ReLu’s [28]. The SpiderCNNs share many similarities Coefficient Channels Channels Channels with GMLS-Nets using a kernel that is based on a learnable Output Input degree-three Taylor polynomial that is taken in product with a learnable radial piecewise-constant weight function [24]. A key distinction of GMLS-Nets is that operators are re- { scattered data processing gressed directly over the dual space V∗ without constructing shape/kernel functions. Both approaches provide ways to ap- Scattered Data Inputs GMLS-Nets proximate the action of a processing operator that aggregates over scattered data. coefficients We also mention other meshfree learning frameworks: coefficient channel a0 a1 a2 a3 a4 ... aN PointNet [13, 14] and Deep Sets [17], but these are aimed primarily at set-based data and geometric processing tasks for Classification segmentation and classification. Additionally, Radial Basis classes SD SD MP ... SD MLP Function (RBF) networks are similarly built upon similar stack layers Regression approximation theory [1, 2]. SD SD ... SD MLP L[u] Related work on operator regression in a SciML context in- input channel stack layers clude [4, 9, 15, 21–23, 26, 27]. In PINNs [23, 27], a versatile framework based on DNNs is developed to regress both linear and non-linear PDE models while exploiting physics knowl- Figure 1: GMLS-Nets. Scattered data inputs are processed by edge. In [26] and PDE-Nets [21], CNNs are used to learn learnable operators τ [u] parameterized via GMLS estimators. stencils to estimate operators. In [9, 15] dictionary learning A local reconstruction is built about each data point and en- is used along with sparse optimization methods to identify coded as a coefficient vector via equation 2. The coefficient dynamical systems to infer physical laws associated with mapping q(a) of equation 4 provides the learnable action of time-series data. In [22], regression is performed over a class the operator. GMLS-Layers can be stacked to obtain deeper of nonlinear pseudodifferential operators, formed by com- architectures and combined with other neural network opera- posing neural network parameterized Fourier multipliers and tions to perform classification and regression tasks (inset, SD: pointwise functionals. scattered data, MP: max-pool, MLP: multi-layer perceptron). GMLS-Nets can be used in conjunction with the above methods. GMLS-Nets have the distinction of being able to move beyond reliance on CNNs on regular grids, no longer layers with intermediate ReLU’s, the later mapping more need moment conditions to impose accuracy and interpretabil- directly onto traditional CNN construction. We next in- ity of filters for estimating differential operators [21], and do troduce pooling operators applicable to unstructured data, not require as strong assumptions about the particular form of whereby for each point in a given target point cloud Xtarget h , the PDE or a pre-defined dictionary as in [15, 27]. We expect φ(xi ) = F ({xj |j ∈ Xh , |xj − xi | < }). Here F represents that prior knowledge exploited globally in PINNs methods the pooling operator (e.g. max, average, etc.). With this col- may be incorporated into the GMLS-Layers. In particular, lection of operators, one may construct architectures similar the ability to regress natively over solver degrees of freedom to CNNs by stacking GMLS-Layers together with pooling will be particularly useful for SciML applications. layers and other NN components. Strided GMLS-layers gen- eralizing strided CNN stencils may be constructed by choos- Results ing target sites on a second, smaller point cloud. Learning differential operators and identifying Relation to other work. governing equations. Many recent works aim to generalize CNNs away from the Many data sets arising in the sciences are generated by pro- limitations of data on regular grids [8, 12]. This includes work cesses for which there are expected governing laws express- on handling inputs in the form of directed and un-directed ible in terms of ordinary or partial differential equations. graphs [7], processing graphical data sets in the form of GMLS-Nets provide natural features to regress such opera- meshes and point-clouds [14, 17], and in handling scattered tors from observed state trajectories or responses to fluctua- sub-samplings of images [8, 19]. Broadly, these works: (i) use tions. We consider the two settings the spectral theory of graphs and generalize convolution in the ∂u frequency domain [8], (ii) develop localized notions similar to = L[u(t, x)] and L[u(x)] = −f (x). (6) ∂t convolution operations and kernels in the spatial domain [28]. GMLS-Nets is most closely related to the second approach. The L[u] can be a linear or non-linear operator. When the The closest works include SplineCNNs [19], MoNet [10, data are snapshots of the system state un = u(tn ) at discrete 11], KP-Conv [28], and SpiderCNN [24]. In each of these times tn = n∆t, we use estimators based on methods a local spatial convolution kernel is approximated un+1 − un by a parameterized family of functions: open/closed B- = L[{uk }k∈K ; ξ]. (7) ∆t input prediction target op. 0.8 Initial condition Regressed FVM Laplacian 1D: Exact solution True operator FDM 0.6 Regressed FDM True operator FVM 2 u 500 0.4 t 0 0 0.2 2 500 predict input u L[u] 0 0.0 0.5 1.0 0.0 0.5 1.0 0 10 20 30 40 x Burgers 1D: 0.2 50 2 Regressed FDM 25 0.15 Regressed FVM 1 0 True operator FDM 0 0.1 True operator FVM u 25 1 Exact 50 0.0 0.5 1.0 0.05 0.0 0.5 1.0 Laplacian 2D: 0 20 30 x Figure 3: Top: Advection-diffusion solution when ∆t = ∆tCF L . The true model solution and regressed solution all agree with the analytic solution. Bottom: Solution for under- u: input L[u]: predicted L[u]: target resolved dynamics with ∆t = 10∆tCF L . The implicit inte- grator causes FDM/FVM of true operator to be overly dissi- Figure 2: Regression of Differential Operators. GMLS-Nets pative. The regressed operator matches well with the FVM can accurately learn both linear and non-linear operators, operator, matching the phase almost exactly. shown is the case of the 1D/2D Laplacians and Burger’s equation. In-homogeneous operators can also be learned ∆t/∆tCF L LF DM,ex LF DM LF V M,ex LF V M by including as one of the input channels the location x. Training and test data consists of random input functions 0.1 0.00093 0.00015 0.00014 0.00010 in 1d at 102 nodes on [0, 1] and in 2d at 400 nodes in 1 0.0011 0.00093 0.0011 0.00011 [0, 1] × [0, 1]. Each random input P function follows a Gaus- 10 0.0083 0.0014 0.0083 0.00035 sian distribution with u(x) = k ξk exp (i2πk · x/L) with ξk ∼ exp(−α1 k 2 )η(0, 1). Training and test data is generated Table 1: The `2 -error for data-driven finite difference model with α1 = 0.1 by computed operators with spectral accuracy (FDM) and finite volume models (FVM) for advection- for Ntrain = 5 × 104 and Ntest = 104 . diffusion equation. Comparisons made to classical discretiza- tions using exact operators. For conservative data-driven fi- nite volume model, there is an order of magnitude better In the case that K = {n + 1}, this corresponds to using an accuracy for large timestep integration. Implicit Euler scheme to model the dynamics. Many other choices are possible, and later we shall discuss estimators with conservation properties. The learning capabilities of natural degrees of freedom for a given model. This provides GMLS-Nets to regress differential operators are shown in access to structure preserving properties such as conservation, Fig. 2. As we shall discuss in more detail, this can be used e.g., conservation of mass in a physical system. to identify the underlying dynamics and obtain governing We take as a source of training data the following analytic equations. solution to the 1D unsteady advection-diffusion equation with advection and diffusion coefficients a and ν on the interval Long-time integrators: discretization for native Ω = [0, 30]. data-driven modeling.   The GMLS framework provides useful ways to target and 1 x − (x0 + at) uex (x, t) = √ exp − (8) sample arbitrary functionals. In a data transfer context, this a 4πνt 4νt has been leveraged to couple heterogeneous codes. For ex- ample, one may sample the flux degrees of freedom of a To construct a finite difference model (FDM), we assume Raviart-Thomas finite element space and target cell integral a node set N = {x0 = 0, x1 , ..., xN −1 , xN = 30}. To con- degrees of freedom of a finite volume code to perform native struct a finite volume model (FVM), we construct the set data transfer. This avoids the need to perform intermediate of cells C = {[xi , xi+1 ], xi , xi+1 ∈ N, i ∈ {0, ..., N − 1}}, projections/interpolations [20]. Motivated by this, we demon- with associated cell measure µ(ci ) = |xi+1 − xi | and set of strate that GMLS may be used to learn discretization native oriented boundary faces Fi = ∂ci = {xi+1 , −xi }. We then data-driven models, whereby dynamics are learned in the assume for uniform timestep ∆t = tn+1 − tn the Implicit Euler update for the FDM given by Brownian Trajectories Density Estimation 10 2 un+1 − uni t=0histogram i = LF DM [un+1 ; ξ], (9) x(t) 0 filtered ∆t To obtain conservation we use the FVM update -10 0 t 10 Particle Distribution t=5 un+1 − uni ρ Z 1 X i LF V M [un+1 ; ξ] · dA. (10) t= 0 = ∆t µ(ci ) f ∈Fi t= 5 For the advection-diffusion equation in the limit ∆t → 0, 0 LF DM,ex = a · ∇u + ν∇2 u and LF V M,ex = au + ν∇u. By 0 x 1 0 x 1 construction, for any choice of hyperparameters ξ the FVM Prediction of Density Evolution will be locally conservative. In this sense, the physics of mass 2.0 prediction conservation are enforced strongly via the discretization, and particle density we parameterize only an empirical closure for fluxes - GMLS 1.5 naturally enables such native flux regression. 1.0 t We use a single linear GMLS-net layer to parameterize ρ both LF DM and LF V M , and train over a single timestep by 0.5 using Eqn. 8 to evaluate the exact time increment in Eqns. 9- 10 . We perform gradient descent to minimize the RMS of the 0.0 residual with respect to ξ. For the FDM and FVM we use a 0.0 0.2 0.4 x 0.6 0.8 1.0 cubic and quartic polynomial space, respectively. Recall that to resolve the diffusion and advective timescales one would Figure 4: GMLS-Nets can be trained with molecular-level select a timestep of roughly ∆tCF L = min 21 a∆t ∆x , 4 ∆x2 . 1 ν∆t  data to infer continuum dynamical models. Data are simula- After regressing the operator, we solve then extracted o tions of Brownian motion with periodic boundary conditions t scheme to advance from u0i = u(xi , t0 ) i to uif inal .  on Ω = [0, 1] and diffusivity D = 1 (top-left, unconstrained i As implicit Euler is unconditionally stable, one may se- trajectory). Starting with initial density of a heaviside func- lect ∆t  ∆tCF L at the expense of introducing nu- tion, we construct histograms over time to estimate the par- merical dissipation, ”smearing” the solution. We consider ticle density (upper-right, solid lines) and perform further ∆t ∈ {0.1∆tCF L , ∆tCF L , 10∆tCF L } and compare both filtering to remove sampling noise (upper-right, dashed lines). the learned FDM/FVM dynamics to those obtained with GMLS-Net is trained using FVM estimator of equation 10. a standard discretization (i.e. letting LF DM = LF DM,ex . Predictive continuum model is obtained for the density evolu- From Fig. 3 we observe that for ∆t/∆tCF L ≤ 1 both the tion. Long-term agreement is found between the particle-level regressed and reference models agree well with the analytic simulation (bottom, solid lines) and the inferred continuum solution. However, for ∆t = 10∆tCF L , we see that while the model (bottom, dashed lines). reference models are overly dissipative, the regressed models match the analytic solution. Inspection of the `2 −norm of the solutions at tf inal in Table 1 indicates that as expected, the to accurately predict important statistical moments of the classical solutions corresponding to LF DM,ex and LF V M,ex high-fidelity model over longer timescales. As an example, converge as O(∆t). The regressed FDM is consistently more consider a mean-field continuum model derived by coarse- accurate than the exact operator. Most interesting, the re- graining a molecular dynamics simulation. Classically, one gressed FVM is roughly independent of ∆t, providing a 20× may pursue homogenization analysis to carefully derive such improvement in accuracy over the classical model. This pre- a continuum model, but such techniques are typically prob- liminary result suggests that GMLS-Nets offer promise as a lem specific and can become technical. We illustrate here how tool to develop non-dissipative implicit data-driven models. GMLS-Nets can be used to extract a conservative continuum We suggest that this is due to the ability for GMLS-Nets to PDE model from particle-level simulation data. regress higher-order differential operator corrections to the discrete time dynamics, similar to e.g. Lax-Friedrichs/Lax- Brownian motion has as its infinitesimal generator the Wendroff schemes. unsteady diffusion equation [3]. As a basic example, we will extract a 1D diffusion equation to predict the long- Data-driven modeling from molecular dynamics. term density of a cloud of particles undergoing pseudo- In science and engineering applications, there are often high- 1D Brownian motion. We consider the periodic domain fidelity descriptions of the physics based on molecular dy- Ω = [0, 1] × [0, 0.1], and generate a collection of Np parti- namics. One would like to extract continuum descriptions cles with initial position xp (t = 0) drawn from the uniform to allow for predictions over longer time/length-scales or re- distribution U [0, 0.5] × U [0, 0.1]. duce computational costs. Coarse-grained modeling efforts Due to this initialization and domain geometry, the particle also have similar aims while retaining molecular degrees density is statistically one dimensional. We estimate the den- of freedom. Each seek lower-fidelity models that are able sity field ρ(x, t) along the first dimension by constructing a collection C of N uniform width cells and build a histogram, MNIST Input GMLS Features Classes Image a[0] a[1] a[2] a[3] a[4] Np 1xp (t)∈c 1x∈c . XX ρ(x, t) = (11) c∈C p=1 a[5] a[6] a[7] a[8] a[9] The 1x∈A is the indicator function taking unit value for x ∈ GMLS-Layer A and zero otherwise. a[10] a[11] a[12] a[13] a[14] We evolve the particle positions xp (t) under 2D Brownian motion (the density will remain statistically 1D as the parti- cles evolve). In the limit Np /N → ∞, the particle density Case Conv-2L Hybr id-2L GMLS-2L satisfies a diffusion equation, and we can scale the Brownian MNIST 98.52% 98.41% 96.87% motion increments to obtain a unit diffusion coefficient in this limit. As the ratio Np /N is finite, there is substantial noise in the Figure 5: MNIST Classification. GMLS-Layers are substi- extracted density field. We obtain a low pass filtered density, tuted for convolution layers in a basic two-layer architecture ρe(x, t), by convolving ρ(x, t) with a Gaussian kernel of width (Conv2d + ReLu + MaxPool + Conv2d + ReLu + MaxPool + twice the histogram bin width. FC). The Conv-2L test are all Conv-Layers, Hybrib-2L has We use the FVM scheme in the same manner as in the GMLS-Layer followed by a Conv-Layer, and GMLS-2L uses previous section. In particular, we regress a flux that matches all GMLS-Layers. GMLS-Nets used a polynomial basis of the increment (e ρ(x, t = 10) − ρe(x, t = 12))/2∆t. This win- monomials. The filters in GMLS are by design more limited dow was selected, since the regression at t = 0 is ineffective than a general Conv-Layer and correspond here to estimated as the density approximates a heaviside function. Such near derivatives of the data set (top-right). Despite these restric- discontinuities are poorly represented with polynomials and tions, the GMLS-Net still performs reasonably well on this subsequently not expected to train well. Additionally, we basic classification task (bottom-table). train over a time interval of 2∆t, where in general k∆t steps can be used to help mollify high-frequency temporal noise. To show how the GMLS-Nets’ inferred operator can be of basis for p∗ and sampling functionals λj , other features used to make predictions, we evolve the regressed FVM may be extracted. For polynomials with terms in dictionary for one hundred timesteps and compare to the density field order, coefficients are shown in Fig. 5. Notice the clear trends obtained from the particle solver. We apply Dirichlet bound- and directional dependence on increases and decreases in the ary conditions ρ(0, t) = ρ(1, t) = 1 and initial conditions image intensity, indicating c[1] ∼ ∂x and c[2] ∼ ∂y . Given matching the histogram ρ(x, t = 0). Again, the FVM by the history of PDE modeling, for many classification and construction is conservative, where it is easily shown for all regression tasks arising in the sciences and engineering, we t that Ω ρdx = Np . A time series summarizing the evolu- R expect such derivative-based features extracted by GMLS- tion of density in both the particle solver and the regressed Nets will be useful in these applications. continuum model is provided in Fig 4. While this is a ba- GMLS-Net on unstructured fluid simulation data. sic example, this illustrates the potential of GMLS-nets in constructing continuum-level models from molecular data. We consider the application of GMLS-Nets to unstructured These techniques also could have an impact on data-driven data sets representative of scientific machine learning appli- approaches for numerical methods, such as projective inte- cations. Many hydrodynamic flows can be experimentally gration schemes. characterized using velocimetry measurements. While veloc- ity fields can be estimated even for complex geometries, in Image processing: MNIST benchmark. such measurements one often does not have access directly While image processing is not the primary application area to fields, such as the pressure. However, integrated quanti- we intend, GMLS-Nets can be used for tasks such as classifi- ties of interest, such as drag are fundamental for performing cation. For the common MNIST benchmark task, we compare engineering analysis and yet depend upon both the velocity use of GMLS-Nets with CNNs in Figure 5. CNNs use kernel and pressure. This limits the level of characterization that size 5, zero-padding, max-pool reduction 2, channel sizes can be accomplished when using velocimetry data alone. We 16, 32, FC as linear map to soft-max prediction of the cat- construct GMLS-Net architectures that allow for prediction egories. The GMLS-Nets use the same architecture with a of the drag directly from unstructured fluid velocity data, GMLS using polynomial basis of monomials in x, y up to without any direct measurement of the pressure. degree porder = 4. We illustrate the ideas using flow past a cylinder of radius We find that despite the features extracted by GMLS-Nets L. This provides a well-studied canonical problem whose being more restricted than a general CNN, there is only a drag is fully characterized experimentally in terms of the modest decrease in the accuracy for the basic MNIST task. Reynolds number, Re = U L/ν. For incompressible flow We do expect larger differences on more sophisticated image past a cylinder, one may apply dimensional analysis to relate tasks. This basic test illustrates how GMLS-Nets with a poly- drag Fd to the Reynolds number via the drag coefficient Cd : nomial basis extracts features closely associated with taking   2Fd UL derivatives of the data field. We emphasize for other choices 2 A = Cd . (12) ρU∞ ν The U∞ is the free-stream velocity, A is the frontal area of the structured data sets. cylinder, and Cd : R → R. Such analysis requires in practice As an architecture, we provide two input channels for the engineering judgement to identify relevant dimensionless two velocity components to three stacked GMLS layers. The groups. After such considerations, this allows one to collapse first layer acts on the cell centers, and intermediate pooling relevant experimental parameters to (ρ, U∞ , A, L, ν) onto a layers down-sample to random subsets of Xh . We conclude single curve. with a linear activation layer to extract the drag coefficient as a single scalar output. We randomly select 80% of the samples for training, and use the remainder as a test set. We 2.5 Training data quantify using the root-mean-square (MSE) error which we GMLS-Net test data find to be below 1.5%. The excellent predictive capability demonstrated in Fig. 6 highlights GMLS-Nets ability to provide an effective means of regressing engineering quantities of interest directly from Drag coefficient 2 velocity flow data; the GMLS-Net architecture is able to identify a latent low-dimensional parameter space which is typically found by hand using dimensional analysis. This similarity relationship across the Reynolds numbers is identi- 1.5 fied, despite the fact that it does not have direct access to the viscosity parameter. These initial results indicate some of the potential of GMLS-Nets in processing unstructured data sets for scientific machine learning applications. 1 100 10000 1e+06 1e+08 Reynolds number Conclusions We have introduced GMLS-Nets for processing scattered data sets leveraging the framework of GMLS. GMLS-Nets allow for generalizing convolutional networks to scattered data, while still benefiting from underlying translational in- variances and weight sharing. The GMLS-layers provide feature extractors that are natural particularly for regressing Figure 6: GMLS-Nets are trained on a CFD data set of flow differential operators, developing dynamical models, and pre- velocity fields. Top: Training set of the drag coefficient plot- dicting quantities of interest associated with physical systems. ted as a function of Reynolds number (small black dots). The GMLS-Nets were demonstrated to be capable of obtaining GMLS-Net predictions for a test set (large red dots). Bottom: dynamical models for long-time integration beyond the lim- Flow velocity fields corresponding to the smallest (left) and its of traditional CFL conditions, for making predictions of largest (right) Reynolds numbers in the test set. density evolution of molecular systems, and for predicting directly from flow data quantities of interest in fluid mechan- ics. These initial results indicate some promising capabilities For the purposes of training a GMLS-Net, we construct a of GMLS-Nets for use in data-driven modeling in scientific synthetic data set by solving the Reynolds averaged Navier- machine learning applications. Stokes (RANS) equations with a steady state finite volume code. Let L = ρ = 1 and consider U ∈ [0.1, 20] and ν ∈ 10−2 , 108 . We consider a k −  turbulence model with inlet conditions consistent with a 10% turbulence inten- References sity and a mixing length corresponding to the inlet size. From [1] D.S. Broomhead and D. Lowe. “Multivariable Func- the solution, we extract the velocity field u at cell centers tional Interpolation and Adaptive Networks”. In: Com- to obtain an unstructured point cloud Xh . We compute Cd plex Systems 2.1 (1988), pp. 321–355. directly from the simulations. We then obtain an unstruc- [2] T. Poggio and F. Girosi. “Networks for approxima- tured data set of 400 (u)i features over Xh , with associated tion and learning”. In: Proceedings of the IEEE 78.9 labels Cd . We emphasize that although U∞ and ν are used to (1990), pp. 1481–1497. generate the data, they are not included as features, and the [3] Ioannis Karatzas and Steven E Shreve. “Brownian Reynolds number is therefore hidden. Motion and Stochastic Calculus”. In: Springer, 1998, We remark that the k −  model is well known to perform pp. 47–127. poorly for flows with strong curvature such as recirculation zones. Here, in our proof-of-concept demonstration, we treat [4] I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artificial the RANS-k −  solution as ground truth for simplicity, de- neural networks for solving ordinary and partial dif- spite its short-comings and acknowledge that a more physical ferential equations”. In: IEEE Transactions on Neural study would consider ensemble averages of LES/DNS data Networks 9.5 (1998), pp. 987–1000. in 3D. We aim here just to illustrate the potential utility of [5] Holger Wendland. Scattered data approximation. GMLS-Nets in a scientific setting for processing such un- Vol. 17. Cambridge university press, 2004. [6] Susanne Brenner and Ridgway Scott. The Mathemati- [21] Zichao Long et al. “PDE-Net: Learning PDEs from cal Theory of Finite Element Methods. Springer, 2008. Data”. In: Proceedings of the 35th International Con- [7] Franco Scarselli et al. “The Graph Neural Network ference on Machine Learning. Ed. by Jennifer Dy Model”. In: Trans. Neur. Netw. 20.1 (Jan. 2009), and Andreas Krause. Vol. 80. Proceedings of Ma- pp. 61–80. ISSN: 1045-9227. chine Learning Research. Stockholmsmssan, Stock- [8] Joan Bruna et al. “Spectral networks and locally con- holm Sweden: PMLR, 2018, pp. 3208–3216. nected networks on graphs”. English (US). In: In- [22] Ravi G. Patel and Olivier Desjardins. “Nonlinear ternational Conference on Learning Representations integro-differential operator regression with neural net- (ICLR2014), CBLS, April 2014. 2014. works”. In: ArXiv abs/1810.08552 (2018). [9] Steven L. Brunton, Joshua L. Proctor, and J. Nathan [23] Maziar Raissi and George Em Karniadakis. “Hidden Kutz. “Discovering governing equations from data by physics models: Machine learning of nonlinear partial sparse identification of nonlinear dynamical systems”. differential equations”. In: Journal of Computational In: 113.15 (2016), pp. 3932–3937. Physics 357 (2018), pp. 125 –141. ISSN: 0021-9991. [10] Thomas N. Kipf and Max Welling. “Semi-Supervised [24] Yifan Xu et al. “SpiderCNN: Deep Learning on Point Classification with Graph Convolutional Networks”. Sets with Parameterized Convolutional Filters”. In: In: ArXiv abs/1609.02907 (2016). Computer Vision – ECCV 2018. Ed. by Vittorio Ferrari [11] Federico Monti et al. “Geometric Deep Learning on et al. Cham: Springer International Publishing, 2018, Graphs and Manifolds Using Mixture Model CNNs”. pp. 90–105. ISBN: 978-3-030-01237-3. In: 2017 IEEE Conference on Computer Vision and [25] Nathan Baker et al. Workshop report on basic research Pattern Recognition (CVPR) (2016), pp. 5425–5434. needs for scientific machine learning: Core technolo- [12] M. M. Bronstein et al. “Geometric Deep Learning: gies for artificial intelligence. Tech. rep. USDOE Of- Going beyond Euclidean data”. In: IEEE Signal Pro- fice of Science (SC), Washington, DC (United States), cessing Magazine 34.4 (2017), pp. 18–42. ISSN: 1053- 2019. 5888. DOI: 10.1109/MSP.2017.2693418. [26] Yohai Bar-Sinai et al. “Learning data-driven discretiza- [13] Charles R. Qi et al. “PointNet: Deep Learning on Point tions for partial differential equations”. In: Proceed- Sets for 3D Classification and Segmentation”. In: The ings of the National Academy of Sciences 116.31 IEEE Conference on Computer Vision and Pattern (2019), pp. 15344–15349. ISSN: 0027-8424. DOI: 10. Recognition (CVPR). 2017. 1073/pnas.1814058116. [14] Charles Ruizhongtai Qi et al. “PointNet++: Deep Hi- [27] M. Raissi, P. Perdikaris, and G.E. Karniadakis. erarchical Feature Learning on Point Sets in a Metric “Physics-informed neural networks: A deep learning Space”. In: Advances in Neural Information Process- framework for solving forward and inverse problems ing Systems 30. Ed. by I. Guyon et al. Curran Asso- involving nonlinear partial differential equations”. In: ciates, Inc., 2017, pp. 5099–5108. Journal of Computational Physics 378 (2019), pp. 686 [15] Samuel H. Rudy et al. “Data-driven discovery of par- –707. tial differential equations”. In: 3.4 (2017). [28] Hugues Thomas et al. “KPCONV: Flexible and de- [16] Nathaniel Trask, Mauro Perego, and Pavel Bochev. formable convolution for point clouds”. In: Proceed- “A high-order staggered meshless method for elliptic ings of the IEEE International Conference on Com- problems”. In: SIAM Journal on Scientific Computing puter Vision. 2019, pp. 6411–6420. 39.2 (2017), A479–A502. [29] BJ Gross et al. “Meshfree methods on manifolds for [17] Manzil Zaheer et al. “Deep Sets”. In: Advances in hydrodynamic flows on curved surfaces: a generalized Neural Information Processing Systems 30. Ed. by I. moving least-squares (GMLS) approach”. In: Journal Guyon et al. Curran Associates, Inc., 2017, pp. 3391– of Computational Physics (2020), p. 109340. 3401. [30] Nathaniel Trask, Pavel Bochev, and Mauro Perego. [18] P. J. Atzberger. “Importance of the Mathematical Foun- “A conservative, consistent, and scalable meshfree dations of Machine Learning Methods for Scientific mimetic method”. In: Journal of Computational and Engineering Applications”. In: SciML2018 Work- Physics 409 (2020), p. 109187. shop, position paper, https://arxiv.org/abs/1808.02213 (2018). [19] M. Fey et al. “SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels”. In: 2018 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition. 2018, pp. 869–877. [20] Paul Allen Kuberry, Pavel B Bochev, and Kara J Pe- terson. A virtual control meshfree coupling method for non-coincident interfaces. Tech. rep. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2018. Derivation of Gradients of the Operator τxi [u]. where ∂M X  ∂  Parameters of the operator τ̃ . = p(xj , xi ) p(xj , xi )T wij ∂xi ∂xi We give here some details on the derivation of the gradients for j the learnable GMLS operator τ [u] and intermediate steps. This  ∂ T can be used in implementations for back-propagation and other + p(xj , xi ) p(xj , xi ) wij applications. ∂xi  GMLS works by mapping data to a local polynomial fit in region ∂wij + p(xj , xi )p(xj , xi )T . Ωi around xi with p∗ (x) ≈ u(x) for x ∈ Ωi . To find the optimal ∂xi fitting polynomial p∗ (x) ∈ V to the function u(x), we can consider the case with λj (x) = δ(x − xj ) and weight function wij = The derivatives in r are given by w(xi −xj ). In a region around a reference point x∗ the optimization X  ∂   ∂r ∂wij problem can be expressed parameterically in terms of coefficients a = p(xj ) uj wij + p(xj )uj . as ∂xi j ∂xi ∂xi X 2 a∗ (xi ) = arg minm uj − p(xj )T a wij . The full derivative of the linear operator τ̃ can be expressed as a∈R j     ∂ ∂ ∂ ∗ We write for short p(xj ) = p(xj , xi ), where the basis elements τ̃ (xi ) = q(xi )T a∗ (xi ) + q(xi )T a (xi ) . ∂xi ∂xi ∂xi in fact do depend on xi . Typically, for polynomials we just use p(xj , xi ) = p(xj − xi ). This is important in the case we want to In the constant case q(xi ) = q0 , the derivative of τ̃ simplifies to take derivatives in the input values xi of the expressions.   We can compute the derivative in a` to obtain ∂ τ̃ (xi ) = qT0 ∂ ∗ a (xi ) . ∂xi ∂xi ∂J (xi ) = 0. The derivatives of the other terms follow more readily. For deriva- ∂a` tive of the linear operator τ̃ in the coefficients a(xi ), we have This implies ∂ " # τ̃ (xi ) = q(xi ). X X ∂a(xi ) p(xj )wij p(xj )T a = wij p(xj )uj . j j For derivatives of the linear operator τ̃ in the mapping coefficient q values, we have Let ∂ τ̃ (xi ) = a(xi ). " # ∂q(xi ) In the case of nonlinear operators τ̃ = q(a(xi )) there are further X T X M= p(xj )wij p(xj ) , r= wij p(xj )uj , j j dependencies beyond just xi and a(xi ), and less explicit expres- sions. For example, when using MLP’s there may be hierarchy of then we can rewrite the coefficients as the solution of the linear trainable weights w. The derivatives of the non-linear operator can system be expressed as M a∗ (xi ) = r. ∂ ∂q τ̃ (xi ) = (a(xi )). This is sometimes written more explicitly for analysis and computa- ∂w ∂w tions as Here, one relies on back-propagation algorithms for evaluation of a∗ (xi ) = M −1 r. ∂q . Similarly, given the generality of q(a), for derivatives in a and ∂w We can represent a general linear operator τ̃ (xi ) using the a∗ repre- xi , one can use back-propagation methods on q and the chain-rule sentation as with the expressions derived during the linear case for a and xi dependencies. τ̃ (xi ) = q(xi )T a∗ (xi ) Typically, the weights will not be spatially dependent q(xi ) = q0 . Throughout, we shall denote this simply as q and assume there is no spatial dependence, unless otherwise indicated. Derivatives of τ̃ in xi , a(xi ), and q. The derivative in xi is given by ∂ ∗ ∂M −1 ∂r a (xi ) = r + M −1 ∂xi ∂xi ∂xi In the notation, we denote p(xj ) = p(xj , xi ), where the basis elements in fact can depend on the particular xi . These terms can be expressed as ∂M −1 ∂M −1 = −M −2 , ∂xi ∂xi