GMLS-Nets: A machine learning framework for unstructured data
                    Nathaniel Trask 1, + , Ravi G. Patel1 , Ben J. Gross 2 , Paul J. Atzberger2,†
                                                      1
                                                       Sandia National Laboratories∗
                                                      Center for Computing Research
                                                      {natrask,rgpatel}@sandia.gov
                                              2
                                                University of California Santa Barbara
                                       Department of Mathematics and Mechanical Engineering
                                                        http://atzberger.org/
                                                        atzberg@gmail.com

                            Abstract                                    under partially known dynamics. This data is often scarce or
                                                                        highly constrained, and it has been proposed that successful
  Data fields sampled on irregularly spaced points arise in many        SciML strategies will leverage prior knowledge to enhance
  science and engineering applications. For regular grids, Con-
                                                                        information gained from such data [18, 25]. One may exploit
  volutional Neural Networks (CNNs) gain benefits from weight
  sharing and invariances. We generalize CNNs by introducing            physical properties such as transformation symmetries, con-
  methods for data on unstructured point clouds using Gen-              servation structure, or solution regularity [6, 9, 18]. This new
  eralized Moving Least Squares (GMLS). GMLS is a non-                  application space necessitates ML architectures capable of
  parametric meshfree technique for estimating linear bounded           utilizing such knowledge.
  functionals from scattered data, and has emerged as an effec-            For data sampled on regular grids, Convolutional Neural
  tive technique for solving partial differential equations (PDEs).     Networks (CNNs) are widely used to exploit translation in-
  By parameterizing the GMLS estimator, we obtain learning              variance and hierarchical structure to extract features from
  methods for linear and non-linear operators with unstructured         data. Here we generalize this technique to the SciML setting
  stencils. The requisite calculations are local, embarrassingly
                                                                        by introducing GMLS-Nets based on the scattered data ap-
  parallelizable, and supported by a rigorous approximation the-
  ory. We show how the framework may be used for unstructured           proximation theory underlying GMLS. Similar to how CNNs
  physical data sets to perform operator regression, develop pre-       learn stencils which benefit from weight-sharing, GMLS-
  dictive dynamical models, and obtain feature extractors for en-       Nets operate by using local reconstructions to learn operators
  gineering quantities of interest. The results show the promise        between function spaces. The resulting architecture is simi-
  of these architectures as foundations for data-driven model           larly interpretable and serves as an effective generalization
  development in scientific machine learning applications.              of CNNs to unstructured data, while providing mechanisms
                                                                        to incorporate knowledge of underlying physics.
                                                                           In this work we show how GMLS-Nets may be used in a
                        Introduction                                    SciML setting. Our results show GMLS-Nets are an effective
Many scientific and engineering applications require process-           tool to discover PDEs, which may be used as a foundation
ing data sets sampled on irregularly spaced points. Consider            to construct data-driven models while preserving physical
e.g. GIS data associating geospatial locations with measure-            invariants like conservation principles. We also show they
ments, or scientific simulations with unstructured meshes.              may be used to improve traditional scientific components,
This need is amplified by the recent surge of interest in scien-        such as time integrators. We show they also can be used
tific machine learning (SciML) [25] targeting the application           to regress engineering quantities of interest from scientific
of data-driven techniques to the sciences. In this setting, data        simulation data. Finally, we briefly show GMLS-Nets can
typically takes the form of e.g. synthetic simulation data from         perform reasonably relative to convNets on traditional com-
meshes, or from sensors associated with data sites evolving             puter vision benchmarks. These results indicate the promise
    ∗
                                                                        of GMLS-Nets to support data-driven modeling efforts in
      Sandia National Laboratories is a multimission laboratory man-    SciML applications. Implementations in TensorFlow and Py-
aged and operated by National Technology and Engineering So-            Torch are available at https://github.com/rgp62/gmls-nets and
lutions of Sandia, LLC.,a wholly owned subsidiary of Honeywell
International, Inc., for the U.S. Department of Energys National
                                                                        https://github.com/atzberg/gmls-nets.
Nuclear Security Administration under contract DE-NA-0003525.
This paper describes objective technical results and analysis. Any      Generalized Moving Least Squares (GMLS)
subjective views or opinions that might be expressed in the paper
do not necessarily represent the views of the U.S. Department of
                                                                        Generalized Moving Least Squares (GMLS) is a non-
Energy or the United States Government.                                 parametric functional regression technique to construct ap-
Copyright c 2020, for this paper by its authors. Use permitted under    proximations of linear, bounded functionals from scattered
Creative Commons License Attribution 4.0 International (CCBY            samples of an underlying field by solving local least-square
4.0).                                                                   problems. On a Banach space V with dual space V∗ , we aim
to recover an estimate of a given target functional τx̃ [u] ∈ V∗    stress however that this framework supports a much broader
acting on u = u(x) ∈ V, where x, x̃ denote associated loca-         application. Consider e.g. learning from flux data related to
tions in a compactly supported domain Ω ⊂ Rd . We assume            H(div)-conforming discretizations, R where one may select
u is characterized by an unstructured collection of sampling        as sampling functional λi (u) = fi u · dA, or consider the
                                N
functionals, Λ(u) := {λj (u)}j=1 ⊂ V∗ .                             physical constraints that may be imposed by selecting P as
   To construct this estimate, we consider P ⊂ V and seek an        be divergence free or satisfy a differential equation.
element p∗ ∈ P which provides an optimal reconstruction of             We illustrate now the connection between GMLS and con-
the samples in the following weighted-`2 sense.                     volutional networks in the case of a uniform grid, Xh ⊂ Zd .
                                                                    Consider a sampling functional λj (u) = (u(xj ) − u(xi )),
                      N
         ∗
                      X                          2                  and assume the parameterization τx̃,ξ (Φ) = ξ1 , ..., ξdim(P) ,
       p = argmin           (λj (u) − λj (p)) ω(λj , τx̃ ).   (1)   xi,j = xi − xj . Then the GMLS estimate is given explicitly
               p∈P    j=1
                                                                    at a point xi by
Here ω(λj , τx̃ ) is a positive, compactly supported kernel                                                             !−1
function establishing spatial correlation between the tar-
                                                                                  X        X
                                                                         h
                                                                       τx˜i [u] =       ξα     φα (xk )W (xi,k )φβ (xk )
get functional and sampling set. If one associates locations                      α,β,j     k
                                                                                                                                (5)
              N
Xh := {xj }j=1 ⊂ Ω with Λ(u), then one may consider
                                                                                                 φβ (xj )W (xi,j )(uj − ui ).
radial kernels ω = W (||xj − x̃||2 ), with support r < .
  Assuming the basis P = span{φ1 , ..., φdim(P) }, and denot-           Contracting
                                                                               P terms involving α, β and k, we may write
ing Φ(x) = {φi (x)}i=1,...,dim(P) , the optimal reconstruction      τxh˜i [u] = j c(τ, Λ)ij (uj − ui ). The collection of stencil
may be written in terms of an optimal coefficient vector a(u)       coefficients at xi ∈ Xh are {c(τ, Λ)ij }j . Therefore, one
                                                                    application for GMLS is to build stencils similar to convo-
                       p∗ = Φ(x)| a(u).                       (2)   lutional networks. A major distinction is that GMLS can
  Provided one has knowledge of how the target functional           handle scattered data sets and a judicious selection of Λ, P
acts on P, the final GMLS estimate may be obtained by               and ω can be used to inject prior information. Alternatively,
applying the target functional to the optimal reconstruction        one may interpret the regression over P as an encoding in a
                                                                    low-dimensional space well-suited to characterize common
                     τx̃h [u] = τx̃ (Φ)| a(u).                (3)   operators. For continuous functions for example, an opera-
                                                                    tor’s action on the space of polynomials is often sufficient
   Sufficient conditions for the existence of solutions to Eqn.     to obtain a good approximation. Unlike CNNs there is no
1 depend only upon the unisolvency of Λ over V, the distri-         need to handle boundary effects; GMLS-nets instead learns
bution of samples Xh , and mild conditions on the domain            one-sided stencils.
Ω; they are independent of the choice of τx̃ . For theoretical
underpinnings and recent applications, we refer readers to [5,      GMLS-Nets
16, 29, 30].                                                        From an ML perspective, GMLS estimation consists of two
   GMLS has primarily been used to obtain point estimates           parts: (i) data is encoded via the coefficient vector a(u) pro-
of differential operators to develop meshfree discretizations       viding a compression of the data in terms of P, (ii) the op-
of PDEs. The abstraction of GMLS however provides a math-           erator is regressed over P∗ ; this is equivalent to finding a
ematically rigorous approximation theory framework which            function qξ : a(u) → R. We propose GMLS-Layers encod-
may be applied to a wealth of problems, whereby one may             ing this process in Figure 1, parameterizing a(u) = N N (u).
tailor the choice of τx̃ , Λ, P and ω to a given application. In       This architecture accepts input channels indexed by α
the current work, we will assume the action of τx̃ on P is          which consist of components of the data vector-field [u]α
unknown, and introduce a parameterization τx̃,ξ (Φ), where ξ        sampled over the scattered points Xh . We allow for different
denote hyperparameters to be inferred from data. Classically,       sampling points for each channel, which may be helpful for
GMLS is restricted to linear bounded target functionals; we         heterogeneous data. Each of these input channels is then used
will also consider a novel nonlinear extension by considering       to obtain an encoding of the input field as the vector a(u)
estimates of the form                                               identifying the optimal representer in P.
                     τx̃h [u] = qx̃,ξ (a(u)),                 (4)      We next select our parameterization of the functional
                                                                    via qξ , which may be any family of functions trainable by
where qx̃,ξ is a family of nonlinear operators parameterized        back-propagation. We will consider two cases in this work
by ξ acting upon the GMLS reconstruction. Where unam-               appropriate for linear and non-linear operators. In the lin-
biguous, we will drop the x̃ dependence of operators and            ear case we consider qξ (a) = ξ T a, which is sufficient to
simply write e.g. τ h [u] = qξ (a(u)). We have recently used        exactly reproduce differential operators. For the nonlinear
related non-linear variants of GMLS to develop solvers for          case we parameterize with a multi-layer perceptron (MLP),
PDEs on manifolds in [29].                                          qξ (a) = MLP(a). Note that in the case of linear activation
   For simplicity, in this work we specialize as follows. Let:      function, the single layer MLP model reduces to the linear
Λ be point evaluations on Xh ; P be πm (Rd ), the space of          model.
                                                 p̄
mth -order polynomials; let W (r) = (1 − r/)+ , where f+             Nonlinearity may thus be handled within a single non-
denotes the positive part of a function f and p ∈ N. We             linear GMLS-Layer, or by stacking multiple linear GMLS-
                                  GMLS-Layer                                                                  Splines [19], a Gaussian correlation kernel [10, 11], or a
                                      Mapping MLP                                                             kernel function based on a learnable combination of ra-
                                                                                                              dial ReLu’s [28]. The SpiderCNNs share many similarities


                      Coeﬃcient
   Channels


                       Channels


                                                                     Channels
                                                                                                              with GMLS-Nets using a kernel that is based on a learnable


                                                                      Output
     Input


                                                                                                              degree-three Taylor polynomial that is taken in product with
                                                                                                              a learnable radial piecewise-constant weight function [24].
                                                                                                              A key distinction of GMLS-Nets is that operators are re-
                            {
               scattered data
                 processing
                                                                                                              gressed directly over the dual space V∗ without constructing
                                                                                                              shape/kernel functions. Both approaches provide ways to ap-
  Scattered Data Inputs                                       GMLS-Nets
                                                                                                              proximate the action of a processing operator that aggregates
                                                                                                              over scattered data.
              coeﬃcients                                                                                         We also mention other meshfree learning frameworks:
                                   coeﬃcient channel


     a0 a1 a2 a3 a4 ... aN
                                                                                                              PointNet [13, 14] and Deep Sets [17], but these are aimed
                                                                                                              primarily at set-based data and geometric processing tasks for
                                                       Classiﬁcation
                                                                                                              segmentation and classification. Additionally, Radial Basis


                                                                                                    classes
                                                        SD            SD        MP ... SD     MLP
                                                                                                              Function (RBF) networks are similarly built upon similar
                                                                    stack layers
                                                       Regression                                             approximation theory [1, 2].
                                                       SD              SD   ...   SD        MLP   L[u]           Related work on operator regression in a SciML context in-
         input channel
                                                                    stack layers                              clude [4, 9, 15, 21–23, 26, 27]. In PINNs [23, 27], a versatile
                                                                                                              framework based on DNNs is developed to regress both linear
                                                                                                              and non-linear PDE models while exploiting physics knowl-
Figure 1: GMLS-Nets. Scattered data inputs are processed by
                                                                                                              edge. In [26] and PDE-Nets [21], CNNs are used to learn
learnable operators τ [u] parameterized via GMLS estimators.
                                                                                                              stencils to estimate operators. In [9, 15] dictionary learning
A local reconstruction is built about each data point and en-
                                                                                                              is used along with sparse optimization methods to identify
coded as a coefficient vector via equation 2. The coefficient
                                                                                                              dynamical systems to infer physical laws associated with
mapping q(a) of equation 4 provides the learnable action of
                                                                                                              time-series data. In [22], regression is performed over a class
the operator. GMLS-Layers can be stacked to obtain deeper
                                                                                                              of nonlinear pseudodifferential operators, formed by com-
architectures and combined with other neural network opera-
                                                                                                              posing neural network parameterized Fourier multipliers and
tions to perform classification and regression tasks (inset, SD:
                                                                                                              pointwise functionals.
scattered data, MP: max-pool, MLP: multi-layer perceptron).
                                                                                                                 GMLS-Nets can be used in conjunction with the above
                                                                                                              methods. GMLS-Nets have the distinction of being able to
                                                                                                              move beyond reliance on CNNs on regular grids, no longer
layers with intermediate ReLU’s, the later mapping more
                                                                                                              need moment conditions to impose accuracy and interpretabil-
directly onto traditional CNN construction. We next in-
                                                                                                              ity of filters for estimating differential operators [21], and do
troduce pooling operators applicable to unstructured data,
                                                                                                              not require as strong assumptions about the particular form of
whereby for each point in a given target point cloud Xtarget
                                                       h     ,                                                the PDE or a pre-defined dictionary as in [15, 27]. We expect
φ(xi ) = F ({xj |j ∈ Xh , |xj − xi | < }). Here F represents                                                 that prior knowledge exploited globally in PINNs methods
the pooling operator (e.g. max, average, etc.). With this col-                                                may be incorporated into the GMLS-Layers. In particular,
lection of operators, one may construct architectures similar                                                 the ability to regress natively over solver degrees of freedom
to CNNs by stacking GMLS-Layers together with pooling                                                         will be particularly useful for SciML applications.
layers and other NN components. Strided GMLS-layers gen-
eralizing strided CNN stencils may be constructed by choos-                                                                              Results
ing target sites on a second, smaller point cloud.
                                                                                                              Learning differential operators and identifying
Relation to other work.                                                                                       governing equations.
Many recent works aim to generalize CNNs away from the                                                        Many data sets arising in the sciences are generated by pro-
limitations of data on regular grids [8, 12]. This includes work                                              cesses for which there are expected governing laws express-
on handling inputs in the form of directed and un-directed                                                    ible in terms of ordinary or partial differential equations.
graphs [7], processing graphical data sets in the form of                                                     GMLS-Nets provide natural features to regress such opera-
meshes and point-clouds [14, 17], and in handling scattered                                                   tors from observed state trajectories or responses to fluctua-
sub-samplings of images [8, 19]. Broadly, these works: (i) use                                                tions. We consider the two settings
the spectral theory of graphs and generalize convolution in the                                                       ∂u
frequency domain [8], (ii) develop localized notions similar to                                                           = L[u(t, x)] and L[u(x)] = −f (x).             (6)
                                                                                                                      ∂t
convolution operations and kernels in the spatial domain [28].
GMLS-Nets is most closely related to the second approach.                                                     The L[u] can be a linear or non-linear operator. When the
   The closest works include SplineCNNs [19], MoNet [10,                                                      data are snapshots of the system state un = u(tn ) at discrete
11], KP-Conv [28], and SpiderCNN [24]. In each of these                                                       times tn = n∆t, we use estimators based on
methods a local spatial convolution kernel is approximated                                                                 un+1 − un
by a parameterized family of functions: open/closed B-                                                                               = L[{uk }k∈K ; ξ].                    (7)
                                                                                                                              ∆t
                      input         prediction            target op.
                                                                                       0.8
                                                                                                                                 Initial condition      Regressed FVM
       Laplacian 1D:                                                                                                             Exact solution         True operator FDM
                                                                                       0.6
                                                                                                                                 Regressed FDM          True operator FVM
            2


                                                                                   u
                                                  500
                                                                                       0.4               t
            0                                         0
                                                                                       0.2
            2                                     500
                                                                       predict
                               input u                                  L[u]
                                                                                        0
                0.0           0.5      1.0                0.0    0.5        1.0              0               10             20                30             40
                                                                                                                                       x
       Burgers 1D:                                                                      0.2
                                                 50
        2                                                                                              Regressed FDM
                                                 25                                    0.15            Regressed FVM
        1
                                                 0                                                     True operator FDM
        0
                                                                                        0.1            True operator FVM


                                                                                   u
                                                 25
        1                                                                                              Exact
                                                 50
                                                      0.0       0.5          1.0
                                                                                       0.05
            0.0               0.5        1.0


       Laplacian 2D:                                                                     0
                                                                                                             20                                30
                                                                                                                                       x


                                                                                   Figure 3: Top: Advection-diffusion solution when ∆t =
                                                                                   ∆tCF L . The true model solution and regressed solution all
                                                                                   agree with the analytic solution. Bottom: Solution for under-
                u: input               L[u]: predicted          L[u]: target
                                                                                   resolved dynamics with ∆t = 10∆tCF L . The implicit inte-
                                                                                   grator causes FDM/FVM of true operator to be overly dissi-
Figure 2: Regression of Differential Operators. GMLS-Nets                          pative. The regressed operator matches well with the FVM
can accurately learn both linear and non-linear operators,                         operator, matching the phase almost exactly.
shown is the case of the 1D/2D Laplacians and Burger’s
equation. In-homogeneous operators can also be learned                              ∆t/∆tCF L                 LF DM,ex             LF DM             LF V M,ex      LF V M
by including as one of the input channels the location x.
Training and test data consists of random input functions                                        0.1              0.00093         0.00015            0.00014       0.00010
in 1d at 102 nodes on [0, 1] and in 2d at 400 nodes in                                            1                0.0011         0.00093             0.0011       0.00011
[0, 1] × [0, 1]. Each random input
                              P function follows a Gaus-                                         10                0.0083          0.0014             0.0083       0.00035
sian distribution with u(x) = k ξk exp (i2πk · x/L) with
ξk ∼ exp(−α1 k 2 )η(0, 1). Training and test data is generated                     Table 1: The `2 -error for data-driven finite difference model
with α1 = 0.1 by computed operators with spectral accuracy                         (FDM) and finite volume models (FVM) for advection-
for Ntrain = 5 × 104 and Ntest = 104 .                                             diffusion equation. Comparisons made to classical discretiza-
                                                                                   tions using exact operators. For conservative data-driven fi-
                                                                                   nite volume model, there is an order of magnitude better
In the case that K = {n + 1}, this corresponds to using an                         accuracy for large timestep integration.
Implicit Euler scheme to model the dynamics. Many other
choices are possible, and later we shall discuss estimators
with conservation properties. The learning capabilities of                         natural degrees of freedom for a given model. This provides
GMLS-Nets to regress differential operators are shown in                           access to structure preserving properties such as conservation,
Fig. 2. As we shall discuss in more detail, this can be used                       e.g., conservation of mass in a physical system.
to identify the underlying dynamics and obtain governing
                                                                                      We take as a source of training data the following analytic
equations.
                                                                                   solution to the 1D unsteady advection-diffusion equation with
                                                                                   advection and diffusion coefficients a and ν on the interval
Long-time integrators: discretization for native                                   Ω = [0, 30].
data-driven modeling.
                                                                                                                                        
The GMLS framework provides useful ways to target and                                                      1              x − (x0 + at)
                                                                                           uex (x, t) = √        exp −                         (8)
sample arbitrary functionals. In a data transfer context, this                                          a 4πνt                 4νt
has been leveraged to couple heterogeneous codes. For ex-
ample, one may sample the flux degrees of freedom of a                             To construct a finite difference model (FDM), we assume
Raviart-Thomas finite element space and target cell integral                       a node set N = {x0 = 0, x1 , ..., xN −1 , xN = 30}. To con-
degrees of freedom of a finite volume code to perform native                       struct a finite volume model (FVM), we construct the set
data transfer. This avoids the need to perform intermediate                        of cells C = {[xi , xi+1 ], xi , xi+1 ∈ N, i ∈ {0, ..., N − 1}},
projections/interpolations [20]. Motivated by this, we demon-                      with associated cell measure µ(ci ) = |xi+1 − xi | and set of
strate that GMLS may be used to learn discretization native                        oriented boundary faces Fi = ∂ci = {xi+1 , −xi }. We then
data-driven models, whereby dynamics are learned in the                            assume for uniform timestep ∆t = tn+1 − tn the Implicit
Euler update for the FDM given by                                               Brownian Trajectories                    Density Estimation
                                                                            10                               2
               un+1 − uni                                                                                                                 t=0histogram
                i
                         = LF DM [un+1 ; ξ],            (9)


                                                                         x(t)
                                                                                0                                                            filtered
                  ∆t
To obtain conservation we use the FVM update                               -10
                                                                                    0     t             10

                                                                                Particle Distribution                                       t=5
  un+1  − uni


                                                                                                             ρ
                           Z
                   1 X
    i
                               LF V M [un+1 ; ξ] · dA. (10)


                                                                           t= 0
              =
      ∆t         µ(ci )
                         f ∈Fi


                                                                           t= 5
For the advection-diffusion equation in the limit ∆t → 0,
                                                                                                             0
LF DM,ex = a · ∇u + ν∇2 u and LF V M,ex = au + ν∇u. By                              0     x              1       0                    x                  1

construction, for any choice of hyperparameters ξ the FVM                                       Prediction of Density Evolution
will be locally conservative. In this sense, the physics of mass         2.0
                                                                                                                                 prediction
conservation are enforced strongly via the discretization, and                                                                   particle density
we parameterize only an empirical closure for fluxes - GMLS              1.5

naturally enables such native flux regression.
                                                                         1.0                        t
   We use a single linear GMLS-net layer to parameterize


                                                                     ρ
both LF DM and LF V M , and train over a single timestep by              0.5
using Eqn. 8 to evaluate the exact time increment in Eqns. 9-
10 . We perform gradient descent to minimize the RMS of the              0.0

residual with respect to ξ. For the FDM and FVM we use a                        0.0           0.2            0.4     x     0.6              0.8          1.0

cubic and quartic polynomial space, respectively. Recall that
to resolve the diffusion and advective timescales one would         Figure 4: GMLS-Nets can be trained with molecular-level
select a timestep of roughly ∆tCF L = min 21 a∆t   ∆x , 4 ∆x2 .
                                                        1 ν∆t
                                                              
                                                                    data to infer continuum dynamical models. Data are simula-
   After regressing the operator, we solve then extracted    o      tions of Brownian motion with periodic boundary conditions
                                                        t
scheme to advance from u0i = u(xi , t0 ) i to uif inal .
                             
                                                                    on Ω = [0, 1] and diffusivity D = 1 (top-left, unconstrained
                                                                i
As implicit Euler is unconditionally stable, one may se-            trajectory). Starting with initial density of a heaviside func-
lect ∆t  ∆tCF L at the expense of introducing nu-                  tion, we construct histograms over time to estimate the par-
merical dissipation, ”smearing” the solution. We consider           ticle density (upper-right, solid lines) and perform further
∆t ∈ {0.1∆tCF L , ∆tCF L , 10∆tCF L } and compare both              filtering to remove sampling noise (upper-right, dashed lines).
the learned FDM/FVM dynamics to those obtained with                 GMLS-Net is trained using FVM estimator of equation 10.
a standard discretization (i.e. letting LF DM = LF DM,ex .          Predictive continuum model is obtained for the density evolu-
From Fig. 3 we observe that for ∆t/∆tCF L ≤ 1 both the              tion. Long-term agreement is found between the particle-level
regressed and reference models agree well with the analytic         simulation (bottom, solid lines) and the inferred continuum
solution. However, for ∆t = 10∆tCF L , we see that while the        model (bottom, dashed lines).
reference models are overly dissipative, the regressed models
match the analytic solution. Inspection of the `2 −norm of the
solutions at tf inal in Table 1 indicates that as expected, the     to accurately predict important statistical moments of the
classical solutions corresponding to LF DM,ex and LF V M,ex         high-fidelity model over longer timescales. As an example,
converge as O(∆t). The regressed FDM is consistently more           consider a mean-field continuum model derived by coarse-
accurate than the exact operator. Most interesting, the re-         graining a molecular dynamics simulation. Classically, one
gressed FVM is roughly independent of ∆t, providing a 20×           may pursue homogenization analysis to carefully derive such
improvement in accuracy over the classical model. This pre-         a continuum model, but such techniques are typically prob-
liminary result suggests that GMLS-Nets offer promise as a          lem specific and can become technical. We illustrate here how
tool to develop non-dissipative implicit data-driven models.        GMLS-Nets can be used to extract a conservative continuum
We suggest that this is due to the ability for GMLS-Nets to         PDE model from particle-level simulation data.
regress higher-order differential operator corrections to the
discrete time dynamics, similar to e.g. Lax-Friedrichs/Lax-            Brownian motion has as its infinitesimal generator the
Wendroff schemes.                                                   unsteady diffusion equation [3]. As a basic example, we
                                                                    will extract a 1D diffusion equation to predict the long-
Data-driven modeling from molecular dynamics.                       term density of a cloud of particles undergoing pseudo-
In science and engineering applications, there are often high-      1D Brownian motion. We consider the periodic domain
fidelity descriptions of the physics based on molecular dy-         Ω = [0, 1] × [0, 0.1], and generate a collection of Np parti-
namics. One would like to extract continuum descriptions            cles with initial position xp (t = 0) drawn from the uniform
to allow for predictions over longer time/length-scales or re-      distribution U [0, 0.5] × U [0, 0.1].
duce computational costs. Coarse-grained modeling efforts              Due to this initialization and domain geometry, the particle
also have similar aims while retaining molecular degrees            density is statistically one dimensional. We estimate the den-
of freedom. Each seek lower-fidelity models that are able           sity field ρ(x, t) along the first dimension by constructing a
collection C of N uniform width cells and build a histogram,           MNIST         Input                 GMLS Features
                                                                       Classes      Image         a[0]    a[1]    a[2]    a[3]    a[4]
                            Np
                                     1xp (t)∈c 1x∈c .
                           XX
               ρ(x, t) =                                    (11)
                           c∈C p=1                                                                a[5]    a[6]    a[7]    a[8]    a[9]


The 1x∈A is the indicator function taking unit value for x ∈                       GMLS-Layer

A and zero otherwise.                                                                             a[10]   a[11]   a[12]   a[13]   a[14]

   We evolve the particle positions xp (t) under 2D Brownian
motion (the density will remain statistically 1D as the parti-
cles evolve). In the limit Np /N → ∞, the particle density
                                                                                     Case       Conv-2L      Hybr id-2L   GMLS-2L
satisfies a diffusion equation, and we can scale the Brownian
                                                                                    MNIST       98.52%        98.41%       96.87%
motion increments to obtain a unit diffusion coefficient in
this limit.
   As the ratio Np /N is finite, there is substantial noise in the   Figure 5: MNIST Classification. GMLS-Layers are substi-
extracted density field. We obtain a low pass filtered density,      tuted for convolution layers in a basic two-layer architecture
ρe(x, t), by convolving ρ(x, t) with a Gaussian kernel of width      (Conv2d + ReLu + MaxPool + Conv2d + ReLu + MaxPool +
twice the histogram bin width.                                       FC). The Conv-2L test are all Conv-Layers, Hybrib-2L has
   We use the FVM scheme in the same manner as in the                GMLS-Layer followed by a Conv-Layer, and GMLS-2L uses
previous section. In particular, we regress a flux that matches      all GMLS-Layers. GMLS-Nets used a polynomial basis of
the increment (e ρ(x, t = 10) − ρe(x, t = 12))/2∆t. This win-        monomials. The filters in GMLS are by design more limited
dow was selected, since the regression at t = 0 is ineffective       than a general Conv-Layer and correspond here to estimated
as the density approximates a heaviside function. Such near          derivatives of the data set (top-right). Despite these restric-
discontinuities are poorly represented with polynomials and          tions, the GMLS-Net still performs reasonably well on this
subsequently not expected to train well. Additionally, we            basic classification task (bottom-table).
train over a time interval of 2∆t, where in general k∆t steps
can be used to help mollify high-frequency temporal noise.
   To show how the GMLS-Nets’ inferred operator can be               of basis for p∗ and sampling functionals λj , other features
used to make predictions, we evolve the regressed FVM                may be extracted. For polynomials with terms in dictionary
for one hundred timesteps and compare to the density field           order, coefficients are shown in Fig. 5. Notice the clear trends
obtained from the particle solver. We apply Dirichlet bound-         and directional dependence on increases and decreases in the
ary conditions ρ(0, t) = ρ(1, t) = 1 and initial conditions          image intensity, indicating c[1] ∼ ∂x and c[2] ∼ ∂y . Given
matching the histogram ρ(x, t = 0). Again, the FVM by                the history of PDE modeling, for many classification and
construction   is conservative, where it is easily shown for all     regression tasks arising in the sciences and engineering, we
t that Ω ρdx = Np . A time series summarizing the evolu-
       R                                                             expect such derivative-based features extracted by GMLS-
tion of density in both the particle solver and the regressed        Nets will be useful in these applications.
continuum model is provided in Fig 4. While this is a ba-            GMLS-Net on unstructured fluid simulation data.
sic example, this illustrates the potential of GMLS-nets in
constructing continuum-level models from molecular data.             We consider the application of GMLS-Nets to unstructured
These techniques also could have an impact on data-driven            data sets representative of scientific machine learning appli-
approaches for numerical methods, such as projective inte-           cations. Many hydrodynamic flows can be experimentally
gration schemes.                                                     characterized using velocimetry measurements. While veloc-
                                                                     ity fields can be estimated even for complex geometries, in
Image processing: MNIST benchmark.                                   such measurements one often does not have access directly
While image processing is not the primary application area           to fields, such as the pressure. However, integrated quanti-
we intend, GMLS-Nets can be used for tasks such as classifi-         ties of interest, such as drag are fundamental for performing
cation. For the common MNIST benchmark task, we compare              engineering analysis and yet depend upon both the velocity
use of GMLS-Nets with CNNs in Figure 5. CNNs use kernel              and pressure. This limits the level of characterization that
size 5, zero-padding, max-pool reduction 2, channel sizes            can be accomplished when using velocimetry data alone. We
16, 32, FC as linear map to soft-max prediction of the cat-          construct GMLS-Net architectures that allow for prediction
egories. The GMLS-Nets use the same architecture with a              of the drag directly from unstructured fluid velocity data,
GMLS using polynomial basis of monomials in x, y up to               without any direct measurement of the pressure.
degree porder = 4.                                                      We illustrate the ideas using flow past a cylinder of radius
   We find that despite the features extracted by GMLS-Nets          L. This provides a well-studied canonical problem whose
being more restricted than a general CNN, there is only a            drag is fully characterized experimentally in terms of the
modest decrease in the accuracy for the basic MNIST task.            Reynolds number, Re = U L/ν. For incompressible flow
We do expect larger differences on more sophisticated image          past a cylinder, one may apply dimensional analysis to relate
tasks. This basic test illustrates how GMLS-Nets with a poly-        drag Fd to the Reynolds number via the drag coefficient Cd :
nomial basis extracts features closely associated with taking
                                                                                                             
                                                                                           2Fd             UL
derivatives of the data field. We emphasize for other choices                                2 A
                                                                                                  = Cd          .               (12)
                                                                                          ρU∞               ν
The U∞ is the free-stream velocity, A is the frontal area of the               structured data sets.
cylinder, and Cd : R → R. Such analysis requires in practice                      As an architecture, we provide two input channels for the
engineering judgement to identify relevant dimensionless                       two velocity components to three stacked GMLS layers. The
groups. After such considerations, this allows one to collapse                 first layer acts on the cell centers, and intermediate pooling
relevant experimental parameters to (ρ, U∞ , A, L, ν) onto a                   layers down-sample to random subsets of Xh . We conclude
single curve.                                                                  with a linear activation layer to extract the drag coefficient
                                                                               as a single scalar output. We randomly select 80% of the
                                                                               samples for training, and use the remainder as a test set. We
                   2.5                                    Training data        quantify using the root-mean-square (MSE) error which we
                                                          GMLS-Net test data   find to be below 1.5%.
                                                                                  The excellent predictive capability demonstrated in Fig. 6
                                                                               highlights GMLS-Nets ability to provide an effective means
                                                                               of regressing engineering quantities of interest directly from
Drag coefficient


                    2
                                                                               velocity flow data; the GMLS-Net architecture is able to
                                                                               identify a latent low-dimensional parameter space which is
                                                                               typically found by hand using dimensional analysis. This
                                                                               similarity relationship across the Reynolds numbers is identi-
                   1.5
                                                                               fied, despite the fact that it does not have direct access to the
                                                                               viscosity parameter. These initial results indicate some of the
                                                                               potential of GMLS-Nets in processing unstructured data sets
                                                                               for scientific machine learning applications.
                    1
                         100   10000              1e+06        1e+08
                                       Reynolds number
                                                                                                      Conclusions
                                                                               We have introduced GMLS-Nets for processing scattered
                                                                               data sets leveraging the framework of GMLS. GMLS-Nets
                                                                               allow for generalizing convolutional networks to scattered
                                                                               data, while still benefiting from underlying translational in-
                                                                               variances and weight sharing. The GMLS-layers provide
                                                                               feature extractors that are natural particularly for regressing
Figure 6: GMLS-Nets are trained on a CFD data set of flow                      differential operators, developing dynamical models, and pre-
velocity fields. Top: Training set of the drag coefficient plot-               dicting quantities of interest associated with physical systems.
ted as a function of Reynolds number (small black dots). The                   GMLS-Nets were demonstrated to be capable of obtaining
GMLS-Net predictions for a test set (large red dots). Bottom:                  dynamical models for long-time integration beyond the lim-
Flow velocity fields corresponding to the smallest (left) and                  its of traditional CFL conditions, for making predictions of
largest (right) Reynolds numbers in the test set.                              density evolution of molecular systems, and for predicting
                                                                               directly from flow data quantities of interest in fluid mechan-
                                                                               ics. These initial results indicate some promising capabilities
   For the purposes of training a GMLS-Net, we construct a                     of GMLS-Nets for use in data-driven modeling in scientific
synthetic data set by solving the Reynolds averaged Navier-                    machine learning applications.
Stokes (RANS) equations with a steady state finite volume
code. Let L = ρ = 1 and consider U ∈ [0.1, 20] and
ν ∈ 10−2 , 108 . We consider a k −  turbulence model
with inlet conditions consistent with a 10% turbulence inten-                                          References
sity and a mixing length corresponding to the inlet size. From                  [1]   D.S. Broomhead and D. Lowe. “Multivariable Func-
the solution, we extract the velocity field u at cell centers                         tional Interpolation and Adaptive Networks”. In: Com-
to obtain an unstructured point cloud Xh . We compute Cd                              plex Systems 2.1 (1988), pp. 321–355.
directly from the simulations. We then obtain an unstruc-                       [2]   T. Poggio and F. Girosi. “Networks for approxima-
tured data set of 400 (u)i features over Xh , with associated                         tion and learning”. In: Proceedings of the IEEE 78.9
labels Cd . We emphasize that although U∞ and ν are used to                           (1990), pp. 1481–1497.
generate the data, they are not included as features, and the
                                                                                [3]   Ioannis Karatzas and Steven E Shreve. “Brownian
Reynolds number is therefore hidden.
                                                                                      Motion and Stochastic Calculus”. In: Springer, 1998,
   We remark that the k −  model is well known to perform                            pp. 47–127.
poorly for flows with strong curvature such as recirculation
zones. Here, in our proof-of-concept demonstration, we treat                    [4]   I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artificial
the RANS-k −  solution as ground truth for simplicity, de-                           neural networks for solving ordinary and partial dif-
spite its short-comings and acknowledge that a more physical                          ferential equations”. In: IEEE Transactions on Neural
study would consider ensemble averages of LES/DNS data                                Networks 9.5 (1998), pp. 987–1000.
in 3D. We aim here just to illustrate the potential utility of                  [5]   Holger Wendland. Scattered data approximation.
GMLS-Nets in a scientific setting for processing such un-                             Vol. 17. Cambridge university press, 2004.
 [6]   Susanne Brenner and Ridgway Scott. The Mathemati-        [21] Zichao Long et al. “PDE-Net: Learning PDEs from
       cal Theory of Finite Element Methods. Springer, 2008.         Data”. In: Proceedings of the 35th International Con-
 [7]   Franco Scarselli et al. “The Graph Neural Network             ference on Machine Learning. Ed. by Jennifer Dy
       Model”. In: Trans. Neur. Netw. 20.1 (Jan. 2009),              and Andreas Krause. Vol. 80. Proceedings of Ma-
       pp. 61–80. ISSN: 1045-9227.                                   chine Learning Research. Stockholmsmssan, Stock-
 [8]   Joan Bruna et al. “Spectral networks and locally con-         holm Sweden: PMLR, 2018, pp. 3208–3216.
       nected networks on graphs”. English (US). In: In-        [22] Ravi G. Patel and Olivier Desjardins. “Nonlinear
       ternational Conference on Learning Representations            integro-differential operator regression with neural net-
       (ICLR2014), CBLS, April 2014. 2014.                           works”. In: ArXiv abs/1810.08552 (2018).
 [9]   Steven L. Brunton, Joshua L. Proctor, and J. Nathan      [23] Maziar Raissi and George Em Karniadakis. “Hidden
       Kutz. “Discovering governing equations from data by           physics models: Machine learning of nonlinear partial
       sparse identification of nonlinear dynamical systems”.        differential equations”. In: Journal of Computational
       In: 113.15 (2016), pp. 3932–3937.                             Physics 357 (2018), pp. 125 –141. ISSN: 0021-9991.
[10]   Thomas N. Kipf and Max Welling. “Semi-Supervised         [24] Yifan Xu et al. “SpiderCNN: Deep Learning on Point
       Classification with Graph Convolutional Networks”.            Sets with Parameterized Convolutional Filters”. In:
       In: ArXiv abs/1609.02907 (2016).                              Computer Vision – ECCV 2018. Ed. by Vittorio Ferrari
[11]   Federico Monti et al. “Geometric Deep Learning on             et al. Cham: Springer International Publishing, 2018,
       Graphs and Manifolds Using Mixture Model CNNs”.               pp. 90–105. ISBN: 978-3-030-01237-3.
       In: 2017 IEEE Conference on Computer Vision and          [25] Nathan Baker et al. Workshop report on basic research
       Pattern Recognition (CVPR) (2016), pp. 5425–5434.             needs for scientific machine learning: Core technolo-
[12]   M. M. Bronstein et al. “Geometric Deep Learning:              gies for artificial intelligence. Tech. rep. USDOE Of-
       Going beyond Euclidean data”. In: IEEE Signal Pro-            fice of Science (SC), Washington, DC (United States),
       cessing Magazine 34.4 (2017), pp. 18–42. ISSN: 1053-          2019.
       5888. DOI: 10.1109/MSP.2017.2693418.                     [26] Yohai Bar-Sinai et al. “Learning data-driven discretiza-
[13]   Charles R. Qi et al. “PointNet: Deep Learning on Point        tions for partial differential equations”. In: Proceed-
       Sets for 3D Classification and Segmentation”. In: The         ings of the National Academy of Sciences 116.31
       IEEE Conference on Computer Vision and Pattern                (2019), pp. 15344–15349. ISSN: 0027-8424. DOI: 10.
       Recognition (CVPR). 2017.                                     1073/pnas.1814058116.
[14]   Charles Ruizhongtai Qi et al. “PointNet++: Deep Hi-      [27] M. Raissi, P. Perdikaris, and G.E. Karniadakis.
       erarchical Feature Learning on Point Sets in a Metric         “Physics-informed neural networks: A deep learning
       Space”. In: Advances in Neural Information Process-           framework for solving forward and inverse problems
       ing Systems 30. Ed. by I. Guyon et al. Curran Asso-           involving nonlinear partial differential equations”. In:
       ciates, Inc., 2017, pp. 5099–5108.                            Journal of Computational Physics 378 (2019), pp. 686
[15]   Samuel H. Rudy et al. “Data-driven discovery of par-          –707.
       tial differential equations”. In: 3.4 (2017).            [28] Hugues Thomas et al. “KPCONV: Flexible and de-
[16]   Nathaniel Trask, Mauro Perego, and Pavel Bochev.              formable convolution for point clouds”. In: Proceed-
       “A high-order staggered meshless method for elliptic          ings of the IEEE International Conference on Com-
       problems”. In: SIAM Journal on Scientific Computing           puter Vision. 2019, pp. 6411–6420.
       39.2 (2017), A479–A502.                                  [29] BJ Gross et al. “Meshfree methods on manifolds for
[17]   Manzil Zaheer et al. “Deep Sets”. In: Advances in             hydrodynamic flows on curved surfaces: a generalized
       Neural Information Processing Systems 30. Ed. by I.           moving least-squares (GMLS) approach”. In: Journal
       Guyon et al. Curran Associates, Inc., 2017, pp. 3391–         of Computational Physics (2020), p. 109340.
       3401.                                                    [30] Nathaniel Trask, Pavel Bochev, and Mauro Perego.
[18]   P. J. Atzberger. “Importance of the Mathematical Foun-        “A conservative, consistent, and scalable meshfree
       dations of Machine Learning Methods for Scientific            mimetic method”. In: Journal of Computational
       and Engineering Applications”. In: SciML2018 Work-            Physics 409 (2020), p. 109187.
       shop, position paper, https://arxiv.org/abs/1808.02213
       (2018).
[19]   M. Fey et al. “SplineCNN: Fast Geometric Deep
       Learning with Continuous B-Spline Kernels”. In: 2018
       IEEE/CVF Conference on Computer Vision and Pat-
       tern Recognition. 2018, pp. 869–877.
[20]   Paul Allen Kuberry, Pavel B Bochev, and Kara J Pe-
       terson. A virtual control meshfree coupling method for
       non-coincident interfaces. Tech. rep. Sandia National
       Lab.(SNL-NM), Albuquerque, NM (United States),
       2018.
Derivation of Gradients of the Operator τxi [u].                              where
                                                                                      ∂M           X  ∂                  
Parameters of the operator τ̃ .                                                               =                 p(xj , xi ) p(xj , xi )T wij
                                                                                      ∂xi                  ∂xi
We give here some details on the derivation of the gradients for                                    j
the learnable GMLS operator τ [u] and intermediate steps. This                                                 
                                                                                                                  ∂
                                                                                                                                 T
can be used in implementations for back-propagation and other                                 +    p(xj , xi )       p(xj , xi )      wij
applications.                                                                                                    ∂xi
                                                                                                                                  
    GMLS works by mapping data to a local polynomial fit in region                                                         ∂wij
                                                                                              +    p(xj , xi )p(xj , xi )T          .
Ωi around xi with p∗ (x) ≈ u(x) for x ∈ Ωi . To find the optimal                                                           ∂xi
fitting polynomial p∗ (x) ∈ V to the function u(x), we can consider
the case with λj (x) = δ(x − xj ) and weight function wij =                   The derivatives in r are given by
w(xi −xj ). In a region around a reference point x∗ the optimization                        X  ∂                                    
                                                                                    ∂r                                            ∂wij
problem can be expressed parameterically in terms of coefficients a                      =               p(xj ) uj wij + p(xj )uj        .
as                                                                                 ∂xi       j
                                                                                                    ∂xi                           ∂xi
                              X                   2
          a∗ (xi ) = arg minm       uj − p(xj )T a wij .                      The full derivative of the linear operator τ̃ can be expressed as
                            a∈R
                                   j                                                                                                         
                                                                                ∂                ∂                                   ∂ ∗
We write for short p(xj ) = p(xj , xi ), where the basis elements                  τ̃ (xi ) =       q(xi )T a∗ (xi ) + q(xi )T          a (xi ) .
                                                                               ∂xi              ∂xi                                 ∂xi
in fact do depend on xi . Typically, for polynomials we just use
p(xj , xi ) = p(xj − xi ). This is important in the case we want to           In the constant case q(xi ) = q0 , the derivative of τ̃ simplifies to
take derivatives in the input values xi of the expressions.                                                                   
   We can compute the derivative in a` to obtain                                                ∂
                                                                                                   τ̃ (xi ) = qT0
                                                                                                                     ∂ ∗
                                                                                                                        a (xi ) .
                                                                                               ∂xi                  ∂xi
                               ∂J
                                   (xi ) = 0.                                    The derivatives of the other terms follow more readily. For deriva-
                               ∂a`
                                                                              tive of the linear operator τ̃ in the coefficients a(xi ), we have
This implies
                                                                                                       ∂
         "                             #                                                                     τ̃ (xi ) = q(xi ).
           X                                      X                                                  ∂a(xi )
                     p(xj )wij p(xj )T a =              wij p(xj )uj .
             j                                     j                          For derivatives of the linear operator τ̃ in the mapping coefficient q
                                                                              values, we have
Let                                                                                                      ∂
                                                                                                             τ̃ (xi ) = a(xi ).
           "                                 #                                                       ∂q(xi )
                                                                                 In the case of nonlinear operators τ̃ = q(a(xi )) there are further
               X                         T
                                                        X
      M=              p(xj )wij p(xj )           , r=        wij p(xj )uj ,
                 j                                       j                    dependencies beyond just xi and a(xi ), and less explicit expres-
                                                                              sions. For example, when using MLP’s there may be hierarchy of
then we can rewrite the coefficients as the solution of the linear            trainable weights w. The derivatives of the non-linear operator can
system                                                                        be expressed as
                        M a∗ (xi ) = r.
                                                                                                     ∂            ∂q
                                                                                                       τ̃ (xi ) =    (a(xi )).
This is sometimes written more explicitly for analysis and computa-                                 ∂w            ∂w
tions as                                                                      Here, one relies on back-propagation algorithms for evaluation of
                        a∗ (xi ) = M −1 r.                                    ∂q
                                                                                   . Similarly, given the generality of q(a), for derivatives in a and
                                                                              ∂w
We can represent a general linear operator τ̃ (xi ) using the a∗ repre-       xi , one can use back-propagation methods on q and the chain-rule
sentation as                                                                  with the expressions derived during the linear case for a and xi
                                                                              dependencies.
                     τ̃ (xi ) = q(xi )T a∗ (xi )
Typically, the weights will not be spatially dependent q(xi ) = q0 .
Throughout, we shall denote this simply as q and assume there is
no spatial dependence, unless otherwise indicated.

Derivatives of τ̃ in xi , a(xi ), and q.
The derivative in xi is given by

                      ∂ ∗          ∂M −1          ∂r
                         a (xi ) =       r + M −1
                     ∂xi            ∂xi           ∂xi
In the notation, we denote p(xj ) = p(xj , xi ), where the basis
elements in fact can depend on the particular xi . These terms can
be expressed as

                         ∂M −1         ∂M −1
                               = −M −2       ,
                          ∂xi           ∂xi