=Paper= {{Paper |id=Vol-2587/article_9 |storemode=property |title=GMLS-Nets: A Machine Learning Framework for Unstructured Data |pdfUrl=https://ceur-ws.org/Vol-2587/article_9.pdf |volume=Vol-2587 |authors=Nathaniel Trask,Ravi Patel,Paul Atzberger,Ben Gross |dblpUrl=https://dblp.org/rec/conf/aaaiss/TraskPAG20 }} ==GMLS-Nets: A Machine Learning Framework for Unstructured Data== https://ceur-ws.org/Vol-2587/article_9.pdf

GMLS-Nets: A machine learning framework for unstructured data
Nathaniel Trask 1, + , Ravi G. Patel1 , Ben J. Gross 2 , Paul J. Atzberger2,†
1
Sandia National Laboratories∗
Center for Computing Research
{natrask,rgpatel}@sandia.gov
2
University of California Santa Barbara
Department of Mathematics and Mechanical Engineering
http://atzberger.org/
atzberg@gmail.com

Abstract under partially known dynamics. This data is often scarce or
highly constrained, and it has been proposed that successful
Data fields sampled on irregularly spaced points arise in many SciML strategies will leverage prior knowledge to enhance
science and engineering applications. For regular grids, Con-
information gained from such data [18, 25]. One may exploit
volutional Neural Networks (CNNs) gain benefits from weight
sharing and invariances. We generalize CNNs by introducing physical properties such as transformation symmetries, con-
methods for data on unstructured point clouds using Gen- servation structure, or solution regularity [6, 9, 18]. This new
eralized Moving Least Squares (GMLS). GMLS is a non- application space necessitates ML architectures capable of
parametric meshfree technique for estimating linear bounded utilizing such knowledge.
functionals from scattered data, and has emerged as an effec- For data sampled on regular grids, Convolutional Neural
tive technique for solving partial differential equations (PDEs). Networks (CNNs) are widely used to exploit translation in-
By parameterizing the GMLS estimator, we obtain learning variance and hierarchical structure to extract features from
methods for linear and non-linear operators with unstructured data. Here we generalize this technique to the SciML setting
stencils. The requisite calculations are local, embarrassingly
by introducing GMLS-Nets based on the scattered data ap-
parallelizable, and supported by a rigorous approximation the-
ory. We show how the framework may be used for unstructured proximation theory underlying GMLS. Similar to how CNNs
physical data sets to perform operator regression, develop pre- learn stencils which benefit from weight-sharing, GMLS-
dictive dynamical models, and obtain feature extractors for en- Nets operate by using local reconstructions to learn operators
gineering quantities of interest. The results show the promise between function spaces. The resulting architecture is simi-
of these architectures as foundations for data-driven model larly interpretable and serves as an effective generalization
development in scientific machine learning applications. of CNNs to unstructured data, while providing mechanisms
to incorporate knowledge of underlying physics.
In this work we show how GMLS-Nets may be used in a
Introduction SciML setting. Our results show GMLS-Nets are an effective
Many scientific and engineering applications require process- tool to discover PDEs, which may be used as a foundation
ing data sets sampled on irregularly spaced points. Consider to construct data-driven models while preserving physical
e.g. GIS data associating geospatial locations with measure- invariants like conservation principles. We also show they
ments, or scientific simulations with unstructured meshes. may be used to improve traditional scientific components,
This need is amplified by the recent surge of interest in scien- such as time integrators. We show they also can be used
tific machine learning (SciML) [25] targeting the application to regress engineering quantities of interest from scientific
of data-driven techniques to the sciences. In this setting, data simulation data. Finally, we briefly show GMLS-Nets can
typically takes the form of e.g. synthetic simulation data from perform reasonably relative to convNets on traditional com-
meshes, or from sensors associated with data sites evolving puter vision benchmarks. These results indicate the promise
∗
of GMLS-Nets to support data-driven modeling efforts in
Sandia National Laboratories is a multimission laboratory man- SciML applications. Implementations in TensorFlow and Py-
aged and operated by National Technology and Engineering So- Torch are available at https://github.com/rgp62/gmls-nets and
lutions of Sandia, LLC.,a wholly owned subsidiary of Honeywell
International, Inc., for the U.S. Department of Energys National
https://github.com/atzberg/gmls-nets.
Nuclear Security Administration under contract DE-NA-0003525.
This paper describes objective technical results and analysis. Any Generalized Moving Least Squares (GMLS)
subjective views or opinions that might be expressed in the paper
do not necessarily represent the views of the U.S. Department of
Generalized Moving Least Squares (GMLS) is a non-
Energy or the United States Government. parametric functional regression technique to construct ap-
Copyright c 2020, for this paper by its authors. Use permitted under proximations of linear, bounded functionals from scattered
Creative Commons License Attribution 4.0 International (CCBY samples of an underlying field by solving local least-square
4.0). problems. On a Banach space V with dual space V∗ , we aim
to recover an estimate of a given target functional τx̃ [u] ∈ V∗ stress however that this framework supports a much broader
acting on u = u(x) ∈ V, where x, x̃ denote associated loca- application. Consider e.g. learning from flux data related to
tions in a compactly supported domain Ω ⊂ Rd . We assume H(div)-conforming discretizations, R where one may select
u is characterized by an unstructured collection of sampling as sampling functional λi (u) = fi u · dA, or consider the
N
functionals, Λ(u) := {λj (u)}j=1 ⊂ V∗ . physical constraints that may be imposed by selecting P as
To construct this estimate, we consider P ⊂ V and seek an be divergence free or satisfy a differential equation.
element p∗ ∈ P which provides an optimal reconstruction of We illustrate now the connection between GMLS and con-
the samples in the following weighted-`2 sense. volutional networks in the case of a uniform grid, Xh ⊂ Zd .
Consider a sampling functional λj (u) = (u(xj ) − u(xi )),
N
∗
X 2 and assume the parameterization τx̃,ξ (Φ) = ξ1 , ..., ξdim(P) ,
p = argmin (λj (u) − λj (p)) ω(λj , τx̃ ). (1) xi,j = xi − xj . Then the GMLS estimate is given explicitly
p∈P j=1
at a point xi by
Here ω(λj , τx̃ ) is a positive, compactly supported kernel !−1
function establishing spatial correlation between the tar-
X X
h
τx˜i [u] = ξα φα (xk )W (xi,k )φβ (xk )
get functional and sampling set. If one associates locations α,β,j k
(5)
N
Xh := {xj }j=1 ⊂ Ω with Λ(u), then one may consider
φβ (xj )W (xi,j )(uj − ui ).
radial kernels ω = W (||xj − x̃||2 ), with support r < .
Assuming the basis P = span{φ1 , ..., φdim(P) }, and denot- Contracting
P terms involving α, β and k, we may write
ing Φ(x) = {φi (x)}i=1,...,dim(P) , the optimal reconstruction τxh˜i [u] = j c(τ, Λ)ij (uj − ui ). The collection of stencil
may be written in terms of an optimal coefficient vector a(u) coefficients at xi ∈ Xh are {c(τ, Λ)ij }j . Therefore, one
application for GMLS is to build stencils similar to convo-
p∗ = Φ(x)| a(u). (2) lutional networks. A major distinction is that GMLS can
Provided one has knowledge of how the target functional handle scattered data sets and a judicious selection of Λ, P
acts on P, the final GMLS estimate may be obtained by and ω can be used to inject prior information. Alternatively,
applying the target functional to the optimal reconstruction one may interpret the regression over P as an encoding in a
low-dimensional space well-suited to characterize common
τx̃h [u] = τx̃ (Φ)| a(u). (3) operators. For continuous functions for example, an opera-
tor’s action on the space of polynomials is often sufficient
Sufficient conditions for the existence of solutions to Eqn. to obtain a good approximation. Unlike CNNs there is no
1 depend only upon the unisolvency of Λ over V, the distri- need to handle boundary effects; GMLS-nets instead learns
bution of samples Xh , and mild conditions on the domain one-sided stencils.
Ω; they are independent of the choice of τx̃ . For theoretical
underpinnings and recent applications, we refer readers to [5, GMLS-Nets
16, 29, 30]. From an ML perspective, GMLS estimation consists of two
GMLS has primarily been used to obtain point estimates parts: (i) data is encoded via the coefficient vector a(u) pro-
of differential operators to develop meshfree discretizations viding a compression of the data in terms of P, (ii) the op-
of PDEs. The abstraction of GMLS however provides a math- erator is regressed over P∗ ; this is equivalent to finding a
ematically rigorous approximation theory framework which function qξ : a(u) → R. We propose GMLS-Layers encod-
may be applied to a wealth of problems, whereby one may ing this process in Figure 1, parameterizing a(u) = N N (u).
tailor the choice of τx̃ , Λ, P and ω to a given application. In This architecture accepts input channels indexed by α
the current work, we will assume the action of τx̃ on P is which consist of components of the data vector-field [u]α
unknown, and introduce a parameterization τx̃,ξ (Φ), where ξ sampled over the scattered points Xh . We allow for different
denote hyperparameters to be inferred from data. Classically, sampling points for each channel, which may be helpful for
GMLS is restricted to linear bounded target functionals; we heterogeneous data. Each of these input channels is then used
will also consider a novel nonlinear extension by considering to obtain an encoding of the input field as the vector a(u)
estimates of the form identifying the optimal representer in P.
τx̃h [u] = qx̃,ξ (a(u)), (4) We next select our parameterization of the functional
via qξ , which may be any family of functions trainable by
where qx̃,ξ is a family of nonlinear operators parameterized back-propagation. We will consider two cases in this work
by ξ acting upon the GMLS reconstruction. Where unam- appropriate for linear and non-linear operators. In the lin-
biguous, we will drop the x̃ dependence of operators and ear case we consider qξ (a) = ξ T a, which is sufficient to
simply write e.g. τ h [u] = qξ (a(u)). We have recently used exactly reproduce differential operators. For the nonlinear
related non-linear variants of GMLS to develop solvers for case we parameterize with a multi-layer perceptron (MLP),
PDEs on manifolds in [29]. qξ (a) = MLP(a). Note that in the case of linear activation
For simplicity, in this work we specialize as follows. Let: function, the single layer MLP model reduces to the linear
Λ be point evaluations on Xh ; P be πm (Rd ), the space of model.
p̄
mth -order polynomials; let W (r) = (1 − r/)+ , where f+ Nonlinearity may thus be handled within a single non-
denotes the positive part of a function f and p ∈ N. We linear GMLS-Layer, or by stacking multiple linear GMLS-
GMLS-Layer Splines [19], a Gaussian correlation kernel [10, 11], or a
Mapping MLP kernel function based on a learnable combination of ra-
dial ReLu’s [28]. The SpiderCNNs share many similarities

Coeﬃcient
Channels

Channels

Channels
with GMLS-Nets using a kernel that is based on a learnable

Output
Input

degree-three Taylor polynomial that is taken in product with
a learnable radial piecewise-constant weight function [24].
A key distinction of GMLS-Nets is that operators are re-
{
scattered data
processing
gressed directly over the dual space V∗ without constructing
shape/kernel functions. Both approaches provide ways to ap-
Scattered Data Inputs GMLS-Nets
proximate the action of a processing operator that aggregates
over scattered data.
coeﬃcients We also mention other meshfree learning frameworks:
coeﬃcient channel

a0 a1 a2 a3 a4 ... aN
PointNet [13, 14] and Deep Sets [17], but these are aimed
primarily at set-based data and geometric processing tasks for
Classiﬁcation
segmentation and classification. Additionally, Radial Basis

classes
SD SD MP ... SD MLP
Function (RBF) networks are similarly built upon similar
stack layers
Regression approximation theory [1, 2].
SD SD ... SD MLP L[u] Related work on operator regression in a SciML context in-
input channel
stack layers clude [4, 9, 15, 21–23, 26, 27]. In PINNs [23, 27], a versatile
framework based on DNNs is developed to regress both linear
and non-linear PDE models while exploiting physics knowl-
Figure 1: GMLS-Nets. Scattered data inputs are processed by
edge. In [26] and PDE-Nets [21], CNNs are used to learn
learnable operators τ [u] parameterized via GMLS estimators.
stencils to estimate operators. In [9, 15] dictionary learning
A local reconstruction is built about each data point and en-
is used along with sparse optimization methods to identify
coded as a coefficient vector via equation 2. The coefficient
dynamical systems to infer physical laws associated with
mapping q(a) of equation 4 provides the learnable action of
time-series data. In [22], regression is performed over a class
the operator. GMLS-Layers can be stacked to obtain deeper
of nonlinear pseudodifferential operators, formed by com-
architectures and combined with other neural network opera-
posing neural network parameterized Fourier multipliers and
tions to perform classification and regression tasks (inset, SD:
pointwise functionals.
scattered data, MP: max-pool, MLP: multi-layer perceptron).
GMLS-Nets can be used in conjunction with the above
methods. GMLS-Nets have the distinction of being able to
move beyond reliance on CNNs on regular grids, no longer
layers with intermediate ReLU’s, the later mapping more
need moment conditions to impose accuracy and interpretabil-
directly onto traditional CNN construction. We next in-
ity of filters for estimating differential operators [21], and do
troduce pooling operators applicable to unstructured data,
not require as strong assumptions about the particular form of
whereby for each point in a given target point cloud Xtarget
h , the PDE or a pre-defined dictionary as in [15, 27]. We expect
φ(xi ) = F ({xj |j ∈ Xh , |xj − xi | < }). Here F represents that prior knowledge exploited globally in PINNs methods
the pooling operator (e.g. max, average, etc.). With this col- may be incorporated into the GMLS-Layers. In particular,
lection of operators, one may construct architectures similar the ability to regress natively over solver degrees of freedom
to CNNs by stacking GMLS-Layers together with pooling will be particularly useful for SciML applications.
layers and other NN components. Strided GMLS-layers gen-
eralizing strided CNN stencils may be constructed by choos- Results
ing target sites on a second, smaller point cloud.
Learning differential operators and identifying
Relation to other work. governing equations.
Many recent works aim to generalize CNNs away from the Many data sets arising in the sciences are generated by pro-
limitations of data on regular grids [8, 12]. This includes work cesses for which there are expected governing laws express-
on handling inputs in the form of directed and un-directed ible in terms of ordinary or partial differential equations.
graphs [7], processing graphical data sets in the form of GMLS-Nets provide natural features to regress such opera-
meshes and point-clouds [14, 17], and in handling scattered tors from observed state trajectories or responses to fluctua-
sub-samplings of images [8, 19]. Broadly, these works: (i) use tions. We consider the two settings
the spectral theory of graphs and generalize convolution in the ∂u
frequency domain [8], (ii) develop localized notions similar to = L[u(t, x)] and L[u(x)] = −f (x). (6)
∂t
convolution operations and kernels in the spatial domain [28].
GMLS-Nets is most closely related to the second approach. The L[u] can be a linear or non-linear operator. When the
The closest works include SplineCNNs [19], MoNet [10, data are snapshots of the system state un = u(tn ) at discrete
11], KP-Conv [28], and SpiderCNN [24]. In each of these times tn = n∆t, we use estimators based on
methods a local spatial convolution kernel is approximated un+1 − un
by a parameterized family of functions: open/closed B- = L[{uk }k∈K ; ξ]. (7)
∆t
input prediction target op.
0.8
Initial condition Regressed FVM
Laplacian 1D: Exact solution True operator FDM
0.6
Regressed FDM True operator FVM
2

u
500
0.4 t
0 0
0.2
2 500
predict
input u L[u]
0
0.0 0.5 1.0 0.0 0.5 1.0 0 10 20 30 40
x
Burgers 1D: 0.2
50
2 Regressed FDM
25 0.15 Regressed FVM
1
0 True operator FDM
0
0.1 True operator FVM

u
25
1 Exact
50
0.0 0.5 1.0
0.05
0.0 0.5 1.0

Laplacian 2D: 0
20 30
x

Figure 3: Top: Advection-diffusion solution when ∆t =
∆tCF L . The true model solution and regressed solution all
agree with the analytic solution. Bottom: Solution for under-
u: input L[u]: predicted L[u]: target
resolved dynamics with ∆t = 10∆tCF L . The implicit inte-
grator causes FDM/FVM of true operator to be overly dissi-
Figure 2: Regression of Differential Operators. GMLS-Nets pative. The regressed operator matches well with the FVM
can accurately learn both linear and non-linear operators, operator, matching the phase almost exactly.
shown is the case of the 1D/2D Laplacians and Burger’s
equation. In-homogeneous operators can also be learned ∆t/∆tCF L LF DM,ex LF DM LF V M,ex LF V M
by including as one of the input channels the location x.
Training and test data consists of random input functions 0.1 0.00093 0.00015 0.00014 0.00010
in 1d at 102 nodes on [0, 1] and in 2d at 400 nodes in 1 0.0011 0.00093 0.0011 0.00011
[0, 1] × [0, 1]. Each random input
P function follows a Gaus- 10 0.0083 0.0014 0.0083 0.00035
sian distribution with u(x) = k ξk exp (i2πk · x/L) with
ξk ∼ exp(−α1 k 2 )η(0, 1). Training and test data is generated Table 1: The `2 -error for data-driven finite difference model
with α1 = 0.1 by computed operators with spectral accuracy (FDM) and finite volume models (FVM) for advection-
for Ntrain = 5 × 104 and Ntest = 104 . diffusion equation. Comparisons made to classical discretiza-
tions using exact operators. For conservative data-driven fi-
nite volume model, there is an order of magnitude better
In the case that K = {n + 1}, this corresponds to using an accuracy for large timestep integration.
Implicit Euler scheme to model the dynamics. Many other
choices are possible, and later we shall discuss estimators
with conservation properties. The learning capabilities of natural degrees of freedom for a given model. This provides
GMLS-Nets to regress differential operators are shown in access to structure preserving properties such as conservation,
Fig. 2. As we shall discuss in more detail, this can be used e.g., conservation of mass in a physical system.
to identify the underlying dynamics and obtain governing
We take as a source of training data the following analytic
equations.
solution to the 1D unsteady advection-diffusion equation with
advection and diffusion coefficients a and ν on the interval
Long-time integrators: discretization for native Ω = [0, 30].
data-driven modeling.

The GMLS framework provides useful ways to target and 1 x − (x0 + at)
uex (x, t) = √ exp − (8)
sample arbitrary functionals. In a data transfer context, this a 4πνt 4νt
has been leveraged to couple heterogeneous codes. For ex-
ample, one may sample the flux degrees of freedom of a To construct a finite difference model (FDM), we assume
Raviart-Thomas finite element space and target cell integral a node set N = {x0 = 0, x1 , ..., xN −1 , xN = 30}. To con-
degrees of freedom of a finite volume code to perform native struct a finite volume model (FVM), we construct the set
data transfer. This avoids the need to perform intermediate of cells C = {[xi , xi+1 ], xi , xi+1 ∈ N, i ∈ {0, ..., N − 1}},
projections/interpolations [20]. Motivated by this, we demon- with associated cell measure µ(ci ) = |xi+1 − xi | and set of
strate that GMLS may be used to learn discretization native oriented boundary faces Fi = ∂ci = {xi+1 , −xi }. We then
data-driven models, whereby dynamics are learned in the assume for uniform timestep ∆t = tn+1 − tn the Implicit
Euler update for the FDM given by Brownian Trajectories Density Estimation
10 2
un+1 − uni t=0histogram
i
= LF DM [un+1 ; ξ], (9)

x(t)
0 filtered
∆t
To obtain conservation we use the FVM update -10
0 t 10

Particle Distribution t=5
un+1 − uni

ρ
Z
1 X
i
LF V M [un+1 ; ξ] · dA. (10)

t= 0
=
∆t µ(ci )
f ∈Fi

t= 5
For the advection-diffusion equation in the limit ∆t → 0,
0
LF DM,ex = a · ∇u + ν∇2 u and LF V M,ex = au + ν∇u. By 0 x 1 0 x 1

construction, for any choice of hyperparameters ξ the FVM Prediction of Density Evolution
will be locally conservative. In this sense, the physics of mass 2.0
prediction
conservation are enforced strongly via the discretization, and particle density
we parameterize only an empirical closure for fluxes - GMLS 1.5

naturally enables such native flux regression.
1.0 t
We use a single linear GMLS-net layer to parameterize

ρ
both LF DM and LF V M , and train over a single timestep by 0.5
using Eqn. 8 to evaluate the exact time increment in Eqns. 9-
10 . We perform gradient descent to minimize the RMS of the 0.0

residual with respect to ξ. For the FDM and FVM we use a 0.0 0.2 0.4 x 0.6 0.8 1.0

cubic and quartic polynomial space, respectively. Recall that
to resolve the diffusion and advective timescales one would Figure 4: GMLS-Nets can be trained with molecular-level
select a timestep of roughly ∆tCF L = min 21 a∆t ∆x , 4 ∆x2 .
1 ν∆t

data to infer continuum dynamical models. Data are simula-
After regressing the operator, we solve then extracted o tions of Brownian motion with periodic boundary conditions
t
scheme to advance from u0i = u(xi , t0 ) i to uif inal .

on Ω = [0, 1] and diffusivity D = 1 (top-left, unconstrained
i
As implicit Euler is unconditionally stable, one may se- trajectory). Starting with initial density of a heaviside func-
lect ∆t ∆tCF L at the expense of introducing nu- tion, we construct histograms over time to estimate the par-
merical dissipation, ”smearing” the solution. We consider ticle density (upper-right, solid lines) and perform further
∆t ∈ {0.1∆tCF L , ∆tCF L , 10∆tCF L } and compare both filtering to remove sampling noise (upper-right, dashed lines).
the learned FDM/FVM dynamics to those obtained with GMLS-Net is trained using FVM estimator of equation 10.
a standard discretization (i.e. letting LF DM = LF DM,ex . Predictive continuum model is obtained for the density evolu-
From Fig. 3 we observe that for ∆t/∆tCF L ≤ 1 both the tion. Long-term agreement is found between the particle-level
regressed and reference models agree well with the analytic simulation (bottom, solid lines) and the inferred continuum
solution. However, for ∆t = 10∆tCF L , we see that while the model (bottom, dashed lines).
reference models are overly dissipative, the regressed models
match the analytic solution. Inspection of the `2 −norm of the
solutions at tf inal in Table 1 indicates that as expected, the to accurately predict important statistical moments of the
classical solutions corresponding to LF DM,ex and LF V M,ex high-fidelity model over longer timescales. As an example,
converge as O(∆t). The regressed FDM is consistently more consider a mean-field continuum model derived by coarse-
accurate than the exact operator. Most interesting, the re- graining a molecular dynamics simulation. Classically, one
gressed FVM is roughly independent of ∆t, providing a 20× may pursue homogenization analysis to carefully derive such
improvement in accuracy over the classical model. This pre- a continuum model, but such techniques are typically prob-
liminary result suggests that GMLS-Nets offer promise as a lem specific and can become technical. We illustrate here how
tool to develop non-dissipative implicit data-driven models. GMLS-Nets can be used to extract a conservative continuum
We suggest that this is due to the ability for GMLS-Nets to PDE model from particle-level simulation data.
regress higher-order differential operator corrections to the
discrete time dynamics, similar to e.g. Lax-Friedrichs/Lax- Brownian motion has as its infinitesimal generator the
Wendroff schemes. unsteady diffusion equation [3]. As a basic example, we
will extract a 1D diffusion equation to predict the long-
Data-driven modeling from molecular dynamics. term density of a cloud of particles undergoing pseudo-
In science and engineering applications, there are often high- 1D Brownian motion. We consider the periodic domain
fidelity descriptions of the physics based on molecular dy- Ω = [0, 1] × [0, 0.1], and generate a collection of Np parti-
namics. One would like to extract continuum descriptions cles with initial position xp (t = 0) drawn from the uniform
to allow for predictions over longer time/length-scales or re- distribution U [0, 0.5] × U [0, 0.1].
duce computational costs. Coarse-grained modeling efforts Due to this initialization and domain geometry, the particle
also have similar aims while retaining molecular degrees density is statistically one dimensional. We estimate the den-
of freedom. Each seek lower-fidelity models that are able sity field ρ(x, t) along the first dimension by constructing a
collection C of N uniform width cells and build a histogram, MNIST Input GMLS Features
Classes Image a[0] a[1] a[2] a[3] a[4]
Np
1xp (t)∈c 1x∈c .
XX
ρ(x, t) = (11)
c∈C p=1 a[5] a[6] a[7] a[8] a[9]

The 1x∈A is the indicator function taking unit value for x ∈ GMLS-Layer

A and zero otherwise. a[10] a[11] a[12] a[13] a[14]

We evolve the particle positions xp (t) under 2D Brownian
motion (the density will remain statistically 1D as the parti-
cles evolve). In the limit Np /N → ∞, the particle density
Case Conv-2L Hybr id-2L GMLS-2L
satisfies a diffusion equation, and we can scale the Brownian
MNIST 98.52% 98.41% 96.87%
motion increments to obtain a unit diffusion coefficient in
this limit.
As the ratio Np /N is finite, there is substantial noise in the Figure 5: MNIST Classification. GMLS-Layers are substi-
extracted density field. We obtain a low pass filtered density, tuted for convolution layers in a basic two-layer architecture
ρe(x, t), by convolving ρ(x, t) with a Gaussian kernel of width (Conv2d + ReLu + MaxPool + Conv2d + ReLu + MaxPool +
twice the histogram bin width. FC). The Conv-2L test are all Conv-Layers, Hybrib-2L has
We use the FVM scheme in the same manner as in the GMLS-Layer followed by a Conv-Layer, and GMLS-2L uses
previous section. In particular, we regress a flux that matches all GMLS-Layers. GMLS-Nets used a polynomial basis of
the increment (e ρ(x, t = 10) − ρe(x, t = 12))/2∆t. This win- monomials. The filters in GMLS are by design more limited
dow was selected, since the regression at t = 0 is ineffective than a general Conv-Layer and correspond here to estimated
as the density approximates a heaviside function. Such near derivatives of the data set (top-right). Despite these restric-
discontinuities are poorly represented with polynomials and tions, the GMLS-Net still performs reasonably well on this
subsequently not expected to train well. Additionally, we basic classification task (bottom-table).
train over a time interval of 2∆t, where in general k∆t steps
can be used to help mollify high-frequency temporal noise.
To show how the GMLS-Nets’ inferred operator can be of basis for p∗ and sampling functionals λj , other features
used to make predictions, we evolve the regressed FVM may be extracted. For polynomials with terms in dictionary
for one hundred timesteps and compare to the density field order, coefficients are shown in Fig. 5. Notice the clear trends
obtained from the particle solver. We apply Dirichlet bound- and directional dependence on increases and decreases in the
ary conditions ρ(0, t) = ρ(1, t) = 1 and initial conditions image intensity, indicating c[1] ∼ ∂x and c[2] ∼ ∂y . Given
matching the histogram ρ(x, t = 0). Again, the FVM by the history of PDE modeling, for many classification and
construction is conservative, where it is easily shown for all regression tasks arising in the sciences and engineering, we
t that Ω ρdx = Np . A time series summarizing the evolu-
R expect such derivative-based features extracted by GMLS-
tion of density in both the particle solver and the regressed Nets will be useful in these applications.
continuum model is provided in Fig 4. While this is a ba- GMLS-Net on unstructured fluid simulation data.
sic example, this illustrates the potential of GMLS-nets in
constructing continuum-level models from molecular data. We consider the application of GMLS-Nets to unstructured
These techniques also could have an impact on data-driven data sets representative of scientific machine learning appli-
approaches for numerical methods, such as projective inte- cations. Many hydrodynamic flows can be experimentally
gration schemes. characterized using velocimetry measurements. While veloc-
ity fields can be estimated even for complex geometries, in
Image processing: MNIST benchmark. such measurements one often does not have access directly
While image processing is not the primary application area to fields, such as the pressure. However, integrated quanti-
we intend, GMLS-Nets can be used for tasks such as classifi- ties of interest, such as drag are fundamental for performing
cation. For the common MNIST benchmark task, we compare engineering analysis and yet depend upon both the velocity
use of GMLS-Nets with CNNs in Figure 5. CNNs use kernel and pressure. This limits the level of characterization that
size 5, zero-padding, max-pool reduction 2, channel sizes can be accomplished when using velocimetry data alone. We
16, 32, FC as linear map to soft-max prediction of the cat- construct GMLS-Net architectures that allow for prediction
egories. The GMLS-Nets use the same architecture with a of the drag directly from unstructured fluid velocity data,
GMLS using polynomial basis of monomials in x, y up to without any direct measurement of the pressure.
degree porder = 4. We illustrate the ideas using flow past a cylinder of radius
We find that despite the features extracted by GMLS-Nets L. This provides a well-studied canonical problem whose
being more restricted than a general CNN, there is only a drag is fully characterized experimentally in terms of the
modest decrease in the accuracy for the basic MNIST task. Reynolds number, Re = U L/ν. For incompressible flow
We do expect larger differences on more sophisticated image past a cylinder, one may apply dimensional analysis to relate
tasks. This basic test illustrates how GMLS-Nets with a poly- drag Fd to the Reynolds number via the drag coefficient Cd :
nomial basis extracts features closely associated with taking

2Fd UL
derivatives of the data field. We emphasize for other choices 2 A
= Cd . (12)
ρU∞ ν
The U∞ is the free-stream velocity, A is the frontal area of the structured data sets.
cylinder, and Cd : R → R. Such analysis requires in practice As an architecture, we provide two input channels for the
engineering judgement to identify relevant dimensionless two velocity components to three stacked GMLS layers. The
groups. After such considerations, this allows one to collapse first layer acts on the cell centers, and intermediate pooling
relevant experimental parameters to (ρ, U∞ , A, L, ν) onto a layers down-sample to random subsets of Xh . We conclude
single curve. with a linear activation layer to extract the drag coefficient
as a single scalar output. We randomly select 80% of the
samples for training, and use the remainder as a test set. We
2.5 Training data quantify using the root-mean-square (MSE) error which we
GMLS-Net test data find to be below 1.5%.
The excellent predictive capability demonstrated in Fig. 6
highlights GMLS-Nets ability to provide an effective means
of regressing engineering quantities of interest directly from
Drag coefficient

2
velocity flow data; the GMLS-Net architecture is able to
identify a latent low-dimensional parameter space which is
typically found by hand using dimensional analysis. This
similarity relationship across the Reynolds numbers is identi-
1.5
fied, despite the fact that it does not have direct access to the
viscosity parameter. These initial results indicate some of the
potential of GMLS-Nets in processing unstructured data sets
for scientific machine learning applications.
1
100 10000 1e+06 1e+08
Reynolds number
Conclusions
We have introduced GMLS-Nets for processing scattered
data sets leveraging the framework of GMLS. GMLS-Nets
allow for generalizing convolutional networks to scattered
data, while still benefiting from underlying translational in-
variances and weight sharing. The GMLS-layers provide
feature extractors that are natural particularly for regressing
Figure 6: GMLS-Nets are trained on a CFD data set of flow differential operators, developing dynamical models, and pre-
velocity fields. Top: Training set of the drag coefficient plot- dicting quantities of interest associated with physical systems.
ted as a function of Reynolds number (small black dots). The GMLS-Nets were demonstrated to be capable of obtaining
GMLS-Net predictions for a test set (large red dots). Bottom: dynamical models for long-time integration beyond the lim-
Flow velocity fields corresponding to the smallest (left) and its of traditional CFL conditions, for making predictions of
largest (right) Reynolds numbers in the test set. density evolution of molecular systems, and for predicting
directly from flow data quantities of interest in fluid mechan-
ics. These initial results indicate some promising capabilities
For the purposes of training a GMLS-Net, we construct a of GMLS-Nets for use in data-driven modeling in scientific
synthetic data set by solving the Reynolds averaged Navier- machine learning applications.
Stokes (RANS) equations with a steady state finite volume
code. Let L = ρ = 1 and consider U ∈ [0.1, 20] and
ν ∈ 10−2 , 108 . We consider a k − turbulence model
with inlet conditions consistent with a 10% turbulence inten- References
sity and a mixing length corresponding to the inlet size. From [1] D.S. Broomhead and D. Lowe. “Multivariable Func-
the solution, we extract the velocity field u at cell centers tional Interpolation and Adaptive Networks”. In: Com-
to obtain an unstructured point cloud Xh . We compute Cd plex Systems 2.1 (1988), pp. 321–355.
directly from the simulations. We then obtain an unstruc- [2] T. Poggio and F. Girosi. “Networks for approxima-
tured data set of 400 (u)i features over Xh , with associated tion and learning”. In: Proceedings of the IEEE 78.9
labels Cd . We emphasize that although U∞ and ν are used to (1990), pp. 1481–1497.
generate the data, they are not included as features, and the
[3] Ioannis Karatzas and Steven E Shreve. “Brownian
Reynolds number is therefore hidden.
Motion and Stochastic Calculus”. In: Springer, 1998,
We remark that the k − model is well known to perform pp. 47–127.
poorly for flows with strong curvature such as recirculation
zones. Here, in our proof-of-concept demonstration, we treat [4] I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artificial
the RANS-k − solution as ground truth for simplicity, de- neural networks for solving ordinary and partial dif-
spite its short-comings and acknowledge that a more physical ferential equations”. In: IEEE Transactions on Neural
study would consider ensemble averages of LES/DNS data Networks 9.5 (1998), pp. 987–1000.
in 3D. We aim here just to illustrate the potential utility of [5] Holger Wendland. Scattered data approximation.
GMLS-Nets in a scientific setting for processing such un- Vol. 17. Cambridge university press, 2004.
[6] Susanne Brenner and Ridgway Scott. The Mathemati- [21] Zichao Long et al. “PDE-Net: Learning PDEs from
cal Theory of Finite Element Methods. Springer, 2008. Data”. In: Proceedings of the 35th International Con-
[7] Franco Scarselli et al. “The Graph Neural Network ference on Machine Learning. Ed. by Jennifer Dy
Model”. In: Trans. Neur. Netw. 20.1 (Jan. 2009), and Andreas Krause. Vol. 80. Proceedings of Ma-
pp. 61–80. ISSN: 1045-9227. chine Learning Research. Stockholmsmssan, Stock-
[8] Joan Bruna et al. “Spectral networks and locally con- holm Sweden: PMLR, 2018, pp. 3208–3216.
nected networks on graphs”. English (US). In: In- [22] Ravi G. Patel and Olivier Desjardins. “Nonlinear
ternational Conference on Learning Representations integro-differential operator regression with neural net-
(ICLR2014), CBLS, April 2014. 2014. works”. In: ArXiv abs/1810.08552 (2018).
[9] Steven L. Brunton, Joshua L. Proctor, and J. Nathan [23] Maziar Raissi and George Em Karniadakis. “Hidden
Kutz. “Discovering governing equations from data by physics models: Machine learning of nonlinear partial
sparse identification of nonlinear dynamical systems”. differential equations”. In: Journal of Computational
In: 113.15 (2016), pp. 3932–3937. Physics 357 (2018), pp. 125 –141. ISSN: 0021-9991.
[10] Thomas N. Kipf and Max Welling. “Semi-Supervised [24] Yifan Xu et al. “SpiderCNN: Deep Learning on Point
Classification with Graph Convolutional Networks”. Sets with Parameterized Convolutional Filters”. In:
In: ArXiv abs/1609.02907 (2016). Computer Vision – ECCV 2018. Ed. by Vittorio Ferrari
[11] Federico Monti et al. “Geometric Deep Learning on et al. Cham: Springer International Publishing, 2018,
Graphs and Manifolds Using Mixture Model CNNs”. pp. 90–105. ISBN: 978-3-030-01237-3.
In: 2017 IEEE Conference on Computer Vision and [25] Nathan Baker et al. Workshop report on basic research
Pattern Recognition (CVPR) (2016), pp. 5425–5434. needs for scientific machine learning: Core technolo-
[12] M. M. Bronstein et al. “Geometric Deep Learning: gies for artificial intelligence. Tech. rep. USDOE Of-
Going beyond Euclidean data”. In: IEEE Signal Pro- fice of Science (SC), Washington, DC (United States),
cessing Magazine 34.4 (2017), pp. 18–42. ISSN: 1053- 2019.
5888. DOI: 10.1109/MSP.2017.2693418. [26] Yohai Bar-Sinai et al. “Learning data-driven discretiza-
[13] Charles R. Qi et al. “PointNet: Deep Learning on Point tions for partial differential equations”. In: Proceed-
Sets for 3D Classification and Segmentation”. In: The ings of the National Academy of Sciences 116.31
IEEE Conference on Computer Vision and Pattern (2019), pp. 15344–15349. ISSN: 0027-8424. DOI: 10.
Recognition (CVPR). 2017. 1073/pnas.1814058116.
[14] Charles Ruizhongtai Qi et al. “PointNet++: Deep Hi- [27] M. Raissi, P. Perdikaris, and G.E. Karniadakis.
erarchical Feature Learning on Point Sets in a Metric “Physics-informed neural networks: A deep learning
Space”. In: Advances in Neural Information Process- framework for solving forward and inverse problems
ing Systems 30. Ed. by I. Guyon et al. Curran Asso- involving nonlinear partial differential equations”. In:
ciates, Inc., 2017, pp. 5099–5108. Journal of Computational Physics 378 (2019), pp. 686
[15] Samuel H. Rudy et al. “Data-driven discovery of par- –707.
tial differential equations”. In: 3.4 (2017). [28] Hugues Thomas et al. “KPCONV: Flexible and de-
[16] Nathaniel Trask, Mauro Perego, and Pavel Bochev. formable convolution for point clouds”. In: Proceed-
“A high-order staggered meshless method for elliptic ings of the IEEE International Conference on Com-
problems”. In: SIAM Journal on Scientific Computing puter Vision. 2019, pp. 6411–6420.
39.2 (2017), A479–A502. [29] BJ Gross et al. “Meshfree methods on manifolds for
[17] Manzil Zaheer et al. “Deep Sets”. In: Advances in hydrodynamic flows on curved surfaces: a generalized
Neural Information Processing Systems 30. Ed. by I. moving least-squares (GMLS) approach”. In: Journal
Guyon et al. Curran Associates, Inc., 2017, pp. 3391– of Computational Physics (2020), p. 109340.
3401. [30] Nathaniel Trask, Pavel Bochev, and Mauro Perego.
[18] P. J. Atzberger. “Importance of the Mathematical Foun- “A conservative, consistent, and scalable meshfree
dations of Machine Learning Methods for Scientific mimetic method”. In: Journal of Computational
and Engineering Applications”. In: SciML2018 Work- Physics 409 (2020), p. 109187.
shop, position paper, https://arxiv.org/abs/1808.02213
(2018).
[19] M. Fey et al. “SplineCNN: Fast Geometric Deep
Learning with Continuous B-Spline Kernels”. In: 2018
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition. 2018, pp. 869–877.
[20] Paul Allen Kuberry, Pavel B Bochev, and Kara J Pe-
terson. A virtual control meshfree coupling method for
non-coincident interfaces. Tech. rep. Sandia National
Lab.(SNL-NM), Albuquerque, NM (United States),
2018.
Derivation of Gradients of the Operator τxi [u]. where
∂M X ∂
Parameters of the operator τ̃ . = p(xj , xi ) p(xj , xi )T wij
∂xi ∂xi
We give here some details on the derivation of the gradients for j
the learnable GMLS operator τ [u] and intermediate steps. This
∂
T
can be used in implementations for back-propagation and other + p(xj , xi ) p(xj , xi ) wij
applications. ∂xi

GMLS works by mapping data to a local polynomial fit in region ∂wij
+ p(xj , xi )p(xj , xi )T .
Ωi around xi with p∗ (x) ≈ u(x) for x ∈ Ωi . To find the optimal ∂xi
fitting polynomial p∗ (x) ∈ V to the function u(x), we can consider
the case with λj (x) = δ(x − xj ) and weight function wij = The derivatives in r are given by
w(xi −xj ). In a region around a reference point x∗ the optimization X ∂
∂r ∂wij
problem can be expressed parameterically in terms of coefficients a = p(xj ) uj wij + p(xj )uj .
as ∂xi j
∂xi ∂xi
X 2
a∗ (xi ) = arg minm uj − p(xj )T a wij . The full derivative of the linear operator τ̃ can be expressed as
a∈R
j
∂ ∂ ∂ ∗
We write for short p(xj ) = p(xj , xi ), where the basis elements τ̃ (xi ) = q(xi )T a∗ (xi ) + q(xi )T a (xi ) .
∂xi ∂xi ∂xi
in fact do depend on xi . Typically, for polynomials we just use
p(xj , xi ) = p(xj − xi ). This is important in the case we want to In the constant case q(xi ) = q0 , the derivative of τ̃ simplifies to
take derivatives in the input values xi of the expressions.
We can compute the derivative in a` to obtain ∂
τ̃ (xi ) = qT0
∂ ∗
a (xi ) .
∂xi ∂xi
∂J
(xi ) = 0. The derivatives of the other terms follow more readily. For deriva-
∂a`
tive of the linear operator τ̃ in the coefficients a(xi ), we have
This implies
∂
" # τ̃ (xi ) = q(xi ).
X X ∂a(xi )
p(xj )wij p(xj )T a = wij p(xj )uj .
j j For derivatives of the linear operator τ̃ in the mapping coefficient q
values, we have
Let ∂
τ̃ (xi ) = a(xi ).
" # ∂q(xi )
In the case of nonlinear operators τ̃ = q(a(xi )) there are further
X T
X
M= p(xj )wij p(xj ) , r= wij p(xj )uj ,
j j dependencies beyond just xi and a(xi ), and less explicit expres-
sions. For example, when using MLP’s there may be hierarchy of
then we can rewrite the coefficients as the solution of the linear trainable weights w. The derivatives of the non-linear operator can
system be expressed as
M a∗ (xi ) = r.
∂ ∂q
τ̃ (xi ) = (a(xi )).
This is sometimes written more explicitly for analysis and computa- ∂w ∂w
tions as Here, one relies on back-propagation algorithms for evaluation of
a∗ (xi ) = M −1 r. ∂q
. Similarly, given the generality of q(a), for derivatives in a and
∂w
We can represent a general linear operator τ̃ (xi ) using the a∗ repre- xi , one can use back-propagation methods on q and the chain-rule
sentation as with the expressions derived during the linear case for a and xi
dependencies.
τ̃ (xi ) = q(xi )T a∗ (xi )
Typically, the weights will not be spatially dependent q(xi ) = q0 .
Throughout, we shall denote this simply as q and assume there is
no spatial dependence, unless otherwise indicated.

Derivatives of τ̃ in xi , a(xi ), and q.
The derivative in xi is given by

∂ ∗ ∂M −1 ∂r
a (xi ) = r + M −1
∂xi ∂xi ∂xi
In the notation, we denote p(xj ) = p(xj , xi ), where the basis
elements in fact can depend on the particular xi . These terms can
be expressed as

∂M −1 ∂M −1
= −M −2 ,
∂xi ∂xi