=Paper= {{Paper |id=Vol-2962/paper40 |storemode=property |title=Groundwater Flow Meta-model for Multilevel Monte Carlo Methods |pdfUrl=https://ceur-ws.org/Vol-2962/paper40.pdf |volume=Vol-2962 |authors=Martin Špetlík,Jan Březina |dblpUrl=https://dblp.org/rec/conf/itat/SpetlikB21 }} ==Groundwater Flow Meta-model for Multilevel Monte Carlo Methods == https://ceur-ws.org/Vol-2962/paper40.pdf
         Groundwater Flow Meta-model for Multilevel Monte Carlo Methods

                                                        Martin Špetlík, Jan Březina

                  Technical University of Liberec, Faculty of Mechatronics, Informatics and Interdisciplinary Studies,
                                        Institute of New Technologies and Applied Informatics
                                         Studentská 1402/2, 461 17 Liberec 1, Czech Republic
                                          martin.spetlik@tul.cz, jan.brezina@tul.cz

Abstract: This paper presents a meta-modeling-based                       that this approach could reduce the computational cost of
technique to reduce the computational cost of Monte Carlo                 MLMC estimates. It is also necessary to maintain the ac-
methods. A stochastic simulation of groundwater flow                      curacy of the estimates to achieve a good PDF approxima-
is substituted with a graph neural network meta-model.                    tion. Graph convolutional neural networks and the support
This type of neural network can deal with a non-euclidean                 vector regression [2] are used as meta-models.
structure of the input data, which in our case is a graph                    First, the groundwater flow problem definition and the
representing a random field on an unstructured mesh. In                   maximum entropy method brief introduction are provided
order to find the most suitable meta-model, a comparison                  in section 2. Then, Monte Carlo methods are presented in
of the standard support vector regression with spectral and               section 3. Spektral and spatial graph convolutional neu-
spatial graph convolutional neural networks is provided.                  ral networks used as meta-models are delineated in sec-
Both the Monte Carlo method and the multilevel Monte                      tion 5. Section 6 provides the link between the MLMC
Carlo method are extended by the meta-model level. In                     and a meta-model, including the related work. Finally,
both cases, up to 50% savings in computational cost are                   section 7 is devoted to the results and discussion.
achieved while maintaining the accuracy of Monte Carlo
estimates.
                                                                          2     Problem Definition

1    Introduction                                                         A 2D benchmark problem of groundwater flow through a
                                                                          porous medium is used to test our proposed methods. The
Groundwater flow in the vicinity of a future nuclear waste                same problem can be found in the paper by Blaheta et al.
repository is the major motivation for our research. In the               [5]. The groundwater flow is described by the boundary
first stage, our quantity of interest (QoI) is the total flow             value problem on the unit square:
through the observed area. However, since not all indis-
                                                                                              −div(K(x)∇p) = 0                     (1)
pensable properties of the rock environment are known,
it is not feasible to determine the desired total flow. For                                        −K∇p ⋅ n⃗ = 0,
that reason, and given the nature of the rock, the missing
                                                                          K is the hydraulic conductivity, p is the pore water pres-
properties are modeled as random fields (RFs). The proba-
                                                                          sure, −K∇p is Darcy’s velocity, n⃗ is the unit normal vector.
bility density function (PDF) of our QoI is under scrutiny.
                                                                          We are interested in the total flow Y through the specified
To approximate the PDF, the maximum entropy method is
                                                                          area
adopted. The method utilizes so-called generalized statis-                                         1
tical moments. In order to estimate these moments, the                                   Y = ∫ [−K∇p ⋅ n⃗](1,y)dy.                 (2)
                                                                                               0
Monte Carlo methods are employed. These methods con-                       The problem is prescribed on unstructured meshes and is
sist in repeating a random experiment. Depending on the                   solved by the finite element method using the Flow123d
required accuracy, it might result in thousands of ground-                software [6]. Since the proposed approach shall be
water flow simulations, which can significantly affect the                adopted for more complicated problems with complex ir-
total computational cost.                                                 regular geometry, unstructured meshes are preferred over
   This fact was behind the idea to substitute simulations                structured ones. The hydraulic conductivity is consid-
with an approximation model, often called a meta-model.                   ered as a random field (RF) with the exponential covari-
Furthermore, we use the multilevel Monte Carlo method                     ance function C(r) = σ 2 exp(− λr ), where λ is a correla-
(MLMC) instead of the standard Monte Carlo method to                      tion length and σ is a standard deviation. The GSTools
reduce the variance of estimates more effectively. Since                  software library [23] is utilized to generate RF; see Hesse
the MLMC uses simulations of different accuracy, we aim                   et al. [16] for theoretical details.
to substitute those with the lowest accuracy, performed
in the largest number. Their approximation shouldn’t be
excessively challenging for meta-modeling. We presume                     2.1   Maximum Entropy Method

     Copyright ©2021 for this paper by its authors. Use permitted under   When considering the hydraulic conductivity as a random
Creative Commons License Attribution 4.0 International (CC BY 4.0).       field, the total flow Y is a real random variable. The aim is
                                                               numbers. Therefore, in order to obtain an unbiased esti-
                                                               mate of the expected value Y = E[P(x)], the MC consists
                                                               in the arithmetic mean of N independent samples

                                                                                             1 N
                                                                                      Ŷ =     ∑ P(xn ),                  (5)
                                                                                             N n=1

                                                               where P is a random variable depending on X. According
                                                               to the central limit theorem:

                                                                                                         σ2
                                                                                      Ŷ ∼ N (µ,            ).            (6)
                                                                                                         N

                                                               Consequently, the computational cost of reducing the vari-
Figure 1: An illustration of the porous media flow problem     ance (increasing N) of Ŷ can be very high. To overcome
                                                               this drawback (for details, see [11, p. 4]), the multilevel
                                                               Monte Carlo method (MLMC) was formulated.
to determine its probability density function ρ(Y ), which
is nonnegative on a bounded domain Ω. The maximum
entropy method [17] (MEM) is employed for this purpose.        3.1      Multilevel Monte Carlo Method
Its basic idea lies in approximating a PDF from general-
                                                               In the case of the MLMC [11], the expected value of a
ized statistical moments and functions that calculate them:
                                                               random variable P is estimated based on the sequence of
                                                               its approximations P1 , ..., PL :
                 ∫ ρ(y)dy = 1,
                   Ω
                                                        (3)                                         L
            ∫ φr (y)ρ(y)dy = µr ,    r = 1,...,R                               E[P] = E[P1 ] + ∑ E[Pl − Pl−1 ],           (7)
             Ω                                                                                     l=2
where ρ(Y ) is an unknown, {µr }Rr=1 are values of gen-
eralized moments, {φr }Rr=1 are linearly independent func-     where Pl−1 ≈ Pl . The estimate of the expected value of P
tions (see [3, p. 1350]) used for calculating generalized      improves from P1 to PL . The unbiased estimate of E[P]:
moments and satisfying φ1 = 1, φr ∈ CR (Ω), r = 2,...,R. In
our case, {φr }Rr=1 are the Legendre polynomials, R ∈ N is              1 N1             L
                                                                                            1 Nl
                                                                 P̂ =      ∑ P1 (xn ) + ∑{ ∑ (Pl (xn ) − Pl−1 (xn ))}, (8)
                                                                                  1                l            l
the number of moments.                                                  N1 n=1          l=2 Nl n=1
   Provided that the system of equations 3 has a solution,
then it has an infinite number of them. Therefore, the         where L is the total number of levels, Nl is a number of
most apposite solution is the one with the maximum Shan-       simulation samples at level l. Input random data xnl are
non entropy (defined in [27]) according to E. Jaynese [17,     dependent for each simulation pair n at level l. But they
p. 623]. Determining ρ(Y ) with the maximum entropy            are independent across levels. Since Pl − Pl−1 are also in-
takes the form of finding the global maximum of the func-      dependent across levels, the total estimated variance V̂ of
tional                                                         P̂ has the following form:
               H(ρ) = − ∫ ρ(y)ln(ρ(y))dy                (4)                                        L
                          Ω                                                                        V̂l
                                                                                         V̂ = ∑        ,                  (9)
under the constraints prescribed in equations 3.                                               l=1 Nl
   A numerical solution and general limitations of the
MEM are beyond the scope of this paper. A more detailed        where V̂1 is an estimate of P1 variance and V̂l for l > 1 is an
insight is provided in [3, 4].                                 estimate of Pl − Pl−1 variance. The MLMC computational
                                                               cost:
                                                                                               L
3   Monte Carlo Methods                                                                 C = ∑ Nl Cl ,                    (10)
                                                                                              l=1

Since the MEM utilizes moment values {µr }Rr=1 that are        where Cl is a cost of a single simulation sample at level l
also random variables derived from some random input X         measured in, for instance, the number of computational
(which is the hydraulic conductivity K in this study). It is   operations, execution time, CPU time, etc. Let C1 denote
necessary to estimate their expected values. Monte Carlo       a cost of P1 , Cl is a cost of Pl − Pl−1 for l > 1.
methods are used for this.                                        Given the MLMC theoretical properties stated in [11,
   The standard Monte Carlo method (MC) is an approach         theorem 1], variance Vl should decrease from l = 1 to l = L,
for estimating expected values of some stochastic simula-      while computational cost Cl should increase from l = 1 to
tion variable. The basic idea comes from the law of strong     l = L.
3.2   Number of Simulation Samples                              number of grid points to capture features from original
                                                                unstructured data, according to [22]. Wang et al. [29]
The way of determining Nl is a crucial part of the MLMC.        proposed a CNN on unstructured meshes. This novel ap-
There are two main approaches:                                  proach is limited to random fields described by the spectral
    • minimizing the total variance V with respect to the       representation method or the Karhunen-Loève expansion.
      prescribed computational cost                             Another popular approach is the representation of unstruc-
    • minimizing the total computational cost C with re-        tured meshes as undirected graphs, which allows us to use
      spect to the prescribed target variance Vt .              graph convolutional neural networks (GCNs) [31]. In this
                                                                study, we use GCN-based meta-models and support vector
Our attention is paid to the latter approach, which has the     regression meta-models.
form of finding the minimum of equation 10 under the con-
straint
                               L
                                  V̂l                           4.1   Graph Representing a Random Field
                        Vt = ∑ .                       (11)
                                  N
                              l=1 l                             A graph G = (V,E) is an unweighted undirected graph de-
After some calculus and concerning the use of the MEM,          scribing the mesh structure on which a random field is gen-
Nl are determined with respect to R moment values               erated. V is a set of vertices representing mesh elements,
                                                                V = {v1 ,v2 ,...,vS }, S is the number of vertices (=mesh el-
                 ¿           √
                 Á V̂ r ∑Li=1 V̂irCi                            ements). The neighborhood of a vertex v is defined as
           Nlr = Á
                 À l                 , r = 1,...,R, (12)        N(v) = {u ∈ V ∣(v,u) ∈ E}. Each vertex has a feature rep-
                   Cl        Vt                                 resenting a random field value at the corresponding mesh
                                                                element. E is a set of edges. An edge connects adjacent
where V̂lr is an estimated variance of φlr (x) − φl−1
                                                  r
                                                      (x) for   mesh elements. An adjacency matrix A is used to represent
r-th moment at level l. Finally, Nl = max Nl .r
                                                                G on a computer.
                                     r=1,...,R
  The mlmc software library [7] is employed to schedule
samples and post-process results, including our MEM im-
plementation.


4     Meta-modeling

A meta-model is a simpler and explicit mathematical func-
tion that approximates complicated functions that can be
both implicit and evaluated by a simulation model, mea-
surements data, or experiments. Alternative names can
be found in the literature, such as surrogate model, sur-
face response model, emulator, etc. Meta-modeling is the
process of designing meta-models. The common meta-
models used in the finite element analysis include support
vector regression (SVR), artificial neural network (ANN),
Kriging, also known as Gaussian process [19], radial basis       Figure 2: An example of a graph on 546 mesh elements
function (RBF) [35], etc.
   Artificial neural networks have become very popular for
meta-model design. A regression problem is solved by an
ANN trained by supervised learning. The aim is to ob-           5     Graph Neural Networks
tain an unknown mapping of input neuron activities to an
output neuron activity. With regard to the nature of our        A variety of real-world problems can be represented as
input data, which is a 2D correlated random field, it would     graphs. Imagine social networks, molecules, or in our
be suitable to use a convolutional neural network (CNN)         case, random fields on unstructured meshes.
meta-model. Nevertheless, CNNs cannot be applied di-               Graph neural networks (GNNs) are deep learning-based
rectly to data on unstructured meshes (see [31]).               models that operate on the graph domain [34]. They have
   There are few ways how to overcome this difficulty. The      a wide range of applications in classification, relation ex-
basic solution is to directly apply a 1D CNN to nodal val-      traction, molecular fingerprints prediction, and so on. A
ues of an unstructured mesh considered as a vector. How-        comprehensive survey on graph neural networks can be
ever, it has limited success [15]. The other option is in-      found in [30]. In brief, GNNs are categorized into sev-
terpolating data from an unstructured mesh to a structured      eral groups, such as graph convolutional networks [33],
mesh and then using a CNN. This approach leads to an ad-        graph attention networks, graph recurrent networks, etc.
ditional error caused by interpolation. It requires a larger    The graph convolutional networks (GCNs) are the most
important ones because they are the fundamental of other           ChebNet GCN with a Chebyshev convolutional layer was
graph neural network models (see [21]). GCNs can be di-            first introduced by Defferrard et al. [10] in 2016. The fil-
vided into spectral GCNs and spatial GCNs.                         ter g is approximated by a truncated expansion of Cheby-
                                                                   shev polynomials Tk (x) up to the K th order:
                                                                                                  K−1
5.1   Spectral Graph Convolutional Neural Networks
                                                                                              gθ = ∑ θk Tk (Λ̃),                          (19)
                                                                                                   k=0
Spectral GCNs are based on knowledge from the field of
graph signal processing. The well-known convolution has            where θk ∈ Rk is a vector of Chebyshev coefficients,
the following form:                                                Λ̃ = 2Λ/λmax − I, λmax is the maximum eigenvalue from
                                                                   Λ. The Chebyshev polynomials are defined recursively by
                                                                   Tk≥2 (x) = 2xTk−1 (x) − Tk−2 (x) with T0 (x) = 1, T1 (x) = x.
              ( f ∗ g)(x) = ∫        f (t)g(x −t)dt,        (13)
                                Rk                                 Then x filtered by g:

but it is unclear how to interpret the translation g(x −t) for                                     K−1
                                                                                      x ∗G g = U( ∑ θk Tk (Λ̃))U T x.                     (20)
a graph signal. Thus the convolution operation is not de-                                          k=0
fined on structures like graphs. W. Hamilton [13] or K. Ot-
ness [24] provides a detailed explanation of the graph con-        The eigendecomposition can be avoided [30, p. 10]:
volution. The basic idea is to take advantage of the fact                                              K−1
that convolution in the spatial domain corresponds to the                               x ∗G g = ( ∑ θk Tk (L̃))x,                        (21)
point multiplication in the spectral domain.                                                           k=0
   A graph signal x ∈ RS (vector of all G vertex values) is
                                                                   where L̃ = 2L/λmax −I. L̃ is called the rescaled graph Lapla-
transformed by a graph Fourier transform from the spatial
                                                                   cian, the eigenvalues are mapped from [0,λmax ] to [−1,1],
domain to the spectral domain. Loosely speaking, the stan-
                                                                   the Chebyshev polynomials form an orthogonal basis.
dard Fourier transform is connected to the eigendecompo-
                                                                      Several K settings were tested. Since K > 1 did not bring
sition of the Laplace operator. In the case of graphs, the
                                                                   improvement for our problem, K = 1 is used in our study.
Laplace operator is represented by the Laplacian matrix
                                                                   For this setting, the ChebNet is very similar to the GCN
[33, p. 4]. The normalized Laplacian matrix L can also be
                                                                   proposed by Kipf and Welling [18]. Filters are exactly
used as a Laplace operator:
                                                                   K-localized. It means the filter modifies information at a
                      L = I − D−1/2 AD−1/2 ,
                                                                   particular vertex based on the information from vertices in
                                                            (14)
                                                                   its K neighborhood. It essentially connects spectral-based
                                                                   methods with spatial-based ones.
where A is a graph adjacency matrix, and D is a diag-
onal matrix of vertex degrees. L is a symmetric real-
valued positive semi-definite matrix that can be factorized        5.2     Spatial Graph Convolutional Neural Networks
L = UΛU T , where U is a matrix of eigenvectors and Λ is
                                                                   As for spatial GCNs [9], the convolution is performed in
a diagonal matrix of eigenvalues. Then the graph Fourier
                                                                   the graph domain by propagating information along edges
transform of x is [34, p. 60]:
                                                                   between adjacent vertices. The concept is based on mes-
                                                                   sage passing neural networks [12]. The spatial graph con-
                           F(x) = U T x                     (15)   volution is defined as follows [30, p. 12]:
and its inverse:                                                          (k)         (k−1)                  (k−1)     (k−1)
                                                                         hv = Uk (hv          , ∑ Mk (hv             ,hu         e
                                                                                                                               ,xvu )),   (22)
                           −1
                          F (x̂) = U x̂.                    (16)                              u∈N(v)

                                                                                (k)
Finally, the graph convolution of x and a filter g ∈ Rk :          where hv represents features of vertex v in a hidden layer
                                                                       e
                                                                   k, xvu is an optional feature vector of an edge (v,u). Uk and
                   x ∗G g = F −1 (F(x) ⊙ F(g)),             (17)   Mk are the update and message functions with learnable
                                                                   parameters.
where ∗G is the convolutional operator on a graph, and ⊙
represents the element-wise Hadamard product. The sig-
                                                                   GraphSage is an aggregation-based learning model pro-
nal x can be filtered in the spectral domain by a filter gθ in
                                                                   posed by Hamilton et al. [14]. It performs the convolution
this way (for details, see [13, p. 83]):
                                                                   as follows:
                                                                          (k)                   (k−1)        (k−1)
                     x ∗G gθ = Ugθ (Λ)U T x,                (18)      hv = σ (W (k) ⋅ fk (hv            ,{hu         ∀u ∈ SN(v) })),      (23)

where gθ (Λ) is a polynomial of the eigenvalues of the             where h0v is a representation vector of vertex v features, σ
Laplacian. The filter represents learnable weights.                represents a nonlinear activation function, W (k) is a weight
matrix in layer k, fk represents an aggregation function,                First, the number of training samples Ntr is determined,
SN(v) ⊆ N(v). Thus GraphSage enables the use of huge                  and numerical simulations are executed. Then, for each
graphs by selecting a subset from each vertex neighbour-              training sample, a stored random field is pre-processed to
hood instead of using all neighbours. There are different             become a meta-model input, let C pr denote the cost of this
operations used as fk , such as average, sum, max, min, etc.          operation. The pre-processing can slightly differ for dif-
In this study, we use GraphSage with a single layer, f is             ferent meta-models. The SVR input is a vector of random
the summation.                                                        field elements, while the GCN input is their graph. After-
   A comparison between spectral GCNs and spatial                     ward, a meta-model is trained, and the cost of this opera-
GCNs is summarized by Wu et al. [30, p. 13]. They                     tion is Cml . Once the meta-model is prepared, it is possible
state that spatial GCNs are usually preferred over spectral           to make predictions at the cost of C pred . In the case of the
GCNs. However, in our problem 2, we aim to approximate                meta-level, there is no stored random field because there is
the solution of the elliptic partial differential equation. To        no simulation executed. Thus a random field is generated
do that numerically, it is possible to use spectral methods           and pre-processed at the cost of Cr f .
[8]. This entitles us to assume that spectral GCNs will be               Let C2 denote the computational cost of a sample at the
more appropriate.                                                     first MLMCmeta level:

                                                                                  C2 = C1∗ + (C2tr +C2pr (N2 − Ntr ))/N2 ,     (25)
6     Monte Carlo Methods with Meta-model
                                                                      where C1∗ is the cost of a simulation sample. C2tr is the cost
There are several approaches how to use meta-models to                of a meta-model training procedure:
reduce the computational cost of Monte Carlo estimates.
The relatively frequent approach (e.g., [1, 20, 28]) is to                           C2tr = C pr Ntr +Cml +C pred Ntr ,        (26)
construct a meta-model of a simulation, then the MC is
conducted with the meta-model instead of the simulation.              C2pr represents the cost of a first level meta-model predic-
Other approaches consist in replacing the whole Monte                 tion sample:
Carlo estimate by a meta-model. For instance, Safta et al.                                 C2pr = C pr +C pred .              (27)
[26] use polynomial chaos expansion meta-model, which
requires a significantly lower number of samples than the             Let C1 denote the cost of a meta-level sample that utilizes
MC. Rosenbaum and Staum [25] construct a stochastic                   meta-model trained at the first level:
simulation meta-model by way of the MLMC.
   We propose a different approach based on adding a new                                    C1 = Cr f +C pred .                (28)
coarse meta-model-based level to the original Monte Carlo
                                                                         Sample costs Cl for l > 2 are not affected by the meta-
method (MC or MLMC). The MLMC estimate from equa-
                                                                      modeling. Meta-models training is performed on the clus-
tion 8 has now the following form:
                                                                      ter; 16 CPUs (Intel Xeon Silver 4114 CPU (2.2GHz)) and
               1 N1             1 N2                                  16 GB RAM (DDR4 2400 ECC Reg dual rank) are as-
    P̂meta =      ∑ P̃1 (xn ) +
                          1
                                   ∑ (P1 (xn ) − P̃1 (xn ))+
                                           2           2
                                                                      signed for that task. Also, MLMC simulations are exe-
               N1 n=1           N2 n=1
                                                               (24)   cuted in parallel. Therefore, C measured in execution time
                          L
                             1 Nl                                     is not equal to the real elapsed time. However, the savings
                         ∑{ ∑ (Pl (xn ) − Pl−1 (xn ))},
                                    l            l
                         l=3 Nl n=1                                   in C can significantly affect the total elapsed time, espe-
                                                                      cially for a small Vt , which is necessary to obtain a neat
where P̃1 denotes a meta-model approximation of P1 .                  PDF by the MEM.
Since the MLMC assumes Pl−1 ≈ Pl , it is completely valid
to employ a meta-model as the coarsest level (which we
denote as the meta-level) and the difference between P1               7   Results
and P̃1 as the subsequent first level. Let MLMCmeta denote
the multilevel Monte Carlo method with the meta-level.                The proposed meta-modeling techniques are investigated
   In order to meet basic theoretical properties of the               in this section. The most suitable one is used by Monte
MLMC (mentioned in section 3.1) the meta-level compu-                 Carlo methods. The models are compared for different
tational cost C1 should be lower than C2 , and meta-level             mesh sizes and random field parameters.
samples variance V1 should be greater than V2 .                          The cross-validation-like procedure is implemented to
                                                                      compare models. Initially, N = 50000 simulation sam-
6.1    Computational cost                                             ples are generated. The number of training samples Ntr
                                                                      is empirically determined based on the properties of the
The total MLMC cost (see equation 10) depends on Cl ,                 MLMC. Since we are interested in Vt ≤ 1e−5 (for the sake
in our case measured in execution time. As a meta-model               of the MEM capability to approximate a PDF neatly), the
brings additional computational costs, parts of the learning          number of samples at the coarsest level is at least 2000
process, including their costs, are introduced.                       for our problem. Thus, the procedure is as follows: 2000
training samples (Ntr = 2000) out of N are chosen for meta-        Table 2 and Table 3 show the models final accuracy
model training. The rest is considered to be test data. Val-    on training data and test data, respectively. The arith-
idation data accounts for 20% of Ntr . This procedure is        metic mean and the standard error of 25 calculations of
                                                                the mean squared error (MSE): D1 ∑Di=1 (y −ymeta ) are pro-
                                                                                                          i i     2
repeated 25 times with independent training sets. Given
that we use a random field with the exponential correla-                       i                         i
                                                                vided, where y is a correct value and ymeta is a predicted
tion function, the logarithm of the RF values is used as        value, D is a number of samples. The relative squared er-
meta-models input to facilitate their training. In all of the   ror (RSE) is used to compare the models across cases (see
following cases, the final Nl is determined based on the        Table 4).
geometric sequence of the initial number of samples de-
creasing across levels from N10 = 2000 to NL0 = 100. The                     Table 2: Meta-models train MSE
number of moments R = 25 is utilized.
                                                                                           number of mesh elements
                                                                  train MSE
7.1   Meta-models Setting
                                                                                           6          48          546
We have tried dozens of GCN topologies. The most                  arithmetic mean
promising ones consist of one ChebNet/GraphSage layer             SVR                  0.01711     0.007185    0.004657
with 8/60 output channels and the ReLU activation func-           ChebNet GCN          0.03476     0.009441    0.004633
tion, following by a global summation pool and the output         GraphSage GCN        0.02368     0.01045     0.006848
layer with one neuron with the identity activation function.      standard error
Table 1 contains main hyperparameters that enable to ob-          SVR                   4.1e−4      6.5e−5       3.6e−5
tain an accurate and computationally efficient meta-model.        ChebNet GCN           7.0e−3      8.1e−5       2.2e−4
Another important hyperparameter is the number of output          GraphSage GCN         4.3e−4      9.9e−5       1.8e−4

  Table 1: GCN hyperparameters common to all cases
                                                                              Table 3: Meta-models test MSE
                    optimizer         Adam
               hidden activation      ReLU                                                 number of mesh elements
                                                                  test MSE
               output activation     identity
                  learning rate       0.001
                                                                                           6          48          546
                 regularization       None
                       loss           MSE                         arithmetic mean
                  max epochs          2000                        SVR                  0.03694     0.008968    0.006696
                     patience          150                        ChebNet GCN          0.04331     0.008392    0.004595
                                                                  GraphSage GCN        0.02284     0.009200    0.006839
                                                                  standard error
channels. The optimal number differs across types of GCN
                                                                  SVR                   6.0e−4      5.4e−5       6.1e−5
layers. It forms a vector of hidden representations corre-
                                                                  ChebNet GCN           2.6e−3      7.9e−5       1.9e−4
sponding to each vertex. It is possible to think of each
                                                                  GraphSage GCN         1.9e−4      1.2e−4       1.9e−4
channel as responding to some different set of features, so
different channels could become specialized to recognize
different objects as described by Zhang et al. [32].
                                                                              Table 4: Meta-models test RSE
   Regarding the SVR, the Gaussian radial basis function
(RBF) kernel and the regularization parameter W = 0.06
are used. An explanation of the role of SVR parameters is                                 number of mesh elements
                                                                     test RSE
provided in [2].
                                                                                               6      48        546

7.2   Meta-models on Different Meshes                                SVR                  0.1263    0.1551    0.1335
                                                                     ChebNet GCN          0.1481    0.1451    0.1064
Three meta-model techniques are compared: the SVR,                   GraphSage GCN        0.1162    0.1591    0.1364
the ChebNet GCN, and the GraphSage GCN. Recall that
low accurate simulations are supposed to be substituted            Data in the tables show that all models provide similar
with meta-models. Hence the emphasis is placed on meta-         results in terms of the train MSE and the test MSE, and
models trained on small meshes. In particular, meshes of        also, the RSE values are of the same order of magnitude.
6, 48, and 546 elements are compared. Random field pa-          Importantly, presented data show no significant outliers
rameters are by default λ = 0.1 and σ = 1. For a given mesh     that would be highly undesirable for our purposes. Al-
size, the models are trained and tested on the same data.       though the results are very similar, it can be seen that the
optimal meta-model varies depending on the mesh size.                   Table 5: Computational cost of MLMCmeta to MC
While the GraphSage GCN or the SVR is more advanta-
geous for very small meshes, the use of the ChebNet GCN                                        number of mesh elements
                                                                          Cmeta /C
is most suitable in the case of larger meshes.
   Nevertheless, additional experiments show that the                                             6           48         546
meta-model approximation is not sufficiently accurate for
meshes of thousands of elements. We face the so-called                    SVR                 0.2077      0.4947     0.4666
curse of dimensionality due to the limited number of train-               ChebNet GCN         0.3648      0.4371     0.4395
ing samples. An increasing number of samples can over-                    GraphSage GCN       0.2812      0.4485     0.4494
come this difficulty, but many simulations need to be
performed in such a case, and learning cost increases.
Thus, to keep Ntr = 2000, we limited ourselves to meta-              In the rest of this section, MLMCmeta employs the Cheb-
models based on simulations on meshes with a maximum              Net GCN on 546 mesh elements. Since the MEM uti-
of ca. 1000 elements.                                             lizes moment values µ, it is important to verify if their
   In practice, we see that the ChebNet GCN is not only           estimates by the MLMCmeta are the same as estimates by
better for larger meshes (from ca. 500 up to ca. 1000 el-         the original MC. We construct a reference MC (MCref ) of
ements) but also provides a more stable learning process          50000 samples on meshes of 217208 elements, moments
in terms of a smooth decrease of a validation loss than           estimates are denoted as µ̂ ire f . Figure 3 shows the MSE:
                                                                               i        i 2
                                                                  25 ∑i=1 ( µ̂ re f − µ̂ ) for estimated moment values by the
the GraphSage GCN. Therefore the ChebNet GCN is pre-               1   25

ferred over the GraphSage GCN for our task.                       MC, and the MSE: 25       1          i         i
                                                                                              ∑i=1 (µ̂ re f − µ̂ meta )2 for moments
                                                                                                25

                                                                  estimated by the MLMCmeta .
7.3   Role of Random Field Parameters
                                                                                                                               MC
Random field parameters affect the meta-model learning                                                                         MLMCmeta
ability. Changing the standard deviation σ only scales                 0.008
features of graph vertices, which our learning procedure
can handle. The correlation length λ plays a more sig-                 0.006
nificant role. As λ decreases, the correlation between the
                                                                 MSE




features of vertices decreases as well. For small correla-             0.004
tion lengths, e.g., λ = 0.001, all features are almost uncor-
related, which has a similar effect as increasing the num-             0.002
ber of vertices with the original correlation length λ = 0.1.
Thus, increasing the number of training samples is neces-
                                                                       0.000
sary to obtain the same results for λ = 0.001.                                 0      5         10       15         20           25
   This reveals one of the SVR drawbacks, which is the re-                                     i-th moment
quirement of a lot of training samples [2, p. 76]. While the
number of training samples 5000 is enough for the Cheb-           Figure 3: Comparison of the MSE between reference mo-
Net GCN on meshes of 546 elements and λ = 0.001. In               ments and moments estimated by MC and MLMCmeta ,
the case of the SVR, even 15000 training samples is not           Vt = 1e−5
enough to get similar results for λ = 0.001 as for λ = 0.1.
For this reason, the ChebNet GCN is preferred for further            Since the error of moments estimates is similar for both
analysis. It is also worth noting that naturally, fewer train-    the MC and the MLMCmeta , it is a good prerequisite for
ing samples are enough for λ > 0.1.                               a fine PDF approximation. Figure 4 shows an example
                                                                  of a PDF approximated by the MEM based on moments
                                                                  from the MC and the MLMCmeta . Given the 25 repeti-
7.4   MC extended by the meta-level                               tions, the average KL divergence KL(ρre f ∣∣ρMC ) = 0.034
Let use the trained meta-models with the Monte Carlo              and KL(ρre f ∣∣ρMLMCmeta ) = 0.031, for R = 25, Vt = 1e−5.
method described in section 6. To illustrate our approach,        Thus, it is possible to get comparable results by both ap-
we use three different MC with simulations on meshes              proaches.
of 6, 48, and 546 elements, each extended to two-level
MLMCmeta . Table 5 shows the ratio between the total cost         7.5     MLMC extended by the meta-level
Cmeta of our MLMCmeta and the total cost C of the orig-
inal MC, Vt = 1e−5. It is important to note that the ratio        Since we cannot effectively train the meta-models on
Cmeta /C is a bit smaller for Vt << 1e−5, where the meta-         meshes with more than ca. 1000 elements, it is advisable
model learning cost is negligible due to the larger N2 . It       to use initially the MLMC that generally reduces the com-
emerges that the computational cost savings of at least           putational cost compared to the MC. Applying formula 24,
50% are achieved for R = 2.                                       a meta-model is trained on the coarsest level, where a
      2.0                                                                                                                         25
                                           DMC : 0.0002149                                                      MLMC
                                           Dmeta : 0.0002178               101                                  MLMC meta
      1.5                                  MC ref                                                                                 20
                                                                           10 1                     5.52     1.52 0.0199
      1.0                                                                                   62.6                                  15
PDF




                                                                           10 3




                                                                     Vrl




                                                                                                                                      r
      0.5                                                                                                                         10
                                                                           10 5
      0.0                                                                  10 7                                                   5
        0.0     0.5        1.0       1.5         2.0           2.5
                                 Y
                                                                             10 3           10 2           10 1             100
Figure 4: Comparison of PDFs approximated by mo-                                               h - mesh step
ments from the MC (blue) and the MLMCmeta (red dotted),              Figure 5: Moment variances across MLMC levels. The
DMC = KL(ρre f ∣∣ρMC ) is the Kullback-Leibler (KL) diver-           level is identified by its simulation mesh step h, added
gence between PDF ρMC from MC data and the reference                 number represents the MLMCmeta Cl
PDF ρre f , Dmeta = KL(ρre f ∣∣ρMLMCmeta ), where ρMLMCmeta
is the density from MLMCmeta data, Vt = 1e−6, R = 25
                                                                         Table 6: Computational cost of MLMCmeta to MLMC

mesh should have a small number of elements. Thus,                                  R   Cmeta /C       25
                                                                                                   KL(ρre f ∣∣ρMLMCmeta )
                                                                                                               R

meta-model learning is feasible. To illustrate the MLMC
                                                                                   2     0.41               0.522
extended by the meta-level, let assume 3 level MLMC
                                                                                   5     0,64              0.00227
with simulations on meshes of 546, 6772 and 87794 el-
                                                                                  15     0,83              0.00338
ements, and the ChebNet GCN meta-model. RF parame-
                                                                                  25     0,87              0.00503
ters: λ = 0.1, σ = 1.
                                                                                  50     0,87               0.0112
   Figure 5 shows the variance decrease from the coars-
                                                                                  75     0.85               0.0176
est level to the finest level. Levels are here defined by a
simulation step h. The smaller the h, the finer the simula-
tion. It can be observed that in our current MLMC imple-
mentation, the decrease is steeper for lower moments. In             8      Conclusions
general, the more the variance across levels decreases, the
more effective the use of the MLMC. Thus, in our case, a
higher R results in a less effective MLMC compared to a              In this paper, we dealt with the meta-model design for
lower R. The number of moments affects the final Nl and              the groundwater flow problem. The motivation was to in-
consequently the total computational cost C.                         corporate a meta-model into the multilevel Monte Carlo
   Table 6 provides the ratio between the MLMCmeta to-               method and achieve additional savings in the MLMC com-
tal cost Cmeta and the MLMC total cost C. To mea-                    putational cost. After comparing the meta-models based
sure a quality of a PDF approximation, KL divergence                 on the ChebNet GCN, the GraphSage GCN, and the sup-
         f ∣∣ρMLMCmeta ) is added to the table, where ρre f de-
KL(ρre 25     R                                        25            port vector regression, the ChebNet GCN was selected
notes a PDF based on 25 moments estimated by the MCref ,             as the most suitable for our task. The proposed meta-
  R
ρMLMC         is a PDF approximated from moments esti-               modeling technique is effective for simulations on unstruc-
        meta
mated by the MLMCmeta . It can be seen that the compu-               tured meshes with a maximum of ca. 1000 elements.
tational cost savings are greatest for the smallest number           Computational cost savings of up to 50% were achieved
of moments. However, R = 2 is insufficient to obtain a de-           for both the MC and the MLMC extended by the meta-
cent PDF by the MEM. In our case, we need at least R = 5.            level. Due to the observed uneven decrease in variances
On the other hand, with Vt >= 1e−5 and R > 30, the mo-               of MLMC estimates across levels, the amount of compu-
ments estimation error causes the PDF to contain obvious             tational cost savings depends on the number R of gener-
ripples. Using 5 ≤ R ≤ 30, we still achieve savings in the           alized statistical moments. In order to obtain a good PDF
computational cost of at least 10%. For you to get an idea,          approximation by the MEM, we required R ≥ 5. In these
for R = 25, Vt = 1e−6, the absolute computational costs in           cases, we were still able to achieve at least 10% savings in
seconds are as follows: Cmeta ≈ 73266 and C ≈ 84603.                 computational costs.
   The effect of R is also manifested in the case of the MC             Although the obtained results are auspicious, to pro-
extended by the meta-level. In both cases, it can be noted           vide more general conclusions, it is necessary to try our
that cost savings are almost constant for R ≥ 15.                    approach with a more complex simulation, which will be
                                                                     more challenging for the meta-model design.
Acknowledgement                                                    [13] William L. Hamilton. Graph representation learning.
                                                                        Synthesis Lectures on Artificial Intelligence and Machine
This work was partly supported by the Student Grant                     Learning, 14(3):1–159.
Scheme at the Technical University of Liberec through              [14] William L. Hamilton, Rex Ying, and Jure Leskovec. In-
project nr. SGS-2020-3068.                                              ductive representation learning on large graphs. CoRR,
The project leading to this work has received funding from              abs/1706.02216, 2017.
the European Union’s Horizon 2020 research and innova-             [15] C. Heaney, Yuling Li, O. Matar, and C. Pain. Applying con-
tion programme under grant agreement No 847593.                         volutional neural networks to data on unstructured meshes
                                                                        with space-filling curves. ArXiv, abs/2011.14820, 2020.
                                                                   [16] Falk Heße, Vladyslav Prykhodko, Steffen Schlüter, and
References                                                              Sabine Attinger. Generating random fields with a truncated
                                                                        power-law variogram: A comparison of several numerical
 [1] Fatma Abid, Khalil Dammak, Abdelkhalak El Hami, Tarek              methods. Environmental Modelling & Software, 55:32–48,
     Merzouki, Hassen Trabelsi, Lassaad Walha, and Mohamed              May 2014.
     Haddar. Surrogate models for uncertainty analysis of
                                                                   [17] E. T. Jaynes. Information theory and statistical mechanics.
     micro-actuator. Microsystem Technologies, 26(8):2589–
                                                                        Physical Review, 106(4):620–630, 1957.
     2600, 2020.
                                                                   [18] Thomas N. Kipf and Max Welling. Semi-supervised
 [2] Mariette Awad and Rahul Khanna. Support vector regres-
                                                                        classification with graph convolutional networks. CoRR,
     sion. In Efficient Learning Machines, pages 67–80. Apress,
                                                                        abs/1609.02907, 2016.
     Berkeley, CA, 2015-04-27.
                                                                   [19] J. Kleijnen. Kriging: Methods and applications. ERN:
 [3] Andrew R. Barron and Chyong-Hwa Sheu. Approximation
                                                                        Computational Techniques (Topic), 2017.
     of Density Functions by Sequences of Exponential Fami-
     lies. The Annals of Statistics, 19(3):1347–1369, September    [20] Shaoning Li and Luca Caracoglia. Surrogate model monte
     1991.                                                              carlo simulation for stochastic flutter analysis of wind tur-
                                                                        bine blades. Journal of Wind Engineering and Industrial
 [4] Claudio Bierig and Alexey Chernov. Approximation of
                                                                        Aerodynamics, 188:43–60, 2019.
     probability density functions by the Multilevel Monte Carlo
     Maximum Entropy method. Journal of Computational              [21] Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, and
     Physics, 314:661–681, 2016.                                        Guangyong Chen. Spectral-based graph convolutional net-
                                                                        work for directed graphs, 2019.
 [5] R Blaheta, M Béreš, and S Domesová. A study of stochas-
     tic FEM method for porous media flow problem. In Ap-          [22] Julian Mack, Rossella Arcucci, Miguel Molina-Solana, and
     plied Mathematics in Engineering and Reliability, pages            Yi-Ke Guo. Attention-based convolutional autoencoders
     281–289. CRC Press, 2016-04-13.                                    for 3d-variational data assimilation. Computer Methods in
                                                                        Applied Mechanics and Engineering, 372, 2020.
 [6] Jan Březina, Jan Stebel, Pavel Exner, and Jan Hybš.
     Flow123d.        http://flow123d.github.com, reposi-          [23] Sebastian Müller and Lennart Schüler. GSTools. https:
     tory: http://github.com/flow123d/flow123d, 2011–                   //github.com/GeoStat-Framework/GSTools, 2019.
     2021.                                                         [24] Karl T. Otness. Graph convolutions and machine learning.
 [7] Jan Březina and Martin Špetlík. MLMC Python library.              Thesis or dissertation, Harvard University, 2018.
     http://github.com/GeoMop/MLMC, 2021.                          [25] Imry Rosenbaum and Jeremy Staum. Multilevel Monte
 [8] C. Canuto. Spectral methods in fluid dynamics. Springer-           Carlo metamodeling. Operations Research, 65(4):1062–
     Verlag, New York, 1988.                                            1077, 2017.
 [9] Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek           [26] Cosmin Safta, Richard L.-Y. Chen, Habib N. Najm, Ali
     Śmieja, Łukasz Struski, Agnieszka Słowik, and Łukasz              Pinar, and Jean-Paul Watson. Toward using surrogates
     Maziarka. Spatial graph convolutional networks. In Neu-            to accelerate solution of stochastic electricity grid opera-
     ral Information Processing, pages 668–675, Cham, 2020.             tions problems. In 2014 North American Power Symposium
     Springer International Publishing.                                 (NAPS), pages 1–6, 2014.
[10] Michaël Defferrard, Xavier Bresson, and Pierre Van-           [27] C. E. Shannon. A mathematical theory of communication.
     dergheynst. Convolutional neural networks on graphs with           Bell System Technical Journal, 27(3):379–423, 1948.
     fast localized spectral filtering. In D. Lee, M. Sugiyama,    [28] R.E. Stern, J. Song, and D.B. Work. Accelerated monte
     U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in         carlo system reliability analysis through machine-learning-
     Neural Information Processing Systems, volume 29. Curran           based surrogate models of network connectivity. Reliability
     Associates, Inc., 2016.                                            Engineering System Safety, 164:1–9, 2017.
[11] Michael B. Giles. Multilevel Monte Carlo methods. Acta        [29] Ze Zhou Wang, Changlin Xiao, Siang Huat Goh, and Min-
     Numerica, 24:259–328, May 2015.                                    Xuan Deng. Metamodel-based reliability analysis in spa-
[12] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley,             tially variable soils using convolutional neural networks.
     Oriol Vinyals, and George E. Dahl. Neural message pass-            Journal of Geotechnical and Geoenvironmental Engineer-
     ing for quantum chemistry. In Doina Precup and Yee Whye            ing, 147(3), 2021.
     Teh, editors, Proceedings of the 34th International Con-      [30] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long,
     ference on Machine Learning, volume 70 of Proceedings              Chengqi Zhang, and Philip S. Yu. A comprehensive survey
     of Machine Learning Research, pages 1263–1272. PMLR,               on graph neural networks. IEEE Transactions on Neural
     06–11 Aug 2017.                                                    Networks and Learning Systems, 32(1):4–24, 2021.
[31] Mengfei Xu, S. Song, Xuxiang Sun, and Weiwei Zhang.
     Ucnn: A convolutional strategy on unstructured mesh.
     ArXiv, abs/2101.05207, 2021.
[32] Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J.
     Smola. Dive into Deep Learning. 2020. https://d2l.ai.
[33] Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejew-
     ski. Graph convolutional networks. Computational Social
     Networks, 6(1), 2019.
[34] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang,
     Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li,
     and Maosong Sun. Graph neural networks: A review of
     methods and applications. AI Open, 1:57–81, 2020.
[35] L. Zhou, G. Yan, and J. Ou. Response surface method based
     on radial basis functions for modeling large-scale structures
     in model updating. Comput. Aided Civ. Infrastructure Eng.,
     28:210–226, 2013.