=Paper= {{Paper |id=Vol-2962/paper40 |storemode=property |title=Groundwater Flow Meta-model for Multilevel Monte Carlo Methods |pdfUrl=https://ceur-ws.org/Vol-2962/paper40.pdf |volume=Vol-2962 |authors=Martin Špetlík,Jan Březina |dblpUrl=https://dblp.org/rec/conf/itat/SpetlikB21 }} ==Groundwater Flow Meta-model for Multilevel Monte Carlo Methods == https://ceur-ws.org/Vol-2962/paper40.pdf

Groundwater Flow Meta-model for Multilevel Monte Carlo Methods

Martin Špetlík, Jan Březina

Technical University of Liberec, Faculty of Mechatronics, Informatics and Interdisciplinary Studies,
Institute of New Technologies and Applied Informatics
Studentská 1402/2, 461 17 Liberec 1, Czech Republic
martin.spetlik@tul.cz, jan.brezina@tul.cz

Abstract: This paper presents a meta-modeling-based that this approach could reduce the computational cost of
technique to reduce the computational cost of Monte Carlo MLMC estimates. It is also necessary to maintain the ac-
methods. A stochastic simulation of groundwater flow curacy of the estimates to achieve a good PDF approxima-
is substituted with a graph neural network meta-model. tion. Graph convolutional neural networks and the support
This type of neural network can deal with a non-euclidean vector regression [2] are used as meta-models.
structure of the input data, which in our case is a graph First, the groundwater flow problem definition and the
representing a random field on an unstructured mesh. In maximum entropy method brief introduction are provided
order to find the most suitable meta-model, a comparison in section 2. Then, Monte Carlo methods are presented in
of the standard support vector regression with spectral and section 3. Spektral and spatial graph convolutional neu-
spatial graph convolutional neural networks is provided. ral networks used as meta-models are delineated in sec-
Both the Monte Carlo method and the multilevel Monte tion 5. Section 6 provides the link between the MLMC
Carlo method are extended by the meta-model level. In and a meta-model, including the related work. Finally,
both cases, up to 50% savings in computational cost are section 7 is devoted to the results and discussion.
achieved while maintaining the accuracy of Monte Carlo
estimates.
2 Problem Definition

1 Introduction A 2D benchmark problem of groundwater flow through a
porous medium is used to test our proposed methods. The
Groundwater flow in the vicinity of a future nuclear waste same problem can be found in the paper by Blaheta et al.
repository is the major motivation for our research. In the [5]. The groundwater flow is described by the boundary
first stage, our quantity of interest (QoI) is the total flow value problem on the unit square:
through the observed area. However, since not all indis-
−div(K(x)∇p) = 0 (1)
pensable properties of the rock environment are known,
it is not feasible to determine the desired total flow. For −K∇p ⋅ n⃗ = 0,
that reason, and given the nature of the rock, the missing
K is the hydraulic conductivity, p is the pore water pres-
properties are modeled as random fields (RFs). The proba-
sure, −K∇p is Darcy’s velocity, n⃗ is the unit normal vector.
bility density function (PDF) of our QoI is under scrutiny.
We are interested in the total flow Y through the specified
To approximate the PDF, the maximum entropy method is
area
adopted. The method utilizes so-called generalized statis- 1
tical moments. In order to estimate these moments, the Y = ∫ [−K∇p ⋅ n⃗](1,y)dy. (2)
0
Monte Carlo methods are employed. These methods con- The problem is prescribed on unstructured meshes and is
sist in repeating a random experiment. Depending on the solved by the finite element method using the Flow123d
required accuracy, it might result in thousands of ground- software [6]. Since the proposed approach shall be
water flow simulations, which can significantly affect the adopted for more complicated problems with complex ir-
total computational cost. regular geometry, unstructured meshes are preferred over
This fact was behind the idea to substitute simulations structured ones. The hydraulic conductivity is consid-
with an approximation model, often called a meta-model. ered as a random field (RF) with the exponential covari-
Furthermore, we use the multilevel Monte Carlo method ance function C(r) = σ 2 exp(− λr ), where λ is a correla-
(MLMC) instead of the standard Monte Carlo method to tion length and σ is a standard deviation. The GSTools
reduce the variance of estimates more effectively. Since software library [23] is utilized to generate RF; see Hesse
the MLMC uses simulations of different accuracy, we aim et al. [16] for theoretical details.
to substitute those with the lowest accuracy, performed
in the largest number. Their approximation shouldn’t be
excessively challenging for meta-modeling. We presume 2.1 Maximum Entropy Method

Copyright ©2021 for this paper by its authors. Use permitted under When considering the hydraulic conductivity as a random
Creative Commons License Attribution 4.0 International (CC BY 4.0). field, the total flow Y is a real random variable. The aim is
numbers. Therefore, in order to obtain an unbiased esti-
mate of the expected value Y = E[P(x)], the MC consists
in the arithmetic mean of N independent samples

1 N
Ŷ = ∑ P(xn ), (5)
N n=1

where P is a random variable depending on X. According
to the central limit theorem:

σ2
Ŷ ∼ N (µ, ). (6)
N

Consequently, the computational cost of reducing the vari-
Figure 1: An illustration of the porous media flow problem ance (increasing N) of Ŷ can be very high. To overcome
this drawback (for details, see [11, p. 4]), the multilevel
Monte Carlo method (MLMC) was formulated.
to determine its probability density function ρ(Y ), which
is nonnegative on a bounded domain Ω. The maximum
entropy method [17] (MEM) is employed for this purpose. 3.1 Multilevel Monte Carlo Method
Its basic idea lies in approximating a PDF from general-
In the case of the MLMC [11], the expected value of a
ized statistical moments and functions that calculate them:
random variable P is estimated based on the sequence of
its approximations P1 , ..., PL :
∫ ρ(y)dy = 1,
Ω
(3) L
∫ φr (y)ρ(y)dy = µr , r = 1,...,R E[P] = E[P1 ] + ∑ E[Pl − Pl−1 ], (7)
Ω l=2
where ρ(Y ) is an unknown, {µr }Rr=1 are values of gen-
eralized moments, {φr }Rr=1 are linearly independent func- where Pl−1 ≈ Pl . The estimate of the expected value of P
tions (see [3, p. 1350]) used for calculating generalized improves from P1 to PL . The unbiased estimate of E[P]:
moments and satisfying φ1 = 1, φr ∈ CR (Ω), r = 2,...,R. In
our case, {φr }Rr=1 are the Legendre polynomials, R ∈ N is 1 N1 L
1 Nl
P̂ = ∑ P1 (xn ) + ∑{ ∑ (Pl (xn ) − Pl−1 (xn ))}, (8)
1 l l
the number of moments. N1 n=1 l=2 Nl n=1
Provided that the system of equations 3 has a solution,
then it has an infinite number of them. Therefore, the where L is the total number of levels, Nl is a number of
most apposite solution is the one with the maximum Shan- simulation samples at level l. Input random data xnl are
non entropy (defined in [27]) according to E. Jaynese [17, dependent for each simulation pair n at level l. But they
p. 623]. Determining ρ(Y ) with the maximum entropy are independent across levels. Since Pl − Pl−1 are also in-
takes the form of finding the global maximum of the func- dependent across levels, the total estimated variance V̂ of
tional P̂ has the following form:
H(ρ) = − ∫ ρ(y)ln(ρ(y))dy (4) L
Ω V̂l
V̂ = ∑ , (9)
under the constraints prescribed in equations 3. l=1 Nl
A numerical solution and general limitations of the
MEM are beyond the scope of this paper. A more detailed where V̂1 is an estimate of P1 variance and V̂l for l > 1 is an
insight is provided in [3, 4]. estimate of Pl − Pl−1 variance. The MLMC computational
cost:
L
3 Monte Carlo Methods C = ∑ Nl Cl , (10)
l=1

Since the MEM utilizes moment values {µr }Rr=1 that are where Cl is a cost of a single simulation sample at level l
also random variables derived from some random input X measured in, for instance, the number of computational
(which is the hydraulic conductivity K in this study). It is operations, execution time, CPU time, etc. Let C1 denote
necessary to estimate their expected values. Monte Carlo a cost of P1 , Cl is a cost of Pl − Pl−1 for l > 1.
methods are used for this. Given the MLMC theoretical properties stated in [11,
The standard Monte Carlo method (MC) is an approach theorem 1], variance Vl should decrease from l = 1 to l = L,
for estimating expected values of some stochastic simula- while computational cost Cl should increase from l = 1 to
tion variable. The basic idea comes from the law of strong l = L.
3.2 Number of Simulation Samples number of grid points to capture features from original
unstructured data, according to [22]. Wang et al. [29]
The way of determining Nl is a crucial part of the MLMC. proposed a CNN on unstructured meshes. This novel ap-
There are two main approaches: proach is limited to random fields described by the spectral
• minimizing the total variance V with respect to the representation method or the Karhunen-Loève expansion.
prescribed computational cost Another popular approach is the representation of unstruc-
• minimizing the total computational cost C with re- tured meshes as undirected graphs, which allows us to use
spect to the prescribed target variance Vt . graph convolutional neural networks (GCNs) [31]. In this
study, we use GCN-based meta-models and support vector
Our attention is paid to the latter approach, which has the regression meta-models.
form of finding the minimum of equation 10 under the con-
straint
L
V̂l 4.1 Graph Representing a Random Field
Vt = ∑ . (11)
N
l=1 l A graph G = (V,E) is an unweighted undirected graph de-
After some calculus and concerning the use of the MEM, scribing the mesh structure on which a random field is gen-
Nl are determined with respect to R moment values erated. V is a set of vertices representing mesh elements,
V = {v1 ,v2 ,...,vS }, S is the number of vertices (=mesh el-
¿ √
Á V̂ r ∑Li=1 V̂irCi ements). The neighborhood of a vertex v is defined as
Nlr = Á
À l , r = 1,...,R, (12) N(v) = {u ∈ V ∣(v,u) ∈ E}. Each vertex has a feature rep-
Cl Vt resenting a random field value at the corresponding mesh
element. E is a set of edges. An edge connects adjacent
where V̂lr is an estimated variance of φlr (x) − φl−1
r
(x) for mesh elements. An adjacency matrix A is used to represent
r-th moment at level l. Finally, Nl = max Nl .r
G on a computer.
r=1,...,R
The mlmc software library [7] is employed to schedule
samples and post-process results, including our MEM im-
plementation.

4 Meta-modeling

A meta-model is a simpler and explicit mathematical func-
tion that approximates complicated functions that can be
both implicit and evaluated by a simulation model, mea-
surements data, or experiments. Alternative names can
be found in the literature, such as surrogate model, sur-
face response model, emulator, etc. Meta-modeling is the
process of designing meta-models. The common meta-
models used in the finite element analysis include support
vector regression (SVR), artificial neural network (ANN),
Kriging, also known as Gaussian process [19], radial basis Figure 2: An example of a graph on 546 mesh elements
function (RBF) [35], etc.
Artificial neural networks have become very popular for
meta-model design. A regression problem is solved by an
ANN trained by supervised learning. The aim is to ob- 5 Graph Neural Networks
tain an unknown mapping of input neuron activities to an
output neuron activity. With regard to the nature of our A variety of real-world problems can be represented as
input data, which is a 2D correlated random field, it would graphs. Imagine social networks, molecules, or in our
be suitable to use a convolutional neural network (CNN) case, random fields on unstructured meshes.
meta-model. Nevertheless, CNNs cannot be applied di- Graph neural networks (GNNs) are deep learning-based
rectly to data on unstructured meshes (see [31]). models that operate on the graph domain [34]. They have
There are few ways how to overcome this difficulty. The a wide range of applications in classification, relation ex-
basic solution is to directly apply a 1D CNN to nodal val- traction, molecular fingerprints prediction, and so on. A
ues of an unstructured mesh considered as a vector. How- comprehensive survey on graph neural networks can be
ever, it has limited success [15]. The other option is in- found in [30]. In brief, GNNs are categorized into sev-
terpolating data from an unstructured mesh to a structured eral groups, such as graph convolutional networks [33],
mesh and then using a CNN. This approach leads to an ad- graph attention networks, graph recurrent networks, etc.
ditional error caused by interpolation. It requires a larger The graph convolutional networks (GCNs) are the most
important ones because they are the fundamental of other ChebNet GCN with a Chebyshev convolutional layer was
graph neural network models (see [21]). GCNs can be di- first introduced by Defferrard et al. [10] in 2016. The fil-
vided into spectral GCNs and spatial GCNs. ter g is approximated by a truncated expansion of Cheby-
shev polynomials Tk (x) up to the K th order:
K−1
5.1 Spectral Graph Convolutional Neural Networks
gθ = ∑ θk Tk (Λ̃), (19)
k=0
Spectral GCNs are based on knowledge from the field of
graph signal processing. The well-known convolution has where θk ∈ Rk is a vector of Chebyshev coefficients,
the following form: Λ̃ = 2Λ/λmax − I, λmax is the maximum eigenvalue from
Λ. The Chebyshev polynomials are defined recursively by
Tk≥2 (x) = 2xTk−1 (x) − Tk−2 (x) with T0 (x) = 1, T1 (x) = x.
( f ∗ g)(x) = ∫ f (t)g(x −t)dt, (13)
Rk Then x filtered by g:

but it is unclear how to interpret the translation g(x −t) for K−1
x ∗G g = U( ∑ θk Tk (Λ̃))U T x. (20)
a graph signal. Thus the convolution operation is not de- k=0
fined on structures like graphs. W. Hamilton [13] or K. Ot-
ness [24] provides a detailed explanation of the graph con- The eigendecomposition can be avoided [30, p. 10]:
volution. The basic idea is to take advantage of the fact K−1
that convolution in the spatial domain corresponds to the x ∗G g = ( ∑ θk Tk (L̃))x, (21)
point multiplication in the spectral domain. k=0
A graph signal x ∈ RS (vector of all G vertex values) is
where L̃ = 2L/λmax −I. L̃ is called the rescaled graph Lapla-
transformed by a graph Fourier transform from the spatial
cian, the eigenvalues are mapped from [0,λmax ] to [−1,1],
domain to the spectral domain. Loosely speaking, the stan-
the Chebyshev polynomials form an orthogonal basis.
dard Fourier transform is connected to the eigendecompo-
Several K settings were tested. Since K > 1 did not bring
sition of the Laplace operator. In the case of graphs, the
improvement for our problem, K = 1 is used in our study.
Laplace operator is represented by the Laplacian matrix
For this setting, the ChebNet is very similar to the GCN
[33, p. 4]. The normalized Laplacian matrix L can also be
proposed by Kipf and Welling [18]. Filters are exactly
used as a Laplace operator:
K-localized. It means the filter modifies information at a
L = I − D−1/2 AD−1/2 ,
particular vertex based on the information from vertices in
(14)
its K neighborhood. It essentially connects spectral-based
methods with spatial-based ones.
where A is a graph adjacency matrix, and D is a diag-
onal matrix of vertex degrees. L is a symmetric real-
valued positive semi-definite matrix that can be factorized 5.2 Spatial Graph Convolutional Neural Networks
L = UΛU T , where U is a matrix of eigenvectors and Λ is
As for spatial GCNs [9], the convolution is performed in
a diagonal matrix of eigenvalues. Then the graph Fourier
the graph domain by propagating information along edges
transform of x is [34, p. 60]:
between adjacent vertices. The concept is based on mes-
sage passing neural networks [12]. The spatial graph con-
F(x) = U T x (15) volution is defined as follows [30, p. 12]:
and its inverse: (k) (k−1) (k−1) (k−1)
hv = Uk (hv , ∑ Mk (hv ,hu e
,xvu )), (22)
−1
F (x̂) = U x̂. (16) u∈N(v)

(k)
Finally, the graph convolution of x and a filter g ∈ Rk : where hv represents features of vertex v in a hidden layer
e
k, xvu is an optional feature vector of an edge (v,u). Uk and
x ∗G g = F −1 (F(x) ⊙ F(g)), (17) Mk are the update and message functions with learnable
parameters.
where ∗G is the convolutional operator on a graph, and ⊙
represents the element-wise Hadamard product. The sig-
GraphSage is an aggregation-based learning model pro-
nal x can be filtered in the spectral domain by a filter gθ in
posed by Hamilton et al. [14]. It performs the convolution
this way (for details, see [13, p. 83]):
as follows:
(k) (k−1) (k−1)
x ∗G gθ = Ugθ (Λ)U T x, (18) hv = σ (W (k) ⋅ fk (hv ,{hu ∀u ∈ SN(v) })), (23)

where gθ (Λ) is a polynomial of the eigenvalues of the where h0v is a representation vector of vertex v features, σ
Laplacian. The filter represents learnable weights. represents a nonlinear activation function, W (k) is a weight
matrix in layer k, fk represents an aggregation function, First, the number of training samples Ntr is determined,
SN(v) ⊆ N(v). Thus GraphSage enables the use of huge and numerical simulations are executed. Then, for each
graphs by selecting a subset from each vertex neighbour- training sample, a stored random field is pre-processed to
hood instead of using all neighbours. There are different become a meta-model input, let C pr denote the cost of this
operations used as fk , such as average, sum, max, min, etc. operation. The pre-processing can slightly differ for dif-
In this study, we use GraphSage with a single layer, f is ferent meta-models. The SVR input is a vector of random
the summation. field elements, while the GCN input is their graph. After-
A comparison between spectral GCNs and spatial ward, a meta-model is trained, and the cost of this opera-
GCNs is summarized by Wu et al. [30, p. 13]. They tion is Cml . Once the meta-model is prepared, it is possible
state that spatial GCNs are usually preferred over spectral to make predictions at the cost of C pred . In the case of the
GCNs. However, in our problem 2, we aim to approximate meta-level, there is no stored random field because there is
the solution of the elliptic partial differential equation. To no simulation executed. Thus a random field is generated
do that numerically, it is possible to use spectral methods and pre-processed at the cost of Cr f .
[8]. This entitles us to assume that spectral GCNs will be Let C2 denote the computational cost of a sample at the
more appropriate. first MLMCmeta level:

C2 = C1∗ + (C2tr +C2pr (N2 − Ntr ))/N2 , (25)
6 Monte Carlo Methods with Meta-model
where C1∗ is the cost of a simulation sample. C2tr is the cost
There are several approaches how to use meta-models to of a meta-model training procedure:
reduce the computational cost of Monte Carlo estimates.
The relatively frequent approach (e.g., [1, 20, 28]) is to C2tr = C pr Ntr +Cml +C pred Ntr , (26)
construct a meta-model of a simulation, then the MC is
conducted with the meta-model instead of the simulation. C2pr represents the cost of a first level meta-model predic-
Other approaches consist in replacing the whole Monte tion sample:
Carlo estimate by a meta-model. For instance, Safta et al. C2pr = C pr +C pred . (27)
[26] use polynomial chaos expansion meta-model, which
requires a significantly lower number of samples than the Let C1 denote the cost of a meta-level sample that utilizes
MC. Rosenbaum and Staum [25] construct a stochastic meta-model trained at the first level:
simulation meta-model by way of the MLMC.
We propose a different approach based on adding a new C1 = Cr f +C pred . (28)
coarse meta-model-based level to the original Monte Carlo
Sample costs Cl for l > 2 are not affected by the meta-
method (MC or MLMC). The MLMC estimate from equa-
modeling. Meta-models training is performed on the clus-
tion 8 has now the following form:
ter; 16 CPUs (Intel Xeon Silver 4114 CPU (2.2GHz)) and
1 N1 1 N2 16 GB RAM (DDR4 2400 ECC Reg dual rank) are as-
P̂meta = ∑ P̃1 (xn ) +
1
∑ (P1 (xn ) − P̃1 (xn ))+
2 2
signed for that task. Also, MLMC simulations are exe-
N1 n=1 N2 n=1
(24) cuted in parallel. Therefore, C measured in execution time
L
1 Nl is not equal to the real elapsed time. However, the savings
∑{ ∑ (Pl (xn ) − Pl−1 (xn ))},
l l
l=3 Nl n=1 in C can significantly affect the total elapsed time, espe-
cially for a small Vt , which is necessary to obtain a neat
where P̃1 denotes a meta-model approximation of P1 . PDF by the MEM.
Since the MLMC assumes Pl−1 ≈ Pl , it is completely valid
to employ a meta-model as the coarsest level (which we
denote as the meta-level) and the difference between P1 7 Results
and P̃1 as the subsequent first level. Let MLMCmeta denote
the multilevel Monte Carlo method with the meta-level. The proposed meta-modeling techniques are investigated
In order to meet basic theoretical properties of the in this section. The most suitable one is used by Monte
MLMC (mentioned in section 3.1) the meta-level compu- Carlo methods. The models are compared for different
tational cost C1 should be lower than C2 , and meta-level mesh sizes and random field parameters.
samples variance V1 should be greater than V2 . The cross-validation-like procedure is implemented to
compare models. Initially, N = 50000 simulation sam-
6.1 Computational cost ples are generated. The number of training samples Ntr
is empirically determined based on the properties of the
The total MLMC cost (see equation 10) depends on Cl , MLMC. Since we are interested in Vt ≤ 1e−5 (for the sake
in our case measured in execution time. As a meta-model of the MEM capability to approximate a PDF neatly), the
brings additional computational costs, parts of the learning number of samples at the coarsest level is at least 2000
process, including their costs, are introduced. for our problem. Thus, the procedure is as follows: 2000
training samples (Ntr = 2000) out of N are chosen for meta- Table 2 and Table 3 show the models final accuracy
model training. The rest is considered to be test data. Val- on training data and test data, respectively. The arith-
idation data accounts for 20% of Ntr . This procedure is metic mean and the standard error of 25 calculations of
the mean squared error (MSE): D1 ∑Di=1 (y −ymeta ) are pro-
i i 2
repeated 25 times with independent training sets. Given
that we use a random field with the exponential correla- i i
vided, where y is a correct value and ymeta is a predicted
tion function, the logarithm of the RF values is used as value, D is a number of samples. The relative squared er-
meta-models input to facilitate their training. In all of the ror (RSE) is used to compare the models across cases (see
following cases, the final Nl is determined based on the Table 4).
geometric sequence of the initial number of samples de-
creasing across levels from N10 = 2000 to NL0 = 100. The Table 2: Meta-models train MSE
number of moments R = 25 is utilized.
number of mesh elements
train MSE
7.1 Meta-models Setting
6 48 546
We have tried dozens of GCN topologies. The most arithmetic mean
promising ones consist of one ChebNet/GraphSage layer SVR 0.01711 0.007185 0.004657
with 8/60 output channels and the ReLU activation func- ChebNet GCN 0.03476 0.009441 0.004633
tion, following by a global summation pool and the output GraphSage GCN 0.02368 0.01045 0.006848
layer with one neuron with the identity activation function. standard error
Table 1 contains main hyperparameters that enable to ob- SVR 4.1e−4 6.5e−5 3.6e−5
tain an accurate and computationally efficient meta-model. ChebNet GCN 7.0e−3 8.1e−5 2.2e−4
Another important hyperparameter is the number of output GraphSage GCN 4.3e−4 9.9e−5 1.8e−4

Table 1: GCN hyperparameters common to all cases
Table 3: Meta-models test MSE
optimizer Adam
hidden activation ReLU number of mesh elements
test MSE
output activation identity
learning rate 0.001
6 48 546
regularization None
loss MSE arithmetic mean
max epochs 2000 SVR 0.03694 0.008968 0.006696
patience 150 ChebNet GCN 0.04331 0.008392 0.004595
GraphSage GCN 0.02284 0.009200 0.006839
standard error
channels. The optimal number differs across types of GCN
SVR 6.0e−4 5.4e−5 6.1e−5
layers. It forms a vector of hidden representations corre-
ChebNet GCN 2.6e−3 7.9e−5 1.9e−4
sponding to each vertex. It is possible to think of each
GraphSage GCN 1.9e−4 1.2e−4 1.9e−4
channel as responding to some different set of features, so
different channels could become specialized to recognize
different objects as described by Zhang et al. [32].
Table 4: Meta-models test RSE
Regarding the SVR, the Gaussian radial basis function
(RBF) kernel and the regularization parameter W = 0.06
are used. An explanation of the role of SVR parameters is number of mesh elements
test RSE
provided in [2].
6 48 546

7.2 Meta-models on Different Meshes SVR 0.1263 0.1551 0.1335
ChebNet GCN 0.1481 0.1451 0.1064
Three meta-model techniques are compared: the SVR, GraphSage GCN 0.1162 0.1591 0.1364
the ChebNet GCN, and the GraphSage GCN. Recall that
low accurate simulations are supposed to be substituted Data in the tables show that all models provide similar
with meta-models. Hence the emphasis is placed on meta- results in terms of the train MSE and the test MSE, and
models trained on small meshes. In particular, meshes of also, the RSE values are of the same order of magnitude.
6, 48, and 546 elements are compared. Random field pa- Importantly, presented data show no significant outliers
rameters are by default λ = 0.1 and σ = 1. For a given mesh that would be highly undesirable for our purposes. Al-
size, the models are trained and tested on the same data. though the results are very similar, it can be seen that the
optimal meta-model varies depending on the mesh size. Table 5: Computational cost of MLMCmeta to MC
While the GraphSage GCN or the SVR is more advanta-
geous for very small meshes, the use of the ChebNet GCN number of mesh elements
Cmeta /C
is most suitable in the case of larger meshes.
Nevertheless, additional experiments show that the 6 48 546
meta-model approximation is not sufficiently accurate for
meshes of thousands of elements. We face the so-called SVR 0.2077 0.4947 0.4666
curse of dimensionality due to the limited number of train- ChebNet GCN 0.3648 0.4371 0.4395
ing samples. An increasing number of samples can over- GraphSage GCN 0.2812 0.4485 0.4494
come this difficulty, but many simulations need to be
performed in such a case, and learning cost increases.
Thus, to keep Ntr = 2000, we limited ourselves to meta- In the rest of this section, MLMCmeta employs the Cheb-
models based on simulations on meshes with a maximum Net GCN on 546 mesh elements. Since the MEM uti-
of ca. 1000 elements. lizes moment values µ, it is important to verify if their
In practice, we see that the ChebNet GCN is not only estimates by the MLMCmeta are the same as estimates by
better for larger meshes (from ca. 500 up to ca. 1000 el- the original MC. We construct a reference MC (MCref ) of
ements) but also provides a more stable learning process 50000 samples on meshes of 217208 elements, moments
in terms of a smooth decrease of a validation loss than estimates are denoted as µ̂ ire f . Figure 3 shows the MSE:
i i 2
25 ∑i=1 ( µ̂ re f − µ̂ ) for estimated moment values by the
the GraphSage GCN. Therefore the ChebNet GCN is pre- 1 25

ferred over the GraphSage GCN for our task. MC, and the MSE: 25 1 i i
∑i=1 (µ̂ re f − µ̂ meta )2 for moments
25

estimated by the MLMCmeta .
7.3 Role of Random Field Parameters
MC
Random field parameters affect the meta-model learning MLMCmeta
ability. Changing the standard deviation σ only scales 0.008
features of graph vertices, which our learning procedure
can handle. The correlation length λ plays a more sig- 0.006
nificant role. As λ decreases, the correlation between the
MSE

features of vertices decreases as well. For small correla- 0.004
tion lengths, e.g., λ = 0.001, all features are almost uncor-
related, which has a similar effect as increasing the num- 0.002
ber of vertices with the original correlation length λ = 0.1.
Thus, increasing the number of training samples is neces-
0.000
sary to obtain the same results for λ = 0.001. 0 5 10 15 20 25
This reveals one of the SVR drawbacks, which is the re- i-th moment
quirement of a lot of training samples [2, p. 76]. While the
number of training samples 5000 is enough for the Cheb- Figure 3: Comparison of the MSE between reference mo-
Net GCN on meshes of 546 elements and λ = 0.001. In ments and moments estimated by MC and MLMCmeta ,
the case of the SVR, even 15000 training samples is not Vt = 1e−5
enough to get similar results for λ = 0.001 as for λ = 0.1.
For this reason, the ChebNet GCN is preferred for further Since the error of moments estimates is similar for both
analysis. It is also worth noting that naturally, fewer train- the MC and the MLMCmeta , it is a good prerequisite for
ing samples are enough for λ > 0.1. a fine PDF approximation. Figure 4 shows an example
of a PDF approximated by the MEM based on moments
from the MC and the MLMCmeta . Given the 25 repeti-
7.4 MC extended by the meta-level tions, the average KL divergence KL(ρre f ∣∣ρMC ) = 0.034
Let use the trained meta-models with the Monte Carlo and KL(ρre f ∣∣ρMLMCmeta ) = 0.031, for R = 25, Vt = 1e−5.
method described in section 6. To illustrate our approach, Thus, it is possible to get comparable results by both ap-
we use three different MC with simulations on meshes proaches.
of 6, 48, and 546 elements, each extended to two-level
MLMCmeta . Table 5 shows the ratio between the total cost 7.5 MLMC extended by the meta-level
Cmeta of our MLMCmeta and the total cost C of the orig-
inal MC, Vt = 1e−5. It is important to note that the ratio Since we cannot effectively train the meta-models on
Cmeta /C is a bit smaller for Vt << 1e−5, where the meta- meshes with more than ca. 1000 elements, it is advisable
model learning cost is negligible due to the larger N2 . It to use initially the MLMC that generally reduces the com-
emerges that the computational cost savings of at least putational cost compared to the MC. Applying formula 24,
50% are achieved for R = 2. a meta-model is trained on the coarsest level, where a
2.0 25
DMC : 0.0002149 MLMC
Dmeta : 0.0002178 101 MLMC meta
1.5 MC ref 20
10 1 5.52 1.52 0.0199
1.0 62.6 15
PDF

10 3

Vrl

r
0.5 10
10 5
0.0 10 7 5
0.0 0.5 1.0 1.5 2.0 2.5
Y
10 3 10 2 10 1 100
Figure 4: Comparison of PDFs approximated by mo- h - mesh step
ments from the MC (blue) and the MLMCmeta (red dotted), Figure 5: Moment variances across MLMC levels. The
DMC = KL(ρre f ∣∣ρMC ) is the Kullback-Leibler (KL) diver- level is identified by its simulation mesh step h, added
gence between PDF ρMC from MC data and the reference number represents the MLMCmeta Cl
PDF ρre f , Dmeta = KL(ρre f ∣∣ρMLMCmeta ), where ρMLMCmeta
is the density from MLMCmeta data, Vt = 1e−6, R = 25
Table 6: Computational cost of MLMCmeta to MLMC

mesh should have a small number of elements. Thus, R Cmeta /C 25
KL(ρre f ∣∣ρMLMCmeta )
R

meta-model learning is feasible. To illustrate the MLMC
2 0.41 0.522
extended by the meta-level, let assume 3 level MLMC
5 0,64 0.00227
with simulations on meshes of 546, 6772 and 87794 el-
15 0,83 0.00338
ements, and the ChebNet GCN meta-model. RF parame-
25 0,87 0.00503
ters: λ = 0.1, σ = 1.
50 0,87 0.0112
Figure 5 shows the variance decrease from the coars-
75 0.85 0.0176
est level to the finest level. Levels are here defined by a
simulation step h. The smaller the h, the finer the simula-
tion. It can be observed that in our current MLMC imple-
mentation, the decrease is steeper for lower moments. In 8 Conclusions
general, the more the variance across levels decreases, the
more effective the use of the MLMC. Thus, in our case, a
higher R results in a less effective MLMC compared to a In this paper, we dealt with the meta-model design for
lower R. The number of moments affects the final Nl and the groundwater flow problem. The motivation was to in-
consequently the total computational cost C. corporate a meta-model into the multilevel Monte Carlo
Table 6 provides the ratio between the MLMCmeta to- method and achieve additional savings in the MLMC com-
tal cost Cmeta and the MLMC total cost C. To mea- putational cost. After comparing the meta-models based
sure a quality of a PDF approximation, KL divergence on the ChebNet GCN, the GraphSage GCN, and the sup-
f ∣∣ρMLMCmeta ) is added to the table, where ρre f de-
KL(ρre 25 R 25 port vector regression, the ChebNet GCN was selected
notes a PDF based on 25 moments estimated by the MCref , as the most suitable for our task. The proposed meta-
R
ρMLMC is a PDF approximated from moments esti- modeling technique is effective for simulations on unstruc-
meta
mated by the MLMCmeta . It can be seen that the compu- tured meshes with a maximum of ca. 1000 elements.
tational cost savings are greatest for the smallest number Computational cost savings of up to 50% were achieved
of moments. However, R = 2 is insufficient to obtain a de- for both the MC and the MLMC extended by the meta-
cent PDF by the MEM. In our case, we need at least R = 5. level. Due to the observed uneven decrease in variances
On the other hand, with Vt >= 1e−5 and R > 30, the mo- of MLMC estimates across levels, the amount of compu-
ments estimation error causes the PDF to contain obvious tational cost savings depends on the number R of gener-
ripples. Using 5 ≤ R ≤ 30, we still achieve savings in the alized statistical moments. In order to obtain a good PDF
computational cost of at least 10%. For you to get an idea, approximation by the MEM, we required R ≥ 5. In these
for R = 25, Vt = 1e−6, the absolute computational costs in cases, we were still able to achieve at least 10% savings in
seconds are as follows: Cmeta ≈ 73266 and C ≈ 84603. computational costs.
The effect of R is also manifested in the case of the MC Although the obtained results are auspicious, to pro-
extended by the meta-level. In both cases, it can be noted vide more general conclusions, it is necessary to try our
that cost savings are almost constant for R ≥ 15. approach with a more complex simulation, which will be
more challenging for the meta-model design.
Acknowledgement [13] William L. Hamilton. Graph representation learning.
Synthesis Lectures on Artificial Intelligence and Machine
This work was partly supported by the Student Grant Learning, 14(3):1–159.
Scheme at the Technical University of Liberec through [14] William L. Hamilton, Rex Ying, and Jure Leskovec. In-
project nr. SGS-2020-3068. ductive representation learning on large graphs. CoRR,
The project leading to this work has received funding from abs/1706.02216, 2017.
the European Union’s Horizon 2020 research and innova- [15] C. Heaney, Yuling Li, O. Matar, and C. Pain. Applying con-
tion programme under grant agreement No 847593. volutional neural networks to data on unstructured meshes
with space-filling curves. ArXiv, abs/2011.14820, 2020.
[16] Falk Heße, Vladyslav Prykhodko, Steffen Schlüter, and
References Sabine Attinger. Generating random fields with a truncated
power-law variogram: A comparison of several numerical
[1] Fatma Abid, Khalil Dammak, Abdelkhalak El Hami, Tarek methods. Environmental Modelling & Software, 55:32–48,
Merzouki, Hassen Trabelsi, Lassaad Walha, and Mohamed May 2014.
Haddar. Surrogate models for uncertainty analysis of
[17] E. T. Jaynes. Information theory and statistical mechanics.
micro-actuator. Microsystem Technologies, 26(8):2589–
Physical Review, 106(4):620–630, 1957.
2600, 2020.
[18] Thomas N. Kipf and Max Welling. Semi-supervised
[2] Mariette Awad and Rahul Khanna. Support vector regres-
classification with graph convolutional networks. CoRR,
sion. In Efficient Learning Machines, pages 67–80. Apress,
abs/1609.02907, 2016.
Berkeley, CA, 2015-04-27.
[19] J. Kleijnen. Kriging: Methods and applications. ERN:
[3] Andrew R. Barron and Chyong-Hwa Sheu. Approximation
Computational Techniques (Topic), 2017.
of Density Functions by Sequences of Exponential Fami-
lies. The Annals of Statistics, 19(3):1347–1369, September [20] Shaoning Li and Luca Caracoglia. Surrogate model monte
1991. carlo simulation for stochastic flutter analysis of wind tur-
bine blades. Journal of Wind Engineering and Industrial
[4] Claudio Bierig and Alexey Chernov. Approximation of
Aerodynamics, 188:43–60, 2019.
probability density functions by the Multilevel Monte Carlo
Maximum Entropy method. Journal of Computational [21] Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, and
Physics, 314:661–681, 2016. Guangyong Chen. Spectral-based graph convolutional net-
work for directed graphs, 2019.
[5] R Blaheta, M Béreš, and S Domesová. A study of stochas-
tic FEM method for porous media flow problem. In Ap- [22] Julian Mack, Rossella Arcucci, Miguel Molina-Solana, and
plied Mathematics in Engineering and Reliability, pages Yi-Ke Guo. Attention-based convolutional autoencoders
281–289. CRC Press, 2016-04-13. for 3d-variational data assimilation. Computer Methods in
Applied Mechanics and Engineering, 372, 2020.
[6] Jan Březina, Jan Stebel, Pavel Exner, and Jan Hybš.
Flow123d. http://flow123d.github.com, reposi- [23] Sebastian Müller and Lennart Schüler. GSTools. https:
tory: http://github.com/flow123d/flow123d, 2011– //github.com/GeoStat-Framework/GSTools, 2019.
2021. [24] Karl T. Otness. Graph convolutions and machine learning.
[7] Jan Březina and Martin Špetlík. MLMC Python library. Thesis or dissertation, Harvard University, 2018.
http://github.com/GeoMop/MLMC, 2021. [25] Imry Rosenbaum and Jeremy Staum. Multilevel Monte
[8] C. Canuto. Spectral methods in fluid dynamics. Springer- Carlo metamodeling. Operations Research, 65(4):1062–
Verlag, New York, 1988. 1077, 2017.
[9] Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek [26] Cosmin Safta, Richard L.-Y. Chen, Habib N. Najm, Ali
Śmieja, Łukasz Struski, Agnieszka Słowik, and Łukasz Pinar, and Jean-Paul Watson. Toward using surrogates
Maziarka. Spatial graph convolutional networks. In Neu- to accelerate solution of stochastic electricity grid opera-
ral Information Processing, pages 668–675, Cham, 2020. tions problems. In 2014 North American Power Symposium
Springer International Publishing. (NAPS), pages 1–6, 2014.
[10] Michaël Defferrard, Xavier Bresson, and Pierre Van- [27] C. E. Shannon. A mathematical theory of communication.
dergheynst. Convolutional neural networks on graphs with Bell System Technical Journal, 27(3):379–423, 1948.
fast localized spectral filtering. In D. Lee, M. Sugiyama, [28] R.E. Stern, J. Song, and D.B. Work. Accelerated monte
U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in carlo system reliability analysis through machine-learning-
Neural Information Processing Systems, volume 29. Curran based surrogate models of network connectivity. Reliability
Associates, Inc., 2016. Engineering System Safety, 164:1–9, 2017.
[11] Michael B. Giles. Multilevel Monte Carlo methods. Acta [29] Ze Zhou Wang, Changlin Xiao, Siang Huat Goh, and Min-
Numerica, 24:259–328, May 2015. Xuan Deng. Metamodel-based reliability analysis in spa-
[12] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, tially variable soils using convolutional neural networks.
Oriol Vinyals, and George E. Dahl. Neural message pass- Journal of Geotechnical and Geoenvironmental Engineer-
ing for quantum chemistry. In Doina Precup and Yee Whye ing, 147(3), 2021.
Teh, editors, Proceedings of the 34th International Con- [30] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long,
ference on Machine Learning, volume 70 of Proceedings Chengqi Zhang, and Philip S. Yu. A comprehensive survey
of Machine Learning Research, pages 1263–1272. PMLR, on graph neural networks. IEEE Transactions on Neural
06–11 Aug 2017. Networks and Learning Systems, 32(1):4–24, 2021.
[31] Mengfei Xu, S. Song, Xuxiang Sun, and Weiwei Zhang.
Ucnn: A convolutional strategy on unstructured mesh.
ArXiv, abs/2101.05207, 2021.
[32] Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J.
Smola. Dive into Deep Learning. 2020. https://d2l.ai.
[33] Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejew-
ski. Graph convolutional networks. Computational Social
Networks, 6(1), 2019.
[34] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang,
Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li,
and Maosong Sun. Graph neural networks: A review of
methods and applications. AI Open, 1:57–81, 2020.
[35] L. Zhou, G. Yan, and J. Ou. Response surface method based
on radial basis functions for modeling large-scale structures
in model updating. Comput. Aided Civ. Infrastructure Eng.,
28:210–226, 2013.