=Paper=
{{Paper
|id=Vol-2962/paper40
|storemode=property
|title=Groundwater Flow Meta-model for Multilevel Monte Carlo Methods
|pdfUrl=https://ceur-ws.org/Vol-2962/paper40.pdf
|volume=Vol-2962
|authors=Martin Špetlík,Jan Březina
|dblpUrl=https://dblp.org/rec/conf/itat/SpetlikB21
}}
==Groundwater Flow Meta-model for Multilevel Monte Carlo Methods ==
Groundwater Flow Meta-model for Multilevel Monte Carlo Methods Martin Špetlík, Jan Březina Technical University of Liberec, Faculty of Mechatronics, Informatics and Interdisciplinary Studies, Institute of New Technologies and Applied Informatics Studentská 1402/2, 461 17 Liberec 1, Czech Republic martin.spetlik@tul.cz, jan.brezina@tul.cz Abstract: This paper presents a meta-modeling-based that this approach could reduce the computational cost of technique to reduce the computational cost of Monte Carlo MLMC estimates. It is also necessary to maintain the ac- methods. A stochastic simulation of groundwater flow curacy of the estimates to achieve a good PDF approxima- is substituted with a graph neural network meta-model. tion. Graph convolutional neural networks and the support This type of neural network can deal with a non-euclidean vector regression [2] are used as meta-models. structure of the input data, which in our case is a graph First, the groundwater flow problem definition and the representing a random field on an unstructured mesh. In maximum entropy method brief introduction are provided order to find the most suitable meta-model, a comparison in section 2. Then, Monte Carlo methods are presented in of the standard support vector regression with spectral and section 3. Spektral and spatial graph convolutional neu- spatial graph convolutional neural networks is provided. ral networks used as meta-models are delineated in sec- Both the Monte Carlo method and the multilevel Monte tion 5. Section 6 provides the link between the MLMC Carlo method are extended by the meta-model level. In and a meta-model, including the related work. Finally, both cases, up to 50% savings in computational cost are section 7 is devoted to the results and discussion. achieved while maintaining the accuracy of Monte Carlo estimates. 2 Problem Definition 1 Introduction A 2D benchmark problem of groundwater flow through a porous medium is used to test our proposed methods. The Groundwater flow in the vicinity of a future nuclear waste same problem can be found in the paper by Blaheta et al. repository is the major motivation for our research. In the [5]. The groundwater flow is described by the boundary first stage, our quantity of interest (QoI) is the total flow value problem on the unit square: through the observed area. However, since not all indis- −div(K(x)∇p) = 0 (1) pensable properties of the rock environment are known, it is not feasible to determine the desired total flow. For −K∇p ⋅ n⃗ = 0, that reason, and given the nature of the rock, the missing K is the hydraulic conductivity, p is the pore water pres- properties are modeled as random fields (RFs). The proba- sure, −K∇p is Darcy’s velocity, n⃗ is the unit normal vector. bility density function (PDF) of our QoI is under scrutiny. We are interested in the total flow Y through the specified To approximate the PDF, the maximum entropy method is area adopted. The method utilizes so-called generalized statis- 1 tical moments. In order to estimate these moments, the Y = ∫ [−K∇p ⋅ n⃗](1,y)dy. (2) 0 Monte Carlo methods are employed. These methods con- The problem is prescribed on unstructured meshes and is sist in repeating a random experiment. Depending on the solved by the finite element method using the Flow123d required accuracy, it might result in thousands of ground- software [6]. Since the proposed approach shall be water flow simulations, which can significantly affect the adopted for more complicated problems with complex ir- total computational cost. regular geometry, unstructured meshes are preferred over This fact was behind the idea to substitute simulations structured ones. The hydraulic conductivity is consid- with an approximation model, often called a meta-model. ered as a random field (RF) with the exponential covari- Furthermore, we use the multilevel Monte Carlo method ance function C(r) = σ 2 exp(− λr ), where λ is a correla- (MLMC) instead of the standard Monte Carlo method to tion length and σ is a standard deviation. The GSTools reduce the variance of estimates more effectively. Since software library [23] is utilized to generate RF; see Hesse the MLMC uses simulations of different accuracy, we aim et al. [16] for theoretical details. to substitute those with the lowest accuracy, performed in the largest number. Their approximation shouldn’t be excessively challenging for meta-modeling. We presume 2.1 Maximum Entropy Method Copyright ©2021 for this paper by its authors. Use permitted under When considering the hydraulic conductivity as a random Creative Commons License Attribution 4.0 International (CC BY 4.0). field, the total flow Y is a real random variable. The aim is numbers. Therefore, in order to obtain an unbiased esti- mate of the expected value Y = E[P(x)], the MC consists in the arithmetic mean of N independent samples 1 N Ŷ = ∑ P(xn ), (5) N n=1 where P is a random variable depending on X. According to the central limit theorem: σ2 Ŷ ∼ N (µ, ). (6) N Consequently, the computational cost of reducing the vari- Figure 1: An illustration of the porous media flow problem ance (increasing N) of Ŷ can be very high. To overcome this drawback (for details, see [11, p. 4]), the multilevel Monte Carlo method (MLMC) was formulated. to determine its probability density function ρ(Y ), which is nonnegative on a bounded domain Ω. The maximum entropy method [17] (MEM) is employed for this purpose. 3.1 Multilevel Monte Carlo Method Its basic idea lies in approximating a PDF from general- In the case of the MLMC [11], the expected value of a ized statistical moments and functions that calculate them: random variable P is estimated based on the sequence of its approximations P1 , ..., PL : ∫ ρ(y)dy = 1, Ω (3) L ∫ φr (y)ρ(y)dy = µr , r = 1,...,R E[P] = E[P1 ] + ∑ E[Pl − Pl−1 ], (7) Ω l=2 where ρ(Y ) is an unknown, {µr }Rr=1 are values of gen- eralized moments, {φr }Rr=1 are linearly independent func- where Pl−1 ≈ Pl . The estimate of the expected value of P tions (see [3, p. 1350]) used for calculating generalized improves from P1 to PL . The unbiased estimate of E[P]: moments and satisfying φ1 = 1, φr ∈ CR (Ω), r = 2,...,R. In our case, {φr }Rr=1 are the Legendre polynomials, R ∈ N is 1 N1 L 1 Nl P̂ = ∑ P1 (xn ) + ∑{ ∑ (Pl (xn ) − Pl−1 (xn ))}, (8) 1 l l the number of moments. N1 n=1 l=2 Nl n=1 Provided that the system of equations 3 has a solution, then it has an infinite number of them. Therefore, the where L is the total number of levels, Nl is a number of most apposite solution is the one with the maximum Shan- simulation samples at level l. Input random data xnl are non entropy (defined in [27]) according to E. Jaynese [17, dependent for each simulation pair n at level l. But they p. 623]. Determining ρ(Y ) with the maximum entropy are independent across levels. Since Pl − Pl−1 are also in- takes the form of finding the global maximum of the func- dependent across levels, the total estimated variance V̂ of tional P̂ has the following form: H(ρ) = − ∫ ρ(y)ln(ρ(y))dy (4) L Ω V̂l V̂ = ∑ , (9) under the constraints prescribed in equations 3. l=1 Nl A numerical solution and general limitations of the MEM are beyond the scope of this paper. A more detailed where V̂1 is an estimate of P1 variance and V̂l for l > 1 is an insight is provided in [3, 4]. estimate of Pl − Pl−1 variance. The MLMC computational cost: L 3 Monte Carlo Methods C = ∑ Nl Cl , (10) l=1 Since the MEM utilizes moment values {µr }Rr=1 that are where Cl is a cost of a single simulation sample at level l also random variables derived from some random input X measured in, for instance, the number of computational (which is the hydraulic conductivity K in this study). It is operations, execution time, CPU time, etc. Let C1 denote necessary to estimate their expected values. Monte Carlo a cost of P1 , Cl is a cost of Pl − Pl−1 for l > 1. methods are used for this. Given the MLMC theoretical properties stated in [11, The standard Monte Carlo method (MC) is an approach theorem 1], variance Vl should decrease from l = 1 to l = L, for estimating expected values of some stochastic simula- while computational cost Cl should increase from l = 1 to tion variable. The basic idea comes from the law of strong l = L. 3.2 Number of Simulation Samples number of grid points to capture features from original unstructured data, according to [22]. Wang et al. [29] The way of determining Nl is a crucial part of the MLMC. proposed a CNN on unstructured meshes. This novel ap- There are two main approaches: proach is limited to random fields described by the spectral • minimizing the total variance V with respect to the representation method or the Karhunen-Loève expansion. prescribed computational cost Another popular approach is the representation of unstruc- • minimizing the total computational cost C with re- tured meshes as undirected graphs, which allows us to use spect to the prescribed target variance Vt . graph convolutional neural networks (GCNs) [31]. In this study, we use GCN-based meta-models and support vector Our attention is paid to the latter approach, which has the regression meta-models. form of finding the minimum of equation 10 under the con- straint L V̂l 4.1 Graph Representing a Random Field Vt = ∑ . (11) N l=1 l A graph G = (V,E) is an unweighted undirected graph de- After some calculus and concerning the use of the MEM, scribing the mesh structure on which a random field is gen- Nl are determined with respect to R moment values erated. V is a set of vertices representing mesh elements, V = {v1 ,v2 ,...,vS }, S is the number of vertices (=mesh el- ¿ √ Á V̂ r ∑Li=1 V̂irCi ements). The neighborhood of a vertex v is defined as Nlr = Á À l , r = 1,...,R, (12) N(v) = {u ∈ V ∣(v,u) ∈ E}. Each vertex has a feature rep- Cl Vt resenting a random field value at the corresponding mesh element. E is a set of edges. An edge connects adjacent where V̂lr is an estimated variance of φlr (x) − φl−1 r (x) for mesh elements. An adjacency matrix A is used to represent r-th moment at level l. Finally, Nl = max Nl .r G on a computer. r=1,...,R The mlmc software library [7] is employed to schedule samples and post-process results, including our MEM im- plementation. 4 Meta-modeling A meta-model is a simpler and explicit mathematical func- tion that approximates complicated functions that can be both implicit and evaluated by a simulation model, mea- surements data, or experiments. Alternative names can be found in the literature, such as surrogate model, sur- face response model, emulator, etc. Meta-modeling is the process of designing meta-models. The common meta- models used in the finite element analysis include support vector regression (SVR), artificial neural network (ANN), Kriging, also known as Gaussian process [19], radial basis Figure 2: An example of a graph on 546 mesh elements function (RBF) [35], etc. Artificial neural networks have become very popular for meta-model design. A regression problem is solved by an ANN trained by supervised learning. The aim is to ob- 5 Graph Neural Networks tain an unknown mapping of input neuron activities to an output neuron activity. With regard to the nature of our A variety of real-world problems can be represented as input data, which is a 2D correlated random field, it would graphs. Imagine social networks, molecules, or in our be suitable to use a convolutional neural network (CNN) case, random fields on unstructured meshes. meta-model. Nevertheless, CNNs cannot be applied di- Graph neural networks (GNNs) are deep learning-based rectly to data on unstructured meshes (see [31]). models that operate on the graph domain [34]. They have There are few ways how to overcome this difficulty. The a wide range of applications in classification, relation ex- basic solution is to directly apply a 1D CNN to nodal val- traction, molecular fingerprints prediction, and so on. A ues of an unstructured mesh considered as a vector. How- comprehensive survey on graph neural networks can be ever, it has limited success [15]. The other option is in- found in [30]. In brief, GNNs are categorized into sev- terpolating data from an unstructured mesh to a structured eral groups, such as graph convolutional networks [33], mesh and then using a CNN. This approach leads to an ad- graph attention networks, graph recurrent networks, etc. ditional error caused by interpolation. It requires a larger The graph convolutional networks (GCNs) are the most important ones because they are the fundamental of other ChebNet GCN with a Chebyshev convolutional layer was graph neural network models (see [21]). GCNs can be di- first introduced by Defferrard et al. [10] in 2016. The fil- vided into spectral GCNs and spatial GCNs. ter g is approximated by a truncated expansion of Cheby- shev polynomials Tk (x) up to the K th order: K−1 5.1 Spectral Graph Convolutional Neural Networks gθ = ∑ θk Tk (Λ̃), (19) k=0 Spectral GCNs are based on knowledge from the field of graph signal processing. The well-known convolution has where θk ∈ Rk is a vector of Chebyshev coefficients, the following form: Λ̃ = 2Λ/λmax − I, λmax is the maximum eigenvalue from Λ. The Chebyshev polynomials are defined recursively by Tk≥2 (x) = 2xTk−1 (x) − Tk−2 (x) with T0 (x) = 1, T1 (x) = x. ( f ∗ g)(x) = ∫ f (t)g(x −t)dt, (13) Rk Then x filtered by g: but it is unclear how to interpret the translation g(x −t) for K−1 x ∗G g = U( ∑ θk Tk (Λ̃))U T x. (20) a graph signal. Thus the convolution operation is not de- k=0 fined on structures like graphs. W. Hamilton [13] or K. Ot- ness [24] provides a detailed explanation of the graph con- The eigendecomposition can be avoided [30, p. 10]: volution. The basic idea is to take advantage of the fact K−1 that convolution in the spatial domain corresponds to the x ∗G g = ( ∑ θk Tk (L̃))x, (21) point multiplication in the spectral domain. k=0 A graph signal x ∈ RS (vector of all G vertex values) is where L̃ = 2L/λmax −I. L̃ is called the rescaled graph Lapla- transformed by a graph Fourier transform from the spatial cian, the eigenvalues are mapped from [0,λmax ] to [−1,1], domain to the spectral domain. Loosely speaking, the stan- the Chebyshev polynomials form an orthogonal basis. dard Fourier transform is connected to the eigendecompo- Several K settings were tested. Since K > 1 did not bring sition of the Laplace operator. In the case of graphs, the improvement for our problem, K = 1 is used in our study. Laplace operator is represented by the Laplacian matrix For this setting, the ChebNet is very similar to the GCN [33, p. 4]. The normalized Laplacian matrix L can also be proposed by Kipf and Welling [18]. Filters are exactly used as a Laplace operator: K-localized. It means the filter modifies information at a L = I − D−1/2 AD−1/2 , particular vertex based on the information from vertices in (14) its K neighborhood. It essentially connects spectral-based methods with spatial-based ones. where A is a graph adjacency matrix, and D is a diag- onal matrix of vertex degrees. L is a symmetric real- valued positive semi-definite matrix that can be factorized 5.2 Spatial Graph Convolutional Neural Networks L = UΛU T , where U is a matrix of eigenvectors and Λ is As for spatial GCNs [9], the convolution is performed in a diagonal matrix of eigenvalues. Then the graph Fourier the graph domain by propagating information along edges transform of x is [34, p. 60]: between adjacent vertices. The concept is based on mes- sage passing neural networks [12]. The spatial graph con- F(x) = U T x (15) volution is defined as follows [30, p. 12]: and its inverse: (k) (k−1) (k−1) (k−1) hv = Uk (hv , ∑ Mk (hv ,hu e ,xvu )), (22) −1 F (x̂) = U x̂. (16) u∈N(v) (k) Finally, the graph convolution of x and a filter g ∈ Rk : where hv represents features of vertex v in a hidden layer e k, xvu is an optional feature vector of an edge (v,u). Uk and x ∗G g = F −1 (F(x) ⊙ F(g)), (17) Mk are the update and message functions with learnable parameters. where ∗G is the convolutional operator on a graph, and ⊙ represents the element-wise Hadamard product. The sig- GraphSage is an aggregation-based learning model pro- nal x can be filtered in the spectral domain by a filter gθ in posed by Hamilton et al. [14]. It performs the convolution this way (for details, see [13, p. 83]): as follows: (k) (k−1) (k−1) x ∗G gθ = Ugθ (Λ)U T x, (18) hv = σ (W (k) ⋅ fk (hv ,{hu ∀u ∈ SN(v) })), (23) where gθ (Λ) is a polynomial of the eigenvalues of the where h0v is a representation vector of vertex v features, σ Laplacian. The filter represents learnable weights. represents a nonlinear activation function, W (k) is a weight matrix in layer k, fk represents an aggregation function, First, the number of training samples Ntr is determined, SN(v) ⊆ N(v). Thus GraphSage enables the use of huge and numerical simulations are executed. Then, for each graphs by selecting a subset from each vertex neighbour- training sample, a stored random field is pre-processed to hood instead of using all neighbours. There are different become a meta-model input, let C pr denote the cost of this operations used as fk , such as average, sum, max, min, etc. operation. The pre-processing can slightly differ for dif- In this study, we use GraphSage with a single layer, f is ferent meta-models. The SVR input is a vector of random the summation. field elements, while the GCN input is their graph. After- A comparison between spectral GCNs and spatial ward, a meta-model is trained, and the cost of this opera- GCNs is summarized by Wu et al. [30, p. 13]. They tion is Cml . Once the meta-model is prepared, it is possible state that spatial GCNs are usually preferred over spectral to make predictions at the cost of C pred . In the case of the GCNs. However, in our problem 2, we aim to approximate meta-level, there is no stored random field because there is the solution of the elliptic partial differential equation. To no simulation executed. Thus a random field is generated do that numerically, it is possible to use spectral methods and pre-processed at the cost of Cr f . [8]. This entitles us to assume that spectral GCNs will be Let C2 denote the computational cost of a sample at the more appropriate. first MLMCmeta level: C2 = C1∗ + (C2tr +C2pr (N2 − Ntr ))/N2 , (25) 6 Monte Carlo Methods with Meta-model where C1∗ is the cost of a simulation sample. C2tr is the cost There are several approaches how to use meta-models to of a meta-model training procedure: reduce the computational cost of Monte Carlo estimates. The relatively frequent approach (e.g., [1, 20, 28]) is to C2tr = C pr Ntr +Cml +C pred Ntr , (26) construct a meta-model of a simulation, then the MC is conducted with the meta-model instead of the simulation. C2pr represents the cost of a first level meta-model predic- Other approaches consist in replacing the whole Monte tion sample: Carlo estimate by a meta-model. For instance, Safta et al. C2pr = C pr +C pred . (27) [26] use polynomial chaos expansion meta-model, which requires a significantly lower number of samples than the Let C1 denote the cost of a meta-level sample that utilizes MC. Rosenbaum and Staum [25] construct a stochastic meta-model trained at the first level: simulation meta-model by way of the MLMC. We propose a different approach based on adding a new C1 = Cr f +C pred . (28) coarse meta-model-based level to the original Monte Carlo Sample costs Cl for l > 2 are not affected by the meta- method (MC or MLMC). The MLMC estimate from equa- modeling. Meta-models training is performed on the clus- tion 8 has now the following form: ter; 16 CPUs (Intel Xeon Silver 4114 CPU (2.2GHz)) and 1 N1 1 N2 16 GB RAM (DDR4 2400 ECC Reg dual rank) are as- P̂meta = ∑ P̃1 (xn ) + 1 ∑ (P1 (xn ) − P̃1 (xn ))+ 2 2 signed for that task. Also, MLMC simulations are exe- N1 n=1 N2 n=1 (24) cuted in parallel. Therefore, C measured in execution time L 1 Nl is not equal to the real elapsed time. However, the savings ∑{ ∑ (Pl (xn ) − Pl−1 (xn ))}, l l l=3 Nl n=1 in C can significantly affect the total elapsed time, espe- cially for a small Vt , which is necessary to obtain a neat where P̃1 denotes a meta-model approximation of P1 . PDF by the MEM. Since the MLMC assumes Pl−1 ≈ Pl , it is completely valid to employ a meta-model as the coarsest level (which we denote as the meta-level) and the difference between P1 7 Results and P̃1 as the subsequent first level. Let MLMCmeta denote the multilevel Monte Carlo method with the meta-level. The proposed meta-modeling techniques are investigated In order to meet basic theoretical properties of the in this section. The most suitable one is used by Monte MLMC (mentioned in section 3.1) the meta-level compu- Carlo methods. The models are compared for different tational cost C1 should be lower than C2 , and meta-level mesh sizes and random field parameters. samples variance V1 should be greater than V2 . The cross-validation-like procedure is implemented to compare models. Initially, N = 50000 simulation sam- 6.1 Computational cost ples are generated. The number of training samples Ntr is empirically determined based on the properties of the The total MLMC cost (see equation 10) depends on Cl , MLMC. Since we are interested in Vt ≤ 1e−5 (for the sake in our case measured in execution time. As a meta-model of the MEM capability to approximate a PDF neatly), the brings additional computational costs, parts of the learning number of samples at the coarsest level is at least 2000 process, including their costs, are introduced. for our problem. Thus, the procedure is as follows: 2000 training samples (Ntr = 2000) out of N are chosen for meta- Table 2 and Table 3 show the models final accuracy model training. The rest is considered to be test data. Val- on training data and test data, respectively. The arith- idation data accounts for 20% of Ntr . This procedure is metic mean and the standard error of 25 calculations of the mean squared error (MSE): D1 ∑Di=1 (y −ymeta ) are pro- i i 2 repeated 25 times with independent training sets. Given that we use a random field with the exponential correla- i i vided, where y is a correct value and ymeta is a predicted tion function, the logarithm of the RF values is used as value, D is a number of samples. The relative squared er- meta-models input to facilitate their training. In all of the ror (RSE) is used to compare the models across cases (see following cases, the final Nl is determined based on the Table 4). geometric sequence of the initial number of samples de- creasing across levels from N10 = 2000 to NL0 = 100. The Table 2: Meta-models train MSE number of moments R = 25 is utilized. number of mesh elements train MSE 7.1 Meta-models Setting 6 48 546 We have tried dozens of GCN topologies. The most arithmetic mean promising ones consist of one ChebNet/GraphSage layer SVR 0.01711 0.007185 0.004657 with 8/60 output channels and the ReLU activation func- ChebNet GCN 0.03476 0.009441 0.004633 tion, following by a global summation pool and the output GraphSage GCN 0.02368 0.01045 0.006848 layer with one neuron with the identity activation function. standard error Table 1 contains main hyperparameters that enable to ob- SVR 4.1e−4 6.5e−5 3.6e−5 tain an accurate and computationally efficient meta-model. ChebNet GCN 7.0e−3 8.1e−5 2.2e−4 Another important hyperparameter is the number of output GraphSage GCN 4.3e−4 9.9e−5 1.8e−4 Table 1: GCN hyperparameters common to all cases Table 3: Meta-models test MSE optimizer Adam hidden activation ReLU number of mesh elements test MSE output activation identity learning rate 0.001 6 48 546 regularization None loss MSE arithmetic mean max epochs 2000 SVR 0.03694 0.008968 0.006696 patience 150 ChebNet GCN 0.04331 0.008392 0.004595 GraphSage GCN 0.02284 0.009200 0.006839 standard error channels. The optimal number differs across types of GCN SVR 6.0e−4 5.4e−5 6.1e−5 layers. It forms a vector of hidden representations corre- ChebNet GCN 2.6e−3 7.9e−5 1.9e−4 sponding to each vertex. It is possible to think of each GraphSage GCN 1.9e−4 1.2e−4 1.9e−4 channel as responding to some different set of features, so different channels could become specialized to recognize different objects as described by Zhang et al. [32]. Table 4: Meta-models test RSE Regarding the SVR, the Gaussian radial basis function (RBF) kernel and the regularization parameter W = 0.06 are used. An explanation of the role of SVR parameters is number of mesh elements test RSE provided in [2]. 6 48 546 7.2 Meta-models on Different Meshes SVR 0.1263 0.1551 0.1335 ChebNet GCN 0.1481 0.1451 0.1064 Three meta-model techniques are compared: the SVR, GraphSage GCN 0.1162 0.1591 0.1364 the ChebNet GCN, and the GraphSage GCN. Recall that low accurate simulations are supposed to be substituted Data in the tables show that all models provide similar with meta-models. Hence the emphasis is placed on meta- results in terms of the train MSE and the test MSE, and models trained on small meshes. In particular, meshes of also, the RSE values are of the same order of magnitude. 6, 48, and 546 elements are compared. Random field pa- Importantly, presented data show no significant outliers rameters are by default λ = 0.1 and σ = 1. For a given mesh that would be highly undesirable for our purposes. Al- size, the models are trained and tested on the same data. though the results are very similar, it can be seen that the optimal meta-model varies depending on the mesh size. Table 5: Computational cost of MLMCmeta to MC While the GraphSage GCN or the SVR is more advanta- geous for very small meshes, the use of the ChebNet GCN number of mesh elements Cmeta /C is most suitable in the case of larger meshes. Nevertheless, additional experiments show that the 6 48 546 meta-model approximation is not sufficiently accurate for meshes of thousands of elements. We face the so-called SVR 0.2077 0.4947 0.4666 curse of dimensionality due to the limited number of train- ChebNet GCN 0.3648 0.4371 0.4395 ing samples. An increasing number of samples can over- GraphSage GCN 0.2812 0.4485 0.4494 come this difficulty, but many simulations need to be performed in such a case, and learning cost increases. Thus, to keep Ntr = 2000, we limited ourselves to meta- In the rest of this section, MLMCmeta employs the Cheb- models based on simulations on meshes with a maximum Net GCN on 546 mesh elements. Since the MEM uti- of ca. 1000 elements. lizes moment values µ, it is important to verify if their In practice, we see that the ChebNet GCN is not only estimates by the MLMCmeta are the same as estimates by better for larger meshes (from ca. 500 up to ca. 1000 el- the original MC. We construct a reference MC (MCref ) of ements) but also provides a more stable learning process 50000 samples on meshes of 217208 elements, moments in terms of a smooth decrease of a validation loss than estimates are denoted as µ̂ ire f . Figure 3 shows the MSE: i i 2 25 ∑i=1 ( µ̂ re f − µ̂ ) for estimated moment values by the the GraphSage GCN. Therefore the ChebNet GCN is pre- 1 25 ferred over the GraphSage GCN for our task. MC, and the MSE: 25 1 i i ∑i=1 (µ̂ re f − µ̂ meta )2 for moments 25 estimated by the MLMCmeta . 7.3 Role of Random Field Parameters MC Random field parameters affect the meta-model learning MLMCmeta ability. Changing the standard deviation σ only scales 0.008 features of graph vertices, which our learning procedure can handle. The correlation length λ plays a more sig- 0.006 nificant role. As λ decreases, the correlation between the MSE features of vertices decreases as well. For small correla- 0.004 tion lengths, e.g., λ = 0.001, all features are almost uncor- related, which has a similar effect as increasing the num- 0.002 ber of vertices with the original correlation length λ = 0.1. Thus, increasing the number of training samples is neces- 0.000 sary to obtain the same results for λ = 0.001. 0 5 10 15 20 25 This reveals one of the SVR drawbacks, which is the re- i-th moment quirement of a lot of training samples [2, p. 76]. While the number of training samples 5000 is enough for the Cheb- Figure 3: Comparison of the MSE between reference mo- Net GCN on meshes of 546 elements and λ = 0.001. In ments and moments estimated by MC and MLMCmeta , the case of the SVR, even 15000 training samples is not Vt = 1e−5 enough to get similar results for λ = 0.001 as for λ = 0.1. For this reason, the ChebNet GCN is preferred for further Since the error of moments estimates is similar for both analysis. It is also worth noting that naturally, fewer train- the MC and the MLMCmeta , it is a good prerequisite for ing samples are enough for λ > 0.1. a fine PDF approximation. Figure 4 shows an example of a PDF approximated by the MEM based on moments from the MC and the MLMCmeta . Given the 25 repeti- 7.4 MC extended by the meta-level tions, the average KL divergence KL(ρre f ∣∣ρMC ) = 0.034 Let use the trained meta-models with the Monte Carlo and KL(ρre f ∣∣ρMLMCmeta ) = 0.031, for R = 25, Vt = 1e−5. method described in section 6. To illustrate our approach, Thus, it is possible to get comparable results by both ap- we use three different MC with simulations on meshes proaches. of 6, 48, and 546 elements, each extended to two-level MLMCmeta . Table 5 shows the ratio between the total cost 7.5 MLMC extended by the meta-level Cmeta of our MLMCmeta and the total cost C of the orig- inal MC, Vt = 1e−5. It is important to note that the ratio Since we cannot effectively train the meta-models on Cmeta /C is a bit smaller for Vt << 1e−5, where the meta- meshes with more than ca. 1000 elements, it is advisable model learning cost is negligible due to the larger N2 . It to use initially the MLMC that generally reduces the com- emerges that the computational cost savings of at least putational cost compared to the MC. Applying formula 24, 50% are achieved for R = 2. a meta-model is trained on the coarsest level, where a 2.0 25 DMC : 0.0002149 MLMC Dmeta : 0.0002178 101 MLMC meta 1.5 MC ref 20 10 1 5.52 1.52 0.0199 1.0 62.6 15 PDF 10 3 Vrl r 0.5 10 10 5 0.0 10 7 5 0.0 0.5 1.0 1.5 2.0 2.5 Y 10 3 10 2 10 1 100 Figure 4: Comparison of PDFs approximated by mo- h - mesh step ments from the MC (blue) and the MLMCmeta (red dotted), Figure 5: Moment variances across MLMC levels. The DMC = KL(ρre f ∣∣ρMC ) is the Kullback-Leibler (KL) diver- level is identified by its simulation mesh step h, added gence between PDF ρMC from MC data and the reference number represents the MLMCmeta Cl PDF ρre f , Dmeta = KL(ρre f ∣∣ρMLMCmeta ), where ρMLMCmeta is the density from MLMCmeta data, Vt = 1e−6, R = 25 Table 6: Computational cost of MLMCmeta to MLMC mesh should have a small number of elements. Thus, R Cmeta /C 25 KL(ρre f ∣∣ρMLMCmeta ) R meta-model learning is feasible. To illustrate the MLMC 2 0.41 0.522 extended by the meta-level, let assume 3 level MLMC 5 0,64 0.00227 with simulations on meshes of 546, 6772 and 87794 el- 15 0,83 0.00338 ements, and the ChebNet GCN meta-model. RF parame- 25 0,87 0.00503 ters: λ = 0.1, σ = 1. 50 0,87 0.0112 Figure 5 shows the variance decrease from the coars- 75 0.85 0.0176 est level to the finest level. Levels are here defined by a simulation step h. The smaller the h, the finer the simula- tion. It can be observed that in our current MLMC imple- mentation, the decrease is steeper for lower moments. In 8 Conclusions general, the more the variance across levels decreases, the more effective the use of the MLMC. Thus, in our case, a higher R results in a less effective MLMC compared to a In this paper, we dealt with the meta-model design for lower R. The number of moments affects the final Nl and the groundwater flow problem. The motivation was to in- consequently the total computational cost C. corporate a meta-model into the multilevel Monte Carlo Table 6 provides the ratio between the MLMCmeta to- method and achieve additional savings in the MLMC com- tal cost Cmeta and the MLMC total cost C. To mea- putational cost. After comparing the meta-models based sure a quality of a PDF approximation, KL divergence on the ChebNet GCN, the GraphSage GCN, and the sup- f ∣∣ρMLMCmeta ) is added to the table, where ρre f de- KL(ρre 25 R 25 port vector regression, the ChebNet GCN was selected notes a PDF based on 25 moments estimated by the MCref , as the most suitable for our task. The proposed meta- R ρMLMC is a PDF approximated from moments esti- modeling technique is effective for simulations on unstruc- meta mated by the MLMCmeta . It can be seen that the compu- tured meshes with a maximum of ca. 1000 elements. tational cost savings are greatest for the smallest number Computational cost savings of up to 50% were achieved of moments. However, R = 2 is insufficient to obtain a de- for both the MC and the MLMC extended by the meta- cent PDF by the MEM. In our case, we need at least R = 5. level. Due to the observed uneven decrease in variances On the other hand, with Vt >= 1e−5 and R > 30, the mo- of MLMC estimates across levels, the amount of compu- ments estimation error causes the PDF to contain obvious tational cost savings depends on the number R of gener- ripples. Using 5 ≤ R ≤ 30, we still achieve savings in the alized statistical moments. In order to obtain a good PDF computational cost of at least 10%. For you to get an idea, approximation by the MEM, we required R ≥ 5. In these for R = 25, Vt = 1e−6, the absolute computational costs in cases, we were still able to achieve at least 10% savings in seconds are as follows: Cmeta ≈ 73266 and C ≈ 84603. computational costs. The effect of R is also manifested in the case of the MC Although the obtained results are auspicious, to pro- extended by the meta-level. In both cases, it can be noted vide more general conclusions, it is necessary to try our that cost savings are almost constant for R ≥ 15. approach with a more complex simulation, which will be more challenging for the meta-model design. Acknowledgement [13] William L. Hamilton. Graph representation learning. Synthesis Lectures on Artificial Intelligence and Machine This work was partly supported by the Student Grant Learning, 14(3):1–159. Scheme at the Technical University of Liberec through [14] William L. Hamilton, Rex Ying, and Jure Leskovec. In- project nr. SGS-2020-3068. ductive representation learning on large graphs. CoRR, The project leading to this work has received funding from abs/1706.02216, 2017. the European Union’s Horizon 2020 research and innova- [15] C. Heaney, Yuling Li, O. Matar, and C. Pain. Applying con- tion programme under grant agreement No 847593. volutional neural networks to data on unstructured meshes with space-filling curves. ArXiv, abs/2011.14820, 2020. [16] Falk Heße, Vladyslav Prykhodko, Steffen Schlüter, and References Sabine Attinger. Generating random fields with a truncated power-law variogram: A comparison of several numerical [1] Fatma Abid, Khalil Dammak, Abdelkhalak El Hami, Tarek methods. Environmental Modelling & Software, 55:32–48, Merzouki, Hassen Trabelsi, Lassaad Walha, and Mohamed May 2014. Haddar. Surrogate models for uncertainty analysis of [17] E. T. Jaynes. Information theory and statistical mechanics. micro-actuator. Microsystem Technologies, 26(8):2589– Physical Review, 106(4):620–630, 1957. 2600, 2020. [18] Thomas N. Kipf and Max Welling. Semi-supervised [2] Mariette Awad and Rahul Khanna. Support vector regres- classification with graph convolutional networks. CoRR, sion. In Efficient Learning Machines, pages 67–80. Apress, abs/1609.02907, 2016. Berkeley, CA, 2015-04-27. [19] J. Kleijnen. Kriging: Methods and applications. ERN: [3] Andrew R. Barron and Chyong-Hwa Sheu. Approximation Computational Techniques (Topic), 2017. of Density Functions by Sequences of Exponential Fami- lies. The Annals of Statistics, 19(3):1347–1369, September [20] Shaoning Li and Luca Caracoglia. Surrogate model monte 1991. carlo simulation for stochastic flutter analysis of wind tur- bine blades. Journal of Wind Engineering and Industrial [4] Claudio Bierig and Alexey Chernov. Approximation of Aerodynamics, 188:43–60, 2019. probability density functions by the Multilevel Monte Carlo Maximum Entropy method. Journal of Computational [21] Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, and Physics, 314:661–681, 2016. Guangyong Chen. Spectral-based graph convolutional net- work for directed graphs, 2019. [5] R Blaheta, M Béreš, and S Domesová. A study of stochas- tic FEM method for porous media flow problem. In Ap- [22] Julian Mack, Rossella Arcucci, Miguel Molina-Solana, and plied Mathematics in Engineering and Reliability, pages Yi-Ke Guo. Attention-based convolutional autoencoders 281–289. CRC Press, 2016-04-13. for 3d-variational data assimilation. Computer Methods in Applied Mechanics and Engineering, 372, 2020. [6] Jan Březina, Jan Stebel, Pavel Exner, and Jan Hybš. Flow123d. http://flow123d.github.com, reposi- [23] Sebastian Müller and Lennart Schüler. GSTools. https: tory: http://github.com/flow123d/flow123d, 2011– //github.com/GeoStat-Framework/GSTools, 2019. 2021. [24] Karl T. Otness. Graph convolutions and machine learning. [7] Jan Březina and Martin Špetlík. MLMC Python library. Thesis or dissertation, Harvard University, 2018. http://github.com/GeoMop/MLMC, 2021. [25] Imry Rosenbaum and Jeremy Staum. Multilevel Monte [8] C. Canuto. Spectral methods in fluid dynamics. Springer- Carlo metamodeling. Operations Research, 65(4):1062– Verlag, New York, 1988. 1077, 2017. [9] Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek [26] Cosmin Safta, Richard L.-Y. Chen, Habib N. Najm, Ali Śmieja, Łukasz Struski, Agnieszka Słowik, and Łukasz Pinar, and Jean-Paul Watson. Toward using surrogates Maziarka. Spatial graph convolutional networks. In Neu- to accelerate solution of stochastic electricity grid opera- ral Information Processing, pages 668–675, Cham, 2020. tions problems. In 2014 North American Power Symposium Springer International Publishing. (NAPS), pages 1–6, 2014. [10] Michaël Defferrard, Xavier Bresson, and Pierre Van- [27] C. E. Shannon. A mathematical theory of communication. dergheynst. Convolutional neural networks on graphs with Bell System Technical Journal, 27(3):379–423, 1948. fast localized spectral filtering. In D. Lee, M. Sugiyama, [28] R.E. Stern, J. Song, and D.B. Work. Accelerated monte U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in carlo system reliability analysis through machine-learning- Neural Information Processing Systems, volume 29. Curran based surrogate models of network connectivity. Reliability Associates, Inc., 2016. Engineering System Safety, 164:1–9, 2017. [11] Michael B. Giles. Multilevel Monte Carlo methods. Acta [29] Ze Zhou Wang, Changlin Xiao, Siang Huat Goh, and Min- Numerica, 24:259–328, May 2015. Xuan Deng. Metamodel-based reliability analysis in spa- [12] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, tially variable soils using convolutional neural networks. Oriol Vinyals, and George E. Dahl. Neural message pass- Journal of Geotechnical and Geoenvironmental Engineer- ing for quantum chemistry. In Doina Precup and Yee Whye ing, 147(3), 2021. Teh, editors, Proceedings of the 34th International Con- [30] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, ference on Machine Learning, volume 70 of Proceedings Chengqi Zhang, and Philip S. Yu. A comprehensive survey of Machine Learning Research, pages 1263–1272. PMLR, on graph neural networks. IEEE Transactions on Neural 06–11 Aug 2017. Networks and Learning Systems, 32(1):4–24, 2021. [31] Mengfei Xu, S. Song, Xuxiang Sun, and Weiwei Zhang. Ucnn: A convolutional strategy on unstructured mesh. ArXiv, abs/2101.05207, 2021. [32] Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. Dive into Deep Learning. 2020. https://d2l.ai. [33] Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejew- ski. Graph convolutional networks. Computational Social Networks, 6(1), 2019. [34] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. AI Open, 1:57–81, 2020. [35] L. Zhou, G. Yan, and J. Ou. Response surface method based on radial basis functions for modeling large-scale structures in model updating. Comput. Aided Civ. Infrastructure Eng., 28:210–226, 2013.