<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GPU accelerated Monte Carlo sampling for SPDEs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikolay Shegunov</string-name>
          <email>nshegunov@fmi.uni-so</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Armianov</string-name>
          <email>parmianov@fmi.uni-so</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Atanas Semerdjiev</string-name>
          <email>asemerdjiev@fmi.uni-so</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Iliev</string-name>
          <email>oleg.iliev@itwm.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Informatics, Sofia University</institution>
          ,
          <addr-line>Bulgaria ITWM Fraunhofer, Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>99</fpage>
      <lpage>106</lpage>
      <abstract>
        <p>Monte Carlo sampling methods is a broad class of computational algorithms, that rely on repeated random sampling to obtain numerical results. The idea of such algorithms is to introduce randomness to solve problems, even in the deterministic case. Such algorithms are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches due to limitations, such as cost of performing experiment or inability to take direct measures. The problems typically require solving a stochastic partial differential equations (SDPEs), where an uncertainty is incorporated in to the model. For example: as an input parameter, or initial boundary condition. Extensive efforts have been devoted to the development of accurate numerical algorithms, so that simulation predictions are reliable in since that the numerical errors are well understood and under control for practical problems. Multilevel Monte Carlo (MLMC) is a novel idea. Instead of sampling from the true solution, a sampling is done at different levels. Such approach is beneficial in terms of convergence rate. However for practical simulations a large number of problems has to be solved, with huge number of unknowns. Such computational restrictions naturally leads to challenging parallel algorithms. To overcome some of the limitations, here we consider a parallel implementation of MLMC algorithm for a model SPDE, that uses GPU acceleration for the permeability generation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Many mathematical models describing industrial problems are subject to
uncertainty due to some limitation. Incorporating the uncertainty typically
leads to more accurate representation of the world. However it is in the
cost of solving a statistical problem, for example one can consider saturated
flow in subsurface, or heat condition in Metal Matrix Composites. Such
stochastic models require enormous computational effort, thereby requiring
new fast algorithms to facilitate that need. In this paper we consider a scalar
elliptic SPDE describing single phase flow throughout heterogeneous porous
media. Although the approach here is not limited to this problem, we aim at
computing the mean flux throughout saturated porous media with prescribed
pressure drop and known distribution of the random coefficients.</p>
      <p>
        One of the preferred and powerful methods for solving SPDE is Multilevel
Monte Carlo algorithm (MLMC). The algorithm exploits a combination of
fewer expensive computations with a plenty of cheap ones to compute
expected values at significantly lower computational cost then standard Monte
Carlo. A key component is the selection of the different levels of combination.
Many different approaches exist. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] authors use the number of therms of
Karhunen-Loewe expansion in order to define coarser levels. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], similarly
to our approach, the coarser levels are defined on coarser grids via averaging
of the coefficients in the PDE. Here we construct the levels by
renormalization. For details we reefer to [
        <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
        ]. For generation of permeability (random
field) we use circulant embedding algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Due to the rapidly advancing area of computer science, methods based
on Monte Carlo sampling are of great interest. Such methods are suitable
for parallelization and can compute the expected value in reasonable time.
For realistic simulations usually a HPC implementation is need. Here we
investigate the possible benefits of generating permeability on GPUs and
study the reduction of time in overall computation of MLMC.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Model problem</title>
      <p>In order to test the overall performance of MLMC in combination with CUDA
generated permeability field, we consider simple model problem in a unit
cube domain, steady state single phase flow in random porous media. This
problem, illustrates well the challenges in solving stochastic PDEs.
−∇ · [k(x, ω)∇p(x, ω)] = 0 for x ∈ D = (0, 1)d, ω ∈ Ω
(1)
Subject to boundary conditions:
with dimension d = {2, 3}, pressure p, scalar permeability k, and random
vector ω. The quantify of interest is the mean (expected value) of the total
flux through the unit cube:
ˆ</p>
      <p>x=0
Q(x, ω) :=</p>
      <p>k(x, ω)∂np(x, ω)dx</p>
      <p>Both the coefficient k(x, ω) and the solution p(x, ω) are subject to
uncertainty, characterized by the random vector ω in a properly defined space.</p>
      <p>Solving this equation can be broken into three sub-problems: generating
permeability (random field), solving the deterministic problem and
reducing the variance with MLMC method. We briefly discuss each
of them.</p>
      <sec id="sec-2-1">
        <title>Generating permeability field</title>
        <p>
          Generating permeability fields is essential problem in solving the SPDE. Here
we consider a practical covariance proposed by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]:
(2)
(3)
C(x, y) = σ2exp(−||x − y||p/λ), p = 2
(4)
Where || · ||p denotes the lp norm. in Rd.
which satisfies:
        </p>
        <p>E[K(x, .)] = 0,
E[K(x, .), K(y, .)] = C(x − y) = C(y − x)
for x, y ∈ D and K(x, ω) = log(k(x, ω))</p>
        <p>
          Several approaches has been developed of generating random
permeability fields applicable to flow simulations. Here we use an algorithm based on
forward and inverse Fourier transform over circulant covariance matrix to
generate permeability. Realization of such field is governed by two
parameters: standard deviation σ and correlation length λ. More details can be
found in the papers [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Solving the deterministic problem</title>
        <p>
          Literature provides different numerical schemes for solving PDEs. In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
an Multi-scale Finite element methods is used. Here for solving the elliptic
PDEs corresponding to each realization of permeability field, we use finite
volume method on a cell centered grid. This method is mass conservative.
More details can be found in [
          <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Variance reduction</title>
        <p>
          We shortly recall the idea proposed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Let {Ml : l = 0 . . . L} ∈ N be increasing sequence of numbers called levels,
with corresponding quantities {QMl}lL=0, and s ≥ 2 be coarsening factor, such
that we have Ml = sMl−1, for l = 0 . . . L. Defining Yl = QMl − QMl−1 and
setting Y0 = QM0, we can use telescopic sum to write the following identitity
for the expected value E</p>
        <p>E[QM ] = E[QM0] +</p>
        <p>E[QMl − QMl−1] =</p>
        <p>E[Yl]
(5)
L
l=1</p>
        <p>L
l=0
1 L
2
l=0
The expectation on the finest level is equal to the expectation on the coarsest
level plus sum of corrections, i.e. differences in the expectations on each pair
of consecutive levels. The terms in (5) are approximated using standard MC
independent estimators, with Nl samples. For the mean square error we have:
nl = α
(vl/tl) with Lagrangian multiplier α =
(vl/tl)
(8)
L
l=0
(6)
(7)
e(QMM,LN )2 = E[(QMM,LN − E[Q])2] =</p>
        <p>Nl−1V [Yl] + (E[QM − E[Q])2
Our goal is to have:
e(QMM,LN )2 =</p>
        <p>Nl−1V [Yl] + (E[QM − E[Q])2 ≤ 2ε2</p>
        <p>Denote vl = V [Yl], and let tl be the mean time computing difference Yl
once, and T = lL=0 nltl be the total time for the computation. Minimizing
T under the above constraint, with Lagrangian multipliers and turning it to
integer value gives us:</p>
        <p>
          To define the levels, the resolution of the discretization is used, such that
the number of square cells in D = 2 and cubic cells in D = 3 are exact power
of 2. Thus on the finer level we have 4 times more cells than on the coarser
level for D = 2, and 8 times more for D = 3. To approximate the random
field, we employ heuristic technique, witch combines each 4 cells in 2D into
one by combination of simple arithmetic, geometric and harmonic average.
For details we refer to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>(a) Permeability
(b) Solution</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Simulation results</title>
      <p>The computational time used by the setup of the generation of 100
permeability fields with σ = 2.0 and λ = 0.2 is shown on figure 2. Our underling
grid is in 2D with size, approximately 1.7 ∗ 107 . As it can be observed, the
speed up that is achieved using only CPU cores for computation at all of the
steps is significant up to around 8 cores. At this point it is approximately
up to 6.755 times faster than calculation on a single core. Using more than 8
cores efficiency gradually decreases and with 24 cores the speed up achieved
is approximately 15 times. The GPU generation is much faster: using single
GPU, generation time is comparable to 11 CPU cores, and on 2 GPUs -
approximately 22 CPU cores. This is expected since permeability generation is
mainly a forward Fourier transform over a matrix, followed by multiplication
with random numbers for each element and then inverse Fourier transform.</p>
      <p>Table 1 and 2 show the comparison of 3 level MLMC implementation
using only CPU versus implementation, that uses GPU for permeability
generation. The presented results are averaged over 10 runs of the algorithm.
0.1</p>
      <p>Single Problem Performance</p>
      <p>CPU Generation
1 GPU Generation
2 GPU Generation</p>
      <p>The fine grid is of size 210 ∗ 210 and permeability generation parameters are
σ = 1.5, λ = 0.1. Each problem is computed by own CPU core and all use
shared GPU, thus in a given moment of the execution 12 or 24 concurrent
calls to the GPUs may exist. This is more taxing situation for the GPU
compared to a distribution where a group of processes solves a single problem. In
table 1 one can observe that MLMC calculated on single GPU and 12 CPU
cores is faster than same algorithm using only CPU cores and the execution
of the generation step on the different levels is approximately 2 times faster
while the execution of the same MLMC implementation on single GPU and
24 CPU cores shows that the overall time is slower than execution only on
the same number of CPU cores. The performance is significantly improved
with introduction of a second GPU. The generation times are notably lower
than the case with a single GPU.</p>
      <p>
        The implementation of the algorithm, used for that test is written in
C++. The method used for solving the PDE conjugate gradient
preconditioned with AMG if provided by DUNE library [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For implementation
of circulant embedding algorithm the fftw library is used for generation on
CPU, and cufft library provided by NVIDIA is used for generation on GPU.
The Multilevel Monte Carlo algorithm is implemented with pure MPI. All
tests are performed on HybriLIT-education and testing cluster, Dubna
Russia, on a GPU node with two NVIDIA TESLA K80 GPUs and Intel Xeon
E5-2695 v2 processors.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Generating random filed using modern GPU accelerated computing is
promising possibility, witch can decrease the computational time of the generation
step of the MLMC algorithm if the task is small enough to fit in the GPU
memory. When the task is larger, however, the execution time using GPU
can be larger than using only CPU cores. The idea of GPU acceleration
is fairly new in the area of scientific computing and there are not many
libraries implemented, using GPU accelerated approach. Further more, HPC
systems with GPU enabled nodes are rather expensive and the number of
large scientific computing clusters with GPU capabilities is relatively small.
On the other hand, in the recent years, some large commercial clusters for
GPU computing were built, inspired by the block-chain and neural networks
trend. A further investigation on solving large scientific problems on such
clusters is an interesting study.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Cliffe</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheichl</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teckentrup</surname>
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>Multilevel Monte Carlo Methods and Applications to Elliptic PDEs with Random Coefficients</article-title>
          .
          <source>Computing and Visualization in Science</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (pp.
          <fpage>3</fpage>
          -
          <lpage>15</lpage>
          ). Springer(
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Mohring</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iliev</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohlberger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bastian</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Uncertainty Quantification for Porous Media Flow Using Multilevel Monte Carlo</article-title>
          .
          <source>In International Conference on Large-Scale Scientific Computing</source>
          (pp.
          <fpage>145</fpage>
          -
          <lpage>152</lpage>
          ). Springer(
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Iliev</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohring</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shegunov</surname>
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Renormalization Based MLMC Method for Scalar Elliptic SPDE</article-title>
          . In: International Conference on LargeScale Scientific Computing (pp.
          <fpage>145</fpage>
          -
          <lpage>152</lpage>
          ). Springer(
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Blaheta</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>B</surname>
          </string-name>
          ´ereˇs,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Domesova´,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>A study of stochastic FEM method for porous media flow problem</article-title>
          . In: Bris,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Dao</surname>
          </string-name>
          , P. (eds.) Applied Mathematics in Engineering and Reliability (pp.
          <fpage>281</fpage>
          -
          <lpage>289</lpage>
          ).
          <fpage>2016</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lunati</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giudici</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parravicini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzini</surname>
          </string-name>
          , G.:
          <article-title>A numerical comparison between two upscaling techniques: non-local inverse based scaling and simplified renormalization</article-title>
          .
          <source>Advances in Water Resources</source>
          <volume>24</volume>
          (pp.
          <fpage>913</fpage>
          -
          <lpage>929</lpage>
          ).
          <source>Elsevier</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Renard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Marsily</surname>
          </string-name>
          , G.:
          <article-title>Calculating equivalent permeability: a review</article-title>
          .
          <source>Advances in Water Resources</source>
          <volume>20</volume>
          (
          <issue>5</issue>
          ) (pp.
          <fpage>253</fpage>
          -
          <lpage>278</lpage>
          ).
          <source>Elsevier</source>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuo</surname>
            <given-names>F.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nuyens</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheichl</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sloan</surname>
            ,
            <given-names>I.H. QuasiMonte</given-names>
          </string-name>
          <article-title>Carlo methods for elliptic PDEs with random coefficients and applications</article-title>
          .
          <source>Journal of Computational Physics (230)</source>
          (pp.
          <fpage>3668</fpage>
          -
          <lpage>3694</lpage>
          ).
          <source>Elsevier</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hoeksema</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kitanidis</surname>
            ,
            <given-names>P.K.</given-names>
          </string-name>
          :
          <article-title>Analysis of the spatial structure of properties of selected aquifers</article-title>
          .
          <source>Water Resources Research</source>
          <volume>21</volume>
          (
          <issue>4</issue>
          ) (pp.
          <fpage>563</fpage>
          -
          <lpage>572</lpage>
          ). (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Bastian</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blatt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engwer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klofkorn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohlberger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>A Generic Grid Interface for Parallel and Adaptive Scientific Computing. Part I: Abstract Framework</article-title>
          .
          <source>Computing</source>
          <volume>82</volume>
          (
          <issue>2-3</issue>
          ) (pp.
          <fpage>103</fpage>
          -
          <lpage>119</lpage>
          ). (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>