=Paper= {{Paper |id=Vol-2076/paper-01 |storemode=property |title=Memory Efficient Algorithm for Solving the Inverse Problem of Finding a Density in a Curvilinear Layer |pdfUrl=https://ceur-ws.org/Vol-2076/paper-01.pdf |volume=Vol-2076 |authors=Elena N. Akimova,Vladimir E. Misilov,Maxim S. Arguchinsky }} ==Memory Efficient Algorithm for Solving the Inverse Problem of Finding a Density in a Curvilinear Layer== https://ceur-ws.org/Vol-2076/paper-01.pdf
         Memory Efficient Algorithm for Solving
        the Inverse Problem of Finding a Density
                  in a Curvilinear Layer

               Elena N. Akimova1,2 (ORCID 0000-0002-4462-5817),
               Vladimir E. Misilov1,2 , and Maxim S. Arguchinsky2
                   1
                      Ural Federal University, Ekaterinburg, Russia
    2
        Krasovskii Institute of Mathematics and Mechanics, Ural Branch of RAS,
                                 Ekaterinburg, Russia
         aen15@yandex.ru, v.e.misilov@urfu.ru, deryabino.49@mail.ru


        Abstract. A memory efficient algorithm is constructed and implemen-
        ted for solving the inverse problem of finding a variable density in a
        curvilinear layer using gravitational data. The algorithm is based on
        the stabilized biconjugated gradient method. The modification based on
        approximation of the SLAE matrix by Toeplitz-block-Toeplitz one sig-
        nificantly reduces memory requirements. The parallel algoritms were im-
        plemented for the Uran supercomputer using the hybrid MPI+OpenMP
        technology. A model problem with synthetic data was solved.

        Keywords: density reconstruction problem, linear inverse problem, ill-
        posed problems, SLAE, Toeplitz matrix, BiCGSTAB method, parallel
        computing, MPI, OpenMP


1   Introduction
The problem of finding a density in a horizontal or curvilinear layer between
given depths was studied in works [1–7]. Solution process consists in two sep-
arate steps. The first one is to extract approximately anomalous effect of the
considered layer from the observed gravitational field using special technique [3].
This technique was implemented for various multiprocessor systems [5, 6]. The
extracted field is the right part of an integral equation of the desired density.
After discretization of the area and approximation of the integral operator, both
of these steps are reduced to solving ill-conditioned systems of linear algebraic
equations.
    In work [7], a parallel algoritm for solving density reconstruction problem was
created on the basis of the stabilized biconjugate gradient (BiCGSTAB) method
and implemented for multicore processor using the OpenMP technology.
    In this work, this algorithm is implemented using the hybrid MPI+OpenMP
technology for the Uran supercomputer installed at the Krasovskii Institute of
Mathematics and Mechanics.
    The memory efficient algoritm for solving this problem is constructed and
implemented for multicore processor. The modification based on approximation
2

of the SLAE matrix by Toeplitz-block-Toeplitz one significantly reduces memory
requirements.


2       Statement of density reconstruction problem
Let us assume a Cartesian coordinate system where plane x0y coincides with
the Earth’s surface, and axis z is directed downwards.
   The vertical attraction of gravity ∆g(x, y, 0) generated by the layer
Π = {(x, y, z) : H1 (x, y) 6 z 6 H2 (x, y)} is given by the following equation [7]:

             Z∞ Z∞ 
                                         1
        f              p                                      −
                       (x − x0 )2 − (y − y 0 )2 − H12 (x, y)
            −∞ −∞                                                                             (1)
                                                      
                                1
            −p                                          σ(x0 , y 0 )dx0 dy 0 = ∆g(x, y, 0),
                (x − x0 )2 − (y − y 0 )2 − H22 (x, y)

where f is the gravitational constant, σ(x, y) is the density distribution in the
layer.
    Let us assume that the boundary surfaces H1 (x, y) and H2 (x, y) satisfy the
conditions H1 (x, y) < H2 (x, y), ∀(x, y) ∈ R2 , lim Hi (x, y) = const, and
                                                  x,y→±∞
                                                                     
the density is constant in areas (x, y, z) : H1 6 z 6 H1 (x, y) and (x, y, z) :
H2 (x, y) 6 z 6 H2 , where H1 = min H1 (x, y) and H2 = max H2 (x, y).
                                              x,y                        x,y
    Problem (1) is a linear two-dimensional Fredholm integral equation of the
first kind of σ(x, y); thus, the problem is ill-posed.
    After discretization on a grid n = M × N and approximation of the integral
operator using quadrature formulae, expression (1) takes the following form

          M 
        N X
        X                           1
    f         q                                            −
                                               2
                 (xv − xi ) − (yu − yj )2 − H1,(i−1)M
                           2
     i=1 j=1                                            +j
                                                  
                           1
    −q                                              σ(i−1)M +j ∆x∆y = b(v−1)M +u ,
                                      2
         (xv − xi )2 − (yu − yj )2 − H2,(i−1)M +j

    u ∈ [1..M ], v ∈ [1..N ].
                                                                            (2)
   The coefficient matrix of SLAE is an ill-conditioned nonsymmetric matrix of
n × n = M N × M N dimension.
   After application of the regularization using the Lavrentiev scheme [1], we
can rewrite this SLAE in the following form:

                                               Az = b,                                        (3)
where A = A + αE, α is the regularization parameter.
                                                                                 3

3   Algorithms for solving the problem and their parallel
    implementation
BiCGSTAB algoritm. For solving system (3), we use BiCGSTAB algoritm
(Fig.1) [8]. In work [7], we showed that this method is the most efficient of
iterative gravient methods for solving SLAE with nonsymmetric matrix.



      r0 = b − Az 0 , r = r0 ,           ρ0 = α0 = ω 0 = 1,    v0 = q0 = 0
      for(k = 0; k + +; )do
                                      ρk αk−1
      ρk = hr, r( k − 1)i,      βk =
                                    ρk−1 ωk − 1
      q k = rk−1 + β k (q k−1 − ω k−1 v k−1 )
                               ρk
      v k = Aq k ,   αk =
                             hr, v k i
      sk = rk−1 − αk v k
                          htk , sk i
      tk = Ask ,     ω=
                          htk , tk i
      z k = z k−1 + ω k sk + αk q k ,      rk = sk − ω k tk
      enddo

                           Fig. 1. BiCGSTAB algorithm


    The initial estimate is z 0 = 0. The condition Az k − b / kbk < ε for some
sufficiently small ε is taken as a termination criterion.
    For large fine grids, storing of coefficient matrix A can take large memory
space. E.g., for the 29 × 29 grid, the matrix would be of 218 × 218 dimension. Full
storage of this matrix with double precision requires 512 GB of memory, which
is more than one node of the Uran supercomputer has.
    For solving the density reconstruction problem using the BiCGSTAB method,
a parallel algoritm was developed utilizing the MPI technology. The coefficient
matrix A is divided by a number of horizontal bars with respect to the number
of used nodes, and each bar is stored and processed on its own node. The hy-
brid MPI+OpenMP technology was used to parallelize the algoritm for nodes’
multicore processors.
    Memory efficient algoritm. In work [6], the memory-efficient algoritm for
solving the density reconstrustion problem in a horizontal layer was proposed
and implemented for multicore processor. This algoritm is based on exploiting
the Toeplitz-block-Toeplitz structure of the coefficient matrix.
4

    The elements of matrix can be written in the following form:
                               
                                                  1
              ak,p,l,q = γ∆x∆y p                                    −
                                    (xk − xl ) + (yp − yq )2 + H12
                                              2
                                                                               (4)
                                                  1
                               −p                                     ,
                                    (xk − xl )2 + (yp − yq )2 + H22
where k, l = 1..M are the block indices and p, q = 1..N are the indices of elements
inside each block.
    Apparently, the matrix elements depend only on the terms (xk − xl )2 + (yp −
    2
yq ) . Note that ak,p,l,q = ak,p+1,l,q+1 and ak,p,l,q = ak+1,p,l+1,q . The former
equation means that in each block each descending diagonal from left to right is
constant. The latter one means that each block diagonal is constant as well. In
other words, the matrix is symmetric Toeplitz-block-Toeplitz.
    The matrix structure for a 6 × 4 grid is shown in Fig. 2. The matrix has 4 × 4
blocks of 6 × 6 dimension. Equal elements are marked by the same colors.




                      Fig. 2. Matrix structure for a 6 × 4 grid


   The obvious way of storing this matrix is to store the first row only. Each
subsequent row is obtained by double cycle shifting of elements. This method
requires storing M · N elements, whilst full storage requires M 2 · N 2 .
   In the case of a curvilinear layer, the matrix elements will have the form
                           
                                                  1
         ak,p,l,q = γ∆x∆y q                                             −
                                                            2
                               (xk − xl ) + (yp − yq )2 + H1,(p−1)M
                                         2
                                                                     +k
                                                                             (5)
                                                  1
                           −q                                             ,
                                                            2
                               (xk − xl )2 + (yp − yq )2 + H2,(p−1)M +k
                                                                                5

where H1,(p−1)M +k and H2,(p−1)M +k are the values of depth of the boundary
surfaces at the grid nodes. Obviously, these values vary, thus, matrix A will not
be Toeplitz-block-Toeplitz.
    In this work, we propose the following way to store this matrix approximately.
Note that the farther the matrix element is from the main diagonal, the lower
its value is. We compute the elements of the main block diagonal (i.e., elements
that has p = q) using formula (5), and the rest of the elements are replaced by
their approximations computed by using formula (4) with some constant values
                                       M
                                       PN
H1 and H2 . The mean values Hw =          Hw,i /M N can be used.
                                     i=1
    The number of stored elements will be M (N 2 + 1). E.g, to store the 218 × 218
matrix, we need only 256 MB instead of 512 GB needed for full storage. Thus,
to solve the problem using this algorithm, we need only one node with multicore
processor. The algorithm was implemented utilizing the OpenMP technology


4   Numerical experiments

Let us demonstrate application of the constructed algorithms to solving the
model density reconstruction problem on a large grid.
    The model problem of reconstructing the density distribution in a curvilinear
layer between the depths H1 = 9.5 km and H2 = 11.5 km is considered.
    The boundary surfaces are shown in Fig. 3.




                            Fig. 3. Boundary surfaces



    Figure 4 shows the model gravitational field. This field was obtained by
solving the forward problem (1) using the model density distribution σ(x, y).
6




                          Fig. 4. Model gravitational field



    Figure 5a shows the original density distribution σ(x, y).




         Fig. 5. Original σ (a) and reconstructed σ̂ (b) density distributions
                                                                                     7

   The inverse problem was solved on 29 × 29 grid by the BiCGSTAB algorithm
and the memory efficient algorithm as described above.
   Figure 5b shows the reconstructed density distribution σ̂. The regularization
parameter was α = 0.1, ε = 0.005 was used as termination criterion.
   Solution by the memory efficient algorithm has a relative error kσ̂ − σk / kσk
not greater than the error of solution by the unmodified BiCGSTAB. Both errors
are lower than 0.05.
   Table 1 shows the computing times for solving the problem by the BiCG-
STAB method using the MPI technology with the Uran supercomputer. Each of
supercomputer’s nodes has Intel E5-2650 CPU and 192 GB of RAM. One core
was used on each node. The coefficient matrix requires 512 GB for full storage.
Thus, to solve the problem, we need at least 4 nodes.


               Table 1. Computing times of BiCGSTAB algorithm

                                  Computing time           Relative speedup
            Number of nodes m
                                    Tm , minutes             Sm = T4 /Tm
                    4                    5.7                      –
                    8                    3                       1.9
                    16                   1.6                     3.6



    The last table line contains the time to solve the problem utilizing the hy-
brid MPI+OpenMP technology with 8 cores at each node. The computing time
reduces to 20 seconds and the efficiency of this technology was 70 %.
    The memory efficient algorithm requires 256 MB to store matrix; thus, to
solve the problem, we need only one node with multicore processor.
    Table 2 shows computing times, parallel speedup, and efficiency for solving
the problem by the memory efficient algorithm using the node with two eight-
core Intel E5-2650 CPUs. The parallel algoritm show good scaling.


             Table 2. Computing times of memory efficient algorithm

      Number of threads,    Computing time          Speedup             Efficiency
              m               Tm , minutes         Sm = T1 /Tm         Em = Sm /m
              1                   22.2                 –                   –
              2                   11.5                1.9                 0.96
              4                   5.9                 3.7                 0.94
              8                    3                  7.4                 0.92
              16                  1.5                 14.8                0.92
8

5   Conclusion
The memory efficient parallel algoritm was constructed on the basis of the Bi-
CGSTAB method for solving the inverse problem of density reconstruction in a
curvilinear layer. The modification is based on approximation of matrix blocks
by the blocks of Toeplitz-block-Toeplitz matrix. Optimized method for matrix
storage significantly reduces the memory requirements.
    The parallel programs were developed for solving the inverse problem of
density reconstruction in a curvilinear layer on the Uran supercomputer.
    The first program is based on BiCGSTAB algorithm with full matrix storage
using the hybrid MPI+OpenMP technology. The second program is based on the
memory efficient algorithm with optimized matrix storage utilizing the OpenMP
technology.
    The model problem with synthetic data was solved, the speedup and effi-
ciency of parallel algorithms was studied.

Acknowledgments
This work was partly supported by the Center of Excellence “Geoinformation
technologies and geophisical data complex interpretation” of the Ural Federal
University Program and by Ural Branch of RAS, project 18-1-1-8.

References
 1. Lavrentiev, M. M., Romanov, V. G., Shishatskii, S. P.: Ill-Posed Problems of Math-
    ematical Physics and Analysis. Translations of Math. Monographs 64. Amer. Math.
    Soc., Providence (1986)
 2. Novoselitskii, V. M.: On the Theory of Determining Density Variations in a Hor-
    izontal Layer from Gravity Anomaly Data. Izv. Akad. Nauk SSSR, Fiz. Zemli.
    Vol. 5, 25–32 (1965)
 3. Martyshko, P. S., Prutkin, I. L.: Technology of depth distribution of gravitational
    field sources (in Russian). Geophysicheskii Journal. Vol. 25 (3), 159–168 (2003)
 4. Martyshko, P. S., Koksharov, D. E.: On finding the density in a layered medium
    using gravitational data (in Russian). Geophysicheskii Journal. Vol. 27 (4), 678–684
    (2005)
 5. Martyshko, P. S., Fedorova, N. V., Akimova, E. N., Gemaidinov, D. V.: Studying
    the structural features of the lithospheric magnetic and gravity fields with the use
    of parallel algorithms. Izvestiya, Physics of the Solid Earth. Vol. 50(4), 508–513
    (2014) https://doi.org/10.1134/S1069351314040090
 6. Akimova, E. N., Martyshko, P. S., Misilov, V. E., Kosivets, R. A.: An efficient
    numerical technique for solving the inverse gravity problem of finding a lateral
    density. Applied Mathematics and Information Sciences. Vol. 10, Iss. 5, 1681–1688
    (2016) http://doi.org/10.18576/amis/100506
 7. Akimova, E. N., Martyshko, P. S., Misilov, V. E.: On finding a density in a curvi-
    linear layer by biconjugate gradient type methods. AIP Conference Proceedings.
    Vol. 1863, 050009 (2017) https://doi.org/10.1063/1.4992206
 8. Van der Vorst, H. A.: Iterative Krylov methods for large linear systems. Vol. 13.
    Cambridge University Press (2003)