Solving the Structural Inverse Gravimetry Problem in the Case of Multilayered Medium Using GPU? Elena Akimova1,2[0000−0002−4462−5817] , Vladimir Misilov1,2 , and Murat Sultanov3[0000−0002−0068−0996] 1 Ural Federal University, Ekaterinburg, Russia 2 Krasovskii Institute of Mathematics and Mechanics, Ekaterinburg, Russia 3 Akhmet Yassawi International Kazakh-Turkish University, Turkistan, Kazakhstan aen15@yandex.ru, v.e.misilov@urfu.ru, murat.sultanov@ayu.edu.kz Abstract. An optimized parallel algorithm is constructed for solving the nonlinear inverse gravimetry problem of finding several boundary surfaces between layers in multilayered medium. The algorithm is based on the modified nonlinear conjugate gradient method with weighting factors. The efficient implementation for GPU was developed. A model problem with synthetic gravitational data was solved. Keywords: Inverse gravimetry problem · Conjugate gradient method · GPU 1 Introduction The problem considered in this paper is finding several interfaces between layers in a multilayered medium using known gravitational data [1, 2]. This problem is described by a nonlinear integral Fredholm equation of the first kind; so, it is ill-posed. The real gravity measurements are carried out over a large area producing the large-scale grids. Processing the gravitational data is a time con- suming process and requires a lot of memory. So, it is necessary to develop parallel algorithms for parallel computing systems. In works [3, 4], for solving the structural gravimetry problem, efficient parallel algorithms based on the modified conjugate gradient method were proposed and implemented multicore CPUs. The modification is based on approximation of the Jacobian matrix by calculating only significant elements. This method reduces the computation time in comparison with the full calculation approach. Here, we construct a parallel algorithm on the basis of this method and implement it for GPUs using the CUDA technology. We compare the CPU and GPU implementations in terms of computation time in solving the model problem with synthetic gravitational data. ? This work was financially supported by the Ministry of Education and Science of the Republic of Kazakhstan (project AP 05133873). 34 E. Akimova et al. 2 Statement of the Inverse Gravity Problem for the Model of Multilayered Medium We assume that the lower half-space is composed of several layers with constant densities, which are separated by the sought surfaces Sl , l = 1..L, where L is the number of boundary surfaces (see, Fig. 1). Fig. 1. Model of multilayer medium The gravitational field due to this half-space is equal to the sum of the gravitational fields due to each surface. Let the boundary surfaces be specified by the functions z = ζl (x, y), let the density contrasts on them be ∆σl , and let the surfaces have the horizontal asymptotic planes z = Hl . The field ∆g produced by the superposition of the boundary surfaces and measured on the Earth’s surface z = 0 is found by the following equation (with accuracy up to a constant term of summation) [1]: Z∞ Z∞  X 1 f ∆σl p − l=1..L (x − x0 )2 + (y − y 0 )2 + ζ 2 (x0 , y 0 ) −∞ −∞ (1)  1 −p dx0 dy 0 = ∆g(x, y, 0), (x − x0 )2 + (y − y 0 )2 + ζ 2 (x0 , y 0 ) where f is the gravitational constant. Solving the Structural Inverse Gravimetry Problem 35 This equation is the Fredholm nonlinear equation of the first kind of functions ζl and is ill-posed: it has a nonunique solution which unstably depends on the initial data. After the discretization of equation (1) on the rectangular grid n = M × N , where the right-hand side ∆g(x, y, 0) is given, and the approximation of the left-side integral operator A(ζ1 , ..., ζl ) by the quadrature rules, we obtain the vector of the right-hand side F with the length n, the combined surfaces vector z = [ζ1 (x1 , y1 ), ..., ζ1 (xM , yN ), ..., ζ2 (x1 , y1 ), ..., ζ2 (xM , yN ), ..., ζL (xM , yN )] with the length Ln, and the system of nonlinear equations having the form A(z) = F. (2) 3 Algorithm for Solving the Inverse Problem In this work, to solve problem (2), we use the approach based on the modified linearized conjugate gradient method with weighting factors [5]. This method has the following form: hpk , S(z k ) k z k+1 = z k − ψ p , kA0 (z k ) − pk k pk = v k + β k pk−1 , p0 = v 0 , hv k , v k − v k−1 i (3) β k = max ,0 kv k−1 k v = γ ◦ S(z), S(z) = A0 (z)∗ (A(z) − F ), where z k is the solution estimate at kth iteration, ψ is the damping factor, γ is the vector of weighting factors, A0 (z) is the Jacobean matrix of the discretized integral operator A(z), ◦ is operation of componentwise vector multiplication. In this work, we propose the following rules for selection of the weighting factors: [F1 , ..., FL ] → [f1 , ..., fLn ] → [γ1 , ..., γLn ], Fl → [γn(l−1) , γn(l−1)+1 , ..., γnl ], p (4) fi2 + µ γl = p , 0 < µ < 1, max { fi2 + µ} i=1..n where µ is the smoothing parameter. The fields Fl are extracted from the total field F using heightwise transfor- mation technique from [6]. The condition kA(z) − F k / kF k < ε for sufficiently small ε is used as the termination criterion for the iterative process. 36 E. Akimova et al. 4 Numerical Implementation In works [3, 4], the efficient technique for reducing the computing time was pro- posed and implemented. The main idea is to drop out small elements and replace matrix A0 with the block-band matrix A0 as shown in Fig. 2. The elements inside each square block depend on terms (x − x0 ) and (y − y 0 ). Thus, the farther the matrix element is from the main diagonal of block, the less its value is. For each block, we get the bandwidth parameter 0 < βl ≤ 1 by automatic adjustment procedure. First, we find the maximal element amax l of the block. To find it, we just need to look over the main diagonal of the block. Then, we set the parame- ter βl in such way that the resulting band matrix elements would be larger then some threshold, i.e. aij > θamaxl for the threshold parameter θ. Fig. 2. Scheme of matrix structure 5 Parallel Implementation for GPU For solving the inverse problem, the parallel algorithm was developed for graphics processor utilizing the CUDA technology. Most expensive part is calculation of the Jacobian matrix A0 and vector A(z k ) at each iteration. The matrix and vector are divided into a number of fragments, and each fragment is processed by its own thread. The elements of the Jacobian matrix are calculated on-the-fly, which means that the value of an element is computed when calling this element, without storing it in memory. The adjustment of the kernel execution parameters for the grid size is an important problem. In previous works [7, 8], we implemented the original method for automatic adjustment of parameters. For the reference 128 × 128 grid and M2090 GPUs, the optimal parameters were found manually. For the grid sizes divisible by 128, the reference parameters are multiplied by the coefficient. When using multiple GPUs, the x dimension is divided by the number of GPUs; i.e., the number of threads in the block is reduced while number of blocks in the grid remains constant. This imposes some constraints on the input data and GPUs configuration: grid size should be divisible by 128 (128, 256, 512, 1024 ...) and GPUs number should be a power of 2 (1, 2, 4, 8, ...). Solving the Structural Inverse Gravimetry Problem 37 6 Numerical Experiments To test the constructed modified algorithm and to compare it with the unmodi- fied one in terms of execution time, we use the model problem of reconstructing three boundary surfaces on a large grid (512 × 512 nodes) using the quasi-real data. The model gravitational field shown in Fig. 3 was obtained by solving the forward problem using three surfaces S1 , S2 , S3 (see, Fig. 4a) with asymprotic planes H1 = 10km, H2 = 20km, H3 = 30km from work [3] and density contrasts ∆σ1 = ∆σ2 = ∆σ3 = 0.2g/cm3 . This surfaces were obtained from gravity maps [9] for an area of 600x600 km near Ekaterinburg, Russia. Fig. 3. Total gravitational field The inverse problem was solved using the eight-core Intel Xeon E5-2650 pro- cessor and NVIDIA Tesla M2090 GPUs) incorporated in Uran parallel computing system. The Jacobian matrix for this problem is 262144 × 786432. The threshold parameter θ = 0.1 was used, which gives the bandwidth parameters β1 = 0.078, β2 = 0.094, β3 = 0.1. The smoothing parameter was µ = 0.2. For the stopping criterion, ε = 0.1 was used. The algorithm took 90 iterations. The reconstructed surfaces S1 , S2 , S3 are shown in Fig. 4b. The relative errors Sl − Sl / kSl k are lower than 1%. 38 E. Akimova et al. Fig. 4. Original surfaces (a) S1 , S2 , S3 and reconstructed surfaces (b) S1 , S2 , S3 (up to down) The table contains the execution times Tm for various number m of Tesla M2090 GPUs, relative speedup Sm = T1 /Tm and efficiency Em = Sm /m. It also contains computation time for eight-core CPU. Table 1. Results of the numerical experiments Devices Computation time Tm , minutes Speedup Sm Efficiency Em 1× M2090 30 1 1 2× M2090 16 1.88 0.94 4× M2090 8 3.75 0.93 8× M2090 4.13 7.24 0.9 Intel Xeon E5-2650 480 The implementation for multiple GPU demonstrates an excellent scaling; the efficiency is more than 90% for eight GPUs. Solving the Structural Inverse Gravimetry Problem 39 7 Conclusion For solving the inverse gravimetry problem of finding several boundary surfaces in multilayered medium, the parallel algorithm was constructed and implemented for multiple GPUs using the CUDA technology. The algorithm is based on the nonlinear conjugate gradient method and on the approach with weighting fac- tors previously proposed by authors. The numerical implementation uses the modification based on approximation the Jacobian matrix by dropping out the less significant elements and to replacing the matrix by a block-band one. The model problem of reconstructing three surfaces using the quasi-real grav- itational data was solved on a large grid. The GPU implementation reduces the computation time by two orders of magnitude in comparison with CPU. The multi-GPU implementation demonstrates an excellent scaling and nearly 90% efficiency. References 1. Akimova E. N., Martyshko P. S., Misilov V. E.: Algorithms for solving the structural gravity problem in a multilayer medium. Doklady Earth Sciences 453(2), 1278–1281 (2013) 2. Akimova E. N., Martyshko P. S., Misilov V. E.: Parallel algorithms for solving structural inverse magnetometry problem on multicore and graphigs processors. In: International Multidisciplinary Scientific GeoConference Surveying Geology and Mining Ecology Management SGEM 2014. Vol. 2. Issue 2, pp. 713–720 (2014) 3. Akimova E. N., Misilov V. E., Klimov M. V.: Modified Conjugate Gradient Method for Solving the Nonlinear Inverse Gravimetry Problem in the Case of Multilayered Medium. In: 18th International Multidisciplinary Scientific Geoconference SGEM 2018, Vol. 18. Issue. 1.1, pp. 893–900 (2018) 4. Akimova, E., Misilov, V., Klimov, M.: Modified parallel algorithm for solving the inverse structural gravity problem. In: 17th International Conference on Geoinfor- matics - Theoretical and Applied Aspects, EAGE (2018) 5. Martyshko P. S., Akimova E. N., Misilov V. E.: Solving the structural inverse gravity problem by the modified gradient methods. Izvestiya, Physics of the Solid Earth 2(5), 704–708 (2016) 6. Martyshko P. S., Fedorova N. V., Akimova E. N., Gemaidinov D. V.: Studying the structural features of the lithospheric magnetic and gravity fields with the use of parallel algorithms. Izvestiya, Physics of the Solid Earth 50(4), 508–513 (2014). 7. Akimova, E.N., Misilov, V.E., Tretyakov, A.I.: Optimized algorithms for solving structural inverse gravimetry and magnetometry problems on GPUs. In: Sokolinsky, L., Zymbler, M. (eds.) PCT 2017. CCIS, vol. 753, pp. 144–155. Springer (2017) 8. Akimova, E.N., Misilov, V.E., Tretyakov, A.I.: Modified Componentwise Gradient Method for Solving Structural Magnetic Inverse Problem. In: Sokolinsky, L., Zym- bler, M. (eds.) PCT 2018. CCIS, vol. 910, pp. 162–173. Springer (2018) 9. Bonvalot S., Balmino G., Briais A., M. Kuhn, Peyrefitte A., Vales N., Biancale R., Gabalda G., Reinquin F., Sarrailh M.: World Gravity Map. Commission for the Geological Map of the World, Eds. BGI-CGMW-CNES-IRD, Paris (2012)