Performance-Effective Algorithm for Solving Large-Scale Forward Gravity Problem for Elliptical Objects Petr S. Martyshko1,2 , Igor V. Ladovskii1 , Denis D. Byzov1 , and Alexander I. Chernoskutov1,2 1 Bulashevich Institute of Geophysics, Ural branch of Russian Academy of Sciences, Yekaterinburg, Russia 2 Yeltsin Ural Federal University, Yekaterinburg, Russia Abstract. In this paper we purpose a performance-effective algorithm for solving forward gravity problem for elliptical objects. The algorithm is subject to parallel computations, it was implemented and tested with the CUDA technology. In general, the suggested method can be applied to density distribution models of arbitrary geometry that can be trian- gulated. Keywords: computational geophysics, potential fields, gravity field, spher- ical Earth crust models 1 Problem statement Modern studies of the internal structure of the Earth require density model reconstructions with high resolution and accuracy. This raises questions on error estimation due to spherical shape of the Earth and creates the need of refinement of the modelling results with sphericity taken into account. Thus, there is a need for an effective tool that allows solving the forward gravity problem for spherical and elliptical density models. In this paper, we propose an algorithm based on the new representations of the triangular plate potential in space that makes it possible to effectively use the capabilities of modern computational parallelization technologies (in particular, CUDA) and, as a result, to obtain high-performance gravity field solving software. Define the 3D density model as follows. Upper boundary S of the model (ground-air interface) is a sector of a surface of an ellipsoid of revolution (for example, Krasovsky Ellipsoid), all points located at a distance of no more than H along the inner normal to S are forming a set D (see Fig. 1). In the domain D ⊂ R3 , the density distribution ρ(p) is defined, p ∈ D. The vertical component of the gravitational field intensity ∆g generaded by the domain D at the outer point q ∈ / D \ ∂D is determined by the integral Z ∂ ρ(p)dVp ∆g(q) = −γ , (1) ∂nq D |r − r0 | 97 Fig. 1. Visual representation of the 3D density model where γ is the gravitational constant, dVp is a volume element of integration, nq is the outer normal to S in the orthogonal projection of the point q to S, r and r0 are the radius vectors of the points p and q, respectively. Assume the ellipsoid of rotation for our spherical model D has equatorial and polar radii equal to a and b. The position of the point on the surface of the ellipsoid of rotation is uniquely defined by the geodesic latitude B ∈ [ −π/2; π/2] and the longitude L ∈ ( −π; π] (except for the “poles”, where longitude is not defined). As a third coordinate H we employ the distance between the given point and the ellipsoid surface; it has a plus sign if the point lies outside the ellipsoid, and the minus sign if the point lies inside it. Position of the given point 2 in space is uniquely defined by three values: (L, B, H), where H √ p > −N (1 − e ), 2 2 2 2 e = a − b /a (the eccentricity of the ellipsoid), and N = a/ 1 − e sin B. 2 Gravity field of a spherical model Assume Di,j,k is a descrete element of D and the density of a descrete element be a constant ρi,j,k . Then the field ∆g at point q generated by D is equals to x −1 N NX y −1 Nz −1 X X ∆g(q) = γ ρi,j,k Gi,j,k (q), (2) i=0 j=0 k=0 where Gi,j,k (q) is the field at the point q of Di,j,k up to a coefficient γ. In order to calculate Gi,j,k (q), we introduce in the space of a spherical model a rectangular Cartesian coordinate system O0 x0 y 0 z 0 . The center O0 coincides with the center of the ellipsoid, O0 z 0 axis is ellipsoid’s axis of rotation and directed from the “south pole” to the “north” (i.e. when B = π/2, then z 0 > 0), the O0 x0 axis points to (0, 0, 0) in (L, B, H), the axis O0 y 0 complements the system to the right-handed 98 axis set. Equations for transition from (L, B, H) to O0 x0 y 0 z 0 are then as follows:  0 x = (N + H) cos B cos L  y 0 = (N + H) cos B sin L , (3)  0  z = (N (1 − e2 ) + H) sin B and   cos B cos L n =  cos B sin L  , (4) sin B where n is a unit vector of the external normal to the ellipsoid at the point (L, B, 0). Integral (1) for the Gi,j,k (q) field can not be expressed analytically. It is also problematic to calculate in numerically (with the help of the cubature formulas), since the boundaries of Di,j,k usually have a very complex description in the (L, B, H) or (x0 , y 0 , z 0 ) coordinate system and, in order to achieve acceptable accuracy, a large number of nodes is required [1, 6]. Therefore, we propose to calculate integral (1) not for Di,j,k , but for the polyhedron of approximation D̂i,j,k formed by the triangulation of Di,j,k . Thereby, Z ∂ ρ(p)dVp Gi,j,k (q) ≈ Ĝi,j,k (q) = − . (5) ∂nq D̂i,j,k |r − r0 | In formula (5), we proceed to integration over the surface, using the Ostro- gradsky theorem Z ! I ! 1 np Ĝi,j,k (q) = nq , ∇p dVp = nq , dS . D̂i,j,k |r − r0 | ∂ D̂i,j,k |r − r0 | Next, we divide the surface integral into parts along the faces of D̂i,j,k and take into account that the outer normal np at the integration point is constant for each face Z X dS Ĝi,j,k (q) = (nq , ni1 ) , (6) Si1 |r − r0 | Si1 ∈S(D̂i,j,k ) where ni1 is the outer normal to the face Si1 . Now we need to acquire a formula R dS for the integral Si1 over a triangle (which is the gravitational potential |r − r0 | of a triangle with a unit surface density without γ coefficient). Let ri (i = 1, 2, 3) be the radius vector of the vertices of the triangle hp1 , p2 , p3 i; q is the field computation point; ai = ri − r0 ; a(j,i) = ai − aj = ri − rj ; N = a(i−1,i) × a(i,i+1) is the normal to the plane of the triangle with a length equal to its doubled area; n = N/|N| is the unit normal to the plane of the triangle; A(j,i) = a(j,i) /|a(j,i) |; 99 (ai · n) is the distance from the point q to the plane of the triangle (with the appropriate sign). Then Z  dS (A1 ; [A2 ; A3 ]) = 2 n; −a1 arctan + hp1 ,p2 ,p3 i |r − r0 | 1 + (A3 ; A1 ) + (A1 ; A2 ) + (A2 ; A3 ) 3 ! X [ai−1 ; ai ] |ai−1,i | + arctanh . i=1 |ai−1,i | |ai−1 | + |ai | (7) Summarizing the process of calculating the vertical component of the field ∆g(q) from the model D, we have: 1. for each element Di,j,k using coordinates transformation (3), obtain the set of faces S(D̂i,j,k ) of the approximating polyhedron D̂i,j,k (in the Cartesian system O0 x0 y 0 z 0 ); 2. with the help of formulas (6), (7) calculate the vertical component of the field Ĝi,j,k (q) for the polyhedron D̂i,j,k at the point q, the normal vector n(q) is calculated by formula (4); 3. assume Gi,j,k (q) ≈ Ĝi,j,k (q) and by field summation of each D̂i,j,k (2) acquire ∆g(q). 3 Implementation In the presented algorithm, computation of formula (7) is taking most of the time of the running program. Indeed, in order to calculate the field of body that consists of Nb discrete elements in Nf points of space, the expression should be calculated N = kNb Nf times, where k is the amount of triangles in one element of the body. In practice, N is of order of 1015 for high-resolution Earth crust models (with surface area of order 1000×1000km and depth of 100km). With- out using contemporary parallel technologies such computations would become senseless since it would take months or even years to run. Hence, our goal is to utilize most powerful computational devices available, which are GPUs. Here, we assume that all the preprocessing steps have been done and as our input we have a set of triangles T (represented by triplets of points in space) that has been obtained as a result of triangulation of the discrete elements of the body. Additionally, we have a set of points Q, in which we want to calculate the normal component (or a vector) of gravitational field of the body. We are not going into details of this step because its computational complexity does not depend on Nf ; thus, its becoming insignificant for large values of Nf . The field computation requires minimum two nested cycles through Q and T . The iterations of the outer loop (over Q) are mutually independent and can be split between different computing nodes or/and GPUs. The inner loop (over T ) can be expressed using map-reduce pattern: gq = reduce(mapq (T )), where g is the resulting gravitational field vector the at the point q. The trick is that those operations are implemented with high efficiency in the CUDA Thrust library [5] in the form of a single function: thrust::transform reduce(). All we were left to do is to implement formula (7) and pass it as a parameter. 100 4 Performance tests All the tests have been performed using two regional Earth crust models that have been previously reconstructed in Bulashevich Institute of Geophysics (Ural Branch of Russian Academy of Sciences) as a result of solving inverse grav- ity problem [2–4]. Test models characteristics are shown in Table 1. Krasovsky Ellipsoid is taken as the reference one. Table 1. Test models characteristics Discretization of the Discretization of the Physical size model (with respect to field (with respect to Name (length×width×height) length, width, height) length, width) Workload 1 793km×1057km×81km 256×256×81 256×256 Workload 2 1336km×969km×81km 1336×969×81 334×243 Comparison of our most resent CPU available (i7-6900K 8 physical cores @ 3.2GHz) and our oldest GPU (GeForce GTX TITAN Black) showed that GPU outperforms CPU from 15 to 20 times even for considerably small workloads (Workload 1). So, future tests were conducted using only the GPUs. Fig. 2. Performance test results 101 Figure 2 shows scalability potential of the algorithm. A slight drop of per- formance after switching from 2 GPU to 4 and from 4 to 8 is mostly due to the fact that the load has been split between GPUs evenly, when actualy it needs to be carefully balanced (since our cluster consists of different models of GPUs3 ). The resulting field for the second workload is shown on Fig. 3. Fig. 3. The resulting field for the second workload Acknowledgments This work was partly supported by the Center of Excellence “Geoinformation technologies and geophisical data complex interpretation” of the Ural Federal University Program and by the The Russian Foundation for Fundamental Re- search (project 17-05-00916 ). References 1. Heck, B., Seitz, K. A.: Comparison of the tesseroid, prism and point-mass ap- proaches for mass reductions in gravity field modelling. Journal of Geodesy. 81, 121–136 (2007) 2. Ladovskii, I. V., Martyshko, P. S., Byzov, D. D., Kolmogorova, V. V.: On Select- ing the Excess Density in Gravity Modeling of Inhomogeneous Media. Izvestiya, Physics of the Solid Earth, Vol. 53, No. 1, 130-139 (2017) https://doi.org/10. 1134/S1069351316060057 3 GPUs of the cluster: 2x Quadro M6000 24GB, 3x GeForce GTX TITAN Black, 3x GeForce GTX TITAN X 102 3. Martyshko, P. S., Ladovskii, I. V., Fedorova, N. V., Byzov, D. D., Tsidaev, A. G.: Theory and methods of complex interpretation of geophysical data. Yekaterinburg: Ural Department of Russian Academy of Sciences, 94p. (2016). ISBN 978-5-7691- 2463-1. 4. Martyshko, P. S., Byzov, D. D., Ladovskii, I. V., Tsidaev, A. G.: 3D density models construction method for layered media. 15th International Multidisci- plinary Scientific GeoConference SGEM 2015, www.sgem.org, SGEM2015 Con- ference Proceedings, Albena. Bulgaria. Book 2 Vol. 1, 425–432 (2015) https: //doi.org/10.5593/SGEM2015/B21/S8.053 5. NVIDIA CUDA Toolkit Documentation. Trust.: http://docs.nvidia.com/cuda/ thrust/index.html 6. Wild-Pfeiffer, F., Augustin, W. und Heck, B.: Optimierung der Rechenzeit bei der Berechnung der 2. Ableitungen des Gravitationspotentials von Massenelementen. Zeitschrift für Geodäsie. № 6 (132), 377–384 (2007)