=Paper=
{{Paper
|id=Vol-2267/333-336-paper-63
|storemode=property
|title=Algorithms for the calculation of nonlinear processes on hybrid architecture clusters
|pdfUrl=https://ceur-ws.org/Vol-2267/333-336-paper-63.pdf
|volume=Vol-2267
|authors=Alexander V. Bogdanov,Vladimir V. Mareev,Nikita Storublevtcev
}}
==Algorithms for the calculation of nonlinear processes on hybrid architecture clusters==
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 ALGORITHMS FOR THE CALCULATION OF NONLINEAR PROCESSES ON HYBRID ARCHITECTURE CLUSTERS A.V. Bogdanov a, V.V. Mareev b and N. Storublevtcev c St.Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034 Russia E-mail: a bogdanov@csa.ru, b map@csa.ru, c 100.rub@mail.ru The problem of porting programs from one hardware platform to another has not ceased to be less relevant and simpler with time. The purpose of our work is to identify the key features of algorithms in porting codes for calculating of essentially nonlinear processes to a modern cluster of hybrid architecture that includes both CPUs (Intel Xeon) and GPU (NVIDIA TESLA) processors. As a test problem for studying the process of porting a code to a cluster of hybrid architecture, the KPI equation of Kadomtsev-Petviashvili was chosen, written in integro-differential form [1], [2]. Keywords: High performance computing, CPU architectures, GPU, FPGA Β© 2018 Alexander V. Bogdanov, Vladimir V. Mareev, Nikita Storublevtcev 333 Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 1. Introduction Porting a computational task to a graphics processor is a difficult problem. As a rule, the program is ported to a graphics accelerator for the sake of improving performance. The main problem during the transfer is to preserve the correctness of program execution. It is impossible to transfer the entire code to the graphics processor. The code will always be launched from the central processor. In any program, there are serial sections of code that cannot be parallelized and thus are meaninglessly transferred to a graphics processor due to its peculiarities, nature of GPU architecture and the increased cost of data transfer. 2. The test problem As a test problem consider the two-dimensional Kadomtsev-Petviashvili equation β KPI [π’π‘ + 0.5(π’ 2 )π₯ + π½π’π₯π₯π₯ β πΊ]π₯ = ππ’π¦π¦ (1) Equation (1) with respect to function π’(π₯, π¦, π‘) is considered in the domain π‘ β₯ 0, π₯, π¦ β (ββ, β), π½, π β₯ 0, πΊ(π₯, π¦) is external source [1]. Instead of the original equation (1) its integro-differential analogue is considered [2] π₯ π’π‘ + 0.5(π’2 )π₯ + π½π’π₯π₯π₯ = π β« π’π¦π¦ (π₯β², π¦, π‘)ππ₯β² + πΊ(π₯, π¦) (2) ββ Solution of the equation (2) in half-plane π‘ β₯ 0 is sought for initial distribution π’(π₯, π¦, 0) = π(π₯, π¦). The numerical simulation of the equation (2) is carried out using a linearized implicit finite- difference scheme using in some cases the flux correction procedure (FCT) [3]. For equation (2), the approximation is performed using the central-difference operators. π+1 π βπ‘ π+1 π+1 βπ‘ π’π,π β π’π,π + (πΉπ+1,π β πΉπβ1,π )+π½ (π’ π+1 β 2π’π+1,π π+1 π+1 + 2π’πβ1,π π+1 β π’πβ2,π ) 4βπ₯ 2βπ₯ 3 π+2,π (3) π+1 = βπ‘πππ,π + βπ‘πΊπ,π The resulting system of difference equations (3) is reduced to the form: π+1 π+1 π+1 π+1 π+1 π ππ βπ’πβ2,π +ππ βπ’πβ1,π + ππ βπ’π,π + ππ βπ’π+1,π + ππ βπ’π+2,π = ππ,π (4) π+1 π+1 π π+1 with βπ’π,π = π’π,π β π’π,π and πΉπ,π β‘ (π’ 2 )π+1 2 π π π+1 2 π,π = (π’ )π,π + 2π’π,π βπ’π,π + π(βπ‘ ) Notations used in equation (3) traditional for finite difference schemes: π π π₯ π₯ π π’(πβπ₯, πβπ¦, πβπ‘) = π’π,π , β«ββ π’π¦π¦ ππ₯β² β β«π₯ π π’π¦π¦ ππ₯β² β‘ ππ,π , min with βπ₯, βπ¦ being the spatial coordinates steps, βπ‘ being the time step, [π₯min , π₯max ] Γ [π¦min , π¦max ] Γ [0, π] β computational domain. The boundary conditions are used: π’π₯ = π’π₯π₯ = 0 along boundary lines π₯1 and π₯π, and π’π¦ = 0 along the lines π¦1 and π¦πΏ ( π₯min = π₯1 , π₯max = π₯π , π¦min = π¦1 , π¦max = π¦πΏ : π π π π π π π π π π π’β1,π = π’0,π = π’1,π ; π’π+2,π = π’π+1,π = π’π,π ; π’π,0 = π’π,1 ; π’π,πΏ+1 = π’π,πΏ The system (4) is solved by a five-point run. As an initial distribution is considered the ellipsoid of rotation: π₯ 2 π¦2 π(π₯, π¦) = π1 β1 β β , (5) π12 π12 334 Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 with the volume π1 = 2ππ1 π1 π1 /3, and π1 , π1 , π1 being the half axis. Similarly to the initial distribution (5), the distribution of sources as an ellipsoid of rotation is chosen: (π₯ β π₯0 )2 (π¦ β π¦0 )2 πΊ(π₯, π¦) = π2 β1 β β π22 π22 with the volume π2 = 2ππ2 π2 π2 /3, and π2 , π2 , π2 being the half axis, (π₯0 , π¦0 ) β center of ellipsoid. The proposed approach is quite natural for porting to GPGPU since it consists of many iterations within which it is necessary to solve large systems of linear equations. Taking into account peculiarities of GPGPU architecture [4] we solve systems of linear equations on GPGPU leaving all pre- and postprocessing to CPU. This approach is realized by semi-automatic procedure, described in [4]. In Figure 1, 2 we show the moments of the perturbations evolution for the values π1 = 2, π1 = 3, π1 = 7.5, Ρ.Π΅. π1 = 20π and π2 = 2, π2 = 3, π2 = 2.5, π₯0 = β14, π¦0 = 14, π2 = 10π. The calculation was carried out without FCT procedure. Figure 1. Without source at π‘ = 16.5. Grid: 600 Γ 850, βπ‘ = 10β4 , βπ₯ = βπ¦ = 0.2 Figure 2. 3D perturbation with source at π‘ = 16.5. Grid: 600 Γ 850, βπ‘ = 10β4 , βπ₯ = βπ¦ = 0.2 335 Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 3. Conclusion 1. As a result of our approach, an algorithm was proposed for transferring the simulation program for a two-dimensional nonstationary model problem to a hybrid system. The features of such a transition are revealed. 2. The use of modern hybrid systems in combination with the new algorithmic approach has allowed also to create a software and hardware platform for mass computations of wave processes. 3. There are no substantial bottlenecks for GPGPU onboard memory and the attempts to use heterogeneous systems for 3D computations are justified. Acknowledgement This work was supported by the grant of Saint Petersburg State University no. 26520170 and the Russian Foundation for Basic Research (RFBR), grant #16-07-01113. References [1] Bogdanov A.V., Mareev V.V. Numerical Simulation KPI Equation // Proceedings of the 15th International Ship Stability Workshop, June 2016, Stockholm, Sweden. pp. 115-117. [2] Bogdanov A., Mareev V., Kulabukhova N., Shchegoleva N. Influence of External Source on KPI Equation. Lecture Notes in Computer Science book series (LNCS, volume 10963), 2018, pp 123-135 [3] Fletcher C.A.J. Computational Techniques for Fluid Dynamics 1 // 2nd edition. β Springer-Verlag, 1991. 401 p. [4] Alexander Bogdanov, Nikita Storublevtcev, Vladimir Mareev. On porting of applications to new heterogeneous systems // Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018) [In print] 336