Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 ALGORITHMS FOR THE CALCULATION OF NONLINEAR PROCESSES ON HYBRID ARCHITECTURE CLUSTERS A.V. Bogdanov a, V.V. Mareev b and N. Storublevtcev c St.Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034 Russia E-mail: a bogdanov@csa.ru, b map@csa.ru, c 100.rub@mail.ru The problem of porting programs from one hardware platform to another has not ceased to be less relevant and simpler with time. The purpose of our work is to identify the key features of algorithms in porting codes for calculating of essentially nonlinear processes to a modern cluster of hybrid architecture that includes both CPUs (Intel Xeon) and GPU (NVIDIA TESLA) processors. As a test problem for studying the process of porting a code to a cluster of hybrid architecture, the KPI equation of Kadomtsev-Petviashvili was chosen, written in integro-differential form [1], [2]. Keywords: High performance computing, CPU architectures, GPU, FPGA Β© 2018 Alexander V. Bogdanov, Vladimir V. Mareev, Nikita Storublevtcev 333 Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 1. Introduction Porting a computational task to a graphics processor is a difficult problem. As a rule, the program is ported to a graphics accelerator for the sake of improving performance. The main problem during the transfer is to preserve the correctness of program execution. It is impossible to transfer the entire code to the graphics processor. The code will always be launched from the central processor. In any program, there are serial sections of code that cannot be parallelized and thus are meaninglessly transferred to a graphics processor due to its peculiarities, nature of GPU architecture and the increased cost of data transfer. 2. The test problem As a test problem consider the two-dimensional Kadomtsev-Petviashvili equation β€” KPI [𝑒𝑑 + 0.5(𝑒 2 )π‘₯ + 𝛽𝑒π‘₯π‘₯π‘₯ βˆ’ 𝐺]π‘₯ = πœ‚π‘’π‘¦π‘¦ (1) Equation (1) with respect to function 𝑒(π‘₯, 𝑦, 𝑑) is considered in the domain 𝑑 β‰₯ 0, π‘₯, 𝑦 ∈ (βˆ’βˆž, ∞), 𝛽, πœ‚ β‰₯ 0, 𝐺(π‘₯, 𝑦) is external source [1]. Instead of the original equation (1) its integro-differential analogue is considered [2] π‘₯ 𝑒𝑑 + 0.5(𝑒2 )π‘₯ + 𝛽𝑒π‘₯π‘₯π‘₯ = πœ‚ ∫ 𝑒𝑦𝑦 (π‘₯β€², 𝑦, 𝑑)𝑑π‘₯β€² + 𝐺(π‘₯, 𝑦) (2) βˆ’βˆž Solution of the equation (2) in half-plane 𝑑 β‰₯ 0 is sought for initial distribution 𝑒(π‘₯, 𝑦, 0) = π‘ž(π‘₯, 𝑦). The numerical simulation of the equation (2) is carried out using a linearized implicit finite- difference scheme using in some cases the flux correction procedure (FCT) [3]. For equation (2), the approximation is performed using the central-difference operators. 𝑛+1 𝑛 βˆ†π‘‘ 𝑛+1 𝑛+1 βˆ†π‘‘ 𝑒𝑗,π‘˜ βˆ’ 𝑒𝑗,π‘˜ + (𝐹𝑗+1,π‘˜ βˆ’ πΉπ‘—βˆ’1,π‘˜ )+𝛽 (𝑒 𝑛+1 βˆ’ 2𝑒𝑗+1,π‘˜ 𝑛+1 𝑛+1 + 2π‘’π‘—βˆ’1,π‘˜ 𝑛+1 βˆ’ π‘’π‘—βˆ’2,π‘˜ ) 4βˆ†π‘₯ 2βˆ†π‘₯ 3 𝑗+2,π‘˜ (3) 𝑛+1 = βˆ†π‘‘πœ‚π‘†π‘—,π‘˜ + βˆ†π‘‘πΊπ‘—,π‘˜ The resulting system of difference equations (3) is reduced to the form: 𝑛+1 𝑛+1 𝑛+1 𝑛+1 𝑛+1 𝑛 π‘Žπ‘— βˆ†π‘’π‘—βˆ’2,π‘˜ +𝑏𝑗 βˆ†π‘’π‘—βˆ’1,π‘˜ + 𝑐𝑗 βˆ†π‘’π‘—,π‘˜ + 𝑑𝑗 βˆ†π‘’π‘—+1,π‘˜ + 𝑒𝑗 βˆ†π‘’π‘—+2,π‘˜ = 𝑓𝑗,π‘˜ (4) 𝑛+1 𝑛+1 𝑛 𝑛+1 with βˆ†π‘’π‘—,π‘˜ = 𝑒𝑗,π‘˜ βˆ’ 𝑒𝑗,π‘˜ and 𝐹𝑗,π‘˜ ≑ (𝑒 2 )𝑛+1 2 𝑛 𝑛 𝑛+1 2 𝑗,π‘˜ = (𝑒 )𝑗,π‘˜ + 2𝑒𝑗,π‘˜ βˆ†π‘’π‘—,π‘˜ + 𝑂(βˆ†π‘‘ ) Notations used in equation (3) traditional for finite difference schemes: 𝑛 𝑗 π‘₯ π‘₯ 𝑛 𝑒(π‘—βˆ†π‘₯, π‘˜βˆ†π‘¦, π‘›βˆ†π‘‘) = 𝑒𝑗,π‘˜ , βˆ«βˆ’βˆž 𝑒𝑦𝑦 𝑑π‘₯β€² β‰ˆ ∫π‘₯ 𝑗 𝑒𝑦𝑦 𝑑π‘₯β€² ≑ 𝑆𝑗,π‘˜ , min with βˆ†π‘₯, βˆ†π‘¦ being the spatial coordinates steps, βˆ†π‘‘ being the time step, [π‘₯min , π‘₯max ] Γ— [𝑦min , 𝑦max ] Γ— [0, 𝑇] β€” computational domain. The boundary conditions are used: 𝑒π‘₯ = 𝑒π‘₯π‘₯ = 0 along boundary lines π‘₯1 and π‘₯𝑀, and 𝑒𝑦 = 0 along the lines 𝑦1 and 𝑦𝐿 ( π‘₯min = π‘₯1 , π‘₯max = π‘₯𝑀 , 𝑦min = 𝑦1 , 𝑦max = 𝑦𝐿 : 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 π‘’βˆ’1,π‘˜ = 𝑒0,π‘˜ = 𝑒1,π‘˜ ; 𝑒𝑀+2,π‘˜ = 𝑒𝑀+1,π‘˜ = 𝑒𝑀,π‘˜ ; 𝑒𝑗,0 = 𝑒𝑗,1 ; 𝑒𝑗,𝐿+1 = 𝑒𝑗,𝐿 The system (4) is solved by a five-point run. As an initial distribution is considered the ellipsoid of rotation: π‘₯ 2 𝑦2 π‘ž(π‘₯, 𝑦) = 𝑐1 √1 βˆ’ βˆ’ , (5) π‘Ž12 𝑏12 334 Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 with the volume 𝑉1 = 2πœ‹π‘Ž1 𝑏1 𝑐1 /3, and π‘Ž1 , 𝑏1 , 𝑐1 being the half axis. Similarly to the initial distribution (5), the distribution of sources as an ellipsoid of rotation is chosen: (π‘₯ βˆ’ π‘₯0 )2 (𝑦 βˆ’ 𝑦0 )2 𝐺(π‘₯, 𝑦) = 𝑐2 √1 βˆ’ βˆ’ π‘Ž22 𝑏22 with the volume 𝑉2 = 2πœ‹π‘Ž2 𝑏2 𝑐2 /3, and π‘Ž2 , 𝑏2 , 𝑐2 being the half axis, (π‘₯0 , 𝑦0 ) β€” center of ellipsoid. The proposed approach is quite natural for porting to GPGPU since it consists of many iterations within which it is necessary to solve large systems of linear equations. Taking into account peculiarities of GPGPU architecture [4] we solve systems of linear equations on GPGPU leaving all pre- and postprocessing to CPU. This approach is realized by semi-automatic procedure, described in [4]. In Figure 1, 2 we show the moments of the perturbations evolution for the values π‘Ž1 = 2, 𝑏1 = 3, 𝑐1 = 7.5, Ρ‚.Π΅. 𝑉1 = 20πœ‹ and π‘Ž2 = 2, 𝑏2 = 3, 𝑐2 = 2.5, π‘₯0 = βˆ’14, 𝑦0 = 14, 𝑉2 = 10πœ‹. The calculation was carried out without FCT procedure. Figure 1. Without source at 𝑑 = 16.5. Grid: 600 Γ— 850, βˆ†π‘‘ = 10βˆ’4 , βˆ†π‘₯ = βˆ†π‘¦ = 0.2 Figure 2. 3D perturbation with source at 𝑑 = 16.5. Grid: 600 Γ— 850, βˆ†π‘‘ = 10βˆ’4 , βˆ†π‘₯ = βˆ†π‘¦ = 0.2 335 Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018 3. Conclusion 1. As a result of our approach, an algorithm was proposed for transferring the simulation program for a two-dimensional nonstationary model problem to a hybrid system. The features of such a transition are revealed. 2. The use of modern hybrid systems in combination with the new algorithmic approach has allowed also to create a software and hardware platform for mass computations of wave processes. 3. There are no substantial bottlenecks for GPGPU onboard memory and the attempts to use heterogeneous systems for 3D computations are justified. Acknowledgement This work was supported by the grant of Saint Petersburg State University no. 26520170 and the Russian Foundation for Basic Research (RFBR), grant #16-07-01113. References [1] Bogdanov A.V., Mareev V.V. Numerical Simulation KPI Equation // Proceedings of the 15th International Ship Stability Workshop, June 2016, Stockholm, Sweden. pp. 115-117. [2] Bogdanov A., Mareev V., Kulabukhova N., Shchegoleva N. Influence of External Source on KPI Equation. Lecture Notes in Computer Science book series (LNCS, volume 10963), 2018, pp 123-135 [3] Fletcher C.A.J. Computational Techniques for Fluid Dynamics 1 // 2nd edition. – Springer-Verlag, 1991. 401 p. [4] Alexander Bogdanov, Nikita Storublevtcev, Vladimir Mareev. On porting of applications to new heterogeneous systems // Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018) [In print] 336