=Paper=
{{Paper
|id=Vol-2267/333-336-paper-63
|storemode=property
|title=Algorithms for the calculation of nonlinear processes on hybrid architecture clusters
|pdfUrl=https://ceur-ws.org/Vol-2267/333-336-paper-63.pdf
|volume=Vol-2267
|authors=Alexander V. Bogdanov,Vladimir V. Mareev,Nikita Storublevtcev
}}
==Algorithms for the calculation of nonlinear processes on hybrid architecture clusters==
<pdf width="1500px">https://ceur-ws.org/Vol-2267/333-336-paper-63.pdf</pdf>
<pre>
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


 ALGORITHMS FOR THE CALCULATION OF NONLINEAR
  PROCESSES ON HYBRID ARCHITECTURE CLUSTERS
              A.V. Bogdanov a, V.V. Mareev b and N. Storublevtcev c
       St.Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034 Russia

                     E-mail: a bogdanov@csa.ru, b map@csa.ru, c 100.rub@mail.ru


The problem of porting programs from one hardware platform to another has not ceased to be less
relevant and simpler with time. The purpose of our work is to identify the key features of algorithms in
porting codes for calculating of essentially nonlinear processes to a modern cluster of hybrid
architecture that includes both CPUs (Intel Xeon) and GPU (NVIDIA TESLA) processors. As a test
problem for studying the process of porting a code to a cluster of hybrid architecture, the KPI equation
of Kadomtsev-Petviashvili was chosen, written in integro-differential form [1], [2].

Keywords: High performance computing, CPU architectures, GPU, FPGA

                                   © 2018 Alexander V. Bogdanov, Vladimir V. Mareev, Nikita Storublevtcev


                                                                                                        333
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


1. Introduction
         Porting a computational task to a graphics processor is a difficult problem. As a rule, the
program is ported to a graphics accelerator for the sake of improving performance. The main problem
during the transfer is to preserve the correctness of program execution.
         It is impossible to transfer the entire code to the graphics processor. The code will always be
launched from the central processor. In any program, there are serial sections of code that cannot be
parallelized and thus are meaninglessly transferred to a graphics processor due to its peculiarities,
nature of GPU architecture and the increased cost of data transfer.


2. The test problem
        As a test problem consider the two-dimensional Kadomtsev-Petviashvili equation — KPI
                                 [𝑢𝑡 + 0.5(𝑢 2 )𝑥 + 𝛽𝑢𝑥𝑥𝑥 − 𝐺]𝑥 = 𝜂𝑢𝑦𝑦                                   (1)
      Equation (1) with respect to function 𝑢(𝑥, 𝑦, 𝑡) is considered in the domain 𝑡 ≥ 0, 𝑥, 𝑦 ∈
(−∞, ∞), 𝛽, 𝜂 ≥ 0, 𝐺(𝑥, 𝑦) is external source [1].
        Instead of the original equation (1) its integro-differential analogue is considered [2]
                                                        𝑥

                      𝑢𝑡 + 0.5(𝑢2 )𝑥 + 𝛽𝑢𝑥𝑥𝑥 = 𝜂 ∫ 𝑢𝑦𝑦 (𝑥′, 𝑦, 𝑡)𝑑𝑥′ + 𝐺(𝑥, 𝑦)                          (2)
                                                       −∞

        Solution of the equation (2) in half-plane 𝑡 ≥ 0 is sought for initial distribution 𝑢(𝑥, 𝑦, 0) =
𝑞(𝑥, 𝑦). The numerical simulation of the equation (2) is carried out using a linearized implicit finite-
difference scheme using in some cases the flux correction procedure (FCT) [3].
        For equation (2), the approximation is performed using the central-difference operators.

         𝑛+1    𝑛
                         ∆𝑡    𝑛+1      𝑛+1
                                                  ∆𝑡
        𝑢𝑗,𝑘 − 𝑢𝑗,𝑘 +       (𝐹𝑗+1,𝑘 − 𝐹𝑗−1,𝑘 )+𝛽      (𝑢 𝑛+1 − 2𝑢𝑗+1,𝑘
                                                                 𝑛+1       𝑛+1
                                                                       + 2𝑢𝑗−1,𝑘    𝑛+1
                                                                                 − 𝑢𝑗−2,𝑘 )
                        4∆𝑥                      2∆𝑥 3 𝑗+2,𝑘                                             (3)
                              𝑛+1
                       = ∆𝑡𝜂𝑆𝑗,𝑘 + ∆𝑡𝐺𝑗,𝑘

        The resulting system of difference equations (3) is reduced to the form:
                        𝑛+1         𝑛+1          𝑛+1        𝑛+1          𝑛+1       𝑛
                   𝑎𝑗 ∆𝑢𝑗−2,𝑘 +𝑏𝑗 ∆𝑢𝑗−1,𝑘 + 𝑐𝑗 ∆𝑢𝑗,𝑘 + 𝑑𝑗 ∆𝑢𝑗+1,𝑘 + 𝑒𝑗 ∆𝑢𝑗+2,𝑘 = 𝑓𝑗,𝑘                    (4)
        𝑛+1     𝑛+1     𝑛        𝑛+1
 with ∆𝑢𝑗,𝑘 = 𝑢𝑗,𝑘  − 𝑢𝑗,𝑘 and 𝐹𝑗,𝑘   ≡ (𝑢 2 )𝑛+1        2 𝑛        𝑛     𝑛+1    2
                                              𝑗,𝑘 = (𝑢 )𝑗,𝑘 + 2𝑢𝑗,𝑘 ∆𝑢𝑗,𝑘 + 𝑂(∆𝑡 )
       Notations used in equation (3) traditional for finite difference schemes:
                                         𝑛       𝑗 𝑥               𝑥          𝑛
                     𝑢(𝑗∆𝑥, 𝑘∆𝑦, 𝑛∆𝑡) = 𝑢𝑗,𝑘 , ∫−∞ 𝑢𝑦𝑦 𝑑𝑥′ ≈ ∫𝑥 𝑗 𝑢𝑦𝑦 𝑑𝑥′ ≡ 𝑆𝑗,𝑘 ,
                                                                    min

with ∆𝑥, ∆𝑦 being the spatial coordinates steps, ∆𝑡 being the time step, [𝑥min , 𝑥max ] × [𝑦min ,
𝑦max ] × [0, 𝑇] — computational domain.
        The boundary conditions are used: 𝑢𝑥 = 𝑢𝑥𝑥 = 0 along boundary lines 𝑥1 and 𝑥𝑀, and 𝑢𝑦 = 0
along the lines 𝑦1 and 𝑦𝐿 ( 𝑥min = 𝑥1 , 𝑥max = 𝑥𝑀 , 𝑦min = 𝑦1 , 𝑦max = 𝑦𝐿 :
               𝑛       𝑛      𝑛      𝑛        𝑛        𝑛      𝑛      𝑛      𝑛        𝑛
              𝑢−1,𝑘 = 𝑢0,𝑘 = 𝑢1,𝑘 ; 𝑢𝑀+2,𝑘 = 𝑢𝑀+1,𝑘 = 𝑢𝑀,𝑘 ; 𝑢𝑗,0 = 𝑢𝑗,1 ; 𝑢𝑗,𝐿+1 = 𝑢𝑗,𝐿
        The system (4) is solved by a five-point run.
        As an initial distribution is considered the ellipsoid of rotation:

                                                            𝑥 2 𝑦2
                                      𝑞(𝑥, 𝑦) = 𝑐1 √1 −        −    ,                                    (5)
                                                            𝑎12 𝑏12


                                                                                                        334
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


 with the volume 𝑉1 = 2𝜋𝑎1 𝑏1 𝑐1 /3, and 𝑎1 , 𝑏1 , 𝑐1 being the half axis.
        Similarly to the initial distribution (5), the distribution of sources as an ellipsoid of rotation is
chosen:

                                                    (𝑥 − 𝑥0 )2 (𝑦 − 𝑦0 )2
                                𝐺(𝑥, 𝑦) = 𝑐2 √1 −             −
                                                       𝑎22        𝑏22

with the volume 𝑉2 = 2𝜋𝑎2 𝑏2 𝑐2 /3, and 𝑎2 , 𝑏2 , 𝑐2 being the half axis, (𝑥0 , 𝑦0 ) — center of ellipsoid.
         The proposed approach is quite natural for porting to GPGPU since it consists of many
iterations within which it is necessary to solve large systems of linear equations. Taking into account
peculiarities of GPGPU architecture [4] we solve systems of linear equations on GPGPU leaving all
pre- and postprocessing to CPU. This approach is realized by semi-automatic procedure, described in
[4].
         In Figure 1, 2 we show the moments of the perturbations evolution for the values 𝑎1 = 2,
𝑏1 = 3, 𝑐1 = 7.5, т.е. 𝑉1 = 20𝜋 and 𝑎2 = 2, 𝑏2 = 3, 𝑐2 = 2.5, 𝑥0 = −14, 𝑦0 = 14, 𝑉2 = 10𝜋. The
calculation was carried out without FCT procedure.


              Figure 1. Without source at 𝑡 = 16.5. Grid: 600 × 850, ∆𝑡 = 10−4 , ∆𝑥 = ∆𝑦 = 0.2


       Figure 2. 3D perturbation with source at 𝑡 = 16.5. Grid: 600 × 850, ∆𝑡 = 10−4 , ∆𝑥 = ∆𝑦 = 0.2

                                                                                                         335
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


3. Conclusion
           1.     As a result of our approach, an algorithm was proposed for transferring the simulation
program for a two-dimensional nonstationary model problem to a hybrid system. The features of such
a transition are revealed.
           2.     The use of modern hybrid systems in combination with the new algorithmic approach
has allowed also to create a software and hardware platform for mass computations of wave processes.
           3.     There are no substantial bottlenecks for GPGPU onboard memory and the attempts to
use heterogeneous systems for 3D computations are justified.


Acknowledgement
       This work was supported by the grant of Saint Petersburg State University no. 26520170
and the Russian Foundation for Basic Research (RFBR), grant #16-07-01113.


References
[1] Bogdanov A.V., Mareev V.V. Numerical Simulation KPI Equation // Proceedings of the 15th
International Ship Stability Workshop, June 2016, Stockholm, Sweden. pp. 115-117.
[2] Bogdanov A., Mareev V., Kulabukhova N., Shchegoleva N. Influence of External Source on KPI
Equation. Lecture Notes in Computer Science book series (LNCS, volume 10963), 2018, pp 123-135
[3] Fletcher C.A.J. Computational Techniques for Fluid Dynamics 1 // 2nd edition. – Springer-Verlag,
1991. 401 p.
[4] Alexander Bogdanov, Nikita Storublevtcev, Vladimir Mareev. On porting of applications to new
heterogeneous systems // Proceedings of the VIII International Conference "Distributed Computing
and Grid-technologies in Science and Education" (GRID 2018) [In print]


                                                                                                        336

</pre>