-

10.18287/1613-0073-2016-1638-444-450

APPLICATION OF THE METHOD OF PYRAMID FOR SYNTHESIS OF PARALLEL ALGORITHM FOR DIFFERENCE SOLUTION OF THE TWO- DIMENSIONAL PARTIAL DIFFERENTIALS EQUATION

D.L. Golovashkin

0 1

L.V. Yablokova

E.V. Belova

1 0 Image Processing Systems Institute - Branch of the Federal Scientific Research Centre "Crystallography and Photonics" of Russian Academy of Sciences , Samara , Russia 1 Samara National Research University , Samara , Russia

2016

444 450

The work is devoted to the synthesis and investigation of parallel algorithm for a finite difference solution of the Poisson equation using the Jacobi method. For example, two-dimensional case demonstrates the efficacy of the method of the pyramids in the synthesis of said algorithm.

Method of the pyramids Jacobi method parallel algorithm

The difference solving differential equations since the mid-50s [ 1 ] is actively used for the production of the computational experiments in various fields of natural sciences (for example, in nanophotonics [ 2 ], diffractive optics [ 3 ], ). In some cases statement of natural experiment or is extraordinary expensive and very expensive on time (Large Hadron Collider), or is forbidden by international treaties (for example, about non-renewal of nuclear tests), or is impossible at all (evolution of large astrophysical objects).

The popular tool of mathematical model operation is the solution of the grid equations of implicit difference schemes for various differential tasks. Procedure of obtaining the specified decision, differing from algorithms of the solution of apparent difference schemes of the equations in the best stability [ 4 ] (often absolute), it is characterized by work with the systems of the linear equations which do not arise for the apparent equations. The specified circumstance induces researchers to development of vector and parallel methods of solution of systems of linear equations of a tape look [ 5,6,7,8 ], for the purpose of decrease in duration of model operation.

In recent work [ 9 ] the parallel algorithm of a multigrid method of solution of elliptical equations by means of Chebyshev iterative procedure is offered.

Authors hold its testing on a supercomputer "Lomonosov", trying to obtain on 64 computing clusters (everyone contains 2 six nuclear Intel Xeon X5670 processors) acceleration of calculations by 25 times in comparison with method of prime iteration. Seeking for further reduction of duration of calculations on similar algorithms, authors of the offered work use a method of pyramids for reduction of duration of communications due to duplication of arithmetic operations by various processors. As an example are chosen the implicit difference scheme for a two-dimensional Poisson equation and a method of Jacobi, for the solution of the grid equations. A similar problem applied to other differential equations or other clusters of the previously successfully solved in the works [ 10,11,12 ]. 2

The difference scheme for a Poisson equation and a Jacobi method

Let's consider the nonuniform Poisson equation in a two-dimensional case (rectangular area of computing experiment with the parties 11 and 12 respectively): We will put boundary conditions equal to zero (a condition of Dirichlet). We define a right member of the equation as: x2u2  y2u2  f (x, y) f (x, y)  22 sin x sin y

l1 l2 for ease of recording of analytical solutions needed at the convergence of the verification.

In compliance with [ 4 ] we will make creation of the approximating difference scheme replacement of derivatives with divided differences on grid area for a Poisson equation. h  {(xi , y j ) : xi  ih x ;i  0, I 1, h x  y j  j h y ; j  0, J1, h y  l2 }

J  1 Then the scheme looks as follows:

l1 ,

I 1

Method of the pyramids

The idea about receiving completely independent branches [ 13 ] of computing process which do not need in the course of the account synchronization and exchange of information is the cornerstone of a method of the pyramids.

The classic version of this method is described in [ 13 ] and [ 14 ]. The authors proposed here is a modification obtained by introducing the concept of the height of the pyramid.

In this work one-dimensional decomposition of two-dimensional grid area (fig. 1) on µ tasks is carried out, each of which calculates values of iterative approximations of grid function on n of iterations (pyramid height) for the subarea forward. At the same time calculations begin with R=r+2n of values (the pyramid basis) and come to the end for r of values (pyramid top), and the subarea limited to n iterations has a pyramid appearance with flat top.

Further, in this article, for simplicity, the case of two processors (µ=2), characterized by pyramids with height n=1 (fig. 2), 2 (fig.3), 4, 8, 16.

From figure 2 it is visible that the case of n=2 corresponds to routine parallel algorithm from [ 5 ], to Jacobi who is characterized by production of communications on each iteration of a method.

For n=2 (fig. 3) the number of communications is cut by half, they accompany calculation on each even iteration. Following the planned tendency, further we will note that with the arbitraries height of n of communications are made for algorithm through each n of iterations. At n=K we will receive the classical option of a method of the pyramids which is not characterized by communications at all. Thus, at our disposal there is a tool allowing to vary randomly the number of communications from K (at n=1) to zero (at n=K). Let's note that reduction of duration of communications at decrease of their quantity is followed by body height of time of arithmetic operations. Really, apparently from fig. 1 at increase in n also R pyramid basis length and consequently also the number of the grid functions referred to maintaining one processor grows. Moreover, the specified calculations are duplicated by the next processors, being a payment for reduction of communications. Confirmation of operability of the offered approach to creation of parallel algorithms was made by method of computing experiment. Initially researches were conducted in comparison with realization for 2 kernels of the desktop computer, only the cluster case as the most successful is mentioned in article.

We were used the supercomputer knot "Sergey Korolyov" [ 15 ], in particular 2 cores of the processor 4x Intel Xeon E7-4860 working under control of the Red Hat Enterprise Linux 5.11 operating system. Program realization of algorithms was made in the FORTRAN language, compilation – on GFORTRAN 4.6. Parameters of sampling of grid area were defined by the sizes I=55, K=100, at which acceleration of traditional parallel algorithm, in comparison with serial, ceases to change and fluctuates approximately at one S=2.045 level. From the table, where Tparallel is the run time of parallel algorithm, and drawing is visible that application of a method of pyramids allowed to achieve increase of accelerations of computing process more than by 1.7 times in comparison with traditional serial approach and even to surpass the theoretical limit set by Amdahl's law. Perhaps, made mention effect also the processor cache memory at decomposition of grid area is bound to padding optimization of communications between quick. The chart in fig. 4 has an apparent U-shaped appearance as the tentative prize in duration of calculations due to decrease of number of communications further, with body height of height of a pyramid, begins to be compensated by increase in volume of arithmetic operations. 5

Conclusion

The reception of drawing up parallel algorithms of the iterative solution of the grid equations of implicit difference schemes offered in work (on the example of a method of Jacobi for a Poisson equation) allowed to increase acceleration of calculations in comparison with traditional parallel approach by 1.2 times. The author hopes that the developed technique will find application at the difference solution of a wide range of the implicit grid equations.

Acknowledgment

This work was supported by grant RFBR 14-07-00291-а.

1. Ryabenky

, Filippov

. About stability of difference equations . Moscow, Gostekhizdat, 1956 .

2. Diffracrive Nanophotonics, ed. by V.A. Soifer, CRC Press,Taylor & Francis Group, CISP , Boca Raton, 679 p.

3. Сomputer design of diffractive optics, ed. by V.A. Soifer , Woodhead Publishing Limited, Cambridge, 2012 .

4. Samarsky

. Theory of difference schemes . Moscow, Science, 1977 .

5. Golub

, Van Loan C. Matrix Computations Johns Hopkins University Press, 1996 ; 728 p.

Wenhua

Yu , Raj Mittra, Tao Su, Yongjun Liu,

Xiaoling

Yang . Parallel finite-difference time-domain method . Artech house Boston , London, 2006 ; 272 p.

7. McDonald

, Fisher

, Rigden

, Perala R. Parallel FDTD Electromagnetic Effects Simulation using On-Demand Cloud HPC Resources . Electromagnetic Compatibility (EMC) , IEEE International Symposium , 2013 ; 499 - 502 .

Ortega

Dzh . Introduction to parallel and vector methods for solving linear systems . Moscow, Mir, 1991 . [in Russia].

9. Zhukov

, Novikov

, Feodoritova

. Parallel multigrid method for solving elliptic equations . Journal Mathematical Models and Computer Simulation , 2014 ; 26 ( 1 ): 55 - 68 .

10. Kochurov

, Golovashkin

. GPU implementation of Jacobi Method and GaussSeidel Method for Data Arrays that Exceed GPU-dedicated Memory Size . Journal of Mathematical Modelling and Algorithms in Operations Research , 2015 ; 14 ( 4 ): 393 - 405 .

11. Kochurov

, Vorotnikova

, Golovashkin

. GPU implementation of Jacobi method for data arrays that exceed GPU-dedicated memory size . CEUR Workshop Proceedings , 2015 ; 1490 : 414 - 419 . DOI: 10 .18287/ 1613 -0073-2015-1490-414-419.

12. Golovashkin

, Kochurov

. Pyramid method for GPU-aided finite difference method . Proceedings of the 13th International Conference on Computational and Mathematical Methods in Science and Engineering , CMMSE 2013 24 - 27 June, 2013 ; 746 - 756 .

13. Lamport L. The parallel execution of DO loops . Communications of the ACM , 1974 ; 17 ( 2 ): 83 - 93 .

14. Valkovski

. Parallel execution cycles. The method of the pyramids . Cybernetics , 1983 ; 5 : 51 - 55 .

15. URL: http://hpc.ssau. ru/node/6 (data of access 30.03 . 2016 ).