<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.18287/1613-0073-2016-1638-444-450</article-id>
      <title-group>
        <article-title>APPLICATION OF THE METHOD OF PYRAMID FOR SYNTHESIS OF PARALLEL ALGORITHM FOR DIFFERENCE SOLUTION OF THE TWO- DIMENSIONAL PARTIAL DIFFERENTIALS EQUATION</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>D.L. Golovashkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L.V. Yablokova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E.V. Belova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute - Branch of the Federal Scientific Research Centre "Crystallography and Photonics" of Russian Academy of Sciences</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>444</fpage>
      <lpage>450</lpage>
      <abstract>
        <p>The work is devoted to the synthesis and investigation of parallel algorithm for a finite difference solution of the Poisson equation using the Jacobi method. For example, two-dimensional case demonstrates the efficacy of the method of the pyramids in the synthesis of said algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>Method of the pyramids</kwd>
        <kwd>Jacobi method</kwd>
        <kwd>parallel algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The difference solving differential equations since the mid-50s [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is actively used for
the production of the computational experiments in various fields of natural sciences
(for example, in nanophotonics [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], diffractive optics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], ). In some cases statement
of natural experiment or is extraordinary expensive and very expensive on time
(Large Hadron Collider), or is forbidden by international treaties (for example, about
non-renewal of nuclear tests), or is impossible at all (evolution of large astrophysical
objects).
      </p>
      <p>
        The popular tool of mathematical model operation is the solution of the grid equations
of implicit difference schemes for various differential tasks. Procedure of obtaining
the specified decision, differing from algorithms of the solution of apparent difference
schemes of the equations in the best stability [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (often absolute), it is characterized
by work with the systems of the linear equations which do not arise for the apparent
equations. The specified circumstance induces researchers to development of vector
and parallel methods of solution of systems of linear equations of a tape look
[
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5,6,7,8</xref>
        ], for the purpose of decrease in duration of model operation.
      </p>
      <p>
        In recent work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] the parallel algorithm of a multigrid method of solution of elliptical
equations by means of Chebyshev iterative procedure is offered.
      </p>
      <p>
        Authors hold its testing on a supercomputer "Lomonosov", trying to obtain on 64
computing clusters (everyone contains 2 six nuclear Intel Xeon X5670 processors)
acceleration of calculations by 25 times in comparison with method of prime iteration.
Seeking for further reduction of duration of calculations on similar algorithms,
authors of the offered work use a method of pyramids for reduction of duration of
communications due to duplication of arithmetic operations by various processors. As an
example are chosen the implicit difference scheme for a two-dimensional Poisson
equation and a method of Jacobi, for the solution of the grid equations.
A similar problem applied to other differential equations or other clusters of the
previously successfully solved in the works [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10,11,12</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The difference scheme for a Poisson equation and a Jacobi method</title>
      <p>Let's consider the nonuniform Poisson equation in a two-dimensional case
(rectangular area of computing experiment with the parties 11 and 12 respectively):
We will put boundary conditions equal to zero (a condition of Dirichlet). We define a
right member of the equation as:
x2u2  y2u2  f (x, y)
f (x, y)  22 sin x sin y</p>
      <p>l1 l2
for ease of recording of analytical solutions needed at the convergence of the
verification.</p>
      <p>
        In compliance with [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we will make creation of the approximating difference scheme
replacement of derivatives with divided differences on grid area for a Poisson
equation.
h  {(xi , y j ) : xi  ih x ;i  0, I 1, h x 
y j  j h y ; j  0, J1, h y  l2 }
      </p>
      <p>J  1
Then the scheme looks as follows:</p>
      <p>l1 ,</p>
      <p>I 1</p>
    </sec>
    <sec id="sec-3">
      <title>Method of the pyramids</title>
      <p>
        The idea about receiving completely independent branches [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] of computing process
which do not need in the course of the account synchronization and exchange of
information is the cornerstone of a method of the pyramids.
      </p>
      <p>
        The classic version of this method is described in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The authors proposed
here is a modification obtained by introducing the concept of the height of the
pyramid.
      </p>
      <p>In this work one-dimensional decomposition of two-dimensional grid area (fig. 1) on
µ tasks is carried out, each of which calculates values of iterative approximations of
grid function on n of iterations (pyramid height) for the subarea forward. At the same
time calculations begin with R=r+2n of values (the pyramid basis) and come to the
end for r of values (pyramid top), and the subarea limited to n iterations has a pyramid
appearance with flat top.</p>
      <p>Further, in this article, for simplicity, the case of two processors (µ=2), characterized
by pyramids with height n=1 (fig. 2), 2 (fig.3), 4, 8, 16.</p>
      <p>
        From figure 2 it is visible that the case of n=2 corresponds to routine parallel
algorithm from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], to Jacobi who is characterized by production of communications on
each iteration of a method.
      </p>
      <p>For n=2 (fig. 3) the number of communications is cut by half, they accompany
calculation on each even iteration. Following the planned tendency, further we will note
that with the arbitraries height of n of communications are made for algorithm
through each n of iterations.
At n=K we will receive the classical option of a method of the pyramids which is not
characterized by communications at all. Thus, at our disposal there is a tool allowing
to vary randomly the number of communications from K (at n=1) to zero (at n=K).
Let's note that reduction of duration of communications at decrease of their quantity is
followed by body height of time of arithmetic operations. Really, apparently from fig.
1 at increase in n also R pyramid basis length and consequently also the number of the
grid functions referred to maintaining one processor grows. Moreover, the specified
calculations are duplicated by the next processors, being a payment for reduction of
communications.
Confirmation of operability of the offered approach to creation of parallel algorithms
was made by method of computing experiment. Initially researches were conducted in
comparison with realization for 2 kernels of the desktop computer, only the cluster
case as the most successful is mentioned in article.</p>
      <p>
        We were used the supercomputer knot "Sergey Korolyov" [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], in particular 2 cores
of the processor 4x Intel Xeon E7-4860 working under control of the Red Hat
Enterprise Linux 5.11 operating system. Program realization of algorithms was made in the
FORTRAN language, compilation – on GFORTRAN 4.6. Parameters of sampling of
grid area were defined by the sizes I=55, K=100, at which acceleration of traditional
parallel algorithm, in comparison with serial, ceases to change and fluctuates
approximately at one S=2.045 level.
From the table, where Tparallel is the run time of parallel algorithm, and drawing is
visible that application of a method of pyramids allowed to achieve increase of
accelerations of computing process more than by 1.7 times in comparison with traditional
serial approach and even to surpass the theoretical limit set by Amdahl's law.
Perhaps, made mention effect also the processor cache memory at decomposition of
grid area is bound to padding optimization of communications between quick.
The chart in fig. 4 has an apparent U-shaped appearance as the tentative prize in
duration of calculations due to decrease of number of communications further, with body
height of height of a pyramid, begins to be compensated by increase in volume of
arithmetic operations.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The reception of drawing up parallel algorithms of the iterative solution of the grid
equations of implicit difference schemes offered in work (on the example of a method
of Jacobi for a Poisson equation) allowed to increase acceleration of calculations in
comparison with traditional parallel approach by 1.2 times. The author hopes that the
developed technique will find application at the difference solution of a wide range of
the implicit grid equations.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This work was supported by grant RFBR 14-07-00291-а.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ryabenky</surname>
            <given-names>VS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Filippov</surname>
            <given-names>AF</given-names>
          </string-name>
          .
          <article-title>About stability of difference equations</article-title>
          . Moscow, Gostekhizdat,
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Diffracrive Nanophotonics, ed. by V.A. Soifer, CRC Press,Taylor &amp; Francis Group,
          <string-name>
            <surname>CISP</surname>
          </string-name>
          , Boca Raton,
          <volume>679</volume>
          p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Сomputer design of diffractive optics, ed. by V.A.
          <string-name>
            <surname>Soifer</surname>
          </string-name>
          , Woodhead Publishing Limited, Cambridge,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Samarsky</surname>
            <given-names>AA</given-names>
          </string-name>
          .
          <article-title>Theory of difference schemes</article-title>
          . Moscow, Science,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Golub</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Loan C. Matrix Computations</surname>
          </string-name>
          Johns Hopkins University Press,
          <year>1996</year>
          ; 728 p.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Wenhua</given-names>
            <surname>Yu</surname>
          </string-name>
          , Raj Mittra, Tao Su, Yongjun Liu,
          <string-name>
            <given-names>Xiaoling</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Parallel finite-difference time-domain method</article-title>
          .
          <source>Artech house Boston</source>
          , London,
          <year>2006</year>
          ; 272 p.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>McDonald</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisher</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigden</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perala R. Parallel FDTD Electromagnetic Effects</surname>
          </string-name>
          <article-title>Simulation using On-Demand Cloud HPC Resources</article-title>
          .
          <article-title>Electromagnetic Compatibility (EMC)</article-title>
          ,
          <source>IEEE International Symposium</source>
          ,
          <year>2013</year>
          ;
          <fpage>499</fpage>
          -
          <lpage>502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Ortega</given-names>
            <surname>Dzh</surname>
          </string-name>
          .
          <article-title>Introduction to parallel and vector methods for solving linear systems</article-title>
          . Moscow, Mir,
          <year>1991</year>
          . [in Russia].
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zhukov</surname>
            <given-names>VT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novikov</surname>
            <given-names>ND</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feodoritova</surname>
            <given-names>OB</given-names>
          </string-name>
          .
          <article-title>Parallel multigrid method for solving elliptic equations</article-title>
          .
          <source>Journal Mathematical Models and Computer Simulation</source>
          ,
          <year>2014</year>
          ;
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <fpage>55</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kochurov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>GPU implementation of Jacobi Method and GaussSeidel Method for Data Arrays that Exceed GPU-dedicated Memory Size</article-title>
          .
          <source>Journal of Mathematical Modelling and Algorithms in Operations Research</source>
          ,
          <year>2015</year>
          ;
          <volume>14</volume>
          (
          <issue>4</issue>
          ):
          <fpage>393</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kochurov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorotnikova</surname>
            <given-names>DG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>GPU implementation of Jacobi method for data arrays that exceed GPU-dedicated memory size</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2015</year>
          ;
          <volume>1490</volume>
          :
          <fpage>414</fpage>
          -
          <lpage>419</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>1613</fpage>
          -0073-2015-1490-414-419.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kochurov</surname>
            <given-names>AV</given-names>
          </string-name>
          .
          <article-title>Pyramid method for GPU-aided finite difference method</article-title>
          .
          <source>Proceedings of the 13th International Conference on Computational and Mathematical Methods in Science and Engineering</source>
          , CMMSE 2013
          <fpage>24</fpage>
          -
          <lpage>27</lpage>
          June,
          <year>2013</year>
          ;
          <fpage>746</fpage>
          -
          <lpage>756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Lamport L.
          <article-title>The parallel execution of DO loops</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <year>1974</year>
          ;
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>83</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Valkovski</surname>
            <given-names>VA</given-names>
          </string-name>
          .
          <article-title>Parallel execution cycles. The method of the pyramids</article-title>
          .
          <source>Cybernetics</source>
          ,
          <year>1983</year>
          ;
          <volume>5</volume>
          :
          <fpage>51</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. URL: http://hpc.ssau.
          <source>ru/node/6 (data of access 30.03</source>
          .
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>