<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Proceedings</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.18287/1613-0073-2016-1638</article-id>
      <title-group>
        <article-title>IMPLEMENTATION OF DIFFERENCE SOLUTIONS OF MAXWELL'S EQUATIONS ON THE GPU BY METHOD OF PYRAMID</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>L.V. Yablokova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D.L. Golovashkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1638</volume>
      <fpage>469</fpage>
      <lpage>476</lpage>
      <abstract>
        <p>The work is dedicated to the development of the method of pyramids in the annex to the decision of Maxwell's equations in the time domain by means of the finite difference (FDTD) and its implementation on the GPU. This method allows you to remove the restriction on the amount of memory the graphics computing device.</p>
      </abstract>
      <kwd-group>
        <kwd>FDTD</kwd>
        <kwd>Maxwell's equations</kwd>
        <kwd>method of pyramids</kwd>
        <kwd>GPU</kwd>
        <kwd>CUDA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Difference solution of Maxwell's equations method in the time domain (FDTD) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is
widely used in modern research in the simulation: the propagation of radiation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
materials with new properties [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the interaction of radiation with biological objects
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and in many other applications.
      </p>
      <p>Its popularity is due to methodological clarity (based on the replacement of
derivatives by difference quotients), wide scope of application (rigorous theory of
diffraction) and an abundance of ready-made software implementations (many commercial
and open source packages for different architectures and operating systems). The
price for the universality of the method - high demands on processor speed and
amount of RAM. Therefore developed half a century ago, the method has been
actively used in computational practice until now.</p>
      <p>
        The development of computer technology hardware base faced with difficulties
("silicon dead end"). It is not by increasing clock speed of processors. Produces special
architectural solutions. They allow to increase the speed of certain types of
operations. The most notable successes achieved in this direction during the development
of the architecture of graphics processors (GPU) and related software tools (CUDA
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the OpenCL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), allowing multiple accelerate action with vectors and matrices.
From a mathematical point of view, based on the FDTD method is an operation of
subtraction of matrices. Modern implementations FDTD-method on GPU [
        <xref ref-type="bibr" rid="ref7">7, 8</xref>
        ]
characterized by an increase in productivity of 10 times compared to the central
computing processor. However, a limited amount of video memory (2-6 Gb. compared with
4-32 Gb. RAM) does not allow full use of the advantage of GPU performance.
Overcoming this limitation - an urgent task. To solve this problem in this paper, we
propose to apply the pyramid method [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">9,10,11</xref>
        ]. It is based on reducing the intensity
of communication between RAM and VRAM due to duplication of arithmetic
operations.
      </p>
      <p>
        A similar problem applied to other differential equations of the previously
successfully solved in the works [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">12,13,14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>Yee algorithm</title>
      <p>
        Illustrating the application of the method of the pyramids to the times-difference
solution of Maxwell's equations, the authors settled on the classical Yee scheme for the
three-dimensional case [
        <xref ref-type="bibr" rid="ref14">15</xref>
        ], which is currently the most popular tool for
computational electrodynamics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The main feature of this scheme is to separate the location
of the nodes of the mesh region for each projection of the electric and magnetic fields,
providing increased order of approximation of the initial differential problem by time
and space.
We considered three-dimensional space of the area of computational experiment D3 (
0  t  T , 0  x  Lx , 0  y  Ly , 0  z  Lz ) the grid structure is superimposed on the
area D3h . In the nodes { (tmd , xia , y jb , zk c ) : tmd  (m  d )ht , m  0,1, ,
M  Mt , (M  T / ht ) , xia  (i  a)hx , i  0,1, , I  I x ,  I  Lx / hx  ,
y jb  ( j  b)hy
,
j  0,1, , J  J y,
(
      </p>
      <p>J  Ly / hy
),
zk c  (k  c)hz
md
k  0,1,.., K  K , K  Lz / hz } of grid defined projection of the field Aia, jb,kc . The
z
values of coefficients for different grid projections of electric and magnetic fields are
presented in table 1. The empty cells correspond to zero value of the coefficient. The
difference scheme for this region is represented by the equations:
1 
2i, j,k  0</p>
      <p>.  0 ,  0 it electric and magnetic constants,  i, j,k , i, j,k it grid
Where, C
ai, j,k

, C
bi, j,k

, D
ai, j,k

of the relative electrical and of the magnetic permittivity of linear isotropic
environment without dispersion,  i, j,k ,  *i, j,k it grid-specific of the electric and of the
magnetic conductivity.</p>
    </sec>
    <sec id="sec-3">
      <title>Software implementation of FDTD-method</title>
      <p>
        Consider the propagation of a plane wave in the three-dimensional region of space,
bounded absorbing layers CPML [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We consider the Maxwell equations and
difference scheme Yee for them [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The solution of difference equations is a means of
modeling.
      </p>
      <p>Define the grid area the size of 256256256 samples in space. The width of the
absorbing layer select equals 11 counts. It should be 467 Mb to accommodate such a
task in the memory card.</p>
      <p>
        We apply the method to a one-dimensional pyramid decomposition [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ] and carry out
the partition of the original grid area of one chosen direction. The following is the
author's code to CUDA C to Yee algorithm in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
__host__ void raschetGPU_pyramid_1d()
{
...
      </p>
      <p>// The number of launched blocks
int SIZEx = ((Imax-1)%BLOCK_DIMx==0) ?
(Imax-1)/BLOCK_DIMx : (Imax-1)/BLOCK_DIMx+1;
int SIZEy = ((Jmax-1)%BLOCK_DIMy==0) ?
(Jmax-1)/BLOCK_DIMy : (Jmax-1)/BLOCK_DIMy+1;
dim3 gridSize = dim3(SIZEx,SIZEy, 1);
dim3 blockSize = dim3(BLOCK_DIMx,BLOCK_DIMy,
BLOCK_DIMz);
create_temp_fields();
// Copying of overlapping regions into a temporary
// buffer
int countPyramidsK = ceil((Kmax-1) /
(pyramidBaseLengthK + 0.0f));
// The number of pyramids
int currentTime = 1;// The current step of time
int timeOfPass =((nmax)%PASS==0)?
nmax/PASS : nmax/PASS+1;
// The number of passes
// The loop of passes
for (int pass = 1; pass &lt;= PASS; pass++)
{
int currentBaseLengthK; // the width of the upper base
//of the current pyramid
int leftOffsetK; // the width of the left overlap of
// the bottom base of the pyramid
int rightOffsetK; // the width of the right overlap of
//the bottom base of the pyramid
int fullBaseLengthK;// the width of the bottom base of
//the current pyramid
int leftPyramidBorderK; // the index of the left
// border of the base of the pyramid
int rightPyramidBorderK;// the index of the right
// border of the base of the pyramid
int durationOfTimePass = timeOfPass * pass; // the
// time step corresponding to the end of the current
// passage
// Copying into a temporary buffer initial field
//values to pass
copyFields(...);
// The loop for pyramid
for(int pyramidIdK = 0; pyramidIdK &lt; countPyramidsK;
pyramidIdK++)
{
int startPyramidBasePositionK = pyramidIdK *
pyramid BaseLengthK; // The starting index of the
//upper base of the pyramid in the grid
getOffsets(pyramidIdK, pyramidBaseLengthK,
&amp;currentBaseLengthK, &amp;leftOffsetK, &amp;rightOffsetK);
getBorders(currentBaseLengthK, leftOffsetK,
rightOffsetK, &amp;fullBaseLengthK, &amp;leftPyramidBorderK,
&amp;rightPyramidBorderK);
// Copying fields on a graphics card
create_arrays_on_GPU(Imax,Jmax,fullBaseLengthK);
copy_temp_arrays_to_GPU(0,0,leftPyramidBorderK,
Imax,Jmax,fullBaseLengthK);
copy_constant_to_device();
// The loop of time inside passage
for (int n = currentTime; n &lt;= durationOfTimePass
&amp;&amp; n &lt;=nmax; n++)
{
// Kernel for conversion of magnetic field
//components on the GPU
kernelH_pyramid1&lt;&lt;&lt;gridSize,blockSize&gt;&gt;&gt;(...);
cudaEventSynchronize(syncEvent);
// Kernel for conversion of electric field
//components on the GPU
kenelE_pyramid1&lt;&lt;&lt;gridSize,blockSize&gt;&gt;&gt;(...);
cudaEventSynchronize(syncEvent);
}
// Copying data from video memory
copy_arrays_from_GPU_with_offset(currentBaseLengthK,
tartPyramidBasePositionK, leftOffsetK);
delete_arrays_on_GPU();
}
currentTime = durationOfTimePass +1;
}
Grid subdomain at the lower base of the pyramid (containing the initial data for each
pass) intersect. Therefore, these areas are in a temporary buffer (implemented
procedure create_temp_fields) must be copied to the beginning of the next pass. Then the
values necessary for the processing of the adjacent pyramid will not be lost.</p>
    </sec>
    <sec id="sec-4">
      <title>Computational experiments</title>
      <p>Numerical calculations to determine the duration of the experiments were carried out
on the video card NVIDIA GeForce GTX 660 Ti, and the CPU Intel Core 2 Duo
E8500. The study was carried out in the Ubuntu operating system. Exploring the
conditions of efficiency of the method of the pyramids will conduct a series of
computational experiments. Limit the amount of video memory of 245 MB, which will build
up to 3 pyramids with different base width and height. The purpose of the
experiments - determination dependence on the acceleration of calculations of these
parameters.
Minimum of time calculation is obtained by achieving a balance between reducing the
number of communications and increasing the number of arithmetic operations. This
is obtained (Fig. 1) using a pyramid base width 64 and height 20 in the total number
of time layers equal 256. With further increase of the height of the pyramid
calculation time begins to increase due to increased volume of duplicate data. This is
illustrated by the U-shaped dependence of the computation time of the height of the
pyramid.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Implementation of FDTD method with the use of the pyramids allows to overcome
the limitation on the amount of memory of the GPU. This allows you to use the GPU
advantage in speed. Pyramid method is based on reducing the intensity of
communication between main and video memory by duplicating arithmetic operations.
In applying the method of the pyramids is important to strike a balance between the
volume of communication and memory volume. To do this, you must find the
optimum width and height of the base of the pyramid.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This work was supported by grant RFBR 14-07-00291-а.
8. FDTD solver. URL: http://www.acceleware.com/fdtd-solvers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Taflove</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagness</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Computational</surname>
          </string-name>
          <article-title>Electrodynamics: The Finite-Difference TimeDomain Method</article-title>
          . Boston: Arthech House Publishers (Thrid Edition),
          <year>2005</year>
          ; 1006 p.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kotlyar</surname>
            <given-names>VV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stafeev</surname>
            <given-names>SS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feldman AYu</surname>
          </string-name>
          .
          <article-title>Photonic nanojets formed by square microsteps</article-title>
          .
          <source>Computer Optics</source>
          ,
          <year>2014</year>
          ;
          <volume>38</volume>
          (
          <issue>1</issue>
          ):
          <fpage>72</fpage>
          -
          <lpage>80</lpage>
          [in Russia].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tiranov</surname>
            <given-names>AD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalachev</surname>
            <given-names>AA</given-names>
          </string-name>
          .
          <article-title>Collective spontaneous emission in a waveguide with close to zero refractive index</article-title>
          .
          <source>Bulletin of the Russian Academy of Sciences</source>
          ,
          <year>2014</year>
          ;
          <volume>78</volume>
          (
          <issue>3</issue>
          ):
          <fpage>271</fpage>
          -
          <lpage>275</lpage>
          [In Russian].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Perov</surname>
            <given-names>SYu</given-names>
          </string-name>
          , Bogacheva EV.
          <article-title>Theoretical and experimental dosimetry in the evaluation of biological effects of electromagnetic fields of portable radio stations. Message 1</article-title>
          . Flat phantoms.
          <source>Radiation Biology. Radioecology</source>
          ,
          <year>2014</year>
          ;
          <volume>54</volume>
          (
          <issue>1</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Boreskov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harlamov</surname>
            <given-names>AA</given-names>
          </string-name>
          .
          <article-title>The basics of working with CUDA technology</article-title>
          . Moscow: “DMK Press” Publisher,
          <year>2010</year>
          ; 232 p. [in Russian].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>OpenCL -</surname>
          </string-name>
          <article-title>The open standard for parallel programming of heterogeneous systems</article-title>
          . URL: http://www.khronos.org/opencl/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>B-CALM - Belgium California</surname>
          </string-name>
          Light Machine. URL: http://b-calm.sourceforge.net/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lamport</surname>
            <given-names>L</given-names>
          </string-name>
          .
          <article-title>The parallel execution of DO loops</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <year>1974</year>
          ;
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>83</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          10.
          <string-name>
            <surname>Valkovski</surname>
            <given-names>VA</given-names>
          </string-name>
          .
          <article-title>Parallel execution cycles. The method of the pyramids</article-title>
          .
          <source>Cybernetics</source>
          ,
          <year>1983</year>
          ;
          <volume>5</volume>
          :
          <fpage>51</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          11.
          <string-name>
            <surname>Golovoshkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>The solution of grid equations on graphics processing devices. The method of the pyramids</article-title>
          .
          <article-title>Modern problems of applied mathematics and mechanics: theory, experiment and practice. International conference dedicated to the 90th anniversary from the birthday of academician NN</article-title>
          . Yanenko, Novosibirsk, Russia, 30 May - 4 June 2011 - Novosibirsk,
          <source>ICT SB RAS</source>
          ,
          <year>2011</year>
          , N 0321101160, http://conf.nsc.ru/files/conferences /niknik-90/fulltext/37858/ 46076/kochurov_final.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kochurov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>GPU implementation of Jacobi Method and Gauss-Seidel Method for Data Arrays that Exceed GPU-dedicated Memory Size</article-title>
          .
          <source>Journal of Mathematical Modelling and Algorithms in Operations Research</source>
          ,
          <year>2015</year>
          ;
          <volume>14</volume>
          (
          <issue>4</issue>
          ):
          <fpage>393</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kochurov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorotnikova</surname>
            <given-names>DG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>GPU implementation of Jacobi method for data arrays that exceed GPU-dedicated memory size</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2015</year>
          ;
          <volume>1490</volume>
          :
          <fpage>414</fpage>
          -
          <lpage>419</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>1613</fpage>
          -0073-2015-1490-414-419.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          14.
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kochurov</surname>
            <given-names>AV</given-names>
          </string-name>
          .
          <article-title>Pyramid method for GPU-aided finite difference method</article-title>
          .
          <source>Proceedings of the 13th International Conference on Computational and Mathematical Methods in Science and Engineering</source>
          , CMMSE 2013
          <fpage>24</fpage>
          -
          <lpage>27</lpage>
          June,
          <year>2013</year>
          ;
          <fpage>746</fpage>
          -
          <lpage>756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          15.
          <string-name>
            <surname>Yee</surname>
            <given-names>KS</given-names>
          </string-name>
          .
          <article-title>Numerical solution of initial boundary value problems involving Maxwell's equations in isotropic media</article-title>
          .
          <source>IEEE Trans. Antennas Propag</source>
          .,
          <year>1966</year>
          ;
          <volume>14</volume>
          :
          <fpage>302</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>