=Paper=
{{Paper
|id=Vol-1638/Paper57
|storemode=property
|title=Implementation of difference solutions of Maxwell's equations on the GPU by method of pyramid 
|pdfUrl=https://ceur-ws.org/Vol-1638/Paper57.pdf
|volume=Vol-1638
|authors=Lyudmila V. Yablokova,Dmitriy L. Golovashkin
}}
==Implementation of difference solutions of Maxwell's equations on the GPU by method of pyramid ==
<pdf width="1500px">https://ceur-ws.org/Vol-1638/Paper57.pdf</pdf>
<pre>
High-Performance Computing


 IMPLEMENTATION OF DIFFERENCE SOLUTIONS
   OF MAXWELL’S EQUATIONS ON THE GPU BY
           METHOD OF PYRAMID

                          L.V. Yablokova, D.L. Golovashkin

                   Samara National Research University, Samara, Russia


       Abstract. The work is dedicated to the development of the method of pyramids
       in the annex to the decision of Maxwell's equations in the time domain by
       means of the finite difference (FDTD) and its implementation on the GPU. This
       method allows you to remove the restriction on the amount of memory the
       graphics computing device.

       Keywords: FDTD, Maxwell's equations, method of pyramids, GPU, CUDA.


       Citation: Yablokova LV, Golovashkin DL. Implementation of difference solu-
       tions of Maxwell’s equations on the GPU by method of pyramid. CEUR Work-
       shop Proceedings, 2016; 1638: 469-476. DOI: 10.18287/1613-0073-2016-1638-
       469-476


Introduction
Difference solution of Maxwell's equations method in the time domain (FDTD) [1] is
widely used in modern research in the simulation: the propagation of radiation [2],
materials with new properties [3], the interaction of radiation with biological objects
[4] and in many other applications.
Its popularity is due to methodological clarity (based on the replacement of deriva-
tives by difference quotients), wide scope of application (rigorous theory of diffrac-
tion) and an abundance of ready-made software implementations (many commercial
and open source packages for different architectures and operating systems). The
price for the universality of the method - high demands on processor speed and
amount of RAM. Therefore developed half a century ago, the method has been active-
ly used in computational practice until now.
The development of computer technology hardware base faced with difficulties ("sili-
con dead end"). It is not by increasing clock speed of processors. Produces special
architectural solutions. They allow to increase the speed of certain types of opera-
tions. The most notable successes achieved in this direction during the development
of the architecture of graphics processors (GPU) and related software tools (CUDA
[5], the OpenCL [6]), allowing multiple accelerate action with vectors and matrices.


Information Technology and Nanotechnology (ITNT-2016)                                  469
High-Performance Computing                                               Yablokova LV, Golovashkin DL…


From a mathematical point of view, based on the FDTD method is an operation of
subtraction of matrices. Modern implementations FDTD-method on GPU [7, 8] char-
acterized by an increase in productivity of 10 times compared to the central compu-
ting processor. However, a limited amount of video memory (2-6 Gb. compared with
4-32 Gb. RAM) does not allow full use of the advantage of GPU performance.
Overcoming this limitation - an urgent task. To solve this problem in this paper, we
propose to apply the pyramid method [9,10,11]. It is based on reducing the intensity
of communication between RAM and VRAM due to duplication of arithmetic opera-
tions.
A similar problem applied to other differential equations of the previously successful-
ly solved in the works [12,13,14].


Yee algorithm
Illustrating the application of the method of the pyramids to the times-difference solu-
tion of Maxwell's equations, the authors settled on the classical Yee scheme for the
three-dimensional case [15], which is currently the most popular tool for computa-
tional electrodynamics [1]. The main feature of this scheme is to separate the location
of the nodes of the mesh region for each projection of the electric and magnetic fields,
providing increased order of approximation of the initial differential problem by time
and space.

       Table 1. The coefficients are grid to projections of the electric and magnetic fields

  A             d            Mt          a           Ix          b          Jy             c          Kz

  Ex                                    0,5          1

  Ey                                                           0,5              1

  Ez                                                                                       0,5         1

  Hx          0,5            1                                 0,5              1          0,5         1

  Hy          0,5            1          0,5          1                                     0,5         1

  Hz          0,5            1          0,5          1         0,5              1

We considered three-dimensional space of the area of computational experiment D3 (
0  t  T , 0  x  Lx , 0  y  Ly , 0  z  Lz ) the grid structure is superimposed on the

area D3h . In the nodes { (tm d , xi  a , y j b , zk c ) : tm  d  ( m  d ) ht , m  0,1,                      ,
M  M t , ( M  T / ht ) ,               xi  a  (i  a )hx ,       i  0,1,       , I  Ix ,    I  Lx / hx  ,
y j  b  ( j  b ) hy   ,        j  0,1,    , J  J y,   (     J  L y / hy         ),    zk  c  ( k  c ) hz        ,


Information Technology and Nanotechnology (ITNT-2016)                                                               470
High-Performance Computing                                                                                           Yablokova LV, Golovashkin DL…


k  0,1,.., K  K z , K  Lz / hz } of grid defined projection of the field Aima,dj b,k c . The
values of coefficients for different grid projections of electric and magnetic fields are pre-
sented in table 1. The empty cells correspond to zero value of the coefficient. The differ-
ence scheme for this region is represented by the equations:

                                                                                             Ezm              Ezmi , j ,k 0.5 E ymi , j 0.5,k 1  E ymi , j 0.5,k 
H xmi ,j 0.5         Da                       H xmi ,j 0.5         Db                  i , j 1,k 0.5                                                          
           0.5,k 0.5        i , j 0.5,k 0.5              0.5,k 0.5    i , j 0.5,k 0.5                 hy                                     hz                  
                                                                                                                                                                       

                                                                                              Exm                    Exmi0.5, j ,k     Ezmi1, j ,k 0.5  Ezmi , j ,k 0.5 
H ymi0.5,            Dai0.5, j ,k 0.5 H ymi0.5,            Dbi0.5, j ,k 0.5                                                                                         
                                                                                                   i 0.5, j ,k 1
        0.5                                        0.5
                                                                                                                                        
            j ,k 0.5                                  j ,k 0.5
                                                                                                                hz                                        hx                  
                                                                                                                                                                              

                                                                                       E ym                 E ymi , j 0.5,k Exmi0.5, j 1,k  Exmi0.5, j ,k 
H zmi0.5,0.5j 0.5,k  Dai0.5, j 0.5,k H zmi0.5,0.5j 0.5,k  Dbi0.5, j 0.5,k                                                                           
                                                                                            i 1, j 0.5,k
                                                                                                                              
                                                                                                          hx                                  hy                
                                                                                                                                                                

                                                                 H zm  0.5           H zmi0.5,0.5j 0.5,k H ymi0.5,
                                                                                                                       0.5
                                                                                                                                      H ymi0.5,
                                                                                                                                               0.5           
Exmi0.5,
       1
                 C             E m
                                                C                  i 0.5, j 0.5,k
                                                                                                              
                                                                                                                           j ,k 0.5               j ,k 0.5
                                                                                                                                                             
           j ,k    ai 0.5, j ,k xi 0.5, j ,k    bi 0.5, j ,k
                                                                                     hy                                             hz                      
                                                                                                                                                            

                                                                      H xm  0.5            H xmi ,j 0.5          H zmi0.5,0.5j 0.5,k  H zmi0.5,0.5j 0.5,k 
E ymi ,j 10.5,k  Cai , j 0.5,k E ymi , j 0.5,k  Cbi , j 0.5,k                                                                                                 
                                                                          i , j 0.5,k 0.5              0.5,k 0.5
                                                                                                                    
                                                                                           hz                                               hx                       
                                                                                                                                                                     

                                                                         H ym  0.5             H ymi0.5,
                                                                                                          0.5
                                                                                                                          H xmi ,j 0.5         H xmi ,j 0.5        
Ezmi ,j ,1k 0.5  Cai , j ,k 0.5 E zmi , j ,k 0.5  Cbi , j ,k 0.5                                                                                                
                                                                              i 0.5, j ,k 0.5               j ,k 0.5              0.5,k 0.5              0.5,k 0.5
                                                                                                                        
                                                                                               hx                                              hy                      
                                                                                                                                                                       

                                   i , j ,k ht                                                      ht                                            *i , j , k ht
                                   1                                                                                                             1
                                  2 i , j ,k  0                                               i, j ,k  0                                      2 i , j , k  0
Where,            Cai , j ,k                                     ,      Cb                                                ,     Dai , j ,k                                  ,
                                   i , j ,k ht                               i , j ,k            i , j ,k ht                                     *i , j ,k ht
                               1                                                            1                                                1
                                  2 i , j ,k  0                                               2 i , j ,k  0                                   2 i , j , k  0
                          ht
               i , j , k  0
Dbi , j ,k                      .  0 ,  0 it electric and magnetic constants,  i , j , k , i , j ,k it grid
                 *i , j , k ht
             1
                2 i , j , k  0
of the relative electrical and of the magnetic permittivity of linear isotropic environ-
ment without dispersion,  i , j ,k ,  *i , j ,k it grid-specific of the electric and of the mag-
netic conductivity.


Information Technology and Nanotechnology (ITNT-2016)                                                                                                                     471
High-Performance Computing                              Yablokova LV, Golovashkin DL…


Software implementation of FDTD-method
Consider the propagation of a plane wave in the three-dimensional region of space,
bounded absorbing layers CPML [1]. We consider the Maxwell equations and differ-
ence scheme Yee for them [1]. The solution of difference equations is a means of
modeling.
Define the grid area the size of 256  256  256 samples in space. The width of the
absorbing layer select equals 11 counts. It should be 467 Mb to accommodate such a
task in the memory card.
We apply the method to a one-dimensional pyramid decomposition [12] and carry out
the partition of the original grid area of one chosen direction. The following is the
author's code to CUDA C to Yee algorithm in [1]:
__host__ void raschetGPU_pyramid_1d()
{
  ...
   // The number of launched blocks
   int SIZEx = ((Imax-1)%BLOCK_DIMx==0) ?
   (Imax-1)/BLOCK_DIMx : (Imax-1)/BLOCK_DIMx+1;
   int SIZEy = ((Jmax-1)%BLOCK_DIMy==0) ?
   (Jmax-1)/BLOCK_DIMy : (Jmax-1)/BLOCK_DIMy+1;
   dim3 gridSize = dim3(SIZEx,SIZEy, 1);
   dim3 blockSize = dim3(BLOCK_DIMx,BLOCK_DIMy,
   BLOCK_DIMz);
   create_temp_fields();
   // Copying of overlapping regions into a temporary
   // buffer
   int countPyramidsK = ceil((Kmax-1) /
   (pyramidBaseLengthK + 0.0f));
   // The number of pyramids
   int currentTime = 1;// The current step of time
   int timeOfPass =((nmax)%PASS==0)?
   nmax/PASS : nmax/PASS+1;
   // The number of passes
   // The loop of passes
   for (int pass = 1; pass <= PASS; pass++)
   {
     int currentBaseLengthK; // the width of the upper base
     //of the current pyramid
     int leftOffsetK; // the width of the left overlap of
     // the bottom base of the pyramid
     int rightOffsetK; // the width of the right overlap of
     //the bottom base of the pyramid
     int fullBaseLengthK;// the width of the bottom base of
     //the current pyramid
     int leftPyramidBorderK; // the index of the left
     // border of the base of the pyramid
     int rightPyramidBorderK;// the index of the right
     // border of the base of the pyramid


Information Technology and Nanotechnology (ITNT-2016)                            472
High-Performance Computing                              Yablokova LV, Golovashkin DL…


      int durationOfTimePass = timeOfPass * pass; // the
      // time step corresponding to the end of the current
      // passage

      // Copying into a temporary buffer initial field
      //values to pass
      copyFields(...);

      // The loop for pyramid
      for(int pyramidIdK = 0; pyramidIdK < countPyramidsK;
      pyramidIdK++)
      {
        int startPyramidBasePositionK = pyramidIdK *
        pyramid BaseLengthK; // The starting index of the
        //upper base of the pyramid in the grid
        getOffsets(pyramidIdK, pyramidBaseLengthK,
        &currentBaseLengthK, &leftOffsetK, &rightOffsetK);
        getBorders(currentBaseLengthK, leftOffsetK,
        rightOffsetK, &fullBaseLengthK, &leftPyramidBorderK,
        &rightPyramidBorderK);

          // Copying fields on a graphics card
          create_arrays_on_GPU(Imax,Jmax,fullBaseLengthK);
          copy_temp_arrays_to_GPU(0,0,leftPyramidBorderK,
          Imax,Jmax,fullBaseLengthK);
          copy_constant_to_device();
          // The loop of time inside passage
          for (int n = currentTime; n <= durationOfTimePass
          && n <=nmax; n++)
          {
            // Kernel for conversion of magnetic field
            //components on the GPU
            kernelH_pyramid1<<<gridSize,blockSize>>>(...);
            cudaEventSynchronize(syncEvent);
            // Kernel for conversion of electric field
            //components on the GPU
            kenelE_pyramid1<<<gridSize,blockSize>>>(...);
            cudaEventSynchronize(syncEvent);
          }

      // Copying data from video memory
      copy_arrays_from_GPU_with_offset(currentBaseLengthK,
      tartPyramidBasePositionK, leftOffsetK);
      delete_arrays_on_GPU();
      }
      currentTime = durationOfTimePass +1;
     }
    ...
}


Information Technology and Nanotechnology (ITNT-2016)                            473
High-Performance Computing                                  Yablokova LV, Golovashkin DL…


Grid subdomain at the lower base of the pyramid (containing the initial data for each
pass) intersect. Therefore, these areas are in a temporary buffer (implemented proce-
dure create_temp_fields) must be copied to the beginning of the next pass. Then the
values necessary for the processing of the adjacent pyramid will not be lost.


Computational experiments
Numerical calculations to determine the duration of the experiments were carried out
on the video card NVIDIA GeForce GTX 660 Ti, and the CPU Intel Core 2 Duo
E8500. The study was carried out in the Ubuntu operating system. Exploring the con-
ditions of efficiency of the method of the pyramids will conduct a series of computa-
tional experiments. Limit the amount of video memory of 245 MB, which will build
up to 3 pyramids with different base width and height. The purpose of the experi-
ments - determination dependence on the acceleration of calculations of these parame-
ters.
Fig. 1 shows the results of measuring the duration of computation.


Fig. 1. The duration of the calculation y (ms) as a function of height x (number of samples) of
                         the pyramid for varying the width of the base

Minimum of time calculation is obtained by achieving a balance between reducing the
number of communications and increasing the number of arithmetic operations. This
is obtained (Fig. 1) using a pyramid base width 64 and height 20 in the total number
of time layers equal 256. With further increase of the height of the pyramid calcula-
tion time begins to increase due to increased volume of duplicate data. This is illus-
trated by the U-shaped dependence of the computation time of the height of the pyra-
mid.


Information Technology and Nanotechnology (ITNT-2016)                                       474
High-Performance Computing                               Yablokova LV, Golovashkin DL…


Conclusion
Implementation of FDTD method with the use of the pyramids allows to overcome
the limitation on the amount of memory of the GPU. This allows you to use the GPU
advantage in speed. Pyramid method is based on reducing the intensity of communi-
cation between main and video memory by duplicating arithmetic operations.
In applying the method of the pyramids is important to strike a balance between the
volume of communication and memory volume. To do this, you must find the opti-
mum width and height of the base of the pyramid.


Acknowledgment

This work was supported by grant RFBR 14-07-00291-а.


References
 1. Taflove A, Hagness S. Computational Electrodynamics: The Finite-Difference Time-
    Domain Method. Boston: Arthech House Publishers (Thrid Edition), 2005; 1006 p.
 2. Kotlyar VV, Stafeev SS, Feldman AYu. Photonic nanojets formed by square microsteps.
    Computer Optics, 2014; 38(1): 72-80 [in Russia].
 3. Tiranov AD, Kalachev AA. Collective spontaneous emission in a waveguide with close to
    zero refractive index. Bulletin of the Russian Academy of Sciences, 2014; 78(3): 271-275
    [In Russian].
 4. Perov SYu, Bogacheva EV. Theoretical and experimental dosimetry in the evaluation of
    biological effects of electromagnetic fields of portable radio stations. Message 1. Flat
    phantoms. Radiation Biology. Radioecology, 2014; 54(1): 57-61.
 5. Boreskov AV, Harlamov AA. The basics of working with CUDA technology. Moscow:
    “DMK Press” Publisher, 2010; 232 p. [in Russian].
 6. OpenCL – The open standard for parallel programming of heterogeneous systems.
    URL: http://www.khronos.org/opencl/.
 7. B-CALM - Belgium California Light Machine. URL: http://b-calm.sourceforge.net/
 8. FDTD solver. URL: http://www.acceleware.com/fdtd-solvers.
 9. Lamport L. The parallel execution of DO loops. Communications of the ACM, 1974;
    17(2): 83-93.
10. Valkovski VA. Parallel execution cycles. The method of the pyramids. Cybernetics, 1983;
    5: 51-55.
11. Golovoshkin DL. The solution of grid equations on graphics processing devices. The
    method of the pyramids. Modern problems of applied mathematics and mechanics: theory,
    experiment and practice. International conference dedicated to the 90th anniversary from
    the birthday of academician NN. Yanenko, Novosibirsk, Russia, 30 May – 4 June 2011 -
    Novosibirsk, ICT SB RAS, 2011, N 0321101160, http://conf.nsc.ru/files/conferences
    /niknik-90/fulltext/37858/ 46076/kochurov_final.pdf.


Information Technology and Nanotechnology (ITNT-2016)                                   475
High-Performance Computing                               Yablokova LV, Golovashkin DL…


12. Kochurov AV, Golovashkin DL. GPU implementation of Jacobi Method and Gauss-Seidel
    Method for Data Arrays that Exceed GPU-dedicated Memory Size. Journal of Mathemati-
    cal Modelling and Algorithms in Operations Research, 2015; 14(4): 393-405.
13. Kochurov AV, Vorotnikova DG, Golovashkin DL. GPU implementation of Jacobi method
    for data arrays that exceed GPU-dedicated memory size. CEUR Workshop Proceedings,
    2015; 1490: 414-419. DOI: 10.18287/1613-0073-2015-1490-414-419.
14. Golovashkin DL, Kochurov AV. Pyramid method for GPU-aided finite difference method.
    Proceedings of the 13th International Conference on Computational and Mathematical
    Methods in Science and Engineering, CMMSE 2013 24–27 June, 2013; 746-756.
15. Yee KS. Numerical solution of initial boundary value problems involving Maxwell’s equa-
    tions in isotropic media. IEEE Trans. Antennas Propag., 1966; 14: 302-307.


Information Technology and Nanotechnology (ITNT-2016)                                  476

</pre>