<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Application of the pyramid method in difference solution d'Alembert equations on graphic processor with the use of Matlab</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>L.V. Yablokova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D.L. Golovashkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute - Branch of the Federal Scientific Research Centre “Crystallography and Photonics” of Russian Academy of Sciences</institution>
          ,
          <addr-line>151 Molodogvardeyskaya st., 443001, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>34 Moskovskoe Shosse, 443086, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>68</fpage>
      <lpage>70</lpage>
      <abstract>
        <p>The paper proposes a modification of the pyramid method for constructing algorithms for the difference solution of the d'Alembert equation on a graphics processor in the event of a shortage of video memory. The authors demonstrate the effectiveness of the method on the practical example of dividing the grid area into two sub domains. Acceleration reaches the characteristic for the case of a domain entirely located in the video memory. In the article investigated the effectiveness of using the author's approach depending on the height of the pyramid and showed the boundaries of applicability of the proposed modification.</p>
      </abstract>
      <kwd-group>
        <kwd>The method of the pyramids</kwd>
        <kwd>the grid area</kwd>
        <kwd>difference solution</kwd>
        <kwd>computing acceleration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        E k 1  2Eik  E k 1
i i
ht2
 c2 Eik1  2hEz2ik  Eik1
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
is written with respect to the grid function defined on the domain
Dh  {(tk , zi ) : tk  kht , k  0,1,..., Nt  T ht , zi  iht , i  1,..., N z 
Lz
strength is, c is the speed of light in free space, T and Lz are size of the region in time and space.
h  1} , where E the value of the electric field
z
      </p>
      <p>
        High-Performance Computing / L.V. Yablokova, D.L. Golovashkin
Below is a fragment of the computational procedure for solving (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) in MATLAB, where c1  c2 h2 h2 , c5  2 c ht  .
t z
% Placement of grid functions on two time layers in video memory
      </p>
      <p>E1=zeros(1,Nz,'gpuArray'); E2=zeros(1,Nz,'gpuArray');
for k=1:2:Nt % Passage through time layers of the grid area through one
E1(2:Nz-1)=2*E2(2:Nz-1)-E1(2:Nz-1)+c1*diff(E2,2); % Calculations on the layer k</p>
      <p>
        E1(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )=sin(c5*k); % Hard radiation condition on the layer к
E2(2:Nz-1)=2*E1(2:Nz-1)-E2(2:Nz-1)+c1*diff(E1,2); % Calculations on the layer k+1
      </p>
      <p>
        E2(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )=sin(c5*(k+1)); % Hard radiation condition on the layer k+1
      </p>
      <p>End
E=gather(E2); % Transfer of results to RAM</p>
      <p>For Nz  5107 and Nt  100 the duration of calculations on the Intel Core i7 CPU was 57.08 s., On the GeForce GTX
TITAN X GPU - 5.55 s. (acceleration of 10.29 times) using MATLAB 2015b and the operating system CentOS 7.2. Both used
arrays occupied 762 MB in memory, however, during the computations on the CPU, the memory requirements increased by one
and a half time, on the GPU the memory requirements increased threefold. Apparently, with the implementation of calculations
for the design E1(2:Nz-1)=2*E2(2:Nz-1)-E1(2:Nz-1)+c1*diff(E2,2) on the CPU, the execution of the operation of numerical
differentiation diff(E2,2) resulted in allocating additional memory for two copies of the array E2, and the execution on the GPU
of the design as a completely required separate area of memory for all the arrays involved and for double copying E2. MATLAB
takes about 0.4 gigabytes in RAM, but does not use video memory for its placement. Thus, the execution of the whole algorithm
on the CPU was accompanying by the extraction of 1.52 GB. In addition, the execution of the whole algorithm on the GPU was
accompanying by the extraction of 2.24 GB. Moreover, if the researcher has a video card with 2 GB of memory (like most
popular video processors now) then the organization of calculations on the GPU becomes impossible. In his previous publication
[7], using the difference scheme for the Maxwell equations, the CUDA C software tool the authors proposed to solve this
problem using the method of pyramids.</p>
    </sec>
    <sec id="sec-2">
      <title>3. The pyramid method application</title>
      <p>2 .</p>
      <p>Will this be possible in this case, given that MATLAB is not specialized for working with graphics processors and its tools
in this area are very meager?</p>
      <p>The essence of the mentioned method in the author's modification consists in splitting the grid domain into overlapping sub
regions that fit in the video memory completely, with the subsequent organization of communications in the production of
vector computations in each sub region separately. In this case, transfers from RAM to video and vice versa are performed not
on each time layer, but through a certain number of them h (the height of the pyramid). This, on the one hand, leads to a
reduction in h the number of communications. On the other hand, to the duplication of arithmetic operations in overlapping
fragments of grid subdomains (the form of pyramids is available in the two-dimensional case).</p>
      <p>A fragment of the computational procedure implementing this strategy in the case under consideration is presented below,
where N  Nz
% creating temporary layers on CPU and GPU</p>
      <p>E1=zeros(1,Nz); E2=zeros(1,Nz); E1m=zeros(1,h); E2m=zeros(1,h);
E1g=zeros(1,N+h,'gpuArray'); E2=zeros(1,N+h,'gpuArray');</p>
      <p>for t=1:h:Nt % Passage through the pyramids
% work with the left subdomain</p>
      <p>E1g=gpuArray(E1(1:N+h)); E2g=gpuArray(E2(1:N+h)); % Forwarding CPU ==&gt; GPU
for k=1:2:h % Calculations inside the first pyramids</p>
      <p>E1g(2:N+h-k)=2*E2g(2:N+h-k)-E1g(2:N+h-k)+c1*diff(E2g(1:N+h-k+1),2);</p>
      <p>
        E1g(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )=sin(c5*(t+k-1));
      </p>
      <p>High-Performance Computing / L.V. Yablokova, D.L. Golovashkin
E2g(2:N+h-k-1)=2*E1g(2:N+h-k-1)-E2g(2:N+h-k-1)+c1*diff(E1g(1:N+h-k),2);</p>
      <p>
        E2g(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )=sin(c5*(t+k));
end
E1(2:N-h)=gather(E1g(2:N-h)); E2(2:N-h)=gather(E2g(2:N-h)); % Forwarding GPU ==&gt; CPU
      </p>
      <p>E1m(1:h)=gather(E1g(N-h+1:N)); E2m(1:h)=gather(E2g(N-h+1:N));
% work with the right subdomain</p>
      <p>E1g(1:N+h-1)=gpuArray(E1(N-h+1:Nz)); E2g(1:N+h-1)=gpuArray(E2(N-h+1:Nz));
for t=1:2:h % Calculation by layers of the pyramid</p>
      <p>E1g(t+1:N+h-2)=2*E2g(t+1:N+h-2)-E1g(t+1:N+h-2)+c1*diff(E2g(t:N+h-1),2);</p>
      <p>E2g(t+2:N+h-2)=2*E1g(t+2:N+h-2)-E2g(t+2:N+h-2)+c1*diff(E1g(t+1:N+h-1),2);
end
E1(N+1:Nz-1)=gather(E1g(h+1:N+h-2)); % Forwarding GPU ==&gt; CPU
E2(N+1:Nz-1)=gather(E2g(h+1:N+h-2));</p>
      <p>E1(N-h+1:N)=E1m(1:h); E2(N-h+1:N)=E2m(1:h); % Replenishment of the result
end</p>
      <p>In the course of experiments with the new algorithm, the dependence of the calculation time on the height of the pyramid.
The Table 1 contains the results.</p>
      <p>Height of the pyramid, h</p>
      <p>Computation time (s)</p>
      <p>Acceleration
2
4
10
20
50
53.02
29.58
15.49
10.75
7.9
1.08
1.93
3.7
5.31
7.23</p>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements References</title>
      <p>Thus, the method of pyramids can be effectively using in arranging calculations for solving difference equations with the help
of MATLAB on graphic processors in the case when arrays storing values of grid functions do not fit into video memory as a
whole. The development of the proposed algorithm for cases of large dimensions will be the next stage of the authors' research in
this direction.</p>
      <p>The research leading to these results has received funding from the Russian Science Foundation grant №16-47-630560-r_a.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Taflove</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagness</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Computational Electrodynamics: The Finite-Difference</surname>
            <given-names>Time-Domain</given-names>
          </string-name>
          <string-name>
            <surname>Method</surname>
          </string-name>
          . Boston: Aerotech House Publishers,
          <year>2005</year>
          ; 1006 p.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Elsherbeni</surname>
            <given-names>А</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demir</surname>
            <given-names>V</given-names>
          </string-name>
          .
          <article-title>The Finite-Difference Time-Domain Method for Electromagnetics with MATLAB Simulations</article-title>
          . Scitech Publishing Inc.
          <year>2009</year>
          ; 426 p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Grigorjev</surname>
            <given-names>IS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mejlihov</surname>
            <given-names>EZ</given-names>
          </string-name>
          . Physical Values: Reference Book. Moscow: Energoatomizdat Publisher,
          <year>1991</year>
          ; 1232 p.
          <article-title>(in Russian)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Malysheva</surname>
            <given-names>SA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>Realization of the difference solution of the Maxwell equations on graphic processors by the pyramid method</article-title>
          .
          <source>Computer Optics</source>
          <year>2016</year>
          ;
          <volume>40</volume>
          (
          <issue>2</issue>
          ):
          <fpage>179</fpage>
          -
          <lpage>187</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2016-40-2-
          <fpage>179</fpage>
          -187.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kozlova</surname>
            <given-names>ES</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotlyar</surname>
            <given-names>VV</given-names>
          </string-name>
          .
          <article-title>Simulation of the propagation of a short two-dimensional pulse of light</article-title>
          .
          <source>Computer Optics</source>
          <year>2012</year>
          ;
          <volume>36</volume>
          (
          <issue>2</issue>
          ):
          <fpage>158</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Vorotnikova</surname>
            <given-names>DG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>Difference solution of the wave equation on graphical processors with repeated use of pairwise sums of the differential pattern</article-title>
          .
          <source>Computer Optics</source>
          <year>2017</year>
          ;
          <volume>41</volume>
          (
          <issue>1</issue>
          ):
          <fpage>134</fpage>
          -
          <lpage>138</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2017-41-1-
          <fpage>134</fpage>
          -138.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Vorotnikova</surname>
            <given-names>DG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golovashkin</surname>
            <given-names>DL</given-names>
          </string-name>
          .
          <article-title>Algorithms with "long" vectors for solving grid equations of explicit difference schemes</article-title>
          .
          <source>Computer Optics</source>
          <year>2015</year>
          ;
          <volume>39</volume>
          (
          <issue>1</issue>
          ):
          <fpage>87</fpage>
          -
          <lpage>93</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>0134</fpage>
          -2452-2015-39-1-
          <fpage>87</fpage>
          -93.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>