Application of the pyramid method in difference solution d’Alembert equations on graphic processor with the use of Matlab L.V. Yablokova1, D.L. Golovashkin1,2 1 Samara National Research University, 34 Moskovskoe Shosse, 443086, Samara, Russia 2 Image Processing Systems Institute – Branch of the Federal Scientific Research Centre “Crystallography and Photonics” of Russian Academy of Sciences, 151 Molodogvardeyskaya st., 443001, Samara, Russia Abstract The paper proposes a modification of the pyramid method for constructing algorithms for the difference solution of the d'Alembert equation on a graphics processor in the event of a shortage of video memory. The authors demonstrate the effectiveness of the method on the practical example of dividing the grid area into two sub domains. Acceleration reaches the characteristic for the case of a domain entirely located in the video memory. In the article investigated the effectiveness of using the author's approach depending on the height of the pyramid and showed the boundaries of applicability of the proposed modification. Keywords: The method of the pyramids; the grid area; difference solution; computing acceleration 1. Introduction The deep interconnectedness of optics and computing technology is due to their mutual influence in the course of which at the turn of the 70-ies and 80-ies of the last century there were two independent branches of science: computer optics associated with the development of numerical methods of calculation and simulation of optical devices on a computer and optical engineering, in which the optical elements are created computing devices. The growth of the relevance of the mentioned industries due to the perfection of the architecture of computers (multithreading, multicore, vectored calculations) and technologies of formation of optical elements (transition from micro to nano-size). The first feature allowed us to use the methods of a rigorous diffraction theory [1] for calculating the nano-sized elements of optical processors, characterized by high computational complexity. Among the numerical methods of the strict theory of diffraction, FDTD [1], deserving high universality (Maxwell's equations describe all wave electromagnetic processes) and the simplicity of understanding (based on the replacement of derivatives by difference relations) deserved wide popularity. The latter circumstance makes it possible to write the computational procedures of the method in a clear form in the popular language of matrix calculations of MATLAB [2] Unfortunately, the software implementation of the FDTD method on modern graphics computing devices that provide faster CPU computation by an order of magnitude is encountered, when using this language, with high demands on the amount of video memory: in the production of calculations, it is necessary to use several times more volume than when Work on the central processor. This circumstance is aggravated traditionally by small video memory sizes (up to 2GB in modern budget video cards) in comparison with operating memory (not less than 4GB, even for office computers). The authors of this publication see the application of the pyramid method as an example of the organization of calculations using the difference scheme Yee [1] on the GPU using CUDA C [3]. 2. Difference solution of the d'Alembert equation (one-dimensional case) on a graphics processor Traditionally, the FDTD method is understood to mean exclusively the difference solution of Maxwell's equations, which is not entirely correct. In the early 80's of the last century [1], the difference solution of the d'Alembert equation was also applied to it, which is still being done [1, 4]. We note that when solving the wave equation the problem of video memory shortage is more acute than for Maxwell's equations because of the necessity of finite-difference approximation of second, not first-order derivatives. However, the decision of the wave equation on the GPU seems more promising due to the high efficiency of vectorization of computational procedures [5]. Outlining the concept of the work, the authors decided to dwell on the one-dimensional equation of d'Alembert, seeking to demonstrate the possibilities of the pyramid method on a simple example. So known [1] the difference scheme for this equation Eik 1  2 Eik  Eik 1 2 Ei 1  2 Ei  Ei 1 k k k c (1) ht2 hz2 is written with respect to the grid function defined on the domain L D  {(tk , zi ) : tk  kht , k  0,1,..., Nt  T h , z  iht , i  1,..., N z  z } , where E the value of the electric field ht i hz  1 strength is, c is the speed of light in free space, T and Lz are size of the region in time and space. 3rd International conference “Information Technology and Nanotechnology 2017” 68 High-Performance Computing / L.V. Yablokova, D.L. Golovashkin Below is a fragment of the computational procedure for solving (1) in MATLAB, where c1  c ht 2 2 hz2 , c5  2 c ht  . % Placement of grid functions on two time layers in video memory E1=zeros(1,Nz,'gpuArray'); E2=zeros(1,Nz,'gpuArray'); for k=1:2:Nt % Passage through time layers of the grid area through one E1(2:Nz-1)=2*E2(2:Nz-1)-E1(2:Nz-1)+c1*diff(E2,2); % Calculations on the layer k E1(2)=sin(c5*k); % Hard radiation condition on the layer к E2(2:Nz-1)=2*E1(2:Nz-1)-E2(2:Nz-1)+c1*diff(E1,2); % Calculations on the layer k+1 E2(2)=sin(c5*(k+1)); % Hard radiation condition on the layer k+1 End E=gather(E2); % Transfer of results to RAM For N z  5 10 and N t  100 the duration of calculations on the Intel Core i7 CPU was 57.08 s., On the GeForce GTX 7 TITAN X GPU - 5.55 s. (acceleration of 10.29 times) using MATLAB 2015b and the operating system CentOS 7.2. Both used arrays occupied 762 MB in memory, however, during the computations on the CPU, the memory requirements increased by one and a half time, on the GPU the memory requirements increased threefold. Apparently, with the implementation of calculations for the design E1(2:Nz-1)=2*E2(2:Nz-1)-E1(2:Nz-1)+c1*diff(E2,2) on the CPU, the execution of the operation of numerical differentiation diff(E2,2) resulted in allocating additional memory for two copies of the array E2, and the execution on the GPU of the design as a completely required separate area of memory for all the arrays involved and for double copying E2. MATLAB takes about 0.4 gigabytes in RAM, but does not use video memory for its placement. Thus, the execution of the whole algorithm on the CPU was accompanying by the extraction of 1.52 GB. In addition, the execution of the whole algorithm on the GPU was accompanying by the extraction of 2.24 GB. Moreover, if the researcher has a video card with 2 GB of memory (like most popular video processors now) then the organization of calculations on the GPU becomes impossible. In his previous publication [7], using the difference scheme for the Maxwell equations, the CUDA C software tool the authors proposed to solve this problem using the method of pyramids. 3. The pyramid method application Will this be possible in this case, given that MATLAB is not specialized for working with graphics processors and its tools in this area are very meager? The essence of the mentioned method in the author's modification consists in splitting the grid domain into overlapping sub regions that fit in the video memory completely, with the subsequent organization of communications in the production of vector computations in each sub region separately. In this case, transfers from RAM to video and vice versa are performed not on each time layer, but through a certain number of them h (the height of the pyramid). This, on the one hand, leads to a reduction in h the number of communications. On the other hand, to the duplication of arithmetic operations in overlapping fragments of grid subdomains (the form of pyramids is available in the two-dimensional case). A fragment of the computational procedure implementing this strategy in the case under consideration is presented below, N where N  z . 2 Fig. 1. The scheme of the algorithm of the pyramids to work with the first domain; (1) there is a message to GPU, (2) there is a calculate to GPU, . (1) there is a message to CPU. % creating temporary layers on CPU and GPU E1=zeros(1,Nz); E2=zeros(1,Nz); E1m=zeros(1,h); E2m=zeros(1,h); E1g=zeros(1,N+h,'gpuArray'); E2=zeros(1,N+h,'gpuArray'); for t=1:h:Nt % Passage through the pyramids % work with the left subdomain E1g=gpuArray(E1(1:N+h)); E2g=gpuArray(E2(1:N+h)); % Forwarding CPU ==> GPU for k=1:2:h % Calculations inside the first pyramids E1g(2:N+h-k)=2*E2g(2:N+h-k)-E1g(2:N+h-k)+c1*diff(E2g(1:N+h-k+1),2); E1g(2)=sin(c5*(t+k-1)); 3rd International conference “Information Technology and Nanotechnology 2017” 69 High-Performance Computing / L.V. Yablokova, D.L. Golovashkin E2g(2:N+h-k-1)=2*E1g(2:N+h-k-1)-E2g(2:N+h-k-1)+c1*diff(E1g(1:N+h-k),2); E2g(2)=sin(c5*(t+k)); end E1(2:N-h)=gather(E1g(2:N-h)); E2(2:N-h)=gather(E2g(2:N-h)); % Forwarding GPU ==> CPU E1m(1:h)=gather(E1g(N-h+1:N)); E2m(1:h)=gather(E2g(N-h+1:N)); % work with the right subdomain E1g(1:N+h-1)=gpuArray(E1(N-h+1:Nz)); E2g(1:N+h-1)=gpuArray(E2(N-h+1:Nz)); for t=1:2:h % Calculation by layers of the pyramid E1g(t+1:N+h-2)=2*E2g(t+1:N+h-2)-E1g(t+1:N+h-2)+c1*diff(E2g(t:N+h-1),2); E2g(t+2:N+h-2)=2*E1g(t+2:N+h-2)-E2g(t+2:N+h-2)+c1*diff(E1g(t+1:N+h-1),2); end E1(N+1:Nz-1)=gather(E1g(h+1:N+h-2)); % Forwarding GPU ==> CPU E2(N+1:Nz-1)=gather(E2g(h+1:N+h-2)); E1(N-h+1:N)=E1m(1:h); E2(N-h+1:N)=E2m(1:h); % Replenishment of the result end In the course of experiments with the new algorithm, the dependence of the calculation time on the height of the pyramid. The Table 1 contains the results. Table 1. The dependence of the calculation duration of the calculation of the height of the pyramid. Height of the pyramid, h Computation time (s) Acceleration 2 53.02 1.08 4 29.58 1.93 10 15.49 3.7 20 10.75 5.31 50 7.9 7.23 4. Conclusion Thus, the method of pyramids can be effectively using in arranging calculations for solving difference equations with the help of MATLAB on graphic processors in the case when arrays storing values of grid functions do not fit into video memory as a whole. The development of the proposed algorithm for cases of large dimensions will be the next stage of the authors' research in this direction. Acknowledgements The research leading to these results has received funding from the Russian Science Foundation grant №16-47-630560-r_a. References [1] Taflove A, Hagness S. Computational Electrodynamics: The Finite-Difference Time-Domain Method. Boston: Aerotech House Publishers, 2005; 1006 p. [2] Elsherbeni А, Demir V. The Finite-Difference Time-Domain Method for Electromagnetics with MATLAB Simulations. Scitech Publishing Inc. 2009; 426 p. [3] Grigorjev IS, Mejlihov EZ. Physical Values: Reference Book. Moscow: Energoatomizdat Publisher, 1991; 1232 p. (in Russian). [4] Malysheva SA, Golovashkin DL. Realization of the difference solution of the Maxwell equations on graphic processors by the pyramid method. Computer Optics 2016; 40(2): 179–187. DOI: 10.18287/2412-6179-2016-40-2- 179-187. [5] Kozlova ES, Kotlyar VV. Simulation of the propagation of a short two-dimensional pulse of light. Computer Optics 2012; 36(2): 158–164. [6] Vorotnikova DG, Golovashkin DL. Difference solution of the wave equation on graphical processors with repeated use of pairwise sums of the differential pattern. Computer Optics 2017; 41(1): 134–138. DOI: 10.18287/2412-6179-2017-41-1-134-138. [7] Vorotnikova DG, Golovashkin DL. Algorithms with "long" vectors for solving grid equations of explicit difference schemes. Computer Optics 2015; 39(1): 87–93. DOI: 10.18287/0134-2452-2015-39-1-87-93. 3rd International conference “Information Technology and Nanotechnology 2017” 70