=Paper=
{{Paper
|id=Vol-1787/376-380-paper-65
|storemode=property
|title=Calculation of ground states of few-body nuclei using NVIDIA CUDA technology
|pdfUrl=https://ceur-ws.org/Vol-1787/376-380-paper-65.pdf
|volume=Vol-1787
|authors=Mikhail Naumenko,Vyacheslav Samarin
}}
==Calculation of ground states of few-body nuclei using NVIDIA CUDA technology==
<pdf width="1500px">https://ceur-ws.org/Vol-1787/376-380-paper-65.pdf</pdf>
<pre>
        Calculation of ground states of few-body nuclei using
                     NVIDIA CUDA technology
                                 M. A. Naumenko1,a, V. V. Samarin1,2
 1
     Flerov Laboratory of Nuclear Reactions, Joint Institute for Nuclear Research, 6 Joliot-Curie st., Moscow reg.,
                                        Dubna, 141980, Russian Federation

         2
             Dubna State University, 19 Universitetskaya st., Moscow reg., Dubna, 141982, Russian Federation
                                             E-mail: a anaumenko@jinr.ru


     The possibility of application of modern parallel computing solutions to speed up the calculations
of ground states of few-body nuclei by Feynman's continual integrals method has been investigated.
These calculations may require large computational time, particularly in the case of systems with
many degrees of freedom. The results of application of general-purpose computing on graphics pro-
cessing units (GPGPU) using NVIDIA CUDA technology are presented. The algorithm allowing us to
perform calculations directly on GPU was developed and implemented in C++ programming lan-
guage. Calculations were performed on the NVIDIA Tesla K40 accelerator installed within the hetero-
geneous cluster of the Laboratory of Information Technologies, Joint Institute for Nuclear Research,
Dubna. The energy and the square modulus of the wave function of the ground states of several few-
body nuclei have been calculated. The results show that the use of GPGPU significantly increases the
speed of calculations.

        Keywords: NVIDIA CUDA, Feynman’s continual integrals method, few-body nuclei

The work was supported by grant 15-07-07673-a of the Russian Foundation for Basic Research (RFBR).


                                                      © 2016 Михаил Алексеевич Науменко, Вячеслав Владимирович Самарин


                                                                                                                376
1. Introduction
      In this work an attempt is made to use modern parallel computing solutions to speed up the calcu-
lations of ground states of few-body nuclei by Feynman’s continual integrals method [Samarin, 2015;
Samarin, Naumenko, 2016; Naumenko, Samarin, 2016]. The algorithm allowing us to perform calcu-
lations directly on GPU was developed and implemented in C++ programming language. The energy
and the square modulus of the wave function of the ground states of several few-body nuclei have
been calculated using NVIDIA CUDA technology [NVIDIA; Sanders J., Kandrot E., 2011]. The re-
sults show that the use of GPU is very effective for these calculations.


2. Theory and computing

     The energy E 0 and the square modulus of the wave function  0 of the ground state of a system
                                                                           2


of few particles with coordinates q may be calculated by Feynman’s continual integrals method using
the propagator K E  q, ; q,0  in Euclidian time  [Shuryak, Zhirov, 1984]
                                                            
                                                  E                    E 
               K E  q, ; q,0     n (q) exp   n     E (q) exp  
                                            2                       2
                                                                               g ( E ) dE .          (1)
                                   n                   Econt               
Here g ( E ) is the density of states with the continuous spectrum E  Econt . For the system with a dis-
crete spectrum and finite motion of particles the square modulus of the wave function of the ground
state may be found in the limit    together with the energy E 0
                                                                  E 
                                 K E  q, ; q,0    0 (q) exp   0  ,    .
                                                            2
                                                                                             (2)
                                                                      
The theoretical approach is described in detail in [Naumenko, Samarin, 2016]. The calculation of
K E  q, ; q,0  for the fixed  was performed by parallel calculation of exponentials F
                                                       N
                                                                 
                                       F  exp  b0 V  qk                                    (3)
                                                      k 1      
for every random trajectory qk  f (q, k ) , where N   /  . The same nucleon-nucleon interaction
potentials V  qk  were used for all the studied nuclei.
     The Monte Carlo algorithm for numerical calculations was developed and implemented in C++
programming language using NVIDIA CUDA technology. The integration method does not require
the use of any additional integration libraries. The calculation included 3 steps:
     1) K E  q, ; q,0  was calculated in a set of multidimensional points q and the maximum of
         K E  q, ; q,0  (i.e.  0 ) was found.
                                  2


     2) The q0 corresponding to the obtained maximum was fixed, K E  q0 , ; q0 ,0  was calculated for
         several increasing values of  and the linear region of K E  q0 , ; q0 ,0  was found.
     3) The time lin corresponding to the beginning of the obtained linear region was fixed and
          K E  q0 , lin ; q0 ,0  (i.e.  0 ) was calculated in all points of the necessary region.
                                      2


      The principal scheme of the calculation of the ground state energy is shown in Fig. 1. The calcu-
lation of the propagator is performed using L sequential launches of the kernel. Each kernel launch
simulates n random trajectories in the space evolving from the Euclidean time   0 to  j , where


                                                                                                    377
j  1, L . All trajectories with   N j   j /  time steps start at the same point q (0) in the space and in
the moment  j return to the same point q (0) according to the chosen probability distribution.


                        Fig. 1. The scheme of calculation of the ground state energy E0

     All threads in a given kernel launch finish at approximately the same time, which makes the
scheme quite effective in spite of the possible delays associated with the kernel launch overhead. Be-
sides, the typical number of kernel launches L required for the calculation of the ground state energy
usually does not exceed 100.
     Starting from the certain time lin the obtained values of the logarithm of the propagator
b01 ln K E tend to lie on the straight line, the slope of which gives the value of the ground state energy.
The time lin is then used in the calculation of the square modulus of the wave function.
       The principal scheme of the calculation of the square modulus of the wave function is shown in
Fig. 2. Similarly, the calculation is performed using M sequential launches of the kernel. Each kernel
launch simulates n random trajectories in the space evolving from the Euclidean time   0 to the time
 lin determined in the calculation of the ground state energy. All trajectories start at the same point
q ( s ) in the space and in the moment lin return back to the same point q ( s ) according to the chosen
probability distribution. Here s  1, M , where M is the total number of points in the space in which the
square modulus of the wave function must be calculated.
      One of the benefits of the approach is that the calculation may be easily resumed later. For exam-
ple, initially the square modulus of the wave function may be calculated with a large space step to ob-
tain the general features of the probability distribution, and later new intermediate points are calculat-
ed and combined with those calculated previously. This may be very useful because the calculation of
the square modulus of the wave function is generally much more time-consuming since it requires cal-
culation in many points in the multidimensional space.
      An important feature of the algorithm allowing effective use of graphic processors is low con-
sumption of memory during the calculation because it is not necessary to prepare a grid of values and
store it in the memory.
      To obtain normally distributed random numbers the cuRAND random number generator was
used. According to the recommendations of the cuRAND developers, each experiment was assigned a
unique seed. Within the experiment, each thread of computation was assigned a unique sequence


                                                                                                        378
number. All threads between kernel launches were given the same seed, and the sequence numbers
were assigned in a monotonically increasing way.


                 Fig. 2. The scheme of calculation of the square modulus of the wave function

      Calculations were performed on the NVIDIA Tesla K40 accelerator installed within the hetero-
geneous cluster [Heterogeneous Cluster] of the Laboratory of Information Technologies, Joint Institute
for Nuclear Research, Dubna. The code was compiled with NVIDIA CUDA version 7.5 for architec-
ture version 3.5. Calculations were performed with single precision.
      The energy of the ground state is negative and therefore only the first term in formula (1) in-
creases with the increase of  , whereas the energies of the excited states are positive and hence the
other terms in decrease with the increase of  . The slope of the linear regression equals the energy of
the ground state E 0 .
      The obtained theoretical binding energies Eb   E0 are listed in Tab. 1 together with the experi-
mental values taken from the knowledge base [Zagrebaev, Denikin, Karpov, Alekseev, Naumenko,
Rachkov, Samarin, Saiko]. It is clear that the theoretical values are close enough to the experimental
ones, though obtaining good agreement was not the goal. The observed difference between the calcu-
lated binding energies of 3H and 3He is also in agreement with the experimental values.

 Tab. 1. Comparison of theoretical and experimental binding energies for the ground states of the studied nuclei
                 Atomic nucleus Theoretical value, MeV Experimental value, MeV
                       2
                         H                 1.17 ± 1                    2.225
                       3
                         H                 9.29 ± 1                    8.482
                      3
                        He                 6.86 ± 1                    7.718
                      4
                        He                 26.95 ± 1                  28.296
     The code implementing Feynman's continual integrals method was initially written for CPU. The
comparison of the calculation time of the ground state energy for 3He using Intel Core i5 3470 (double
precision) and NVIDIA Tesla K40 (single precision) with different statistics is shown in Tab. 2. Even
taking into account that the code for CPU used only one thread, double precision and a different ran-
dom number generator, the time difference is impressive. This fact allows us to increase the statistics
and the accuracy of calculations in the case of using NVIDIA CUDA technology.
     The comparison of the calculation time of the square modulus of the wave function for the
ground state of 3He using Intel Core i5 3470 and NVIDIA Tesla K40 with the statistics 10 6 for every
point in the space and the total number of points 43200 is shown in Tab. 3. The value ~ 177 days for


                                                                                                           379
CPU is an estimation based on the performance gain in the calculation of the ground state energy. It is
evident that beside the performance gain the use of NVIDIA CUDA technology in certain cases may
enable calculations impossible before.

            Tab. 2. Comparison of the calculation time of the ground state energy for 3He nucleus
      Statistics,        Intel Core i5 3470          NVIDIA Tesla K40, Performance gain,
          n       (1 thread, double precision), sec (single precision), sec  times
         105                   ~ 1854                         ~8             ~ 232
         106                   ~ 18377                       ~ 47            ~ 391
        5·10  6
                                  −                          ~ 221             −
         107                      −                          ~ 439             −

 Tab. 3. Comparison of the calculation time of the square modulus of the wave function for the ground state of
                                                  3
                                                    He nucleus
              Statistics,            Intel Core i5 3470            NVIDIA Tesla K40,
                  n       (1 thread, double precision, estimation)  (single precision)
                 106                     ~ 177 days                     ~ 11 hours


3. Conclusion
      In this work an attempt is made to use modern parallel computing solutions to speed up the calcu-
lations of ground states of few-body nuclei by Feynman’s continual integrals method. The algorithm
allowing us to perform calculations directly on GPU was developed and implemented in C++ pro-
gramming language. The method was applied to the nuclei consisting of nucleons, but it may also be
applied to the calculation of cluster nuclei. The results show that the use of GPGPU significantly in-
creases the speed of calculations. This allows us to increase the statistics and the accuracy of calcula-
tions as well as reduce the space step in calculations of wave functions. It also greatly simplifies the
process of debugging and testing. In certain cases, the use of NVIDIA CUDA enables calculations
impossible before.


References
Heterogeneous Cluster of LIT, JINR [Electronic resource]: http://hybrilit.jinr.ru/.
Naumenko M. A., Samarin V. V. Application of CUDA technology to calculation of ground states of
      few-body nuclei by Feynman’s continual integrals method // Supercomputing frontiers and in-
      novations. — 2016. — Vol. 3, No. 2. — P. 80–95.
NVIDIA CUDA [Electronic resource]: http://developer.nvidia.com/cuda-zone/.
Samarin V. V. Quantum Description of Coupling to Neutron-Rearrangement Channels in Fusion Reac-
      tions near the Coulomb Barrier // Phys. At. Nucl. — 2015. — Vol. 78, No. 7 — P. 861–872.
Samarin V. V., Naumenko M. A. Study of Ground States of 3,4,6He Nuclides by Feynman’s Continual
      Integrals Method // Bull. Russ. Acad. Sci. Phys. — 2016. — Vol. 80, No. 3 — P. 283–289.
Sanders J., Kandrot E. CUDA by Example: An Introduction to General-Purpose GPU Programming.
      — New York: Addison-Wesley, 2011.
Shuryak E. V., Zhirov O. V. Testing Monte Carlo Methods for Path Integrals in Some Quantum Me-
      chanical Problems // Nucl. Phys. B. — 1984. — Vol. 242. — P. 393–406.
Zagrebaev V. I., Denikin A. S., Karpov A. V., Alekseev A. P., Naumenko M. A., Rachkov V. A., Samarin
      V. V., Saiko V. V. NRV web knowledge base on low-energy nuclear physics [Electronic re-
      source]: http://nrv.jinr.ru/


                                                                                                          380

</pre>