=Paper=
{{Paper
|id=Vol-2507/225-229-paper-39
|storemode=property
|title=Accelerating the Particle-in-cell Method of Plasma and Particle Beam Simulation Using CUDA Tools
|pdfUrl=https://ceur-ws.org/Vol-2507/225-229-paper-39.pdf
|volume=Vol-2507
|authors=Ivan Kadochnikov
}}
==Accelerating the Particle-in-cell Method of Plasma and Particle Beam Simulation Using CUDA Tools==
<pdf width="1500px">https://ceur-ws.org/Vol-2507/225-229-paper-39.pdf</pdf>
<pre>
          Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                             Budva, Becici, Montenegro, September 30 – October 4, 2019


  ACCELERATING THE PARTICLE-IN-CELL METHOD OF
   PLASMA AND PARTICLE BEAM SIMULATION USING
                  CUDA TOOLS
                                          I.S. Kadochnikov1,2
  1
      Joint Institute for Nuclear Research, 6 Joliot-Curie St, Dubna, Moscow Region, 141980, Russia
      2
          Plekhanov Russian University of Economics, Stremyanny lane, 36, Moscow, 117997, Russia

                                           E-mail: kadivas@jinr.ru


Numerical simulation of charged particle dynamics is required for the development of electron beam
ion sources (EBIS). Special attention should be paid to the formation and evolution of different
instabilities that can manifest in such sources. Understanding these processes will allow the efficiency
of ion sources to be improved. The simulation of two-stream instability emerging between slow ions
and fast electrons is particularly important. To perform the required simulation, “ef” and “ef_python”
applications are under development. In these applications simulation is performed using the particle-
in-cell (PIC) method, which contains four main parts: a particle mover, a particle-to-grid scatter, a
field solver, field-to-particle interpolation. Each part’s performance depends on the number of
particles, as well as on the granularity of the simulation spatial mesh. The field solver usually
represents a major portion of computational difficulty. This report describes the efforts to improve the
performance of the ef_python application using CuPy, AMGX and PyAMG libraries. With minimal
changes in the source code, CuPy allowed us to port all simulation computations to CUDA. In
addition, to accelerate the field solver, algebraic multigrid methods, implemented by the PyAMG
library on CPU and the AMGX library on GPU, were utilized. Major speed-up was achieved,
especially when running simulations with very high-resolution spatial grids on high-performance
GPUs.

Keywords: plasma simulation, particle-in-cell, CUDA, GPU, CuPy


                                                                                          Ivan Kadochnikov


                                                                                                             `
                                                               Copyright © 2019 for this paper by its authors.
                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                         225
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019


1. Introduction
        Numerical simulation of charged particle dynamics is required for the development of electron
beam ion sources (EBIS) [1]. Special attention should be paid to the formation and evolution of
different instabilities that can manifest in such sources. Understanding these processes can allow the
efficiency of ion sources to be improved. The simulation of two-stream instability emerging between
slow ions and fast electrons is particularly important.
        To perform the required simulation, “ef” and “ef_python” applications are under development
[2] using the particle-in-cell (PIC) method [3]. Ef is written in C++ for performance reasons, and there
are several initiatives to use MPI, OpenMP and CUDA to use parallel and GPU computing in ef.
Ef_python is written in python for fast and flexible development. It uses NumPy for simulation, and
this report describes the efforts to improve its performance and adapt it to use graphics accelerators.
        A previous study using a 2-dimensional dynamic particle model as a sample showed that both
OpenCL and CUDA could dramatically improve particle simulation performance compared to
NumPy [4]. For this project we chose to use CUDA as the acceleration backend because of the
availability of high-level libraries, namely CuPy and AMGX.


2. Particle-in-cell
         The idea of the particle-in-cell method is to simulate individual positions and velocities of
particles over small time steps, while using a grid to approximate collective parameters, such as charge
density and current density created by particles. The simulation domain is defined by a cuboid regular
mesh and a time step. As part of the problem definition, Ef and Ef_pyhton allow defining conducting
volumes with a fixed potential within the simulation domain. Only simple volume shapes, such as a
cylinder or a sphere, are supported right now. The same shapes define particle sources that generate
new particles with a normal velocity distribution within their volumes at each time step. It is possible
to add externally defined electric and magnetic fields.
         Each simulation step within Ef_python consists of the following operations: advance particle
positions and momenta, generate new particles, calculate charge density, compute electric potential,
calculate electric field.
2.1 Grid-to-particle: field interpolation
        To compute the electric field acting on each particle, linear interpolation of the electric field
from the 8 surrounding grid nodes is used. This is not the only possible interpolation kernel. Grid-to-
particle and particle-to-grid kernels are an important area of possible algorithmic optimization to
improve simulation accuracy and performance in this project.
2.2 Particle mover
        A “particle mover” denotes the simulation component responsible for updating particle
positions and velocities at each time step. The choice of the particle mover is important for assuring
conservation of important invariants, such as energy. As the particle mover in ef_python, the well-
established leapfrog second-order explicit method, known as the Boris scheme, is used [5].
2.3 Particle creation and destruction
         Particles that leave the simulation volume or collide with inner conducting volumes are
absorbed and removed from simulation. New particles generated by particle sources are added to the
system and their momentum is simulated half a time step back as it is required by the “leapfrog”
particle mover.


                                                                                                     226
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019


2.4 Particle-to-grid: particle weighting
        The charge of each particle is linearly divided into the 8 surrounding grid nodes. This first-
order scheme is sometimes called cloud-in-cell. The collective current is not used in ef or ef_python
right now, as the magnetic field created by particles is considered negligible.
2.5 Field solver
        To simulate the electromagnetic field created by particles, Maxwell’s equations on the grid
can be approximately solved by many algorithms based on 3 main approaches: finite difference
methods (FDM), finite element methods (FEM) and spectral methods. In ef and ef_python, the FDM
method is used to solve the Poisson’s equation and compute the electric potential created by the
simulated particles, while considering the effect of conducting volumes within the simulation domain.
The electric field is then simply a numerically computed gradient of the potential on the grid.


3. Simulation performance improvements
        The contribution of each step of simulation to the total execution time depends on different
simulation parameters and implementation details. Profiling tools allowed us to find bottlenecks and
improve performance using different simulation setups. The field solver is usually the major
component of simulation in terms of resource usage.
3.1 Algorithm improvements
         As a result of profiling, several unreasonably slow function implementations were fixed,
including rewriting the FDM equation sparse matrix generator using diagonal SciPy matrix classes,
caching which mesh nodes are inside conducting volumes, delegating field-to-particle interpolation to
SciPy, and rewriting particle-to-field weighting for better NumPy parallelization.
         In order to safely conduct the major code restructuring and reimplementation required for this
project, over 200 unit tests were created, providing total code coverage of 91%.
3.2 Algebraic multi-grid solver
        The point of major computation complexity of
ef_python is the field solver. As such, it was the first
component picked for acceleration. The SciPy conjugate
gradient solver for sparse matrices was used in
ef_python before these improvements.
        Multigrid methods are methods of numerically
solving differential equations on a hierarchy of grids
with increasing coarseness, allowing one to faster reduce
the large-wavelength error [6]. Algebraic multigrid
methods construct the multigrid scale hierarchy directly
from the matrix of linear equations, not explicitly
relying on the geometry of the system or the partial
differential equations being solved. Thus, algebraic
multigrid methods are easy to apply without complex
problem-dependent setup. The PyAMG library provides          Figure 1 Comparison of the axially
many AMG methods for Python, and the AMGX library symmetric beam shape: an infinite beam
implements AMG on CUDA. PyAMGX is a Python
                                                          predicted analytically and a beam starting
wrapper over AMGX, allowing one to run AMGX from
Python, on one GPU only.                                   from a cylinder computed numerically


                                                                                                     227
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019


3.3 Using CuPy for GPU acceleration
         CuPy is a library aiming to help easy
acceleration of the NumPy-based code on
GPU by providing in-place replacements for
most NumPy functions and operators. By
using CuPy we accelerated most of the
simulation operations in ef_python with minor
code changes.
         Grid interpolation was not available in
CuPy and had to be rewritten as a custom
CUDA kernel. The gradient function from
Numpy that was used to compute the electric
field from the potential was not implemented
                                                  Figure 2 Comparison of the simulation time for the
in CuPy. It had to be reimplemented for the
case of regular grid steps using CuPy            two-stream    instability problem with higher precision
operators.                                        of the field solver and a smaller number of particles
         Simulation classes were rewritten to
use class injection for the convenient runtime NumPy/CuPy and PyAMG/AMGX selection. This
allowed using the selected simulation backend with minimal code duplication. Only the field solver
and spatial mesh classes required explicit implementation-dependent subclasses.


4. Sample simulation
                                                              In addition to unit tests, sample
                                                      simulation was regularly performed to detect
                                                      errors during development. It was the
                                                      simulation of a radially symmetric charged
                                                      particle beam starting from a cylindrical
                                                      emitter. The shape of the beam caused by
                                                      electrostatic particle repulsion can be predicted
                                                      analytically. The comparison between the
                                                      theoretical shape and the numerical solution is
                                                      shown in Figure 1.
                                                              The simulation of two-stream instability
Figure 3 Comparison of the simulation time for the    was performed with ef using OpenMP with 1
    two-stream instability problem with higher        and 12 threads to provide a baseline of
 precision of the field solver and a smaller number   comparison. The performance was compared to
                      of particles                    ef_python using CuPy and PyAMGX running
                                                      on Nvidia Tesla K40 and Nvidia Tesla K80
                                                      graphics cards. The results of this comparison
are shown in Figure 2 and Figure 3.


5. Results and acknowledgements
        With minimal changes to the program, CuPy allowed accelerating most of the simulation
operations on GPU through CUDA. In addition, to improve the field solver, algebraic multigrid
methods, implemented by the PyAMG library on CPU and the AMGX library on GPU, were utilized.
Major speed-up was achieved, especially when running simulations with very high-resolution spatial
grids on high-performance GPUs.
        Using multiple GPUs is an important future prospect for the ef_python application. This will
require using MPI for process coordination and data exchange. AMGX already has MPI-based multi-


                                                                                                     228
      Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC’2019)
                         Budva, Becici, Montenegro, September 30 – October 4, 2019


GPU support, but the PyAMGX interface library will have to be extended to add it in. Another
promising avenue of future research is using OpenCL as an acceleration backend, as it can utilize non-
Nvidia GPUs.
        Profiling and testing were partly performed on the HybriLIT heterogeneous computing
platform [7] in the Laboratory of Information Technologies at JINR.
        The reported study was funded by RFBR according to the research project № 18-32-00239.


References
[1] Donets, E.D., 1976. Review of the JINR Electron Beam Ion Sources. IEEE Transactions on
Nuclear Science 23, 897. https://doi.org/10.1109/TNS.1976.4328375
[2] Ef [WWW Document], n.d. . GitHub. URL https://github.com/epicf (accessed 11.11.19).
[3] Grigorʹev, I.N., Vshivkov, V.A., Fedoruk, M.P., 2002. Numerical “particle-in-cell” methods:
theory and applications. VSP, Utrecht; Boston.
[4] Boytsov, A., Kadochnikov, I., Zuev, M., Bulychev, A., Zolotuhin, Y., Getmanov, I., 2018.
COMPARISON OF PYTHON 3 SINGLE-GPU PARALLELIZATION TECHNOLOGIES ON THE
EXAMPLE OF A CHARGED PARTICLES DYNAMICS SIMULATION PROBLEM 5.
[5] Qin, H., Zhang, S., Xiao, J., Liu, J., Sun, Y., Tang, W.M., 2013. Why is Boris algorithm so good?
Physics of Plasmas 20, 084503. https://doi.org/10.1063/1.4818428
[6] Shapira, Y., 2003. Matrix-Based Multigrid: Theory and Applications. Springer Science &
Business Media.
[7] Gh. Adam, M. Bashashin, D. Belyakov, M. Kirakosyan, M. Matveev, D. Podgainy,
T. Sapozhnikova, O. Streltsova, Sh. Torosyan, M. Vala, L. Valova, A. Vorontsov, T. Zaikina,
E. Zemlyanaya, M. Zuev. IT-ecosystem of the HybriLIT heterogeneous platform for high-performance
computing and training of IT-specialists. Selected Papers of the 8th International Conference
“Distributed Computing and Grid-technologies in Science and Education” (GRID 2018), Dubna,
Russia, September 10-14, 2018, CEUR-WS.org/Vol-2267″


                                                                                                     229

</pre>