CPU and GPU Parallel Kramers-Klein Calculations Bogdan I. Iaparov, Anton N. Karmatsky, and Alexander S. Moskvin Ural Federal University, Yekaterinburg, Russia bogdan.iaparov@urfu.ru Abstract. The Kramers-Klein equation is a special Fokker-Planck equa- tion describing the brownian motion in a potential. This equation is used in different areas of biology, chemistry and physics. In this paper we present the comparison between numerical and analytical solutions in case of a linear force and usage of multi-core CPU and GPU for solv- ing Kramers-Klein equation with different boundary conditions. It was shown that the GPU calculations under reflecting boundary conditions are slower as compared with other boundary conditions because in the first case we need to call two kernels sequentially. Keywords: Kramers-Klein equation · parallel algorithm · CUDA · OpenMP 1 Introduction Brownian motion(BM) is the random motion of a particle in a fluid. Hereafter we’re discussing the one-dimensional case. BM can be mathematically described in two ways. The first one is a Langevin equation, a stochastic differential equa- tion for the particle’s coordinate: mẍ + mγ ẋ + mf 0 (x) = mΓ (t) < Γ (t) >= 0 (1) 0 0 < Γ (t)Γ (t ) >= 2γkB T δ(t − t) here m is the mass of the particle, γ is the damping constant, f 0 (x) is the stationary potential force per mass due to the potential −f (x), Γ (t) – delta- correlated Gaussian random process. It’s clearly seen from (1) that x(t) and v(t) = ẋ(t) are stochastic quantities due to Γ (t) in the equation. Therefore we may ask for probability density function (PDF)(ρ(x, v, t)) of finding particle in a coordinate x with a velocity v at time t. An equation for PDF in case of BM in an external potential field is called Kramers-Klein (KK) equation: ∂ρ ∂ρ ∂ρ ∂vρ γkB T ∂ 2 ρ = −v + f 0 (x) +γ + (2) ∂t ∂x ∂v ∂v m ∂v 2 This equation was used first to describe reaction kinetics [1], but later it turned out that it can be used in very different fields of physics such as relaxation of dipoles, superionic conductors and so on [2]. In the paper we address parallel numerical solutions of (2). 18 Bogdan I. Iaparov, Anton N. Karmatsky, and Alexander S. Moskvin 2 Numerical Solution Despite intensive analytical studies of KK equation [2], analytic solution can be found only in special cases. This makes numerical solutions as a significant resource for studying stochastic processes. Difficulties of studying KK equation numerically come from the fact, that this partial differential equation is of a mixed type. The problem behaves like a parabolic equation in the v-direction and like a hyperbolic equation in the x-direction. Moreover, we have different types of hyperbolic equations in the x-directions for v > 0 and v < 0. We used a numerical method, computational domain and the uniform grid as in [3].Central differences scheme were used for the discretization in v - direction of first and second derivatives, for x - direction the upwind scheme discretization was used, for t - direction forward Euler scheme was used. It is an explicit method that efficiently deals with different types of boundary conditions and potential fields. Convergence and stability of the method is discussed in Ref.[3]. The accuracy of the method is defined as O(∆x + ∆v 2 + ∆t). 3 Boundary Conditions Say we have a rectangular 2D computational domain(in x and v direction, xL ≤ x ≤ xR ,vL ≤ v ≤ vR ) with uniform grid. In the v-direction the problem doesn’t have boundaries and we assume that computational domain is so large that we can assume the BC as follows: ρ(x, vL , t) = ρ(x, vR , t) = 0 (3) for all x, t. In the x-direction we consider three types of BC: – Dirichlet. In analogy with v-direction we assume that computational domain is so large that particle can’t be in boundaries of domain: ρ(xL , v, t) = ρ(xR , v, t) = 0 (4) for all v, t. – Absorbing. In this case we only have physical boundaries defined at xL for v < 0 and at xR for v > 0: ρ(xL , v, t) = 0, v > 0 (5) ρ(xR , v, t) = 0, v < 0 The physical meaning of the absorbing BC is that there is no particle flux from the boundary into the interior. – Reflecting. In this case we have the BC as follows: ρ(xL , v, t) = ρ(xL , −v, t) (6) ρ(xR , v, t) = ρ(xR , −v, t) The physical meaning of the reflecting BC is that no particle would be lost at the boundary and all particles will be reflected back into the interior. CPU and GPU Parallel Kramers-Klein Calculations 19 4 Parallelization Parallel CPU calculations were implemented using OpenMP, the GPU calcula- tions were done using CUDA. The numerical method we used [3] is explicit, the PDF inside the domain is fully calculated from PDF at previous time, as a con- sequence, calculation of PDF in each interior’s node is independent of another nodes at the same time: ρ(x, v, ti+1 ) = ρ(x, v, ρ(x, v, ti )) (7) hence the method can be easily parallelized using “stencil” parallel pattern. In case of CPU parallelization, the boundaries were calculated separately and without parallelization because the highest performance was shown in this case. In case of GPU parallelization the boundaries were calculated in one CUDA kernel with interior in case of the Dirichlet and absorbing BC. In case of the reflecting BC two CUDA kernels run sequentially: the first one calclulates interior and half of boundaries in x-direction with v > 0 and the second one calculates half of the boundaries in x-direction with v < 0. We did it because in this case PDF on the boundary depends on PDF in another points at the same time(see (6)). As a result, GPU calculations with reflecting BC are slower than with the Dirichet or absorbing ones. In CUDA calculations only global memory was used. 5 Results Program for CPU was written in C++ and compiled with gcc (g++ -O3) com- piler. GPU implementation is also in C++ and CUDA part was compiled with proprietary nvcc compiler while g++ has been used for the host code. CPU cal- culations were done on PC(4x Intel Core i5 3.2 GHz, NVIDIA GTX 750), GPU calculations were done on PC and Uran supercomputer(NVIDIA Tesla M2050) from Institute of Mathematics and Mechanics UrB RAS. Calculations were done using double precision floating point numbers. First, we compared results of the numerical solution with the analytic one in case of a linear force (parabolic potential). The analytic solution in this case was taken from [2] and is a two-variable Gaussian distribution of coordinate and velocity with the time-dependent means and variances. The error was calculated as a discrete norm:  0.5 X ε(t) = ∆x∆v (ρ(xi , vj , t) − ρa (xi , vj , t))2  (8) i,j where ρa is an analytic solution. In Figure 1 the numerical and analytic PDFs are shown at t = 12 (dimensionless units). Number of nodes in the x-direction is 2049 and in the v-direction is 129. It is clearly seen that numerical solution does approximate the analytic one quite accurately. 20 Bogdan I. Iaparov, Anton N. Karmatsky, and Alexander S. Moskvin 0.09 0.09 0.05 0.08 0.05 0.08 0.07 0.07 0.00 0.06 0.00 0.06 0.05 0.05 0.04 0.04  0.05  0.05 0.03 0.03 0.02 0.02  0.10 0.01 0.01 0.00 4 4 2 2 4 0 4 0 2 2 v 2 2 v 0 0 x 2 4 4 x 2 4 4 (a) Numerical solution (b) Analytic solution Fig. 1: PDF in case of potential 0.18x2 , γ = 3. 1a is a numerical and 1b is an analytic solution at t = 12. At t = 0, the particle has zero velocity and x = 0. ε(12) = 8.3e − 4. All variables are in dimensionless units. Then we looked on the performance of parallel CPU and GPU calculations comparing with the serial CPU ones with different BC. Calculations were done on an uniform grid with 2049 nodes in x-direction and with 257 nodes in v- direction, total of 526593 nodes. Time step was 1.4e − 4 and calculations were stopped when t = 10. The results are shown in the Table 1. Parallel solving of KK equation provides a significant speed up of the calculations even in case of the parallel CPU solution. Table 1: Duration of solving KK equation on a 2049x257 grid with ∆t = 1.4e − 4 to t = 10. In brackets is written the speed up factor comparing with serial CPU calculations(1x CPU). Type of BC Dirichlet Absorbing Reflecting Hardware 1x CPU 592.6 s 601.2 s 600.8 s 4x CPU 218.3 s (2.7x) 226.74 s (2.7x) 220 s (2.7x) 1x GPU(GTX 750) 114.7 s (5.2x) 116.6 s (5.2x) 122 s (4.9x) 1x GPU(Tesla M2050) 39.8 s (14.9x) 40 s (15x) 43.9 s (13.7x) Calculations on grids with different Np and Nx have shown that duration of solving KK equation depends linearly on N = Np · Nx (data not shown). 6 Discussion KK equation is a partial differential equation of mixed type which describes the time evolution of a PDF of a brownian particle in an external potential. In CPU and GPU Parallel Kramers-Klein Calculations 21 literature the Kramers problem(find the escape rate of a Brownian particle from a potential well) is usually solved analytically using some assumptions(harmonic approximation near minima [1]), piecewise-linear approximation [4] and high energy barriers approximation[5]. Finding PDF is a much harder task and can be solved analytically only in a few cases. In this paper we solved the KK equation numerically. Solving KK equation numerically allows us to get a solution for both finite and infinite geometry and arbitrary potential field. We showed that both CPU and GPU parallel computing provide a significant gain in the solving time. Even home PC’s GPU can solve the equation for an acceptable time. BC type is important from a physical viewpoint on the problem when it has a finite geometry. Different types of collisions of brownian particle with a wall: the elastic(reflecting BC) and inelactic (absorbing BC) ones influence the PDF very significantly[3]. From a viewpoint of computations, in case of the reflecting BC, the GPU parallelization is less effective because in one iteration we need to call two kernels sequentially and Table 1 shows quite clearly, that the GPU acceleration is smaller in case of the reflecting BC. From a viewpoint of computations, further works should include more accu- rate and efficient schemes for solving the KK equation and its parallelization[7]. Further physical researches should include numerical studies of the KK equation for the ion channel kinetics[6] with different physically reasonable potentials, infinite and finite geometries. Acknowledgements. Supported by the Russian Scientific Foundation, project # 14-35-00005. References 1. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of chemical reactions, Physica. 7: 284-304 (1940) 2. Risken, H.: The Fokker-Planck Equation, Springer, Berlin (1989) 3. Araújo, A., Das, A.K., Sousa, E.: A numerical approach to study the Kramers equation for finite geometries: boundary conditions and potential fields, J. Phys. A: Math. Theor. 48:045202 (2015) 4. Goychuk, I., Hänggi, P.: The role of conformational diffusion in ion channel gating, Physica A. 325:9–18 (2003) 5. Mel’nikov, V.I., Meshkov, S.V.: Theory of activated rate processes: Exact solution of the Kramers problem, J. Chem. Phys. 85:1018–1027 (1986) 6. Sigg, D.: Modeling ion channels: Past, present, and future, J Gen Physiol., 144: 726 (2014) 7. Shu, C.-W.: Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws, Advanced Numerical Approximation of Nonlinear Hyperbolic Equations,Lecture Notes in Mathematics series, 1697: 325432 (2006)