Performance Evaluation of Space Fractional FitzHugh-Nagumo Model: an Implementation with PETSc Library Nikita Markov1 , Konstantin Ushenin1,2 , and Ahmed Hendy1 1 Ural Federal University, Yekaterinburg, Russia 2 Institute of Immunology and Physiology of the Ural Branch of the Russian Academy of Sciences, Yekaterinburg, Russia Abstract. Space fractional derivatives have been used instead of integer operators with success in a variety of practical applications to study and describe transport processes in media characterized by spatial connectiv- ity properties and high structural heterogeneity altering the classical laws of diffusion. This study provides an implementation of space fractional FitzHugh-Nagumo model of excitable medium in two-dimensional space. This implementation is based on shifted Grunwald-Letnikov computa- tional scheme which transforms the space fractional diffusion equations to a system of algebraic equations. After that, the conjugated gradient method is used to solve the sparse matrix system. MPI and PETSc li- brary is used for implementation. We investigate the influence of some factors to our implementation performance and scalability such as core numbers, size of computation mesh, and fractional derivative orders. All tests are done on ccNUMA architecture with two CPUs. Keywords: space fractional reaction-diffusion equations · fractional Laplacian · FitzHugh-Nagumo model · spiral waves · conjugated gra- dient method · distributed memory system · PETSc library 1 Introduction Living system simulations make great impact on the understanding of biological processes. They help in checking the side effects of drugs and allow to create a personalized model of human organs which help in planning of surgery and medication. Recently, some researchers have been looking for a way to apply fractional derivatives [1] to living system models. The main advantage of these approaches is to describe sub-diffusion processes. The sub-diffusion is a diffusion process which gives a squared displacement of particles which are non-linearly dependent on time. Especially, this variant of diffusion exists in tissue and sub- cellular space. The model of visco-elastic property of soft tissue which can explain oscillation damping more precisely is one of models which contains fractional derivatives. The review of these sorts of models are described in the introduction of Ph.D. thesis [2]. In addition, there are several other applications of models Performance Evaluation of Space Fractional FitzHugh-Nagumo Model 67 with fractional derivatives to life science: cancer growth [3], initial stage of HIV [4], and population growth [5]. Individual attention is required for the excitable medium models. These mod- els describe neurons and muscle tissue (myocardium). The special problem is a more realistic description of diffusion in consideration of subdiffusion transport in subcellural space and tissue structure. In [6], the authors proposed a replac- ment of integer Laplacian in excitable medium model by fractional Laplacian. This replacement gives a precise reproduction of several physiological experi- ments in computer simulations. An important feature of fractional derivative in a new model behavior, which can not be repeated with changing diffusion coefficient or introduction of nonlinear parts. Unfortunately, the replacement of integer derivatives by fractional ones leads to a significant increase in the computational time. This fact motivates us to introduce a high performance implementation for fractional FitzHugh-Nagumo model [7] based on some computational methods [8]. A fractional FitzHughNagu- mo model is a generalizing of the standard model that describes the propagation of the electrical potential in heterogeneous cardiac tissue. The fractional model consists of a couple fractional Riesz space nonlinear reaction-diffusion model and a system of ordinary differential equations. This implementation is required for physiological experiments in the future. We use PETSc library [9–11], which provides iterative solvers for sparse ma- trix. This library is used in more than 14 software packages for scientific com- puting [9] and support multicore architectures, computation accelerators and systems with distributed memory. The implementation based on this popular and well-known library is more suitable for our goals. Besides that, we investigate the influence of some factors in our implemen- tation performance and scalability such as core numbers, size of computation mesh, and fractional derivative orders. All tests were conducted on Cache co- herent non-uniform memory access (ccNUMA) architecture with two six core central processing unit (CPU). The rest of this paper is as follows: the next sec- tion is devoted to describe FitzHugh-Nagumo model and fractional Laplalacian operator, then we proceed to describe conducted experiments. 2 Numerical Implementation of FitzHugh-Nagumo Model The FitzHugh-Nagumo model is a reaction-diffusion equation which describes waves propagation in an excitable medium. Initially, it was created as a mathe- matical description of the action potential propagation on an axon membrane. ∂u = ∇ · (K∇u) + u(1 − u)(u − a) − v, (1) ∂t ∂v = ε(βu − γv − σ), (2) ∂t where u is a “fast” variable which describes membrane potential of a cell; v is a “slow” variable which connects with the medium conductivity by inverse ratio; 68 Nikita Markov, Konstantin Ushenin, and Ahmed Hendy K is a diffusion tensor; ε, β, γ, σ are the constants which describe the model behavior. During a long period of time, the FitzHugh-Nagumo model was being used for research of auto-oscillatory process in the excitable media. For example, the spiral waves in a two dimensional domain and scroll waves in a three dimen- sional domain. These structures are models of arrhythmic activity in mammalian hearts. The model is suitable for this study, because it is simple, contains only two phase variables and converges on big time steps. In addition, spiral waves provide test example for a performance analysis which is close to realistic problems. 2.1 Space Fractional Laplacian Operators Usually, researchers and engineers use derivatives and integrals with integer or- der of degree in applications. However, there are several definitions of derivatives and integrals with real and complex numbers of degree. They are named frac- tional derivatives and fractional integrals. Some of these definitions are given by Grunwald-Letnikov, Riemann-Liouville, Caputo, Marchaud, Hadamard, and more [12]. In some cases, many of these definitions are equivalent to each other. In this paper we use a fractional derivative with order α1 (1 < α1 ≤ 2) which is defined by Riesz [12]. ∂ α1 v 1  ∂ α1 v ∂ α1 v  = − + , (3) ∂|x|α1 2 cos(πα1 /2) ∂xα1 ∂(−x)α1 where Z x ∂ α1 v(x, y, t) 1 ∂2 v(ζ, y, t) = α1 −1 dζ, (4) ∂xα1 Γ (2 − α1 ) ∂x2 −∞ (x − ζ) Z ∞ ∂ α1 v(x, y, t) 1 ∂2 v(ζ, y, t) = dζ. (5) ∂(−x)α1 Γ (2 − α1 ) ∂x2 x (ζ − x)α1 −1 Similar expressions are defined for the space Riesz fractional derivative of order α2 (1 < α2 ≤ 2) with respect to y. After that, we can introduce a fractional Laplacian operator.  ∂β ∂β ∂ β T ∇β = , , (6) ∂|x|β ∂|y|β ∂|z|β In [1, 13, 12], the authors describe the theory of fractional derivatives in detail. After the replacement of integer Laplacian operator with the fractional one, the fractional FitzHugh-Nagumo model appears: ∂u ∂ α1 u ∂ α2 u = Kx + K y + u(1 − u)(u − a) − v, (7) ∂t ∂|x|α1 ∂|y|α2 ∂v = ε(βu − γv − σ), (8) ∂t where Kx and Ky are conductivity coefficients along axes. For further explana- tion, we denote non-linear part of first equation as f = u(1 − u)(u − a) − v, and left side of second equations as s = ε(βu − γv − σ). Performance Evaluation of Space Fractional FitzHugh-Nagumo Model 69 2.2 Numerical Method In our research, we use the numerical method which is described in [14, 8]. This numerical method transforms (7) to a system of linear algebraic equations which can be solved by conjugated gradient (CG) method. The numerical simulation is formulated by considering positive integers m1 , m2 and N such that hx = Dx /m1 , hy = Dy /m2 , τ = T /N be characteristics of space and time grids. The orthogonal uniform computation mesh includes m × m nodes with step by space h. τ is a time step and n is a number of conducted time steps. uni,j is the value of unknown function u(xi , yj , tn ). The GrunwaldLetnikov approximations for l.h.s and r.h.s fractional deriva- tives are given by: i+1 ∂αu 1 X (l) = gα u(xi−l+1 , yj , tn ) + O(hx ), (9) ∂xα (xi ,yi ,tn ) (hx )a l=0 m−i+1 ∂αu 1 X = gα(l) u(xi+l−1 , yj , tn ) + O(hx ), (10) ∂(−x)α (xi ,yi ,tn ) (hx )a l=0 where coefficients gαl are defined by recurrent formula:   α−l+1 gα(0) = 1, gα(l) = (−1)l αl = −gαl−1 . (11) l Similar expressions are defined for y axis direction. The Order of fractional derivative for y direction is denoted as α2 . Substitution of expressions (9), (10) in (7) leads to the following numerical scheme [14]: i+1 hX m−i+1 X i uni,j + r(1) gα(l)1 uni−l+1,j + gα(l)1 un1+l−1,j l=0 l=0 j+1 hX m−j+1 X i + r(2) gα(l)2 uni,j−l+1 + gα(l)2 uni,j+l−1 = un−1 n−1 i,j + τ fi,j , (12) l=0 l=0 n−1 n vi,j = vi,j + τ sn−1 i,j , (13) τ K x τ Ky r(1) = πα1 , r(2) = , (14) 2 cos( 2 )h α 1 2 cos( πα 2 )h 2 α2 n where fi,j = uni,j (1 − uni,j )(uni,j − a) − vi,j n is nonlinear part in the first model n n n equation, and si,j = ε(βui,j − γvi,j − σ) is right hand part of second equations. Boundary conditions in Dirichlet form are described in the following form: un0,j = unm,j = uni,0 = uni,m = 0. (15) The linear system Au(n) = b(n−1) is constructed based on these expressions. A is matrix of the system with size (m − 1)(m − 1) × (m − 1)(m − 1) in left hand 70 Nikita Markov, Konstantin Ushenin, and Ahmed Hendy (a) α1 = α2 = 1.9 (b) α1 = α2 = 2.0 Fig. 1: Structure of the matrix A for different values α1 , α2 side of equations (12). un = (un1,1 , un1,2 , ..., un1,m−1 , un2,1 , ..., unm−1,m−1 ). b(n−1) is a right hand side vector of system (12) with length (m − 1)(m − 1). Figure 1 presents matrix A with different values of α1 = α2 . The problem became identical to usual FitzHugh-Nagumo model if α1 = α2 = 2.0. In work [14] authors use Gauss-Seidel method to solve the equations. In our research we use conjugated gradient method without precondition. 2.3 Spiral Wave Initiation Protocol In this research, we use initial conditions which lead to a stable spiral wave in excitable medium. These structures are models of arrhythmic activity in mam- malian hearts. Spiral waves are auto oscilatory process. This allows to use them for long time simulations without additional activation of excitable medium. Initial conditions correspond to ones in [14]. These are square with u0i,j = 1 0 and rectangle with vi,j = 0.1 which are not intersect with each other. For better understanding, we rewrite expressions from [14] in terms of percent from square side: ( 1.0, if 0% < x ≤ 50%, 0% < y < 50%, u(x, y, 0) = (16) 0.0, otherwise ( 0.1, if 0% < x ≤ 50%, 50% < y < 100%, v(x, y, 0) = (17) 0.0, otherwise Spiral waves for different fractional derivative orders are shown in Figure 2. 2.4 Implementation At the first stage of the algorithm, arrays and the matrix are initialized by PETSc library. PETSc library does a partition of arrays and the matrix across different processes. The matrix and arrays are constructed in a parallel way. Each process keeps in memory a single part of every structure. The formulas of matrix values Performance Evaluation of Space Fractional FitzHugh-Nagumo Model 71 (a) α1 = 1.3, α2 = 1.7 (b) α1 = α2 = 1.5 (c) α1 = α2 = 1.75 (d) α1 = α2 = 1.8 (e) α1 = α2 = 1.9 (f) α1 = α2 = 2.0 Fig. 2: Spiral waves for different values of α1 , α2 (curvature of the waves is due to boundary conditions) are described in numerical scheme (12). After initialization and construction of the matrix, the program sets u0 and v 0 in order to initiate the spiral wave in the simulation (16)-(17). It should be noted that in the main computational cycle each operation is conducted in a parallel way and the linear system is solved by conjugate gradient method. 3 Numerical Experiments and Results We consider three series of experiments to study performance of the simulation. In the first series, we establish the effect of order of fractional derivative to the performance. We take different values for α1 , α2 and fix the number of cores to one. The second series is concerned with the establishment of scalability for different numbers of threads (from 1 to 12). The fractional orders α1 = α2 are fixed. The third series discusses the relation between number of mesh nodes and scalability. The order of fractional derivatives α1 = α2 chosen to be 1.9 and 2.0. All experiments were conducted on one node of Ural Federal University’s cluster with two CPU Intel Xeon E5-2620 v2 @ 2.10GHz, which are joined in ccNUMA architecture. In [6], the authors compare features of excitation wave behavior when the order of fractional derivative is equal to 1.9, 1.75, and 1.5. In this research, we 72 Nikita Markov, Konstantin Ushenin, and Ahmed Hendy Algorithm 1 The algorithm of initialization and main computational cycle 1: procedure Compute(t, s, f, T ) 2: Initiate(Ai,j , un n i,j , vi,j ) 3: ConstructTheMatrix(A) 4: SetInitialValues(u0 , v 0 ) 5: while t · n < T do 6: b ← un−1 + t · f (un−1 , v n−1 ) 7: Solve(Aun = b) 8: v n ← v n−1 + t · b 9: Print(t, un , v n ) 10: n←n+1 11: end while 12: return un , v n 13: end procedure Fig. 3: Histogram (a) shows computation time for different α1 , α2 ∈ [1.1, 2.0], α1 = α2 ; plot (b) shows computation time for several possible combi- nations of α1 , α2 ∈ [1.1, 2.0] use the same parameters. Other variants and combinations are used to complete the study. If α1 = α2 = 2.0, fractional FitsHugh-Nagumo model is reduced to standard FitsHugh-Nagumo model with integer order of Laplacian. If α1 = 1.9 or α2 = 1.9, the simulation behavior is similar to the case of ordinary Laplacian, but performance drops. This happens because the matrix of the linear system is more complex for fractional derivative, see Fig. 1. This leads to a high number of steps in iterative CG solver. The first series of experiments are done on a square area with size 2.5 × 2.5 and 256 × 256 mesh nodes, step-by-time τ = 0.1. The parameters α1 , α2 are chosen in range from 1.1 to 2.0 with step 0.1. The simulation takes 1400 time steps for each parameter combination. A spiral wave initiation protocol (16,17) is used as initial condition.The computational power is restricted by one core. When α1 = α2 , α1 ∈ [1.1, 2.0], the results are shown in Figure 3(a). For all possible combinations of parameters α1 ∈ [1.1, 2.0], α2 ∈ [1.1, 2.0], the results are shown in Figure 3(b). The computational time grows with increasing α1 , α2 Performance Evaluation of Space Fractional FitzHugh-Nagumo Model 73 Fig. 4: Diagram (a) shows performance for integer Laplacian order α1 = α2 = 2.0. This is a case when fractional derivatives are reduced to ordinary deriva- tives. Diagram (b) shows performance for other order of fractional Laplacian when α1 = α2 . Please, notice the scale. Diagram (c) shows scalability of the implementation for different orders of fractional derivatives and dramatically decreases when α1 = 2.0 or α2 = 2.0. The computational time for α1 = α2 = 2.0 is 35 times lower than in case of α1 = α2 = 1.9. We can explain this result by reduction of the fractional problem to the problem with integer Laplacian when α1 = α2 = 2.0. This reduction leads to a simplification of the matrix (Fig. 1). The multi-diagonal matrix of fractional problem is reduced to banded matrix with five diagonals. This simple matrix structure requires less iterations of CG solvers. The difference in computational time for different derivative orders can be explained by features of the matrix or features of spiral wave behavior. In the second series, we fixed values of α1 = α2 ; α1 , α2 ∈ [1.1, 2.0] and studied the scalability for number of cores from 1 to 12. Fractional derivative orders are set to be 1.1, 1.25, 1.5, 1.75, 1.9, 2.0. Simulation takes 1000 time steps on a 5.0 × 5.0 mesh with 512 × 512 mesh nodes. 74 Nikita Markov, Konstantin Ushenin, and Ahmed Hendy Fig. 5: A Plot (a) shows the implementation speedup for fractional Laplacian with integer and not-integer orders on computation mesh with different size. A chart (b) shows a dependency between scalability and computational mesh size when number of cores equal to 12 and parameters α1 = α2 are set to 1.9 or 2.0 The time of computations is shown on a Diagrams 4(a), 4(b) and scalability is shown on a Plot 4(c). The scalability is close to linear for α1 = α2 = 2.0, but in case α1 = α2 < 2.0 it increases up to a sixth core and stays on plate. Similar result is shown on two and three nodes of cluster. In our opinion, these results may be explained by counts of slow non-local memory readings from different cores in ccNUMA architecture. In the third series, we study a dependency between scalability and mesh size. The computation is done on a different square meshes with sizes 1.25 × 1.25, 2.5 × 2.5, 5.0 × 5.0 and 128 × 128 , 256 × 256 , 512 × 512. This formulation of the problem is important, because the computational scheme 12 establishes dependency of each value to the values from adjoined row and column way to the mesh border. Expansion of the computational mesh leads to increasing of the number of nonzero elements in the matrix. Parameters of experiment are taken as: h = 0.009767 is step by space, τ = 0.1 is step by time, 1000 time steps were computed. The order of fractional derivatives α1 = α2 are set to be 1.9 and 2.0. Figure 5(b) shows scalability of the problem with fractional and ordinary Laplacians. Scalability is close to linear for a “big” mesh, which employs near or more than 256 × 256 mesh nodes. For fractional model, linear scalability is bounded by six cores. For problem with integer order Laplacian, we may explain the results by decreasing of the ratio between useful computation and utility operations. After this experiment, we suppose that our implementation may be used for various electrophysiological computational study, but the application cases are restricted by mesh size and one cluster node. 4 Comparison with Related Work In this section we compare our implementation with some existed works in literature. In [15], the authors proposed some numerical methods for efficient computation of a class of matrix functions on GPU. This method allows to Performance Evaluation of Space Fractional FitzHugh-Nagumo Model 75 improve performance of exponential integrator and computation of fractional Laplacian. The authors describe computationally fractional-by-space reaction- diffusion equations and fractional-by-space FitzHugh-Nagumo model on CPU and GPU. Libraries cuBLAS and cuSPARCE are used as parallelization tech- nology. MATLAB is used for implementation. In their study, 20000 time steps are computed during 175 369 seconds with CPU and 14 579 seconds with GPU. Approximately, it is about 8.768 seconds for one time step on CPU and 0.729 seconds for GPU. Our implementation computes 1000 time steps in 772 sec. with 12 CPU cores. It is about 0.771 steps in one second. Thus, performance of our implementation is faster than CPU implementation from the paper [15] and close in performance to GPU implementation. In [16], the authors described parallel algorithm for solution fractional-by- space reaction-diffusion equations in one dimensional case. Finite difference method is used for fractional partial derivative discretization. Performance study was achieved with one core, one CPU (Intel Xeon X5540) and 64 cores. In the last case, MPI technology is used. Programming language was Fortran 90. Re- markably, computational scheme is based on Grunwald approximations. This is similar to our study, but we solve the problem in two dimensions. In their study, performance increased by 3.38-3.58 times with 4 cores and 4.55–5.23 times with 8 cores. We computed minimal and maximal speedup achieved in our simulation for all variant of parameters excluding case of Lapla- cian with integer order. In our implementation, performance increased by 2.29 – 3.85 times on 4 cores, 3.5 – 5.0 times on 8 cores and 4.06 – 5.28 times on 12 cores. Based on these metrics, we can say that our implementation scalability is close to scalability of the implementation [16]. Besides that, the authors in [17] described program to solve Maxwell’s frac- tional equation which uses PETSc, but performance and scalability tests are not described. In [18] and [19], a study of performance for several numerical meth- ods without parallelization was performed. In [2] and [20, 21], a study of effective parallelization techniques was done. The authors used PETSc. However, these works are not compatible with our work directly, because the authors solved the problem with another conditions and other computational methods. In ad- dition, our study is more detailed. We study performance from hardware and model parameters together in one research. 5 Conclusion In this paper, we describe a high performance implementation of the fractional FitzHugh-Nagumo model. It is based on numerical method [8] to approximate fractional reaction diffusion equation and conjugate gradient method to solve the sparse linear system. Our solution is done with the aid of PETSc library which is used a lot for scientific computing [9]. The simulation with fractional derivatives shows 17%-65% less performance than ones with integer order derivatives. We study the dependencies of simulation performance such as size of mesh and 76 Nikita Markov, Konstantin Ushenin, and Ahmed Hendy order of fractional derivative. In addition, we establish simulation scalability in ccNUMA architecture with two six-core CPU. The proposed implementation shows a linear scalability when a fractional problem is reduced to a problem with an ordinary Laplacian (α1 = α2 = 2.0). However, the implementation speedup reaches plateau from six cores and more with other values α1 , α2 . The proposed implementation can be used for research of physiological fea- tures of excitable medium model with fractional derivative in the future. Acknowledgements. The study was supported by the Russian Science Foun- dation (no. 14-35-00005). We used the computational cluster of Ural Federal University for this research. References 1. Podlubny, I.: Fractional differential equations: an introduction to fractional deriva- tives, fractional differential equations, to methods of their solution and some of their applications. Volume 198. Academic press (1998) 2. Zhang, W.: High Performance Computing for Solving Fractional Differential Equa- tions with Applications. PhD thesis, University of Oslo (2014) 3. Ahmed, E., Hashish, A., Rihan, F.: On fractional order cancer model. Journal of Fractional Calculus and Applied Analysis 3(2) (2012) 1–6 4. Arafa, A., Rida, S., Khalil, M.: Fractional modeling dynamics of hiv and cd4+ t-cells during primary infection. Nonlinear Biomedical Physics 6(1) (2012) 1 5. Xu, H.: Analytical approximations for a population growth model with fractional order. Communications in Nonlinear Science and Numerical Simulation 14(5) (2009) 1978–1983 6. Bueno-Orovio, A., Kay, D., Grau, V., Rodriguez, B., Burrage, K.: Fractional dif- fusion models of cardiac electrical propagation: role of structural heterogeneity in dispersion of repolarization. Journal of The Royal Society Interface 11(97) (2014) 20140352 7. FitzHugh, R.: Impulses and physiological states in theoretical models of nerve membrane. Biophysical journal 1(6) (1961) 445 8. Liu, F., Zhuang, P., Turner, I., Anh, V., Burrage, K.: A semi-alternating direction method for a 2-d fractional fitzhugh–nagumo monodomain model on an approxi- mate irregular domain. Journal of Computational Physics 293 (2015) 252–263 9. Balay, S., Abhyankar, S., et al.: PETSc Web page. http://www.mcs.anl.gov/petsc (2016) 10. Balay, S., Abhyankar, S., et al.: PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory (2016) 11. Balay, S., Gropp, W., McInnes, L., Smith, B.: Efficient management of parallelism in object oriented numerical software libraries. In Arge, E., Bruaset, A.M., Lang- tangen, H.P., eds.: Modern Software Tools in Scientific Computing, Birkhäuser Press (1997) 163–202 12. Samko, S., Kilbas, A., Marichev, O., et al.: Fractional integrals and derivatives. Theory and Applications, Gordon and Breach, Yverdon 1993 (1993) 13. Baleanu, D., Diethelm, K., Scalas, E., Trujillo, J.: Models and numerical methods. World Scientific 3 (2012) 10–16 Performance Evaluation of Space Fractional FitzHugh-Nagumo Model 77 14. Liu, F., Turner, I., Anh, V., Yang, Q., Burrage, K.: A numerical method for the fractional fitzhugh–nagumo monodomain model. ANZIAM Journal 54 (2013) 608–629 15. Farquhar, M., Moroney, T., Yang, Q., Turner, I.: Gpu accelerated algorithms for computing matrix function vector products with applications to exponential integrators and fractional diffusion. SIAM Journal on Scientific Computing 38(3) (2016) C127–C149 16. Gong, C., Bao, W., Tang, G.: A parallel algorithm for the riesz fractional reaction- diffusion equation with explicit finite difference method. Fractional Calculus and Applied Analysis 16(3) (2013) 654–669 17. Ge, J.: Fractional Diffusion Modeling of Electromagnetic Induction in Fractured Rocks. PhD thesis, Texas A&M University (2014) 18. Burrage, K., Hale, N., Kay, D.: An efficient implicit fem scheme for fractional-in- space reaction-diffusion equations. SIAM Journal on Scientific Computing 34(4) (2012) A2145–A2172 19. Yang, Q., Turner, I., Moroney, T., Liu, F.: A finite volume scheme with precon- ditioned lanczos method for two-dimensional space-fractional reaction–diffusion equations. Applied Mathematical Modelling 38(15) (2014) 3755–3762 20. Alyoubi, A., Ganesh, M.: An efficient hpc framework for parallel long-time and large-scale simulation of a class of anomalous single-phase models. In: High Per- formance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th Interna- tional Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, IEEE (2015) 1552–1557 21. Alyoubi, A., Ganesh, M.: Parallel mixed fem simulation of a class of single-phase models with non-local operators. Journal of Computational and Applied Mathe- matics 307 (2016) 106–118