Comparison of scalar multiplication algorithms in a low resource device Mohamed Ramdani Mohamed Benmohammed Nadjia Benblidia LRDSI Laboratory, Computer Science LIRE Laboratory, Computer Science LRDSI Laboratory, Computer Science Dept Dept Dept University of Blida1, Blida, Algeria University of Constantine2, University of Blida1, Blida, Algeria moumouhr@yahoo.fr Constantine, Algeria benblidia@gmail.com med.ramdani@yahoo.com ben_moh123@yahoo.com 1 Introduction Abstract Cryptography has emerged as a reliable and power- ful solution for maintaining data confidentiality. To Scalar multiplication of a point is the central make information eligible, cryptographic mechanisms operation in the elliptic curves cryptography use complex mathematical methods that often involve (ECC). It’s a complex operation that requires intensive computations. This intensity is often a prob- a lot of optimization especially on execution lem for low-resource systems like embedded systems. times to reduce resource consumption in sys- There are two ways to encrypt a message; symmet- tems with low computing power and memory ric method (Secret Key Cryptography SKC) based on such as embedded systems. Several works on shared private keys, and asymmetric method (Public- the acceleration of computation and many oth- Key Cryptography PKC) based on a couple of public ers on the reduction of the complexity of math- and private keys. The advantage of the latter is that all ematical methods used in the computations on exchanges are public and its security is based on the dif- the elliptic curves have been realized. Many ficulty of finding the secret from information exchanged algorithms for computing scalar multiplication publicly. are proposed, each using their own calculation technique and mathematical methods. In this One of the most recently used asymmetric cryptosys- paper, we combine the techniques of acceler- tems is Elliptic Curve Cryptography (ECC). ECC uses ated computations with optimized mathemat- short keys for equal security to other asymmetric cryp- ical methods and we implement them on al- tosystems [MKY16]. It is recommended for systems gorithms for scalar multiplication of a point. in which electronic devices have low computing power This work aims to distinguish the most efficient and very limited memory such as embedded systems. computation algorithm with the best accelera- However, despite recent optimizations on ECC, essen- tion technique. The results show that the Joye tially the reduction of computational complexity, this algorithm combined with the co-Z Doubling- cryptosystem remains complex and requires further op- Addition acceleration technique gives better timization. results and saves more than one second of com- The important and costly operation in Elliptic Curve putation time compared to the other algorithms Cryptography for encryption, decryption, digital sig- implemented in this paper. nature, key exchange, etc is scalar multiplication of a point. This is a complex operation that requires more optimizations for the acceleration of cryptographic com- Keywords – Elliptic curves cryptography (ECC), Scalar putations. The execution time of any ECC operation de- multiplication, embedded systems, acceleration compu- pends on the execution time of the scalar multiplication. tations, co-Z addition An embedded system is an autonomous system, gen- erally dedicated to specific tasks, having a limited size Copyright c by the paper’s authors. Copying permitted only for pri- and low resources. There are two important constraints vate and academic purposes. In: Proceedings of the 3rd Edition of the International Conference on of an embedded system: optimized computing power Advanced Aspects of Software Engineering (ICAASE18), Constan- while respecting the temporal and spatial constraints, tine, Algeria, 1,2-December-2018, published at http://ceur-ws.org and an essential security to ensure data confidentiality Page 70 Comparison of scalar multiplication algorithms in a low resource device ICAASE'2018 especially for sensitive applications. The system used The standard algorithm called Dbl-and-add [Knu97] in this work is an Arduino card very limited in memory for calculating scalar multiplication of a point is given resources and computing power. It is an open source by algorithm 1. system, open hardware and open source bootloader. In this paper, we present an implementation of scalar Algorithm 1 Standard Dbl-and-add algorithm multiplication algorithms on a low-resource system to Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) compare and derive the most optimized computation Ensure: Q = k·P method for such systems. We define the scalar multi- R0 = 0/ plication of a point and present some algorithms of its R1 = P computation, in section 2. Section 3 is dedicated to ac- for i ← 1 to d-1 do celeration techniques of cryptographic computations on if ki = 1 then a scalar multiplication and to the system of coordinates R0 ← R0 + R1 used. In Section 4, we present the software implemen- end if tation of acceleration techniques in scalar multiplication R1 ← 2 × R1 algorithms and present the comparison results in Section end for 5. return R0 ; 2 Scalar multiplication The basic operation performed in protocols based on El- This algorithm has a huge disadvantage for the se- liptic Curve Cryptography is the scalar multiplication of curity of the private key used because it is very vul- a point on the curve. Each scalar multiplication requires nerable to attacks SPA [MS00] and DPA [KJJ99]. In- several thousand operations in a finite field. Let k be a deed, by analyzing either the energy consumed or scalar of n bits, P and Q two points of an elliptic curve the number of clock cycles performed per operation, defined on a finite field F by the Weierstrass equation an attacker can reconstruct the private key. This [Gui13][BJ02]: fault is corrected by several works [Mon87][Joy07] [LS01][DSF16][FGDM+ 10], we choose here to work on the algorithms of Mongomery ladder [Mon87] and E : Y 2 + a1 XY + a3 = X 3 + a2 X 2 + a4 X + a6 (1) the algorithm of Joye Dbl-and-add [Joy07] (see algo- where a1 , a2 , a3 , a4 ∈ F. In this work, we consider the rithms 2 and 3). finite prime fields Fp . Equation (1) then becomes: Algorithm 2 Montgomery ladder algorithm E : Y 2 = X 3 + aX + b (2) Require: k = (k ,..., k1 , k0 )2 , P ∈ E(Fp ) d−1 where a, b ∈ Fp . The multiplication of the scalar k by Ensure: Q = k·P the point P of the curve is another point Q s.t Q = k · P R0 = 0/ = P + P + . . . + P (k times). R1 = P Two operations are necessary to perform a scalar mul- for i ← d-1 downto 0 do tiplication: the point doubling P + P = 2P = P’ and the b ← ki point addition P + P’. The computations of doubling and R1−b ← R1−b + Rb addition are given by formulas (3) and (4) below. Let P Rb ← 2 × Rb (X1 , Y1 ) and Q (X2 , Y2 ) be two points of the elliptic curve end for s.t P, Q ∈ E (F ) and P 6= Q : P + P = 2P = (X , Y ) is return R0 ; p 3 3 computed as follows: 2 3X12 + a  X3 = − 2X1 Algorithm 3 Joye Dbl-and-add algorithm 2Y1  2 2 (3) Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) 3X1 + a Ensure: Q = k·P Y3 = (X1 − X3 ) −Y1 2Y1 R0 = 0/ R1 = P and P + Q = (X4 , Y4 ) is the result of an addition of the for i ← 0 to d-1 do two points P and Q is computed as follows: b ← ki Y2 −Y1 2 R1−b ← 2 × R1−b + Rb   X4 = − X1 − X2 end for X2 − X1 (4) return R0 ; Y2 −Y1 2   Y4 = (X1 − X4 ) −Y1 X2 − X1 International Conference on Advanced Aspects of Software Engineering Page 71 ICAASE, December, 01-02, 2018 Comparison of scalar multiplication algorithms in a low resource device ICAASE'2018 Doubling algorithm, a P + Q point addition by Jacobian Table 1: Calculation of point doubling and addition op- Addition and a Doubling-Addition operation (DA) 2P erations with Jacobean coordinates + Q by the Mixed Jacobian Affine algorithm. This last P+Q 2P algorithm uses mixed coordinates; a point in Jacobian A 2 X1 Z2 4X1Y1 2 coordinates and another point in Affine coordinates. In B 2 X2 Z1 2 3X1 + AZ1 4 co-Z Jacobian Formulae, a point doubling is computed C 3 Y1 Z2 / by the Initial Doubling algorithm (XYcZ-IDBL), an ad- D 3 Y2 Z1 / dition by (X;Y)-only co-Z addition (XYcZ-ADD) and E B–A / a DA by (X;Y)-only co-Z Doubling-Addition (XYcZ- F D–C / DA). X3 3 -E – 2AE + F3 2 -2A + B 2 Y3 -CE 3 + F(AE 2 – X3 ) -8Y14 + B(A – X3 ) 4 Software implementation Z3 Z1 Z2 E 2Y1 Z1 The various techniques of computational acceleration for point doubling and addition are implemented on 3 Related work the different scalar multiplication algorithms presented above in order to compare the performance of each al- The complexity of the computations in scalar multipli- gorithm with each technique. The algorithms obtained cation of a point and the large number of point doubling and used as support for our comparisons are presented and point addition calculated to perform this operation in the following: required a lot of optimization works reduce computa- tions. In this section, we present some algorithms that • Standard Dbl-and-add with Jacobian doubling and allow computation accelerations in a scalar multiplica- Jacobian addition (see algorithm 4) tion, but before that, we will focus first on the choice of coordinate systems used. In the Affine coordinate system, the addition and dou- • Standard Dbl-and-add with co-Z Initial doubling bling formulas involve point inversion operations in Fp and co-Z addition (see algorithm 5) which is considered very expensive on the finite fields. In order to avoid the inversion of points [KS17], we • Montgomery ladder with Jacobian doubling and chose to work on the Jacobian coordinates where a point Jacobian addition (see algorithm 6) of the curve is represented by three coordinates (X : Y : Z) which corresponds to the Affine point ( ZX2 , ZY3 ). Equa- • Montgomery ladder with co-Z Initial doubling and tion (2) then becomes: co-Z addition (see algorithm 7) E : Y 2 = X 3 + aXZ 4 + bZ 6 (5) • Joye Dbl-and-add with Mixed Jacobian Affine The computation of the doubling P + P = 2P = (X3 : Y3 Doubling-Addition (DA) (see algorithm 8) : Z3 ) and the addition P + Q = (X3 : Y3 : Z3 ) is given by the table 1. • Joye Dbl-and-add with co-Z Doubling-Addition An additional optimization of the addition, called co- (DA) (see algorithm 9) Z addition, has been proposed by Meloni [Mel07] where he considers that two entry points share the same Z co- ordinate. Let P = (X1 : Y1 : Z) and Q = (X2 : Y2 : Z), the Algorithm 4 Standard Dbl-and-add with Jacobian dou- addition co-Z of P and Q (with P 6= Q) is defined by P + bling and Jacobian addition Q = (X3 : Y3 : Z3 ) so that : Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) X3 = D − (B +C) Ensure: Q = k·P R0 = 0/ Y3 = (Y2 −Y1 )(B − X3 ) − E (6) R1 = P Z3 = Z(X2 − X1 ) for i ← 1 to d-1 do 2 2 if ki = 1 then where A = (X2 −X1 ) , B = X1 A, C = X2 A, D = (Y2 −Y1 ) (R0 ) ← Jacobian_addition(R0 ,R1 ) and E = Y1 (C-B) end if In [Riv11], several algorithms are proposed to per- (R1 ) ← Jacobian_doubling(R1 ) form a scalar multiplication of a point in Jacobian coor- end for dinates with Standard Jacobian Formulae and co-Z ad- return R0 ; dition Jacobian Formulae. In Standard Jacobian Formu- lae, a point doubling 2P is performed by the Jacobian International Conference on Advanced Aspects of Software Engineering Page 72 ICAASE, December, 01-02, 2018 Comparison of scalar multiplication algorithms in a low resource device ICAASE'2018 Algorithm 9 Joye Dbl-and-add with Mixed Jacobian Algorithm 5 Standard Dbl-and-add with co-Z Initial Affine DA doubling and co-Z addition Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) Ensure: Q = k·P Ensure: Q = k·P R0 = 0/ R0 = 0/ R1 = P R1 = P for i ← 0 to d-1 do for i ← 1 to d-1 do b ← ki if ki = 1 then (R1−b ,Rb ) ← XYcZ-DA(R1−b ,Rb ) (R1 ,R0 ) ← XYcZ-ADD(R0 ,R1 ) end for end if return R0 ; (R0 ,R1 ) ← XYcZ-IDBL(R0 ) end for return R0 ; 5 Results and comparison The results presented in this comparison are the result of Algorithm 6 Montgomery ladder with Jacobian dou- experiments carried out on a low-resource device with a bling and Jacobian addition very low computing capacity and a very limited mem- Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) ory size. The device used is an Arduino Uno R3 board Ensure: Q = k·P whose characteristics, shown in Table 2, are close to R0 = 0/ most embedded devices. The curves used are those rec- R1 = P ommended in [Qu99]. The size of the keys used is 192 for i ← d-1 downto 0 do and 256 bits. b ← ki (R1−b ) ← Jacobian_addition(R1−b ,Rb ) Table 2: Features of Arduino Uno R3 (Rb ) ← Jacobian_doubling(Rb ) Feature Type/Value end for return R0 ; Chipset Atmega328P Clock Frequency 8/16 Mhz Algorithm 7 Montgomery ladder with co-Z Initial dou- Voltage 1.8 - 5.5 V bling and co-Z addition Bit Width 8 bits Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) Ensure: Q = k·P Static RAM 2 kB R0 = 0/ Program Memory 32 kB R1 = P for i ← d-1 downto 0 do b ← ki Table 3 shows the execution time (in ms) of point (R1−b ,Rb ) ← XYcZ-ADD(Rb ,R1−b ) doubling, point addition and doubling-addition opera- (Rb ,R1−b ) ← XYcZ-IDBL(Rb ) tions in Standard Jacobian Formulae and co-Z Jacobian end for Formulae. return R0 ; We find that the addition and doubling-addition oper- ations in co-Z Jacobian Formulae are faster and require less computations, unlike the Initial Doubling operation Algorithm 8 Joye Dbl-and-add with Mixed Jacobian where the treatments are heavier than Standard Jacobian Affine DA Formulae. Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp ) The execution time of the scalar multiplication algo- Ensure: Q = k·P rithms (algorithm 4 to algorithm 9) is given by table 4 R0 = 0/ and 5. R1 = P The standard Dbl-and-add algorithm gives better re- for i ← 0 to d-1 do sults, its resource consumption is very low because the b ← ki number of point addition is optimized to log(d/3) un- (R1−b ) ← Mixed_Jacobian_Affine_DA(R1−b ,Rb ) like the other two algorithms where at each iteration, end for an addition is performed regardless of the value of the return R0 ; bit. However, its vulnerability to SPA and DPA attacks makes it almost inappropriate. In the other results, we International Conference on Advanced Aspects of Software Engineering Page 73 ICAASE, December, 01-02, 2018 Comparison of scalar multiplication algorithms in a low resource device ICAASE'2018 Table 5: Execution time of the scalar multiplication al- gorithms (P-256) P-256 Table 3: Execution time (in ms) of point doubling, ad- Algorithm Standard co-Z Jaco- dition and doubling-addition operations Jacobian bian Formulae Algorithm P-192 P-256 Algorithm Algorithm Point Doubling (2P) Standard Dbl-and-add 4 5 Standard Jacobian 6.064 11.732 4770 5724 Jacobian Doubling Algorithm Algorithm co-Z Jaco- XYcZ- 9.708 18.592 Montgomery ladder 6 7 bian IDBL 7471 6630 Point Addition (P + Q) Algorithm Algorithm Standard Jacobian 10.056 19.332 Joye Dbl-and-add 8 9 Jacobian Addition 7815 5528 co-Z Jaco- XYcZ-ADD 3.628 6.916 bian notice that the algorithm 9 gives very interesting results Doubling-Addition (2P + Q) with a reduced execution time of 428 ms compared to Standard Mixed DA 15.501 29.712 the algorithm 7 for a 192-bit key and 1102 ms for a 256- Jacobian bit key. co-Z Jaco- XYcZ-DA 10.916 20.808 the graph below (see figure 1) gives a global compar- bian ison of all implemented algorithms. Table 4: Execution time of the scalar multiplication al- gorithms (P-192) P-192 Algorithm Figure 1: A global comparison of implemented algo- Standard co-Z Jaco- rithms Jacobian bian Algorithm Algorithm Standard Dbl-and-add 4 5 6 Conclusion 1841 2260 In this work, we compared the different computational Algorithm Algorithm acceleration techniques for scalar multiplication by a Montgomery ladder 6 7 point on an embedded system in order to compare them 2960 2641 and to distinguish the best method of computing a scalar multiplication in a low-resource system. The hardware Algorithm Algorithm used is an Arduino embedded system with a microcon- Joye Dbl-and-add 8 9 troller limited in resources including memory and com- 3085 2213 puting power. We used Jacobian coordinate system with Standard Jacobian Formulae and co-Z Jacobian Formu- lae, an improved version of the addition called co-Z ad- dition, since this type of coordinates provides less com- plex and more efficient computation operations. We International Conference on Advanced Aspects of Software Engineering Page 74 ICAASE, December, 01-02, 2018 Comparison of scalar multiplication algorithms in a low resource device ICAASE'2018 have implemented acceleration techniques on three al- [KS17] K Keerthi and B Surendiran. Elliptic gorithms widely used in practice: the standard Dbl-and- curve cryptography for secured text en- add algorithm, the Montgomery ladder algorithm and cryption. In Circuit, Power and Comput- the Joye Dbl-and-add algorithm. We found that the Joye ing Technologies (ICCPCT), 2017 Inter- Dbl-and-add algorithm widh co-Z Doubling-Addition national Conference on, pages 1–5. IEEE, gives better results if we take into account the level of 2017. security offered, unlike the standard algorithm Dbl-and- add which offers a better time but it is very vulnerable [LS01] P-Y Liardet and Nigel P Smart. Prevent- to SPA and DPA attacks. The time saved for a 256-bit ing spa/dpa in ecc systems using the ja- key is more than one second (1102 ms) for scalar multi- cobi form. In International Workshop on plication using the Joye Dbl-and-add algorithm coupled Cryptographic Hardware and Embedded with the technique co-Z Doubling-Addition compared Systems, pages 391–401. Springer, 2001. to other algorithms implemented in this work. [Mel07] Nicolas Meloni. New point addition for- mulae for ecc applications. In Inter- References national Workshop on the Arithmetic of [BJ02] Eric Brier and Marc Joye. Weierstraß Finite Fields, pages 189–201. Springer, elliptic curves and side-channel attacks. 2007. In International Workshop on Public Key [MKY16] Dindayal Mahto, Danish Ali Khan, and Cryptography, pages 335–345. Springer, Dilip Kumar Yadav. Security analysis of 2002. elliptic curve cryptography and rsa. In Proceedings of the World Congress on [DSF16] Giovanni Di Sirio and Giovanni Fontana. Engineering WCE, volume 1, 2016. Method for protecting a cryptographic de- vice against spa, dpa and time attacks, [Mon87] Peter L Montgomery. Speeding the pol- August 30 2016. US Patent 9,430,188. lard and elliptic curve methods of fac- torization. Mathematics of computation, [FGDM+ 10] Junfeng Fan, Xu Guo, Elke De Mulder, 48(177):243–264, 1987. Patrick Schaumont, Bart Preneel, and In- grid Verbauwhede. State-of-the-art of se- [MS00] Rita Mayer-Sommer. Smartly analyzing cure ecc implementations: a survey on the simplicity and the power of simple known side-channel attacks and counter- power analysis on smartcards. In Interna- measures. In Hardware-Oriented Secu- tional Workshop on Cryptographic Hard- rity and Trust (HOST), 2010 IEEE In- ware and Embedded Systems, pages 78– ternational Symposium on, pages 76–87. 92. Springer, 2000. IEEE, 2010. [Qu99] Minghua Qu. Sec 2: Recommended el- liptic curve domain parameters. Certicom [Gui13] Aurore Guillevic. Arithmetic of pairings Res., Mississauga, ON, Canada, Tech. on algebraic curves for cryptography. Rep. SEC2-Ver-0.6, 1999. PhD thesis, Ecole Normale Supérieure de Paris-ENS Paris, 2013. [Riv11] Matthieu Rivain. Fast and regular algo- rithms for scalar multiplication over el- [Joy07] Marc Joye. Highly regular right-to-left al- liptic curves. IACR Cryptology ePrint gorithms for scalar multiplication. In In- Archive, 2011:338, 2011. ternational Workshop on Cryptographic Hardware and Embedded Systems, pages 135–147. Springer, 2007. [KJJ99] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential power analysis. In Annual International Cryptology Confer- ence, pages 388–397. Springer, 1999. [Knu97] Donald Ervin Knuth. The art of com- puter programming: sorting and search- ing, volume 3. Pearson Education, 1997. International Conference on Advanced Aspects of Software Engineering Page 75 ICAASE, December, 01-02, 2018