Comparison of scalar multiplication algorithms in a
                 low resource device
        Mohamed Ramdani                           Mohamed Benmohammed                               Nadjia Benblidia
LRDSI Laboratory, Computer Science            LIRE Laboratory, Computer Science           LRDSI Laboratory, Computer Science
                Dept                                        Dept                                          Dept
 University of Blida1, Blida, Algeria            University of Constantine2,               University of Blida1, Blida, Algeria
       moumouhr@yahoo.fr                            Constantine, Algeria                         benblidia@gmail.com
     med.ramdani@yahoo.com                        ben_moh123@yahoo.com


                                                                   1    Introduction
                          Abstract                                    Cryptography has emerged as a reliable and power-
                                                                      ful solution for maintaining data confidentiality. To
      Scalar multiplication of a point is the central
                                                                      make information eligible, cryptographic mechanisms
      operation in the elliptic curves cryptography
                                                                      use complex mathematical methods that often involve
      (ECC). It’s a complex operation that requires
                                                                      intensive computations. This intensity is often a prob-
      a lot of optimization especially on execution
                                                                      lem for low-resource systems like embedded systems.
      times to reduce resource consumption in sys-
                                                                         There are two ways to encrypt a message; symmet-
      tems with low computing power and memory
                                                                      ric method (Secret Key Cryptography SKC) based on
      such as embedded systems. Several works on
                                                                      shared private keys, and asymmetric method (Public-
      the acceleration of computation and many oth-
                                                                      Key Cryptography PKC) based on a couple of public
      ers on the reduction of the complexity of math-
                                                                      and private keys. The advantage of the latter is that all
      ematical methods used in the computations on
                                                                      exchanges are public and its security is based on the dif-
      the elliptic curves have been realized. Many
                                                                      ficulty of finding the secret from information exchanged
      algorithms for computing scalar multiplication
                                                                      publicly.
      are proposed, each using their own calculation
      technique and mathematical methods. In this                        One of the most recently used asymmetric cryptosys-
      paper, we combine the techniques of acceler-                    tems   is Elliptic Curve Cryptography (ECC). ECC uses
      ated computations with optimized mathemat-                      short  keys for equal security to other asymmetric cryp-
      ical methods and we implement them on al-                       tosystems     [MKY16]. It is recommended for systems
      gorithms for scalar multiplication of a point.                  in which    electronic devices have low computing power
      This work aims to distinguish the most efficient                and   very  limited  memory    such as embedded systems.
      computation algorithm with the best accelera-                   However,     despite recent  optimizations  on ECC, essen-
      tion technique. The results show that the Joye                  tially  the  reduction  of computational    complexity,   this
      algorithm combined with the co-Z Doubling-                      cryptosystem     remains  complex   and  requires further op-
      Addition acceleration technique gives better                    timization.
      results and saves more than one second of com-                     The important and costly operation in Elliptic Curve
      putation time compared to the other algorithms                  Cryptography      for encryption, decryption, digital sig-
      implemented in this paper.                                      nature,   key  exchange,   etc is scalar multiplication of a
                                                                      point. This is a complex operation that requires more
                                                                      optimizations for the acceleration of cryptographic com-
Keywords – Elliptic curves cryptography (ECC), Scalar putations. The execution time of any ECC operation de-
multiplication, embedded systems, acceleration compu- pends on the execution time of the scalar multiplication.
tations, co-Z addition                                                   An embedded system is an autonomous system, gen-
                                                                      erally dedicated to specific tasks, having a limited size
Copyright c by the paper’s authors. Copying permitted only for pri- and low resources. There are two important constraints
vate and academic purposes.
In: Proceedings of the 3rd Edition of the International Conference on
                                                                      of an embedded system: optimized computing power
Advanced Aspects of Software Engineering (ICAASE18), Constan- while respecting the temporal and spatial constraints,
tine, Algeria, 1,2-December-2018, published at http://ceur-ws.org     and an essential security to ensure data confidentiality


                                                                                                                            Page 70
    Comparison of scalar multiplication algorithms in a low resource device                                              ICAASE'2018


     especially for sensitive applications. The system used                      The standard algorithm called Dbl-and-add [Knu97]
     in this work is an Arduino card very limited in memory                   for calculating scalar multiplication of a point is given
     resources and computing power. It is an open source                      by algorithm 1.
     system, open hardware and open source bootloader.
        In this paper, we present an implementation of scalar                 Algorithm 1 Standard Dbl-and-add algorithm
     multiplication algorithms on a low-resource system to                    Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )
     compare and derive the most optimized computation                        Ensure: Q = k·P
     method for such systems. We define the scalar multi-                       R0 = 0/
     plication of a point and present some algorithms of its                    R1 = P
     computation, in section 2. Section 3 is dedicated to ac-                   for i ← 1 to d-1 do
     celeration techniques of cryptographic computations on                       if ki = 1 then
     a scalar multiplication and to the system of coordinates                         R0 ← R0 + R1
     used. In Section 4, we present the software implemen-                        end if
     tation of acceleration techniques in scalar multiplication                   R1 ← 2 × R1
     algorithms and present the comparison results in Section                   end for
     5.                                                                         return R0 ;
     2     Scalar multiplication
     The basic operation performed in protocols based on El-            This algorithm has a huge disadvantage for the se-
     liptic Curve Cryptography is the scalar multiplication of       curity of the private key used because it is very vul-
     a point on the curve. Each scalar multiplication requires       nerable to attacks SPA [MS00] and DPA [KJJ99]. In-
     several thousand operations in a finite field. Let k be a       deed, by analyzing either the energy consumed or
     scalar of n bits, P and Q two points of an elliptic curve       the number of clock cycles performed per operation,
     defined on a finite field F by the Weierstrass equation         an attacker can reconstruct the private key. This
     [Gui13][BJ02]:                                                  fault is corrected by several works [Mon87][Joy07]
                                                                     [LS01][DSF16][FGDM+ 10], we choose here to work
                                                                     on the algorithms of Mongomery ladder [Mon87] and
          E : Y 2 + a1 XY + a3 = X 3 + a2 X 2 + a4 X + a6        (1) the algorithm of Joye Dbl-and-add [Joy07] (see algo-
     where a1 , a2 , a3 , a4 ∈ F. In this work, we consider the rithms 2 and 3).
     finite prime fields Fp . Equation (1) then becomes:
                                                                     Algorithm 2 Montgomery ladder algorithm
                         E : Y 2 = X 3 + aX + b                  (2) Require: k = (k ,..., k1 , k0 )2 , P ∈ E(Fp )
                                                                                      d−1
     where a, b ∈ Fp . The multiplication of the scalar k by Ensure: Q = k·P
     the point P of the curve is another point Q s.t Q = k · P          R0 = 0/
     = P + P + . . . + P (k times).                                     R1 = P
        Two operations are necessary to perform a scalar mul-           for i ← d-1 downto 0 do
     tiplication: the point doubling P + P = 2P = P’ and the              b ← ki
     point addition P + P’. The computations of doubling and              R1−b ← R1−b + Rb
     addition are given by formulas (3) and (4) below. Let P              Rb ← 2 × Rb
     (X1 , Y1 ) and Q (X2 , Y2 ) be two points of the elliptic curve    end  for
     s.t P, Q ∈ E (F ) and P 6= Q : P + P = 2P = (X , Y ) is            return   R0 ;
                       p                                       3   3
     computed as follows:
                                         2
                                3X12 + a
                                   
                            X3 =            − 2X1               Algorithm 3 Joye Dbl-and-add algorithm
                                  2Y1
                        2      2                          (3) Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )
                        3X1 + a                                 Ensure: Q = k·P
                  Y3 =             (X1 − X3 ) −Y1
                          2Y1                                     R0 = 0/
                                                                  R1 = P
     and P + Q = (X4 , Y4 ) is the result of an addition of the
                                                                  for i ← 0 to d-1 do
     two points P and Q is computed as follows:
                                                                    b ← ki
                              Y2 −Y1 2                              R1−b ← 2 × R1−b + Rb
                                     
                    X4 =                 − X1 − X2                end for
                             X2 − X1
                                                            (4)   return R0 ;
                        Y2 −Y1 2
                                
                Y4 =                 (X1 − X4 ) −Y1
                       X2 − X1


International Conference on Advanced Aspects of Software Engineering                                                             Page 71
ICAASE, December, 01-02, 2018
    Comparison of scalar multiplication algorithms in a low resource device                                    ICAASE'2018


                                                                   Doubling algorithm, a P + Q point addition by Jacobian
     Table 1: Calculation of point doubling and addition op-
                                                                   Addition and a Doubling-Addition operation (DA) 2P
     erations with Jacobean coordinates
                                                                   + Q by the Mixed Jacobian Affine algorithm. This last
                        P+Q                       2P               algorithm uses mixed coordinates; a point in Jacobian
           A                2
                        X1 Z2                  4X1Y1 2             coordinates and another point in Affine coordinates. In
           B                2
                        X2 Z1                   2
                                            3X1 + AZ1    4         co-Z Jacobian Formulae, a point doubling is computed
           C                3
                        Y1 Z2                      /               by the Initial Doubling algorithm (XYcZ-IDBL), an ad-
           D                3
                        Y2 Z1                      /               dition by (X;Y)-only co-Z addition (XYcZ-ADD) and
           E            B–A                        /               a DA by (X;Y)-only co-Z Doubling-Addition (XYcZ-
           F            D–C                        /               DA).
          X3        3
                  -E – 2AE + F3    2          -2A + B  2

          Y3 -CE 3 + F(AE 2 – X3 ) -8Y14 + B(A – X3 )              4 Software implementation
          Z3           Z1 Z2 E                  2Y1 Z1
                                                                   The various techniques of computational acceleration
                                                                   for point doubling and addition are implemented on
     3 Related work                                                the different scalar multiplication algorithms presented
                                                                   above in order to compare the performance of each al-
     The complexity of the computations in scalar multipli- gorithm with each technique. The algorithms obtained
     cation of a point and the large number of point doubling and used as support for our comparisons are presented
     and point addition calculated to perform this operation in the following:
     required a lot of optimization works reduce computa-
     tions. In this section, we present some algorithms that
                                                                      • Standard Dbl-and-add with Jacobian doubling and
     allow computation accelerations in a scalar multiplica-
                                                                        Jacobian addition (see algorithm 4)
     tion, but before that, we will focus first on the choice of
     coordinate systems used.
        In the Affine coordinate system, the addition and dou-        • Standard Dbl-and-add with co-Z Initial doubling
     bling formulas involve point inversion operations in Fp            and co-Z addition (see algorithm 5)
     which is considered very expensive on the finite fields.
     In order to avoid the inversion of points [KS17], we             • Montgomery ladder with Jacobian doubling and
     chose to work on the Jacobian coordinates where a point            Jacobian addition (see algorithm 6)
     of the curve is represented by three coordinates (X : Y :
     Z) which corresponds to the Affine point ( ZX2 , ZY3 ). Equa-    • Montgomery ladder with co-Z Initial doubling and
     tion (2) then becomes:                                             co-Z addition (see algorithm 7)

                      E : Y 2 = X 3 + aXZ 4 + bZ 6                   (5)      • Joye Dbl-and-add with Mixed Jacobian Affine
     The computation of the doubling P + P = 2P = (X3 : Y3                      Doubling-Addition (DA) (see algorithm 8)
     : Z3 ) and the addition P + Q = (X3 : Y3 : Z3 ) is given by
     the table 1.                                                  • Joye Dbl-and-add with co-Z Doubling-Addition
        An additional optimization of the addition, called co-       (DA) (see algorithm 9)
     Z addition, has been proposed by Meloni [Mel07] where
     he considers that two entry points share the same Z co-
     ordinate. Let P = (X1 : Y1 : Z) and Q = (X2 : Y2 : Z), the Algorithm 4 Standard Dbl-and-add with Jacobian dou-
     addition co-Z of P and Q (with P 6= Q) is defined by P + bling and Jacobian addition
     Q = (X3 : Y3 : Z3 ) so that :                               Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )
                               X3 = D − (B +C)                   Ensure: Q = k·P
                                                                   R0 = 0/
                     Y3 = (Y2 −Y1 )(B − X3 ) − E             (6)   R1 = P
                                Z3 = Z(X2 − X1 )                   for i ← 1 to d-1 do
                           2                                   2     if ki = 1 then
     where A = (X2 −X1 ) , B = X1 A, C = X2 A, D = (Y2 −Y1 )             (R0 ) ← Jacobian_addition(R0 ,R1 )
     and E = Y1 (C-B)                                                end if
        In [Riv11], several algorithms are proposed to per-          (R1 ) ← Jacobian_doubling(R1 )
     form a scalar multiplication of a point in Jacobian coor-     end for
     dinates with Standard Jacobian Formulae and co-Z ad-          return R0 ;
     dition Jacobian Formulae. In Standard Jacobian Formu-
     lae, a point doubling 2P is performed by the Jacobian


International Conference on Advanced Aspects of Software Engineering                                                 Page 72
ICAASE, December, 01-02, 2018
    Comparison of scalar multiplication algorithms in a low resource device                                              ICAASE'2018


                                                        Algorithm 9 Joye Dbl-and-add with Mixed Jacobian
     Algorithm 5 Standard Dbl-and-add with co-Z Initial
                                                        Affine DA
     doubling and co-Z addition
                                                        Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )
     Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )   Ensure: Q = k·P
     Ensure: Q = k·P                                      R0 = 0/
       R0 = 0/                                            R1 = P
       R1 = P                                             for i ← 0 to d-1 do
       for i ← 1 to d-1 do                                   b ← ki
         if ki = 1 then                                      (R1−b ,Rb ) ← XYcZ-DA(R1−b ,Rb )
             (R1 ,R0 ) ← XYcZ-ADD(R0 ,R1 )                end for
         end if                                           return R0 ;
         (R0 ,R1 ) ← XYcZ-IDBL(R0 )
       end for
       return R0 ;
                                                                              5   Results and comparison
                                                                              The results presented in this comparison are the result of
     Algorithm 6 Montgomery ladder with Jacobian dou-                         experiments carried out on a low-resource device with a
     bling and Jacobian addition                                              very low computing capacity and a very limited mem-
     Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )                         ory size. The device used is an Arduino Uno R3 board
     Ensure: Q = k·P                                                          whose characteristics, shown in Table 2, are close to
        R0 = 0/                                                               most embedded devices. The curves used are those rec-
        R1 = P                                                                ommended in [Qu99]. The size of the keys used is 192
        for i ← d-1 downto 0 do                                               and 256 bits.
          b ← ki
          (R1−b ) ← Jacobian_addition(R1−b ,Rb )
                                                                                       Table 2: Features of Arduino Uno R3
          (Rb ) ← Jacobian_doubling(Rb )
                                                                                             Feature          Type/Value
        end for
        return R0 ;                                                                      Chipset               Atmega328P
                                                                                         Clock Frequency        8/16 Mhz
     Algorithm 7 Montgomery ladder with co-Z Initial dou-              Voltage              1.8 - 5.5 V
     bling and co-Z addition                                           Bit Width               8 bits
     Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )
     Ensure: Q = k·P                                                   Static RAM               2 kB
        R0 = 0/                                                        Program Memory          32 kB
        R1 = P
        for i ← d-1 downto 0 do
          b ← ki                                             Table 3 shows the execution time (in ms) of point
          (R1−b ,Rb ) ← XYcZ-ADD(Rb ,R1−b )               doubling,    point addition and doubling-addition opera-
          (Rb ,R1−b ) ← XYcZ-IDBL(Rb )                    tions in  Standard  Jacobian Formulae and co-Z Jacobian
        end for                                           Formulae.
        return R0 ;                                          We find that the addition and doubling-addition oper-
                                                          ations in co-Z Jacobian Formulae are faster and require
                                                          less computations, unlike the Initial Doubling operation
     Algorithm 8 Joye Dbl-and-add with Mixed Jacobian where the treatments are heavier than Standard Jacobian
     Affine DA                                            Formulae.
     Require: k = (kd−1 ,..., k1 , k0 )2 , P ∈ E(Fp )        The execution time of the scalar multiplication algo-
     Ensure: Q = k·P                                      rithms (algorithm 4 to algorithm 9) is given by table 4
        R0 = 0/                                           and 5.
        R1 = P                                               The standard Dbl-and-add algorithm gives better re-
        for i ← 0 to d-1 do                               sults, its resource consumption is very low because the
          b ← ki                                          number of point addition is optimized to log(d/3) un-
          (R1−b ) ← Mixed_Jacobian_Affine_DA(R1−b ,Rb )   like the other two algorithms where at each iteration,
        end for                                           an addition is performed regardless of the value of the
        return R0 ;                                       bit. However, its vulnerability to SPA and DPA attacks
                                                          makes it almost inappropriate. In the other results, we


International Conference on Advanced Aspects of Software Engineering                                                             Page 73
ICAASE, December, 01-02, 2018
    Comparison of scalar multiplication algorithms in a low resource device                                              ICAASE'2018


                                                                              Table 5: Execution time of the scalar multiplication al-
                                                                              gorithms (P-256)
                                                                                                                     P-256
     Table 3: Execution time (in ms) of point doubling, ad-                            Algorithm
                                                                                                          Standard       co-Z Jaco-
     dition and doubling-addition operations                                                              Jacobian       bian
           Formulae         Algorithm          P-192       P-256
                                                                                                          Algorithm      Algorithm
                          Point Doubling (2P)                                     Standard Dbl-and-add    4              5
           Standard        Jacobian            6.064      11.732                                          4770           5724
           Jacobian        Doubling
                                                                                                          Algorithm      Algorithm
           co-Z Jaco-      XYcZ-               9.708      18.592                  Montgomery ladder       6              7
           bian            IDBL
                                                                                                          7471           6630
                        Point Addition (P + Q)
                                                                                                          Algorithm      Algorithm
           Standard        Jacobian           10.056      19.332                  Joye Dbl-and-add        8              9
           Jacobian        Addition
                                                                                                          7815           5528
           co-Z Jaco-      XYcZ-ADD            3.628       6.916
           bian
                                                                              notice that the algorithm 9 gives very interesting results
                     Doubling-Addition (2P + Q)
                                                                              with a reduced execution time of 428 ms compared to
           Standard        Mixed DA           15.501      29.712              the algorithm 7 for a 192-bit key and 1102 ms for a 256-
           Jacobian                                                           bit key.
           co-Z Jaco-      XYcZ-DA            10.916      20.808                 the graph below (see figure 1) gives a global compar-
           bian                                                               ison of all implemented algorithms.


     Table 4: Execution time of the scalar multiplication al-
     gorithms (P-192)
                                                  P-192
               Algorithm                                                      Figure 1: A global comparison of implemented algo-
                                      Standard          co-Z Jaco-
                                                                              rithms
                                      Jacobian          bian
                                      Algorithm         Algorithm
        Standard Dbl-and-add          4                 5
                                                                              6    Conclusion
                                      1841              2260
                                                                              In this work, we compared the different computational
                                      Algorithm         Algorithm             acceleration techniques for scalar multiplication by a
        Montgomery ladder             6                 7                     point on an embedded system in order to compare them
                                      2960              2641                  and to distinguish the best method of computing a scalar
                                                                              multiplication in a low-resource system. The hardware
                                      Algorithm         Algorithm             used is an Arduino embedded system with a microcon-
        Joye Dbl-and-add              8                 9                     troller limited in resources including memory and com-
                                      3085              2213                  puting power. We used Jacobian coordinate system with
                                                                              Standard Jacobian Formulae and co-Z Jacobian Formu-
                                                                              lae, an improved version of the addition called co-Z ad-
                                                                              dition, since this type of coordinates provides less com-
                                                                              plex and more efficient computation operations. We


International Conference on Advanced Aspects of Software Engineering                                                             Page 74
ICAASE, December, 01-02, 2018
    Comparison of scalar multiplication algorithms in a low resource device                              ICAASE'2018


     have implemented acceleration techniques on three al- [KS17]             K Keerthi and B Surendiran. Elliptic
     gorithms widely used in practice: the standard Dbl-and-                  curve cryptography for secured text en-
     add algorithm, the Montgomery ladder algorithm and                       cryption. In Circuit, Power and Comput-
     the Joye Dbl-and-add algorithm. We found that the Joye                   ing Technologies (ICCPCT), 2017 Inter-
     Dbl-and-add algorithm widh co-Z Doubling-Addition                        national Conference on, pages 1–5. IEEE,
     gives better results if we take into account the level of                2017.
     security offered, unlike the standard algorithm Dbl-and-
     add which offers a better time but it is very vulnerable [LS01]          P-Y Liardet and Nigel P Smart. Prevent-
     to SPA and DPA attacks. The time saved for a 256-bit                     ing spa/dpa in ecc systems using the ja-
     key is more than one second (1102 ms) for scalar multi-                  cobi form. In International Workshop on
     plication using the Joye Dbl-and-add algorithm coupled                   Cryptographic Hardware and Embedded
     with the technique co-Z Doubling-Addition compared                       Systems, pages 391–401. Springer, 2001.
     to other algorithms implemented in this work.             [Mel07]        Nicolas Meloni. New point addition for-
                                                                              mulae for ecc applications. In Inter-
     References                                                               national Workshop on the Arithmetic of
     [BJ02]           Eric Brier and Marc Joye. Weierstraß                    Finite Fields, pages 189–201. Springer,
                      elliptic curves and side-channel attacks.               2007.
                      In International Workshop on Public Key [MKY16]         Dindayal Mahto, Danish Ali Khan, and
                      Cryptography, pages 335–345. Springer,                  Dilip Kumar Yadav. Security analysis of
                      2002.                                                   elliptic curve cryptography and rsa. In
                                                                              Proceedings of the World Congress on
     [DSF16]          Giovanni Di Sirio and Giovanni Fontana.                 Engineering WCE, volume 1, 2016.
                      Method for protecting a cryptographic de-
                      vice against spa, dpa and time attacks, [Mon87]         Peter L Montgomery. Speeding the pol-
                      August 30 2016. US Patent 9,430,188.                    lard and elliptic curve methods of fac-
                                                                              torization. Mathematics of computation,
     [FGDM+ 10] Junfeng Fan, Xu Guo, Elke De Mulder,                          48(177):243–264, 1987.
                Patrick Schaumont, Bart Preneel, and In-
                grid Verbauwhede. State-of-the-art of se- [MS00]              Rita Mayer-Sommer. Smartly analyzing
                cure ecc implementations: a survey on                         the simplicity and the power of simple
                known side-channel attacks and counter-                       power analysis on smartcards. In Interna-
                measures. In Hardware-Oriented Secu-                          tional Workshop on Cryptographic Hard-
                rity and Trust (HOST), 2010 IEEE In-                          ware and Embedded Systems, pages 78–
                ternational Symposium on, pages 76–87.                        92. Springer, 2000.
                IEEE, 2010.                               [Qu99]              Minghua Qu. Sec 2: Recommended el-
                                                                              liptic curve domain parameters. Certicom
     [Gui13]          Aurore Guillevic. Arithmetic of pairings
                                                                              Res., Mississauga, ON, Canada, Tech.
                      on algebraic curves for cryptography.
                                                                              Rep. SEC2-Ver-0.6, 1999.
                      PhD thesis, Ecole Normale Supérieure de
                      Paris-ENS Paris, 2013.                   [Riv11]        Matthieu Rivain. Fast and regular algo-
                                                                              rithms for scalar multiplication over el-
     [Joy07]          Marc Joye. Highly regular right-to-left al-             liptic curves. IACR Cryptology ePrint
                      gorithms for scalar multiplication. In In-              Archive, 2011:338, 2011.
                      ternational Workshop on Cryptographic
                      Hardware and Embedded Systems, pages
                      135–147. Springer, 2007.

     [KJJ99]          Paul Kocher, Joshua Jaffe, and Benjamin
                      Jun. Differential power analysis. In
                      Annual International Cryptology Confer-
                      ence, pages 388–397. Springer, 1999.

     [Knu97]          Donald Ervin Knuth. The art of com-
                      puter programming: sorting and search-
                      ing, volume 3. Pearson Education, 1997.


International Conference on Advanced Aspects of Software Engineering                                            Page 75
ICAASE, December, 01-02, 2018