=Paper=
{{Paper
|id=Vol-3179/Short_2.pdf
|storemode=property
|title=Algorithm for Increasing the Stability Level of Cryptosystems
|pdfUrl=https://ceur-ws.org/Vol-3179/Short_2.pdf
|volume=Vol-3179
|authors=Roman Kvyetnyy,Yaroslav Ivanchuk,Andrii Yarovyi,Yurii Horobets
|dblpUrl=https://dblp.org/rec/conf/iti2/KvyetnyyIYH21
}}
==Algorithm for Increasing the Stability Level of Cryptosystems==
Algorithm for Increasing the Stability Level of Cryptosystems Roman Kvyetnyy, Yaroslav Ivanchuk, Andrii Yarovyi and Yurii Horobets Vinnytsia National Technical University, Khmelnytsky highway 95, Vinnytsia, 21021, Ukraine Abstract The article considers the types of DSA and ECDSA cryptosystems affected by side channel attacks. A method for identifying the type of impact based on lattice attacks has been developed, which showed that if there are common bits in ephemeral keys, the sender's private key can be restored in polynomial time, if there are the required number of messages, depending on the number of common bits. As countermeasure this attack an ephemeral key generation algorithm was developed, using a deterministic signature, to increase the crypto- resistance of systems to lattice attacks, and a modular inverse algorithm was improved to perform the operation at a constant time to prevent sidechannel attack on modular inverse operation, which using for generation signature. Keywords 1 lattice, shared bits, modular inverse, constant-time, deterministic signature, cryptosystems 1. Introduction Cryptosystem is the implementation of cryptographic methods and their infrastructure for the provision of information security services [1]. Cryptosystems convert source data into unreadable form using encryption and decryption keys, which in turn allows for confidentiality of information. At the heart of any encryption system is the availability of information on the exchange of plaintext data only from the sender and recipient. Depending on the method of encrypting and decrypting information, cryptosystems are divided into two types [1]: 1. A symmetric encryption system is based on the use of the same type of key, the information on the use of which is agreed by the parties before the data session. For example, cryptosystems of this type: DES, Triple-DES, BLOWFISH, IDEA, AES [2]. The main disadvantage of this encryption system is the complexity of key management in large networks. 2. The asymmetric encryption system uses different types of keys, which are interconnected by a certain mathematical dependence. This approach allows you to get the original plaintext by using the inverse functions of mathematical operations. The main representatives of these cryptosystems encryption are DSA, DSS, RSA, ECDSA, El-Gamal [3 – 5]. The main disadvantages of these encryption systems are the availability of information about the public key of the sender and recipient when creating an encrypted message; the level of security of encryption systems is proportional to the level of complexity of operations based on various mathematical problems (integer factorization complexity, discrete logarithm problem [6]). For some encryption systems, an ephemeral key (nonce) is used, which is needed to add more complexity to the encryption algorithm, the main requirement for it, the ephemeral key must be used only once. For DSA, ElGamal, ECDSA encryption systems - when creating encrypted messages, an ephemeral key is additionally used, which in turn leads to attacks on side channels, FLUSH+RELOAD ephemeral key data, attacks on the pseudo-random number generator [7], using oscillography for timings attack [8], using masked digital signal [9]. All of these attack types result in the substitution or interception of individual bits of ephemeral key data, giving third parties access to encrypted messages, which in turn compromises the sender's secret key. A particularly dangerous cryptoattack types are side channel attack, which allow, by analyzing the time of generation of the ephemeral key, to find the most Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine EMAIL: rkvetny@sprava.net (A. 1); ivanchuk@ukr.net (A. 2); a.yarovyy@gmail.com (A. 3); yuriy.sparrow@gmail.com (A. 4) ORCID: 0000-0002-9192-9258 (A. 1); 0000-0002-4775-6505 (A. 2); 0000-0002-6668-2425 (A. 3); 0000-0002-0483-1042 (A. 4) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 293 or least significant bits of the ephemeral key. Then using special methods the sender's private key is searched. Articles [11, 12] show a crypto attack on a DSA-type encryption system, in which the sender's private key can be found using methods for solving number theory problems, namely the problem of hidden numbers, to solve which uses the method of short vectors or the method of close vectors. by collecting from the n-number of encrypted messages less and / or more significant bits of the ephemeral key, using a side channel attack, but this attack is not possible if the numerical value of these bits is unknown.Scientific research [12] investigates the cryptoattack type on the private key of the sender using the method of fast Fourier transformation, which allows you to effectively search for a private access key among a large number of encrypted messages. The high efficiency of this crytoattack leaves open the question of developing a reliable method of protecting systems from such cryptoattack. The article [14, 15] reveals the issue of protection against side channel attack, using the method of constant-time addition of numbers when creating encrypted messages. Using this method requires significant system resources, which generally slows down the process of generating encrypted messages. Therefore, it is important to increase the level of resilience of cryptosystems to various sidechannel attack, which will increase the overall level of security of data encryption when exchanging information. The aim of the work is to increase the degree of protection of the process of information exchange in system networks on the basis of the developed effective data encryption algorithm. To achieve this goal it is necessary to determine the vulnerability type of cryptosystems to sidechannel attack, as well as to develop an effective method protection cryptosystems from sidechannel attack to prevent unauthorized access to encrypted information. 2. A summary of the main material To identify a potentially dangerous type attack on private key, in based on the DSA encryption algorithm (ECDSA), it is necessary to analyze the mathematical description of this algorithms. To use the DSA algorithm, the signer selects two prime numbers p and q, and generator g, where p – prime number of size between 512 and 1024 bits, with increments of 64 bits; q – prime number, recommended by NIST standards [14] with size at least 160 bits, and corresponds to equality 2 N 1 q 2 N , where N {160, 224, 256} , and q | p 1 ; g – generator of order q subgroup G of finite field Fp abelian group. Furthermore, he selects randomly a {1,..., q 1} and computes А g a mod p . The public key of the signer is ( p, q, g , A) and his private key a. He also publishes a hash function h :{0,1} {0,..., q 1}, for mapping message into range {1, , q 1} . * To sign a message m, signer selects randomly k {1,..., q 1} , which is the ephemeral key (or nonce), and computes: r ( g k mod p) mod q; s k 1 (h(m) ar ) mod q . (1) The signature of m is the pair (r, s). The signature (1) is valid if and only if: 1 1 r (( g s h( m)mod q As r mod q ) mod p) mod q . (2) For the ECDSA the signer chooses an elliptic curve E over finite field F p , a point P E (F p ) (respectively to DSA is generator) with prime order q with size at least 160 bits. Appropriately, to FIPS 186-3 binary length of prime number p must be in set {160, 224, 256, 512}. Furthermore, for randomly chosen a {1, ..., q 1} computes Q aP . The public key of signer is ( E , p, q, P, Q) and his private key a. Also, signer publishes hash function h :{0,1}* {0,..., q 1} . To sign a message m, signer chooses random number k {1, ..., q 1} , computes kP ( x, y ) (where x and y is integers in range {0, , p 1} . Further, computes the signature of message m, is pair of integers (r, s): r x mod q; s k 1 (h(m) ar ) mod q . (3) For verification the signatures, receiver computes: u1 s 1h(m) mod q, u2 s 1r mod q, u1P u2Q ( x0 , y0 ) . (4) 294 The signature (3) is valid if and only if, when r x0 mod p . The security of the two systems s relied on the assumption that the only way to forge the signature is to recover either the secret key a, or the ephemeral key k. Thus, the parameters of these systems were chosen in such a way that the computation of discrete logarithms is computationally infeasible. To perform a potentially dangerous crypto attack by an attacker, which will reveal the signer private key, an attacker can block some bits of register or memory that contain the value of the ephemeral key k, while the value of the blocked bits is unknown. Assume that an attacker was able to collect n-number encrypted messages mi (i=1, …, n) with the associated signatures (ri, si), where all the corresponding ephemeral keys ki shared total ẟ bits between more significant (MSB) and less significant (LSB) bits independent of i. In this way, they will meet the following dependency for all i=1, ..., n: ki k 2t ki 2t ' k ' . (5) where 0 k 2t , 0 k ' 2 N t ' , t ' t , 0 ki 2 N , with k and k´ common for all ki. Figure 1 shows a diagram of the placement of bits in the ephemeral key. Figure 1: Scheme of the ephemeral key k – numerical value of the total more significant bits (MSB), k′ – numerical value of common less significant bits (LSB), k i - numeric value of the current bits, t – number of common more significant bits, t′ – number of common less significant bits It should be noted that the values of the variables ki , k , ki , k ' are unknown. In the n of equations (4), which define the signature: m1 ar1 s1k1 0 (mod q); m ar s k 0 (mod q); 2 2 2 2 (6) mn arn sn kn 0 (mod q), where mi – message, a – signer private key, ri – numerical value of the first part of the signature, si – numerical value of the second part of the signature, ki – ephemeral key, i-message index. If in the system of equations (6) replace the parameters mi, ri, si with the value (5) and eliminate common variables k and k′, then we obtain: ( s11m1 s2 1m2 ) a( s11r1 s2 1r2 ) 2t (k1 k2 ) 0(mod q); 1 1 1 1 ( s1 m1 s3 m3 ) a( s1 r1 s3 r3 ) 2 (k1 k3 ) 0(mod q); t (7) ( s 1m s 1m ) a( s 1r s 1r ) 2t (k k ) 0(mod q). 1 1 n n 1 1 n n 1 n Let i , i i Z be such that: i : 2t ( s11m1 si 1mi ) mod q; i : 2 ( s1 r1 si ri ) mod q; t 1 1 (8) i : k1 ki . Then, system of equalities (7) becomes: 295 2 a 2 k2 0 (mod q); a 3 0 (mod q); (9) a 0 (mod q), n n n where a, κi and αi, βi – are known value and unknown value. The set of solutions: L {( x0 , x1 ,..., xn ) Z n 1 | x0 i x1i xi 0 (mod q)} , (10) forms an (n+1)–dimension lattice spanned by the row vectors of the following basis matrix: 1 0 n 0 1 n M 0 0 q 0 . (11) 0 q 0 0 0 To find the short vector in the matrix (11), LLL or BKZ algorithms [17, 18] can be used. If the system of equations (7) has a solution ((determining the value of a short vector 0 with elements v0 (1, a, n ) , which will be identical to the element of the system of solutions (10), then the value of the signer’s private key a will be obtained. However, the norm 0 is lower bounded by secret key a, which can be an integer of approximately N bits. Then the next element of the vector 0 is much larger than the next ones, which are N-ẟ integers. Vector 0 there is no ground to be a short vector L, when signer private key a is upper-bounded. Therefore, assuming that private key a, smaller than 2N , then, it is necessary to adapt the method described above to find the signer’s private key a length of N bits, to solve this problem can be used one of methods described below: If the value ẟ is small (no more then 8) , then the method goes to the previous, with searching for N N most significant bits of a, then we have a a 2 a ' , where a 2 and a ' 2 . Then, in (8) αi changing to i 2 t ( s11m1 si 1mi ) a '2 t ( s11r1 si 1ri ) . Removing the second column from the matrix (11), then get the matrix: 1 n 0 n M 0 q ' 0 , n 2. (12) 0 0 q 0 But this method requires more messages and larger matrix that negatively smashes the speed of work. Using a weighted Euclidean product of vectors to get a vector v0 in the basis of reduced algorithms using knowledge about the size of the target vector ν0. For example, we take a weighted Euclidian product while performing reduction of basis: n ( x0 ,..., xn ), y0 ,..., yn ) : xi yi 2 2( N [log2 ( v0 ,i )]) , (13) i 0 The use of this method reduces the number of common bits ẟ. Weights can be used without changing the norm that in reality corresponds to the multiplication of all columns to the corresponding weight. The previous method is used when the ephemeral keys are shared by multiple bits blocks. Now assume that we have n messages mi (i = 1, …, n) with the associated signatures (ri, si), where ephemeral keys ki share a total ẟ bits distributed between l block of bits. Denote by ẟi the number of bits i-th block at 296 position pi . Let t = (t1, …, tl) be a l-set of integers, then set 2t (2t1 , ..., 2tl ) . Then ephemeral key ki has form: ki 2 p b 2 t k i , (14) where b=(b1, …, bi) is the vector of shared bits blocks, p=(p1, …, pl) is vector of position a l blocks, ki=(ki,0, …, ki,l) the vector of no shared bits blocks at positions t = (t0, …, tl). On figure 2, depicts the scheme of location of bits blocks. Figure 2: Scheme of bits location After stating that t : 0 and pl 1 : N , it follows that for all i =1, …, n і and for all j = 1, …, n: t j p j j , j , 0 b j 2 j , 0 ki , j 2 j1 j . (p t ) (15) j Values of ki , ki , j , b j are unknown. In signatures (7), substitute ki by (15) and eliminate the common variable b: ( s11m1 s2 1m2 ) a ( s11r1 s2 1r2 ) l 2t j (k1, j k 2, j ) 0 (mod q ); j 0 ( s 1m s 1m ) a ( s 1r s 1r ) l 2 j (k1, j k3, j ) 0 (mod q ); t 1 1 3 3 1 1 3 3 j 0 (16) 1 ( s1 m1 sn 1mn ) a ( s11r1 sn 1rn ) j 0 2 j (k1, j kn , j ) 0 (mod q ). l t Let i , i Z та i , t Z l are those: i : s11m1 si 1mi mod q; i : s1 r1 si ri mod q; 1 1 (17) i : (k1,1 , ...., k1,l ) (ki ,1 , , ki ,l ); t : (t , ...., t ), 1 l then (16) becomes: 2 a 2 2t 2 k1,0 k2,0 (mod q ); a 2 3 k1,0 k3,0 (mod q ); t (18) a 2t k k (mod q ), n n n 1,0 n ,0 where a and κi – unknown αi and βi known. Embedding this equation in a lattice L, spanned by row vectors of the following basis matrix. I l ( n1)2 2 n 2 n tI I l ( n1)2 2 1 ( n1) (19) M . I l ( n1)2 I 2 t1 I ( n1) l ( n1)2 0 qI ( n 1) 297 Then i , i , i Z be this form: i : ( s11m1 si 1mi ) mod q; i : ( s1 r1 si ri ) mod q; 1 1 (22) : k k . i 1,1 i ,1 In this case, (18) be this: 2 a 2 2t1 2 k1,0 k2,0 (mod q ); a 2 1 3 k1,0 k3,0 (mod q ); t (23) a 2t1 k k (mod q ). n n n 1,0 n ,0 Then, lattice L spanned by row vectors of the following basis matrix be such that: 1 0 0 0 2 an 0 1 0 0 2 n 0 0 1 0 2t1 0 M 0 0 0 t1 (24) 1 0 2 0 0 0 0 q 0 0 0 0 0 0 q Target element L is a vector: v0 (1, a, 2 ,..., n , k1,0 k2,0 ,..., k1,0 kn,0 ) (1, a, 2 ,...., n , 2 ,...., n ) M . (25) To search for a private key a length of N bits, can also apply the methods described above. The developed approach to obtaining access to private keys from signed messages proves that the DSA (ECDSA) algorithm is vulnerable to side-channel attack, in case when the ephemeral keys of each encrypted message have shared bits in less and/or more significant bits. Figure 3 shows the relationship between the number of shared bits and the number of necessary messages to the success rate. Figure 3: Diagram of dependence between the number of shared bits and the number of messages necessary to the percentage of the probability attack success 298 To achieve this goal, a function for generating an ephemeral key in the form of a pseudocode, which will receive a message (m) and private key of the sender (d), is proposed – algorithm 1 (Deterministic Ephemeral Key Generation Algorithm): function ( EC ) DSA _ SIGN (m, d ) e H ( m) repeat repeat u HMAC _ SIZE (d , e) ( x, y ) uP E (F p ) # for ECDSA r x mod q # for ECDSA r ( g k mod p ) mod q # for DSA until r 0 s u 1 (e dr ) mod q until s 0 return (r , s). The feature of the represented function for ephemeral key generation, from others, is the use of an HMAC algorithm, which allows you to generate more reliable random numbers, and prevents attacks by side channels. Due to the use of additional hash function that hides bits of ephemeral key from a potential attack, when other algorithms are presented in various cryptographic libraries use direct operations with the bits of the ephemeral key in the memory registers, which leads to attacks on side channels. The use of a new generation algorithm does not require changes to the message verification procedure, and will provide protection against attacks by side channels, thanks to the use of additional hash function that follows the integrity of the data, and mixes the ephemeral key for greater randomness. Also, one of the vulnerable points in the creation of signatures (1), (3) is modular inverse operation in the – algorithm 2 (Not constant-time modular inverse algorithm [17]): Input: a, b Z0 with gcd(a, b) 1,0 b a , (b1 2k mod a, k ) Output: [log 2 (a)] k 2[log 2 (a)]. u a, v b, r 0, s 1, k 0 while v 1do if u 0 (mod 2) then u u / 2, s 2 s else if v 0 (mod 2) then v v / 2, r 2 r else if u v then u (u v ) / 2, r r s, s 2 s else v (v u ) / 2, s r s, r 2 r end if k k 1 end while return ( s, k ). The execution time of the algorithm 2 depends on both values a and b, namely the number of iterations of the cycle while, namely the number of iterations of the While cycle, as well as the degree 299 of two, which should be removed in the end of the algorithm, are significantly different for input data. Therefore, to convert an algorithm 2 into an algorithm performed by a constant time for a given wn bit module m, it is necessary to save the following requirements: 1. One iteration is always calculated by the same amount of time, it means to calculate all four branches from the algorithm 2 and select the correct values for a constant time. This ensures that the time of execution of one iteration does not depend on the branch taken, but means that the calculation time increases until the calculation of all branches; 2. The algorithm must perform the one-time number of iterations (for a constant time), which means that it is necessary to calculate the worst number of 2wn iterations. This can be realized by determining when the algorithm 2 is completed (when v=1). Depending on this condition, we create a bit mask and choose the input value for this iteration (when v=1), or the value, calculation with iteration with constant time ( v 1 ). After making modifications in the algorithm 2, we will receive a new algorithm 3 that will perform a modular inverse operation for a constant time – algorithm 3 (Constant-time algorithm of modular inverse): Input: a, b Z0 with gcd(a, b) 1,0 b a , (b1 2k mod a, k ) Output: [log 2 (a)] k 2[log 2 (a)]. u a, v b, r 0, s 1, k 0 S S bitflip (m3 ) for i 1to 2[log 2 (a )] do m 5 S (0 uv < ) uv < sub(u ', u, v) m 6 bitflip (m 5 ) uv = equal (u ', 0) rshift1 (u ', u ') # else if u v 0 if u v select (u , u ', m 6 , u , m 5 ) d 0 uv = # d 2 1if u v select (r , rs, m 6 , r , m 5 ) lshift1 ( s , s) select (r , s , m 6 , s, m 5 ) add (rs, r , s ) rshift1 (u, u ) S S bitflip (m 5 ) m1 d (0 (u 0 1)) # if u 0 m 7 bitflip (S) m 2 bitflip(m1 ) sub(v , v, u ) select (u, u, m 2 , u, m1 ) rshift1 (v , v ) # else select ( s, s , m 2 , s, m1 ) select (v, v , m 7 , v, S) lshift1 (r , r ) select ( s, rs, m 7 , s, S) S (d bitflip(m1 )) select (r , r , m 7 , r , S) m 3 S 0 ( v 0 1) m 4 bitflip(m 3 ) # else if v 0 k ((k d) ((k 1) bitflip (d))) rshift1 (v , v) end for select (v, u, m 4 , v, m 3 ) return ( s, k ). select (r , r , m 4 , r , m 3 ) 3. Conclusions The vulnerability type of cryptosystems is determined for side channel attack, which is concluded in blocking the bits of the ephemeral key or determining the signature time of each message. Using the developed method, the signers private key can be determined in the presence of a sufficient number of 300 messages in less than a minute, bypassing a discrete logarithm problem, which in turn fully compromises the stability of cryptosystems. Was developed a effective method for protecting cryptosystems to attack by side channels to prevent unauthorized access to encrypted information, based on a new ephemeral key generation algorithm to prevent attacks by side channels, which increases the count a mathematical operation to 2N. The use of the developed constant time algorithm of modular inverse, provides a constant number of clock cycles, namely, the number of bits of 256, the number of cycles will not be constant 50-60. When using this algorithm, the number of clock cycles is always 486, which provides resistant to the attack. 4. References [1] A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone, Handbook of applied cryptography. 1996. [2] M. Ubaidullah and Q. Makki, “A Review on Symmetric Key Encryption Techniques in Cryptography,” Int. J. Comput. Appl., vol. 147, no. 10, 2016, doi: 10.5120/ijca2016911203. [3] T. Elgamal, “A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms,” IEEE Trans. Inf. Theory, vol. 31, no. 4, 1985, doi: 10.1109/TIT.1985.1057074. [4] M. Kirschmer, J. Voight, Algorithmic enumeration of ideal classes for quaternion orders, SIAM J. Comput. 39 (2010) 1714–1747. URL: http://dx.doi.org/10.1137/080734467. doi:10.1137/080734467. [5] Roman N. Kvyetnyy, Yevhenii A. Titarchuk, Volodymyr Y. Kotsiubynskyi, Waldemar Wójcik, Nursanat Askarova. Partially homomorphic encryption algorithm based on elliptic curves.- Proc. SPIE 10808, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, 108082H (1 October 2018); 8 p. doi: 10.1117/12.2501583; https://doi.org/10.1117/12.2501583. [6] P. Q. Nguyen and P. Q. Nguyen, “Public-key Cryptanalysis,” 2008. [7] P. A. Fouque, M. Tibouchi, and J. C. Zapalowicz, “Recovering private keys generated with weak prngs,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, vol. 8308 LNCS, doi: 10.1007/978-3- 642-45239-0_10. [8] Y. Ivanchuk. Mechatronic Systems II. Applications in Material Handling Processes and Robotics, edited by Leonid Polishchuk, Orken Mamyrbayev, Konrad Gromaszek, (2021), Taylor & Francis Group, CRC Press, Balkema book Boca Raton, London, New York, Leiden, 352 P. ISBN 978-1- 032-10585-7, DOI: 10.1201/9781003225447. [9] A. A. Yarovyy, L. I. Timchenko, N. I. Kokriatskaia, "Parallel-Hierarchical Computing System for Multi-Level Transformation of Masked Digital Signals," Advances in Electrical and Computer Engineering, vol.12, no.3, pp.13-20, 2012, doi:10.4316/AECE.2012.03002. [10]D. Boneh and R. Venkatesan, “Hardness of computing the most significant bits of secret keys in diffie-hellman and related schemes,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1996, vol. 1109, doi: 10.1007/3-540-68697-5_11. [11] N. A. Howgrave-Graham and N. P. Smart, “Lattice Attacks on Digital Signature Schemes,” Des. Codes, Cryptogr., vol. 23, no. 3, 2001, doi: 10.1023/A:1011214926272. [12] E. De Mulder, M. Hutter, M. E. Marson, and P. Pearson, “Using Bleichenbacher’s solution to the hidden number problem to attack nonce leaks in 384-bit ECDSA: Extended version,” J. Cryptogr. Eng., vol. 4, no. 1, 2014, doi: 10.1007/s13389-014-0072-z. [13] D. Jayasinghe, R. Ragel, and D. Elkaduwe, “Constant time encryption as a countermeasure against remote cache timing attacks,” 2012, doi: 10.1109/ICIAFS.2012.6419893. [14] P. D. Gallagher and C. Romine, “FIPS PUB 186-4 Digital Signature Standard (DSS),” Encycl. Cryptogr. Secur., no. July, 2013. [15] P. Q. Nguyễn and D. Stehlé, “Floating-point LLL revisited,” in Lecture Notes in Computer Science, 2005, vol. 3494, doi: 10.1007/11426639_13. [16] C. P. Schnorr and M. Euchner, “Lattice basis reduction: Improved practical algorithms and solving subset sum problems,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1991, vol. 529 LNCS, doi: 10.1007/3-540-54458-5-51. [17] B. S. Kaliski, “The Montgomery Inverse and Its Applications,” IEEE Trans. Comput., vol. 44, no. 8, 1995, doi: 10.1109/12.403725. 301