=Paper=
{{Paper
|id=Vol-2393/paper_363
|storemode=property
|title=Polynomials Multiplier under Irreducible Polynomial Module for High-Performance Cryptographic Hardware Tools
|pdfUrl=https://ceur-ws.org/Vol-2393/paper_363.pdf
|volume=Vol-2393
|authors=Maksat Kalimoldayev,Sahibay Tynymbayev,Miras Magzom,Margulan Ibraimov,Serik Khokhlov,Akmaral Abisheva,Viktoriia Sydorenko
|dblpUrl=https://dblp.org/rec/conf/icteri/KalimoldayevTMI19
}}
==Polynomials Multiplier under Irreducible Polynomial Module for High-Performance Cryptographic Hardware Tools==
<pdf width="1500px">https://ceur-ws.org/Vol-2393/paper_363.pdf</pdf>
<pre>
 Polynomials Multiplier under Irreducible Polynomial
Module for High-Performance Cryptographic Hardware
                        Tools

      Maksat Kalimoldayev1, Sakhybay Tynymbaev2, Miras Magzom2, Margulan
       Ibraimov2, Serik Khokhlov2, Akmaral Abisheva2, Viktoriia Sydorenko3
     1Information and Computational Technologies Institute, Almaty, 050010, Kazakhstan
            2Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
                    3National Aviation University, Kyiv, 03058, Ukraine


              s.tynym@mail.ru margulan.ibraimov@kaznu.kz
     magzom_miras@gmail.com ak_maral@mail.ru v.sydorenko@ukr.net


       Abstract. One of the most popular and effective methods of information
       security is cryptographic, that can be realized in software as well as in hardware
       tools. Hardware cryptographic devices are oriented on confidentiality ensuring
       put some actual problems must be solved. For the purpose to raise the
       performance of computing devices productivity, it is necessary to use number
       systems without the disadvantages of the radix numeration system. This is due
       to the fact that while performing on multi-digit numbers arithmetic operations
       represented in the positional system, it becomes necessary to take into account
       inter-bit transfers that far slows down the computation speed and complicates
       the calculator structure. The new ways search to improve the computing devices
       performance led researchers to an objective conclusion that in this direction of
       the positional number system all possibilities have been exhausted. In order to
       boost productivity of computing devices, it is necessary to use number systems
       without such disadvantages.

       Keywords: Information Security, Cryptosystem, FPGA, Polynomials, Modular
       Multiplication.


1.   Introduction

Topical tendencies of computer equipment and system development require the
elaboration of high-performance computing devices, including information security.
By information and communication networks and the integrating devices
development the need for creating efficient cryptographic transformations hardware
solutions will grow. For example, hardware cryptographic devices are few times
faster than software cryptographic tools. But hardware tools have some problems that
must be solved to provide efficient confidentiality ensuring.
2. Modern Approaches and Problem Definition
There are tasks leading to calculations when the integer values variables far exceed the
maximum range of typical computing devices, defined by the hardware-supported
machine word length [1, 2]. The hardware implementation deemed to be efficient from
the point of view of processing speed and capabilities, solving such issues by traditional
approaches is near impossible [3, 4].
For example, concerning ECC or RSA cryptosystems, the main difficulty in
cryptographic transformations is first of all due to the need to perform sequential
modular multiplication by multi-digit numbers [5]. In such cryptosystem
implementation, an important task is to ensure effective modular multiplication [6].
It should not be overlooked, that most of the modern computing equipment operates in a
radix numeration system. For multi-digit numbers arithmetic operations represented in the
traditional positional system, a need to take into account inter-bit transfers arises, that
significantly slows down the computation speed and complicates the calculator structure.
Consequently, relevant researches devoted for searching new ways to improve the
computing devices performance are topical. The studies focused on the use of non-
traditional methods of coding numerical information and the corresponding parallel
variants of computer arithmetic are of great importance.
The value of each numeric character (number) in the designation of a number depends
on its position or digit in the traditional radix numeration system. However, besides this,
there are also so-called “non - radix numeration system”, one of which is the “residue
number system” (RNS) [7]. RNS application is an efficient way of large data
calculations. Particularly, the RNS application allows to increase the operations speed
due to the lack of transfer when adding, dividing a large block of input data into smaller
sub-blocks and parallel processing.
The residue number system is a data representation system in computational arithmetic,
where integer is denoted by a set of smaller numbers.
In the residue number system, a positive integer is represented as a sequence of residues
or deductions:
                                           A  (a1 , a2 ,, an ).                       (1)
From dividing to set positive integers p1 , p2 ,, pn , that are called as system basis.
i numbers are derived in such a way:
                                         A
                              i  A        pi , i  1, n,                           (2)
                                         pi 
where  A / pi  means whole part from dividing A by pi . From (2) it follows, that
number i -bit i of number A is the least positive remainder of the dividing A by pi
and i  pi . In this case, the digits formation of each bit is carried out independently of
each other. In accordance with the Chinese theorem on remainder number
representation A in the form (1) will be unique if the numbers are pi pairwise simple.
The range volume of representable numbers in this case is equals to P  p1 , p2 ,..., pn .
In this case, similarly to the radix numeration system, the range of representable
numbers grows as a product of bases, and the digit bits of numbers grows as the sum of
the digit capacity of the same bases.
The main privileges that make it possible to effectively use modular arithmetic in some
fields of computing technology are: a high level of natural parallelism at the number
system level, that is related to the absence of digits transfer in addition and
multiplication, as well as the absence of error propagation. In contrast to the radix
numeration system, all vector elements are equivalent, and an error in one of them leads
only to a dynamic range reduction. This fact allows you to design devices of high fall-
over protection and error correction [8].
These features ensure good advantages for RNS over the radix number system at
modular operations of addition, subtraction and multiplication. This is especially true if
multi-digit numbers act as operands.
Strategic pathway in RNS application in computing is the development of cryptographic
information security tools. The research team headed by R.G. Biyashev proposed
modular encryption and digital signature generation algorithms based on the
nonpositional polynomial number system (NPNS) [9-11]. The purpose of research is the
development, research and implementation of information security cryptographic
algorithms, developed on the basis of non-positional polynomial number systems, in
information and communication systems and networks for various purposes. The block
symmetric encryption algorithms developed by them are built on the basis of this
approach and are the research results analyzing the possibility of using the non-
positional encryption algorithm in practice [12]. Also in this direction there are
important works devoted to parallel computation [13-14] as well as papers [15-16].
Taking into account the above stated, the development of computation hardware for the
NPNS is an urgent task, the solution of which will provide opportunities for creating
efficient cryptosystems hardware implementations based on polynomial RNS.


3. Results and Discussion
Nowadays, the residual numbers system (RNS) is often applied for the development
of efficient and high-performance special-purpose processes. RNS is widely applied
in cryptography. For example, modular arithmetic allows to create an effective
cryptographic systems hardware implementation. The non-positional number systems
application allows us to accelerate slow computations in asymmetric encryption
algorithms and increase reliability.
The developed non-positional encryption systems, as a cryptographic strength
criterion, applies not the key length, but the cryptographic strength of the
cryptoalgorithms themselves. The use of non-positional polynomial number systems
(NPNS) also makes possible to increase the algorithms efficiency, as in accordance
with NPNS rules, all arithmetic operations can be performed in parallel using the
modules of the NPNS bases.
For the implementation of the developed algorithms in the form of modules combined
into a cryptographic security system (CSS) works on program efficiency are carried
out. As well as, work is being carried out to build software and hardware and
hardware implementations of cryptographic information security symmetric
algorithms based on the NPNS.
As hardware-software and hardware implementation has the best speed
characteristics, the cryptographic algorithm integrity is guaranteed and allows to
optimize many of the mathematical operations adopted in encryption algorithms. For
developed algorithms software and hardware implementation, parts of the procedures
are implemented in hardware.
The basic device for non-positional polynomial number systems is a device for
multiplying polynomials modulo an irreducible polynomial, where data encryption
and decryption routine calculations are performed.
In this research, we consider an approach to polynomials multiplying A( x) and B( x)
modulo an irreducible polynomial P( x), that is,            A( x)  B( x) mod P( x), where
 deg A( x), deg B( x)  deg P( x).
In each multiplication process step, the partial remainder ri is calculated by the
former partial remainder shaper by adding modulo two double the previous partial
remainder 2ri 1 , with the result of the logical multiplication of the polynomial A( x)
(multiplicand) by the next high bit of the polynomial B( x) – multiplier modulo
irreducible polynomial P( x).
Then the           i -th partial remainder is determined by the formula:
 ri  (2ri 1  A( x)  bi ) mod P( x), where bi is the i -th high bit of the binary image of
the polynomial B( x), (bi  0,1). A is the binary image of the polynomial A( x). P
is binary image of the irreducible polynomial P( x).
The considered multiplier functional diagram is shown in Fig.1.
The device includes RgA for binary image storing of the polynomial A( x)
(multiplicative), shifting the RgB register for the binary image of the polynomial
storing B( x) (multiplier), the RgR register for storing the binary image of an
irreducible polynomial (module), adder AD1, where modulo 2 sum up the doubled
previous remainder 2ri 1 with the multiplicand A( x) with bi 1  1 , forming
 Ci  2ri 1  bi 1  A( x) . Modulo-two adder (AD2), together with a multiplexer (MS),
modifies Ci modulo P( x). Register RgR serves to store intermediate remainders.
Additionally, the multiplier contains a subtracting timing pulse (COUNT), where, at
the end of the operation, the “End of Operation” signal is generated. T Trigger, that
allow the passage of timing pulses into the circuit.
We consider the multiplier operation. By “START” signal, the binary A( x), B( x)
and P( x) polynomials coefficients are received by the blocks of the I1, I2, I3
diagrams, respectively, in the registers RgA( x) and RgB( x), RgP( x). Besides this,
by “START” signal, the binary code (k) of the multiplier digits number is received in
the TP count. The “Start” signal prior to reaching at the single trigger input T is
delayed on the DL.1 delay line. The delay on LZ.1 is determined by the total delay
time on RgA( x), I6, AD1, AD2, MS and the recording time of the remainder in the
 RgR register and the delay time is shifted by Shf (L1).
     Fig. 1 Functional multiplier diagram of polynomials irreducible polynomials modulo

Upon the “Start” signal is reached the trigger input T and translates it into a single
state, that allows the first timing pulse TP1 to pass from the output of the I4 diagram.
At this point in the RgR register, the partial remainder is r0  C0 , with ri 1  1 , since
 egA( x)  deg P( x).
The first timing signal RgB( x) is shifted to the left by one digit, while in the high
order RgB( x) the value of the next coefficient of the polynomial B( x)  bi 2 is fixed,
provided to the control inputs of the I6 diagrams, and to the other inputs of the
polynomial A( x) values coefficients. If, at the same time, bi 2  1, then the
polynomial coefficients are provided to the right-hand inputs of the AD1 adder. TSI at
the time of the RgB( x) shift is delayed by the delay line DL.2 and is provided to the
control inputs of diagram I7, and the information inputs are supplied by the remainder
from the outputs of the Shf diagrams(L1) (L1) 2ri 1 .
From I7 output , the doubled remainder is provided to the left inputs of the AD1
adder. When bi 2  1, the output of this adder is C1 2ri 1  A( x) .
If bi 2  0, C1  2r0 . Next, the C1 value is provided to the left inputs of the adder
modulo 2 (AD2). Moreover, if C1  P( x), then the multiplexer (MS) outputs the
value C1 and is written to the RgR register forming the value r1 .
If C1  P( x), then the MS multiplexer outputs the result C1  P( x) , shaping also the
value r1 . Further, the remainder r1 is shifted one digit to the left by the Shf shifter (L1).
At this point, the I4 diagram output of the receives the TP2 timing pulse shifting the
contents of the RgB( x) register. AD 1, RgA( x) inputs are provided depending on the
value of bi 3 , and the second inputs are provided with the bits of the residual r1
multiplied by two. AD1 output, the C2 value is formed and with the help of the adder
AD2 and the multiplexer MS, C2 is modulo, shaping the remainder r2 .
It is noteworthy that when each timing pulse reaches, a unit is subtracted from the TP
count. Upon n  1 timing signal reaches the RgR register, the result of multiplying
the polynomials modulo the irreducible polynomial is generated and the TPC is set to
“0” and the counter generates a “end of operation” signal that sets the trigger T to the
zero position, preventing the next timing signal from passing output diagram I4. At
the time of last remainder shaping signal "end of operation" are delayed on the delay
line DL.3. After that, the result is given to the outputs by the diagram I8.
If to consider an example of multiplying polynomials modulo an irreducible
polynomial in the multiplier diagram shown in Figure 1.
Let A( x)  x 4  x  1, B( x)  x 4  x 2  1, P( x)  x5  x 2  1.
Binary representations of polynomials are presented below (see Table 1):
 A  100112; B  10,012; P  1001012.
                              Table 1. Example of multiplying
           №             RgR bits                AD1                       AD2
           1                2                     3                         4
                                              A  010011               C0  010011
                                                                               
          Start           b4  1                                       P ( x) 100101
                                                  000000
                                              A  010011               r1  010011

                                                                       C1  100110
                                                                               
          ТP1             b3  0          C1  2r1  100110            P( x) 100101
                                                                       r2  000011
                                             C2  2r2  A              C2  010101
                                                                                 
                                                 000110
          ТP 2             b2  1                  
                                                                        P ( x) 100101
                                                 010011                 r3  010101
                                                 010101                 as C3  P( x)
                                                                        C3  101010
                                                                                
          ТP 3             b1  0        C3  2r3  0  101010          P( x) 100101
                                                                        r4  001111
                                             C4  2r4  A              C4  001101
                                                 011110                          
          ТP 4             b0  1                                      P ( x) 100101
                                                 010011                 r5  001101
                                                 001101

Checking: ( x4  x  1)  ( x4  x2  1)  ( x3  x2  1), accordingly binary display of this
polinomial – 011012.
This algorithm was tested on Nexys 4 Artix-7 FPGA Board. Figure 2 contains
diagram for encoding and decoding the number A in hexadecimal.


                 Fig. 2 The timing diagram of the algorithm for an 8-bit number

       Table 2. The number of resources used in encoding and decoding as a percentage
    Slice Logic      % of the FPGA % of the FPGA % of the FPGA % of the FPGA
    Utilization      resource used resource used resource used resource used
                     when encoding when encoding when encoding when encoding
                     and decoding     and decoding      and decoding     and decoding
                     4-bit code       8-bit code        12-bit code      24-bit code
    Number of
                          0.02%             0.1%            0.39%             1.25%
    Slice Registers
    Number of
                          0.05%            0.33%            1.48%              4.9%
    Slice LUTs
    Number of
                            7%              13%              19%               36%
    bonded IOBs
    Number of
    BUFG/BUFG               3%               3%               3%                3%
    CTRLs
Table 2 shows the number of resources used in encoding and decoding processes in
percents.

4. Conclusions
The precondition research for is the growing need to create efficient hardware
solutions for cryptographic transformations and the difficulties that arise in using the
radix numeration system.
As was stated above, the basic privileges of nonpositional number system applying is
the absence of transfer of digits in the operations of addition and multiplication, and,
consequently, the parallel operations possibility on each of the bases of the system,
which significantly speeds up the calculation process. It stands to mention that most
modern general-purpose processors are not able to effectively perform nonpositional
number system calculations.
For the most effective implementation of computing devices based on the residual
number system, it is required to develop non-standard circuit solutions that effectively
perform calculations in a nonpositional number system.

References
     1. A. Poschmann, Lightweight Cryptography – Cryptographic Engineering for a Pervasive
World. IACR ePrint archive 2009, 516 р., (2009).
     2. J. Guajardo, S. Kumar, C. Paar, J. Pelzl, Efficient software-implementation of finite
fields with applications to cryptography, Acta Applicandae Mathematica, Vol. 93, Iss. 1-3, pp
3-32, 2006.
     3. Kalimoldayev M., Tynymbayev S., Gnatyuk S., Ibraimov M., Magzom M. The device
for multiplying polynomials modulo an irreducible polynomial News of the National Academy
of Sciences of the Republic of Kazakhstan, Series of Geology and Technical Sciences, 2 (434),
pp. 199-205 (2019).
     4. Tynymbayev S., Gnatyuk S.A., Aitkhozhayeva Y.Z., Berdibayev R.S., Namazbayev
T.A. Modular reduction based on the divider by blocking negative remainders, News of the
National Academy of Sciences of the Republic of Kazakhstan, Series of Geology and Technical
Sciences, 2 (434), pp. 238-248, 2019.
     5. N. Gura, Comparing Elliptic Curve Cryptography and RSA on 8-bit CPUs’’ Proc. 6th
Int’l Workshop Cryptographic Hardware and Embedded Systems (CHES 04) LNCS 3156. –
Springer (2004).
     6. D. Schinianakis., T. Stouraitis, RNS-based RSA and ECC cryptography basic
operations, algorithms, and hardware, Embedded Systems Design with Special Arithmetic and
Number Systems, Springer (2017).
     7. Hu Z., Gnatyuk S., Kovtun M., Seilova N. Method of searching birationally equivalent
Edwards curves over binary fields, Advances in Intelligent Systems and Computing, Vol. 754,
pp. 309-319, 2018.
     8. S. Gnatyuk, M. Kovtun, V. Kovtun, A. Okhrimenko, Search method development of
birationally equivalent binary Edwards curves for binary Weierstrass curves from DSTU 4145-
2002, Proceedings of 2nd Intern. Scientific-Practical Conf. on the Problems of Infocom-
munications. Science and Technology (PIC S&T 2015), Kharkiv, Ukraine, рр. 5-8 (2015).
     9. M. Kalimoldayev, R. Biyashev, S. Nyssanbayeva, Y. Begimbayeva, Modification of the
digital signature, developed on the nonpositional polynomial notations, Eurasian Journal of
Mathematical and Computer Applications, ISSN 2306–6172, Volume 4 , Issue 2 (2016),
рр. 33-38 (2016).
    10. R. Biyashev, M. Kalimoldayev, S. Nyssanbayeva, M. Magzom, Development of an
encryption algorithm based on nonpositional polynomial notations, Proceedings of the
International Conference on Advanced Materials Science and Environmental Engineering
рр. 112-118 (2016).
    11. S. Gnatyuk, A. Okhrimenko, M. Kovtun, T. Gancarczyk, V. Karpinskyi, Method of
Algorithm Building for Modular Reducing by Irreducible Polynomial, Proceedings of the
16th International Conference on Control, Automation and Systems, Gyeongju, Korea,
рр. 1476-1479 (2016).
    12. S. Gnatyuk, V. Kinzeryavyy, M. Iavich, D. Prysiazhnyi, Kh. Yubuzova, High-
Performance Reliable Block Encryption Algorithms Secured against Linear and Differential
Cryptanalytic Attacks, CEUR Workshop Proceedings (Proceedings of the 14th International
Conference on ICT in Education, Research and Industrial Applications. Integration,
Harmonization and Knowledge Transfer, Kyiv, Ukraine, May 14-17, 2018), vol. 2104, pp. 657-
668 (2018).
    13. M. Ciet, M. Neve, E. Peeters, J. Quisquater, Parallel FPGA implementation of RSA
with residue number systems-can side-channel threats be avoided?, In 2003 46th Midwest
Symposium on Circuits and Systems, Vol. 2, pp. 806-810, 2003.
    14. R. Hobson, P. McGinn, Co-processor for performing modular multiplication, U.S.
Patent No. 6,209,016. 27 Mar. 2001.
    15. V. Krasnobayev, S. Koshman, A. Yanko, A. Martynenko, Method of Error Control of
the Information Presented in the Modular Number System, Problems of Infocommunications.
Science and Technology: International Scientific-Practical Conference, pp. 35-42, 2018.
    16. V. Krasnobayev, S. Koshman, A Method for Operational Diagnosis of Data
Represented in a Residue Number System, Cybernetics and Systems Analysis, vol. 54, issue 2,
pp. 336-344, 2018.

</pre>