GPU Implementation of Quantum Secure ABC Cryptosystem on
Cuda
Sedat Akleyleka, Ramazan Koyutürkb and Hakan Kutucuc
a
  Ondokuz Mayıs University, Department of Computer Engineering, Samsun, Turkey
b
  Ege University, Department of Mathematics, Izmir, Turkey
c
  Karabuk University, Department of Computer Engineering, Karabuk, Turkey


                 Abstract
                 In this paper, we consider the ABC cryptosystem based on multivariate polynomial systems
                 which is one of the post-quantum cryptosystems. We review the theoretical structure of the
                 ABC cryptosystem and implement it on the GPU by using the NVIDIA CUDA technology.
                 We carry out the GPU and CPU implementation details of the ABC cryptosystem on three
                 computers with different graphics cards. We also give a comprehensive comparison between
                 the implementations. We compute the required number of arithmetic operations for each
                 process: key generation, encryption and decryption. According to the experimental results,
                 the GPU implementations have better memory performance than the CPU implementations.
                 Moreover, the encryption process is faster in the GPU implementation. Due to the structure
                 of ABC cryptosystem, the decryption process is slower in the GPU implementation.

                 Keywords 1
                 Post-quantum cryptography, Multivariate polynomials, GPU, CUDA

1. Introduction
    Public-key encryption systems developed in the late 1970s are becoming a crucial component of
communication networks. Especially with the spread of the internet, public key systems have become
common with the key exchange in SSL (Secure Sockets Layer). Systems developed for the secure
transmission and storage of information have enabled further study and development on public key
encryption. However, it is thought that the classical public key encryption systems will lose their
reliability when quantum computers working with very small cubits are developed and spread out. For
this reason, it is essential to develop public key encryption systems that are resistant to quantum
attacks. Besides, the reliability of the developed systems against quantum attacks should be tested.
    At present, the internet and the other communication systems are mainly based on the Digital
Signature Algorithm (DSA), the Elliptic Curve DSA or the Diffie-Hellman key exchange using
related algorithms, RSA encryption and digital signatures [1]. These cryptosystems guarantee their
reliability due to the difficulty of several theoretical problems such as integer factorization, discrete
logarithm, elliptic curves, etc. However, Peter Shor showed in 1994 that each of these problems could
be solved with quantum computers in polynomial time [2]. For this reason, almost all the encryption
methods we use will become insecure. Multivariate public key cryptography systems are supposed to
be resistant to quantum computer attacks since they rely on a multivariate polynomials over finite
fields that is an NP-hard problem [3, 4].


IntelITSIS’2021: 2nd International Workshop on Intelligent Information Technologies and Systems of Information Security, March 24–26,
2021, Khmelnytskyi, Ukraine
EMAIL: sedat.akleylek@bil.omu.edu.tr (S. Akleylek); ramazankoyuturk@gmail.com (R. Koyutürk); hakankutucu@karabuk.edu.tr (H.
Kutucu)
ORCID: 0000-0001-7005-6489 (S. Akleylek); 0000-0001-7105-5197 (R. Koyutürk); 0000-0001-7144-7246 (H. Kutucu)
            © 2021 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
1.1.    Literature Survey
    When quantum computers of the required size are available, RSA [1], Diffie-Hellman, DSA and
ECC (Elliptic Curve Cryptography), which are currently the most widely used public key encryption
algorithms, will become insecure. The reason for this security problem is the fact that Shor [2]
algorithm, which solves the factorization and the discrete logarithm problems in polynomial time on
quantum computers, can be run quickly. Therefore, alternative methods to classical encryption
methods based on mathematical problems against quantum computer attacks are required. There are
currently five main classes that are believed to be resistant to quantum attacks: multivariate
polynomial systems, lattice, abstract, code and isogeny based cryptography. In this study, the ABC
cryptosystem using multivariate polynomial systems will be considered and the details about the GPU
implementation developed using the NVIDIA CUDA technology will be given.
    The cryptosystems based on multivariate polynomials are becoming very fast due to the
infrastructure they use, but multivariate polynomial systems generally require more memory due to
the key size. This is advantageous for low-cost devices such as smart cards and RFID [5] chips.
Although there are many practical multivariate signature schemes [6,7,8], the number of secure and
efficient multivariate encryption schemes is low.
    Multivariate polynomial systems based cryptosystems are very attractive in the post-quantum
world. GeMSS, LUOV, Rainbow, MQDSS and their derivations have been proposed [8,9,10,11].
ABC is an encryption method defined for the post-quantum world. Recently, a new simple and
efficient multivariate public key encryption scheme based on matrix multiplication called "simple
matrix scheme" was proposed. Then, ABC derivatives whose cubic polynomials are at least randomly
quadratic were given. They showed that they broke it using algebraic attacks that were as difficult as
solving an equation. A generalization of the ABC scheme using a non-square matrix instead of a
square matrix was defined. A new ABC version that uses tensor multiplication of matrices to
eliminate decoding errors was proposed. Optimized implementation of ABC by leveraging the
features of the modern x64 CPU to increase productivity was provided [12].

1.2.    Motivation and Contribution
   The security of the information obtained as a result of certain labor and the applications we use in
our social and professional lives is vitally important. There are several encryption algorithms for the
computers and for quantum computers that are currently prototypes. The ABC encryption algorithm
based on multivariate polynomials is one of these algorithms. In this study, an implementation of the
ABC encryption system which works on CPU and GPU has been developed to provide necessary
security after quantum computers are available. We also give a comprehensive comparison between
the implementations.
   The organization of the paper as follows: In Section 2, we recall basic parts of the ABC
cryptosystem. In Section 3, we give implementation details as well as a comprehensive comparison.
We conclude the paper in Section 4.


2. ABC Cryptosystem
   In this section, the details of ABC cryptosystem is given. ABC is one of the quantum attack
resistant cryptosystems. In [12], an efficient implementation of ABC cryptosystem was provided.
   The security of ABC cryptosystem depends on the hardness of the solution of multivariate
polynomial systems. In a multivariate quadratic (MQ) polynomial system, there are 𝑚𝑚 equations
 (1) − (𝑚𝑚) and 𝑛𝑛 variables as follows:
                                                    𝑛𝑛   𝑛𝑛                  𝑛𝑛

                     𝑝𝑝   (1)
                                (𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 ) = � � 𝑝𝑝𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗 + � 𝑝𝑝𝑖𝑖(1) 𝑥𝑥𝑖𝑖 + 𝑝𝑝0(1)
                                                              (1)
                                                                                                     (1)
                                                    𝑖𝑖=1 𝑗𝑗=1              𝑖𝑖=1
                                                             𝑛𝑛   𝑛𝑛                  𝑛𝑛

                              𝑝𝑝   (2)
                                         (𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 ) = � � 𝑝𝑝𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗 + � 𝑝𝑝𝑖𝑖(2) 𝑥𝑥𝑖𝑖 + 𝑝𝑝0(2)
                                                                       (2)
                                                                                                                            (2)
                                                             𝑖𝑖=1 𝑗𝑗=1              𝑖𝑖=1
                                                                             .
                                                                             .
                                                                             .
                                                             𝑛𝑛   𝑛𝑛                  𝑛𝑛

                          𝑝𝑝   (𝑚𝑚)
                                         (𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 ) = � � 𝑝𝑝𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗 + � 𝑝𝑝𝑖𝑖(𝑚𝑚) 𝑥𝑥𝑖𝑖 + 𝑝𝑝0(𝑚𝑚)
                                                                       (𝑚𝑚)
                                                                                                                            (𝑚𝑚)
                                                             𝑖𝑖=1 𝑗𝑗=1              𝑖𝑖=1


    In MQ problem, the aim is to find 𝑥𝑥´ = (𝑥𝑥´ 1 , … , 𝑥𝑥´ 𝑛𝑛 ) such that 𝑝𝑝(1) (𝑥𝑥´ ) = 0, … , 𝑝𝑝(𝑚𝑚) (𝑥𝑥´ ) = 0. It’s
proved that MQ problem is NP-hard when 𝑚𝑚 ≅ 𝑛𝑛 over 𝐺𝐺𝐺𝐺(2).
    The encryption/decryption process of ABC cryptosystem is given in Figure 1. An invertible
multivariate quadratic polynomial system is needed such as 𝐹𝐹: 𝐹𝐹 𝑛𝑛 → 𝐹𝐹 𝑚𝑚 . Then, to hide the linear
structure of public key, linear transformations are used such as 𝐿𝐿1 : 𝐹𝐹 𝑛𝑛 → 𝐹𝐹 𝑛𝑛 and 𝐿𝐿2 : 𝐹𝐹 𝑚𝑚 → 𝐹𝐹 𝑚𝑚 .
Then, the public key is formed as 𝐹𝐹´ = 𝐿𝐿2 𝑜𝑜𝑜𝑜𝑜𝑜𝐿𝐿1 . The private key set includes 𝐿𝐿1 , 𝐹𝐹 and 𝐿𝐿2 . ABC
cryptosystem has three phases: key generation, encryption and decryption.


Figure 1:The encryption/decryption process of ABC cryptosystem

Key Generation : Let F be a finite field with q elements. Let 𝑠𝑠 ∈ 𝑆𝑆, 𝑛𝑛 = 𝑠𝑠 2 and 𝑚𝑚 = 2𝑛𝑛. Then
generate A, B, C matrices in the following form:

                𝑥𝑥1                 ⋯ 𝑥𝑥𝑠𝑠             𝑏𝑏1                       ⋯    𝑏𝑏𝑠𝑠              𝑐𝑐1        ⋯ 𝑐𝑐𝑠𝑠
   𝐴𝐴 = �        ⋮                  ⋱ ⋮ �, 𝐵𝐵 = �       ⋮                        ⋱     ⋮ �, 𝐶𝐶 = �       ⋮         ⋱ ⋮�
         𝑥𝑥(𝑠𝑠−1)(𝑠𝑠+1)             ⋯ 𝑥𝑥𝑛𝑛       𝑏𝑏(𝑠𝑠−1)(𝑠𝑠+1)                  ⋯    𝑏𝑏𝑛𝑛        𝑐𝑐(𝑠𝑠−1)(𝑠𝑠+1)   ⋯ 𝑐𝑐𝑛𝑛

    𝐹𝐹[𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 ] is obtained by using (𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 ) monomials. Then, by the linear combinations of
(𝑥𝑥1 , … , 𝑥𝑥𝑛𝑛 ), (𝑏𝑏1 , … , 𝑏𝑏𝑛𝑛 ) and (𝑐𝑐1 , … , 𝑐𝑐𝑛𝑛 ) are computed. 𝐸𝐸1 = 𝐴𝐴𝐴𝐴 and 𝐸𝐸2 = 𝐴𝐴𝐴𝐴 are calculated.
Two invertible linear maps 𝐿𝐿2 : 𝐹𝐹 𝑚𝑚 → 𝐹𝐹 𝑚𝑚 and 𝐿𝐿1 : 𝐹𝐹 𝑛𝑛 → 𝐹𝐹 𝑛𝑛 are randomly chosen. Then, the public
key is the composition 𝐹𝐹´ = 𝐿𝐿2 𝑜𝑜𝑜𝑜𝑜𝑜𝐿𝐿1 : 𝐹𝐹 𝑛𝑛 → 𝐹𝐹 𝑚𝑚 . The private key includes B, C matrices and 𝐿𝐿1 , 𝐿𝐿2
linear transformations.

Encryption: Let 𝑑𝑑 ∈ 𝐹𝐹 𝑛𝑛 be a message (plaintext). Then, the ciphertext is computed as follows: 𝑐𝑐 =
𝐹𝐹´ (𝑑𝑑) and 𝑐𝑐 ∈ 𝐹𝐹 𝑚𝑚 .

Decryption: After receiving the ciphertext, 𝑧𝑧 = 𝐿𝐿−1             −1               −1
                                                   2 (𝑐𝑐), 𝑦𝑦 = 𝐹𝐹 (𝑧𝑧) and 𝑑𝑑 = 𝐿𝐿1 (𝑦𝑦) are performed.
                         𝑛𝑛
Then, the message 𝑑𝑑 ∈ 𝐹𝐹 is obtained. The detailed decryption process is as follows:

         1)           𝑧𝑧 = 𝐿𝐿−1                               𝑛𝑛                ´       ´
                             2 (𝑐𝑐) is computed. Then, 𝑧𝑧 ∈ 𝐹𝐹 is used to form 𝐸𝐸1 and 𝐸𝐸2 matrices.

                        𝑧𝑧1                ⋯    𝑧𝑧𝑠𝑠                𝑧𝑧𝑛𝑛+1           ⋯ 𝑧𝑧𝑛𝑛+𝑠𝑠
         𝐸𝐸1 = �         ⋮                 ⋱     ⋮ �, 𝐸𝐸2 = �          ⋮             ⋱    ⋮ �
                   𝑧𝑧(𝑠𝑠−1)(𝑠𝑠+1)          ⋯    𝑧𝑧𝑛𝑛          𝑧𝑧𝑛𝑛+(𝑠𝑠−1)(𝑠𝑠+1)      ⋯ 𝑧𝑧𝑚𝑚
        2)       𝑦𝑦 = (𝑦𝑦1 , … , 𝑦𝑦𝑛𝑛 ) is found such that 𝐹𝐹(𝑦𝑦) = 𝑧𝑧.
        3)       𝑑𝑑 = 𝐿𝐿−1
                        1 (𝑦𝑦1 , … , 𝑦𝑦𝑛𝑛 ) is computed.

    The decryption failure probability in Step 2 is 1/q. This means that there might be more than one
solution of 𝑦𝑦 (1) , … , 𝑦𝑦 (𝑙𝑙) . Therefore, each possible solution should be checked. In order to implement
ABC cryptosystem, finite field arithmetic (polynomial addition, multiplication inversion), matrix
multiplication, matrix inversion, matrix transpoze, and Gaussian elimination algorithms are needed.


3. Implementations of the ABC Algorithm
   This section details the implementations of the ABC cryptosystem running on CPU and GPU
separately. The ABC system operates in three main stages that are “key generation”, “encryption” and
“decryption”. The parameters in the implementations on the CPU and GPU are given in Table 1. The
implementations were carried out on three different platforms with the different operating systems;
Windows 8 x64 and Windows 10 x64. The experiments were conducted on NVIDIA GeForce GT
740M, NVIDIA GeForce GT 840M and NVIDIA GeForce GTX 1050Ti graphics cards. One of the
processes in which performance improvement is required is the addition in GF(28).

Table 1
Parameters in the implementations
                S                                        8
                N                                       64
               M                                        128
      CENTRAL_MAP_SIZE                                 2080
        PUBLIC_KEY_SIZE                               266240
        SECRET_KEY_SIZE                               294912

3.1.    CPU Implementation
   In the implementation on the CPU, after the predefined variables and the variables in the Main()
function are defined, the implementation is executed by calling the functions written for key
generation, encryption and decryption in the given order. The number of calls of the operations
mentioned in Section 2 by three functions (key generation, encryption, decryption) is given in Table 2
for the CPU and GPU implementation.

Table 2
The number of calls of the operations in the implementation
                                   Key generation      Encryption            Decryption         Total
 Addition                              4452352              0                 286720           4739072
 Subtraction                               0                0                 2064512          2064512
 Multiplication                      107740992           532480               2617792         110891264
 Division                                  0                0                  16256            16256
 Inverse                                  192               0                    1               193
 Matrix Inverse                            2                0                    0                2
 Matrix Multiplication                    257               0                    0               257
 Matrix Transposition                      1                0                    0                1
 Computation of Coefficients              128               0                    0               128
 Echelon                                   0                0                    1                1
3.2.    GPU Implementation
   Addition, subtraction and multiplication in the finite field GF(28) can be implemented on the
Graphics Processing Unit (GPU) to provide performance improvement for the ABC cryptosystem.
   The main task of a GPU is to display the images on the screen that are created on the computer.
The first GPUs only performed this task. Over time, the Central Processing Unit (CPU) was
inadequate for major computational problems, and the idea of parallel computing using the GPU is
revealed. A GPGPU model has been created to provide that GPUs have a programmable interface and
can be programmed in high- level languages.
   Compute Unified Device Architecture (CUDA) is a parallel computing architecture introduced by
NVIDIA in 2006 to take advantage of computing power of the GPU. CUDA is an application
programming interface (API) model that supports programming languages such as FORTRAN,
C/C++ and Python. It runs on Linux, Windows and Mac Osx platforms. Its advantages over its
competitors include shared memory usage, faster data reading from the GPU, and bit-level operation.
   The GPU differs from the CPU in that it has a SIMD (Single Instruction Multiple Data)
architecture. Figure 2 shows the schematic comparison of CPU and GPU structures. While CPU
calculations are performed in series, GPU performs calculations in parallel.


Figure 2: Comparison of CPU and CPU structures

   One of the most commonly used operations in the ABC algorithm is the addition in GF(28). When
the overall algorithm is considered, the summation function is called once in the “key generation”
function and three times in the “decryption” function.

              global void addcuda(WORD *a, WORD *b, WORD *c) {
              *c = *a ^ *b;
          }

   Similar to the function definition in C++ programming language, but a function other than the
main function is defined in CUDA. “a”, “b” and “c” are pointer variables of type “WORD”. “c” is the
variable that will store the result of the summation of “a” and “b” and carries it over the GPU to the
CPU.
   Calling the “addcuda” function in the key generation function is as follows. First, “*tempA”,
“*tempB”, “*tempC” and “tempFB” are defined as WORD data type. Then the cudaMalloc which is a
pre-defined CUDA function is used to define the size of the GPU. With the cudaMemcpy function,
the values on the CPU are copied to the GPU. We call the function on the GPU with “addcuda <<< 1,
1 >>> ()”. After completing this process, we use the cudaMemcpy function to copy the values on the
GPU back to the CPU.
   WORD *tempA, *tempB, *tempC, tempFB;
   cudaMalloc((void**)&tempA, sizeof(WORD));
   cudaMalloc((void **)&tempB, sizeof(WORD));
   cudaMalloc((void **)&tempC, sizeof(WORD));
   for (s = 0; s < VARIABLE; s++) {
       for (t = s; t < VARIABLE; t++) {
            if (t == s)
                FB[flag++] = V[s * VARIABLE + t];
            else {
                cudaMemcpy(tempA, &V[s*VARIABLE+t], sizeof(WORD), cudaMemcpy HostToDevice);
                cudaMemcpy(tempB, &V[t*VARIABLE+s], sizeof(WORD), cudaMemcpy HostToDevice);
                 addcuda<<<1,1>>>(tempA,tempB,tempC);
                 cudaMemcpy(&tempFB,tempC,sizeof(WORD), cudaMemcpyDeviceToHost);
                 FB[flag++] = tempFB;
            }
       }
   }
   cudaFree(tempA);
   cudaFree(tempB);
   cudaFree(tempC);


3.3.       Comparison of the implementations
   The implementation prepared in .Net environment using Microsoft Visual Studio for the ABC
cryptosystem has been tested on three different computers, each with different graphics processors.
Table 3 summarizes the computer, graphics processor, and operating system information used.

Table 3
Test environment
 Computer                   CPU                    RAM          Graphics Card          Operating
                                                                                        System
       1         Intel(R) Core(TM) i7-4500U        12 GB    GeForce GT 740M 2 GB       Windows
                       CPU @ 1.80 GHz                                                     8.1
       2         Intel(R) Core(TM) i7-4210U        8 GB     GeForce GT 840M 2 GB       Windows
                       CPU @ 1.70 GHz                                                     10
       3         Intel(R) Core(TM) i7-8750H        16 GB   GeForce GTX 1050Ti 4 GB     Windows
                       CPU @ 2.20 GHz                                                     10

In Table 4, the total running time of both CPU and GPU implementations prepared for the ABC
cryptosystem in three different environments are given in microseconds (µs).

Table 4
Execution times of the implementations
                                                 CPU                           GPU
             Computer1                        4336000 µs                    6089299 µs
             Computer2                        5123000 µs                    6589299 µs
             Computer3                        2214000 µs                    4856900 µs

Table 5 shows the memory usage of the implementations prepared for the ABC cryptosystem on the
CPU for three different environments.
Table 5
Memory usage of the implementations
                                               CPU                              GPU
           Computer1                          90 MB                            76 MB
           Computer2                          100 MB                           87 MB
           Computer3                          127 MB                           118 MB

In Table 6, the running times of the encryption and decryption functions on both CPU and GPU
implementations prepared for the ABC cryptosystem in three different environments are given in
microseconds (µs).

Table 6
Execution times of the encryption and decryption functions
                                       CPU                                    GPU
                            Encryption        Decryption         Encryption         Decryption
     Computer1               16000 µs         132000 µs            8000 µs          3291599 µs
     Computer2               19000 µs         145000 µs           11000 µs          3701789 µs
     Computer3               10000 µs          78000 µs            3000 µs          2479900 µs


    The running times of the implementations and the functions required for the ABC cryptosystem
shown in Table 2 in detail are evaluated by counting the number of times the key generation,
encryption and decryption are called and used. As can be noticed in the call of functions the entire
addition operations in GF(28) are performed on the GPU. Furthermore, although an increase in total
time has been observed, an improvement has been achieved in the process of encryption. In addition
to running time improvements, there is an improvement in memory usage. The running time in Table
4 is measured for both implementations as follows: time starts before the key generation and ends
with the completion of the decryption.
    The running time of the CPU implementation is 4336000 microseconds in Computer1. The
running time of GPU implementation is 6089299 microseconds on the same computer. There is a 40%
increase in time compared to the CPU implementation. However, the memory consumption of the
CPU implementation is 90MB while the memory consumption of the GPU implementation is 76MB.
That is, a 15% memory saving is achieved compared to the CPU implementation. In the encryption
function, the time is reduced from 16000 microseconds to 8000 microseconds and a 50% time-saving
is achieved.
    The running time of the CPU implementation is 5123000 microseconds in Computer2. The
running time of GPU implementation is 6589299 microseconds on the same computer. There is a 28%
increase in time compared to the CPU implementation. However, the memory consumption of the
CPU implementation is 100MB while the memory consumption of the GPU implementation is 87MB.
That is, a 13% memory saving is achieved compared to the CPU implementation. In the encryption
function, the time is reduced from 19000 microseconds to 11000 microseconds and a 42% time-
saving is achieved.
    The running time of the CPU implementation is 2214000 microseconds in Computer3. The
running time of GPU implementation is 4856900 microseconds on the same computer. There is a 19%
increase in time compared to the CPU implementation. However, the memory consumption of the
CPU implementation is 127MB while the memory consumption of the GPU implementation is
118MB. That is, a 7% memory saving is achieved compared to the CPU implementation. In the
encryption function, the time is reduced from 10000 microseconds to 3000 microseconds and a 70%
time-saving is achieved.
    Regardless of the working environment, the GPU implementation is slower than the CPU
implementation, but it is better in memory consumption. The reason for this difference is that the
addition operations in GF(28) are performed on the GPU.
4. Conclusion
   It is believed that the ABC, one of the multivariate public key cryptosystems, will be robust to
computational attacks using quantum computers. As a result of this study, an application or
implementation that works with a graphics processor regardless of the working environment of the
computers is better in memory usage. The running times are evaluated by counting the number of
calls of the key generation, encryption and decryption. The number of the arithmetic operations is
computed. The memory consumption of the CPU implementation is larger than the GPU
implementation in all platforms. In addition to the efficiency in terms of memory consumption,
running the entire implementation on the GPU will allow better results to increase performance in
terms of time.

   Acknowledgement
   The first and second authors were partially supported by TÜBİTAK under grant no. EEEAG-
116E279.

5. References
   [1] R. L. Rivest, A. Shamir, L. Adleman, A Method for Obtaining Digital Signatures and Public-
       Key Cryptosystems. Commun, ACM 21 2 (1978) 120–126
   [2] P. Shor, Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a
       Quantum Computer, SIAM J. Computing 26 5 (1997) 1484–1509
   [3] J. A. Buchmann, D. Butin, Post-Quantum Cryptography: State of the Art. The New Code-
       breakers, LNCS 9100 (2016) 88-108.
   [4] N. Kundu, S.M. Debnath, D. Mishra, A Secure and Efficient Group Signature Scheme Based
       On Multivariate Public Key Cryptography, Journal of Information Security and Applications
       58 102776 (2021). doi: 10.1016/j.jisa.2021.102776
   [5] J. Chen, J. Ning, J. Ling, T.S.C. Lau, Y. Wang, A New Encryption Scheme For Multivariate
       Quadratic Systems, Theoretical Computer Science 809 (2020) 372-383
   [6] N. Kundu, S.M. Debnath, D. Mishra, T. Choudhury, Post-Quantum Digital Signature Scheme
       Based On Multivariate Cubic Problem, Journal of Information Security Applications, 53
       (2020) 102512. doi: 10.1016/j.jisa.2020.102512
   [7] D. Smith-Tone, C. Tone, A Multivariate Cryptosystem Inspired By Random Linear Codes,
       Finite Fields and Their Applications 69 101778 (2021). doi: 10.1016/j.ffa.2020.101778
   [8] Post-Quantum          Cryptography       Round      2     Submissions,       2020.    URL:
       https://csrc.nist.gov/projects/post-quantum-cryptography/round-2-submissions
   [9] M. S. Chen, A. Hülsing, J. Rijneveld, S. Samardjiska, P. Schwabe, From 5-pass MQ-based
       identification to MQ-based signatures, in: Proceedings of 22th Annual International
       Conference on the Theory and Application of Cryptology and Information Security,
       ASIACRYPT, Hanoi, 2016, vol. 10032, pp. 135-165
   [10] S. Akleylek, M. Soysaldı, A novel 3-pass identification scheme and signature scheme based
       on multivariate quadratic polynomials, Turkish Journal of Mathematics 43 1 (2019) 241-257
   [11] S. Akleylek, M. Soysald, W.-K. Lee, S.O. Hwang, D. Chee-Keong Wong,Novel Post-
         quantum MQ-based Signature Scheme for Internet of Things with Parallel Implementation,
         IEEE Internet of Things Journal (2021). doi:10.1109/JIOT.2020.3038388, 2021.
   [12] Z. Peng, S. Tang, J. Chen, C. Wu and X. Zhang, Fast Implementation of Simple Matrix
         Encryption Scheme on Modern x64 CPU, in: ISPEC 2016, 10060, pp. 151-166.