<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hardware Implementation for Code-based Post-quantum Asymmetric Cryptography.∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kristjane Koleci</string-name>
          <email>kristjane.koleci@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Baldi</string-name>
          <email>m.baldi@univpm.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Martina</string-name>
          <email>maurizio.martina@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Masera</string-name>
          <email>guido.masera@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universit`a Politecnica delle Marche</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a dedicated hardware implementation of the LEDAcrypt cryptosystem, which uses Quasi-Cyclic Low-Density Parity-Check codes and a decoding algorithm known as Q-decoder for the decryption function. The designed architecture is synthesized for both FPGA and ASIC technologies, featuring an intrinsic scalability over a wide range of parallelism degrees, which makes it possible to target multiple application scenarios, with different trade-offs between decryption latency and implementation complexity. The proposed system achieves a large speed-up over both software execution and a previous hardware implementation, with a the decryption latency as low as 3.16 ms for the FPGA version, and 1.2 ms when synthesized for a 65 nm CMOS technology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The efficiency of post-quantum cryptographic algorithms when implemented in hardware
is considered among the requirements of candidates to the NIST post-quantum cryptography
standardization process [22]. This motivates research in this area, and in the design of efficient
hardware solutions for the implementation of these new cryptographic primitives.
1.1</p>
      <sec id="sec-1-1">
        <title>Related work</title>
        <p>Several works have already appeared in the literature concerning the hardware implementation
of post-quantum cryptographic primitives. Among them, isogeny-based cryptographic
primitives have been considered in [17, 16] and lattice-based primitives have been considered in
[13, 12, 8], while primitives based on the lattice-based problem variant known as ring-learning
with errors (Ring-LWE) have been considered in [1].</p>
        <p>Concerning the implementation of code-based post-quantum primitives, the classic McEliece
scheme based on binary Goppa codes has been considered in [24], while variants based on
QuasiCyclic Moderate-Density Parity-Check (QC-MDPC) codes have been considered in [18, 15]. The
LEDAcrypt post-quantum cryptographic primitives based on QC-LDPC codes have also been
recently considered for hardware implementation in [14].</p>
        <p>Concerning post-quantum digital signatures, a hardware-oriented analysis of NIST
postquantum cryptography standardization candidates is reported in [23].
1.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Contribution</title>
        <p>This work addresses the implementation of an hardware accelerator for the LEDAcrypt
primitives. The proposed architecture is synthesizable for both ASIC and FPGA technologies.
Moreover, it is scalable in terms of processing parallelism, thus achieving different trade-offs
between performance and implementation complexity.</p>
        <p>The paper is organized as follows: firstly the description of the algorithm is given in Section
2, then the overview of the architecture and additional details on one key processing unit are
provided in Section 3 and 4. Finally, the synthesized results are summarized in Section 5 and
the conclusions are given in Section 6.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Algorithm</title>
      <p>From the complexity standpoint, a crucial algorithm for the LEDAcrypt primitives is the
decoding algorithm, which is an iterative algorithm that estimates a sparse error vector e starting
from a syndrome vector s, by exploiting the knowledge of two secret matrices: the secret
QCLDPC code parity-check matrix H and the secret transformation matrix Q. In LEDAcrypt,
this is performed through an iterative algorithm derived from the classic bit-flipping decoding
algorithm, and known as Q-decoder.</p>
      <p>The decoding algorithm starts from an initial syndrome s that is computed from the
encrypted message m and the two secret matrices H and Q as follows:</p>
      <p>
        Initial Syndrome : s(0)T = (HQ)mT
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
Then, at every iteration l = 1, · · · , Itmax, the algorithm generates an updated syndrome s(l)
and a refined estimate of the error vector e(l), by computing the following quantities:
      </p>
      <p>Sigma : σ(l) = s(l−1)H
Correlation : ρ(l) = [ρ(1l), ρ(2l), . . . , ρ(nl)] = σ(l)Q
Thresholds : b(l) =</p>
      <p>max
j=1,··· ,n
!ρ(l)"</p>
      <p>j
Positions : Pl = {v ∈ [1, n]|Rv(l) &gt; b(l)}</p>
      <p>
        Errors : e(l) = e(l) = e(l−1) + # qv
Syndrome : s(l) = s(0) + e(l)HT
v∈P l
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
where qv is the vth row of QT . The product in 1 and 7 are performed in GF(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), while the result
of equations 2 and 3 are integers. The stop condition is reached when s = 0 (s being the sum
of the entries of s) or l = Itmax. When the decoding process terminates recovering the correct
error vector, such a vector can be straightforwardly used to retrieve the cleartext message m.
      </p>
      <p>LEDAcrypt provides various parameters related to the involved matrices and vectors. Table
1 gives the main algorithm parameters for different levels of security (Category): n0 is the
number of circulant blocks forming the parity-check matrix H, p is the size of the circulant
blocks, dv and m are the numbers of asserted bits in each column of the matrices H and Q
respectively, t is the number of intentional errors used for encryption and Itmax is the maximum
number of iterations required to successfully recover the encrypted message.</p>
      <p>The estimation of the most time consuming units in the decoder is important to achieve
an efficient implementation. Therefore, we used a Matlab implementation with the relevant
profiling capabilities to derive the processing time required for each main task. Table 2 provides
the collected results for a code with n0 = 2, p = 27, 779 and Itmax = 3. Two versions of
the algorithm software implementation have been simulated: the first version is the original
model [3], while the second version has been obtained by exploiting specific Matlab options to
accelerate the execution. The Matlab code has been run on a laptop with 16GB of RAM and
Intel Core i7-6700 HQ CPU with 2.60GHz clock frequency.</p>
      <p>It is clear from the profiling results that the most time consuming functions in the
Qdecoder are the Initial Syndrome calculation and the Syndrome update. Therefore, a parallel
formulation of these tasks is desirable to map them onto a dedicated hardware architecture and
to accelerate the whole algorithm. An additional advantage that is expected from the hardware
implementation of the Q-decoder is the lower energy dissipation.
The complete architecture is divided into three main blocks (Figure 1): the Memory, the Control
Unit (CU) and the Data Path (DP). The Memory unit contains several memory components
that store the data structures used by the decoding process. The CU is implemented as a Finite
State Machine (FSM) that drives the execution of the algorithm. Finally, the DP includes the</p>
      <sec id="sec-2-1">
        <title>Memory</title>
        <p>Message
Syndrome</p>
        <p>Sigma
Correlation
Error
H
Q
L</p>
      </sec>
      <sec id="sec-2-2">
        <title>Control Unit</title>
        <p>Idle</p>
        <p>Initial
Syndrome</p>
        <p>Cmp
Message
Corrected
Message Not
Corrected
true
false</p>
        <p>Syndrome,
Correlation and
Message
Evaluation
SW = 0</p>
        <p>false
It &lt; ItMax true</p>
      </sec>
      <sec id="sec-2-3">
        <title>Data Path</title>
        <p>Syndrome
Computation</p>
        <p>Unit
Correlation
Computation</p>
        <p>Unit</p>
        <p>Message
Update</p>
        <p>Unit
Syndrome
Update</p>
        <p>
          Unit
resources necessary to process all the algorithm variables and it is structured into four main
units: (i) the Syndrome Computation Unit (SCU), which includes the evaluation of the Initial
Syndrome s(0) as in (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), (ii) the Correlation Computation Unit (CCU), which evaluates the
Correlation ρ, namely (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), (iii) the Message Update Unit (MUU), which derives the errors e in
the message and its correction (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ), and (iv) the Syndrome Update Unit (SUU), which iteratively
updates the syndrome (s(l)) based on the obtained errors (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ).
        </p>
        <p>The Control Unit FSM basically follows the sequence of processing tasks given in Section
2. The iterative nature of the algorithm and the data dependencies within a single iteration
impose a sequential execution of the key steps. However, several operations inside the syndrome
and correlation computations allow for a parallel execution. Moreover, the similarities among
these operations suggest the reuse of some hardware resources in both SCU and CCU.
3.1</p>
        <sec id="sec-2-3-1">
          <title>Memory Organization</title>
          <p>The vectors and matrices sizes are derived from Table 1. There are three kinds of vectors
handled by the algorithm: vectors containing positions, binary vectors and integer vectors.
Vectors containing positions in the range [0, p] require np = ⌈log2(p)⌉ bits. Binary vectors
are stored in a matrix format, as Nb bit words, where Nb is the decided degree of processing
parallelism. Integer vectors are stored in a matrix format, as nc and ns bit words for Correlation
and Sigma values, respectively.</p>
          <p>Based on the dynamic range of each data, the expected size for the required memories can
be evaluated as in Table 3. The reported size values are related to the case of two circulant
block (n0 = 2). Therefore, every array (except the syndrome s and the matrix Q) is represented
as a two-block component. As an example, the message m is expressed as m = [m0 m1] and
L = HQ = [L0 L1]T . Similarly, the matrix Q is divided into four components (Q00 to Q11).
where ⊕ indicates the bit wise xor operation, while the CCU calculates
$m0
m1%
&amp; L0 '</p>
          <p>L1
(m0L0*</p>
          <p>(s0*
= )</p>
          <p>⊕
m1L1
+ = )⊕+</p>
          <p>s1
σ = s $ H0</p>
          <p>H1 % = $ σ0</p>
          <p>σ1 %
ρ = $ σ0
σ1 %
&amp; Q00</p>
          <p>Q10</p>
          <p>Q01 '
Q11
= )
(σ0Q00</p>
          <p>+
σ1Q10
σ0Q01*</p>
          <p>
            + + = $ρ0
σ1Q11
ρ1%
(
            <xref ref-type="bibr" rid="ref8">8</xref>
            )
(
            <xref ref-type="bibr" rid="ref9">9</xref>
            )
(
            <xref ref-type="bibr" rid="ref10">10</xref>
            )
In both units, the key requirement is the calculation of the product between a vector and a
sparse cyclic matrix, that is a VectorByCirculant operation. In the SCU a binary vector m is
multiplied, while in the CCU the product is needed twice, involving a binary vector in (
            <xref ref-type="bibr" rid="ref9">9</xref>
            ) and
an integer vector in (
            <xref ref-type="bibr" rid="ref10">10</xref>
            ). However, a unified architecture can be conceived to cover both types
of product, as detailed in Section 4.
3.3
          </p>
        </sec>
        <sec id="sec-2-3-2">
          <title>Message Update Unit</title>
          <p>The update of the message requires to find the positions of the errors intentionally inserted at
the encoding side. This estimation is based on Syndrome and Correlation. The circuit in Figure
2(a) shows the evaluation of the syndrome weight. Initially, the syndrome weight is set to 0,
then the Syndrome memory is read row by row and the number of ones per row is accumulated
in the Syndrome Weight register.</p>
          <p>The circuit for the evaluation of the error is in Figure 2(b). A row is read from the
Correlation memory and its elements are compared with the threshold b: if at least one value in the
row is higher than b (Cmp or = 1), the error position is saved into the Error memory, in terms
of row address and location within the row (index). The process ends when the last row of the
Correlation memory is checked. The threshold is derived from a Look-Up-Table that returns a
value of b given the input range of Syndrome Weighs (SWs) [14].</p>
          <p>Finally, Figure 2(c) shows the update of the message vector. The circuit receives a message
word (Nb elements) from the Message memory and the corresponding error positions from the
Error memory. A set of Nb xor gates applies the correction and the updated message is stored
back into the Message memory.</p>
          <p>Syndrome Row Register
Syndrome Row Register</p>
          <p>N bit
N bit</p>
          <p>Cou1nster Cou1nster
N bit N bit
N bit</p>
          <p>+ +
Syndrome Syndrome
Weight Weight</p>
          <p>N bit
(a) Syndrome Weight
evaluation logic.</p>
          <p>Correlation Row Register
Correlation Row Register</p>
          <p>Message Row Regis</p>
          <p>Message Row Register
Threshold &gt; T&gt;hre&gt;sho&gt;ld &gt;&gt; &gt;&gt; &gt;&gt; &gt;&gt; &gt; &gt; &gt; &gt;
Cmp_or</p>
          <p>Cmp_Comrp</p>
          <p>Cmp
(b) Error Position evaluation.</p>
          <p>Bit
Selection</p>
          <p>Bit
Selection</p>
          <p>Updated MessageURpedgaitsetedrMessage Re</p>
          <p>Decoded
index</p>
          <p>Message Row Register
Updated Message Register
(c) Message Update.</p>
          <p>Decoded</p>
          <p>index
3.4</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>Syndrome Update Unit</title>
          <p>
            The syndrome update equations (
            <xref ref-type="bibr" rid="ref6">6</xref>
            ) and (
            <xref ref-type="bibr" rid="ref7">7</xref>
            ) are modified into a different form that simplifies
the hardware implementation:
e(sly)n = e(l)L
s(l) = s(l−1) ⊕ e(sly)n
where L = HQ and esyn is the error location vector for the syndrome. The latter equation
can be implemented by means of the same circuit given in Figure 2(c), while a sparse vector by
circulant product is needed for the calculation of e(sly)n.
          </p>
          <p>This operation is implemented as described in Algorithm 1.</p>
          <p>Algorithm 1 SparseVectorByCirculant</p>
          <p>Input: el,L;
Output: elSyn;
r ⇐ 0;
indexSyn = 0;
while indexP os &lt; dv do
while indexErr &lt; M axP os do
elSyn(indexSyn) = mod(el(indexErr) −
L(indexP os), p);
indexSyn = indexSyn + 1;
end while
end while
Algorithm 2 VectorByCirculant</p>
          <p>Input: v(1, n),Pos(1, d);
Output: r(1, n);
r ⇐ 0;
indexP os = 1;
while indexP os ≤ d do
k = Pos(indexP os);
vshifted = [v(k : n), v(1 : k − 1)] ;
r ⇐ r + vshifted
end while
4</p>
          <p>
            Vector By Circulant product architecture
The Vector By Circulant unit, included in both the SCU and CCU, evaluates the product
r = Av of a cyclic and sparse binary matrix A by a vector (integer or binary), v. In a cyclic
matrix, all the rows are cyclic shifts of the first one. An example of the Vector By Circulant
product is given in (
            <xref ref-type="bibr" rid="ref11">11</xref>
            ) and (
            <xref ref-type="bibr" rid="ref12">12</xref>
            ) for the case of size p = 15, with dv = 2 (a2 and a5 asserted
values in the first row of A). The real values of p are given in Table 1. The strategy described
is the same employed in the QcBits Algorithm[10].
          </p>
          <p>r14 = a1v0 + a2v1 · · · + a0v14 + v4</p>
          <p>The straightforward implementation of the product, i.e. the direct mapping of these
equations into an hardware architecture is not efficient, because of the size of v and A .</p>
          <p>
            However, it can be seen from (
            <xref ref-type="bibr" rid="ref12">12</xref>
            ) that the elements of r are obtained by combining shifted
versions of the v vector. For example, in the given example for p = 15, r can be calculated by
taking two circular shifts of v, respectively starting at positions 2 and 5 (the asserted values in
the first row), and adding them element by element. In general, r can be calculated by adding
(modulo-2 addition for the binary case) dv shifted versions of v (indicated as vshifted). Their
initial positions are determined by the elements in the first row of A, stored in the vector Pos.
The product calculation is detailed in Algorithm 2.
          </p>
          <p>To efficiently implement Algorithm 2, we proceed in a partially parallel way and update
Nb elements of r at a time. Let us assume that the A matrix is stored in the sparse format,
i.e. all non-zero elements in the first row are available in a linear array (Pos). Moreover, we
assume that v is stored in the memory Mv, organized as p/Nb words of Nb elements. At every
read from the memory, we obtain Nb elements of v in parallel. Said x the initial position of a
shifted version of v (that is an element of the Pos vector), we find the first element in vshifted
by calculating the address i = ⌊x/Nb⌋ and the index j = x mod Nb. If Nb is a power of two,
i is simply equal to the log2 Nb most significant bits of x and j is equal to the remaining least
significant bits of x. Therefore, the desired shifted version of v is obtained by selecting in the
read word all elements with index ≥ j; the vector is then completed by means of additional
reads from Mv, at addresses &gt; i. Overall, up to ⌈p/Nb⌉ reads are needed to obtain the complete
vshifted. For example, with p = 15, Nb = 4 and x = 3, the first vector element, v3, is read
at the first cycle, together with elements v0 to v2, which are not used at this time. In the two
following cycles, elements v4 to v7, and v8 to v11 are taken from Mv. Finally elements v12,
v13 and v14 are read at the fourth cycle. The architecture of the complete architecture for the
product Vector by Circulant is shown in Figure 3.</p>
          <p>Pos
index
log N
initial
address
Row</p>
          <p>Counter
address
p</p>
          <p>N</p>
          <p>Row
N</p>
          <p>collapse unit
…</p>
          <p>Binary
+
+ … + Integer</p>
          <p>N
M</p>
          <p>N
N
N</p>
          <p>Add</p>
          <p>Result</p>
          <p>N</p>
          <p>Control Unit
counters and register
enables</p>
          <p>N
M</p>
          <p>N</p>
          <p>In order to process the data read from the Mv memory, we need two Nb-element registers
(Rown and Rown+1) that store two consecutive rows of Mv. At every cycle, a new row is loaded
into the Rown + 1 register and the previous row is moved to the Rown register, at the same
time. Moreover, a collapse unit merges the elements in the two registers in a sequence of Nb
ordered elements starting from the position pointed by index. The circuit, at the first cycle,
skips elements with index lower than the initial shift position and reuse them at the last cycle,
to complete the tail of the sequence.</p>
          <p>The collapse unit provides Nb consecutive elements to an accumulator that calculates the
final product as the sum of several shifted versions of v. The temporary accumulated values
are stored in the result memory, Mr, which is set to zero at the beginning. Then, at every
iteration, the Add unit receives a new portion of a shifted vector from the collapse unit and
combines it with the old accumulated values read from Mr. The Add unit actually contains
plain xor gates in the case of a binary vector and a set of complete adders for an integer vector.</p>
          <p>The complete result is available in the Mr memory after the loading of the last vshifted.</p>
          <p>This solution is scalable and provides a speed-up by a factor nb over the sequential
implementation.
5</p>
          <p>Simulations and Synthesis results
The proposed architecture has been described in a parameterized form, for multiple degrees
of parallelism, ranging from Nb = 8 up to Nb = 64. The synthesis has been completed using
Synopsys Design Compiler and a CMOS 65nm technology library. Functional and post-synthesis
simulations have been done to exactly estimate the required number of cycles and to derive the
power dissipation.</p>
          <p>The results provided by the Modelsim simulations are given in Table 4 for two codes with
n0 = 2 and p = 27, 779 and p = 15013.</p>
          <p>The FPGA synthesis has been carried out with Xilinx Vivado, targeting the Artix-7
xc7a50tcpg236-3 device and setting the clock frequency at 100 MHz. The occupied resources
are detailed in Table 6.</p>
          <p>Although the hardware implementation of QC-LDPC code decoders for wireless
communications has been deeply investigated [11], only a few dedicated architectures for code-based
post-quantum cryptography are available in the open literature. The FPGA results reported
in Table 6 can be compared against the LEDAcrypt implementation proposed in [14], which
supports the n0 = 2, p = 15, 013 code with a bit parallelism of 32 bits: the reported resource
usage for a Xilinx Virtex-6 device is 650 FFs and 2222 LUTs, the achievable clock frequency is
140 MHz and the required number of cycles is 2.62 · 106. Therefore, the architecture described
in [14] has a decryption latency of 2,620,000 cycles at 140 MHz, corresponding to 18.7 ms. On
the other side, the solution described in this paper achieves a decryption latency, for the same
code and degree of parallelism, equal to 507,000 cycles at 100 MHz, equal to 5.05 ms.
6</p>
          <p>Conclusions and Future Work
This paper presents a hardware implementation of the LEDAcrypt post-quantum cryptographic
primitives based on QC-LDPC codes. A key advantage of the present design is the scalability
of the obtained decoder, which allows for different trade-offs between computational time and
implementation cost. The decryption latency ranges between 3.16 ms up to 16 ms, for an
internal degree of parallelism between 8 and 64.</p>
          <p>As a future work, the effect of an extended parallelism in the critical processing tasks could
be explored more in detail. Given the large impact of processing parallelism on the total amount
of occupied resources or Silicon area, several architecture-level choices have to be explored in
order to balance the reduction of the decoding time and the increase of the implementation cost.
A second line of research for future works can aim at improving the implementation efficiency
by means of joint algorithm and architecture optimizations. As an example, possible algorithm
simplifications can be investigated to evaluate both the effects on the cryptographic primitives
and the provided advantages in terms of their hardware implementation.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rashmi</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Lake Bu, Alan Ehret, and
          <string-name>
            <surname>Michel</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kinsy</surname>
          </string-name>
          .
          <article-title>Open-source fpga implementation of post-quantum cryptographic hardware primitives</article-title>
          .
          <source>In Field Programmable Logic and Applications (FPL)</source>
          ,
          <source>2019 International Conference on, Sep</source>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Frank</given-names>
            <surname>Arute</surname>
          </string-name>
          , Kunal Arya,
          <string-name>
            <given-names>Ryan</given-names>
            <surname>Babbush</surname>
          </string-name>
          , et al.
          <article-title>Quantum supremacy using a programmable superconducting processor</article-title>
          .
          <source>Nature</source>
          ,
          <volume>574</volume>
          :
          <fpage>505</fpage>
          -
          <lpage>510</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barenghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaraluce</surname>
          </string-name>
          , G. Pelosi, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Santini</surname>
          </string-name>
          . Ledacrypt. https://github. com/LEDAcrypt/LEDAcrypt/tree/master last viewed
          <source>February</source>
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bodrato</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiaraluce</surname>
          </string-name>
          .
          <article-title>A new analysis of the McEliece cryptosystem based on QC-LDPC codes</article-title>
          .
          <source>In Proceedings of the 6th international conference on Security and Cryptography for Networks (SCN</source>
          <year>2008</year>
          ), pages
          <fpage>246</fpage>
          -
          <lpage>262</lpage>
          , Berlin, Heidelberg,
          <year>2008</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baldi</surname>
          </string-name>
          , Alessandro Barenghi, Franco Chiaraluce, Gerardo Pelosi, and Paolo Santini.
          <article-title>LEDAcrypt: QC-LDPC code-based cryptosystems with bounded decryption failure rate</article-title>
          . In Marco Baldi, Edoardo Persichetti, and Paolo Santini, editors,
          <source>Code-Based Cryptography</source>
          , pages
          <fpage>11</fpage>
          -
          <lpage>43</lpage>
          , Cham,
          <year>2019</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Berlekamp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McEliece</surname>
          </string-name>
          , and H. van Tilborg.
          <article-title>On the inherent intractability of certain coding problems (corresp</article-title>
          .).
          <source>Information Theory</source>
          , IEEE Transactions on,
          <volume>24</volume>
          (
          <issue>3</issue>
          ):
          <fpage>384</fpage>
          -
          <lpage>386</lpage>
          , 5
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Daniel</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Grover vs</article-title>
          .
          <source>McEliece. In Proceedings Post-Quantum Cryptography: Third International Workshop (PQCrypto</source>
          <year>2010</year>
          ), pages
          <fpage>73</fpage>
          -
          <lpage>80</lpage>
          , Darmstadt, Germany,
          <volume>5</volume>
          <fpage>2010</fpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fritzmann</surname>
          </string-name>
          , G. Maringer,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schamberger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Seplveda</surname>
          </string-name>
          .
          <article-title>Secure and compact full ntru hardware implementation</article-title>
          .
          <source>In 2018 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>Oct 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Lily</given-names>
            <surname>Chen</surname>
          </string-name>
          , Stephen
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yi-Kai</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Dustin Moody, Rene Peralta, Ray Perlner, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Smith-Tone</surname>
          </string-name>
          .
          <article-title>Report on post-quantum cryptography</article-title>
          .
          <source>National Institute of Standards and Technology Internal Report</source>
          ,
          <volume>8105</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Tung</given-names>
            <surname>Chou</surname>
          </string-name>
          . Qcbits:
          <article-title>Constant-time small-key code-basedcryptography</article-title>
          . https://www.win.tue. nl/~tchou/papers/qcbits.pdf,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Condo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martina</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Masera</surname>
          </string-name>
          .
          <article-title>A network-on-chip-based turbo/ldpc decoder architecture</article-title>
          .
          <source>In 2012 Design, Automation Test in Europe Conference Exhibition (DATE)</source>
          , pages
          <fpage>1525</fpage>
          -
          <lpage>1530</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Farahmand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Sharif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Briggs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gaj</surname>
          </string-name>
          .
          <article-title>A high-speed constant-time hardware implementation of ntruencrypt sves</article-title>
          .
          <source>In 2018 International Conference on Field-Programmable Technology (FPT)</source>
          , pages
          <fpage>190</fpage>
          -
          <lpage>197</lpage>
          ,
          <year>12 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Howe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. O'Neill</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Regazzoni</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Gneysu</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Beeden</surname>
          </string-name>
          .
          <article-title>Lattice-based encryption over standard lattices in hardware</article-title>
          .
          <source>In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>June 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Santini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Lightweight key encapsulation using ldpc codes on fpgas</article-title>
          .
          <source>IEEE Transactions on Computers</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. C. C.</given-names>
            <surname>Cheung</surname>
          </string-name>
          .
          <article-title>Area-time efficient computation of niederreiter encryption on qc-mdpc codes for embedded hardware</article-title>
          .
          <source>IEEE Transactions on Computers</source>
          ,
          <volume>66</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1313</fpage>
          -
          <lpage>1325</lpage>
          ,
          <year>Aug 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Koziel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Azarderakhsh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Kermani</surname>
          </string-name>
          .
          <article-title>A high-performance and scalable hardware architecture for isogeny-based cryptography</article-title>
          .
          <source>IEEE Transactions on Computers</source>
          ,
          <volume>67</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1594</fpage>
          -
          <lpage>1609</lpage>
          ,
          <year>Nov 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Koziel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Azarderakhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Mozaffari</given-names>
            <surname>Kermani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jao</surname>
          </string-name>
          .
          <article-title>Post-quantum cryptography on fpga based on isogenies on elliptic curves</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems I: Regular Papers</source>
          ,
          <volume>64</volume>
          (
          <issue>1</issue>
          ):
          <fpage>86</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>Jan 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Ingo</given-names>
            <surname>Von</surname>
          </string-name>
          <string-name>
            <surname>Maurich</surname>
          </string-name>
          , Tobias Oder, and
          <article-title>Tim Gu¨neysu. Implementing qc-mdpc mceliece encryption</article-title>
          .
          <source>ACM Trans. Embed. Comput. Syst.</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ),
          <year>April 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>McEliece</surname>
          </string-name>
          .
          <article-title>A public-key cryptosystem based on algebraic coding theory</article-title>
          .
          <source>Deep Space Network Progress Report</source>
          ,
          <volume>44</volume>
          :
          <fpage>114</fpage>
          -
          <lpage>116</lpage>
          ,
          <year>January 1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <article-title>National Institute of Standards and Technology. Post-quantum cryptography project</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <article-title>National Institute of Standards and Technology. Post-quantum cryptography project - round 2 submissions.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <article-title>National Institute of Standards and Technology. Submission requirements and evaluation criteria for the post-quantum cryptography standardization process</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Deepraj</given-names>
            <surname>Soni</surname>
          </string-name>
          .
          <article-title>A hardware evaluation study of nist post-quantum cryptographic signature schemes</article-title>
          . In Second PQC Standardization Conference, Santa Barbara, CA,
          <year>August 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Wen</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakub Szefer</surname>
            , and
            <given-names>Ruben</given-names>
          </string-name>
          <string-name>
            <surname>Niederhagen</surname>
          </string-name>
          .
          <article-title>Fpga-based niederreiter cryptosystem using binary goppa codes</article-title>
          .
          <source>In Tanja Lange and Rainer Steinwandt</source>
          , editors,
          <source>Post-Quantum Cryptography</source>
          , pages
          <fpage>77</fpage>
          -
          <lpage>98</lpage>
          , Cham,
          <year>2018</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>