<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Febuary</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Classification of Encrypted Word Embeddings using Recurrent Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert Podschwadt</string-name>
          <email>rpodschwadt1@student.gsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Takabi</string-name>
          <email>takabi@gsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia State University</institution>
          ,
          <addr-line>Atlanta</addr-line>
          ,
          <country country="GE">Georgia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>07</volume>
      <issue>2020</issue>
      <abstract>
        <p>Deep learning has made many exciting applications possible and given the popularity of social networks and user generated content everyday there is no shortage of data for these applications. The content generated by the users is written or spoken in natural language which needs to be processed by computers. Recurrent Neural Networks (RNNs) are a popular choice for language processing due to their ability to process sequential data. On the other hand, this data is some of the most privacy sensitive information. Therefore, privacy-preserving methods for natural language processing are crucial. In this paper, we focus on settings where a client has private data and wants to use machine learning as a service (MLaaS) to perform classification on the data without the need to disclose the data to the entity ofering the service. We employ homomorphic encryption techniques to achieve this. Homomorphic encryption allows for data being processed without it being decrypted thereby protecting the users privacy. Although homomorphic encryption has been used for privacy-preserving machine learning, most of the work has been focused on image processing and convolutional neural networks (CNNs), but RNNs have not been studied. In this work, we use homomorphic encryption to build privacy-preserving RNNs for natural language processing tasks. We show that RNNs can be run over encrypted data without loss in accuracy compared to a plaintext implementation by evaluating our system on a sentinment classification task on the IMDb movie review dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Security and privacy; • Computing methodologies →
Natural language processing; Neural networks;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Artificial neural networks have been very successful and popular
over the last few years in a variety of domains. CNNs have shown
better than human performance in image classification tasks [
        <xref ref-type="bibr" rid="ref13 ref38">13,
38</xref>
        ] and have also been applied to language processing tasks[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
RNNs, another type of neural networks, are specifically designed
to work with sequences. Unlike other types of networks, RNNs
take the output of the previous sequence step into consideration.
There are diferent types of RNN architectures such as Long Short
Term Memory (LSTM), Gated Recurrent Unit (GRU) and a simple
fully connected variant or Elman Network [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In this work we
work with Elman Networks and unless specified otherwise will use
the term RNN instead of Elman Network. Recurrent architectures
are very popular in natural language processing (NLP) due to the
sequential nature of language. There are many diferent sub-fields in
NLP. In this work we investigate the task of sentiment classification.
      </p>
      <p>Many companies have built a business around ofering MLaaS.
In MLaaS the model is hosted in the cloud. The service provider has
the infrastructure and know-how to build the models. The client
owns the data and sends it to the provider (also called server) for
processing.</p>
      <p>A concern for the client of MLaaS is the privacy of the data.
To process the data the server needs access to the data. This is
often unwanted or unacceptable depending on the sensitivity of the
data. There are three main techniques for preserving the privacy of
the data while still allowing for ML algorithms to work: 1) Secure
Multiparty Computation (SMC), 2) Diferential Privacy (DP) and 3)
Homomorphic Encryption (HE).</p>
      <p>
        In previous work a variety diferent machine learning algorithm
have been adapted for privacy preserving processing such as linear
regression [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], linear classifiers [
        <xref ref-type="bibr" rid="ref17 ref4">4, 17</xref>
        ], decision trees [
        <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
        ] or
neural networks [
        <xref ref-type="bibr" rid="ref14 ref29 ref32">14, 29, 32</xref>
        ]. Solutions based on SMC [
        <xref ref-type="bibr" rid="ref29 ref32">29, 32</xref>
        ] come
with a huge communication overhead.
      </p>
      <p>
        We propose an approach that is based on homomorphic
encryption and recurrent neural networks. It does not require interactive
communication between client and server like SMC approaches
but in the case of longer sequences, we use interactive
communication to control the noise introduced by HE. Very little prior work
deals with recurrent neural networks. Much of the work is done
on CNNs in the image domain [
        <xref ref-type="bibr" rid="ref10 ref14 ref22">10, 14, 22</xref>
        ] and more. [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] perform
encrypted speech recognition which is an NLP task but the model
used is also a CNN. Badwai et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] research privacy preserving
text classification which is the task that we also us in the this paper
but the authors do not use an RNN. To the best of our knowledge,
there is only one prior paper working with a recurrent architecture.
Qian and Lei propose a system [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] that is capable of implementing
LSTM networks based on TFHE [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Their LSTM model sufers
from a small drop in accuracy though when running on encrypted
data. Our solution is able to maintain the same accuracy as the
plain text model. We present a solution that can process RNNs with
arbitrary length input sequences in a privacy preserving manner
and introduce a way of using word embeddings with encrypted
data. To ensure the privacy of the data we rely on the CKKS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
crypto scheme. We evaluate our system on a text classification task.
The basic idea of our proposed approach is running RNNs on the
encrypted data by taking advantage of HE schemes. The server
hosts the trained model, the client transmits the encrypted data for
processing and receives an encrypted result. The training of the
model is done on plaintext. In this work, we make the following
main contributions:
• We propose an approach that combines RNNs, specifically
Elman Networks, and homomorphic encryption to perform
inference over encrypted data in natural language processing
tasks.
• We present an innovative approach to work with word
embeddings for encrypted data.
• We perform thorough benchmarking of our system both
with respect to run time performance and communication
cost. Our results demonstrate that we are able to run RNNs
over encrypted data without sacrificing accuracy and with
reasonable performance and communication cost.
1.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Threat Model and Problem Statement</title>
      <p>In this paper, we apply privacy preserving machine learning
techniques based on HE to RNNs. We focus on a client server setting
such as MLaaS in which the client has full control over the data
and the server has full control over the model. We assume that
the model has been trained on plaintext data and the server ofers
inference as a service to the clients. The clients want to use the
inference service and wish to keep their data private while the
server wishes to keep its model private .</p>
      <p>Threat Model: We assume that all parties are honest but curios.
They will not deviate from the protocol but will attempt to learn
any information possible in the process. The server does not share
information about the architecture of the model with the client. The
client encrypts the data and sends it to the server for processing. If
it is possible, the server will process the data and send back the final
result in encrypted format. In some cases data will be sent back
to the client where it is decrypted, encrypted again to remove the
built-up noise and sent back to the server to continue processing.
In addition to the privacy of the data we have the goal to achieve
accurate predictions. This means the predictions made on encrypted
data should be as close as possible to predictions made on plaintext
data.
2
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>BACKGROUND</title>
    </sec>
    <sec id="sec-5">
      <title>Homomorphic Encryption</title>
      <p>Homomorphic encryption (HE) schemes are similar to other
asymmetric encryption schemes as in they have a public key pk for
encrypting (Enc) data and a private or secret key sk for
decryption (Dec). Additionally, HE schemes also have a so-called
evaluation function, Eval . This evaluation function allows the
evaluation of a circuit C over encrypted data without the need for
decryption. Given a set of plaintexts {mi }0n and their encryption
{ci }0n = Enc(pk, {mi }0n ) the circuit C can be evaluated as:
Dec(sk, Eval (pk, C, c0, · · · , cn )) = C(m0, · · · , mn ).</p>
      <p>
        Most modern HE schemes are based on the ring learning with
errors problem (RLWE). Roughly speaking to encrypt a plaintext
some noise is added and the decryption process is the removal
of that noise. For more details see Brakerski et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Cheon
et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. When operations are performed on the ciphertexts the
noise grows and when it passes a certain threshold the ciphertext
can not be decrypted correctly anymore. Multiplications add much
more noise than additions. A way of controlling the noise is to use
so-called leveled homomorphic encryption (LHE). LHE allows a for
certain number of multiplications based on the parameters chosen
for the encryption scheme and evaluating circuits of a known depth.
Computation cost can be mitigated in some cases by using single
instruction multiple data (SIMD) techniques introduced by Smart
and Vercauteren [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
2.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Recurrent Neural Networks</title>
      <p>In contrast to fully connected or convolutional neural networks,
which are feed forward only, recurrent neural networks feed some
part of their hidden state back into themselves.</p>
      <p>
        There are many diferent types of recurrent neural network cells
with Long short-term Memory (LSTM) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and Gated Recurrent
Unit (GRU) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] being the most popular ones. These cells are more
complex than simple RNNs which we are focusing on this paper.
While LSTM and GRU lead to better performance we focus on the
simpler RNN type due to lower computational complexity.
      </p>
      <p>The RNN used in this paper consists of three main components
input (xt ), hidden state (st ) and output (o) of the network at time
step t . The st for one neuron is calculated by following formula:
st = f (xt · w + st −1 · v) where f is the activation function and
· is the vector dot product. Tanh ( ee−−xx +−ee−−xx ) is the most common
activation functions used in RNNs.
2.3</p>
    </sec>
    <sec id="sec-7">
      <title>Polynomial Approximation: Theoretical</title>
    </sec>
    <sec id="sec-8">
      <title>Foundation</title>
      <p>One of the major limitations of homomorphic encryption is the
limited set of operations that can be performed. CKKS supports
addition and multiplication. Division is supported only for plaintext
divisors. Basically this allows us to evaluate only polynomials. Tanh,
a popular activation function in RNNs, can not be expressed as a
polynomial. This means we can not evaluate it over encrypted data.
A way to circumvent this is to find a polynomial approximation.</p>
      <p>
        Hesamifard et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] use an approach that is based on
Chebyshev polynomials. Given the family of all continuous real valued
functions X on a non-empty compact space C(X ) and let µ be a finite
measure on X . The authors define f , д ∈ C(X ) as ⟨f , д⟩ = ∫ X f дd µ .
To generate Chebyshev polynomials they use d µ = √dx as the
1−x 2
measure on [
        <xref ref-type="bibr" rid="ref1">−1, 1</xref>
        ]. For better computational performance we want
to stick to low degree polynomials.
2.4
      </p>
    </sec>
    <sec id="sec-9">
      <title>NLP with Neural Networks</title>
      <p>
        Recurrent neural networks are widely used for addressing
challenges in Natural Language Processing. Recurrent neural network
reached state-of-the-art performance for diferent tasks such as:
Speech Recognition, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Generating Image Descriptions,
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], Machine Translation, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Language
Modeling, [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. The implementation of an NLP pipeline using
RNNs can be broken done into four major parts: 1) Designing the
network, 2) Encoding the data, 3) Training the model and 4)
Inference of new instances.
      </p>
      <p>In the next section we will look at the individual steps in detail
and describe the changes that are necessary for computation in a
privacy preserving setting.
3</p>
    </sec>
    <sec id="sec-10">
      <title>THE PROPOSED PRIVACY-PRESERVING</title>
    </sec>
    <sec id="sec-11">
      <title>CLASSIFICATION FOR RECURRENT</title>
    </sec>
    <sec id="sec-12">
      <title>NEURAL NETWORKS</title>
      <p>Looking at the components of the RNN pipeline described in Section
2.4 we determine what changes need to be made to adhere to the
constraints of homomorphic encryption.</p>
      <p>Network Design. As long as we only use fully connected and
recurrent layers the only consideration we need to make are the
activation functions that are being used. All other operations inside
an RNN can be performed over encrypted data using HE schemes.
However, it is not possible to implement common activation
functions within current HE schemes. We aim to find the best low
degree polynomial approximation to replace the activation functions
within the RNN.</p>
      <p>Data Encoding. In this paper, we use word embeddings as an
encoding scheme for textual data. We describe our approach to
handling embeddings in more detail in Section 3.1.</p>
      <p>Model Training. In this paper, we assume that the training of
the model is performed by the server on plain training data.</p>
      <p>Inference. This is the part of the pipeline in our system that
is run on encrypted data. At no point during this process is the
data decrypted on the server thus ensuring its privacy is protected.
During processing by the model, the encrypted data accumulates
noise. We describe a way of circumventing the problem of the noise
crossing the threshold after which correct decryption is no longer
possible in Section 3.2. Once the data has been processed by the
entire network, the result of the classification is sent back to the
client. The result of the classification is still encrypted and needs
to be decrypted by the client.</p>
      <p>
        A variety of activation functions have been proposed as
replacements for common activation functions used in NNs. Dowlin et. al
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] use polynomials of degree 2 to substitute the Sigmoid function
in CNNs and Shortell and Shokoufandeh [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] use polynomial of
degree 3 to approximate the natural logarithm function. Hesamifard
et. al [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] use Chebyshev polynomials to approximate activation
functions such as ReLU, Sigmoid and Tanh. We will be using the
approach of [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] to approximate Tanh which is the most popular
activation function in RNNs. The Softmax function can not be
performed over encrypted data but since it is typically used as the very
last function of neural network, we move it to the client side. The
server computes the neural network all the way to the inputs of
the Softmax function. The the Softmax function is performed by
the client after decryption to obtain the classification results.
3.1
      </p>
    </sec>
    <sec id="sec-13">
      <title>Encrypted word embeddings</title>
      <p>
        Word embeddings are a way to turn words into real valued vectors.
The embedding layer basically is a lookup table that maps any word
in a dictionary to a real valued vector. The lookup of an embedding
for a given word cannot be performed eficiently in HE schemes.
We address this problem by moving the embedding layer out of
the RNN and to the client where it can be performed in plaintext.
After performing the embedding lookup, the client encrypts the
embeddings and sends the result to the server. To enhance the the
privacy of the model, the model owner can use one of the many
pretrained embeddings such as GloVe [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], Elmo [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], Bert [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or
XLNet [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] and share those with the client .
3.2
      </p>
    </sec>
    <sec id="sec-14">
      <title>Noise growth in HE</title>
      <p>In an RNN architecture, a sequence is processed by feeding its
entries into a fully connected layer which also takes the output of
that layer produced for the previous sequence entry. The current
output and the previous output are combined into the new output.
Due to the noise build-up in HE we need to keep track of the number
of operations performed on ciphertexts. To process a sequence
of length n with an RNN layer the resulting ciphertext needs to
pass the layer n times. That means n dot products and activation
functions are applied. It is not always possible to process all of the
sequence entries due to the noise that is accumulated. Our approach
is to send the encrypted data back to the client where it is decrypted
and re-encrypted thereby removing the built up noise.
3.3</p>
    </sec>
    <sec id="sec-15">
      <title>Implementation</title>
      <p>
        We use CKKS to protect the privacy of the client data. The server
trains a plaintext model and shares the embedding matrix with the
client. The activation in the model needs to be compatible with
HE. This is achieved by approximating Tanh using the method by
Hesamifard et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The client performs the embedding process
and encrypts the result. The encrypted embeddings are sent to the
server where it is processed. When the noise, built up during
computation, reaches the limit it the data is sent back to client where it is
decrypted, thereby removing all noise, rencrypted and sent back to
the server. Once the model is completed processed the server sends
the still encrypted resutl back to the client where it can be decrypted.
We implement our proposed solution in C++11. We train the model
using Keras [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the homomorphic encryption primitives are
provided by HElib [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. On the plaintext, we tried diferent activation
functions and found out that Tahn and Tanh approximations work
best. Other activation functions such as x 2 or the linear function
cause the model not to train properly. We find that best replacement
for our purposes is: −0.00163574303018748x 3 +0.249476365628036x .
4
      </p>
    </sec>
    <sec id="sec-16">
      <title>EXPERIMENTAL RESULTS</title>
      <p>
        The experiments were performed on a Ubuntu 18.04 64bit machine
with an AMD Ryzen 5 2600 @ 3.5GHz processor and 32GB of RAM.
The IMDb [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] dataset contains 50,000 movie reviews labeled as
either positive or negative of which 25,000 are used as training
and 25,000 as test data. The tokenization is performed by Keras.
We train a model to perform sentiment classification which is
classifying a review as either positive or negative. Out of the 25,000
training instances we use 2,000 as validation data for
hyperparameter tuning. We use a vocabulary of the top 20,000 words. We pad or
truncate the reviews to be 200 words long. Our model consists of
an embedding layer that turns words in the reviews into real
valued vectors of dimension 128. The embedding matrix is randomly
initialized and updated during the training process. The embedding
layer is followed by and RNN layer with 128 units. We use the
Tanh approximation from Section 3.3 as activation function. The
last layer is a fully connected layer with two units and Softmax
activation. The training is performed on the plain data using Keras
and yields 86.47% accuracy on the unseen test data. We achieve the
same accuracy on the encrypted data.
      </p>
      <p>We extract the learned weights and run experiments with
diferent batch sizes. In our experiments the noise growth exceeds the
workable threshold after 27 timesteps. This means we need to add
communication between client and server seven times to refresh
the noise in order to classify the IMDb sequences of length 200.</p>
      <p>The amount of data that needs to be transmitted depends on the
batch size. The encrypted embeddings are larger than the plaintext
data by a factor of 1,280. See Table 1 for diferent batch sizes. The
Embeddings column is the amount of data that is initially transferred
from the client to server. Noisy ciphertext gives the size of the data
the server sends to the client to be refreshed and Refreshed ciphertext
is the reencrypted answer. These are the values for only one refresh
operation. The Batch column is the total amount of data transferred
between client and server during classification of one batch which
requires seven refresh rounds.</p>
      <p>The amount of data that needs to be transmitted initially makes
up the largest portion of the transfer. To run our network seven
noise removal communications are required. At a batch size of 256
the server sends 106MB to the client and the client responds with
70MB. One round of noise removal therefore requires 176 MB to
be transferred. All seven rounds take 1,232MB. Which is less than
10% of the initial transfer. The increase in size of the ciphertexts
is nearly linear. Smaller ciphertexts sizes carry more overhead per
instance than larger ones.</p>
      <p>
        Table 2 lists the execution time for diferent batch sizes. The
times are given for encrypted, plain data and for the actual time
it takes to processes the batch as well as the resulting time per
instance. The noise removal is not performed by the client though.
It is simulated on the server. The measurements also do not include
the encryption and transfer of the embeddings. We can see that
increasing the batch size leads to lower per instance classification
time. The efect is lost when increasing the batch size from 128 to
256. On the plain data we still can see improvement after that point.
To get an accurate comparison the plain text measurements are
performed on the same implementation as the encrypted experiments.
It looks like the growth in execution time for the encrypted values
is exponential while the plain version appears to be logarithmic.
Our implementation performs best on encrypted data with a batch
size of 128 and worst with a batch size of one if we look at the
time per sample. The overhead is smallest though for one instance
per batch. Here the encrypted version is 40 times slower than the
plain version. For our optimal batch size of 128 the encrypted
version is 92 times slower. This is due to the diferent growth rates of
execution time for the encrypted and plain data.
Badwai et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] presented PrivFT a system for privacy
preserving text classification built on Facebooks fasttext [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] (Joulin et al.
[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]). The main diference to our work is that we use a recurrent
architecture. In PrivFT the embedding operation is also not
outsourced to the client. The client needs to one-hot encode each word,
encrypt it and send it to server where the embedding operation is
performed as a matrix multiplication. The message size is similar.
The inference time for single instance on the IMDb is higher in our
scenario but using larger batch sizes allows us to get a lower per
instance time. In contrast to our work, PrivFT features schemes for
training on encrypted data and a CKKS implementation with GPU
acceleration. Lou and Jiang created SHE [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] a privacy preserving
neural network framework based on TFHE. It ofers support for
LSTM cells. The authors replace the computationally expensive
and high noise introducing matrix operations normally required
by LSTMs with much cheaper shift operations. Zhang et al. [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]
perform a diferent NLP task namely encrypted speech recognition
based on a CNN. The last step of the network that matches the
output to actual text is performed on the client side.
6
      </p>
    </sec>
    <sec id="sec-17">
      <title>CONCLUSION</title>
      <p>In this paper, we present an approach that allows the use of
recurrent neural networks on homomorphically encrypted data based
on the CKKS scheme. We present a solution to perform NLP tasks
over encrypted data using recurrent neural networks, in our case
sentiment analysis on the IMDb dataset. We are able to achieve
this with no loss in accuracy compared to the plaintext model. This
is made possible by introducing communication between client
and server to refresh the noise. We trade network trafic for the
ability eficiently use word embeddings. Our future work aims at
investigating other recurrent architectures such as LSTM and GRU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Louis</surname>
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Aslett</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pedro M. Esperança</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chris</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Holmes</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Encrypted statistical machine learning: new privacy preserving methods</article-title>
          .
          <source>CoRR</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ahmad</given-names>
            <surname>Al</surname>
          </string-name>
          <string-name>
            <surname>Badawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luong</given-names>
            <surname>Hoang</surname>
          </string-name>
          , Chan Fook Mun, Kim Laine, and Khin Mi Mi Aung.
          <year>2019</year>
          .
          <article-title>PrivFT: Private and Fast Text Classification with Homomorphic Encryption</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>06972</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Neural Machine Translation by Jointly Learning to Align and Translate</article-title>
          .
          <source>In 3rd International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings. http://arxiv.org/abs/1409.0473
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Raphael</given-names>
            <surname>Bost</surname>
          </string-name>
          , Raluca Ada Popa, Stephen Tu, and
          <string-name>
            <given-names>Shafi</given-names>
            <surname>Goldwasser</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Machine Learning Classification over Encrypted Data. In 22nd Annual Network and Distributed System Security Symposium</article-title>
          ,
          <string-name>
            <surname>NDSS</surname>
          </string-name>
          , San Diego, California, USA.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zvika</given-names>
            <surname>Brakerski</surname>
          </string-name>
          , Craig Gentry, and
          <string-name>
            <given-names>Vinod</given-names>
            <surname>Vaikuntanathan</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>(Leveled) Fully Homomorphic Encryption Without Bootstrapping</article-title>
          .
          <source>In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12)</source>
          . ACM, New York, NY, USA,
          <fpage>309</fpage>
          -
          <lpage>325</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jung</given-names>
            <surname>Hee</surname>
          </string-name>
          <string-name>
            <surname>Cheon</surname>
          </string-name>
          , Andrey Kim, Miran Kim, and
          <string-name>
            <given-names>Yongsoo</given-names>
            <surname>Song</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Homomorphic Encryption for Arithmetic of Approximate Numbers</article-title>
          .
          <source>In Advances in Cryptology - ASIACRYPT</source>
          <year>2017</year>
          ,
          <article-title>Tsuyoshi Takagi</article-title>
          and Thomas Peyrin (Eds.). Springer International Publishing, Cham,
          <fpage>409</fpage>
          -
          <lpage>437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ilaria</given-names>
            <surname>Chillotti</surname>
          </string-name>
          , Nicolas Gama, Mariya Georgieva, and
          <string-name>
            <given-names>Malika</given-names>
            <surname>Izabachène</surname>
          </string-name>
          .
          <year>August 2016</year>
          .
          <article-title>TFHE: Fast Fully Homomorphic Encryption Library</article-title>
          . https://tfhe.github.io/tfhe/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart van Merriënboer,
          <string-name>
            <surname>Caglar Gulcehre</surname>
            , Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <fpage>1724</fpage>
          -
          <lpage>1734</lpage>
          . https://doi.org/10.3115/v1/
          <fpage>D14</fpage>
          -1179
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>François</given-names>
            <surname>Chollet</surname>
          </string-name>
          et al.
          <year>2017</year>
          . Keras. https://github.com/fchollet/keras.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Edward</surname>
            <given-names>Chou</given-names>
          </string-name>
          , Josh Beal, Daniel Levy, Serena Yeung,
          <string-name>
            <given-names>Albert</given-names>
            <surname>Haque</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2018</year>
          .
          <article-title>Faster CryptoNets: Leveraging sparsity for real-world encrypted inference</article-title>
          .
          <source>arXiv preprint arXiv:1811</source>
          .
          <volume>09953</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Junyoung</surname>
            <given-names>Chung</given-names>
          </string-name>
          , Caglar Gulcehre, Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>In NIPS 2014 Workshop on Deep Learning</source>
          ,
          <year>December 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). arXiv:
          <year>1810</year>
          .04805 http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Terrance</given-names>
            <surname>Devries</surname>
          </string-name>
          and
          <string-name>
            <given-names>Graham W.</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Improved Regularization of Convolutional Neural Networks with Cutout</article-title>
          .
          <source>CoRR abs/1708</source>
          .04552 (
          <year>2017</year>
          ). arXiv:
          <volume>1708</volume>
          .04552 http://arxiv.org/abs/1708.04552
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Nathan</surname>
            <given-names>Dowlin</given-names>
          </string-name>
          , Ran Gilad-Bachrach, Kim Laine, Kristin Lauter Michael Naehrig,
          <string-name>
            <given-names>and John</given-names>
            <surname>Wernsing</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy</article-title>
          .
          <source>Technical Report MSR-TR-2016-3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Elman</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>Finding structure in time</article-title>
          .
          <source>COGNITIVE SCIENCE 14</source>
          ,
          <issue>2</issue>
          (
          <year>1990</year>
          ),
          <fpage>179</fpage>
          -
          <lpage>211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jonas</surname>
            <given-names>Gehring</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Auli</surname>
          </string-name>
          , David Grangier,
          <string-name>
            <given-names>Denis</given-names>
            <surname>Yarats</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yann N Dauphin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Convolutional sequence to sequence learning</article-title>
          .
          <source>In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org</source>
          ,
          <volume>1243</volume>
          -
          <fpage>1252</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Thore</surname>
            <given-names>Graepel</given-names>
          </string-name>
          , Kristin Lauter, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Naehrig</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>ML Confidential: Machine Learning on Encrypted Data</article-title>
          .
          <source>In Proceedings of the 15th International Conference on Information Security and Cryptology (ICISC'12)</source>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          and
          <string-name>
            <given-names>Navdeep</given-names>
            <surname>Jaitly</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Towards End-to-end Speech Recognition with Recurrent Neural Networks</article-title>
          .
          <source>In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML'14)</source>
          . JMLR.org, II-1764
          <string-name>
            <surname>-</surname>
          </string-name>
          II-1772. http://dl.acm.org/citation.cfm?id=
          <volume>3044805</volume>
          .
          <fpage>3045089</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Speech recognition with deep recurrent neural networks</article-title>
          .
          <source>In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing</source>
          .
          <fpage>6645</fpage>
          -
          <lpage>6649</lpage>
          . https://doi.org/10.1109/ICASSP.
          <year>2013</year>
          . 6638947
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Shai</given-names>
            <surname>Halevi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Shoup</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Algorithms in HElib</article-title>
          . In Advances in Cryptology - CRYPTO - 34th Annual Cryptology Conference, CA, USA, Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Ehsan</surname>
            <given-names>Hesamifard</given-names>
          </string-name>
          , Hassan Takabi, and
          <string-name>
            <given-names>Mehdi</given-names>
            <surname>Ghasemi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>CryptoDL: Towards Deep Learning over Encrypted Data</article-title>
          . In Annual Computer Security Applications Conference (ACSAC).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Ehsan</surname>
            <given-names>Hesamifard</given-names>
          </string-name>
          , Hassan Takabi, and
          <string-name>
            <given-names>Mehdi</given-names>
            <surname>Ghasemi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Deep Neural Networks Classification over Encrypted Data</article-title>
          .
          <source>In Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy. ACM</source>
          ,
          <volume>97</volume>
          -
          <fpage>108</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long Short-Term Memory</article-title>
          .
          <source>Neural Computation</source>
          <volume>9</volume>
          ,
          <issue>8</issue>
          (
          <year>1997</year>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          . https://doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          . 8.1735 arXiv:https://doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.
          <fpage>1735</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Armand</surname>
            <given-names>Joulin</given-names>
          </string-name>
          , Edouard Grave, Piotr Bojanowski, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Bag of Tricks for Eficient Text Classification</article-title>
          .
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>2</volume>
          ,
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Valencia, Spain,
          <fpage>427</fpage>
          -
          <lpage>431</lpage>
          . https://www.aclweb.org/anthology/E17-2068
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Andrej</given-names>
            <surname>Karpathy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Li</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Deep Visual-Semantic Alignments for Generating Image Descriptions</article-title>
          .
          <source>In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Qian</given-names>
            <surname>Lou</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lei</given-names>
            <surname>Jiang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>SHE: A Fast and Accurate Deep Neural Network for Encrypted Data</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          .
          <volume>10035</volume>
          -
          <fpage>10043</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Andrew L. Maas</surname>
            , Raymond E. Daly, Peter T. Pham, Dan Huang,
            <given-names>Andrew Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>Christopher</given-names>
          </string-name>
          <string-name>
            <surname>Potts</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Learning Word Vectors for Sentiment Analysis</article-title>
          .
          <source>In Proceedings of the 49th Annual</source>
          <article-title>Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics</article-title>
          , Portland, Oregon, USA,
          <fpage>142</fpage>
          -
          <lpage>150</lpage>
          . http://www.aclweb.org/anthology/P11-1015
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Martin Karafiát, Lukás Burget, Jan Cernocký, and
          <string-name>
            <given-names>Sanjeev</given-names>
            <surname>Khudanpur</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In INTERSPEECH.</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohassel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>SecureML: A System for Scalable PrivacyPreserving Machine Learning</article-title>
          .
          <source>In IEEE Symposium on Security and Privacy (SP).</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          . http://www.aclweb.org/anthology/D14- 1162
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Matthew</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Peters</surname>
            , Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
            <given-names>Kenton</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>and Luke</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In Proc. of NAACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sadegh Riazi</surname>
          </string-name>
          , Christian Weinert, Oleksandr Tkachenko,
          <string-name>
            <surname>Ebrahim M. Songhori</surname>
            , Thomas Schneider, and
            <given-names>Farinaz</given-names>
          </string-name>
          <string-name>
            <surname>Koushanfar</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications</article-title>
          . CoRR abs/
          <year>1801</year>
          .03239 (
          <year>2018</year>
          ). arXiv:
          <year>1801</year>
          .03239 http://arxiv.org/abs/
          <year>1801</year>
          .03239
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Shortell</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ali</given-names>
            <surname>Shokoufandeh</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Secure Signal Processing Using Fully Homomorphic Encryption</article-title>
          .
          <source>In Advanced Concepts for Intelligent Vision Systems - 16th International Conference</source>
          , ACIVS, Italy, Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Smart</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Vercauteren</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Fully homomorphic SIMD operations</article-title>
          .
          <source>Designs, Codes and Cryptography</source>
          <volume>71</volume>
          ,
          <issue>1</issue>
          (
          <issue>01</issue>
          <year>Apr 2014</year>
          ),
          <fpage>57</fpage>
          -
          <lpage>81</lpage>
          . https://doi.org/10. 1007/s10623-012-9720-4
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Martin</surname>
            <given-names>Sundermeyer</given-names>
          </string-name>
          , Ralf Schlüter, and Hermann Ney.
          <year>2012</year>
          .
          <article-title>LSTM Neural Networks for Language Modeling</article-title>
          . In INTERSPEECH.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Kelvin</surname>
            <given-names>Xu</given-names>
          </string-name>
          , Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Show, Attend and Tell: Neural Image Caption Generation with Visual Attention</article-title>
          .
          <source>In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research)</source>
          ,
          <source>Francis Bach and David Blei (Eds.)</source>
          , Vol.
          <volume>37</volume>
          . PMLR, Lille, France,
          <fpage>2048</fpage>
          -
          <lpage>2057</lpage>
          . http://proceedings.mlr.press/v37/xuc15.html
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Zhilin</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>XLNet: Generalized Autoregressive Pretraining for Language Understanding</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>08237</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Zagoruyko</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nikos</given-names>
            <surname>Komodakis</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Wide Residual Networks</article-title>
          .
          <source>CoRR abs/1605</source>
          .07146 (
          <year>2016</year>
          ). arXiv:
          <volume>1605</volume>
          .07146 http://arxiv.org/abs/1605.07146
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Encrypted Speech Recognition Using Deep Polynomial Networks</article-title>
          .
          <source>In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          .
          <volume>5691</volume>
          -
          <fpage>5695</fpage>
          . https://doi.org/10. 1109/ICASSP.
          <year>2019</year>
          .8683721
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>