<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Gray-box Techniques for Adversarial Text Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Prithviraj Dasgupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joseph Collins</string-name>
          <email>joseph.collins@nrl.navy.mil</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Buhman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>P. Dasgupta and A. Buhman are with the Computer Science Dept., University of Nebraska at Omaha</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Machine learning-based systems are currently used widely
for different cyber-security tasks including email spam
filtering, network intrusion detection, sentiment analysis of
posts on social media sites like Twitter, and validating
authenticity of news posts on social media sites. Most of these
systems use supervised learning techniques where a
learning algorithm called a classifier is trained to categorize data
into multiple categories. For instance, an automated
classifier for an email spam filter classifies incoming email into
spam and non-spam categories. A critical vulnerability of
classifier-based learning algorithm is that they are
susceptible to adversarial attacks where a malicious adversary could
send corrupted or poisoned data to the classifier that results
in incorrect classification, or, for more calculated attacks,
use corrupted data to alter the classification decisions of the
classifier. Both these attacks could compromise the
operation of the classifier, making it behave in an unintended,
possibly insecure and unsafe manner. As an example, a
compromised email classifier could end up categorizing valid email
messages as spam (false positives) while delivering spam
email messages as non-spam (false negatives). An
important defense against adversarial attacks is to build a model
of the adversary including the data that the adversary
generates, and use this model as a virtual adversary to improve the
learner’s robustness to adversarial attacks. Towards this
objective, in this paper, we investigate adversarial data
generation techniques that could be used to model corrupted data
generated by a virtual adversary. Our work focuses on
adversarial text data generation as many of the aforementioned
cyber-security tasks operate mainly on text data.</p>
      <p>
        Adversarial data generation techniques can be broadly
classified into two categories called white-box and
blackbox. In white-box techniques, the adversary has access to
information of the classifier, such as the parameters of
the model of the classifier, e.g., weights in a deep neural
network-based classifier, or, to internal calculations of the
classifier, such as the gradients of the loss function
calculated by the classifier on a sample. In contrast, in
blackbox techniques, the adversary does not have any
information about the classifier and treats it like a black-box, by
passing data samples as queries to the classifier and
observing the classifier’s output category or label for each sample.
Additionally, in most practical black-box settings, the
adversary can send only a finite number of queries to the
classifier due to the adversary’s budget limitations in generating
samples, or, due to restrictions in the number of queries
accepted by the classifier. While white-box techniques such as
gradient-based methods
        <xref ref-type="bibr" rid="ref15">(Liang et al. 2018)</xref>
        have been
proposed for generating adversarial text, black box methods for
generating adversarial text
        <xref ref-type="bibr" rid="ref5">(Gao et al. 2018)</xref>
        are less
investigated in literature. In this paper, we have describe gray-box
techniques for generating adversarial text where
gradientbased white-box techniques for generating adversarial text
are combined with budget-limited, black-box techniques for
generating adversarial samples. We have evaluated the
adversarial text data generated using our proposed technique
using the DBPedia dataset that contains short articles on
14 different categories extracted from Wikipedia, in terms
of the difference from the original data used as a basis to
generate the adversarial data, as well as the effectiveness of
the adversarial data in fooling a classifier into making
misclassifications. Our results show that using gray-box
techniques the divergence in the perturbed text from the original,
unperturbed text increases with the amount of perturbation,
and, more perturbation of original text results in changing
the label of the perturbed text with respect to the original
text more often.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Adversarial data generation techniques have received
considerable attention from researchers in the recent past. The
main concept in adversarial data generation is to add slight
noise to valid data using a perturbation technique so that the
corrupted data still appears legitimate to a human but gets
mis-classified by a machine classifier. Most research on
adversarial data generation has focused on image data,
including gradient-based methods for perturbing valid data (Biggio
et al. 2013),
        <xref ref-type="bibr" rid="ref19">(Goodfellow, Shlens, and Szegedy 2014)</xref>
        ,
        <xref ref-type="bibr" rid="ref20">(Papernot et al. 2016)</xref>
        , recurrent neural networks
        <xref ref-type="bibr" rid="ref8">(Gregor et al.
2015)</xref>
        , and generative adversarial networks (GANs)
(Goodfellow et al. 2014) and its extensions
        <xref ref-type="bibr" rid="ref19">(Mirza and Osindero
2014)</xref>
        , (Chen et al. 2016),
        <xref ref-type="bibr" rid="ref1 ref12 ref28">(Arjovsky, Chintala, and
Bottou 2017)</xref>
        . However, rather than generating synthetic data
towards misleading a machine classifier, the objective of
GAN-enabled data generation has been to create synthetic
data that looks convincing to humans.
      </p>
      <p>
        Adversarial Text Generation. For adverarial image
generation, image characteristics like RGB pixel values, HSV
values, brightness, contrast, etc., are real-valued numbers
that can be manipulated using mathematical operations to
generate perturbed images that can fool a machine classifer
while remaining imperceptible to humans. In contrast, for
perturbing text, adding a real value to a numerical
representation of a word, character, or token in an embedding space,
might result in a new word that might either be nonsense,
or, might not fit in with the context of the unperturbed,
original text - in both cases, the adversarial text can be easily
flagged by both a machine classifier and a human. To
address this shortcoming, instead of directly using image
perturbation techniques to generate adversarial text, researchers
have proposed using auto-encoders
        <xref ref-type="bibr" rid="ref2">(Bengio et al. 2013)</xref>
        and
recurrent neural networks (RNN) with long short term
memory (LSTM) architecture to generate text as sequences of
tokens
        <xref ref-type="bibr" rid="ref17">(Mikolov et al. 2010)</xref>
        , albeit with limitations when
generating adversarial text that looks realistic to humans
        <xref ref-type="bibr" rid="ref3">(Bengio et al. 2015)</xref>
        ,
        <xref ref-type="bibr" rid="ref10">(Husza´r 2015)</xref>
        . Recently, GAN-based
methods have been shown to be successful for adversarial text
generation. In adverarial text generation GANs, a feedback
signal (Yu et al.
        <xref ref-type="bibr" rid="ref9">2017), (Zhang et al. 2017</xref>
        )
        <xref ref-type="bibr" rid="ref26">(Subramanian
et al. 2017)</xref>
        such as a reward value within a reinforcement
learning framework
        <xref ref-type="bibr" rid="ref24 ref4">(Fedus, Goodfellow, and Dai 2018)</xref>
        or
high-level features of text identified by the classifier called
leaked information (Guo et al. 2017), is evaluated by the
discriminator or classifier and provided to the generator or
adversary. The adversary treats this information as a feedback
signal from the classifier about the quality of its
adversarially generated text, and adapts future adversarial generations
of words or phrases in the text towards improving the
quality of its generated text. Additional methods for generating
adversarial text are described in
        <xref ref-type="bibr" rid="ref11">(Iyyer et al. 2018)</xref>
        , (Jia and
Liang
        <xref ref-type="bibr" rid="ref9">2017), (Shen et al. 2017</xref>
        ). Following gradient-based
techniques for image perturbation
        <xref ref-type="bibr" rid="ref19">(Goodfellow, Shlens, and
Szegedy 2014)</xref>
        , (Ebrahimi et al. 2018),
        <xref ref-type="bibr" rid="ref15">(Liang et al. 2018)</xref>
        have used gradient-based methods to identify tokens in text
that have the highest gradient and consequently most
influence in determining the text’s label in the classifier. These
tokens are then strategically modified so that the text with
modified tokens results in a different label when classified,
as compared to the original text’s label. Both these
methods methods require the gradient information for the
unperturbed text when it is classified by the classifier to perturb the
text. In contrast, in
        <xref ref-type="bibr" rid="ref5">(Gao et al. 2018)</xref>
        , Gao et al. use a similar
idea, but instead of selecting words or characters to perturb
based on classifier-calculated gradients, they first rank words
in a piece of text to be perturbed based on a score assigned
to the word’s context, followed by changing the spelling of
the highest ranked words. Their technique is relevant when
the gradient information from the classifier might not be
readily available to perturb text, for example, in an online
classification service where the classifier can be accessed
only as a black box. Sethi and Kantardzic
        <xref ref-type="bibr" rid="ref24">(Sethi and
Kantardzic 2018)</xref>
        , also described black-box methods for
generating adversarial numerical and text data while considering
budget limitations of the adversary in generating
adversarial data. Our work is complimentary to these gradient-based
and black-box approaches as it compares the effectiveness
of generating adversarial text by integrating gradient-based
methods with concepts used in budget-limited black-box
methods.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Adversarial Text Perturbation</title>
      <p>
        One of the main functions of an adversary in an adversarial
learning setting is to generate, and possibly mislabel,
adversarial text to fool the classifier. Adversarial text is usually
generated by starting from valid text and changing certain
words or characters in it. This process is also referrred to
to as perturbing text. We define adversarial text
perturbation using the following formalization: Let V denote a
vocabulary of English words and X denote a corpus of
English text. {Xi} ⊂ X denotes a dataset consisting of
English text samples, where Xi denotes the i-th sample.
Further, Xi = (Xi(1); Xi(2); :::; Xi(W )), where Xi(j) denotes the
j-th word in Xi and W is the maximum number of words
of a text sample. Each word X(j) is represented by a feature
i
vector X(j) = f1:F generated using some word embedding
i
like word2Vec
        <xref ref-type="bibr" rid="ref18">(Mikolov et al. 2013)</xref>
        , Global Vector (GloVe)
        <xref ref-type="bibr" rid="ref19 ref23">(Pennington, Socher, and Manning 2014)</xref>
        , or fastText (Joulin
et al. 2016), where F is the embedding space dimension.
Each sample Xi belongs to a class with label l ∈ L, where
L is a finite set of class labels. For notational convenience,
we assume that l = y(Xi), where y : X → L is a
relation that ascertains ground truth label l for Xi. A classifier
C is also used to determine a label of Xi, the classifier’s
output is given by yC : X → L. We say that the
classifier classifies Xi correctly only when yC (Xi) = y(Xi). A
valid example Xi is perturbed by altering a subset of words
{Xi(j)} ∈ Xi using a perturbation strategy . The
perturbation strategy : X → X modifies the j-th word Xi(j) to
word X˜i(j) ∈ V; 1 ≤ j ≤ W . Let n = |{X˜i(j)}| denote
the number of words perturbed by the perturbation strategy
and X˜i;n denote text Xi after perturbing n words in it.
Finally, let ∆ : X ×X → [0; 1] denote a divergence measure
between two pieces of text with ∆(Xi; Xi) = 0. Within this
formalization, the objective of the adversary is to determine
a minimal perturbation n to Xi satisfying:
n
= argn
      </p>
      <p>min ∆(Xi; X˜i;n )
s.t. yC (X˜i;n ) ̸= y(Xi);
and, yC (Xi) = y(Xi)
(1)</p>
      <p>The objective function in Equation 1 finds the number
of words to perturb that gives the minimum divergence
between the original and perturbed text, while the last two
constraints respectively ensure that the classifier mis-classifies
the perturbed text, X˜i;n , giving it a different label than the
ground truth y(Xi), but the classifier correctly classifies the
original text Xi. In the next section, we describe different
perturbation strategies we have used to perturb valid text and
generate adversarial text.</p>
      <p>Perturbation Methods for Text Data
An adversarial perturbation method adds a certain amount of
noise to unperturbed data to generate perturbed data. There
are two main questions in a perturbation method: how much
noise to add, and where in the data to add the noise. For the
first question, the two types of perturbation methods,
whitebox and black-box, both add a random, albeit small amount
of noise. But these two types of methods differ in their
approach to answer the second question. Below, we describe
the white-box and black-box perturbation methods, and
propose a gray-box method that combines the advantages of the
two.</p>
      <p>White-box Gradient-based Perturbation. White-box
methods first query the classifier with the unperturbed text
and observe the gradient of the loss function of the
classifier’s output from the query for the different words in the
queried text. Because the loss function expresses the
difference of the classifier’s determined label from the ground
truth label for the queried text, the word in the text that
corresponds to the most negative gradient or change of the loss
function is most influential in changing the label for the text
determined by the classifier. This word is selected as the
word to perturb. Mathematically, the selected word, x ,
having the most negative gradient is given by:
where L is the loss function from the classifier, fLk is
the gradient of the loss function for the k-th feature of a
word, wj;k is the weight in the embedding layer neural
network connecting the j-th word to its k-th feature in
embedding space. x is then perturbed by replacing it with x˜(j),
the word from the vocabulary that has the smallest positive
gradient of loss function, and, consequently, has the least
influence in changing the label determined by the classifier.
Mathematically, x˜(j) is given by:</p>
      <p>L L
x˜(j) = arg min ∑ wj;k fk ; s:t: &gt; 0 (3)
j k fk</p>
      <p>The general schematic of the white-box gradient-based
perturbation is shown in Figure 1.</p>
      <p>
        Black-box Perturbation In contrast to white-box
methods, black-box methods select the perturbation position
randomly within unperturbed text. Consequently, black-box
methods could yield adversarial examples that are less
effective in misguiding the classifier than white-box generated
examples. Below, we describe two black-box methods used
to perturb text.
• Black-box, Budget-Limited Perturbation (Anchor
Points). The anchor points (AP) method, proposed
in
        <xref ref-type="bibr" rid="ref24">(Sethi and Kantardzic 2018)</xref>
        , is prescribed when the
adversary has a limited budget and can send fewer queries to
the classifier. The adversary starts with a set of valid,
unperturbed samples drawn randomly from the unperturbed
dataset. AP adversarial data generation proceeds in two
stages called exploration and exploitation, each with its
respective budget. In the exploration stage, the adversary
uses one unit of exploration budget to randomly select
one sample from the unperturbed set and and adds to it
a perturbation vector drawn from a normal distribution
N (0; R), where R ∈ [Rmin; Rmax) is a perturbation
radius. The perturbed sample is sent as a query to the
classifier and if it is categorized with the same label as the
original sample, it is retained as a candidate
adversarial example. The perturbation radius is adjusted
proportional to the fraction of candidate adversarial examples
and these steps are repeated until the exploration budget
is expended. During the exploit phase, the adversary
creates an adversarial example by generating a convex
combination of a pair of randomly selected candidate
adversarial examples created during exploration phase.
Generating each adversarial example consumes one unit of
adversary’s exploitation budget.
• Black-box, Budget-Lenient Peturbation (Reverse
Engineering). The reverse engineering (RE) attack, also
proposed in
        <xref ref-type="bibr" rid="ref24">(Sethi and Kantardzic 2018)</xref>
        , is accomplished
once again in two stages called exploration and
exploitation. The adversary once again starts with a set of valid,
unperturbed samples drawn randomly from the
unperturbed dataset. In the exploration stage, the adversary
trains its own stand-in classifier representing the real
classifier. To get training data for its stand-in, it generates
sample adversarial data by generating a random vector in
a direction that is orthogonal to the difference vector
between a pair of valid samples taken from the unperturbed
set. The adversarial example is generated by adding the
random vector and the average of the two valid samples
used to generate the random vector. The adversarial
example and its category obtained by sending the example
as a query to the classifier, are recorded as training data
and used to train the adversary’s stand-in classifier. In the
exploitation stage, the stand-in classifier is used to
generate samples that are predicted by it to produce a desired
classification. The anchor points method described above
could be used for the exploitation stage. The reader is
referred to
        <xref ref-type="bibr" rid="ref24">(Sethi and Kantardzic 2018)</xref>
        further details about
the AP and RE algorithms.
      </p>
      <p>Gray-box Perturbation Despite their limitation of
generating less effective adversarial text, black-box methods can
20 words</p>
      <p>1 word
30 words
0.2
0.4
0.8</p>
      <p>1
0.6
BLEU Score
3000
2500 lse</p>
      <p>p
2000 a
m
S
f
o
1500 re
b
m
1000 u</p>
      <p>N
500
0
10 words
generate larger and more diverse set of adversarial examples
with fewer queries or probes to the classifier than white-box
methods. With this insight, we propose to combine
whitebox and black-box methods into a gray-box method to
investigate perturbation methods that can generate effective
yet diverse adversarial examples. In the gray-box method,
instead of drawing unperturbed samples for the black-box
methods randomly from the original dataset, we first use the
white-box method to generate a small seed set of perturbed
examples from samples that are randomly-drawn from the
original dataset. This seed set of perturbed examples is then
used to create a larger set of perturbed examples using the
AP and RE black-box methods.</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>
        Word Embedding and Classifier Network. We have
evaluated our proposed techniques on the DBPedia dataset.
The dataset contains 700; 000 short articles extracted from
Wikipedia and categorized into 14 different categories. For
generating word embeddings from English words we have
used a widely used vector representation format called
Word2Vec
        <xref ref-type="bibr" rid="ref18">(Mikolov et al. 2013)</xref>
        . Word2Vec trains a neural
network on a vocabulary of 150; 000 English words, each
word in the vocabulary is given a unique one-hot
identifier. Word2Vec gives a feature vector in a 300-dimension
space for each word in the vocabulary. We have used a
publicly available, pre-trained Word2Vec model for generating
word embeddings for our experiments. Following recent
research
        <xref ref-type="bibr" rid="ref27">(Yu et al. 2017)</xref>
        that showed that the prefix of a long
piece of text is important for determining its label, we have
assumed each query is limited to the first 40 words in the
text. Before sending a query text to the classifier, words in
the text are given a unique integer identifier, words not in the
vocabulary are given an id of zero. The classifier used for our
experiments is based on
        <xref ref-type="bibr" rid="ref12 ref27 ref28 ref29 ref30">(Zhang and Wallace 2017)</xref>
        1. It uses
a deep neural network with three convolutional layers with
filter sizes of 3, 4, and 5 respectively. A rectified linear unit
1Code available at
https://github.com/dennybritz/cnn-textclassification-tf
(ReLU) activation function and max pooling are used with
each layer. The pooled outputs are combined by
concatenating them followed by a dropout layer with keep probability
0:5. The output of the dropout layer is a probability
distribution indicating the confidence level of the classifier for
the query text over the different data categories. The
category with the maximum confidence level is output as the
category of the query text. The loss function used for
calculating gradients of the inupt is implemented using softmax
cross entropy. The gradients are backpropagated through the
classifier’s network to calculate the gradients of the different
features at the input. The feature gradients are again
backpropagated through the Word2Vec network to calculate the
gradients of the words, as shown in Figure 1.
      </p>
      <p>
        For evaluating the effectiveness of the perturbation
technique we have used the Bilingual Evaluation Understudy
(BLEU) score
        <xref ref-type="bibr" rid="ref21">(Papineni et al. 2002)</xref>
        . BLEU is a widely used
evaluation metric for automated machine translation of text
and has recently been used to measure the difference
between adversarially generated synthetic text with original
text
        <xref ref-type="bibr" rid="ref12 ref27 ref28 ref29 ref30">(Zhang, Xu, and Li 2017)</xref>
        . It compares the similarity
between a machine translation of text and a professional
human translation without considering grammar or whether the
text makes sense. We have used BLEU-4 that scores sets of
four consecutive words, or 4-grams. When two pieces of text
are identical their BLEU score is 1:0, and as the dissimilarity
increases, the BLEU score approaches 0.
      </p>
      <p>In our first experiment, we analyzed the effect of the
amount of perturbation of different text samples measured
in terms of BLEU score (x-axis) on the number of samples
that are assigned the same label before and after perturbation
(y-axis). Perturbed text was generated using the white-box
gradient-based method where the features of the words with
the most negative gradients, calculated using Equation 2,
were perturbed with a small random amount. The number of
words to perturb was varied over {1; 10; 20; 30}: For each
of these perturbation amounts, a batch of 1000 text samples
were perturbed, and results were averaged over 4 batches.
BLEU scores of perturbed text was binned into intervals of
0:1 by rounding each BLEU score to the first place of
decimal. Results shown in Figure 2, illustrate that as the BLEU
score decreases (perturbed text becomes more different from
the original text), the fraction of samples that retain the same
label before and after perturbation also decreases. This result
appears intuitive because the more different a piece of text
is from its original, unperturbed version, the less likely it is
to retain the same label. However, for BLUE score of 0:4
and lower, we observe that the fraction of samples retaining
the same label as the original slightly increases when 10 or
more words are perturbed. This is possibly due to the fact
that when the perturbed text is very different from the
original text, most of the perturbed words are out of context from
the original text and the text appears as nonsense to a human
reader. The machine classifier however is confounded into
labeling the nonsensical, perturbed text with the same label
as the original text, possibly due to one or more of the
unperturbed words. Further investigation into this issue would
lead to a better understanding of the degree of perturbation
that converts text into rubbish for a human and possibly
unintelligible for the machine classifier as well.</p>
      <p>For our next set of experiments we analyzed the effect
of the main algorithm parameters in the AP and RE
algorithms, the maximum and minimum perturbation radii,
Rmin and Rmax on the amount of perturbation measured
in terms of BLEU score, and the fraction of samples that
retain the same label as the original. Results are shown in
Figures 3 through 5. We observe that for both AP (Figure 3(a))
and RE (Figure 4(a)) algorithms, as the perturbation radius
increases, the BLEU score reduces, which corroborates the
fact that the degree of perturbation increases the divergence
between the original and perturbed text. Correspondingly,
the fraction of perturbed text that retains the same label as
the original text generally decreases with the increase in
perturbation radius for both AP and RE, in Figure 3(b) and
Figure 4(b) respectively. Figure 5 shows the BLEU score
and fraction of samples that retain same label as the original
versus the number of words perturbed using the white-box
gradient-based approach, where the words with highest
negative gradient were replaced with words with smallest
positive gradient calculated using Equations 2 and 3. As before,
we observe that more perturbation results in lower BLEU
scores implying more divergent text after pertubation. More
perturbation also reduces the fraction of perturbed samples
that retain the same label as the original text.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper we investigated gray-box techniques for
generating adversarial text as a combination of white-box
gradient-based techniques and black-box techniques. We
validated the correct behavior of the gray-box techniques in
generating perturbed text while showing that more
perturbation results in greater divergence as well as greater degree of
label-changes in the perturbed text with respect to the
original text. In the future, we plan to compare the proposed
gray-box adversarial text generation methods with
GANbased and RNN-based synthetic text generation. While
single words are usually treated as the basic lexical unit, it
would be interesting to analyze how character-level
perturbations and word sequence or n-gram-level perturbations
affect adversarial text generation. We are also interested in
getting a better understanding of how to determine a critical or
minimal amount of perturbation that would be successful in
generating adversarial text. We envisage that this work and
its extensions will generate interesting results that would
enable a better understanding and novel means for adversarial
text generation, and methods to make machine classifiers
robust against adversarial text-based attacks.
1
0.95
0.9
0.85
re 0.8
o
cS0.75 Rmin = 1.0
LEU 0.7
B0.65
0.6
0.55
0.5
0</p>
      <p>2</p>
      <p>Rmin= .4</p>
      <p>Rmin = 1.0
0</p>
      <p>2
1
0.95
0.9
0.85
re 0.8
o
cS0.75
LEU 0.7
B0.65
0.6
0.55
0.5
1
0.95
0.9
0.85
re 0.8
o
cS0.75
LEU 0.7
B0.65
0.6
0.55
0.5
0</p>
      <p>Rmin = .7</p>
      <p>Rmin = .4</p>
      <p>Rmin = .1
4</p>
      <p>6
Radius Max
(a)
Rmin = .1
4</p>
      <p>6
Radius Max
(a)</p>
      <p>Rmin = .1
6</p>
      <p>Radius Max
(b)</p>
      <p>Rmin = .1
Rmin = .7</p>
      <p>Rmin = 1.0</p>
      <p>Rmin = .4
4 6
Radius Max
(b)
(b)
8
10
12
0
2
4
8
10
12
conference on machine learning and knowledge discovery
in databases, 387–402. Springer.</p>
      <p>Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Sˇrndic´, N.;
Laskov, P.; Giacinto, G.; and Roli, F. 2013. Evasion attacks
against machine learning at test time. In Joint European
Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever,
I.; and Abbeel, P. 2016. Infogan: Interpretable
representation learning by information maximizing generative
adver5</p>
      <p>10 15
Number of Words to Perturb
(a)
20
25
0
5</p>
      <p>10 15
Number of Words to Perturb
20
25
sarial nets. In Advances in neural information processing
systems, 2172–2180.</p>
      <p>Ebrahimi, J.; Rao, A.; Lowd, D.; and Dou, D. 2018.
Hotflip: White-box adversarial examples for text classification.
In Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics, ACL 2018, Melbourne,
Australia, July 15-20, 2018, Volume 2: Short Papers, 31–36.
2014.</p>
      <p>CoRR</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Arjovsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chintala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Wasserstein gan</article-title>
          .
          <source>arXiv preprint arXiv:1701</source>
          .
          <fpage>07875</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Alain</surname>
            , G.; and Vincent,
            <given-names>P.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Generalized denoising auto-encoders as generative models</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>899</volume>
          -
          <fpage>907</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jaitly</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>Scheduled sampling for sequence prediction with recurrent Fedus</article-title>
          , W.; Goodfellow,
          <string-name>
            <given-names>I.;</given-names>
            and
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Maskgan: Better text generation via filling in the</article-title>
          . arXiv preprint arXiv:
          <year>1801</year>
          .07736.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lanchantin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Soffa,
          <string-name>
            <surname>M. L.</surname>
          </string-name>
          ; and Qi,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Blackbox generation of adversarial text sequences to evade deep learning classifiers</article-title>
          . arXiv preprint arXiv:
          <year>1801</year>
          .04354.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          2014.
          <article-title>Generative Adversarial Nets</article-title>
          . In
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lawrence</surname>
          </string-name>
          , N. D.; and
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          , K. Q., eds.,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>27</volume>
          ,
          <fpage>2672</fpage>
          -
          <lpage>2680</lpage>
          . Curran Associates, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>abs/1412</source>
          .6572.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Gregor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Danihelka</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rezende</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Wierstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Draw: A recurrent neural network for image generation</article-title>
          . In Bach, F., and
          <string-name>
            <surname>Blei</surname>
          </string-name>
          , D., eds.,
          <source>Proceedings of the 32nd International Conference on Machine Learning</source>
          , volume
          <volume>37</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>1462</volume>
          -
          <fpage>1471</fpage>
          . Lille, France: PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>2017. Long text generation via adversarial training with leaked information</article-title>
          .
          <source>arXiv preprint arXiv:1709</source>
          .
          <fpage>08624</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Husza´r</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>How (not) to train your generative model: Scheduled sampling</article-title>
          , likelihood, adversary? arXiv preprint arXiv:
          <volume>1511</volume>
          .
          <fpage>05101</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wieting</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Adversarial example generation with syntactically controlled paraphrase networks</article-title>
          .
          <source>arXiv preprint arXiv:1804</source>
          .06059.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Adversarial examples for evaluating reading comprehension systems</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>07328</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          2016.
          <article-title>Bag of Tricks for Efficient Text Classification</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>arXiv:1607</source>
          .01759 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Su,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ; and
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>Deep text classification can be fooled</article-title>
          .
          <source>In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18</source>
          ,
          <fpage>4208</fpage>
          -
          <lpage>4215</lpage>
          .
          <source>International Joint Conferences on Artificial Intelligence Organization.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; Karafia´t, M.;
          <string-name>
            <surname>Burget</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; Cˇernocky`, J.; and
          <string-name>
            <surname>Khudanpur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In Eleventh Annual Conference of the International Speech Communication Association.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Chen,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            ; and
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          . In Burges,
          <string-name>
            <given-names>C. J. C.</given-names>
            ;
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ;
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ; and
            <surname>Weinberger</surname>
          </string-name>
          , K. Q., eds.,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          ,
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . Curran Associates, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Osindero</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Conditional generative adversarial nets</article-title>
          .
          <source>arXiv preprint arXiv:1411</source>
          .
          <fpage>1784</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McDaniel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Fredrikson,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Celik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. B.</given-names>
            ; and
            <surname>Swami</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>The limitations of deep learning in adversarial settings</article-title>
          .
          <source>In Security and Privacy (EuroS&amp;P)</source>
          ,
          <source>2016 IEEE European Symposium on</source>
          ,
          <fpage>372</fpage>
          -
          <lpage>387</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Papineni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roukos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Ward,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Zhu</surname>
          </string-name>
          , W.-J.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>BLEU: A Method for Automatic Evaluation of Machine Translation</article-title>
          .
          <source>In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics</source>
          , ACL '
          <volume>02</volume>
          ,
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          . Stroudsburg, PA, USA: Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Socher, R.; and
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Glove: Global Vectors for Word Representation</article-title>
          .
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Sethi</surname>
            ,
            <given-names>T. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kantardzic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Data driven exploratory attacks on black box classifiers in adversarial domains</article-title>
          .
          <source>Neurocomputing</source>
          <volume>289</volume>
          :
          <fpage>129</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barzilay</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Jaakkola</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Style transfer from non-parallel text by cross-alignment</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>6830</volume>
          -
          <fpage>6841</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rajeswar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dutil</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Adversarial generation of natural language</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Representation Learning for NLP</source>
          ,
          <fpage>241</fpage>
          -
          <lpage>251</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; Zhang, W.;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Seqgan: Sequence generative adversarial nets with policy gradient</article-title>
          .
          <source>In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9</source>
          ,
          <year>2017</year>
          , San Francisco, California, USA.,
          <volume>2852</volume>
          -
          <fpage>2858</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification</article-title>
          .
          <source>In Proceedings of the Eighth International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          ,
          <fpage>253</fpage>
          -
          <lpage>263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Y.;
          <string-name>
            <surname>Gan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Henao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Carin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Adversarial feature matching for text generation</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          ,
          <fpage>4006</fpage>
          -
          <lpage>4015</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            , H.; Xu,
            <given-names>T.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks</article-title>
          .
          <source>In IEEE International Conference on Computer Vision</source>
          , ICCV 2017, Venice, Italy,
          <source>October 22-29</source>
          ,
          <year>2017</year>
          ,
          <fpage>5908</fpage>
          -
          <lpage>5916</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>