<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>TurQUaz at CheckThat! 2024: Creating Adversarial Examples using Genetic Algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Basak Demirok</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Selin Mergen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bugra Oz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mucahid Kutlu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Qatar University</institution>
          ,
          <addr-line>Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TOBB University of Economics and Technology</institution>
          ,
          <addr-line>Ankara</addr-line>
          ,
          <country country="TR">Türkiye</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>As we increasingly integrate artificial intelligence into our daily tasks, it is crucial to ensure that these systems are reliable and robust against adversarial attacks. In this paper, we present our participation in Task 6 of CLEF CheckThat! 2024 lab. In our work, we explore several methods, which can be grouped into two categories. The ifrst group focuses on using a genetic algorithm to detect words and changing them via several methods such as adding/deleting words and using homoglyphs. In the second group of methods, we use large language models to generate adversarial attacks. Based on our comprehensive experiments, we pick the genetic algorithm-based model which utilizes a combination of splitting words and homoglyphs as a text manipulation method, as our primary model. We are ranked third based on both BODEGA metric and manual evaluation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Adversarial Examples</kwd>
        <kwd>Credibility Assessment</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Robustness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Researchers have investigated various adversarial attacks and defense mechanisms for NLP tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Chen et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] examined backdoor attacks where manipulated training data causes models to fail with
specific triggers but perform normally otherwise. Yang et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] demonstrated that altering a single
word embedding can fool sentiment analysis models without afecting clean data results. Kurita et al.
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] compared various backdoor attacks for diferent NLP tasks, finding that attack success varies across
tasks. Dai et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] showed that inserting trigger sentences into LSTM-based models is highly efective.
In this study, we target already trained models.
      </p>
      <p>
        Researchers have also examined the vulnerabilities of NLP models in a black-box setting. The methods
explored in prior work can be categorized into three types: 1) character-level changes, where words are
modified with diferent spelling errors [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], 2) word-level changes, involving the replacement, removal,
or addition of words [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and 3) sentence-level changes, where new sentences or phrases are added, or
existing ones are removed or paraphrased [13]. While our text manipulation techniques are similar to
the ones in the literature, we use a genetic algorithm to decide the words to be changed and how to
change them. In addition, we explore the utilization of LLMs for adversarial example generation.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methods</title>
      <p>In our work, we propose two diferent methods including a genetic algorithm-based approach and
LLM-based approach. Now we explain these methods.</p>
      <sec id="sec-3-1">
        <title>3.1. Genetic Algorithm Based Approach</title>
        <p>Some words are more important than others in several NLP tasks. For instance, let us consider the
following statement for the fact-checking task: The capital city of Turkiye is Ankara. If we change the
word Ankara to any other city name, the statement will be false. Therefore, if we can make the word
Ankara unreadable for the models, it is likely that the model will be confused in the prediction. Once
an important word is selected, then the next question is how to modify it to fool the models. Therefore,
our approach can be considered in two steps: i) detecting the words to be modified and ii) applying the
modification method. Now we explain these two steps in detail.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Selecting Words to be Modified</title>
          <p>We develop a genetic algorithm for selecting the words to be modified. Algorithm 1 describes our
genetic algorithm. As the first step, we tokenize the input text and create potential mutations to form
an initial population [Line 1]. Based on our mutation strategy, we apply mutations to each word
with a probability defined by the mutation_rate (0.1) Each candidate’s fitness is evaluated based on its
ability to deceive the target model [Line 3]. If a candidate changes the label of the input, it receives a
iftness score which is reflective of the modifications made. Otherwise, the candidate gets a 1000-point
penalty additional to the modifications. Through the selection phase, the most promising candidates are
retained, and through genetic operations like crossover and mutation, a new generation of text variants
is created [Lines 4-5]. The crossover operation is executed by selecting a random, appropriate point
in the token list of the chosen parents, ensuring that the structural integrity of words is maintained.
Ofspring are then produced by merging segments from each parent up to and beyond this point [ Line
5]. As for the mutation, it alters each token based on the chosen mutation strategy/strategies, such as
homoglyph replacement or various strategic word splits with a probability defined by the mutation_rate
[Line 5]. We explain the mutation strategies in detail in Section 3.1.2. This iterative process continues
until a successful adversarial text is generated or the maximum number of generations is reached [Lines
2-7]. If no successful manipulation is achieved, the original text is returned as a fallback [Line 8]. We
set the maximum number of generations to 10 and the population size to 20 in all our experiments.
Algorithm 1 : Genetic Algorithm Structure
1: Initialize Population: generate initial _ many mutations.
2: for  = 1 to _ do
3: Evaluate Fitness: calculate fitness for each candidate.
4: Selection: select the top half of the population by fitness.
5: Crossover and Mutation: apply crossover and mutation on selected parents to create new
population.
6: if any candidate change the label then
7: return adversarial text
8: return original_text as fallback</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Mutation Methods</title>
          <p>In this section, we explain the word modification techniques used in our study. Figure2 shows an
example modified sentence for each method.</p>
          <p>Homoglyph Replacement (HomoglyphRep). Some letters are visually similar, i.e., homoglyphs, but
their encodings are diferent. So in this approach, we replace the characters with their visually similar
ones. Figure 1 shows the letters we replaced and which letters were used for the replacement. In case
there are multiple homoglyphs for a letter, we randomly select one of them.</p>
          <p>Splitting Words Randomly (SplitR). If a word is split into two, we can still easily understand the
meaning of the corresponding sentence/phrase1. However, models like BERT can be highly afected
by this kind of typo because it will lead to incorrect tokenization and it might cause getting
out-ofvocabulary representations for these words. Therefore, in this approach, we split words by adding a
space character to a randomly selected index of the word.</p>
          <p>Splitting Words Meaningfully (SplitM). In this method, instead of splitting the words from a random
position, we focus on trying to create meaningful words after splitting, e.g., "langu age". By this approach,
the model will get a completely irrelevant but valid word, causing huge changes in its representation.
We use the NLTK word corpus to identify the valid words and split them accordingly. As an exceptional
case, we avoid splitting the first and last characters unless the word starts or ends with ‘a’.
Split Words Heuristically (SplitH). In this method, we split based on the first and the last character
of the targeted word. In particular, if the targeted word starts or ends with characters ‘a’ or ‘i’, we divide
them from the beginning or the end accordingly. Otherwise, we choose a random index to split.
CombineHomoglyphRep&amp;Split: In this method, we combine HomoglyphRep and Split methods. In
particular, we randomly choose one of them and apply accordingly.
1e.g., "nat ural language processing"
CombineHADSSh. In this method, we randomly select from one of the following methods: i)
HomoglyphRep, ii) SplitR, iii) adding random characters into words, iv) deleting a randomly selected letter,
and v) shufling the order of the letters within the targeted word.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. LLM Based Approach</title>
        <p>As LLMs have impressive performance in text generation and semantic analysis of text, we investigate
how they can be utilized to create adversarial examples. We propose three diferent methods based on
LLMs, LLAMA 32 and Mistral3. The details of these methods are explained below. Figure3 shows an
example modified sentence for each LLM based method.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1. Paraphrasing with LLMs (LLMParaphrase)</title>
        <p>In this method, we explore the impact of paraphrasing using LLMs. We use LLAMA3 to paraphrase the
texts with the following prompt: "Paraphrase the following sentence with similar length T: S" where S is
the input sentence and T represents the token count of the given text.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2. Identifying Words to be Changed Using LLMs (LLMIdentify)</title>
        <p>In this method, we use LLMs to identify the words that convey the most important information for the
general meaning of the given text. We directly ask LLAMA 3 to identify two important words and then
apply HomoglyphRep method for those methods. We use following prompt for this task: "You are an
information extractor and your task is to extract and return the two most important words that convey the
meaning of the sentence. You should output the extracted words in the ’word1, word2’ format. Sentence:
{sentence}"</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2.3. Creating Adversarial Examples (LLMAdversarial)</title>
        <p>In this method, we utilize LLMs to create adversarial examples and pre-evaluate its validity by another
LLM. Figure 4 shows the process flow of our method. In particular, firstly, we do not ask only to
paraphrase a given text but ask LLAMA3 to create an adversarial example for a given text. Next, we
2https://ollama.com/library/llama3:8b
3https://ollama.com/library/mistral v02
ask Mistral to check if the generated text is an adversarial attack for the corresponding task. If Mistral
verifies it, we use that generated text. Otherwise, we generate another sample using LLAMA3. This
generation and verification process continues at most three iterations. After three attempts, we use
LLAMA3’s output although Mistral does not verify that.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>In this section, we present the experimental setup and the results.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>Datasets. The dataset shared by the organizers of the lab covers five diferent binary classification
tasks: Style-based news bias assessment (HN), propaganda detection (PR2), fact-checking (FC), rumor
detection (RD), and COVID-19 misinformation detection (C19). Table 1 provides statistics about the
datasets.
Evaluation Metrics: This task uses the Bodega score [14] to evaluate the systems, which is basically
the multiplication of confusion score (or success rate), semantic score, and character score. This score
takes values between 0 and 1. A high score indicates that the model is deceived by preserving the
meaning and appearance, while a low score indicates a weak deception by changing the meaning and/or
appearance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Results</title>
        <p>In this section, we present results for our proposed methods on the test set. Firstly, we present the result
for the genetic algorithm-based approaches against victim models by taking the average of all problem
domains (Section 4.2.1). Next, we report the results for both LLM-based and genetic algorithm-based
approaches in the fact-checking task (Section 4.2.2). Lastly, we present our oficial results (Section 4.3).</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Results for Genetic Algorithm Based Approach</title>
          <p>We report genetic algorithm-based results when the target model is BiLSTM, BERT, and Surprise. The
results are shown in Table 2. Our observations based on the results are as follows. Firstly, among split
models, SplitM has significantly lower success rate and BODEGA score, but slightly higher semantic
and character scores. While SplitR and SplitH have highly similar scores, SplitR outperforms SplitR
slightly in terms of BODEGA. Secondly, HomoglyphRep achieves the highest semantic score in all
cases and the highest character score in two cases, but its BODEGA and success rate are lower than
our combination-based methods. Thirdly, CombineHomoglyphRep&amp;Split outperforms CombineHADSSh in all
cases in terms of BODEGA, suggesting that we can focus on only a few text manipulation methods
instead of covering all. Finally, among all models, CombineHomoglyphRep&amp;Split and SplitR has the same
average BODEGA score, but CombineHomoglyphRep&amp;Split slightly outperforms SplitR in terms of success
rate and semantic score.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Results for LLM Based Approach</title>
          <p>As getting results for LLM based approaches require much more computation power and time, we could
get results only for the fact checking task and for BERT and BiLSTM models. Table3 shows the results
for LLM based approaches and also corresponding genetic algorithm based approaches for comparison.</p>
          <p>All LLM based approaches resulted in lower BODEGA scores compared to all genetic algorithm based
methods. Furthermore, genetic algorithm based methods achieve very high success rates and character
scores. CombineHomoglyphRep&amp;Split achieves a perfect success rate in both cases but HomoglyphRep
slightly outperforms CombineHomoglyphRep&amp;Split in terms of average BODEGA score.</p>
          <p>Among LLM based approaches, our results are mixed. LLMIdentify achieves the lowest BODEGA score
when the target model is BERT. However, it yields the highest BODEGA score when the target model
is BiLSTM. This suggests that BiLSTM models are highly afected by homoglyph attacks.
Interestingly, LLMParaphrase outperforms LLMAdversarial in both target models although the prompt we use in
LLMParaphrase just asks to paraphrase the text without any intention of creating an adversarial example
while we ask to create adversarial example in LLMAdversarial.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Oficial Ranking</title>
        <p>We submitted the results of CombineHomoglyphRep&amp;Split as our oficial run because of its superior
performance on average. We achieved 0.4859 BODEGA score on average, ranking third among participants.
Based on the results with manual annotations for preserving meaning, we achieved 0.62, ranking again
third.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we present our participation in Task 6 of the CLEF 2024 CheckThat! Lab. In our study, we
explore two diferent approaches to create adversarial examples. In the first approach, we use a genetic
algorithm to detect the words to be changed and to identify text manipulation methods. We investigate
various text manipulation methods, such as adding/deleting a letter, using homoglyphs, and shufling
the order of letters within a text. In the second approach, we utilize large language models to create
adversarial examples. This involves three diferent methods: asking LLMs to paraphrase a given text,
using LLMs to directly generate adversarial examples, and employing LLMs to identify the words that
need to be changed to create adversarial examples.</p>
      <p>In our comprehensive set of experiments, which involve six diferent tasks, three diferent target
models, and a total of nine methods, we have the following observations. Firstly, genetic algorithm-based
methods outperform all LLM-based approaches. Secondly, among the genetic algorithm-based methods,
using the combination of homoglyphs and splitting words as text manipulation outperforms the other
methods. This suggests that we need to be more selective in text manipulation methods instead of using
all possible methods. In the oficial ranking, our primary model is ranked third based on the BODEGA
score and semantic preservation scores which are based on manual annotations.</p>
      <p>In the future, we plan to extend this work in two diferent directions. Firstly, although LLM models
did not achieve high performance in this task, we believe that their efectiveness can be improved
through several strategies, such as using diferent prompts and fine-tuning them specifically for this task.
Secondly, regarding the genetic algorithm-based methods, we plan to explore other text manipulation
methods to enhance this model further.
attack on text classification and entailment, in: Proceedings of the AAAI conference on artificial
intelligence, volume 34, 2020, pp. 8018–8025.
[13] D. Dogan, B. Altun, M. S. Zengin, M. Kutlu, T. Elsayed, Catch me if you can: Deceiving stance
detection and geotagging models to protect privacy of individuals on twitter, in: Proceedings of
the International AAAI Conference on Web and Social Media, volume 17, 2023, pp. 173–184.
[14] P. Przybyła, A. Shvets, H. Saggion, Bodega: Benchmark for adversarial example generation in
credibility assessment, arXiv preprint arXiv:2303.08032 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. E.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. Z.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhazmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Adversarial attacks on deep-learning models in natural language processing: A survey</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 11</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shvets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Sheang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF2024 CheckThat! lab task 6 on robustness of credibility assessment with adversarial examples (incrediblae)</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CLEF</source>
          <year>2024</year>
          , Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          , G. Da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Piskorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2024 CheckThat! Lab: Check-worthiness, subjectivity, persuasion, roles, authorities and adversarial robustness</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Di Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alzantot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elgohary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-J.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Generating natural language adversarial examples</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2890</fpage>
          -
          <lpage>2896</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Boucher</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shumailov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Papernot</surname>
          </string-name>
          ,
          <article-title>Bad characters: Imperceptible nlp attacks</article-title>
          ,
          <source>in: 2022 IEEE Symposium on Security and Privacy (SP)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>1987</fpage>
          -
          <lpage>2004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Adversarial attacks and defenses in deep learning</article-title>
          ,
          <source>Engineering</source>
          <volume>6</volume>
          (
          <year>2020</year>
          )
          <fpage>346</fpage>
          -
          <lpage>360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Backes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Badnl:
          <article-title>Backdoor attacks against nlp models with semantic-preserving improvements</article-title>
          ,
          <source>in: Proceedings of the 37th Annual Computer Security Applications Conference</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>554</fpage>
          -
          <lpage>569</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in nlp models</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2048</fpage>
          -
          <lpage>2058</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kurita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Michel</surname>
          </string-name>
          , G. Neubig,
          <article-title>Weight poisoning attacks on pretrained models</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2793</fpage>
          -
          <lpage>2806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A backdoor attack against lstm-based text classification systems</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>138872</fpage>
          -
          <lpage>138878</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bisk</surname>
          </string-name>
          ,
          <article-title>Synthetic and natural noise both break neural machine translation</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , P. Szolovits,
          <article-title>Is bert really robust? a strong baseline for natural language</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>