<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adversarial Attack⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haokun He</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yafeng Song</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dylan Massey</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper describes the submission of attack methods and results for shared task 6 at CheckThat! Lab at CLEF 2024. We present two novel attack methods to test the robustness of credibility assessment (CA) classifiers across ifve tasks: fact-checking, COVID-19 misinformation detection, propaganda detection, style-based news bias assessment, and rumor detection. The methods were evaluated using the BODEGA score, which accounts for the success of the attack while preserving the original text's meaning. Our GloTa method, combining GloVe embeddings with RoBERTa-based substitutions, demonstrated superior efectiveness in most tasks compared to baselines. Notably, GloTa achieved the highest BODEGA scores in propaganda detection and fact-checking, indicating significant vulnerability in these areas. However, the method showed comparable performance to baselines in style-based news bias and rumor detection, reflecting the inherent robustness of classifiers in these tasks. Against a more robust pre-trained RoBERTa classifier, GloTa still outperformed RoBERTa-ATTACK, although with generally lower success rates. These findings highlight the need for continuous improvement in adversarial attack techniques to enhance the robustness of CA systems against evolving threats.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;robustness</kwd>
        <kwd>adversarial attack</kwd>
        <kwd>BODEGA score</kwd>
        <kwd>GloVe embeddings</kwd>
        <kwd>credibility assessment</kwd>
        <kwd>RoBERTa</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Credibility assessment (CA) can be understood as a family of tasks that have their goal in determining
whether a given textual document adheres to constraints, such as factuality, or not [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Advances in
NLP techniques and increased availability of high-quality domain-specific data have made classifiers
for CA viable for real-world deployment in contexts such as automated moderation of comments in
online platforms.
      </p>
      <p>
        However, recent studies [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] indicate that text classifiers can be easily deceived through simple
manipulations. For example, a user might circumvent a misinformation classifier by selectively
replacing alphabetic characters with numbers. In the statement drinking water kills, this would lead to
a perturbation such as drinking w4ter k1lls, which a classifier might not robustly handle, leading to
misclassification. Such alterations, both simple and sophisticated, highlight that classifiers still lack the
robustness needed to withstand attacks from users with potentially malicious intent.
      </p>
      <p>The robustness of classifiers can be systematically assessed by automatically perturbing initially
correctly classified input examples until the classifier’s decision is altered. From an attacker’s perspective,
the goal is to develop an algorithm that generates adversarial examples for each text sequence, resulting
in an opposite label compared to the original text sequence. If a decision can be changed (i.e., the
classifier gets confused), the attack is considered successful. To ensure that perturbations remain
human-readable and convey the original content, semantic and character-based distance metrics can be
employed in systematic robustness assessments.</p>
      <p>In the present working notes, we detail two attack methods1 used to assess the robustness of
ifve tasks given by this shared task: fact checking (FC), COVID-19 misinformation detection (C19),
propaganda detection (PR), style-based news bias assessment (HN), and rumor detection (RD). These
tasks are interpreted as binary classification tasks aimed at determining whether a given piece of text is
credible or not.</p>
      <p>
        We evaluate our attack methods on various classifier models, referred to as victim models, which
difer in architecture but are applied to the same tasks for comparison. The evaluation metric employed
is the BODEGA score, as proposed by Przybyła et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which measures the success rate of confusion
under the constraint of meaning preservation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        Credibility assessment (CA) is the high-level task concerned with determining whether some given
natural language expression is credible with regards to some aspect, e.g. veracity, or not. Assessing
the robustness of classifiers performing CA is vital, for otherwise users with malicious intent might
easily bypass such classifiers in contexts such as automated content moderation, such as the screening
of contributions to online forums. We restrict our focus to five CA tasks. The first task, FC is concerned
with classifying whether a given natural language statement is true or false relative to some body of
knowledge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The fact checking classifiers we attack are based on data from Thorne et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The other
four tasks are similar to FC, but difer by text type and subsequently by the datasets they were trained
and evaluated on. These tasks have the goal of assessing whether some given text is misinformation [5],
a rumor [6], propaganda [7] or fake news [8] respectively. A summary of the statistical information for
these tasks is presented in the following Table 1.
      </p>
      <p>Previous work on classifier attack strategies can be broadly classified by information available to
an attacker (black-box, grey-box, white-box) and perturbation granularity (sentence-level, word-level,
character-level). In a white-box setting an attacker has full visibility of the models internals, including
model weights [9]. In a black-box scenario – as understood in the context of this task – an attacker
only can obtain information of the (binary) classification decision / confidence scores.</p>
      <p>
        We solve for the grey-box version of the task as outlined in Przybyła et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. An attacker hence,
(1) can obtain the confidence scores from the model, (2) is provided information about the high-level
architecture (not model parameters though), and (3) has access to training and development datasets as
also the evaluation method. Further, an attacker is free to query the model as many times as needed in
order to confuse it. Przybyła et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduce an attack efectiveness metric, called BODEGA score,
that is composed of the confusion success rate, while also punishing semantic and lexical
(characterlevel) distance. They detail BODEGA scores for a number of methods on the five aforementioned tasks
(FC, C19, PR, HN, RD) subsuming three diferent model types – BERT [ 10], BiLSTM and RoBERTa [11].
The datasets used to train the victim models are openly available3.
1The detailed code for our methods is available at: https://github.com/yafengsong/InCrediblAE-2024-GloTa
2The numbers of texts here are from the data used for the BERT Classifier. The number of texts may vary slightly for the
other two models.
3cf.: https://gitlab.com/checkthat_lab/clef2024-checkthat-lab/-/tree/main/task6/incrediblAE_public_release
      </p>
      <p>
        The measure of evaluation, the BODEGA score, is computed as the product of one binary and two
real-valued numbers BODEGA = succ * semdist * chrdist. Where succ ∈ 0, 1. The success variable
(succ) takes 1 when confusion is achieved and 0 when not. The other two values, semantic distance
(semdist) and character-edit distance (chrdist) are ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. 1 indicates that similarity is preserved relative
to the original in both cases, whereas a value closer to 0 signifies a higher divergence. The BODEGA
scores over the individual adversarial generations are then mean-aggregated across test data points and
tasks, to generate a final score. Thus, a high BODEGA score against a classifier implies low robustness,
but a high fecundity of the attack method. The BODEGA score can only consider attacks successful
that targeted towards a potential attacker’s goals, i.e. we are only interested in changes from 1 → 0, or
consider both confusion directions as a success, which can be understood as untargeted. We consider
only the untargeted scenario.
      </p>
      <p>As basis for our experiments, we follow two promising methods, of which the latter also serves as one
of our baselines. Firstly, Li et al. [9], who replace words using nearest neighbor search and second Li
et al. [12], who use BERT to detect potential replacements for each input instance. Li et al. [12]’s method,
BERT-ATTACK, consists of probing the victim model for words that have high potential to change
the classification confidence and then in a subsequent step looking for suitable replacement words
for the most vulnerable words that still preserve the meaning. Their method outperforms previous
methods and is shown to work relatively well independent of the specific classifier architecture or task
(model-agnostic). The second baseline method is from Alzantot et al. [13], who use a genetic algorithm
over multiple generations to generate adversarial samples that are maximally fit to confuse the classifier.
As a framework for evaluation, we rely on the OpenAttack toolkit developed by Zeng et al. [14]. While
our work focuses on the word-level, some approaches have addressed perturbations on a more
coarserespectively fine-grained level [15], such as character-switching [16] or paraphrasing [17].</p>
      <p>Our contributions in the CLEF CheckThat! 2024 edition [18, 19] of the Shared Task on Robustness of
Credibility Assessment with Adversarial Examples (IncrediblAE) [20] can be summarized as follows:
• We introduce two novel methods to eficiently generate adversarial text samples for robustness
assessment of CA classifiers.
• We outperform previous baselines in the majority of CA tasks, with our GloTa approach appearing
as the most promising.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>Motivated by previous work, we initially attempted to address this task using either rule-based
algorithms or neural network-based methods. However, our experiments indicated that rule-based
algorithms, such as randomly arranging characters or replacing words using a preset synonym list, did
not achieve satisfactory performance across the five test tasks. Consequently, we focused primarily on
developing a new model-based method to solve this shared task.</p>
      <sec id="sec-3-1">
        <title>3.1. Contextual Embedding with RoBERTa Attacker</title>
        <p>Inspired by BERTAttacker4 [12], which provides a framework for automatically generating adversarial
samples, we first adopted a similar approach. BERTAttacker calculates an importance score for each
word and generates a candidate word list for substitution. However, we opted to use the RoBERTa
model [11] instead of BERT, given RoBERTa’s focus on masked word prediction with dynamic masking
and its training on a larger dataset, which should enhance its semantic understanding. We use the
RoBERTa-base model to generate importance scores for each word by calculating the diference in
output probability distributions between the original input sequence and the masked input sequence.</p>
        <p>Once the importance scores for each word are obtained, we rank the words in the sequence based
on these scores. This ranking identifies the most vulnerable words, with the highest-ranking word
4An implementation can be found at: https://github.com/thunlp/OpenAttack/blob/master/OpenAttack/attackers/bert_attack/
__init__.py
being the most susceptible to an attack that could alter the classifier’s output. Then, we iteratively
substitute these words to execute the attack on the victim model. For each substitution, we identify the
word’s position and extract its contextual embedding from the second-to-last layer of the RoBERTa
model. We select  (in our case =36) other words from the masked sequence’s predictions at that
position and extract their contextual embeddings after substituting the original word in the RoBERTa
model. We experimented with diferent values of  and determined that 36 ofers the optimal balance
between attack success rate and semantic preservation. By comparing the original word’s contextual
embedding with those of the  selected words, we retain only those with a similarity score above a
preset threshold(=0.3) as candidates for substitution.</p>
        <p>After obtaining a list of candidate words for each position in the original sequence, we substitute
each original word with candidates and check if the substitution fools the victim model. If successful,
we stop and return the modified sequence. If none of the candidates succeed, we retain the word that
most reduces the confidence in the original label and repeat the process for the next position. This
continues until either the attack is successful or all words in the input sequence have been processed.
The whole process of this method is shown in Algorithm 1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. GloTa: Combining GloVe Embeddings with RoBERTa</title>
        <p>GloTa, which stands for GloVe and RoBerTa, represents a method that combines GloVe [21] embeddings
and RoBERTa to enhance adversarial attack techniques. Applying the aforementioned method yielded
a high success rate, but the semantic score was often low due to extensive substitution by
RoBERTagenerated candidates. These substitutions do not necessarily preserve the original meaning and may
even introduce opposite meanings. For example, RoBERTa-generated candidates for the word love
in the sentence I love you might include miss, forgive, or hate. To address this issue, we use GloVe
embeddings to generate candidate lists for substituting vulnerable words in the input sequence.</p>
        <p>The candidate lists are generated using a process akin to the initial step in Genetic algorithm [13].
We build a large synonym dataset by computing the  nearest neighbors of each selected word based
on distance in the GloVe embedding space (Common Crawl, 840B tokens, 2.2M vocab), using the aclImdb
dataset [22] of movie reviews from IMDB as dictionary to construct the synonym dictionary, thereby
6:
8:
10:
12:
14:
16:
18:</p>
        <p>′ ←
20: X ←</p>
        <p>end for
22: return X
else
 ← ExtractEmbedding(, )
C ← GenerateCandidateWords(, , )
E ← ExtractEmbeddings(C, )
C′ ← {  ∈ C | Similarity(, ) &gt;  }</p>
        <p>C ← ReRankCandidatesByBLEURT(C′, X)
end if
for candidate  in C do</p>
        <p>X′ ← SubstituteWord(X, , )
if  (X′) ̸=  (X) then</p>
        <p>return X′
end if
end for</p>
        <p>WordThatReducesConfidenceMost (C,  )</p>
        <p>SubstituteWord(X, , ′)
mitigating the semantic loss associated with candidates generated from masked language models. We
still use the aclImdb dataset because it was employed in the original Genetic algorithm paper, allowing
us to maintain consistency and compare our results with the original findings.</p>
        <p>Algorithm 2 Adversarial Attack using RoBERTa and GloVe
Require: Original sequence X, victim model  , RoBERTa model , GloVe embeddings , synonym
dictionary , number of candidates , similarity threshold 
Ensure: Modified sequence X′</p>
        <p>S ← CalculateImportanceScores(X, )
2: W ← RankWordsByImportance(S)</p>
        <p>for word  in W do
4: if  ∈  then</p>
        <p>C ← []</p>
        <p>Once the synonym dictionary is constructed, we extract the most vulnerable word by the method
in Algorithm 1, generate a list of the closest words in the GloVe embeddings as candidates, and then
substitute the original word with candidates until the decision is flipped. If none of the words in the
list achieves this, we proceed to the next vulnerable word in the sequence and repeat the process.
However, because the input sequences span five diferent domains and may even include URLs and
emojis, many words are absent from the constructed synonym dictionary. For these out-of-vocabulary
words, we revert to using RoBERTa to generate candidate lists and then re-rank the substitution words
by BLEURT [23] to prioritize semantically close words for substitution. The entire process is illustrated
in Algorithm 2.</p>
        <p>Additionally, we set two types of hyperparameters that can be tuned for diferent input datasets:
(1) max_candidates (2) max_substitutes and max_sub_rate. The first hyperparameter is the number
of substitution words. A longer list may increase the success rate but reduce semantic integrity. The
second type includes two hyperparameters: the number of substitutions made to the original sequence
and the substitution rate compared to the original input sentences. Ideally, we can substitute each
word, but excessive substitutions may significantly alter the semantic meaning. Therefore, we establish
these thresholds to balance the trade-of between the semantic score and success score. The attack on
the sequence will terminate when either threshold is reached. In our experiment, we tested diferent
parameter values and set the optimal parameters as follows: max_candidates to 30, max_substitutes to
80, and max_sub_rate to 0.5.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We conducted the RoBERTa-ATTACK and our GloTa method, along with BERT-ATTACK and Genetic
algorithms as baselines, on five tasks to attack three victim models: the BERT Classifier, the Bi-LSTM
Classifier, and the RoBERTa Classifier. The last model, the RoBERTa Classifier, was introduced as
a "surprise" model for this shared task. The results are summarized in Table 2. GloTa achieved the
highest BODEGA scores in the Propaganda Detection (PR) and Fact Checking (FC) tasks. Analyzing the
sub-scores of these tasks, the gains were primarily from the success scores compared to the baseline
methods, indicating that our method is more efective at fooling the classifier in these tasks.</p>
      <p>For the Style-based News Bias Assessment (HN) and Rumor Detection (RD) tasks, our GloTa method
achieved similar BODEGA scores as the baseline. In the HN task, the baseline BERT-ATTACK already
achieved a high success score, limiting the potential for additional gains. In the meantime, GloTa’s
semantic score did not outperform the baseline, resulting in a comparable BODEGA score. In the
RD task, the victim model’s robustness led to low success scores across all methods, limiting GloTa’s
efectiveness in this context.</p>
      <p>The COVID-19 Misinformation Detection (C19) task is a new dataset released for this shared task,
lacking prior baselines for comparison. Both GloTa and RoBERTa-ATTACK achieved a relatively high
success score, indicating this task’s susceptibility. However, the semantic score was not high due to
the presence of non-word tokens such as URLs, hashtags, and emojis from Twitter data, making it
challenging to find semantically similar alternative words.</p>
      <p>The newly introduced classifier in this shared task, the RoBERTa classifier, is purported to be more
robust. We employed both RoBERTa-ATTACK and GloTa methods to attack this model, although we
did not have baseline results for comparison. GloTa significantly outperformed RoBERTa-ATTACK in
the HN and PR tasks, while achieving comparable results in the FC and RD tasks. These outcomes from
the RoBERTa classifier align with the comparative performance between GloTa and RoBERTa-ATTACK
observed in the BERT and Bi-LSTM classifier results. However, the performance gap between them
has widened, suggesting that GloTa is more efective at attacking more robust classifiers. Nonetheless,
RoBERTa-ATTACK marginally outperformed GloTa in the C19 task, primarily due to its higher success
score. Furthermore, when compared to all the BODEGA scores in BERT and Bi-LSTM classifiers, only
the FC task exhibited a higher BODEGA score in the RoBERTa classifier, while the other tasks showed
lower scores due to decreased success score or semantic score. This indicates that the RoBERTa classifier
is indeed more robust.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Future Work</title>
      <p>Given that the BODEGA score consists of three components—success score, semantic score, and
character score—the evaluation of adversarial attacks can be analyzed within these divisions. As shown
in Table 2, the character scores consistently remain high across diferent tasks. Therefore, our discussion
is focused on improving the success score and semantic score.</p>
      <sec id="sec-5-1">
        <title>5.1. Success Score</title>
        <p>Our GloTa method achieves near-perfect success scores across all three classifiers, with the exception
of the RD task in these three classifiers and the HN task in the RoBERTa classifier. We attribute this
high success rate primarily to our "greedy" attack method, which identifies vulnerable words and
replaces them sequentially until the classification is altered. While it is possible to replace all words
in the input sentences, this approach would significantly slow the process and lower the semantic
score, given that GloTa creates word-level replacements without considering the context. Therefore, we
introduced hyperparameters to control the number of substitutions in the original sequences, balancing
the trade-of between success and semantic scores. Additionally, as shown in Table 1, we observed that
HN and RD tasks involve longer texts compared to other tasks. This can lead to a more robust trained
victim model, which requires either replacing more words to alter the result or failing to attack when
the thresholds set by hyperparameters are reached. Future research could investigate allowing varying
numbers of substitutions in low-success tasks, such as RD, to determine if increasing the success rate
can enhance the overall BODEGA score.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Semantic Score</title>
        <p>
          Our initial goal of combining BERT-ATTACK and Genetic was due to their superior performance in the
experiment conducted by Przybyła et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We found that using a masked language model to replace
words could not maintain a relatively high semantic score, and character-level replacement resulted in
a low success rate. Therefore, we employed GloVe embeddings at word level to improve the semantic
score in the BODEGA calculation while maintaining a high success rate. However, results indicate that
the semantic score of our GloTa method showed only minor improvement compared to using words
from masked language models. For some tasks, such as FC, the semantic score decreased despite an
increase in the success score, due to more words being replaced in the original sequences. This may
be because the synonym dictionary was constructed from the aclImdb dataset, which difers from the
domains of the test tasks (HN, PR, FC, RD, and C19). As a result, many out-of-vocabulary words were
found in the original sequences, which required using RoBERTa to generate alternative words. This
likely explains the similarity in semantic scores between RoBERTa-ATTACK and GloTa.
        </p>
        <p>In addition, our system achieved an average BODEGA score of 0.4776 across all tasks and classifiers
with an average semantic score of 0.5867, ranking 4th out of 6 teams. However, in human evaluations,
our system managed to preserve the meaning in only 14% of the attack samples, also ranking 4th
out of 6 teams. The details of the ranking and human evaluation methodology are explained in the
overview paper of this shared task by Barrón-Cedeño et al. [18]. The human evaluation result is
notably lower compared to the semantic score achieved by our system. This discrepancy is likely due to
several factors. First, in the automatic evaluation, the BODEGA score utilizes BLEURT to calculate the
semantic score, where substituting a single word with its close synonym often results in a high score,
whereas human evaluation considers the entire context. Second, as previously mentioned, the synonym
dictionary did not adequately cover the domain of the test tasks, leading to candidate words that were
not semantically similar enough to the original words. Third, word-level replacements do not consider
the full context, leading to decreased sentence fluency and greater deviation from the original meanings
as more replacements are performed. However, we found that using GloVe embeddings performed
better in human evaluation compared to other teams that used only masked language models for word
substitution. This is likely because, as noted in Section 3.2, the words generated by the masked language
model do not necessarily preserve the original meaning and may even introduce opposite meanings.</p>
        <p>To further improve semantic scores, constructing the synonym dictionary using text data from the five
test tasks could reduce the occurrence of out-of-vocabulary words, thereby enhancing the semantic score.
Additionally, methods such as DeepWordBug [24], which performs character-level modifications, could
be explored to enhance the semantic score in both automatic and human evaluations after identifying
vulnerable words by the masked language model. Furthermore, our current methodology overlooks the
impact of emojis, which were ignored despite their importance in conveying emotional information.
Future research should incorporate emoji embeddings to enhance semantic understanding and model
performance.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this shared task, we explored the robustness of classifiers for credibility assessment (CA) tasks by
developing and evaluating two attack methods: RoBERTa-ATTACK and GloTa. The GloTa method, which
combines GloVe embeddings and RoBERTa to enhance adversarial attack capabilities, demonstrated
superior efectiveness, achieving the highest BODEGA scores in propaganda detection (PR) and
factchecking (FC) tasks. This indicates a significant vulnerability in these areas. However, its performance
was on par with baselines in style-based news bias assessment (HN) and rumor detection (RD), reflecting
the inherent robustness of classifiers in these tasks.</p>
      <p>When tested against a more robust RoBERTa classifier, GloTa outperformed RoBERTa-ATTACK,
although with generally lower success rates, underscoring the enhanced robustness of the RoBERTa
classifier. These findings highlight the trade-of between achieving high attack success rates and
maintaining semantic integrity. Our study enhances the understanding of CA classifier robustness and
demonstrates that using a masked language model to identify vulnerable words and replace them with
similar word embeddings in the original texts can be an efective method for adversarial attacks.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Special thanks to all the organizers of the CheckThat! Lab at CLEF 2024. We would also like to extend
our gratitude to Andrianos Michail, Simon Clematide and the Department of Computational Linguistics
at the University of Zurich for valuable suggestions and continuous support for this shared task.
(FEVER), Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 1–9. URL:
https://aclanthology.org/W18-5501. doi:10.18653/v1/W18-5501.
[5] Y. Jiang, X. Song, C. Scarton, I. Singh, A. Aker, K. Bontcheva, Categorising fine-to-coarse grained
misinformation: An empirical study of the COVID-19 infodemic, in: R. Mitkov, G. Angelova
(Eds.), Proceedings of the 14th International Conference on Recent Advances in Natural Language
Processing, INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 2023, pp. 556–567. URL: https:
//aclanthology.org/2023.ranlp-1.61.
[6] S. Han, J. Gao, F. Ciravegna, Neural language model based training data augmentation for weakly
supervised early rumor detection, in: Proceedings of the 2019 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining, ACM, Vancouver British Columbia Canada,
2019, pp. 105–112. URL: https://dl.acm.org/doi/10.1145/3341161.3342892. doi:10.1145/3341161.
3342892.
[7] G. Da San Martino, A. Barrón-Cedeño, H. Wachsmuth, R. Petrov, P. Nakov, SemEval-2020 task 11:
Detection of propaganda techniques in news articles, in: A. Herbelot, X. Zhu, A. Palmer, N.
Schneider, J. May, E. Shutova (Eds.), Proceedings of the Fourteenth Workshop on Semantic Evaluation,
International Committee for Computational Linguistics, Barcelona (online), 2020, pp. 1377–1414.</p>
      <p>URL: https://aclanthology.org/2020.semeval-1.186. doi:10.18653/v1/2020.semeval-1.186.
[8] M. Potthast, J. Kiesel, K. Reinartz, J. Bevendorf, B. Stein, A stylometric inquiry into hyperpartisan
and fake news, in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Melbourne, Australia, 2018, pp. 231–240. URL: https://aclanthology.org/P18-1022.
doi:10.18653/v1/P18-1022.
[9] J. Li, S. Ji, T. Du, B. Li, T. Wang, Textbugger: Generating adversarial text against real-world
applications, ArXiv abs/1812.05271 (2018). URL: https://api.semanticscholar.org/CorpusID:54815878.
[10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume
1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota,
2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423.
[11] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
RoBERTa: A robustly optimized BERT pretraining approach, ArXiv abs/1907.11692 (2019). URL:
https://api.semanticscholar.org/CorpusID:198953378.
[12] L. Li, R. Ma, Q. Guo, X. Xue, X. Qiu, BERT-ATTACK: Adversarial attack against BERT using BERT,</p>
      <p>ArXiv abs/2004.09984 (2020). URL: https://api.semanticscholar.org/CorpusID:216036179.
[13] M. F. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. B. Srivastava, K.-W. Chang, Generating natural
language adversarial examples, ArXiv abs/1804.07998 (2018). URL: https://api.semanticscholar.
org/CorpusID:5076191.
[14] G. Zeng, F. Qi, Q. Zhou, T. Zhang, Z. Ma, B. Hou, Y. Zang, Z. Liu, M. Sun, OpenAttack: An
open-source textual adversarial attack toolkit, in: H. Ji, J. C. Park, R. Xia (Eds.), Proceedings of the
59th Annual Meeting of the Association for Computational Linguistics and the 11th International
Joint Conference on Natural Language Processing: System Demonstrations, Association for
Computational Linguistics, Online, 2021, pp. 363–371. URL: https://aclanthology.org/2021.acl-demo.
43. doi:10.18653/v1/2021.acl-demo.43.
[15] X. Han, Y. Zhang, W. Wang, B. Wang, Text Adversarial Attacks and Defenses: Issues, Taxonomy,
and Perspectives, Security and Communication Networks 2022 (2022) 6458488. URL: https://doi.
org/10.1155/2022/6458488. doi:10.1155/2022/6458488, publisher: Hindawi.
[16] J. Gao, J. Lanchantin, M. L. Sofa, Y. Qi, Black-box generation of adversarial text sequences to
evade deep learning classifiers, in: 2018 IEEE Security and Privacy Workshops (SPW), 2018, pp.
50–56. doi:10.1109/SPW.2018.00016.
[17] M. Iyyer, J. Wieting, K. Gimpel, L. Zettlemoyer, Adversarial example generation with syntactically
controlled paraphrase networks, in: M. Walker, H. Ji, A. Stent (Eds.), Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long Papers), Association for Computational
Linguistics, New Orleans, Louisiana, 2018, pp. 1875–1885. URL: https://aclanthology.org/N18-1170.
doi:10.18653/v1/N18-1170.
[18] A. Barrón-Cedeño, F. Alam, J. M. Struß, P. Nakov, T. Chakraborty, T. Elsayed, P. Przybyła, T. Caselli,
G. Da San Martino, F. Haouari, C. Li, J. Piskorski, F. Ruggeri, X. Song, R. Suwaileh, Overview of
the CLEF-2024 CheckThat! Lab: Check-Worthiness, subjectivity, persuasion, roles, authorities
and adversarial robustness, in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier, G. M.
Di Nunzio, P. Galuščáková, A. García Seco de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR
Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International
Conference of the CLEF Association (CLEF 2024), 2024.
[19] G. Faggioli, N. Ferro, P. Galuščáková, A. García Seco de Herrera (Eds.), Working Notes of CLEF
2024 - Conference and Labs of the Evaluation Forum, CLEF 2024, Grenoble, France, 2024.
[20] P. Przybyła, B. Wu, A. Shvets, Y. Mu, K. C. Sheang, X. Song, H. Saggion, Overview of the
CLEF2024 CheckThat! lab task 6 on robustness of credibility assessment with adversarial examples
(InCrediblAE), in: [19], 2024.
[21] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in:
Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. URL: http:
//www.aclweb.org/anthology/D14-1162.
[22] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts, Learning word vectors for sentiment
analysis, in: Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, Association for Computational Linguistics, Portland,
Oregon, USA, 2011, pp. 142–150. URL: http://www.aclweb.org/anthology/P11-1015.
[23] T. Sellam, D. Das, A. Parikh, BLEURT: Learning robust metrics for text generation, in: D. Jurafsky,
J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 7881–7892.</p>
      <p>URL: https://aclanthology.org/2020.acl-main.704. doi:10.18653/v1/2020.acl-main.704.
[24] J. Gao, J. Lanchantin, M. L. Sofa, Y. Qi, Black-box generation of adversarial text sequences to
evade deep learning classifiers, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018,
pp. 50–56. URL: http://arxiv.org/abs/1801.04354.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Shvets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <article-title>Verifying the robustness of automatic credibility assessment</article-title>
          ,
          <year>2023</year>
          . URL: https://api.semanticscholar.org/CorpusID:257505431.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <article-title>Deep text classification can be fooled</article-title>
          ,
          <source>arXiv preprint arXiv:1704.08006</source>
          (
          <year>2017</year>
          ). URL: https://arxiv.org/abs/1704.08006.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey on adversarial attack in the age of artificial intelligence</article-title>
          ,
          <source>Wireless Communications and Mobile Computing</source>
          <year>2021</year>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          . doi:
          <volume>10</volume>
          .1155/
          <year>2021</year>
          /4907754.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>The fact extraction and VERification (FEVER) shared task</article-title>
          , in: J.
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vlachos</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Cocarascu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Christodoulopoulos</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Mittal (Eds.),
          <source>Proceedings of the First Workshop on Fact Extraction and VERification</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>