<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fired_from_NLP at CheckThat! 2024: Estimating the Check-Worthiness of Tweets Using a Fine-tuned Transformer-based Approach⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md. Sajid Alam Chowdhury</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anik Mahmud Shanto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mostak Mahmud Chowdhury</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hasan Murad</string-name>
          <email>hasanmurad@cuet.ac.bd</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Udoy Das</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chittagong University of Engineering and Technology (CUET)</institution>
          ,
          <addr-line>Chittagong</addr-line>
          ,
          <country country="BD">Bangladesh</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Due to immense usage and dependence on web-based and social media platforms, we nowadays come across a lot of information but all of them are not true. Thus, it is important to verify a statement before believing it. Therefore, Checking the validity of a statement has become a core research topic in Natural Language Processing (NLP) in both low-resource and resource-enriched languages. The CheckThat! Lab at CLEF 2024 has organized a shared task named Check-worthiness estimation (Task 1) where three datasets have been provided in the Arabic, English, and Dutch languages to determine whether a claim in a tweet and/or transcriptions is worth fact-checking. To perform the task, we have utilized several machine learning, deep learning, and transformer-based models to check which model performs best on the given datasets. Among all of these models, our proposed CW-BERT model has ranked 7ℎ, 10ℎ, and 12ℎ, scoring the F1 scores of 0.530, 0.543, and 0.745 in this task for the Arabic, English, and Dutch language respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Check-worthiness</kwd>
        <kwd>Fact-checking</kwd>
        <kwd>Tweets</kwd>
        <kwd>Transcriptions</kwd>
        <kwd>NLP</kwd>
        <kwd>Transformer</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Social media platforms like Facebook and Twitter are now integral parts of our daily routines. These
sites have revolutionized the way people communicate in the modern world. On the other hand, the
rise and spreading of misinformation has become a matter of huge concern to the same extent. By
spreading false news and misinformation using social media sites, public opinion and points of view
can be changed. Therefore, as a vast amount of content is generated daily, it has become essential to
identify which claims should be checked, whether they are facts or not, to allocate resources and stop
the spread of misleading information efectively.</p>
      <p>
        To overcome these challenges, The CheckThat! Lab [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] at CLEF 2024 has featured six distinct
tasks described in the overview paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], each addressing diferent aspects of misinformation and
content analysis on social media. Among these tasks, we have participated exclusively in Task 1:
Check-Worthiness Estimation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Verifying the accuracy of statements has emerged as a significant task within the field of Natural
Language Processing. Like many other NLP tasks, this task has also been highly explored in
highresource languages like English [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Very few researches have been done in this domain in low-resource
languages. Transformer-based approaches [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have been used in all these research works. Before
transformer-based approaches had begun to outperform all other approaches, machine learning (ML)
algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and deep learning (DL) techniques [7] have been employed for detecting if a tweet or
statement should be verified or not.
      </p>
      <p>
        The task of Check-Worthiness Estimation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] at the CheckThat! Lab [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], part of CLEF 2024, seeks
to bridge this gap by providing annotated datasets and encouraging the development of models that
can accurately estimate the check-worthiness of texts. The introduction of multi-genre data further
complicates the task, as tweets and transcriptions vary significantly in structure, style, and context.
This task contains datasets in multiple languages (Arabic, English, and Dutch), making the task both
linguistically and contextually challenging. Traditional approaches to check-worthiness estimation
often rely on human annotators to determine if a claim is verifiable, potentially harmful, or otherwise
significant. However, automating this process requires sophisticated natural language processing (NLP)
techniques and models that are capable of handling diverse and complex data.
      </p>
      <p>For this purpose, we have presented a detailed analysis of our approach to the Check-Worthiness
Estimation task. We have also conducted a comparative analysis of several models, including
Machine Learning models (Random Forest, SVM, XGBoost), Deep Learning models (LSTM, Bi-LSTM),
and Transformer-based models (AraBERT [8] for the Arabic language, RobBERT [9] for the Dutch
language, BERT-uncased [10] for the English language, and MultiLingual-BERT-uncased [10] for all
three languages). We have named our approach of using the Multilingual-BERT-uncased model as
CW-BERT.</p>
      <p>We have used the respective datasets provided by the organizers for respective languages to train our
respective models. After that, we have evaluated each of our models on the provided datasets. We have
found that among all models, fine-tuned models based on transformers have obtained better results
than all other models. The core contributions of our research are given below –
• We have developed a fine-tuned CW-BERT model specifically designed to assess the
checkworthiness of claims in tweets and transcriptions across three languages: Arabic, English, and
Dutch.
• We have conducted a comparative analysis among various models based on machine learning,
deep learning, and transformer techniques to determine the most efective approach.</p>
      <p>The implementation details of this task have been provided in this GitHub repository1.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>The purpose of this work is to evaluate the value of fact-checking a claim made in a tweet and/or
transcription. Previous works on this topic can be broadly categorized into machine learning, deep
learning, and transformer-based approaches.</p>
      <sec id="sec-2-1">
        <title>2.1. Previous Works Based on Machine Learning Approaches</title>
        <p>
          An approach based on rule using feature engineering has been proposed [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The unsupervised
approaches have been based on K-means clustering and the supervised approaches have been based on
Cosine similarity, POS tags, and TF-IDF vectorization. An SVM-based model [11] has been proposed to
predict whether a fact is worth checking. This approach focuses not only on sentence structure but also
the context of the sentence.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Previous Works Based on Deep Learning Appoaches</title>
        <p>Determining how trustworthy a claim is in the context of politics is a very important task. A deep
learning approach with multiple tasks [12] has been used for predicting whether fact-checking should
be given priority to a statement. A CNN-based deep learning model [7] has been suggested for obtaining
semantic word embeddings while managing the complexity of natural language structures in diverse
languages.
1https://github.com/Fired-from-NLP/CLEF-2024-check-worthiness-estimation</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Previous Works Based on Transformer-based Approaches</title>
        <p>Several researches have been done to estimate fact-checking worthiness in both low-resource and
resource-enriched languages. This study suggests that NorBench [13] and NB-BERT-base [14] have been
successfully employed for automated claim detection. Another research has proposed a
transformerbased fine-grained technique to claim check worthiness [15].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        In our study, we have used respective datasets for three diferent languages (Arabic, Dutch, and English)
provided by CLEF 2024 - CheckThat! Lab [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for estimating the Check-worthiness of tweets and/or
transcriptions. These datasets are categorized into two classes: ‘Yes’ (indicating the text is worth
fact-checking) and ‘No’ (indicating otherwise). The datasets are divided into four sets: train, dev, test,
and dev-test.
      </p>
      <p>For the English dataset, each split contains sentences comprising 8 to 9 words. The Arabic and Dutch
datasets are 11 to 16 words and 7 to 9 words respectively. Figure 1, and Figure 2 show a significant
imbalance in the English and Arabic datasets. The ‘Yes’ category, indicating check-worthy sentences, is
substantially underrepresented compared to the ‘No’ category. Specifically, the English dataset contains
2243 samples labeled as ‘Yes’ and 5090 samples labeled as ‘No’. A similar imbalance is observed in the
Arabic dataset, with 5413 ‘Yes’ samples and 17088 ‘No’ samples. Additionally, Table 1 reveals that the
Dutch dataset demonstrates a smaller sample distribution across both categories. These datasets have
been constructed from tweets and/or transcriptions, and reflect real-world text distributions.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>We have explained our methodology to develop models for check-worthiness estimation in this section.
At first, we used some preprocessing strategies on the given datasets and then utilized a variety of deep
learning and machine learning algorithms. Moreover, we have used diferent transformer models to
develop the system. Figure 4 provides the summary of our methodology.</p>
      <sec id="sec-4-1">
        <title>4.1. Preprocessing</title>
        <p>Throughout the training and evaluation phases, we have applied several preprocessing steps. These
include the removal of extraneous spaces and punctuation from the input text. However, as numerical
text contributes to the overall meaning of the text we haven’t removed them. For the final
preprocessing step specific to transformer-based models, we have tokenized the sample text using the BERT
tokenizer (bert-base-multilingual-uncased), added special tokens, and truncated/padded the sequences
to a maximum length of 128 tokens.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Machine Learning Based Approaches</title>
        <p>We have employed conventional machine learning algorithms like Logistic Regression, Random
Forest, Support Vector Machine (SVM), and XGBoost to estimate check-worthiness. To determine the
significance of each word within a text, we have employed the TF-IDF vectorizer. Using the modified
training data, with a maximum of 1000 iterations, we have used a Logistic Regression model, a Random
Forest model with 100 estimators and 42 as random state value, an SVM model with a linear kernel
and a maximum number of iterations set to 1000. Additionally, we have developed an XGBoost model
with 100 estimators and a max depth of 15, that builds an ensemble of decision trees repeatedly using
gradient boosting.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Deep Learning Based Approaches</title>
        <p>We have also utilized some deep learning-based models such as LSTM and BiLSTM to detect
checkworthiness. First, we have defined an embedding layer to convert words into vectors. Then, we added
a spatial dropout layer to prevent overfitting. Then we have defined an LSTM layer for the LSTM
model and a bidirectional LSTM layer for the BiLSTM model to capture sequential dependencies, then a
sigmoid activation function-based dense output layer for binary classification. We have used the Adam
optimizer and the binary cross-entropy loss function to compile both the LSTM and the BiLSTM models.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Transformer-based Approaches</title>
        <p>Transformer-based approaches are widely applied in numerous domains. For this reason, we have
employed multiple transformer-based models for diferent languages in this task. We have used the
AraBERT model [8], the RobBERT model [9], and the BERT-uncased model [10] for the Arabic, Dutch,
and English language respectively.</p>
        <p>Furthermore, we have fine-tuned the mBERT (bert-base-multilingual-uncased) model [ 10] for this
task. We have named our model CW-BERT (Check-Worthiness-BERT) which is a generalized model for
all three languages (Arabic, Dutch, and English).</p>
        <p>Firstly, we have carried out a variety of preprocessing steps to prepare the data for training. Using
the BERT tokenizer (bert-base-multilingual-uncased), we have tokenized each sample text to transform
it into a format that can be entered into the BERT model. This involved padding/truncating sequences
to a maximum length of 128 tokens and inserting special tokens. Then, we mapped the labels, which
were ‘Yes’ for check-worthy claims and ‘No’ for non-check-worthy claims to binary values (1 for Yes,
0 for No). After that, we used padding for tokenized sequences to ensure uniform input length and
converted the tokenized texts and labels PyTorch tensors for model training.</p>
        <p>In our approach, we have chosen the mBERT model (bert-base-multilingual-uncased) for this task
as it has been pre-trained on an extensive collection of multi-lingual data. We have specifically used
the BertForSequenceClassification class, which is tailored for sequence classification tasks. The model
has been fine-tuned using the training dataset using a carefully designed procedure. We have loaded
the preprocessed training data into a DataLoader with a batch size of 32, enabling eficient mini-batch
training. Then, in order to modify the learning rate during training, we used a linear learning rate
scheduler with warm-up and the AdamW optimizer, which had a learning rate of 2− 5 and an epsilon
of 1− 8. We have trained the model for several epochs, during which the forward pass computed the
output of the model and loss for each batch, the backward pass computed gradients and updated model
parameters. Finally, The learning rate was adjusted in accordance with the scheduler, and gradient
clipping was used to stop exploding gradients.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>This section presents the experimental findings obtained during the training and evaluation stages of
our proposed model and several other transformer-based, machine learning, and deep learning models
for comparative study.</p>
      <sec id="sec-5-1">
        <title>5.1. Environment Settings</title>
        <p>A personal computer equipped with an Intel Core i7-9750H CPU running at 3.00 GHz and an NVIDIA
GeForce GTX 2060 GPU was used to execute the simulation. Additionally, a Kaggle Notebook with a
P100 GPU was used to guarantee the necessary processing capability.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Parameter Settings</title>
        <p>Table 2 summarizes the parameter settings that we have used in diferent models.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Evaluation Metrics</title>
        <p>As per the guidelines provided by the CLEF 2024 - CheckThat! Lab: Task 1 organizers, we have evaluated
our models by calculating the F1 score on the test dataset. Equation 1 gives the mathematical description
of the F1 score.</p>
        <p>F1 Score = 2 ×</p>
        <p>Precision × Recall</p>
        <p>Precision + Recall
Precision =</p>
        <p>Recall =</p>
        <p>TP
TP + FP</p>
        <p>TP
TP + FN
(1)
(2)
(3)
where
True Positive, False Positive, and False Negative are represented by the symbols TP, FP, and FN
respectively in equations 2 and 3.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Comparative Analysis</title>
        <p>The CW-BERT model’s performance on the Arabic, Dutch, and English test datasets is shown in Table 3.</p>
        <p>We have conducted a comparative analysis of transformer-based, machine learning, and deep learning
models, evaluating their performance using the F1 score. Table 4 shows how diferent models perform
on the datasets of various languages by calculating the F1 score.</p>
        <p>When assessing models using the Arabic language test dataset, the SVM model has attained the
best F1 score of 0.403 among machine learning models. Deep learning models, specifically LSTM and
BiLSTM, have demonstrated enhanced eficacy. The BiLSTM model achieved an F1 score of 0.447,
just surpassing the LSTM model’s F1 score of 0.406. Transformer-based models have demonstrated
superior performance in contrast to both deep learning and machine learning approaches. For instance,
the AraBERT model has achieved an F1 score of 0.504, while our proposed CW-BERT model secured
the highest F1 score of 0.530 among all models evaluated for the Arabic language, placing 7ℎ on the
leaderboard.</p>
        <p>In the context of the Dutch language, the SVM model once more demonstrated superior performance
among machine learning models, achieving an F1 score of 0.415. Among the deep learning models,
we have again seen improvement over the machine learning models as the LSTM model and the
BiLSTM model have obtained the F1 score of 0.417 and 0.429 respectively. However, transformer-based
models once again have demonstrated clear superiority over both deep learning and machine learning
approaches. Specifically, the RobBERT model has secured the second-highest F1 score of 0.532, while
the CW-BERT model attained the best F1 score of 0.543, ranking 12ℎ on the leaderboard.</p>
        <p>Finally, for the English language, we have observed that the XGBoost model has surpassed the other
models of machine learning with an F1 score of 0.516. Deep learning models have done little better than
machine learning models. The BiLSTM model has scored an F1 score of 0.541 having a slight edge over
the F1 score of the LSTM model of 0.528. However, the transformer-based approaches have achieved
significant improvement with high margins contrasted with the deep learning and machine learning
models. Although the BERT-uncased model has obtained an impressive F1 score of 0.713, our proposed
CW-BERT model excels by obtaining the highest score among these models with a remarkable F1 score
of 0.745, securing the 10ℎ position on the leaderboard.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Error Analysis</title>
        <p>From the confusion matrices, we have seen that the CW-BERT model for the English language has
achieved the lowest True Positive Rate (TPR) of 79.8% even though the model is performing well in the
Arabic language in terms of ensuring that most of the statements that are check-worthy are correctly
identified. However, it has not performed so well in the Dutch and English languages as it has obtained
very low TPR of 60.7% and 68.2% respectively, indicating that the model is missing some actual positive
instances (check-worthy statements).</p>
        <p>According to Figures 1, 2, and 3, it is evident that the datasets in all three datasets are highly
imbalanced with more ‘No’ (Not Check-worthy) instances compared to ‘Yes’ (Check-worthy) instances.
Because of this, the model is biased in favor of the majority class (No), which raises the number of
false negatives. Besides that, we can see in Figure 3 that there are not enough training samples for
the Dutch language which is also a reason for getting low TPR. Therefore, the model leads to some
misclassifications.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Fact-checking is crucial for maintaining the integrity of information shared on social media. By focusing
on influential tweets, fact-checkers can more efectively mitigate the spread of misinformation and
contribute to a healthier information ecosystem. In this research, we have employed several machine
learning, deep learning, and transformer-based approaches for detecting the fact-checking worthiness
of tweets and/or transcriptions using the provided dataset in Task 1 (Check-Worthiness Estimation)
organized by CheckThat! Lab under CLEF 2024. For three distinct languages, we have developed a
single multilingual model. In the future, we plan to develop a more comprehensive multilingual model
which will lead us to superior results. We intend to expand on this work in the future so that our model
can eficiently handle data imbalance and apply large language models to the dataset.
[7] H. Sinha, Sakshi, Y. Sharma, Text-convolutional neural networks for fake news detection in tweets,
in: FICTA 2020, Volume 1, Springer, 2021.
[8] W. Antoun, F. Baly, H. M. Hajj, Arabert: Transformer-based model for arabic language
understanding, CoRR (2020).</p>
      <p>[9] P. Delobelle, T. Winters, B. Berendt, Robbert: a dutch roberta-based language model, CoRR (2020).
[10] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers
for language understanding, CoRR (2018).
[11] P. Gencheva, P. Nakov, L. Màrquez, A. Barrón-Cedeño, I. Koychev, A context-aware approach for
detecting worth-checking claims in political debates, in: R. Mitkov, G. Angelova (Eds.), RANLP,
2017.
[12] S. Vasileva, P. Atanasova, L. Màrquez, A. Barrón-Cedeño, P. Nakov, It takes nine to smell a rat:</p>
      <p>Neural multi-task learning for check-worthiness prediction, CoRR (2019).
[13] D. Samuel, A. Kutuzov, S. Touileb, E. Velldal, L. Øvrelid, E. Rønningstad, E. Sigdel, A. Palatkina,
NorBench – a benchmark for Norwegian language models, in: T. Alumäe, M. Fishel (Eds.),
NoDaLiDa, University of Tartu Library, 2023.
[14] P. E. Kummervold, J. De la Rosa, F. Wetjen, S. A. Brygfjeld, Operationalizing a national digital
library: The case for a norwegian transformer model, in: NoDaLiDa, 2021.
[15] M. Sundriyal, M. S. Akhtar, T. Chakraborty, Leveraging social discourse to measure
checkworthiness of claims for fact-checking, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <surname>The</surname>
            <given-names>CLEF</given-names>
          </string-name>
          -2024 CheckThat! Lab:
          <article-title>Check-worthiness, subjectivity, persuasion, roles, authorities, and adversarial robustness</article-title>
          , in: N.
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tonellotto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lipani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
          </string-name>
          , I. Ounis (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>449</fpage>
          -
          <lpage>458</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          , G. Da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Piskorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2024 CheckThat! Lab: Check-worthiness, subjectivity, persuasion, roles, authorities and adversarial robustness</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Di Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weering</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2024 CheckThat! lab task 1 on check-worthiness estimation of multigenre content</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . García Seco de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2024</year>
          , Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          , G. Da San Martino, et al.,
          <article-title>Overview of checkthat! 2020 english: Automatic identification and verification of claims in social media</article-title>
          .,
          <source>CLEF (working notes)</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention is all you need,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Naskar</surname>
          </string-name>
          ,
          <article-title>A new approach to claim check-worthiness prediction and claim verification</article-title>
          , in: P.
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Sharma</surname>
          </string-name>
          , R. Sangal (Eds.), ICON, NLPAI,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>