<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HYBRINFOX at CheckThat! 2024 - Task 1: Enhancing Language Models with Structured Information for Check-Worthiness Estimation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Géraud Faye</string-name>
          <email>geraud.faye@centralesupelec.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Morgane Casanova</string-name>
          <email>morgane.casanova@irisa.fr</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Icard</string-name>
          <email>benjamin.icard@lip6.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julien Chanson</string-name>
          <email>julien.chanson@mondeca.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Gadek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Gravier</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Égré</string-name>
          <email>paul.egre@ens.psl.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Airbus Defence and Space</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institut Jean-Nicod</institution>
          ,
          <addr-line>CNRS, ENS-PSL, EHESS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LIP6, Sorbonne Université</institution>
          ,
          <addr-line>CNRS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Mondeca</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Université Paris-Saclay</institution>
          ,
          <addr-line>CentraleSupélec, MICS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Université de Rennes</institution>
          ,
          <addr-line>CNRS, Inria, IRISA</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper summarizes the experiments and results of the HYBRINFOX team for the CheckThat! 2024 - Task 1 competition. We propose an approach enriching Language Models such as RoBERTa with embeddings produced by triples (subject ; predicate ; object) extracted from the text sentences. Our analysis of the developmental data shows that this method improves the performance of Language Models alone. On the evaluation data, its best performance was in English, where it achieved an F1 score of 71.1 and ranked 12th out of 27 candidates. On the other languages (Dutch and Arabic), it obtained more mixed results. Future research tracks are identified toward adapting this processing pipeline to more recent Large Language Models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hybrid AI</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Check-Worthiness</kwd>
        <kwd>Fact-Checking</kwd>
        <kwd>Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The recent democratisation of social media has given the users unprecedented access to information,
with the possibility to contribute knowledge as well as to share personal views and opinions. By the
same token, however, it has also ofered misinformation new paths to propagate, sometimes with
massive impact. Because of that, automated misinformation detection and automated fact-checking
have become tasks of central interest in the data science community.</p>
      <p>
        This paper deals with a specific aspect of fact-checking, namely with the problem of check-worthiness
estimation, presented as Task 1 of the the broader CheckThat! 2024 workshop [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], concerned with
information quality evaluation. Various works dealing with fact-checking operate under the assumption
that the entirety of the claims, sentences, or articles in a dataset are checkworthy [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. But this
approach can be ineficient, and a useful preliminary step is the identification of which claims are
checkworthy, and which are not. The notion of check-worthiness is complex. Some claims in a text are not
check-worthy simply because they are not declarative sentences and do not report even potential facts
(viz. questions). Others are not check-worthy because, while making declarative assertions, they make
claims of no consequence. Conversely, a declarative sentence with potentially harmful consequences is
one that ranks high on check-worthiness. Others, finally, may not be check-worthy when they simply
report subjective views that are not susceptible of verification proper. It is a non-trivial challenge,
therefore, to determine which claims in a document are specifically check-worthy.
      </p>
      <p>
        Check-worthiness is a recent task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], mostly covered using Language Models [6]. In this paper, we
propose an approach designed to leverage structured information from the text, in order to enhance the
representation obtained with a Language Model. Because the task of check-worthiness estimation is
related to fact-checking, it seems appropriate to identify facts from the text to help the model predict
check-worthiness. By using both structured facts and Language Models embeddings, we obtained better
results than when using Language Models alone, ranking 12th among 27 competing teams for English
(with an F1 score of 71.1). Results were more mixed for the non-English languages represented in the
test set: in Dutch our method ranked 8th out of 16 candidates (F1 score of 58.9), and in Arabic it ranked
10th out of 14 (F1 score of 51.9).
      </p>
      <p>In Section 2, we open with a quick review of the state of the art on the task. In Section 3, we spell
out the functioning of the proposed processing pipeline. Then, Section 4 discusses preliminary results
obtained with the initial training data and presents the final submitted results. Finally, some elements
on the evolution and future use of the proposed hybrid system are presented in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        As explained in the previous section, check-worthiness is a fairly recent task, first mentioned in 2015 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Several datasets have been constructed, like the ClaimBuster dataset [7], or the datasets proposed at the
CheckThat workshops since 2018 [8]. These datasets focus on two main types of contexts for the task:
• Classifying sentences from a political debate. These could be used to ease fact-checking during
television political debates, on datasets such as [9].
• Classifying tweets. Because they are easily and widely shared online, check-worthiness is an
important task for online discussions to avoid information manipulation.
      </p>
      <p>Both of these categories are important, and a commonality between them is their short format.
However, the scope of this task of check-worthiness can be widened so as to also include online press,
with an eye to so-called “pink slime” news [10], encompassing longer texts whose truthfulness is
questionable.</p>
      <p>Among pioneering approaches to the task, we find methods such as ClaimRank [ 11], using traditional
NLP methods (e.g. lemmatization, TF-IDF) to identify check-worthy claims.</p>
      <p>More recent approaches take advantage of the Transformer layer and of pretrained Language Models
such as BERT [12], RoBERTa [13] or XLNet [14]. The fortune of these approaches can be seen in the
2023 CheckThat! Task 1 overview paper [15], showing that nearly all teams used a transformer-based
Language Model.</p>
      <p>With the even more recent development of Large Language Models and of Generative AI, a natural
shift has been made toward the use of LLMs, relying on prompt engineering [16] and in-context
learning [17] to achieve check-worthiness estimation. These approaches were used by the winners of
this year’s competition, both in English [18] and in Dutch [19].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>3.1. Model
A straightforward approach for check-worthiness estimation would be to use a pretrained Language
Model fine-tuned with the provided training data. However, these language models produce embeddings
that are opaque, even if they are suficient most of the time. To increase the quality of the language
model predictions, we propose to use them in conjunction with a small neural network able to leverage
structured information from the input text. A visual description of the architecture is given in Figure 1.
The processing pipeline is the following:</p>
      <p>OpenIE6</p>
      <p>Text</p>
      <p>
        Linear +
Sigmoid
 ∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]
 1,  1,  1
 2,  2,  2
 3,  3,  3
( 4,  4,  4)
fafsatTsteTxetxt +
      </p>
      <p>Linear +
ReLU
  1,   1,   1
  2,   2,   2
  3,   3,   3
(  4,   4,   4)</p>
      <p>Linear +
ReLU +
Mean
(  ,   ,   )</p>
      <p>Linear
1. To begin with, the text is embedded using a Language Model. In our case, the RoBERTa model [13]
is used, producing an embedding of dimension 768. RoBERTa was chosen for its ease of use, its
high performance for classification tasks and its relatively small size when compared to recent
LLMs.
2. In parallel, the text is structured using an Open Information Extraction system. These systems
extract information from the text in the form of triples (subject; predicate; object). We used
OpenIE6 [20] to extract triples in the English language. Using triples allows us to produce
structured information from the text and to reduce syntactic complexity, with the aim of helping
sentence classification. A maximum limit of 4 triples by text were extracted, which is enough to
consider all information triples for more than 90% of sentences. Each part of the triples is encoded
using fastText [21], producing 3 vectors of dimension 300 per triple. These vector representations
go through a dense layer with ReLU activation function. Then, they are averaged before being
combined in the last layer to produce an embedding of dimension 768 (the same dimension as the
Language Model).
3. Encodings from the previous two parts are concatenated and go through a dense layer with ReLU
activation function with a final output producing the probability of being checkworthy by means
of a sigmoid activation function.</p>
      <p>The described architecture can be transposed to other languages when an OpenIE system and an LM
are available. In the context of Task 1 of the evaluation, for Spanish, Dutch and Arabic, RoBERTa was
swapped with a multilingual BERT [12].1 The OpenIE6 system was replaced with Multi²OIE [22] in a
zero-shot setting for non-English languages, providing worse performance than OpenIE6 on English,
but allowing us to test the architecture on other languages. In principle, the same architecture can be
used for any language, and in practice it is applicable to the 98 languages currently supported by the
multilingual version of BERT.</p>
      <sec id="sec-3-1">
        <title>3.2. Example</title>
        <p>To better understand how this architecture works, we illustrate the pipeline with a simple example. We
took a sentence from the training English dataset: "I must remind him the Democrats have controlled
the Congress for the last twenty-two years and they wrote all the tax bills." This sentence comes from a
debate between US presidential candidates Jimmy Carter and Gerald Ford on September 26ℎ 1976. It is
deemed to be checkworthy, as it contains allegations on Jimmy Carter’s party.
1https://huggingface.co/google-bert/bert-base-multilingual-cased</p>
        <p>The classical NLP classification pipeline using RoBERTa would consist in producing a 768 dimension
embeddings and doing classification passing this embeddings through a neural network. In our approach,
we produce in parallel embeddings of the triples that can be extracted from the text. For this example,
OpenIE6 produces the three following triples:
(I; must remind; him the Democrats have controlled the Congress for the last twenty-two years)
(the Democrats; have controlled; the Congress for the last twenty-two years)</p>
        <p>(they; wrote; all the tax bills)</p>
        <p>Only the last two triples contain the information that is check-worthy. However, there are no
triplelevel annotations, so we keep all triples. Firstly, we encode each subject, predicate and object with
fastText, creating in this case three embedding vectors for each triple. These embeddings go through the
same linear layer and the embeddings of subjects, predicates and objects are averaged by component
(subject, predicate, object). This leads to three vectors of dimension 300, representing the subjects,
predicates and objects of the triples extracted from the text. Finally they are concatenated and projected
into a vector of dimension 768, the same dimension as the RoBERTa embeddings. RoBERTa embeddings
and embeddings for triples are eventually concatenated for standard classification.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section is divided in three parts. The first presents the protocol of our model. The second part
reports the evaluation of our proposed approach and of a standard Language Model, in order to measure
how the additional triple processing part impacts performance. The third part contains an analysis of
our submitted results, as well as of the dificulties encountered.</p>
      <sec id="sec-4-1">
        <title>4.1. Training procedure</title>
        <p>Each model was trained over 5 epochs on the train set. After each epoch, the model was evaluated on
the dev set and the best model in terms of macro-F1 score was kept. The scores reported in Section 4.2
are the macro-F1 score on the dev-test set.</p>
        <p>In our procedures, only the train set was used for training the model, the dev and dev-test sets being
used for model selection. Reported results were produced by the models with the best dev-test macro-F1
score, which were also used to make predictions on the final test set.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Preliminary results on the development data</title>
        <p>The main goal of our approach was to evaluate how combining structured information from the text with
a standard Language Model is impacting performance. Results observed on the dev-test set provided
before the competition are provided in Table 1.</p>
        <p>In most cases, our approach outperformed the LM baseline, achieving the highest performance gain
in Arabic, followed by English and Spanish.</p>
        <p>It appears that performance is generally lower for non-English languages. This is due to the fact that
multilingual models perform worse, but they allowed us to process diferent languages with the same
architecture and weights.</p>
        <p>The same goes for the OpenIE system, with OpenIE6 being specifically trained on English, and
Multi²OIE being used in a zero-shot setting (no non-English training sample was used for training).
This limitation is pointed out in the Multi²OIE paper [22], but it is the only existing open-source OpenIE
system able to process Arabic, Dutch, English and Spanish. As it relies on a multilingual BERT, it
also sufers from lower performance for relatively low-resource languages, explaining the decrease in
performance for Dutch.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results on the evaluation data</title>
        <p>The competition scores and ranking are shown in Table 2, with the scores of the best performing team
(state-of-the-art for this dataset) and the baseline being also reported.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Discussion</title>
        <p>While the proposed approach outperformed RoBERTa on the dev-test set, several upgrades could have
been made to reduce possible errors in the processing pipeline. Firstly, it is well known that triples
extracted with OpenIE are noisy and may not always contain useful facts for the task. This can be seen
in the first triple of the example given in Section 3: (I; must remind; him the Democrats have controlled
the Congress for the last twenty-two years). One way would be to filter out the triples that do not contain
named entities in the subject and object part. This approach would keep only the second triples in
the example. One additional step to increase the usefulness of triples would be to apply a coreference
analysis, changing pronouns by the objects they refer to. After coreference, (they; wrote; all the tax bills)
would become (the Democrats; wrote; all the tax bills), which is more descriptive.</p>
        <p>Another way of improving this approach would be to use post-hoc explanation methods such as
integrated gradients [23], to identify which embeddings make the highest contribution to the prediction.
This could help identify the triples most relevant to the prediction, giving interpretability to the proposed
addition, and further input for a fact-checking system.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Future work and conclusion</title>
      <p>The HYBRINFOX team is interested in neurosymbolic architectures and our aim is generally to improve
performance of Language Models by adding structured information from the texts. This approach has
to be adapted to misinformation detection or fact-checking settings. In general, we believe that all tasks
that are related to factual claims could benefit from adding structured information into their pipeline in
order to increase performance.</p>
      <p>The proposed approach uses Language Models such as BERT (for OpenIE) or RoBERTa, but could be
upgraded by using most recent advances in Large Language Models such as Mistral or ChatGPT. An
LLM prompted with instructions could easily perform a similar pipeline:
1. Extract information triples from the text.
2. Select factual triples.
3. Identify if the factual triples are check-worthy.</p>
      <p>This approach could help identify which part of the text contains check-worthy information with
better accuracy.</p>
      <p>To conclude, the proposed approach, enriching Language Models with a level of structured
information, has shown promising results in comparison to the use of Language Models alone on the task of
check-worthiness estimation. For check-worthiness, the extraction of factual triples from the text helps
classification. However, performance was mixed on non-English texts.</p>
      <p>Further analyses need to be conducted with other expert systems to further improve performance.
As mentioned in the introduction, the definition of what counts as check-worthy is complex. One
approach, which we have not tried, might be to consider as checkworthy first and foremost sentences
making objective claims. For that purpose, we may piggyback on the methods used in Task 2 of the
CheckThat! Lab [24] dealing with the classification of subjective vs objective sentences. We leave that
exploration for further work.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We thank two anonymous referees for helpful comments and suggestions on the first version of this
report. This work was supported by the programs HYBRINFOX (ANR-21-ASIA-0003), FRONTCOG
(ANR-17-EURE-0017), and THEMIS (n°DOS0222794/00 and n° DOS0222795/00). PE thanks Monash
University for hosting him during the writing of this paper, in the context of the program PLEXUS (Marie
Skłodowska-Curie Action, Horizon Europe Research and Innovation Programme, grant n°101086295).
[6] F. Alam, S. Shaar, F. Dalvi, H. Sajjad, A. Nikolov, H. Mubarak, G. Da San Martino, A. Abdelali,
N. Durrani, K. Darwish, A. Al-Homaid, W. Zaghouani, T. Caselli, G. Danoe, F. Stolk, B. Bruntink,
P. Nakov, Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers,
social media platforms, policy makers, and the society, in: M.-F. Moens, X. Huang, L. Specia, S. W.-t.
Yih (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, Association
for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 611–649. URL: https:
//aclanthology.org/2021.findings-emnlp.56. doi: 10.18653/v1/2021.findings-emnlp.56.
[7] F. Arslan, N. Hassan, C. Li, M. Tremayne, A benchmark dataset of check-worthy factual claims, in:
International Conference on Web and Social Media, 2020. URL: https://api.semanticscholar.org/
CorpusID:216870066.
[8] P. Nakov, A. Barrón-Cedeño, T. Elsayed, R. Suwaileh, L. Màrquez, W. Zaghouani, P. Atanasova,
S. Kyuchukov, G. Da San Martino, Overview of the CLEF-2018 CheckThat! lab on automatic
identification and verification of political claims, in: P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J. Y.
Nie, L. Soulier, E. SanJuan, L. Cappellato, N. Ferro (Eds.), Experimental IR Meets Multilinguality,
Multimodality, and Interaction, Springer International Publishing, Cham, 2018, pp. 372–387.
[9] F. Rayar, M. Delalandre, V.-H. Le, A large-scale TV video and metadata database for french political
content analysis and fact-checking, in: Proceedings of the 19th International Conference on
Content-Based Multimedia Indexing, CBMI ’22, Association for Computing Machinery, New York,
NY, USA, 2022, p. 181–185. URL: https://doi.org/10.1145/3549555.3549557. doi:10.1145/3549555.
3549557.
[10] B. D. Horne, M. G. Gruppi, NELA-PS: A dataset of pink slime news articles for the study of local
news ecosystems, ArXiv abs/2403.13657 (2024). URL: https://api.semanticscholar.org/CorpusID:
268537274.
[11] I. Jaradat, P. Gencheva, A. Barrón-Cedeño, L. Màrquez, P. Nakov, ClaimRank: Detecting
checkworthy claims in Arabic and English, in: Y. Liu, T. Paek, M. Patwardhan (Eds.), Proceedings of the
2018 Conference of the North American Chapter of the Association for Computational Linguistics:
Demonstrations, Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp.
26–30. URL: https://aclanthology.org/N18-5006. doi:10.18653/v1/N18-5006.
[12] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, arXiv:1810.04805 [cs] (2018).
[13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
Roberta: A robustly optimized BERT pretraining approach, CoRR abs/1907.11692 (2019). URL:
http://arxiv.org/abs/1907.11692. arXiv:1907.11692.
[14] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q. V. Le, XLNet: generalized autoregressive
pretraining for language understanding, Curran Associates Inc., Red Hook, NY, USA, 2019.
[15] F. Alam, A. Barrón-Cedeño, G. S. Cheema, G. K. Shahi, S. Hakimov, M. Hasanain, C. Li, R. Míguez,
H. Mubarak, W. Zaghouani, P. Nakov, Overview of the CLEF-2023 CheckThat! lab Task 1 on
checkworthiness in multimodal and multigenre content, in: Conference and Labs of the Evaluation
Forum, 2023. URL: https://api.semanticscholar.org/CorpusID:264441760.
[16] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubig, Pre-train, prompt, and predict: A systematic
survey of prompting methods in natural language processing, ACM Comput. Surv. 55 (2023). URL:
https://doi.org/10.1145/3560815. doi:10.1145/3560815.
[17] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, L. Li, Z. Sui, A survey for in-context
learning, ArXiv abs/2301.00234 (2023). URL: https://api.semanticscholar.org/CorpusID:263886074.
[18] L. Yufeng, P. Rrubaa, Z. Arkaitz, FactFinders at CheckThat! 2024: Refining check-worthy statement
detection with LLMs through data pruning, in: Working Notes of CLEF 2024 - Conference and
Labs of the Evaluation Forum, CLEF ’2024, Grenoble, France, 2024.
[19] B. Mehmet Eren, K. Kaan Efe, K. Mucahid, TurQUaz at CheckThat! 2024: A hybrid approach of
ifne-tuning and in-context learning for check-worthiness estimation, in: Working Notes of CLEF
2024 - Conference and Labs of the Evaluation Forum, CLEF ’2024, Grenoble, France, 2024.
[20] K. Kolluru, V. Adlakha, S. Aggarwal, p. Mausam, S. Chakrabarti, OpenIE6: Iterative Grid Labeling
and Coordination Analysis for Open Information Extraction, in: The 58th Annual Meeting of the
Association for Computational Linguistics (ACL), Seattle, U.S.A, 2020.
[21] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information,
arXiv preprint arXiv:1607.04606 (2016).
[22] Y. Ro, Y. Lee, P. Kang, Multiˆ2OIE: Multilingual open information extraction based on
multihead attention with BERT, in: T. Cohn, Y. He, Y. Liu (Eds.), Findings of the Association for
Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020,
pp. 1107–1117. URL: https://aclanthology.org/2020.findings-emnlp.99. doi: 10.18653/v1/2020.
findings-emnlp.99.
[23] I. Čík, A. D. Rasamoelina, M. Mach, P. Sinčák, Explaining deep neural network using layer-wise
relevance propagation and integrated gradients, in: 2021 IEEE 19th World Symposium on Applied
Machine Intelligence and Informatics (SAMI), 2021, pp. 000381–000386. doi:10.1109/SAMI50585.
2021.9378686.
[24] J. M. Struß, F. Ruggeri, A. Barrón-Cedeño, F. Alam, D. Dimitrov, A. Galassi, G. Pachov, I. Koychev,
P. Nakov, M. Siegel, M. Wiegand, M. Hasanain, R. Suwaileh, W. Zaghouani, Overview of the
CLEF-2024 CheckThat! lab task 2 on subjectivity in news articles, 2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Przybyła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          , G. Da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Piskorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2024 CheckThat! Lab: Check-worthiness, subjectivity, persuasion, roles, authorities and adversarial robustness</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Di Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tchechmedjiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fafalios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Boland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gasquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zapilko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <article-title>Claimskg: A knowledge graph of fact-checked claims</article-title>
          , in: C.
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Svátek</surname>
            ,
            <given-names>I. Cruz</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lefrançois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gandon</surname>
          </string-name>
          (Eds.),
          <source>The Semantic Web - ISWC 2019</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>309</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , K.-s. Choi,
          <article-title>Unsupervised fact checking by counter-weighted positive and negative evidential paths in a knowledge graph</article-title>
          , in: D.
          <string-name>
            <surname>Scott</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Bel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Zong (Eds.),
          <source>Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1677</fpage>
          -
          <lpage>1686</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .coling-main.
          <volume>147</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .coling-main.
          <volume>147</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          , Automated fact checking:
          <article-title>Task formulations, methods and future directions</article-title>
          , in: E. M.
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Derczynski</surname>
          </string-name>
          , P. Isabelle (Eds.),
          <source>Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Association for Computational Linguistics, Santa Fe, New Mexico, USA,
          <year>2018</year>
          , pp.
          <fpage>3346</fpage>
          -
          <lpage>3359</lpage>
          . URL: https://aclanthology.org/C18-1283.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tremayne</surname>
          </string-name>
          ,
          <article-title>Detecting check-worthy factual claims in presidential debates</article-title>
          ,
          <source>in: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          , CIKM '15,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2015</year>
          , p.
          <fpage>1835</fpage>
          -
          <lpage>1838</lpage>
          . URL: https://doi.org/10.1145/2806416.2806652. doi:
          <volume>10</volume>
          .1145/2806416.2806652.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>