<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Huertas-García)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>CIVIC-UPM at CheckThat! 2021: Integration of Transformers in Misinformation Detection and Topic Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Álvaro Huertas-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier Huertas-Tato</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alejandro Martín</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Camacho</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Sciences, Universidad Rey Juan Carlos</institution>
          ,
          <addr-line>Calle Tulipán, 28933, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer System Engineering, Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>Calle de Alan Turing, 28031, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Online Social Networks (OSNs) growth enables and amplifies the quick spread of harmful, manipulative and false information that influence public opinion while sow conflict on social or political issues. Therefore, the development of tools to detect malicious actors and to identify low-credibility information and misinformation sources is a new crucial challenge in the ever-evolving field of Artificial Intelligence. The scope of this paper is to present a Natural Language Processing (NLP) approach that uses Doc2Vec and diferent state-of-the-art transformer-based models for the CLEF2021 Checkthat! lab Task 3. Through this approach, the results show that it is possible to achieve 41.43% macro-average F1-score in the misinformation detection (Task A) and 67.65% macro-average F1-score in the topic classification (Task B).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Misinformation</kwd>
        <kwd>Social Media</kwd>
        <kwd>Topic Modeling</kwd>
        <kwd>Fact-checking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Misleading information spreads on the Internet at an incredible speed and Online Social
Networks (OSNs) amplify the quick spread of harmful, manipulative and false information. This
phenomenon undermines the integrity of online conversations, influences public opinion, and
originates conflicts on social, political, or health issues [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In particular, since COVID-19
emerged in Wuhan, China, in December 2019, the public has been bombarded with vast
quantities of information, much of which is not checked, leading the World Health Organization
(WHO) to coin this situation as the term infodemic [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Therefore, the development of tools
devoted to detecting malicious actors (e.g. bots and trolls) and identifying low-credibility
information and misinformation sources is a new crucial challenge. Throughout this paper, we
will use the term misinformation instead of fake news following the recommendations of the
Poynter Institute1 and the Council of Europe as they consider it inadequate to describe the
complexity of the information disorder ecosystem [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The scope of this paper is to describe a Natural Language Processing (NLP) approach that
makes use of Machine Learning (ML) and Deep Learning (DL) techniques for the CLEF2021
Checkthat! lab Task 3 [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. In this competition, we carry out a comparative study between the
classical Doc2Vec algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as document feature extractor combined with ML classifiers,
and fine-tuned state-of-the-art models based on Transformers such as T5 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], RoBERTa [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
Electra [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Longformers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        This paper is organized into the following sections: Section 2 provides a general view of
some related works on misinformation detection and the description of the Checkthat! lab
task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Section 3 introduces our proposed approach. Section 4 describes the results from the
experiments conducted. Finally, the conclusions are covered in Section 5.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description and Related Work</title>
      <p>
        In recent years, there has been growing interest in detecting misinformation [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. Since
2017, Checkthat! organizers have proposed diferent tasks of misinformation detection such as
automatic identification and verification of claims, check-worthiness, or evidence retrieval [
        <xref ref-type="bibr" rid="ref13 ref14">13,
14</xref>
        ]. In addition, other authors [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have committed to combating the misinformation generated
during the COVID-19 pandemic by collecting data since the pandemic’s outbreak to explore the
impact of fact-checkers on misinformation.
      </p>
      <p>
        The current task addressed in this paper of misinformation detection Checkthat! lab at CLEF
2021 [
        <xref ref-type="bibr" rid="ref15 ref16 ref4">4, 15, 16</xref>
        ] is divided into two subtasks: Task A and Task B. Task A is designed to classify a
set of news into four classes (false, partially false, true, other) [17]. On the other hand, Task B
consists of classifying a subset of news from Task A into six topical categories: health, economy,
crime, climate, elections, and education [18]. Both subtasks share that the text data is divided
into the title and the body of news, and that they are a multi-class classification problem with
imbalanced data (see Table 1). Therefore, the oficial evaluation metric is the macro-averaged
F1-score. The steps used by the organizers for the data collection were defined in the research
presenting the AMUSED framework [19]. It is important to point out that during the data
exploration some inconsistencies were found. For example, in Task A, some news titles and
bodies seemed to be unrelated. Moreover, in Task B, the title and the body fields appeared to be
swapped as the length of the title was longer than the length of the body.
      </p>
      <p>
        Regarding previous related work in the literature, the appearance of the attention-based
method in 2017 [20] paved the way for the development of transformer architectures such as
Bidirectional Encoder Representations from transformers (BERT) [21]. Jwa et al. [22] were
among the first to develop a model based on BERT for detecting misinformation. The authors
conclude that fine-tuning the model in the specific task leads to better results than traditional
approaches, such as using a simple classifier model based on TF-IDF and cosine similarity to
classify news [23]. Nevertheless, in the literature, there are also examples of using classical
techniques such as Doc2Vec [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to deal with long text documents in tasks related to the fight
against misinformation [24].
      </p>
      <p>Unlike Doc2Vec, one of the main transformer-based models’ limitations on Natural Language
Processing (NLP) tasks is the text length. The average text length in Task A is 4,167 and 286
words in body and title, with a maximum of 32,767 and 9,960 words, respectively. In Task B,
in body and title, the average is 4,980 and 566 words, and the maximum is 32,767 and 16,524
words, respectively. Long sequences of text are disproportionately expensive for transformers
because attention is quadratic to the sequence length [21]. For this reason, recently, a new
method has been proposed, namely Longformer. The authors of Longformer [25] developed
a model with an attention mechanism that scales linearly with sequence length by replacing
the full self-attention mechanism with the combination of local windowed attention and global
attention to have in to account larger interactions without increasing the computation, making it
easy to process documents of thousands of tokens. Furthermore, a recent research [26] includes
Longformers in a framework for jointly predicting rumor stance and veracity on the dataset
released at SemEval 2019 RumorEval [27].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed approaches methodology</title>
      <p>This section describes the proposed approaches for Tasks A and B of Checkthat! lab CLEF2021.
As described in the previous section, the training data for both subtasks contains two text
data fields, title and body news. To obtain the best results and avoid overfitting, we reserved
20% of the training data split in a stratified way as a development set. Table 2 summarizes the
hyperparameters tuned for both tasks using their respective development set. It is essential to
highlight that for each subtask using only titles, only body texts, or title and body texts as data
input is explored.</p>
      <p>Two remarkable hyperparameters for transformer-based model approaches are the sliding
window and oversampling. As previously mentioned, transformers models typically have a
restriction on the maximum length allowed for a sequence. A plausible strategy to overcome
this limitation is to use the sliding window approach introduced by Zhang et al. [28]. Here, any
sequence exceeding the maximum length is split into several windows (sub-sequences), and
each one is assigned the label from the original sequence. We explored the use of this technique,
and to minimize any information loss that hard cutofs between two windows may cause, we
applied 20% of overlapping between the sub-sequences. Finally, we explored to over-sample the
unbalanced data so that all classes had the same frequency as the most abundant class using the
RandomOversampler from imblearn2 package.
3.1. Task A</p>
      <p>Grid Search
Grid search with CV = 5
Grid search with CV = 5
Grid search with CV = 5
Grid search with CV = 5</p>
      <p>
        Grid search with CV = 5
To carry out the Task A, two approaches are tested. The first one is based on the use of the
classical Doc2Vec algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as document feature extractor combined with Machine Learning
(ML) classifiers. The second approach takes advantage of diferent state-of-the-art
transformerbased models [20, 21] to extract dense embeddings with a linear layer on top to classify the
documents into four categories.
      </p>
      <sec id="sec-3-1">
        <title>3.1.1. Doc2Vec approach</title>
        <p>
          Doc2Vec represents documents into dense vectors named document or paragraph embeddings.
This algorithm extends the idea of Word2Vec [29, 30], adding a new paragraph representation
that is trained along word embeddings to develop document-level embeddings so that documents
of difering lengths can be represented by fixed-length vectors [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These dense document vectors
can be obtained by concatenating the paragraph vector with the word vectors to predict a target
        </p>
        <sec id="sec-3-1-1">
          <title>2https://github.com/scikit-learn-contrib/imbalanced-learn</title>
          <p>word, or predicting sample words from the paragraph using the paragraph vector. These two
implementations of Doc2Vec are named PD-DM and PD-DBOW, respectively. The Doc2Vec
models are obtained from Gensim library [31]. We explore the use of PD-DM, PD-DBOW, and
the combination of both models as feature extractors for this classification task.</p>
          <p>The classifiers tested were Naive Bayes (NB), Random Forest (RF), Logistic Regression with
L1 and L2 regularization (LR1 and LR2, respectively), Elastic Net, and Support Vector Classifier
(SVC).</p>
          <p>The data processing for this approach consists of diferent steps. The ftfy package [32]
is used to repair Unicode and emoji errors, and the ekphrasis package [33] for lower-casing,
normalizing percentages, time, dates, emails, phones and numbers. Abbreviations are expanded
using contractions package3 and word tokenization, stop-word removal, punctuation removal,
and word lemmatization is carried out using the NLTK toolkit [34].</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.2. Transformers approach</title>
        <p>
          In this approach, we use diferent transformer-based models to classify the Task A news.
The models tested were T5 small and T5 base [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], Longformer base [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], RoBERTa base [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
and DistilRoBERTa base [
          <xref ref-type="bibr" rid="ref8">8, 35</xref>
          ]. The data processing procedure for this approach consists of
repairing Unicode and emoji errors with ftfy package [32] and normalizing emails, phones and
URLs with ekphrasis package [33].
        </p>
        <p>
          Finally, the model with the best performance on the development set is selected to boost its
performance by incorporating more data from related tasks: Kaggle’s KDD20204 and Clickbait
news detection5 competitions. KDD2020 competition consists of distinguishing fake claims
from authentic ones. On the other hand, Clickbait detection is focused on classifying articles
into news, clickbait, and other.
3.2. Task B
The proposed approach for Task B is based on transformer-based models. The models tested
were: Electra base [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], T5 base [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], RoBERTa base [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and DistilRoBERTa base [
          <xref ref-type="bibr" rid="ref8">8, 35</xref>
          ]. As for
the transformer-based model approach for Task A, the data processing procedure consists of
repairing Unicode and emoji errors with ftfy package [32] and normalizing emails, phones and
URLs with ekphrasis package [33].
        </p>
        <p>In addition, multi-task training was explored in the case of the T5 base model. The model was
trained on Task B and Kaggle’s Ag News task6. Ag News is a topic classification competition
with 120k news grouped into 4 categories: World, Sports, Business, and Sci-Tech.</p>
        <sec id="sec-3-2-1">
          <title>3https://github.com/kootenpv/contractions 4https://www.kaggle.com/c/fakenewskdd2020/overview 5https://www.kaggle.com/c/clickbait-news-detection 6https://www.kaggle.com/amananandrai/ag-news-classification-dataset</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>4.1. Task A
Table 3 reports the performance of Doc2Vec models evaluated in the development set. The best
macro F1-score (29.23%) is achieved using the title field as input data and combining features
from PV-DM and PV-DBOW models with Logistic Regression classifier with L2 regularization.
Remarkably, this same approach worsens when the input data includes the body text field:
24.96% F1-score only with body text and 25.93% F1-score with title and body texts.</p>
      <p>Regarding the transformer-based model approach, Table 4 details the performance of the
models, the training data, the type of data input, and if oversampling and sliding window
techniques are used during training.</p>
      <p>As expected, our experiments show that state-of-the-art transformer-based models
outperform the classical Doc2Vec algorithms. The best performance, 50.96% macro-averaged F1-score,
is achieved with DistilRoBERTa base, a distilled version of RoBERTa base, using the body field
from Checkthat! data as data input with oversampling and sliding window for dealing with
long texts. The hyperparameters selected for this model were polynomial decay scheduler
with warmup, one step for gradient accumulation, 0.04731 as weight decay, and learning rate
equals to 9.468e-5. Significantly, the performance of the model obtained using the same
hyperparameters without oversampling and without sliding window was 39.61% macro-averaged
RF
LR2
LR2
NB
LR2
NB
LR2</p>
      <p>True
False
False
True
True
True
True</p>
      <p>False
True
True
True
True
True
True
F1-score. Remarkably, the introduction of new related data from KDD2020 and Clickbait news
detection competitions did not improve the performance on Checkthat! lab Task A. Moroever,
the Clickbait task has a more noticeable impact on performance, suggesting that less related
tasks have more impact on performance.</p>
      <p>The oficial test results for the best model on the development set, DistilRoBERTa base, are
shown in Table 6.
4.2. Task B</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>
        In this work, we have proposed a NLP approach for misinformation detection Task A and
topic classification Task B from the CLEF2021 Checkthat! lab Task 3 [
        <xref ref-type="bibr" rid="ref15 ref4">4, 15</xref>
        ]. Our work has
led us to conclude that transformer-based models fine-tuned explicitly for the tasks have
achieved the best performance. In Task A, the results indicate that the transformer-based models
outperform the classical Doc2Vec model. Oversampling proves to be a valuable technique to deal
with unbalanced data in both tasks. However, the sliding window technique to overcome the
maximum length transformers’ limitation shows diferent efects in Task A and Task B. Finally,
we achieved a macro-average F1-score of 41.43% in Task A and 67.65% in Task B. In future work,
we will most likely test new architectures, such as Hierarchical Attention Networks, and add
more related data to boost the transformer-based model performance.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by the following grants and funding agencies: Spanish
Ministry of Science and Innovation under TIN2017-85727-C4-3-P (DeepBio) grant, by Comunidad
Autónoma de Madrid under S2018/TCS-4566 grant (CYNAMON), and by BBVA FOUNDATION
GRANTS FOR SCIENTIFIC RESEARCH TEAMS SARS-CoV-2 and COVID-19 under the grant:
"CIVIC: Intelligent characterisation of the veracity of the information related to COVID-19".
Relevant parts of this research is a result of the project IBERIFIER - Iberian Digital Media Research
and Fact-Checking Hub, funded by the European Commission under the call CEF-TC-2020-2
(European Digital Media Observatory), grant number 2020-EU-IA-0252. Finally, the work has
been supported by the Comunidad Autónoma de Madrid under Convenio Plurianual with
the Universidad Politécnica de Madrid in the actuation line of "Programa de Excelencia para el
Profesorado Universitario".
[17] G. K. Shahi, A. Dirkson, T. A. Majchrzak, An exploratory study of covid-19 misinformation
on twitter, Online Social Networks and Media 22 (2021) 100104.
[18] G. K. Shahi, A multilingual domain identification using fact-checked articles: A case study
on covid-19 misinformation, arXiv preprint (2021).
[19] G. K. Shahi, Amused: An annotation framework of multi-modal social media data, arXiv
preprint arXiv:2010.00502 (2020).
[20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I.
Polosukhin, Attention is all you need, 2017. arXiv:1706.03762.
[21] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[22] H. Jwa, D. Oh, K. Park, J. M. Kang, H. Lim, exbake: Automatic fake news detection model
based on bidirectional encoder representations from transformers (bert), Applied Sciences
9 (2019). doi:10.3390/app9194062.
[23] B. Riedel, I. Augenstein, G. P. Spithourakis, S. Riedel, A simple but tough-to-beat baseline
for the fake news challenge stance detection task, 2018. arXiv:1707.03264.
[24] B. Anjali, R. Reshma, V. Geetha Lekshmy, Detection of counterfeit news using
machine learning, in: 2019 2nd International Conference on Intelligent Computing,
Instrumentation and Control Technologies (ICICICT), volume 1, 2019, pp. 1382–1386.
doi:10.1109/ICICICT46008.2019.8993330.
[25] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The long-document transformer, 2020.</p>
      <p>arXiv:2004.05150.
[26] A. Khandelwal, Fine-tune longformer for jointly predicting rumor stance and veracity
(2020).
[27] G. Gorrell, K. Bontcheva, L. Derczynski, E. Kochkina, M. Liakata, A. Zubiaga, Rumoureval
2019: Determining rumour veracity and support for rumours, 2018. arXiv:1809.06683.
[28] Z. Wang, P. Ng, X. Ma, R. Nallapati, B. Xiang, Multi-passage bert: A globally normalized
bert model for open-domain question answering, 2019. arXiv:1908.08167.
[29] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words
and phrases and their compositionality, 2013. arXiv:1310.4546.
[30] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eficient estimation of word representations in
vector space, 2013. arXiv:1301.3781.
[31] R. Řehůřek, P. Sojka, Software Framework for Topic Modelling with Large Corpora, in:
Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA,
Valletta, Malta, 2010, pp. 45–50.
[32] R. Speer, ftfy, Zenodo, 2019. doi:10.5281/zenodo.2591652, version 5.5.
[33] C. Baziotis, N. Pelekis, C. Doulkeridis, Datastories at semeval-2017 task 4: Deep lstm
with attention for message-level and topic-based sentiment analysis, in: Proceedings of
the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for
Computational Linguistics, Vancouver, Canada, 2017, pp. 747–754.
[34] E. Loper, S. Bird, Nltk: The natural language toolkit, in: Proceedings of the ACL-02
Workshop on Efective Tools and Methodologies for Teaching Natural Language Processing
and Computational Linguistics - Volume 1, ETMTNLP ’02, Association for Computational
Linguistics, USA, 2002, p. 63–70. URL: https://doi.org/10.3115/1118108.1118117. doi:10.
3115/1118108.1118117.
[35] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,
S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Huggingface’s transformers: State-of-the-art
natural language processing, 2020. arXiv:1910.03771.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Naeem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bhatti</surname>
          </string-name>
          , The Covid-
          <volume>19</volume>
          'infodemic':
          <article-title>a new front for information professionals</article-title>
          ,
          <source>Health Information &amp; Libraries Journal</source>
          <volume>37</volume>
          (
          <year>2020</year>
          )
          <fpage>233</fpage>
          -
          <lpage>239</lpage>
          . doi:
          <volume>10</volume>
          .1111/hir.12311.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Quattrociocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galeazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Valensise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brugnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scala</surname>
          </string-name>
          ,
          <source>The COVID-19 social media infodemic</source>
          ,
          <source>Scientific Reports</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <article-title>16598</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41598-020-73510-5.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Estrada-Cuzcano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Alfaro-Mendives</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Saavedra-Vásquez</surname>
          </string-name>
          ,
          <article-title>Disinformation y misinformation, posverdad y fake news: precisiones conceptuales, diferencias</article-title>
          , similitudes y yuxtaposiciones, Información, cultura y sociedad (
          <year>2020</year>
          )
          <fpage>93</fpage>
          -
          <lpage>106</lpage>
          . doi:
          <volume>10</volume>
          .34096/ics.i42.
          <fpage>7427</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          , T. Mandl,
          <string-name>
            <surname>The</surname>
            <given-names>CLEF</given-names>
          </string-name>
          -
          <year>2021</year>
          <article-title>CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news</article-title>
          ,
          <source>in: Proceedings of the 43rd European Conference on Information Retrieval</source>
          , ECIR '21,
          <string-name>
            <surname>Lucca</surname>
          </string-name>
          , Italy,
          <year>2021</year>
          , pp.
          <fpage>639</fpage>
          -
          <lpage>649</lpage>
          . URL: https://link.springer.com/chapter/ 10.1007/978-3-
          <fpage>030</fpage>
          -72240-1_
          <fpage>75</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          , T. Mandl,
          <article-title>Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection</article-title>
          , in: Working Notes of CLEF 2021-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2021</year>
          , Bucharest, Romania (online),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Distributed representations of sentences and documents</article-title>
          ,
          <year>2014</year>
          . arXiv:
          <volume>1405</volume>
          .
          <fpage>4053</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1910</year>
          .10683.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Electra:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2003</year>
          .10555.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Kang</surname>
          </string-name>
          , H. Lim,
          <article-title>exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert</article-title>
          ),
          <source>Applied sciences 9</source>
          (
          <year>2019</year>
          )
          <fpage>4062</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Kaliyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Narang</surname>
          </string-name>
          , Fakebert:
          <article-title>Fake news detection in social media with a bert-based deep learning approach</article-title>
          ,
          <source>Multimedia tools and applications 80</source>
          (
          <year>2021</year>
          )
          <fpage>11765</fpage>
          -
          <lpage>11788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mensio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Khare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          ,
          <article-title>Co-spread of misinformation and factchecking content during the covid-19 pandemic</article-title>
          , in
          <source>: Proceedings of the 12th International Social Informatics Conference (SocInfo)</source>
          , LNCS,
          <year>2020</year>
          . URL: http://oro.open.ac.uk/71786/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          , G. Da San Martino, P. Atanasova,
          <article-title>Overview of the clef-2019 checkthat! lab: Automatic identification and verification of claims, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction, volume
          <volume>11696</volume>
          of Lecture Notes in Computer Science, Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>301</fpage>
          -
          <lpage>321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, M. Hasanain,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hamdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. S.</given-names>
            <surname>Ali</surname>
          </string-name>
          , Overview of checkthat! 2020:
          <article-title>Automatic identification and verification of claims in social media, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction, Lecture Notes in Computer Science, Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference of the CLEF Association: Information Access Evaluation Meets Multiliguality</source>
          , Multimodality, and Visualization, CLEF '
          <year>2021</year>
          , Bucharest, Romania (online),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          , D. Nandini, FakeCovid
          <article-title>- a multilingual cross-domain fact check news dataset for covid-19</article-title>
          , in: Workshop Proceedings of the 14th
          <source>International AAAI Conference on Web and Social Media</source>
          ,
          <year>2020</year>
          . URL: http://workshop-proceedings.icwsm.org/pdf/2020_14.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>