<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TOBB ETU at CheckThat! 2020: Prioritizing English and Arabic Claims Based on Check-Worthiness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yavuz Selim Kartal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mucahid Kutlu</string-name>
          <email>m.kutlug@etu.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TOBB University of Economics and Technology</institution>
          ,
          <addr-line>Ankara</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Misinformation has many negative consequences on our daily life. While spread of misinformation is very fast, investigating veracity of claims is slow. Therefore, we urgently need systems helping human fact checkers in the combat against misinformation. In this paper, we present our participation in check-worthiness tasks (i.e., Task 1 and Task 5) of CLEF-2020 Check That! Lab. For English Task 1, we use logistic regression with ned-tuned BERT predictions, POS tags, controversial topics and a hand-crafted word list as features. For English Task 5, we again use logistic regression with ned-tuned BERT predictions and word embeddings as features. For Arabic Task 1, we use a hybrid approach of ned-tuned BERT model with the model used for English Task 5. For the Arabic task, we use AraBert as our Bert model. In the o cial evaluation of primary submissions, our primary models a) ranked 3rd in Arabic Task 1 based on P@30 and shared the 1st rank with another group based on P@5, b) ranked 5th in English Task 1 based on average precision and shared the 1st rank with ve other groups based on reciprocal rank, P@1, P@3 and P@5 metrics, and c) ranked 3rd in Task 5 based on average precision.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Social media platforms provide an incredibly easy way to share information with
others. Any information, including misinformation, can reach millions of people
in a very short time. Unfortunately, misinformation spread over Internet cause
many unpleasant incidents such as huge changes in stock prices1. Since the start
of on-going Covid-19 pandemic, we have also witnessed how misinformation can
cause unhealthy, potentially deadly, practices such as gargling with bleach to
prevent Covid-192.</p>
      <p>
        In order to combat misinformation, there are many fact-checking websites
which manually investigate veracity of claims and share their ndings with
public. However, misinformation spread much faster than true information [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and
investigating veracity of claims are extremely time consuming, taking around one
day for a single claim [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Considering the vast amount of claims spread on the
Internet and high cost of fact-checking, we urgently need systems helping
factcheckers to detect check-worthy claims, enabling them to focus on the important
claims instead of spending their precious time for less important claims.
      </p>
      <p>
        CLEF 2020 Check That! Lab [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has organized two di erent shared-tasks
(Task 1 and Task 5) for detecting check worthy claims. Task 1 has two di erent
datasets, consisting of Arabic and English tweets while Task 5 has English
political debates and transcribed speeches. In this paper, we present our methods
developed for both Task 1 and Task 5.
      </p>
      <p>In our study, we use two di erent ranking methodologies including logistic
regression model and a hybrid combination of ned-tuned BERT model with
logistic regression. We also investigate many di erent features including
wordembeddings, presence of comparative and superlative adjectives, hand-crafted
word list, domain-speci c controversial topics, POS tags, metadata of tweets, and
predictions of ned-tuned BERT models. Based on our experiments on training
data, we use the following primary models for each task.</p>
      <p>{ Arabic Task 1: Hybrid model using word embeddings and BERT
predictions as features for logistic regression model.
{ English Task 1: Logistic regression model with POS tags, controversial
topics, comparative and superlative adjectives, and BERT predictions as
features.
{ English Task 5: Logistic regression model with word embeddings and
BERT predictions as features.</p>
      <p>CLEF 2020 Check That! Lab used precision@30 (P@30) for Arabic Task 1
and average precision (AP) for English Task 1 and Task 5, as o cial evaluation
metrics. Based on o cial metrics, our primary models for Arabic Task 1, English
Task 1, and Task 5 ranked 3rd (out of 8 groups), 5th (out of 12 groups), and
3rd (out of 3 groups), respectively. However, based on other metrics, our models
shared the rst rank with others in many cases. In particular, our primary model
for Arabic Task 1 shared the 1st rank with another group based on P@5. In
addition, our primary model in Task 1 English shared the 1st rank with 5 other
groups on RR, P@1, P@3 and P@5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        There are a number of studies in check-worthy claim detection. Hassan et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
develop ClaimBuster which is one of the rst check-worthy claim detection
models. ClaimBuster uses many features including part-of-speech (POS) tags, named
entities, sentiment, and TF-IDF representations of claims. TATHYA [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] uses
topics detected in all presidential debates from 1976 to 2016, POS tuples, entity
history, and bag-of-words as features.
      </p>
      <p>
        Gencheva et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose a neural network model with a long list of
sentence level and contextual features including sentiment, named entities, word
embeddings, topics, contradictions, and others. Jaradat et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] use similar
features with Gencheva et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] but extend the model for Arabic. In its
followup work, Vasileva et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] propose a multi-task learning model to detect
whether a claim will be fact-checked by at least ve of 9 reputable fact-checking
organizations.
      </p>
      <p>
        In 2018, Check That! Lab (CTL) has been organized for the rst time in
English and Arabic with participation of seven teams [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The participants
investigated many learning models such as recurrent neural network (RNN) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
multilayer perceptron [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], random forest (RF) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], k-nearest neighbor (kNN) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
gradient boosting [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] with di erent sets of features such as bag-of-words [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ],
character n-gram [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], POS tags [
        <xref ref-type="bibr" rid="ref10 ref22 ref23">10, 22, 23</xref>
        ], verbal forms [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], named entities [
        <xref ref-type="bibr" rid="ref22 ref23">22,
23</xref>
        ], syntactic dependencies [
        <xref ref-type="bibr" rid="ref10 ref23">23, 10</xref>
        ], and word embeddings [
        <xref ref-type="bibr" rid="ref10 ref22 ref23">10, 22, 23</xref>
        ]. On English
dataset, Prise de Fer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] team achieved the best MAP scores using bag-of-words,
POS tags, named entities, verbal forms, negations, sentiment, clauses, syntactic
dependency and word embeddings with SVM-Multilayer perceptron learning.
On Arabic dataset, the model of Yasser et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] outperformed the others using
POS tags, named entities, sentiment, topics, and word embeddings.
      </p>
      <p>
        In CTL'19, chekc-worthiness task has been organized for only English [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. 11
teams participated in the task and used varying models such as LSTM, SVM,
naive bayes, and logistic regression (LR) with many features including readability
of sentences and their context. Copenhagen team [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] achieved the best
overall performance using syntactic dependency and word embeddings with weakly
supervised LSTM model.
      </p>
      <p>
        The labeled datasets provided by CTL enabled further studies in this task.
Lespagnol et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] explore using SVM, LR, and Random Forests with a long
list of features including word-embeddings, POS tags, syntactic dependency tags,
entities, and "information nutritional" features which represent factuality,
emotion, controversy, credibility, and technicality of statements. Kartal et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] use
logistic regression utilizing BERT model with additional features including word
embeddings, controversial topics, hand-crafted list of words, POS tags, presence
of comparative and superlative adjectives, and adverbs. They achieve the
highest AP scores on both CTL'18 and CTL'19 English datasets. In CTL'20, we use
features adapted from Kartal et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]'s model. However, we also explore
additional features such as tweet meta data features. We also investigate a hybrid
combination of ne-tuned BERT model with logistic regression.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach</title>
      <p>In this section, we explain the features we investigate (Section 3.1) and models
to prioritize claims (Section 3.2).
3.1</p>
      <sec id="sec-3-1">
        <title>Features</title>
        <p>
          BERT: We rst remove mentions and URLs from tweets. For Arabic tweets, we
also apply spelling correction using Farasa3. Then we ne tune BERT models
using respective training data sets. We use multilingual uncased-large BERT
model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for English tasks and Ara-BERT model [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] for the Arabic task. The
prediction value of the ned tuned BERT model is used as a feature in the
logistic regression model.
        </p>
        <p>
          Word Embeddings (WE): Word embeddings are able to catch semantic and
syntactic features of words. Thus, we use word embeddings to capture similarities
between claims. Speci cally, we represent each sentence as the average vector of
its words. We use word2vec models pre-trained on Google news [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] in Task 5.
For Task 1, we use fastText models pre-trained on Wikipedia [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Both word
embedding models provide a vector size of 300. We exclude out-of-vocabulary
words when we use word2vec.
        </p>
        <p>
          Controversial Topics (CT): We use controversial topics feature de ned by
Kartal et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In this feature, 11 major controversial topics in current US
politics (e.g., immigration, gun policy, racism, abortion) are de ned. Each topic
is represented by the average word embeddings of hand-crafted related words
(e.g., "immigrants", "illegal", "borders", "Mexican", "Latino", and "Hispanic"
for the immigration topic). We also represent sentences/tweets to be ranked as
the average word embeddings excluding stopwords of NLTK4. Subsequently, we
calculate cosine similarity between sentences/tweets and each topic using their
vector representation. This feature is used only for English datasets because this
feature is valid only for claims about US politics.
        </p>
        <p>
          Handcrafted Word List (HW): We use handcrafted word list feature de ned
by Kartal et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In this feature, rstly, 66 words which might be correlated to
check-worthy claims are identi ed (e.g., unemployment). Then, we check whether
there is an overlap between lemmas of selected words and lemmas of words in
the respective sentence/tweet.
        </p>
        <p>Part-of-speech (POS) Tags: Informative words can make a sentence/tweet
more likely to be check-worthy. Thus, in this feature set, we use the number
of nouns, verbs, adverbs and adjectives in order to catch information load of
sentences/tweets.</p>
        <p>
          Comparative &amp; Superlative (CS): In this feature, we use the number of
comparative and superlative adjectives and adverbs in sentences/tweets, as de ned
by Kartal et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>Tweet Meta Data (TMD): Meta data of tweets might be an indicator for
check-worthy claims. For instance, if a tweet is retweeted a lot or shared by an
inuential people, it might be check-worthy because it reaches to many people and
a ect people. Speci cally, in this feature group, we use the following information
about tweets: 1) whether the account is a veri ed one, 2) whether the tweet is
agged as sensitive content, 3) whether the tweet is quoting another tweet, 4)</p>
        <sec id="sec-3-1-1">
          <title>3 http://qatsdemo.cloudapp.net/farasa/ 4 https://www.nltk.org/</title>
          <p>presence of a URL, 5) presence of a hashtag, 6) whether a user is mentioned, 7)
retweet counts, and 8) favorite counts.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Ranking Methodology</title>
        <p>We use two di erent approaches to prioritize claims based on their check-worthiness
using features de ned above.</p>
        <p>
          Logistic Regression (LR): LR is commonly used in state-of-the-art
checkworthy detection models [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]. Thus, we also train a LR model with features
de ned above. Then we rank claims based on their predicted probabilities of
being check-worthy.
        </p>
        <p>Hybrid In this model, we apply a hybrid approach combining logistic regression
model and BERT model. We rst ne tune BERT model as explained above and
rank claims using the ne-tuned BERT model. We keep the rankings of the top
10 claims as they are, but re-rank the other claims using logistic regression with
word embeddings and BERT features explained above. For Arabic Task 1, we
use Ara-Bert as our BERT model.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Implementation</title>
        <p>
          We use ktrain5 and huggingface transformers [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] to ne-tune BERT models
with 1 cycle learning rate policy and maximum learning rate of 2e-5 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. We
use SpaCy6 for all syntactic and semantic analyses. We use Scikit toolkit7 for
the implementation of LR. We use default parameters for LR.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Experimental Setup</title>
        <p>Our experiments are in two steps. We rst evaluate di erent models using
training datasets. Subsequently, we report results for our models participated in the
shared task on the test data. In evaluation of di erent models with the training
data, we use di erent cases for each task and language because the data formats
and sizes are di erent for each of them. In particular, in Arabic Task 1 , we use
5-fold cross validation. In English Task 1, both training and validation data sets
are provided in the development phase of the shared task. Thus, we use the same
setup. In English Task 5, transcripts of 50 political debates and speeches are
provided. Following the suggestion of the shared task organizers, we use the rst
40 les (i.e., debates) as training and remaining 10 les for evaluating di erent
models in the development phase of the shared task.</p>
        <p>We evaluate the models with the following metrics: average precision (AP),
precision@1 (P@1), precision@5 (P@5), precision@10 (P@10) and precision@30
(P@30). The o cial metrics are P@30 for Arabic and AP for English tasks.</p>
        <sec id="sec-4-2-1">
          <title>5 https://pypi.org/project/ktrain/ 6 https://spacy.io/ 7 https://scikit-learn.org</title>
          <p>4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Experimental Results</title>
        <p>
          Experiments on Training Data. We rst compare performance of di erent
models on Arabic training dataset using 5-fold cross validation. In particular, we
use ne-tuned Multilingual BERT (M-BERT) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], Ara-BERT, logistic regression
with di erent combinations of BERT, WE and TMD features de ned in Section
2, and our hybrid model.
        </p>
        <p>The results are shown in Table 1. Our observations are as follows. Firstly,
Ara-BERT outperforms M-BERT, showing superior performance of language
speci c models compared to multilingual models. Secondly, TMD features do
not yield higher prediction accuracy. Lastly, hybrid model outperforms all other
models based on all metrics. Thus, we choose hybrid model as our primary model
for Arabic Task 1. We also choose the second best model which is LR with
AraBERT and WE, as our contrastive submission (C1).</p>
        <p>Next, we compare di erent models using training data provided for English
Task 1. In particular, we investigate performance of ned tuned BERT model,
logistic regression with di erent sets of features de ned in Section 2, and our
hybrid model. In this set of experiments, we also use two di erent word embedding
models, word2vec and fastText (FT) for WE features.</p>
        <p>The results are shown in Table 2. Our observations based on the results are
as follows. Firstly, word2vec yields higher AP scores than fastText in our logistic
regression model (0.625 vs. 0.573). However, we observe the opposite case in our
hybrid model such that fastText yields slightly higher results than word2vec
(0.799 vs. 0.805). Secondly, using only BERT outperforms all models that do
not use BERT. Thirdly, we achieve our best AP scores results when we use
logistic regression with our BERT, POS, CT, and HW features together. Lastly,
replacing HW with CS yields slightly lower AP (0.817 vs. 0.821) but higher
P@30 (0.867 vs. 0.833). Based on these results, we choose logistic regression
with BERT, POS, CT, and HW features, as our primary model.
For Task 5, we investigate performance of ned tuned BERT model, logistic
regression with di erent sets of features de ned in Section 2, and our hybrid
model. The results are shown in Table 3. The primary model for English Task 1
(i.e., LR with POS, CT, HW and BERT features) achieve the best P@30 scores
while hybrid model (i.e., primary model for Arabic Task 1) is inferior to other
models. Logistic regression with BERT and WE features achieve the best AP
scores. Thus, we select this model as our primary model for Task 5.
Experiments on Test Data. We train our primary and contrastive models
using training data provided in the development phase of the shared task. The
results are shown in Table 4. In Arabic Task 1, our best run (C1) is ranked 2nd
among all best runs per team based on o cial metric P@30. Our primary model
also shares the rst rank with another group based on P@5 metric. Considering
all runs submitted for Arabic Task 1, our contrastive and primary models are
ranked 5th and 7th among 28 participants, respectively, based on P@30.</p>
        <p>In English Task 1, our primary model is ranked 5th among all primary
models. However, our primary model and second contrastive model (C2) share the
rst rank with nine other models based on P@1 and P@5 metrics. Our second
contrastive model actually outperforms our primary model and shares the rst
rank with ve other models based on P@10.</p>
        <p>In English Task 5, all our models unfortunately show poor performance on
the test dataset. Our primary model ranked third among three primary models.
In this paper, we present our participation in Task 1 and Task 5 of CLEF-2020
Check That! Lab. We use three di erent models for Arabic Task 1, English Task
1, and Task 5 as our primary models. For Arabic Task 1, we propose a hybrid
model which uses a ne-tuned BERT model for the top ten claims and then use
logistic regression model with BERT and word embedding features to re-rank the
remaining claims. For English Task 1, we rank claims using a logistic regression
with features including domain-speci c controversial topics, prediction of
nedtuned BERT model, a handcrafted word list, and POS tags. For English Task
5, we use logistic regression with BERT and word embedding features.</p>
        <p>Our primary models for Arabic Task 1, English Task 1, and Task 5 ranked 3rd
(out of 8 groups), 5th (out of 12 groups), and 3rd (out of 3 groups), respectively,
based on o cial evaluation metric of each task. Our models also share the rst
rank with other groups in Arabic Task 1 and English Task 1 based on various
evaluation metrics.</p>
        <p>We believe that misinformation is a global problem. Therefore, we plan to
work on di erent languages and build a multilingual check-worthy claim
detection model in the future. Furthermore, the limited number of annotated datasets
is one of the main obstacles to develop e ective systems. Thus, we also plan to
explore weak supervision methods and develop deep learning models for this
task.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Agez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lespagnol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Petitcol</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          . IRIT at checkthat!
          <year>2018</year>
          . In Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>W.</given-names>
            <surname>Antoun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Baly</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajj</surname>
          </string-name>
          . Arabert:
          <article-title>Transformer-based model for arabic language understanding</article-title>
          .
          <source>In LREC 2020 Workshop Language Resources and Evaluation Conference</source>
          <volume>11</volume>
          {
          <issue>16</issue>
          <year>May 2020</year>
          , page 9.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barron-Cedeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kyuchukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <article-title>Overview of the clef-2018 checkthat! lab on automatic identi cation and veri cation of political claims. task 1: Check-worthiness</article-title>
          . arXiv preprint arXiv:
          <year>1808</year>
          .05542,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohtarami</surname>
          </string-name>
          , and G. Da San Martino.
          <article-title>Overview of the clef-2019 checkthat! lab on automatic identi cation and veri cation of claims. task 1: Check-worthiness</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Barron-Ceden</surname>
          </string-name>
          ~o, T. Elsayed,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, M. Hasanain,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hamdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z. Sheikh</given-names>
            <surname>Ali</surname>
          </string-name>
          . Overview of CheckThat! 2020:
          <article-title>Automatic identi cation and veri - cation of claims in social media</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          {
          <fpage>4186</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>P.</given-names>
            <surname>Gencheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marquez</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Barron-Ceden~o, and</article-title>
          <string-name>
            <surname>I. Koychev.</surname>
          </string-name>
          <article-title>A contextaware approach for detecting worth-checking claims in political debates</article-title>
          .
          <source>In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017</source>
          , pages
          <fpage>267</fpage>
          {
          <fpage>276</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ghanem</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montes-y-</article-title>
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>F. M. R.</given-names>
          </string-name>
          <string-name>
            <surname>Pardo</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          .
          <article-title>UPV-INAOE - check that: Preliminary approach for checking worthiness of claims</article-title>
          .
          <source>In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <article-title>Learning word vectors for 157 languages</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>C. Hansen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>J. G.</given-names>
          </string-name>
          <string-name>
            <surname>Simonsen</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lioma</surname>
          </string-name>
          .
          <article-title>The copenhagen team participation in the check-worthiness task of the competition of automatic identication and veri cation of claims in political debates of the clef-2018 checkthat! lab</article-title>
          . In CLEF,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. Hansen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>J. G.</given-names>
          </string-name>
          <string-name>
            <surname>Simonsen</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lioma</surname>
          </string-name>
          .
          <article-title>Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss</article-title>
          .
          <source>In Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September</source>
          <volume>9</volume>
          -
          <issue>12</issue>
          ,
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Caraballo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gawsane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sable</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tremayne</surname>
          </string-name>
          . Claimbuster:
          <article-title>The rst-ever end-to-end fact-checking system</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>10</volume>
          :
          <year>1945</year>
          {
          <year>1948</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. I. Jaradat,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gencheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barron-Ceden</surname>
          </string-name>
          ~o,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marquez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          . Claimrank:
          <article-title>Detecting check-worthy claims in arabic and english</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations</source>
          , pages
          <volume>26</volume>
          {
          <fpage>30</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Guvenen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          .
          <article-title>Too many claims to fact-check: Prioritizing political claims based on check-worthiness</article-title>
          .
          <source>ArXiv</source>
          , abs/
          <year>2004</year>
          .08166,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>C. Lespagnol</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            , and
            <given-names>M. Z.</given-names>
          </string-name>
          <string-name>
            <surname>Ullah</surname>
          </string-name>
          .
          <article-title>Information nutritional label and word embedding to estimate information check-worthiness</article-title>
          .
          <source>In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>941</volume>
          {
          <fpage>944</fpage>
          . ACM,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>A.</given-names>
            <surname>Patwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goldwasser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bagchi</surname>
          </string-name>
          .
          <article-title>Tathya: A multi-classi er system for detecting check-worthy statements in political debates</article-title>
          .
          <source>In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management</source>
          , pages
          <volume>2259</volume>
          {
          <fpage>2262</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <article-title>A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay</article-title>
          .
          <source>ArXiv</source>
          , abs/
          <year>1803</year>
          .09820,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. S. Vasileva,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marquez</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Barron-Ceden~o, and</article-title>
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <article-title>It takes nine to smell a rat: Neural multi-task learning for check-worthiness prediction</article-title>
          .
          <source>In Proceedings of the International Conference on Recent Advances in Natural Language Processing</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>S.</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Aral</surname>
          </string-name>
          .
          <article-title>The spread of true and false news online</article-title>
          .
          <source>Science</source>
          ,
          <volume>359</volume>
          (
          <issue>6380</issue>
          ):
          <volume>1146</volume>
          {
          <fpage>1151</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Brew</surname>
          </string-name>
          .
          <article-title>Huggingface's transformers: Stateof-the-art natural language processing</article-title>
          . ArXiv, abs/
          <year>1910</year>
          .03771,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>K.</given-names>
            <surname>Yasser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          . bigir at CLEF 2018:
          <article-title>Detection and veri cation of check-worthy political claims</article-title>
          .
          <source>In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>C. Zuo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Karakas</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Banerjee</surname>
          </string-name>
          .
          <article-title>A hybrid recognition system for check-worthy claims using heuristics and supervised learning</article-title>
          .
          <source>In CLEF</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>