<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alex Nikolov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Da San Martino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Koychev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Preslav Nakov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FMI, So a University \St Kliment Ohridski"</institution>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Qatar Computing Research Institute</institution>
          ,
          <addr-line>HBKU</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>While misinformation and disinformation have been thriving in social media for years, with the emergence of the COVID-19 pandemic, the political and the health misinformation merged, thus elevating the problem to a whole new level and giving rise to the rst global infodemic. The ght against this infodemic has many aspects, with fact-checking and debunking false and misleading claims being among the most important ones. Unfortunately, manual fact-checking is time-consuming and automatic fact-checking is resource-intense, which means that we need to pre- lter the input social media posts and to throw out those that do not appear to be check-worthy. With this in mind, here we propose a model for detecting check-worthy tweets about COVID-19, which combines deep contextualized text representations with modeling the social context of the tweet. Our o cial submission to the English version of CLEF-2020 CheckThat! Task 1, system Team Alex, was ranked second with a MAP score of 0.8034, which is almost tied with the wining system, lagging behind by just 0.003 MAP points absolute.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The rise of disinformation (aka \fake news") in social media has given rise to a
number of initiatives to fact-check claims of general interest and to con rm or
to debunk them. Unfortunately, manual fact-checking is a very time-consuming
process, and thus automated approaches have been proposed as a faster
alternative. Yet, even with automated methods, it is not possible to fact-check every
single claim, and the accuracy of automated fact-checking systems is signi cantly
lower than that of human experts. Furthermore, there is a need to pre- lter and
to prioritize what should be passed to human fact-checkers. As the need to
prioritize has been gradually gaining recognition, so was the task of check-worthiness
estimation, which is seen as an important rst step in the general fact-checking
pipeline. A leading e ort in this direction has been the CLEF CheckThat! lab,
which featured a check-worthiness estimation task in all its editions [
        <xref ref-type="bibr" rid="ref1 ref2">1,2,18</xref>
        ].
      </p>
      <p>
        Traditionally, check-worthiness estimation has focused on political debates
and speeches, ignoring social media. In order to bridge this gap, the 2020 edition
of the CLEF-2020 CheckThat! Lab [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] featured Task 1 on check-worthiness
estimation on tweets, o ered in Arabic and English [10,18]. Given the prominence of
disinformation related to COVID-19, which has grown to become the rst global
infodemic, the English Task 1 focused on tweets related to COVID-19.
      </p>
      <p>The lab organizers provided a dataset of tweets originating from the early
days of the global COVID-19 pandemic and covering a variety of
COVID-19related topics, e.g., concerning the number of con rmed cases in di erent parts
of the world, the measures taken by local governments to combat the pandemic,
claims about the nature of the virus, etc. The participants were challenged to
develop systems to rank a set of input tweets according to their check-worthiness.</p>
      <p>Below, we describe the system we developed for the English Task 1, which is
an ensemble combining deep contextualized text representations with social
context. Our o cial submission was ranked second-best, and it was almost tied with
the winner. We further describe a number of additional experiments and
comparisons, which we believe should be useful for future research as they provide
some indication about what techniques are e ective for the task.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The earliest work on claim check-worthiness estimation is the ClaimBuster
system [11], which was trained on manually annotated US Presidential debates. It
used TF.IDF features after discarding words appearing in less than three
documents. In addition, for each sentence they calculated a sentiment score, word
counts, number of occurrences for 43 Part-of-Speech (POS) tags from the Penn
Treebank, as well as the frequency of use of 26 entity types, such as Person and
Organization. They performed feature selection using a random forest and GINI
index, and then conducted various experiments using feature subsets passed on
to a Support Vector Machine, a Random Forest, and a Nave Bayes classi ers.</p>
      <p>
        In a related line of work, Gencheva &amp; al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] created a dataset also based on US
Presidential debates, but with annotations from nine fact-checking organizations.
Their goal was to mimic the selection strategies of these organizations, and they
focused on modeling the context. They reused most of ClaimBuster's features
and added the number of named entities within a sentence, the number of words
belonging to one of nine possible lexicons, i.e., words indicating bias or negative
words. For the context, they added features for sentence positioning within a
speaker's segment, for the size of the segment a sentence belongs to, as well as
for the size of the previous and of the next segments, along with several metadata
features. They further trained an LDA topic model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on Presidential debates
and used it to extract a distribution over 300 learned topics, which they used
as additional features. Finally, they added averaged word2vec word embeddings
[14] for each sentence. They trained an SVM classi er as well as a feed-forward
neural network with two hidden layers, and ReLU for activation, and they found
that the additional context features yielded sizable performance gains.
      </p>
      <p>In follow-up work, Vasileva &amp; al. [19] used a multi-task learning neural
network that predicts whether a sentence would be selected for fact-checking by each
individual fact-checking organization (from a set of nine such organizations), as
well as by any of them.</p>
      <p>Yet another follow-up work resulted in the development of the ClaimRank
system, which was trained on more data and also included Arabic content [12].
Other related work, also focused on political debates and speeches. For example,
Patwari &amp; al. [16] predicted whether a sentence would be selected by a
factchecking organization using a boosting-like model.</p>
      <p>
        Last but not least, the task was the topic of CLEF in 2018 [
        <xref ref-type="bibr" rid="ref1">1,15</xref>
        ] and
2019 [
        <xref ref-type="bibr" rid="ref2 ref7 ref8">2,7,8</xref>
        ], where the focus was once again on political debates and speeches,
from a single fact-checking organization.
      </p>
      <p>
        In a slightly di erent domain, Konstantinovskiy &amp; al. [13] developed a dataset
for check-worthiness estimation consisting of manually annotated sentences from
TV debates in the UK. They used InferSent sentence embeddings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], as well as
the number of occurrences of some POS tags, and the number of di erent named
entities within a sentence. They experimented with a number of classi ers such
as Logistic Regression, SVMs, Nave Bayes, and Random Forest.
      </p>
      <p>Note that all above systems were trained on speeches and debates, while
the task we deal with here is about tweets. While we reuse some of the
features of these systems, we focus on the rst step for nding appropriate data
representation, i.e., using pre-processing techniques speci c for tweets.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data Pre-processing</title>
      <p>We applied di erent pre-processing techniques to the raw tweet text, which we
describe in the following subsections.
3.1</p>
      <sec id="sec-3-1">
        <title>Default Pre-processing</title>
        <p>We will begin with the description of our default pre-processing. It includes the
following processing rules and heuristics:
Splitting hashtags into separate words based on UpperCamelCase. The
UpperCamelCase convention is a way to write multiple words joined together as
a single word with the rst letter of each of the multiple words capitalized within
the joined word. It is a technique commonly used in hashtags in Twitter, e.g., the
string #TheMoreYouKnow is composed of four words: the, more, you, and know.
Such joined-words hashtags are common in Twitter, and thus we attempted to
split them, possibly extracting useful text and facilitating the understanding of
the tweets. As hashtags can deviate from the standard UpperCamelCase
convention, we further added some additional rules to cope with some variations.
Uni cation of the hashtags about COVID-19. The dataset contains tweets
with hashtags such as #covid19, #covid 19, #Covid2019. We replaced all such
hashtags with the canonical tag #covid-19. Similarly, we uni ed the di erent
ways of spelling (or even misspelling) the colloquial term corona virus,
including hashtags such as #coronavirus, #corona, #korona; we replaced all such
wordforms and hashtags with the canonical form corona virus.</p>
        <p>Replacing `@' with `user'. In general, the identity of a tweet's author is
irrelevant regarding a tweet's check-worthiness, and thus we replaced all user
mentions with the special token user. However, we preserved the identities of
in uential public gures and organizations since their status might in uence
the check-worthiness of the respective tweets. In such cases, we replaced certain
usernames with the person's actual name or title. For example, we replaced
@realDonaldTrump with Donald Trump, @VP with Vice President, and @WHO with
World Health Organization.</p>
        <p>Replacing URLs with `url'. The presence of a URL can have an impact on
the nal check-worthiness label. A classi er might have di culties di erentiating
between di erent target URLs, and thus we replaced all URLs by the special url
token.</p>
        <p>Removing hashtags at the end of tweets. Many tweets contained hashtags
coming after the meaningful textual statement, and possibly before an included
URL, e.g., `This is a scandal! #Covid-19 #Upset #Scandal'. We observed that
such hashtags typically did not bring much information with respect to
checkworthiness, and thus we removed them.</p>
        <p>Expanding shortened quantities. We replaced tokens such as 7m and 12k
with expanded versions such as 7 million and 12 thousand, respectively.
Removing punctuation marks, except for quotation marks. Punctuation
does not help much semantically, and thus we removed it. However, we kept
quotation marks, which can often indicate the beginning or the ending of a
quote, which may be the key point of a claim worth fact-checking.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Corona Pre-processing</title>
        <p>We further applied a special pre-processing hack, which we call Corona
preprocessing. It replaces the words covid-19 and corona virus with ebola, which
helps to obtain a more meaningful semantic representation of the input text, as
ebola is in the vocabulary of pre-trained embeddings and Transformers, while
covid-19 and corona virus, which are more recent terms, are not.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>SW+C Pre-processing</title>
        <p>A third method of pre-processing, which we call SW+C, applies the rules of the
Corona pre-processing, and further removes most stop-words (as listed in the
standard NLTK stop-word list). However, it keeps personal and demonstrative
pronouns, such as he and this, as they provide references that might be
important to determine whether a claim is check-worthy.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We experimented with the above pre-processing techniques and di erent
representations and learning models. All results reported in Tables 1-3 show the
average performance using 5-fold cross-validation on the validation set.</p>
      <p>Our baseline system is an SVM with TF.IDF tokens as features, and it
achieved a MAP score of 0.6235.
4.1</p>
      <sec id="sec-4-1">
        <title>Experiments with SVM</title>
        <p>We used an SVM classi er with word embeddings from GloVe, FastText, Sent2vec,
as well as from Transformers. For GloVe and for FastText, we used three
different pooling options: mean pooling, max pooling, and TF-IDF pooling. For
embeddings from Transformers, we used mean pooling, max pooling, direct use
of the CLS token, and WK pooling [20].</p>
        <p>For each embedding type and pre-processing technique, we performed
randomized search with 1,000 iterations. The space of search hyper-parameters
consisted of three di erent kernel types: linear, polynomial, and RBF, each with an
equal probability of being selected. In addition, the regularization parameter C
and the kernel coe cient for the RBF and for the polynomial kernels were
sampled from a Gamma distribution with parameters a=2, b=1, so that most
of our samples are close to the default value of 1 in order to prevent over tting;
however, in some cases, extreme values were also tried. When selecting a
polynomial kernel, a degree parameter also needs to be provided. We sampled the
degree uniformly with values between 2 and 5. The results of the experiments
are shown in Table 1. In addition to reporting the highest achieved MAP score
for a given group, we further show the corresponding macro-F1 score.</p>
        <p>We can see in Table 1 that using mean pooling alongside GloVe embeddings
yields marginally better results than using max pooling or TF-IDF pooling. The
same is true for the experiments with FastText embeddings, which achieved a
slightly higher MAP and macro-F1 scores. The Twitter bigrams Sent2Vec
embeddings, combined with a pre-processing of replacing all occurrences of
Covid19 and Corona virus with Ebola, managed to outperform the GloVe and the
FastText embeddings. Finally, using Transformer embeddings yielded the best
overall MAP and macro-F1 scores; note that these results were achieved using
WK pooling on the BERT-base embeddings.
We experimented with Logistic Regression in a setup, similar to that for SVM: we
performed randomized search with 1,250 iterations and 5-fold cross-validation.
The hyper-parameter search space includes di erent solvers such as Newton-cg,
SAG, LBFGS, SAGA, and Liblinear, and the regularization parameter C was
sampled from a Gamma distribution with parameters a=2, b=1.</p>
        <p>Table 2 shows the evaluation results. The highest MAP score for GloVe
embeddings is achieved with mean pooling, similarly to SVM. In contrast, the best
MAP scores on FastText embeddings are achieved using max pooling (whereas
mean pooling was best with SVM). Overall, the results with Logistic Regression
and Glove &amp; FastText embeddings are moderately lower than those with SVM.</p>
        <p>The Sent2Vec embedding experiments yielded the best results with Twitter
bigram embeddings (as was the case with SVM). However, they only yielded a
tiny improvement over the highest scores using GloVe and FastText embeddings.</p>
        <p>The results using Transformer embeddings as features for Logistic Regression
are similar to the ones with SVM: in both cases, WK pooling with BERT-base
worked best, and the MAP scores were also similar. In the case of Logistic
Regression, the use of Transformer embeddings considerably improved the MAP
scores compared to the use of GloVe, FastText, and Sent2Vec embeddings.
For Transformers, we used randomized search with 5-fold cross-validation for 20
iterations. We sampled the hyper-parameters uniformly as follows: the number
of training epochs from f2; 3; 4; 5g, the batch size from f2; 4; 8; 12; 16; 24; 32g,
and the learning parameter from f6:25e 5; 5e 5; 3e 5; 2e 5g.</p>
        <p>The results are shown in Table 3. Among the Transformer models, RoBERTa
achieved the highest MAP score by a wide margin and for all pre-processing
techniques. The highest score was achieved using the Corona pre-processing,
which is the best overall result. The hyper-parameters that performed the best
used 5 training epochs, a batch size of 8, and a learning rate of 3e-05.
4.4</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experiments using Tweet Metadata</title>
        <p>Finally, we used information about the tweet and its author, e.g., the number of
retweets of the target tweet, the number of friends the tweet's author has, the
number of years since the account was created, the presence of a URL in the
tweet, etc. We used these extra features by concatenating them to the validation
set predictions for the best-performing RoBERTa models, which arose from the
5-fold cross-validation. The best set of parameters for each RoBERTa model and
pre-processing is shown in Table 4.</p>
        <p>As some tweets contain a link to online news articles, we designed features
modeling the factuality of reporting of the outlets that published these news
articles. For this, we used the manual judgments from Media Bias/Fact Check.3
We thus derived the following nine Boolean features:</p>
        <sec id="sec-4-2-1">
          <title>Is Twitter account veri ed?</title>
          <p>Does the tweet contain a URL?
Does the tweet contain a link to an article published by a news outlet whose
factuality of reporting is
{ very high?
{ high?
{ mostly factual?
{ mixed?
{ low?
{ fake news?
{ conspiracy?</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>As well as the following three numerical features:</title>
          <p>Natural logarithm of the number of times the tweet was retweeted;
Natural logarithm of the number of friends of the Twitter account;
Years since the registration of the Twitter account.</p>
          <p>We conducted experiments concatenating the RoBERTa predictions to the
rst nine and also to all twelve metadata features. The results are shown in
Tables 5 and 6, respectively. Once again, we ran randomized search using SVM
and Logistic Regression on the new features, while searching in the same
hyperparameter space as described in Sections 4.1 and 4.2. We can see that Logistic
Regression performed better than SVM. Moreover, while the best MAP scores
are identical when using twelve vs. nine metadata features, using all twelve
features performed a bit better in terms of macro-F1 score.
3 http://mediabiasfactcheck.com
For our primary submission, we used Logistic Regression with all twelve
metadata features in addition to RoBERTa predictions, as described in Section 4.4.</p>
          <p>For our rst and second contrastive runs, we used models based solely on
RoBERTa. Our contrastive 2 run averages the test set predictions of each of the
trained RoBERTa models with identical hyper-parameters, whereas our
contrastive 1 run averages the predictions of the best-performing RoBERTa models
on the corresponding validation sets. For the contrastive 1 run, we did not pose
the restriction of having identical hyper-parameters.</p>
          <p>The o cial evaluation results are shown in Table 7. We can see that our
primary run achieved a MAP score of 0.8034, which puts us at second place,
falling behind the winner by 0.003 points absolute only, i.e., we are practically
tied for the rst place. Our primary run achieved a higher MAP score on the
test set than on the validation sets, which is a sign of a model that is capable of
generalizing well and not over tting. Our rst and second contrastive runs also
achieved high MAP scores on the test set, but were slightly worse.
We have described our system, Team Alex, for detecting check-worthy tweets
about COVID-19, which we developed for the English version of CLEF-2020
CheckThat! Task 1. It is based on an ensemble combining deep contextualized
text representations with social context, as well as advanced pre-processing. Our
system was ranked second with a MAP score of 0.8034, which is almost tied with
the wining system, lagging behind by just 0.003 MAP points absolute.</p>
          <p>We further described a number of additional experiments and comparisons,
which we believe should be useful for future research as they provide some
indication about what techniques are e ective for the task.</p>
          <p>
            In future work, we plan to experiment with more pre-processing techniques,
with better modeling the social context, as well as with some newer Transformer
models such as T5 [17] and Electra [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research is part of the Tanbih project,4 developed at the Qatar Computing
Research Institute, HBKU, which aims to limit the e ect of \fake news",
propaganda, and media bias by making users aware of what they are reading, thus
promoting media literacy and critical thinking.</p>
      <p>This research is also partially supported by the National Scienti c Program
\ICTinSES", nanced by the Bulgarian Ministry of Education and Science and
Project UNITe BG05M2OP001-1.001-0004, funded by the OP \Science and
Education for Smart Growth" and the EU via the ESI Funds.
4 http://tanbih.qcri.org
[10] Hasanain, M., Haouari, F., Suwaileh, R., Ali, Z., Hamdan, B., Elsayed, T.,
Barron-Ceden~o, A., Da San Martino, G., Nakov, P.: Overview of CheckThat!
Arabic: Automatic identi cation and veri cation of claims in social media.
In: Working Notes of CLEF 2020|Conference and Labs of the Evaluation
Forum. CLEF '2020, Thessaloniki, Greece (2020)
[11] Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in
presidential debates. In: Proceedings of the 24th ACM International on
Conference on Information and Knowledge Management. p. 1835{1838. CIKM
'15, Melbourne, Australia (2015)
[12] Jaradat, I., Gencheva, P., Barron-Ceden~o, A., Marquez, L., Nakov, P.:
ClaimRank: Detecting check-worthy claims in Arabic and English. In:
Proceedings of the 16th Annual Conference of the North American Chapter of
the Association for Computational Linguistics. pp. 26{30. NAACL-HLT '18,
New Orleans, Louisiana, USA (2018)
[13] Konstantinovskiy, L., Price, O., Babakar, M., Zubiaga, A.: Towards
Automated Factchecking: Developing an Annotation Schema and Benchmark for
Consistent Automated Claim Detection. arXiv pre-print 1809.08193 (2018)
[14] Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting Similarities among
Languages for Machine Translation. arXiv e-prints arXiv:1309.4168 (2013)
[15] Nakov, P., Barron-Ceden~o, A., Elsayed, T., Suwaileh, R., Marquez, L.,
Zaghouani, W., Atanasova, P., Kyuchukov, S., Da San Martino, G.: Overview
of the CLEF-2018 CheckThat! Lab on automatic identi cation and veri
cation of political claims. In: Proceedings of the Ninth International
Conference of the CLEF Association: Experimental IR Meets Multilinguality,
Multimodality, and Interaction. pp. 372{387. Lecture Notes in Computer
Science, Springer, Avignon, France (2018)
[16] Patwari, A., Goldwasser, D., Bagchi, S.: TATHYA: a multi-classi er system
for detecting check-worthy statements in political debates. In: Proceedings
of the 2017 ACM on Conference on Information and Knowledge
Management. pp. 2259{2262. CIKM '17, Singapore (2017)
[17] Ra el, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou,
Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a uni ed
text-to-text transformer. arXiv pre-print 1910.10683 (2019)
[18] Shaar, S., Nikolov, A., Babulkov, N., Alam, F., Barron-Ceden~o, A., Elsayed,
T., Hasanain, M., Suwaileh, R., Haouari, F., Da San Martino, G., Nakov, P.:
Overview of CheckThat! English: Automatic identi cation and veri cation
of claims in social media. In: Working Notes of CLEF 2020|Conference and
Labs of the Evaluation Forum. CLEF '2020, Thessaloniki, Greece (2020)
[19] Vasileva, S., Atanasova, P., Marquez, L., Barron-Ceden~o, A., Nakov, P.: It
takes nine to smell a rat: Neural multi-task learning for check-worthiness
prediction. In: Proceedings of the International Conference on Recent
Advances in Natural Language Processing. pp. 1229{1239. RANLP '19, Varna,
Bulgaria (2019)
[20] Wang, B., Kuo, C.C.J.: SBERT-WK: A sentence embedding method by
dissecting BERT-based word models. arXiv pre-print 2002.06652 (2020)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Barron-Ceden~o,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Kyuchukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , Da San Martino, G.,
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF-2018 CheckThat! Lab on automatic identi cation and veri cation of political claims, Task 1: Check-worthiness</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <source>CLEF 2018 Working Notes. Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karadzhov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohtarami</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martino</surname>
            ,
            <given-names>G.D.S.:</given-names>
          </string-name>
          <article-title>Overview of the CLEF-2019 CheckThat! lab on automatic identi cation and veri cation of claims. Task 1: Check-worthiness</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Muller, H. (eds.)
          <source>CLEF 2019 Working Notes. Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Lugano, Switzerland (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Barron-Ceden</surname>
            ~o,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Da San Martino, G.,
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haouari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babulkov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamdan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Overview of CheckThat! 2020 | automatic identi cation and veri cation of claims in social media</article-title>
          .
          <source>In: Proceedings of the 11th International Conference of the CLEF Association: Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction. CLEF '
          <year>2020</year>
          , Thessaloniki, Greece (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          ,
          <issue>601</issue>
          {
          <volume>608</volume>
          (01
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: ELECTRA:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          . arXiv pre-print
          <year>2003</year>
          .
          <volume>10555</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>arXiv pre-print 1705.02364</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Barron-Ceden~o,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          , Da San Martino, G.: CheckThat! at CLEF 2019:
          <article-title>Automatic identi cation and veri cation of claims</article-title>
          .
          <source>In: Proceedings of the 41st European Conference on Information Retrieval</source>
          . pp.
          <volume>309</volume>
          {
          <fpage>315</fpage>
          . ECIR '
          <volume>19</volume>
          ,
          <string-name>
            <surname>Cologne</surname>
          </string-name>
          , Germany (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Barron-Ceden~o,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , Da San Martino, G.,
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF-2019 CheckThat!: Automatic identi cation and veri cation of claims. In: Experimental IR Meets Multilinguality, Multimodality, and</article-title>
          <string-name>
            <surname>Interaction. LNCS</surname>
          </string-name>
          , Springer, Lugano, Switzerland (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Gencheva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Barron-Ceden~o,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Koychev</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.:</surname>
          </string-name>
          <article-title>A context-aware approach for detecting worth-checking claims in political debates</article-title>
          .
          <source>In: Proceedings of the Conference Recent Advances in Natural Language Processing</source>
          . pp.
          <volume>267</volume>
          {
          <fpage>276</fpage>
          . RANLP '
          <volume>17</volume>
          ,
          <string-name>
            <surname>Varna</surname>
          </string-name>
          , Bulgaria (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>