<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>bigIR at CLEF 2018: Detection and Veri cation of Check-Worthy Political Claims</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khaled Yasser</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mucahid Kutlu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tamer Elsayed</string-name>
          <email>telsayedg@qu.edu.qa</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Qatar University</institution>
          ,
          <addr-line>Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the enormous amount of misinformation spread over the Internet, manual fact-checking is no longer feasible to prevent its negative impact. There is an urgent need for automated systems that can make fact-checking process faster and e ectively detect the veracity of claims. In this paper, we present our participation in the two tasks of CLEF-2018 CheckThat! Lab. To rank claims based on their check-worthiness (Task 1), we propose a learning-to-rank approach with features extracted by natural language processing such as named entity recognition and sentiment analysis. For veracity prediction (Task 2), we propose using an external Web search engine to retrieve potentiallyrelevant Web pages and extract features from relevant segments of those pages to predict the veracity. In the o cial evaluation, our best performing runs for Task 1 are ranked 4th (out of 16 runs from 8 teams) and 1st (out of 5 runs from 2 teams) over the English and Arabic datasets respectively, while our best performing run for Task 2 is ranked 6th (out of 10 runs from 5 teams) over the English datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>Fact-Checking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Check-Worthiness</title>
    </sec>
    <sec id="sec-2">
      <title>Veracity Prediction.</title>
      <p>In the wake of the widespread of misinformation on the Internet and in the
news, there emerged a need to combat this phenomenon as e ectively as possible.
However, despite the advent of technology, human fact-checkers could not keep
up with the pace in which misinformation is perpetuated and spread. As a result,
there is a need to develop automated systems to assist and help combat false
claims and misinformation.</p>
      <p>
        CLEF-2018 CheckThat! Lab [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] introduced two tasks which address two
important aspects of fact-checking systems. The main goal of the rst task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is to
detect the check-worthy claims in political debates and prioritize them based on
their check-worthiness. This resultant system has two bene ts: (1) ltering the
claims to be automatically fact-checked, and (2) helping human fact-checkers
prioritize the claims to be fact-checked to focus on the most important ones.
The second task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] aims at predicting the veracity of the claims automatically,
which is the ultimate goal of fact-checking systems. CheckThat! Lab released
both English and Arabic datasets for the two tasks. In this paper, we present
our approaches to tackle these two important problems in the context of the lab.
      </p>
      <p>For the check-worthiness task, we use a learning-to-rank approach with
features extracted for each sentence in the debates. The features include word
embeddings, types of named entities, part-of-speech tags, and sentiment and topic
of sentences. We evaluated the impact of each feature group on training data
and eventually submitted (for the test phase) runs on 3 models where each
model has di erent sets of feature groups. Our best performing model that uses
only features of named entities, sentiment and topic of sentences is ranked 4th
in the o cial evaluation. For the Arabic dataset, we automatically translated
the dataset into English and used the same models. Our best performing model
ranked 1st in the o cial evaluation, but only 2 groups participated in the Arabic
data challenge.</p>
      <p>For the factuality task, we rst retrieve some potentially-relevant Web pages
using a commercial Web search engine, excluding the pages not allowed (by the
lab organizers) for the task. Next, we detect the relevant segments within the
pages and use them to extract features for each claim. Our features include
percentages of sentences con rming the claim and contradicting with the claim,
and also stances of relevant segments. Our best performing run is ranked 6th in
the o cial evaluation for the English dataset, but we did not participate in the
Arabic data challenge in this task.
2</p>
      <sec id="sec-2-1">
        <title>Task 1: Check-Worthiness</title>
        <p>
          In Task 1 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the goal is to rank sentences made in presidential debates
according to their likelihood of being check-worthy. In this section, we describe our
approach and present the evaluation results.
2.1
        </p>
        <sec id="sec-2-1-1">
          <title>Proposed Approach</title>
          <p>Prioritizing the claims based on their check-worthiness is a ranking problem.
Therefore, our approach relies on using learning-to-rank (L2R) methods. There
are two versions of the datasets, one in English, and another in Arabic. First, we
explain our approach for the English dataset, then how we modify our approach
for the Arabic dataset.</p>
          <p>
            Learning-to-Rank Model. We propose an L2R model for this task because
the goal is to rank claims based on their check-worthiness instead of classifying
them. We focus on pairwise and point-wise L2R models since we do not have
a list of queries. We use RankLib L2R toolkit1 in our implementation. In our
experiments on the training data, we observed that the MART algorithm [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]
outperforms other L2R algorithms we tried (i.e., RankBoost and RankNet).
Therefore we opted for the MART algorithm in the testing phase.
1 https://sourceforge.net/p/lemur/wiki/RankLib/
Features. The sentences in the dataset can be a full or half sentence. Because
the data is extracted from debates, there exist sentences used in daily language,
sentences without any verb because of being interrupted by another person,
or sentences that are just a continuation of a previously interrupted sentence.
While this kind of challenge suggests to consider sentences before and/or after
each sentence in feature extraction, we chose to ignore these contextual data and
consider only the individual sentences because the features extracted from the
context would overlap with other statements, which could potentially introduce
noise. The features we extract for each sentence can be divided into 5 categories:
(1) vector representation of sentences, (2) named entities, (3) part-of-speech tags,
(4) sentiment, and (5) topic features. We explain each of these features next.
{ Vector Representation of Sentence: We use Word2Vec to represent
sentences. Due to the small sample set in our training data, we use
embeddingbased vector representation instead of term-based representation. This allows
us to capture similar sentences, and not just ones with the exact terms used.
It also has lower dimensions as opposed to that of a term-based
representation. We used a model pre-trained on Google News2 where each vector has
300 components, and represented each sentence by the average vector of the
vectors of its terms.
{ Types of Named Entities: Check-worthy claims usually contain some
named entities such as organizations, countries, and people. However, not
all types of entities will be helpful for identifying check-worthy claims. For
example, a claim which mentions the name of an international company is
more likely to be check-worthy than a claim which mentions the name of a
local music group, both of which are valid named entity types. We represent
the types of named entities as an n-dimensional vector where each dimension
corresponds to a type. The vector contains binary features, re ecting the
existence or absence of a certain entity type in a sentence. In order to detect
the named entity types, we use IBM Watson API for Natural Language
Understanding3 which yields a vector of 26 features. Some of the available
entity tags are: person, organization, country, and geographical entity.
{ Part-of-Speech (POS) Tags: There can be structural aspects of sentences
that can help distinguish check-worthiness. For example, a sentence written
in future tense is less likely to be check-worthy than a sentence written in
past tense. Therefore, we use POS tags to capture the sentence structure
of claims. We represent POS tags as a vector of binary features where each
feature corresponds to the existence or absence of a certain tag. We use
Stanford CoreNLP [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] to extract the POS tags. The POS feature vector
consists of 36 features.
{ Sentiment: We use sentiment of sentences (i.e., whether the sentence is
presenting a positive, negative, or neutral attitude) as a feature. We
extract sentiment labels using Stanford CoreNLP [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], but we collapse di erent
2 https://code.google.com/archive/p/word2vec/
3 https://www.ibm.com/watson/services/natural-language-understanding/
grades of positive and negative labels (e.g., very positive and very negative)
to positive and negative, respectively.
{ Topic: The topic of the sentence can indicate whether it is worth being
checked or not. For example, a sentence about the use of nuclear weapons is
more check-worthy than one about the release date of a movie. Therefore,
we extracted topics of sentences using IBM Watson API in which a sentence
could be classi ed into multiple topics where each topic could be ne-grained
up to three levels. In our implementation, we consider only topics which have
a con dence score of 0.5 or higher and used only the rst two levels; after
manual inspection of the topics available we found the third level to be too
ne-grained for such a small dataset. The topics, just like types of named
entities and POS tags, are represented as a vector of binary features but
with a dimensionality of 348.
          </p>
          <p>Feature Selection. After constructing the feature vectors for all sentences, we
performed two-tier feature selection to choose the most discriminant features.
The rst tier is group-level selection, in which we evaluate the impact of each
feature group previously mentioned in a leave-one-out fashion. For example, we
test a model with and without topic features to see their impact on the results.
The rst tier was done manually by trying di erent possible combinations of
feature groups (See Section 2.2). The second tier adopts a variance-threshold
feature selection where we removed features which have variance less than a
certain threshold across the samples. We used a threshold of 0 to eliminate
features which have the same value across all samples.</p>
          <p>Handling Class Imbalance. We observed that the training dataset given for
Task 1 is highly imbalanced such that only 3% of the statements have been
judged as check-worthy. This could potentially cause problems on the test data
due to lack of enough samples for check-worthy claims. Therefore, we oversample
the positive samples by duplication. In our experiments with the training data,
we tried multiple oversampling values and observed that an oversample factor
of 400% is the most ideal one for the training data.</p>
          <p>
            Arabic. All the steps we described so far have been applied on the English
dataset. One of the main challenge for Arabic dataset is that there is no
suitable NLP tools that can be used to extract features. Therefore, we opted for
using machine translation methods. Mohammad et al. [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] discussed the e ect of
machine translation on sentiment analysis and showed that models trained on
a di erent language yield results comparable to those trained on the same
language. Therefore, we rst translate the Arabic dataset to English using Google
Translate API 4 and then apply the same model trained on the English one.
4 https://cloud.google.com/translate/docs/
2.2
In order to pick the models to use in the test phase, we performed k -fold cross
validation for evaluating di erent models. Testing the models on data from the
same debates they are trained on might not give proper insights on how they
would perform on the unseen data; they could share similar topics, talk about
the same entities, or have similar syntactic styles. Therefore, we set the folds
such that each has the data from a separate debate.
          </p>
          <p>In our unreported initial experiments, we observed that MART outperforms
other L2R algorithms when the maximum number of trees and learning rate are
set to 100 and 0.1 respectively. Subsequently, we evaluated the impact of each
feature group using the MART algorithm while also changing the number of
leaves parameter. The values for the number of leaves parameter we tried are
from 5 to 12 inclusive. Table 2.2 shows the Average Precision (AP) score of the
two best-performing models for each group of features on the English dataset.</p>
          <p>Number of leaves AP on test
All - fPOS tags, Word2Vecg 5 0.399</p>
          <p>All - fPOS tags, Word2Vecg 6 0.347
Table 1. Results of training MART models with di erent groups of features and
number of leaves for Task 1. The maximum number of trees in the training stage is set
to 100. The ones in bold represent the models selected for the test phase.</p>
          <p>As shown in Table 2.2, the best performing model is the one trained without
using Word2Vec vectors as features using trees with only 5 leaves. Aside from
Word2Vec, models trained without POS features also outperform the models
that do not use a particular set of features. We select the top 3 models (shown
in bold in the table) for the submission of both English and Arabic datasets
without picking two models with the same feature groups.</p>
          <p>In the o cial evaluation for the English dataset, our selected models All
- fWord2Vecg, All - fPOS tags, Word2Vecg, and All - fPOS tagsg achieved
Features</p>
          <p>All</p>
          <p>All
All - fentity typesg
All - fentity typesg</p>
          <p>All - ftopicsg</p>
          <p>All - ftopicsg
All - fPOS tagsg</p>
          <p>All - fPOS tagsg
All - fsentimentg</p>
          <p>All - fsentimentg
All - fWord2Vecg</p>
          <p>All - fWord2Vecg
MAP values of 0.1120 (10th in the ranking), 0.1319 (4th in the ranking), and
0.1117 (11th in the ranking) respectively. As on the Arabic dataset, our models
achieved 0.0899 (4th in the ranking), 0.1498 (1st in the ranking), and 0.0962 (3rd
in the ranking) MAP scores respectively. That clearly shows the second model
(i.e., All - fPOS tags, Word2Vecg) outperforms the other two models in both
datasets. Interestingly, it performed better on the Arabic dataset than it did on
the English one (0.1498 vs. 0.1319).
3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Task 2: Factuality</title>
        <p>
          After nding which claims are more important to be checked, Task 2 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] focuses
on the second stage of automated fact-checking which is detecting the veracity
of check-worthy claims.
3.1
        </p>
        <sec id="sec-2-2-1">
          <title>Proposed Approach</title>
          <p>
            We approach this task as a classi cation problem. For a given claim, we rst
nd potentially-relevant Web pages using an external Web search engine. Then
we detect the relevant segments in each of those pages and extract features from
these segments. Finally, we predict the veracity of the claim using a learning
model based on the extracted features. We discuss our approach in detail next.
Web Search. The rst step of our approach is retrieving search results for the
claim. We follow the approach described by Karadzhov et al. [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] to generate a
query from a given claim. We use Google Custom Search5 to retrieve the results
but with custom settings to lter out websites which are not allowed for the
task. We retrieve 10 results for each claim.
          </p>
          <p>Relevant Segments Detection. The goal of this step is to nd which parts of
a Web page are relevant to the claim of interest. We compute the cosine similarity
between the Word2Vec vectors of the claim and each sentence in the page. We
consider a sentence as relevant if the cosine similarity score is higher than 0.5.
Next, for each relevant sentence, we add a sentence before and a sentence after
in order to capture the contextual information. We call these three-sentence
structures relevant segments.</p>
          <p>
            Features. We extract features from each relevant segment. We rst explain
the features and how they are extracted, then we explain how we aggregate the
features from di erent pages. The features are as follows:
{ Stance Detection: Stance detection is the process of nding whether a
piece of text is for, against, or unrelated to another piece of text. Following
5 https://cse.google.com/cse/
the Fake News Challenge6, where the challenge was to check if an article
supports or denies a claim using stance detection, we incorporate stance
detection into our method. We use the implementation provided by [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] and
train it on the entire Emergent dataset [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. We rst classify each relevant
segment in all pages separately and then calculate the percentage of each
label, yielding a feature vector of size 3.
{ Contradiction in Predicates: A stance can be a subjective statement
regarding a claim. For instance, let the claim be \He bought weapons" and
a sentence extracted from a document be \I do not want to believe that
he bought weapons." While the sentence is against the claim, it does not
state whether the claim is actually true or not. On the other hand, consider
the following sentence: \He forgot to buy weapons". The predicate of this
claim (\buy") contradicts the one in this sentence. In this feature, we try to
detect if the predicate of a relevant segment contradicts with the predicate
of the given claim. In order to extract such relation, we use TruthTeller[
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]
in combination with WordNet7.
          </p>
          <p>In particular, we rst generate a relation matrix, R, which represents the
relation between two sentences. Ri;j indicates the relation between term i in
the rst sentence and term j in the other, where the relation can be either
synonyms, antonyms, or unrelated. We compare the lemmas of the terms and
assign the label synonyms if the two terms are the same. Furthermore, we
classify each term in a sentence using TruthTeller where a term can either
be positive (i.e., the term is a rmed), negative (i.e., the term is negated in
some way), uncertain (i.e., there is a doubt around the term), or unknown
(i.e., a decision could not be made). We detect the relative truth of term j in
sentence B based on a term i in sentence A, given that they have a relation,
using the following rules:</p>
          <p>If Ri;j = synonyms and T (Ai) = T (Bj ) ! con rms
If Ri;j = synonyms and T (Ai) 6= T (Bj ) ! contradicts
If Ri;j = antonyms and T (Ai) = T (Bj ) ! contradicts</p>
          <p>If Ri;j = antonyms and T (Ai) 6= T (Bj ) ! con rms
Figure 1 and 2 show an example of a relation matrix and truth predicates
for the sentences \He bought weapons" and \He forgot to buy weapons"
respectively. The terms \buy" and \bought" share the same lemma; their
relation is set to synonyms. The terms \buy" and \bought" are synonyms
with opposite truth values, positive and negative, respectively. Following the
aforementioned rules, the two sentences contradict one another.
The feature vector for predicates consists of two features, one for the
percentage of con rming predicates, and one for the percentage of contradicting
ones. We do exclude uncertain and unknown truth values to simplify the
process since they do not o er any value.</p>
          <p>The results of both stages, stance detection and contradiction in predicates,
from all pages are aggregated into a single vector of features. This nal
vec6 http://www.fakenewschallenge.org/
7 https://wordnet.princeton.edu/
2synonyms unrelated unrelated 3
6 unrelated unrelated unrelated 7
6 7
R(A, B) = 66 unrelated unrelated unrelated 77
6 7
6 unrelated synonyms unrelated 7
4 5</p>
          <p>unrelated unrelated synonyms
tor carries the following 5 features: the percentage of segments supporting the
claim, the percentage of segments which are against the claim, the percentage of
segments unrelated to the claim, the percentage of sentences with contradicting
predicates, and the percentage of sentences with con rming predicates.
The evaluation process for Task 2 is similar to that of Task 1. We performed
k -fold cross-validation where each fold corresponds to the claims from one
debate. The optimization and evaluation for Task 2 was done on model prediction
accuracy. We tried three common classi ers for this problem and performed grid
search optimization for each classi er on all of its parameters. The results are
reported in Table 2.</p>
          <p>Although stance is more common, the results in Table 2 show that using
contradiction in predicate, by itself or in addition to stance, has the potential to
improve classi cation accuracy.</p>
          <p>The selected models, highlighted in bold in Table 2, are SVM with
contradictionin-predicates features, logistic regression with all features, and random forest
with stance features. The metric in the o cial evaluation was Mean Squared
Error (MSE), and our models received scores of 0.9640, 0.9640, and 0.9425,
respectively. The rst two models landed the last place, while the last one landed
the 6th place (out of 10).
4</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Conclusion</title>
        <p>In this paper, we present our methods for the tasks of CLEF-2018 CheckThat!
Lab. For the task of detecting check-worthy claims, we proposed a
learningto-rank based approach which uses natural language processing methods for
extracting the features. Our best performing model achieved the 4th place on
the English dataset. The same model, when used in conjunction with machine
translation, got the 1st place in the evaluation on the Arabic datasets.</p>
        <p>Regarding the task of verifying the check-worthy claims, we proposed a
system which uses an external Web search engine to collect evidence and then
predicts the veracity of the claims by detecting the stance of relevant statements
and contradicting or con rming sentences in the retrieved pages. Our best
performing model got 6th place. However, we observed promising results for the
contradiction in predicates feature, suggesting that it can be a useful feature for
fact-checking with more enhanced methods. However, it needs further
experiments on larger datasets.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Acknowledgments</title>
        <p>This work was made possible by NPRP grant# NPRP 7-1313-1-245 and NPRP
grant# 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar
Foundation). Statements made herein are solely the responsibility of the authors.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Barron-Ceden~o,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Kyuchukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , Da San Martino, G.,
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF-2018 CheckThat! Lab on automatic identi cation and veri cation of political claims</article-title>
          .
          <source>Task</source>
          <volume>1</volume>
          : Check-worthiness
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Barron-Ceden~o,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Marquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Kyuchukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , Da San Martino, G.,
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF-2018 CheckThat! Lab on automatic identi cation and veri cation of political claims</article-title>
          .
          <source>Task</source>
          <volume>2</volume>
          : Factuality
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlachos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Emergent: a novel data-set for stance classi cation</article-title>
          .
          <source>In: Proceedings of the 2016</source>
          conference
          <article-title>of the North American chapter of the association for computational linguistics: Human language technologies</article-title>
          . pp.
          <volume>1163</volume>
          {
          <issue>1168</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Greedy function approximation: a gradient boosting machine</article-title>
          . Annals of statistics pp.
          <volume>1189</volume>
          {
          <issue>1232</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Karadzhov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barron-Cedeno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Fully automated fact checking using external sources</article-title>
          .
          <source>arXiv preprint arXiv:1710.00341</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lotan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stern</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dagan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Truthteller: Annotating predicate truth</article-title>
          .
          <source>In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>752</volume>
          {
          <issue>757</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McClosky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Stanford CoreNLP natural language processing toolkit</article-title>
          . In:
          <article-title>Association for Computational Linguistics (ACL) System Demonstrations</article-title>
          . pp.
          <volume>55</volume>
          {
          <issue>60</issue>
          (
          <year>2014</year>
          ), http://www.aclweb.org/anthology/P/P14/P14-5010
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salameh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiritchenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>How translation alters sentiment</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>55</volume>
          ,
          <volume>95</volume>
          {
          <fpage>130</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Barron-Ceden~o,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Marquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Kyuchukov</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , Da San Martino, G.:
          <article-title>Overview of the CLEF2018 CheckThat! Lab on automatic identi cation and veri cation of political claims</article-title>
          . In: Bellot,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Trabelsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Murtagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Sanjuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <source>Proceedings of the Ninth International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Lecture Notes in Computer Science</source>
          , Springer, Avignon, France (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augenstein</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spithourakis</surname>
            ,
            <given-names>G.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A simple but tough-tobeat baseline for the fake news challenge stance detection task</article-title>
          .
          <source>arXiv preprint arXiv:1707.03264</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>