<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TheEarthIsFlat's Submission to CLEF'19 CheckThat! Challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Favano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark J. Carman</string-name>
          <email>mark.carman@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pier Luca Lanzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano MI 20133</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This report details our investigations in applying state-ofthe-art pre-trained Deep Learning models to the problems of Automated Claim Detection and Fact Checking, as part of the CLEF'19 Lab: CheckThat!: Automatic Identi cation and Veri cation of Claims. The report provides an overview of the experiments performed on these tasks, which continue to be extremely challenging for current technology. The research focuses mainly on the use of pre-trained deep neural text embeddings that through transfer learning can allow for improved classi cation performance on small and unbalanced text datasets. We also investigate the e ectiveness of external data sources for improving prediction accuracy on the claim detection and fact checking tasks. Our team submitted runs for every task/subtask of the challenge. The results appeared satisfactory for task 1 and promising but less satisfactory for task 2. A detailed explanation of the steps performed to obtain the submitted results is provided, including comparison tables between our submissions and other techniques investigated.</p>
      </abstract>
      <kwd-group>
        <kwd>Automated Fact Checking cation</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Claim Detection</kwd>
        <kwd>Text Classi-</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this report we describe our e orts to use state-of-the-art pre-trained deep
neural text embeddings for tackling the di erent subtasks of the CheckThat!
challenge [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In order to achieve good results, a great number of experiments
were performed. In the following sections we provide descriptions and results for
the most interesting of these experiments in the hope of inspiring future research
in this area. In Section 2 we will explain all the steps that brought to our nal
submission for Task 1, from the choice of the architecture to the ne-tuning of
the chosen setup. In Section 3 we explain the text pair classi cation approach
that we applied for the subtasks of Task 2.
Sanders And what has happened there is absolutely unacceptable.
      </p>
      <p>Maddow Senator, thank you.</p>
      <p>Todd Secretary Clinton, let me turn to the issue of trade.</p>
      <p>Todd In the '90s you supported NAFTA.</p>
      <p>
        Todd But you opposed it when you ran for the president in 2008.
Label
The rst task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for the CheckThat challenge involved classifying individual
statements within political debates as check-worthy (i.e. constituting a claim
that is worth fact checking) or not check-worthy. The training data consisted of
19 debates, while the test data contained seven. An example section from one
of the debates1 is shown in Table 1. Note that each debate is a dialog with the
speaker information available for each utterance.
Recent years have seen a proliferation of pretrained-embeddings for language
modeling and text classi cation tasks, starting from basic word embeddings such
as word2vec [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and GloVe [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and moving to sub-word and character-level
embeddings like FastText [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. More recently pre-trained deep networks have
become available, which make use of BiLSTM [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or self-attention layers [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to
build deep text processing models like ELMo [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These models
o er improved transfer learning ability, taking advantage of massive corpora of
unlabeled text data from the Web to learn the structure of language, and then
leveraging that knowledge to identify better features and improve prediction
performance on subsequent supervised learning tasks.
      </p>
      <p>
        In this work, we make use of a number of state-of-the-art pre-trained models
for text-processing, namely: BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], ELMo [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Infersent [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], FastText [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
and the Universal Sentence Encoder (USE) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>When competing in the challenge we rst ran a preliminary experiment over
validation data comparing the performance of these toolkits in order to decide
which one to use for our submission. We repeated this comparison after the
annotated test set for the challenge was published, so that we could provide
results on the held-out test data. Those test results for Task 1 are reported in
Table 2. Note that default (hyper)parameters were used for each system, with
the exception of the number of training steps (or epochs), which was set based
on validation performance.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Sample sentences extracted from the le \20160209-msnbc-dem".</title>
      <p>Results of the preliminary experiment indicated that the Universal Sentence
Encoder (USE) was a model that could provide reasonable performance for the
claim detection task. We then investigated a number of di erent settings for how
to train a USE-based classi er and how to modify the training dataset in order
to improve prediction performance. The modi cations to the training dataset
considered included appending speaker information or previous utterances to
the input and also the use of external training data.</p>
      <p>For the classi cation task, the network architecture used was to append a
fully connected Feed-Forward (FF) Neural Network with two hidden layers to the
output from the Universal Sentence Encoder. The training (hyper)parameters
for the network were set to the values shown in Table 3. Note that the weights
of the USE encoding were not ne-tuned2 during training of the classi er due
to the small quantities of labelled training data available.</p>
      <p>
        The following experimental setups were evaluated. We report results for each
setting on the test data (not available at the time of run submission) in Table 4.
1. Training on Task 1 dataset only, using each individual sentence only as the
input text.
2. Same as setup 1, but concatenating the speaker information to the sentence
text.
2 Investigations with the parameter Trainable set to true resulted in degraded
performance.
3. Same as setup 1, but using as input the concatenation of the two previous
sentences with the current sentence.
4. Same as setup 1, but applying basic text pre-processing, in which
contractions in the text are expanded and the text is stripped of accented characters,
special characters or extra white spaces, and then converted to lower-case.
5. Same as setup 1, but activating the Trainable parameter of the USE-module
to ne-tune the weights of the sentence encoder.
6. Supplementing the Task 1 dataset with additional positive examples
extracted from the LIAR dataset [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The LIAR dataset contains a set of
political sentences from various sources that have been fact-checked by
PolitiFact3 and assigned a truth label. It is safe to assume that all the sentences
included in the LIAR dataset were once considered worthy of fact checking.
Based on this assumption all the sentences in the dataset make for a valid
set of additional positive instances for the fact checking task. Moreover there
is a strong motivation for adding positive examples to the Task 1 training
set, since the training data is highly skewed toward the negative class with
only a small percentage of positive training instances. An obvious
limitation of this idea is that by adding only positive instances which come from
a di erent source from the training data (and therefore may not share the
same vocabulary distribution), we may simply end up training the classi er
to distinguish between instances from the two datasets (the Task 1 political
debate instances and the LIAR fact-checked claims dataset).
7. Training rst on the LIAR dataset [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], but keeping the 0 and 1 labels the
same as they were in the original LIAR dataset (where 1 indicates a false
statement and 0 indicates a true statement), and then train again on Task
1 dataset.
8. Training on a much larger Headlines+Wikipdia dataset consisting of one
million headlines from news articles sourced from an Australian news source4
and one million randomly chosen sentences from the content of Wikipedia
articles5. The assumption here is that random chosen sentences from Wikipedia
are generally not making claims nor worth fact-checking, while headlines
from news articles are more likely to state a claim and are interesting and
therefore likely worth fact checking. After rst training on the 2 million
sentence corpus, we then further train ( ne-tune) the model on the Task 1
dataset.
      </p>
      <p>We note from Table 4 that none of the tested modi cations to the training
data resulted in improvements over the basic USE-based classi er. Of all the
techniques, the most interesting appears to be that of adding millions of positive
and negative examples from the Headlines+Wikipedia dataset, which caused
relatively small degradation in Average Precision (MAP) while providing a marked
increase in Reciprocal Rank (RR). We leave to future work an investigation of</p>
    </sec>
    <sec id="sec-3">
      <title>3 https://www.politifact.com</title>
    </sec>
    <sec id="sec-4">
      <title>4 https://www.kaggle.com/therohk/million-headlines</title>
    </sec>
    <sec id="sec-5">
      <title>5 https://www.kaggle.com/mikeortman/wikipedia-sentences</title>
      <p>why that was the case and whether modi cations to that dataset and its use
could result in positive gains in MAP.
2.3</p>
      <sec id="sec-5-1">
        <title>Comparing Di erent Encoder &amp; Discriminator Architectures</title>
        <p>
          The Universal Sentence Encoder (USE) o ers two di erent pre-trained models
that di er in their internal architecture. The standard USE module is trained
with a Deep Averaging Network (DAN) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], while the larger version of the
module is trained with a Transformer [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] encoder.6
        </p>
        <p>Performance for the two versions of the USE encoder on the test data are
shown in Table 5. We note a much higher MAP value for the larger,
transformerbased model.</p>
        <p>
          In order to provide a discriminative model able to predict check-worthiness
labels, two di erent network architectures have been layered on top of the USE
architecture. The relative performance of the two models is shown in Table 6,
and their descriptions are as follows:
1. The architecture used to produce most of the results in this report is a Feed
Forward Deep Neural Network (FF-DNN) with two hidden layers, obtained
by using the TensorFlow DNNClassi er component.
2. A second architecture consists of a dense layer with a ReLU [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] activation
function, followed by a softmax layer allows to categorize the results. This
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 A third version of the encoder, called \lite", is speci cally designed for systems with</title>
      <p>limited computational resources, and thus was not investigated here.
architecture was implemented in Keras7 applying a lambda layer to wrap
the USE output.</p>
      <p>Performance for the TensorFlow implementation (on the validation data)
outperformed the Keras ReLU architecture, so we continued with that model in
the other experiments.</p>
      <p>In order to decide how many steps to train each model for, we examined
performance of the models against the number of training steps on individual
debates from the training data as shown for the Large USE model in Table 7.
For that particular model we decided to train the model for only 600 steps based
on the average results across the training debates.
For the submitted runs we made use of both the standard and large USE
architectures compared in Table 5. The standard USE model has been used for
the rst two runs: Primary and Contrastive 1, while the large USE model was
used for Contrastive 2. Table 8 contains the results for the submitted runs8. The
di erence between the rst two runs, which both use the standard USE model,
is that for the rst we used the Adagrad optimiser and a feed-forward network
with two hidden layers of size 512/128 while for the second we employed the</p>
    </sec>
    <sec id="sec-7">
      <title>7 https://keras.io</title>
    </sec>
    <sec id="sec-8">
      <title>8 Note that some values are the same as Table 5.</title>
      <p>Adam optimiser with two hidden layers of size 100 and 8. We note that our last
run (Contrastive 2) obtained the best MAP score over all runs submitted by any
team for Task 1.</p>
      <p>The USE standard model had been chosen as the primary run because it had
provided better peak results during training, while the large model provided
more stable results. Note the results on the training data shown in Table 9,
where the standard model outperformed the large model on two of the three
debates used for training.</p>
      <p>Independently from the model used, we see that there is large variation in
the performance across the debates in the training set. Dealing with such large
variation e ectively is something that ought be addressed in future work. We
note that on the test data, where the average MAP value is around 0.18, the
average precision across the individual debates varies from 0.05 (for the
201512-19 debate) to 0.5 (for the 2018-01-31 debate).
3</p>
      <sec id="sec-8-1">
        <title>Task 2 - Evidence and Factuality</title>
        <p>
          The second task of the challenge [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] contains multiple subtasks which together
form a path that aims at automating the fact-checking process. Given a claim
and a set of the web pages, the subtasks consist of:
1. Ranking the web-pages based on how useful they are to assess the veracity
of the claim.
2. Labelling the web-pages based on their usefulness into four categories: very
useful, useful, not useful, not relevant.
3. Labelling individual passages within those pages that are useful for
determining the veracity of the claim.
4. Labelling the claims as true or false given the discovered information.
        </p>
        <p>Unlike Task 1 for which all the data was written in English, for Task 2
all content was written in Arabic. We generally worked directly with the Arabic
text but also experimented with translating the content into English as discussed
below.</p>
        <p>
          Every subtask has been tackled using a similar setup: after processing the
data to obtain a dataset that consists of two strings of text and a label to
predict, we feed this pairs into a pre-trained BERT model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] that we train to
classify the relationships between the two texts. In some cases, we have also
investigated adding external data that could be useful, given that the datasets
for the subtasks were extremely small.
3.1
        </p>
        <sec id="sec-8-1-1">
          <title>Task 2A and 2B { Determining Relevant Web-pages</title>
          <p>For the rst two subtasks we used an almost identical approach: We extracted
the claim text and associated with each web page text using the Beautiful Soup
parser9 to remove HTML markup. The training sets then consisted of 395
labelled text pairs (claims, corresponding webpages and relationship labels).</p>
          <p>
            A set of experiments on the dataset were performed using a small portion
of the training data as a validation set. The accuracy results in Table 10 have
been averaged over three runs to account for the variation due to very small
training/validation sets. The techniques investigated were the following:
1. BERT model is trained on the Task 2-AB dataset.
2. BERT model is trained on external data using a dataset that was previously
used for stance detection for the FakeNewsChallenge [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] challenge.
3. BERT model has been rst trained on the FakeNewsChallenge dataset then
on the Task 2-AB dataset.
4. The Task 2-AB dataset has been translated to English before feeding it to
the model as in 1.
          </p>
          <p>Given that training BERT over large sections of text has very large memory
requirements, the standard pre-trained BERT model was used instead of the
biggest one available10. This limited the text sections to be no more than 100
to 150 words. BERT automatically reduces the information in longer context
windows such that the this limit is enforced, implying that some information is
necessarily lost from the text of longer webpages.</p>
          <p>We observe in Table 10 improved performance using the FakeNewsChallenge
dataset and translating the Arabic text to English, but caution that the results
are subject to signi cant variation due to small sample sizes.</p>
          <p>The ranking for subtask 2A was computed using the predicted con dence
value with which the pages were being classi ed as useful. Analyzing the
Challenge's \Results Summary", it can be noted that while the system learnt to</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9 https://www.crummy.com/software/BeautifulSoup/</title>
      <p>10 We conjecture that the use of the bigger BERT model would have increased
performance on these subtasks.</p>
      <p>Epochs Accuracy
3
5
6
7
8
10
classify not relevant and not useful pairs of texts, it was not able to learn to
classify useful and very useful pairs. Thus in subtask 2A the test results we
obtained were quite poor, while for subtask 2B (see Table 12) we indeed achieved a
high Accuracy value (0.79) for two-class classi cation but a zero Precision value,
indicating that the classi er is predicting only the negative class.
3.2</p>
      <sec id="sec-9-1">
        <title>Task 2C { Finding Useful Passages</title>
        <p>For this subtask the dataset consisted of each claim text paired with a paragraph
that was linked to it. Again the set over which the results could be measured was
too small to compare the di erent parameter settings for the model. In this case
the scores obtained without using any external data were quite promising and
Table 11 shows the performance versus the number of epochs used for training.</p>
        <p>The results for Task 2C in Table 12 show scores that are much lower than the
ones obtain in Table 11, nonetheless this submission got the best scores among
the teams over Precision (0.41), Recall (0.94) and F1 (0.56), while obtaining a
slightly lower result for Accuracy (0.51).
3.3</p>
      </sec>
      <sec id="sec-9-2">
        <title>Task 2D { Assessing Claim Veracity</title>
        <p>
          Subtask D has been tackled thinking about how external data might be leveraged
to learn a model for assessing claim factuality. Two di erent datasets have been
considered: The rst was the Stanford Natural Language Inference Corpus, [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
while the second was again the FakeNewsChallenge [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] stance detection dataset.
        </p>
        <p>The two datasets have been used to judge the relationship between the claims
and the text that composed the web pages. While in the rst case the entailment
or contradiction con dence score is used, in the second case the con dence over
the labels agree or disagree (how much a text agrees or disagrees with a given
headline) was used instead.</p>
        <p>The results obtained have been evaluated only over a subset of 31 claims and
in this case the best Accuracy value obtained is 0.52.
4</p>
        <sec id="sec-9-2-1">
          <title>Conclusions</title>
          <p>In this report we have described our investigations in applying state-of-the-art
pre-trained deep learning models to the problems of automated claim detection
and fact checking, as part of the CLEF'19 Lab: CheckThat!: Automatic Identi
cation and Veri cation of Claims.</p>
          <p>
            For Task A we investigated the use of pre-trained deep neural embeddings
for the problem of check-worthiness prediction. Over a set of embeddings, we
found the Universal Sentence Encoder (USE) [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] to provide the best performance
with little out-of-the-box tuning required. We investigated di erent techniques
for pre-processing the political debate data and also the use of external datasets
for augmenting the small and highly unbalanced training dataset, but did not
observe performance improvements in either case. Thus our runs for the challenge
were built by simply training a Feed-Forward neural network on top of the USE
encoding(s), without further modi cation of the training data.
          </p>
          <p>The results obtained for the rst task were quite inspiring. With a more
judicial choice of validation set it may have been possible to determine that the
best choice of model was indeed that used for our third run, which obtained the
highest MAP value over all teams for the task. Further work should be aimed
at levelling the di erences in performance over the di erent debates.</p>
          <p>
            The various subtasks of Task 2 involved predicting the usefulness of
webpages and passages for determining the veracity of a particular claim as well
as predicting the veracity of the claim itself. For this task we made use of the
BERT [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] model, which can be trained on text pairs to directly predict a
relationship label. We found this approach to the task promising, but hampered by
insu cient training data and large memory requirements for the BERT model.
Furthermore, we found that external datasets (from the FakeNewsChallenge [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ])
may be useful for improving performance on these tasks, despite the fact that
they are in a di erent language (English) from the training/test data for the
task (Arabic).
          </p>
          <p>In conclusion, the preliminary results show that pre-trained deep learning
models can be e ective for a variety of tasks. The use of small or unbalanced
datasets is a renown problem for deep learning, yet the transfer learning
techniques that we used to face the challenge proved quite successful and may o er
an opportunity in overcoming deep learning limitations.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karadzhov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohtarami</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Da San Martino, G.:
          <article-title>Overview of the CLEF-</article-title>
          2019
          <source>CheckThat! Lab on Automatic Identi cation and Veri cation of Claims. Task</source>
          <volume>1</volume>
          : Check-Worthiness
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angeli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Limtiaco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
          </string-name>
          , R.S.,
          <string-name>
            <surname>Constant</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guajardo-Cespedes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strope</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurzweil</surname>
          </string-name>
          , R.:
          <article-title>Universal sentence encoder</article-title>
          . CoRR abs/
          <year>1803</year>
          .11175 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>670</volume>
          {
          <fpage>680</fpage>
          . ACL, Copenhagen, Denmark (
          <year>September 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Barron-Ceden~o,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , Da San Martino, G.,
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF-2019 CheckThat!: Automatic Identi cation and Veri cation of Claims. In: Experimental IR Meets Multilinguality, Multimodality, and</article-title>
          <string-name>
            <surname>Interaction. LNCS</surname>
          </string-name>
          , Lugano, Switzerland (
          <year>September 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. FakeNewsChallenge organizers:
          <article-title>FakeNewsChallenge stance detection dataset</article-title>
          . http://www.fakenewschallenge.org (
          <year>2016</year>
          ),
          <source>online; Since December 1st 2016</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Framewise phoneme classi cation with bidirectional lstm and other neural network architectures</article-title>
          .
          <source>Neural Networks</source>
          <volume>18</volume>
          (
          <issue>5-6</issue>
          ),
          <volume>602</volume>
          {
          <fpage>610</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <article-title>Barron-Ceden~o,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF-</article-title>
          2019
          <source>CheckThat! Lab on Automatic Identi cation and Veri cation of Claims. Task</source>
          <volume>2</volume>
          : Evidence and Factuality
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjunatha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyd-Graber</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>III</surname>
          </string-name>
          , H.D.:
          <article-title>Deep unordered composition rivals syntactic methods for text classi cation</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31</source>
          ,
          <year>2015</year>
          , Beijing, China, Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <volume>1681</volume>
          {
          <issue>1691</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Bag of tricks for e cient text classi cation</article-title>
          .
          <source>arXiv preprint arXiv:1607.01759</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Recti ed linear units improve restricted boltzmann machines</article-title>
          .
          <source>In: Proceedings of the 27th international conference on machine learning (ICML-10)</source>
          . pp.
          <volume>807</volume>
          {
          <issue>814</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          . In: In EMNLP (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proc. of NAACL</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
          </string-name>
          , L.u.,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          . Curran Associates, Inc. (
          <year>2017</year>
          ), http://papers.nips.cc/paper/7181-attention
          <article-title>-is-all-you-need</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , W.Y.:
          <article-title>"liar, liar pants on re": A new benchmark dataset for fake news detection</article-title>
          .
          <source>CoRR abs/1705</source>
          .00648 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>