<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An ML Model for Predicting Information Check-Worthiness using a Variety of Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md Zia Ullah</string-name>
          <email>mdzia.ullah@irit.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT, UMR5505 CNRS 118 Route de Narbonne</institution>
          ,
          <addr-line>31062 Toulouse CEDEX 9</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this communication, we introduce the important problem of information check-worthiness. We present the method we developed to automatically answer this problem. This method makes use of an elaborated information representation that combines the “information nutritional label” features along with word-embedding features. The information check-worthy claim is then predicted by training a machine learning model based on these features. Our model outperforms the official participants' runs of CheckThat! 2018 challenge.</p>
      </abstract>
      <kwd-group>
        <kwd>Information check-worthiness</kwd>
        <kwd>Information nutritional label</kwd>
        <kwd>Machine learning based model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The main problems associated to automatic fact-checking consist of (1) deciding whether
a piece of information is worth being reviewed or not and (2) finding evidence that helps
in detecting if the fact is correct or if it is a fake. Information check-worthiness refers
to the first challenge and is specifically critical in political debates [
        <xref ref-type="bibr" rid="ref2 ref8">8,2</xref>
        ] where facts can
be manipulated, denied, or hidden.
The approach we developed to tackle this problem relies both on word embedding
using Word2Vec model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and on the Information Nutritional Label for online
documents [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The former is now a common model to represent texts for various tasks [
        <xref ref-type="bibr" rid="ref15 ref18">18,15</xref>
        ].
On the other hand, the information nutritional label which was initially introduced to
“help readers making more informed judgments about the items they read” provides
scores for various criteria to qualify the content of a text and have shown to be
helpful for deciding whether a piece of information should be prioritized for checking or
not [
        <xref ref-type="bibr" rid="ref1 ref13">13,1</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>2.1 Information representation</title>
        <p>The information representation combines (a) the information nutritional label features
and (b) word embedding features.</p>
        <p>
          ”Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).”
Information nutritional label. The information nutritional label for online documents [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
corresponds to a description of the textual information unit according to nine criteria as
follows:
1. Factuality: the number of facts it mentions,
2. Readability: the ease with which a reader can understand it,
3. Virality: the speed at which it is propagated,
4. Emotion: its emotional impact, both positive and negative emotion.
5. Opinion: the number of opinionated sentences it contains,
6. Controversy: the number of controversial issues it addresses,
7. Authority/Trust/Credibility: its credibility and the authority and trust of the source
it belongs to,
8. Technicality: the number of technical issues it addresses and technical terms used,
9. Topicality: its current interest which is time-dependent.
        </p>
        <p>
          From the initial label our method makes use of the ones that are underlined
(factuality, emotion, controversy, and technicality) in our model. Lespagnol et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] discusses
this point in more details.
        </p>
        <p>
          Word embedding : Word embedding refers to the representation of a word in a semantic
space as a vector of numerical values. Words that are semantically and syntactically
similar tend to be close in this embedding space. To represent a sentence, we use the
pre-trained “Word vectors” which was trained on GoogleNews corpus using Word2Vec
model [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. We average the word vectors of every word in a sentence. When we could
not find a word in the model, we represent it with a zero vector. Although zero vector
affects the mean [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], this is indeed essential when we could not find any word of the
sentence in the model.
2.2
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Machine learning</title>
        <p>
          We have considered a machine learning model based on stochastic gradient descent
classifier with “log loss” function (AKA, Logistic regression). We keep the default
values of other hyper-parameters of the ML algorithm from Scikit-learn (version 3.2.4) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>
        We used the CLEF18 CheckThat! 2018 collection (CT-CWC-18) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for evaluation.
It corresponds to the transcriptions of political debates or speeches from the 2016 US
Presidential campaign. For each line of the transcription the training data set includes a
label indicating whether this statement is check-worthy (1) or not (0).
      </p>
      <p>
        The CT-CWC-18 consists of 3 sub-datasets with a total of 4; 064 sentences from
which 90 are check-worthiness. The test set consists of 7 sub-datasets for a total of
4; 882 sentences from which 192 are check-worthiness. The data set is strongly
unbalanced in favor to sentences that are not worth checking. While oversampling the
minority class is common practice in machine learning[
        <xref ref-type="bibr" rid="ref11 ref3">3,11</xref>
        ], it does not guarantee the
best results [
        <xref ref-type="bibr" rid="ref19 ref21">21,19</xref>
        ]. In our experiments, we studied both cases and report here the best
only, which is achieved without oversampling, keeping the initial data as it is.
      </p>
      <p>
        In Table 1, the results are presented in terms of mean average precision (MAP)
which is the official measure for the CLEF track [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]; we used the scripts from the
CheckThat! Lab organizers.
      </p>
      <p>
        While in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] we evaluated various other features and other feature combinations,
the best results were obtained when combining word embeddings and information
nutritional label based features. Moreover, also in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] we consider various machine
learning models. The best results have been obtained when using SGD Logloss (Stochastic
gradient descent classifier training using “log” loss function) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
C Prise de Fer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
h
ce Copenhagen [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
k
hT UPV-INAOE [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
a
t! IRIT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
      </p>
      <p>
        We also compared our method to the teams that participated in CLEF track,
including Prise de Fer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], Copenhagen [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], UPV-INAOE-Autoritas [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and IRIT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Among
the participants, the best performing system is Prise de Fer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] that obtained a MAP
score of 0.133. Prise de Fer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] represented the sentence using word-embedding
combined with POS-tags, syntactic dependencies, and some features including named
entities, sentiment, and verbal forms. They trained a multi-layer perceptron (MLP) model
with two hidden layers (100 units and 8 units, respectively) and the hyperbolic tangent
(tanh) as an activation function. The Copenhagen team [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] represented each sentence
using word-embedding combined with POS tags and syntactic dependencies. They trained
an attention based RNN with GRU memory units and obtained a MAP score of 0.115.
The UPV-INAOE team [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] obtained a MAP score of .113 where they used character
n-grams as features and k-nearest neighbors as the model. The IRIT team [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] used the
features based on information nutritional label, and trained an SVM model which
obtained a MAP score of 0.063.
      </p>
      <p>In Table 1, we describe three variants of our method namely SGD Logloss based on
information nutritional label based features (SGD Logloss-N), word-embedding based
features (SGD Logloss-W), and the combination of information nutritional label and
word embedding (SGD Logloss-NW). We can see the SGD Logloss-NW produces the
http://alt.qcri.org/clef2018-factcheck
best performance compared to the other two variants. Our method also outperforms all
the participating teams’ approaches in the CLEF2018 CheckThat! track.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>
        Identifying check-worthy statements has been recently investigated in different
studies. In ClaimBuster [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the authors used the transcripts of all of the US
presidential debates that were manually annotated. The authors proposed a SVM-based model
with sentence-level features such as sentiment, length, TF-IDF, POS-tags, and Entity
Types. Gencheva et al. integrated several context-aware and sentence-level features to
train both SVM and Feed-forward Neural Networks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This approach outperforms the
ClaimBuster system in terms of MAP and precision.
      </p>
      <p>
        The best performing system in CheckThat! Lab at CLEF 2018 related shared task is
Prise de Fer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] with MAP of 0.133. The sentence level features they used are
wordembedding combined with POS-tags, syntactic dependencies, named entities,
sentiment, and verbal forms. They trained a multi-layer perceptron (MLP) consisting of two
hidden layers and the hyperbolic tangent as the activation function.
      </p>
      <p>
        The second best performing system is Copenhagen team’s [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] that obtained a MAP
of 0.115. The authors represented the sentence using word embedding combined with
POS tags and syntactic dependency based features. This representation was used as
input to an RNN with GRU memory units, where the output from each word was
aggregated using attention, followed by a fully connected layer, from which the output was
predicted using a sigmoid function [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        The other participants used different representations such as character n-grams [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
or topics [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]; different machine learning algorithms such as SVM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Random
Forest [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], k-nearest neighbors [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], or Gradient boosting [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        In this communication, we present a method for predicting information check-worthiness
that was developed in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Experimental results on the CheckThat! 2018 collection shows that combing
information nutritional label and word-embedding using SGD Logloss model produces the
best performance and outperforms the known related methods. Oversampling the
training set have not improved the results although the training examples are unbalanced. In
future work, we would like to improve the model by integrating additional components
from the information nutritional label such as readability and other language model
such as BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Ethical issue. While Check That challenge has its proper ethical policies,
detecting information check-worthiness raises ethical issues that are beyond the scope of the
paper.</p>
      <p>Acknowledgement. This work has been partially funded by the European Union’s
Horizon 2020 H2020-SU-SEC-2018 under the Grant Agreement n°833115
(PREVISION project https://cordis.europa.eu/project/id/833115). The paper reflects
only the authors’ view and the Commission is not responsible for any use that may be
made of the information it contains</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosc</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lespagnol</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petitcol</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.: IRIT at checkthat!
          <year>2018</year>
          . In: Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bond</surname>
            ,
            <given-names>G.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schewe</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snyder</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speller</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          :
          <article-title>Reality monitoring in politics</article-title>
          .
          <source>In: The Palgrave Handbook of Deceptive Communication</source>
          , pp.
          <fpage>953</fpage>
          -
          <lpage>968</lpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chawla</surname>
            ,
            <given-names>N.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowyer</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>L.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kegelmeyer</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          :
          <article-title>Smote: synthetic minority oversampling technique</article-title>
          .
          <source>Journal of artificial intelligence research 16</source>
          ,
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fuhr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanselowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Nejdl</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          , et al.:
          <article-title>An information nutritional label for online documents</article-title>
          .
          <source>In: ACM SIGIR Forum</source>
          . vol.
          <volume>51</volume>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>66</lpage>
          . ACM (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gencheva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Ma`rquez, L.,
          <string-name>
            <surname>Barro´</surname>
            n-Ceden˜o,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>A context-aware approach for detecting worth-checking claims in political debates</article-title>
          .
          <source>In: Proceedings of the International Conference Recent Advances in Natural Language Processing</source>
          ,
          <string-name>
            <surname>RANLP</surname>
          </string-name>
          <year>2017</year>
          . pp.
          <fpage>267</fpage>
          -
          <lpage>276</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Go´mez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pardo</surname>
            ,
            <given-names>F.M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>UPV-INAOE - check that: Preliminary approach for checking worthiness of claims</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deciding what's true: The rise of political fact-checking in American journalism</article-title>
          . Columbia University Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simonsen</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lioma</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 checkthat! lab</article-title>
          . In: Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adair</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamilton</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tremayne</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The quest to automate fact-checking</article-title>
          . world (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennamoun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohel</surname>
            ,
            <given-names>F.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Togneri</surname>
          </string-name>
          , R.:
          <article-title>Cost-sensitive learning of deep feature representations from imbalanced data</article-title>
          .
          <source>IEEE transactions on neural networks and learning systems 29(8)</source>
          ,
          <fpage>3573</fpage>
          -
          <lpage>3587</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kleinbaum</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gail</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Logistic regression. Springer (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lespagnol</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullah</surname>
            ,
            <given-names>M.Z.</given-names>
          </string-name>
          :
          <article-title>Information nutritional label and word embedding to estimate information check-worthiness</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <fpage>941</fpage>
          -
          <lpage>944</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . Curran Associates, Inc. (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.:
          <article-title>”recherche d'information textuelle, apprentissage et plongement de mots”</article-title>
          . In: Document nume´rique. Herme`s (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>Barro´n-</article-title>
          <string-name>
            <surname>Cedeno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Ma`rquez, L.,
          <string-name>
            <surname>Zaghouani</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atanasova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyuchukov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Da San Martino, G.:
          <article-title>Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims</article-title>
          .
          <source>In: Proceedings of the Ninth International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF'18)</source>
          . pp.
          <fpage>372</fpage>
          -
          <lpage>387</lpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>arXiv preprint arXiv:1802</source>
          .
          <volume>05365</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Reshma</surname>
            ,
            <given-names>I.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaspard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franchet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brousset</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faure</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mejbri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.:
          <article-title>Training set class distribution analysis for deep learning model - application to cancer detection (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ullah</surname>
            ,
            <given-names>M.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shajalal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chy</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aono</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Query subtopic mining exploiting word embedding for search result diversification</article-title>
          .
          <source>In: Asia Information Retrieval Symposium</source>
          . pp.
          <fpage>308</fpage>
          -
          <lpage>314</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Provost</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Learning when training data are costly: The effect of class distribution on tree induction</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          <volume>19</volume>
          ,
          <fpage>315</fpage>
          -
          <lpage>354</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yasser</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kutlu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
          </string-name>
          , T.: bigir at CLEF 2018:
          <article-title>Detection and verification of checkworthy political claims</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zuo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karakas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A hybrid recognition system for check-worthy claims using heuristics and supervised learning</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>