An ML Model for Predicting Information
         Check-Worthiness using a Variety of Features

                                        Md Zia Ullah

                                  IRIT, UMR5505 CNRS
                                  mdzia.ullah@irit.fr
                  118 Route de Narbonne, 31062 Toulouse CEDEX 9, France

        Abstract. In this communication, we introduce the important problem of infor-
        mation check-worthiness. We present the method we developed to automatically
        answer this problem. This method makes use of an elaborated information rep-
        resentation that combines the “information nutritional label” features along with
        word-embedding features. The information check-worthy claim is then predicted
        by training a machine learning model based on these features. Our model outper-
        forms the official participants’ runs of CheckThat! 2018 challenge.

        Keywords: Information check-worthiness; Information nutritional label; Machine
        learning based model


1     Introduction
The main problems associated to automatic fact-checking consist of (1) deciding whether
a piece of information is worth being reviewed or not and (2) finding evidence that helps
in detecting if the fact is correct or if it is a fake. Information check-worthiness refers
to the first challenge and is specifically critical in political debates [8,2] where facts can
be manipulated, denied, or hidden.

2     Method
The approach we developed to tackle this problem relies both on word embedding us-
ing Word2Vec model [14] and on the Information Nutritional Label for online docu-
ments [5]. The former is now a common model to represent texts for various tasks [18,15].
On the other hand, the information nutritional label which was initially introduced to
“help readers making more informed judgments about the items they read” provides
scores for various criteria to qualify the content of a text and have shown to be help-
ful for deciding whether a piece of information should be prioritized for checking or
not [13,1].

2.1    Information representation
The information representation combines (a) the information nutritional label features
and (b) word embedding features.
    ”Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
    License Attribution 4.0 International (CC BY 4.0).”
Information nutritional label. The information nutritional label for online documents [5]
corresponds to a description of the textual information unit according to nine criteria as
follows:
 1. Factuality: the number of facts it mentions,
 2. Readability: the ease with which a reader can understand it,
 3. Virality: the speed at which it is propagated,
 4. Emotion: its emotional impact, both positive and negative emotion.
 5. Opinion: the number of opinionated sentences it contains,
 6. Controversy: the number of controversial issues it addresses,
 7. Authority/Trust/Credibility: its credibility and the authority and trust of the source
     it belongs to,
 8. Technicality: the number of technical issues it addresses and technical terms used,
 9. Topicality: its current interest which is time-dependent.
     From the initial label our method makes use of the ones that are underlined (factual-
ity, emotion, controversy, and technicality) in our model. Lespagnol et al. [13] discusses
this point in more details.

Word embedding : Word embedding refers to the representation of a word in a semantic
space as a vector of numerical values. Words that are semantically and syntactically
similar tend to be close in this embedding space. To represent a sentence, we use the
pre-trained “Word vectors” which was trained on GoogleNews corpus using Word2Vec
model [14]. We average the word vectors of every word in a sentence. When we could
not find a word in the model, we represent it with a zero vector. Although zero vector
affects the mean [20], this is indeed essential when we could not find any word of the
sentence in the model.

2.2   Machine learning
We have considered a machine learning model based on stochastic gradient descent
classifier with “log loss” function (AKA, Logistic regression). We keep the default val-
ues of other hyper-parameters of the ML algorithm from Scikit-learn (version 3.2.4) [17].


3     Results
We used the CLEF18 CheckThat! 2018 collection (CT-CWC-18) [16] for evaluation.
It corresponds to the transcriptions of political debates or speeches from the 2016 US
Presidential campaign. For each line of the transcription the training data set includes a
label indicating whether this statement is check-worthy (1) or not (0).
    The CT-CWC-18 consists of 3 sub-datasets with a total of 4, 064 sentences from
which 90 are check-worthiness. The test set consists of 7 sub-datasets for a total of
4, 882 sentences from which 192 are check-worthiness. The data set is strongly un-
balanced in favor to sentences that are not worth checking. While oversampling the
minority class is common practice in machine learning[3,11], it does not guarantee the
best results [21,19]. In our experiments, we studied both cases and report here the best
only, which is achieved without oversampling, keeping the initial data as it is.
     In Table 1, the results are presented in terms of mean average precision (MAP)
which is the official measure for the CLEF track [16]; we used the scripts from the
CheckThat! Lab organizers.
     While in [13] we evaluated various other features and other feature combinations,
the best results were obtained when combining word embeddings and information nu-
tritional label based features. Moreover, also in [13] we consider various machine learn-
ing models. The best results have been obtained when using SGD Logloss (Stochastic
gradient descent classifier training using “log” loss function) [12].


Table 1. MAP of the SGD Logloss ML algorithm, considering features based on Nutritional label
(N), Word-embedding (W), or the combination of both (NW) - without oversampling. First row
is the best MAP achieved at Checkthat! 2018 challenge.

                                           Method           MAP
                                           SGD Logloss – N  .079
                              Our model


                                           SGD Logloss – W .210
                                           SGD Logloss – NW .230

                                           Prise de Fer [23]   .133
                              CheckThat!


                                           Copenhagen [9]      .115
                                           UPV-INAOE [7]       .113
                                           IRIT [1]            .063


    We also compared our method to the teams that participated in CLEF track, includ-
ing Prise de Fer [23], Copenhagen [9], UPV-INAOE-Autoritas [7], and IRIT [1]. Among
the participants, the best performing system is Prise de Fer [23] that obtained a MAP
score of 0.133. Prise de Fer [23] represented the sentence using word-embedding com-
bined with POS-tags, syntactic dependencies, and some features including named enti-
ties, sentiment, and verbal forms. They trained a multi-layer perceptron (MLP) model
with two hidden layers (100 units and 8 units, respectively) and the hyperbolic tangent
(tanh) as an activation function. The Copenhagen team [9] represented each sentence us-
ing word-embedding combined with POS tags and syntactic dependencies. They trained
an attention based RNN with GRU memory units and obtained a MAP score of 0.115.
The UPV-INAOE team [7] obtained a MAP score of .113 where they used character
n-grams as features and k-nearest neighbors as the model. The IRIT team [1] used the
features based on information nutritional label, and trained an SVM model which ob-
tained a MAP score of 0.063.
    In Table 1, we describe three variants of our method namely SGD Logloss based on
information nutritional label based features (SGD Logloss-N), word-embedding based
features (SGD Logloss-W), and the combination of information nutritional label and
word embedding (SGD Logloss-NW). We can see the SGD Logloss-NW produces the
  http://alt.qcri.org/clef2018-factcheck
best performance compared to the other two variants. Our method also outperforms all
the participating teams’ approaches in the CLEF2018 CheckThat! track.


4   Related work
Identifying check-worthy statements has been recently investigated in different stud-
ies. In ClaimBuster [10], the authors used the transcripts of all of the US presiden-
tial debates that were manually annotated. The authors proposed a SVM-based model
with sentence-level features such as sentiment, length, TF-IDF, POS-tags, and Entity
Types. Gencheva et al. integrated several context-aware and sentence-level features to
train both SVM and Feed-forward Neural Networks [6]. This approach outperforms the
ClaimBuster system in terms of MAP and precision.
     The best performing system in CheckThat! Lab at CLEF 2018 related shared task is
Prise de Fer [23] with MAP of 0.133. The sentence level features they used are word-
embedding combined with POS-tags, syntactic dependencies, named entities, senti-
ment, and verbal forms. They trained a multi-layer perceptron (MLP) consisting of two
hidden layers and the hyperbolic tangent as the activation function.
     The second best performing system is Copenhagen team’s [9] that obtained a MAP
of 0.115. The authors represented the sentence using word embedding combined with
POS tags and syntactic dependency based features. This representation was used as
input to an RNN with GRU memory units, where the output from each word was aggre-
gated using attention, followed by a fully connected layer, from which the output was
predicted using a sigmoid function [9].
     The other participants used different representations such as character n-grams [7]
or topics [22]; different machine learning algorithms such as SVM [1], Random For-
est [1], k-nearest neighbors [7], or Gradient boosting [22].


5   Conclusion
In this communication, we present a method for predicting information check-worthiness
that was developed in [13].
    Experimental results on the CheckThat! 2018 collection shows that combing infor-
mation nutritional label and word-embedding using SGD Logloss model produces the
best performance and outperforms the known related methods. Oversampling the train-
ing set have not improved the results although the training examples are unbalanced. In
future work, we would like to improve the model by integrating additional components
from the information nutritional label such as readability and other language model
such as BERT [4].
    Ethical issue. While Check That challenge has its proper ethical policies, detect-
ing information check-worthiness raises ethical issues that are beyond the scope of the
paper.
   Acknowledgement. This work has been partially funded by the European Union’s
Horizon 2020 H2020-SU-SEC-2018 under the Grant Agreement n°833115 (PREVI-
SION project https://cordis.europa.eu/project/id/833115). The paper reflects
only the authors’ view and the Commission is not responsible for any use that may be
made of the information it contains


References
 1. Agez, R., Bosc, C., Lespagnol, C., Petitcol, N., Mothe, J.: IRIT at checkthat! 2018. In: Work-
    ing Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France
    (2018)
 2. Bond, G.D., Schewe, S.M., Snyder, A., Speller, L.F.: Reality monitoring in politics. In: The
    Palgrave Handbook of Deceptive Communication, pp. 953–968. Springer (2019)
 3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-
    sampling technique. Journal of artificial intelligence research 16, 321–357 (2002)
 4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional
    transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
 5. Fuhr, N., Giachanou, A., Grefenstette, G., Gurevych, I., Hanselowski, A., Jarvelin, K., Jones,
    R., Liu, Y., Mothe, J., Nejdl, W., et al.: An information nutritional label for online documents.
    In: ACM SIGIR Forum. vol. 51, pp. 46–66. ACM (2018)
 6. Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware
    approach for detecting worth-checking claims in political debates. In: Proceedings of the
    International Conference Recent Advances in Natural Language Processing, RANLP 2017.
    pp. 267–276 (2017)
 7. Ghanem, B., Montes-y-Gómez, M., Pardo, F.M.R., Rosso, P.: UPV-INAOE - check that:
    Preliminary approach for checking worthiness of claims. In: Working Notes of CLEF 2018
    - Conference and Labs of the Evaluation Forum, Avignon, France (2018)
 8. Graves, L.: Deciding what’s true: The rise of political fact-checking in American journalism.
    Columbia University Press (2016)
 9. Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: The copenhagen team participation in
    the check-worthiness task of the competition of automatic identification and verification of
    claims in political debates of the CLEF-2018 checkthat! lab. In: Working Notes of CLEF
    2018 - Conference and Labs of the Evaluation Forum, Avignon, France (2018)
10. Hassan, N., Adair, B., Hamilton, J.T., Li, C., Tremayne, M., Yang, J., Yu, C.: The quest to
    automate fact-checking. world (2015)
11. Khan, S.H., Hayat, M., Bennamoun, M., Sohel, F.A., Togneri, R.: Cost-sensitive learning of
    deep feature representations from imbalanced data. IEEE transactions on neural networks
    and learning systems 29(8), 3573–3587 (2017)
12. Kleinbaum, D.G., Dietz, K., Gail, M., Klein, M., Klein, M.: Logistic regression. Springer
    (2002)
13. Lespagnol, C., Mothe, J., Ullah, M.Z.: Information nutritional label and word embedding
    to estimate information check-worthiness. In: Proceedings of the 42nd International ACM
    SIGIR Conference on Research and Development in Information Retrieval. pp. 941–944
    (2019)
14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of
    words and phrases and their compositionality. In: Advances in Neural Information Process-
    ing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)
15. Mothe, J.: ”recherche d’information textuelle, apprentissage et plongement de mots”. In:
    Document numérique. Hermès (2020)
16. Nakov, P., Barrón-Cedeno, A., Elsayed, T., Suwaileh, R., Màrquez, L., Zaghouani, W.,
    Atanasova, P., Kyuchukov, S., Da San Martino, G.: Overview of the CLEF-2018 Check-
    That! Lab on automatic identification and verification of political claims. In: Proceedings of
    the Ninth International Conference of the CLEF Association: Experimental IR Meets Multi-
    linguality, Multimodality, and Interaction (CLEF’18). pp. 372–387. Springer (2018)
17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
    Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher,
    M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine
    Learning Research 12, 2825–2830 (2011)
18. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.:
    Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
19. Reshma, I.A., Gaspard, M., Franchet, C., Brousset, P., Faure, E., Mejbri, S., Mothe, J.: Train-
    ing set class distribution analysis for deep learning model – application to cancer detection
    (2019)
20. Ullah, M.Z., Shajalal, M., Chy, A.N., Aono, M.: Query subtopic mining exploiting word
    embedding for search result diversification. In: Asia Information Retrieval Symposium. pp.
    308–314. Springer (2016)
21. Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distri-
    bution on tree induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
22. Yasser, K., Kutlu, M., Elsayed, T.: bigir at CLEF 2018: Detection and verification of check-
    worthy political claims. In: Working Notes of CLEF 2018 - Conference and Labs of the
    Evaluation Forum, Avignon, France (2018)
23. Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims
    using heuristics and supervised learning. In: Working Notes of CLEF 2018 - Conference and
    Labs of the Evaluation Forum, Avignon, France (2018)