UPV-INAOE-Autoritas - Check That:
 Preliminary Approach for Checking Worthiness
                  of Claims

                   Bilal Ghanem1 , Manuel Montes-y-Gómez3 ,
                     Francisco Rangel1,2 , and Paolo Rosso1
          1
             PRHLT Research Center, Universitat Politècnica de València
             {bigha@doctor, prosso@dsic, fraranpa@prhlt}.upv.es
                      2
                        Autoritas Consulting, Valencia, Spain
 3
   Instituto Nacional de Astrofı̀sica,Óptica y Electrónica (INAOE), Puebla, Mexico
                                 mmontesg@inaoep.mx


      Abstract. Journalists usually work for a long time to investigate presi-
      dential debates. Their main role is to extract the sentences in the debates
      that include information about facts or previous events. These sentences
      are called claims. The investigation process of these claims is important
      where it can reveal how credible is the speaker or the other candidates.
      Therefore, proposing systems for extracting these claims can certainly
      improve the press work. In this paper, we will present our approach for
      CLEF-2018 Check That lab for Task 1. We propose an approach that
      uses a text distortion technique to detect claims that are worthy for
      checking. Our approach has achieved an acceptable results taking into
      account the complexity of this task.

      Keywords: Factual Claims, English, Arabic.


1   Introduction

In the recent years, the political situation in many countries started to be more
complex, which made the politicians started to exchange accusations in the pub-
lic political debates, especially in the presidential debates. The prevailing situ-
ation in any debate is that each presidential candidate has a short period to
respond to the claims of the other candidates. Each of them has the right to
accuse the other with different claims, where the main objective is to convince
the audience about his ability for that political position. During these long de-
bates, the journalists’ role is to investigate and validate the mutual accusations
between those candidates to reveal the truth behind each claim. This task is
complex to be done manually and in a short time, since the debates are very
long, and each candidate declares many claims. Many of these claims are just
opinions while the others contain previous facts that happened before. Starting
from this issue, recent research topics are moving towards detecting check worthy
claims in the presidential debates, where automatic approaches will save a lot of
time to journalists for this process. In this paper, we present our approach for
detecting claims that are worthy for checking, we tried to predict factual claims
by highlighting specific cue words in a classification process. In [1] we presented
our approach for task 2 (checking claims factuality). In the following section, we
present the previous works in the literature, then in Section 3, we describe the
task. In Section 4, we describe our approach. Section 5 presents the experimental
results, and finally in Section 6 we draw some conclusions and discuss further
work.


2   Related Work
The main orientation in the literature was focused more on another related issue
on claims, about detecting their factuality. Some researchers have proposed mul-
tiple approaches using different types of features to handle this issue. In [2] the
authors have used a tree kernel approach to detect claims in the argument min-
ing domain. Their approach used consistency trees of sentences to detect claims
by capturing the trees similarities. They employed a support vector machine
(SVM). In a different direction, a set of textual features to detect worthy claims
in the presidential debates was proposed in [3]. They used sentiment polarity,
number of words in a sentence, Bag-of-Words (BOW) using the Tf-Idf weighting
scheme, Part-of-Speech tags, and named entities types. Since they generated a
large feature set, they used a feature selection technique to extract the top N
important features. Finally, different classifiers were tested, where the Random
Forest classifier shows better results. Another related approach was proposed
in [4], similarly, also for the presidential debates. The authors proposed different
types of features, such as sentence-level features (sentiment, named entities, lin-
guistic features etc.), contextual features (the position of a claim in the debate),
and other mixed features such as text embedding, the discussed topic, and con-
tradictions in the debate. For their approach, a SVM and a deep forward neural
network (FNN) were used. In the results, the FNN showed better results com-
paring to the SVM. They created a corpus from the USA presidential debates of
2016, with a total of 5,415 sentences and they made it available to the research
community.


3   Task Description
A set of presidential debates from the US presidential election are presented for
the task, where each claim in the debate text has been tagged manually as worth
to be check (1) or not (0). The text of the debates is used for the task as it is, to
give the opportunity to the participants to exploit also contextual features in the
debates. These debates are provided in two languages, English and Arabic, where
the Arabic text is obtained translating from the English debates. The dataset
that was provided is totally imbalanced, where the total number of claims is
4064: 90 claims are worth to be check and 3970 are not. The task goal is to
detect the claims that are worthy for checking and to rank them, from the most
worthy one for checking to the lowest one. The Average Precision was used as a
performance measure. More details of the task are mentioned in [8].


4     Proposed Approach

Previously, the authors in [5] have proposed a text distortion technique to en-
hance thematic text clustering by maintaining the words that have a low fre-
quency in a document. Later on, a similar research in [6] has used the same
text distortion technique for authorship attribution task, where the author has
maintained the words that have the highest frequency in the documents, in an
attempt to detect the author from his writing style.
    We believe that this type of tasks is more thematic than stylistic, where the
writing style is not important as the thematic words. In our approach, we used
the same text distortion technique to detect worthy claims, where we concealed
words that have high frequency in documents and maintaining (highlighting)
other cue words that are used more in factual claims. Therefore, we followed [5]
in their approach by concealing the most frequent words.


4.1   Text Distortion with Linguistic Features

In our approach, we have maintained the thematic words (that have the lowest
frequency) using a threshold (C). The higher value of C is, the more thematic
words are maintained. Also, we maintained a set of linguistic cue words (LC)
that were used previously in [7] to infer the credibility of news (see Table 1).
Additionally, we maintained also the named entities (NE) from being distorted,
such as: Iraq, Trump, America. Through the manually checking of the claims, we
found the checking worthy claims tended to list different types of named entities.

                   Table 1. Samples from the linguistic lexicons.

             Linguistic lexicons                 Examples
                 Assertives      appear, declare, guarantee, hypothesize
                  Factives             learn, realize, know, discover
                  Hedges              almost, guess, indicate, mostly
                Implicatives         cause, manage, hesitate, neglect
                  Report             admit, answer, clarify, comment
                    Bias           adhere, act, agree, allow, addition
                Subjectivity      afraid, champ, apologist, amusement

    In Table 2, we show an example of the distortion process.
    After applying the distortion process, the new version of the text was used by
the char n-gram model using Tf-Idf weighting scheme. The new distorted text,
become less biased by the high frequency words, such as stopwords. Finally,
after preparing the distorted text, there is still one issue which is the value of
C variable. The value of C is crucial, being a threshold between the amount
     Table 2. An example of the text distortion process using different values of C.

       Original claim  It was actually $1.7 billion in cash, obviously, I guess for
                                             the hostages.
          C=0         ** *** ******** $ # . # ******* ** **** , ********* , *
                                       ***** *** *** ******** .
     C = 0 & LC+NE      ** *** actually $ # . # ******* ** **** , obviously , *
                                       guess *** *** ******** .
    C = 2000 & LC+NE ** *** actually $ 1 . 7 ******* ** cash , obviously , * guess
                                           *** *** hostages .
    C = 3500 & LC+NE It *** actually $ 1 . 7 billion in cash , obviously , I guess
                                           for *** hostages .


of thematic and the stylistic words. In Section 5, we show how we select the
most appropriate value of C. For the Arabic language, we employed the same
approach, where the only issue we had was the Arabic version of the linguistic
lexicons. The manual translation of them is a time-consuming process, where
they are quite large. Therefore, we used Google Translation API to translate
these lexicons.


5     Experiments and Results

In this section, we present the tuning process of our approach. We carried
out many experiments to test different machine learning classifiers. We found
that the K-Nearest-Neighbor (KNN) has achieved the highest Average Precision
value. We had have two parameters to select the best model: the value of K-
neighbors (K) of the KNN classifier and the C value of the distortion ratio. The
selection process of these two values is hard to be set manually, therefore, we
used the Grid Search technique to select the most appropriate values of these
two parameters. The best value of K is 1, where for C value is 1700. The low
value of K is due to the highly imbalanced situation of the dataset; larger values
tend to bias the classifier to the majority class. A similar process was applied
to select the best parameters but using word n-gram rather than character. For
the evaluation, the Average Precision @N was used. The results of both runs
are showed in Table 3. From these runs, we can see that char n-gram model
outperformed clearly the one using word n-gram.
Table 3. The results obtained during the tuning phase using word and char n-gram
models. We chose @N in our experiments as the last record in the testing part.

              Approach           Classifier K C value n-gram AVG Precision @N
    Text Distortion + char n-gram KNN 1 1700             4        0.234
    Text Distortion + word n-gram KNN 2 2500             1        0.157
        Baseline, word n-gram      SVM                   1        0.163

Ranking Claims After the claims have been detected, it is important to rank
them based on their worthiness for checking. For the ranking process, we used
the KNN classifier. We ranked the claims based on the KNN confidence in the
classification process. At the beginning, we extracted the distances to the nearest
neighbor (since we used K-neighbor equal to 1) for all the predictions in the test
file. Then we applied a normalization for the distances to range 0-1. For each
predicted instance, we checked the class type of the nearest neighbor: if it was
positive, we subtracted the distance value from 1 and we used it for the ranking.
We subtracted the distance from 1 to take the inverse of it: the small distance
value (near to zero) mean a high classification confidence. The highest value
(near to 1) is the one that obtained a higher rank (more worthy for checking).
We applied the same process when the nearest neighbor is from the negative
class, the rank value by -1, in order to discriminate the positive and the negative
instances.
     As we mentioned before, the used measure for this task is the Average Pre-
cision. In the official testing phase, multiple testing files were presented. For the
final results the Mean Average Precision (MAP) was used. The official results of
the task 1 are shown in Table 4. In the English part of the task, our approach


        Table 4. Official results for the Task1, released using MAP measure.

                        Team              English        Arabic
                     Prise de Fer         0.1332
                     Copenhagen           0.1152
                     UPV-INAOE            0.1130         0.0585
                        bigIR             0.1120         0.0899
                      Fragarach           0.0812
                         blue             0.0801
                       RNCC               0.0632


has achieved the third position among seven teams, where the results are close
to each other. In the Arabic part, only two teams have submitted their results.
Similar to English, the results are close and there is not big difference between
them. We believe that the lower results of our approach in the Arabic part is
because of the automatic translation of the lexicons. A manual translation would
have been more reliable.


6   Conclusion and Future Work

Detecting the claims that are worthy to be checked is important being a prelim-
inary step for detecting their factuality. With these two steps, we can improve
the journalists work of manual investigating for instance presidential debates.
As a result of that, journalists can achieve their work quicker and in an easier
manner. As we saw from the official results, performances are low, showing the
difficulty of the task. Also, we can conclude that text distortion method worked
better than using the full text in the classification process, where it has improved
the results comparing to the baseline with the normal BOW method. As future
work, we will try to test more features that could discriminate the claims that
are worthy to be checked from those that are not.


7    Acknowledgements

The authors acknowledge the SomEMBED TIN2015-71147-C2-1-P MINECO
research project. The work on the data in Arabic as well as this publication were
made possible by NPRP grant #9-175-1-033 from the Qatar National Research
Fund (a member of Qatar Foundation). The statements made herein are solely
the responsibility of the last two authors.


References
1. Ghanem, Bilal, Manuel Montes-y-Gòmez, Francisco Rangel and Paolo Rosso. UPV-
   INAOE-Autoritas - Check That: An Approach based on External Sources to Detect
   Claims Credibility. In Working Notes of CLEF 2018 - Conference and Labs of the
   Evaluation Forum, CLEF ’18, Avignon, France, September.
2. Lippi, Marco, and Paolo Torroni. Context-Independent Claim Detection for Argu-
   ment Mining. In IJCAI, vol. 15, pp. 185-191. 2015.
3. Hassan, Naeemul, Chengkai Li, and Mark Tremayne. Detecting Check-Worthy Fac-
   tual Claims in Presidential Debates. In Proceedings of the 24th ACM International
   on Conference on Information and Knowledge Management, pp. 1835-1838. ACM,
   2015.
4. Gencheva, Pepa, Preslav Nakov, Lluı́s Màrquez, Alberto Barrón-Cedeño, and Ivan
   Koychev. A Context-Aware Approach for Detecting Worth-Checking Claims in Po-
   litical Debates. In Proceedings of the International Conference Recent Advances in
   Natural Language Processing, RANLP 2017, pp. 267-276. 2017.
5. Granados, Ana, Manuel Cebrian, David Camacho, and Francisco de Borja Ro-
   driguez. Reducing the Loss of Information Through Annealing Text Distortion.
   IEEE Transactions on Knowledge and Data Engineering 23, no. 7 (2011): 1090-
   1102.
6. Stamatatos, Efstathios. Authorship Attribution using Text Distortion. In Proceed-
   ings of the 15th Conference of the European Chapter of the Association for Com-
   putational Linguistics: Volume 1, Long Papers, vol. 1, pp. 1138-1149. 2017.
7. Mukherjee, Subhabrata, and Gerhard Weikum. Leveraging Joint Interactions for
   Credibility Analysis in News Communities. In Proceedings of the 24th ACM Inter-
   national on Conference on Information and Knowledge Management, pp. 353-362.
   ACM, 2015.
8. Atanasova, Pepa and Màrquez, Lluı́s and Barrón-Cedeño, Alberto and Elsayed,
   Tamer and Suwaileh, Reem and Zaghouani, Wajdi and Kyuchukov, Spas and Da
   San Martino, Giovanni and Nakov, Preslav, Overview of the CLEF-2018 Check-
   That! Lab on Automatic Identification and Verification of Political Claims. Task 1:
   Check-Worthiness. In Working Notes of CLEF 2018 - Conference and Labs of the
   Evaluation Forum, CLEF ’18, Avignon, France, September.