iCompass at CheckThat! 2022: Combining Deep
Language Models for Fake News Detection
Bilel Taboubi1 , Mohamed Aziz Ben Nessir1 and Hatem Haddad1
1
    iCompass, Emeraude Palace, Rue du Lac Windermère, Les Berges du Lac, Tunis 1053


                                         Abstract
                                         Users of social media tend to explore different platforms to obtain news and find information about
                                         different events and activities. Furthermore they read, share, publish news with no prior knowledge of
                                         the certainty of being real or fake. This necessitates the development of an automated system for fake
                                         news detection. In this paper we report a system and its output as part of CLEF2022 - CheckThat! Lab
                                         Fighting the COVID-19 Infodemic and Fake News Detection. Task 3 was carried out using two BERT
                                         base uncased and data preprocessing with stop-words removal, lemmatization. We achieve an F1 score
                                         of 0.339 on news classification on English dataset.

                                         Keywords
                                         Categorical Classification, fake news detection, BERT, RoBERTa


1. Introduction
Social media platforms have grown to unimaginable heights with a vast amount of information
exponentially increasing. This information flow increase allows social media platforms to be a
host for plenty of unwanted, untruthful and misleading information that can be made and shared
by anyone. As a result a category of people took advantage of it and started disseminating
false information about people or entities, making negative impacts to individuals, business
and society. The amount of information being shared is uncontrollable and cannot be totally
covered by manually fact checking sites, as a result an automated system to detect whether
an information is real or fake is in need. In this paper, we have tackled Task 3: Fake News
Detection CLEF2022-CheckThat! [1]. The task required multi-class classification of articles to
determine the article claim is true, false, partially false or other (lack of evidence). This task
is offered as a mono-lingual task in English and as cross-lingual task for English and German
(English training data, German test data) [2]. The paper discusses the results obtained on the
English dataset with pre-trained transformers models and pre-processing techniques applied.


CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ bileltaboubi20@gmail.com (B. Taboubi); mohamedaziz.bennessir@etudiant-isi.utm.tn (M. A. B. Nessir);
haddad.hatem@gmail.com (H. Haddad)
 0000-0003-3599-7229 (H. Haddad)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
2. Tasks Definition
Task 3 is a multi-class classification problem. Given the text and the title of a news article,
determine whether the main claim made in the article is true, partially true, false, or other.
This task is offered as a mono-lingual task in English and as cross-lingual task for English and
German.
  CheckThat!2022 lab organizers [3] defined the labels for the categories as follows:

    • False - The claim made in an article is untrue.
    • Partially False - The given claim have weak evidence of the claim and cannot be considered
      as 100% true or false,
    • True - The claim is totally true.
    • Other - Articles that cannot be proven as false, true or partially true.


3. Literature Review
Internet and social media platforms became a main part in our daily life, our main source for
information as well as for misinformation. With the huge increase of false information social
media needs to curb the spreading of misinformation through their platform. As a result, fake
news detection got wide attention in the NLP research community. In [4], authors conducted
an exploratory study of COVID-19 misinformation on Twitter. They created two dataset, the
first one contains 1500 tweets relating to 1274 false and 226 partially false claims collected
from fact-checked claims related to COVID-19 by professional fact-checking organisations with
different languages. The non-English tweets got translated into English language with the use
of google translator API. The second dataset contains a corpus of 163,096 English tweets with
purpose of understanding the misinformation around COVID-19. This study showed that false
claims propagate faster than any other fake news category and even verified Twitter accounts of
celebrities and organizations are taking part in misinformation spread. In [5] Das et al. proposed
an Ensemble model for COVID-19 fake news detection for the Constraint COVID19 shared
task [6]. They combined pre-trained models with a heuristic algorithm based on the username
handle and link-domain in tweets. In [7], authors created a multilingual dataset ‘The FakeCovid’
collected from 92 different fact-checking websites with 5182 articles circulated in 105 countries
where 40.8% of articles are written in the English language, the dataset was manually annotated
in in three languages English, Hindi and German due to limited knowledge about other different
languages. They applied BERT based without finetuning and with preprocessing techniques
on the data such as abbreviations and contractions of words, spelling correction to achieve
F1-score of 0.76 on English dataset. In [8], authors presented a semi-automatic framework
’AMUSED’ to collect data from different networking sites such as Twitter, YouTube, and Reddit
in different languages with the following steps, identify domain and data sources, scrap the web
and detect language, extract social media links and crawl data from them, label the crawled
data, human verification and finally merge the social media crawled data with the details from
the news articles. They made a use case of COVID-19 misinformation with the framework to
collect 8,077 fact-checked news articles from 105 countries in 40 languages. In [9], authors
presented overview of the CLEF-2021 CheckThat! Lab on Task 3 fake news detection where
they described Task 3A, which is about determining whether a claim is true, partially true,
false, or other, and Task 3B, which is about classifying an article to a topical domain (health,
crime, climate, election, and education). Thus they described the provided data for each task
and their collection and annotation steps, the participants team and their solution. There were
27 teams for Task 3A, The best performing system for Task 3A was obtained by NoFake team
and achieved a macro F1-score of 0.84 and was ahead of the rest by a rather large margin, they
applied BERT base and trained it with additional data from different fact-checking websites.
For Task 3B there were 20 teams and the best system was made by NITK_NLP [10] achieving
0.88 marco F1 score with an ensemble of three transformers models.


4. Data Preparation
The provided dataset contains about 1264 articles in English (title and text) with the respective
label (true, partially true, false, or other) divided into a training set with 900 rows and a
development set with 394 rows. Table 1 presents a sample of the dataset for task 3 and table 2
introduces the distribution of the dataset according to their respective classes.

Table 1
Samples of Task 3 dataset
    public_id     text                                                 title               rating
                                                                       The Texas State
                  U.S. military officials worked to ensure President
                                                                       Senate – Senator
    1145ea7c      Trump wouldn’t see the warship that bears the                            true
                                                                       Paul Bettencourt:
                  name of the late senator, a frequent target ...
                                                                       District 7
                  A 2,500-strong border and coastguard corps could
                  see armed personnel sent to Greece. The island of    EU army to
    2d06d27c                                                                               false
                  Lesbos has been deluged with migrants The            protect borders
                  European Union’s ...


Table 2
Data distribution of task 3

                                     rating            occurence
                                     True              211
                                     False             578
                                     Partially False   358
                                     Other             117


   For the data pre-processing we applied various techniques, such as applying lowercase,
lemmatization, English stopwords removal such as “are”, “the”, “is” and etc, punctuation removal
using NLTK [11] library. The dataset contained null values for texts and titles. In order to make
it more manageable null values for titles were replaced by their texts and null values for texts
were replaced with their titles.
5. Approach
This paper introduced two concatenated parallel BERT models for classifying whether the news
are real or fake. The process of predicting the news different categories true, false, partially
false or other is done by using the following architecture:
   - Title input layer, Text input layer.
   - BERT model for text input followed by a gated recurrent unit with 128 untis and 0.3
probability for dropout and a dropout layer with 0.1 probability.
   - BERT model for title input following by global max pooling layer and dropout layer with
0.1 probability.
   - Concatenation layer to concatenate the output of the BERT models.
   - Dense layer with softmax activation function and four units.

Figure 1: News classes prediction steps


  As presented in figure 1 the model takes as input each of text and title after getting pre-
processed. The inputs layers passes the data to BERT models one for the text input and the other
for title input. Before predicting the classes, the output of each BERT model get concatenated.
6. Pre-trained Models
In order to achieve the best results different pre-trained models were used, combined and
fine-tuned with different hyperparameters for the multi-class classification task.

BERT base Uncased
BERT [12] is a trained Transformer Encoder stack that uses bidirectional self-attention. The
BERT’s model architecture is composed by multiple encoder layers (also called Transformer
Blocks) twelve in the Base version. Thus, it has larger feedforward networks (768 hidden units)
and 12 attention heads. The model is trained on unlabeled data over different pre-training
tasks. For finetuning, the BERT model is initialized with the pre-trained parameters and can
be used directly. The model initial parameters change by training it by labeled data from the
downstream tasks such as Masked LM, Next Sentence Prediction.

RoBERTa
The self-supervised transformer model RoBERTa [13] was trained on a enormous corpus of
English data containing five English-language corpora of varying sizes and domains, totaling
over 160GB of uncompressed text. Self-supervised means it was pre-trained on raw texts with
no human annotation, and then utilized an automated way to generate inputs and labels from
those texts. RoBERTa model achieves state-of-the-art results on GLUE (The General Language
Understanding Evaluation), RACE (The ReAding Comprehension from Examinations) and
SQuAD (The Stanford Question Answering Dataset).


7. Results & Discussion
Pre-trained models BERT base uncased and RoBERTa were trained and finetuned with the
following architecture: the model is a multi-input, a concatenation between 2 sub-model just
before the classification layer where the first is taking text input followed by embedding layer
which will contain a BERT model , Gated recurrent network layer with 128 units and 0.3 dropout
rate, global max pooling and a dropout layer. The second sub-model consisted of an input layer,
Embedding layer which will contain a second BERT model, Global max pooling layer and a
dropout layer. The average training time of a model is around 8 minutes. Best results achieved
by each pre-trained model is presented in the table 3 where they got trained on the train set,
tested with dev set.

Table 3
Task 1A Pre-trained models results on test set.

                           Type             F1     Accuracy   Precision   Recall
                     BERT base uncased     0.513     0.511      0.555     0.511
                         RoBERTa           0.227     0.237      0.220     0.237
   RoBERTa was pre-trained on a bigger vocabulary than BERT base uncased but it got outper-
formed and that is due to the limited resources available to train the models, the batch size and
sequence length was limited where we was unable to exceed 10 batch size and 128 sequence
length while training RoBERTa model with the proposed architecture.
   The submitted model was BERT base uncased, trained with a 10 epochs, 2e-5 learning rate
for Adam optimizer, a sequence length of 128, 22 batch size and categorical cross entropy loss
function. The model achieved F1_score 0.513 on the dev set.
   Our model for task 3 achieved interesting results on English test set and we were placed first
in task 3 ranking leaderboard with 0.339 macro F1 measure among 25 participants as shown in
table 4.

Table 4
Top 3 on Task 3 English leaderboard

                                     Team        Accuracy     F1-Score
                                   iCompass        0.547       0.339
                                   nlpiruned       0.541       0.332
                                   awakened        0.531       0.323


  The low macro F1 score can be explained with the categories ‘other’ and ‘partially false’, since
these classes presented low precision and recall scores as shown in the table 5.

Table 5
Classification report on the test set

                               iCompass        precision    recall   F1-Score
                                  false          0.636      0.832     0.721
                                 other           0.105      0.065     0.079
                             partially false     0.145      0.214     0.173
                                  true           0.602      0.281     0.383


8. Conclusion
In this paper, we analysed pre-trained models BERT base uncased and RoBERTa. In order
to obtain the best macro F1 for fake news classification on the English dataset different pre-
processing techniques were used such as stopwards removal, lemmatization, etc. with the
purpose of reducing irrelevant words from the text and title for training. Our model attained
0.339 macro F1 measure which is unsatisfactory and that is due to the data low distribution
specially for the categories ‘other’ and ‘partially false’. In future, we will explore augmentation
and resembling strategies to create a large balanced dataset for training and validating our
proposed model and try to overcome our limitations.
References
 [1] J. Köhler, G. K. Shahi, J. M. Struß, M. Wiegand, M. Siegel, T. Mandl, Overview of the
     CLEF-2022 CheckThat! lab task 3 on fake news detection, in: Working Notes of CLEF
     2022—Conference and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
 [2] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
     T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
     N. Babulkov, Y. S. Kartal, J. Beltrán, The clef-2022 checkthat! lab on fighting the covid-19
     infodemic and fake news detection, in: M. Hagen, S. Verberne, C. Macdonald, C. Seifert,
     K. Balog, K. Nørvåg, V. Setty (Eds.), Advances in Information Retrieval, Springer Interna-
     tional Publishing, Cham, 2022, pp. 416–428.
 [3] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
     T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
     N. Babulkov, Y. S. Kartal, J. Beltrán, M. Wiegand, M. Siegel, J. Köhler, Overview of the
     CLEF-2022 CheckThat! lab on fighting the COVID-19 infodemic and fake news detection,
     in: Proceedings of the 13th International Conference of the CLEF Association: Information
     Access Evaluation meets Multilinguality, Multimodality, and Visualization, CLEF ’2022,
     Bologna, Italy, 2022.
 [4] G. K. Shahi, A. Dirkson, T. A. Majchrzak, An exploratory study of covid-19 mis-
     information on twitter, Online Social Networks and Media 22 (2021) 100104. URL:
     https://www.sciencedirect.com/science/article/pii/S2468696420300458. doi:https://doi.
     org/10.1016/j.osnem.2020.100104.
 [5] S. D. Das, A. Basak, S. Dutta, A heuristic-driven ensemble framework for covid-19 fake
     news detection, 2021. URL: https://arxiv.org/abs/2101.03545. doi:10.48550/ARXIV.2101.
     03545.
 [6] P. Patwa, S. Sharma, S. Pykl, V. Guptha, G. Kumari, M. S. Akhtar, A. Ekbal, A. Das,
     T. Chakraborty, Fighting an infodemic: COVID-19 fake news dataset, in: Combating Online
     Hostile Posts in Regional Languages during Emergency Situation, Springer International
     Publishing, 2021, pp. 21–29. URL: https://doi.org/10.1007%2F978-3-030-73696-5_3. doi:10.
     1007/978-3-030-73696-5_3.
 [7] G. K. Shahi, D. Nandini, FakeCovid- A Multilingual Cross-domain Fact Check News Dataset
     for COVID-19, ICWSM, 2020. URL: https://doi.org/10.36190/2020.14. doi:10.36190/2020.
     14.
 [8] G. K. Shahi, Amused: An annotation framework of multi-modal social media data, 2020.
     URL: https://arxiv.org/abs/2010.00502. doi:10.48550/ARXIV.2010.00502.
 [9] J. M. S. Gautam Kishore Shahi, T. Mand, Overview of the clef-2021 checkthat! lab: Task 3
     on fake news detection (2021). URL: http://ceur-ws.org/Vol-2936/paper-30.pdf.
[10] A. K. M. Hariharan RamakrishnaIyer LekshmiAmmal, Overview of the clef-2021 checkthat!
     lab: Task 3 on fake news detection (2021). URL: http://ceur-ws.org/Vol-2936/paper-49.pdf.
[11] S. Bird, E. Klein, E. Loper, Natural language processing with Python: analyzing text with
     the natural language toolkit, " O’Reilly Media, Inc.", 2009.
[12] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, 2018. URL: https://arxiv.org/abs/1810.04805.
     doi:10.48550/ARXIV.1810.04805.
[13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, 2019. URL: https:
     //arxiv.org/abs/1907.11692. doi:10.48550/ARXIV.1907.11692.