CIC at CheckThat! 2022: Multi-class and Cross-lingual
Fake News Detection
Muhammad Arif1 , Atnafu Lambebo Tonja1 , Iqra Ameer1 , Olga Kolesnikova1 ,
Alexander Gelbukh1 , Grigori Sidorov1 and Abdul Gafar Manuel Meque1
1
  Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Av. Juan de Dios Batiz, s/n, 07320,
Mexico City, Mexico


                                      Abstract
                                      Nowadays, social media is one widely used platform to access information. Fake news on social media
                                      and various other media is widely spreading. It is a matter of serious concern due to its ability to cause
                                      a lot of social and national damage with destructive impacts. Therefore, detecting misleading news
                                      is critical to detect automatically. Fake news detection software has been used in a variety of fields,
                                      such as social media, health, political news, etc. This paper presents the Instituto Politécnico Nacional
                                      (Mexico) at CheckThat! 2022. In this paper, we discuss using different algorithms for the multiclass and
                                      cross-lingual fake news detection task. We achieved a macro F1-score of 28.60% for a mono-lingual task
                                      in English (task 3a) using RoBERTa pre-trained model and 17.21% for a cross-lingual task for English and
                                      German (task 3b) using Bi-LSTM deep learning algorithm.

                                      Keywords
                                      Fake news detection, Cross-lingual classification, Multi-class detection, Fake news detection for low
                                      resource languages, Transfer learning


1. Introduction
Fake news refers to falsified news or propaganda that is disseminated through traditional media
platforms such as print and television, as well as non-traditional media platforms such as social
media [1]. The primary objective of disseminating such information is to deceive readers, harm
a company’s reputation, or profit from sensationalism. It is widely regarded as one of the most
serious threats to democracy, free speech, and social order [2]. Fake news is rapidly being
disseminated through social media platforms such as Twitter and Facebook, according to [3].
These platforms provide a venue for the general public to express their thoughts and opinions
in an unfiltered and uncensored manner. Compared to conventional method views from media
publishers’ platforms, some news pieces hosted or shared on social media sites receive more
views. According to researchers who researched the speed with which fake news spreads on
Twitter, tweets containing misleading information reach individuals six times faster than factual

CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ mariff2021@cic.ipn.mx (M. Arif); atnafu.lambebo@wsu.edu.et (A. L. Tonja); iqra@nlp.cic.ipn.com (I. Ameer);
kolesolga@gmail.com (O. Kolesnikova); gelbukh@cic.ipn.mx (A. Gelbukh); sidorov@cic.ipn.mx (G. Sidorov);
gafar1_meque@cic.ipn.mx (A. G. M. Meque)
 https://atnafuatx.github.io/ (A. L. Tonja)
 0000-0001-6141-0204 (M. Arif); 0000-0002-3501-5136 (A. L. Tonja); 0000-0002-1134-9713 (I. Ameer)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
tweets [2]. The negative consequences of false news appear to be unavoidable, ranging from
making people believe Hillary Clinton had an alien baby to affecting the 2016 US presidential
elections [3]. A few facts about fake news in the United States are as follows:

    • 62% of Americans get their news from social media [4]
    • Fake news had a higher Facebook share than legitimate news [5]

   Another interesting, however, a sad example is a false statement claiming that "alcohol is a
cure for COVID-19" caused multiple deaths and hospitalizations in Iran, according to [6]. This
shows how helpless we are against fake news in some critical situations and how severe the
consequences can be if we ignore them. The first step in dealing with fake news is recognizing
and distinguishing it from real news.
   Detecting fake information on social media poses numerous new and challenging research
problems. Although fake news itself isn’t a new problem–nations or groups have been using the
news media to execute propaganda or affect operations for centuries–the rise of web-generated
news on social media makes fake news a more powerful force that challenges traditional
journalistic norms. There are numerous traits of this hassle that make it uniquely challenging
for automatic detection [7]. In this paper, we propose a methodology to trained the model that
detects whether an article is authentic or fake based on its words, phrases, sources, and titles. It
will apply supervised machine learning algorithms on an annotated (labeled) dataset and for
automatic fake news detection.
   We used three models: Passive Aggressive Classifier (PAC), a machine learning algorithm;
Bi-LSTM, a deep learning algorithm; and RoBERTa a pre-trained language model from the BERT
family for fine-tuning. Then, according to confusion matrix results, feature selection methods
are applied to experiment and choose the best-fit features to obtain the highest precision. We
propose to create the model using different classification algorithms. The product model will
test the unseen data, the results will be plotted, and accordingly, the product will be a model
that detects and classifies fake articles and can be used and integrated with any system for
future use.[8].
   This paper discusses multi-class and cross-lingual fake news detection methods for the shared
task at CheckThat!2022. The paper is organized as follows: section 2 describes the past work
related to this study, section 3 gives an overview of the dataset statistics, section 4 explains
the methodology adopted in this study including used algorithms, section 5 emphasizes on the
experimental results and description. Finally, section 6 concludes the paper and sheds some
light on possible future work.


2. Related work
Faking a piece of news has been part of all eras of technology in the form of yellow journalism.
However, since the advent of social media, the impact of the harm has grown many folds. It has
hence been one of the most challenging problems for researchers to solve for the last decade
as it is very difficult to distinguish fake text from real text [9, 10, 11]. Fake news detection
approaches, by and large, fall into two classes relying upon whether they use (1) news content
or (2) social settings [7]. Theoretical fake news studies have examined the classification of fake
news in the form of misinformation, disinformation, hysteria, falsehood, propaganda, clickbait,
and conspiracy theories. The last decade has evidenced considerable advances in research on
fake news that has had a real-life impact.
   With the emergence of larger datasets, the research on fake news detection got into force
when numerous studies arose in 2017. Among the first, Wang [12] introduced LIAR - a relatively
large dataset of 12, 000 truth-checked and multi-class labeled short news statements from the
political area. To this novel data, he applied various algorithms and deep learning architectures
such as Support Vector Machines(SVM), Logistic Regression, Bi-LSTM, and Convolutional
Neural Network (CNN).
   The domain of automated fake news detection and fake news classification across languages
was not studied in depth in previous research. Though [13] trained their models with news
texts from one language to classify them in another, their approach relied on prior machine
translation as a pre-processing step. The lack of cross-language classification studies in fake
news detection was due to the absence of appropriate data. Yet, recently[14] presented a
multilingual dataset that offers a possibility to study transfer learning in fake news classification.
Utilizing the embedding of XLM-R without fine-tuning the model, the authors reported accuracy
of 82% for transfer classification from Italian to French. As their overall results show room for
improvement, this study aims to extend the proposed model further to gain valuable insights
from transfer learning.


3. Dataset
In the experimentation phase, we used the dataset for task 3 from shared task organizers [15, 16].
The dataset contains train, validation, and test sets for English and test data for the German
language. The training data includes about 1, 300 articles in English with the respective labels
such as true, partially false, false, or other [17]. The features are as follows:

    • ID is a unique identifier to identify each instance uniquely.
    • Text is the content of the news.
    • Title is the headlines of the news.
    • Label is the class assigned to the instance as true, partially false, false, or other.

   The "partially false" label is associated with articles that contain partially true and partially
false information but cannot be considered 100% true. While the "other" label is assigned to
articles that cannot be categorized as true, false, or partially false due to the lack of evidence
about their claims.
   Figure 1 shows training and validation dataset statistics and the imbalance between the four
classes in both the training and validation dataset. The training dataset is slightly imbalanced
46% of the texts labeled as false, 28% were labelled as partially false, 17% were labelled as true,
and 9% were labelled as other. This shows that approximately half of the training dataset was
labelled as false, and approximately one-third was labelled as partially false.
                 (a) Training data                                (b) Validation data
Figure 1: Label distribution of training and validation dataset


4. Methodology
For fake news detection, we used three models: Passive Aggressive Classifier (PAC), a machine
learning algorithm; Bi-LSTM, a deep learning algorithm; and RoBERTa [18] a pre-trained
language model from the BERT family for fine-tuning. For task 3a, we used the English training
dataset that contains four labels to train the model with three different algorithms. We used the
English test dataset to test the models’ performance. For task 3b, we used the English dataset
to train the models and then tested them on the German test dataset to evaluate cross-lingual
performance. F1-score was used to measure the models’ efficiency.

4.1. Data pre-processing
Data pre-processing is one of the important steps in natural language processing (NLP) tasks.
We performed the following data pre-processing in order to prepare the data for training. We
performed several pre-processing steps, including removing unwanted characters, stop word
removal, as well as converting labels to integers.

4.2. Algorithms
In this section, we discuss algorithms used in this paper: passive aggressive classifier, a machine
learning algorithm, Bi-LSTM, a deep learning algorithm, and RoBERTa, a pre-trained language
model.

4.2.1. Passive Aggressive Classifier
Passive aggressive classification (PAC) is an online learning classifier that is used in cases where
there is a need to keep a 24 × 7 check on the data like news, social media, etc. [19]. PAC is
a noteworthy classifier among the online learning algorithms. The classification function is
updated if there is a misclassification in newly seen data or if a predetermined margin is not
exceeded by its classification score [20]. The input to PAC is a matrix of TF-IDF features. Thus,
a model is formed while trained on the training data and then applied to the test set to evaluate
its performance. We used the following parameters in training fake news detection using PAC,
tfidf_vectorizer to transform text into vector, max_df=0.7, C=0.5, max_iter=50, random_state=5.

4.2.2. Bi-LSTM
Deep learning models are widely used for linguistic modeling. Typical deep learning models
such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) can
detect complex patterns in textual data. Long Short-Term Memory (LSTM) is a tree-structured
recurrent neural network used to analyze variable-length sequential data [21]. Bi-directional
LSTM(Bi-LSTM) allows looking at a particular sequence both from front to back as well as
from back to front. Long short-term memory (LSTM) is a structure that learns how much of
the previous network state to apply when input data is received. By using the hidden state,
information can flow in both directions. At each time step, the outputs of the two LSTMs
are merged. This BiLSTM model helps remove the barriers of traditional RNNs. The BiLSTM
provides excellent accuracy and makes the context much easier to understand.able 1 shows the
parameters used for the Bi-LSTM model for our experiment.

Table 1
LSTM parameters used in model training
                              Parameters used in Bi-LSTM
                              Parameters    Values
                              hidden_units  128
                              embed_units   100
                              Dropout       0.2
                              learning_rate 0.0001
                              optimizer     adam
                              batch_size    32
                              loss          binary cross entropy
                              num_itr       50
                              activation    sigmoid
                              Total params 3,717,745

  After pre-processing the text we used tokenizer function from Keras to tokenize the input
and added padding to tokinized input. We converted tokenized input after padding to tensors
using convert to tensor. To build BiLSTM model we used the keras library package. Our BiLSTM
model contains embedding layer which is composed of vocab size of 34,172, embed units of 100,
input length of 5,840, dropout layers, a fully connected layer with 256 neurons, binary cross
entropy as loss function, adam as optimizer and relu activation. The model is trained using
batch size of 32, 50 number of iteration and 0.0001 learning rate.

4.2.3. RoBERTa
RoBERTa is a transformer model pre-trained on a large corpus of English data in a self-supervised
fashion. RoBERTa is an optimized BERT model re-trained with improved training methodology,
more data, and hardware resources proposed by [18]. RoBERTa without the next sentence
prediction concept is similar to BERT and employs dynamic masking, which results in changing
the masked token during the training epochs. RoBERTa is trained with dynamic masking ,
full-sentences without NSP (Next-Sentence Prediction) loss, large mini-batches and a larger
byte-level BPE (Byte-Pair Encoding). Furthermore, RoBERTa is pre-trained on more data, longer
sequences and with bigger batch sizes. Table 2 shows parameters used to train the RoBERTa
model.

Table 2
RoBERTa parameters used in model training
                                Parameters used in RoBERTa
                                Parameters    Values
                                hidden_units  128
                                Dropout       0.1
                                learning_rate 0.0001
                                optimizer     adam
                                batch_size    32
                                num_itr       20
                                activation    softmax
                                Total params 125,798,148

   The RoBERTa base model used for this paper was fine-tuned on a given fake news dataset.
We used a ”roberta-base” model from the Hugging Face library, which is already pre-trained.
For both tasks we added batch normalization layer to speed up training, to make learning easier
and a fully-connected output layer with a SoftMax function so that a probabilistic output of
all labels for fake news detection would be produced. For both tasks we fine tuned the model
using 20 number of iteration. For Task 3a, the model is fine-tuned and tested in English dataset
while for Task 3b we used English dataset for fine tuning and tested the model with German
test data without explicitly training in German dataset.


5. Result and Discussion
Table 3 presents the Accuracy, Precision, Recall, and F1- scores obtained by applying the
deep learning and transfer learning methods on the dataset provided by the organizers of the
CheckThat! 2022 fake news detection shared task at CLEF 2022. In this table, “Task" refers to
the monolingual task in English (Task 3a) and the cross-lingual task for English and German
(Task 3b). The“Model" refers to deep learning and transfer learning-based models applied in
this study.
   As shown in Table 3 for Task 3a (multi-class fake news classification), fine-tuning RoBERTa
model on the English dataset gives better results than using Bi-LSTM and PAC. This shows
that languages with fewer resources can benefit from using pre-trained models. For Task 3b,
cross-lingual fake news classification Bi-LSTM gives better results than RoBERTa and passive
classifier by only testing the model on the German test dataset. This shows that monolingual
pre-trained language models have lower performance on new languages if a large amount of
dataset does not efficiently fine-tune them. Figure 2 and 3 shows the training loss, validation
loss, as well as train and validation accuracy of the Bi-LSTM algorithm for task 3b. It can be
                 Task      Model        Accuracy     Precision   Recall   F1-score
                           PAC          0.51         0.29        0.25     0.2
                 Task 3a   Bi-LSTM      0.52         0.13        0.25     0.17
                           RoBERTa      0.47         0.36        0.34     0.29
                           PAC          -            -           -        -
                 Task 3b   Bi-LSTM      0.61         0.37        0.39     0.35
                           RoBERTa      0.28         0.13        0.26     0.17
Table 3
Results for Task 3a and Task 3b using three different models


observed that the validation loss is greater than the training loss after 20 epochs, and train
accuracy is very high, but the validation accuracy is low and constant as we increase the number
of epochs. This shows that the model is not predicting well on the validation data. It may
indicate that the model is underfitting and depends on the size of the datasets. The latter is
typical for deep learning algorithms.


Figure 2: BiLSTM train loss vs validation loss for Task 3b


  Figure 4 and 5 displays training loss, validation loss, and train and validation accuracy of the
RoBERTa algorithm for task 3b. It is seen that the model’s both training loss and validation loss
decrease and stabilize at a specific point. This indicates an optimal fit. Training and Validation
accuracy also demonstrate how the model performs on a new dataset. Our results demonstrate
that using pre-trained language models for low-resourced languages can give better results
than deep learning algorithms.


6. Conclusion and Future Work
This paper discussed some classification models for detecting multi-class and cross-lingual
fake news. RoBERTa pre-trained model performed well for the monolingual task compared
Figure 3: BiLSTM train accuracy vs validation accuracy for Task 3b


Figure 4: RoBERTa train and validation loss for Task 3b


to other Bi-LSTM and PAC algorithms we applied in our experiments. We also observed that
for cross-lingual fake news detection, Bi-LSTM performed well compared to RoBERTa. In the
future, we will explore how increasing the amount of data will influence the performance of
pre-trained models. We also suggest multilingual pre-trained models may improve cross-lingual
fake news classification.


Acknowledgments
The work was done with partial support from the Mexican Government through the grant
A1S-47854 of CONACYT, Mexico, grants 20220852, 20220859, and 20221627 of the Secretaría de
Figure 5: RoBERTa train and validation accuracy for Task 3b


Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the
CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje
Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE,
Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD
Award.


References
 [1] A. Thota, P. Tilak, S. Ahluwalia, N. Lohia, Fake news detection: a deep learning approach,
     SMU Data Science Review 1 (2018) 10.
 [2] K. Langin, Fake news spreads faster than true news on twitter—thanks to people, not bots,
     Science magazine (2018).
 [3] H. Allcott, M. Gentzkow, Social media and fake news in the 2016 election, Journal of
     economic perspectives 31 (2017) 211–36.
 [4] J. Gottfried, E. Shearer,           News use across social media platforms 2016,
     http://www.journalism.org/2016/05/26/news-use-across-social-me dia-platforms-2016/
     (2016).
 [5] C. Silverman, L. Alexander, How teens in the balkans are duping trump supporters with
     fake news. buzzfeed, 14 november, 2016.
 [6] N. Karimi, J. Gambrell, Hundreds die of poisoning in iran as fake news suggests methanol
     cure for virus, 2020.
 [7] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: A data
     mining perspective, ACM SIGKDD explorations newsletter 19 (2017) 22–36.
 [8] Z. Khanam, B. Alwasel, H. Sirafi, M. Rashid, Fake news detection using machine learning
     approaches, in: IOP Conference Series: Materials Science and Engineering, volume 1099,
     IOP Publishing, 2021, p. 012040.
 [9] V. L. Rubin, Y. Chen, N. K. Conroy, Deception detection for news: three types of fakes,
     Proceedings of the Association for Information Science and Technology 52 (2015) 1–4.
[10] D. Fallis, A functional analysis of disinformation, IConference 2014 Proceedings (2014).
[11] E. C. Tandoc Jr, Z. W. Lim, R. Ling, Defining “fake news” a typology of scholarly definitions,
     Digital journalism 6 (2018) 137–153.
[12] W. Y. Wang, " liar, liar pants on fire": A new benchmark dataset for fake news detection,
     arXiv preprint arXiv:1705.00648 (2017).
[13] S. K. W. Chu, R. Xie, Y. Wang, Cross-language fake news detection, Data and Information
     Management 5 (2021) 100–109.
[14] Y. Li, B. Jiang, K. Shu, H. Liu, Mm-covid: A multilingual and multimodal data repository
     for combating covid-19 disinformation, arXiv preprint arXiv:2011.04088 (2020).
[15] G. K. Shahi, J. M. Struß, T. Mandl, Overview of the clef-2021 checkthat! lab task 3 on fake
     news detection, Working Notes of CLEF (2021).
[16] G. K. Shahi, D. Nandini, FakeCovid – a multilingual cross-domain fact check news dataset
     for covid-19, in: Workshop Proceedings of the 14th International AAAI Conference on Web
     and Social Media, 2020. URL: http://workshop-proceedings.icwsm.org/pdf/2020_14.pdf.
[17] G. K. Shahi, A. Dirkson, T. A. Majchrzak, An exploratory study of covid-19 misinformation
     on twitter, Online Social Networks and Media 22 (2021) 100104.
[18] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
     arXiv:1907.11692 (2019).
[19] K. Nagashri, J. Sangeetha, Fake news detection using passive-aggressive classifier and other
     machine learning algorithms, in: Advances in Computing and Network Communications,
     Springer, 2021, pp. 221–233.
[20] J. Lu, P. Zhao, S. C. Hoi, Online passive-aggressive active learning, Machine Learning 103
     (2016) 141–183.
[21] P. Bahad, P. Saxena, R. Kamal, Fake news detection using bi-directional lstm-recurrent
     neural network, Procedia Computer Science 165 (2019) 74–82.