MUCIC at CheckThat! 2021: FaDo-Fake News
Detection and Domain Identification using
Transformers Ensembling
Fazlourrahman Balouchzahi1 , Hosahalli Lakshmaiah Shashirekha2 and
Grigori Sidorov1
1
    Center for Computing Research, Instituto Politécnico Nacional, CDMX, Mexico
2
    Department of Computer Science, Mangalore University, Mangalore, India


                                         Abstract
                                         Since the beginning of Covid-19 era in November 2019, the patient growth curve is closely accompa-
                                         nied by the growth of fake news. Therefore, developing tools and models for the detection of fake news
                                         from real ones in various domains have become more significant than the earlier days. To address the
                                         detection of fake news, in this paper, we, team MUCIC, describe the models submitted to ‘Fake News
                                         Detection’, a shared task organized by CLEF-2021-CheckThat! Lab. This shared task contains two sub-
                                         tasks namely; Fake News Detection of News Articles (Subtask 3A) and Topical Domain Classification
                                         of News Articles (Subtask 3B) and both are multi-class text classification tasks. The proposed models
                                         have been developed by fine-tuning the three transformer-based language models namely; Roberta, Dis-
                                         tilbert, and BERT from HuggingFace using training data and then ensembling them as estimators with
                                         majority voting. The proposed models performances evaluated through the evaluation script provided
                                         by organizers obtained F1-scores of 0.5309 and 0.8550 for Subtask 3A and Subtask 3B respectively.

                                         Keywords
                                         Fake News Detection, Domain Identification, Transformers, BERT, Roberta, Distilbert


1. Introduction
Anonymity is a significant attribute of the cyber world [1] which also provides ample opportu-
nities to mislead and manipulate peoples’ thoughts and damnify social trust [2]. Ease of access,
low cost, and swift broadcasting of the information in social media and network is exerting
negative influence on sharing fake news in various domains [3]. Fake news detection in different
domains has received much attention especially after the outbreak of Covid-19 and its effects
on the entire world.
   The nature of texts in social media is highly unstructured and noisy, and texts may belong
to various domains as people comment or share messages about various topics. Therefore,
identifying the domain of texts in which news are disseminating in social media is very important

CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
" frs1_b@yahoo.com (F. Balouchzahi); hlsrekha@gmail.com (H. L. Shashirekha); sidorov@cic.ipn.mx
(G. Sidorov)
~ https://mangaloreuniversity.ac.in/dr-h-l-shashirekha (H. L. Shashirekha); http://www.cic.ipn.mx/~sidorov/
(G. Sidorov)
 0000-0003-1937-3475 (F. Balouchzahi)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
and might help in designing the models/tools to detect fake news. Fake news detection and
domain identification tasks could be modeled as binary Text Classification (TC) problems if
there are only two classes, otherwise they will be modeled as multi-class TC problem.
   Hitherto many models based on Machine Learning (ML) and Deep Learning (DL) are experi-
mented by researchers for TC in general. Of late Transfer Learning (TL) and transformers-based
models are also getting attention due to their efficient performances in various Natural Language
Processing (NLP) and TC tasks [4, 5].
   Many DL and Neural Network (NN) based models such as CNN, LSTM, BiLSTM, etc. are
considered as best models for many NLP and TC tasks. But, the introduction of transformers
have changed the game and since 2017 transformer1 -based models are beating DL models
in many NLP related applications2 . Transformers are novel architectures with self-attention
mechanism that are primarily used for NLP. Transformer-based models are able to handle
long-range dependencies and usually are used for sequence-to-sequence NLP tasks3 .
   NLP researchers are challenged by Conference and Labs of the Evaluation Forum (CLEF)
20214 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims,
and Fake News5 [6]. Fake News Detection (Task 3)6 is one of the three tasks evaluated by the
CheckThat! Lab and this task has two subtasks. Figure 1 illustrates the graphical representation
of subtasks and details of the subtasks are as follows:

    • Subtask 3A - Multi-Class Fake News Detection of News Articles (English): is a
      multi-class TC task that accepts a given English text and classifies it into one of four
      pre-defined categories namely, ‘False’, ‘Partially False’, ‘True’, and ‘Other’. Table 1 gives
      the description of labels in Subtask 3A;
    • Subtask 3B - Topical Domain Classification of News Articles: is also a multi-class
      TC task that further classifies a given fake news into one of six categories representing
      six different domains namely, ‘Health’, ‘Climate’, ‘Economy’, ‘Crime’, ‘Elections’ and
      ‘Education’. It is worth to note that classification results of Subtask 3A are incommunicable
      to Subtask 3B and only known fake news provided by the organizers are used for Subtask
      3B.

   Many tools and algorithms have been introduced by researchers for TC in general. However,
an algorithm which performs well for a particular dataset may not give the same performance
for other datasets [3]. Therefore, it is not logical to claim that an algorithm or a model leads
to the same performance for all the datasets. As a result, inspired by H. L. Shashirekha et al.
[3] and Gundapu S et al. [7], we, team MUCIC, utilized available at HuggingFace7 to develop
transformer-based models, namely, BERT, Distilbert, and Roberta, extended each of these
models with a step of fine-tuning the respective Language Model (LM) and finally ensembled
the extended models with majority voting for fake news detection and domain identification.
   1
     https://pypi.org/project/transformers/
   2
     https://yale-lily.github.io/public/matt_f2018.pdf
   3
     https://towardsdatascience.com/transformers-89034557de14
   4
     http://clef2021.clef-initiative.eu/
   5
     https://sites.google.com/view/clef2021-checkthat/home
   6
     https://sites.google.com/view/clef2021-checkthat/tasks/task-3-fake-news-detection
   7
     https://huggingface.co/
Figure 1: Graphical representation of Fake News Detection subtasks


Table 1
Classes in Subtask 3A
 Category           Description
 False              The main content of given text is fake.
 Partially false    Main claim in given text might be True but also contain false or misleading
                    information, not surely true and not certainly false.
 True               The given text includes contents that are clearly apparent or capable of being logically proved.
 Other              Texts with no enough evidence to categorize as of one of earlier categories.


  The rest of paper is organized as follows: Section 2 highlights the recent literature and related
works followed by the proposed Ensemble of transformer-based model for fake news detection
and domain identification in Section 3. While Section 4 describes the experiments and results,
Section 5 concludes the paper with future scope.


2. Related Work
Alongside the outbreak of Covid-19, fake news detection task has gained more and more
attention due to critical situation the entire globe is facing. Many researchers have developed
many models to combat fake news and avoid its spread thereby. Few of the latest and related
works are briefly discussed in this section. Gundapu et al. [7] have proposed several models
based on ML, DL and TL approaches for the task of fake news detection in English language in
which a given news item is categorized as either ‘fake’ or ‘real’. They conducted experiments
on the dataset provided by ConstraintAI’218 shared task organizers and their ensemble of
transformer-based models outperformed the rest of the models with an F1-score of 0.9855. The

    8
        https://constraint-shared-task-2021.github.io/
dataset as described in detail by Patwa et al. [8] consists of 10,700 texts from various social media
platforms such as Instagram, Twitter, etc. Their proposed transformer-based ensemble model
architecture consists of preprocessing and ensemble model construction. Data is preprocessed by
converting emoji and hashtags to text, removing punctuation, digits, non-ASCII characters, stop
words and stemming. Three transformer-based models namely, BERT, ALBERT, and XLNet are
ensembled in such way that the average of all softmax values from estimators gives probability
of the final predicted label.
   A multilingual cross-domain dataset containing 5,182 fact-checked news articles related to
Covid-19 is developed by Shahi et al. [9] by collecting articles from 92 fact checking websites
shared during the first five months of 2020. The collected articles include texts in 40 languages,
categorized into 11 classes and then filtered into two categories namely, ‘False’ and ‘Other’. 4,132
texts which belong to ‘False’ category contain false information and the remaining belongs to
‘Other’ category. Authors used Bert-based model as a baseline and obtained an average F1-score
of 0.76 on identifying fake news for the developed dataset.
   Another work towards fake news detection in Covid-19 domain carried out by Paka et. al [10]
includes developing a COVID-19 (Twitter) Fake news (CTF) dataset. CTF is a large-scale text
dataset in Covid-19 domain collected from Twitter containing 21.85M unlabelled tweets along
with 45.26K labeled tweets; out of which 18.55K are labeled as genuine and remaining 26.71K as
fake. Data collection and annotation procedure has been done in four stages namely, Segregating
COVID-19 related tweets, Collecting COVID-19 supporting statements, Filtering genuine and
fake tweets, and Human annotation. The authenticity of the CTF dataset is guaranteed by
fact checking websites such as PolitiFact, Snopes, TruthOrFiction, etc. and certain health
organizations. The authors proposed a semi-supervised DL model based on Neural Attention
model called Cross-SEAN which leverages huge unlabelled data to improve its performance
and obtained F1-score of 0.95 on CTF dataset.
   Random Forest (RF), k-Nearest Neighbor (kNN), Logistic Regression (LR) and Support Vector
Machine (SVM), with TF and TF-IDF as features and also DL models, namely, Convolutional
Neural Network (CNN), Long Short Term Memory (LSTM), and Gated Recurrent Network (GRU),
utilizing Glove word embedding have been surveyed by Jiang et al. [11] for fake news detection.
Further, the authors proposed a stacking approach using the above mentioned classifiers and
evaluated that model on two fake news datasets namely, ISOT9 and KDnugget [12] and obtained
accuracies of 99.94% and 96.05% respectively. However, in terms of individual performance of
each estimator, LR classifier with an accuracy of 92.82% and RF with an accuracy of 99.87% both
fed with TF-IDF as features outperformed other individual classifiers for both the datasets.
   Despite the lack of availability of annotated dataset for fake news detection in low resource
languages, Forum for Information Retrieval Evaluation (FIRE) 202010 called for UrduFake11 , a
shared task on fake news detection in Urdu language and provided a dataset consisting of 500
real and 400 fake news articles in Urdu [13]. Balouchzahi et al. [14] proposed various models
based on ML, DL, TL, and hybrid approaches for UrduFake. ML model is a Voting Classifier (VC)
with three estimators namely, Multinomial Naïve Bayes (MNB), Multilayer Perceptron (MLP),

    9
      https://www.uvic.ca/engineering/ece/isot/datasets/fake-news/index.php
   10
      http://fire.irsi.res.in/fire/2020/home
   11
      https://www.urdufake2020.cicling.org/home
and LR, had been fed with set of char and word ngrams features. While DL model has been
developed based on a multi-channel CNN and Skipgram word embedding, an implementation of
Universal Language Model Fine-Tuning (ULMFiT) is based on TL approach. Further, all models
are ensembled as an hybrid model based on majority voting. ML VC outperformed the rest of
the models with an average F1-score of 0.7894.
   Tools and modules for the analysis of fake news on social media are not only bounded to
detect the category of a given text but also to identify the spreaders of such false information to
find out the intention beyond sharing fake news. In this direction, PAN at Conference and Labs
of the Evaluation Forum (CLEF) 2020 had called for shared task to identify the spreader of fake
news in Spanish and English. Dataset provided by task organizers consists of 100 tweets per
user (news spreaders) for 300 users. To tackle this task Shashirekha et al. [2, 3] proposed two
classifiers namely, ULMFiT based on TL approach and ensemble of ML classifiers as a voting
classifier. The ULMFiT model initially uses unlabelled data collection from Wikipedia to train
the universal LMs for both English and Spanish languages. As texts from Wikipedia are from
general domain, there is every chance of missing valuable words and features related to fake
news. Hence, the LMs obtained by training on Wikipedia are fine-tuned using the training
data provided by PAN to make LMs more specific for the task of fake news spreader detection.
The Fast.ai12 library is used to develop LMs and classifiers. The proposed ML VC is built with
three estimators, namely, two linear SVM and LR classifiers. After preprocessing the texts by
removing stopwords and punctuation, converting emoji to text and performing lemmatization,
features are extracted. Unigram TFIDF, N-gram TF combined with Doc2vec features are scaled
using MaxAbsScaler and used to train the VC. Chi-square test, Mutual Information, and F-test
are used to select the important features. The final results reported by PAN illustrate that
ULMFiT and ML VC obtained average accuracies of 0.63 and 0.70 respectively.


3. Methodology
The performance and effectiveness of ensembling models based on majority voting of ML
classifiers have already been proved for many tasks. Inspired by Gundapu S et al. [7], ensembling
fine tuned transformer-based models are experimented in this work for fake news detection and
domain identification. Transformer-based models generally include two steps of pre-training
LM and then fine-tuning the model for the new task. Usually as pre-trained models are publicly
available at HuggingFace it is only required to fine-tune LM with a labeled dataset for the target
task. The structure of the proposed model consists of the following four steps:
   1. Pre-training: pre-trained models available at HuggingFace are used;
   2. Fine-tuning LMs with texts from training set to make LM more domain specific for the
      intended task;
   3. Training the models obtained in step 2 for classification (each model separately);
   4. Ensembling for prediction of labels based on majority voting.
As domain of texts used for pre-training a LM might be different from the training set, initially
the individual LMs for transformer-based models namely, BERT [15], Distilbert [16], and
   12
        https://www.fast.ai/
Table 2
Configuration of transformer-based models

 Model             Type                           Max Length        Batch size     Learning rate     epochs
 Roberta           roberta-base
 Distilbert        distilbert-base-uncased             512              16               4e-5          5
 BERT              bert-base-uncased


Roberta [17] are fine-tuned using the training data and then these fine-tuned models are used
as estimators in a VC. The main objectives of each estimator are given below:

    • BERT is a pre-trained model that employs Masked Language Modeling (MLM) in a self-
      supervised fashion. In other words, only raw texts are used to pre-train the model without
      manual annotation. MLM concept along with Next Sentence Prediction (NSP) enables the
      model to learn deep representation of a language to extract more efficient features from
      texts for downstream tasks.
    • Distilbert is a lightweight BERT model. It follows the objectives of BERT model with
      distillation loss and returns the same probabilities as BERT and it also utilizes Cosine
      embedding concept to generate hidden states as close as BERT model.
    • Roberta is an optimized BERT model re-trained with improved training methodology,
      more data and hardware resources13 . Roberta without NSP concept is similar to BERT
      and employs dynamic masking results in changing the masked token during the training
      epochs.

   Fast-bert14 library with configuration given in Table 2 is used to fine-tune the LMs in (step
2). Except for model and type, the same configuration is used for all the three LMs and fine-
tuned only for 5 epochs due to resource (RAM and GPU) constraint. Fast-bert associated with
HuggingFace enables a very simple manner to train and evaluate the transformer-based models.
Each individual transformer-based model is trained for 20 epochs.
   As per the general idea of the proposed model illustrated in Figure 2, preprocessing the texts
has been skipped as the models performed better without preprocessing. The same architecture
is followed to construct models for both the subtasks.


4. Experimental Results
For any supervised ML task such as TC, annotated data sets are essential to train the model and
enable machine to quickly and clearly understand the input patterns [18]. Therefore, datasets
provided by the CheckThat! Lab are used and the data collections steps are detailed by Shahi et
al. [19].
   Due to mistakes while generating prediction files for the task, the results of the proposed
models reported in the leaderboard are much less than the actual performances of the systems.
   13
        https://towardsdatascience.com/bert-roberta-distilbert-xlnet-which-one-to-use-3d5ab82ba5f8
   14
        https://github.com/utterworks/fast-bert
Figure 2: Proposed Ensemble of Transformer-based models


Therefore, along with the results reported by task organizers, the non-official results, i.e., the
actual performances of system evaluated through evaluation script15 provided by the organizer
on re-generated prediction files are also included in this paper. The re-generated and re-evaluated
submissions for both the subtasks can be found in our GitHub page16 .
   The training set for Subtask 3A consists of 900 texts in four categories namely, ‘False’, ‘Partially
False’, ‘True’, and ‘Other’. Description of labels is given in Table 1 and label distributions over
training sets are given in Figure 3a. It can be observed that the dataset is highly imbalanced
and as expected this has affected the performance of the proposed model in a negative way. As
per the results reported in leaderboard, for the test set consisting of 364 texts, the proposed
model obtained an F1-score of 0.2334 which is far lower than the expectation due to earlier
mentioned reason of mistakes in prediction files. However, non-officially the model obtained
0.5034 F1-score on re-generated prediction files.
   A subset of fake news from the dataset of Subtask 3A is used for Subtask 3B. The dataset for
Subtask 3B which includes 318 texts distributed into 6 categories is also imbalanced and the
distribution of labels is shown in Figure 3b. Based on results reported in leaderboard, for the
test set consisting of 137 texts, the proposed model obtained an F1-score of 0.1450 which is far
less than the expectation but actual performance obtained non-officially is 0.8550 F1-score on
re-generated prediction files.
   Comparison of results both official (in leaderboard) and non-official, between the proposed
models and top models in the shared task are given in Table 3. It is very much clear that there is
a huge gap between the results reported officially and the results obtained non-officially. Con-
sidering actual performances of the proposed model once again the effectiveness of ensembling
classifiers to utilize the strength of single models has been proved.


   15
        https://gitlab.com/checkthat_lab/clef2021-checkthat-lab/-/tree/master/task3/evaluation
   16
        https://github.com/fazlfrs/CheckThat-_Task3_Submissions
Figure 3: Label distribution over training sets for subtasks


Table 3
Comparison of performances of the top models and proposed models in the shared task
              Team/ participant name                             F1-score
                                                    Subtask 3A              Subtask 3B
              sushmakumari                            0.8376                  0.8552
              MUCIC(non-official)                     0.5039                  0.8550
              kannanrrk                               0.5034                  0.8178
              jmartinez595                            0.4680                     –
              MUCIC(official in leaderboard)          0.2334                  0.1450


5. Conclusion and Future Work
In this paper, we, team, MUCIC, have presented the description of the proposed Ensemble of
transformer-based VC models for Fake News Detection, a shared task (Task 3) in CLEF-2021-
CheckThat! Lab. Three transformer-based models namely, Roberta, Distilbert, and BERT are
double fine-tuned (once on respective LM and then down streamed for respective TC task)
and ensembled as VC that predicts the label of a given text by majority voting. The proposed
models achieved low F1-scores of 0.233 and 0.145 for Subtask 3A: Multi-Class Fake News
Detection of News Articles and Subtask 3B: Topical Domain Classification of News Articles
respectively, against our expectations due to our mistakes in submission files. However, the
actual performances of the systems show very competitive F1-scores of 0.5309 and 0.8550
for Subtask 3A and Subtask 3B respectively, on re-generated prediction files. Improving the
performances of the proposed models by addressing the problems followed by exploring ML
and DL approaches with various feature sets will be the future work.
6. Acknowledgment
Team MUCIC sincerely appreciates the efforts, guidance and support of the shared task organiz-
ers and reviewers for the valuable comments and suggestions.


References
 [1] F. Balouchzahi, H. L. Shashirekha, Las for HASOC - learning approaches for hate speech
     and offensive content identification, in: P. Mehta, T. Mandl, P. Majumder, M. Mitra (Eds.),
     Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, Hyderabad,
     India, December 16-20, 2020, volume 2826 of CEUR Workshop Proceedings, CEUR-WS.org,
     2020, pp. 145–151. URL: http://ceur-ws.org/Vol-2826/T2-6.pdf.
 [2] H. L. Shashirekha, F. Balouchzahi, Ulmfit for twitter fake news spreader profiling, in:
     L. Cappellato, C. Eickhoff, N. Ferro, A. Névéol (Eds.), Working Notes of CLEF 2020 -
     Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020,
     volume 2696 of CEUR Workshop Proceedings, CEUR-WS.org, 2020. URL: http://ceur-ws.org/
     Vol-2696/paper_126.pdf.
 [3] H. L. Shashirekha, M. D. Anusha, N. S. Prakash, Ensemble model for profiling fake news
     spreaders on twitter, in: L. Cappellato, C. Eickhoff, N. Ferro, A. Névéol (Eds.), Working
     Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece,
     September 22-25, 2020, volume 2696 of CEUR Workshop Proceedings, CEUR-WS.org, 2020.
     URL: http://ceur-ws.org/Vol-2696/paper_136.pdf.
 [4] S. Gao, M. Alawad, M. T. Young, J. Gounley, N. Schaefferkoetter, H.-J. Yoon, X.-C. Wu,
     E. B. Durbin, J. Doherty, A. Stroup, et al., Limitations of transformers on clinical text
     classification, IEEE journal of biomedical and health informatics (2021).
 [5] S. Prabhu, M. Mohamed, H. Misra, Multi-class text classification using bert-based active
     learning, arXiv preprint arXiv:2104.14289 (2021).
 [6] P. Nakov, G. D. S. Martino, T. Elsayed, A. Barrón-Cedeño, R. Míguez, S. Shaar, F. Alam,
     F. Haouari, M. Hasanain, N. Babulkov, A. Nikolov, G. K. Shahi, J. M. Struß, T. Mandl, The
     CLEF-2021 checkthat! lab on detecting check-worthy claims, previously fact-checked
     claims, and fake news, in: D. Hiemstra, M. Moens, J. Mothe, R. Perego, M. Potthast,
     F. Sebastiani (Eds.), Advances in Information Retrieval - 43rd European Conference on
     IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II,
     volume 12657 of Lecture Notes in Computer Science, Springer, 2021, pp. 639–649. URL: https:
     //doi.org/10.1007/978-3-030-72240-1_75. doi:10.1007/978-3-030-72240-1\_75.
 [7] S. Gundapu, R. Mamid, Transformer based automatic covid-19 fake news detection system,
     arXiv preprint arXiv:2101.00180 (2021).
 [8] P. Patwa, S. Sharma, S. PYKL, V. Guptha, G. Kumari, M. S. Akhtar, A. Ekbal, A. Das,
     T. Chakraborty, Fighting an infodemic: Covid-19 fake news dataset, arXiv preprint
     arXiv:2011.03327 (2020).
 [9] G. K. Shahi, D. Nandini, FakeCovid – a multilingual cross-domain fact check news dataset
     for covid-19, in: Workshop Proceedings of the 14th International AAAI Conference on Web
     and Social Media, 2020. URL: http://workshop-proceedings.icwsm.org/pdf/2020_14.pdf.
[10] W. S. Paka, R. Bansal, A. Kaushik, S. Sengupta, T. Chakraborty, Cross-sean: A cross-stitch
     semi-supervised neural attention model for covid-19 fake news detection, Applied Soft
     Computing 107 (2021) 107393.
[11] T. Jiang, J. P. Li, A. U. Haq, A. Saboor, A. Ali, A novel stacking approach for accurate
     detection of fake news, IEEE Access 9 (2021) 22626–22639.
[12] P. H. A. Faustini, T. F. Covões, Fake news detection in multiple platforms and languages,
     Expert Systems with Applications 158 (2020) 113503.
[13] M. Amjad, G. Sidorov, A. Zhila, Data augmentation using machine translation for fake
     news detection in the urdu language, in: Proceedings of The 12th Language Resources
     and Evaluation Conference, 2020, pp. 2537–2542.
[14] F. Balouchzahi, H. L. Shashirekha, Learning models for urdu fake news detection, in:
     P. Mehta, T. Mandl, P. Majumder, M. Mitra (Eds.), Working Notes of FIRE 2020 - Forum for
     Information Retrieval Evaluation, Hyderabad, India, December 16-20, 2020, volume 2826 of
     CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 474–479. URL: http://ceur-ws.org/
     Vol-2826/T3-7.pdf.
[15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[16] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
     faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019).
[17] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
     arXiv:1907.11692 (2019).
[18] G. K. Shahi, Amused: An annotation framework of multi-modal social media data, arXiv
     preprint arXiv:2010.00502 (2020).
[19] G. K. Shahi, J. M. Struß, T. Mandl, Overview of the CLEF-2021 CheckThat! lab task 3
     on fake news detection, in: Working Notes of CLEF 2021—Conference and Labs of the
     Evaluation Forum, CLEF ’2021, Bucharest, Romania (online), 2021.