=Paper= {{Paper |id=Vol-3180/paper-30 |storemode=property |title=Overview of the CLEF-2022 CheckThat! Lab: Task 3 on Fake News Detection |pdfUrl=https://ceur-ws.org/Vol-3180/paper-30.pdf |volume=Vol-3180 |authors=Juliane Köhler,Gautam Kishore Shahi,Julia Maria Struß,Michael Wiegand,Melanie Siegel,Thomas Mandl,Mina Schütz |dblpUrl=https://dblp.org/rec/conf/clef/KohlerSSWS0S22 }} ==Overview of the CLEF-2022 CheckThat! Lab: Task 3 on Fake News Detection== https://ceur-ws.org/Vol-3180/paper-30.pdf
Overview of the CLEF-2022 CheckThat! Lab: Task 3
on Fake News Detection
Juliane Köhler1 , Gautam Kishore Shahi2 , Julia Maria Struß1 , Michael Wiegand3 ,
Melanie Siegel4 , Thomas Mandl5 and Mina Schütz6
1
  University of Applied Sciences Potsdam, Germany
2
  University of Duisburg-Essen, Germany
3
  Alpen-Adria-Universität Klagenfurt, Austria
4
  Darmstadt University of Applied Sciences, Germany
5
  University of Hildesheim, Germany
6
  AIT Austrian Institute of Technology GmbH, Austria


                                         Abstract
                                         This paper describes the results of the CheckThat! Lab 2022 Task 3. This is the fifth edition of the lab,
                                         which concentrates on the evaluation of technologies supporting three tasks related to factuality. Task 3
                                         is designed as a multi-class classification problem and focuses on the veracity of German and English
                                         news articles. The German subtask is ought to be solved using an cross-lingual approach while the
                                         English subtask was offered as mono-lingual task. The participants of the lab were provided an English
                                         training, development and test dataset as well as a German test dataset. In total, 25 teams submitted
                                         successful runs for the English subtask and 8 for the German subtask. The best performing system for
                                         the mono-lingual subtask achieved a macro F1-score of 0.339. The best system for the cross-lingual
                                         task achieved a macro F1-score of 0.242. In the paper at hand we will elaborate on the process of data
                                         collection, the task setup, the evaluation results and give a brief overview of the participating systems.

                                         Keywords
                                         Misinformation, Fake News, Text Classification, Evaluation, Deep Learning, Machine Learning




1. Introduction
During the COVID-19 pandemic, the World Health Organization did not only speak of a
pandemic, but even an Infodemic, due to the vast amount of false information about the
disease1 . Regarding the huge societal impact of misinformation, research on the topic received
a lot of attention in the past years. To counter the spread of wrong and potentially harmful
information, researchers explored different approaches to detect fake news in different media
forms such as social media [1, 2], news papers [3, 4], deep fakes [5, 6] and others [7, 8]. The
CheckThat! Lab at CLEF contributes to those research efforts by offering tasks along the

CLEF 2022 – Conference and Labs of the Evaluation Forum, September 5-8, 2022, Bologna, Italy
" juliane.koehler@fh-potsdam.de (J. Köhler); gautam.shahi@uni-due.de (G. K. Shahi); struss@fh-potsdam.de
(J. M. Struß); michael.wiegand@aau.at (M. Wiegand); melanie.siegel@h-da.de (M. Siegel);
mandl@uni-hildesheim.de (T. Mandl); mina.schuetz@ait.ac.at.at (M. Schütz)
 0000-0002-7175-5895 (J. Köhler); 0000-0001-6168-0132 (G. K. Shahi); 0000-0001-9133-4978 (J. M. Struß);
0000-0002-5403-1078 (M. Wiegand); 0000-0002-8398-9699 (T. Mandl)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings           CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073




                  1
                      https://www.who.int/health-topics/infodemic#tab=tab_1
full verification pipeline with high quality data and corresponding evaluation environments.
Thus, fostering the development of approaches to perform fake news identification and provide
tools to support individuals. This lab offers three tasks [9, 10] which are all described in the
lab’s overview paper [11]. The paper at hand provides a detailed overview of Task 3 of the
CheckThat! Lab 2022. The task focuses on predicting the truthfulness of articles and is further
elaborated in section 3. Furthermore, it addresses the challenge of providing a dataset of
genuine information and misinformation in news articles. While users might be careful to
trust low-quality information on social media applications, false information in news articles
poses a threat, as victims might be less suspicious. Accordingly, efforts must be made to create
high-quality datasets that are needed for the conduction of fruitful experiments.
   The remainder of this paper is organized as follows: Section 2 gives an overview on the
state-of-the-art research, section 3 provides detailed information on the task, while section 4
explains the process of data collection. Section 5 presents the evaluation results and gives an
overview of the participants approaches. Section 6 describes our baseline classifier and lists
detailed descriptions of the different approaches used the by the individual participating teams.
Finally, we provide a brief conclusion and an outlook on potential future work in section 7.


2. Related Work
Much work was recently dedicated towards the identification of misinformation in general
[12, 13, 14] and in particular in social media [15, 16]. Fake news detection for social media
poses several challenges which require more research. Among them are visual content [8] and
fast dissemination [1, 2]. News articles can be considered as less complex than social media.
Nevertheless, current detection systems can still not provide satisfying results as the Task 3a of
CheckThat! 2021 showed [17]. Furthermore, most studies only model fake news detection as
a binary classification problem [1, 2, 3, 4, 14, 18, 19, 20]. Task 3 of the CheckThat! 2022 Lab
therefore offers a task on multi-class classification of news articles.
   Several other initiatives related to the CheckThat! Lab at CLEF aim to advance research
on fake news detection. MediaEval 2021 offered a text-based fake news and conspiracy theory
detection task with three subtasks [21]. Their data originates from Twitter posts and news
articles and the main topic in the data is COVID-19. A similar task was already hosted by
MediaEval 2020 [22] focusing on COVID-19 and 5G conspiracy theories.
   RumourEval [23, 24], as part of SemEval, addressed stance detection and also classifying
tweets according to the truthfulness. Other SemEval tasks concerned stance [25], and propa-
ganda detection [26] as well as fact-checking in community question answering forums [27].
FEVER [28, 29] focused on Wikipedia data for supporting or invalidating claims.
   In 2019, the Qatar International Fake News Detection and Annotation Contest 2 was conducted.
The task description is similar to last year’s iteration of CheckThat! ’s Task 3 [17]. The
first subtask focused on the classification of news articles, detecting if an article is fake or
legitimate [30]. The second subtask was on deciding on the topical domain of a news article
and the third addressed the automatic distinction of human and bot accounts on Twitter.

   2
       https://sites.google.com/view/fakenews-contest
   Another related shared task is the Fake News Challenge Stage 1 (FNC-I)3 , which centered
around stance detection. The aim was to develop automatic systems that, given a random pair
of a title and the body of two different or the same article, classify the pair in one of four stance
classes: Agrees (if the body text agrees with the title), Disagrees (if the body text disagrees with
the title), Discusses (if the body text discusses the title without taking a position), Unrelated (if
the body text and title are unrelated). Participants were given a dataset that consists of titles
and bodies of news articles as well as a the stance dataset 4 . In the latter, a pair of title and body
were allocated the according stance. The most successful submission5 applied both a XGBoost
classifier and a 1D convolutional neural network classifier. The weighted average of those two
classifiers was taken as output.
   The technology for detecting misinformation can be broadly categorised into knowledge based
approaches which use knowledge bases and compare claims to them in some way (e.g. [31]) and
text classification approaches which learn to distinguish between texts with wrong information
and texts with correct information based on examples (e.g. [32]). Task 3 of CheckThat! 2022
is dedicated to evaluate text classification methods.


3. Task Description
The CheckThat! Task 3 evaluates systems which predict the veracity of news articles and is
designed as a multi-class classification problem. In 2022, the second iteration of the task was
conducted. As in 2021, the task is offered as a monolingual task in English. Additionally – in
line with the general CLEF mission – the task was also offered as a cross-lingual task this year,
providing English training and German test data. The overall problem definition is equivalent
to Subtask 3A from last year’s task:

Task 3: Multi-class fake news detection of news articles. Given the text and the title of
a news article, determine whether the main claim made in the article is true, partially true, false,
or other. The four categories were proposed based on Shahi et al. [33, 34] and the definitions
for the four categories are as follows:

  False: The main claim made in an article is untrue.

  Partially False: The main claim of an article is a mixture of true and false information. It
  includes articles in categories like partially false, partially true, mostly true, miscaptioned,
  misleading etc., as defined by different fact-checking services.

  True: This rating indicates that the primary elements of the main claim are demonstrably
  true.

  Other: An article that cannot be categorised as true, false, or partially false due to lack of
  evidence about its claims. This category includes articles in dispute and unproven articles.

    3
      http://www.fakenewschallenge.org/
    4
      https://github.com/FakeNewsChallenge/fnc-1
    5
      https://github.com/Cisco-Talos/fnc-1
Figure 1: Overview of data crawling from fact-checked articles.


4. Data Description
For this task, we aimed for two high-quality, real-life fake news datasets that address a wide
range of topics in two languages: English and German. Fact-checking claims is not only a
time-consuming task, but also requires training and experience. Therefore, to ensure high data
quality, we relied on expert evaluations of claims in news articles that were documented on
fact-checking services’ websites. In the following section, the process of data collection and the
dataset itself will be described in detail. A summary of the approach is depicted in Figure 1.

4.1. Crawling Fact-Checking Reports
The general procedure for data crawling was adopted from last year’s iteration of the task [17]:
First, fact-checking sites with an appropriate structure for crawling and focus had to be found.
For the English data, the same fact-checking sites were used as last year. For the German data,
an analysis of available fact-checking sites was conducted, and seven websites were judged as
suitable for crawling regarding the purposes of this task. The crawling process was based on
the AMUSED framework [35]. From each fact-checking site, we collected the report about a
claim, the experts’ judgement on the truthfulness of the claim, links to the potential source of a
claim, and information on the type of the source (e. g. news article, social media post, video).
Based on the source type, automatic filtering was applied, removing all social media posts and
non-textual documents.
   If available, the metadata was accessed via the JSON format using the ClaimReview-type
defined by Schema.org. However, many websites did not make use of this format and a clear
general position of a source link did not exist. Therefore, the first three links of a report were
collected, based on the observation that one of those usually referred to the claim source. A
manual check of the collected links given in the reports followed to identify the correct source
link. For the German data around 1,300 links from roughly 650 reports were manually examined
this way. For the English data the same was done for around 1,100 links from roughly 780
reports.
   Furthermore, to generate more articles for the category true, we made use of sources the
fact-checking experts referred to, in order to validate and proof their judgements. This decision
was made, because those references were implicitly judged as reliable. On top of that, these
articles covered similar topics as the original claim, thus counteracting a topical bias between
classes. The collection of the according URLs was done manually as well.
   After the collection and evaluation of links, roughly 1,500 articles (779 German, 711 English)
remained for scraping.

4.2. Scraping Articles from the Web
From the remaining article candidates and their corresponding links; title and text were extracted
in an automatic scraping process. Due to very diverse websites, the creation of tailored scrapers
was not feasible. Instead, the h1-tags were extracted as titles and the contents of the p-tags
as text, excluding footer contents. For data with missing titles and texts or with a text length
below a threshold of 500 characters, the according articles were checked manually again. Often,
those articles were not extracted correctly, sitting behind a paywall, or having already removed
for the public. If possible, the missing values were added to the data manually or otherwise
the article was deleted. Since many fact checking sites relied on archived versions of an article,
different URLs sometimes lead to the same content. Those duplicates were also removed from
the corpus.

4.3. Data Set for Task 3
In total, we relied on data from 20 different websites with AFP being used for both languages).
Each fact-checking agency made use of their customized labels (see Table 1 for examples). Thus,
instead of providing the original labels, we merged the labels with a similar meaning to one of
our four categories: false, partially false, true or other.
   The participants of Task 3 were provided an English training set that consisted of last year’s
training set (900 articles) and an English development set that served as test set in the last
iteration (364 articles). The newly collected data was given as test set without labels to the
participants. The test sets consist of the title and text from 612 English and 586 German articles.
An overview about the distribution of the different classes in the fake news detection corpus,
CT-FAN-22 [36], is given in Table 3. Each dataset included a unique identifier that was given to
each individual article, the title of an article, the main text of an article and the respective class
label. Table 2 shows some sample data.
Table 1
Examples for labels merged to build the final set of classes for task 3
 task label         original label English                         original label German
 false              fake, false, inaccurate, inaccurate with       Falsch, Stimmt nicht, Fälschung, Frei er-
                    consideration, incorrect, not true, pants-     funden, Manipuliert, Fälschung
                    fire
 partially false    half-true, imprecise, mixed, partiallyIrreführend, Teilweise falsch, Größten-
                    true, partly true, barely-true, misleading
                                                          teils falsch, Gößtenteils richtig, Stimmt
                                                          eher nicht, Halbwahrheit
 true               accurate, correct, true               Wahr, Stimmt, Richtig
 other              debated, other, unclear, unknown, un- Keine Beweise, Keine Belege, Keine Hin-
                    supported                             weise, Unbelegt




Table 2
Sample data for task 3
 public_id                title                             text                                      rating
 8666812565752130848 Geheime Planspiele von                 Es ist der große Plan B, über             False
 7648037552383161291 „Grünen“ und SPD-Linken:               den vor der Wahl nichts nach
                     Esken soll an Stelle von               draußen dringen darf: Auch wenn
                     Scholz Kanzlerin werden!               SPD-Kanzlerkandidat Olaf Scholz
                                                            am Sonntag als Sieger aus der
                                                            Bundestagswahl hervorgehen sollte,
                                                            könnten „Grüne“ und SPD-Linke
                                                            ihn auf dem Weg ins Kanzleramt
                                                            noch zu Fall bringen. [...]6
 27007788890674305467 Quebec’s     liquor    and            Health Minister Christian Dubé            True
 3881300019061126049 cannabis stores will require           hopes the measure encourages
                      vaccine passport as of Jan.           more Quebecers to get vaccinated.
                      18                                    Quebecers will have to present
                                                            proof of vaccination to access the
                                                            province’s liquor and cannabis
                                                            stores as of Tuesday, Jan. 18. [...]7




5. Submissions and Results
In this section, we present an overview of all submissions for Task 3 of the CheckThat! lab 2022.
Each team could submit up to 200 runs. Yet, only the last submission was taken into account for
the evaluation. In total, there were 32 submissions evaluated for detecting English fake news and
    6
      https://deutschlandkurier.de/2021/09/geheime-planspiele-von-gruenen-und-spd-linken-esken-soll-an-stelle-
von-scholz-kanzlerin-werden/
    7
      https://montrealgazette.com/news/local-news/quebecs-liquor-and-cannabis-stores-will-require-vaccine-
passport-as-of-jan-18
Table 3
The number of documents and class distribution for the CT-FAN-22 corpus for English (left) and German
(right) fake news detection
       Class             Training   Development       Test              Class             Test
       False               465            113         315               False             191
       True                142             69         210               True              243
       Partially False     217            141          56               Partially False    97
       Other                76             41          31               Other              55
       Total               900            364         612               Total             586




16 submissions for the German part of the task. Out of the 32 submissions for English detection,
7 were rejected due to the submission being either incorrectly formatted or incomplete. The
same is true for half, that is, 8 of the submissions for the German detection. However, 6 out of
those 8 flawed submissions were rejected, because they contained classification results for the
English instead of the German test data.
   Out of 26 teams that successfully submitted a solution that fully complied with our format
specifications for at least one of the two subtasks, most teams only participated in the English
monolingual subtask (25 accepted submissions). Yet, 8 teams also successfully submitted runs
to the cross-lingual subtask.
   Both subtasks are classification tasks. Therefore, we used accuracy and the macro-F1 score
for evaluation and ranked the systems by the latter.
   Table 4 provides an overall summary of system performances in terms of macro-F1 score. In
both subtasks the best system score is still fairly low. This underlines the general difficulty of
both tasks. It comes as no surprise that the best score in the cross-lingual task is below that of
the monolingual task, since the latter is a more difficult problem.
   Table 4 shows that the baseline classifier, further described in Section 6.1, is very strong for
both subtasks, this is particularly true for the monolingual task. In both cases, it is notably
above the median. The median in the monolingual task is a bit closer to the best score than
in the cross-lingual task. This suggests that the number of stronger systems is greater in the
monolingual task.
   In both tasks, the range between worst and best score is still not insignificant. Despite the
fairly similar system design of the different participants within the same subtask as outlined in
section 6 (i.e. fine-tuning a transformer), there still seem to be several degrees of freedom which
have a major impact on overall performance (e.g. hyperparameter settings or the particular
choice of the language model).
   In Tables 5 and 6 the latest submission of each team for both subtasks is given.

5.1. Results of fake news categorization of news articles for English data
In total, 25 teams attempted to solve the first task, which was the the monolingual English task.
The best system for the monolingual subtask was submitted by team iCompass [37] (macro-
Table 4
Summary statistics for overall macro F1-scores in the two subtasks.


                  Subtask         #Teams     Baseline    Min     Max     Median
                  Monolingual        25        0.312     0.117   0.339    0.271
                  Cross-lingual      8         0.242     0.111   0.290    0.191



averaged F1 score: 0.339). They applied 𝑏𝑒𝑟𝑡-𝑏𝑎𝑠𝑒-𝑢𝑛𝑐𝑎𝑠𝑒𝑑 on title and main text seperately
and concatenated the results. The model was fine-tuned on the task-specific training data. They
also experimented with RoBERTa for which they got worse results. No additional external
resources were employed in the final classifier.
   The second-best system in this subtask was submitted by team NLP&IR@UNED [38] (macro-
averaged F1 score: 0.332). The team made use of an ensemble classifier. It was built out of a
Funnel Transformer and a Feed Forward Neural Network. The features were extracted by the
𝐿𝐼𝑊 𝐶 text analysis tool.
   A more detailed discussion of the different approaches is given in section 6.

5.2. Results of fake news categorization of news articles for German data
In total 8 teams attempted to solve the second subtask, which was the English-German cross-
lingual task. Team ur-iw-hnt [39], as the team with the most successful submission (macro-
averaged F1 score: 0.290), translated the first 5000 tokens of an article from the German test
data using the service of Google Translate. They applied an extractive summarization technique
and a 𝐵𝐸𝑅𝑇𝐿𝑎𝑟𝑔𝑒 model for the multi-class classification.
   Team NITK-IT_NLP [40], which was the team with the second-best submission, divided
the text of the news articles into windows of 500 tokens. Those windows are shifted over the
text in order not to lose context. They experimented with different transformer models, with a
𝑚𝐷𝑒𝐵𝐸𝑅𝑇 𝑎 model yielding the best results. The individual results of all 8 submissions are
depicted in Table 6.
   A more detailed discussion of the different approaches is given in section 6.


6. Discussion of the Approaches Used
Before we give a summary of the different classification approaches in section 6.2, we will first
describe the baseline classifier that was used in this year’s shared task (section 6.1).

6.1. The Baseline Classifier
To have a starting point for the participants, we created a baseline system. The model used for
the CheckThat! 2022 Task 3 baseline is a standard bert-base-cased model from HuggingFace
(no lower-casing during training). The downloaded pre-trained model is originally trained on
English data and was fine-tuned on the 900 articles from the CheckThat! training set. The
Table 5
English: Official evaluation results for English Fake News Detection ranked by the macro-F1 score,
including the F1 scores for individual classes and the overall accuracy
        Team                   True        False     Partially      Other     Accuracy     Macro-F1
                                                      False
 1      iCompass [37]          0.383       0.721        0.173        0.080       0.547       0.339
 2      NLP&IR@UNED [38]       0.446       0.729        0.097        0.057       0.541       0.332
 3      Awakened [41]          0.328       0.744        0.185        0.035       0.531       0.323
 4      UNED                   0.346       0.725        0.191        0.000       0.544       0.315
     Baseline                  0.244       0.701        0.157        0.144       0.480       0.312
5       NLytics [42]           0.339       0.707        0.184        0.000       0.513       0.308
6       SCUoL [43]             0.377       0.709        0.133        0.000       0.526       0.305
7       NITK-IT_NLP [40]       0.325       0.734        0.133        0.000       0.536       0.298
8       CIC [44]               0.111       0.682        0.215        0.136       0.475       0.286
9       ur-iw-hnt [39]         0.290       0.733        0.110        0.000       0.533       0.283
10      BUM [45]               0.207       0.694        0.140        0.063       0.472       0.276
11      boby232                0.255       0.676        0.126        0.045       0.475       0.275
12      HBDCI [46]             0.177       0.708        0.209        0.000       0.508       0.273
13      DIU_SpeedOut           0.195       0.706        0.182        0.000       0.521       0.271
14      DIU_Carbine            0.192       0.626        0.157        0.056       0.472       0.258
15      CODE [47]              0.126       0.662        0.203        0.029       0.444       0.255
16      MNB                    0.160       0.701        0.142        0.000       0.507       0.251
17      subMNB                 0.160       0.701        0.142        0.000       0.507       0.251
18      FoSIL [48]             0.141       0.670        0.169        0.022       0.462       0.251
19      TextMinor [49]         0.250       0.555        0.086        0.048       0.377       0.235
20      DLRG                   0.009       0.694        0.092        0.000       0.513       0.199
21      DIU_Phoenix            0.420       0.040        0.092        0.000       0.278       0.159
22      AIT_FHSTP [50]         0.280       0.146        0.154        0.039       0.199       0.155
23      DIU_SilentKillers      0.407       0.070        0.135        0.000       0.260       0.153
24      DIU_Fire71             0.430       0.006        0.094        0.000       0.275       0.133
25      AI Rational            0.296       0.000        0.196        0.090       0.098       0.117


training parameters were: a batch size of 8, a maximum sequence length of 512, 10 epochs and
a learning rate of 3e-5.
   Since only one article was longer than the maximum sequence length, we did not use passage
classification or windows for training. We adopted AdamW as an optimizer with a linear
scheduler without warm up. For the training/validation split we used 90% of the training set for
training and 10% for validation. The training loss was 0.04 after 10 epochs. The outputs on the
validation set (training data) after completion of training were the following:

      • Accuracy: 0.56

      • Precision: 0.44 (macro-averaged)

      • Recall: 0.44 (macro-averaged)
Table 6
German: Official evaluation results for German Fake News Detection ranked by the macro-F1 score,
including the F1 scores for individual classes and the overall accuracy


      Team                 True        False      Partially      Other      Accuracy    Macro-F1
                                                   False
1     ur-iw-hnt [39]       0.401       0.536        0.189         0.033       0.427        0.290
    Baseline               0.405       0.328        0.029         0.204       0.280        0.242
2     NITK-IT_NLP [40]     0.268       0.490        0.077         0.063       0.362        0.225
3     UNED                 0.298       0.166        0.210         0.162       0.213        0.209
4     AIT_FHSTP [50]       0.378       0.168        0.151         0.081       0.254        0.195
5     Awakened [41]        0.098       0.452        0.194         0.000       0.283        0.186
6     CIC [44]             0.000       0.449        0.240         0.000       0.282        0.172
7     NoFake               0.000       0.492        0.000         0.000       0.326        0.123
8     AI Rational          0.268       0.000        0.166         0.122       0.114        0.111



     • F1: 0.42 (macro-averaged)

  The results on the test set show that there was a problem with overfitting (see Table 5).
  The baseline model is trained on the title and text content of the articles. It is based on former
work in [51], where also a bert-base-cased was used on another fake news detection dataset. In
that work, it was shown that using the titles in front of the body content boosted the accuracy.
This boost could also be observed on the CheckThat! Task 3 data. For the German data, we
used automatic translation of German texts to English and then applied the baseline model. The
results of the baseline system for German can be found in Table 6.

6.2. Classification Approaches
Most experiments involved deep learning models (16 teams), especially applications of BERT
(12 teams), RoBERTa (6 teams) or other BERT versions (in total 8 teams) were popular. How-
ever, almost as many teams (14 teams) experimented with feature-based supervised-learning
approaches as well. Examples are SVMs (10 teams), Logistic Regression (9 teams), Random
Forests (8 teams) and Naive Bayes (7 teams). Yet, the majority merely fine-tuned a pre-trained
language model and only very few experimented with other approaches.
   Although still quite many participants also experimented with feature-based approaches,
only very few teams incorporated a more non-standard feature design, i.e. features other than
bag of words, n-grams or word embeddings. One team (NLP&IR@UNED [38]) employed
features from LIWC [52] , one other team (HBDCI [46]) made use of surface features to capture
misspellings and repeated sentences. One further team (FoSIL [48]) implemented a special
feature selection scheme using human behavior based optimization.
   Surprisingly, only two participants (BUM [45] and NLP&IR@UNED [38]) considered em-
ploying an ensemble of different classifiers, despite the fact that this procedure is a simple and
established method for effectively combining individual classifiers of varying performances.
   Only very few teams used additional processing techniques which are not part of standard
text classification algorithms. BUM [45] exploited Wikipedia for evidence retrieval. Team
ur-iw-hnt [39] incorporate summarization techniques (both abstractive and extractive ones)
in order to account suitable for long documents.
   In order to bridge the language gap between the English training data and German test data
in the cross-lingual subtask, either a multi-lingual language model (such as XML-RoBERTa or
mDeBERTa) was used or the data was automatically translated into the other language by using
services such as Google Translate8 . Among the participants, there were no approaches that
went beyond these well-established procedures.
   Since all participants pursued a supervised-learning approach, the choice of training data is
also an issue that has been addressed by several participants. About half of them used some
training data in addition to the one provided as part of the lab. A popular complement were
data from Kaggle.9 10

6.3. Detailed Description of Participants Systems
In this subsection, we provide a description of the individual participant papers to offer deeper
insight into the individual approaches applied to the tasks.
Team AI Rational (monolingual:25 cross-lingual:8) employed a RoBERTa classifier and made
use of other English training data in addition to the provided dataset.
Team AIT_FHSTP [50] (monolingual:22 cross-lingual:4) primarily experimented with different
transformers for this task being T5 and XLM-RoBERTa. For the evaluation, they used XLM-
RoBERTa. For the cross-lingual subtask the given English training data was translated into
German.
Team Awakened [41] (monolingual:3 cross-lingual:5) took part in both subtasks, employing a
BiLSTM architecture with BART sentence transformers for the monolingual and BiLSTM with
XLM sentence transformers for the cross-lingual task.
Team boby232 (monolingual:11) exclusively experimented with feature-based supervised classi-
fication, more specifically, 𝑘 nearest neighbors. The focus was on tuning the parameters of the
classifier. The training data provided by the previous edition of this task was used.
Team BUM [45] (monolingual:10) described an approach to the monolingual fake news detection
task. Numerous additional datasets were added for training. In addition, a Bag-of-Words
approach was used to extract text passages from Wikipedia data that match claims from the
training and test data. A T5 transformer approach checked whether a claim was a logical
consequence of these passages. As a result, the authors found that the approach worked better
for detecting the false class than for the other classes. They attribute this to unbalanced data.
Team CIC [44] (monolingual:8 cross-lingual:6) considered three different classifiers: passive
aggressive classification, Bi-LSTM and a transformer (i.e. RoBERTa). For the monolingual task,
RoBERTa performed best, while for the cross-lingual task BiLSTM perform best.


   8
      https://translate.google.com/
   9
      www.kaggle.com/datasets/liberoliber/onion-notonion-datasets
   10
      www.kaggle.com/c/fakenewskdd2020
Team CODE [47] (monolingual:15) built a system based on two components: the first component
establishes whether an instance is relevant (i.e. not belonging to the other class) while the second
component is devised to specify relevant instances (i.e. it distinguishes between the remaining
class labels of this subtask). As component classifiers, the authors fine-tune BERT. The team
employed additional training data: Fake News Detection Challenge KDD 2020 and the Fake News
Classification Datasets from Kaggle.
Team DIU_Carbine (monolingual:14) first augmented the dataset with additional instances of
class true from Kaggle so that the dataset is more balanced for training. TF-IDF was used
as a method to generate features for supervised learning. Four different traditional learning
algorithms were tested, Logistic Regression turned out to be the one that worked best.
Team DIU_Fire71 (monolingual:24) experimented with several traditional learning algorithms,
namely XGBoost, KNN, Gradient Boosting Classifier, Random Forest, Support Vector Machines,
Naive Bayes, and Decision Trees. They also used other English training data than the one
provided in this year’s task and the dataset from the last iteration of the shared task. For the
best classification model, they used TF-IDF vectorization. XGBoost and the Gradient Boosting
Classifier algorithm achieved the best results.
Team DIU_Phoenix (monolingual:21) employed a multitude of traditional supervised learning
algorithms (i.e. Support Vector Machines, Logistic Regression, Random Forests, Decision Trees,
XGB Boosting, Gradient Boosting, Naive Bayes, KNN) and deep learning classfiers (LSTM). They
used other English training data than the one provided in this year’s task and the dataset from
the last iteration of the shared task as well.
Team DIU_SilentKillers (monolingual:23) experimented with Support Vector Machines, Ran-
dom Forests, XGBoost, and LSTM. No additional dataset was used.
Team DIU_SpeedOut (monolingual:13) experimented with several traditional learning algo-
rithms, i.e. Naive Bayes, Logistic Regression, and Stochastic Gradient Descent, and deep learning,
more specifically, LSTM.
Team FoSIL [48] (monolingual:18) employed an SVM in combination with a feature selection
algorithm whose concept is based on human behaviour-based optimization.
Team HBDCI [46] (monolingual:12) compared two different classification approaches: a feature-
based approach using traditional supervised learning and a deep-learning approach that com-
bines BERT, CNN, non-contextual embeddings and stylometric features. Both classifiers were
evaluated in different configurations (i.e. different subsets of features). Overall, the deep-learning
approach outperformed the feature-based approach. Including a subset of stylometric features
was also helpful.
Team iCompass (monolingual:1) employed two concatenated parallel fine-tuned BERT models.
One of the models processed the title of an article and the other the main text.
Team MNB (monolingual:16) experimented with Support Vector Machines, Logistic Regression,
Random Forests, and Naive Bayes. Tuning the parameters of their classifiers was the main focus
of their research.
Team NITK-IT_NLP [40] (monolingual:7 cross-lingual:2) examined different transformer mod-
els for both subtasks. They also proposed a classifier trained on striding text windows of the
data. This approach seems necessary since some of the document instances from the given
dataset are fairly long.
Team NLP&IR@UNED [38] (monolingual:2) used first a longformer model to be able to process
longer sequences as they are typical for news texts. As a second approach, they applied data
augmentation by splitting the texts into shorter sequences before classifying with BERT-based
models. As a third approach, the authors derived text and LIWC-features and used them together
with a transformer embedding in an ensemble.
Team NLytics [42] (monolingual:5) experimented both with RoBERTa and Longformer models.
They employed the latter for the official system submission in order to overcome the restriction
of 512 tokens. A topic modeling approach was implemented prior to the classification step to
account for the varying class distributions in different topics.
Team NoFake (cross-lingual:7) also made use of other English non-specified training data
and training data provided by the previous edition of this task. They experimented with two
traditional supervised classifiers (Support Vector Machines, Logistic Regression) and one BERT
deep learning classifier. They focused on exploring the different training data in their research.
Team SCUoL [43] (monolingual:6) tested four supervised learning algorithms (features: TF-IDF)
and four transformers. Additional data from the Kaggle task was experimented with. The result
showed that SVC is the best classifier and bert-large-cased is the best transformer model, slightly
outperforming SVC. However, the additional data did not result in any performance gains.
Team subMNB (monolingual:17) employed Support Vector Machines, Logistic Regression, Ran-
dom Forests and Naive Bayes. In addition to the training data provided by in the context of the
task, other English training data was used.
Team TextMinor [49] (monolingual:19) pursued a deep-learning approach based on RoBERT.
Next to exploiting the information contained in that language model, the authors also included
overlap features, a singular value decomposition between text and title, and cosine similarity
between text and title based on their TF-IDF representation.
Team UNED (monolingual:4 cross-lingual:3) experimented with BERT, RoBERTa, and ALBERT
deep learning classifiers. They relied on the following publicly available models: bert-base-cased,
bert-base-uncased, albert-base-v2, multilingual-bert-base, roberta-base. Their focus was on the
fine-tuning of those classifiers.
Team ur-iw-hnt [39] (monolingual:9 cross-lingual:1) experimented with extractive and abstrac-
tive summarization. Subsequently, BERT models were applied. In the case of German, the
data was first automatically translated into English using machine translation. Large language
models worked well, but overfitting was identified as an issue that needs be avoided.


7. Conclusion
We have presented a detailed overview of Task 3 of the CheckThat! Lab of CLEF 2022. It
focused on the classification of news articles with respect to the correctness of their main claims.
The results give a realistic estimate of the current state-of-the art for fake news detection. Most
of the participants used transformer-based models like BERT or RoBERTa. Systems based on
such technology could be applied within the fact checking community. However, the results
show that more work is required in order to improve the current systems. The marco F1 scores
are not sufficient for a satisfying multi-class classification of news articles according to their
factuality. It is a limitation that the provided dataset was unbalanced. Yet, this shared task is one
of the few research initiatives that focuses not binary, but multi-class classification. On top of
that we offer a dataset in two languages: German and English. Future research should continue
our efforts to provide high-quality multilingual real-world dataset in multiple languages and
also broaden the scope by including different kinds of meta data (e.g. social factors). Thus,
enabling research beyond textual features.


8. Acknowledgements
We are thankful for the CheckThat! organizers for supporting this task. We also thank the
student volunteers for helping with the annotation of the data.
   Part of this work has been funded by the BMBF (German Federal Ministry of Education and
Research) under the grant no. 01FP20031J. The responsibility for the contents of this publication
lies with the authors.


References
 [1] Y. Liu, Y.-F. B. Wu, Fned: A deep network for fake news early detection on social media,
     ACM Transactions on Information Systems 38 (2020) 1–33. doi:10.1145/3386253.
 [2] Z. Wang, Z. Yin, Y. A. Argyris, Detecting medical misinformation on social media using
     multimodal deep learning, IEEE journal of biomedical and health informatics 25 (2021)
     2193–2203. doi:10.1109/JBHI.2020.3037027.
 [3] V. K. Singh, I. Ghosh, D. Sonagara, Detecting fake news stories via multimodal analysis,
     Journal of the Association for Information Science and Technology 72 (2021) 3–17. doi:10.
     1002/asi.24359.
 [4] M. Villagracia Octaviano, Fake news detection using machine learning, in: 2021 5th
     International Conference on E-Society, E-Education and E-Technology, ICSET 2021,
     Association for Computing Machinery, New York, NY, USA, 2021, p. 177–180. URL:
     https://doi.org/10.1145/3485768.3485774. doi:10.1145/3485768.3485774.
 [5] C.-C. Hsu, Y.-X. Zhuang, C.-Y. Lee, Deep fake image detection based on pairwise learning,
     Applied Sciences 10 (2020) 370. doi:10.3390/app10010370.
 [6] S. Agarwal, H. Farid, T. El-Gaaly, S.-N. Lim, Detecting deep-fake videos from appearance
     and behavior, in: 2020 IEEE International Workshop on Information Forensics and Security
     (WIFS), IEEE, 2020, pp. 1–6. doi:10.1109/WIFS49906.2020.9360904.
 [7] D. Kopev, A. Ali, I. Koychev, P. Nakov, Detecting deception in political debates using acous-
     tic and textual features, in: 2019 IEEE Automatic Speech Recognition and Understanding
     Workshop (ASRU), IEEE, 2019, pp. 652–659. doi:10.1109/ASRU46091.2019.9003892.
 [8] S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, S. Satoh, Spotfake: A multi-
     modal framework for fake news detection, in: 2019 IEEE Fifth International Conference on
     Multimedia Big Data (BigMM), IEEE, 2019, pp. 39–47. doi:10.1109/BigMM.2019.00-44.
 [9] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, R. Míguez, T. Caselli, M. Kutlu,
     W. Zaghouani, C. Li, S. Shaar, H. Mubarak, A. Nikolov, Y. S. Kartal, J. Beltrán, Overview of
     the CLEF-2022 CheckThat! lab task 1 on identifying relevant claims in tweets, in: Working
     Notes of CLEF 2022—Conference and Labs of the Evaluation Forum, CLEF ’2022, Bologna,
     Italy, 2022.
[10] P. Nakov, G. Da San Martino, F. Alam, S. Shaar, H. Mubarak, N. Babulkov, Overview of
     the CLEF-2022 CheckThat! lab task 2 on detecting previously fact-checked claims, in:
     Working Notes of CLEF 2022—Conference and Labs of the Evaluation Forum, CLEF ’2022,
     Bologna, Italy, 2022.
[11] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
     T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
     N. Babulkov, Y. S. Kartal, J. Beltrán, M. Wiegand, M. Siegel, J. Köhler, Overview of the
     CLEF-2022 CheckThat! lab on fighting the COVID-19 infodemic and fake news detection,
     in: Proceedings of the 13th International Conference of the CLEF Association: Information
     Access Evaluation meets Multilinguality, Multimodality, and Visualization, CLEF ’2022,
     Bologna, Italy, 2022.
[12] X. Zhou, R. Zafarani, A survey of fake news: Fundamental theories, detection methods,
     and opportunities, ACM Comput. Surv. 53 (2020) 109:1–109:40. URL: https://doi.org/10.
     1145/3395046. doi:10.1145/3395046.
[13] X. Zhang, A. A. Ghorbani, An overview of online fake news: Characterization, detection,
     and discussion, Inf. Process. Manag. 57 (2020) 102025. URL: https://doi.org/10.1016/j.ipm.
     2019.03.004. doi:10.1016/j.ipm.2019.03.004.
[14] M. Hardalov, A. Arora, P. Nakov, I. Augenstein, A Survey on Stance Detection for Mis-
     and Disinformation Identification, in: Findings of the North American Chapter of the
     Association for Computational Linguistics: Human Language Technologies, NAACL ’22
     (Findings), Association for Computational Linguistics, Seattle, WA, USA, 2022. URL: https:
     //arxiv.org/abs/2103.00242.
[15] S. I. Manzoor, J. Singla, et al., Fake news detection using machine learning approaches: A
     systematic review, in: 2019 3rd International Conference on Trends in Electronics and
     Informatics (ICOEI), IEEE, 2019, pp. 230–234. doi:https://doi.org/10.1109/ICOEI.
     2019.8862770.
[16] A. Kazemi, K. Garimella, G. K. Shahi, D. Gaffney, S. A. Hale, Research note: Tiplines to
     uncover misinformation on encrypted platforms: A case study of the 2019 indian general
     election on whatsapp, Harvard Kennedy School Misinformation Review (2022).
[17] G. K. Shahi, J. M. Struß, T. Mandl, Overview of the CLEF-2021 checkthat! lab: Task 3 on fake
     news detection, in: G. Faggioli, N. Ferro, A. Joly, M. Maistro, F. Piroi (Eds.), Proceedings of
     the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest,
     Romania, September 21st - to - 24th, 2021, volume 2936 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2021, pp. 406–423. URL: http://ceur-ws.org/Vol-2936/paper-30.pdf.
[18] M. Hardalov, I. Koychev, P. Nakov, In search of credible news, in: C. Dichev, G. Agre (Eds.),
     Artificial Intelligence: Methodology, Systems, and Applications, Springer International
     Publishing, Cham, 2016, pp. 172–180.
[19] G. K. Shahi, D. Nandini, Fakecovid–a multilingual cross-domain fact check news dataset
     for covid-19, arXiv preprint arXiv:2006.11343 (2020).
[20] D. Röchert, G. K. Shahi, G. Neubaum, B. Ross, S. Stieglitz, The networked context of
     covid-19 misinformation: Informational homogeneity on youtube at the beginning of the
     pandemic, Online Social Networks and Media 26 (2021) 100164.
[21] K. Pogorelov, D. T. Schroeder, S. Brenner, J. Langguth, Fakenews: Corona virus and 5g
     conspiracies multimedia analysis task at mediaeval 2021, in: Working Notes Proceedings
     of the MediaEval 2021 Workshop, 2022.
[22] K. Pogorelov, D. T. Schroeder, L. Burchard, J. Moe, S. Brenner, P. Filkukova, J. Langguth,
     Fakenews: Corona virus and 5g conspiracy task at mediaeval 2020, in: S. Hicks, D. Jha,
     K. Pogorelov, A. G. S. de Herrera, D. Bogdanov, P. Martin, S. Andreadis, M. Dao, Z. Liu, J. V.
     Quiros, B. Kille, M. A. Larson (Eds.), Working Notes Proceedings of the MediaEval 2020
     Workshop, Online, 14-15 December 2020, volume 2882 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2020. URL: http://ceur-ws.org/Vol-2882/paper64.pdf.
[23] L. Derczynski, K. Bontcheva, M. Liakata, R. Procter, G. Wong Sak Hoi, A. Zubiaga, SemEval-
     2017 task 8: RumourEval: Determining rumour veracity and support for rumours, in:
     Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017),
     Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 69–76. URL:
     https://aclanthology.org/S17-2006. doi:10.18653/v1/S17-2006.
[24] G. Gorrell, E. Kochkina, M. Liakata, A. Aker, A. Zubiaga, K. Bontcheva, L. Derczynski,
     SemEval-2019 task 7: RumourEval, determining rumour veracity and support for rumours,
     in: Proceedings of the 13th International Workshop on Semantic Evaluation, Association
     for Computational Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 845–854. URL:
     https://aclanthology.org/S19-2147. doi:10.18653/v1/S19-2147.
[25] S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, C. Cherry, SemEval-2016 task 6:
     Detecting stance in tweets, in: Proceedings of the 10th International Workshop on
     Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, San
     Diego, California, 2016, pp. 31–41. URL: https://aclanthology.org/S16-1003. doi:10.18653/
     v1/S16-1003.
[26] G. Da San Martino, A. Barrón-Cedeño, H. Wachsmuth, R. Petrov, P. Nakov, SemEval-
     2020 task 11: Detection of propaganda techniques in news articles, in: Proceedings of the
     Fourteenth Workshop on Semantic Evaluation, International Committee for Computational
     Linguistics, Barcelona (online), 2020, pp. 1377–1414. URL: https://aclanthology.org/2020.
     semeval-1.186. doi:10.18653/v1/2020.semeval-1.186.
[27] T. Mihaylova, G. Karadzhov, P. Atanasova, R. Baly, M. Mohtarami, P. Nakov, SemEval-2019
     task 8: Fact checking in community question answering forums, in: Proceedings of the
     13th International Workshop on Semantic Evaluation, Association for Computational
     Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 860–869. URL: https://aclanthology.
     org/S19-2149. doi:10.18653/v1/S19-2149.
[28] J. Thorne, A. Vlachos, O. Cocarascu, C. Christodoulopoulos, A. Mittal, The fact ex-
     traction and VERification (FEVER) shared task, in: Proceedings of the First Work-
     shop on Fact Extraction and VERification (FEVER), Association for Computational
     Linguistics, Brussels, Belgium, 2018, pp. 1–9. URL: https://aclanthology.org/W18-5501.
     doi:10.18653/v1/W18-5501.
[29] J. Thorne, A. Vlachos, C. Christodoulopoulos, A. Mittal, FEVER: a large-scale dataset for
     fact extraction and VERification, in: Proceedings of the 2018 Conference of the North
     American Chapter of the Association for Computational Linguistics: Human Language
     Technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New
     Orleans, Louisiana, 2018, pp. 809–819. URL: https://aclanthology.org/N18-1074. doi:10.
     18653/v1/N18-1074.
[30] W. Antoun, F. Baly, R. Achour, A. Hussein, H. Hajj, State of the art models for fake news
     detection tasks, in: 2020 IEEE International Conference on Informatics, IoT, and En-
     abling Technologies (ICIoT), IEEE, 2020, pp. 519–524. doi:10.1109/ICIoT48696.2020.
     9089487.
[31] M. Mayank, S. Sharma, R. Sharma, DEAP-FAKED: knowledge graph based approach for
     fake news detection, CoRR abs/2107.10648 (2021). URL: https://arxiv.org/abs/2107.10648.
     arXiv:2107.10648.
[32] Q. Su, M. Wan, X. Liu, C.-R. Huang, et al., Motivations, methods and metrics of misin-
     formation detection: an nlp perspective, Natural Language Processing Research 1 (2020)
     1–13.
[33] G. K. Shahi, A. Dirkson, T. A. Majchrzak, An exploratory study of covid-19 misinformation
     on twitter, Online social networks and media (2021) 100104. doi:10.1016/j.osnem.
     2020.100104.
[34] G. K. Shahi, T. A. Majchrzak, Exploring the spread of covid-19 misinformation on twitter,
     EasyChair Preprint no. 6009, EasyChair, 2021.
[35] G. K. Shahi, AMUSED: An annotation framework of multi-modal social media data, 2020.
     arXiv:2010.00502.
[36] G. K. Shahi, J. M. Struß, T. Mandl, J. Köhler, M. Wiegand, M. Siegel, CT-FAN-22 corpus: A
     Multilingual dataset for Fake News Detection, 2022. URL: https://doi.org/10.5281/zenodo.
     6555293. doi:10.5281/zenodo.6555293.
[37] B. Taboubi, M. A. B. Nessir, H. Haddad, iCompass at CheckThat! 2022: combining deep
     language models for fake news detection, in: Working Notes of CLEF 2022 - Conference
     and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[38] J. R. Martinez-Rico, J. Martinez-Romo, L. Araujo, NLP&IRUNED at CheckThat! 2022: en-
     semble of classifiers for fake news detection, in: Working Notes of CLEF 2022 - Conference
     and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[39] H. N. Tran, U. Kruschwitz, ur-iw-hnt at CheckThat! 2022: cross-lingual text summarization
     for fake news detection, in: Working Notes of CLEF 2022 - Conference and Labs of the
     Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[40] R. L. Hariharan, M. Anand Kumar, Nitk-it_nlp at checkthat! 2022: Window based approach
     for fake news detection using transformers, in: Working Notes of CLEF 2022 - Conference
     and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[41] C.-O. Truică, E.-S. Apostol, A. Paschke, Awakened at CheckThat! 2022: fake news detection
     using BiLSTM and sentence transformer, in: Working Notes of CLEF 2022 - Conference
     and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[42] A. Pritzkau, O. Blanc, M. Geierhos, U. Schade, NLytics at CheckThat! 2022: hierarchical
     multi-class fake news detection of news articles exploiting the topic structure, in: Working
     Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, CLEF ’2022, Bologna,
     Italy, 2022.
[43] S. Althabiti, M. A. Alsalka, E. Atwell, SCUoL at CheckThat! 2022: fake news detection
     using transformer-based models, in: Working Notes of CLEF 2022 - Conference and Labs
     of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[44] M. Arif, A. L. Tonja, I. Ameer, O. Kolesnikova, A. Gelbukh, G. Sidorov, A. G. Meque, CIC
     at CheckThat! 2022: multi-class and cross-lingual fake news detection, in: Working Notes
     of CLEF 2022 - Conference and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy,
     2022.
[45] D. La Barbera, K. Roitero, J. Mackenzie, S. Damiano, G. Demartini, S. Mizzaro, BUM
     at CheckThat! 2022: a composite deep learning approach to fake news detection using
     evidence retrieval, in: Working Notes of CLEF 2022 - Conference and Labs of the Evaluation
     Forum, CLEF ’2022, Bologna, Italy, 2022.
[46] C. P. Capetillo, D. Lecuona-Gómez, H. Gómez-Adorn, I. Arroyo-Fernández, J. Neri-Chávez,
     HBDCI at CheckThat! 2022: fake news detection using a combination of stylometric
     features and deep learning, in: Working Notes of CLEF 2022 - Conference and Labs of the
     Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[47] O. Blanc, A. Pritzkau, U. Schade, M. Geierhos, CODE at CheckThat! 2022: multi-class fake
     news detection of news articles with BERT, in: Working Notes of CLEF 2022 - Conference
     and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[48] A. Ludwig, J. Felser, J. Xi, D. Labudde, M. Spranger, FoSIL at CheckThat! 2022: using
     human behaviour-based optimization for text classification, in: Working Notes of CLEF
     2022 - Conference and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022.
[49] S. Kumar, G. Kumar, S. R. Singh, TextMinor at CheckThat! 2022: fake news article detection
     using RoBERT, in: Working Notes of CLEF 2022 - Conference and Labs of the Evaluation
     Forum, CLEF ’2022, Bologna, Italy, 2022.
[50] M. Schütz, J. Böck, M. Andresel, A. Kirchknopf, D. Liakhovets, D. Slijepčević, A. Schindler,
     AIT_FHSTP at CheckThat! 2022: Cross-lingual fake news detection with a large pre-trained
     transformer, in: Working Notes of CLEF 2022 - Conference and Labs of the Evaluation
     Forum, CLEF ’2022, Bologna, Italy, 2022.
[51] M. Schütz, Detection and identification of fake news: Binary content classification with
     pre-trained language models, in: Information between Data and Knowledge, volume 74 of
     Schriften zur Informationswissenschaft, Werner Hülsbusch, Glückstadt, 2021, pp. 422–431.
     URL: https://epub.uni-regensburg.de/44959/, gerhard Lustig Award Papers.
[52] J. W. Pennebaker, R. L. Boyd, K. Jordan, K. Blackburn, The development and psychomet-
     ricproperties of LIWC2015, Technical Report, University of Texas, 2015.