AI Rational at CheckThat! 2022: Using transformer models for
                         tweet classification

Aleksandar Savcheva
a
    Sofia University, Bulgaria


                Abstract
                This paper is an overview of the approach taken by team AI Rational in CheckThat! 2022 for
                Task1 in English, Bulgarian, Dutch and Turkish. Task 1 is about classifying COVID-19 tweets
                and has four subtasks: 1A Check-worthiness; 1B Verifiable factual claims detection; 1C
                Harmful tweet detection; 1D Attention-worthy tweet detection. This document will focus on
                the experiments done for 1A English where the team got first place out of 13 teams however
                the same techniques are done for the other languages and subtasks. This document will show
                our data preprocessing and data augmentation as well as the use of transformer models BERT,
                DistilBERT and RoBERTa for text classification and how we fined-tuned them for best results.

                Keywords
                Check-worthiness, COVID-19, transformer models

1. Introduction

    In today’s day and age with the mass spread of the internet, access to large quantities of information
is easily achieved. With the increase of data, the amount of disinformation increases as well. People
rely on trustworthy news sites where the articles are redacted and checked by publishers to satisfy their
need for information. However, most people use social media sites as well where people can freely post
information about different topics which is not fact-checked or can be malicious in nature. Vulnerable
topics include but are not limited to finance, healthcare and politics. To combat the spread of
misinformation in social media sites Task 1 in CheckThat! 2022 is about classifying tweeter tweets
related to COVID-19 in English, Bulgarian, Turkish, Dutch and Arabic. Task 1 has four subtasks:
        1A Check-worthiness of tweets: Given a tweet, predict whether it is worth fact-checking.
        1B Verifiable factual claims detection: Given a tweet, predict whether it contains a verifiable
    factual claim.
        1C Harmful tweet detection: Given a tweet, predict whether it is harmful to society and why.
        1D Attention-worthy tweet detection: Given a tweet, predict whether it should get the
    attention of policy makers and why.

    Models have been trained on all 4 of these subtasks in the languages: English, Bulgarian, Dutch and
Turkish. However, this document will show the approach and experiments on subtask 1A English only.
For the other subtasks and languages, the same approach was used.
    Section 2 of this document will show the most successful models from previous editions of this task.
The models used to solve the task: DistilBERT, BERT and RoBERTa. The method of fine-tuning these
models. Section 2 also contains what kind of data preprocessing and data augmentation are used on the
training data.


1
 CLEF 2022 – Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
EMAIL: alex1alex@abv.bg (A. Savchev)
ORCID: 0000-0003-4626-643X (A. Savchev)
                ©️ 2022 Copyright for this paper by its authors.
                Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                CEUR Workshop Proceedings (CEUR-WS.org) Proceedings
2. Usage of transformer models
2.1. Previous successful approaches for CheckThat!
    In CheckThat! 2021 the most successful participants have used different transformer models as seen
in [1] such models include BERT, RoBERTa, BETO, AraBERT, BERTurk. Some teams have applied
data augmentation, and most have done simple data preprocessing. For deeper research papers from
different participants were analyzed. In [2] the team has tried different transformer models. For their
final submission they have used BERT pre-trained on tweets and have gotten first place in English and
fourth in Spanish for task 1A which is the equivalent of clef 2022 task 1A. [7] have experimented with
fine-tuning pretrained models BETO and RoBERTa. They have achieved fifth and first place in English
and Spanish languages on task 1A.

2.2.    Dataset

   For task1 English the “train” corpus consists of 2122 entries. Each entry is a tweet related to COVID-
19 which is labeled with 0 or 1 on whether it is worth fact-checking or not. The entries also consist of
the tweet’s id and URL. The organizers have also provided “dev” set of 195 entries and “dev_test” set
of 574 entries.
   For data preprocessing, all links in the data were replaced with “@link”. Other kinds of text edits
were not done since these models have their own tokenizer. For English, to increase the training set it
was experimented with back translation (English tweets were translated to French and then translated
back to English), tweets that remained the same were removed.
   For training both entries from the “train” set and “dev” set were used, which total to 2317 entries.
With back translation the train set increased to 4439 entries. For validation set the “dev_test” corpus
was used.

2.3.    Experiments done
    Experiments for task1A in English were made with three different pre-trained transformer models:
BERT [3], DistilBERT [4] (a distilled version of BERT) and RoBERTa [5]. All models used have been
taken from huggingface1. Fine-tuning of parameters was done on DistilBERT model since it is the
fastest one to train. The more notable hyper parameters are number of epochs 15, warm up ratio 6% and
learning ratio 3e-5. These discoveries were used when training the different models. The same settings
were used for the other subtasks and languages.
    For the other subtasks the same experiments were done where for English the final submissions were
with RoBERTa while for the other languages XLM-RoBERTa [6] (a cross-lingual version of
RoBERTa) was used. For Dutch and Turkish data augmentation was not used while for Bulgarian the
test set was increased with back translation, translating Bulgarian to English and then back to Bulgarian
to have more data.


2.4.    Results from experiments

    Table 1 shows the results of the different experiments on task 1A English with the given train and
test set from the organizers. As expected, the best results are with RoBERTa model. Data augmentation
helped improve the metrics slightly.


https://huggingface.co1
Table 1
Experiments
             Model                            Accuracy                    F1(for positive class)
             BERT                              0.8118                            0.8833
           DistilBERT                          0.8101                            0.8836
            RoBERTa                            0.8432                            0.9010
   BERT + Data augmentation                    0.8118                            0.8838
 DistilBERT+Data augmentation                  0.8153                            0.8862
 RoBERTa + Data augmentation                   0.8536                            0.9070


3. Placements

   Table 2 shows the placements in the official leaderboards2. The rankings are written in format
X/N where X is the place taken by the model and N is the number of participants in that subtask.


Table 2
Placements
        Subtask                 English             Bulgarian            Dutch            Turkish
          1A                     1/13                  3/5                2/5                -
          1B                      4/9                  1/2                1/2               2/4
          1C                     2/10                  1/2                2/2               3/4
          1D                      2/5                  1/2                1/2               1/2

4. Conclusion and future work
    This paper presents the experiments done with transformer models BERT, DistilBERT, RoBERTa
and data augmentation technique back translation on task1A in English. The results achieved are
satisfactory. For future work experiments with transformer models pre-trained on tweets is advised and
experimenting with different data augmentation techniques to increase the data set may yield better
results.

5. Acknowledgements

   This work has been supported by professor Preslav Nakov, professor Ivan Koychev and assistant
Momchil Hardalov. Victor Kostov (worked on task 2) and Kristian Zhelyazkov (worked on task 3) from
the AI Rational team helped this work as well. All the before mentioned people are from Sofia
University


https://docs.google.com/spreadsheets/d/1LMjU7nrl2R7iuAE023kwip2lxQw9jG2V6hEHPChBuHE2
6. References

[1] Shaden Shaar, Maram Hasanain, Bayan Hamdan, Zien Sheikh Ali, Fatima Haouari, Alex Nikolov,
    Mucahid Kutlu, Yavuz Selim Kartal, Firoj Alam, Giovanni Da San Martino, Alberto Barrón-
    Cedeño, Rubén Míguez, Javier Beltrán, Tamer Elsayed and Preslav Nakov, Overview of the
    CLEF-2021 CheckThat! Lab Task 1 on Check-Worthiness Estimation in Tweets and Political
    Debates http://ceur-ws.org/Vol-2936/paper-28.pdf
[2] Juan R. Martinez-Rico, Juan Martinez-Romo and Lourdes Araujo NLP&IR@UNED at
    CheckThat! 2021: Check-worthiness estimation and fake news detection using transformer models
    http://ceur-ws.org/Vol-2936/paper-44.pdf
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, BERT: Pre-training of Deep
    Bidirectional Transformers for Language Understanding https://arxiv.org/pdf/1810.04805.pdf
[4] Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf , DistilBERT, a distilled version
    of BERT: smaller, faster, cheaper and lighter https://arxiv.org/pdf/1910.01108.pdf
[5] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike
    Lewis, Luke Zettlemoyer, Veselin Stoyanov, RoBERTa: A Robustly Optimized BERT Pretraining
    Approach https://arxiv.org/pdf/1907.11692.pdf
[6] Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmὰn, Edouard Grave, Myle
    Ott, Luke Zettlemoyer, Veselin Stoyanov, Unsupervised Cross-lingual Representation Learning at
    Scale https://arxiv.org/pdf/1911.02116.pdf
[7] Robiert Sepúlveda-Torres, Estela Saquete GPLSI team at CheckThat! 2021: Fine-tuning BETO
    and RoBERTa http://ceur-ws.org/Vol-2936/paper-52.pdf