=Paper=
{{Paper
|id=Vol-3180/paper-58
|storemode=property
|title=Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text
|pdfUrl=https://ceur-ws.org/Vol-3180/paper-58.pdf
|volume=Vol-3180
|authors=Prerona Tarannum,Md. Arid Hasan,Firoj Alam,Sheak Rashed Haider Noori
|dblpUrl=https://dblp.org/rec/conf/clef/TarannumHAN22
}}
==Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text==
Z-Index at CheckThat! Lab 2022: Check-Worthiness
Identification on Tweet Text
Prerona Tarannum1 , Md. Arid Hasan1 , Firoj Alam2 and Sheak Rashed Haider Noori1
1
Daffodil International University
2
Qatar Computing Research Institute
Abstract
The wide use of social media and digital technologies facilitates sharing various news and information
about events and activities. Despite sharing positive information misleading and false information is
also spreading on social media. There have been efforts in identifying such misleading information both
manually by human experts and automatic tools. Manual effort does not scale well due to the high volume
of information, containing factual claims, are appearing online. Therefore, automatically identifying
check-worthy claims can be very useful for human experts. In this study, we describe our participation in
Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022.
We performed standard preprocessing steps and applied different models to identify whether a given text
is worthy of fact-checking or not. We use the oversampling technique to balance the dataset and applied
SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m)
and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official
submissions and our systems ranked as 3𝑟𝑑 , 5𝑡ℎ , and 12𝑡ℎ in Spanish, Dutch, and English, respectively. In
further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base)
outperform the SVM and RF in Dutch and English languages where a different scenario is observed for
Spanish.
Keywords
Check-worthiness, Check-worthy claim detection, Fact-checking, Disinformation, Misinformation, Social
Media Text, Transformer Models,
1. Introduction
Recently, social media became the main communication channel to exchanging information
among people. As a result, it becomes the primary source of news [1]. In our daily activities, such
information is helpful, however, a major part of them contains misleading content that is harmful
to individuals, society, or organizations [2, 3]. The harmful or misleading content includes
hate speech [4], hostility [5, 6], propagandistic news and memes [7, 8, 9, 10], harmful memes
[11], abusive language [12], cyberbullying and cyber-aggression [13, 14] and rumours [15]. The
misleading or harmful aspects of such information raised the interest to identify and flag them to
reduce their spread, further. There have been significant research efforts to automatically identify
CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ prerona15-14134@diu.edu.bd (P. Tarannum); arid.cse0325.c@diu.edu.bd (Md. A. Hasan); fialam@hbku.edu.qa
(F. Alam); drnoori@daffodilvarsity.edu.bd (S. R. H. Noori)
0000-0002-3292-1870 (P. Tarannum); 0000-0001-7916-614X (Md. A. Hasan); 0000-0001-7172-1997 (F. Alam);
0000-0001-6937-6039 (S. R. H. Noori)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
such content. Recent surveys on fake news [16] disinformation [17], rumours [15], propaganda
[7], multimodal memes [18], hate speech [4], cyberbullying [19], and offensive content [20]
highlight the importance of the problem and relevant approaches to address them.
Most often information is disseminated with facts to make people believe it is true, which
are typically found in political debates, and social and global agendas. Identifying whether
such facts are true or false is an important step in fighting misleading information. There have
been manual efforts by fact-checking organizations to identify the truthfulness of such factual
statements. As such manual efforts do not scale well, therefore, it is important to automatically
identify them. However, there is a reliability issue with the automated approach [21]. A trade-off
is to support human fact-checkers using an automated approach, which includes different steps
in the fact-checking pipeline [22]. The first step of the fact checking pipeline is to find content
that is check-worthy. The CheckThat! Lab (CTL) shared tasks is addressing this problem for
the past several years. As an ongoing effort, this year CheckThat! Lab offered check worthiness
subtask in six different languages such as Arabic, Bulgarian, Dutch, English, Spanish, and
Turkish where data was collected from Twitter [23, 24, 25]. We participated in check worthiness
subtask and focused on Dutch, English, and Spanish. For the experiments, we used different
pretrained transformer based models, which have been widely used in several NLP tasks [2, 26].
The difficulties arise when multilingual pretrained model is used in such tasks where facts and
claims vary by country [27] and knowledge transferring across the language could spread the
disinformation. We used multilingual transformer models (m-BERT and XLM-RoBERTa) for
our experiments. In addition to the transformer models, we also used SVM and RF with TF-IDF
representations.
The rest of this paper is organized as follows. In section 2, we provided related works that are
relevant for this study. We then discuss the methodology in Section 3. Results of the experiments
and detailed discussions are provided in Section 4. Finally, we conclude our study in Section 5.
2. Related Work
To deal with the factuality of statements there have been initiatives to manually check them
and as result many fact-checking organizations have emerged, such as FactCheck.org1 , Snopes2 ,
PolitiFact3 , and FullFact4 . In addition, there have also been some international initiatives such as
the Credibility Coalition5 and Eufactcheck6 [28].
One of the earlier efforts in this direction is the ClaimBuster system [29], which has been
developed using the transcripts of 30 historical US election debates with a total of 28,029
transcribed sentences. The annotation includes non-factual, unimportant factual, and check-
worthy factual class labels and has been carried out by students, professors, and journalists.
Gencheva et al. [30] also focused on the 2016 US Presidential debates for which they obtained
annotations from different fact-checking organizations. An extension of this work resulted in
1
http://www.factcheck.org/
2
http://www.snopes.com/fact-check/
3
http://www.politifact.com/
4
http://fullfact.org/
5
https://credibilitycoalition.org/
6
https://eufactcheck.eu/
(a) Dutch (b) English (c) Spanish
Figure 1: Word-cloud representing top frequent words in different languages.
the development of ClaimRank, where the authors used more data and also included Arabic
content Jaradat et al. [31]. Alam et al. [3] focused on COVID 19 topics in languages which
are Arabic, Bulgarian, Dutch, and English, and achieved strong performances using pre-trained
language models. The study also discussed the utility of single-task and multitask settings. The
positive unlabelled learning technique for check-worthiness tasks has been introduced by Wright
and Augenstein [32] where authors experimented with this technique with the BERT model
on different datasets and achieved the best results on two datasets out of three. The study of
Alhindi et al. [33] introduced a multi-layer annotated news corpus and augmented discourse
structure to understand the relation between fact-checking and argumentation. The first Turkish
dataset for check-worthiness has been studied by Kartal and Kutlu [34], where BERT multilingual
outperforms other models.
Some notable research outcomes came from shared tasks. For example, the CLEF Check-
That! labs’ shared tasks [35, 36, 37, 38] in the past few years featured challenges on automatic
identification [39, 40] and verification [41, 42] of claims in political debates, and tweets [43].
3. Methodology
3.1. Data
The dataset we used in our study is obtained from CLEF CheckThat!2022 lab task1: Identifying
Relevant Claims in Tweets [25]. The data is based on the COVID-19 topic for Dutch and English
where Spanish is mixed of politics and COVID-19 topics, which is collected from Twitter. In
table 1, we present the distribution of the datasets that we used in this shared task to run our
experiments. In Figure 1, we present the word cloud for all three languages to understand the
most common words present in the datasets. We first removed the stopwords from the data and
then used the rest of the words to generate the most frequent words.
Table 1
Data splits and distributions of Subtask 1A: Check-worthiness of tweets
Class label Train Dev Test Total
Dutch
No 546 44 350 940
Yes 377 28 316 721
Total 923 72 666 1661
English
No 1675 151 110 1936
Yes 447 44 39 530
Total 2122 195 149 2466
Spanish
No 3087 2195 4296 9578
Yes 1903 305 704 2912
Total 4990 2500 5000 12490
3.2. Preprocessing
The CTL subtask-1A datasets are collected from Twitter. As a result, the data contains many
symbols, URLs, and invisible characters. We performed several preprocessing steps to clean the
noisy data. First, we perform URLs and unnecessary character removal steps by following the
approach discussed in [44]. Then, we removed the stopwords from the data. Finally, we removed
hashtag signs and usernames.
3.3. Models
We used both deep learning and traditional models to run classification experiments. As deep
learning algorithms, we used two transformer based models, BERT [45] and XLM-RoBERTa [46].
Several factors were considered while choosing the algorithms. Among the transformer based
models, BERT and XLM-RoBERTa are larger in parameter size.7 The number of parameters
and network size is responsible for computation time and performance of the learning. For these
two models, we used the multilingual version of the models. For the later case, we used the two
most popular algorithms such as (i) Random Forest (RF) [47], and (ii) Support Vector Machines
(SVM) [48].
3.4. Experiments
Transformers models We use the Transformer Toolkit [49] for transformer-based models.
We used learning rate of 1𝑒 − 5 to fine-tune each model [45]. Model specific tokenizer is available
with Transformer Toolkit that we used in our study. For transformer based model, we run 4, 2,
7
110 million parameters in BERT multilingual and 125 million in XLM-RoBERTa base
Table 2
Hyper-parameters for traditional models to reproduce the results.
Dutch English Spanish
Parameters
SVM RF SVM RF SVM RF
Number of Feature 1850 1500 1750 2800 3200 1700
N-gram 3 3 4 3 4 3
Random Seed 2814 2814 2814 2814 2814 2814
Table 3
Official results on the test set and overall ranking of Subtask 1A: Check-worthiness of tweets
Language Model F1 (postive class) Rank
Dutch BERT-m 0.497 5𝑡ℎ
English BERT-m 0.478 12𝑡ℎ
Spanish BERT-m 0.303 3𝑟𝑑
and 8 epochs for BERT-m model for Dutch, English, and Spanish languages, and 4, 4, and 8
epochs for Dutch, English, and Spanish languages for XLM-RoBERTa-base model.
Traditional Algorithms To train the classifiers using the above-mentioned traditional models,
we first transformed the preprocessed data into tf-idf vectors with weighted 𝑛-gram (unigram,
bigram and trigram) to use contextual information. The class distribution of provided dataset for
English and Spanish is not well balanced. Therefore, to balance the class distribution, we applied
oversampling techniques [50] for all three languages. We merged the train and dev-test set to
train the model. We applied the upsampling technique to the combined dataset with a ratio of 1.0
with respect to the negative class. In Table 2, we report the hyper-parameters with the values to
reproduce our results.
4. Results and Discussion
In Table 3, we report the official results and ranking evaluated by the lab organizers. The official
evaluation metric for subtask 1A is F1 measure with respect to the positive class.
In Table 4, we report the detailed classification results for each language. After releasing the
gold set once the submission period ends, we re-run all the experiments and reported the detailed
results. From the table, we can conclude that among the traditional models the performance of
SVM is much better than RF except for Spanish data where RF is 0.25% higher. The upsampling
technique for traditional models improves from 0.10% to 1.10% on different languages with
respect to the positive class. We know from the literature, transformer based models are well-
known for their performances and capabilities. Although XLM-Roberta base and BERT-m models
provide the best results for Dutch and English languages with respect to positive class, where the
traditional model outperforms the transformer models on Spanish language by a large margin.
Table 4
Detail results on the test set of Subtask 1A: Check-worthiness of tweets. Bold indicates positive
class F1 score. Underline indicates best F1 score for each language.
Class label Model Accuracy Precision Recall F1 Score
Dutch
No 60.85 61.71 61.28
SVM 59.01
Yes 56.91 56.01 56.46
No 57.85 73.71 64.82
RF 57.96
Yes 58.18 40.51 47.76
No 60.82 67.43 63.96
BERT-m 60.06
Yes 58.99 51.90 55.22
No 60.00 53.14 56.36
XLM-RoBERTa base 56.76
Yes 53.93 60.76 57.14
English
No 85.71 70.91 77.61
SVM 69.80
Yes 44.83 66.67 53.61
No 76.64 95.45 85.02
RF 75.17
Yes 58.33 17.95 27.45
No 89.86 56.36 69.27
BERT-m 63.09
Yes 40.00 82.05 53.78
No 91.11 37.27 52.90
XLM-RoBERTa base 51.01
Yes 33.65 89.74 48.95
Spanish
No 92.89 89.08 90.95
SVM 84.76
Yes 46.70 58.38 51.89
No 91.27 95.93 93.54
RF 88.62
Yes 63.92 44.03 52.14
No 91.75 69.34 78.99
BERT-m 68.30
Yes 24.87 61.93 35.49
No 90.33 73.72 81.18
XLM-RoBERTa base 70.64
Yes 24.43 51.85 33.21
5. Conclusion
In this study, we have run comparative experiments using different check-worthiness claim
datasets consisting of Dutch, English, and Spanish languages, which are provided by CLEF
CheckThat! lab 2022 organizers as a part of shared tasks. We cleaned the data to run the
classification experiments. We investigated different machine learning algorithms including
traditional (i.e., SVM) and deep learning models (i.e., BERT multilingual). Despite the cost
of increased resource and time complexity, transformer based models did not perform well for
Spanish language, however, outperformed the Dutch and English languages. Our study reveals
that the transformer based models outperforms the traditional machine learning approach for
Dutch and English language tasks.
6. Acknowledgments
We would like to thank the organizers and other participants in the challenge. We are thankful to
DIU NLP and ML Research Lab for the workplace support. Finally, thanks to all the anonymous
reviewers for their suggestions.
Part of this work is made within the Tanbih mega-project,8 developed at the Qatar Computing
Research Institute, HBKU, which aims to limit the impact of “fake news”, propaganda, and media
bias by making users aware of what they are reading, thus promoting media literacy and critical
thinking.
References
[1] A. Perrin, Social media usage. pew research center 2015: 52-68, 2020.
[2] F. Alam, F. Dalvi, S. Shaar, N. Durrani, H. Mubarak, A. Nikolov, G. Da San Martino,
A. Abdelali, H. Sajjad, K. Darwish, P. Nakov, Fighting the COVID-19 infodemic in social
media: A holistic perspective and a call to arms, in: Proceedings of the International
AAAI Conference on Web and Social Media, ICWSM ’21, 2021, pp. 913–922. URL:
https://ojs.aaai.org/index.php/ICWSM/article/view/18114.
[3] F. Alam, S. Shaar, F. Dalvi, H. Sajjad, A. Nikolov, H. Mubarak, G. Da San Mar-
tino, A. Abdelali, N. Durrani, K. Darwish, A. Al-Homaid, W. Zaghouani, T. Caselli,
G. Danoe, F. Stolk, B. Bruntink, P. Nakov, Fighting the COVID-19 infodemic: Mod-
eling the perspective of journalists, fact-checkers, social media platforms, policy mak-
ers, and the society, in: Findings of the Association for Computational Linguis-
tics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Domini-
can Republic, 2021, pp. 611–649. URL: https://aclanthology.org/2021.findings-emnlp.56.
doi:10.18653/v1/2021.findings-emnlp.56.
[4] P. Fortuna, S. Nunes, A survey on automatic detection of hate speech in text, ACM
Computing Surveys (CSUR) 51 (2018) 1–30.
[5] S. Brooke, “Condescending, Rude, Assholes”: Framing gender and hostility on Stack
Overflow, in: Proceedings of the Third Workshop on Abusive Language Online, Asso-
ciation for Computational Linguistics, Florence, Italy, 2019, pp. 172–180. URL: https:
//aclanthology.org/W19-3519. doi:10.18653/v1/W19-3519.
[6] S. Joksimovic, R. S. Baker, J. Ocumpaugh, J. M. L. Andres, I. Tot, E. Y. Wang, S. Dawson,
Automated identification of verbally abusive behaviors in online discussions, in: Proceedings
of the Third Workshop on Abusive Language Online, Association for Computational
8
http://tanbih.qcri.org
Linguistics, Florence, Italy, 2019, pp. 36–45. URL: https://aclanthology.org/W19-3505.
doi:10.18653/v1/W19-3505.
[7] G. Da San Martino, S. Cresci, A. Barrón-Cedeño, S. Yu, R. D. Pietro, P. Nakov, A survey
on computational propaganda detection, in: C. Bessiere (Ed.), Proceedings of the Twenty-
Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, ijcai.org, 2020,
pp. 4826–4832. URL: https://doi.org/10.24963/ijcai.2020/672. doi:10.24963/ijcai.
2020/672.
[8] G. Da San Martino, S. Yu, A. Barrón-Cedeño, R. Petrov, P. Nakov, Fine-grained analysis of
propaganda in news article, in: Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong
Kong, China, 2019, pp. 5636–5646. URL: https://www.aclweb.org/anthology/D19-1565.
doi:10.18653/v1/D19-1565.
[9] D. Dimitrov, B. Bin Ali, S. Shaar, F. Alam, F. Silvestri, H. Firooz, P. Nakov, G. Da San Mar-
tino, Detecting propaganda techniques in memes, in: Proceedings of the 59th Annual Meet-
ing of the Association for Computational Linguistics and the 11th International Joint Con-
ference on Natural Language Processing, ACL-IJCNLP ’21, Association for Computational
Linguistics, Online, 2021, pp. 6603–6617. URL: https://aclanthology.org/2021.acl-long.516.
doi:10.18653/v1/2021.acl-long.516.
[10] D. Dimitrov, B. Bin Ali, S. Shaar, F. Alam, F. Silvestri, H. Firooz, P. Nakov, G. Da San Mar-
tino, SemEval-2021 task 6: Detection of persuasion techniques in texts and im-
ages, in: Proceedings of the 15th International Workshop on Semantic Evaluation, Se-
mEval ’21, Association for Computational Linguistics, Online, 2021, pp. 70–98. URL:
https://aclanthology.org/2021.semeval-1.7. doi:10.18653/v1/2021.semeval-1.7.
[11] S. Pramanick, S. Sharma, D. Dimitrov, M. S. Akhtar, P. Nakov, T. Chakraborty, MO-
MENTA: A multimodal framework for detecting harmful memes and their targets, in:
Findings of the Association for Computational Linguistics: EMNLP 2021, Associa-
tion for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 4439–
4455. URL: https://aclanthology.org/2021.findings-emnlp.379. doi:10.18653/v1/2021.
findings-emnlp.379.
[12] H. Mubarak, K. Darwish, W. Magdy, Abusive language detection on arabic social media,
in: Proceedings of the first workshop on abusive language online, 2017, pp. 52–56.
[13] C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, B. Desmet, G. De Pauw, W. Daele-
mans, V. Hoste, Detection and fine-grained classification of cyberbullying events, in:
Proceedings of the International Conference Recent Advances in Natural Language Pro-
cessing, INCOMA Ltd. Shoumen, BULGARIA, Hissar, Bulgaria, 2015, pp. 672–680. URL:
https://aclanthology.org/R15-1086.
[14] R. Kumar, A. K. Ojha, S. Malmasi, M. Zampieri, Benchmarking aggression identification
in social media, in: Proceedings of the First Workshop on Trolling, Aggression and
Cyberbullying, 2018, pp. 1–11.
[15] A. Bondielli, F. Marcelloni, A survey on fake news and rumour detection techniques,
Information Sciences 497 (2019) 38–55.
[16] X. Zhou, R. Zafarani, A survey of fake news: Fundamental theories, detection methods,
and opportunities, CSUR 53 (2020) 1–40.
[17] F. Alam, S. Cresci, T. Chakraborty, F. Silvestri, D. Dimitrov, G. D. S. Martino, S. Shaar,
H. Firooz, P. Nakov, A survey on multimodal disinformation detection, arXiv:2103.12541
(2021).
[18] T. H. Afridi, A. Alam, M. N. Khan, J. Khan, Y. K. Lee, A multimodal memes classification:
A survey and open research issues, in: 5th International Conference on Smart City Appli-
cations, SCA 2020, Springer Science and Business Media Deutschland GmbH, 2021, pp.
1451–1466.
[19] B. Haidar, M. Chamoun, F. Yamout, Cyberbullying detection: A survey on multilingual
techniques, in: 2016 European Modelling Symposium (EMS), 2016, pp. 165–171. doi:10.
1109/EMS.2016.037.
[20] F. Husain, O. Uzuner, A survey of offensive language detection for the arabic language,
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
20 (2021) 1–44.
[21] S. Shaar, F. Alam, G. D. S. Martino, P. Nakov, Assisting the human fact-checkers:
Detecting all previously fact-checked claims in a document, arXiv:2109.07410 (2021).
arXiv:2109.07410.
[22] P. Nakov, D. Corney, M. Hasanain, F. Alam, T. Elsayed, A. Barrón-Cedeño, P. Papotti,
S. Shaar, G. D. S. Martino, Automated fact-checking for assisting human fact-checkers, in:
Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI ’21,
2021, pp. 4551–4558.
[23] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
N. Babulkov, Y. S. Kartal, J. Beltrán, The clef-2022 checkthat! lab on fighting the
covid-19 infodemic and fake news detection, in: M. Hagen, S. Verberne, C. Macdonald,
C. Seifert, K. Balog, K. Nørvåg, V. Setty (Eds.), Advances in Information Retrieval, Springer
International Publishing, Cham, 2022, pp. 416–428.
[24] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, J. M. Struß, T. Mandl, R. Míguez,
T. Caselli, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, G. K. Shahi, H. Mubarak, A. Nikolov,
N. Babulkov, Y. S. Kartal, J. Beltrán, M. Wiegand, M. Siegel, J. Köhler, Overview of the
CLEF-2022 CheckThat! lab on fighting the COVID-19 infodemic and fake news detection,
in: Proceedings of the 13th International Conference of the CLEF Association: Information
Access Evaluation meets Multilinguality, Multimodality, and Visualization, CLEF ’2022,
Bologna, Italy, 2022.
[25] P. Nakov, A. Barrón-Cedeño, G. Da San Martino, F. Alam, R. Míguez, T. Caselli, M. Kutlu,
W. Zaghouani, C. Li, S. Shaar, H. Mubarak, A. Nikolov, Y. S. Kartal, J. Beltrán, Overview
of the CLEF-2022 CheckThat! lab task 1 on identifying relevant claims in tweets, in:
Working Notes of CLEF 2022—Conference and Labs of the Evaluation Forum, CLEF ’2022,
Bologna, Italy, 2022.
[26] F. Alam, A. Hasan, T. Alam, A. Khan, J. Tajrin, N. Khan, S. A. Chowdhury, A review
of bangla natural language processing tasks and the utility of transformer models, arXiv
preprint arXiv:2107.03844 (2021).
[27] K. Singh, G. Lima, M. Cha, C. Cha, J. Kulshrestha, Y.-Y. Ahn, O. Varol, Misinformation,
believability, and vaccine acceptance over 40 countries: Takeaways from the initial phase of
the covid-19 infodemic, Plos one 17 (2022) e0263381.
[28] M. Stencel, Number of fact-checking outlets surges to 188 in more than 60 countries, Duke
Reporters’ LAB (2019) 12–17.
[29] N. Hassan, C. Li, M. Tremayne, Detecting check-worthy factual claims in presidential
debates, in: Proceedings of the 24th ACM International on Conference on Information and
Knowledge Management, CIKM ’15, Association for Computing Machinery, Melbourne,
Australia, 2015, pp. 1835–1838. URL: https://doi.org/10.1145/2806416.2806652. doi:10.
1145/2806416.2806652.
[30] P. Gencheva, P. Nakov, L. Màrquez, A. Barrón-Cedeño, I. Koychev, A context-aware
approach for detecting worth-checking claims in political debates, in: Proceedings of the
International Conference Recent Advances in Natural Language Processing, RANLP ’17,
INCOMA Ltd., Varna, Bulgaria, 2017, pp. 267–276. URL: https://doi.org/10.26615/
978-954-452-049-6_037.
[31] I. Jaradat, P. Gencheva, A. Barrón-Cedeño, L. Màrquez, P. Nakov, ClaimRank: Detecting
check-worthy claims in Arabic and English, in: Proceedings of the 2018 Conference of the
North American Chapter of the Association for Computational Linguistics: Demonstrations,
NAACL-HLT ’18, Association for Computational Linguistics, New Orleans, Louisiana,
USA, 2018, pp. 26–30. URL: https://aclanthology.org/N18-5006. doi:10.18653/v1/
N18-5006.
[32] D. Wright, I. Augenstein, Claim check-worthiness detection as positive unlabelled learning,
arXiv preprint arXiv:2003.02736 (2020).
[33] T. Alhindi, B. McManus, S. Muresan, What to fact-check: Guiding check-worthy informa-
tion detection in news articles through argumentative discourse structure, in: Proceedings
of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021,
pp. 380–391.
[34] Y. S. Kartal, M. Kutlu, Trclaim-19: The first collection for turkish check-worthy claim de-
tection with annotator rationales, in: Proceedings of the 24th Conference on Computational
Natural Language Learning, 2020, pp. 386–395.
[35] P. Nakov, A. Barrón-Cedeño, T. Elsayed, R. Suwaileh, L. Màrquez, W. Zaghouani,
P. Atanasova, S. Kyuchukov, G. Da San Martino, Overview of the CLEF-2018 Check-
That! lab on automatic identification and verification of political claims, in: CLEF,
Lecture Notes in Computer Science, Springer, Avignon, France, 2018, pp. 372–387. URL:
https://link.springer.com/chapter/10.1007/978-3-319-98932-7_32#citeas.
[36] T. Elsayed, P. Nakov, A. Barrón-Cedeño, M. Hasanain, R. Suwaileh, G. Da San Martino,
P. Atanasova, Overview of the CLEF-2019 CheckThat!: Automatic identification and
verification of claims, in: Experimental IR Meets Multilinguality, Multimodality, and
Interaction, LNCS, Lugano, Switzerland, 2019.
[37] T. Elsayed, P. Nakov, A. Barrón-Cedeño, M. Hasanain, R. Suwaileh, G. Da San Martino,
P. Atanasova, CheckThat! at CLEF 2019: Automatic identification and verification of
claims, in: Advances in Information Retrieval, ECIR ’19, Springer International Publishing,
Cologne, Germany, 2019, pp. 309–315. URL: https://link.springer.com/chapter/10.1007/
978-3-030-15719-7_41.
[38] S. Shaar, F. Alam, G. Da San Martino, A. Nikolov, W. Zaghouani, P. Nakov, A. Feldman,
Findings of the NLP4IF-2021 shared tasks on fighting the COVID-19 infodemic and cen-
sorship detection, in: Proceedings of the Fourth Workshop on NLP for Internet Freedom:
Censorship, Disinformation, and Propaganda, NLP4IF ’21’, Association for Computa-
tional Linguistics, Online, 2021, pp. 82–92. URL: https://aclanthology.org/2021.nlp4if-1.12.
doi:10.18653/v1/2021.nlp4if-1.12.
[39] P. Atanasova, L. Màrquez, A. Barrón-Cedeño, T. Elsayed, R. Suwaileh, W. Zaghouani,
S. Kyuchukov, G. Da San Martino, P. Nakov, Overview of the CLEF-2018 CheckThat! lab
on automatic identification and verification of political claims, task 1: Check-worthiness,
in: CLEF 2018 Working Notes. Working Notes of CLEF 2018 - Conference and Labs of the
Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, Avignon, France, 2018.
[40] P. Atanasova, P. Nakov, G. Karadzhov, M. Mohtarami, G. D. S. Martino, Overview of
the CLEF-2019 CheckThat! Lab on Automatic Identification and Verification of Claims.
Task 1: Check-Worthiness, in: CLEF 2019 Working Notes, CEUR Workshop Proceedings,
CEUR-WS.org, Lugano, Switzerland, 2019.
[41] A. Barrón-Cedeño, T. Elsayed, R. Suwaileh, L. Màrquez, P. Atanasova, W. Zaghouani,
S. Kyuchukov, G. Da San Martino, P. Nakov, Overview of the CLEF-2018 CheckThat!
lab on automatic identification and verification of political claims, task 2: Factuality, in:
CLEF 2018 Working Notes. Working Notes of CLEF 2018 - Conference and Labs of the
Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, Avignon, France, 2018.
[42] M. Hasanain, R. Suwaileh, T. Elsayed, A. Barrón-Cedeño, P. Nakov, Overview of the
CLEF-2019 CheckThat! Lab on Automatic Identification and Verification of Claims. Task
2: Evidence and Factuality, in: CLEF 2019 Working Notes. Working Notes of CLEF
2019 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings,
CEUR-WS.org, Lugano, Switzerland, 2019.
[43] P. Nakov, G. Da San Martino, T. Elsayed, A. Barrón-Cedeño, R. Míguez, S. Shaar, F. Alam,
F. Haouari, M. Hasanain, W. Mansour, B. Hamdan, Z. S. Ali, N. Babulkov, A. Nikolov,
G. K. Shahi, J. M. Struß, T. Mandl, M. Kutlu, Y. S. Kartal, Overview of the CLEF-
2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims,
and fake news, in: K. Candan, B. Ionescu, L. Goeuriot, B. Larsen, H. Müller, A. Joly,
M. Maistro, F. Piroi, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the Twelfth International Conference of the
CLEF Association, LNCS (12880), Springer, 2021. URL: https://link.springer.com/chapter/
10.1007/978-3-030-72240-1_75.
[44] F. Alam, H. Sajjad, M. Imran, F. Ofli, CrisisBench: Benchmarking crisis-related social
media datasets for humanitarian information processing, in: Proceedings of the International
AAAI Conference on Web and Social Media, ICWSM ’21, 2021, pp. 923–932. URL:
https://ojs.aaai.org/index.php/ICWSM/article/view/18115.
[45] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
transformers for language understanding, in: Proceedings of the 2019 Conference of
the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, NAACL-HLT ’19, Association for Computational Linguistics, Min-
neapolis, Minnesota, USA, 2019, pp. 4171–4186. URL: https://www.aclweb.org/anthology/
N19-1423. doi:10.18653/v1/N19-1423.
[46] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning
at scale, in: Proceedings of the 58th Annual Meeting of the Association for Computa-
tional Linguistics, ACL ’20, Association for Computational Linguistics, Online, 2020, pp.
8440–8451. URL: https://aclanthology.org/2020.acl-main.747. doi:10.18653/v1/2020.
acl-main.747.
[47] A. Liaw, M. Wiener, et al., Classification and regression by random forest, R news 2 (2002)
18–22.
[48] J. Platt, Sequential minimal optimization: A fast algorithm for training support vector
machines, Technical Report, Microsoft, Redmond, USA., 1998.
[49] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu,
T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-art
natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods
in Natural Language Processing: System Demonstrations, EMNLP ’20, Association for
Computational Linguistics, Online, 2020, pp. 38–45. URL: https://aclanthology.org/2020.
emnlp-demos.6. doi:10.18653/v1/2020.emnlp-demos.6.
[50] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, Smote: synthetic minority
over-sampling technique, Journal of artificial intelligence research 16 (2002) 321–357.