=Paper=
{{Paper
|id=Vol-3740/paper-49
|storemode=property
|title=Mela at CheckThat! 2024: Transferring Persuasion Detection from English to Arabic
- A Multilingual BERT Approach
|pdfUrl=https://ceur-ws.org/Vol-3740/paper-49.pdf
|volume=Vol-3740
|authors=Sara Nabhani,Md Abdur Razzaq Riyadh
|dblpUrl=https://dblp.org/rec/conf/clef/NabhaniR24
}}
==Mela at CheckThat! 2024: Transferring Persuasion Detection from English to Arabic
- A Multilingual BERT Approach==
Mela at CheckThat! 2024: Transferring Persuasion
Detection from English to Arabic - A Multilingual BERT
Approach
Notebook for the CheckThat! Lab at CLEF 2024
Sara Nabhani, Md Abdur Razzaq Riyadh
Department of Artificial Intelligence, University of Malta
Abstract
This paper presents our system’s participation in CheckThat! Lab Task 3, focuses on identifying persuasion
techniques in Arabic text. We solely focused on Arabic, a low-resource language for this task. The task required
identifying any persuasion technique applied to individual tokens within the text. Only the test set was provided
for Arabic for this task, without any corresponding development or training sets. Our research aimed to investigate
how a resource-rich language like English could benefit the low-resource Arabic language in the context of
persuasion detection. To that end, we utilized a multilingual BERT which incorporated English and Arabic
knowledge during its pre-training stage. Our system achieved first place on the Arabic leaderboard in the
shared task. The result, achieved without training on Arabic data, highlights the effectiveness of multilingual
BERT models. This also demonstrates the potential of using resource-rich languages like English to enhance
performance in low-resource languages such as Arabic for persuasion detection tasks.
Keywords
arabic, propaganda, persuasion
1. Introduction
Throughout history, propaganda has played a significant role in shaping public opinion. Propaganda
uses various persuasive techniques to influence the way people think and act. With the advent of the
digital age, the impact of propaganda has grown even stronger. Nowadays, persuasive techniques are
widely used as tools for spreading propaganda through digital platforms. The increasing use of these
persuasion techniques highlights the need for advanced methods to identify and critically evaluate
them. This need has become urgent as the volume of digital content continues to rise, making it easier
for propaganda to spread rapidly.
This paper describes our approach to the CheckThat! task 3 [1, 2], focuses on the identification of
persuasion techniques within the textual spans of Arabic articles. The goal of this task is to detect
various techniques used to persuade readers within Arabic texts. However, a significant challenge
we faced was the lack of training data for Arabic. While the task provided training data for several
languages, including English, French, Italian, German, Russian, and Polish, there was no training set
available for Arabic. This lack of training data for Arabic made it difficult to develop a model specifically
trained on Arabic texts. To overcome this challenge, we used the training data from the English set
to fine-tune a multilingual BERT model [3] and then evaluated it on the Arabic test set. Thus, our
study investigates the effectiveness of using a high-resource language, such as English, to enhance
the performance of a model for a low-resource language like Arabic. In the context of the persuasion
technique identification task, we aimed to demonstrate that a model trained on English data could still
perform effectively when applied to Arabic texts. This approach is based on the idea of cross-lingual
transfer learning, where knowledge gained from one language can be transferred to another language.
The paper is structured as follows: Section 2 reviews previous works in this area. Section 3 outlines
our proposed system in detail. Section 4 presents the results. Section 4 discusses our findings and their
CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
$ sara.nabhani.23@um.edu.mt (S. Nabhani); md.riyadh.23@um.edu.mt (M. A. R. Riyadh)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
implications. Finally, Section 5 concludes the paper and suggests directions for future research.
2. Related Work
Persuasion detection has traditionally focused on analyzing entire documents or paragraphs. However,
a fairly recent study introduced the task of identifying persuasion techniques at the token level [4].
Their work is significant because it provides one of the earlier datasets annotated with propaganda
techniques at the character level. This allows researchers to employ multi-label, multi-class classification
techniques for persuasion detection with finer granularity [4]. The author utilized BERT [3] for this
downstream task and evaluated using a modified F1 score to consider partial matching.
Several recent studies have explored persuasion detection via shared tasks like SemEval and ArAIEval
[5, 6, 7]. BERT-based classifiers are a popular choice for these tasks due to their effectiveness in text
classification [5]. There is a challenge with label distribution, as some persuasion techniques appear
much less frequently than others. Moreover, most tokens within the data lack any persuasion labels.
This is addressed by employing techniques like class weighting during loss calculation [8]. Additionally,
multi-task architectures utilizing shared representations from pre-trained models like BERT have shown
good results [9]. For persuasion detection in Arabic, previous works are commonly based on AraBERT
[10] [7]. Propaganda detection in Arabic also benefits from preprocessing steps such as reversing
code-switching and emoji conversion [11].
Pre-trained multilingual models are integral to the NLP tasks for low-resource languages. BERT
[3] itself offers two multilingual versions: cased and uncased. These models are impressive in their
scope, being trained on over 100 languages. The training process leverages masked language modelling
and next token prediction objectives, allowing the model to learn generalizable representations across
languages. XLM [12] is another set of multilingual models that uses translation objective alongside
causal and masked language modeling for pre-training. Similarly, mBART [13] builds upon the BART
model [14] by using a multilingual pre-training objective. The objective is reconstructing the original
text from a corrupted version in multiple languages, allowing mBART to develop robust denoising
capabilities.
The growing popularity of cross-lingual transfer learning offers a promising approach to improve
performance on Arabic NLP tasks. This is demonstrated by employing task-specific fine-tuning on
English and French data to improve Arabic NLU performance [15]. Similarly, for abstractive summa-
rization of Arabic text, fine-tuning multilingual models (mBERT and mBART) on Hungarian or English
before fine-tuning again on Arabic data demonstrated performance gains [16]. These findings highlight
the effectiveness of cross-lingual transfer learning in improving the performance of Arabic language
processing tasks.
3. Methodology
In this section, we describe the methodology employed for detecting persuasion techniques in Arabic
articles using a multilingual BERT model fine-tuned on English data.
3.1. Data Preparation
The data for this task was provided in the form of article files, with the corresponding labels given in a
separate file. The label file contained information about the persuasion techniques used and the offsets
indicating the span of text within the articles where these techniques were applied. There are 23 labels
representing different persuasion techniques. These techniques are identified within the text at the
token level, allowing for multi-label classification where each token can be associated with one or more
techniques. This detailed annotation allows the model to recognize and classify multiple techniques
within a single span of text.
For preprocessing, we first split the articles into paragraphs. This was done based on empty lines,
effectively treating each paragraph as a separate instance. Once the articles were divided into paragraphs,
we calculated the offsets for the persuasive spans within each paragraph. This allowed us to align the
provided labels with the appropriate paragraphs.
3.2. Task Formulation
We formulated the task as a multi-class, multi-label token classification problem. This means that each
token (or word) in the input text could be classified into one or more persuasion technique categories.
This approach enables the model to recognize multiple techniques that may be present in a single span
of text. After predicting labels for each of the tokens, consecutive tokens with the same labels define a
span. Table 1 demonstrates an example.
Table 1
An input sequence of length 256, where each token can be assigned one or more persuasion techniques. Consec-
utive tokens with the same labels form a span. In this example, {𝑥3 , 𝑥4 , 𝑥5 } form a span for 𝑡1 ; {𝑥1 , 𝑥2 } form a
span for 𝑡2 ; {𝑥5 , 𝑥6 } form a span for 𝑡2 ; {𝑥4 } is a span for 𝑡3 ; and {𝑥6 } is a span for 𝑡3 .
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 ... 𝑥256
𝑡1 0 0 1 1 1 0 0 0
𝑡2 1 1 0 0 1 1 0 0
𝑡3 0 0 0 1 0 1 0 0
: : : : : : : : :
𝑡23 0 0 0 0 0 0 0 0
3.3. Model and Training
We employed a multilingual BERT model for this task. Multilingual BERT (mBERT ) is pre-trained on
multiple languages, including Arabic and English, making it suitable for cross-lingual transfer learning.
For the loss calculation, we used binary cross-entropy, which is well-suited for multi-label classification
tasks.
Given the lack of Arabic training data and the zero-shot nature of the task for Arabic, we used the
provided English training data to fine-tune the mBERT model. Since there was no Arabic data provided
for validation, we utilized the Arabic validation dataset from the ArAIEval shared task on propaganda
detection 2024.1 This validation dataset consists of 921 documents, with an average of 30.25 tokens per
document, and follows the same labelling and annotation guidelines. The following hyperparameters
were used during training:
• Learning Rate: 5e-5
• Number of Epochs: 75
• Maximum Input Length: 256 tokens
Additionally, we utilized pos_weights to adjust the loss calculation. This helps in handling the class
imbalance, ensuring that the model does not become biased towards the more frequent classes.
4. Results and Discussion
For evaluation, we used the modified F1-micro score, which accounts for partial matching of the spans.
All the scores reported in this paper use that modified F1. Our model, fine-tuned using the English
1
https://araieval.gitlab.io/task1/
training data and validated on the Arabic dev dataset, achieved an F1-micro score of 0.0998 on the
dev set. When evaluated on the test set, the model’s performance improved significantly, achieving
an F1-micro score of 0.3009. The difference in performance between the dev and test sets could be
attributed to the domain-specific nuances and potential distributional differences in the test set.
Below is a detailed breakdown of the F1-micro scores per technique on the validation set, as shown
in Table 2.
Table 2
F1-micro scores per technique on the validation set.
Technique F1-micro Technique F1-micro
Appeal to Values 0.6207 Questioning the Reputation 0.0000
Loaded Language 0.1855 Straw Man 0.2138
Consequential Oversimplification 0.6897 Repetition 0.0292
Causal Oversimplification 0.0542 Guilt by Association 0.0443
Appeal to Hypocrisy 0.0114 Conversation Killer 0.1724
False Dilemma-No Choice 0.0172 Whataboutism 0.2759
Slogans 0.0661 Obfuscation-Vagueness-Confusion 0.1034
Name Calling-Labeling 0.1257 Flag Waving 0.0709
Doubt 0.0483 Appeal to Fear-Prejudice 0.0472
Exaggeration-Minimisation 0.0983 Red Herring 0.5862
Appeal to Popularity 0.5517 Appeal to Authority 0.0949
Appeal to Time 0.8276
The results reveal a significant variation in the model’s performance across different persuasion
techniques. Techniques such as Appeal to Time, Consequential Oversimplification, and Appeal to
Values were detected more reliably, indicating that the model can effectively identify these patterns.
In contrast, techniques like Loaded Language, Straw Man, and Whataboutism showed moderate
performance. Techniques like Questioning the Reputation, Repetition, False Dilemma-No Choice, and
Appeal to Hypocrisy posed significant difficulties for the model. These techniques may be underrepre-
sented in the training data, further complicating their detection.
The variation in performance can also be attributed to the nature and categorization of the techniques.
Techniques that belong to the same category, such as different types of logical fallacies or emotional
appeals, may share linguistic features that the model struggles to distinguish. For example, both Straw
Man and Whataboutism involve misrepresentation or diversion tactics, which could confuse the model.
On the other hand, techniques like Appeal to Values and Appeal to Popularity, which are more explicit
and direct, tend to be easier for the model to identify.
It’s important to note that no Arabic data was available for training. We relied on the English training
data to fine-tune the multilingual BERT model. This cross-lingual transfer learning approach introduces
additional challenges due to differences in linguistic structures and contextual usage between English
and Arabic.
5. Conclusion and Future Work
With the increasing sophistication of persuasion techniques, particularly in Arabic-language content, it
is crucial to focus research efforts on this area. This study investigated the effectiveness of a multilingual
BERT model fine-tuned on English data for the task of Arabic persuasion detection. English was selected
as the training language due to its extensive resources in Natural Language Processing (NLP) tasks,
including propaganda detection. Our aim was to evaluate how these abundant resources could be
leveraged to benefit languages with fewer resources, such as Arabic. This work achieved first place for
Arabic on the leaderboard for the test set, demonstrating the potential of cross-lingual transfer learning
[17]. However, there is still room for improvement.
Future work can explore how other high-resource languages impact the performance on Arabic. There
might be various strategies to enhance the model’s performance. Increasing the diversity and quantity
of training data, particularly for techniques where performance was low, through data augmentation or
the collection of additional labelled data, can help balance the dataset. Advanced fine-tuning techniques
like focal loss can adjust the loss function to focus more on hard-to-classify examples, while dynamic
sampling strategies can address class imbalance.
Additionally, incorporating more sophisticated features such as syntactic and semantic information,
part-of-speech tags, or dependency parsing can provide the model with greater context and improve
classification accuracy. Exploring alternative hidden layer representations within BERT may also yield
better classification performance. By addressing these areas, future research can further improve the
accuracy and robustness of models in detecting a wide range of persuasion techniques, ultimately
enhancing their utility in real-world applications.
Acknowledgments
We acknowledge the assistance of the LT-Bridge Project (GA 952194) and DFKI for the use of their
Virtual Laboratory. Also, authors have been supported financially by the EMLCT 2 programme during
this entire work.
References
[1] G. Faggioli, N. Ferro, P. Galuščáková, A. García Seco de Herrera (Eds.), Working Notes of CLEF
2024 - Conference and Labs of the Evaluation Forum, CLEF 2024, Grenoble, France, 2024.
[2] J. Piskorski, N. Stefanovitch, F. Alam, R. Campos, D. Dimitrov, A. Jorge, S. Pollak, N. Ribin, Z. Fijavž,
M. Hasanain, N. Guimarães, A. F. Pacheco, E. Sartori, P. Silvano, A. V. Zwitter, I. Koychev, N. Yu,
P. Nakov, G. Da San Martino, Overview of the CLEF-2024 CheckThat! lab task 3 on persuasion
techniques, in: [1], 2024.
[3] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[4] S. Yu, G. D. S. Martino, P. Nakov, Experiments in Detecting Persuasion Techniques in the News,
2019. URL: http://arxiv.org/abs/1911.06815, arXiv:1911.06815 [cs].
[5] D. Dimitrov, B. Bin Ali, S. Shaar, F. Alam, F. Silvestri, H. Firooz, P. Nakov, G. Da San Mar-
tino, SemEval-2021 task 6: Detection of persuasion techniques in texts and images, in:
A. Palmer, N. Schneider, N. Schluter, G. Emerson, A. Herbelot, X. Zhu (Eds.), Proceedings of
the 15th International Workshop on Semantic Evaluation (SemEval-2021), Association for Com-
putational Linguistics, Online, 2021, pp. 70–98. URL: https://aclanthology.org/2021.semeval-1.7.
doi:10.18653/v1/2021.semeval-1.7.
[6] J. Piskorski, N. Stefanovitch, G. Da San Martino, P. Nakov, SemEval-2023 Task 3: Detecting
the Category, the Framing, and the Persuasion Techniques in Online News in a Multi-lingual
Setup, in: Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-
2023), Association for Computational Linguistics, Toronto, Canada, 2023, pp. 2343–2361. URL:
https://aclanthology.org/2023.semeval-1.317. doi:10.18653/v1/2023.semeval-1.317.
[7] M. Hasanain, F. Alam, H. Mubarak, S. Abdaljalil, W. Zaghouani, P. Nakov, G. D. S. Martino, A. A.
Freihat, Araieval shared task: Persuasion techniques and disinformation detection in arabic text,
arXiv preprint arXiv:2311.03179 (2023).
[8] K. Gupta, D. Gautam, R. Mamidi, Volta at semeval-2021 task 6: Towards detecting persuasive texts
and images using textual and multimodal ensemble (2021). URL: http://arxiv.org/abs/2106.00240,
arXiv:2106.00240 [cs].
[9] K. Kaczyński, P. Przybyła, Homados at semeval-2021 task 6: Multi-task learning for propa-
ganda detection, in: Proceedings of the 15th International Workshop on Semantic Evaluation
(SemEval-2021), Association for Computational Linguistics, Online, 2021, p. 1027–1031. URL:
https://aclanthology.org/2021.semeval-1.141. doi:10.18653/v1/2021.semeval-1.141.
2
https://mundus-web.coli.uni-saarland.de/
[10] W. Antoun, F. Baly, H. Hajj, AraBERT: Transformer-based model for Arabic language understand-
ing, in: H. Al-Khalifa, W. Magdy, K. Darwish, T. Elsayed, H. Mubarak (Eds.), Proceedings of the 4th
Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive
Language Detection, European Language Resource Association, Marseille, France, 2020, pp. 9–15.
URL: https://aclanthology.org/2020.osact-1.2.
[11] B. Tuck, F. Qachfar, D. Boumber, R. Verma, Detectiveredasers at araieval shared task: Lever-
aging transformer ensembles for arabic deception detection, in: Proceedings of ArabicNLP
2023, Association for Computational Linguistics, Singapore (Hybrid), 2023, p. 494–501. URL:
https://aclanthology.org/2023.arabicnlp-1.45. doi:10.18653/v1/2023.arabicnlp-1.45.
[12] G. Lample, A. Conneau, Cross-lingual language model pretraining, arXiv preprint arXiv:1901.07291
(2019).
[13] Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, L. Zettlemoyer, Multilin-
gual denoising pre-training for neural machine translation, Transactions of the Association for
Computational Linguistics 8 (2020) 726–742.
[14] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer,
Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation,
and comprehension, arXiv preprint arXiv:1910.13461 (2019).
[15] K. Abboud, O. Golovneva, C. DiPersio, Cross-lingual transfer for low-resource arabic language
understanding, in: H. Bouamor, H. Al-Khalifa, K. Darwish, O. Rambow, F. Bougares, A. Abdelali,
N. Tomeh, S. Khalifa, W. Zaghouani (Eds.), Proceedings of the Seventh Arabic Natural Language
Processing Workshop (WANLP), Association for Computational Linguistics, Abu Dhabi, United
Arab Emirates (Hybrid), 2022, p. 225–237. URL: https://aclanthology.org/2022.wanlp-1.21. doi:10.
18653/v1/2022.wanlp-1.21.
[16] M. Kahla, Z. G. Yang, A. Novák, Cross-lingual fine-tuning for abstractive arabic text summarization,
in: Proceedings of the international conference on recent advances in natural language processing
(ranlp 2021), 2021, pp. 655–663.
[17] A. Barrón-Cedeño, F. Alam, T. Chakraborty, T. Elsayed, P. Nakov, P. Przybyła, J. M. Struß, F. Haouari,
M. Hasanain, F. Ruggeri, X. Song, R. Suwaileh, The CLEF-2024 CheckThat! Lab: Check-worthiness,
subjectivity, persuasion, roles, authorities, and adversarial robustness, in: N. Goharian, N. Tonel-
lotto, Y. He, A. Lipani, G. McDonald, C. Macdonald, I. Ounis (Eds.), Advances in Information
Retrieval, Springer Nature Switzerland, Cham, 2024, pp. 449–458.