=Paper=
{{Paper
|id=Vol-3740/paper-240
|storemode=property
|title=An Oppositional Thinking Analysis Method Using BERT-based Model with BiGRU
|pdfUrl=https://ceur-ws.org/Vol-3740/paper-240.pdf
|volume=Vol-3740
|authors=Qingbiao Hu,Zhongyuan Han,Jiangao Peng,Mingcan Guo,Chang Liu
|dblpUrl=https://dblp.org/rec/conf/clef/HuHPGL24
}}
==An Oppositional Thinking Analysis Method Using BERT-based Model with BiGRU==
An Oppositional Thinking Analysis Method Using
BERT-based Model with BiGRU
Notebook for PAN at CLEF 2024
Qingbiao Hu, Zhongyuan Han* , Jiangao Peng, Mingcan Guo and Chang Liu
Foshan University, Foshan, China
Abstract
The Oppositional thinking analysis: Conspiracy theories vs critical thinking narratives task of PAN at CLEF 2024
involves two challenges: first, distinguishing between conspiracy and critical narratives as Subtask 1, and second,
identifying key elements of oppositional narratives as Subtask 2. We consider these two challenges as binary
classification and sequence labeling problems, respectively. We will perform both tasks in English and Spanish.
In this paper, we introduce our method to address these challenges by fine-tuning a BERT-based model with an
added BiGRU layer for Subtask 1 and employing a multi-task learning method for Subtask 2. Finally, our model
for English achieves MCC scores of 0.821 in Subtask 1 and Span-F1 scores of 0.569 in Subtask 2 on the official test
set.
Keywords
PAN 2024, Oppositional Thinking Analysis, BERT-based Model, Multi-task Learning
1. Introduction
As it is acknowledged that conspiracy theories pose significant harm to society and are challenging to
identify [1], the difficulty lies in distinguishing them from critical thinking narratives, as both share
similarities in oppositional thinking. However, it is crucial to differentiate between them, as failure to
do so could push people toward conspiracy communities, as shown in [2]. The PAN at CLEF 2024 task
[3] on oppositional thinking analysis [4] aims to address this problem. It includes two subtasks framed
as a binary classification task and a token-level classification task, respectively.
The automatic detection of conspiracy theories in text using pre-trained language models has proven
effective [5] in recent years. Combining the transformer-based model with downstream neural networks
has achieved state-of-the-art performance in similar tasks [6]. Inspired by related works, we employ
CT-BERT [7] and BiGRU (Bidirectional Gated Recurrent Units) [8] to address this task. By integrating
the BERT-based layer with the BiGRU layer, we leverage the benefits of deep contextual embeddings
and sequence-sensitive features.
2. Oppositional thinking analysis Task
At PAN 2024 there are two subtasks proposed for oppositional thinking analysis:
• Subtask 1: Distinguishing between critical and conspiracy texts. It is a binary classification
task that aims to distinguish between two types of messages: the first contains critical messages
that scrutinize significant decisions within the public health sector without endorsing a con-
spiratorial mindset; the second includes messages that interpret the pandemic or public health
decisions as the result of a malignant conspiracy orchestrated by secretive, powerful entities. Our
task is to categorize these texts into distinct categories: CONSPIRACY or CRITICAL.
CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
*
Corresponding author.
$ ezio411152084@gmail.com (Q. Hu); hanzhongyuan@gmail.com (Z. Han); wyd1n910@gmail.com (J. Peng);
gmc9812@163.com (M. Guo); lc965024004@gmail.com (C. Liu)
0009-0004-8237-0044 (Q. Hu); 0000-0001-8960-9872 (Z. Han); 0009-0006-3780-5023 (J. Peng); 0000-0002-4977-2138
(M. Guo); 0009-0000-0887-9273 (C. Liu)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
• Subtask 2: Detecting elements of the oppositional narratives. It is a token-level classification
task aimed at recognizing text spans corresponding to the key elements of oppositional narratives.
A span-level annotation scheme that identifies the Agents (A), Facilitators (F), Campaigners (C),
Victims (V), Effects (E), Objectives (O) in the oppositional narratives was developed. Our task is
to identify specific spans in texts that should be annotated with the corresponding labels.
3. Method
Generally speaking, our method consists of two main parts: the BERT-based encoder and the BiGRU
downstream neural network layer for both Subtask 1 and Subtask 2. Our method involves three primary
steps: 1) fine-tune the pre-trained BERT-based model with the given training dataset, 2) feed the
sequence of embeddings from the BERT-based model into a BiGRU layer and 3) Use the outputs from
the BiGRU layer, typically the final hidden states that encapsulate the information from the entire
sequence, to classify the text into categories (e.g., critical or conspiracy) in Subtask 1 or to combine
with different task heads for span annotation in Subtask 2.
3.1. BERT-based Model with BiGRU Layer Architecture for Subtask 1
In this section, we introduce the architecture for Subtask 1. Figure 1 shows the whole architecture.
CRITICAL or
Output
CONSPIRACY
Softmax
Linear Layer
Dropout Layer
Concatenated
Hidden States
BiGRU Layer
Sequence Output
BERT-based Encoder
Tokenization
⋯⋯ Input Data
word 1 word N
Figure 1: Model Architecture for Subtask 1. This architecture enhances BERT’s contextual embeddings with a
BiGRU layer for bidirectional sequential processing, which, after dropout regularization, feeds into a linear layer
for final classification.
The CT-BERT model is selected as our encoder, which was trained on a large dataset of COVID-19
Twitter messages. The corpus for this PAN 2024 task consists of COVID-19 Telegram texts, making
our model particularly well-suited due to its training on similar content. Consequently, this model is
expected to outperform other BERT-based models due to its superior understanding of this specific
domain. Additionally, we have chosen RoBERTa [9] as a contrasting model to verify whether these
expectations hold.
The BERT-based model provides rich contextual embeddings by considering the left and right contexts
within the transformer architecture. The addition of a BiGRU layer introduces an extra level of sequential
processing. It processes information in both forward and backward directions across the text, offering
a comprehensive view of the temporal dependencies. Once the BERT-based layer has generated the
sequence outputs, they are fed into the BiGRU layer. The BiGRU layer synthesizes the information
captured by the BERT layer, adding a layer of understanding. This enhancement aids in detecting subtle
cues and patterns that differentiate various narrative types.
The BiGRU outputs are then passed through additional dropout layers for regularization, followed by
a linear classification layer that maps the BiGRU outputs to the target category.
3.2. Multi-task Learning Architecture for Subtask 2
The core architecture for Subtask 2 remains the same, however, we employ a multi-task learning method
to more effectively address the specific challenges posed by Subtask 2, as shown in Figure 2.
Category:O
Start char:2 Output
End char:135
BIO Tagging
Token Task Modules
Classification For Different
+BiGRU Categories
BERT-based Encoder Shared Layer
Input Text
Figure 2: Model Architecture for Subtask 2.This architecture uses a BERT-based encoder shared layer and
BiGRU-enhanced token classification layers with BIO tagging for different categories, creating a multi-task
classifier that identifies text elements in six categories.
Given that the key elements to be identified in a text fall under one of six categories— Agents (A),
Facilitators (F), Campaigners (C), Victims (V), Effects (E), and Objectives (O)—each can be considered
a separate token classification task. All these tasks share the same need for embeddings. Therefore,
we utilize a BERT-based encoder (primarily CT-BERT) as the backbone of our architecture, with token
classification layers serving as task-specific heads. This forms our multi-task classifier architecture.
Additionally, the token classification layer is integrated with a BiGRU layer, and through BIO tagging,
we achieve the span output for each category.
Recent research [10] has proven the effectiveness of a multi-task classifier based on the domain-specific
CT-BERT model. Utilizing a shared encoder, our model efficiently learns universal representations
beneficial across all tasks, while the dedicated task modules concentrate on task-specific features.
4. Experiments and Results
4.1. Datasets
Given these two subtasks, the oppositional thinking analysis task has provided datasets [11] consisting
of Telegram texts related to COVID-19 from a list of oppositional Telegram channels, available in both
English and Spanish. The data has been pre-processed and tokenized for convenience, with emojis
and other non-text content removed. The training datasets include lists of texts fully annotated with
categories and spans of key elements, whereas the test datasets contain only the input texts. A total of
5000 texts for each language have been provided.
4.2. Evaluation
For evaluation, we used the official metrics provided to evaluate Subtask 1: Matthews Correlation
Coefficient (MCC) [12], per-class F1 scores: F1-Consp and F1-Crit and macro-averaged F1.
And we used the following metrics in Subtask 2: span-F1 [13], span-recall, span-precision and
micro-span-F1.
4.3. Baseline
The organisers of each subtask provided baselines in both languages for each subtask. BERT classifier
is used for Subtask 1, and BERT-based multi-task token classifier is used for Subtask 2.
4.4. Settings
While training, we preprocessed the training set and divided it using stratified 3-fold cross-validation.
Our model is trained using a cross-entropy loss function and utilizes the AdamW optimizer with a
learning rate of 2e-5, incorporating a scheduler for learning rate adjustments. Other hyperparameters
include a batch size of 16 and a training duration of three epochs.
In Subtask 1, we selected CT-BERT and RoBERTa for experiments on the English corpus, and bert-
spanish [14] for the Spanish corpus. Each model was tested both with and without an added BiGRU
layer. In Subtask 2, we selected CT-BERT as backbone on the English corpus, and bert-spanish for the
Spanish corpus. Each model was tested both with an added BiGRU layer.
4.5. Results
During the training process for Subtask 1, we evaluated our models and compared them with the
official baselines. We anticipate that the CT-BERT + BiGRU model will outperform other models on
the English corpus. For the Spanish corpus, due to the limited availability of multilingual models for
experimentation, we used BERT-Spanish with a BiGRU layer.
As shown in Table 1, our model performed better than both the baseline and RoBERTa + BiGRU,
demonstrating the effectiveness of the CT-BERT + BiGRU model in this binary classification task. When
compared with CT-BERT without the BiGRU, the version with BiGRU showed slight improvement.
However, the BERT-Spanish + BiGRU model slightly fell short of the Spanish baseline.
The Table 2 shows that our model still holds up, indicating that our model is robust and neither
overfits nor underfits the training set. However, the BERT-Spanish + BiGRU model performed worse
than the baseline.
Table 1
Results for SubTask 1 on training sets
Model Language MCC F1-Consp F1-Crit F1-avg
Baseline English 0.729 0.819 0.908 0.863
CT-BERT + BiGRU English 0.815 0.878 0.936 0.907
CT-BERT English 0.808 0.872 0.935 0.903
RoBERTa + BiGRU English 0.789 0.859 0.928 0.894
RoBERTa English 0.783 0.928 0.853 0.890
Baseline Spanish 0.677 0.790 0.886 0.838
BERT-spanish + BiGRU Spanish 0.662 0.776 0.882 0.829
In relation to Subtask 2, and similar to the approach in Subtask 1, we compared the CT-BERT + BiGRU
model and the BERT-Spanish + BiGRU model with the baseline model during training to evaluate if this
multi-task architecture still performs better. Subsequently, we submitted our best model for testing on
the official test sets. Table 3 and Table 4 demonstrate the results obtained in Subtask 2.
Table 2
Results for Subtask 1 on official testing sets
Model Language MCC F1-Consp F1-Crit F1-avg
Baseline English 0.796 0.863 0.931 0.897
CT-BERT + BiGRU English 0.821 0.821 0.940 0.909
Baseline Spanish 0.668 0.787 0.880 0.833
BERT-spanish + BiGRU Spanish 0.653 0.768 0.880 0.824
Table 3
Results for Subtask 2 on training sets
Model Language span-F1 span-P span-R micro-span-F1
Baseline English 0.522 0.453 0.640 0.510
CT-BERT + BiGRU English 0.576 0.516 0.667 0.542
Baseline Spanish 0.475 0.429 0.544 0.475
BERT-spanish + BiGRU Spanish 0.475 0.440 0.527 0.483
Table 4
Results for Subtask 2 on official testing sets
Model Language span-F1 span-P span-R micro-span-F1
Baseline English 0.532 0.468 0.633 0.499
CT-BERT + BiGRU English 0.569 0.522 0.633 0.538
Baseline Spanish 0.493 0.453 0.562 0.495
BERT-spanish + BiGRU Spanish 0.486 0.462 0.522 0.494
5. Conclusion
This paper mainly introduces our work on oppositional thinking analysis at PAN 2024. Our work
utilizes a BERT-based model with a BiGRU layer to enhance performance in both binary classification
and sequence labeling tasks within this domain. The results from the official testing datasets indicate
that our method achieved an improvement of approximately 0.04 MCC scores in Subtask 1 and reached
4th place in the Official Ranking for the English corpus.
While the English model demonstrated strong performance, the Spanish model was less successful,
with only marginal improvements attributed to the BiGRU layer. Therefore, future work should focus
on investigating how this method impacts multilingual tasks.
Acknowledgments
This work is supported by the Social Science Foundation of Guangdong Province, China (No.GD24CZY02)
References
[1] K. M. Douglas, J. E. Uscinski, R. M. Sutton, A. Cichocka, T. Nefes, C. S. Ang, F. Deravi, Understanding
conspiracy theories, Political psychology 40 (2019) 3–35.
[2] S. Phadke, M. Samory, T. Mitra, What makes people join conspiracy communities? Role of social
factors in conspiracy engagement, Proceedings of the ACM on Human-Computer Interaction 4
(2021) 1–30.
[3] A. A. Ayele, N. Babakov, J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag,
M. Fröbe, D. Korenčić, M. Mayerl, D. Moskovskiy, A. Mukherjee, A. Panchenko, M. Potthast,
F. Rangel, N. Rizwan, P. Rosso, F. Schneider, A. Smirnova, E. Stamatatos, E. Stakovskii, B. Stein,
M. Taulé, D. Ustalov, X. Wang, M. Wiegmann, S. M. Yimam, E. Zangerle, Overview of PAN 2024:
Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking
Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot,
D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro
(Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in
Computer Science, Springer, Berlin Heidelberg New York, 2024.
[4] D. Korenčić, B. Chulvi, X. B. Casals, M. Taulé, P. Rosso, F. Rangel, Overview of the Oppositional
Thinking Analysis PAN Task at CLEF 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S.
de Herrera (Eds.), Working Notes of CLEF 2024 – Conference and Labs of the Evaluation Forum,
2024.
[5] K. Pogorelov, D. T. Schroeder, S. Brenner, J. Langguth, FakeNews: Corona Virus and Conspiracies
Multimedia Analysis Task at MediaEval 2021., in: MediaEval, 2021.
[6] J. Alghamdi, Y. Lin, S. Luo, Towards covid-19 fake news detection using transformer-based models,
Knowledge-Based Systems 274 (2023) 110642.
[7] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model
to analyse covid-19 content on twitter, Frontiers in artificial intelligence 6 (2023) 1023281.
[8] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks
on sequence modeling, arXiv preprint arXiv:1412.3555 (2014).
[9] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
[10] Y. Peskine, G. Alfarano, I. Harrando, P. Papotti, R. Troncy, Detecting COVID-19-Related Conspiracy
Theories in Tweets., in: MediaEval, 2021.
[11] D. Korenčić, B. Chulvi, X. Bonet Casals, M. Taulé, P. Rosso, PAN24 Oppositional Thinking Analysis
[Data set], https://doi.org/10.5281/zenodo.11199642, 2024. Available from Zenodo.
[12] D. Chicco, N. Tötsch, G. Jurman, The Matthews correlation coefficient (MCC) is more reliable
than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix
evaluation, BioData mining 14 (2021) 1–22.
[13] G. Da San Martino, Y. Seunghak, A. Barrón-Cedeno, R. Petrov, P. Nakov, et al., Fine-grained analysis
of propaganda in news article, in: Proceedings of the 2019 conference on empirical methods
in natural language processing and the 9th international joint conference on natural language
processing (EMNLP-IJCNLP), Association for Computational Linguistics, 2019, pp. 5636–5646.
[14] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
evaluation data, arXiv preprint arXiv:2308.02976 (2023).