An Oppositional Thinking Analysis Method Using BERT-based Model with BiGRU Notebook for PAN at CLEF 2024 Qingbiao Hu, Zhongyuan Han* , Jiangao Peng, Mingcan Guo and Chang Liu Foshan University, Foshan, China Abstract The Oppositional thinking analysis: Conspiracy theories vs critical thinking narratives task of PAN at CLEF 2024 involves two challenges: first, distinguishing between conspiracy and critical narratives as Subtask 1, and second, identifying key elements of oppositional narratives as Subtask 2. We consider these two challenges as binary classification and sequence labeling problems, respectively. We will perform both tasks in English and Spanish. In this paper, we introduce our method to address these challenges by fine-tuning a BERT-based model with an added BiGRU layer for Subtask 1 and employing a multi-task learning method for Subtask 2. Finally, our model for English achieves MCC scores of 0.821 in Subtask 1 and Span-F1 scores of 0.569 in Subtask 2 on the official test set. Keywords PAN 2024, Oppositional Thinking Analysis, BERT-based Model, Multi-task Learning 1. Introduction As it is acknowledged that conspiracy theories pose significant harm to society and are challenging to identify [1], the difficulty lies in distinguishing them from critical thinking narratives, as both share similarities in oppositional thinking. However, it is crucial to differentiate between them, as failure to do so could push people toward conspiracy communities, as shown in [2]. The PAN at CLEF 2024 task [3] on oppositional thinking analysis [4] aims to address this problem. It includes two subtasks framed as a binary classification task and a token-level classification task, respectively. The automatic detection of conspiracy theories in text using pre-trained language models has proven effective [5] in recent years. Combining the transformer-based model with downstream neural networks has achieved state-of-the-art performance in similar tasks [6]. Inspired by related works, we employ CT-BERT [7] and BiGRU (Bidirectional Gated Recurrent Units) [8] to address this task. By integrating the BERT-based layer with the BiGRU layer, we leverage the benefits of deep contextual embeddings and sequence-sensitive features. 2. Oppositional thinking analysis Task At PAN 2024 there are two subtasks proposed for oppositional thinking analysis: • Subtask 1: Distinguishing between critical and conspiracy texts. It is a binary classification task that aims to distinguish between two types of messages: the first contains critical messages that scrutinize significant decisions within the public health sector without endorsing a con- spiratorial mindset; the second includes messages that interpret the pandemic or public health decisions as the result of a malignant conspiracy orchestrated by secretive, powerful entities. Our task is to categorize these texts into distinct categories: CONSPIRACY or CRITICAL. CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France * Corresponding author. $ ezio411152084@gmail.com (Q. Hu); hanzhongyuan@gmail.com (Z. Han); wyd1n910@gmail.com (J. Peng); gmc9812@163.com (M. Guo); lc965024004@gmail.com (C. Liu)  0009-0004-8237-0044 (Q. Hu); 0000-0001-8960-9872 (Z. Han); 0009-0006-3780-5023 (J. Peng); 0000-0002-4977-2138 (M. Guo); 0009-0000-0887-9273 (C. Liu) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings • Subtask 2: Detecting elements of the oppositional narratives. It is a token-level classification task aimed at recognizing text spans corresponding to the key elements of oppositional narratives. A span-level annotation scheme that identifies the Agents (A), Facilitators (F), Campaigners (C), Victims (V), Effects (E), Objectives (O) in the oppositional narratives was developed. Our task is to identify specific spans in texts that should be annotated with the corresponding labels. 3. Method Generally speaking, our method consists of two main parts: the BERT-based encoder and the BiGRU downstream neural network layer for both Subtask 1 and Subtask 2. Our method involves three primary steps: 1) fine-tune the pre-trained BERT-based model with the given training dataset, 2) feed the sequence of embeddings from the BERT-based model into a BiGRU layer and 3) Use the outputs from the BiGRU layer, typically the final hidden states that encapsulate the information from the entire sequence, to classify the text into categories (e.g., critical or conspiracy) in Subtask 1 or to combine with different task heads for span annotation in Subtask 2. 3.1. BERT-based Model with BiGRU Layer Architecture for Subtask 1 In this section, we introduce the architecture for Subtask 1. Figure 1 shows the whole architecture. CRITICAL or Output CONSPIRACY Softmax Linear Layer Dropout Layer Concatenated Hidden States BiGRU Layer Sequence Output BERT-based Encoder Tokenization ⋯⋯ Input Data word 1 word N Figure 1: Model Architecture for Subtask 1. This architecture enhances BERT’s contextual embeddings with a BiGRU layer for bidirectional sequential processing, which, after dropout regularization, feeds into a linear layer for final classification. The CT-BERT model is selected as our encoder, which was trained on a large dataset of COVID-19 Twitter messages. The corpus for this PAN 2024 task consists of COVID-19 Telegram texts, making our model particularly well-suited due to its training on similar content. Consequently, this model is expected to outperform other BERT-based models due to its superior understanding of this specific domain. Additionally, we have chosen RoBERTa [9] as a contrasting model to verify whether these expectations hold. The BERT-based model provides rich contextual embeddings by considering the left and right contexts within the transformer architecture. The addition of a BiGRU layer introduces an extra level of sequential processing. It processes information in both forward and backward directions across the text, offering a comprehensive view of the temporal dependencies. Once the BERT-based layer has generated the sequence outputs, they are fed into the BiGRU layer. The BiGRU layer synthesizes the information captured by the BERT layer, adding a layer of understanding. This enhancement aids in detecting subtle cues and patterns that differentiate various narrative types. The BiGRU outputs are then passed through additional dropout layers for regularization, followed by a linear classification layer that maps the BiGRU outputs to the target category. 3.2. Multi-task Learning Architecture for Subtask 2 The core architecture for Subtask 2 remains the same, however, we employ a multi-task learning method to more effectively address the specific challenges posed by Subtask 2, as shown in Figure 2. Category:O Start char:2 Output End char:135 BIO Tagging Token Task Modules Classification For Different +BiGRU Categories BERT-based Encoder Shared Layer Input Text Figure 2: Model Architecture for Subtask 2.This architecture uses a BERT-based encoder shared layer and BiGRU-enhanced token classification layers with BIO tagging for different categories, creating a multi-task classifier that identifies text elements in six categories. Given that the key elements to be identified in a text fall under one of six categories— Agents (A), Facilitators (F), Campaigners (C), Victims (V), Effects (E), and Objectives (O)—each can be considered a separate token classification task. All these tasks share the same need for embeddings. Therefore, we utilize a BERT-based encoder (primarily CT-BERT) as the backbone of our architecture, with token classification layers serving as task-specific heads. This forms our multi-task classifier architecture. Additionally, the token classification layer is integrated with a BiGRU layer, and through BIO tagging, we achieve the span output for each category. Recent research [10] has proven the effectiveness of a multi-task classifier based on the domain-specific CT-BERT model. Utilizing a shared encoder, our model efficiently learns universal representations beneficial across all tasks, while the dedicated task modules concentrate on task-specific features. 4. Experiments and Results 4.1. Datasets Given these two subtasks, the oppositional thinking analysis task has provided datasets [11] consisting of Telegram texts related to COVID-19 from a list of oppositional Telegram channels, available in both English and Spanish. The data has been pre-processed and tokenized for convenience, with emojis and other non-text content removed. The training datasets include lists of texts fully annotated with categories and spans of key elements, whereas the test datasets contain only the input texts. A total of 5000 texts for each language have been provided. 4.2. Evaluation For evaluation, we used the official metrics provided to evaluate Subtask 1: Matthews Correlation Coefficient (MCC) [12], per-class F1 scores: F1-Consp and F1-Crit and macro-averaged F1. And we used the following metrics in Subtask 2: span-F1 [13], span-recall, span-precision and micro-span-F1. 4.3. Baseline The organisers of each subtask provided baselines in both languages for each subtask. BERT classifier is used for Subtask 1, and BERT-based multi-task token classifier is used for Subtask 2. 4.4. Settings While training, we preprocessed the training set and divided it using stratified 3-fold cross-validation. Our model is trained using a cross-entropy loss function and utilizes the AdamW optimizer with a learning rate of 2e-5, incorporating a scheduler for learning rate adjustments. Other hyperparameters include a batch size of 16 and a training duration of three epochs. In Subtask 1, we selected CT-BERT and RoBERTa for experiments on the English corpus, and bert- spanish [14] for the Spanish corpus. Each model was tested both with and without an added BiGRU layer. In Subtask 2, we selected CT-BERT as backbone on the English corpus, and bert-spanish for the Spanish corpus. Each model was tested both with an added BiGRU layer. 4.5. Results During the training process for Subtask 1, we evaluated our models and compared them with the official baselines. We anticipate that the CT-BERT + BiGRU model will outperform other models on the English corpus. For the Spanish corpus, due to the limited availability of multilingual models for experimentation, we used BERT-Spanish with a BiGRU layer. As shown in Table 1, our model performed better than both the baseline and RoBERTa + BiGRU, demonstrating the effectiveness of the CT-BERT + BiGRU model in this binary classification task. When compared with CT-BERT without the BiGRU, the version with BiGRU showed slight improvement. However, the BERT-Spanish + BiGRU model slightly fell short of the Spanish baseline. The Table 2 shows that our model still holds up, indicating that our model is robust and neither overfits nor underfits the training set. However, the BERT-Spanish + BiGRU model performed worse than the baseline. Table 1 Results for SubTask 1 on training sets Model Language MCC F1-Consp F1-Crit F1-avg Baseline English 0.729 0.819 0.908 0.863 CT-BERT + BiGRU English 0.815 0.878 0.936 0.907 CT-BERT English 0.808 0.872 0.935 0.903 RoBERTa + BiGRU English 0.789 0.859 0.928 0.894 RoBERTa English 0.783 0.928 0.853 0.890 Baseline Spanish 0.677 0.790 0.886 0.838 BERT-spanish + BiGRU Spanish 0.662 0.776 0.882 0.829 In relation to Subtask 2, and similar to the approach in Subtask 1, we compared the CT-BERT + BiGRU model and the BERT-Spanish + BiGRU model with the baseline model during training to evaluate if this multi-task architecture still performs better. Subsequently, we submitted our best model for testing on the official test sets. Table 3 and Table 4 demonstrate the results obtained in Subtask 2. Table 2 Results for Subtask 1 on official testing sets Model Language MCC F1-Consp F1-Crit F1-avg Baseline English 0.796 0.863 0.931 0.897 CT-BERT + BiGRU English 0.821 0.821 0.940 0.909 Baseline Spanish 0.668 0.787 0.880 0.833 BERT-spanish + BiGRU Spanish 0.653 0.768 0.880 0.824 Table 3 Results for Subtask 2 on training sets Model Language span-F1 span-P span-R micro-span-F1 Baseline English 0.522 0.453 0.640 0.510 CT-BERT + BiGRU English 0.576 0.516 0.667 0.542 Baseline Spanish 0.475 0.429 0.544 0.475 BERT-spanish + BiGRU Spanish 0.475 0.440 0.527 0.483 Table 4 Results for Subtask 2 on official testing sets Model Language span-F1 span-P span-R micro-span-F1 Baseline English 0.532 0.468 0.633 0.499 CT-BERT + BiGRU English 0.569 0.522 0.633 0.538 Baseline Spanish 0.493 0.453 0.562 0.495 BERT-spanish + BiGRU Spanish 0.486 0.462 0.522 0.494 5. Conclusion This paper mainly introduces our work on oppositional thinking analysis at PAN 2024. Our work utilizes a BERT-based model with a BiGRU layer to enhance performance in both binary classification and sequence labeling tasks within this domain. The results from the official testing datasets indicate that our method achieved an improvement of approximately 0.04 MCC scores in Subtask 1 and reached 4th place in the Official Ranking for the English corpus. While the English model demonstrated strong performance, the Spanish model was less successful, with only marginal improvements attributed to the BiGRU layer. Therefore, future work should focus on investigating how this method impacts multilingual tasks. Acknowledgments This work is supported by the Social Science Foundation of Guangdong Province, China (No.GD24CZY02) References [1] K. M. Douglas, J. E. Uscinski, R. M. Sutton, A. Cichocka, T. Nefes, C. S. Ang, F. Deravi, Understanding conspiracy theories, Political psychology 40 (2019) 3–35. [2] S. Phadke, M. Samory, T. Mitra, What makes people join conspiracy communities? Role of social factors in conspiracy engagement, Proceedings of the ACM on Human-Computer Interaction 4 (2021) 1–30. [3] A. A. Ayele, N. Babakov, J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag, M. Fröbe, D. Korenčić, M. Mayerl, D. Moskovskiy, A. Mukherjee, A. Panchenko, M. Potthast, F. Rangel, N. Rizwan, P. Rosso, F. Schneider, A. Smirnova, E. Stamatatos, E. Stakovskii, B. Stein, M. Taulé, D. Ustalov, X. Wang, M. Wiegmann, S. M. Yimam, E. Zangerle, Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024. [4] D. Korenčić, B. Chulvi, X. B. Casals, M. Taulé, P. Rosso, F. Rangel, Overview of the Oppositional Thinking Analysis PAN Task at CLEF 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 – Conference and Labs of the Evaluation Forum, 2024. [5] K. Pogorelov, D. T. Schroeder, S. Brenner, J. Langguth, FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task at MediaEval 2021., in: MediaEval, 2021. [6] J. Alghamdi, Y. Lin, S. Luo, Towards covid-19 fake news detection using transformer-based models, Knowledge-Based Systems 274 (2023) 110642. [7] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter, Frontiers in artificial intelligence 6 (2023) 1023281. [8] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv:1412.3555 (2014). [9] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019). [10] Y. Peskine, G. Alfarano, I. Harrando, P. Papotti, R. Troncy, Detecting COVID-19-Related Conspiracy Theories in Tweets., in: MediaEval, 2021. [11] D. Korenčić, B. Chulvi, X. Bonet Casals, M. Taulé, P. Rosso, PAN24 Oppositional Thinking Analysis [Data set], https://doi.org/10.5281/zenodo.11199642, 2024. Available from Zenodo. [12] D. Chicco, N. Tötsch, G. Jurman, The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation, BioData mining 14 (2021) 1–22. [13] G. Da San Martino, Y. Seunghak, A. Barrón-Cedeno, R. Petrov, P. Nakov, et al., Fine-grained analysis of propaganda in news article, in: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, 2019, pp. 5636–5646. [14] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and evaluation data, arXiv preprint arXiv:2308.02976 (2023).