<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Binary Battle: Leveraging Machine Learning and Transfer Learning Models to Distinguish between Conspiracy Theories and Critical Thinking</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sidharth</forename><surname>Mahesh</surname></persName>
							<email>sidharthmaheshedu@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<settlement>Mangalore</settlement>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sonith</forename><surname>Divakaran</surname></persName>
							<email>sonithksd@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<settlement>Mangalore</settlement>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kavya</forename><surname>Girish</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<settlement>Mangalore</settlement>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lakshmaiah</forename><surname>Shashirekha</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<settlement>Mangalore</settlement>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Binary Battle: Leveraging Machine Learning and Transfer Learning Models to Distinguish between Conspiracy Theories and Critical Thinking</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">49472FA90DF1C2220FA8FFEDF50D33E1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:54+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Oppositional thinking analysis: Conspiracy vs Critical Narratives</term>
					<term>Oppositional Thinking</term>
					<term>Conspiracy Theories</term>
					<term>Machine Learning</term>
					<term>Transfer Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In the context of automatic content moderation Natural Language Processing (NLP) has a complex task when it comes to distinguishing between conspiracy theories and critical thinking. While conspiracy theories present complex narratives attributing significant events to covert actions by powerful and malicious entities, critical thinking involves scrutinizing decisions without resorting to any sinister explanations. Making this distinction is essential to avoid the mislabeling of valid criticism as conspiracy, which may unintentionally lead people to join conspiracy communities. Conspiratorial and critical narratives are both examples of oppositional thinking, which is important in public debate, particularly in controversial areas like public health. In this direction, "Oppositional thinking analysis: Conspiracy theories vs critical thinking narratives"-a shared task organized at PAN 2024, invites the research community to address the challenges of distinguishing between conspiracy and critical texts in English and Spanish languages. To explore the strategies for distinguishing between critical and conspiracy texts in English and Spanish on social media platforms, in this paper, we -team MUCS, describe the models proposed for Subtask-1: "Distinguishing between critical and conspiracy texts" of the shared task. We explored machine learning models trained with Term Frequency-Inverse Document Frequency (TF-IDF) of char n-grams in the range (1, 5) and transfer learning techniques using several BERT variants fine-tuned with the given English and Spanish datasets, to classify the given unlabeled English and Spanish text into one of the two categories -'CONSPIRACY' or 'CRITICAL'. Among the proposed models, English_BERT and Spanish_BERT models obtained Matthews Correlation Coefficient (MCC) scores of 0.7162 and 0.6293 for English and Spanish languages respectively.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Conspiracy theories and critical thinking are two forms of oppositional thinking which are common especially on contentious issues and analyzing oppositional thinking entails scrutinizing narratives that question mainstream perspectives. Further, understanding the impact of different types of oppositional thinking on public opinion and behavior is crucial <ref type="bibr" target="#b0">[1]</ref>. While critical thinking fosters constructive and democratic debate characterized by reasoned questioning without unfounded explanations, conspiratorial thinking can lead to misinformation and social conflict attributing the significant events to hidden malevolent forces <ref type="bibr" target="#b1">[2]</ref>. In social and political discourse, conspiracy theories can have detrimental effects on an individual as well as on organisations or the entire society. These theories suggest that major social (encompass gatherings and activities involving people) and political events (activities related to government and leadership) with careful planning by powerful and malevolent entities can spread false information and stir social unrest. Conspiracy theories have been associated with violence, war, terrorism, prejudice, poor health choices, and denial of climate change <ref type="bibr" target="#b2">[3]</ref>. In contrast, critical thinking involves analyzing and questioning decisions, particularly in areas like public health, without attributing events to hidden conspiracies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1 Sample text and their corresponding labels in English dataset</head><p>Mislabeling critical discourse as conspiratorial can suppress healthy debate and alienate individuals who are merely questioning decisions and mislabeling critical conversation as conspiracy can stifle constructive disagreement and alienate people who are only challenging judgements <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. This leads to a climate of mistrust and inhibits the exchange of open and sincere ideas. On the other hand, failing to identify and address conspiratorial narratives allows the spread of misinformation, leading to societal division and mistrust. Thus, accurate differentiation between the two forms of oppositional thinking by content moderation systems is essential to avoid marginalizing legitimate criticism and preventing individuals from being drawn into conspiracy communities by mistaking valid critique for conspiracy theories <ref type="bibr" target="#b4">[5]</ref>. Additionally, distinguishing between these forms of oppositional thinking is essential for public discourse and social harmony.</p><p>Distinguishing between the oppositional forms of thinking is challenging due to the nuanced and overlapping content, the context-dependent nature of oppositional statements, and oppositional attitudes <ref type="bibr" target="#b5">[6]</ref>. Further, differentiating between well-founded critical and unfounded conspiracy theories require advanced and context-aware NLP techniques and developing such techniques is crucial for improving the accuracy and fairness of content moderation systems. To address the challenges of distinguishing between the oppositional forms of thinking, "Oppositional thinking analysis: Conspiracy theories vs Critical thinking narratives" shared task organized at PAN 2024 <ref type="bibr" target="#b6">[7]</ref>, invites the research community to develop models to distinguish between conspiracy and critical thinking texts in English and Spanish languages. The shared task has two subtasks in English and Spanish languages and we -team MUCS participated in only Subtask-1: 'Distinguishing between critical and conspiracy texts'. This subtask having two categories -'CONSPIRACY' and 'CRITICAL', is modeled as a binary classification problem. In this paper, we describe various machine learning models trained with TF-IDF of char n-grams in the range <ref type="bibr" target="#b0">(1,</ref><ref type="bibr" target="#b4">5)</ref> and transfer learning techniques using several BERT variants fine-tuned with the given Engish and Spanish datasets, to classify the given unlabeled English and Spanish text into one of the two categories -'CONSPIRACY' or 'CRITICAL'. The sample text from the given datasets for English and Spanish are shown in the Tables <ref type="table">1 and 2</ref> respectively.</p><p>The rest of the paper is organized as follows: Section 2 describes the recent literature on the two forms of oppositional thinking and Section 3 focuses on the description of the proposed models followed by the experiments and results in Section 4. The paper concludes with future works in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Conspiracy theories involve elaborate, unverified claims driven by cognitive biases and emotional needs, often rejecting official explanations without sufficient evidence. In contrast, critical thinking is characterized by objective analysis, logical reasoning, and the evaluation of evidence from multiple sources. Research highlights that while conspiracy beliefs are linked to cognitive biases and feelings of powerlessness, critical thinking fosters informed decision-making and intellectual humility <ref type="bibr" target="#b2">[3]</ref>.</p><p>To explore different strategies for the conspiracy textual content classification in social media, Moosleitner and Murauer <ref type="bibr" target="#b7">[8]</ref>  Giachanou et al. <ref type="bibr" target="#b1">[2]</ref> performed a comparative analysis of various profiles and psychological and linguistic characteristics in social media texts of users who share posts about conspiracy theories. The authors then compared the effectiveness of these characteristics for predicting whether a user is a conspiracy propagator or not by proposing ConspiDetector, a model that is based on a Convolutional Neural Network (CNN) which combines word embeddings with psycho-linguistic characteristics extracted from the tweets of users to detect conspiracy propagators. Recordare et al. <ref type="bibr" target="#b10">[11]</ref> implemented various machine learning classifiers (Logistic Regression (LR), k-Nearest Neighbours (kNN), Naive Bayes (NB), SVM, Decision Trees (DT), Random Forest (RF), Gradient Boosting: XGBoost and LightGBM, Quadratic Discriminant Analysis, MLP, Ridge Classifier, and Linear Discriminant Analysis, trained with Bidirectional Auto-Regressive Transformers (bart)-large-Multi-Genre natural language inference features for identifying users who propagate conspiracy theories based on a rich set of 871 features in English language. Among all the proposed models, LightGBM classifier outperformed other models with a macro F1 score of 0.87. To identify whether an article belongs to conspiracy theory or not in English language, Ghasemizade and Onaolapo <ref type="bibr" target="#b11">[12]</ref> proposed machine learning classifiers (RF, SVM, k-NN, NB) trained with TF-IDF of word unigrams and deep learning model trained with padded and embedded text sequences. Using their respective tokenizers, tokenized and padded text sequences were taken as inputs to train the transformer models -BERT and RoBERTa. Their proposed RoBERTa model outperformed other models with a macro F1 score of 87%.</p><p>The above literature highlights extensive research efforts aimed at detecting conspiracy theories utilizing a range of machine learning, deep learning, and transfer learning models. These studies offer valuable insights into the detection of conspiracy theories. However, they do not specifically address the distinction between conspiracy theories and critical thinking within the framework of oppositional thinking. This gap suggests a need for further research to effectively distinguish between these concepts, encouraging the creation of new models for this specific type of application.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>We have explored machine learning and transfer learning models for distinguishing between critical and conspiracy texts in English and Spanish and the steps involved in the construction of these models are explained in the following subsections. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Machine Learning models</head><p>The framework of machine learning model is visualized in Figure <ref type="figure" target="#fig_1">1</ref> and the steps included in building these classifiers are explained below:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Pre-processing</head><p>Pre-processing is the preliminary step in building learning models and it involves cleaning and transforming the raw text data to a suitable format required for subsequent processing. Usually, text data contains noise in the form of: user mentions, hashtags, punctuation, digits, and hyperlinks, and eliminating this irrelevant information makes the data less complex and improves the performance of the classifier. Hence, in this work, this irrelevant information and stopwords are removed during preprocessing. Further, English and Spanish stopwords available at NLTK library<ref type="foot" target="#foot_0">1</ref> are used as references to remove English and Spanish stopwords respectively from the given dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Feature Extraction</head><p>The role of feature extraction is to extract relevant features from the given data to train the learning models. TF-IDF of char n-grams is used to represent English and Spanish text. Char n-grams are sequences of n consecutive characters in a word and char n-grams in the range <ref type="bibr" target="#b0">(1,</ref><ref type="bibr" target="#b4">5)</ref> are obtained from the text and converted to TF-IDF vectors using TfidfVectorizer<ref type="foot" target="#foot_1">2</ref> . </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Model Construction</head><p>The performance of the learning model relies on the features and the classifier used to carry out classification. This work utilizes machine learning classifiers (MNB, LR, RF, and ensemble of LSVC, LR, and RF with majority voting), to distinguish between conspiracy and critical texts in English and Spanish language. A brief description of the machine learning classifiers is given below:</p><p>• MNB -is a probability-based classifier suitable for text classification with discrete characteristics like word frequency counts <ref type="bibr" target="#b12">[13]</ref>. • LinearSVC -used from the Scikit-learn library<ref type="foot" target="#foot_2">3</ref> attempts to maximize the distance between classified samples by finding a hyperplane. • LR -is used to predict the probability of certain classes based on dependent variables and is suitable for binary classification task. Further, regularisation approaches in LR classifiers are useful for reducing overfitting in high dimensional space <ref type="bibr" target="#b13">[14]</ref>. • RF -is one of the supervised learning algorithms which is flexible and can be adapted easily to different situations but it is necessary to build a minimum number of trees in order to classify the data <ref type="bibr" target="#b14">[15]</ref>. • Ensemble learning -is a strategy for building a new classifier from several heterogeneous base classifiers taking benefit of the strength of one classifier to overcome the weakness of another classifier to get better performance for the classification task <ref type="bibr" target="#b15">[16]</ref>. In this work, three machine learning classifiers (LSVC, LR, and RF) are ensembled with hard voting to distinguish between critical and conspiracy texts.</p><p>The hyperparameters and their values used in the machine learning models are shown in Table <ref type="table" target="#tab_0">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Transfer Learning</head><p>The technique of transfer learning within the broader field of machine learning utilizes knowledge gained from one task to improve the performance of another related task. This is realized with the help of pretrained transformer models which are trained on large unlabeled data and are widely accessible and applicable to various tasks. The pretrained transformer models are fine-tuned with the given dataset to fit the models to a particular task or domain. The framework of the proposed transfer learning model is shown in Figure <ref type="figure" target="#fig_2">2</ref>.</p><p>The text is pre-processed to clean and transform the raw text into a consistent format by converting numeric information to corresponding words, and removing URLs, user mentions, hash tags, and special characters. Preprocessing is applied to the sentences of the given text to retain the sentence structure in the text and the preprocessed text is used to fine-tune the transformer models. A brief description of the transformer models used in this study to fine-tune are given below: • BERT_base <ref type="foot" target="#foot_3">4</ref> is a conceptually simple and empirically powerful pretrained language model using a Masked Language Modeling (MLM) objective trained on Toronto Book Corpus and Wikipedia and exclusively used for tasks involving English texts. • English_BERT 5 -is a bilingual Legal BERT model trained with 2,000 Dutch and 6,000 English legal documents amounting to 12 GB legal text from various areas belonging to legal domain such as legislation and court cases. This domain specific BERT has resulted in improved performance compared to using standard BERT models for legal tasks. • CT_BERT_v2 6 -is a BERT-large-uncased model pretrained on a corpus of messages from Twitter about COVID-19. This model is identical to covid-twitter-bert but trained on more data (40.7M sentences and 633M tokens) resulting in higher performances for many downstream applications. • EN_RoBERTa 7 -is a large multi-lingual language model trained on 12.5TB of filtered Common-Crawl data. Based on Facebook's RoBERTa model, this model is fine-tuned with the conll2003 dataset in English. • ES_BERT 8 -BETO: Spanish BERT is trained on the Spanish edition of Wikipedia, the OPUS Project, and Spanish books and news articles, and is exclusively used for tasks involving Spanish texts. • Distil_SpanBERT 9 -is a distilled version of SpanishBERT, trained on Spanish text sources, and is also exclusively used for tasks involving Spanish texts, but is optimized for efficiency and speed. • Spanish_BERT 10 -is a sentence-transformers model which maps sentences and paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. • ES_RoBERTa 11 -a variant of RoBERTa is a BERT-based model, specifically tailored for the Spanish language. It is trained on large Spanish text corpora to understand and generate contextually relevant representations of words and sentences.</p><p>We employed the above mentioned BERT variants from the Hugging Face library. The hyperparameter and their values used in the above transfer learning models are shown in Table <ref type="table" target="#tab_1">4</ref>. These BERT variants are fine-tuned with the pre-processed Train set and is used to train transformer classifier (ClassificationModel) to distinguish between conspiracy and critical texts in English and Spanish language. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments and Results</head><p>The datasets provided by the organizers of the shared task consisted of only Train set and are highly imbalanced. Statistics of the datasets are as follows:</p><p>• English dataset: 2,621 samples belong to 'CRITICAL' class and 1,379 samples belong to 'CON-SPIRACY' class. • Spanish dataset: 2,538 samples belong to 'CRITICAL' class and 1,462 samples belong to 'CON-SPIRACY' class.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>employed machine learning models (Support Vector Machines (SVM), Multinomial Naive Bayes (MNB), and Extremely randomized Trees) trained with TF-IDF of character, word and Document Term n-grams based features and BERT models (BERT-base, RoBERTa, and DistilBERT) for English text. For the three tasks: Task 1 -Text-Based Misinformation Detection, Task 2 -Text-Based Conspiracy Theories Recognition, and Task 3 -Text-Based Combined Misinformation and Conspiracies Detection, their proposed BERT-base models outperformed all other models in all three tasks obtaining MCC scores of 0.3184, 0.3624, 0.3347 for Tasks 1, 2, and 3 respectively. For detecting fake news during covid-19 pandemic, Tahat et al. [9] proposed a hybrid analysis using Structural Equation Modelling (SEM) and machine learning classification algorithms such as BayesNet, AdaBoostM1, LWL, Logistic, J48, and OneR, for English dataset. Among the proposed models J48 classifier outperformed other machine learning classifiers with an F-Measure score of 0.863. Peskine et al. [10] proposed transformer model (composed of an ensembling of CT-BERT models) and a node embedding-based techniques (node2vec + Multilayer Perceptron (MLP) classification head) to detect COVID-19-related conspiracy theories in tweets in English language which consists of three subtasks: Task 1 -Text-Based Misinformation and Conspiracies Detection, Task 2 -Graph-Based Conspiracy Source Detection, and Task 3 -Graph and Text-Based Conspiracy Detection. Their proposed CT-BERT ensembling model obtained a MCC score of 0.710 and 0.719 for Task 1 and Task 3 respectively, and node2vec + MLP model obtained a MCC of 0.355 for Task 2.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Framework of the proposed machine learning model</figDesc><graphic coords="4,157.84,336.96,279.60,84.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Framework of the proposed transfer learning model</figDesc><graphic coords="6,181.44,65.60,232.40,144.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="2,76.60,136.80,442.08,126.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 3</head><label>3</label><figDesc>Hyperparameters and their values used in machine learning models</figDesc><table><row><cell cols="2">Model Hyperparameters</cell><cell>Values</cell></row><row><cell></cell><cell>alpha</cell><cell>1.0</cell></row><row><cell>MNB</cell><cell>fit_prior</cell><cell>True</cell></row><row><cell></cell><cell>class_prior</cell><cell>None</cell></row><row><cell></cell><cell>C</cell><cell>1.0</cell></row><row><cell>LSVC</cell><cell>class_weight max_iter</cell><cell>balanced 10000</cell></row><row><cell></cell><cell>random_state</cell><cell>123</cell></row><row><cell>LR</cell><cell>max_iter</cell><cell>1000</cell></row><row><cell>RF</cell><cell>n_estimators random_state</cell><cell>100 42</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 4</head><label>4</label><figDesc>Hyperparameter and their values used in transfer learning models</figDesc><table><row><cell>Model</cell><cell>Hyperparameter</cell><cell>Value</cell></row><row><cell></cell><cell>architectures</cell><cell>BertForMaskedLM</cell></row><row><cell></cell><cell>attention_probs_dropout_prob</cell><cell>0.1</cell></row><row><cell></cell><cell>hidden_size</cell><cell>768</cell></row><row><cell>BERT_base</cell><cell>intermediate_size</cell><cell>3072</cell></row><row><cell></cell><cell>max_position_embeddings</cell><cell>512</cell></row><row><cell></cell><cell>model_type</cell><cell>bert</cell></row><row><cell></cell><cell>vocab_size</cell><cell>30522</cell></row><row><cell></cell><cell>architectures</cell><cell>BertModel</cell></row><row><cell></cell><cell>hidden_size</cell><cell>768</cell></row><row><cell></cell><cell>max_position_embeddings</cell><cell>512</cell></row><row><cell>English_BERT</cell><cell>model_type</cell><cell>bert</cell></row><row><cell></cell><cell>hidden_size</cell><cell>768</cell></row><row><cell></cell><cell>num_hidden_layers</cell><cell>12</cell></row><row><cell></cell><cell>vocab_size</cell><cell>105879</cell></row><row><cell></cell><cell>hidden_act</cell><cell>gelu</cell></row><row><cell></cell><cell>hidden_size</cell><cell>1024</cell></row><row><cell>CT_BERT_v2</cell><cell>intermediate_size</cell><cell>4096</cell></row><row><cell></cell><cell>model_type</cell><cell>bert</cell></row><row><cell></cell><cell>vocab_size</cell><cell>30522</cell></row><row><cell></cell><cell>architectures</cell><cell>XLMRobertaForTokenClassification</cell></row><row><cell></cell><cell>hidden_act</cell><cell>gelu</cell></row><row><cell>EN_RoBERTa</cell><cell>max_position_embeddings</cell><cell>514</cell></row><row><cell></cell><cell>model_type</cell><cell>xlm-roberta</cell></row><row><cell></cell><cell>vocab_size</cell><cell>250002</cell></row><row><cell></cell><cell>architectures</cell><cell>BertForMaskedLM</cell></row><row><cell></cell><cell>hidden_act</cell><cell>gelu</cell></row><row><cell>ES_BERT</cell><cell>hidden_size intermediate_size</cell><cell>768 3072</cell></row><row><cell></cell><cell>position_embedding_type</cell><cell>absolute</cell></row><row><cell></cell><cell>vocab_size</cell><cell>31002</cell></row><row><cell></cell><cell>architectures</cell><cell>DistilBertForMaskedLM</cell></row><row><cell>Distil_SpanBERT</cell><cell>model_type</cell><cell>distilbert</cell></row><row><cell></cell><cell>vocab_size</cell><cell>31002</cell></row><row><cell></cell><cell>architectures</cell><cell>BertModel</cell></row><row><cell></cell><cell>hidden_act</cell><cell>gelu</cell></row><row><cell>Spanish_BERT</cell><cell>model_type</cell><cell>bert</cell></row><row><cell></cell><cell>position_embedding_type</cell><cell>absolute</cell></row><row><cell></cell><cell>vocab_size</cell><cell>31002</cell></row><row><cell></cell><cell>architectures</cell><cell>RobertaForMaskedLM</cell></row><row><cell></cell><cell>hidden_act</cell><cell>gelu</cell></row><row><cell>ES_RoBERTa</cell><cell>hidden_size</cell><cell>768</cell></row><row><cell></cell><cell>max_position_embeddings</cell><cell>514</cell></row><row><cell></cell><cell>vocab_size</cell><cell>50262</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.nltk.org/search.html?q=stopwords</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://huggingface.co/google-bert/bert-base-uncased</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Work</head><p>In this paper, we -team MUCS, describe the models submitted to Subtask-1: 'Distinguishing between critical and conspiracy texts' of the shared task "Oppositional thinking analysis: Conspiracy theories vs critical thinking narratives" at 'PAN 2024', to distinguishing between critical and conspiracy texts in English and Spanish. Experiments are carried out with TF-IDF of char n-grams in the range <ref type="bibr" target="#b0">(1,</ref><ref type="bibr" target="#b4">5)</ref> to train several machine learning classifiers and several BERT variants are fine-tuned with the English and Spanish Train sets in transfer learning models, to distinguish the given unlabeled English and Spanish text as 'CONSPIRACY' or 'CRITICAL'. As the shared task participants were allowed to submit the predictions of only two models on the Test set, we fine-tuned English_BERT and EN_RoBERTa with the complete English Train set and Spanish_BERT and ES_RoBERTa with the complete Spanish Train set and the predictions of these models on English and Spanish Test sets were submitted to the organizers for evaluation. Among these two models, English_BERT and Spanish_BERT obtained MCC scores of 0.7162 and 0.6293 for English and Spanish texts securing 61 st and 36 th position respectively. As the given datasets are imbalanced, suitable text augmentation techniques followed by efficient text representation methods and context-aware models to distinguish between the two forms of oppositional thinking will be explored further.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>As the datasets consists of only Train sets, 33% of the Train sets at random are considered as Validation sets to evaluate the performances of the proposed models for both the languages and the remaining as the Train sets. Experiments were carried out by training various machine learning models using TF-IDF of char n-grams in the range <ref type="bibr" target="#b0">(1,</ref><ref type="bibr" target="#b4">5)</ref> and by fine-tuning various BERT variants mentioned above, to distinguish between conspiracy and critical texts in English and Spanish. The performances of the proposed models were evaluated on the Validation set based on macro F1 score and the performances are shown in Tables <ref type="table">5 and 6</ref> for English and Spanish datasets respectively.</p><p>The results shown in Tables <ref type="table">5 and 6</ref> illustrate that transfer learning models have performed better than machine learning models. As the shared task participants were allowed to submit the predictions of only two models on the Test sets, we fine-tuned English_BERT and EN_RoBERTa with the complete English Train set and Spanish _BERT and ES_RoBERTa with the complete Spanish Train set and the predictions of these models on English and Spanish Test sets submitted to the organizers were evaluated based on MCC scores. MCC is a metric used to evaluate the quality of binary classifications especially with imbalanced datasets. The MCC value ranges from -1 to +1, where +1 indicates perfect prediction, 0 indicates no better than random prediction, and -1 indicates total disagreement between prediction and observation. MCC provides a balanced and comprehensive measure of model's performance, considering all types of classification errors. MCC scores are calculated using the following formula:</p><p>where:</p><p>• 𝑇 𝑃 (True Positives) -number of correct positive predictions,</p><p>• 𝑇 𝑁 (True Negatives) -number of correct negative predictions,</p><p>• 𝐹 𝑃 (False Positives) -number of incorrect positive predictions, • 𝐹 𝑁 (False Negatives) -number of incorrect negative predictions.</p><p>Among the two transfer learning models submitted to the shared task, English_BERT and Span-ish_BERT models obtained MCC scores of 0.7162 and 0.6293 for English and Spanish texts securing 61 st and 36 th position respectively. The performances of these models is shown in the Table <ref type="table">7</ref>. The low performances of fine-tuned English_BERT and Spanish_BERT models could be attributed to the following reasons:</p><p>• The given datasets are highly imbalanced with approximately 2/3 of the total samples belonging to 'CRITICAL' class and 1/3 belonging to 'CONSPIRACY' class in both the languages. The highly imbalanced data will significantly impact the models' performance with a bias towards majority class. • English_BERT designed for both Dutch and English may potentially reduce its effectiveness for purely English tasks. Further, domain-specific models like English_BERT might not generalize well outside their specialized contexts. • Spanish_BERT model might be more suited for sentence similarity tasks rather than classification. Further, differences in pre-training data, fine-tuning processes, and the values of hyperparameters used in the models could also contribute to the disparity in the performances of the proposed models.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Oppositional Ideas, Not Dichotomous Thinking: Reply to Rorty</title>
		<author>
			<persName><forename type="first">H</forename><surname>Sharp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Political Theory</title>
				<meeting><address><addrLine>Sage CA; Los Angeles, CA</addrLine></address></meeting>
		<imprint>
			<publisher>Sage Publications</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="142" to="147" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Detection of Conspiracy Propagators using Psycho-Linguistic Characteristics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Giachanou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ghanem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Science</title>
		<imprint>
			<biblScope unit="page" from="3" to="17" />
			<date type="published" when="2023">2023</date>
			<publisher>SAGE Publications Sage UK</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">What are Conspiracy Theories? A Definitional Approach to their Correlates, Consequences, and Communication</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Douglas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Sutton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Annual review of psychology</title>
				<imprint>
			<publisher>Annual Reviews</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">74</biblScope>
			<biblScope unit="page" from="271" to="298" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification -Condensed Lab Overview</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ayele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Babakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">B</forename><surname>Casals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dementieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elnagar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Freitag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korenčić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moskovskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rizwan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taulé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ustalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Yimam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association CLEF-2024</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Hunting Conspiracy Theories during the COVID-19 Pandemic</title>
		<author>
			<persName><forename type="first">J</forename><surname>Moffitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Carley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Social Media+ Society</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">20563051211043212</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Beyond Oppositional Thinking: Radical Respect</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">G</forename><surname>Thornton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Romano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Philosophical Studies in Education</title>
				<imprint>
			<publisher>ERIC</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="199" to="209" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">B</forename><surname>Casals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dementieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elnagar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Freitag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korenčić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taulé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ustalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">On the Performance of Different Text Classification Strategies on Conspiracy Classification in Social Media</title>
		<author>
			<persName><forename type="first">M</forename><surname>Moosleitner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Murauer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>MediaEval</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Detecting Fake News during the COVID-19 Pandemic: A SEM-ML Approach</title>
		<author>
			<persName><forename type="first">K</forename><surname>Tahat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mansoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">N</forename><surname>Tahat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Habes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Alfaisal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khadragy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Salloum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comput. Integr. Manuf. Syst</title>
		<imprint>
			<biblScope unit="page" from="1554" to="1571" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Detection of COVID-19-Related Conpiracy Theories in Tweets using Transformer-Based Models and Node Embedding Techniques</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Peskine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Papotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2022, Multimedia Evaluation Workshop</title>
				<meeting><address><addrLine>Bergen, Norway</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023-01">January 2023. 2023</date>
			<biblScope unit="page" from="12" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Recordare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Fagni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tesconi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2405.12566</idno>
		<title level="m">Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Developing a Hierarchical Model for Unraveling Conspiracy Theories</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ghasemizade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Onaolapo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EPJ Data Science</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page">31</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Text Classification on Twitter Data</title>
		<author>
			<persName><forename type="first">P</forename><surname>Harjule</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gurjar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Seth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Thakur</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="160" to="164" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Additive Logistic Regression: A Statistical View of Boosting (With Discussion and a Rejoinder by the Authors)</title>
		<author>
			<persName><forename type="first">J</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hastie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tibshirani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The annals of statistics</title>
				<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="337" to="407" />
		</imprint>
		<respStmt>
			<orgName>Institute of Mathematical Statistics</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Feature Selection using Random Forest Classifier for Predicting Prostate Cancer</title>
		<author>
			<persName><forename type="first">M</forename><surname>Huljanah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Rustam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Utama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Siswantining</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IOP Conference Series: Materials Science and Engineering</title>
				<imprint>
			<publisher>IOP Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page">52031</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.06525</idno>
		<title level="m">Text Classification based on Ensemble Extreme Learning Machine</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
