1. Introduction

E. B. Marino);

Linguistic Markers of Population Replacement Conspiracy Theories in YouTube Immigration Discourse

Erik Bran Marino

Davide Bassi

Renata Vieira

0 0 Universidade de Évora, CIDEHUS , Évora , Portugal 1 Universidade de Santiago de Compostela , Santiago de Compostela , Spain

2025

000 0 0003

This paper presents a linguistic analysis of YouTube comments related to immigration discourse, analyzing the contrasts between standard anti-immigration comments and those linked to Population Replacement Conspiracy Theories (PRCT). Using a dataset of 71,137 YouTube comments classified into three stance categories (PRO, NEUTRAL, CONTRA) and PRCT annotation, we analyze the linguistic features of each group through LIWC (Linguistic Inquiry and Word Count). Our findings reveal significant diferences in the language patterns of PRCT comments, both in comparison to standard anti-immigration discourse (CONTRA) and to all other groups. These diferences appear particularly in religious references, power dynamics, conflict framing, and emotional tone. The high linguistic overlap (89.7%) between conspiracy and non-conspiracy antiimmigration discourse reveals the subtle nature of these diferences. These distinctive linguistic patterns provide valuable insights both for the understanding and the automatic detection of conspiracy theories in online discourse, contributing to the growing body of research on computational approaches to identifying harmful content online.

eol>Population Replacement Conspiracy Theory Immigration discourse YouTube comments LIWC analysis LLMs Deepseek Hybrid approach Computational Social Sciences

1. Introduction Immigration has become one of the central and most con

troversial topics in cultural and political debates across Western societies. The debate is increasingly influenced by Population-Replacement Conspiracy Theories (PRCTs) narratives that portray demographic change as an élite plot to replace native populations [1, 2]. Online, the mantra at the core of these narratives—the Great Replacement—has migrated from fringe blogs to mainstream platforms, reshaping how migration is framed and politicised [3].

The impact of PRCTs goes beyond mere rhetoric. Analyses of terrorist manifestos show that the Christchurch (2019) and Utøya (2011) attackers adopted the GreatReplacement frame as moral legitimation for violence [4, 5, 6]. Experimental work further demonstrates that exposure to PRCT claims heightens Islamophobia and support for extremist action [4]. These findings underscore the societal risks tied to PRCT difusion [5].

Automatic moderation faces two intertwined challenges. First, PRCT cues are lexically sparse, domainlfexible, and embedded in high-volume comment streams, limiting rule-based filters. Second, existing supervised classifiers require large, domain-specific corpora that are rarely available for niche conspiracies [7, 8]. Even stateof-the-art large language models (LLMs) may struggle when prompted zero-shot on conspiracy detection tasks [9, 10].

This study ofers a dual contribution:

1. Methodological: We provide, to our knowledge,

the first systematic evaluation of an open-weight LLM (DeepSeek-v3) for PRCT detection in a fewshot setting. Performance is validated against a gold subset independently annotated by two experts (see §3). 2. Psycholinguistic: Using LIWC, we deliver the ifrst fine-grained comparison of PRCT language with other stances in the immigration debate (PRO, CONTRA, NEUTRAL), illuminating diferences in temporal focus, power rhetoric and conlfict framing [9] 1.

These aims translate into two research questions: RQ1 Can DeepSeek-v3, with minimal in-context exam

ples, reliably distinguish PRCT comments from non-PRCT content? RQ2 Do PRCT comments exhibit psycho-linguistic patterns that difer systematically from other immigration stances?

1Throughout this paper we use psycholinguistic in the computational

social-science sense: the study of how everyday language reflects basic social and personality processes [11].

2. Related Work Our analysis is based on a dataset comprising 71,137 unique YouTube comments related to immigration. Specifically, we expanded the dataset described in Bassi

et al. [16] by crawling a total of 15 videos about immigraPRCTs comprise a family of narratives such as the Great tion (see Table 7 in the appendix for complete video list). Replacement, the Kalergi Plan, White Genocide and Eu- Following the methodology established in the referenced rabia. Recent scholarships track their strategic main- study, which demonstrated that parent comment contexstreaming, whereby far-right actors blend demographic tual information is crucial for accurate stance detection alarmism with cultural-defence rhetoric to broaden ap- in YouTube comments, we employed the same hybrid peal [1, 2]. pipeline to reconstruct conversation chains and preserve

In terms of computational approaches to conspiracy parent-child relationships between comments. detection, early systems combined rule-based extraction For stance classification, we utilized GPT-4o with conwith bag-of-words classifiers [ 8]. More recent pipelines textual information from reconstructed comment chains present an automated pipeline using BERT embeddings to detect the stance of the comments. The vast majorto discover narrative frameworks in conspiracy theories ity of comments mention migration. The classification and conspiracies. Evaluated against expert data, it shows scheme distinguished between three primary categories: relation extraction recall of 83.7-82.9% for Pizzagate and Bridgegate [7]. • CONTRA: expressing anti-immigration views

Large Language Models ofer new possibilities for this • NEUTRAL: expressing neutral, unclear or unredomain, promising zero-shot classification without costly lated perspectives towards immigration annotation. Previous works shows that GPT-3.5 and • PRO: expressing pro-immigration views LLaMA-2 outperform RoBERTa on generic conspiracy tasks but inflate false-positive rates [ 12, 13]. However, no prior study evaluates DeepSeek on PRCT specifically, leaving a clear research gap that we address.

From a linguistic perspective, corpus studies reveal that conspiracy texts favour future-oriented temporal frames, certainty language and out-group pronouns [8, 7].

Our work isolates PRCT language to test whether it is merely an intensification of generic anti-immigration talk or a qualitatively distinct register. In this context, LIWC remains a widely validated tool for psycholinguistic profiling. In extremist contexts it is able to capture cues pertinent to radical rhetoric [14]. Yet its capacity to discriminate between sub-types of anti-immigration discourse goes beyond its goals. By integrating LIWC with stance labels, we extend its interpretive utility.

Overall, the literature lacks (i) validated LLM approaches for PRCT detection and (ii) systematic linguistic characterisation that separates PRCT from nonconspiratorial rhetoric. Our study addresses both gaps, laying empirical foundations for future detection pipelines and theory-driven analyses of demographic conspiracy talk. Furthermore, Hernaiz [15] theorizes that conspiracy theories operate within the same secular rational frame as mainstream explanations, suggesting that linguistic diferences between conspiracy and nonconspiracy discourse may be more subtle than categorical, warranting empirical investigation of their shared and distinct features.

A detailed performance evaluation of GPT-4o for

immigration-related stance labelling is provided in [16].

The model achieved a − 1 = 78.7% on a manually labelled subset, demonstrating suficient accuracy to enable automated annotation across the entire dataset.

Subsequently, the comments were further analyzed using DeepSeek v3 in a few-shot learning approach to identify those containing Population Replacement Conspiracy Theory elements, resulting in the PRCT annotation.

The classification process employed carefully structured prompts that included reference examples extracted directly from the existing labeled dataset (5 PRCT examples and 5 Non-PRCT examples) to guide the model’s understanding. Representative PRCT and Non-PRCT examples for the few-shot prompt were drawn from the training pool via stratified random sampling across the 15 videos, balancing length, topic, and stance. The five PRCT instances include both explicit markers (e.g. explicit mention of "Great Replacement") and implicit cues (coded dog-whistles such as "demographic engineering"); likewise, the five Non-PRCT examples span policy-oriented, economic, and security-focused objections free of conspiratorial framing. The prompts featured explicit definitions of PRCT content, encompassing specific conspiracy narratives such as "Great Replacement Theory", "White Genocide Theory", "Eurabia", and "Kalergi Plan", as well as broader indicators like demographic warfare narratives, terms such as "invasion", "replacement", and "remigration", and claims of orchestrated population change.

Non-PRCT examples were defined to include policy discussions, border security debates, integration challenges, and economic impact analysis without conspiracy elements. The model was configured with temperature=0 to ensure deterministic and reproducible classifications, and was explicitly instructed to respond strictly with either 3. Methodology 3.1. Dataset "PRCT" or "Non-PRCT", avoiding ambiguous classifica- Count (LIWC) tool. LIWC is a text analysis software tions. To ensure the reliability of our PRCT classification, that calculates the percentage of words pertaining to we validated DeepSeek v3’s performance using a man- specific dictionaries falling into specific psychological ually annotated gold standard dataset of 500 YouTube and linguistic categories [17]. comments, evenly split between PRCT and Non-PRCT We processed all comments through LIWC, focusing classifications 2. Each comment was independently re- on the following key dimensions: viewed by two expert annotators following detailed an- Temporal focus: refers to the extent to which innotation guidelines that provided clear criteria for iden- dividuals characteristically direct their attention to the tifying PRCT content. The inter-annotator agreement past, present, and future [18]. LIWC derives temporal demonstrated high reliability with Gwet’s AC1 = 0.891 focus scores by counting the frequency of time-related and PABAK = 0.804, indicating substantial agreement words in text. For example, past focus includes words particularly for PRCT identification (Positive Agreement like "ago" or "did;" present focus captures "today," "is," and Rate: 0.947). DeepSeek v3 achieved 94.5% accuracy on "now," while future focus is based on "may," "will," and this gold standard, with balanced precision and recall, "soon"[19]. demonstrating robust detection capabilities across difer- Pronoun usage: Pronoun use highlights whether atent PRCT manifestations. tention is on others—third-person singular/plural (he/she,

This methodology allowed us to create a comprehen- they), on ourselves as distinct entities—first-person sinsive dataset that distinguishes between standard anti- gular pronouns (I), or ourselves embedded within a social immigration discourse and discourse specifically contain- relationship—first-person plural (we) and second-person ing population replacement conspiracy theories. Given (you) [20]. the nature of our study, we proceeded by removing du- Cognitive processes: This dictionary comprises over plicated comments and applying a word count filter to 1,000 entries that identify active information-processing; retain comments between 5 and 1000 words, ensuring suf- it yields six sub-scores (insight, causation, discrepancy, ifcient content for meaningful analysis while excluding tentativeness, certainty and diferentiation) [ 21 ]. These extremely short or excessively long comments. Table 1 dimensions capture the depth and style of mental elabdescribes the final distribution of stance and PRCT anno- oration, indicating whether individuals are reasoning tations in our dataset. analytically (causation, insight), expressing uncertainty or confidence (tentativeness, certainty), or making disCategory Count (%) tinctions and comparisons (diferentiation, discrepancy). Stance Emotional dimensions: LIWC distinguishes beCONTRA 37,531 (52.76%) tween broad sentiment and specific emotions [ 22 ]. The afNEUTRAL 22,190 (31.19%) fect category encompasses both positive tone (e.g., "good," PRO 11,416 (16.05%) "love," "happy") and negative tone (e.g., "bad," "hate," PRCT "hurt") words, which reflect general sentiment. The emoNon-PRCT 65,915 (92.66%) tion categories are more targeted, focusing on specific PRCT 5,221 (7.34%) emotion labels such as positive emotion (e.g., "joy," "excited"), negative emotion (e.g., "sad," "angry"), and dis

Total Dataset 71,137 (100.00%) crete emotional states including anxiety (e.g., "worry," Table 1 "fear"), anger (e.g., "mad," "frustrated"), and sadness (e.g., Distribution of stance categories and PRCT annotations in the "disappointed," "cry") [19]. These dimensions capture dataset both the valence and intensity of emotional expression in text.

Within the CONTRA stance category, 4,905 comments Social dynamics: this dictionary captures references (13.07%) contained PRCT elements, while 32,625 com- to interpersonal relationships and social behaviors, inments (86.93%) were standard anti-immigration discourse cluding social referents (e.g., "you," "we"), prosocial behavwithout conspiracy theories. This distinction forms the ior (e.g., "help," "care"), conflict (e.g., "fight," "argue"), and basis of our comparative linguistic analysis. communication acts (e.g., "said," "tell"). The framework also measures power-related language reflecting awareness of social hierarchies and clout, which captures confi3.2. LIWC Analysis dence or leadership displayed through language [19, 20]. To analyze the linguistic characteristics of each comment Linguistic style: captures stylistic markers (such as category, we utilized the Linguistic Inquiry and Word usage of exclamation and question marks, or periods) which can reflect formality or communicative intent [ 19]. 2Detailed annotation criteria for the PRCT validation task are pub- For each category, we averaged LIWC scores and licly available at https://zenodo.org/records/16605519. conducted comparative analyses to identify significant

3.3. Statistical Analysis

diferences, particularly between CONTRA-PRCT (the content from general anti-immigration rhetoric. The bi4,905 merged class) comments and other categories. We nary comparison directly addresses whether conspiracy adopted an exploratory approach, running the complete theories represent fundamentally diferent discourse or LIWC dictionary and retaining all variables for analysis. an intensification of existing patterns. Figure 3 displays the subset that reached || > 0.2 af- PRCT-Specific Feature Classification : We categoter multiple-comparison correction; these include both rized the LIWC dimensions as either PRCT-specific (statissingle-word scores (e.g. religion) and composite cate- tically significant after FDR correction with || ≥ 0.2) or gories (e.g. analytic). shared features (|| < 0.2). The overlap percentage was calculated as the proportion of shared features relative to total features analyzed.

Statistical Test Selection: given the large sample sizes

and non-normal distributions typical of linguistic data, for each dimension, we assessed normality conditions through Shapiro-Wilk and homogeneity of variance using Levene’s test. Normality assumption was violated in all 39 cases, hence we recurred to Kruskal-Wallis test.

Multiple Comparison Correction: Given the exploratory nature of our research (comparison of multiple LIWC dimensions across 4 diferent groups), we applied multiple comparison corrections. Specifically, False Discovery Rate (FDR), and Bonferroni Correction to identify most robust efects.

Efect Size : for each significant diference, we calculated Cohen’s d. In this regard, we highlight how usually efect sizes 0.2 ≤ | | ≥ 0.5 are considered small, however we considered efect sizes of || > 0.2 as substantial, in line with field-specific benchmarks for linguistic research [ 23, 24 ].

Two-Phase Analysis: Our analytical approach comprised two phases: (1) a comprehensive four-group comparison (CONTRA-PRCT, CONTRA, NEUTRAL, PRO) to establish general immigration discourse patterns, and (2) a focused binary analysis (CONTRA-PRCT vs CONTRA) to identify features specifically distinguishing conspiracy

4. Results Our analysis revealed distinct linguistic patterns in

immigration-related discourse, with significant diferences between stance groups while highlighting substantial overlap between conspiracy and non-conspiracy antiimmigration rhetoric.

4.1. General Immigration Discourse Patterns The comprehensive four-group comparison (CONTRA

PRCT, CONTRA, NEUTRAL, PRO) revealed systematic linguistic diferences across immigration stances. After applying FDR correction for multiple comparisons, the majority of LIWC dimensions showed significant diferences (FDR < 0.05).

Anti-Immigration vs Pro-Immigration Discourse.

As shown in Figure 1, both anti-immigration groups (CONTRA-PRCT and CONTRA) demonstrated a similar depersonalised rhetoric, signalled by a higher usage of third-person plural pronouns (they), reflecting out-group focus, and first-person plural pronouns ( we), signalling in-group consolidation, compared to PRO framing of demographic change as a spiritual or civilizaand NEUTRAL comments.Specifically, "They" pronouns: tional threat.

CONTRA-PRCT (3.36), CONTRA (3.17) vs PRO (2.51) vs Power Language ( = 0.233, FDR < 0.001; NEUTRAL (1.68); and "We" pronouns: CONTRA-PRCT CONTRA-PRCT: 3.621 vs CONTRA: 2.560): PRCT dis(1.43), CONTRA (1.30) vs PRO (0.97) vs NEUTRAL (0.77). course shows 41.4% higher usage of power-related lanAdditionally, PRCT discourse exhibited distinct cogni- guage, reflecting emphasis on elite control and orchestive processing patterns. PRCT comments showed the trated manipulation. highest analytic thinking scores (43.2) compared to all Conflict Framing ( = 0.219, FDR < 0.001: other groups (PRO: 38.2, NEUTRAL: 39.1, CONTRA: 39.2), CONTRA-PRCT: 0.853 vs CONTRA: 0.437): Conspiracy suggesting more structured, logical reasoning style. Con- discourse frames immigration as active conflict/warfare versely, PRCT comments demonstrated lower insight lan- with 95.2% higher conflict language usage. guage usage (1.7) compared to PRO (2.3) and NEUTRAL Tone ( = − 0.214, FDR < 0.001; CONTRA-PRCT: (2.6) groups, indicating less expression of sudden under- 30.674 vs CONTRA: 39.347): PRCT comments exhibit standing or realization. This pattern can indicate that significantly more negative tone, with 22.0% lower poswhile PRCT discourse employs analytical framing, it may itive sentiment scores than standard anti-immigration rely more on predetermined interpretive frameworks discourse. rather than exploratory or discovery-oriented thinking.

4.2. PRCT-Specific Linguistic Signature 5. Discussion

To isolate features unique to conspiracy discourse from The linguistic patterns identified in our analysis ofer siggeneral comments against immigration, we conducted nificant insights into the nature of PRCT discourse and its a focused binary comparison between CONTRA-PRCT distinction from standard anti-immigration rhetoric. Our (n=4,905) and CONTRA non-PRCT (n=32,625) comments. findings reveal that while conspiracy and non-conspiracy This analysis revealed a striking finding: 89.7% of lin- anti-immigration discourse share 89.7% of their linguistic guistic features showed negligible diferences (Cohen’s features, they difer significantly in four key dimensions: d < 0.2) between conspiracy and non-conspiracy anti- religious references, power dynamics, conflict framing, immigration discourse, suggesting that anti-immigration and emotional tone. discourse, regardless of conspiracy content, shares fundamental characteristics of outgroup construction and 5.1. High Linguistic Overlap authoritative positioning. As shown in Figure 2, only four dimensions exceeded the meaningful efect size thresh- A potential limitation is that the Non-PRCT compariold. son set, although explicitly anti-immigration, aggregates

As shown in Figure 3, four dimensions demonstrated heterogeneous sub-registers (security, economic, assimmeaningful efect sizes ( ≥ 0.2) with statistical signifi- ilationist). This breadth may inflate the observed lincance after FDR correction: guistic overlap. Nevertheless, the residual diferences we detect—religious framing, power attribution, conflict, PRRCeTl:i1g.i2o7n4 v(s C=ON0T.R2A51: ,0.5F9D1)R: PR<CT0.d0i0sc1o;uCrsOeNshToRwAs- and tone—remain interpretable within Hernaiz [15]’s 115.6% higher usage of religious language, reflecting the framework of shared rational frames, suggesting that PRCT discourse intensifies, rather than qualitatively de- higher use of religious terminology in PRCT comments reparts from, mainstream anti-immigration rhetoric. The lfects the framing of immigration as not merely a political substantial overlap (89.7%) between PRCT and standard or economic issue, but as a threat to cultural and spiritual anti-immigration discourse, in fact, aligns with Hernaiz identity. This finding aligns with Hernaiz [15]’s obser[15]’s theoretical framework of shared secular rational vation that conspiracy theories operate within a hybrid frames. Rather than representing fundamentally diferent framework, employing rational secular arguments while discourses, conspiracy theories may intensify existing simultaneously appealing to notions of "faith" and "berhetorical patterns while operating within the same ra- lief" that pair them with religious explanations. Like relitional framework as mainstream explanations. Our find- gious narratives, PRCT discourse ascribes demographic ing of high ANALYTIC thinking combined with low IN- change to volitional agents with malevolent intent, transSIGHT language suggests that PRCT commenters employ forming a social phenomenon into a spiritual or civianalytical reasoning to validate existing beliefs rather lizational crisis. This supports previous findings that than explore new understandings, potentially reflecting replacement conspiracy theories present demographic the confirmatory versus exploratory cognitive distinction change as an existential threat to a civilization’s core val[15]. This high overlap could pose challenges for auto- ues [1]. This pattern manifests empirically in comments mated detection systems but provides valuable insights such as "Have you heard of Islamic Jihad? that’s most for understanding how conspiracy narratives emerge likely why........Islamization!!" (relig=27.27), where immifrom and relate to mainstream discourse. gration becomes reframed as deliberate religious warfare rather than demographic movement, directly invoking 5.2. PRCT-Specific Features the Eurabia conspiracy framework that portrays Muslim immigration as orchestrated civilizational replacement.

The four distinctive features of PRCT discourse, as visu- Power dynamics (d = 0.233): The emphasis on poweralized in Figure 3, provide interesting insights into its related language could reflect the classic conspiratorial conceptual structure: view that demographic changes are orchestrated by pow

Religious language (d = 0.251): The significantly erful elites rather than resulting from natural social processes. This finding corroborates studies showing that at- conflict framing and negative afect—map closely onto tribution of agency and intentionality to shadowy power defining features documented in other conspiracy famicenters is a defining characteristic of conspiracy thinking lies (e.g., QAnon, anti-vaccination, or Great Reset narra[ 10, 25 ]. The linguistic manifestation of this attribution tives). Future work can test whether these markers genappears in constructions such as "Import third world → be- eralize across domains, turning the present fine-grained come third world" (power=66.67), where the verb "import" analysis into a broader framework for detecting conspirtransforms organic migration processes into deliberate atorial escalation in online discourse. elite manipulation. This deterministic arrow formulation removes agency from migrants themselves while imply- 5.3. Socio-Linguistic Mechanisms in PRCT ing the existence of powerful orchestrators capable of Discourse: Theoretical Perspectives engineering demographic transformation, exemplifying how power-related language shifts explanatory frame- The linguistic patterns identified in our analysis invite works from socio-economic to conspiratorial causation. broader theoretical reflections on the socio-linguistic

Conflict framing (d = 0.219): PRCT discourse shows mechanisms underlying PRCT discourse. While acknowlnearly double the rate of conflict terminology compared edging the limitations of drawing definitive conclusions to standard anti-immigration comments (0.85% vs 0.44%), from a single study with an English-language YouTube representing a 95.2% relative increase. This signals how dataset, the distinctive features we observed suggest sevconspiracy theories transform social issues into exis- eral promising avenues for theoretical exploration. The tential struggles between groups [ 26 ]. This Manichean high linguistic overlap (89.7%) between PRCT and stanframing can serve to legitimize more extreme responses, dard anti-immigration discourse suggests what might as demonstrated by Bracke and Aguilar [6]. The mil- be conceptualized as a rhetorical continuum rather than itarization of discourse materializes in statements like a categorical distinction. This finding resonates with "aggressively defending our borders from invaders" (con- the concept of the Overton window [ 27 ] - the range of lfict=25.00), where immigration policy becomes reconcep- politically acceptable discourse at a given time. Rather tualized as warfare requiring defensive military action. than emerging as entirely separate discourses, conspirThe lexical choice of "invaders" transforms migrants from acy narratives may represent incremental shifts along policy subjects into military threats, while "aggressively this continuum, potentially facilitating the mainstreamdefending" positions exclusionary responses as legitimate ing of fringe ideas through gradual rhetorical transforself-defense, illustrating how conflict framing escalates mations. Within this continuum, we observe that the immigration discourse from policy debate to existential significantly higher use of religious terminology in PRCT combat. comments (+115.6%) might reflect the so-called sacral

Negative tone (d = -0.214): The markedly more nega- ization of collective identity - a process through which tive emotional tone of PRCT discourse, with 22.0% lower political issues are transformed into matters of existenpositive sentiment scores, shows the afective dimen- tial and moral value [ 28 ]. While our data cannot estabsion of conspiracy theories. This emotional negativity lish causality, this linguistic pattern aligns with Girard’s may function as a mobilizing mechanism, generating (2020) theory of sacred diferentiation, where boundaries moral outrage and urgency [4]. This heightened negativ- between in-group and out-group acquire quasi-religious ity appears in apocalyptic formulations such as "Most significance. The emphasis on power-related language of Europe has been destroyed because of illegal immi- (+41.4%) in PRCT discourse further connects to what Hofgrants" (tone_neg=30.00), where the verb "destroyed" es- stadter [ 30 ] termed the paranoid style in political rhetoric calates beyond policy criticism to civilizational annihila- - the perception of systematic, malevolent orchestration tion. The continental scope ("Most of Europe") and direct behind social phenomena. This linguistic pattern may causal attribution ("because of") exemplify how PRCT reflect the construction of alternative relevance strucdiscourse employs catastrophic language to transform tures through which events are reframed as evidence demographic statistics into existential crisis narratives, of hidden designs [ 31 ]. Equally notable is the substanintensifying emotional engagement through linguistic tial increase in conflict terminology (+95.2%), suggesting extremity. a potential militarization of the interpretive frame that

These four linguistic markers ofer insights for both transforms political debate into existential struggle. This socio-psychological understanding of conspiracy dis- might create what Bauman [ 32 ] characterizes as a discurcourse and the development of computational detection sive state of emergency in which exceptional responses systems, providing empirically grounded features that become justified by the perception of imminent threat. could enhance automated identification of PRCT content Such framing represents not merely a rhetorical choice online. While this study isolates Population-Replacement but a fundamental shift in how immigration discourse is Conspiracy Theories, the four linguistic dimensions we conceptualized and processed. These theoretical perspecidentify—religious sacralization, elite power attribution, tives collectively suggest several promising directions for future research. Longitudinal studies could track the evo- generalizability to other platforms and languages where lution of these linguistic markers over time to understand conspiracy discourse could manifest diferently. The auhow discursive shifts occur. Comparative analyses across tomatic classification process, though efective with high diferent languages and cultural contexts would test the agreement scores, inevitably introduces some risk of misgeneralizability of these patterns, while experimental classification that future work might address through studies might investigate how exposure to these specific additional validation approaches or multi-platform comlinguistic features afects audience perceptions and be- parisons. liefs. It is important to emphasize that these theoretical Regarding data handling, our research relies on userinterpretations remain speculative based on our limited generated content from public YouTube videos, raising dataset. The patterns we observed ofer intriguing cor- important privacy considerations. We conducted this relations, but establishing causal relationships between research in accordance with GDPR Article 9(2)(j) and these linguistic features and the social mechanisms de- Article 89, which permit processing of potentially senscribed would require more extensive mixed-methods sitive data for research purposes with appropriate saferesearch combining computational and qualitative ap- guards. Throughout our analysis, we removed personal proaches. Nevertheless, these preliminary findings sug- identifiers from collected comments, focused on aggregest that the subtle linguistic distinctions between con- gate linguistic patterns rather than individual profiles, spiracy and non-conspiracy discourse may reveal deeper and maintained secure data storage with restricted access. social and cognitive processes worthy of further inves- Although the YouTube videos themselves remain pubtigation. Future research might investigate whether the licly accessible, we do not publish the raw comment data transition from mainstream to conspiratorial discourse openly to protect user privacy. Researchers interested in follows predictable linguistic trajectories, and how im- accessing the dataset for scientific purposes may contact migration discourse becomes embedded within broader the authors with appropriate research ethics documencivilizational or existential frames. tation, with any data sharing conducted in compliance with GDPR and relevant national regulations.

This research also raises broader ethical questions 6. Conclusion about the study and identification of conspiracy theories online. While identifying linguistic markers of potentially harmful content could facilitate better content moderation, we recognize the complex balance between reducing harmful misinformation and protecting legitimate discourse. The high linguistic overlap (89.7%) between conspiracy and non-conspiracy anti-immigration discourse underscores the subtlety of these distinctions and the risks of over-moderation based solely on automated detection. Our findings should be interpreted as identifying patterns across large samples, not as definitive classifiers for individual comments. This complexity highlights the importance of human oversight in content moderation systems that might leverage these linguistic insights.

This study advances both methodological and theoretical

fronts. RQ1 asked whether DeepSeek-v3 can reliably detect PRCT content with minimal examples; our validation on a 500-comment gold set (§3) confirms 94.5 % accuracy (balanced precision/recall), demonstrating that a LLM in a few-shot regime is adequate for this task.

RQ2 examined whether PRCT comments exhibit distinct psycho-linguistic patterns; the comparison revealed four robust markers—religious references, power dynamics, conflict framing and negative tone—that systematically diferentiate PRCT from standard anti-immigration discourse.

While 89.7 % of linguistic features are shared between conspiracy and non-conspiracy anti-immigration comments, the four PRCT-specific dimensions remain stable and interpretable. These findings underscore that con- Acknowledgments spiracy narratives often intensify, rather than abandon, mainstream rhetorical frames, and they provide empiri- This research was conducted as part of a larger project cally grounded cues for automated moderation systems. focused on detecting disinformation such as conspiracy theories in online discourse. The authors would like to thank their supervisors and colleagues for their guidance 7. Limitations and Ethical and support throughout this research. We are particularly Considerations grateful to Katarina Laken for her valuable contributions and insightful advice. This work was supported by the While our study reveals significant linguistic patterns HYBRIDS project, which has received funding from the in PRCT discourse, several limitations and ethical con- European Union’s Horizon Europe research and innosiderations warrant discussion. Our analysis focuses on vation programme under the Marie Skłodowska-Curie English-language YouTube comments, which may limit Grant Agreement No. 101073351 and from the UK Research and Innovation (UKRI) Horizon Europe funding [9] M. Hunter, T. Grant, Is linguistic inquiry and guarantee (Grant Number: EP/X036758/1). The work is word count (liwc) reliable, eficient, and efective partially supported by the Portuguese Science Founda- for the analysis of large online datasets in forensic tion as part of the projects CEECIND/ 01997/2017 and and security contexts?, Applied Corpus LinguisUIDP/00057/2025. The content of this work reflects only tics 5 (2025) 100118. doi:10.1016/j.acorp.2025. the authors’ view and the funding agencies are not re- 100118. sponsible for any use that may be made of the information [10] A. Platt, J. Brown, A. Venske, Toward detecting it contains. conspiracy language in misinformation documents, in: Proceedings of the 2022 Computers and People Research Conference (SIGMIS–CPR ’22), 2022.

References doi:10.1145/3510606.3551895. [11] J. W. Pennebaker, The secret life of pronouns, New [1] M. Ekman, The great replacement: Strategic main- Scientist 211 (2011) 42–45.

streaming of far-right conspiracy claims, Conver- [12] T. Vergho, J.-F. Godbout, R. Rabbany, K. Pelrine, gence 28 (2022) 1127–1143. Comparing gpt-4 and open-source language mod[2] M. Sedgwick, The great replacement narrative: Fear, els in misinformation mitigation, arXiv preprint anxiety and loathing across the west, Politics, Reli- arXiv:2401.06920 (2024). gion & Ideology 25 (2024) 548–562. doi:10.1080/ [13] A. Kumar, R. Sharma, P. Bedi, Towards optimal nlp 21567689.2024.2424790. solutions: analyzing gpt and llama-2 models across [3] E. B. Marino, J. M. Benitez-Baleato, A. S. Ribeiro, model scale, dataset size, and task diversity, EngiThe polarization loop: How emotions drive propa- neering, Technology & Applied Science Research gation of disinformation in online media—the case 14 (2024) 14219–14224. of conspiracy theories and extreme right move- [14] A. Etaywe, K. Macfarlane, M. Alazab, A cybertments in southern europe, Social Sciences 13 (2024) errorist behind the keyboard: An automated text 603. analysis for psycholinguistic profiling and threat [4] M. Obaidi, J. R. Kunst, S. Ozer, S. Y. Kimel, The assessment, Journal of Language Aggression and “great replacement” conspiracy: How the per- Conflict (2024). ceived ousting of whites can evoke violent extrem- [15] H. A. P. Hernaiz, Competing explanations of global ism and islamophobia, Group Processes & Inter- evils: Theodicy, social sciences, and conspiracy thegroup Relations 25 (2021) 1675–1695. doi:10.1177/ ories, AGLOS: journal of area-based global studies 13684302211028293. 2 (2011) 27. [5] M. Davis, Violence as method: The “white replace- [16] D. Bassi, M. J. Maggini, R. Vieira, M. Pereira-Fariña, ment”, “white genocide”, and “eurabia” conspiracy A pipeline for the analysis of user interactions in theories and the biopolitics of networked violence, youtube comments: A hybridization of llms and Ethnic and Racial Studies (2024). doi:10.1080/ rule-based methods, in: 2024 11th International 01419870.2024.2304640, advance online pub- Conference on Social Networks Analysis, Managelication. ment and Security (SNAMS), 2024, pp. 146–153. [6] S. Bracke, L. M. H. Aguilar, The politics of replacement: from “race suicide” to the “great replace- [17] dYo.iR: 1.0T.a1u1sc0z9ik/,SJN.AWM.SP6e4n3n1e6b.a2k0e2r,4.T1h0e8p8s3y7c8h1o.logment”, in: The politics of replacement, Routledge, ical meaning of words: Liwc and computerized 2023, pp. 1–19. text analysis methods, Journal of Language and [7] S. Shahsavari, T. R. Tangherlini, B. Shahbazi, Social Psychology 29 (2010) 24–54. doi:10.1177/ E. Ebrahimzadeh, V. Roychowdhury, An automated pipeline for the discovery of conspiracy and [18] S0.26J.19B2a7rnXe0s9,351S6t7u6ck. in the past or living in conspiracy-theory narrative frameworks, PLOS the present? temporal focus and the spread ONE 15 (2020) e0233879. doi:10.1371/journal. of covid-19, Social Science & Medicine 280 [8] pMo.nSea.m0o2r3y3, 8T7.9M.itra, Conspiracies online: User (2021) 114057. doi:https://doi.org/10.1016/ discussions in a conspiracy community follow- [19] jR..sLo.cBsocyidm,eAd..A20sh2o1k.k1u1m40a5r,7S.. Seraj, J. W. Pening dramatic events, in: Proceedings of the nebaker, The development and psychometric propTwelfth International Conference on Web and So- erties of liwc-22, Austin, TX: University of Texas cial Media, ICWSM 2018, Stanford, California, at Austin 10 (2022) 1–47. URL: https://www.liwc. USA, June 25-28, 2018, AAAI Press, 2018, pp. 340– app/static/documents/LIWC-22%20Manual%20-% 349. URL: https://aaai.org/ocs/index.php/ICWSM/ 20Development%20and%20Psychometrics.pdf. ICWSM18/paper/view/17907. [20] E. Kacewicz, J. W. Pennebaker, M. Davis, M. Jeon,

Appendix YouTube Videos Used in Dataset Collection

• Chinese migrants are fastest growing group crossing into U.S. from Mexico youtube.com/watch?v=M7TNP2OTY2g • Native American Shuts Down Immigration Protest youtube.com/watch?v=2utsjsWOWUA • Migrants evade Texas floating barrier youtube.com/watch?v=2i8n6jCH1S4 Declaration on Generative AI

Psychology 33 ( 2014 ) 125 - 143 . doi:https://doi. org/10.1177/0261927X13502654.

[21]

R. L.

Moore , C.-J. Yen , F. E. Powers , Exploring the relationship between clout and cognitive processing in mooc discussion forums , British Journal of Educational Technology 52 ( 2021 ) 482 - 497 . doi:https://doi.org/10.1111/bjet.13033.

[22] K. K. Aldous , J.

An , B. J.

Jansen , Measuring 9 emotions of news posts from 8 news organizations across 4 social media platforms for 8 months , Trans. Soc. Comput . 4 ( 2022 ). URL: https://doi.org/10.1145/ 3516491. doi: 10 .1145/3516491.

[23]

Plonsky ,

F. L.

Oswald , How big is “big”? interpreting efect sizes in l2 research, Language learning 64 ( 2014 ) 878 - 912 .

[24]

Wei ,

Hu ,

Xiong , Efect size reporting practices in applied linguistics research: A study of one major journal , Sage Open 9 ( 2019 ) 2158244019850035 .

[25]

Brotherton , Suspicious minds: Why we believe conspiracy theories , Bloomsbury Publishing , 2015 .

[26]

Barkun , A culture of conspiracy: Apocalyptic visions in contemporary America , volume 15 , Univ of California Press, 2013 .

[27]

N. J.

Russell , An introduction to the overton window of political possibilities , Mackinac Center for Public Policy 4 ( 2006 ).

[28]

Durkheim , Suicide: A study in sociology , Routledge, 2005 .

[29]

Girard , Il capro espiatorio, Adelphi Edizioni spa , 2020 .

[30]

Hofstadter , The paranoid style in American politics , Vintage, 2012 .

[31]

Gofman , Frame analysis: An essay on the organization of experience ., Harvard University Press, 1974 .

[32]

Bauman , Retrotopia, Revista Española de Investigaciones Sociológicas (REIS) 163 ( 2018 ) 155 - 158 .

• Denmark Is Leading Europe's Anti-Immigration Policies youtube .com/watch?v=zpkBKEPxze4

• Venezuelan Immigrant: 'I Regret Having Come to the United States' youtube .com/watch?v=3FPbZcVLTBI

• Migrant group attempts mass entry into US at Mexico border youtube .com/watch?v=h_TqO9EqMhY

• Norway's Muslim immigrants attend classes on western attitudes to women youtube .com/watch?v=oKY600o3CXw

• Why does Sweden no longer wants immigrants? youtube .com/watch?v=5CSUimZjiI0

• How Sweden is Destroyed by the Immigration Crisis youtube .com/watch?v=rUw4cs2MHwc

• Migrant crisis reaches boiling point on Staten Island youtube .com/watch?v=-LDra78ksTo

• "Deportation, not relocation!" Poland votes on illegal migration youtube .com/watch?v=x4afwGepMkM

• Students Say Obama Immigration Quote Is Racist ... When They Think It's From Trump youtube .com/watch?v=Vj9IxVlLRl0

• US' illegal immigrants crisis: Elon Musk visits Texas youtube .com/watch?v=2_iYuiHyzKQ

• Migrant beats resident, steals flag from NY home youtube .com/watch?v=FTXZmor6KBY