Introduction

Textual Analysis for Radicalisation Narratives aligned with Social Sciences Perspectives

Ronald Denaux

rdenaux@expertsystem.com 0

Jose Manuel Gomez-Perez

jmgomez@expertsystem.com 0 0 Expert System , Madrid , Spain

2019

One of the unintended consequences of the Web is that it can function as a radicalising medium. Hence, developing information systems that are capable of detecting radicalising content is one of the key challenges faced by society to prevent and minimise radicalisation. Fortunately, much work has already been done by social scientists to understand key factors in the radicalisation process and common narratives. This paper presents work to reuse this understanding from social science in a way that is useful for designing and developing information systems. We present work summarising various perspectives on the concept of narratives and how they apply to radicalisation domains; in particular, we focus on islamic radicalisation as a key example of radicalisation. We introduce three taxonomies to help capture di erent aspects of radicalisation narratives and present a system for identifying mentions of one of such aspects in texts: strategic radicalisation narratives for islamic radicalisation.

Introduction

Social scientists have been studying and modelling radicalisation processes for decades and have developed models to think about both the radicalisation process and communication strategies that radical groups use in order to convey their message and recruit new members [KT11, Du 17, GRBM16]. In this paper we argue that much of this work can and should be used to inform the design and development of radicalisation detection systems. In particular, most of the existing systems relying on machine-learning approaches behave as black-box systems and are therefore di cult to use and trust in practice. Being able to refer to existing radicalisation models can help to explain classi cation results of such systems.

In this paper, we look into how we can adapt notions of radicalisation narratives from social sciences into NLP systems. The main challenges we tackle are: (i) ill de ned concept of narrative; (ii) mismatch between human and textual analysis capabilities; (iii) lack of formal, machine-actionable, representations for narrative analysis; (iv) lack of human annotated content to train automated detection systems and (v) unbalanced appearance of narratives in real-world texts.

Our main contributions are: (i) identi cation of three types of narratives from social science in Sect. 2, (ii) a set of taxonomies aligned to social science models (described in Sect. 3), (iii) an implementation of one of these taxonomies as a text annotation system and an analysis of various magazines published by islamic radical groups in Sect. 4. 2

Social Science Perspectives on Narratives

Social scientists and domain experts tend to use the term radical narrative loosely to describe the main messages that are disseminated by a radical organisation. In order to implement automated systems, we need to have a better understanding of what a narrative is. Performing an extensive literature review on the conceptualisation of this term is not in the scope of this paper, however we review a few representative works that provide a wider understanding of the term that can help us to focus the scope and hence, help us to generate useful semantic resources for radicalisation narratives.

From a generic, human sciences perspective, Riessman [Rie05] states that any text can be narrative if in it: \events are selected, organised, connected, and evaluated as meaningful for a particular audience". Riessman introduces 4 main types of narrative analysis: (i) thematic analysis focuses on what is said in the text; (ii) structural analysis focuses on how the message is structured; (iii) interactional analysis looks at how the teller interacts with the listener and (iv) performative analysis looks at other factors besides the spoken or written word such as actions performed by the teller alongside the messages. In the context of this paper, we will focus mainly on supporting thematic narrative analysis with some basic structural analysis. Technological approaches for aiding in interactional and performative analyses are not in the scope of this paper.

Lewandowsky et al. [LSF+13] provide a psychological point of view for narratives in the context of con icts and disinformation[Com18]. They consider narratives as mental frames, i.e. \necessary cognitive tools, designed to pare down information in order to manage complexity", hence they are not the same as propaganda or spin; rather they facilitate communication. However, due to their inherent (over)simpli cation of reality, narratives can be misused to spread misinformation more e ectively by making the misinformation coherent with the prevalent narrative. Thus, the prevalence of a narrative, i.e. whether alternative narratives are ignored by the media or the recipients, is important when considering its potential misuse.

Focusing more on radicalisation, and more speci cally Islamic extremism, Halverson et al. [HGC11] de nes a narrative not as a single story, but as a system of stories ; i.e. a collection that is coherent and reinforces the same themes. When this de nition of narrative is further linked to a (cultural) group identity, it becomes a master narrative. For example, for Muslims, the sacred texts provide a collection of stories (i.e. a narrative) that tells them who they are and how they should behave. The book [HGC11] further describes 11 \master narratives" employed by Islamist extremists to \connect or resonate within a set of cultural and historical circumstances". It is worth noting that the individual narratives on their own are not extremist, they simply collect various versions of stories about speci c signi cant events, periods and/or myths in the history of islam; however, Islamist extremists refer to these to increase their appeal to muslims.

Betz [Bet08] focuses on narratives in the context of con icts, also called strategic narratives and uses a de nition by Sir Lawrence Freedman as \compelling storylines which can explain events convincingly and from which inferences can be drawn". These narratives are deliberately constructed out of current ideas, express a \sense of identity and belonging" and provide a \sense of purpose". Importantly, these narratives often \appeal to emotion" and make use of \suspect metaphors and dubious historical analogies". [Bet08] also introduces the concept of vertical narrative coherence, which states that there are multiple levels of narratives and that successful strategic narratives are those that manage to coherently link to both master and individual narratives. These are the most useful to use as a basis for developing automated systems:

Cultural narratives provide a cultural identity and are grounded in tradition. The master narratives described in [HGC11] are an example of this type of narrative. This also maps to the mass-level radicalisation indicators [Du 17, FAA18].

Strategic narratives provide a group identity and are grounded in ideology and map to the meso level radicalisation indicators.

Local and individual narratives are stories of speci c places and individuals; these map to micro-level radicalisation indicators. 3

Radicalisation Narrative Taxonomies

Based on the three types of narratives described in the previous section, we set out to de ne taxonomies that can be used as the basis of text annotation tasks, in particular to aid in the detection of radical texts. One challenge in this regard is that in general, it is not possible to de ne a single taxonomy that is generally applicable to all types of radicalisation (e.g. radical islam, left- and right-wing extremism, white nationalism). Generalisability in this sense can apply to the taxonomy itself but also to the text analyser built to annotate texts according to the taxonomy. Of course, if the taxonomy itself is not generalisable, then the annotator targeting it is also not generalisable. The main reason impeding generalisability is that, as we saw in the previous section, the various narratives depend on speci c knowledge about the cultural and radical group as well as on the individual.

The cultural narratives are the least generalisable: it is necessary to derive custom taxonomies for each cultural group; e.g. cultural narratives about the Battle of Badr 2 or the 72 virgins 3 are speci c to a muslim cultural identity, but not to e.g. a christian or jewish cultural identity (although both narratives can have similar parallels in those cultures). Fortunately, the number of cultural identities is fairly limited and lists of narratives are relatively easy to acquire.

Strategic narratives are generalisable up to a point : the main strategies used by radical groups are generalisable, but the speci cs can only be captured by taking into account the characteristics of the radical group. With other words, the main categories in the taxonomy are generally applicable, but categories deep in the taxonomy may be necessary to model strategic narratives used by {and applicable only{ to speci c radical groups. This lack of generalisability also applies to text annotators targeting even intermediate levels in the taxonomy. For example, a general strategy is to discredit other groups, but if you want to capture this in more detail to analyse whether the group being discredited is perceived to be competition or enemy, then a single article published by the radical group on its own is often insu cient (it may be possible to infer such relations from a larger corpus, but this is out of the scope of this paper). In such cases you need to add domain knowledge into the annotator, e.g. ISIS considers Al-Qaeda and the Muslim Brotherhood to be a competitor, while they consider e.g. christians and the West as enemies. Similarly, radical groups, especially online, often develop their own jargon to refer to aspects of their group identity. For example, radical islamist use the term kufr as a despective term to refer to non-believers. Such non-generalisable knowledge is even more prevalent when trying to capture aspects of a group's identity. For example, Incel group identity relies on a large number of terms4 to refer to subgroups; hence knowing the correct term to refer to the most radical members in that group again requires custom knowledge tailored to the incel group identity. Such domain knowledge can be injected into annotation systems using di erent approaches like rule-based (explored in this paper) or machine-learning based (not in the scope of this paper, but relying on the annotation of large corpora).

Individual narratives are the most generalisable as individuals tend to follow similar radicalisation processes and descriptions of these can be captured by looking for mentions of radicalisation indicators[Du 17, FAA18] which are not speci c to a radicalisation type: grievances, meeting radicalised people, personal circumstances (death of a relative, unemployment, history of crime). These narratives can 2http://trivalent.expertsystemlab.com/thes/conceptschemes/ISLAMIC_MASTER_NARRATIVE/c/3 3http://trivalent.expertsystemlab.com/thes/conceptschemes/ISLAMIC_MASTER_NARRATIVE/c/12 4https://incels.wiki/w/Incelese be found in stories and interviews about how someone was radicalised. Such articles typically use standard language (not jargon speci c to the radical group), therefore it should be possible to build annotation systems which are capable of nding such mentions in a fairly generic way. However, one problem with building this type annotation systems is that there are not many available documents containing this type of narratives; also, detecting such stories is not directly useful for detecting radical texts or preventing radicalisation.

In this paper we aim to improve the automatic support provided to human analysts by de ning taxonomies that can be implemented as automated text annotation systems that facilitate radicalisation narrative analysis. To this end, we present a taxonomy of Islamist extremist narratives which takes into account the three types of narratives discussed above. The full taxonomy has 89 nodes organized in 3 main sub-taxonomies5: Islamic Master Narrative: contains 37 subcategories, divided in 3 sublevels. The 11 subcategories in the rst sublevel are those proposed in [HGC11] and further sublevels provide more speci c subarguments within the narratives.

Strategic Narrative: contains 20 subcategories, divided in 3 sublevels. The rst 2 sublevels are generic narratives that can be exploited to promote radical ideologies by any radical group, and the 3rd sublevel is speci c to strategic narratives by muslim extremist groups, in particular ISIS. The taxonomy has been derived from two lists of ISIS narratives, one presented in [GRBM16] and another provided by domain experts at the International Institute for Counter-Terrorism6. The rst (generic) sublevel consists of categories: { (promote) group identity: any narrative that promotes the radical group spreading this message, e.g. by promoting a winner's image for the group, promoting the social aspects of the group, associating the group with a sense of adventure and in general promoting the ideology of the group (or encompassing groups). { Discredit other groups: any narrative that discredits groups other than the group spreading the message. Typically the enemy is attacked, although similar but competing groups are often also discredited. { Sow discord between groups: whereby divisions between groups are exploited and reinforced. { Moral obligation: narratives that appeal to social and cultural norms to justify a certain type of action.

Radicalisation indicator: contains 28 subcategories, divided in 2 sublevels, referring to structural causes and trigger events at all 3 (macro, meso and micro) levels or the radicalisation process. All of these subcategories are generic and not speci c to a speci c type of radicalisation. The taxonomy itself does not de ne subcategories for structural cause, trigger event, macro-, meso- or micro-level; however all of the subcategories are linked to a radicalisation ontology, which can be used to automatically classify the subcategory into one of these axes. This taxonomy was derived from the narrative conceptualisations described in [Du 17, KT11]. 4

Implementation and Detection Results

We have implemented a rule-based system that annotates Strategic Narratives in texts. These rules are written in a custom rule language on top of Expert System's NLP pipeline. The core of the pipeline consists of standard tokenization and lemmatization steps and a unique disambiguation step, which links n-grams in the text to concepts in a lexico-semantic knowledge graph (called Sensigrafo, which is similar to WordNet). This pre-analysed content is then fed into our rule engine, which matches parts of the text to rules hand-crafted by linguists to identify segments of text that should be annotated with one of the nodes in the taxonomy. Rules may refer to lemmas, keywords, concepts and related concepts (by following speci c links in the knowledge graph) and can be combined using a variety of positional, syntactic and semantic operators. The resulting categorizer uses 80 rules, some of which rely on recognising words and concepts that are speci c to muslim extremist slang.

In order to evaluate the rule-based narrative detector, we analysed a few collections of magazines: 5Available from http://trivalent.expertsystemlab.com/thes/ 6http://www.ict.org.il/ as a baseline, we selected 9 articles from \Young Muslim Digest"7 (YMD), a long-running magazine from India which is non-radical. These articles were selected from their homepage as well as from the \popular" articles recommendations from their about page. Topics are centred around lifestyle choices in uenced by islam (e.g. husband-wife relationships, youth, conversion to islam), but also geo-politics (e.g. mining opportunities in Afghanistan, war). three issues of Al-Risalah, published by the Nusra Front[Kot18] 14 English issues of Dabiq, published by ISIS between 2014 and 2016 9 issues of of Rumiyah, a magazine published by ISIS since mid 2016

Table 1 provides the results which, overall, show (i) a stark contrast between the radical publications and the non-radical baseline (YMD) and (ii) diverging prevalence of narratives.

The Group Identity narrative is the most complex branch in the taxonomy, but shows clear di erences between the publications. The only topic that has a relatively high coverage percentage in the non-radical texts was legitimacy of ideology. Manual inspection shows that this is caused by matching of sentences of the form \X says in the Qur'an: ...", where X is an imam, important person or deity. Indeed, YMD uses this type of phrases frequently to legitimize lifestyle choices they recommend to their readers.

Homophily also occurs in YMD (in about 10% of the pages), but is much more prevalent in the radical publications. Furthermore, while YMD uses only generic homophily phrases (e.g. \brothers and sisters"), the radical publications use terminology to glorify and idealize the more radical members (e.g. \martyrs"). Radicalising publications also push a winner narrative as part of their group identity, especially by highlighting perceived achievements.

7http://www.youngmuslimdigest.com

By far the biggest di erence between radical and non-radical texts is that radical texts actively push narratives to discredit other groups, in particular groups seen as the enemy or as ine ective alternatives (e.g. political islamists). This is closely related to the narrative of sowing discord between groups, where we see a similar pattern.

Another interesting result is that while YMD also includes narratives of moral obligations, these are framed in a neutral manner. By contrast, radical publications frame them in terms relative to the group: i.e. as an imperative to protect, avenge or pre-emptively attack. Note that, because the annotator is rule-based, it is straightforward to provide explanations for the annotations as shown in Figure 1. 5

Conclusion and Future Work

In this paper, we presented an approach for analysing radicalisation narratives in a way that is aligned with existing social science approaches. This resulted in a set of three taxonomies that can be used to annotate radicalisation narratives at the cultural, strategic and personal level. We also presented a rule-based prototype to automatically detect strategic narratives for islamic publications and showed that the results align with our expectations, can easily be explained, can be used to distinguish between radical and non-radical texts, in particular to identify prevalent narratives [LSF+13].

A key challenge in automating this type of narrative analysis is that it is crucial to encode knowledge about the originator of the text (the radical groups), their slang and who is the (intended) recipient; all of which impacts generalisability, as discussed in Section 3. Our results form a rst step toward constructing a line of argumentation based on how cultural, strategic and personal narratives connect [CI15].

One of the main advantages of the proposed approach is that it aligns with existing social science models, making it easy for domain experts to understand and validate the annotations. This is in contrast to machine learning based approaches, which function as a black box which may produce classi cations which are hard to explain. This is especially relevant for radical text detection where machine learning approaches may be subject to discriminatory bias against minorities.

Our immediate next steps include re ning our rule-based annotator (e.g. adding more rules to nd occurrences of narratives we are currently missing) and adapting it to other types of radicalising texts such as on-line forums for right- and left-wing extremism and incel culture.

Acknowledgements Work supported by the European Comission under grant 740934 { TRIVALENT { and grant 770302 { Co-Inform { as part of the Horizon 2020 research and innovation programme. [Bet08]

David Betz. The virtual dimension of contemporary insurgency and counterinsurgency. Small Wars & Insurgencies, 19(4):510{540, dec 2008.

Alvaro Carrera and Carlos A. Iglesias. A systematic review of argumentation techniques for multiagent systems research. Arti cial Intelligence Review, 44(4):509{535, dec 2015.

European Commission. A multi-dimensional approach to disinformation. Technical report, European Commission, 2018.

Cind Du Bois. Literature Review on Radicalisation. Technical Report October, TRIVALENT, 2017. Miriam Fernandez, Moizzah Asif, and Harith Alani. Understanding the Roots of Radicalisation on Twitter. In Proceedings of the 10th ACM Conference on Web Science - WebSci '18, pages 1{10.

ACM, 2018.

Je ry R. Halverson, Jr. Goodall H. Lloyd, and Steven R. Corman. Master narratives of Islamist extremism. Palgrave Macmillan, 2011.

Ioannis E

Kotoulas . Ideological Principles of Jabhat al-Nusra in Al-Risalah Magazine . (December) , 2018 .

Michael

King and Donald M. Taylor . The Radicalization of Homegrown Jihadists: A Review of Theoretical Models and Social Psychological Evidence . Terrorism and Political Violence , 23 ( 4 ): 602 { 622, sep 2011 .

[LCGPC17]

Raul

Lara-Cabrera , Antonio Gonzalez-Pardo, and David Camacho. Statistical analysis of risk assessment factors and metrics to evaluate radicalisation in Twitter . Future Generation Computer Systems , 2017 .

[LSF+13] [Rie05] [SDK+17] Stephan

Lewandowsky

, Werner G K Stritzke , Alexandra M Freund, Klaus Oberauer , and Joachim

Krueger. Misinformation , Disinformation, and Violent Con ict . American Psychologist , 68 ( 7 ): 487 { 501 , 2013 .

Catherine K Riessman. Narrative Analysis . In Narrative, Memory & Everyday Life , pages 1 { 7.

Hassan

Saif , Thomas Dickinson, Leon Kastler, Miriam Fernandez, and

Harith

Alani . A Semantic Graph-Based Approach for Radicalisation Detection on Social Media . In ESWC , pages 571 { 587 .

Springer , 2017 .