-

Does Anyone see the Irony here? Analysis of Perspective-aware Model Predictions in Irony Detection

Simona Frenda

1 2

Soda Marem Lo

Silvia Casola

Bianca Scarlini

Cristina Marco

Valerio Basile

Davide Bernardi

0 0 Alexa AI, Amazon Development Centre Italy , Turin 1 Computer Science Department, University of Turin , Turin 2 aequa-tech srl , Turin

In the framework of perspectivism, analyzing how people perceive pragmatic phenomena, like irony, is relevant for deeply understanding the diferent points of view, and for creating more robust perspective-aware models. This paper presents a linguistic analysis of irony perception in 11 perspectivist models. Each model is trained on annotations by crowd-sourcing workers diferent in gender, age, and nationalities. Due to the sparsity of the dataset, we examine the texts classified as ironic and not-ironic by these perspectivist models, and identify linguistic patterns that all perspectives associate with irony. To our knowledge, we are the first to also provide evidence for the diferent linguistic patterns perceived as ironic by a specific perspective. For example, models trained on data annotated by American and Australian annotators are more inclined to classify a text as ironic when it includes a negative sentiment, while models trained on data annotated by the youngest annotators are particularly influenced by words related to immoral behaviors. Warning: This paper could contain content that is ofensive or upsetting for the reader.

eol>Irony Detection Irony Interpretation Perspectivism Linguistic Analysis

1. Introduction nomenon by a majority of people [5], irony tends to be closely related to the cultural and personal background The use of supervised learning is the core of several ar- of those who interpret it [6, 7]. eas of Artificial Intelligence, including Natural Language In this paper, we investigate the perception of irony in Processing (NLP). Models that leverage this learning diferent segments of the English-speaking population. paradigm are strictly dependent on either automatically- We focus, in particular, on two research questions (RQ): produced datasets, i.e., silver data, or manually-curated ones, i.e., gold standards. In the contest of human-made • RQ1: what are the common linguistic triggers for annotations, the standard approach determines the final irony interpretation, regardless of perspectives? annotation by resolving the disagreement of multiple an- • RQ2: what are the linguistic patterns typical of notators, e.g., through majority voting. Recent research each perspective? trends ofer an alternative take and show that flattening the disagreement of several annotators can discard valuable information [1, 2].

Some of these trends go by the name of perspectivist approaches. According to these lines of research, the discrepancies of diferent annotators can be exploited to model diferent points of view ( perspectives) on a specific task [3]. This is especially important when the task is highly subjective, such as that of identifying irony [4].

While some linguistic patterns are linked to this phe

To answer these questions, we exploited EPIC (English

Perspectivist Irony Corpus) [8], a disaggregated English corpus for irony detection, containing 3,000 pairs of PostsReplies from Twitter and Reddit, along with the demographic information of each annotator.

Inspired by [9], and in continuity with [8], we grouped annotators in 11 diferent perspectives: self-identified female and male, age-based groups (boomers, generation X, generation Y and generation Z), and country-based groups. Then, reproducing the experiments of [8], we 2nd Workshop on Perspectivist Approaches to NLPi soliti noti created 11 perspective-aware models and obtained their * Corresponding author. predictions on the same set of instances. $ simona.frenda@unito.it (S. Frenda); sodamarem.lo@unito.it We do so to perform a quantitative and qualitative (S. M. Lo); silvia.casola@unito.it (S. Casola); scarlini@amazon.it analysis of the common and specific linguistic patterns (vBa.leSrcioa.rbliansii)l;em@aurnciotcor.iit@(Vam.Baazsoinle.i)t; (dCv.dMbea@rcaom);azon.it (D. Bernardi) (afective, ofensive, syntactic, and lexical) that activate © 2023 Copyright © 2023 for this paper by its authors. Use permitted under Creative Commons the ironic interpretation of a text for each population CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g LCicEenUseRAttWribuotironk4s.0hIontpernPatrioonacl e(CeCdBiYn4g.0)s. (CEUR-WS.org) segment. We leveraged the models’ knowledge to predict the labels on the test set, and performed a linguistic syntactic [10], stylistic [11], pragmatic [12], semantic analysis on this portion of the corpus to compare the [13], and afective [ 14, 15, 16] ones. Despite the clear predicted perception of each social group on the same impact of some of these features on irony detection, content. In fact, since instances are annotated on average the general cognitive mechanisms that activate irony by 5 annotators, they do not necessarily contain labels for regardless of language and domain are still being studied all demographic traits and perspectives. For example, an [17, 5, 18, 19]. instance can be annotated by workers from Generation The authors of [5] conducted an exhaustive linguisGenY and GenZ only, and lack labels from annotators of tic analysis on three Twitter datasets annotated for the the older generations. irony detection task in French, Italian, and English. They

By comparing the relevance of diferent linguistic fea- looked for specific linguistic strategies used for expresstures for the perspectivist models, we are able, firstly, to ing irony: analogy, metaphor, hyperbole/exaggeration, confirm the importance – for all perspectives – of some euphemism, rhetorical question, oxymoron, paradox, and specific features known to be of high impact in previ- other elements such as false assertion, context shift, situous works [5]; secondly, we show that some patterns are ational irony, or specific markers (emoticons, negations, perspective-specific. patterns of discourse, hashtags labelling the presence

For instance, we found that the models trained on of humour, intensifiers, punctuation, false propositions, female, generation Y, Australian, and American perspec- elements of surprise, modality, quotations, opposition, tives tend to recognize irony especially when the texts capital letters, personal pronouns, interjections, comparexpress negative sentiment. The Irish perspective seems ison, named entities, report verbs, expression of opinion, to be amused by the emotional contrast in the texts. The urls). Oxymorons, false assertion, and situational irony male perspectivist model, instead, seems to be more sensi- have been confirmed as triggers for irony in Italian tweets tive to the recognition of irony when texts contain insults also by the authors of [20], who analysed the predictions explicitly related to crimes or immoral behaviors, pro- obtained in the context of the IronITA shared task [21]. fessions, and animals. A similar diference is also visible Unlike other languages, ellipsis and apostrophes stand in the dimension of age, where words related to female out for Spanish [22]. genitalia appear relevant in the decision for Generation Another common trait for irony detection from the X; in contrast, the youngest generations (i.e., Y and Z) multilingual perspective is the role played by afective are more influenced by words related to crimes and im- information. For example, the authors of [14] showed moral behaviors. Models trained on the perspectives of how pleasantness, imagery, activation, and negative senboomers and Indians are sensitive to specific syntactic timent have a discriminative power in classifying ironic patterns. and non-ironic English tweets. Negative emotions, in

These analyses shed light on the diferent perceptions particular, were identified primarily in English #ironic of irony by diferent population segments. While we self-labelled tweets [23], in diferent ironic texts in Spanfound common patterns that are independent of lan- ish [22] and Italian ironic tweets [20]. These works show guages and perspectives, attention to diferent points that, among the linguistic strategies that can be used for of view is needed especially for creating user-centered the activation of irony, some are language-independent, applications and for making them explainable. while others seem related to specific languages and cul

This paper is organized as follows. In Section 2, we tures. Irony, as a subjective phenomenon, is strongly present an overview of previous works related to the anal- influenced by individual perception. ysis of linguistic features and strategies for expressing The perspectivist framework [3] aims at modelling irony, focusing on a multilingual and multiperspective these aspects by incorporating the diferent points of view approach to the phenomenon. In Section 3 we describe represented in the annotations. The new multi-faceted the EPIC corpus, used to perform the source-independent annotation process is then exploited for model training, (Section 4.1) and source-dependent (Section 4.2) analy- interpretation, and analysis of the predictions [4]. Perses on the patterns that drive the interpretation of our spectivist works on irony are very few. To our knowledge, perspective-aware models. Finally, Section 5 is dedicated only two disaggregated datasets for English exist on huto the discussion and conclusive observations on our mour [24] and irony [8]. The first was used as benchmark results. in the first edition of the LeWiDi (Learning with disagreement) shared task at SemEval 2021; whereas the second was used to build, with a strongly perspectivist approach, 2. Related Work demographic-based models to encode annotators’ perspectives. Results demonstrated both a variation in the Literature about irony detection has explored the contri- perception of irony based on annotators’ social group, bution of several linguistic features within classical and and an increase in confidence for perspective-aware modneural architectures (using golden standard datasets): els compared to the non-perspectivist ones. iro 515 536 156 415 577 322 418 338 343 355 452

Inspired by their work, and focusing especially on the the perspectivist models1. perception of irony, we propose a linguistic analysis of Each perspective-specific training set was used to finethe predictions of diferent perspectivist models, which tune a pre-trained BERT model [25]. In particular, similar contributes to this emerging framework by examining to [8], we finetuned the uncased version of BERT 2 for the most impactful linguistic features for interpreting Sequence Classification, with a binary (ironic and notirony. ironic) label. Each BERT model was trained by taking as input the representation of the Post-Reply pair. The learning rate was set in a range of 6e-5 and 5e-5, the 3. Dataset and Perspectivist batch size to 16 and the maximum number of epochs to Models 10 with an early-stopping strategy.

These models have been tested in perspective-specific To answer the research questions RQ1 and RQ2, we ex- test sets, computing the binary label and the confidence ploit EPIC, the English Perspectivist Irony Corpus re- score of each model by following [26]’s formula based on leased by [8]. This corpus comprises 3,000 pairs of Post- the normalized diference between the logits of each class, Reply extracted from social media, evenly retrieved from i.e., ironic and not-ironic. The average of the confidence Twitter and Reddit, and was annotated for the irony de- scores over instances and the f1-score of each model are tection task by crowdsourcing workers with diferent reported in Table 1. As we can notice, the f1-score is demographical traits. EPIC was qualitatively examined fair enough considering the notable unbalance between by [8], that inspected the diferent demographic-based positive (iro) and negative (non-iro) classes in each perspectives encoded in the dataset. They exploited this dataset. information to create perspectivist models trained on Once we validated these models, we applied them to subsets of data annotated by workers with the same de- the test set (iro: 110, non-iro: 443) obtaining the mographical trait. With the aim of examining the percep- predictions (and the confidence score of the predictions) tion of irony, we reproduced their perspectivist models of perspectivist models for each instance, like in Table 23. and used their predictions for the linguistic analysis.

In more details, following [8] we trained 11 perspective-aware classifiers. Each of these models was 4. Analysis on Perspectives trained on data labeled by a specific subset of annotators, who were separated according to their demographic traits In this Section, we focus on the analysis of the common as shown in Table 1: gender (female, male), age (boomers, and specific patterns that trigger the interpretation of Generation X, Generation Y, Generation Z), and nation- irony of 11 perspective-aware models across the 553 inality (British, Indian, Irish, American, and Australian). stances of the test set. As commented above, EPIC conAs in [8], we created: i) a unique test set featuring tains Post-Reply pairs extracted from two sources: Twit20% of the instances of EPIC’s corpus (246 from Reddit 1We note that to label each instance in our perspective-specific and 307 from Twitter) used for the analyses described in datasets, we applied the majority voting strategy to each Post-Reply Section 4, ii) and the perspective-specific datasets (see pair given the annotations of the selected subsets of annotators. Table 1) by grouping the remaining instance-annotation We, then, discarded all the entries for which we could not compute pairs according to the age, gender, and nationality of 2ahtmtpasj:o//rhituygvgointegfwaciteh.ctoh/ebearvta-bilaasbel-euanncnaosetadtions. their annotators used, in a split 80/20, to train and test 3For the sake of clarity, we report the maximum and minimum confidence score only for each instance.

fem Reddit Other people on social Saw someone on a friend’s 0 media when they’re be- FB comments have the ing trolls. They only do nerve to tell her to "check it because 99% of them her sources" and link to a wouldn’t have the nerve to meme. The friend has a say whatever they’re say- PhD in the field being dising to your face. cussed.

Reddit Pasta pillows, yes. Pasta Yeah that implies that you 1 1 (.841) 0 (.003) cushions even because of dip them in the water the frilly edge. But pasta and then bin them beteabags? No. fore drinking your slightly

pasta flavoured water.

Twitter Hey atheists, what gives @BeatTheCult 1 1 1 your life meaning if you Meat,chips, bread and don’t believe in God? beer....

Twitter Apparently Reece Mogg @YvonneBurdett3 We can 1 1 0 will be making a statement only hope! Perhaps we’ve within the hour. It’s not declared war on Russia going to be his resignation or put a man on Mars is it overnight. 0 0 1 1 1 ter and Reddit. Therefore, we describe two types of anal- To examine the features that are actually discriminaysis: firstly, a source-independent analysis (Section 4.1) tive for the detection of irony, we selected for each model and secondly, a source-based analysis (Section 4.2). only texts from the test set predicted with a very high

The former focused on capturing the linguistic fea- score of confidence. The threshold used for this selection tures that trigger the ironic interpretation of a text re- is unique for each perspectivist model (Table 3), and it gardless of its source, exploring the common and diverse was obtained by computing the median of the list of confeatures among the predictions of diferent perspective- ifdence scores resulting from the prediction of positive based models. The latter aimed at identifying in which class (ironic texts) on the specific perspective-based test source these models tend to predict irony exploring the sets of EPIC (Table 1). possible causes, and if there are linguistic patterns spe- This choice is motivated by one of the findings of [ 8], cific of a source, looking especially at the use of the who proved that perspectivist models are more confident strategies and markers identified by [ 5] in multilingual and precise when predict labels in test sets that encode datasets. their perspectives; and depends also on our purpose of

For both analyses, we took into account the predictions examining the perception of irony. We want to be sure of perspectivist models obtained in the test set (Table 2). that the analysed texts, especially the ones recognized as For each instance, therefore, we have the labels of all ironic, have been predicted with a very high confidence the 11 perspectives, and the confidence score of each by the models. model computed as described in Section 3. We leveraged the models’ knowledge to predict the labels on the test set since – by design –, not all instances of our corpus feature manual annotations covering all demographic traits/perspectives.

4.1. Source-independent Analysis

To observe the commonalities and diferences among the interpretation of irony by the various perspectivist models, we extracted a set of linguistic features from the texts of the test set, computed their 2 value for each model, and plotted these values in heatmaps4.

4Since we observed that the distribution of the 2 values of the

features is non-linear, we employed the logarithmic function of PowerTransformer to normalize the data.

The selection of the set of features was inspired by existing literature about multilingual and multigenre ironic texts (Section 2); and include: 1) afective features: the sentiment, emotions, and feelings expressed in the texts (Section 4.1.1); 2) the presence of ofensive language (Section 4.1.2); 3) syntactic features (Section 4.1.3). We also performed a lexical analysis (Section 4.1.4).

4.1.1. Afective analysis

We used the EmoLex dictionary [27] to extract emotions and expressed feelings (Figure 1). EmoLex is based on the wheel of emotions theorized by [28], which includes 8 main emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) and the primary dyads or feelings (aggressiveness, optimism, love, submission, awe, disapproval, remorse, contempt).

Favored by the design of the wheel of emotions, we computed also the variability of opposite emotions and contrary feelings by means of the standard deviation ( ). The weights of the emotional features are obtained by summing the TF-IDF5 of words belonging to the specific emotions/feelings. And, we computed the sentiment scores (positive and negative) by using SentiWordNet 3.0 [29] (Figure 2).

As Figure 1 shows, negative emotions and feelings (Example 1) like disgust, contempt, and remorse report the highest 2 values for the majority of the perspectivist models. Thus, we can confirm the findings of previous analyses in English tweets [23, 14]) where negative emotions were identified primarily in #ironic self-labelled tweets. Another common discriminative feature is the contrast between negative emotions and feelings and their positive counterpart (Example 2).

(1) [Post] TLDR: senior positions and management get paid more. [Reply] And are generally the most useless pricks out there, all talk and no action. (2) [Post] Fuck carlow they beat me in the feile when I was 13. They all looked like 30 year old men. [Reply] We have to win a match in football some how.

By looking at the perspective-specific models , we noticed some interesting findings. For instance, when considering the gender dimension, we can notice a higher confident in detecting irony when the text is character 2 for the Fem-persp model on the presence of nega- ized by a negative sentiment, diferently from their countive sentiment and on negative emotions/feelings (fear, terparts (especially GenX, GenZ, IN, IR-persp models). sadness, disapproval, and awe) with respect to the Male- The analysis of emotions brings to light an interesting difpersp model (Figures 2 and 1). These values suggest the ference between the IR-persp model and all the 4 models idea that female annotators tend to recognize irony in built taking into account the provenance. The IR-persp texts that express a certain negativity. model shows a marked and higher 2 score especially in

Similar finding is noticed in GenY, AU and particularly the presence of emotional contradictions in the texts. US-persp models. All these models, indeed, show to be

5To compute the TF-IDF, we cleaned text from URLs and other not

alphanumeric symbols, tokenized it and removed the stopwords, and finally lemmatized it using the SpaCy large model for English.

4.1.2. Ofensive language The authors of [20] proved that irony, especially in its

sarcastic form, can be used to reinforce a negative message. For this reason, the presence of ofensive language could be considered a trigger for the ironic interpretation of a text.

To this purpose, we exploited HurtLex, a multilingual lexicon of ofensive words. The entries in the lexicon are categorized into 17 types of ofences (related to the economic and social spheres, professions, animals, and so on) (Table 4) enclosed in two macro-categories: conservative (words with literally ofensive sense) and inclusive (all the words regardless of the explicitness of the ofenses).

Category PS RCI PA DDP

GenX-persp model, and the words related to crimes/immoral behaviours for the youngest generations (i.e., Y and Z). In the dimension of nationality, it is clear that the presence of ofensive words related especially to moral behaviours/defects have some impact to the detection of irony for AU and US-persp model. While words related to male genitalia report a higher score only for AU and IR-persp model.

Figure 3 shows that some categories of ofensive language report the highest 2 values for the majority of the perspectivist models. These categories are related in 4.1.3. Syntactic features particular to male genitalia, moral behaviors/defects, and, As shown in previous work [30], syntactic features are even in its conservative sense, to the category of physical proven to be useful to detect ironic language in social disabilities and diversity. media. In particular, we captured syntactic dependencies

We can also point out interesting diferences when that could reveal pragmatic information, such as: intensiconsidering perspective-specific models . Looking ifers ( intens), discourse connections (disc_conn), adat the gender, we can notice higher values in the Male- verbial locutions (adv_loc), mentions (mention) and persp model when the texts contain words related to nominal phrases (and the number of nominal phrases crimes/immoral behaviors, professions, and animals, dif- in the tweet) (nom_phrase and num_nom_phrase). As ferently from the Fem-persp model. Figure 4 shows, only the adverbial locutions appear rele

Observing the dimension of age, instead, the difer- vant for the majority of models. ences are not so marked, except for the ofensive words re- However, we noticed that syntactic features have lated to female genitalia that appear discriminant for the a higher 2 score in a few models, such as Boomer and IN-persp models. If the former seems to be triggered by diferent syntactic features (i.e., the presence of intensifiers and nominal utterances), the latter shows to discriminate irony, especially in the presence of discursive connections.

4.1.4. Lexical analysis

Specifically, considering the features associated with the model based on boomers’ perspective, there is a high presence of non-English words (as usernames or foreign words, especially from Hindi), and few verbs. In fact, it relies more on nominal n-grams, which in some cases corresponds to the entire text, as in the Examples 3 and 4. This result is further confirmed in the analysis above (Figure 4).

(3) [Post] That’s damn shitty of Hugo Boss, what on earth with the chaps in the corner shop and the kebab shop call us now? [Reply] Ma man (4) [Post] Election Predictions: Republicans will win the House! Stacey Abrams will lose in Georgia! Any takers? [Reply] @USER Yo crazy dude

4.2. Source-based Analysis In this section, we present a quantitative and qualitative

analysis of the characteristics of ironic texts in Twitter and Reddit, showing analogies and diferences.

To perform a lexical analysis on the test set, we extracted the top 100 unigrams, bigrams and trigrams 4.2.1. Irony on Twitter is more contextual weighted by their TF-IDF6 applied separately for each Observing the predicted texts, we noticed that perspecmodel on texts labelled as ironic. In order to examine the tivist models tend to identify irony more in posts from lexical patterns that may influence their choices, we man- Reddit (63% of the cases in Table 6) even if the two sources ually analysed both the features that were common to at are balanced in the creation process of our corpus. least 6 models and the ones that occurred in a individual We hypothesized that this diference was due to the model only. diferent level of complexity and need for context for

Focusing on the n-grams common to at least 5 mod- instances in the two sources. To measure the characterisels, we individuated a total of 18 features that recur in 5 tics, we computed the length in characters and tokens7 to 7 models. Ten of them are unigrams frequent across and lexical richness of the Post-Reply pairs, in terms of the texts, as family, think, feel, know, while the other 8 type-token-ration (TTR)8. lexical features are bigrams and trigrams linked to the We also compute the number of named entities9 and same 4 texts predicted as ironic by at least 5 models and external elements10 that could amplify the contextual reported in Table 5. information in each source (Table 7). We used spaCy and

To highlight whether some lexical features were spaCy-udpipe loading the available models for English in model-specific , we filtered the data by removing all particular to extract interjections and the named entities. the features that recurred in more than one model of For the emoticons and emojis, we exploited available lists the same dimension (age, gender, and nationality). By in the emoji library. While all the other characteristics manually inspecting these unique features per model, we have been extracted using specific regex. noticed that for the majority of them, the bigrams and As expected, posts from Reddit are longer than tweets, trigrams represented a diferent combination of the same but the values of the lexical richness and the number texts (e.g. common lannister aside, family common lannister, lannister aside obsession, aside obsession, common lannister, family common, lannister aside). Boomer-persp and GenY-persp models were the only ones that behaved diferently. Their bigrams and trigrams rarely show the systematic repetition of the same lexical items described above, and they both present a higher number of unigrams compared to other models. 7For computing the length in tokens, the texts have been cleaned and tokenized, removing urls, punctuation, emoji, and emoticons. 8TTR is the number of distinct words over the overall words in the text. We took into account tokens and types lists without urls, punctuation, emoji, and emoticons. Here, the texts have been cleaned and tokenized as described in the previous footnote. 9The list of named entities considered in this study includes: works of art, organizations, persons, geopolitical entities, locations, events, names of products, date, languages, laws, and nationalities or religious or political groups. 10External elements include: hashtags, emoji, emoticons, and urls.

6We used the TfidfVectorizer from Scikit-learn

no don’t please. i was crushing over since she came. keep that chutiya away Hey atheists, what gives your life meaning if you don’t believe in God? Reply You know there’s something else that Trump family has in common with the Lannisters aside from the obsession with gold. @BeatTheCult bread and beer....

So BJP-RSS folk need to fear NSG ? Kinda contradictory no ? Has this guy any shame left. He should be behind bars! wind power my arse so....what you think this is false? Or you prefer burning stuf? 8 7 6 8 # Models

Bi/Trigrams trump family common Meat,chips, meat chip bread shame leave bar, shame leave prefer burn, burn stuf , think false prefer, prefer burn stuf of named entities suggest that the content on Twitter is more varied than that from Reddit (Table 7). This is also confirmed by the number of external elements. A similar trend is also observed in the human annotations of the texts of the test set: most annotators recognized more irony in posts from Reddit (27%) than in tweets (14%).

To analyze this trend further, we explored how each model behaves with respect to the source. In general, they identify texts from Reddit as ironic more often than tweets; the only exception is the model trained on the Boomers’ perspective, which have classified instances as ironic almost equally for the two sources (52% from Reddit and 48% from Twitter).

4.2.2. Linguistic strategies and markers

We carried out a qualitative analysis of the texts predicted as ironic by at least 5 models, which amounts to a total of 26 texts, 24 from Reddit and 2 from Twitter (Table 6). To these, we added 22 tweets from those identified as ironic by at least 3 models in order to conduct a comparative linguistic analysis of the two sources. For this analysis, we took into account also the irony strategies and markers proposed in the schema of [5] (Section 2).

We found that in both sources, users tend to use similar linguistic strategies to express irony, such as paradox/oxymoron and false assertions, confirming the results presented in [5]; and other interesting features, such as context shift (Example 5) and hyperbole/exaggeration (Example 6).

(5) [Post] How many roads must a man walk down? [Reply] The only word I know is grunt and I can’t spell it. (6) [Post] Apparently Reece Mogg will be making a statement within the hour. It’s not going to be his resignation is it [Reply] @USER We can only hope! Perhaps we’ve declared war on Russia or put a man on Mars overnight.

However, some diferences are evident. Twitter users often convey contradictions that characterize irony through unexpected answers (Example 4) and euphemisms (Example 7), while Reddit communities lean towards the use of rhetorical questions (Example 8) and respect to their counterparts (respectively, generations X metaphors. and Z, and Indian and Irish perspectives). Moreover, dif(7) [Post] Lindsey Hoyle spent £7,500 of taxpayers money ferently from other models of the provenance dimension, on a mattress and sheets for his bed in the speakers the Irish perspective shows to recognize irony especially residence. in presence of emotional contradictions. In turn, the [Reply] @USER @USER Very Toriesque male perspective model seems more sensitive to irony (8) [Post] wind power my arse when the text reports ofences related to crimes/immoral [Reply] so....what you think this is false? Or you prefer behaviors, professions, or animals. burning stuf? Similar diferences are visible in the dimension of age, where texts including female genitalia are considered From a stylistic point of view, both Reddit and Twitter ironic by Generation X, while the youngest generations texts contain question marks, exclamation points, and (i.e., Y and Z) are more influenced by words related to ellipsis. Full stops are common to the two sources, but crimes/immoral behaviors. Finally, only boomers and they are more frequent in tweets, while Reddit users are Indian perspectives are sensible to syntactical patterns, more prone to employ swear words. such as intensifiers, nominal utterances, and discursive

Tweets also contain nominal utterances more fre- connectors [RQ2]. We also noticed that all models detect quently than Reddit posts; this is coherent with the statis- irony in Reddit posts more often than in tweets. tics shown in Table 7, which highlight how texts from The findings of these analyses reveal the perception of Reddit are longer and thus include verbal expressions irony of diferent segments of people. These observations, to fulfil complete sentences. In general, in both sources, therefore, could help to create models for irony detection texts are short and composed of straight answers. with diferent degrees of “subjectivity”: models that take into account the most common features to detect irony, 5. Discussion and Conclusion or models that target distinct perspectives. In both cases, this study provides the ingredients to make their decisions explainable. In line with this purpose, we would like, in the future, to enrich these analyses looking also at the topic of the texts, and extend them to diferent languages, capturing also the understanding of irony in diferent countries.

To the best of our knowledge, this work is the first to approach the analysis of the perceptions of irony in specific segments. Specifically, we base our analysis on the age, gender, and nationality dimension from the EPIC dataset [8]. To examine these patterns in a specific set of texts, we modelled 11 perspectives (self-identified female and male, boomers, generation X, generation Y and genera- Limitations tion Z, British, Indian, Irish, American, and Australian), and comparatively analysed the impact of various lin- This work is the first attempt to explore the perception guistic features in each of them. of irony, looking at diferent perspectives. Given the

The contribution of this paper is twofold. Firstly, our early stages of this framework, we are aware there are analysis confirms most of the observations made in the some limitations, which we aim to tackle in subsequent literature about the similar ironic patterns featured in research. In particular, the perspectives are based on a texts of diferent languages [ 23, 14, 5]. Secondly, our small subset of characteristics (self-identified gender, age, analysis provides evidence for the diferent perceptions and nationality), and the analysis is conducted using a of irony experienced by people with distinct demographic limited number of data instances (553). To overcome this traits. As a subjective task, irony identification is indeed problem, in the future, we plan to extend these analyses impacted by experience and background. to a larger corpus that includes texts in several languages.

Through this analysis exercise, we noticed that the patterns that often trigger ironic interpretation in most perspectivist models are negative emotions (i.e., disgust, Acknowledgments contempt, remorse) and contrasting expressions with their counterparts in the wheel of emotions of Plutchik The work of S. Frenda, S. Casola and V. Basile was par(trust, submission, and love); ofensive language (related tially funded by the Multilingual Perspective-Aware NLU in particular to male genitalia), moral behaviors or de- project in partnership with Amazon Alexa. This research fects, physical disabilities, and diversities also play a role was funded through a donation from Amazon. [RQ1].

In addition, looking at the diferences among perspectives, we noticed that models trained on female, generation Y, Australian, and American perspectives, often recognize irony when texts convey negative sentiment with TUT, IEEE intelligent systems 28 (2013) 55–63. [26] A. A. Taha, L. Hennig, P. Knoth, Confidence estiURL: https://www.computer.org/csdl/magazine/ex/ mation of classification based on the distribution 2013/02/mex2013020055/13rRUxAAT3i. of the neural network output layer, arXiv preprint [18] C. Van Hee, E. Lefever, V. Hoste, SemEval-2018 arXiv:2210.07745 (2022).

task 3: Irony detection in English tweets, in: Pro- [27] S. M. Mohammad, P. D. Turney, Crowdsourcing a ceedings of the 12th International Workshop on word-emotion association lexicon, Computational Semantic Evaluation, Association for Computa- Intelligence 29 (2013) 436–465. URL: https://doi.org/ tional Linguistics, New Orleans, Louisiana, 2018, 10.1111/j.1467-8640.2012.00460.x. pp. 39–50. URL: https://aclanthology.org/S18-1005. [28] R. Plutchik, H. Kellerman, Theories of emotion, voldoi:10.18653/v1/S18-1005. ume 1, Academic Press, 1980. URL: https://books. [19] R. Giora, I. Jafe, I. Becker, O. Fein, Strongly at- google.it/books?id=TV99AAAAMAAJ. tenuating highly positive concepts. The case of de- [29] S. Baccianella, A. Esuli, F. Sebastiani, SentiWordfault sarcastic interpretations, Review of Cognitive Net 3.0: An enhanced lexical resource for sentiLinguistics. Published under the auspices of the ment analysis and opinion mining, in: ProceedSpanish Cognitive Linguistics Association 16 (2018) ings of the Seventh International Conference on 19–47. URL: https://doi.org/10.1075/rcl.00002.gio. Language Resources and Evaluation (LREC’10), Eu[20] S. Frenda, A. T. Cignarella, V. Basile, C. Bosco, ropean Language Resources Association (ELRA), V. Patti, P. Rosso, The unbearable hurtfulness of Valletta, Malta, 2010. URL: http://www.lrec-conf. sarcasm, Expert Systems with Applications 193 org/proceedings/lrec2010/pdf/769_Paper.pdf . (2022) 116398. [30] A. T. Cignarella, V. Basile, M. Sanguinetti, C. Bosco, [21] A. T. Cignarella, S. Frenda, V. Basile, C. Bosco, P. Rosso, F. Benamara, Multilingual irony detection V. Patti, P. Rosso, Overview of the EVALITA 2018 with dependency syntax and neural models, in: task on irony detection in Italian tweets (IronITA), Proceedings of the 28th International Conference in: Sixth Evaluation Campaign of Natural Language on Computational Linguistics, International ComProcessing and Speech Tools for Italian (EVALITA mittee on Computational Linguistics, Barcelona, 2018), volume 2263, CEUR-WS, 2018, pp. 1–6. Spain (Online), 2020, pp. 1346–1358. URL: https: [22] S. Frenda, V. Patti, Computational models for irony //aclanthology.org/2020.coling-main.116. detection in three spanish variants, in: CEUR Workshop Proceedings, volume 2421, CEUR-WS, 2019, pp. 297–309. [23] E. Sulis, D. I. H. Farías, P. Rosso, V. Patti, G. Rufo,

Figurative messages and afect in Twitter: Differences between #irony, #sarcasm and #not, Knowledge-Based Systems 108 (2016) 132–143. URL: https://doi.org/10.1016/j.knosys.2016.05.035, new Avenues in Knowledge Bases for Natural Language

Processing. [24] E. Simpson, E.-L. Do Dinh, T. Miller, I. Gurevych,

Predicting humorousness and metaphor novelty with Gaussian process preference learning, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 5716–5728. URL: https://aclanthology.org/P19-1572.

doi:10.18653/v1/P19-1572. [25] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:

Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/ N19-1423. doi:10.18653/v1/N19-1423.