1. Introduction

Written Goodbyes: How Genre and Sociolinguistic Factors Influence the Content and Style of Suicide Notes

Lucia Busso

Claudia Roberta Combei

1 0 Aston Institute for Forensic Linguistics, Aston University , Birmingham , UK 1 Dipartimento di Studi Umanistici, Università di Pavia , Italy

The study analyses a novel corpus of 76 freely available English authentic suicide notes (SNs) (letters and social media posts), spanning from 1902 to 2023. By using NLP and corpus linguistics tool, this research aims at decoding patterns of content and style in SNs. In particular, we explore variation in linguistic features in SNs across sociolinguistic factors (age, gender, addressee, time period) and between text type - referred to as genre - (letters vs. online posts). To this end, we use topic models, subjectivity analysis, and sentiment and emotion analysis. Results highlight how both discourse and emotion expression, show diferences depending on genre, gender, age group and time period. We suggest a more nuanced approach to personalized prevention and intervention strategies based on insights from computer-assisted linguistic analysis.

eol>suicide notes topic modelling sentiment and emotion analysis subjectivity analysis

1. Introduction

distinguish between genuine and elicited suicide notes.

This – the authors claim – can in turn help developing This paper investigates the language of suicide notes, a prediction strategy of repeated suicide attempts, as with the goal of uncovering patterns of discourse, topics, suicide notes ofer valuable insights into specific personand emotional expression across various sociolinguistic ality states and mindsets. Similarly, [7] suggests that factors and relationship dynamics, spanning over 100 analysing SNs may contribute to assessing the risk of years. A suicide note (SN) has been defined in the lit- repeated suicide attempts. erature as "any available text by a suicide which was Despite the area being well-researched, especially in authored shortly before death" ([ 1 ]: 26). forensic linguistics, current analyses of SNs present sev

The importance of a detailed analysis of suicide notes eral shortcomings. Given the dificulty of accessing data, has been acknowledged in the scholarly debate ([ 2 ]). In scholars have either used dubious source material (such fact, SNs have been widely studied in linguistics, soci- as the letters published on the blog "The Holy Dark"), ology, and psychology starting with the publication in or have reused and reanalysed SNs written by famous 1959 of Osgood and Walker’s seminal work ([ 3 ]). Since people (such as Virginia Woolf and Kurt Cobain, e.g., then, the language of SNs has been investigated mainly [12, 13]). Moreover, there is no study to date – to the best through Genre Analysis ([4]), with some scholars work- of the authors’ knowledge – that analyses of SNs using ing with corpus methods ([ 1, 5 ]). Lately, big corpora of text type, which we refer to as genre, or sociolinguistic SNs have been collected through the Web and used for factors (such as gender, age, addressee, or time period) computational analyses (inter alia [6, 7, 8]). as covariates.

Research on SNs is naturally practical, being focused In the present paper, we set out to perform corpus and on suicide prevention ([9]), identification ([ 10]), and au- computational analyses on a novel dataset of authentic thenticity ([11]). For instance, the study by [6] uses clas- suicide notes. Specifically, we aim to explore whether sification algorithms to help mental health professionals and to what extent SNs style and content vary according to genre (letter vs. online post) and sociolinguistic factors CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, (the victim’s gender and age, as well as the addressee and *DCecor0r4es—po0n6,d2in02g4a,uPtihsao,rIstaly time period of the SN). To this end, we employ Structural †Although both authors are equally responsible for the conceptual- Topic Modelling ([14]) and keyword analysis, subjectivity ization and the contents of this article, for the purposes of Italian analysis ([15]), and sentiment and emotion analysis ([16, academia, we specify what follows: L. Busso authored Section 1, 17]).

Section 2, and Section 3; C. R. Combei authored Section 4, Section 5, and Section 6. $ l.busso@aston.ac.uk (L. Busso); claudiaroberta.combei@unipv.it 2. Data (C. R. Combei)

https://orcid.org/0000-0002-5665-771X (L. Busso); Despite the presence in the literature of various datasets https://orcid.org/0000-0003-1884-8205 (C. R. Combei) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License of suicide letters, none - to the authors’ best knowledge Attribution 4.0 International (CC BY 4.0).

The resulting corpus contains 26,214 tokens, and in

cludes texts from 1902 to 2023. Unavoidably, the distribution is skewed towards more recent texts (only 5 texts and ease of interpretation, and we model the efect of the are from before 1950, and only 14 are from before 1990. "genre" covariate on topic content (i.e. lexical content The majority of the corpus (75%) includes SNs from 1990 used within topics) and prevalence (i.e. the frequency to the present day). However, the corpus is balanced for with which a topic is discussed). textual type (genre), with 43 letters (51% of the tokens) Figure 1 shows the top 10 word probabilities for the 3 and 33 social media posts (49% of the tokens). The SNs topics in the corpus. Following extensive concordance also cover a wide range of addresses, including messages analysis to explore the keywords in context, the three directed to family, life-partners (including ex-partners), topics have been labelled: friends, the internet, or cases where the addressee is unspecified.

3. Topic Modelling and keywords analysis 3.1. Structural Topic Models

1. Topic 1: Explanations. This topic clusters words related to reasons, motives, and emotions associated with the act of suicide. 2. Topic 2: Anguish. This topic clusters words related to the intimate feelings of pain and hurt that accompany suicidal ideation. 3. Topic 3: Connectedness. This topic clusters words that refer to close connections to other people in the victim’s life. are freely available to other researchers. Furthermore, existing corpora are usually either very small, and hence not suitable for quantitative analysis, or too big, and hence not controlled for the parameters we are interested in analysing. Therefore, we decided to collect a new dataset of genuine suicide notes to fill this gap, and to make it available to researchers interested in the topic. Given the sensitivity of the topic, corpus files are available upon requests to the authors. Using the semi-automated software Bootcat ([18], we collected a corpus consisting of 76 suicide letters and social media posts1. The SNs have the following characteristics: • freely available on the open internet (i.e., not behind paywalls or log-in platforms) • taken from reputable news websites to ensure authenticity (i.e. not taken from blogs or other non-oficial sources) • only notes that were reproduced in full (i.e. not from extracts or quotes in other texts)

Topic Models (TM) are a family of unsupervised learn

ing algorithms that cluster co-occurring words across documents into thematic nodes, or "topics" ([19]). These As mentioned above, we model the efect of genre (letalgorithms require a substantial human input, as the top- ter vs. online post) for topical content and prevalence. ics retrieved should be interpretable by the researcher While we find no statistical diferences ( p > .05) for topiassigning meaning to the patterns discovered ([20, 21]). cal content, some interesting diferences arise in topical

In this study we use Structural Topic Modelling ([14]), prevalence, as can be seen in Table 1 and in Figure 2. a type of TM that allows to model topics distribution as a Specifically, we observe that online posts discuss signififunction of document-level covariates in regression-like cantly less private feelings of anguish and pain (Topic 2) schemes. The STM analyses are performed in R[22]. We and significantly more interpersonal relationships (Topic select a number of topics K=3 based on mathematical fit 3).

1We based our data retrieval on the sources provided by [7], and

expanded on them through targeted Google searches. For privacy reasons, online posts were only collected if reported by newspaper articles, and were not retrieved on social media platforms themselves.

3.2. Keyword Analysis To explore the corpus further, beside the "black box" of the STM algorithm, we performed a keyword analysis. Using SketchEngine ([23]), we extract keywords for both

letters and social media posts using EnTenTen21 as reference corpus. To ensure that we only consider words that are used throughout the corpus, we discarded instances with a low ARF (average reduced frequency) score ([24]). Not surprisingly, many keywords are shared across the two subcorpora, reflecting "universal" themes of suicidal ideation such as apologies, goodbyes, and explanations. However, idiosyncratic keywords paint an interesting picture (see Figure 3), as online posts seem to display a lower prevalence of intimate feelings, and more polarized emotion words and swearwords.

4. Subjectivity analysis Subjectivity analysis investigates what is generally la

belled as a "private state", namely opinions, feelings, beliefs, speculations ([15]: 674), typically classifying a text on a scale ranging from high objectivity to high subjectivity.

Our paper uses this analysis because we see subjectivity as a relevant stylistic and content-related element, useful for understanding suicidal ideation. Although this is a preliminary study, we believe that findings from subjectivity, sentiment, and emotion analysis, supported by the exploration of psychosocial factors (not the object of this paper), could be useful for evaluating the risk of (repeated) suicide attempts. In particular, we expect that highly subjective texts may signal intense personal turmoil, which has, in fact, been reported as a potential risk factor for suicide ([25]).

This research uses the TextBlob library for Python that provides tools for various textual analyses, including subjectivity, as part of its sentiment analysis function2. The tool uses a pattern analyzer and a pre-defined dictionary of word polarity and subjectivity. It also incorporates intensity, accounting for the impact of modifiers, which can increase or reduce the measured subjectivity score. Each SN is processed to extract its overall subjectivity score that ranges from 0 (i.e., highly objective) through 1 (i.e., highly subjective). To discuss the efect of genre and sociolinguistic factors on the subjectivity score, we present the results of statistical analyses conducted in R[22].

First of all, the mean subjectivity score at the corpus level (M = 0.56, SD = 0.12) indicates that SNs are characterized by a level of subjectivity that falls above the midpoint of the scale (0.50); there is, thus, a tendency toward greater subjectivity than objectivity. Interestingly, however, the mean subjectivity scores and their distributions are nearly identical between letters (M = 0.56, SD = 0.13) and social media posts (M = 0.57, SD = 0.12).

Next, based on Figure 4, SNs written from 1950-1969 seem to have the highest subjectivity score (M = 0.72, SD = 0.15). In contrast, the lowest subjectivity is found for SNs written from 1990-1999 (M = 0.49, SD = 0.12), followed by those from 1970-1989 (M = 0.52, SD = 0.15). SNs written before 1950 (M = 0.56, SD = 0.11), from 2000-2019 (M = 0.56, SD = 0.11), and from 2020-now (M = 0.56, SD = 0.13) have identical subjectivity scores.

The results displayed in Figure 5 indicate that subjectivity scores of SNs addressed to life-partners (M = 0.61, SD = 0.06) are the highest, followed by those addressed to family (M = 0.60, SD = 0.09). This suggests that SNs addressed to people with whom the victim has a close relationship are characterized by a deeper personal engagement and a more vivid linguistic expression than those addressed to the internet (M = 0.56, SD = 0.12), to friends (M = 0.55, SD = 0.08), and to other addressees (M

2The sentiment analysis score itself obtained from the TextBlob tool

is not used in this study, as more advanced methods for investigating sentiment are preferred (see Section 5) sentiment and emotion analysis. Sentiment analysis is defined as "the task of finding the opinions of authors about specific entities" ([ 26]: 82). Emotion analysis (also Figure 4: Subjectivity as a function of the year group emotion classification), on the other hand, is often seen as a more refined version of sentiment analysis, since it deals with the identification of primary emotions in a text ([27]).

For this research we employ the latest version available (at the time of writing) of Twitter-roBERTa-base for sentiment analysis, a model trained on over 124 million tweets that is fine-tuned for this task with the TweetEval benchmark ([ 28, 29, 30 ]). For emotion classification, we use the Emotion English DistilRoBERTa-base model ([ 31 ]) to extract Ekman’s six basic emotions ([ 32 ]): anger, disgust, fear, joy, sadness, and surprise, along with a neutral class.

The model is a fine-tuned version of DistilRoBERTa-base, trained on six balanced datasets, each containing 2,811 observations per emotion, for a total of almost 20,000 observations.

Our analysis reveals that the average probability of Figure 5: Subjectivity as a function of addressee negative sentiment (M = 0.61, SD = 0.31) is roughly three times higher than the average probability of neutral (M = 0.22, SD = 0.15) and positive sentiment (M = 0.17, SD = 0.54, SD = 0.16). The standard deviations for most ad- = 0.28). Then, the dominant sentiment in each SN is dressees (i.e., partner, family, friends) are relatively small, determined by identifying the highest probability among suggesting limited variation within these groups. the three sentiment classes. We find that 73% of the

As regards the victim’s gender, the average subjectivity SNs have negative sentiment as the highest probability, score for females (M = 0.58, SD = 0.11) is slightly higher 17.1% positive sentiment, and 9.2% neutral sentiment. than the score for males (M = 0.54, SD = 0.14), but the This trend is also supported by Figure 6 that shows the standard deviations point out that the ranges overlap to distribution of sentiment probabilities, confirming that a large extent. Finally, no consistent tendency emerges most SNs have a higher likelihood of expressing negative from the distribution of subjectivity scores with respect sentiment. We interpret these results as a reflection of to the victim’s age. In fact, there is substantial varia- the emotional distress tied to both writing the suicide tion within each age group, meaning that the degree of notes and the thoughts surrounding the act of suicide subjectivity in SNs is influenced by other factors. itself.

Some interesting tendencies are observed from the analysis of sentiment distribution across sociolinguistic 5. Sentiment and emotion analysis factors and genre. First, Figure 7 illustrates a consistent diference between the two genres: online posts have In order to obtain a more fine-grained image of the emo- a higher prevalence of negative sentiment (90.9%) comtional dimension of the SNs, and to complement the pre- pared to letters (60.5%). viously discussed findings on the topics and subjectivity Next, all SNs from 1970-1989 show negative sentiment of these texts, we also present and discuss the results of as being dominant (100%). A high presence of negative sentiment (88.5%) is also present in SNs written from 2020-now. Interestingly, SNs from 1990-1999 display a balanced sentiment distribution (50% negative and 50% positive), marking the only period in our corpus with such a high presence of positive sentiment. This situation could be due to the fact that the authors of these (very long) SNs are well-known celebrities (e.g., Kurt Cobain and OJ Simpson). Even if the letters were not intended for the general public, the idea these texts might eventually become public could have influenced the victims to transmit more positive messages.

Some patterns of sentiment distribution are traceable when considering the addressee of the SN. Positive sen- Figure 9: Emotions as a function of genre timent is more common when the addressee is the victim’s partner (40%) or family (35.7%). Contrarily, a very high percentage of negative sentiment is observed in SNs addressed to the general public on the internet (93.1%). compared to letters (40%). On the other hand, neutrality Figure 8 shows that the negative sentiment is slightly and joy, the only two non-negative emotions, are more more frequent in SNs written by female victims (72.7%) frequent in letters (14.6% and 8.8%, respectively) than in compared to male victims (68.2%). As for the victim’s online posts (9.5% and 3.2%, respectively). age, a distinct pattern is dificult to identify, but, negative The analysis reveals that sadness is the most prevalent sentiment is the most frequent (over 65%) in SNs written emotion across all time periods. In particular, the presby teenagers (10s) and people in their 20s, 30s, 40s, and ence of sadness exceeds 50% in SNs from 1970-1989 and 60s. from 2020-now. Then, the SNs written from 1970-1989

Moving on to emotion analysis, the average probability are also characterized by a definite presence of disgust of SNs conveying sadness (M = 0.48, SD = 0.37 ) is four (22.2%). In line with the sentiment analysis results, SNs times higher than the average probability of conveying from 1990-1999 contain the lowest presence of sadness anger (M = 0.12, SD = 0.22), fear (M = 0.12, SD = 0.21), (40.8%) and generally the lowest presence of negative and neutrality (M = 0.12, SD = 0.18). Sadness (53.9%) is, emotions overall, compared to other periods. SNs written indeed, the dominant emotion in the corpus, followed by before 1950 display the highest presence of fear (17.3%) neutrality (13.2%), anger (11.8%), and fear (7.9%). This is in the corpus, although sadness still remains the most determined by identifying the highest probability among prevalent emotion in this period. the seven emotion classes for each individual SN. From Figure 10, we can identify a clear disparity be

We can pinpoint some interesting outcomes from the tween the emotions transmitted by female and male vicanalysis of emotions across genres and sociolinguistic tims. Sadness appears more frequently in SNs written by factors. As concerns genre, Figure 9 depicts an obvious females (53.1%) compared to males (41.5%). Additionally, diference between letters and online posts. On the one anger is more prevalent in SNs written by males (17.1%), hand, sadness is more frequent in online posts (59.2%) ranking as their second most common emotion (after sadness). ings (e.g., anguish and pain) and greater polarized emotion words and swearwords.

Subjectivity analysis revealed that SNs tended to be more subjective than objective, irrespective of the genre.

Some diferences based on addressees were identified in the corpus; for example, SNs directed toward close relationships (i.e., life-partners and family) showed higher subjectivity scores, suggesting a more profound and personal style, compared to those directed toward the broader (internet) public.

As far as sentiment analysis is concerned, negative sentiment was dominant in the corpus (i.e. three times more frequent than neutral or positive sentiment), especially Figure 10: Emotions as a function of gender in online posts. Then, the analysis of emotions revealed that sadness was the main emotion in the corpus. This evident presence of sadness and negative sentiment reflects the complex emotional challenges and inner struggles that victims experienced at the time they wrote their SNs.

Although sadness was the most common emotion in both letters and online posts, it occurred more frequently in the latter text type. Also, letters tended to convey more positive emotions (e.g., joy) more frequently than online posts. Finally, the analysis revealed that sadness was more common in the SNs written by female victims and by teenagers.

All in all, our results reveal that the content, discourse, and emotional expression in SNs vary as a function of genre, sociolinguistic factors, and relationship dynamics.

These diferences uncover the need of taking into acFigure 11: Emotions as a function of age group count specific social, demographic, and cultural variables when designing and implementing suicide prevention and intervention strategies. In this sense, we believe that

Although Figure 11 illustrates a complex distribution of corpus-based and NLP research on SNs can contribute to emotions across the age groups of the victims, some pat- the improvement of these personalized strategies. terns still emerge. Sadness is the most common emotion in the SNs of all age groups except for those written by Acknowledgments people in their 30s, where neutrality prevails (36.2%). Interestingly, teenagers express the lowest neutrality (3.4%) The research presented in this paper was conducted while and the highest sadness (60.1%). Additionally, fear is C. R. Combei benefited from support provided by the prominent among SNs written by people over 70 years project "PON Ricerca e Innovazione 2014–2020 - Linea old (31.8%), making it the second most frequent emotion Innovazione (D.M. 1062/2021)". for this age group. Fear is also the second most common emotion for SNs written by teenagers (14.6%).

6. Conclusions

This mixed-methods study analysed the content and style of 76 SNs written over the course of a century, using genre, several sociolinguistic factors, and relationship dynamics as covariates. First of all, three main topics emerged from our corpus, that we labelled as Explanations, Anguish, and Connectedness. Looking at the diferences in topical prevalence between the two text types, we observed that online posts displayed less private feelThe Journal of Abnormal and Social Psychology 59 USA, 2008, pp. 1556–1560.

(1959) 58–67. [18] M. Baroni, S. Bernardini, Bootcat: Bootstrapping [4] B. Samraj, J. M. Gawron, The suicide note as a corpora and terms from the web., in: Proceedings genre: Implications for genre theory, Journal of of the fourth international conference on language English for Academic Purposes 19 (2015) 88–101. resoiurces and evaluation, Lisbon, Portugal, 26-28 [5] A. E. Jaafar, H. A.-S. Jasim, A corpus-based stylis- May 2004, 2004, pp. 1313–1316. tic analysis of online suicide notes retrieved from [19] D. M. Blei, Probabilistic topic models, Communicareddit, Cogent Arts & Humanities 9 (2022) 2047434. tions of the ACM 55 (2012) 77–84. [6] J. Pestian, H. Nasrallah, P. Matykiewicz, A. Ben- [20] J. Chang, S. Gerrish, C. Wang, J. Boyd-Graber, nett, A. Leenaars, Suicide note classification using D. Blei, Reading tea leaves: How humans internatural language processing: A content analysis, pret topic models, Advances in neural information Biomedical informatics insights 3 (2010) BII–S4706. processing systems 22 (2009) 288–296. [7] S. Ghosh, A. Ekbal, P. Bhattacharyya, Cease, a cor- [21] M. E. Roberts, B. M. Stewart, D. Tingley, C. Lucas, pus of emotion annotated suicide notes in english, J. Leder-Luis, S. K. Gadarian, B. Albertson, D. G. in: Proceedings of the twelfth interantional confer- Rand, Structural topic models for open-ended surence on language resoiurces and evaluation, 2020, vey responses, American journal of political science pp. 1618–1626. 58 (2014) 1064–1082. [8] A. M. Schoene, A. Turner, G. R. D. Mel, N. Dethlefs, [22] R Core Team, R: A Language and Environment for Hierarchical multiscale recurrent neural networks Statistical Computing, R Foundation for Statistical for detecting suicide notes, IEEE Transactions on Computing, Vienna, Austria, 2023. URL: https:// Afective Computing 14 (2021) 153–164. www.R-project.org/. [9] M. Chatterjee, P. Kumar, P. Samanta, D. Sarkar, Sui- [23] A. Kilgarrif, V. Baisa, J. Bušta, M. Jakubíček, cide ideation detection from online social media: A V. Kovář, J. Michelfeit, P. Rychly`, V. Suchomel, The multi-modal feature based technique, International sketch engine: ten years on, Lexicography 1 (2014) Journal of Information Management Data Insights 7–36.

2 (2022) 100103. [24] J. Hlaváčová, P. Rychly`, Dispersion of words in [10] T. Zhang, A. M. Schoene, S. Ananiadou, Automatic a language corpus, in: Text, Speech and Diaidentification of suicide notes with a transformer- logue: Second International Workshop, TSD’99 based deep learning model, Internet interventions Plzen, Czech Republic, September 13–17, 1999 Pro25 (2021) 100422. ceedings 2, Springer, 1999, pp. 321–324. [11] M. Ioannou, A. Debowska, Genuine and simulated [25] A. Schuck, R. Calati, S. Barzilay, S. Bloch-Elkouby, suicide notes: An analysis of content, Forensic I. Galynker, Suicide crisis syndrome: A review of science international 245 (2014) 151–160. supporting evidence for a new suicide-specific di[12] E. T. Sudjana, N. Fitri, Kurt cobain’s suicide note agnosis, Behavioral Sciences and the Law 37 (2019) case: Forensic linguistic profiling analysis, Inter- 223–239. national Journal of Criminology and Sociological [26] R. Feldman, Techniques and applications for senTheory 6 (2013) 217–227. timent analysis, Communications of the ACM 56 [13] N. Malini, V. Tan, Forensic linguistics analysis of (2013) 82–89.

virginia woolf’s suicide notes, International Journal [27] C. R. Combei, A. Luporini, Sentiment and emotion of Education 9 (2016) 53–58. analysis meet appraisal: A corpus study of tweets [14] M. E. Roberts, B. M. Stewart, D. Tingley, Stm: An related to the COVID-19 pandemic, Rassegna ItalR package for structural topic models, Journal of iana di Linguistica Applicata 53 (2021) 115–136. statistical software 91 (2019) 1–40. [28] J. Camacho-Collados, K. Rezaee, T. Riahi, [15] A. Montoyo, P. Martínez-Barco, A. Balahur, Sub- A. Ushio, D. Loureiro, D. Antypas, J. Boisson, jectivity and sentiment analysis: An overview of L. Espinosa Anke, F. Liu, E. Martínez Cámara, the current state of the area and envisaged devel- TweetNLP: Cutting-edge natural language proopments, Decision Support Systems 53 (2012) 675– cessing for social media, in: Proceedings of the 679. 2022 Conference on Empirical Methods in Natural [16] L. Bing, Sentiment Analysis: Mining Opinions, Sen- Language Processing: System Demonstrations, timents, and Emotions, Cambridge University Press, Association for Computational Linguistics, Abu Cambridge, 2015. Dhabi, UAE, 2022, pp. 38–49. [17] C. Strapparava, R. Mihalcea, Learning to identify [29] D. Loureiro, F. Barbieri, L. Neves, L. Espinosa Anke, emotions in text, in: Proceedings of the 2008 ACM J. Camacho-collados, TimeLMs: Diachronic lanSymposium on Applied Computing, SAC ’08, Asso- guage models from Twitter, in: Proceedings of the ciation for Computing Machinery, New York, NY, 60th Annual Meeting of the Association for Com

[1]

J. J.

Shapero , The language of suicide notes , Ph.D. thesis , University of Birmingham, 2011 .

[2]

Tellari ,

Zanchi , Il suicidio di universitari nei media italiani: Uno studio basato su corpus , in: S. Matiola, M. Milicevic Petrovic (Eds.), CLUB - Working Papers in Linguistics, volume 8 ,

AMS

Acta AlmaDL , Bologna, 2024 , pp. 1 - 20 .

[3]

C. E.

Osgood ,

E. G.

Walker , Motivation and language behavior: a content analysis of suicide notes ., putational Linguistics: System Demonstrations, Association for Computational Linguistics , Dublin, Ireland, 2022 , pp. 251 - 260 .

[30]

Barbieri ,

Camacho-Collados ,

L. Espinosa

Anke , L. Neves, TweetEval: Unified benchmark and comparative evaluation for tweet classification , in: T. Cohn,

He , Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 , Association for Computational Linguistics , Online, 2020 , pp. 1644 - 1650 .

[31]

Hartmann , Emotion English DistilRoBERTabase, https://huggingface.co/j-hartmann/emotionenglish-distilroberta-base/, 2022 .

[32]

Ekman , Basic emotions , in: T. Dalgleish, T. Power (Eds.), The Handbook of Cognition and Emotion , John Wiley & Sons, Ltd, Sussex, U.K., 1999 , pp. 45 - 60 .