Use of Lexical and Psycho-Emotional Information to Detect Hate Speech Spreaders on Twitter Notebook for PAN at CLEF 2021 Riccardo Cervero Università degli Studi di Milano-Bicocca (UNIMIB), Milan, Italy Abstract This notebook summarises the participation at the "Profiling Hate Speech Spreaders on Twitter" shared task [1] at PAN at CLEF 2021 [2], and describes the proposed method for the goal of binary classification into hate speech spreaders and non spreaders. This method consists in an ensemble method inspired by Buda-Bolonyai’s previous work - based on the separate training of different baselines and the subse- quent definition of a meta-model for the final prediction - has been proposed for both the English and Spanish corpora, with the introduction of more in-depth features relating to the personality traits of the users and the psychological and emotional dimensions detectable from the text they published. The aforementioned system achieved an accuracy result of 0.7 for the English-writing users’ dataset and 0.8 for the Spanish-writing users’ dataset (with a final average result of 0.75). Keywords Hate speech detection, personality traits, psycho-linguistic patterns, sentiment analysis 1. Introduction The structure of online social networks, while able to offer several advantages - including the possibility of sharing content with thousands of users with ease - also encourages the prolifer- ation of toxic narratives. In particular, the information filtering systems used to personalise the experience of each user have caused the intellectual isolation of certain sub-communities, called echo chambers [3]. These virtual bubbles are configured as closed virtual environments, and are extremely attractive to some readers because they propose tendentious contents that provoke a strong emotional engagement and because they exploit the so-called confirmation bias [4]. These ideological frameworks are the source of most of the hate messages spreading on the Web [5]. Given the scale of the phenomenon, it is necessary to exploit computational linguistics tools to stem its spread. This is precisely where the "Profiling Hate Speech Spreaders on Twitter" task at PAN at CLEF 2021 lies: the objective is to identify Twitter users who show a tendency to publish posts containing hate speech ("hate speech spreaders"), checking whether this can be done by extracting linguistic features from the last 200 tweets in their timeline. The task is carried out on two corpora in English and Spanish. CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " r.cervero@campus.unimib.it (R. Cervero)  0000-0002-6642-9147 (R. Cervero) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) In particular, this project will use features that can infer, from the raw text, the users’ personality traits, the psychological and cognitive processes underlying the textual content and the emotional dimension of the published message. The underlying hypothesis is that individuals’ psychological inclinations may influence not only their real-life interactions, but also their behaviour within the virtual community, as suggested by numerous studies [7, 8, 9]. For instance, it is clear from this research that trolling is correlated with mental disorders that often result in the dissemination of violent content: anti-social tendency, aggressive behaviour, psychopathy, narcissism and even the sadistic personality disorder (SPD). The paper is organised as follows: Section 2 presents related works about the hate speech detection; Section 3 introduces the dataset provided at the task and describes the proposed system - outlining not only the predictive architectures employed, but also the features on which those has been trained. Lastly, the results and conclusions are discussed in Sections 4 and 5 respectively. 2. Related Work The phenomenon of hate speech has recently become a popular area of research. The possibility of counteracting the dissemination of aggressive narratives towards specific targets by means of automatic processes has very often been tested. In general, a data-driven approach, i.e. extracting different types of features and exploiting them in combination with Machine Learning techniques to estimate a model that can produce the smallest possible error on new data, appears to be more frequent and effective. Classifiers used for this purpose are of various kinds: Naïve Bayes, as implemented by Kwok and Wang [10] in combination with a Bag-of-Words approach; Support Vector Machines, again applied on Bag-of-Words features by Greevy et al. [11]; Logistic Regression, trained, for instance, on N-grams, as it is the case with Waseem and Hovy’s work [12] on Twitter users. Other more in-depth research has focused on identifying sub-classes of hate speech: Salminen et al. [13] have even developed a taxonomy of hate content in online social media, including the four main macro-categories of accusation, humiliation, swearing and promotion of violence. As the tools applied to Natural Language Processing evolved, more complex approaches were tested in various studies. One case is that of the introduction of Deep Learning tecnhiques: Mikolov et al. [14] use embeddings as features; in other many studies, one can find deep architectures such as Convolutional Neural Networks [15], Recurrent Neural Networks [16], a combination of both [17], transformers - in particular, BERT [18]. In general, however, the best results are often offered by ensemble methods. One case is that of MacAvaney et al. [19], who have exploited the innovative multi-view learning strategy, creating separate view-classifiers for groups of different features and then combining them with a Linear Support Vector Machine to produce a meta-model. Other cases are [20] and [21]. Given the effectiveness of the latter solution, in this project an ensemble method for profiling hate speech spreaders has been chosen. 3. Method This Section will illustrate the proposed system for identifying Twitter users who spread hate messages, based on the linguistic and semantic features that can be extracted from the texts they published in the past. Software submission was made via TIRA platform [22]. 3.1. Datasets The two datasets provided for the "Profiling Hate Speech Spreaders on Twitter" task are both composed of 200 observations, each recording in a unique sub-corpus the last 200 tweets published by the respective anonymous user included in the sample - whose username was converted into an alphanumeric sequence by means of a hashing algorithm. No other operations were performed on the original text, except the replacement with specific tokens of any URLs, hashtags and names of the users mentioned or retweeted. 3.2. Environment Setup The entire project was developed within the Python programming environment, exploiting version 3.7. The main libraries used are pandas1 and numpy2 for the management of basic data structures, scikit-learn3 for the application of the key Machine Learning techniques - such as baselines’ training and their validation -, and xgboost4 for the implementation of the gradient boosting method. 3.3. Ensemble Model The architecture of the proposed predictive model is inspired by the work of Buda & Bolonyai [6] at the "Profiling Fake News Spreaders on Twitter" task at PAN at CLEF 2020, aimed to inquire the feasibility of detecting authors who had shared fake news in their past timeline, only looking at the linguistic features extractable from the posts they had published. Their system achieved the best overall accuracy on the English corpus (0.75), and has been tied for first place as far as the average result on both datasets (0.775). In detail, after having trained 4 baselines (a regularized Logistic Regression, one Support Vector Classifier, a Random Forest method and an implementation of the gradient boosting algorithm offered by the XGBoost library) on N-grams collected from the sub-corpora and having estimated a fifth model by implementing again the gradient boosting method on some stylistic features (which will be described later in the Section 3.4.4), a Logistic Regression it is applied as a meta-model that estimated the relative weights of the predictions made separately by each of the stacked baselines, and that in turn returned a final binary prediction about the tendency of the user to spread false information or not. Unlike done with the model originally presented by the authors, in this case it was decided to also 1 Official documentation at https://pandas.pydata.org. 2 Official documentation at https://numpy.org/. 3 Official documentation at https://scikit-learn.org/. 4 Official documentation at https://xgboost.readthedocs.io/. experiment with a Ridge Classifier as a meta-model, selecting ex post the best solution on the basis of the accuracy result produced. The first four baselines undergo a training process consisting in an extensive grid search of the optimal combination among two text pre-processing methods, different vectorization techniques and the parameters and hyperparameters of the models themselves. More specifically, as far as the pre-processing stage, both pipelines convert all the tokens to lower case and remove non alphanumeric characters; the only difference is that the second pipeline implemented preserves the emoticons and emojis. With regard to the corpus vectorization, the Term Frequency – Inverse Document Frequency function is applied to different ranges of N-grams, considering unigrams, bigrams and both, as well as running tests to optimise the hyper-parameter value which set their minimum overall document frequency. Therefore, in conjunction with the search for the best pre-processing and vectorization strategies, the sub-optimal parameters of the four baselines have been found by using a Grid Search Cross Validation technique, with the number of folders set at 5 by the original authors and 10 in the case of this project. As a last step, in order to prevent the ensemble model from overfitting the training set, the Logistic and Ridge Regression have not been trained directly on the predictions of the baselines, but on the approximation of the predictions distribution, obtained by refitting the sub-models with the cross-validated hyperparameters on different chunks of the training set. Thus, as has just been explained, from both corpora provided for the "Profiling Hate Speech Spreaders on Twitter" task at PAN 2021 N-grams were collected on which 4 sub-models were trained respectively; these were then stacked together with a fifth baseline: an XGBoost algo- rithm, whose input consists of a different category of features (in Buda and Bolonyai’s original work composed mostly of some stylistic statistics). The main contribution of this paper lies in the combination of the aforementioned approach by Buda & Bolonyai, which has proved effective in profiling fake news spreaders at PAN 2020 shared task, and a set of features capa- ble of synthesising the personality of the authors and bringing out certain psychological and emotional dynamics that can be useful in accurately classifying Twitter users into ’hate speech spreaders’. The next Section will explain these features in more detail. 3.4. Features Therefore, the new ensemble model includes a fifth baseline, a gradient boosting algorithm, which receives new feature sets as input. In this project, we tested all the possible combinations between the original stylistic features proposed by Buda and Bolonyai (Section 3.4.4) and the different features related to personality traits (Section 3.4.1), psycho-linguistic patterns (Section 3.4.2) and emotions and polarity (Section 3.4.3) extrapolated from the text. In the end, the best results were respectively offered: for the English corpus, by the mix of personality-related features with emotional and sentiment dimensions tagged within the text; for the Spanish corpus, again by the combination of the personality scores extracted through the Five Factor Model, together with psycho-linguistic patterns found by LIWC software, the stylistics features extracted by Buda and Bolonyai, and still the emotions. These respective solutions have been therefore proposed for the computation of the final accuracy on the test set held by the event organisers. It is important to note that, in both optimal feature sets, the features describing authors’ personality traits are present: this demonstrates the validity of the initial hypothesis, according to which the identification of users who publish violent content can be reasonably conducted through a psychological profile performed with computational tools. 3.4.1. Five Factor Model Features The Five Factor Model [23] (FFM) consists in a process of attributing certain psychological characteristics to an individual according to the so-called ’Big Five’ taxonomy, developed by Rothmann and Coetzer [24] as a modern evolution of the dispositional approach to the study of human personality and its consequences on behaviour. This theory - also ’trait theory’ - stems from the discovery of semantic associations as a result of statistical analysis carried out on a sample of personality survey data. Thanks to these evidences, it was possible to demonstrate that the human psychological dimension can be summarised in only five aspects, referred to by words and expressions recurrent in natural language during the description of the personality of an individual. The five suggested standard factors are listed below: openness to experience, conscientiousness, agreeableness, extraversion, and emotional stability. Hence, this approach argues for the existence of semantic associations between different sets of words and each of the five factors through which it appears possible to synthesise the personality of an individual. From this theoretical basis, Neuman and Cohen’s method [25] derives a vector of personality scores which can be provided as input to predictive Machine Learning models. In details, these personality scores are calculated by computing the cosine similarity between the context-free embedding representations of both the input text - written by the selected author - and a set of benchmark adjectives empirically observed as to be able to encode the essence of personality (and selected by Neuman and Cohen). 3.4.2. LIWC Features The Linguistic Inquiry and Word Count (LIWC) [26] is a software developed for Natural Lan- guage Processing tasks, which is able to automatically detect linguistic patterns and map the text into a dense representation composed of 73 psychologically-meaningful linguistic categories. Therefore, this strategy is configured as a lexicon-based method that associates, within a dictio- nary, a set of predefined classes to several tokens. Using this dictionary, it is then possible to tag the sought psycho-linguistic features and thus obtain evidence of mental and cognitive processes underlying the text of the tweets. The resource used in this project is the LIWC2015 dictionary5 , for both languages. This tool considers several inflected variants from about 6400 word stems, as well as certain selected emoticons; each of these linguistic items has been assigned one or multiple categories among the mentioned 73. These psycho-linguistic classes are arranged in a hierarchical structure: three macro-categories have been set up by the creators of the software, which are in turn divided into numerous tags. The three macro-categories with which psycho-linguistic patterns can be labelled are: linguistic dimensions - comprising different types of function words, such as pronouns, article, prepositions, etc.; grammars - including common verbs and adjectives (like eat, come, free, happy, long), comparisons (greater, best, after), interrogatives (how, when, what), numbers, etc.; psychological processes - for instance, of an 5 Official website of LIWC software is at: https://liwc.wpengine.com. affective, cognitive, social or perceptual nature. In conclusion, the LIWC features, with which the Machine Learning models will be trained, coincide with a 73-dimensional vector indicating the total raw occurrence of the mentioned sub-categories within the sub-corpora associated with each author. 3.4.3. Emotional Features The aforementioned components have been combined with a vector recording the raw oc- currence of eight emotional dimensions tagged within the text by exploiting the association proposed by the NRC lexicon6 . In the same way, two dimensions related to the sentiment polarization have been extracted. 3.4.4. Original Features Finally, the potential of the stylistic features proposed by Buda and Bolonyai (originally used for the task of fake news spreaders detection in Twitter) has also been tested on the new task of profiling hate speech spreaders, both in combination with those presented in the previous sections and alone. In details, this type of feature is configured as a set of user-wise statistics: (i) minimum, maximum, mean, standard deviation and range of the length - both in words and in characters - of the tweets; (ii) number of retweets and mentions by the author; (iii) count of additional elements: URLs, hashtags, emojis and ellipses; (iv) lexical diversity, calculated as the type-token ratio of lemmas. 4. Results In this Section, we will present the best results obtained from testing the several new feature sets - derived from all the possible combination of the features presented in Section 3.4 - on which the fifth baseline is trained. The main criterion for selecting the best models on the English and Spanish corpora - then proposed to the "Profiling Hate Speech Spreaders on Twitter" task at PAN 2021 - coincides with the maximisation of the average accuracy obtained on the 10 folds derived from the split of the training set during the Cross Validation procedure, and, in case of very similar results, also with the minimisation of the variance of these accuracy values. As visible in Table 1, the best result on the English corpus is produced by combining Five Factor Model features with lexicon-based emotional and sentiment dimensions. As far as the Spanish dataset, the best performing solution consist in the union of FFM personality scores, LIWC features, the eight emotional dimensions and the Buda-Boloyai’s original set. The first system provided an accuracy of 0.7 on the English test set; the second achieved a result equal to 0.8 on the Spanish test set. These results underline the validity of the initial hypothesis - that the extraction of personality characteristics is extremely useful for the binary classification of users into hate speech spreaders and non spreader -, and the validity of an ensemble approach for the same task. 6 Official NRC Lexicon is at: https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm. Table 1 Best overall solution for both English and Spanish corpora. Language Features Accuracy (Train Set) Accuracy (Test Set) EN FFM + Emo. + Sent. 0.695 0.7 ES FFM+ LIWC+ Emo.+ 0.845 0.8 Styl. 5. Conclusion The ease and speed with which today’s Web technologies allow information to be shared has made online social media an extremely dangerous and effective means for disseminating offen- sive messages, raising the need for automated tools that can stop the flow of toxic information before it contaminates the virtual community. In this paper, hate speech detection is approached from an author profiling perspective: instead of analyzing the single content, the aim is to identify users who tend to publish posts that fall into the category of "hate speech". For the participation to the "Profiling Hate Speech Spreaders on Twitter" task at PAN at CLEF 2021, an ensemble method inspired by a previous work by Buda and Bolonyai is proposed for the detection of fake news spreaders: four baselines are trained on N-grams, and a fifth one receives as input features defined by them. The main contribution of this paper is to propose as fea- tures of the fifth baseline, instead of the descriptive statistics related to writing style originally employed by Buda and Bolonyai, a set of features related to personality traits (defined by the Five Factor Model), and representing psycho-linguistic patterns and emotional dimensions from the text. Thanks to these strategies, the result obtained on the test set provided by the task organisers at PAN 2021 is 0.7 and 0.8 for the English and Spanish corpora respectively. References [1] Rangel F., Liz De La Peña Sarracén G., Chulvi B., Fersini E., Rosso P. "Profiling Hate Speech Spreaders on Twitter Task at PAN 2021". In: Faggioli G., Ferro N., Joly A., Maistro M., Piroi F. "CLEF 2021 Labs and Workshops, Notebook Papers" Conference and Labs of the Evaluation Forum (CLEF 2021), CEUR-WS.org (2021). [2] Bevendorff J., Chulvi B., Liz De La Peña Sarracén G., Kestemont M., Manjavacas E., Markov I., Mayerl M., Potthast M., Rangel F., Rosso P., Stamatatos E., Stein B., Wiegmann M., Wolska M., Zangerle E. "Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter,and Style Change Detection". In: Selcuk Candan K., Ionescu B., Goeuriot L., Larsen B., Müller H., Joly A., Maistro M., Piroi F., Faggioli G., Ferro N. "12th International Conference of the CLEF Association (CLEF 2021)", Springer, Bucharest, Romania (2021). [3] Sunstein C. R. "The law of group polarization". In: Journal of political philosophy 10 (2002), pp. 175—195. [4] Nickerson R. S. "Confirmation bias: A ubiquitous phenomenon in many guises." In: Review of general psychology 2(2) (1998), pp. 175. [5] Del Vicario M. et al. "Echo chambers: Emotional Contagion and Group Polarization on Facebook" In: Proceedings of the National Academy of Sciences 113(3) (2016), pp. 554–559. [6] Buda J., Bolonyai F. "An Ensemble Model Using N-grams and Statisti- cal Features to Identify Fake News Spreaders on Twitter". In: Cappellato L., Eickhoff C., Ferro N., Névéol A. (eds.) CLEF 2020 Labs and Work-shops, Notebook Papers, CEUR- WS.org (2020) [7] Andjelovic T., Buckels E. E., Paulhus D. L., Trapnell P. D. "Internet trolling and everyday sadism: Parallel effects on pain perception and moral judgment". In: Journal of Personality 87(2) (2019), pp. 328–340. [8] Buckels E. E. "Probing the Sadistic Minds of Internet Trolls". In: Society for Personality and Social Psychology (2019). [9] March E., Steele G. "High Esteem and Hurting Others Online: Trait Sadism Moderates the Relationship Between Self-Esteem and Internet Trolling". In: Cyberpsychology, Behavior, and Social Networking 23(7) (2020), pp. 441–446. [10] Kwok I, Wang Y. "Locate the hate: Detecting tweets against blacks". In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence AAAI’13 (2013), pp. 1621–1622. [11] Greevy E, Smeaton AF. "Classifying racist texts using a support vector machine". In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR ’04 (2004), pp. 468–469. [12] Waseem Z, Hovy D. "Hateful symbols or hateful people? Predictive features for hate speech detection on twitter". In: Proceedings of the NAACL Student Research Workshop (2016), pp. 88–93. [13] Salminen J., Almerekhi H., Milenkovic M., Jung S., Kwak H., Jansen B.J. "Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media" (2018). [14] Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. "Distributed representations of words and phrases and their compositionality". In: Proc. NIPS (2013), pp. 3111–3119. [15] Badjatiya P., Gupta S., Gupta M., Varma V. "Deep learning for hate speech detection in tweets". In: Proceedings of the 26th International Conference on World Wide Web Companion, WWW ’17 Companion (2017), pp. 759–760. [16] Del Vigna F., Cimino A., Dell’Orletta F., Petrocchi M., Tesconi M. "Hate me, hate me not: Hate speech detection on facebook". In: ITASEC (2017). [17] Huynh T. V., Nguyen V. D., Nguyen K. V., Nguyen N. L. T., Nguyen A. G. T. "Hate speech detection on vietnamese social media text using the bi-gru-lstm-cnn model" (2019). [18] Devlin J., Chang M., Lee K., Toutanova K. "BERT: pre-training of deep bidirectional transformers for language understanding" (2018). [19] MacAvaney S., Yao H. R., Yang E., Russell K., Goharian N., Frieder O. "Hate speech detection: Challenges and solutions". PLoS ONE. 14(8):1–16 (2019). [20] Nina-Alcocer V. "Vito at HASOC 2019: Detecting hate speech and offensive content through ensembles". In: Mehta P., Rosso P., Majumder P., Mitra M. (eds.) Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019, CEUR Workshop Proceedings, vol. 2517, pp. 214–220. CEUR-WS.org (2019). [21] Nourbakhsh A., Vermeer F., Wiltvank G., Van der Goot R. "Sthruggle at SemEval-2019 task 5: An ensemble approach to hate speech detection." In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 484–488. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019). [22] Potthast M., Gollub T., Wiegmann M., Stein B. "TIRA Integrated Research Architecture". In: Ferro N., Peters C. "TIRA Integrated Research Architecture" in "Information Retrieval Evaluation in a Changing World", Springer, Berlin Heidelberg New York, DOI: 10.1007/978- 3-030-22948-1_5 (2019). [23] John O.P., Srivastava S. "The Big-five Trait Taxonomy: History, Measurement, and Theoret- ical Perspectives". In: Handbook of Personality: Theory and Research (1999), pp. 102–138. [24] Rothmann S., Coetzer E. P. "The big five personality dimensions and job performance". In: SA Journal of Industrial Psychology (29) (2003). [25] Neuman Y., Cohen Y. "A Vectorial Semantics Approach to Personality Assessment". In: Scientific Reports 4(1) (2014). [26] Pennebaker J. W., Boyd R. L., Jordan K., Blackburn K. "The Devel- opment and Psychometric Properties of LIWC 2015". Tech. rep (2015).