=Paper=
{{Paper
|id=Vol-2936/paper-159
|storemode=property
|title=INFOTEC-LaBD at PAN@CLEF21: Profiling Hate Speech Spreaders on Twitter through Emotion-based
Representations
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-159.pdf
|volume=Vol-2936
|authors=Hiram Cabrera,Sabino Miranda,Eric Tellez
|dblpUrl=https://dblp.org/rec/conf/clef/CabreraMT21
}}
==INFOTEC-LaBD at PAN@CLEF21: Profiling Hate Speech Spreaders on Twitter through Emotion-based
Representations==
INFOTEC-LaBD at PAN@CLEF21: Profiling Hate Speech Spreaders on Twitter through Emotion-based Representations Notebook for PAN at CLEF 2021 Hiram Cabrera , Sabino Miranda-Jiménez and Eric S. Tellez INFOTEC Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación, Circuito Tecnopolo Sur No. 112, Fracc. Tecnopolo Pocitos Aguascalientes, Ags., México Abstract Nowadays, social media is perhaps one of the most powerful channels of communication among people worldwide. Despite the physical constraints, people in a social network communicate efficiently and instantaneously without restriction. While this can promote the interchange of ideas and information, in this scenario, people with a dangerous idiosyncrasy can achieve more people with low restrictions. Automatic hate speech identification in social networks is a Natural Language Processing task dedicated to pointing out users that have this kind of misconduct among its publication’s content. In this work, we tackled the PAN21 Hate Speech identification task through Semantic Emotion-based models in both Spanish and English languages. We implement several approaches, one of them is designed to output explainable results based on the user’s emotional charge. Keywords hate speech, author profiling, emotion-based classification 1. Introduction Social media is perhaps one of the most powerful channels of communication among people worldwide. Despite the physical constraints, people in social networks such as Twitter interchange efficiently and instantaneously ideas and information without restriction. In this scenario, hate speech emerges as a problem when communication denigrates a person or a group based on some characteristics such as race, color, gender, or sexual orientation. Automatic identification of hate speech has been popular because of the nature of social networks, and it has been tackled on several fronts. On the one hand, several competitions have been run contests at the message level. For example, the HatEval [1] challenge considers the identification of hate speech against immigrants and women in Twitter as a two-class classification problem, i.e., whether a tweet is hateful or not hateful. Also, OffensEval [2] challenge consists in determining whether a given message has offensive content. This event runs several tasks, such as identifying whether a message has offensive language and categorizing offense types. Among the offense types, OffensEval considers messages containing an insult or threat to someone, or a CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " hiram.cabrerap@infotec.mx (H. C. ); sabino.miranda@infotec.mx (S. M. ); eric.tellez@infotec.mx (E. S. Tellez) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) tweet containing non-targeted profanity and swearing, and identifying the target, i.e., whether the offensive content is about an individual, a group, or others. On the other hand, author profiling has become a powerful tool for NLP applications offering multiple approaches to tackle several tasks such as author attribute identification [3, 4], sentiment analysis [5, 6], and text classification [7]. In this sense, the PAN @ CLEF21 challenge considers the profiling hate speech spreaders task [8, 9], identifying authors who have shared some hate speech in the past according to the tweets published. The competition determines whether a user is a hate speech spreader given a set of 200 tweets per author, for English and Spanish languages. Approaches to face this problem commonly use external information such as lexicons or datasets from related domains to enrich the knowledge database or encode semantic information generally using word embeddings. In the following sections, we introduce the tools used in our approach to profiling hate speech spreaders. Sentiment and emotion lexicons Using lexicons of labeled words is a fundamental tool in many text classification approaches. The main idea behind this method is to use curated information that associates words with some helpful information to solve the task. Despite that dictionary-based classifiers are among the first approaches in the community, its simplicity and the inherent ability of explanation, these methods remain popular even today. For instance, Nielsen [10] introduces AFINN,1 a lexicon associating a vocabulary of more than three thousand words in four languages with a degree of sentiment. It also includes sentiment scores for emojis. Bing Liu [11, 5] also provides a list of English words and their associated sentiment.2 Currently, the lexicon contains close to 6800 entries. Mohammed and Turney [12] introduced the NRC Word-Emotion Association Lexicon (also called EmoLex);3 the lexicon contains more than 14,000 English words associated with eight basic emotions: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. EmoLex also provides the scoring for both negative and positive sentiments. All these lexicons have been automatically translated into several languages. The number of words has also been increased since their creations. However, one of the main drawbacks of lexicons is that they need to be created by experts and have an inherent dependence on language and domain. Another critical issue is the lack of exhaustiveness due to the explosion of terms (e.g., neologisms, inflections, synonymy, hyponymy, hypernymy). Finally, depending on the domain, the lexical variations and errors also negatively affect the performance of the models. Our work is based on the EmoLex dictionary along with semantic representations of the vocabulary to cope with many of the issues regarding plain lexicons. Word embeddings Several of these limiting issues can be solved using semantic word-embeddings, which are semantic lexicons associating words with a vector in a semantic space. Semantic spaces are 1 https://github.com/fnielsen/afinn 2 https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html 3 https://www.saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm learned from a huge non-annotated text corpus, based on the distributional hypothesis of semantics, i.e., words with similar meanings will tend to appear in similar contexts. Some examples of word embeddings are fastText [13] and Global vectors (GloVe) [14]. Semantic language models are sophisticated methods dedicated to understanding language, and they are created to predict the semantic of a sequence of words through very large non-annotated corpora. Our approach centers on word meanings instead of sentence meanings; therefore, language models are beyond the scope of this contribution. The interested reader is referred to the related literature [15, 16, 17, 18]. FastText4 is a word-embedding with a fast construction. It learns word distributional semantic using small windows around words. In addition to other approaches, it is designed to tackle out-of-vocabulary words using subwords which are small substrings that compose words and are to compute word-vectors whenever a word is unknown. Using fastText models, we cope with the vocabulary diversity found in social networks. Machine learning models A popular way to tackle author profiling problems is supervised learning; in this approach, a set of labeled examples is given to an algorithm to create a model that can label never-seen examples. The PAN @ CLEF21 asks for profiling of hate speech spreaders using user’s textual information retrieved from Twitter. Therefore, our dataset examples are a list of text messages and their associated label. Classical supervised learning will receive a set of examples (𝑋, 𝑦) of the form 𝑋 = 𝑥1 , 𝑥2 , · · · , 𝑥𝑛 and 𝑦 = 𝑦1 , 𝑦2 , · · · , 𝑦𝑛 , where 𝑥𝑖 is a vector of some dimension 𝛿, i.e., 𝑥𝑖 ∈ R𝛿 . On the other hand, 𝑦𝑖 is a categorical value that represents a valid class. For this matter, profiling methods based on supervised learning need to transform input messages into a real-valued vector, 𝑦 is often obtained by human annotators that assign a label to each example. From a general perspective, the vectorization can be based on how authors write, the actual content, or the meaning of their messages. More detailed, we can capture how authors write using stylometry features [19]. The content-based representations follow a generic procedure that relies on preprocessing the text, tokenize it with a variety of possible schemes to create bag-of-words, and then using a weighting scheme to vectorize [7, 20]. Other more sophisticated approaches use word embeddings or language models to vectorize using the semantic of the author’s messages. Section 2 shows how our approach handles this step. Once the dataset is in the (𝑋, 𝑦) form, we need to learn from these examples to obtain a predicting model. There are different types of machine learning models used for the Author Profiling task [21]. For instance, we evaluate our approach with Naïve Bayes, K-nearest neighbors, Support Vector Machines, Logistic Regression, and Gradient Boosting, among others. The details about these methods are studied in [22, 23] and the precise implementation is documented in [24]. 4 https://fasttext.cc/ Our contribution This notebook tackles the problem of profiling Hate Speech Spreaders, in Spanish and English languages, through its published messages in social media. In particular, we focus on the homonymous task in PAN @ CLEF21 for benchmarking our model. Each dataset, English, and Spanish, contains 200 cases for each of the two languages, each dataset contains 200 different authors, and each author is represented by two hundred messages. Furthermore, each author is labeled as a hate speech spreader or not. Our approach is designed to be explainable through the projection of users into an Emotion- space induced by EmoLex. It can support variations of the same concepts and lexical variations of the same word based on semantic representations with the help of fastText. Multilingual support for Spanish and English languages is straightforward since translations of EmoLex, and existing pre-trained models of fastText for Spanish and English languages. We test our emotion-based encoding with seven different classifiers and provide a brief statistical analysis in the experimental results section. Section 2 details our modeling approach. Roadmap This section contextualizes our participation in PAN @ CLEF21. Section 2 details the construction and prediction stages of our model. The experimental setup and results are described in sections 3 and 4. Our final comments and conclusions are given in Section 5. 2. Emotion-based modeling of users Our model is pretty general and straightforward. However, its construction needs a set of Emotion-prototypes based on the emotion lexicon and word embeddings. We use the EmoLex and pre-trained FastText embeddings for both Spanish and English languages. Figure 1 illustrates the general flow of the prototype’s computation. First, the procedure segments words per emotion; in the case of the EmoLex, it associates each word with ten emotions and sentiments: anger, anticipation, disgust, fear, joy, negative, positive, sadness, surprise, and trust. Note that words linked with several emotions will be linked with such emotions. The emotion sub-lexicons are then used to create a unique vector prototype that summarizes the emotion using word embeddings of individual words, using the companion fastText word-embedding model. These emotion-prototypes are stored as 𝑃 = {𝑝1 , 𝑝2 , · · · , 𝑝10 }, and will be used to encode user’s messages and be able to train the models and predict new instances. Once emotion-prototypes are created, we can train and predict. Figure 2 shows the flow that transforms an author into an emotion vector. Each author is represented as the sequence of its messages; these messages are plain text normalized and partitioned into a list of tokens. This procedure removes diacritic symbols, punctuation signs, duplicate letters, stop words, URLs, and emojis in this step. Along with text normalization, all user mentions are normalized to USER. Messages are then tokenized as unigrams. Finally, these unigrams are used to create a sequence that will be used to create a 300-dimensional vector. Input: Emotion lexicon, word-embedding Segmentation Words are grouped by emotion Computation for the fol- Emotions to vector lowing emotions: The words in each emotion are used to create emotion-prototypes as follows: 1. Anger 2. Anticipation ∑︁ 𝑤𝑡 𝑝*𝑖 = 3. Disgust ‖𝑤𝑡 ‖ 𝑡∈emotion𝑖 4. Fear 𝑝*𝑖 5. Joy 𝑝𝑖 = ‖𝑝*𝑖 ‖ 6. Negative 7. Positive where emotion𝑖 is the set of words per emotion found in the emotion lexicon; and 𝑤𝑡 8. Sadness represents the embedding vector of term 𝑡. 9. Surprise 10. Trust Output: Emotion prototypes Each prototype is a 300-dimension vectors computed, one per emotion: 𝑃 = {𝑝1 , 𝑝2 , · · · , 𝑝10 } Figure 1: Computing emotion-prototypes for our modeling. The emotion-prototypes are used to map user’s messages to an emotion-space, using the cosine similarity among prototypes and user vectors. First, we need to transform the sequence of words into a single vector using the companion word-embedding; then, we will use this vector and vector prototypes 𝑃 to create a 10-dimensional vector used by a classifier at training and prediction stages. 3. Experimental setup Our implementation of the emotion-space above detailed in §2 is based on a number of well-known libraries in the Python language. NLTK [25] for preprocessing of text messages, Scikit-Learn [24] is used to create features and machine learning algorithms, fastText [26] for mapping Tweets to semantic space. In particular, we use the model pre-trained on 600 billion tokens on Common Crawl for the English task and the one from [27] to tackle the Spanish task. We use the NRC Lexicon [12] (EmoLex) to create lists of representative keywords; it contains a multilanguage pack with support for English and Spanish languages. We ran our tests on a computer with a 3.4 GHz Quad-core Intel Core i7, 32 GB RAM, and operating system macOS Catalina 10.15.7. Preprocessing The data provided by PAN organizers were 200 XML documents for both English and Spanish, each document corresponds to a user and includes 200 tweets written by him. Input: User’s messages Text preprocessing and normalization Our implementation Lower case, diacritic and punctuation uses the fastText’s symbols removed, de-duplication of letters, get_sentence_vector stop-word removal, emoticons and URLs are to vectorize user’s mes- also removed. sages. Sequence to vector The normalized messages are represented as a vector as follows: ∑︁ 𝑤𝑡 𝑢 ˇ= The cosine is computed ‖𝑤𝑡 ‖ as follows: 𝑡∈text 𝑢 ˇ 𝑢 ^= ‖𝑢 ˇ‖ ∑︀ 𝑖 𝑥𝑖 · 𝑦𝑖 cos(𝑥, 𝑦) = ‖𝑥‖ ‖𝑦‖ where text is a collection of words in user’s messages, and 𝑤𝑡 represents the 300-dimensional vector of term 𝑡 in the word embedding. User’s vector with the following attributes: 1. Anger Emotion-based vectorization 2. Anticipation We represent each user as 3. Disgust 𝑢 = [cos(𝑝1 , 𝑢 ^ ), · · · , cos(𝑝10 , 𝑢 ^ ), cos(𝑝2 , 𝑢 ^ )] 4. Fear 5. Joy where 𝑝𝑖 is an Emotion-prototype and 𝑢 ^ the 6. Negative 300-dimensional user vector 7. Positive 8. Sadness 9. Surprise Output: 10-dimensional vector 𝑢 10. Trust Figure 2: Projecting users into the emotion-space. We create the emotion-prototypes and project our datasets as described in §2. The preprocessing of texts is performed with the help of NLTK5 package. Therefore, we characterize every user with a 10-dimensional vector, and this representation is the input for the machine learning models at training and prediction stages. Model selection In order to participate in the PAN contest, we develop several models using the training data provided by the organizers. We split the original training dataset into two subsets to test our models. We randomly assign 70%/30% from the training dataset to training/test splits, each of which forms a balanced subset. We consider several classifiers for creating our emotion-based models. More precisely, we use Naive Bayes (NB), K-Nearest Neighbor (KNN), both linear and non-linear Support Vector Machine (SVM), Nearest Centroid (NC), Logistic Regression (LR), and Gradient Boosting (GB) from the Scikit-Learn package. We ran a hyperparameter optimization process to select those 5 https://www.nltk.org/ Table 1 Grid-searched hyper-parameters for the used machine learning models Model Hyperparameters Model Name (Scikit-learn parameter name) Values Iterations (max_iter ) 5000 LinearSVM Random number generator (random_state) 42 Number of neighbors (n_neighbors) {1, 2, 3, 4, 5, 6, 7} Power (p) {1, 2, 5} KNN Weight function (weights) {uniform, distance} Algorithm to compute NN (algorithm) {auto, ball_tree, kd_tree, brute} Leaf size (leaf_size) {10, 20, 30, 50} Regularization coefficient (C) {0.1, 1, 10, 100, 1000} SVM Kernel type (kernel) {linear, rbf} Kernel coefficient for rbf (gamma) {1, 0.1, 0.01, 0.001, 0.0001} NB Stability (var_smoothing) 1 . . . 10−9 with 100 equally spaced points Threshold for shrinking centroids (shrink_threshold) 0 . . . 1 with linear steps of NC 0.01 Distance (metric) {euclidean, manhattan} Number of estimators (n_estimators) {10,100,1000} Learning rate (learning_rate) {0.001, 0.01, 0.1} GB Subsample ratio (subsample) {0.5, 0.7, 1.0} Maximum depth of a tree (max_depth) {3, 7, 9} models that perform the best. Nonetheless, we used default parameters for Logistic Regression. Table 1 lists the parameter grid used for the model selection. We used grid search on the mentioned space for the search process and weighted each model using a 5-fold, 3-repetition stratified k-fold cross-validation. We chose the parameter combination in each language that had the highest accuracy during the cross-validation for the final selection. Thus, the final models were fitted on the entire training set. The best hyperparameters are summarized in Table 2. After every learning algorithm fit a model for each language, we evaluate them using the test subset to predict the class labels, obtaining an estimate of how well these models perform on unseen data. The accuracies achieved are shown later in Table 3. Finally, based on the test results, the ML models selected to participate in the Hate Speech Spreader task were chosen. 4. Experimental results This section presents the experimental results of our approach for the PAN @ CLEF21 Hate Speech Spreader Profiling task. As commented, we divided the dataset to perform model selection, and therefore we show the results for this internal process and the results in the official gold standard. Figure 3 shows the performance of model evaluation using 5-fold with 3-repetition stratified k-fold cross-validation for both English and Spanish. The classifiers SVM for English and LSVM for Spanish evidence that their data points consistently hover around the center values; thus, the predictions will have less variation. Likewise, these two models have competitive medians Table 2 The best hyperparameters for the machine learning models Language Model Hyperparameters max_iter = 5000 LinearSVM random_state = 42 n_neighbors = 6 KNN p=5 leaf_size = 10 C = 1000 SVM kernel = rbf EN gamma = 1 NB var_smoothing = 0.8111308307 shrink_threshold = 0.0 NC metric = manhattan n_estimators = 1000 GB learning_rate = 0.001 subsample = 0.5 max_iter = 5000 LinearSVM random_state = 42 n_neighbors = 7 KNN p=1 leaf_size = 10 C = 100 SVM gamma = 1 ES NB var_smoothing = 0.2310129700 NC shrink_threshold = 0.93 learning_rate = 0.001 GB subsample = 0.5 comparing to the others. 0.8 0.90 0.7 0.85 0.80 0.6 Evaluation Evaluation 0.75 0.5 0.70 0.65 0.4 0.60 lsvm svm knn nc bayes lr gb lsvm svm knn nc bayes lr gb ML Model ML Model (a) English language (b) Spanish language Figure 3: Accuracy distribution of our models using the cross-validation partitions. The higher, the better. The performance for the machine learning models is shown in Table 3. Note that the highest accuracy during the cross-validation is SVM with RBF kernel for English and LinearSVC for Spanish. Subsequently, we choose these two models based on the model evaluation (see Fig. 3) and the accuracy from Table 3. Table 3 Accuracy for the different classifiers applied to our emotion-based author encodings. The higher, the better. Accuracy Model EN ES Linear SVC 71% 73% SVM rbf 78% 70% KNN 73% 66% NC 60% 71% NB 61% 65% LR 60% 72% GB 75% 67% Table 4 Accuracies achieved during the Cross-Validation process and on the test set Model (language) C-V (training set) PAN@CLEF21 (test set) SVM RBF (EN) 78% 62% LSVM (ES) 73% 78% Average 75% 70% The accuracy of our models was 5% lower on the official test set compared to the cross- validation results as shown in Table 4. Analysis of emotion-based hate speech spreader models Gradient boosting is an ensemble of decision trees that, instead of accessing data through a kernel function, accesses attributes directly. Decision trees require the computation of the importance of each attribute as part of its construction algorithm; it is feasible to use it outside of the decision tree context to obtain insight into the problem. In particular, we use the Gini importance [24] to measure the influence of each attribute on the final decision. Figure 4 shows the importance of each attribute, as seen by our models. For example, in the case of English, Fig. 4a, there is some remarkable difference between the fourth most important and the rest of them. Trust, Disgust, Anticipation, and Joy are the most critical predictors in English, highlighting Trust and Disgust. On the other hand, Fig. 4b shows the attribute importance for the Spanish language; Disgust dominates the other features, clearly standing out for determining hate speech spreaders. Our Spanish modeling closely mimics Plutchik’s classification of emotions [28] that link hatred to three primary emotions: Disgust, Anger, and Fear. Figure 5 shows the emotion and sentiment distributions per class, computed on the PAN @ CLEF21 training set. The top row shows distributions for the English language. Here we observe that no hate speech spreaders (negative examples) have large variations in their emotions as Trust Disgust Disgust Anger Anticipation Joy Joy Negative Surprise Sadness Feature Feature Fear Surprise Anger Positive Sadness Anticipation Negative Trust Positive Fear 0.00 0.05 0.10 0.15 0.20 0.0 0.1 0.2 0.3 0.4 0.5 Relative importance Relative importance (a) English language (b) Spanish language Figure 4: Feature importance of the gradient boosting decision tree as applied to our emotion- based author’s encoding. The higher, the most important. 0.975 0.975 0.950 0.950 0.925 0.925 0.900 0.900 0.875 0.875 0.850 0.850 0.825 0.825 0.800 0.800 Anger Disgust Joy Positive Surprise Anger Disgust Joy Positive Surprise Anticipation Fear Negative Sadness Trust Anticipation Fear Negative Sadness Trust (a) Negative class - English language (b) Positive class - English language 0.90 0.90 0.85 0.85 0.80 0.80 0.75 0.75 0.70 0.70 Anger Disgust Joy Positive Surprise Anger Disgust Joy Positive Surprise Anticipation Fear Negative Sadness Trust Anticipation Fear Negative Sadness Trust (c) Negative class - Spanish language (d) Positive class - Spanish language Figure 5: Distribution of the emotion-space (i.e., computed with the emotion projection procedure, see Fig. 2) for both English and Spanish languages. compared with those in the positive class. Note that several median values (white point in the box inside the violin shape) also dramatically moves. This effect is easily noticeable by those features with the highest Gini importance, see Fig. 4a. Figures 5c and 5d illustrates how emotions distributes for the Spanish language writers for negative and positive classes. Again, we observe that mass concentrates around the median for positive class; we can also observe a noticeable difference, the median in some attributes like Anger, disgust, and fear; this could be associated with a higher emotional charge in the Spanish language Hate Speech spreaders. 5. Conclusions This paper proposes Semantic Emotion-based models on both Spanish and English languages to cope with the Profiling Hate Speech Spreaders challenge at PAN @ CLEF21. Our approach was designed to be explainable through the projection of users into an Emotion-space induced by EmoLex, supporting lexical variations of the same word based on semantic representations with the help of word embeddings such as fastText. We conducted a broad model selection study to get the best performing algorithms for our approach. In this sense, we selected SVM with RBF kernel for English and Linear SVC for Spanish as our emotional-based models. Unfortunately, the accuracy of our models was 5% lower on the test set compared to the cross-validation results. There is still room to improve our work in the future; our next steps in the research include exploring the use of different word embeddings and lexicons with different emotion classifications to improve the overall effectiveness of the classification process. Due to our emotion-centered modeling, we noticed that the Spanish model closely mimics Plutchik’s classification of emotions for hatred from our experiments. This behavior requires a more profound exploration to describe its effect on a more fine scale. References [1] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. Rangel, P. Rosso, M. Sanguinetti, Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter, in: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), Association for Computational Linguistics, 2019. [2] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L. Der- czynski, Z. Pitenis, c. Çöltekin, SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020), in: Proceedings of SemEval, 2020. [3] F. Rangel, Author profile in social media: Identifying information about gender, age, emotions and beyond, in: Proceedings of the 5th BCS IRSG Symposium on Future Directions in Information Access, 2013, p. 58–60. [4] F. M. R. Pardo, P. Rosso, Overview of the 7th author profiling task at pan 2019: Bots and gender profiling in twitter., in: CLEF (Working Notes), volume 2380 of CEUR Workshop Proceedings, CEUR-WS.org, 2019. [5] L. Bing, Sentiment analysis: mining opinions, sentiments, and emotions, Cambridge University Press, 2015. [6] M. E. Aragón, A. P. López-Monroy, L. C. González, M. Montes-y-Gómez, Attention to emotions: Detecting mental disorders in social media, in: TSD: International Conference on Text, Speech, and Dialogue, volume 12284, Springer, Cham, 2020, pp. 231–239. [7] E. S. Tellez, D. Moctezuma, S. Miranda-Jiménez, M. Graff, An automated text categoriza- tion framework based on hyperparameter optimization, Knowledge-Based Systems 149 (2018) 110–123. [8] J. Bevendorff, B. Chulvi, G. L. D. L. P. Sarracén, M. Kestemont, E. Manjavacas, I. Markov, M. Mayerl, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wolska, , E. Zangerle, Overview of PAN 2021: Authorship Verification,Profiling Hate Speech Spreaders on Twitter,and Style Change Detection, in: 12th International Conference of the CLEF Association (CLEF 2021), Springer, 2021. [9] F. Rangel, G. L. D. L. P. Sarracén, B. Chulvi, E. Fersini, P. Rosso, Profiling Hate Speech Spreaders on Twitter Task at PAN 2021, in: G. Faggioli, N. Ferro, A. Joly, M. Maistro, F. Piroi (Eds.), CLEF 2021 Labs and Workshops, Notebook Papers, CEUR-WS.org, 2021. [10] F. A. Nielsen, A new anew: Evaluation of a word list for sentiment analysis in microblogs., in: M. Rowe, M. Stankovic, A.-S. Dadzie, M. Hardey (Eds.), MSM, volume 718 of CEUR Workshop Proceedings, CEUR-WS.org, 2011, pp. 93–98. [11] M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 168–177. [12] S. M. Mohammad, P. D. Turney, Crowdsourcing a word-emotion association lexicon, Computational Intelligence 29 (2013) 436–465. [13] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin, Advances in pre-training distributed word representations, in: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018. [14] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in: Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. URL: http://www.aclweb.org/anthology/D14-1162. [15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, arXiv preprint arXiv:1706.03762 (2017). [16] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [17] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, OpenAI blog 1 (2019) 9. [18] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, arXiv preprint arXiv:2005.14165 (2020). [19] S. Ashraf, O. Javed, M. Adeel, H. Iqbal, R. M. A. Nawab, Bots and gender prediction using language independent stylometry-based approach., in: CLEF (Working Notes), 2019. [20] F. Rangel, P. Rosso, M. Potthast, B. Stein, Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter, Working notes papers of the CLEF (2017) 1613–0073. [21] J. Bevendorff, B. Ghanem, A. Giachanou, M. Kestemont, E. Manjavacas, I. Markov, M. May- erl, M. Potthast, F. M. R. Pardo, P. Rosso, G. Specht, E. Stamatatos, B. Stein, M. Wiegmann, E. Zangerle, Overview of PAN 2020: Authorship verification, celebrity profiling, profiling fake news spreaders on twitter, and style change detection, in: A. Arampatzis, E. Kanoulas, T. Tsikrika, S. Vrochidis, H. Joho, C. Lioma, C. Eickhoff, A. Névéol, L. Cappellato, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction - 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22-25, 2020, Proceedings, volume 12260 of Lecture Notes in Computer Science, Springer, 2020, pp. 372–383. [22] G. James, D. Witten, T. Hastie, R. Tibshirani, An introduction to statistical learning, volume 112, Springer, 2013. [23] N. Cristianini, J. Shawe-Taylor, et al., An introduction to support vector machines and other kernel-based learning methods, Cambridge university press, 2000. [24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830. [25] S. Bird, E. Klein, E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit, " O’Reilly Media, Inc.", 2009. [26] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, Fasttext.zip: Compressing text classification models, CoRR abs/1612.03651 (2016). URL: http: //arxiv.org/abs/1612.03651. [27] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning word vectors for 157 languages, in: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018. [28] R. Plutchik, The emotions: Facts, theories and a new model., American Journal of Psychology 77 (1964) 518.