UACh at MEX-A3T 2019: Preliminary Results on Detecting Aggressive Tweets by Adding Author Information Via an Unsupervised Strategy Marco Casavantes, Roberto López, and Luis Carlos González Universidad Autónoma de Chihuahua. Facultad de Ingenierı́a. Chihuahua, Chih., México {p271673,jrlopez,lcgonzalez}@uach.mx Abstract. In this paper we describe our participation for the Aggres- siveness Detection Track in the second edition of MEX-A3T. We evaluate different strategies for text classification, including classifiers such as Sup- port Vector Machines and a Multilayer Perceptron trained on n-grams (words and characters) and word embeddings. We also study the inclu- sion of features to try to give context to the text messages and explore if people verbally attack differently depending on their traits and overall environment. Preliminary results show that our strategy is competitive to detect aggression in tweets, ranking in 2nd place with respect to the participants of 2018 and 2019. Keywords: Spanish text classification · Aggressiveness Detection · Mul- tilayer Perceptron. 1 Introduction Technology has changed the way in which people communicate with each other, giving rise to new services such as social networks, where a style of informal com- munication is used. Such social networks, though, present several challenges to maintain communication channels open to the free sharing of ideas. The intoler- ance and aggressiveness of certain users affects the experience of other consumers or people interested in being part of the communities and their conversations. The fact of not being face to face in the communication channel and even pre- serve anonymity, encourages these individuals to express themselves offensively. However, the volume of messages that are sent daily, the growth of online com- munities, and the respective ease of access to these social networks, make the moderation of communication channels a difficult task to be dealt with by con- ventional means, and as people increasingly communicate online, the need for Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 Septem- ber 2019, Bilbao, Spain. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) high quality automated abusive language classifiers becomes much more pro- found[1]. One of the goals of the second edition of MEX-A3T[2] is to tackle this prob- lem and further improve the research of this important NLP task, the detection of aggressive tweets in Mexican Spanish. In this work we evaluate strategies pro- posed before, such as the use of lexical features through TF-IDF representations, and different approaches to add features in order to try to give context to each text. Surprisingly, even tackling the task with such a basic approach our proposal is able to offer competitive results, just slightly behind the top performer of this competition in 2018 and 2019, INGEOTEC. Furthermore, we also investigate how to incorporate author’s traits by using unsupervised methods and attempt- ing to include this information as possible features, based on the hypothesis that there are different ways of aggression depending on the author’s context. 2 Proposed Method 2.1 Data Pre-processing After loading the train and test sets, we strip the tweets from non-alphanumeric characters and only keep some relevant Spanish characters (á,é,ı́,ó,ú,ñ,and ü), all words are then made lowercase and subsequently we noticed that in both sets there exists many different terms to express laughter (mainly due to how many times ”ja” is repeated when the word ”jaja” appears and because of typos) so that led us to replace every word containing ”jaja” to ”risa” (laugh), with the purpose of decreasing the number of terms that represent this emotion. It is worth mentioning that we also created and conducted experiments on a ver- sion of the datasets where emojis were converted to text and hashtags were sepa- rated by words (e.g., ”:)” would turn into ”smiling face”, and ”#FelizMiércoles” would be ”feliz miércoles”), however most hashtags were wrongly separated and the performance of the classifiers decreased by incorporating these steps and were therefore discarded. 2.2 Features We conducted our research using the following features: Lexical: We use word n-grams (n=1, 2) and char n-grams (n=3, 4) as features, this collection of terms is weighted with its term frequency-inverse document frequency (TF-IDF). Document Embeddings: The objective was to represent the tweets through Word Embeddings[3] and try different classifiers with these new features, each text message was converted to a vector of size = 300 (mean of the vectors of each word). The model of words in Spanish was computed with fastText[4] and downloaded from [5] . User Occupation and Location predictions: Although we attempted sev- eral strategies to obtain unsupervised author profiles for each document [6], we 538 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) ended up using the output of the system developed by [7] as predictions of occu- pation and location values to explore the possibility of differences in vocabulary that exists according to the profile of the author of the message. Grouping tweets by theme: An implementation of Self Organizing Maps (SOM) as a clustering strategy called MiniSom[8] was used with aims to find groups in the collection of texts based on underlying or non-explicit features, the clustering was done including all words and also ignoring swear words (to reduce the noise and focus on thematic terms), after training the network we were able to compute the coordinates assigned to a tweet on the map and use these as new features. Perspicuity score / Inflesz scale: Based on [9], we adapted the idea of cap- turing the quality of each tweet by using a modified Flesch Reading Ease score (since this test only applies to text written in English), called Perspicuity score and its equivalence to the Inflesz scale, following the equation described in [10] where the number of sentences is also fixed at one. All the extra categorical features mentioned above were concatenated follow- ing a One Hot Encoding scheme. 3 Experiments and Results The datasets were provided by MEX-A3T Team. Table 1 shows the distribution of training and test partitions for Spanish tweets. Table 1. Data distribution for Spanish tweets corpus Class Training Test Aggressive 2727 N/A Non-aggressive 4973 N/A Total 7700 3156 We separated the training set in 67% for training and 33% for validation to evaluate our experiments with different combinations of features discussed in section 2.2. We started our research by recreating the baselines described in the overview of the first edition of MEX-A3T[11], particularly focusing on the character trigrams baseline, as it holds the best performance in comparison to the BoW baseline. We trained Linear Support Vector Machines and a Multilayer Perceptron as classifiers for this task, and we decided to use the perceptron as the final system to submit our predictions since it exhibited the best results in the validation stage, as shown in Table 2 where we obtained the F1-score macro and the F- Measure over the aggressive class. We performed all modeling regarding the creation of tf-idf feature matrices and SVM classifiers using scikit-learn[12], and for the Multilayer Perceptron, we used the implementation described in [13], 539 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) there was only an instance were this Perceptron couldn’t be trained with Word Embeddings, so we tried another configuration on the MLPClassifier from scikit- learn getting low scores similar to the ones obtained using LinearSVM, and therefore casting aside this approach. 3.1 Results As stated before, the Multilayer Perceptron was chosen as final system, however, because of time and memory constraints we had to train this model using only character n-grams of range [3,4] for this task even though later results shows better performance by using n-grams of range [3,5]. Table 3 list the top five final rankings for the aggressiveness detection task for 2019, more details of all results of the contest are shown at [2]. It is interesting to observe that even when our system relied on such a basic approach, it is able to compete face- to-face against INGEOTEC, a model based on an ensemble of classifiers, which specially tailors discriminative features for aggressive detection via a Genetic Programming strategy. 3.2 Analysis To breakdown our results, we started by getting the 10 most valuable n-grams at character level separated by length, as shown in Table 4. With respect to the aggressive class, our final configuration had more false positives than false negatives, meaning that it was easier for an aggressive tweet to be missclassified as non-aggressive than the other way around. Despite running several experi- ments and adding new features trying to give context to the tweets, in hopes of improving classification in this task, unfortunately these strategies showed, at best, almost unnoticeable changes in the results, and hinder of classification at worst. After manual inspection, we observed that this could have happened because: – Occupation and Location predictions did not group the messages in a bal- anced way, in fact, most tweets would fall under only one out of eight avail- able categories for occupation and six categories for location. – SOM Coordinates would not enhance the classification scores before as the clusters were capturing word repetition instead of thematic aspects for each tweet. Later experiments (after submission of results) showed that this be- haviour was caused because the clustering was made with n-grams; training the SOM with word embeddings created with the train set of this task (with- out external resources) solved this issue and did a better job at grouping the tweets by subjects. – There was no relevant pattern by applying a perspicuity score to each tweet, as there were multiple cases of similar scores assigned to both aggressive and non-aggressive messages. 540 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) Table 2. Detailed classification with F1-scores in the validation stage. Added features Classifier Char n-gram range F1-score F1-score macro (aggressive class) None LinearSVM [3,3] 0.76 0.68 None MLP [3,3] 0.77 0.66 None LinearSVM [3,4] 0.77 0.69 None MLP [3,4] 0.79 0.69 None LinearSVM [3,5] 0.77 0.70 None MLP [3,5] 0.79 0.70 Word Embeddings LinearSVM N/A 0.59 0.39 Word Embeddings MLPClassifier N/A 0.56 0.34 Occupation (O) LinearSVM [3,3] 0.76 0.69 Occupation (O) MLP [3,3] 0.78 0.67 Occupation (O) LinearSVM [3,4] 0.77 0.69 Occupation (O) MLP [3,4] 0.76 0.68 Location (L) LinearSVM [3,3] 0.77 0.68 Location (L) MLP [3,3] 0.77 0.65 Location (L) LinearSVM [3,4] 0.77 0.70 Location (L) MLP [3,4] 0.76 0.67 Perspicuity (P) LinearSVM [3,3] 0.76 0.68 Perspicuity (P) MLP [3,3] 0.77 0.66 Perspicuity (P) LinearSVM [3,4] 0.77 0.70 Perspicuity (P) MLP [3,4] 0.76 0.67 SOM Coordinates (S) LinearSVM [3,3] 0.76 0.68 SOM Coordinates (S) MLP [3,3] 0.78 0.66 SOM Coordinates (S) LinearSVM [3,4] 0.76 0.69 SOM Coordinates (S) MLP [3,4] 0.79 0.69 O+L+P+S LinearSVM [3,3] 0.76 0.69 O+L+P+S MLP [3,3] 0.78 0.67 O+L+P+S LinearSVM [3,4] 0.77 0.69 O+L+P+S MLP [3,4] 0.77 0.69 Table 3. Final scores of the aggressiveness detection task. Rank Team F1-score F1-score (non- Accuracy (aggressive class) aggressive class) 1 INGEOTEC 0.4796 0.8131 0.7250 2 Casavantes (Our approach) 0.4790 0.8164 0.7285 3 GLP (run 2) 0.4749 0.7949 0.7050 4 GLP (run 4) 0.4635 0.7774 0.6854 5 mineriaUNAM (run 2) 0.4549 0.8016 0.7075 541 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) Table 4. Best n-grams at character level in training set Length n-gram Frequency in Frequency in non aggressive class aggressive class 3 chars ’os ’ 3074 3207 ’ de’ 2571 3879 ’as ’ 2205 3252 ’que’ 1991 3540 ’ qu’ 1965 3667 4 chars ’ de ’ 1860 2798 ’que ’ 1768 3262 ’ que’ 1649 2954 ’ put’ 1589 1517 ’ la ’ 1062 2195 4 Conclusions and Future Work In this paper, we describe our strategy to classify aggressive and non-aggressive tweets in Mexican Spanish. In our best performing system, we use only lexical features and our results show a better performance than most results of all par- ticipants. This outcome, and the fact that the F-measure for the aggressive class is still low compared to the score on the non-aggressive class, motivates the idea of future work focusing on feature analysis for aggressiveness detection and ex- plore which representations are truly relevant, including word embeddings, Bag of Words and Characters of different n-gram ranges, see if these complement each other and if so, how to combine them. We analyzed our clustering strategies, and after changing the way they were trained we could observe slight improvement in classification results, motivating us to keep experimenting on ways to try to add context to the text messages. We also believe in the potential that neural networks display for this task, and that more research on how to build and train them properly will certainly improve the current situation of this task. As future work, we look forward to develop new strategies based on deep neural networks, such as Recurrent Neural Networks, which are tools aimed to work with sequential data similar in nature to time series. References 1. Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. Abusive language detection in online user content. In Proceedings of the 25th Inter- national Conference on World Wide Web, WWW ’16, pages 145–153, Republic and Canton of Geneva, Switzerland, 2016. International World Wide Web Conferences Steering Committee. 2. Mario Ezra Aragón, Miguel Á Álvarez-Carmona, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villaseñor-Pineda, and Daniela Moctezuma. Overview of MEX-A3T at IberLEF 2019: Authorship and aggressiveness analysis in Mexican Spanish tweets. In Notebook Papers of 1st SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Bilbao, Spain, September, 2019. 542 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) 3. Quoc Le and Tomas Mikolov. Distributed representations of sentences and doc- uments. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–1188–II–1196. JMLR.org, 2014. 4. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enrich- ing word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017. 5. Github - mquezada/starsconf2018-word-embeddings: Material para el taller ”rep- resentaciones vectoriales de palabras basadas en redes neuronales” de la starsconf 2018. https://github.com/mquezada/starsconf2018-word-embeddings. (Accessed on 06/02/2019). 6. Roberto López Santillán, L.C. González-Gurrola, and Graciela Ramı́rez-Alonso. Custom document embeddings via the centroids method: Gender classification in an author profiling task. In Linda Cappellato, Nicola Ferro, Jian-Yun Nie, and Laure Soulier, editors, CLEF 2018 Evaluation Labs and Workshop – Working Notes Papers, 10-14 September, Avignon, France. CEUR-WS.org, September 2018. 7. Rosa Marıa Ortega-Mendoza and A Pastor López-Monroy. The winning approach for author profiling of mexican users in twitter at mex. a3t@ ibereval-2018. 8. Github - justglowing/minisom: Minisom is a minimalistic implementation of the self organizing maps. https://github.com/JustGlowing/minisom. (Accessed on 06/03/2019). 9. Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. Au- tomated hate speech detection and the problem of offensive language. CoRR, abs/1703.04009, 2017. 10. Escala inflesz — legible. https://legible.es/blog/escala-inflesz/. (Accessed on 06/02/2019). 11. Miguel Álvarez-Carmona, Estefanı́a Guzmán-Falcón, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villaseñor-Pineda, Verónica Reyes-Meza, and Antonio Rico-Sulayes. Overview of MEX-A3T at IberEval 2018: Authorship and aggressive- ness analysis in Mexican Spanish tweets. CEUR Workshop Proceedings, 2150:74– 96, 2018. 12. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. 13. Github - afshinrahimi/sparsemultilayerperceptron: Lasagne / theano based multilayer perceptron mlp which accepts both sparse and dense matrices and is very easy to use with scikit-learn api similarity. https://github.com/afshinrahimi/sparsemultilayerperceptron. (Accessed on 06/03/2019). 543