=Paper=
{{Paper
|id=Vol-2664/mex-a3t_paper2
|storemode=property
|title=GRU with Author Profiling Information to Detect Aggressiveness
|pdfUrl=https://ceur-ws.org/Vol-2664/mexa3t_paper2.pdf
|volume=Vol-2664
|authors=María Guadalupe Garrido-Espinosa,Alejandro Rosales-Pérez,Adrián Pastor López-Monroy
|dblpUrl=https://dblp.org/rec/conf/sepln/Garrido-Espinosa20
}}
==GRU with Author Profiling Information to Detect Aggressiveness==
GRU with Author Profiling Information to Detect Aggressiveness María Guadalupe Garrido-Espinosaa , Alejandro Rosales-Péreza and Adrián Pastor López-Monroyb a Mathematics Reseach Center (CIMAT) Monterrey, Alianza Centro 502, 66629, Nuevo León b Mathematics Research Center (CIMAT), Jalisco s/n Valenciana, 36023, Guanajuato Abstract This paper describes our participation for the Aggressiveness Identification Track in the third edition of MEX-A3T. The task focuses on the detection of aggressive tweets in Mexican Spanish. Our approach consists in the use of a Bidirectional Gated Recurrent Unit merged with author profiling derived features. The challenge results indicate that our proposal exceeds a Support Vector Machine baseline. Keywords Aggressiveness Detection, Bidirectional GRU, Author profiling 1. Introduction The social media enables users to be in contact with others they care about. It also offers a way to discuss, and disseminate information as well as share opinions with the particularity that the people can decide to show or hide their identity; this makes easier for the users to express themselves freely, but also removes the face to face incentives to avoid being offensive. Given the huge amount of shared data, it is difficult to manually catch all aggressive messages. So, there is a need to construct mechanisms that help to detect them automatically to avoid harassment on social media and prevent physical assaults derived from aggressive comments. The Aggressiveness Identification Track in MEX-A3T [1] encouraged the development of methods to determine whether a tweet written in Mexican Spanish is aggressive or not. Based on the results obtained by [2] to tackle the aggressiveness identification problem, we evaluated the usage of author profiling derived characteristics along with a Gated Recurrent Unit (GRU) network. The challenge results showed that our proposal exceeds a Support Vector Machine (SVM) baseline. This article is organized as follows, Section 2 details the proposed method and the way that author profiling characteristics were predicted. Section 3 describes the corpus and the results obtained with the training set. Subsequently, in Section 4 the results of the competition are presented and finally, the conclusions and future work are presented in Section 5. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) email: maria.garrido@cimat.mx (M.G. Garrido-Espinosa); alejandro.rosales@cimat.mx (A. Rosales-Pérez); pastor.lopez@cimat.mx (A.P. López-Monroy) orcid: © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Diagram of architecture proposed to detect aggressive tweets. 2. System We preserved all the content words in the tweets. To tokenize, all punctuation marks were removed, converting the text into space separated sequences of words. These sequences were split into a list of tokens to form a vocabulary. Each word in the vocabulary is represented as a vector with a pretrained word embeddings. We used FastText embeddings from Spanish Billion Word Corpus [3] of size 300. A bi-directional GRU model using words as inputs is proposed, this model is combined with the predictions on gender and occupation of users (using a reference model and using a one-hot-encoding). Then a ReLU activation is applied, followed by a dropout, and a dense layer for making predictions; Fig. 1 shows the architecture diagram. At the end, the model considered only the gender and Sciences-Student occupation categories (the remaining categories were discarded by a 𝜒 2 criterion). 247 2.1. Bidirectional Gated Recurrent Unit The bidirectional recurrent neural networks perform better on certain tasks where the order is meaningful and are frequently used on natural language processing [4]. The Bidirectional GRU is formed by two regular GRU, each of which processes the input sequence in one direction, left-to-right and right-to-left, and then it merges their representations. By proceeding in this way, the Bidirectional GRU can capture patterns that might be pass over by a unidirectional GRU. A regular GRU calculates each hidden state ℎ𝑡 as follows: 𝑧𝑡 = 𝜎(𝑊𝑧 𝑥𝑡 + 𝑈𝑧 ℎ𝑡−1 ) (Update Gate) 𝑟𝑡 = 𝜎(𝑊𝑟 𝑥𝑡 + 𝑈𝑟 ℎ𝑡−1 ) (Reset Gate) ℎ̃𝑡 = 𝑡𝑎𝑛ℎ(𝑊 𝑥𝑡 + 𝑈 (𝑟𝑡 ⊙ ℎ𝑡−1 ) (Candidate) ℎ𝑡 = (1 − 𝑧𝑡 ) ∗ ℎ𝑡−1 + 𝑧𝑡 ∗ ℎ̃𝑡 (Output) where 𝑊𝑧 , 𝑈𝑧 , 𝑊𝑟 , 𝑈𝑟 , 𝑊 and 𝑈 are the parameters to be learned in the training phase. The function 𝜎 is the logistic sigmoid function and ⊙ is the element-wise multiplication [5]. The method proposed in this work uses a Bidirectional GRU network with ℎ̂𝑡 = ℎ⃖⃖⃗𝑡 + ⃖⃖⃖ ℎ𝑡 as the way of merging the two GRUs. 2.2. Author Profiling features In order to introduce more information to the model, we used the Mexican corpus for author profiling from MEX-A3T 2019 [6] to predict three labels: gender, place of residence, and occu- pation, where we considered a different model for each label. The occupation label has eight classes: arts, student, social, sciences, sports, administrative, health, and others, while the place of residence has six classes: north, northwest, northeast, center, west, and southeast. We adopted the n-gram ensemble approach proposed by [7] for each one of the attributes to forecast with a little variation in the size of n-grams. The n-gram ensemble approach involves four steps: the first extracts groups of n-grams of size one to three at word level and size three to five at character level. In the second step, for each group, the best n-grams are selected using 𝜒 2 criterion. This process led to choose the best five thousand, two thousand, and thousand n-grams at word level, and the best two thousand, three thousand, and five thousand at character level. All of them are concatenated in the third step and used to classify with a SVM in the fourth step. Once the prediction is done, the one-hot-encoding is applied to each label, and the resulting features are further filtered with the 𝜒 2 criterion to select the best three features. This process leaved three author profiling features: gender, student, and sciences occupation. 3. Experimental Settings and Preliminary Evaluation In this section we describe the corpus provided by the organizers, the partitions used to make experiments, the architecture used, and the preliminary results obtained. Table 1 shows the 248 Table 1 MEX-A3T corpus distribution Class Train Percent. Test Non-aggressive 5,222 71.2% - Aggressive 2,110 28, 8% - Total 7,332 100% 3,143 Table 2 F1 scores in the validation stage Added Features F1-score (agressive class) None 0.7256 Gender, Sciences, Student 0.7311 Gender, Sciences 0.7328 tweets distribution in the training and test set To perform experiments, we made a partition with the 7,332 samples set: 70% was taken to train, 10% to validate, and 20% to test. Fig. 1 shows the architecture of our Bidirectional GRU model. The embedding layer outputs an embedding vector of size 86 × 300 and feeds a Bidirectional GRU layer with 128 hidden units. Next, a global max pooling layer and a global average pooling layer flatten the Bidirectional GRU output by taking the average and max value, both of them are concatenated into a vector of size 1 × 256. In other channel, the author profiling features feed a dense layer with identity activation and with 16 hidden units. The outcome of this layer is concatenated with the pooling outcome and form a vector of size 1 × 272. It is then passed to another layer with ReLU activation and 64 units. Before the final prediction, a dropout layer with a rate of 0.10 is used to regularize the network. Table 2 shows the results obtained by the method described in Section 2 at the validation stage. The F1 obtained with the fusion of gender, sciences, and bi-GRU features is slightly better than the model that incorporates student variable but is nearly a point better than the method without author profiling features. 4. Competition Results In this section we will present our results in the competition. Table 3 lists the final rankings for the challenge in the aggressiveness detection task. DeepMath-1 corresponds to the experiment with gender and sciences while DeepMath-2 also includes the student trait. They ranked ninth and tenth correspondingly. 5. Conclusions and future work In this paper, we reported our participation in the MEX-A3T 2020 project to classify aggressive and non-aggressive tweets written in Mexican Spanish. We proposed a Bidirectional GRU 249 Table 3 Final scores of aggressiveness detection task Rank Team Name F1-score (agressive class) 1 CIMAT-1 0.7998 2 CIMAT-2 0.7971 3 UPB-2 0.7969 4 UACh-2 0.7720 5 INGEOTEC 0.7468 6 Idiap-UAM-1 0.7255 Baseline (Bi-GRU) 0.7124 7 Idiap-UAM-2 0.7066 8 UACh-1 0.7062 9 DeepMath-1 0.7001 10 DeepMath-2 0.6957 Baseline (BoW-SVM) 0.6760 11 UMUTeam-2 0.6727 12 Intensos-1 0.6619 13 UMUTeam-3 0.6516 14 UGalileo-2 0.6388 15 UGalileo-1 0.6387 16 ITCG-SD 0.6080 17 UMUTeam-1 0.5892 18 UPB-1 0.3437 19 Intensos-2 0.2515 at word level with author profiling information. The results showed that the use of extra information as gender and sciences occupation allows us to get a better performance than those obtained without author profiling features. The competition results also showed that the proposed method was able to outperform the BoW-SVM baseline provided by the organizers as well as several proposed methods by other competitors. Future work includes conducting experiments with Bidirectional GRU at the character level to capture dependencies in text missed by the one at the word level. Acknowledgments First author would like to thank CONACyT for financial support through scholarship number 718246. References [1] M. E. Aragón, H. Jarquín, M. Montes-y Gómez, H. J. Escalante, L. Villaseñor-Pineda, H. Gómez-Adorno, G. Bel-Enguix, J.-P. Posadas-Durán, Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish, in: Notebook Papers of 250 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020. [2] M. Casavantes, R. López, L. C. González, UACh at MEX-A3T 2019: Preliminary Results on Detecting Aggressive Tweets by Adding Author Information Via an Unsupervised Strategy, in: In Proceedings of the First Workshop for Iberian Languages Evaluation Forum (IberLEF 2019), CEUR WS Proceedings, 2019. [3] C. Cardellino, Spanish Billion Words Corpus and Embeddings, 2019. URL: https:// crscardellino.github.io/SBWCE/. [4] F. Chollet, Deep Learning with Python, Manning Publications, 2018. [5] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv preprint arXiv:1412.3555 (2014). [6] M. E. Aragón, M. Á. Álvarez-Carmona, M. Montes-y Gómez, H. J. Escalante, L. Villasenor- Pineda, D. Moctezuma, Overview of MEX-A3T at IberLEF 2019: Authorship and Aggres- siveness Analysis in Mexican Spanish Tweets, in: Notebook Papers of 1st SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Bilbao, Spain, 2019. [7] M. E. Aragón, A. P. López-Monroy, Author Profiling and Aggressiveness Detection in Spanish tweets: MEX-A3T 2018., in: IberEval@ SEPLN, 2018, pp. 134–139. [8] M. Graff, S. Miranda-Jiménez, E. S. Tellez, D. Moctezuma, V. Salgado, J. Ortiz-Bejar, C. N. Sánchez, INGEOTEC at MEX-A3T: Author Profiling and Aggressiveness Analysis in Twitter Using 𝜇tc and EvoMSA, in: IberEval@ SEPLN, 2018, pp. 128–133. [9] Y. Kim, Convolutional Neural Networks for Sentence Classification, arXiv preprint arXiv:1408.5882 (2014). [10] N. Albadi, M. Kurdi, S. Mishra, Investigating the Effect of Combining GRU Neural Networks with Handcrafted Features for Religious Hatred Detection on Arabic Twitter Space, Social Network Analysis and Mining 9 (2019) 41. [11] Z. Zhang, D. Robinson, J. Tepper, Detecting Hate Speech on Twitter Using a Convolution GRU Based Deep Neural Network, in: European semantic web conference, Springer, 2018, pp. 745–760. [12] C. E. M. Cuza, G. L. De la Peña Sarracén, P. Rosso, Attention Mechanism for Aggressive Detection, in: CEUR Workshop Proc., volume 2150, 2018, pp. 114–118. 251