1. Introduction

GRU with Author Profiling Information to Detect Aggressiveness

María Guadalupe Garrido-Espinosa

Alejandro Rosales-Pérez

alejandro.rosales@cimat.mx 0

Adrián Pastor López-Monroy

pastor.lopez@cimat.mx 1 0 Mathematics Reseach Center (CIMAT) Monterrey, Alianza Centro 502 , 66629, Nuevo León 1 Mathematics Research Center (CIMAT) , Jalisco s/n Valenciana, 36023, Guanajuato

246 251

This paper describes our participation for the Aggressiveness Identification Track in the third edition of MEX-A3T. The task focuses on the detection of aggressive tweets in Mexican Spanish. Our approach consists in the use of a Bidirectional Gated Recurrent Unit merged with author profiling derived features. The challenge results indicate that our proposal exceeds a Support Vector Machine baseline.

eol>Aggressiveness Detection Bidirectional GRU Author profiling

1. Introduction

The social media enables users to be in contact with others they care about. It also ofers a way to discuss, and disseminate information as well as share opinions with the particularity that the people can decide to show or hide their identity; this makes easier for the users to express themselves freely, but also removes the face to face incentives to avoid being ofensive.

Given the huge amount of shared data, it is dificult to manually catch all aggressive messages. So, there is a need to construct mechanisms that help to detect them automatically to avoid harassment on social media and prevent physical assaults derived from aggressive comments.

The Aggressiveness Identification Track in MEX-A3T [ 1] encouraged the development of methods to determine whether a tweet written in Mexican Spanish is aggressive or not. Based on the results obtained by [ 2 ] to tackle the aggressiveness identification problem, we evaluated the usage of author profiling derived characteristics along with a Gated Recurrent Unit (GRU) network. The challenge results showed that our proposal exceeds a Support Vector Machine (SVM) baseline.

This article is organized as follows, Section 2 details the proposed method and the way that author profiling characteristics were predicted. Section 3 describes the corpus and the results obtained with the training set. Subsequently, in Section 4 the results of the competition are presented and finally, the conclusions and future work are presented in Section 5.

2. System

We preserved all the content words in the tweets. To tokenize, all punctuation marks were removed, converting the text into space separated sequences of words. These sequences were split into a list of tokens to form a vocabulary. Each word in the vocabulary is represented as a vector with a pretrained word embeddings. We used FastText embeddings from Spanish Billion Word Corpus [3] of size 300.

A bi-directional GRU model using words as inputs is proposed, this model is combined with the predictions on gender and occupation of users (using a reference model and using a one-hot-encoding). Then a ReLU activation is applied, followed by a dropout, and a dense layer for making predictions; Fig. 1 shows the architecture diagram. At the end, the model considered only the gender and Sciences-Student occupation categories (the remaining categories were discarded by a 2 criterion).

2.1. Bidirectional Gated Recurrent Unit

The bidirectional recurrent neural networks perform better on certain tasks where the order is meaningful and are frequently used on natural language processing [4].

The Bidirectional GRU is formed by two regular GRU, each of which processes the input sequence in one direction, left-to-right and right-to-left, and then it merges their representations. By proceeding in this way, the Bidirectional GRU can capture patterns that might be pass over by a unidirectional GRU.

A regular GRU calculates each hidden state ℎ as follows: = ( + ℎ −1) = ( + ℎ −1) ℎ̃ = ℎ

( + ( ⊙ ℎ −1) ℎ = (1 − ) ∗ ℎ −1 + ∗ ℎ̃ (Update Gate) (Reset Gate) (Candidate) (Output) where , , , , and are the parameters to be learned in the training phase. The function is the logistic sigmoid function and ⊙ is the element-wise multiplication [5].

The method proposed in this work uses a Bidirectional GRU network with ℎ̂ = ℎ⃖⃖⃗ + ⃖ℎ⃖ ⃖ as the way of merging the two GRUs.

2.2. Author Profiling features

In order to introduce more information to the model, we used the Mexican corpus for author profiling from MEX-A3T 2019 [ 6] to predict three labels: gender, place of residence, and occupation, where we considered a diferent model for each label. The occupation label has eight classes: arts, student, social, sciences, sports, administrative, health, and others, while the place of residence has six classes: north, northwest, northeast, center, west, and southeast.

We adopted the n-gram ensemble approach proposed by [7] for each one of the attributes to forecast with a little variation in the size of n-grams. The n-gram ensemble approach involves four steps: the first extracts groups of n-grams of size one to three at word level and size three to five at character level. In the second step, for each group, the best n-grams are selected using 2 criterion. This process led to choose the best five thousand, two thousand, and thousand n-grams at word level, and the best two thousand, three thousand, and five thousand at character level. All of them are concatenated in the third step and used to classify with a SVM in the fourth step.

Once the prediction is done, the one-hot-encoding is applied to each label, and the resulting features are further filtered with the 2 criterion to select the best three features. This process leaved three author profiling features: gender, student, and sciences occupation.

3. Experimental Settings and Preliminary Evaluation

In this section we describe the corpus provided by the organizers, the partitions used to make experiments, the architecture used, and the preliminary results obtained. Table 1 shows the tweets distribution in the training and test set

To perform experiments, we made a partition with the 7,332 samples set: 70% was taken to train, 10% to validate, and 20% to test. Fig. 1 shows the architecture of our Bidirectional GRU model. The embedding layer outputs an embedding vector of size 86 × 300 and feeds a Bidirectional GRU layer with 128 hidden units. Next, a global max pooling layer and a global average pooling layer flatten the Bidirectional GRU output by taking the average and max value, both of them are concatenated into a vector of size 1 × 256.

In other channel, the author profiling features feed a dense layer with identity activation and with 16 hidden units. The outcome of this layer is concatenated with the pooling outcome and form a vector of size 1 × 272. It is then passed to another layer with ReLU activation and 64 units. Before the final prediction, a dropout layer with a rate of 0.10 is used to regularize the network.

Table 2 shows the results obtained by the method described in Section 2 at the validation stage. The F1 obtained with the fusion of gender, sciences, and bi-GRU features is slightly better than the model that incorporates student variable but is nearly a point better than the method without author profiling features.

4. Competition Results

In this section we will present our results in the competition. Table 3 lists the final rankings for the challenge in the aggressiveness detection task. DeepMath-1 corresponds to the experiment with gender and sciences while DeepMath-2 also includes the student trait. They ranked ninth and tenth correspondingly.

5. Conclusions and future work

In this paper, we reported our participation in the MEX-A3T 2020 project to classify aggressive and non-aggressive tweets written in Mexican Spanish. We proposed a Bidirectional GRU at word level with author profiling information. The results showed that the use of extra information as gender and sciences occupation allows us to get a better performance than those obtained without author profiling features. The competition results also showed that the proposed method was able to outperform the BoW-SVM baseline provided by the organizers as well as several proposed methods by other competitors.

Future work includes conducting experiments with Bidirectional GRU at the character level to capture dependencies in text missed by the one at the word level.

Acknowledgments References

First author would like to thank CONACyT for financial support through scholarship number 718246.

[1] M. E. Aragón, H. Jarquín, M. Montes-y Gómez, H. J. Escalante, L. Villaseñor-Pineda, H. Gómez-Adorno, G. Bel-Enguix, J.-P. Posadas-Durán, Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish, in: Notebook Papers of

2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF) , Malaga, Spain,

September , 2020 . [2]

Casavantes ,

López ,

L. C.

González , UACh at MEX-A3T 2019 : Preliminary Results

(IberLEF 2019 ), CEUR WS Proceedings , 2019 . [3]

Cardellino , Spanish Billion Words Corpus and Embeddings , 2019 . URL: https://

crscardellino.github.io/SBWCE/. [4]

Chollet , Deep Learning with Python , Manning Publications , 2018 . [5]

Chung ,

Gulcehre ,

Cho ,

Bengio , Empirical Evaluation of Gated Recurrent Neural

Networks on Sequence Modeling, arXiv preprint arXiv:1412.3555 ( 2014 ). [6]

M. E.

Aragón ,

Á . Álvarez-Carmona , M.

Montes-y Gómez , H. J.

Escalante , L. Villasenor-

Pineda , D.

Moctezuma , Overview of MEX-A3T at IberLEF 2019: Authorship and Aggres-

siveness Analysis in Mexican Spanish Tweets , in: Notebook Papers of 1st SEPLN Workshop

on Iberian Languages Evaluation Forum (IberLEF) , Bilbao, Spain, 2019 . [7]

M. E.

Aragón ,

A. P.

López-Monroy , Author Profiling and Aggressiveness Detection in

Spanish tweets: MEX-A3T 2018 ., in: IberEval@ SEPLN, 2018 , pp. 134 - 139 . [8]

Graf ,

Miranda-Jiménez ,

E. S.

Tellez ,

Moctezuma ,

Salgado ,

Ortiz-Bejar , C. N.

Sánchez , INGEOTEC at MEX-A3T: Author Profiling and Aggressiveness Analysis in

Twitter

Using

tc and EvoMSA , in: IberEval@ SEPLN, 2018 , pp. 128 - 133 . [9]

Kim , Convolutional Neural Networks for Sentence Classification , arXiv preprint

arXiv:1408.5882 ( 2014 ). [10]

Albadi ,

Kurdi ,

Mishra , Investigating the Efect of Combining GRU Neural Networks

Network Analysis and Mining 9 ( 2019 ) 41 . [11]

Zhang ,

Robinson ,

Tepper , Detecting Hate Speech on Twitter Using a Convolution

GRU Based Deep Neural Network, in: European semantic web conference , Springer, 2018 ,

pp. 745 - 760 . [12] C. E. M. Cuza , G. L. De la Peña Sarracén , P. Rosso , Attention Mechanism for Aggressive

Detection , in: CEUR Workshop Proc., volume 2150 , 2018 , pp. 114 - 118 .