-

Attention mechanism for aggressive detection

Carlos Enrique Mun~iz Cuza

carlos@cerpamid.co.cu 0

Gretel Liz De la Pen~a Sarracen

gretel@cerpamid.co.cu 0

Paolo Rosso

prosso@dsic.upv.es 1 0 Center for Pattern Recognition and Data Mining , Cuba 1 PRHLT, Universitat Politecnica de Valencia , Spain

2018

114 118

This paper describes the system we developed for IberEval 2018 on Aggressive detection track of Authorship and Aggressiveness Analysis on Twitter (MEX-A3T)1. The task focuses on the detection of aggressive comments in tweets that come from Mexican users. Systems must be able to determine whether a tweet is aggressive or not. Our approach is an Attention-based Long Short-Term Memory Network Recurrent Neural Network where the attention layer helps to calculate the contribution of each word towards targeted aggressive classes. In particular, we build a Bidireccional LSTM to extract information from the word embeddings over the sentence, then apply attention over the hidden states to estimate the importance of each word and nally feed this context vector to another LSTM model to estimate whether the tweet is aggressive or not. The experimental results show that our model achieves outstanding results.

Deep Learning Attention-based Neural Network LSTM Model Aggressive Detection Track Twitter

Recurrent Neural Networks (RNNs) is a type of deep neural network designed for sequence modeling. These kinds of models are greatly studied due to their exibility in capturing nonlinear relationships. However, traditional RNNs su er from problems known as exploding or vanishing gradients and, therefore, have di culty in capturing long-term dependencies. Long Short-Term Memory networks (LSTM) [ 1 ] are one of the most used in Natural Language Processing (NLP) to overcome this limitation. They are able to learn the dependencies in lengths of considerably large chains.

Moreover, attention models have become an e ective mechanism to obtain better results [2{6]. In [ 7 ], the authors use a hierarchical attention network for document classi cation. The model has two levels of attention mechanisms applied at the word and sentence-level, enabling it to attend di erentially to more and less important content when constructing the document representation. The 1 https://mexa3t.wixsite.com/home 2 experiments show that the architecture outperforms previous methods by a substantial margin.

In this paper, we propose a similar Attention-based LSTM for the IberEval 2018 track on Authorship and aggressiveness analysis in Twitter (MEX-A3T) [ 8 ]. The attention layer is applied on the top of a Bidirectional LSTM to generate a context vector for each word embedding which is then fed to another LSTM network to detect whether the tweet is aggressive or not. To the best of our knowledge, there has been no other work exploring the use of attention-based architectures for the task.

The task focuses on the detection of aggressive comments. This is a topic that has not been widely studied in the community. The aim is to determine whether a tweet, which comes from Mexican users, is aggressive or not.

The paper is organized as follows. Section 2 describes our system. Experimental results are then discussed in Section 3. Finally, we present our conclusions with a summary of our ndings in Section 4. 2 2.1

System Preprocessing

In the preprocessing step, the tweets are cleaned. Firstly, the emoticons are recognized and replaced by corresponding words that express the sentiment they convey. Also, we remove all links and urls. Afterwards, tweets are morphologically analyzed by FreeLing [ 9 ]. In this way, for each resulting token, its lemma is assigned. Then, the tweets are represented as vectors with a word embedding model. This model was generated by using the word2vec algorithm [ 10 ] from the Wikipedia collection in Spanish. 2.2

Method

We propose a model that consists of a Bidirectional LSTM neural network (BiLSTM) at the word level. At each time step t the Bi-LSTM gets as input a word vector wt with syntactic and semantic information, known as word embedding [ 10 ]. Afterward, an attention layer is applied over each hidden state ht. The attention weights are learned using the concatenation of the current hidden state ht of the Bi-LSTM and the past hidden state st 1 of a Post-Attention LSTM (Pos-Att-LSTM). Finally, the target aggressiveness of the tweet is predicted by this nal Pos-Att-LSTM network. 2.3

Bidireccional LSTM Recurrent Neural Network

In NLP problems, standard LSTM receives sequentially (left to right order) at each time step a word embedding wt and produces a hidden state ht. Each hidden state ht is calculated as follow: Attention mechanism for aggressive detection 3 it = (W (i)xt + U (i)ht 1 + b(i)) ft = (W (f)xt + U (f)ht 1 + b(f)) ot = (W (o)xt + U (i)ht 1 + b(o)) ut = (W (u)xt + U (u)ht 1 + b(u)) ct = it ht = ot ut + ft tanh(ct)

Where all W ; U and b are parameters to be learned during training. The function is the sigmoid function and stands for element-wise multiplication.

The bidirectional LSTM, on the other hand, makes the same operations as standard LSTM but, processes the incoming text in a left-to-right and a rightto-left order in parallel. Thus, the output is a two hidden state at each time step !ht and ht .

The proposed method uses a Bidirectional LSTM network which considers ! each new hidden state as the concatenation of these two h^t = [ht ; ht ]. The idea of this Bi-LSTM is to capture long-range and backwards dependencies. With an attention mechanism we allow the Bi-LSTM to decide which part of the sentence should \attend". Importantly, we let the model learn what to attend on the basis of the input sentence and what it has produced so far.

Let H 2 R2 Nh Tx the matrix of hidden states [h^1; h^2; :::; h^Tx ] produced by the Bi-LSTM, where Nh is the size of the hidden state and Tx is the length of the given sentence. The goal is then to derive a context vector ct that captures relevant information and feed it as input to the next level (Pos-Att-LSTM). Each ct is calculate as follow:

Tx ct = X t0=1 t;t0 h^t0 t;i =

t;i PTx j=1 t;j = tanh(Wa [h^t; st 1] + ba)

Where Wa and ba are the trainable attention parameters, st 1 is the past hidden state of the Pos-Att-LSTM and h^t is the current hidden state. The idea of the concatenation layer is to take into account not only the input sentence but also the past hidden state to produce the attention weights. 2.5

Post-Attention LSTM

The goal of the Post-Att-LSTM is to predict whether the tweet is aggressive or not. This network at each time step receives the context vector ct which is propagated until the nal hidden state sTx . This vector is a high level representation of the tweet and is used in the nal softmax layer as follow: 4 y^ = sof tmax(Wg

sTx + bg)

Where Wq and bq are the parameters for the softmax layer. Finally, cross entropy is used as the loss function, which is de ned as:

L =

X yi log(y^i)

i Where yi is the true classi cation of the tweet. (1) 3

Results

Table 1 shows the results obtained by the proposed method on the aggressive class for two di erent runs (run1 and run2). A variation in the model was realized for run2, where a linguistic characteristic is added to tweets. This characteristic is based on the study carried out in the work [ 11 ], where the authors propose a methodology for the detection of obscene and vulgar phrases in Mexican tweets. In this way, the characteristic de nes the presence or not of obscene or vulgar words in the tweets according to the resource generated by the cited work. These results reveal that the linguistic characteristic incorporated in the second run marked performance improvement based on F-measure aggressive class, reaching the second position of the ranking among all the participants in the task. We propose an Attention-based Long Short-Term Memory Network Recurrent Neural Network for the IberEval 2018 track on aggressive detection in Twitter. The model consists of a bidirectional LSTM neural network with an attention mechanism that allows to estimate the importance of each word and then, this context vector is used with another LSTM model to estimate whether the tweet is aggressive or not. The results showed that the use of a linguistic characteristic based on the occurrence of obscene or vulgar phrases in the tweets allows to improve the F.measure of the aggressive class. Acknowledgments. The work of the third author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project.

1. Hochreiter , Sepp, and Jurgen Schmidhuber. Long Short-Term Memory . Neural Computation 9 ( 8 ), pp. 1735 { 1780 . ( 1997 ).

2. Yang , Min and Tu, Wenting and Wang, Jingxuan and Xu, Fei and Chen, Xiaojun. Attention Based LSTM for Target Dependent Sentiment Classi cation . In AAAI Conference on Arti cial Intelligence . pp. 5013 { 5014 . ( 2017 ).

3. Zhang, Yu and Zhang, Pengyuan and Yan, Yonghong. Attention-based LSTM with Multi-task Learning for Distant Speech Recognition . Proc. Interspeech 2017 . pp. 3857 { 3861 . ( 2017 ).

4. Wang , Yequan and Huang, Minlie and Zhao, Li and Zhu, Xiaoyan. Attention-based LSTM for Aspect-level Sentiment Classi cation . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing . pp. 606 { 615 . ( 2016 ).

5. Lin , Kai and Lin, Dazhen and Cao, Donglin. Sentiment Analysis Model Based on Structure Attention Mechanism. In UK Workshop on Computational Intelligence . pp. 17 { 27 . Springer, Cham ( 2017 ).

6. Rush , Alexander

and Chopra, Sumit and Weston, Jason. A Neural Attention Model for Abstractive Sentence Summarization . arXiv preprint arXiv:1509 . 00685 . ( 2015 ).

7. Yang , Zichao and Yang, Diyi and Dyer, Chris and He, Xiaodong and Smola, Alex and Hovy, Eduard. Hierarchical Attention Networks for Document Classi cation . In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . pp. 1480 { 1489 . ( 2016 ).

8. Alvarez-Carmona , Miguel A and Guzman-Falcon , Estefan a and Montes-y- Gomez , Manuel and Escalante, Hugo Jair and Villasen~or-Pineda, Luis and Reyes-Meza, Veronica and Rico-Sulayes, Antonio. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets . Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL) , Seville, Spain, September. ( 2018 ).

9. Padro , Llu s and Stanilovsky, Evgeny. Freeling 3 .0:

Towards

Wider Multilinguality . Proceedings of the Language Resources and Evaluation Conference (LREC 2012 ). ( 2013 ).

10. Mikolov , Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Je . Distributed Representations of Words and Phrases and their Compositionality . In Advances in Neural Information Processing Systems . pp. 3111 { 3119 . ( 2013 ).

11. Guzman , Estefania and Beltran, Beatriz and Tovar, Mireya and Vazquez, Andres and Mart nez, Rodolfo. Clasi cacion de Frases Obscenas o Vulgares dentro de Tweets. Research in Computing Science. 85 , pp. 65 { 74 . ( 2014 ).