=Paper=
{{Paper
|id=Vol-2150/MEX-A3T_paper_4
|storemode=property
|title=Attention Mechanism for Aggressive Detection
|pdfUrl=https://ceur-ws.org/Vol-2150/MEX-A3T_paper4.pdf
|volume=Vol-2150
|authors=Carlos Enrique Muñiz Cuza,Gretel Liz De la Peña Sarracén,Paolo Rosso
|dblpUrl=https://dblp.org/rec/conf/sepln/CuzaSR18
}}
==Attention Mechanism for Aggressive Detection==
Attention mechanism for aggressive detection Carlos Enrique Muñiz Cuza1 , Gretel Liz De la Peña Sarracén1 , Paolo Rosso2 1 Center for Pattern Recognition and Data Mining, Cuba {carlos,gretel}@cerpamid.co.cu 2 PRHLT, Universitat Politècnica de València, Spain prosso@dsic.upv.es Abstract. This paper describes the system we developed for IberEval 2018 on Aggressive detection track of Authorship and Aggressiveness Analysis on Twitter (MEX-A3T)1 . The task focuses on the detection of aggressive comments in tweets that come from Mexican users. Systems must be able to determine whether a tweet is aggressive or not. Our approach is an Attention-based Long Short-Term Memory Network Re- current Neural Network where the attention layer helps to calculate the contribution of each word towards targeted aggressive classes. In par- ticular, we build a Bidireccional LSTM to extract information from the word embeddings over the sentence, then apply attention over the hid- den states to estimate the importance of each word and finally feed this context vector to another LSTM model to estimate whether the tweet is aggressive or not. The experimental results show that our model achieves outstanding results. Keywords: Deep Learning, Attention-based Neural Network, LSTM Model, Aggressive Detection Track, Twitter 1 Introduction Recurrent Neural Networks (RNNs) is a type of deep neural network designed for sequence modeling. These kinds of models are greatly studied due to their flexibility in capturing nonlinear relationships. However, traditional RNNs suffer from problems known as exploding or vanishing gradients and, therefore, have difficulty in capturing long-term dependencies. Long Short-Term Memory net- works (LSTM) [1] are one of the most used in Natural Language Processing (NLP) to overcome this limitation. They are able to learn the dependencies in lengths of considerably large chains. Moreover, attention models have become an effective mechanism to obtain better results [2–6]. In [7], the authors use a hierarchical attention network for document classification. The model has two levels of attention mechanisms ap- plied at the word and sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. The 1 https://mexa3t.wixsite.com/home Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 2 Carlos E. Muñiz, Gretel L. De la Peña, Paolo Rosso experiments show that the architecture outperforms previous methods by a sub- stantial margin. In this paper, we propose a similar Attention-based LSTM for the IberEval 2018 track on Authorship and aggressiveness analysis in Twitter (MEX-A3T) [8]. The attention layer is applied on the top of a Bidirectional LSTM to generate a context vector for each word embedding which is then fed to another LSTM network to detect whether the tweet is aggressive or not. To the best of our knowledge, there has been no other work exploring the use of attention-based architectures for the task. The task focuses on the detection of aggressive comments. This is a topic that has not been widely studied in the community. The aim is to determine whether a tweet, which comes from Mexican users, is aggressive or not. The paper is organized as follows. Section 2 describes our system. Experimen- tal results are then discussed in Section 3. Finally, we present our conclusions with a summary of our findings in Section 4. 2 System 2.1 Preprocessing In the preprocessing step, the tweets are cleaned. Firstly, the emoticons are recognized and replaced by corresponding words that express the sentiment they convey. Also, we remove all links and urls. Afterwards, tweets are morphologically analyzed by FreeLing [9]. In this way, for each resulting token, its lemma is assigned. Then, the tweets are represented as vectors with a word embedding model. This model was generated by using the word2vec algorithm [10] from the Wikipedia collection in Spanish. 2.2 Method We propose a model that consists of a Bidirectional LSTM neural network (Bi- LSTM) at the word level. At each time step t the Bi-LSTM gets as input a word vector wt with syntactic and semantic information, known as word embedding [10]. Afterward, an attention layer is applied over each hidden state ht . The at- tention weights are learned using the concatenation of the current hidden state ht of the Bi-LSTM and the past hidden state st−1 of a Post-Attention LSTM (Pos-Att-LSTM). Finally, the target aggressiveness of the tweet is predicted by this final Pos-Att-LSTM network. 2.3 Bidireccional LSTM Recurrent Neural Network In NLP problems, standard LSTM receives sequentially (left to right order) at each time step a word embedding wt and produces a hidden state ht . Each hidden state ht is calculated as follow: 115 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Attention mechanism for aggressive detection 3 it = σ(W (i) xt + U (i) ht−1 + b(i) ) (input gate) (f ) (f ) (f ) ft = σ(W xt + U ht−1 + b ) (forget gate) (o) (i) (o) ot = σ(W xt + U ht−1 + b ) (output gate) ut = σ(W (u) xt + U (u) ht−1 + b(u) ) (new memory cell) ct = it ⊗ ut + ft ⊗ ct−1 (final memory cell) ht = ot ⊗ tanh(ct ) Where all W∗ , U∗ and b∗ are parameters to be learned during training. The function σ is the sigmoid function and ⊗ stands for element-wise multiplication. The bidirectional LSTM, on the other hand, makes the same operations as standard LSTM but, processes the incoming text in a left-to-right and a right- to-left order in parallel. Thus, the output is a two hidden state at each time step → − ← − ht and ht . The proposed method uses a Bidirectional LSTM network which considers → − ← − each new hidden state as the concatenation of these two hˆt = [ ht , ht ]. The idea of this Bi-LSTM is to capture long-range and backwards dependencies. 2.4 Attention Layer With an attention mechanism we allow the Bi-LSTM to decide which part of the sentence should “attend”. Importantly, we let the model learn what to attend on the basis of the input sentence and what it has produced so far. Let H ∈ R2∗Nh ×Tx the matrix of hidden states [hˆ1 , hˆ2 , ..., hˆTx ] produced by the Bi-LSTM, where Nh is the size of the hidden state and Tx is the length of the given sentence. The goal is then to derive a context vector ct that captures relevant information and feed it as input to the next level (Pos-Att-LSTM). Each ct is calculate as follow: Tx X βt,i ct = αt,t0 hˆt0 αt,i = PTx β = tanh(Wa ∗ [hˆt , st−1 ] + ba ) t0 =1 j=1 βt,j Where Wa and ba are the trainable attention parameters, st−1 is the past hidden state of the Pos-Att-LSTM and hˆt is the current hidden state. The idea of the concatenation layer is to take into account not only the input sentence but also the past hidden state to produce the attention weights. 2.5 Post-Attention LSTM The goal of the Post-Att-LSTM is to predict whether the tweet is aggressive or not. This network at each time step receives the context vector ct which is prop- agated until the final hidden state sTx . This vector is a high level representation of the tweet and is used in the final softmax layer as follow: 116 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) 4 Carlos E. Muñiz, Gretel L. De la Peña, Paolo Rosso ŷ = sof tmax(Wg ∗ sTx + bg ) Where Wq and bq are the parameters for the softmax layer. Finally, cross entropy is used as the loss function, which is defined as: X L=− yi ∗ log(yˆi ) (1) i Where yi is the true classification of the tweet. 3 Results Table 1 shows the results obtained by the proposed method on the aggressive class for two different runs (run1 and run2). A variation in the model was realized for run2, where a linguistic characteristic is added to tweets. This characteristic is based on the study carried out in the work [11], where the authors propose a methodology for the detection of obscene and vulgar phrases in Mexican tweets. In this way, the characteristic defines the presence or not of obscene or vulgar words in the tweets according to the resource generated by the cited work. These results reveal that the linguistic characteristic incorporated in the second run marked performance improvement based on F-measure aggressive class, reaching the second position of the ranking among all the participants in the task. Table 1. Performance on the testing set Run F-measure Precision Recall run 1 0.3091 0.5724 0.2117 run 2 0.45 0.3815 0.5485 4 Conclusion We propose an Attention-based Long Short-Term Memory Network Recurrent Neural Network for the IberEval 2018 track on aggressive detection in Twitter. The model consists of a bidirectional LSTM neural network with an attention mechanism that allows to estimate the importance of each word and then, this context vector is used with another LSTM model to estimate whether the tweet is aggressive or not. The results showed that the use of a linguistic characteristic based on the occurrence of obscene or vulgar phrases in the tweets allows to improve the F.measure of the aggressive class. 117 Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Attention mechanism for aggressive detection 5 Acknowledgments. The work of the third author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project. References 1. Hochreiter, Sepp, and Jürgen Schmidhuber. Long Short-Term Memory. Neural Com- putation 9(8), pp. 1735–1780. (1997). 2. Yang, Min and Tu, Wenting and Wang, Jingxuan and Xu, Fei and Chen, Xiaojun. Attention Based LSTM for Target Dependent Sentiment Classification. In AAAI Conference on Artificial Intelligence. pp. 5013–5014. (2017). 3. Zhang, Yu and Zhang, Pengyuan and Yan, Yonghong. Attention-based LSTM with Multi-task Learning for Distant Speech Recognition. Proc. Interspeech 2017. pp. 3857–3861. (2017). 4. Wang, Yequan and Huang, Minlie and Zhao, Li and Zhu, Xiaoyan. Attention-based LSTM for Aspect-level Sentiment Classification. In Proceedings of the 2016 Confer- ence on Empirical Methods in Natural Language Processing. pp. 606–615. (2016). 5. Lin, Kai and Lin, Dazhen and Cao, Donglin. Sentiment Analysis Model Based on Structure Attention Mechanism. In UK Workshop on Computational Intelligence. pp. 17–27. Springer, Cham (2017). 6. Rush, Alexander M and Chopra, Sumit and Weston, Jason. A Neural Attention Model for Abstractive Sentence Summarization. arXiv preprint arXiv:1509.00685. (2015). 7. Yang, Zichao and Yang, Diyi and Dyer, Chris and He, Xiaodong and Smola, Alex and Hovy, Eduard. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies. pp. 1480–1489. (2016). 8. Álvarez-Carmona, Miguel Á and Guzmán-Falcón, Estefanı́a and Montes-y-Gómez, Manuel and Escalante, Hugo Jair and Villaseñor-Pineda, Luis and Reyes-Meza, Verónica and Rico-Sulayes, Antonio. Overview of MEX-A3T at IberEval 2018: Au- thorship and aggressiveness analysis in Mexican Spanish tweets. Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL), Seville, Spain, September. (2018). 9. Padró, Lluı́s and Stanilovsky, Evgeny. Freeling 3.0: Towards Wider Multilinguality. Proceedings of the Language Resources and Evaluation Conference (LREC 2012). (2013). 10. Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems. pp. 3111–3119. (2013). 11. Guzmán, Estefania and Beltrán, Beatriz and Tovar, Mireya and Vázquez, Andrés and Martı́nez, Rodolfo. Clasificación de Frases Obscenas o Vulgares dentro de Tweets. Research in Computing Science. 85, pp. 65–74. (2014). 118