<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Attention mechanism for aggressive detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carlos Enrique Mun~iz Cuza</string-name>
          <email>carlos@cerpamid.co.cu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gretel Liz De la Pen~a Sarracen</string-name>
          <email>gretel@cerpamid.co.cu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prosso@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Pattern Recognition and Data Mining</institution>
          ,
          <country country="CU">Cuba</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>PRHLT, Universitat Politecnica de Valencia</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>114</fpage>
      <lpage>118</lpage>
      <abstract>
        <p>This paper describes the system we developed for IberEval 2018 on Aggressive detection track of Authorship and Aggressiveness Analysis on Twitter (MEX-A3T)1. The task focuses on the detection of aggressive comments in tweets that come from Mexican users. Systems must be able to determine whether a tweet is aggressive or not. Our approach is an Attention-based Long Short-Term Memory Network Recurrent Neural Network where the attention layer helps to calculate the contribution of each word towards targeted aggressive classes. In particular, we build a Bidireccional LSTM to extract information from the word embeddings over the sentence, then apply attention over the hidden states to estimate the importance of each word and nally feed this context vector to another LSTM model to estimate whether the tweet is aggressive or not. The experimental results show that our model achieves outstanding results.</p>
      </abstract>
      <kwd-group>
        <kwd>Deep Learning</kwd>
        <kwd>Attention-based Neural Network</kwd>
        <kwd>LSTM Model</kwd>
        <kwd>Aggressive Detection Track</kwd>
        <kwd>Twitter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recurrent Neural Networks (RNNs) is a type of deep neural network designed
for sequence modeling. These kinds of models are greatly studied due to their
exibility in capturing nonlinear relationships. However, traditional RNNs su er
from problems known as exploding or vanishing gradients and, therefore, have
di culty in capturing long-term dependencies. Long Short-Term Memory
networks (LSTM) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are one of the most used in Natural Language Processing
(NLP) to overcome this limitation. They are able to learn the dependencies in
lengths of considerably large chains.
      </p>
      <p>
        Moreover, attention models have become an e ective mechanism to obtain
better results [2{6]. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the authors use a hierarchical attention network for
document classi cation. The model has two levels of attention mechanisms
applied at the word and sentence-level, enabling it to attend di erentially to more
and less important content when constructing the document representation. The
1 https://mexa3t.wixsite.com/home
2
experiments show that the architecture outperforms previous methods by a
substantial margin.
      </p>
      <p>
        In this paper, we propose a similar Attention-based LSTM for the IberEval
2018 track on Authorship and aggressiveness analysis in Twitter (MEX-A3T)
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The attention layer is applied on the top of a Bidirectional LSTM to generate
a context vector for each word embedding which is then fed to another LSTM
network to detect whether the tweet is aggressive or not. To the best of our
knowledge, there has been no other work exploring the use of attention-based
architectures for the task.
      </p>
      <p>The task focuses on the detection of aggressive comments. This is a topic
that has not been widely studied in the community. The aim is to determine
whether a tweet, which comes from Mexican users, is aggressive or not.</p>
      <p>The paper is organized as follows. Section 2 describes our system.
Experimental results are then discussed in Section 3. Finally, we present our conclusions
with a summary of our ndings in Section 4.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>System</title>
      <sec id="sec-2-1">
        <title>Preprocessing</title>
        <p>
          In the preprocessing step, the tweets are cleaned. Firstly, the emoticons are
recognized and replaced by corresponding words that express the sentiment they
convey. Also, we remove all links and urls. Afterwards, tweets are morphologically
analyzed by FreeLing [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In this way, for each resulting token, its lemma is
assigned. Then, the tweets are represented as vectors with a word embedding
model. This model was generated by using the word2vec algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] from the
Wikipedia collection in Spanish.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Method</title>
        <p>
          We propose a model that consists of a Bidirectional LSTM neural network
(BiLSTM) at the word level. At each time step t the Bi-LSTM gets as input a word
vector wt with syntactic and semantic information, known as word embedding
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Afterward, an attention layer is applied over each hidden state ht. The
attention weights are learned using the concatenation of the current hidden state
ht of the Bi-LSTM and the past hidden state st 1 of a Post-Attention LSTM
(Pos-Att-LSTM). Finally, the target aggressiveness of the tweet is predicted by
this nal Pos-Att-LSTM network.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Bidireccional LSTM Recurrent Neural Network</title>
        <p>In NLP problems, standard LSTM receives sequentially (left to right order) at
each time step a word embedding wt and produces a hidden state ht. Each
hidden state ht is calculated as follow:
Attention mechanism for aggressive detection
3
it = (W (i)xt + U (i)ht 1 + b(i))
ft = (W (f)xt + U (f)ht 1 + b(f))
ot = (W (o)xt + U (i)ht 1 + b(o))
ut = (W (u)xt + U (u)ht 1 + b(u))
ct = it
ht = ot
ut + ft
tanh(ct)</p>
        <p>Where all W ; U and b are parameters to be learned during training. The
function is the sigmoid function and stands for element-wise multiplication.</p>
        <p>The bidirectional LSTM, on the other hand, makes the same operations as
standard LSTM but, processes the incoming text in a left-to-right and a
rightto-left order in parallel. Thus, the output is a two hidden state at each time step
!ht and ht .</p>
        <p>The proposed method uses a Bidirectional LSTM network which considers
!
each new hidden state as the concatenation of these two h^t = [ht ; ht ]. The idea
of this Bi-LSTM is to capture long-range and backwards dependencies.
With an attention mechanism we allow the Bi-LSTM to decide which part of the
sentence should \attend". Importantly, we let the model learn what to attend
on the basis of the input sentence and what it has produced so far.</p>
        <p>Let H 2 R2 Nh Tx the matrix of hidden states [h^1; h^2; :::; h^Tx ] produced by
the Bi-LSTM, where Nh is the size of the hidden state and Tx is the length of
the given sentence. The goal is then to derive a context vector ct that captures
relevant information and feed it as input to the next level (Pos-Att-LSTM). Each
ct is calculate as follow:</p>
        <p>Tx
ct = X
t0=1
t;t0 h^t0
t;i =</p>
        <p>t;i
PTx
j=1 t;j
= tanh(Wa [h^t; st 1] + ba)</p>
        <p>Where Wa and ba are the trainable attention parameters, st 1 is the past
hidden state of the Pos-Att-LSTM and h^t is the current hidden state. The idea
of the concatenation layer is to take into account not only the input sentence
but also the past hidden state to produce the attention weights.
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Post-Attention LSTM</title>
        <p>The goal of the Post-Att-LSTM is to predict whether the tweet is aggressive or
not. This network at each time step receives the context vector ct which is
propagated until the nal hidden state sTx . This vector is a high level representation
of the tweet and is used in the nal softmax layer as follow:
4
y^ = sof tmax(Wg</p>
        <p>sTx + bg)</p>
        <p>Where Wq and bq are the parameters for the softmax layer. Finally, cross
entropy is used as the loss function, which is de ned as:</p>
        <p>L =</p>
        <p>X yi log(y^i)</p>
        <p>i
Where yi is the true classi cation of the tweet.
(1)
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>
        Table 1 shows the results obtained by the proposed method on the aggressive
class for two di erent runs (run1 and run2). A variation in the model was realized
for run2, where a linguistic characteristic is added to tweets. This characteristic
is based on the study carried out in the work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where the authors propose a
methodology for the detection of obscene and vulgar phrases in Mexican tweets.
In this way, the characteristic de nes the presence or not of obscene or vulgar
words in the tweets according to the resource generated by the cited work. These
results reveal that the linguistic characteristic incorporated in the second run
marked performance improvement based on F-measure aggressive class, reaching
the second position of the ranking among all the participants in the task.
We propose an Attention-based Long Short-Term Memory Network Recurrent
Neural Network for the IberEval 2018 track on aggressive detection in Twitter.
The model consists of a bidirectional LSTM neural network with an attention
mechanism that allows to estimate the importance of each word and then, this
context vector is used with another LSTM model to estimate whether the tweet
is aggressive or not. The results showed that the use of a linguistic characteristic
based on the occurrence of obscene or vulgar phrases in the tweets allows to
improve the F.measure of the aggressive class.
Acknowledgments. The work of the third author was partially supported by
the SomEMBED TIN2015-71147-C2-1-P MINECO research project.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hochreiter</surname>
          </string-name>
          , Sepp, and
          <article-title>Jurgen Schmidhuber. Long Short-Term Memory</article-title>
          .
          <source>Neural Computation</source>
          <volume>9</volume>
          (
          <issue>8</issue>
          ), pp.
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          . (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Min and Tu, Wenting and Wang, Jingxuan and Xu, Fei and Chen, Xiaojun. Attention Based LSTM for Target Dependent Sentiment Classi cation</article-title>
          .
          <source>In AAAI Conference on Arti cial Intelligence</source>
          . pp.
          <volume>5013</volume>
          {
          <fpage>5014</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Zhang, Yu and Zhang, Pengyuan and Yan, Yonghong.
          <article-title>Attention-based LSTM with Multi-task Learning for Distant Speech Recognition</article-title>
          .
          <source>Proc. Interspeech</source>
          <year>2017</year>
          . pp.
          <volume>3857</volume>
          {
          <fpage>3861</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Yequan and Huang, Minlie and Zhao, Li and Zhu, Xiaoyan. Attention-based LSTM for Aspect-level Sentiment Classi cation</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>606</volume>
          {
          <fpage>615</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lin</surname>
          </string-name>
          , Kai and Lin, Dazhen and Cao, Donglin.
          <source>Sentiment Analysis Model Based on Structure Attention Mechanism. In UK Workshop on Computational Intelligence</source>
          . pp.
          <volume>17</volume>
          {
          <fpage>27</fpage>
          . Springer, Cham (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rush</surname>
          </string-name>
          ,
          <string-name>
            <surname>Alexander</surname>
            <given-names>M</given-names>
          </string-name>
          and
          <article-title>Chopra, Sumit and Weston, Jason. A Neural Attention Model for Abstractive Sentence Summarization</article-title>
          .
          <source>arXiv preprint arXiv:1509</source>
          .
          <fpage>00685</fpage>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Zichao and Yang, Diyi and Dyer, Chris and He, Xiaodong and Smola, Alex and Hovy, Eduard. Hierarchical Attention Networks for Document Classi cation</article-title>
          .
          <source>In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>1480</volume>
          {
          <fpage>1489</fpage>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>Miguel A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Guzman-Falcon</surname>
          </string-name>
          ,
          <article-title>Estefan a and Montes-y-</article-title>
          <string-name>
            <surname>Gomez</surname>
          </string-name>
          ,
          <article-title>Manuel and Escalante, Hugo Jair and Villasen~or-Pineda, Luis and Reyes-Meza, Veronica and Rico-Sulayes, Antonio. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets</article-title>
          .
          <source>Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL)</source>
          , Seville, Spain,
          <string-name>
            <surname>September.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Padro</surname>
          </string-name>
          , Llu s and Stanilovsky,
          <source>Evgeny. Freeling 3</source>
          .0:
          <string-name>
            <given-names>Towards</given-names>
            <surname>Wider</surname>
          </string-name>
          <article-title>Multilinguality</article-title>
          .
          <source>Proceedings of the Language Resources and Evaluation Conference (LREC</source>
          <year>2012</year>
          ).
          <article-title>(</article-title>
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <article-title>Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Je . Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          . pp.
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          . (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Guzman</surname>
          </string-name>
          , Estefania and Beltran, Beatriz and Tovar, Mireya and Vazquez, Andres and Mart nez, Rodolfo. Clasi cacion de Frases Obscenas o Vulgares dentro de Tweets. Research in Computing Science.
          <volume>85</volume>
          , pp.
          <volume>65</volume>
          {
          <fpage>74</fpage>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>