<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Aggressiveness in Mexican Spanish Tweets with LSTM + GRU and LSTM + CNN Architectures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>VictorPeñaloz a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>RLICT: Research Laboratory in Information and Communication Technologies, Universidad Galileo</institution>
          ,
          <addr-line>7a. Avenida, calle Dr. Eduardo Suger Cofiño, Zona 10, Ciudad de Guatemala</addr-line>
          ,
          <country country="GT">Guatemala</country>
        </aff>
      </contrib-group>
      <fpage>280</fpage>
      <lpage>286</lpage>
      <abstract>
        <p>This paper presents a description of our participation in MEX-A3T 2020 aggressiveness detection on the Spanish Mexican tweets track. The goal of this task is to analyze a corpus comprised of Spanish Mexican tweets and identify its aggressiveness level (aggressive or not). For this task, we proposed two architectures; the first one is a BiLSTM + GRU based, and the second is a BiLSTM + CNN based architecture. After experimenting and evaluating, our BiLSTM + CNN model achieves 63.88% on aggressive class F1-Score, and our BiLSTM + CNN model achieves 63.87% on aggressive class F1-Score.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Aggressiveness</kwd>
        <kwd>Long Short Term Memory</kwd>
        <kwd>Gated Recurrent Unit</kwd>
        <kwd>Convolutional Neural Network</kwd>
        <kwd>Twitter</kwd>
        <kwd>Mexican Spanish text classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Data Preprocessing</title>
      <p>Although supervised deep learning models can learn the main features from a dataset, the
performance of such models depends on the quality of input da1ta]. [Previous sentiment
analysis research on twitter-based corpus shows that various corpus-preprocessing techniques
provide a significant improvement in model performance. Some techniques merely remove
noise data, and others reduce terms and expressions to basic meani2n]g. [</p>
      <sec id="sec-2-1">
        <title>2.1. Basic Data Preprocessing</title>
        <p>For models described in this paper, the next steps were performed on the training data3]s:et [
1. Lower case input text.
2. Remove URLs: URLs were encoded on the training data set &lt;aUsRL&gt;.
3. Remove accents, diaresis and tilde characters: Input text to NFKD to ASCII.
4. Remove numeric characters.
5. Remove single character and two-character elements.</p>
        <p>6. Remove punctuation symbols.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Text Sequences Length</title>
        <p>
          LSTM [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and GRU [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] architectures are a proposal to learn long term dependencies. Despite the
success of these architectures, there are concerns about the ability of these networks to manage
such dependencies 6[]. Considering those, we decided to limit the length of text sequences
looking to get a sequence length that preserves the relevant information about the tweet and
reduces the model training time. Trimming was done by shortening at the end of each text
sequence.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Lemmatization</title>
        <p>Lemmatization makes a morphological analysis of words and tries to remove inflectional endings,
returning words to their dictionary word. In previous research, the use of lemmatization
outperforms base algorithms on language modelin7g].T[he pipeline used was:
1. Tokenization.
2. Multiword tokens expansion.
3. POS labeling.
4. Lemmatization.</p>
        <p>
          For the previous pipeline, we used AnCora treebank, Spanish models, from Python Stanford
NLP package [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Stop Words</title>
        <p>
          We remove stop words using the Spanish corpus from open-source Natural Language Toolkit
(NLTK) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Word Vectors</title>
        <p>
          As a word-level representation, we used pre-trained embedding vectors with Fast1T0e]xt [
library. Embedding vectors used were pre-trained on external Mexican Spanish tweets. This
pretrained file contains 1,247.3M tokens with 100 dimensions each. These vectors were provided
by the last MEX-A3T 2019 organizers1[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Balance Dataset</title>
        <p>On un-balanced data sets, diferent categories were represented unequally. So the output model
is not biased to learn features of the majority class in classification task use of over-sampling
techniques on minority class was proposed previously to get a better classifier performance.
SMOTE is an oversampling method, in which the minority class is over-sampled creating
“synthetic” samples rather than by over-sampling with replaceme1n2t].[</p>
        <p>MEX-A3T 2020 training corpus was not balanced; we applied the SMOTE method to get a
corpus with aggressive and not-aggressive equally represented classes.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Systems Description</title>
      <p>
        Recurrent networks have proven to be useful in natural language processing tasks for their ability
to carry information from the pas1t3[]. On the other hand, convolutional neural networks
have been used and showed promising results in diverse applications of natural language
processing [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Additionally, the architecture used has proven to be efective on previous NLP
classification tasks[15] and was altered to be adapted to this specific domain task.
      </p>
      <p>This paper discussed two model’s performance with slightly diferent approaches. The first
model (Fig.1) is comprised of an embedding input layer, followed by a spatial dropout that feeds
a BiLSTM layer and a BiGRU layer respectively. Each of BiLSTM and BiGRU individual blocks
feeds an independent global average polling layer and global max-pooling layer. The polling
layers outputs are merged and followed by a dense layer with a ReLU activation function. Next
batch normalization and dropout are applied. The last layer is dense with a SoftMax activation
function.</p>
      <p>The first model (BiLSTM + BiGRU) was trained using an Adam optimizer (learning rate =
3e-5, epsilon = 1e-8, norm clipping = 1.0), with sparse categorical loss entropy as a loss function,
and was trained for 13 epochs.</p>
      <p>The second model (Fig.2) is a slightly diferent version of the first model, but the BiGRU
layer was replaced for a 1D convolutional layer, and was trained for 15 epoch1s.sThaobwlse
in detail the values of the parameters used for each model.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The oficial competition metric was the F1 score on aggressive class. Tab2leshows our results
on MEX-A3T 2020 on the test dataset and results on an own test data set used to experiment
on the modeling phase. Own test data set was created, taking 20% of content provided oficial
training set. Additionally, Tab2leshows two baselines used by organizers to compare with</p>
      <p>Bidirectional
(CuDNNGRU)
Bidirectional 
(CuDNNLSTM) Global Max </p>
      <p>Pooling</p>
      <p>1D
Conv 1D</p>
      <p>Global 
Average 
Pooling 1D
Global Max </p>
      <p>Pooling</p>
      <p>1D
Global 
Average 
Pooling 1D
Global Max </p>
      <p>Pooling</p>
      <p>1D
Global 
Average 
Pooling 1D</p>
      <p>Global 
Average 
Pooling 1D
Global Max </p>
      <p>Pooling
1D</p>
      <p>Concatenate
Concatenate
Concatenate</p>
      <p>Concatenate
Input Layer Embedding</p>
      <p>Spatial
Dropout1D</p>
      <p>Batch
Concatenate Dense Normalization Dropout Dense
participating models, and some results from other participants ranked by a place on competition
are shown too.</p>
      <p>Based on the results, it should be noted that the two proposed architectures achieved similar
performance. It can be observed that achieved results on the oficial test set not difer so much
from results achieved on own test set. This indicates that chosen test data for the modeling
phase represents well the proposed task dataset, and proposed models are not overfitting the
training set.</p>
      <p>We achieved 16th place with run 2 (BiLSTM + CNN). Although our results are lower than
baselines models, this work shows a comparison between two proposed models on
aggressiveness detection on Mexican Spanish tweets and leave possibilities open for architecture
improvement with further research.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this work, we describe our participation in MEX-A3T@IberLEF2020, Aggressiveness
Identification on Spanish Mexican Tweets Track3][.</p>
      <p>We have shown two proposed architectures, first uses a BiLSTM + BiGRU combination as the
base and second are BiLSTM + CNN combination based.</p>
      <p>According to our experiment results, these two architectures show similar results on the
aggressiveness detection task. Although proposed architectures achieved lower results
compared to baseline models, it is possible to continue improving them, especially working on the
corpus-preprocessing phase. We think that we have lost task-relevant information on tweets
preprocessing phase that did not allow us to obtain better models performance.</p>
      <p>Additionally, it would be worth to try other embedding vectors and dictionaries that represent
better particular features of Mexican Spanish.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by Facultad de Ingeniería de Sistemas, Informática y Ciencias de la
Computación (FISICC) and Research Laboratory in Information and Communication
Technologies (RLICT), both part of Universidad Galileo from Guatemala.
Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1746–1751. UhRtLt:ps:
//www.aclweb.org/anthology/D14-11.8d1oi:10.3115/v1/D14- 1181.
[15] E. Garcia, Mercado libre data challengeh,ttps://github.com/eduagarcia/
meli-challenge-201,92019.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Kotsiantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kanellopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Pintelas</surname>
          </string-name>
          ,
          <article-title>Data preprocessing for supervised leaning</article-title>
          ,
          <source>World Academy of Science</source>
          , Engineering and Technology,
          <source>International Journal of Computer</source>
          , Electrical, Automation,
          <source>Control and Information Engineering</source>
          <volume>1</volume>
          (
          <year>2007</year>
          )
          <fpage>4104</fpage>
          -
          <lpage>4109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Angiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ferrari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fontanini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fornacciari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Iotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Magliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manicardi</surname>
          </string-name>
          ,
          <article-title>A comparison between preprocessing techniques for sentiment analysis in twitter</article-title>
          , in: KDWeb,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Aragón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jarquín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes-y Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Villaseñor-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Posadas-Durán</surname>
          </string-name>
          ,
          <article-title>Overview of mex-a3t at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish</article-title>
          ,
          <source>in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF)</source>
          , Malaga, Spain, September,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural Computation</source>
          <volume>9</volume>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bougares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Learning phrase representations using RNN encoder-decoder for statistical machine translation</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <year>2014</year>
          , pp.
          <fpage>1724</fpage>
          -
          <lpage>1734</lpage>
          . URL: https://www.aclweb.org/anthology/D14-11.
          <year>7d9oi</year>
          :
          <fpage>10</fpage>
          .3115/v1/
          <fpage>D14</fpage>
          - 1179.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Tian, Do rnn and lstm have long memory?</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2006</year>
          .03860.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lloyd-Yemoh</surname>
          </string-name>
          ,
          <article-title>Stemming and lemmatization: A comparison of retrieval performances</article-title>
          ,
          <source>in: Lecture Notes on Software Engineering</source>
          , volume
          <volume>2</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>262</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dozat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. D. Manning,
          <article-title>Universal dependency parsing from scratch, in: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Association for Computational Linguistics</article-title>
          , Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>160</fpage>
          -
          <lpage>170</lpage>
          . URL: https://nlp.stanford.edu/pubs/qi2018universal.p.df
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Klein</surname>
          </string-name>
          , E. Loper,
          <source>Natural Language Processing with Python</source>
          , 1st ed.,
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information</article-title>
          ,
          <source>arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] INGEOTEC,
          <source>FastText Word Embeddings for Spanish Language Variations</source>
          ,
          <year>2019</year>
          (accessed June 10,
          <year>2020</year>
          ). URL: https://github.com/INGEOTEC/RegionalEmbedding.s
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Bowyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. P.</given-names>
            <surname>Kegelmeyer</surname>
          </string-name>
          , Smote:
          <article-title>Synthetic minority over-sampling technique</article-title>
          ,
          <source>J. Artif. Intell. Res</source>
          .
          <volume>16</volume>
          (
          <year>2002</year>
          )
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karafiát</surname>
          </string-name>
          , L. Burget, Jan,
          <string-name>
            <given-names>H. .</given-names>
            <surname>Černocký</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khudanpur</surname>
          </string-name>
          ,
          <article-title>Recurrent neural network based language model</article-title>
          .,
          <source>in: In INTERSPEECH</source>
          <year>2010</year>
          „
          <year>2010</year>
          , pp.
          <fpage>1045</fpage>
          -
          <lpage>1048</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Convolutional neural networks for sentence classification</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>