1. Introduction

Detecting Aggressiveness in Mexican Spanish Tweets with LSTM + GRU and LSTM + CNN Architectures

VictorPeñaloz a

0 0 RLICT: Research Laboratory in Information and Communication Technologies, Universidad Galileo , 7a. Avenida, calle Dr. Eduardo Suger Cofiño, Zona 10, Ciudad de Guatemala , Guatemala

280 286

This paper presents a description of our participation in MEX-A3T 2020 aggressiveness detection on the Spanish Mexican tweets track. The goal of this task is to analyze a corpus comprised of Spanish Mexican tweets and identify its aggressiveness level (aggressive or not). For this task, we proposed two architectures; the first one is a BiLSTM + GRU based, and the second is a BiLSTM + CNN based architecture. After experimenting and evaluating, our BiLSTM + CNN model achieves 63.88% on aggressive class F1-Score, and our BiLSTM + CNN model achieves 63.87% on aggressive class F1-Score.

eol>Aggressiveness Long Short Term Memory Gated Recurrent Unit Convolutional Neural Network Twitter Mexican Spanish text classification

1. Introduction 2. Data Preprocessing

Although supervised deep learning models can learn the main features from a dataset, the performance of such models depends on the quality of input da1ta]. [Previous sentiment analysis research on twitter-based corpus shows that various corpus-preprocessing techniques provide a significant improvement in model performance. Some techniques merely remove noise data, and others reduce terms and expressions to basic meani2n]g. [

2.1. Basic Data Preprocessing

For models described in this paper, the next steps were performed on the training data3]s:et [ 1. Lower case input text. 2. Remove URLs: URLs were encoded on the training data set <aUsRL>. 3. Remove accents, diaresis and tilde characters: Input text to NFKD to ASCII. 4. Remove numeric characters. 5. Remove single character and two-character elements.

6. Remove punctuation symbols.

2.2. Text Sequences Length

LSTM [ 4 ] and GRU [ 5 ] architectures are a proposal to learn long term dependencies. Despite the success of these architectures, there are concerns about the ability of these networks to manage such dependencies 6[]. Considering those, we decided to limit the length of text sequences looking to get a sequence length that preserves the relevant information about the tweet and reduces the model training time. Trimming was done by shortening at the end of each text sequence.

2.3. Lemmatization

Lemmatization makes a morphological analysis of words and tries to remove inflectional endings, returning words to their dictionary word. In previous research, the use of lemmatization outperforms base algorithms on language modelin7g].T[he pipeline used was: 1. Tokenization. 2. Multiword tokens expansion. 3. POS labeling. 4. Lemmatization.

For the previous pipeline, we used AnCora treebank, Spanish models, from Python Stanford NLP package [ 8 ].

2.4. Stop Words

We remove stop words using the Spanish corpus from open-source Natural Language Toolkit (NLTK) [ 9 ].

2.5. Word Vectors

As a word-level representation, we used pre-trained embedding vectors with Fast1T0e]xt [ library. Embedding vectors used were pre-trained on external Mexican Spanish tweets. This pretrained file contains 1,247.3M tokens with 100 dimensions each. These vectors were provided by the last MEX-A3T 2019 organizers1[ 1 ].

2.6. Balance Dataset

On un-balanced data sets, diferent categories were represented unequally. So the output model is not biased to learn features of the majority class in classification task use of over-sampling techniques on minority class was proposed previously to get a better classifier performance. SMOTE is an oversampling method, in which the minority class is over-sampled creating “synthetic” samples rather than by over-sampling with replaceme1n2t].[

MEX-A3T 2020 training corpus was not balanced; we applied the SMOTE method to get a corpus with aggressive and not-aggressive equally represented classes.

3. Systems Description

Recurrent networks have proven to be useful in natural language processing tasks for their ability to carry information from the pas1t3[]. On the other hand, convolutional neural networks have been used and showed promising results in diverse applications of natural language processing [ 14 ]. Additionally, the architecture used has proven to be efective on previous NLP classification tasks[15] and was altered to be adapted to this specific domain task.

This paper discussed two model’s performance with slightly diferent approaches. The first model (Fig.1) is comprised of an embedding input layer, followed by a spatial dropout that feeds a BiLSTM layer and a BiGRU layer respectively. Each of BiLSTM and BiGRU individual blocks feeds an independent global average polling layer and global max-pooling layer. The polling layers outputs are merged and followed by a dense layer with a ReLU activation function. Next batch normalization and dropout are applied. The last layer is dense with a SoftMax activation function.

The first model (BiLSTM + BiGRU) was trained using an Adam optimizer (learning rate = 3e-5, epsilon = 1e-8, norm clipping = 1.0), with sparse categorical loss entropy as a loss function, and was trained for 13 epochs.

The second model (Fig.2) is a slightly diferent version of the first model, but the BiGRU layer was replaced for a 1D convolutional layer, and was trained for 15 epoch1s.sThaobwlse in detail the values of the parameters used for each model.

4. Results

The oficial competition metric was the F1 score on aggressive class. Tab2leshows our results on MEX-A3T 2020 on the test dataset and results on an own test data set used to experiment on the modeling phase. Own test data set was created, taking 20% of content provided oficial training set. Additionally, Tab2leshows two baselines used by organizers to compare with

Bidirectional (CuDNNGRU) Bidirectional (CuDNNLSTM) Global Max

Pooling

1D Conv 1D

Global Average Pooling 1D Global Max

Pooling

1D Global Average Pooling 1D Global Max

Pooling

1D Global Average Pooling 1D

Global Average Pooling 1D Global Max

Pooling 1D

Concatenate Concatenate Concatenate

Concatenate Input Layer Embedding

Spatial Dropout1D

Batch Concatenate Dense Normalization Dropout Dense participating models, and some results from other participants ranked by a place on competition are shown too.

Based on the results, it should be noted that the two proposed architectures achieved similar performance. It can be observed that achieved results on the oficial test set not difer so much from results achieved on own test set. This indicates that chosen test data for the modeling phase represents well the proposed task dataset, and proposed models are not overfitting the training set.

We achieved 16th place with run 2 (BiLSTM + CNN). Although our results are lower than baselines models, this work shows a comparison between two proposed models on aggressiveness detection on Mexican Spanish tweets and leave possibilities open for architecture improvement with further research.

5. Conclusions and Future Work

In this work, we describe our participation in MEX-A3T@IberLEF2020, Aggressiveness Identification on Spanish Mexican Tweets Track3][.

We have shown two proposed architectures, first uses a BiLSTM + BiGRU combination as the base and second are BiLSTM + CNN combination based.

According to our experiment results, these two architectures show similar results on the aggressiveness detection task. Although proposed architectures achieved lower results compared to baseline models, it is possible to continue improving them, especially working on the corpus-preprocessing phase. We think that we have lost task-relevant information on tweets preprocessing phase that did not allow us to obtain better models performance.

Additionally, it would be worth to try other embedding vectors and dictionaries that represent better particular features of Mexican Spanish.

Acknowledgments

This work was supported by Facultad de Ingeniería de Sistemas, Informática y Ciencias de la Computación (FISICC) and Research Laboratory in Information and Communication Technologies (RLICT), both part of Universidad Galileo from Guatemala. Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1746–1751. UhRtLt:ps: //www.aclweb.org/anthology/D14-11.8d1oi:10.3115/v1/D14- 1181. [15] E. Garcia, Mercado libre data challengeh,ttps://github.com/eduagarcia/ meli-challenge-201,92019.

[1]

S. B.

Kotsiantis ,

Kanellopoulos ,

P. E.

Pintelas , Data preprocessing for supervised leaning , World Academy of Science , Engineering and Technology, International Journal of Computer , Electrical, Automation, Control and Information Engineering 1 ( 2007 ) 4104 - 4109 .

[2]

Angiani ,

Ferrari ,

Fontanini ,

Fornacciari ,

Iotti ,

Magliani ,

Manicardi , A comparison between preprocessing techniques for sentiment analysis in twitter , in: KDWeb, 2016 .

[3]

M. E.

Aragón ,

Jarquín ,

Montes-y Gómez ,

H. J.

Escalante ,

Villaseñor-Pineda ,

Gómez-Adorno ,

Bel-Enguix ,

J.-P.

Posadas-Durán , Overview of mex-a3t at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish , in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF) , Malaga, Spain, September, 2020 .

[4]

Hochreiter ,

Schmidhuber , Long short-term memory , Neural Computation 9 ( 1997 ) 1735 - 1780 .

[5]

Cho , B. van Merriënboer ,

Gulcehre ,

Bahdanau ,

Bougares ,

Schwenk ,

Bengio , Learning phrase representations using RNN encoder-decoder for statistical machine translation , in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Association for Computational Linguistics , Doha, Qatar, 2014 , pp. 1724 - 1734 . URL: https://www.aclweb.org/anthology/D14-11. 7d9oi : 10 .3115/v1/ D14 - 1179.

[6]

Zhao ,

Huang ,

Lv ,

Duan ,

Qin ,

Li , G. Tian, Do rnn and lstm have long memory? , 2020 . arXiv: 2006 .03860.

[7]

Balakrishnan ,

Lloyd-Yemoh , Stemming and lemmatization: A comparison of retrieval performances , in: Lecture Notes on Software Engineering , volume 2 , 2014 , pp. 262 - 267 .

[8]

Qi ,

Dozat ,

Zhang , C. D. Manning, Universal dependency parsing from scratch, in: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Association for Computational Linguistics , Brussels, Belgium, 2018 , pp. 160 - 170 . URL: https://nlp.stanford.edu/pubs/qi2018universal.p.df

[9]

Bird ,

Klein , E. Loper, Natural Language Processing with Python , 1st ed., O'Reilly Media , Inc., 2009 .

[10]

Bojanowski ,

Grave ,

Joulin , T. Mikolov, Enriching word vectors with subword information , arXiv preprint arXiv:1607.04606 ( 2016 ).

[11] INGEOTEC, FastText Word Embeddings for Spanish Language Variations , 2019 (accessed June 10, 2020 ). URL: https://github.com/INGEOTEC/RegionalEmbedding.s

[12]

N. V.

Chawla ,

K. W.

Bowyer ,

L. O.

Hall ,

W. P.

Kegelmeyer , Smote: Synthetic minority over-sampling technique , J. Artif. Intell. Res . 16 ( 2002 ) 321 - 357 .

[13]

Mikolov ,

Karafiát , L. Burget, Jan,

H. .

Černocký ,

Khudanpur , Recurrent neural network based language model ., in: In INTERSPEECH 2010 „ 2010 , pp. 1045 - 1048 .

[14]

Kim , Convolutional neural networks for sentence classification , in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),