=Paper=
{{Paper
|id=Vol-2664/mexa3t_paper8
|storemode=property
|title=UMUTeam at MEX-A3T'2020: Detecting Aggressiveness with Linguistic Features and Word Embeddings
|pdfUrl=https://ceur-ws.org/Vol-2664/mexa3t_paper8.pdf
|volume=Vol-2664
|authors=José Antonio García-Díaz,Rafael Valencia-García
|dblpUrl=https://dblp.org/rec/conf/sepln/Garcia-DiazV20
}}
==UMUTeam at MEX-A3T'2020: Detecting Aggressiveness with Linguistic Features and Word Embeddings==
<pdf width="1500px">https://ceur-ws.org/Vol-2664/mexa3t_paper8.pdf</pdf>
<pre>
UMUTeam at MEX-A3T’2020: Detecting
Aggressiveness with Linguistic Features and Word
Embeddings
José Antonio García-Díaza , Rafael Valencia-Garcíaa
a
    Facultad de Informática, Universidad de Murcia, Campus de Espinardo, 30100, Spain


                                         Abstract
                                         Social networks have become a dangerous place used by some people to harass others by taking advan-
                                         tage of the anonymity that the Internet provides. The consequences of this behaviour lead to long-term
                                         harm to victims, or in extreme cases, they can lead them to commit suicide. Due to the large volume of
                                         post created daily, manual supervision to prevent this harmful content becomes impossible. In this paper
                                         we describe our participation in MEX-A3T focused on aggressiveness identification in tweets written
                                         in Mexican Spanish. Our proposal is grounded in the combination of linguistic features and pre-trained
                                         word embeddings. In our first run, we use Support Vector Machines to train a combination of linguistic
                                         features and sentence embeddings; whereas in the second and third run the linguistic features are com-
                                         bined with two deep-learning models: a Convolutional Neural Network and a Bidirectional Long-Short
                                         Term Memory. Although our proposal does not beat the baseline, almost achieves the same results
                                         providing at the same time an interpretable model.

                                         Keywords
                                         Sentiment-Analysis, Text Classification, Deep-learning


1. Introduction
Some people use social networks as a means to harass others by taking advantage of anonymity
or the perceived social distance that the Internet provides [1]. The consequences of this
behaviour should not be taken as a joke, since they can cause long-term harm to victims and, in
extreme situations, lead them to severe isolation and even to commit suicide [2]. Due to the
large volume of posts created every day, manual supervision of harmful material becomes very
complicated, if not impossible, in some cases [3]. In order to prevent this harmful behaviour,
researchers have taken advantage of modern Natural Language Processing (NLP) techniques
for analysing and therefore improving automatic hate-speech detectors through the analysis
of large amounts of data. In line with this, many NLP workshops have proposed the creation
of hate-speech detectors. In HatEval [4], for example, the objective consisted in multilingual
detection of hate speech against immigrants and women. Other tasks have focused in specific
groups such as women. In this sense, AMI (Automatic Misogyny Identification) is a task focused
on spotting several misogynist traits proposed in [5] and [6] in available datasets composed

Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020)
email: joseantonio.garcia8@um.es (J.A. García-Díaz); valencia@um.es (R. Valencia-García)
orcid: 0000-0002-3651-2660 (J.A. García-Díaz); 0000-0003-2457-1791 (R. Valencia-García)
                                       © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
English, Spanish, and Italian.
   One of the reasons why hate-speech identification is hard is because is heavily cultural and
background dependant; even in countries that share the same language [7]. Therefore, in this
paper we describe our participation in the Aggressiveness Identification Track of MEX-A3T’2020
[8] which involved the identification of tweets written in Spanish-Mexican labelled as aggressive.
Our proposal is grounded on the usage of linguistic features and different forms of pre-trained
word embeddings. The rest of the paper is organised as follows. First, in Section 2, our proposal
and the different runs submitted are described. Then, in Section 3, we describe the achieved
results. Finally, the lessons learned and further work are described in Section 4.


2. System description
In the bibliography, several approaches can be found regarding Hate Speech identification. Some
of them are oriented towards specific forms of hate-speech, such as cyber-bullying [9, 10, 11],
misogyny identification [12, 13], or detecting hate-speech towards immigrants [14]. Others, as
the proposed by HateEval [4], included several target groups.
   The corpus consisted in tweets compiled from Mexico City. The training split is composed by
7332 tweets where 2110 of them were labelled as aggressive, and the remaining were labelled as
non-aggressive. We encode the tweets as lowercase, strip multiple white spaces, and remove the
hashtag symbol. Then, we extract the linguistic features with UMUTextStats [15], a tool inspired
in Linguistic Inquiry and Word Count (LIWC) [16]. Although LIWC is available in Spanish
[17, 18], it does not consider some Spanish linguistic phenomena such as grammatical gender or
a deep classification of part-of-speech, morphemes or suffixes. UMUTextStats manages a total
of 311 linguistic features organised into: (1) grammatical features, (2) morphological features,
(3) spelling and stylistic errors, (4) figurative language [19], (5) statistics regarding the sentence
type, (6) punctuation symbols, (7) topics, and (8) a great variety of positive and negative feelings.
   In our proposal, we combine linguistic features with word embeddings [20] and sentence
embeddings [21]. Word embeddings can be trained from news sites and public encyclopedias
to convey general semantic rules. For the first run, we use Weka [22] to train a Support
Vector Machine (SVM) with the combination of linguistic features and sentence embeddings.
Specifically, we use the open-source library LibLinear [23], based on efficient Support Vector
Classifier (SVC) with linear kernels. Sentence embeddings were obtained from pre-trained
word embeddings based on the Spanish model of fastText [24]. The model was trained with the
provided training dataset and evaluated with 10 fold-cross validation. For the second and third
run we use the functional API of Keras [25]. In both runs we combined the linguistic features
with pre-trained word embeddings with a Bidirectional Long-Short Term Memory (BiLSTM)
for the second run and a Convolutional Neural Network (CNN) for the third run. BiLSTM was
selected because it can handle long semantic dependencies and CNN was selected because it can
spot local information regardless their position, obtaining syntactic and semantic information
[26]. Both runs were trained with 10 epochs and a batch size of 32. In a nutshell, the layer
architecture can be described as follows: linguistic features are the input of a hidden layer
with a dimensionality of 10 and its output is concatenated to the output of CNN or BiLSTM
(depending on the run). Then, the combination of the outputs are connected sequentially to


                                                288
Table 1
Comparison of our runs with the two base-lines and the winner of the task
              Model         F1-OFF   F1-NON     F1 macro       P       R      ACC
              best-result   0.7998     0.9195     0.8596    0.8605   0.8588   0.8851
              baseline1     0.7124     0.8841     0.7983    0.7988   0.7988   0.8348
              baseline2     0.6760     0.8780     0.7770       -        -     0.8228
              run2          0.6727     0.8706     0.7716    0.7744   0.7691   0.8145
              run3          0.6516     0.8771     0.7644    0.7644   0.7503   0.8183
              run1          0.5892     0.8430     0.7161    0.7223   0.7112   0.7728


two more hidden dense layers (both with Rectified Linear Unit -ReLU- as activation function)
until the final output layer for binary prediction with a sigmoid as activation function. In case
of the run 2, the architecture of BiLSTM consists in a Bidirectional layer with a dropout of 0.2
and a recurrent dropout of 0.2 connected to a hidden layer with a dimensionality of 10 and
softmax as activation function. In case of the run 3, the architecture of CNN consisted in a
one-dimensional convolutional layer (Conv1D) with an output of 128 output, an 1D-window
size of 5, and a rectified linear unit -ReLU- as activation function. This layer is connected to a
GlobalMaxPool1D layer, and this is connected to two more hidden layers with a dimensionality
of 10.


3. Results
The organisers of MEX-A3T’2020 ranked the participants by using the F1 measure on the
aggressive class. They also created two baseline models. The first one is based on a Bag of Words
(BoW) model trained with a SVM, which achieves an F1 measure of the offensive class of 0.6760;
whereas the second baseline was trained with a Bidirectional Gated Unit Network (BiGRU),
which achieves an F1 measure of the offensive class of 0.7124. The comparison of our three runs
with the two baseline models and winner run are shown in Table 1. This table contains the F1
measure both of the offensive (F1-OFF) and the non-offensive (F1-NON) classes, as well as the
macro F1 measure (F1 macro), the precision (P), recall (R) and accuracy (ACC).
   Our best result was achieved by the run2 (BiLSTM+LF) which almost equals the second
baseline for the F1-offensive (0.6727 vs 0.6760) and run3 drops the F1-offensive slightly (0.6516
vs 0.6760); however, both runs where far from the first baseline. In order to understand which
are the most discriminating features, we calculated the Information Gain (IG) of the training
dataset (not showed). We observed that linguistic features related to negative sentiments are the
most discriminating ones, which includes offensive-language, sadness, anger, or anxiety among
others. Other discriminatory features were swear-words (vulgar expressions, but not necessarily
offensive). Twitter mentions and verbs in first person are also discriminating features, which
suggests that some of the menaces and offensive expressions that appear in the texts are made
in first person and towards specific people. It is worth noting that demonyms appear on the top
twenty linguistic features, which suggest that some of the tweets refers to people belonging to
specific places or ethnic groups.


                                                289
4. Conclusions
In this paper we have described our participation in the MEX-A3T task regarding aggressiveness
identification with the experiments that rely on linguistic features and different combinations
word embeddings; however, our best results do not outperform the baseline results for a minimal
difference. After an analysis of the results and the discriminating linguistic features, we achieve
the following insights: (1) aggressiveness in social networks are characterised by the usage of a
strong offensive language as well as misspelled words and linguistic errors; (2) the number and
relevance of verbs in first person in singular indicates that the threats are commonly performed
directly; (3) the number of twitter mentions and the appearance of demonyms do not make
clear if the targets are either ethnics groups or individual target; so a more detailed analysis of
the results is required.
   However, these findings should be taken with caution. As we observed, our proposal does
not improved any of the baselines proposed. Furthermore, we consider two main actions to
follow. First, as our proposal was designed with Spanish-European language in mind, it should
be adapted to handle different specific languages to manage cultural background. Second, we
will include state-of-the-art NLP techniques, such as BERT or ELMo, in order to evaluate the
classification accuracy.


Acknowledgments
This work has been supported by the Spanish National Research Agency (AEI) and the European
Regional Development Fund (FEDER/ERDF) through projects KBS4FIA (TIN2016-76323-R) and
LaTe4PSP (PID2019-107652RB-I00). In addition, José Antonio García-Díaz has been supported
by Banco Santander and University of Murcia through the Doctorado industrial programme.


References
 [1] N. Lapidot-Lefler, A. Barak, Effects of anonymity, invisibility, and lack of eye-contact
     on toxic online disinhibition, Computers in Human Behavior 28 (2012) 434 – 443. URL:
     http://www.sciencedirect.com/science/article/pii/S0747563211002317. doi:https://doi.
     org/10.1016/j.chb.2011.10.014.
 [2] G. S. O’Keeffe, K. Clarke-Pearson, et al., The impact of social media on children, adolescents,
     and families, Pediatrics 127 (2011) 800–804.
 [3] E. Chandrasekharan, U. Pavalanathan, A. Srinivasan, A. Glynn, J. Eisenstein, E. Gilbert,
     You can’t stay here: The efficacy of reddit’s 2015 ban examined through hate speech,
     Proceedings of the ACM on Human-Computer Interaction 1 (2017) 1–22.
 [4] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. R. Pardo, P. Rosso, M. Sanguinetti,
     Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women
     in twitter, in: Proceedings of the 13th International Workshop on Semantic Evaluation,
     2019, pp. 54–63.
 [5] E. Fersini, P. Rosso, M. Anzovino, Overview of the task on automatic misogyny identifica-
     tion at ibereval 2018., in: IberEval@ SEPLN, 2018, pp. 214–228.


                                               290
 [6] E. Fersini, D. Nozza, P. Rosso, Overview of the evalita 2018 task on automatic misogyny
     identification (ami), EVALITA Evaluation of NLP and Speech Tools for Italian 12 (2018) 59.
 [7] A. M. Croom, Spanish slurs and stereotypes for mexican-americans in the usa: A context-
     sensitive account of derogation and appropriation, Pragmática Sociocultural/Sociocultural
     Pragmatics 8 (2014) 145–179.
 [8] M. E. Aragón, H. Jarquín, M. Montes-y Gómez, H. J. Escalante, L. Villaseñor-Pineda,
     H. Gómez-Adorno, G. Bel-Enguix, J.-P. Posadas-Durán, Overview of mex-a3t at iberlef
     2020: Fake news and aggressiveness analysis in mexican spanish, in: Notebook Papers of
     2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain,
     September, 2020.
 [9] H. Rosa, N. Pereira, R. Ribeiro, P. C. Ferreira, J. P. Carvalho, S. Oliveira, L. Coheur, P. Paulino,
     A. V. Simão, I. Trancoso, Automatic cyberbullying detection: A systematic review, Com-
     puters in Human Behavior 93 (2019) 333–345.
[10] M. A. Al-garadi, K. D. Varathan, S. D. Ravana, Cybercrime detection in online communica-
     tions: The experimental case of cyberbullying detection in the twitter network, Computers
     in Human Behavior 63 (2016) 433–443.
[11] A. López-Martínez, J. A. García-Díaz, R. Valencia-García, A. Ruiz-Martínez, Cyberdect. a
     novel approach for cyberbullying detection on twitter, in: International Conference on
     Technologies and Innovation, Springer, 2019, pp. 109–121.
[12] E. Shushkevich, J. Cardiff, Automatic misogyny detection in social media: A survey,
     Computación y Sistemas 23 (2019). doi:10.13053/cys-23-4-3299.
[13] T. Lynn, P. T. Endo, P. Rosati, I. Silva, G. L. Santos, D. Ging, A comparison of machine
     learning approaches for detecting misogynistic speech in urban dictionary, in: 2019
     International Conference on Cyber Situational Awareness, Data Analytics And Assessment
     (Cyber SA), IEEE, 2019, pp. 1–8.
[14] A. Ben-David, A. M. Fernández, Hate speech and covert discrimination on social media:
     Monitoring the facebook pages of extreme-right political parties in spain, International
     Journal of Communication 10 (2016) 27.
[15] J. A. García-Díaz, M. Cánovas-García, R. Valencia-García, Ontology-driven aspect-based
     sentiment analysis classification: An infodemiological case study regarding infectious
     diseases in latin america, Future Generation Computer Systems 112 (2020) 614–657.
     doi:https://doi.org/10.1016/j.future.2020.06.019.
[16] Y. R. Tausczik, J. W. Pennebaker, The psychological meaning of words: Liwc and com-
     puterized text analysis methods, Journal of language and social psychology 29 (2010)
     24–54.
[17] M. del Pilar Salas-Zárate, M. A. Paredes-Valverde, M. Á. Rodríguez-García, R. Valencia-
     García, G. Alor-Hernández, Automatic detection of satire in twitter: A psycholinguistic-
     based approach, Knowl. Based Syst. 128 (2017) 20–33. URL: https://doi.org/10.1016/j.knosys.
     2017.04.009. doi:10.1016/j.knosys.2017.04.009.
[18] M. del Pilar Salas-Zárate, E. López-López, R. Valencia-García, N. Aussenac-Gilles, Á. Almela,
     G. Alor-Hernández, A study on LIWC categories for opinion mining in spanish reviews, J.
     Inf. Sci. 40 (2014) 749–760. URL: https://doi.org/10.1177/0165551514547842. doi:10.1177/
     0165551514547842.
[19] M. del Pilar Salas-Zárate, G. Alor-Hernández, J. L. Sánchez-Cervantes, M. A. Paredes-


                                                  291
     Valverde, J. L. García-Alcaraz, R. Valencia-García, Review of english literature on figurative
     language applied to social networks, Knowledge and Information Systems (2019) 1–33.
[20] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in
     vector space, arXiv preprint arXiv:1301.3781 (2013).
[21] C. Zhang, S. Sah, T. Nguyen, D. Peri, A. Loui, C. Salvaggio, R. W. Ptucha, Semantic sentence
     embeddings for paraphrasing and text summarization, CoRR abs/1809.10267 (2018). URL:
     http://arxiv.org/abs/1809.10267. arXiv:1809.10267.
[22] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, The weka data
     mining software: an update, ACM SIGKDD explorations newsletter 11 (2009) 10–18.
[23] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: A library for large
     linear classification, Journal of machine learning research 9 (2008) 1871–1874.
[24] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning word vectors for 157
     languages, arXiv preprint arXiv:1802.06893 (2018).
[25] F. Chollet, et al., Keras, https://github.com/fchollet/keras, 2015.
[26] M. A. Paredes-Valverde, R. C. Palacios, M. del Pilar Salas-Zárate, R. Valencia-García,
     Sentiment analysis in spanish for improvement of products and services: A deep learning
     approach, Scientific Programming 2017 (2017) 1329281:1–1329281:6. URL: https://doi.org/
     10.1155/2017/1329281. doi:10.1155/2017/1329281.


                                               292

</pre>