<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting O ensiveness in Social Network Comments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marta Navarron Garc a</string-name>
          <email>martanavarron@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel Segura Bedmar</string-name>
          <email>isegura@inf.uc3m.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department University Carlos III of Madrid Avenida de la Universidad 30</institution>
          ,
          <addr-line>28911, Leganes, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Social media undoubtedly has a signi cant in uence on our lives. Although there are exists many advantages, there are also some disadvantages of social media on society, particularly youth. A very large number of social media users are subjected to di erent types of abuse (such as harassment, racism, personal attacks) everyday. The main goal of MeO endEs@IberLEF 2021 is to promote research on the analysis of o ensive language in social networks for Spanish. This paper describes our participation in the shared task of MeO endEs@IberLEF 2021 [40]. We have explored di erent deep learning models such as Long-Short Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT) and also traditional Machine Learning models as Logistic Regression or Support Vector Machine (SVM), among others, to classify the comments (written in Spanish) into the four classes de ned in the O endEs corpus. The results of our experiments show that BERT obtains the best results among all of our models.</p>
      </abstract>
      <kwd-group>
        <kwd>Multi-Class Text Classi cation</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Long-Short Term Memory</kwd>
        <kwd>Bidirectional Encoder Representations from Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the last few years, social networks has become a way of life for many people.
The people use them to express themselves, make themselves known, advertise,
or simply socialise with other people, becoming a tool where there are always
opinions and comments to the publications that are made on these platforms.
But although it is a way of expression, there are always comments that can
become o ensive to a group of people or to a speci c person or user, becoming
a tool of threat which can produce long-term harm to victims.</p>
      <p>Among them, YouTube, Instagram and Twitter are ones of the most famous
social network having millions of active users around the world. In the case of
Twitter, it allows the user to send or receive small posts called tweets. Tweets are
comments, mostly sentences which are not more than 280 characters in which a
user posts his opinion or comments on a particular topic. Moreover, the post can
include images, videos, links or references to other users. In the case of YouTube,
it is a website dedicated to sharing videos, where users can comment and share
di erent opinions. Instagram is also a social network whose main function is to
share photos and short videos with other users, who can also comment them.
Although these social networks already have some measures in place to avoid
inappropriate comments or images that may be caused harm to other users, most
times they are neither very robust nor very fast to detect all these comments.</p>
      <p>
        There are studies in the eld of social networks in which NLP is used to
analyse the behaviour of di erent user pro les or opinions, as well as the detection
of user behaviour or trends. For example, there are studies in which it is possible
to observe and predict the favorability of users with a political group based on
the comments [
        <xref ref-type="bibr" rid="ref21">33</xref>
        ], also there are others related to the eld of mental health,
in which it is possible to detect the level of depression based on comments on
Twitter [
        <xref ref-type="bibr" rid="ref19">31</xref>
        ], and many others.
      </p>
      <p>Thus, NLP can be used to analyse social media. The goal of this work is to
explore di erent NLP and machine learning techniques to detect and classify the
o ensiveness that a tweet or a comment could have. This task can be viewed as
a task of sentiment analysis, which is the process of detecting polarity, feelings
or even intentions in texts.</p>
      <p>
        This work describes our participation in the shared task of MeO endEs
IberLEF 2021 [
        <xref ref-type="bibr" rid="ref28">40</xref>
        ], which aims the analysis of o ensive language in social networks
for Spanish. Although the task has four subtasks, where di erent scenarios are
proposed, we have only participated on the rst task, where the goal is to classify
the comments (written in Spanish) into the four classes de ned in the O endEs
corpus. We explore di erent deep learning models such as Long-Short Term
Memory (LSTM) [
        <xref ref-type="bibr" rid="ref11">23</xref>
        ], Bidirectional Encoder Representations from
Transformers (BERT) [17], and also traditional Machine Learning models as Super Vector
Machines or Logistic Regression, among others. Our approaches only uses the
text, without exploiting any contextual information from the users and the
related social media.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        In the last years, the detection of toxic content in social media has received
considerable attention from the NLP community [
        <xref ref-type="bibr" rid="ref27">39</xref>
        ], [
        <xref ref-type="bibr" rid="ref35">47</xref>
        ], [52]. Most existing
approaches have been built on classical machine learning (ML) techniques [19],
[
        <xref ref-type="bibr" rid="ref36">48</xref>
        ], [
        <xref ref-type="bibr" rid="ref31">43</xref>
        ], however recently deep learning methods [
        <xref ref-type="bibr" rid="ref11">23</xref>
        ],[17], [
        <xref ref-type="bibr" rid="ref11">23</xref>
        ], [15] have been
also applied to the task. In this section, we review some of the main studies of
toxic language detection in social media.
      </p>
      <p>[13] represented texts with a set of lexical and syntactic features. SVM and
Nave Bayes were used for two di erent tasks, detect o ensive content and
identify potential o ensive users in social media, being SVM the best classi er with
an F1 of 96.2% for the task of detecting o ensive texts and an F1 of 77.8% for
the task of identifying potential o ensive users.</p>
      <p>
        [9] explored several classical machine learning algorithms to detect abusive
language (racism, sexism, hate speech, aggression and personal attacks). The
authors used the Bag-Of-Words mode [53] to represent the texts. The Nave Bayes
algorithm obtained the top F1-Score (81.85%) on the Wikipedia talk dataset [
        <xref ref-type="bibr" rid="ref18">30</xref>
        ].
      </p>
      <p>
        [12] also used an SVM with linear kernel and FastText [
        <xref ref-type="bibr" rid="ref14">26</xref>
        ], a library for
text classi cation based on a neural network, which only has one-hidden layer.
The authors only provided recall scores. The experiments showed that SVM
outperfomed FastText for the task of abusive language detection.
      </p>
      <p>In [20], the authors created their own dataset of tweets annotated with ve
categories to classify the level of harassment of each tweet. The categories are:
1) most o ensive or violent messages, 2) threats, 3) hate speech, 4) directed
Harassment, and 5) potentially o ensive.</p>
      <p>
        In [10], several classical machine learning algorithms (such as logistic
regression, multinomial Nave Bayes, and random fores) were applied techniques to
detect abusive comments. The authors used TF-IDF to represent texts. They
also studied a bidirectional long short-term memory (BiLSTM) [
        <xref ref-type="bibr" rid="ref11">23</xref>
        ] to the task.
      </p>
      <p>
        [54] explored some of the most popular language models based on
transformers [54] (such as BERT [17], RoBERTa [
        <xref ref-type="bibr" rid="ref20">32</xref>
        ] and XLM [15]) applied to the task
of toxic comment classi cation. Their results shows that BERT and ROBERTa
obtained better results than XML.
      </p>
      <p>The majority of previous studies concerning to a behaviour detection in social
networks are in English, very few e orts have been made to address this kind
of task in Spanish. Below we describe some of the studies about toxic detection
from texts written in Spanish.</p>
      <p>
        [
        <xref ref-type="bibr" rid="ref29">41</xref>
        ] proposed di erent approaches to detect the misogyny and xenophobia
from Spanish tweet. They applied di erent classical supervised machine learning
techniques such as Nave Bayes, SVM, logistic regression, decision tree, and an
ensemble voting classi er. They also applied LSTM model to deal with the task.
Moreover, they develop their own linguistic resource that contains a set of
hateful concepts correlated with hateful words toward women or/and immigrants.
The authors also employed the iSOL lexicon [18], a dictionary with positive and
negative words, and word embeddings from the model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The authors consider
their results with the lexicon-based approach "are more than acceptable results",
compared to other machine learning approaches. Decision Tree shows the works
results with an F1-Score of 0.686, while Multinomial Nave Bayes and Logistic
Regressions obtain the top performance with a F1-score of 0.728 and 0.73
respectively. Moreover, the LSTM model obtained a similar performance with an
F1-score of 0.704. The authors also developed an ensemble voting classi er that
combined bot the multinomial Nave bayes and logistic regression, achieving the
best result with an F1-score of 74.2%. Later, in 2021, the same authors [
        <xref ref-type="bibr" rid="ref30">42</xref>
        ]
explored di erent pre-trained language models based on transfer learning (BERT,
XLM and BETO. BETO). BETO was the approach that obtained the best F1
(77.6%).
3
      </p>
      <p>MeO
Most previous work have focused on toxic detection from English texts.
MeO endEs@IberLEF 2021 is a competition to boost research on the detection
of o ensive language in social media, a sensitive topic that has hardly been
addressed for the Spanish language. The organizers of the competition have created
a dataset in which comments written in Spanish from di erent social networks
are collected.</p>
      <p>The organisation proposes a series of tasks, which mainly consist of
classifying the comments into di erent categories using metadata and additional
information. There are a total of four di erent subtasks:
{ Subtask 1: Non-contextual multiclass classi cation for generic Spanish.
{ Subtask 2: Contextual multiclass classi cation for generic Spanish.
{ Subtask 3: Non-contextual binary classi cation for Mexican Spanish.
{ Subtask 4: Contextual binary classi cation for Mexican Spanish.</p>
      <p>The main di erence between the tasks are the variant of the language: if it
is generic Spanish or Mexican Spanish. Moreover, while the rst and third tasks
do not provide contextual information, the second and fourth tasks allow to use
contextual metadata with information related to the comment such as the user
or the related social media.</p>
      <p>We have only participated in the subtask 1, whose goal is detect the o
ensiveness of the comments written in Spanish using only the texts. There are a
total of four classes, OFG, OFP, NOM and NO, which will be described in the
next sections.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Materials</title>
    </sec>
    <sec id="sec-4">
      <title>Methods</title>
      <p>
        This chapter starts describing in detail the dataset of the MeO endEs@IberLEF
task [
        <xref ref-type="bibr" rid="ref28">40</xref>
        ]. Then we present the approaches that we have developed for our
participation in the task.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>The dataset consist of comments over di erent social media platforms such as
YouTube, Instagram and Twitter. It contains more than 50,000 comments in
Spanish, making this corpus the largest and more varied Spanish dataset for
o ensive language analysis. Each comment in the dataset has a text, a numerical
ID and a label that provides the o ensive level and its target. The di erent
categories are:
{ OFP: the comment is o ensive and its target is a person.
{ OFG: the comment is o ensive and its target is a group of people or
collective.
{ NOM: the comment is non-o ensive, but uses inadequate language.
{ NO: the comment is non-o ensive.</p>
        <p>As an example of our data, the comment "verguenza ajena like si crees que
windy parece retrasada" which means that "like, if you think that Windy looks
stupid, cringe," is a clear example of the category OFP, as its content is o ensive,
it's a clear example where it has used swear words and denigrates a person.</p>
        <p>The organisers provided a training set with 16,710 comments. During the
evaluation, they also provided a test set with a total of 13,607 comments. These
comments are not classi ed, that is, they do not include their corresponding
label.</p>
        <p>We randomly split this training dataset into two subsets a ratio 80:20. The
rst subset is used for training our models and the second one to tune their
hyper-parameters.</p>
        <p>Fig.1 shows the class distribution, which is very similar on both subsets.
There is a strong unbalanced distribution of the classes, being NO the class
with more instances. However, there are still a large number of comments using
o ensive language.</p>
        <p>Distribution of classes
OFP</p>
        <p>385
44
OFG 168</p>
        <p>1666
s
e
s
s
a
l
C
NOM</p>
        <p>NO
240
995
2673</p>
        <p>Train
Validation</p>
        <p>10539
0
2000
4000 6000 8000
Number of comments
10000</p>
        <p>Fig. 1: Class distribution on training and validation datasets
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Traditional Machine Learning approach</title>
        <p>
          Data preprocessing Preprocessing techniques help us clean the texts and
reduce the size of the vocabulary to represent the comments. We have applied
the following techniques to preprocess the comments of the datasets:
{ Convert to lower-case the comments.
{ Tokenize the text and remove the stopwords (words without semantic
meaning). To do this, we use the NLTK library [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
{ Normalise tokens applying the Snowball stemming technique [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
{ Remove di erent symbols, words with numbers, punctuation, etc.
        </p>
        <p>Another aspect that we have previously analysed is the in uence of
emoticons. We have carried out an analysis where we converted emoticons and di erent
emojis to text, i.e. each emoticon corresponded to a description such as happy or
sad or shy, etc. However, there is not much di erence in the results obtained by
keeping these emoticons and transforming them than by removing these symbol.</p>
        <p>
          After text processing, we need to transform the representation of the text into
a vector, as input of our models. We have applied two di erent methods. First,
we converts each sentence into vectors using the TF-IDF method model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The
TF-IDF score is calculated by multiplying the metrics of term frequency (TF)
of a word and the inverse document frequency (IDF). To obtain IDF, the total
number of documents is divided by the number of documents that contain the
word. Then, the logarithm is applied on this result. The higher tf-idf of a word,
the more relevant the word is. As result of applying this method, we obtain the
processed data and we can start to train the models.
        </p>
        <p>
          To deal with the problem of unbalanced classes, we have applied di erent
techniques such as undersampling and oversampling [
          <xref ref-type="bibr" rid="ref10">22</xref>
          ], and Synthetic
Minority Over-sampling Technique (SMOTE) [11]. Undersampling and oversampling
techniques handle the imbalance problem by randomly resampling the training
dataset. The undersample method deletes instances from the majority class while
oversampling duplicates instances from the minority class. SMOTE is an
oversampling technique, which focuses on the feature space for each target class and
its nearest neighbours, to generate new instances with the help of interpolation
between the positive instances that lie together [
          <xref ref-type="bibr" rid="ref9">21</xref>
          ]. To apply these techniques
we have used the corresponding functions from the package imblearn of python
using the parameters by default.
        </p>
        <p>Now we brie y explain the di erent classi ers that we have used.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Random Forest</title>
        <p>Random Forest is a supervised learning algorithm. Random Forest classi er
consists on a large number of decision trees that operate as an ensemble. The
RandomForestClassifier function from the package sklearn of python is used
to train the model. We use a total of 100 number of trees and the rest of the
parameters by default.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Support Vector Machine (SVM)</title>
        <p>
          SVM [16] is a supervised machine learning algorithm that uses the kernel trick.
This technique nds the optimal hyper plane that separate the instances of the
classes. The SVM is commonly used for text classi cation [
          <xref ref-type="bibr" rid="ref38">50</xref>
          ], where text are
usually represente using the TF-IDF model. The LinearSVC function from the
package sklearn of python is used to train the model. Using the balanced class
weight and the rest of the parameters by default.
4.5
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Nave Bayes</title>
        <p>
          Nave Bayes is a family of probabilistic algorithms that take advantage of
probability theory and Bayes' Theorem [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. It is also used in NLP applying the Bayes'
Theorem to predict the "probability for each class such as the probability that
given data point belongs to a particular class" [
          <xref ref-type="bibr" rid="ref33">45</xref>
          ]. In this study the
multinomial Nave Bayes is applied using the MultinomialNB function from the package
sklearn of python to train the model.
4.6
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>Logistic Regression</title>
        <p>Logistic regression is a statistical method that is used to predict the probability
of a binary outcome based on a set of independent variables. In our study, we
have used a multinomial logistic regression, as we have a total of four classes.
To achieve this the LogisticRegression function from the package sklearn of
python is used to train the model. Just like the rest of the models, using the
balanced class weight and the rest of the parameters by default.
4.7</p>
      </sec>
      <sec id="sec-4-7">
        <title>Stochastic Gradient Descent (SGD)</title>
        <p>
          Stochastic Gardient Descent is an optimization technique to tting linear
classiers and regressors under convex loss functions such as (linear) Support Vector
Machines and Logistic Regression [
          <xref ref-type="bibr" rid="ref25">37</xref>
          ]. In this study we have applied a linear
classi ers with SGD training using the SGDClassifier function from the
package sklearn to train the model. In this case we use the parameters by default,
meaning that the loss function gives a linear SVM, also we have used the
balanced class weight.
4.8
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>Gradient Boosting Classi er</title>
        <p>
          Gradient Boosting Classi er [
          <xref ref-type="bibr" rid="ref23">35</xref>
          ] is a machine learning technique that is an
ensemble of machine learning algorithms and weak prediction models, obtaining
as result an outperforming model. Gradient boosting classi er applies
boosting as optimization function of aklternative loss functions. For this study, the
GradientBoostingClassifier function from the package sklearn is used to
train the model. We have used the default parameters for this task.
4.9
        </p>
      </sec>
      <sec id="sec-4-9">
        <title>Deep Learning approach</title>
        <p>
          Data preprocessing and features module First, we clean the texts removing
di erent symbols, words with numbers, punctuation. Then, texts were tokenized
by using the keras tokenizer, with 10,000 as maximum number of words. To
represent the comments, we use the word embedding technique random
initialization [
          <xref ref-type="bibr" rid="ref17">29</xref>
          ], that is, for each token of the vocabulary, a vector of numbers is
randomly created. The comments are truncated and padded to obtain the same
size in all comments (250 was de ned as the maximum number of words in a
comment). Then, the models are initialized with these vectors.
        </p>
        <p>LSTM for O ensive Classi cation In this section, we describe the
architecture of the LSTM model that we have used for the task of classifying the
comments into the four classes de ned in the O endEs corpus.</p>
        <p>Long Short Term Memory (LSTM), is a type of recurrent neural network
capable of learning order dependence in sequence prediction problems, keeping
only relevant information from the past inputs during training.</p>
        <p>
          The architecture of our LSTM model is explained by layers in the next steps:
{ The rst layer of our LSTM model is the embedded layer. The embedding
layer is initialized with the sentence embedding obtained as result of the
random initialization process. This layer uses 250 length vectors to represent
each word.
{ Before the LSTM layer, we add a dropout layer using a dropout rate of
0.2. This will help us to prevent over tting. To do this, we use the function
SpatialDropout1D proportioned by keras in python [
          <xref ref-type="bibr" rid="ref37">49</xref>
          ].
{ The last layer is the LSTM layer with a memory dimension of 100 memory
units.
{ The activation function of the output layer is the softmax function of one
single layer, assigning the probabilities of an instance being each class.
        </p>
        <p>
          The training of the network was performed by the minimization of the
categorical cross entropy function, and the learning process was optimized with the
Adam [
          <xref ref-type="bibr" rid="ref16">28</xref>
          ] algorithm as default.
        </p>
        <p>
          The next Fig.2 shows the approach based on the LSTM architecture.
BERT for O ensive Classi cation The second approach of deep learning
architecture is the BERT model. BERT applies a bidirectional training of
transformer, which can read entire sequences of tokens at once as opposed to
directional models like LSTMs that read sequentially. The transformers are "an
attention mechanism that learns contextual relations between words" [
          <xref ref-type="bibr" rid="ref12">24</xref>
          ], consist
of two distinct mechanisms: an encoder and a decoder. The rst reads the input,
while the latter creates the task prediction (in our case, a class for the input
comment). This provides a deeper understanding of language ow and context
than one-way language models.
        </p>
        <p>
          The data preprocess prior to train the model is the same as the one we
have applied with the LSTM model. The use the BERT pre-trained model for
tokenization, provided by HuggingFace [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], that it was implemented by Google
team. After the encoding process, the BERT embedding vector is obtained. This
transformations of the data correspond to an input layer of the network.
        </p>
        <p>Again, the activation function of the output layer is the softmax function of
one single unit, assigning the probabilities of an instance being each class.</p>
        <p>Also as LSTM model, the training of the network was performed by the
minimization of the categorical cross entropy function, and the learning process
was optimized with the Adam algorithm as default.
4.10</p>
      </sec>
      <sec id="sec-4-10">
        <title>Regularization Details</title>
        <p>There are numerous cases when the training performance of a machine learning
algorithm is really high, but after all in the test set the performance is poor.
This is common and it happens due the over tting of the model. The over tting
is when the neural network has high data variance and makes it hard for the
process when it use new data that it was not in the training.</p>
        <p>To solve this, di erent techniques are applied that help us to handle the
over tting problem, such as the already mentioned dropout or early stopping,
both applied in the deep learning models.</p>
        <p>
          Dropout In deep neural networks, dropout refers to the noise or data that is
dropped to improve processing and results, it is a regularization technique [
          <xref ref-type="bibr" rid="ref34">46</xref>
          ].
        </p>
        <p>
          Dropout add a penalty to the loss function. At the training stage, the input
nodes are randomly selected and ignored with probability 1 p., meaning that
the dropout layer randomly sets input units to 0 with a frequency of rate at each
step during training [
          <xref ref-type="bibr" rid="ref37">49</xref>
          ] There are several studies showing that a dropout rate
of 0.5 is e ective in most scenarios [
          <xref ref-type="bibr" rid="ref15">27</xref>
          ].
        </p>
        <p>Despite of that, we have decided to chose a threshold of 0.2. The decision of
choosing a threshold lower to 0.5, is because we have four classes that are very
similar to each other and they are unbalanced.</p>
        <p>As result of applying dropout we get a much simpler network.</p>
        <p>Early Stopping Early stooping is another strategy to prevent the over tting
of the models. The objective of this technique is to train su ciently with the
training data, and stop when the performance on the validation data starts to
decline to avoid over tting. We gave a margin of 2 epochs, that is, the model
is allowed to be trained for 2 more epochs to improve the performance of the
model. If there is no improvement in the validation loss, the training is stopped.
4.11</p>
      </sec>
      <sec id="sec-4-11">
        <title>Network Training Details</title>
        <p>
          Optimizer The optimizer of our deep learning architectures is Adaptive
Moment Estimation (Adam) is a stochastic gradient descent method. According
to Diederik P. Kingma et.al [
          <xref ref-type="bibr" rid="ref16">28</xref>
          ]. The method is "computationally e cient, has
little memory requirement, invariant to diagonal rescaling of gradients, and is
well suited for problems that are large in terms of data/ parameters" [
          <xref ref-type="bibr" rid="ref16">28</xref>
          ]. The
default parameters are used for Adam optimizer, with LSTM we use keras and
with BERT we use the TensorFlow optimizer. The exception is that we have
choose a di erent learning rate for BERT.
        </p>
        <p>{ Learning rate LSTM: 0:001
{ Learning rate BERT: 2e 5
{ Beta 1: 0:9
{ Beta 2: 0:999
{ Epsilon: 1e 7
Loss The selected loss function is the categorical cross entropy, also called
Softmax Loss. It is a loss function that is used in multi-class classi cation tasks.</p>
        <p>The loss function is the following:</p>
        <p>Loss =
outputsize</p>
        <p>
          X
i=1
yi log(y^i)
(1)
where y^i is the i-th scalar value in the model output, yi is the
corresponding target value, and output size is the number of scalar values in the model
output [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Number of epochs and batch size We have declared a total of 15 number
of epochs to t both models, LSTM model and BERT model.</p>
        <p>The batch size is 100 in the case of the LSTM model and 1114 for BERT
model.</p>
        <p>Monitoring the loss in the validation data, only was necessary 6 epochs for
the LSTM model and 4 in the case of the BERT, as result of the early stopping,
since after those points the loss validation stopped improving.</p>
        <p>
          Software and Hardware Details The experiments have been developed in
Python 3.7.7. Concretely to develop the machine learning algorithms, we have
used the Python library scikit-learn [
          <xref ref-type="bibr" rid="ref26">38</xref>
          ], while the deep learning models were
developed making use of the libraries Keras [14] on top of Tensor ow [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and
PyTorch [
          <xref ref-type="bibr" rid="ref24">36</xref>
          ].
        </p>
        <p>Our experiments were conducted on Google Colab with the GPU activated.
Google Colab is a open product from Google Research that allows to execute
and create python code through the browser, enabling us to use computational
resources, such as GPU or TPU.</p>
        <p>There are many other libraries that we have used to plot, visualise the data
and evaluate the models, some of them are the libraries pandas, numpy, sklearn
and matplotlib.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation and Discussion</title>
      <p>To evaluate our models, the organiser provided a the test dataset 13,607
comments. These did not include their label.</p>
      <p>For the performance of the models, we have used di erent standard metrics
as precision, recall and F1-score. Moreover, using their micro-averaged,
macroaveraged and weighted macro-averaged versions, we have obtained the mean
Square Error (MSE) from the o cial results. The micro-average is more suitable
for unbalanced datasets. Since we have an unbalanced dataset (see Fig.1), the
most appropriate metric for comparing the models is the micro-averaged
F1Score, which we will call micro-F1. Analysing further, we have also been able
to obtain results at the class level. With this, we can check the e ciency of our
models when classifying and predicting a comment that could be o ensive.</p>
      <p>Table1 shows the results obtained with the traditional machine learning
methods. We can see that all of the models achieve a micro-F1 that ranges
from 0.80 to 0.88, although they show certain di erences at the class level. The
only models that can obtain results for the minority class (OFG) are the the
logistic regression and Gradient Boosting models. The rest of them are not able
to obtain a result, being 0 for all the metrics. This may be due to the class OFG
has only a few instances.</p>
      <p>The Table2 shows the results obtained with deep learning architectures:
LSTM and BERT. The micro-F1 of this two models is similar obtaining a
difference of 0.01. Again at class level, the models obtain an score of 0 for the class
OFG while the rest of the classes has an score around 0.9 for NO, and 0.5-0.7 for
NOM and OFP. As result, the best model obtained on the validation data set
is the Stochastic Gradient Descent achieving a micro-F1 of 0:88, followed by the
random forest 0:874 and BERT 0:870. At level class, the best model is logistic
regression with a micro-F1 of 0:93 for the class NO, 0:71 for NOM class, 0:19 for
OFG class and 0:59 for OFP class.</p>
      <p>In addition, the three models that we have presented to the competition and
evaluated with the test dataset are the deep learning approach models (LSTM
BERT) and logistic regression. We have selected these models because we wanted
to focus on this newest algorithms results and also have a reference of a
traditional model. Although the logistic regression is not the best model of all the
traditional machine learning models, it is the only one that at class level is able
to obtain results for all of them, so it has been decided to present this model to
the competition.</p>
      <p>The results obtained for the test dataset (see Table3) are lower than those
obtained with these models in the validation dataset, although this is normal.
The best approach is the BERT model. The LSTM model has achieved a
microF1 of 0.861734 on the validation dataset and 0.80751 on the o cial results, which
are lower than those obtained with the BERT model, micro-F1 of 0.870992 on
the validation dataset and 0.84168 on the test dataset. Moreover, we can see
with the logistic regression model, it works better than the LTSM model. In
particular, the logistic regression has achieved a micro-F1 of 0.860861 on the
validation dataset, and 0.816331 on the test dataset. Moreover, the o cial MSE
of LSTM, BERT and Logistic regression models are 0.085417, 0.069783 and
0.075155 respectively.</p>
      <p>If we compare the results obtained for both datasets, we can say that even
if we have not proposed the best model of the validation dataset, Stochastic
Gradient Descent, the results with BERT, LSTM and Logistic regression are the
expected.</p>
      <p>
        In the results of this study, there is a pattern that is repeated for all the
models and their approaches. In general the results obtained are quite similar,
even if we have applied di erent data process or methods for each approach.
Also, the majority class (NO) has a higher score for all of the models, while
the classes NOM and OFP also obtain similar results between them two. This
happens because these models are trained with unbalanced data. However, the
models are able to obtain an score for the rest of the classes, despite of this large
imbalance. This fact is also observed in the confusion matrix of these respective
models (see Fig3 Fig4). As commented, the majority of the models are not able
to obtain a metric other than 0 for the minority class (OFG). This may be due
to the dataset is unbalanced and only a 1.27% of the comments corresponds to
the OFG class(see Fig. 1). Knowing that, we have explored di erent methods
such as SMOTE, oversampling and undersampling methods to resolve the data
imbalance. These techniques were only applied to the traditional machine
learning classi ers, because deep learning approach models are robust for imbalanced
data [
        <xref ref-type="bibr" rid="ref13">25</xref>
        ], [
        <xref ref-type="bibr" rid="ref32">44</xref>
        ].
      </p>
      <p>However, the results obtained after applying these methods do not provide
any signi cant di erence (see Table 4 and Table 5) on unbalanced data, excepted
that the models obtain scores for the minority class (OFG). All the models
(except Naive Bayes and Gradient Boosting) use balanced class weight, that is,
the training of the models takes into account the weight of each class. Probably
due to that fact, the results obtained with the unbalanced data and those
applying the unbalancing techniques are similar and no noticeable improvement is
obtained.</p>
      <p>Even with this fact, we cannot claim that our models nd it more di cult
to classify comments with o ensive language than those that do not contain it.
Although the models have been trained with a greater number of non-o ensive
comments. As we have commented, observing the results obtained in all the rest
of the classes, and taking into account this imbalance, most of the models are
capable of making a classi cation and detection of the di erent classes.
(b) Support Vector Machine
(SVM) Confusion Matrix
(c) Nave Bayes Confusion
Matrix
(d) Logistic Regression
Confusion Matrix
(e) Stochastic Gradient
Descent (SGD) Confusion
Matrix
(f) Gradient Boosting
Classi er Confusion Matrix
The confusion matrix show us our True Negatives on the top left, False Negatives on
the top right, True Positives on the bottom right, and False Positives on the bottom
left of each class.</p>
      <p>Confusion matrix: LSTM model
ltcauA NOM 98 118
FGO 34</p>
      <p>3
1500
1000
500
0
2000
1500
1000
500
0
6</p>
    </sec>
    <sec id="sec-6">
      <title>Social-Economic Impact</title>
      <p>
        Nowadays most social media, applications and websites, have various tools that
prevent di erent actions that can be dangerous or o ensive to users. However,
many of these tools are mainly focused on the treatment of images and videos
in which di erent behaviours can be identi ed, such as violent, unpleasant or
illicit behaviour that could be o ensive or be sensible. In these cases, a lter
is added to these types of applications and usually are identi ed, blocked or
even deleted. While there are not as many tools implemented to deal with text
on these platforms. Most of the time, when a person su ers discrimination or
cyberbullying on social media [
        <xref ref-type="bibr" rid="ref39">51</xref>
        ] is done through text comments or text
messages. Although there are identities that are involved to identify and track this
kind of behaviours, it would be really e cient if any model dedicated to the
identi cation of o ensive comments is incorporated.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Law framework</title>
      <p>In Spain, the practice of insults, threats and slander are commonplace on social
networks, as they are justi ed by the right to Freedom of Expression, but they
are not unpunished.</p>
      <p>Freedom of expression is a fundamental right as de ned in Article 10 of the
European Convention on Human Rights, and Article 20.1.a) of the Spanish
Constitution. The counterweight to Freedom of Expression is the Right to Honour.
This right is included in Spanish legislation in Article 18 of the Constitution,
and is a fundamental right regulated in Organic Law 1/1982, of 5 May, on the
civil protection of the right to honour, personal and family privacy and one's
own image.</p>
      <p>The di erent o ences that can be committed are typi ed in the Spanish
Penal Code (slander in Articles 205 et seq. and libel in Articles 208 et seq.),
including harassment or stalking (Article 172 ter CP), sexting (Article 197.7
CP), grooming (Article 183 bis CP), cyberbullying (Article 197 CP), among
others.</p>
      <p>However, despite the fact that these crimes are commonly committed on
social networks, there is no regulatory body that prevents this series of conducts;
it simply limits itself to punishing them once they have been committed and
reported. It is the companies themselves, such as Facebook or Twitter, that
judge which actions damage the rights of other users, all of which is related to
the problem posed by the limits of rights.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion and Future Work</title>
      <p>
        One of the main goals of this study is to study di erent NLP and deep learning
models. In particular, this document describes our participation in the shared
task of MeO endEs@IberLEF 2021 [
        <xref ref-type="bibr" rid="ref28">40</xref>
        ]. We have explored di erent deep
learning models such as Long-Short Term Memory (LSTM) and Bidirectional
Encoder Representations from Transformers (BERT), as well as traditional
machine learning models such as Logistic Regression or Support Vector Machines
(SVM) among others, to classify the comments (written in Spanish) into the
four classes de ned in the O endEs corpus, which allow to label the o ensive
level and its o ensive target described in each comment.
      </p>
      <p>The results of our experiments show that for the test evaluation, BERT
obtains the best results obtaining an F1-Score of 84.16% and a MSE of 0.069.
Comparing this with the other deep learning approach, LSTM model. We can
see that a bidirectional network model works better than a unidirectional model
for detection of o ensive comments. The bidirectional model, BERT, it is able
to obtain the context of a comment giving as result a better performance, also,
considering the results of the logistic regression, we can see that for this kind of
task it is better to work with a bidirectional network as BERT is better than
the logistic regression. Even that, considering the performance of this models in
the validation dataset, logistic regression is the only one of the three who is able
to get a result for each class (NO 0.93, NOM 0.71, OFG 0.19, OFP 0.59).</p>
      <p>We have also studied the in uence of emoticons by converting them to text.
However, the inclusion of emoticons did not improve the results. In addition,
although we have not gone into great depth on this, as we have commented
before, several approaches have been used to solve the problem of imbalance
data, such as Oversampling and Undersampling or SMOTE methods, however
we also have not gone further applying this techniques as the results obtained
do not improve.</p>
      <p>
        As future work, we plan to address the other subtasks proposed in
MeOffendEs@IberLEF 2021 as the comparison of using Mexican or general Spanish
language with our models. We will explore other pre-trained models trained on
tweets and comments of other Social networks as XLM that it use in [
        <xref ref-type="bibr" rid="ref30">42</xref>
        ] model
or RoBERTa also applied in [54]. We will use the contextual information about
the user and the social media. In addition, we plan to develop a multimodal
system that also exploits the information from images or videos to identify o ensive
content in social media.
      </p>
      <p>
        Also it could be interesting to relate this task with another di erent task that
is provided by IberLeF [
        <xref ref-type="bibr" rid="ref22">34</xref>
        ]. They propose numerous task as the identi cation
or classi cation of emotions, Stance and Opinions, harmful information, health
related information extraction and knowledge discovery, humour and irony or
lexical acquisition. Saying this and as future work it could be interesting to merge
the emotion classi cation with the o ensive detection as we could nd di erent
behaviours from the way the users react to a di erent type of comment.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>This work was supported by the NLP4RARE-CM-UC3M, which was developed
under the Interdisciplinary Projects Program for Young Researchers at
University Carlos III of Madrid. The work was also supported by the
Multiannual Agreement with UC3M in the line of Excellence of University Professors
(EPUC3M17), and in the context of the V PRICIT (Regional Programme of
Research and Technological Innovation).
9. Bourgonje, P., Moreno-Schneider, J., Srivastava, A., Rehm, G.: Automatic classi
cation of abusive language and personal attacks in various forms of online
communication. In: International Conference of the German Society for Computational
Linguistics and Language Technology. pp. 180{191. Springer, Cham (2017)
10. Chandrika, C., Kallimani, J.S.: Classi cation of abusive comments using various
machine learning algorithms. In: Cognitive Informatics and Soft Computing, pp.
255{262. Springer (2020)
11. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic
minority over-sampling technique. Journal of Arti cial Intelligence Research 16,
321{357 (Jun 2002). https://doi.org/10.1613/jair.953, bluehttp://dx.doi.org/10.
1613/jair.953
12. Chen, H., McKeever, S., Delany, S.J.: Abusive text detection using neural networks.</p>
      <p>In: McAuley, J., McKeever, S. (eds.) Proceedings of the 25th Irish Conference
on Arti cial Intelligence and Cognitive Science, Dublin, Ireland, December 7 - 8,
2017. CEUR Workshop Proceedings, vol. 2086, pp. 258{260. CEUR-WS.org (2017),
bluehttp://ceur-ws.org/Vol-2086/AICS2017 paper 44.pdf
13. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting o ensive language in social media
to protect adolescent online safety. In: 2012 International Conference on Privacy,
Security, Risk and Trust and 2012 International Confernece on Social Computing.
pp. 71{80. IEEE (2012)
14. Chollet, F., et al.: Keras (2015), bluehttps://github.com/fchollet/keras
15. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzman,
F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual
representation learning at scale. In: Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics. pp. 8440{8451. Association for
Computational Linguistics, Online (Jul 2020).
https://doi.org/10.18653/v1/2020.aclmain.747, bluehttps://www.aclweb.org/anthology/2020.acl-main.747
16. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273{297
(1995). https://doi.org/10.1007/bf00994018
17. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep
bidirectional transformers for language understanding. In: Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
pp. 4171{4186. Association for Computational Linguistics, Minneapolis, Minnesota
(Jun 2019). https://doi.org/10.18653/v1/N19-1423, bluehttps://www.aclweb.org/
anthology/N19-1423
18. Dolores Molina-Gonzalez, M., Mart nez-Camara, E., Teresa Mart n-Valdivia, M.,
Alfonso Uren~a-Lopez, L.: A spanish semantic orientation approach to domain
adaptation for polarity classi cation. Information Processing Management 51(4), 520{
531 (2015). https://doi.org/https://doi.org/10.1016/j.ipm.2014.10.002, bluehttps:
//www.sciencedirect.com/science/article/pii/S0306457314000910
19. Dom nguez-Almendros, S., Ben tez-Parejo, N., Gonzalez-Ramirez, A.: Logistic
regression models. Allergologia et immunopathologia 39(5), 295{305 (2011)
20. Golbeck, J., Ashktorab, Z., Banjo, R.O., Berlinger, A., Bhagwan, S., Buntain,
C., Cheakalos, P., Geller, A.A., Gergory, Q., Gnanasekaran, R.K., Gunasekaran,
R.R., Ho man, K.M., Hottle, J., Jienjitlert, V., Khare, S., Lau, R.,
Martindale, M.J., Naik, S., Nixon, H.L., Ramachandran, P., Rogers, K.M., Rogers, L.,
Sarin, M.S., Shahane, G., Thanki, J., Vengataraman, P., Wan, Z., Wu, D.M.:
A large labeled corpus for online harassment research. In: Fox, P.,
McGuinness, D.L., Poirier, L., Boldi, P., Kinder-Kurlanda, K. (eds.) Proceedings of the
52. Xu, J.M., Jun, K.S., Zhu, X., Bellmore, A.: Learning from bullying traces in social
media. In: Proceedings of the 2012 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies.
pp. 656{666. Association for Computational Linguistics, Montreal, Canada (Jun
2012), bluehttps://www.aclweb.org/anthology/N12-1084
53. Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical
framework. International Journal of Machine Learning and Cybernetics 1(1-4),
43{52 (2010)
54. Zhao, Z., Zhang, Z., Hopfgartner, F.: A comparative study of using pre-trained
language models for toxic comment classi cation. In: Companion Proceedings of
the Web Conference 2021. pp. 500{507 (2021)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Categorical crossentropy loss function: Peltarion platform</article-title>
          , bluehttps://peltarion. com
          <article-title>/knowledge-center/documentation/modeling-view/build-an-ai-model/ loss-functions/categorical-crossentropy</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <article-title>Hugging face { the ai community building the future</article-title>
          ., bluehttps://huggingface.co/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. snowballstemmer, bluehttps://pypi.org/project/snowballstemmer/</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Citro</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jozefowicz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudlur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mane</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olah</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talwar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasudevan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viegas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warden</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wattenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wicke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>TensorFlow: Large-scale machine learning on heterogeneous systems (</article-title>
          <year>2015</year>
          ), bluehttps://www.tensor ow.org/, software available from tensor ow.org
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ribeiro-Neto</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , et al.:
          <article-title>Modern information retrieval</article-title>
          , vol.
          <volume>463</volume>
          . ACM press New York (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Berrar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Bayes' theorem and naive bayes classi er</article-title>
          .
          <source>Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics; Elsevier Science Publisher: Amsterdam</source>
          , The Netherlands pp.
          <volume>403</volume>
          {
          <issue>412</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit. "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          ,
          <issue>135</issue>
          {
          <fpage>146</fpage>
          (
          <year>2017</year>
          ) 2017 ACM on Web Science Conference,
          <year>WebSci 2017</year>
          , Troy,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA, June 25 - 28,
          <year>2017</year>
          . pp.
          <volume>229</volume>
          {
          <fpage>233</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2017</year>
          ). https://doi.org/10.1145/3091478.3091509, bluehttps://doi.org/10.1145/3091478.3091509
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          21. Happy95:
          <article-title>Smote: Overcoming class imbalance problem using smote (</article-title>
          <year>Jan 2021</year>
          ), bluehttps://www.analyticsvidhya.com/blog/2020/10/ overcoming-class
          <article-title>-imbalance-using-smote-techniques/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          22.
          <string-name>
            <surname>Hernandez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrasco-Ochoa</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            nez-Trinidad,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. Progress in Pattern Recognition, Image Analysis</article-title>
          ,
          <source>Computer Vision</source>
          , and Applications Lecture Notes in Computer Science p.
          <volume>262</volume>
          {
          <issue>269</issue>
          (
          <year>2013</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -41822-833
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          23.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          24.
          <string-name>
            <surname>Horev</surname>
          </string-name>
          , R.:
          <article-title>Bert explained: State of the art language model for nlp (</article-title>
          <year>Nov 2018</year>
          ), bluehttps://towardsdatascience.com
          <article-title>/ bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          25.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Is BERT really robust? natural language attack on text classi cation and entailment</article-title>
          . CoRR abs/
          <year>1907</year>
          .11932 (
          <year>2019</year>
          ), bluehttp://arxiv.org/abs/
          <year>1907</year>
          .11932
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          26.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Fasttext. zip: Compressing text classi cation models</article-title>
          .
          <source>arXiv preprint arXiv:1612.03651</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          27.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          (
          <year>2014</year>
          ). https://doi.org/10.3115/v1/d14-
          <fpage>1181</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          28.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          29.
          <string-name>
            <surname>Kocmi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojar</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>An exploration of word embedding initialization in deeplearning tasks (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          30.
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huttenlocher</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
          </string-name>
          , J.:
          <article-title>Predicting positive and negative links in online social networks</article-title>
          .
          <source>In: Proceedings of the 19th international conference on World wide web</source>
          . pp.
          <volume>641</volume>
          {
          <issue>650</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          31.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Napagao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Gasulla</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzumura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>What are we depressed about when we talk about covid-19: Mental health analysis on tweets using natural language processing</article-title>
          .
          <source>Lecture Notes in Computer Science Arti cial Intelligence</source>
          XXXVII p.
          <volume>358</volume>
          {
          <issue>370</issue>
          (
          <year>2020</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          - 63799-627
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          32.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          . CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ), bluehttp://arxiv.org/abs/
          <year>1907</year>
          .11692
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          33.
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatic detection of political opinions in tweets</article-title>
          .
          <source>Lecture Notes in Computer Science The Semantic Web: ESWC 2011</source>
          Workshops p.
          <volume>88</volume>
          {
          <issue>99</issue>
          (
          <year>2012</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -25953-18
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          34.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          , M. (eds.):
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          35.
          <string-name>
            <surname>Natekin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoll</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Gradient boosting machines, a tutorial</article-title>
          .
          <source>Frontiers in neurorobotics 7</source>
          ,
          <issue>21</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          36.
          <string-name>
            <surname>Paszke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lerer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradbury</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chanan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Killeen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimelshein</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antiga</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desmaison</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopf</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeVito</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raison</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tejani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chilamkurthy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chintala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Pytorch:
          <article-title>An imperative style, high-performance deep learning library</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          , pp.
          <volume>8024</volume>
          {
          <fpage>8035</fpage>
          . Curran Associates, Inc. (
          <year>2019</year>
          ), bluehttp://papers.neurips.cc/paper/ 9015-pytorch
          <article-title>-an-imperative-style-high-performance-deep-learning-library</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          37.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          38.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Muller,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nothman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Louppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Perrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Edouard</surname>
          </string-name>
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          :
          <article-title>Scikit-learn: Machine learning in python (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          39.
          <string-name>
            <surname>Phd</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adigun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          :
          <article-title>Identi cation and classi cation of toxic comments on social media using machine learning techniques pp</article-title>
          .
          <volume>2454</volume>
          {
          <issue>6194</issue>
          (11
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          40.
          <article-title>Plaza-del-</article-title>
          <string-name>
            <surname>Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casavantes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin-Valdivia</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gomez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarqu</surname>
            n-Vasquez,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Villasen~or-</article-title>
          <string-name>
            <surname>Pineda</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Overview of the MeO endEs task on o ensive text detection at IberLEF 2021</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          41.
          <string-name>
            <surname>Plaza-Del-Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molina-Gonzalez</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <article-title>Uren~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            <given-names>nValdivia</given-names>
          </string-name>
          , M.T.:
          <article-title>Detecting misogyny and xenophobia in spanish tweets using language technologies</article-title>
          .
          <source>ACM Transactions on Internet Technology</source>
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <volume>1</volume>
          {
          <fpage>19</fpage>
          (
          <year>2020</year>
          ). https://doi.org/10.1145/3369869
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          42.
          <string-name>
            <surname>Plaza-Del-Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molina-Gonzalez</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <article-title>Uren~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            <given-names>nValdivia</given-names>
          </string-name>
          , M.T.:
          <article-title>Comparing pre-trained language models for spanish hate speech detection</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>166</volume>
          ,
          <issue>114120</issue>
          (
          <year>2021</year>
          ). https://doi.org/10.1016/j.eswa.
          <year>2020</year>
          .114120
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          43.
          <string-name>
            <surname>Rish</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , et al.:
          <article-title>An empirical study of the naive bayes classi er</article-title>
          .
          <source>In: IJCAI 2001 workshop on empirical methods in arti cial intelligence</source>
          . vol.
          <volume>3</volume>
          , pp.
          <volume>41</volume>
          {
          <issue>46</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          44.
          <string-name>
            <surname>Sangiorgio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dercole</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Robustness of lstm neural networks for multistep forecasting of chaotic time series</article-title>
          . Chaos, Solitons Fractals
          <volume>139</volume>
          ,
          <issue>110045</issue>
          (
          <year>2020</year>
          ). https://doi.org/https://doi.org/10.1016/j.chaos.
          <year>2020</year>
          .
          <volume>110045</volume>
          , bluehttps:// www.sciencedirect.com/science/article/pii/S0960077920304422
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          45.
          <string-name>
            <surname>Saxena</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>How the naive bayes classi er works in machine learning</article-title>
          .
          <source>Data Science, Machine Learning</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          46.
          <string-name>
            <surname>Shacklett</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>What is dropout? understanding dropout in neural networks</article-title>
          (
          <year>Mar 2021</year>
          ), bluehttps://searchenterpriseai.techtarget.com/de nition/dropout
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          47.
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Importance and challenges of social media text</article-title>
          .
          <source>International Journal of Advanced Computer Research</source>
          <volume>8</volume>
          ,
          <issue>831</issue>
          {
          <volume>834</volume>
          (04
          <year>2017</year>
          ). https://doi.org/10.26483/ijarcs.v8i3.
          <fpage>3108</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          48.
          <string-name>
            <surname>Suthaharan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Support vector machine</article-title>
          . In:
          <article-title>Machine learning models and algorithms for big data classi cation</article-title>
          , pp.
          <volume>207</volume>
          {
          <fpage>235</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          49.
          <string-name>
            <surname>Team</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Keras documentation: Spatialdropout1d layer</article-title>
          , bluehttps://keras.io/api/ layers/regularization layers/spatial dropout1d/
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          50.
          <string-name>
            <surname>Tong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Support vector machine active learning with applications to text classi cation</article-title>
          .
          <source>Journal of machine learning research 2(Nov)</source>
          ,
          <volume>45</volume>
          {
          <fpage>66</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          51.
          <string-name>
            <surname>Whittaker</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kowalski</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>Cyberbullying via social media</article-title>
          .
          <source>Journal of School Violence</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ),
          <volume>11</volume>
          {
          <fpage>29</fpage>
          (
          <year>2015</year>
          ). https://doi.org/10.1080/15388220.
          <year>2014</year>
          .
          <volume>949377</volume>
          , bluehttps://doi.org/10.1080/15388220.
          <year>2014</year>
          .949377
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>