<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. A. Marrugo-Tobón);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Natural Language Content Evaluation System For Multiclass Detection of Hate Speech in Tweets Using Transformers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Duván Andres Marrugo-Tobón</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Carlos. Martinez-Santos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edwin Puertas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Tecnologíca de Bolívar, Faculty of Engineering</institution>
          ,
          <addr-line>Cartagena de Indias 17013001</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in eficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, focusing on hate speech detection. Our system enhances classification accuracy and eficiency by harnessing the power of Transformers, namely BERT and DistilBERT. By leveraging feature extraction techniques, we capture pertinent information from tweets, enabling practical analysis, categorization, and identification of hate speech instances. During training, we also tackle imbalanced corpora by employing techniques to ensure fair representation of diferent tweet categories, including hate speech. Our system achieves impressive accuracy through extensive training of 95%, showcasing Transformers' efectiveness in comprehending and categorizing tweets, including identifying hate speech. Furthermore, our system maintains a good accuracy during testing of 83%, highlighting the robustness and generalizability of the trained models for hate speech detection. This system contributes to advancing automated tweet categorization, specifically in hate speech detection, providing a reliable and eficient solution for organizing and analyzing diverse tweet datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural language processing</kwd>
        <kwd>Hate speech detection</kwd>
        <kwd>Transformers</kwd>
        <kwd>DistilBERT</kwd>
        <kwd>BERT</kwd>
        <kwd>Feature extraction</kwd>
        <kwd>Tweet categorization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the exponential growth of social media platforms, particularly Twitter, user-generated
content has skyrocketed, making efective categorization and analysis of tweets increasingly
challenging [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The categorization of tweets into meaningful and relevant categories is essential
for various applications such as sentiment analysis, trend detection, and information retrieval.
However, traditional rule-based approaches and simple machine-learning techniques often need
to catch up to capturing tweets’ complex linguistic patterns and nuances.
      </p>
      <p>
        Social media platforms have become a prominent medium for expressing opinions, sharing
information, and fostering social connections in the contemporary digital age. However, the
general nature of online communication has also led to the proliferation of hate speech, posing
significant challenges to maintaining a respectful and inclusive online environment [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Therefore, recognizing and efectively addressing hate speech is crucial to promoting online safety,
combating discrimination, and ensuring positive user experiences [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The detection and classification of hate speech in tweets have gained considerable attention
due to this social media platform’s brevity and widespread use. Traditional approaches for
hate speech detection often relied on handcrafted features and rule-based systems, but these
methods needed to capture the nuances and evolving nature of hate speech-language. With
the recent advancements in natural language processing (NLP), specifically the development of
transformer-based models, more sophisticated and accurate techniques for hate speech detection
have emerged [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        To address these challenges, recent advancements in natural language processing (NLP) have
focused on leveraging deep learning models, particularly Transformers, for tweet
categorization [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Transformers, a class of deep learning architectures, have revolutionized NLP tasks
by capturing contextual information and semantic relationships in an unparalleled manner.
Among the popular Transformer models, BERT (Bidirectional Encoder Representations from
Transformers) and DistilBERT (a distilled version of BERT) have emerged as powerful tools for
text classification tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Transformers have been widely employed in the context of tweet classification to improve
accuracy and eficiency. However, to further enhance the performance of tweet categorization
systems, it is crucial to consider feature extraction techniques that capture relevant information
from tweets[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. There are various feature extraction methods explored in the literature, including
the use of character n-grams [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], word embeddings such as Word2Vec [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and deep
learningbased approaches like Convolutional Neural Networks (CNNs) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These techniques enable
the models to understand the nuanced characteristics of tweets and improve classification
accuracy.
      </p>
      <p>
        Furthermore, we must address the imbalanced corpora in tweet datasets to ensure fair
representation of diferent categories during training [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Oversampling and undersampling
are two commonly used strategies to balance the corpus. These techniques aim to create a
balanced distribution of tweet categories, providing equal opportunities for the model to learn
from all classes [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Other advanced approaches, such as Synthetic Minority Over-sampling
Technique (SMOTE) [13] and Adaptive Synthetic Sampling (ADASYN) [14], have also been
proposed to generate synthetic samples or adaptively adjust the sampling rate to achieve better
corpus balance.
      </p>
      <p>By combining the power of Transformers, efective feature extraction techniques, and corpus
balancing strategies [15], we aim to enhance the accuracy and robustness of tweet classification
systems, ultimately enabling more precise information retrieval and analysis in the realm of
social media data [16]. These combined techniques allow the models to capture the contextual
information and semantic relationships in tweets, leverage informative features extracted from
tweet content, and ensure fair representation of diferent tweet categories during training. The
integration of these components forms a comprehensive framework that addresses the challenges
posed by imbalanced corpora and enhances the overall performance of tweet classification
systems.</p>
      <p>This paper explores Transformers’ efectiveness, specifically BERT and DistilBERT, in
identifying diferent types of hate speech in tweets using transformer-based models, by classifying
hate speech into distinct categories, such as homophobia intolerance. The contributions of this
work include the application of cutting-edge Transformers for the categorization of tweets, the
exploration of feature extraction techniques to enhance classification accuracy, and a
comprehensive evaluation of the proposed approach using a real-world dataset. This work is as follows:
First, it overviews the related work on text categorization. Next, it describes the methodology
implemented for classification models. Then, it presents experimental validation. Finally, it
summarizes conclusions and future works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Dataset Description</title>
        <p>The Hate Speech Detection track, organized by the Grupo de Ingeniería Lingüística at the
Universidad Nacional Autónoma de México, focuses on classifying tweets in Mexican Spanish
for LGBT+phobic content. The dataset comprises tweets from 2012 to 2022, providing a
decadelong temporal span for analysis [17].</p>
        <p>Participants in this track used a corpus of Mexican Spanish tweets. They aim to develop
classification models that accurately determine whether a given tweet exhibits LGBT+phobic
content. The classification categories include LGBT+phobic (P), not LGBT+phobic (NP), or
not LGBT+related (NA), enabling a multi-class classification task. For example, the baseline
required for the use of a Transformer was 75%. In turn, the dataset provides three columns
of information: Id, content, and label. As a result, the corpus needs to be more balanced. For
example, 62% are the NP class, 25% are the NA class, and 12% are the P class.</p>
        <p>This competition ofers an opportunity to delve into the challenges of hate speech detection
within the context of Mexican Spanish tweets [17]. Participants can explore the linguistic
nuances, evolving trends, and temporal variations in LGBT+phobic content over the ten years,
facilitating a comprehensive understanding of online hate speech dynamics.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Classification Process</title>
        <p>In Fig. 1, we illustrate the classification process. The dataset is read and stored in a Data-Frame
with 2-column labels. We preprocess the data by removing empty words, URLs, punctuation
marks, and special characters and converting them to lowercase. We also did data balancing.
Furthermore, we conduct exploratory analysis to extract keywords, acronyms, and abbreviations,
utilizing n-grams for feature selection.</p>
        <p>Feature extraction is crucial in representing sentences or documents by assigning a probability
of occurrence to words. We trained the system using 80% of the data. We used the remaining
20% for verification. Evaluation metrics, including F1 score, accuracy, precision, and recall, were
utilized to identify the best-performing method.</p>
        <p>Input
tweets</p>
        <p>Pre-processing</p>
        <p>Cross
validation
Oversampling</p>
        <p>BeErRtT
DistilBERT</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Pre-Processing</title>
        <p>Input data for natural language tasks such as text classification consists of unstructured text. In
contrast to other types of data, such as images or time series, textual information does not have
an intrinsic numerical representation. Therefore, before entering it into a classifier, it has to be
represented in a suitable feature domain. Therefore, preprocessing procedures are fundamental
since, without them, there is no basis for feature extraction or classification algorithms [18].</p>
        <p>Input
texts</p>
        <p>Raw
Tweets</p>
        <p>Data Cleaning</p>
        <p>Remove URL
Remove Puntuation Marks
and Special characters
Remove Numerical</p>
        <p>characters</p>
        <p>Convert to lower case</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Data Cleaning</title>
          <p>Raw
Tweets
Description</p>
          <p>Stop words</p>
          <p>Lemmatization
Remove Stop Words</p>
          <p>Raw
Tweets
Description</p>
          <p>Exploratory Data</p>
          <p>Analysis
Finding and replace</p>
          <p>Keywords
Replace acronyms
Replace slangs</p>
          <p>Clean Tweets
Data cleaning of tweets involves several steps to preprocess the text and remove unnecessary
or irrelevant information, as shown in Fig. 2. The process includes removing special characters,
punctuation marks, URLs, and mentions and converting the text into individual words. As well
as removing empty words to reduce noise, hashtags, and emoticons were treated depending on
the objectives of the analysis. In addition, text normalization is performed by converting the text
to lowercase letters, processing abbreviations and contractions, and removing or substituting
numbers.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Stopword</title>
          <p>When working with Twitter data in Spanish, it is common to apply stopwords to remove words
that do not contribute significantly to the text’s overall meaning. These stopwords include
articles like "el," "la," "los," and "las," pronouns such as "yo," "tú," and "él," as well as prepositions
and conjunctions like "de," "en," "con," and "por." Additionally, common verbs like "ser," "estar,"
and "tener," along with adjectives and adverbs such as "bueno," "malo," "grande," and "poco," are
often included in the stopwords list. Interjections like "ah," "eh," and "oh" are also considered as
stopwords. Removing these stopwords from the Twitter text can shift the focus towards more
meaningful content, allowing for better analysis and understanding of the data.</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>2.3.3. Exploratory Data Analysis</title>
          <p>The text was normalized at this stage by converting it to lowercase and handling slang words
and abbreviations identified based on [ 19]. Visualizing the content of a text document is a
crucial task in text mining. However, many text visualizations indirectly represent the text by
showing linguistic model outputs such as word count, character length, and word sequences.
In our analysis, we initially conducted two types of analysis to explore the word relationships
within each tweet:
• Univariate Analysis: This analysis shows that words like ‘wuebonas’→ ’estupidas’,
’ptm’ → ’puta madre’ etc. These can be replaced by the expanded meaning or removed as
needed. In our case, it increased the accuracy of the models by 0.03% compared to when
we had it.
• Bivariate Analysis: Bigram and Trigram analysis was performed to explore word
relationships within each tweet. Depending on the application, this analysis examined
phonemes, syllables, letters, words, or base pairs. In addition, it aimed to identify any
relationships with words that were not eliminated in previous phases, potentially impacting
the subsequent classification process. The exploratory analysis revealed modified words,
such as "Abbreviations" being transformed into "ntpsdv" and "no te pases de verga".</p>
          <p>Thanks to these analyses, we can see which words afect the classifier’s performance that
we can eliminate. For example, words such as acronyms, abbreviations, grammatical errors, or
slang (’vato’, ’joto’, ’wey’, ’machin’, etc) can be removed or replaced within the corpus.</p>
        </sec>
        <sec id="sec-2-3-4">
          <title>2.3.4. Data Balancing</title>
          <p>In the multiclass classification task, where the categories are LGBT+phobic (P),
nonLGBT+phobic (NP), or non-LGBT+related (NA), data balancing is crucial due to the initial
imbalance of classes in the dataset. Therefore, cross-validation and oversampling techniques
were applied to address the dataset’s initial class imbalance for the multiclass classification task.</p>
          <p>We initially used the oversampling technique to increase the representation of the minority
class (P) by generating synthetic samples. In this case, we used the popular oversampling
technique Synthetic Minority Oversampling Technique (SMOTE). To avoid overfitting, we
performed over-sampling. Initially, we performed a random oversampling with an algorithm
randomness control (Random State) of 42. Then, we performed cross-validation by random
permutation (Shufle Split) to extract the training dataset equivalent to 80% and the test dataset
comparable to 20% of the total data.</p>
          <p>By combining cross-validation and oversampling, we can ensure that the model is trained
on balanced data and evaluate its performance robustly. It helps mitigate the impact of class
imbalance and improves the model’s ability to accurately classify instances into all three classes:
LGBT+phobic (P), non-LGBT+phobic (NP), and non-LGBT+ related (NA).</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Transformers</title>
        <sec id="sec-2-4-1">
          <title>Architecture</title>
          <p>Q
K
V</p>
          <p>Stacked attention layers</p>
          <p>Linear
Linear
Linear</p>
          <p>Scale
Softmax</p>
          <p>Concat</p>
          <p>Linear</p>
          <p>A bag of tokens is given without any ordering pattern to initialize the transformers. Then,
the transformer relies on a "self-attention" mechanism to learn the dependencies between the
tokens. In addition, a particular encoding step is performed before the first layer of the encoder
to ensure that embeddings of the same word appearing in a diferent position in the sentence
will have another representation. This step is called positional encoding, and its purpose is to
inject information about the relative positioning between words, which we would otherwise
lose. The critical component of this architecture is the self-attention layer, which intuitively
allows the encoder to look at other words in the input sentence whenever processing one of its
words. Stacking multiple layers of this type creates multi-head attention (MHA) layer, as shown
in Fig. 3. Then, we condensed the individual outputs into a single matrix by concatenating the
head outputs and passing the result through a linear layer.</p>
          <p>• Encoder: In the encoding part, the input embeddings are multiplied by three separate
weight matrices, as indicated in Equation 1, Q (queries), K (keys), and V (values), to
generate diferent word representations.</p>
          <p>=  ·  =  ·   =  · 
,  ,  ∈ ℜ×  are the learned weight matrices. Eventually, we obtain the
representation of each word by multiplying the scaled term with the V matrix containing
the input representation. We define this operation in Equation 2.</p>
          <p>= (, ,  ) =  (
 · 
√
) · 
(1)
(2)
• Decoder: During the decoding phase, every decoder layer receives the output of the
encoder (the K and V matrices) and the output of the previous decoder layer. Additionally,
we modified the self-attention layers into what we defined as "Masked" self-attention
layers. The masked MH self-attention layer ensures the use of only the self-attention
scores. We do it by adding a factor M to the word embeddings in Equation 3. We set M to
-inf for masked positions and 0 otherwise.</p>
          <p>=  (
 ·  + 
√
) · 
(3)
• Preprocessing: For performing the preprocessing, we should note that the two proposed
models, BERT and DistilBert, based on deep neural network architectures, include similar
steps for removing special characters, lemmatization, and stop word removal. In addition,
tokenized documents are truncated or padded with a given number of tokens to ensure
that the model receives uniformly sized input samples (i.e., with the same number of
tokens).</p>
          <p>As mentioned above, we will develop the problem using pre-trained BERT and DistilBERT
for automatic tweet categorization and test diferent optimizer methods. Table 1 shows the
parameters used for each architecture.</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>BERT Architecture</title>
          <p>
            The architecture consists of a stacked coding layer of the transformer [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. BERT is composed
of two main steps: pre-training and tuning. During pre-training, there are two unsupervised
tasks to train BERT on a sizable unlabeled corpus: masked language modeling (MLM) and
next sentence prediction (NSP) to produce a pre-trained model. Then, for fitting, the model
is initialized with the pre-trained parameters, and all parameters are fit using labeled data for
specific tasks such as classification.
          </p>
          <p>We select a BERT-base model containing an encoder with 12 transformer blocks, 12
selfattenuating heads, and a hidden size 768. The network takes input from a sequence of no
more than 512 tokens and outputs the sequence representation. The series has one or two
segments in which the first token of the line is always [CLS], which contains the particular
classification embedding. Then, we use another unique token [SEP] to separate the segments.
This study applies the "BERT-base-uncased" model as the base model. In the base model, we
use the tokenizer of the "BERT-base-uncased" model and the fine-tuned BERT architecture for
the classification task. A simple SoftMax classifier is added to the model’s top to predict the
probability of label c. W is the task-specific parameter matrix, and  is a tweets’ category.</p>
        </sec>
        <sec id="sec-2-4-3">
          <title>DistilBERT Architecture</title>
          <p>DistilBERT [20] is a "distilled" version of BERT, which is smaller and faster than BERT and
also protects the accuracy of BERT. Therefore, it is safe to say that DistilBERT is a smaller
transformer model than BERT. This model includes six layers, 768 dimensions, and 12 heads
with 66 million parameters. We conducted experiments with DistilBERT to augment the results</p>
          <p>BERT
vocab size
hidden size
num hidden layers
num attention heads</p>
          <p>intermediate size
hidden dropout prob
attention probs</p>
          <p>dropout prob
max position embeddings
type vocab size
initializer range</p>
          <p>epoch
Optimizer
128000
768
12
12
3072
0.1
0.1
0.1
512</p>
          <p>2
0.02
32
obtained with the "BERT-base-uncased-model". First, the "DistilBERT-base-uncased-model" is
used as a pre-trained model. The final proposed architecture consists of DistilBERT [ 20] with
dropout and linear layers on top of the DistilBERT. Next, we show the general architecture of
the model. A softmax classifier is used on top of the linear layer to predict the probability of the
 tweets category.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>We initially trained the model designed with 80% of the data the competition provided. The
construction of the BERT and DistilBERT models described before allowed us to obtain two
models with an accuracy of 0.945 and 0.932. Since we performed both the analysis and training,
accuracy and loss are the primary metrics. In Fig. 4 and 5, it is possible to observe the accuracy
behavior for each epoch. Each model has a diferent number of epochs used in training. For the
BERT model, there were 35, while for DistilBERT, 20. Although the accuracy obtained by the
BERT model is higher than that of DistilBERT, it is worth considering how many epochs each
one reaches the maximum level. Using an EarlyStopping, with patience=10, we obtained this
information, which Table 2 shows. Based on the percentage of the duty cycle it takes to complete
its maximum accuracy, DistilBERT does represent the most eficient model since it achieves
similar accuracy to BERT in a shorter duty cycle. However, it is noteworthy to comment that
despite having the highest accuracy.</p>
      <p>In Fig. 6(a), we can also see that despite not having equal or more significant precession than
BERT, the DistilBERT model predicts the P and NA categories much better, with a minimum
percentage error value of 2.25% concerning the data set. At the same time, it manages to be
much more efective with prediction among the NP, which has a more significant number of
data than the other classes, NA and P.</p>
      <p>Finally, we tested the incoming model with a dataset provided without labels. The results
obtained respectively reach 82.4% and 81.3% accuracy.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In conclusion, our objective was to find an optimal model for the multiclass hate speech detection
problem, explicitly focusing on classifying tweets in Mexican Spanish for LGBT+phobic content
using the dataset provided by the Hate Speech Detection track organized by the Grupo de
Ingeniería Lingüística at the Universidad Nacional Autónoma de México.</p>
      <p>Firstly, although the DistilBERT model better predicted the P and NA categories, there is
room for further improvement in accurately predicting the NP category. Fine-tuning the model
or exploring diferent approaches tailored explicitly for handling imbalanced data could help
address this issue. Additionally, while the BERT and DistilBERT models achieved high accuracies
of 94.5% and 93.2%, respectively, there is potential for enhancing their performance even
further. Experimenting with diferent hyperparameters and model architectures or incorporating
additional contextual information could improve classification results.</p>
      <p>We compared the results obtained using two transforms on the same corpus. The incoming
model was then tested on a dataset without labels, achieving accuracies of 82.4% and 81.3%,
respectively. Accuracy and loss were the primary metrics used for evaluation. Comparing
the two transform-based models, we found that DistilBERT outperformed BERT in speed
and accuracy, making it the preferred choice for classification. With an accuracy of 93.2%, it
demonstrated better overall performance among the proposed models.</p>
      <p>In summary, future work should address the remaining challenges in accurately predicting the
NP category, refining the model’s performance through advanced techniques, and broadening
its language capabilities to encompass a more diverse range of languages. By continuing to
iterate and improve upon the existing models, we can advance the field of hate speech detection
and contribute to developing more robust and inclusive language processing solutions.
with Applications 212 (2023) 118817. URL: https://www.sciencedirect.com/science/article/
pii/S0957417422018358. doi:https://doi.org/10.1016/j.eswa.2022.118817.
[13] D. Elreedy, A. F. Atiya, A comprehensive analysis of synthetic minority oversampling
technique (smote) for handling class imbalance, Information Sciences 505 (2019) 32–64.
URL: https://www.sciencedirect.com/science/article/pii/S0020025519306838. doi:https:
//doi.org/10.1016/j.ins.2019.07.070.
[14] A. S. Hussein, T. Li, D. M. Abd Ali, K. Bashir, C. W. Yohannese, A modified adaptive
synthetic sampling method for learning imbalanced datasets, in: Developments of
Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th
International FLINS Conference (FLINS 2020), World Scientific, 2020, pp. 76–83.
[15] M. Aloraini, A. Khan, S. Aladhadh, S. Habib, M. F. Alsharekh, M. Islam, Combining the
transformer and convolution for efective brain tumor classification using mri images,
Applied Sciences 13 (2023). URL: https://www.mdpi.com/2076-3417/13/6/3680. doi:10.
3390/app13063680.
[16] B. Jang, M. Kim, G. Harerimana, S.-u. Kang, J. W. Kim, Bi-lstm model to increase accuracy in
text classification: Combining word2vec cnn and attention mechanism, Applied Sciences 10
(2020). URL: https://www.mdpi.com/2076-3417/10/17/5841. doi:10.3390/app10175841.
[17] G. Bel-Enguix, H. Gómez-Adorno, G. Sierra, J. Vásquez, S.-T. Andersen, S. Ojeda-Trueba,
Overview of HOMO-MEX at Iberlef 2023: Hate speech detection in Online Messages
directed tOwards the MEXican spanish speaking LGBTQ+ population, Procesamiento del
lenguaje natural 71 (2023).
[18] D. M. Eler, D. Grosa, I. Pola, R. Garcia, R. Correia, J. Teixeira, Analysis of document
pre-processing efects in text and opinion mining, Information 9 (2018). URL: https:
//www.mdpi.com/2078-2489/9/4/100. doi:10.3390/info9040100.
[19] D. Huerta-Velasco, H. Calvo, Verbal aggression detection in mexican tweets, Computacion
y Sistemas 26 (2022) 261–269. doi:10.13053/CyS-26-1-4169, publisher Copyright: ©
2022 Instituto Politecnico Nacional. All rights reserved.
[20] R. Silva Barbon, A. T. Akabane, Towards transfer learning techniques—bert, distilbert,
bertimbau, and distilbertimbau for automatic text classification from diferent languages:
A case study, Sensors 22 (2022) 8184.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-S.</given-names>
            <surname>Jeong</surname>
          </string-name>
          ,
          <article-title>Sentiment classification using convolutional neural networks</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>9</volume>
          (
          <year>2019</year>
          ). URL: https://www.mdpi.com/2076-3417/9/11/2347. doi:
          <volume>10</volume>
          .3390/ app9112347.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Galas</surname>
          </string-name>
          ,
          <article-title>Chapter 11 - experimental computational simulation environments for big data analytic in social sciences</article-title>
          , in: V.
          <string-name>
            <surname>Govindaraju</surname>
            ,
            <given-names>V. V.</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Rao (Eds.),
          <source>Big Data Analytics</source>
          , volume
          <volume>33</volume>
          of Handbook of Statistics, Elsevier,
          <year>2015</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>277</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/B9780444634924000113. doi:https:// doi.org/10.1016/B978-0
          <source>-444-63492-4</source>
          .
          <fpage>00011</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. H.</given-names>
            <surname>Khand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khan</surname>
          </string-name>
          , G. Mujtaba,
          <article-title>Automatic hate speech detection using machine learning: A comparative study</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>11</volume>
          (
          <year>2020</year>
          ). URL: http://dx.doi.org/10.14569/IJACSA.
          <year>2020</year>
          .
          <volume>0110861</volume>
          . doi:
          <volume>10</volume>
          .14569/IJACSA.
          <year>2020</year>
          .
          <volume>0110861</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alkomah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>A literature review of textual hate speech detection methods and datasets</article-title>
          ,
          <source>Information</source>
          <volume>13</volume>
          (
          <year>2022</year>
          ). URL: https://www.mdpi.com/2078-2489/13/6/273. doi:
          <volume>10</volume>
          . 3390/info13060273.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper</article-title>
          and lighter,
          <year>2020</year>
          . arXiv:
          <year>1910</year>
          .01108.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Madni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Umer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abuzinadah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Saidani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alsubai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ashraf</surname>
          </string-name>
          ,
          <article-title>Improving sentiment prediction of textual tweets using feature fusion and deep machine ensemble model</article-title>
          ,
          <source>Electronics</source>
          <volume>12</volume>
          (
          <year>2023</year>
          ). URL: https://www.mdpi.com/2079-9292/12/6/1302. doi:
          <volume>10</volume>
          .3390/electronics12061302.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. M.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <article-title>A survey of sentiment analysis: Approaches, datasets</article-title>
          , and future research,
          <source>Applied Sciences</source>
          <volume>13</volume>
          (
          <year>2023</year>
          ). URL: https://www.mdpi.com/2076-3417/13/7/ 4550. doi:
          <volume>10</volume>
          .3390/app13074550.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases</article-title>
          and their compositionality,
          <year>2013</year>
          . arXiv:
          <volume>1310</volume>
          .
          <fpage>4546</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Severyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <article-title>Twitter sentiment analysis with deep convolutional neural networks</article-title>
          ,
          <source>in: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>959</fpage>
          -
          <lpage>962</lpage>
          . doi:
          <volume>10</volume>
          .1145/2766462. 2767830.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Behera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Rath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Misra</surname>
          </string-name>
          , Co-lstm:
          <article-title>Convolutional lstm model for sentiment analysis in social big data</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <article-title>102435</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S0306457320309286. doi:https: //doi.org/10.1016/j.ipm.
          <year>2020</year>
          .
          <volume>102435</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>G.-D. Pilar</surname>
            ,
            <given-names>S.-B.</given-names>
          </string-name>
          <string-name>
            <surname>Isabel</surname>
            , P.-M. Diego,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Ávila José Luis, A novel flexible feature extraction algorithm for spanish tweet sentiment analysis based on the context of words, Expert Systems</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>