<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UAEMemex Participation at Dimemex 2025: Exploring Lexical and Semantic Information to Detect Hate, Inappropriate, and Harmless Memes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Verónica Neri-Mendoza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Rojas-Simon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yulia Ledeneva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yorne</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alejandrina Santos-Bobadilla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abner Alain Gil-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ángel Baron-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arnulfo Garcia-Hernández</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Autonomous University of the State of Mexico, Instituto Literario 100</institution>
          ,
          <addr-line>Toluca 50000</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Secretariat of Science, Humanities, Technology and Innovation</institution>
          ,
          <addr-line>1582 Insurgentes Sur Avenue, Crédito Constructor Benito Juárez, Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>As part of the IberLef 2025 workshop, this paper presents the framework developed by the UAEMemex team for the subtask of classifying hate speech, inappropriate content, and harmless content in Spanishlanguage memes from Mexico, as part of the DIMEMEX-2025 competition. Given the growing dissemination and social impact of hate speech and inappropriate content on social media, research in Natural Language Processing (NLP) for their automatic detection has gained relevance. The complexity of identifying these categories, particularly in the meme format that fuses lexical and visual information, requires sophisticated approaches. For this reason, we explored lexical information through ASCII vectorization and semantics through vector representations obtained with BERT and Doc2Vec to discern distinctive patterns among the three categories. Finally, these representations were used as input for the Logistic Regression, Multilayer Perceptron (MLP) and KNN algorithms for classification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Memes</kwd>
        <kwd>Hate Speech</kwd>
        <kwd>Inappropriate Content</kwd>
        <kwd>ASCII</kwd>
        <kwd>BERT</kwd>
        <kwd>Doc2Vec</kwd>
        <kwd>MLP</kwd>
        <kwd>KNN</kwd>
        <kwd>and LR1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        This paper presents a framework of classification used by the UAEMemex team in Subtask 1:
Detection of hate speech, inappropriate content, and harmless memes. Participants were free to use
any approach of their choice, for the detection of inappropriate meme
        <xref ref-type="bibr" rid="ref2">s from Mexico.
(DIMEMEX2025</xref>
        ) [1], at the seventh workshop of the Iberian Language Evaluation Forum
        <xref ref-type="bibr" rid="ref3">(IberLef 2025)</xref>
        [2].
      </p>
      <p>The scientific study of hate speech and inappropriate content on multimodal social media, from
a Natural Language Processing (NLP) perspective, has an unquestionable potential for social
impact. With the rapid development of mobile and web technologies, such content has become
increasingly widespread on social media platforms, as it is easy to publish any opinion. It has been
frequently observed that user conversations often veer into inappropriate areas, such as insults and
rude and impolite comments about individuals or certain groups or communities [3].</p>
      <p>Recent studies confirm that exposure to these contents has serious offline consequences for
historically disadvantaged communities. Thus, research on the automatic detection of hate speech
and inappropriate content has attracted significant attention. Determining the presence of hate
speech and inappropriate content in a multimodal format is not straightforward, even for human
interpretation.</p>
      <p>In the meme format it poses significant challenges for automated moderation and semantic
understanding. While memes often link complex meanings through the interaction of textual and
visual elements, their potential to spread hate speech and inappropriate material demands the
development of methodologies for their detection.</p>
      <p>Our proposal focused on exploring lexical information through ASCII vectorization and
semantic information in the textual components of memes through BERT (Bidirectional Encoder
Representations from Transformers) vectorization. We proposed an in-depth analysis of lexical
items and their contextual meaning through semantic models that can provide patterns to
differentiate hate speech, inappropriate content, and memes without any such content. In addition,
we implemented Logistic Regression, MLP, and KNN algorithms with the information expressed in
text format.</p>
      <p>This paper is organized as follows: Section 2 presents the relevant state-of-the-art works that
address hate speech and inappropriate content detection; On the other hand, Section 3 describes
the proposed approach; While Section 4 presents the empirical results derived from the application
of the approach; and, finally, Section 5 presents the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State-of-the-art works</title>
      <p>Regarding BERT vectorization, this vectorization has been used in some works since BERT
analyzes words considering the complete context surrounding them (both the words before and
after them). This vectorization allows to understand the meaning of the words. Because of this, this
type of representation has been used in several research, such as in [4] that address the automatic
identification of hate speech in social media was investigated. To address the challenge of a lack of
labeled data and classification biases, a transfer learning strategy based on BERT, a pre-trained
language model, was proposed. It is ability to capture the context of offensive language was
evaluated using fine-tuning techniques, and its performance was compared with previous
approaches using public datasets tackling topics such as racism, sexism, and offensive content on
Twitter (currently known as X). The results indicated that the model achieved significant
improvements in precision and recall. In addition to revealing biases in annotation and data
collection, which it is advisable to use to realize the potential of a more accurate model.</p>
      <p>Moreover, hate speech detection on social media was investigated in [5], focusing on content
related to religion, race, gender, and sexual orientation. To capture the semantics of the language
used in these speeches, the BERT language model was used as a vectorization method. The results
showed that BERT achieved an F1-score of up to 96% on a balanced dataset, outperforming other
methods such as LSTM neural networks with domain-specific embeddings, which achieved an
F1score of 93%.</p>
      <p>On the other hand, lexical features have proven to be fundamental in various NLP tasks, as they
allow the representation of crucial information from documents for applications such as clustering
and classification. In the specific field of text classification, character, or word-level, n-grams
constitute one of the most widely used representations. However, their inherent high
dimensionality entails considerable computational costs. Given this situation in mind, our objective
focuses on identifying a lower-dimensional representation capable of capturing the stylistic and
lexical particularities of a document. Consequently, the probability of occurrence of each character
in its ASCII encoding within the text was calculated [6], [7].</p>
      <p>In [8], the authors developed a hate speech content detection system, based on Twitter posts,
using K-Nearest Neighbor (KNN) method, in conjunction with the TF-IDF feature extraction
technique. The main objective was to identify potential violations of the Indonesian Electronic
Information and Transaction Act (UU ITE). Standard data preprocessing techniques were applied,
and TF-IDF was used to convert texts into numerical vectors. Classification was performed using
KNN (K=10), achieving an accuracy of 67.86% using the cosine distance metric. In an evaluation
with 100 new tweets, the accuracy reached 77%, validated by UU ITE law experts. The study
concludes that KNN, combined with TF-IDF, is an effective tool for hate speech detection, although
its accuracy could be improved depending on more balanced data and advanced natural language
processing techniques.</p>
      <p>In the work done in [9], it focused on the classification of Indonesian-language hate speech text,
using an improved version of the KNN algorithm. To improve the representation of text features, a
term weighting technique called TF-IDF-ICSρF was applied. The results indicated that the
combination of TF-IDF-ICSρF with the improved KNN algorithm achieved an average accuracy of
88.11%, significantly outperforming the traditional KNN approach with TF-IDF, which achieved an
accuracy of 70.30%.</p>
      <p>In [10] the detection of hate speech on Twitter was explored using NLP and machine learning
techniques. A logistic regression model was proposed to classify tweets into three categories: hate
speech, offensive language, and none of the above. Standard preprocessing techniques were
applied, and TF-IDF was used for text vectorization. The results demonstrated that the model
achieved an accuracy of 93%, highlighting its effectiveness in automatically identifying harmful
content on social media. The study highlights the usefulness of logistic regression combined with
NLP representations as an effective solution for content moderation on digital platforms.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Approach</title>
      <p>This section digests the proposed approach's overview. First, we explain the data stratification on
which our proposed approach was tested. Second, we describe how the data was preprocessed.
Third, we describe the text representation models employed for each meme. Finally, we describe
the classifiers we used and their hyperparameters.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Stratification</title>
        <p>
          Regarding the
          <xref ref-type="bibr" rid="ref2">subtasks of DIMEMEX-2025</xref>
          , the proposed approach was tested on Subtask 1
(Detection of Hate Speech, Inappropriate, and Harmless Memes). In the first phase, the task
organizers released training and development sets, providing only the labels in the training set to
create classification models. The labels in the development set were not available, but proposed
approaches could be evaluated with these data through the CodaLab platform [11].
        </p>
        <p>On the other hand, the data stratification of the training set is shown in . As observed, it
consists of 2263 Mexican Spanish memes from social media, of which 62.08% belong to the class
“Harmless”, 20.86% belong to the class "Inappropriate content", and 17.06% belong to the "Hate
speech" class (see table 1). With this information in mind, we notice that the distribution of memes
is disbalanced, providing most of the memes in the class "Harmless". This situation represents a
challenge for participants to distinguish any aggression.
100.00%
In addition, it is worth mentioning that the information for each meme is organized according to
the examples shown in Figure 1. Each meme has its image, OCR text, image caption or description,
and class. For the proposed approach, we focused on analyzing the OCR texts because we assumed
that they contain more specific information that may be related to hate speech and inoffensive
content.</p>
        <sec id="sec-3-1-1">
          <title>Image</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Text</title>
          <p>NO ESTÁ MAL</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Description</title>
          <p>Class
La imagen es un
meme que presenta
un primer plano de
un hombre con piel
oscura y cabello
corto. Tiene una
expres …</p>
          <p>Harmless</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Pre-processing</title>
        <p>-Pero esa mujer te hizo
brujería!
-la brujería...</p>
        <p>La imagen es un meme
que presenta un
formato de dos paneles
extraídos de una
escena de una película
o serie. En el primer
panel …
Inappropriate content</p>
        <p>Después de los memes racistas y</p>
        <p>xenófobos, seguimos siendo
hermanos. Porque somos seres</p>
        <p>civilizados.</p>
        <p>La imagen es un meme que
presenta una serie de personajes
con cabezas de diferentes criaturas
o personas super-puestas sobre
figuras de palitos, cada una vestida
con camisetas que representan
banderas de varios países
latinoamerica...</p>
        <p>Hate speech
After extracting OCR text memes, we filtered them to eliminate unnecessary information. This
process involves the following steps:</p>
        <p>Tokenization: All words were tokenized and separated from non-alphanumeric characters and
punctuation marks to improve the association or “understanding” of words for each meme.</p>
        <p>Normalization: We removed non-alphanumeric characters and punctuation marks from
tokenized texts, obtaining words that were then converted to lowercase.</p>
        <p>Stopwords removal: Finally, normalized texts were introduced to stopword removal, eliminating
common low-meaning words (e.g., de, la, que, el). In particular, we removed all stopwords from
each text through the Spanish stopwords list provided by the NLTK toolkit
(https://bit.ly/2DqKPvW).</p>
        <p>In previous studies on hate speech detection, the steps mentioned above have been considered
standard procedures to create classification models. However, the proposed approach does not
necessarily rely on these methods. Below, we provide a detailed explanation of its structure.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Architecture</title>
        <p>The model architecture is based on the framework of components we call NLPClassKit.
NLPClassKit is an assembly of software components that allows classification models to be
developed for NLP projects. It is currently available in a GitLab repository
(https://gitlab.com/JohnRojas/NLPClassKit), and it considers an architecture of processes according
to Figure 2.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.3.1. Text vectorization</title>
        <p>Once text documents were extracted or preprocessed, they were used as input for text
vectorization methods. Such methods are essential for extracting relevant text information at
different levels (e.g., lexical or semantic). Some of these methods are incorporated into a software
component called Text2Vec. Text2Vec is a component developed in Python and available in a
GitLab repository (https://gitlab.com/MLComponents1/Text2Vec) that encodes texts according to
the following techniques:</p>
        <p>One-hot-encoding (OHE): OHE is a straightforward method for representing words as binary
vectors. It generally creates a vector vocabulary from input text, where each word is assigned a
unique index and a binary vector is created for each word (i.e., it assigns 1 to the index of the word
appearing in the document and zero elsewhere) [12]. Concerning other methods, OHE does not
require high computational resources but generates high-dimensional vectors.</p>
        <p>TF-IDF (Term Frequency-Inverse Document Frequency): It is a frequency-based method for
evaluating the importance of a word to a document in a collection (corpus). It combines
TermFrequency (how often a word appears in a document) and Inverse Document Frequency (How
unique or rare the word is across all documents).</p>
        <p>Doc2Vec (D2V): It is a Word2Vec-based method that generates vector representations for entire
documents, paragraphs, or sentences instead of individual words [13]. In general, it captures the
context and semantics of a document in a fixed-length numerical vector, allowing for comparison,
classification, or clustering of documents.</p>
        <p>BERT (Bidirectional Encoder Representations from Transformers): It is a pre-trained method based
on attention mechanisms that understands the context of words in a sentence by looking at both
the left and right sides (bidirectional) [14]. Unlike the methods mentioned above, BERT uses a
transformer architecture to deeply understand meaning and relationships in language, creating
contextual vectors. Moreover, it is fine-tuned with specific-domain text data, and it does not
require preprocessing steps mentioned in the Section 3.2.</p>
        <p>ASCII: This method proposed in [6], [7] that considers the frequency of characters to determine
the probability that each character may appear from a given OCR text. Formally, it is shown in Eq.
(1):</p>
        <p>ASCII ( d )=[ p ( c1) , p ( c2) , … , p ( c255)] , p ( ci )=
f ( ci ) ,
len ( d )
(1)
where the function ASCII ( d ) receives the input OCR text d . Next, it generates a vector
representation of 255 values; each p ( ci ) represents the probability that the character ci appears in
d . Such probabilities are calculated by dividing the frequency of character ci ( f (ci )) and len( d ) (a
function that counts the number of characters). Furthermore, the value p ( c256) is used for emojis
or unknown characters that may appear in the same OCR text.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.3.2. Classification</title>
        <p>The vectors obtained from the previous step are used as input to a set of supervised machine
learning algorithms, which are briefly described as follows:</p>
        <p>NaiveBayes (NB): The NB algorithm relies on the Bayes theorem, which deals with probability
calculus. Considering its input features, the NB computes the probability that a text may belong to
a specific class.</p>
        <p>K-Nearest Neighbors (KNN): KNN is a well-known machine learning algorithm that operates on
the majority rule principle. It predicts the label of a test data point by assigning it to the class most
common among its K nearest training data points in the feature space.</p>
        <p>Logistic Regression (LR): The LR is a supervised machine learning algorithm that measures the
relationship between a set of features ( X ) and a binary output ( y ∈{0,1}). Mathematically, LR is
expressed as 1/( 1+ e−z ), where z is the linear combination between the input features ( xi) and
model parameters (wi). The output values range from 0 to 1, indicating the likelihood that the
input belongs to the output 1.</p>
        <p>Multilayer Perceptron (MLP): The MLP is a deep learning model composed of three types of fully
connected layers: (i) the input layer, (ii) one or more hidden layers, and (iii) the output layer.
Furthermore, this model can capture complex data relations and solve text classification tasks.</p>
        <p>Support Vector Machines (SVM): SVM is a supervised machine learning algorithm for
classification tasks. It works by finding the optimal hyperplane that best separates data points of
different classes in a high-dimensional space. The key idea is to maximize the margin between the
nearest points (support vectors) of different classes.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.3.3. Generation and interpretation of results</title>
        <p>Finally, the result of the classification process is given in the last stage, where each algorithm
described in the previous section generates the following information:
 Confusion matrix: Shows the number of texts classified correctly/incorrectly.
 Predictions: Shows the prediction labels generated in the training stage of each algorithm.
 Performance of generated models: Shows a detailed description of the performance of each
classification model in terms of Recall, Precision, F1-score, macro F1-score, and Accuracy.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>As we observe, the best result that we obtained was the use of BERT as text vectorization with
LR (0.43), while the second (0.42) and third (0.37) best results were obtained using the KNN
classifier. On the other hand, we obtained the best results using BERT vectorization, due to its
advantage in generating contextualized representations of words, compared to ASCII vectorization,
which is a simpler form based on frequencies. However, despite being simple, ASCII vectorization
obtained results very close to the more sophisticated BERT vectorization. Its performance was even
better than D2V vectorization (0.33).</p>
      <p>The parameters we used to obtain the best results are described in Table 3. Regarding text
vectorization, we employed the multilingual pre-trained BERT model with a maximum length of
256 features, utilizing the CLS mapping method. In the case of D2V, we tested several parameters;
nevertheless, the best result was obtained with a vector size of 100 and a maximum distance
between the current and predicted words of 5 (Window). Furthermore, this vector representation
was obtained using 40 epochs and word vectors by skip-grams (Dbow words = 1).</p>
      <p>On the other hand, the LR was iterated 500 times using the Limited-memory BFGS optimization
algorithm, which is employed for solving large-scale problems, considering l2 as the criterion
penalty. Concerning the KNN algorithm, we tested its performance using different values of K (1, 3,
5, 7, and 9). However, we obtained the best F1 results by using the nearest neighbor (K=1).
Regarding MLP, the best model was obtained by 100 epochs, the loss function was Sparse
categorical cross entropy, and the learning rate was 0.010. Related to its architecture, we used ReLU
activation function in hidden layers, and Softmax in the output layer. Across all layers, the model
employed 10303 parameters.</p>
      <p>Below, we show in Table 4 the comparison between our team's results and the rest of the
competitors. According to the F1-score results, our team, with the best result, obtained the 6th
position. Concerning the best team (Ryuan), there is a gap of 0.15. However, it is important to
highlight that our experiments were done using straightforward methods, and we only considered
OCR texts from the provided dataset.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper, we presented the proposed approach that the Uaememex team employed in Subtask 1
from DIMEMEX-2025, as an effort from the NLP scientific community to minimize digital violence
to several communities. In particular, we experimented with lexical and semantic features from
different text vectorization methods. We described the stages, parameters, and values to develop
classification models for detecting Hate speech, Inappropriate, and Harmless content. Specifically,
we employed a framework of classification called NLPClassKit, which focuses on text vectorization
methods (OHE, TF-IDF, D2V, BERT, and ASCII) and supervised machine learning algorithms (NB,
LR, KNN, MLP, and SVM) to solve NLP classification problems.</p>
      <p>According to the experimentation described in Section 4, the configuration that obtained the
best results was vectorization of texts through BERT and classification by LR. Despite obtaining
lower results than other participants, we noticed that the exploration of semantic features may
better contribute to the detection of hate speech, inappropriate content, and harmless content.
Nevertheless, we observed that the ASCII-based text representation should be incorporated as a
complement to modern classification techniques (see Section 4), as it obtained comparable
performance to other approaches.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly to check grammar and spelling,
as well as Gemini to improve the clarity of some sentences. After using these tools, the authors
reviewed and edited the content as needed. Furthermore, they take full responsibility for the
content of the publication.
[13]
[14]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Villaseñor-Pineda</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Montes-</surname>
          </string-name>
          y-Gómez,
          <article-title>“Overview of DIMEMEX at IberLEF2025: Detection of Inappropriate Memes from Mexico,”</article-title>
          <source>Proces. del Leng</source>
          .
          <source>Nat.</source>
          , vol.
          <volume>75</volume>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>S. M.</given-names>
            <surname>González-Barba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            ,
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          ,
          <article-title>Jiménez-Zafra, “Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages,”</article-title>
          <source>Proc. Iber</source>
          . Lang.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Eval. Forum</surname>
          </string-name>
          (IberLEF
          <year>2025</year>
          ),
          <article-title>co-located with 41st Conf</article-title>
          .
          <source>Spanish Soc. Nat. Lang. Process. (SEPLN</source>
          <year>2025</year>
          ),
          <article-title>CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Yenala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jhanwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Chinnakotla</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Goyal</surname>
          </string-name>
          , “
          <article-title>Deep learning for detecting inappropriate content in text,”</article-title>
          <source>Int. J. Data Sci. Anal</source>
          ., vol.
          <volume>6</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>286</lpage>
          ,
          <year>2018</year>
          , doi: 10.1007/s41060-017-0088-4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Mozafari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Farahbakhsh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Crespi</surname>
          </string-name>
          , “
          <article-title>A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media,”</article-title>
          <source>in Studies in Computational Intelligence</source>
          ,
          <year>2020</year>
          , vol.
          <volume>881</volume>
          SCI, pp.
          <fpage>928</fpage>
          -
          <lpage>940</lpage>
          , doi: 10.1007/978-3-
          <fpage>030</fpage>
          -36687-2_
          <fpage>77</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          9, pp.
          <fpage>106363</fpage>
          -
          <lpage>106374</lpage>
          ,
          <year>2021</year>
          , doi: 10.1109/ACCESS.
          <year>2021</year>
          .
          <volume>3100435</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>Notes Bioinformatics)</source>
          , vol.
          <volume>14755</volume>
          LNCS, pp.
          <fpage>331</fpage>
          -
          <lpage>341</lpage>
          ,
          <year>2024</year>
          , doi: 10.1007/978-3-
          <fpage>031</fpage>
          -62836- 8_
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Rojas-Simón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ledeneva</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>García-Hernández</surname>
          </string-name>
          ,
          <article-title>“A Dimensionality Reduction Approach for Text Vectorization in Detecting Human and Machine-generated</article-title>
          <string-name>
            <surname>Texts</surname>
          </string-name>
          ,” doi: 10.13053/CyS-28-4-5214.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Prasetyo</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Samudra</surname>
          </string-name>
          , “
          <article-title>Hate speech content detection system on Twitter using K-nearest neighbor method,” in AIP Conference Proceedings</article-title>
          , Apr.
          <year>2022</year>
          , vol.
          <volume>2470</volume>
          , doi: 10.1063/5.0080185.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Saputra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aeni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Saraswati</surname>
          </string-name>
          , “
          <article-title>Indonesian Hate Speech Text Classification Using Improved K-Nearest Neighbor with TF-IDF-ICSρF,”</article-title>
          <string-name>
            <given-names>Sci. J.</given-names>
            <surname>Informatics</surname>
          </string-name>
          , vol.
          <volume>11</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2024</year>
          , doi: 10.15294/sji.v11i1.
          <fpage>48085</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Zehra</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Doja</surname>
          </string-name>
          , “
          <article-title>A Logistic Regression Model for Hate Speech Recognition</article-title>
          ,”
          <source>May</source>
          <year>2023</year>
          , doi: 10.4108/eai.24-
          <fpage>3</fpage>
          -
          <year>2022</year>
          .2318769.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pavão</surname>
          </string-name>
          et al.,
          <article-title>“CodaLab Competitions An Open Source Platform to Organize Scientific Challenges,”</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Mach</surname>
          </string-name>
          .
          <source>Learn. Res.</source>
          , vol.
          <volume>24</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>2023</year>
          , Accessed: May 20,
          <year>2025</year>
          . [Online].
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Rojas-Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ledeneva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Garcia-Hernandez</surname>
          </string-name>
          ,
          <article-title>“Evaluation of Text Summaries Based on Linear Optimization of Content Metrics</article-title>
          ,” vol.
          <volume>1048</volume>
          , p.
          <fpage>215</fpage>
          ,
          <year>2022</year>
          , doi: 10.1007/978- 3-
          <fpage>031</fpage>
          -07214-7.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , “
          <article-title>Distributed Representations of Sentences and Documents</article-title>
          .” J.
          <string-name>
            <surname>Devlin</surname>
            , M.-
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K. T.</given-names>
          </string-name>
          <string-name>
            <surname>Google</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. I. Language</surname>
          </string-name>
          , “BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,” pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Accessed: May 20,
          <year>2025</year>
          . [Online]. Available: https://github.com/tensorflow/tensor2tensor.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>