<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HARGP-BETO: Hierarchical Text Interactions Model for Abuse Detection in Mexican Spanish Memes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qiyuan Jin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiang Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Hong Kong University of Science and Technology</institution>
          ,
          <addr-line>Clear Water Bay, Kowloon, 999077</addr-line>
          ,
          <country country="HK">Hong Kong</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The automatic detection of abusive content in memes presents unique challenges in low-resource languages like Mexican Spanish, where cultural nuances and data scarcity compound traditional NLP dificulties. This study introduces HARGP-BETO, a novel hierarchical framework that combines advanced attention mechanisms with adaptive feature fusion for detecting hate speech and inappropriate content in Mexico Spanish memes. Our approach integrates dual-segment encoding with local-global attention interactions, enhanced through gated fusion and multi-level pooling strategies. Experimental results on the DIMEMEX corpus demonstrate the framework's efectiveness with macro-F1 score of 0.6139 and 68.24% accuracy, improving performance compared to existing baseline methods. While excelling at majority class detection (75.4% recall for non-abusive content), analysis reveals persistent challenges with minority class discrimination, particularly for hate speech categories. The results validate text-based approaches as computationally eficient alternatives to multimodal meme systems, while highlighting directions for addressing cultural specificity and data imbalance in Mexico Spanish abusive content detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Hybrid Attention</kwd>
        <kwd>Mexico Spanish</kwd>
        <kwd>Abusive Content Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Social media is playing an ever more significant role in people’s daily lives [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Memes, combining
images with text, are spreading rapidly and have become a popular way for people to communicate and
express emotions. Nevertheless, some memes contain negative elements like insults, attacks on specific
groups or individuals, and hate speech. Such memes with harmful content can foster bad trends such as
prejudice and discrimination, causing serious negative impact, particularly on the healthy development
of teenagers. The harmony of the online world is a pressing concern. As a result, in recent years, the
detection and analysis of abusive content in social media have emerged as a hot topic in computational
linguistics [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        In English-speaking contexts, with the development of NLP, a variety of automatic detection methods
for hate speech and abusive content have emerged, making remarkable progress [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]. However,
in Spanish-speaking contexts, this challenge is much more severe. The unique linguistic features
of Spanish, coupled with a lack of datasets, which include the influence of local languages and the
emergence of localized neologisms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], have collectively hindered progress in this area. Although
previous work has made some headway and started to fill this gap [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], such as shared task DIMEMEX
[
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], robust baselines for abusive meme classification are still limited.
      </p>
      <p>To further advance this research, we conduct a study on the DIMEMEX dataset, with the aim of
determining whether memes contain hate speech, inappropriate content, or neither. This paper aims to
develop models that can accurately detect and classify diferent types of abusive content in Mexican
Spanish memes. By addressing this challenge, we hope to create a safer and more respectful online
world for Spanish-speaking users in Mexico and beyond.</p>
      <p>
        Moreover, since text serves as the primary and direct information carrier on social media platforms
and memes often contain text conveying core meanings and emotions, this gives unique advantages
to text-based models. Even though memes are essentially multi-modal, text alone can provide rich
semantic information to identify whether a meme is abusive, which is particularly important when
there are nuances in emotional expression. Compared to multi-modal approaches, text-based detection
has several advantages, such as easier data acquisition and annotation, lower computational costs, and
better interpretability. For all these reasons, this paper focuses on developing an efective text-based
computational model to accurately detect and classify abusive content in Mexican Spanish memes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The tendency of hate speech to go viral significantly compounds the challenges of content moderation
detection. Numerous scholars are dedicated to developing automated detection algorithms over years
for monitoring negative content on social media. Davidson et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] implemented unsupervised
learning, utilizing crowd-sourced hate speech lexicons to train multi-class classifiers; Founta et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] devised an incremental iterative methodology employing crowd-sourcing to annotate large-scale
tweet collections with abuse-related labels; Kiela et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] established the Hateful Memes Challenge
Dataset, addressing gaps in multi-modal hate speech classification datasets; Bai et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] introduced
STATE ToxiCN, the first span-level Chinese hate speech dataset. As the world’s third most spoken
language, Spanish has also witnessed significant progress in hate speech detection. Exploratory work
collectively drives domain advancement. For instance, scholars such as José Antonio García-Díaz [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
Montesinos-Cánovas [15], and Vallecillo-Rodríguez [16] have made notable contributions through their
research.
      </p>
      <p>
        Mexico maintains critical significance within Spanish-speaking regions in the world. Its unique
geography fosters complex digital linguistic ecosystems where slang, cultural allusions, regional
dialects, and loanwords prevail [16]. These lingual-cultural phenomena substantially increase detection
dificulties for hateful or abusive content. Many researchers have devoted significant efort to the
ifeld of ofensive content detection in Mexican Spanish and have achieved some significant research
outcomes. Gemma Bel-Enguix [17] introduce the T-MexNeg corpus, which is the first corpus annotated
with negation in Twitter in Mexican Spanish. MEX-A3T task in IberLEF 2019 [18] and IberLEF 2020
[19] conferences focused on the detection of aggressive tweets. Subtask 3 and Subtask 4 in MeOfendES
2021 [20, 21] are related to the identification of ofensive language targeting the Mexican variant of
Spanish. Similarly, they continue to contribute to this field in 2023 - 2025 [
        <xref ref-type="bibr" rid="ref8">22, 8, 23</xref>
        ]. These studies and
conferences demonstrate the evolution and progress in Mexican Spanish ofensive content detection
these years.
      </p>
      <p>Nevertheless, previous work has primarily concentrated on pure text analysis. Memes serve as
emerging carriers of hate speech with concealment that impedes identification. Developing eficient
detection for Mexican Spanish demands extensive new studies.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>According to the introduction of the organizers, the DIMEMEX corpus consists of more than 3000 memes,
compiled from public Facebook groups rooted in Mexico. Given that a significant amount of emotions,
opinions and statements can be conveyed through memes, the dataset has been manually annotated to
detect hate speech, inappropriate content, and neither within them, shown in Figure 1. Based on the
task definition, this dataset can be used for a three-class classification problem, distinguishing between
hate speech, inappropriate content, and neither. Additionally, it enables a more nuanced classification,
diferentiating instances of hate speech into various categories like classism, sexism, racism, and others.
However, our primary focus here is on the three-class classification problem.</p>
      <p>Each meme’s information comes from two parts, text and image. The meme text is extracted from the
images via a state-of-the-art OCR technique, and each meme has a unique meme ID. Table 1 outlines
(a) neither
(b) inappropriate content
(c) hate speech
the training dataset for the three - class classification problem, containing 2263 memes. In the training
phase of this study, 85% of the dataset is used as the train dataset to train and optimize the model,
while the remaining 15% serves as the validation set for preliminary performance evaluation and model
tuning. During diferent stages of the competition, model predictions are made on the oficial test
datasets, and the results are submitted to obtain performance scores. This ongoing assessment ensures
the generalizability and practical efectiveness of the model.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>In this section, we describe our data preprocess and proposed framework, HARGP-BETO, for
multiaspect meme classification.</p>
      <sec id="sec-4-1">
        <title>4.1. Data Augmentation</title>
        <p>This paper utilizes a multimodal DIMEMEX dataset comprising text-image pair samples. Each sample
 is structured as follows:</p>
        <p>= (, , , )
where  represents the OCR text extracted from memes,  is the description associated with
the memes,  is the image of memes, and  denotes the classification labels.</p>
        <p>To construct the input for our model, we design a specialized data preprocessing pipeline with the
following key steps. For dynamic text augmentation, we apply synonym replacement (using the nlpaug
library) on 20% of the meme text instances during training to enhance model generalization:
′ = (),  (augment) = 0.2
This approach helps to mitigate overfitting by introducing lexical variations while maintaining semantic
consistency.</p>
        <p>We conducted quantitative analysis of text length distributions using BERT tokenizershown in Table 2.
Through hyperparameter experimentation, we chose the optimal maximum input lengths: 56 tokens for
text and 240 tokens for descriptions. To efectively integrate these dual-text sources while preserving
their distinct semantic roles, we design a specialized input format that explicitly segments the content,
(1)
(2)
as Equation (3) . This method extends standard BETO processing by incorporating explicit boundary
markers between OCR-extracted meme text and descriptions, and balances information preservation
and GPU memory constraints and reduce padding waste to the greatest extent.</p>
        <p>Input = [CLS] ⊕ o1c:r ⊕ [SEP] ⊕ c1t:x ⊕ [SEP]</p>
        <p>As our research primarily focuses on text, we only perform basic processing on the image information
for contrastive experiments. The images of memes  undergo some basic processing methods to obtain
′ and the standardized visual features are extracted through a ViT processor.</p>
        <p>Finally, text and images multimodal alignment can be achieved:
pixel = ViTProcessor(′) ∈ R3× 224× 224</p>
        <p>final = (Itext, Mattn, Ttype, pixel, )
These visual features exclusively support baseline comparisons in Section 5.1, including the ViT-only
model and multimodal CLIP benchmark. This underscores our core methodological focus:
demonstrating that textual signals, when properly processed through our proposed model, provide suficient
discriminative power for abuse detection without visual dependency.
4.2. Model
The model is a hierarchical text classification framework combining dual-segment interaction and
adaptive feature fusion. The architecture comprises three key modules, shown in Figure 2:
1. BETO-based Dual-segment Encoder,
2. Hybrid attention with local-global interactions,
3. Gated hierarchical pooling.</p>
        <p>BETO-based Dual-segment Encoder: The BETO encoder generates embeddings H ∈
R× (++3)×  [24]. We then divide the output of BETO output into text sequence  ∈ R× 
and description sequence  ∈ R×  [25]. These are passed through a norm layer to get the
normalized OCR text feature ∈ R× ×  and normalized descriptions features  ∈ R× ×  ,
where  is maximum input length of text,  is maximum input length of descriptions.</p>
        <p>Hybrid Attention Mechanism: This module jointly models the local contextual patterns of OCR
text and the global cross - interaction between OCR text and descriptions [26, 27]. This hybrid attention
mechanism combines the advantages of local causal attention and global cross-segment attention, and
uses an adaptive feature fusion method to dynamically combine local and global features.
1. Local Causal Attentions</p>
        <p>The window-restricted multi-head attention [28] capture local dependencies by restricting attention
to a fixed-size window, thereby focusing on fine-grained contextual relationships.</p>
        <p>Q = TnormW,</p>
        <p>K = TnormW ,</p>
        <p>V = TnormW
(6)</p>
        <p>We introduce a causal mask  ∈ {0, −∞} ×  to to restrict attention to a window of size 3 for
locality.
attention mechanism here.</p>
        <p>Q = TnormW,</p>
        <p>K = DnormW ,</p>
        <p>V = DnormW
M, =
{︃0
−∞
if  ≤  + 1 (window size = 3)
otherwise
The local attention mechanism is then computed as Equation(8) . Additive masking, which is equivalent
to multiplicative masking, uses large negative values to ensure numerical stability and hardware
eficiency, making it more memory-bandwidth friendly.</p>
        <p>Alocal = Softmax
︂( QK⊤ + M</p>
        <p>︂)
√</p>
        <p>V ∈ R× ×</p>
        <sec id="sec-4-1-1">
          <title>2. Global Cross-Segment Attention To obtain the better interactions between OCR text and descriptions, we employ a global multi-head (7) (8)</title>
          <p>(9)
Aglobal = Softmax
︂( QK⊤ )︂
√</p>
          <p>V ∈ R× ×</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3. Adaptive Feature Fusion To dynamically combine local and global features, we introduce a learnable gate that adaptively fuses these features of the input data [29, 30, 31].</title>
          <p>G =  (︀ W[Alocal; Aglobal]︀) ∈ R× × 
Afused = G ⊙</p>
          <p>Alocal + (1 −</p>
          <p>G) ⊙</p>
          <p>Aglobal ∈ R× × 
fusion allows the model to leverage both local and global information efectively.</p>
          <p>where  denotes the sigmoid activation, and ⊙ represents element-wise multiplication. This adaptive
Hierarchical Pooling: To enhance the text understanding capability of the model, we adopt a
hierarchical pooling strategy inspired by previous work [32].</p>
          <p>
            This strategy concatenates the CLS token ℎCLS from the beginning of the input sequence with the
mean-pooled ¯ and max-pooled ˜ outputs from the hybrid attention mechanism, resulting in multilevel
pooling features. The concatenated features are then transformed using a GELU activation function
to produce a comprehensive feature representation P [
            <xref ref-type="bibr" rid="ref15 ref16">33, 34</xref>
            ]. This allows the model to utilize global
semantics, overall statistical features, and salient local information.
          </p>
          <p>where:</p>
          <p>P = GELU
︁(</p>
          <p>W[hCLS; A¯; A˜ ])︁ ∈ R× 
A¯ =
 =1</p>
          <p>1 ∑︁ Afused,,</p>
          <p>A˜ = max(Afused,)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <sec id="sec-5-1">
        <title>5.1. Ablation experiment</title>
        <p>We evaluate model performance on the DIMEMEX dataset, monitoring precision, recall, F1-score, and
accuracy. All models are initialized with BETO, with its parameters frozen during training; only the
newly added modules are optimized. Each experiment is repeated five times, with results reported as
averages. Our ablation studies focus on two key aspects: (1) comparing the efectiveness of using Cross
Attention alone versus Hybrid Attention, and (2) assessing the contribution of Hierarchical Pooling.</p>
        <p>To investigate the impact of each module on the performance of the model, we design a series of
ablation experiments. Starting with BETO as the baseline model, we progressively introduce diferent
components: cross-segment attention, residual connections, gated fusion, and hierarchical pooling. The
experimental setup and results are shown in Table 3.</p>
        <p>HARP-BETO introduces a paradigm shift by replacing cross-segment attention with hybrid attention,
which integrates local causal attention and global cross-modal attention. This architecture captures
both fine-grained contextual patterns within OCR text and high-level interactions between text and
descriptions. HARGP-BETO further enhances this design by incorporating gated fusion to dynamically
weight local and global features, as Equation (11) and Equation (12) , resulting in more adaptive feature
representation. Both models employ hierarchical pooling for final feature aggregation.</p>
        <p>We compare our model with both single-modal ViT and multimodal CLIP models. The results are
shown in Table 4.</p>
        <p>The ablation study in Table 4 systematically evaluates the impact of key architectural modules on the
performance of the model. Our architectural decision to build on BETO’s text-based foundation stems
from three key observations: (1) The strong performance of Spanish-specialized BETO (61.77% precision)
(10)
(11)
(12)
(13)
Cross-Segment</p>
        <p>Attention</p>
        <p>Hybrid Attention
(Local+Global)</p>
        <p>Residual
Connection</p>
        <p>Gated
Fusion</p>
        <p>Hierarchical</p>
        <p>Pooling
✓
✓
✓
✓
×
×
Model
Vit
CLIP
BETO
A-BETO
AR-BETO
ARG-BETO
ARGP-BETO
HARP-BETO
HARGP-BETO
×
×
×
×
✓
✓
F1-Score</p>
        <p>Precision
versus general multilingual CLIP (57.93%) validates the necessity of language-specific modeling for
nuanced Mexico Spanish. (2) Although CLIP’s visual-text alignment improves recall (+1.66% over BETO),
its limited Spanish pretraining causes precision degradation in culture-specific contexts. (3) ViT’s poor
performance (42.76% F1) confirms that visual patterns alone lack suficient semantic signals for abuse
content detection.</p>
        <p>We dissect the role of each component based on their incremental contributions. A-BETO
demonstrates that cross-attention enhances feature interaction between segments but lacks hierarchical
refinement. Introducing a residual connection in AR-BETO stabilizes training during
backpropagation to some extent, generating a gain of 0. 34% F1 despite the increase in the parameter count. For
ARG-BETO, gating mechanism mitigates noise but underperforms for minority classes with Class 1
(inappropriate content)  1 = 0.49. For ARGP-BETO, multiscale pooling diversifies feature aggregation,
particularly benefiting Class 0 (neither) (  1 = 0.82 → +4% vs. AR-BETO). HARP-BETO combines local
and global attention improving Class 1 recall (43% → 50%), but precision drops due to overfitting in
sparse samples. As for HARGP-BETO, adding gating to hybrid attention output refines hybrid features,
which perform best in these models.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Confusion Matrix</title>
        <p>The HARGP-BETO model demonstrates robust performance in integrating multiscale features through
its hybrid attention mechanism, which efectively combines local and global semantic patterns to
achieve a balanced macro-F1 score of 0.6139, as seen in Figure 3. The model excels in classifying the
majority class (Class 0 with 172 TP and 75.4% recall), showcasing its ability to leverage hierarchical
attention and gated fusion for stable feature aggregation. The multilevel pooling strategy through
concatenated representations [CLS], mean-pooled, and max-pooled further enhances discriminative
power by capturing various statistical signals, contributing to an overall accuracy of 68.24%.</p>
        <p>However, the model has limitations due to data imbalance in the dataset. Class 1 has a low recall of
46.6%, with 23 samples misclassified as Class 0. Class 2 shows moderate performance (61.1% recall) with
14 misclassifications to Class 0. This suggests that the gating mechanism may be biased towards the
weight allocation of majority classes, suppressing the characteristics of low-frequency classes. In the
future, dynamic category weights, contrastive learning losses and hierarchical attention optimization
need to be adopted to alleviate the imbalance and enhance the fine-grained discrimination ability.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study presents HARGP-BETO, a novel hierarchical framework for detecting abusive content in
Mexican Spanish memes, leveraging hybrid attention mechanisms and adaptive feature fusion to address
the challenges of multimodal and imbalanced data. The proposed HARGP-BETO model achieves a
macro-F1 score of 0.6139 and an accuracy of 68. 24%, demonstrating its efectiveness in integrating local
and global semantic patterns through gated fusion and multilevel pooling. The hierarchical architecture,
particularly the hybrid attention design, significantly improves feature interaction between OCR text
and contextual descriptions, enabling robust performance on the majority class (Class 0: 75.4% recall)
while maintaining balanced precision-recall trade-ofs. These advancements highlight the potential
of text-based approaches in abusive meme detection, which are computationally eficient compared
to multimodal methods. Additionally, text-based models are more conducive to leveraging post-hoc
attribution analysis and visualization tools after training, making them more easily interpretable than
purely image-based methods.</p>
      <p>However, the model’s performance on minority classes (e.g., Class 1 recall= 46.6%) highlights persistent
challenges rooted in data imbalance and feature ambiguity. Future work should focus on dynamic
class-aware gating, contrastive learning for minority-class discrimination, and enhanced local attention
mechanisms to mitigate bias. By refining these aspects, the framework could be extended to other
low-resource languages, fostering safer internet spaces while preserving cultural and linguistic nuances
in abusive content detection.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>[15] E. Montesinos-Cánovas, F. Garcia-Sánchez, J. A. Garcia-Díaz, G. Alcaraz Mármol, R. Valencia García,</p>
        <p>Spanish hate-speech detection in football (2023).
[16] H. Gomez-Adorno, G. Bel-Enguix, G. Sierra, J.-C. Barajas, W. Álvarez, Machine learning and deep
learning sentiment analysis models: Case study on the sent-covid corpus of tweets in mexican
spanish, Informatics 11 (2024). URL: https://www.mdpi.com/2227-9709/11/2/24.
[17] G. Bel-Enguix, H. Gómez-Adorno, A. Pimentel, S.-L. Ojeda-Trueba, B. Aguilar-Vizuet, Negation
detection on mexican spanish tweets: The t-mexneg corpus, Applied Sciences 11 (2021). URL:
https://www.mdpi.com/2076-3417/11/9/3880. doi:10.3390/app11093880.
[18] M. E. Aragón, M. Á. Álvarez-Carmona, M. M. y Gómez, H. J. Escalante, L. V. Pineda, D. Moctezuma,
Overview of mex-a3t at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish
tweets, in: IberLEF@SEPLN, 2019. URL: https://api.semanticscholar.org/CorpusID:267061516.
[19] M. E. Aragón, H. J. Jarquín-Vásquez, M. M. y Gómez, H. J. Escalante, L. V. Pineda, H.
GómezAdorno, J. P. Posadas-Durán, G. Bel-Enguix, Overview of mex-a3t at iberlef 2020: Fake news
and aggressiveness analysis in mexican spanish, in: IberLEF@SEPLN, 2020, pp. 222–235. URL:
https://ceur-ws.org/Vol-2664/mex-a3t_overview.pdf.
[20] F. M. Plaza-del Arco, M. Casavantes, H. J. Escalante, M. T. Martín-Valdivia, A. Montejo-Ráez,
M. Montes-y Gómez, H. Jarquín-Vásquez, L. Villaseñor-Pineda, Overview of MeOfendEs at IberLEF
2021: Ofensive Language Detection in Spanish Variants, Procesamiento del Lenguaje Natural
67 (2021) 183–194. URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6388,
number: 0.
[21] F. M. Plaza-del Arco, M. Casavantes, H. Escalante, M. Martin-Valdivia, A. Montejo-Ráez, M.
Montesy Gómez, H. Jarquín-Vásquez, L. Villasenor-Pineda, Overview of the meofendes task on ofensive
text detection at iberlef 2021, Procesamiento del Lenguaje Natural 67 (2021).
[22] H. Jarquín-Vásquez, D. I. Hernández-Farías, L. J. Arellano, H. J. Escalante, L. Villaseñor-Pineda,
M. Montes, F. Sanchez-Vega, et al., Overview of da-vincis at iberlef 2023: Detection of aggressive
and violent incidents from social media in spanish, Procesamiento del Lenguaje Natural 71 (2023)
351–360.
[23] T.-C. I. H.-F. D. I. E. H. J. V.-P. L. M.-y.-G. M. Jarquín-Vásquez, Horacio, Overview of DIMEMEX
at IberLEF2025: Detection of Inappropriate Memes from Mexico, Procesamiento del Lenguaje
Natural 75 (2025).
[24] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
for language understanding, in: Proceedings of the 2019 conference of the North American chapter
of the association for computational linguistics: human language technologies, volume 1 (long
and short papers), 2019, pp. 4171–4186.
[25] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,</p>
        <p>Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
[26] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The long-document transformer, arXiv preprint
arXiv:2004.05150 (2020).
[27] M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontanon, P. Pham, A. Ravula,
Q. Wang, L. Yang, et al., Big bird: Transformers for longer sequences, Advances in neural
information processing systems 33 (2020) 17283–17297.
[28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,</p>
        <p>Attention is all you need, Advances in neural information processing systems 30 (2017).
[29] F. Liu, S.-Y. Shen, Z.-W. Fu, H.-Y. Wang, A.-M. Zhou, J.-Y. Qi, Lgcct: A light gated and crossed
complementation transformer for multimodal speech emotion recognition, Entropy 24 (2022) 1010.
[30] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, Xlnet: Generalized autoregressive
pretraining for language understanding, Advances in neural information processing systems 32
(2019).
[31] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio,
Learning phrase representations using rnn encoder-decoder for statistical machine translation,
arXiv preprint arXiv:1406.1078 (2014).
[32] A. Tao, K. Sapra, B. Catanzaro, Hierarchical multi-scale attention for semantic segmentation, arXiv</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>A. Online Resources</title>
      <sec id="sec-8-1">
        <title>The sources for the CEUR-art style are available via</title>
        <p>• GitHub</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Bruning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Alge</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-C. Lin</surname>
          </string-name>
          ,
          <article-title>Social networks and social media: Understanding and managing influence vulnerability in a connected society</article-title>
          ,
          <source>Business Horizons</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>749</fpage>
          -
          <lpage>761</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Díaz-Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Morán-Méndez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Villasenor-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aguilera</surname>
          </string-name>
          , L. MenesesLerín,
          <article-title>Automatic detection of ofensive language in social media: Defining linguistic criteria to build a mexican spanish dataset</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Mullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. M. N. W.</given-names>
            <surname>Zainon</surname>
          </string-name>
          ,
          <article-title>Advances in machine learning algorithms for hate speech detection in social media: a review</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>88364</fpage>
          -
          <lpage>88376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , et al.,
          <article-title>Exploring hate speech detection: challenges, resources, current research and future directions</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Toktarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Syrlybay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Myrzakhmetova</surname>
          </string-name>
          , G. Anuarbekova,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rakhimbayeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhylanbaeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Suieuova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kerimbekov</surname>
          </string-name>
          ,
          <article-title>Hate speech detection in social networks using machine learning and deep learning methods</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>14</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mojedano Batel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pezik</surname>
          </string-name>
          ,
          <article-title>Native dialect influence detection (ndid): Diferentiating between mexican and peninsular l1 spanish in l2 english, Language and Law/Linguagem e Direito 9 (</article-title>
          <year>2022</year>
          )
          <fpage>120</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. N. M.</given-names>
            <surname>Mercado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F. C.</given-names>
            <surname>Chuctaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. G. C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <article-title>Automatic cyberbullying detection in spanish-language social networks using sentiment analysis techniques</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>9</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jarquín-Vásquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Tlelo-Coyotecatl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Casavantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. I.</given-names>
            <surname>Hernández-Farías</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Villaseñor-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes</surname>
          </string-name>
          , et al.,
          <source>Overview of dimemex at iberlef</source>
          <year>2024</year>
          :
          <article-title>Detection of inappropriate memes from mexico</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>73</volume>
          (
          <year>2024</year>
          )
          <fpage>335</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Macy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <source>Automated hate speech detection and the problem of ofensive language</source>
          ,
          <year>2017</year>
          . URL: https://arxiv.org/abs/1703.04009. arXiv:
          <volume>1703</volume>
          .
          <fpage>04009</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>A.-M. Founta</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Djouvas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Chatzakou</surname>
            ,
            <given-names>I. Leontiadis</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blackburn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Stringhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sirivianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kourtellis</surname>
          </string-name>
          ,
          <article-title>Large scale crowdsourcing and characterization of twitter abusive behavior</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1802</year>
          .00393. arXiv:
          <year>1802</year>
          .00393.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Firooz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ringshia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Testuggine</surname>
          </string-name>
          ,
          <article-title>The hateful memes challenge: Detecting hate speech in multimodal memes</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/
          <year>2005</year>
          .04790. arXiv:
          <year>2005</year>
          .04790.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>State toxicn: A benchmark for span-level target-aware toxicity extraction in chinese hate speech detection</article-title>
          ,
          <year>2025</year>
          . URL: https: //arxiv.org/abs/2501.15451. arXiv:
          <volume>2501</volume>
          .
          <fpage>15451</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>García-Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers</article-title>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          <volume>9</volume>
          (
          <year>2022</year>
          )
          <fpage>2893</fpage>
          -
          <lpage>2914</lpage>
          . preprint arXiv:
          <year>2005</year>
          .
          <volume>10821</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aberdam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Litman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsiper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Anschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Slossberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mazor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manmatha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <article-title>Sequence-to-sequence contrastive learning for text recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>15302</fpage>
          -
          <lpage>15312</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <article-title>Gaussian error linear units (gelus</article-title>
          ),
          <source>arXiv preprint arXiv:1606.08415</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>