<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FAST-HASOC 2025: Multimodal and Multilingual Approaches for Hate Speech and Ofensive Content Detection in Hindi Memes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Rafi</string-name>
          <email>muhammad.rafi@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saif Ur Rehman Awan</string-name>
          <email>saifurrehman@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ramsha Jat</string-name>
          <email>ramsha.jat@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aiman Falak</string-name>
          <email>aiman.falak@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatimah Ansari</string-name>
          <email>fatimahansari614@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahmed Raza</string-name>
          <email>ahmedraza9332@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sagar Chabbriya</string-name>
          <email>sagarchhabriya34@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Forum for Information Retrieval Evaluation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National University of Computer and Emerging Sciences</institution>
          ,
          <addr-line>Islamabad</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This paper describes our **Team FAST's** participation in the HASOC 2025 shared task on ofensive content detection in Hindi memes. The dataset consists of multimodal samples with OCR-extracted text and raw images, annotated across four subtasks: sentiment, sarcasm, vulgarity, and abuse detection. We propose a multimodal framework that integrates classical machine learning and deep learning models. Our contributions are as follows: (i) a tailored preprocessing pipeline for noisy OCR and Hindi, English code-mixing using curated stopword and vulgar dictionaries, (ii) a combination of lightweight classical models (TF-IDF + Random Forest) with neural approaches (CNN, BiLSTM, ResNet50). Our system achieved its best performance in **Vulgarity detection with a Macro-F1 score of 0.75**. Code, data splits, and preprocessing resources are available at: https://github.com/fatimahansari/hindi-HASOC-2025.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hate Speech Detection</kwd>
        <kwd>Multimodal NLP</kwd>
        <kwd>Hindi Memes</kwd>
        <kwd>TF-IDF</kwd>
        <kwd>Random Forest</kwd>
        <kwd>CNN</kwd>
        <kwd>BiLSTM</kwd>
        <kwd>ResNet50</kwd>
        <kwd>HASOC 2025</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Ofensive content on social media spans textual and visual modalities, often with subtle cues,
codemixing, and transliteration. The HASOC 2025 shared task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] focuses on Hindi memes, combining
OCR-extracted text and meme images, with subtasks in sentiment, sarcasm, vulgarity, and abuse
detection. These challenges are amplified by noisy OCR, informal Hinglish, and implicit insults. We
present our approach, which combines both classical and neural models under a multimodal pipeline.
Unlike prior editions focusing on monomodal text [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], our system explicitly fuses image and text
features for vulgarity and incorporates Hindi-specific lexicons for preprocessing. Our contributions are:
• A preprocessing framework for noisy OCR and Hindi–English code-mixing.
      </p>
      <p>• A multimodal architecture combining TF–IDF + Random Forest with ResNet50.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset and Resources</title>
      <sec id="sec-2-1">
        <title>We used the HASOC 2025 Hindi meme dataset:</title>
        <p>• Train: 1133 samples with labels.</p>
        <p>Each sample includes an image and OCR text annotated for four subtasks. Additional resources:</p>
        <p>• hindi-offensive-words-original.json — ofensive lexicon mapped to neutral terms.</p>
      </sec>
      <sec id="sec-2-2">
        <title>These resources are publicly released with our code repository.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. System Architecture</title>
      <sec id="sec-3-1">
        <title>3.1. Preprocessing</title>
        <p>Our text preprocessing includes:</p>
        <sec id="sec-3-1-1">
          <title>1. Cleaning URLs, emails, non-Devanagari symbols. 2. Stopword removal (Hindi/Hinglish). 3. Ofensive word replacement using the vulgar dictionary. 4. Tokenization and language-aware normalization.</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Models</title>
        <p>Sentiment: For sentiment detection, we chose a combination of TF–IDF features and a Random
Forest classifier. TF–IDF is efective in representing textual data from short social media posts and OCR
text because it captures the importance of words and bigrams while ignoring overly frequent stopwords.
Random Forest, being an ensemble of decision trees, provides robustness against noisy data and works
well with sparse, high-dimensional inputs. This model was selected because sentiment cues in Hindi
memes are often expressed through explicit keywords or short phrases, making classical feature-based
approaches suitable. Additionally, Random Forests handle class imbalance relatively well and ofer
interpretability compared to deep models.</p>
        <p>Sarcasm: Sarcasm is typically expressed through subtle lexical patterns, wordplay, and local context
within a short sequence of text. To model this, we employed a Convolutional Neural Network (CNN)
with an embedding layer, 1D convolutional filters, and global max pooling. The CNN captures n-gram
level features by sliding filters over word embeddings, allowing it to learn important combinations
of words that signal sarcasm. This architecture is lightweight compared to transformers but efective
for short-text sarcasm detection, which often relies on key sarcastic cues rather than long-range
dependencies. We chose CNNs because they generalize well on small datasets, train faster, and are less
prone to overfitting than more complex models.</p>
        <p>
          Vulgarity: This multimodal setup allows the system to leverage complementary strengths, while
textual features capture linguistic vulgarity, image embeddings contribute crucial context when
ofensiveness is implied visually rather than verbally. The decision-level fusion also provides robustness,
as errors in one modality can be compensated by the other, leading to more stable and consistent
predictions across a wide variety of meme formats. Furthermore, the use of a relatively lightweight
architecture like ResNet50 ensures faster inference and reduced computational overhead, making the
approach more practical for real world deployment where large volumes of memes need to be processed
eficiently. This balance between accuracy, eficiency, and adaptability is particularly important for
social media platforms, where ofensive content spreads rapidly, and automated systems must detect
problematic material in near real-time while maintaining scalability across millions of daily uploads.
In addition, this framework supports better generalization across unseen content, as both modalities
provide complementary cues that reduce reliance on any single visual or linguistic pattern. The fusion
mechanism also improves resilience against adversarial modifications, such as text masking or subtle
image manipulation, which are commonly used to dodge detection. By integrating context from both
channels, the model can more accurately diferentiate between humor and truly vulgar intent, reducing
false positives that undermine user trust. Overall, the multimodal design strengthens the system’s
ability to adapt to evolving forms of online vulgarity, ensuring consistent performance as meme culture
and ofensive patterns continue to shift over time.
Abuse: Abuse detection is more complex than sentiment or vulgarity because abusive language is
often indirect, context-dependent, and highly code-mixed, especially in online spaces where users
frequently switch between Hindi and English. To capture sequential dependencies, we employed a
Bidirectional LSTM, which processes text in both forward and backward directions, thereby modeling
long range dependencies and subtle cues that a unidirectional model might overlook. This helps the
model understand relationships between abusive terms and their surrounding context, such as sarcasm
or implicit threats. We initialized the model with FastText embeddings trained on Hindi, which provide
rich semantic representations even for rare and morphologically complex words, and also adapt well
to spelling variations and colloquial usage. Additionally, we incorporated a binary lexicon feature
indicating the presence of ofensive terms, ensuring that explicit abuse was directly flagged rather
than relying solely on contextual embeddings. This hybrid approach was chosen because BiLSTMs
efectively capture sequential patterns in short, noisy social media text, while lexicon features act as a
safety net for explicit slurs that embeddings might underrepresent. Together, these components create a
more comprehensive detection system capable of handling both overt insults and subtle, context-driven
abuse.
Why not Transformers? Although transformer-based models such as BERT and mBERT have shown
strong results in ofensive language detection, we deliberately chose not to rely on them as our primary
models in this work. There are three key reasons. First, the HASOC 2025 dataset for Hindi memes is
relatively small (just over 1100 training examples), which makes fine-tuning large transformer models
prone to overfitting [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In contrast, lighter models such as TF–IDF + Random Forest or CNNs are
more data-eficient and generalize better in low-resource settings. Second, OCR-extracted Hindi text is
noisy and often code-mixed, with Romanized tokens that pretrained multilingual transformers do not
handle well without extensive normalization. Classical models and BiLSTMs with FastText embeddings
proved more robust under these conditions, especially when augmented with curated lexicons [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
Finally, computational eficiency was an important consideration: Random Forests, CNNs, and BiLSTMs
are significantly faster to train and deploy, making them practical for iterative experimentation and
real-world applications where resources are limited. While transformers remain a promising direction,
in this task we prioritized interpretability, eficiency, and robustness in noisy, under-resourced data
conditions.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>4.1. Setup
Implemented in Python (scikit-learn, TensorFlow/Keras). Training used 5-fold stratified CV.
Hyperparameters were tuned empirically. Experiments were run on a single TPU.</p>
      <sec id="sec-4-1">
        <title>4.2. Baselines</title>
        <p>• Majority class prediction.</p>
        <p>• Logistic Regression + TF–IDF (text-only).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Our models outperform baselines across all subtasks (Table 1). Vulgarity detection particularly benefited
from multimodal fusion.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>Key findings:
• Preprocessing improved sentiment and abuse detection by handling noisy OCR.
• CNNs captured lexical sarcasm cues better than linear models.
• Multimodal fusion was critical for vulgarity detection, where ofensiveness was primarily visual.
• Lexicon-informed features improved recall for rare abusive expressions. The ofensive term
replacement was implemented using a simple dictionary-based lookup, where any exact match of
a term in the hindi-offensive-words-original.json was replaced with a neutral
placeholder. This ensures that the model learns the context of the meme without explicitly learning to
classify based on individual vulgar terms.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>We presented **Team FAST’s** robust multimodal system for HASOC 2025 Hindi memes. By combining
tailored preprocessing, classical machine learning, and neural architectures, we achieved competitive
performance across all subtasks, with the best Macro-F1 score of **0.75** for vulgarity detection. Future
directions could include:
• End-to-end multimodal transformers (mBERT, CLIP).
• Larger Hindi-Hinglish pretrained embeddings.</p>
      <p>• Fine-grained abuse target classification.
All code, stopword lists, ofensive word dictionaries, and models are available at: https://github.com/
fatimahansari/hindi-HASOC-2025</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Koyel</given-names>
            <surname>Ghosh and Mithun Das</surname>
          </string-name>
          and
          <article-title>Mwnthai Narzary and Saptarshi Saha and Shubhankar Barman and Animesh Mukherjee and Sandip Modha and Debasis Ganguly and Utpal Garain and Sylvia Jaki and Thomas Mandl, Overview of the HASOC Track at FIRE 2025: Abusive Meme Identification - Shadows Behind the Laughter</article-title>
          , in: K. Ghosh,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          , S. Pal (Eds.),
          <source>Forum for Information Retrieval Evaluation (Working Notes) (FIRE 2025) December</source>
          <volume>17</volume>
          -20, Varanasi , India, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Dmonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc subtracks at fire 2023: Hate speech and ofensive content identification in assamese, bengali, bodo, gujarati and sinhala, in: Proceedings of the 15th annual meeting of the forum for information retrieval evaluation</article-title>
          ,
          <year>2023</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Senapati</surname>
          </string-name>
          ,
          <article-title>Hate speech detection: a comparison of mono and multilingual transformer model with cross-language evaluation</article-title>
          , in: S. Dita,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trillanes</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. I.</surname>
          </string-name>
          Lucas (Eds.),
          <source>Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation</source>
          , Association for Computational Linguistics, Manila, Philippines,
          <year>2022</year>
          , pp.
          <fpage>853</fpage>
          -
          <lpage>865</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .paclic-
          <volume>1</volume>
          .94/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <article-title>Data bootstrapping approaches to improve low resource abusive language detection for indic languages</article-title>
          ,
          <source>in: Proceedings of the 33rd ACM conference on hypertext and social media</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>