<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Classical Machine Learning and Deep Learning Approaches for Ofensive Language Identification in Dravidian Code-Mixed Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rachana Nagaraju</string-name>
          <email>rachananagaraju20@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hosahalli Lakshmaiah Shashirekha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <addr-line>Mangalore, Karnataka</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Forum for Information Retrieval Evaluation</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Ofensive Language Identification (OLI) has become an important task, particularly in the context of social media, where harmful content such as hate speech, cyberbullying, and toxic discourse can spread rapidly and widely. The challenge becomes even more complex in low-resource, code-mixed settings across Dravidian languages such as Tamil, Malayalam, Kannada, and Tulu. To address these challenges, Ofensive Language Identification in Dravidian Code-Mixed Languages shared task at FIRE 2025 released gold-standard annotated datasets of comments collected from online platforms in these four languages, with the goal of developing automated systems that can classify user comments as either ofensive or non-ofensive. We - team MUCS, participated in the shared task on OLI by developing two complementary pipelines: i) Of_ML - an ensemble of traditional Machine Learning (ML) estimators (Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), and Naïve Bayes (NB) classifiers) with hard and weighted voting, trained with Term Frequency-Inverse Document Frequency (TF-IDF) of word ngrams in the range (1-3) as features, and ii) Of_DL - a hybrid Deep Learning (DL) architecture that combines Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM). These pipelines highlight the comparative strengths of feature-based ML models and representation-driven DL architectures, while emphasizing the benefits of ensembling for multilingual ofensive language detection. Experimental results demonstrated that the classical ML pipeline consistently outperformed the DL architecture in all four languages. For Tamil, Of_ML model with hard voting obtained a macro-averaged F1-score of 0.452 (Rank 3) compared to that of DL's 0.350 (Rank 8). For Malayalam, Of_ML model with weighted voting reached 0.712 macro-averaged F1-score (Rank 4), while that of DL model achieved only 0.350 (Rank 8). For Kannada, Of_ML model with hard voting obtained a macro-averaged F1-score of 0.421 (Rank 5), whereas DL model managed to get 0.344 macro-averaged F1-score (Rank 8). Finally, for Tulu, Of_ML model with weighted voting delivered its strongest performance with 0.790 macro-averaged F1-score (Rank 2), while DL model recorded that of 0.610 (Rank 7). These results highlight that, despite the increasing prominence of neural methods, traditional ML models with careful feature engineering remain highly competitive and, in several cases, superior for OLI in low-resource, code-mixed Dravidian languages such as Tamil, Malayalam, Kannada, and Tulu.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ofensive Language Identification</kwd>
        <kwd>Code-Mixed Text</kwd>
        <kwd>Dravidian Languages</kwd>
        <kwd>Social Media</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        OLI is an important application in Natural Language Processing (NLP), particularly in the context of
social media platforms where these platforms serve as primary spaces for public discourse. Online
platforms often sufer from toxic interactions, hate speech, cyber bullying, and abusive content, making
the detection of ofensive language a necessary step toward ensuring safe and inclusive communication.
Unlike monolingual formal text, user-generated social media content tends to be short, noisy, and
highly informal, presenting significant challenges for automated text processing systems [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The
complexity of this task increases in code-mixed environments where the practice of blending words
and/or phrases from multiple languages in a single utterance is widespread. Code-mixed data often
employ a combination of Roman and native scripts, phonetic transliteration, and inconsistent grammar,
all of which complicate the application of standard monolingual models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Dravidian languages such as Tamil, Malayalam, Kannada, and Tulu are frequently mixed with English
in online discourse, producing a unique set of challenges. Earlier studies of OLI in Dravidian languages
[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] have highlighted the scarcity of high-quality datasets and the limited efectiveness of transfer
learning approaches in these low-resource settings. Table 1 presents few samples of user-generated
content in Dravidian languages, highlighting the informal, noisy, and code-mixed nature of the text
that poses challenges for automated OLI systems.
      </p>
      <p>
        To address the challenges of OLI in code-mixed Dravidian languages, Ofensive Language Identification
in Dravidian Code-Mixed Languages [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] shared task at FIRE 2025 invited researchers to build automated
systems that can distinguish user comments mainly as ofensive or non-ofensive in four Dravidian
languages - Tamil, Malayalam, Kannada, and Tulu [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Similar shared tasks such as HASOC [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
TRAC [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], in the past, demonstrated the value of creating benchmark datasets for abusive language
detection. FIRE 2025 task continues this tradition by focusing on OLI in Dravidian code-mixed data, thus
providing researchers with a test bed for evaluating learning systems for a low-resource code-mixed
content.
      </p>
      <p>We - team MUCS, participated in this shared task with two distinct pipelines, aiming to systematically
compare traditional feature-based ML models with modern DL architectures. The first pipeline - Of_ML,
relies on TF–IDF features to train the ensemble of ML estimators (LR, SVM, RF, and NB) with hard and
weighted voting. The second pipeline - Of_DL, explores a hybrid CNN–LSTM model to capture semantic
and contextual information more efectively. Our code is available on GitHub 1. Our experimental results
reveal a clear performance gap between the two approaches. For Tamil, the ML pipeline with hard
voting achieved a macro-averaged F1-score of 0.452 (Rank 3), while the DL pipeline settled with that
of 0.350 (Rank 8). For Malayalam, ML with weighted voting delivered macro-averaged F1-score 0.712
1https://github.com/rachanabn20/Ofensive-Language-Identification-in-Dravidian-Code-Mixed-Languages---DravidianCodeMix-FIRE-2025
(Rank 4), compared to that of DL’s 0.350 (Rank 8). For Kannada, ML model with hard voting secured
0.421 macro-averaged F1-score (Rank 5), while DL reached that of 0.344 (Rank 8). Finally, for Tulu,
ML with weighted voting recorded its strongest result with 0.790 macro-averaged F1-score (Rank 2),
whereas DL achieved its best overall performance with 0.610 macro-averaged F1-score (Rank 7). These
ifndings emphasize that, despite the increasing popularity of DL approaches, traditional ML methods
with careful feature engineering remain highly efective in low-resource and code-mixed language
scenarios. Moreover, while DL models showed some promise in Tulu, their overall performance was
inconsistent, highlighting the need for further research in adapting neural architectures for complex
multilingual and code-mixed environments.</p>
      <p>The subsequent sections of this paper details the related works (Section 2), methodology (Section 3),
experiments, results, and implications of our approach (Section 4) followed by conclusion and future
works (Section 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The automatic detection of ofensive and abusive language has been widely studied in recent years to
advance research and develop robust systems for automatic OLI, particularly in Indian languages and
code-mixed social media texts. The TRAC shared task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] benchmarked aggression identification in
social media. The HASOC track at FIRE 2019 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] focused on hate speech and ofensive content detection
in Indo-European languages, while HASOC at FIRE 2020 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] extended the focus to Hate Speech and
OLI in Tamil, Malayalam, Hindi, English, and German. Furthermore, HASOC-Dravidian-CodeMix
at FIRE 2021 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] targeted code-mixed Tamil and Malayalam, whereas DravidianLangTech at EACL
2021 [11] addressed OLI in code-mixed Dravidian languages (Tamil-English, Malayalam-English, and
Kannada-English). These tasks have significantly contributed to the creation of annotated datasets,
evaluation metrics, and baseline models for ofensive language detection in Indian and Dravidian
languages, fostering inclusive and safer digital communication environments.
      </p>
      <p>
        Early surveys by Schmidt and Wiegand [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Fortuna and Nunes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], provided comprehensive
overviews of hate speech detection techniques, ranging from lexicon-based approaches to more advanced
ML models. These studies highlighted the limitations of keyword-based models and emphasized the
importance of contextual features for reliable detection. In multilingual and code-mixed scenarios,
the task becomes even more challenging. Jose and Choudhury [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], surveyed sentiment analysis and
opinion mining in code-mixed text, noting dificulties such as transliteration, inconsistent grammar,
and lack of standardized resources. Code-mixed datasets have also been explored in language pairs
such as Hindi–English and Bengali–English, further illustrating the challenges of feature extraction
and classification in noisy user-generated text [12].
      </p>
      <p>
        Saumya et al. [13] compared traditional and neural approaches for OLI, reinforcing the importance
of character- and word-level representations in noisy code-mixed data. Chakravarthi et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] reported
the first benchmark results for OLI in Tamil, Malayalam, and Kannada through a Ofensive Language
Identification in Dravidian Languages-EACL 2021 shared task. The baseline systems included classical
ML models such as LR, SVM, and RF with TF–IDF features, as well as neural models including CNN,
BiLSTM, and multilingual transformers such as mBERT and XLM-R. Interestingly, classical ML models
with TF–IDF often outperformed DL approaches in low-resource conditions, while transformer-based
models achieved competitive performance but required careful tuning and were highly sensitive to noise
in code-mixed data. The shared task’s main contributions included the establishment of standardized
datasets and evaluation protocols, demonstrating the continued strength of classical ML models for
small and imbalanced datasets, and the potential of multilingual transformers. However, challenges
such as limited data availability, transliteration inconsistencies, and poor cross-lingual transferability
restricted model generalization.
      </p>
      <p>Prajnashree et al. [14] proposed traditional ML models (SVM, RF, Passive Aggressive Classifier)
and Siamese LSTM for OLI in low-resource Indian languages on HASOC 2023 datasets. The results
illustrated that SVM achieved strong results for Sinhala (macro F1 = 0.78, Rank 11) and Siamese LSTM for
Gujarati (macro F1 = 0.72, Rank 12). Although efective in low-resource conditions, these models lagged
behind transformer-based approaches due to limited contextual understanding. Fazlourrahma et al. [15]
introduced COOLI, a system for code-mixed OLI in Tamil-English (Ta–En), Malayalam-English (Ma–En),
and Kannada-English (Kn–En). They proposed two models: (i) COOLI-Ensemble (MLP, XGBoost, and
LR in a voting setup) and (ii) COOLI-Keras (dense neural network). COOLI-Ensemble achieved the best
performance, ranking 1st for Ma–En (F1 = 0.97), 4th for Ta–En (F1 = 0.75), and 6th for Kn–En (F1 = 0.69).
Despite these strong results, dataset imbalance and inconsistent Romanization limited cross-lingual
generalization.</p>
      <p>Overall, prior work underscores the dual challenge of OLI and code-mixing, motivating the need for
specialized approaches that balance classical feature-based ML approaches with DL models tailored for
low-resource, code-mixed settings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our approach to OLI in Dravidian code-mixed texts consists of classical ML and DL models. The
methodology includes text pre-processing, feature extraction, model training, and evaluation. We
developed two independent pipelines, namely Of_ML and Of_DL, to systematically analyze performance
trade-ofs between traditional ML models and neural methods for OLI in four Dravidian languages. The
steps involved in the methodology are given below:</p>
      <sec id="sec-3-1">
        <title>3.1. Text Pre-processing</title>
        <p>The dataset comprised user-generated code-mixed texts in four Dravidian languages: Kannada,
Malayalam, Tulu, and Tamil. These texts contained multiple classes, including few types of ofensive categories
and a non-ofensive category. Due to the noisy nature of user-generated content, the following
preprocessing steps are applied to clean and prepare the text for further processing:
• Normalization: Repeated characters are reduced (e.g., soooo → soo) to handle exaggerated
expressions.
• Noise Removal: URLs, user mentions, emojis, hashtags, numbers, and non-alphanumeric
symbols are removed.
• Stopwords: Minimal stopword removal was performed to avoid loosing functional words that
may contribute to ofensive tone.
• Transliteration Variants: Transliterated forms (e.g., phonetic spellings in Roman script such
as maga (Roman script) vs. maga (native Kannada script) are retained as it is without exlicit
normalization.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature Engineering</title>
        <p>Feature representations difered significantly across the two pipelines:
• Features for Of_ML Models: Word-level -grams in the range (1–3) (unigrams, bigrams, and
trigrams) are extracted to capture explicit lexical signals pertaining to ofensive expressions and
handle spelling variations. These features are then vectorized using the TfidfVectorizer2
with the vocabulary size restricted to 15,000 features. Even though the feature set is
highdimensional and sparse, it is highly efective for linear classifiers. However, this work does not
include the analysis of TF–IDF features contributing to classification performance, nor does it
provide inspection of the most discriminative -grams. Further, no explicit linguistic features
are incorporated to address the challenges arising from code-mixed or morphologically rich text,
which are common in multilingual social media data.
• Features for Of_DL Models: Each word token in the dataset is mapped to dense vector
representation using an embedding layer with randomly initialized weights.
2https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Training</title>
        <p>The two pipelines are trained with diferent features and frameworks of these pipelines are shown in
Figures 1 and 2.</p>
        <p>• Of_ML Model: The following estimators are used in the ensemble model to enhance the
performance of the classifier:
– Logistic Regression3: is a linear classifier efective in high-dimensional sparse spaces.
– Support Vector Machine4: is a margin-based classifier that maximizes class separation.
– Random Forest5: is an ensemble of decision trees that improves robustness to feature
noise.</p>
        <p>– Naïve Bayes6: is a probabilistic classifier well-suited for text data.</p>
        <p>These estimators are ensembled with hard voting (predictions from individual classifiers are
combined based on majority voting) for Kannada and Tamil, and weighted voting (classifiers with
higher validation performance are assigned larger weights) for Tulu and Malayalam.
• Of_DL Model: The Hybrid CNN–LSTM model combines CNN layers for local feature extraction
with LSTM layers for sequential modeling, balancing short-span and long-range contexts:
– Convolutional Neural Network7: learns phrase-level features directly from embeddings.
– Long Short-Term Memory8: captures discourse-level and sequential patterns across
tokens.
3https://en.wikipedia.org/wiki/Logistic_regression
4https://en.wikipedia.org/wiki/Support_vector_machine
5https://en.wikipedia.org/wiki/Random_forest
6https://en.wikipedia.org/wiki/Naive_Bayes_classifier
7https://en.wikipedia.org/wiki/Convolutional_neural_network
8https://en.wikipedia.org/wiki/Long_short-term_memory
The final representations are fed to a fully connected layer to output probabilities across the
ofensive language categories.</p>
        <p>Table 2 summarizes the hyperparameters used for both pipelines. While classical ML models are
tuned via grid search9, DL models used validation-based tuning.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>
        The proposed pipelines are built and evaluated using the Dravidian languages: Kannada, Malayalam,
and Tamil datasets provided by the organizers of the shared task [
        <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
        ]. Recent eforts have introduced
OLI dataset in another Dravidian language Tulu [16], highlighting the growing importance of OLI
in low-resource contexts. Each language dataset has Train, Validation and blind Test sets, and the
distribution of these sets is shown in Table 3. Even though the datasets are highly imbalanced, we did
not make any eforts to balance the datasets.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Results</title>
        <p>The performance of the models is reported in terms of macro-averaged Precision (mPrecision), macro
averaged Recall (mRecall), and macro averaged F1-score (mF1-score) across all the classes. This allows
us to capture performance across all the classes in a better way rather than being dominated by the
majority class. The participating teams are ranked based on the performances of the models in terms of
mF1-score. Tables 4 and 5 present the results of our traditional ML ensemble approaches on Validation
and Test sets respectively, while Tables 6 and 7 report the performance of the DL pipelines on Validation
and Test sets respectively.</p>
        <p>From the results, it is clear that the Of_ML models consistently outperformed the Of_DL models
across all languages on the Test sets. For Kannada, Of_ML model achieved mF1-score of 0.42 with
5th rank, compared to Of_DL model’s mF1-score of 0.34 with 8 th rank. For Malayalam, Of_ML model
obtained mF1-score of 0.71 (Rank 4), while Of_DL model reached only 0.35 mF1-score (Rank 8). Tulu
showed the strongest Of_ML model performance with a mF1-score of 0.79 (Rank 2), substantially better
than Of_DL model’s 0.61 mF1-score (Rank 8). Tamil also followed this trend, where Of_ML model
scored a mF1-score of 0.45 (Rank 3) but Of_DL model lagged at 0.35 mF1-score (Rank 8). Though
Of_DL models captured contextual patterns, TF–IDF based Of_ML models proved significantly more
competitive in the shared task.</p>
        <p>Figures 3a, 3b, 3c, and 3d illustrate the leaderboard rankings based on mF1-score for Tulu, Tamil,
Malayalam, and Kannada respectively, highlighting the relative performance of our systems against
other participating teams.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Confusion Matrices</title>
        <p>The confusion matrices provide further insight into classification performance by highlighting the
misclassification. Figures 4a, 4b, 4c, and 4d illustrate the performances of Of_ML models across Kannada,
Malayalam, Tamil, and Tulu datasets, respectively. Confusion matrices for Kannada, Tulu, Malayalam,
and Tamil, obtained using Of_DL models are provided in Figures 5a, 5b , 5c and 5d, respectively, to show
the contrast with ML pipelines. The low mF1-scores of Of_DL models for Kannada and Tamil, indicate
that minority ofensive categories are under-predicted. This mirrors the behavior observed in Of_ML
models, though Of_DL pipelines are better at balancing recall across the major and mid-frequency
classes.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Error Analysis</title>
        <p>While the overall results demonstrate that both Of_ML and Of_DL models are capable of handling OLI
in multiple langauges, a closer look at the performance across languages reveals important insights:
(a) Leaderboard Rankings for Tulu
(b) Leaderboard Rankings for Tamil
(c) Leaderboard Rankings for Malayalam
(d) Leaderboard Rankings for Kannada
• Class imbalance has a significant impact in the performances of the classifiers. Minority categories
such as Ofensive_Targeted_Insult_Other or Ofensive_Untargeted consistently showed very low
recall, often close to zero in Kannada and Tamil. This suggests that both ML and DL systems leaned
heavily toward predicting the more frequent Not_Ofensive class, leading to missed detections of
subtler ofensive expressions.
• Linguistically, both Ofensive_Targeted_Group and Ofensive_Targeted_Individual categories rely
on similar cues — such as insults or derogatory expressions — making it dificult for the models
to distinguish whether the ofense was aimed at a group or a single person.
(c) Tamil
(d) Tulu
• Language-specific diferences are striking. Malayalam and Tulu achieved higher weighted
F1scores, indicating relatively robust detection. These datasets appear cleaner, with fewer ambiguous
cases, which may explain the performance boost. In contrast, Kannada and Tamil sufered from
noisier data and heavier code-mixing, which made ofensive intent harder to capture.
• Code-mixing itself remains one of the most challenging factors. Many social media posts combine
English with native Dravidian scripts, producing hybrid structures that traditional ML and DL
models struggled with. For example, a sentence might switch languages mid-phrase, obscuring
both semantic and syntactic cues that are crucial for ofensive language detection.</p>
        <p>These observations highlight that while the models perform reasonably well for some languages,
particularly Malayalam and Tulu, further work is needed to improve robustness in noisier and more
code-mixed contexts. Addressing class imbalance and exploring advanced contextual embeddings or
data augmentation techniques could help bridge this gap in the future.</p>
        <p>Table 8 presents representative misclassified samples across languages. Several factors contribute to
the mis-classification of Test samples as given below:
(c) Tamil
(d) Tulu
• Code-switching and Mixed Scripts: The classifier struggles when users mix English with
regional languages or emojis. For example: “Enjoyed too much. Superb” was labeled as
Not_Language but predicted as Not_offensive.
• Sarcasm and Context Dependence: Subtle insults or sarcastic remarks are dificult for the
model to interpret. Example: “so poor dialogue delivery From Mohanal...” was a targeted insult but
predicted as untargeted.
• Ambiguity in Ofense Type: The model may detect ofensive tone but fail to decide whether it
is targeted or untargeted. Example: “Rashmika ide iro avtava” (targeted insult at an individual)
was predicted as untargeted.
• Dialect and Spelling Variations: Regional slang and non-standard spellings confuse the
classiifer. Example: “Bvc yrt comment manpva ha yavu” (untargeted insult) was predicted as a targeted
insult.
• Named Entities and Cultural References: The system fails to recognize organizations or
cultural groups as insult targets. Example: “RIP kannada film chamber. No proper plannings” was
an insult towards a group but predicted as non-language.</p>
        <p>• Polite vs. Non-language Confusion: Positive or polite expressions are sometimes misclassified
as Not_Language. Example: “Sir dasara habbada shubashayagalu” was actually non-ofensive
but predicted as Not_Language.
• Entity Type Confusion (Individual vs. Group vs. Other): The classifier struggles to
distinguish whether an insult is directed at an individual, a group, or an abstract concept. Example:
“Padam pottum... Charithram akanam kett marakkar varanam” was a group insult but predicted
as non-ofensive.</p>
        <p>The above mis-classification samples illustrate the challenges of noisy code-mixed content which
includes inconsistent language mixing, informal spellings, and limited annotated data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>In this work, we - team MUCS explored both ensemble of traditional ML models (Of_ML) and DL
Of_DL pipelines for OLI in four Dravidian languages: Kannada, Malayalam, Tulu, and Tamil. The
overall results showed that Of_ML models are more efective achieving strong leaderboard rankings —
2nd place for Tulu, 3rd place for Tamil, 4th place for Malayalam, and 5th place for Kannada. In contrast,
Of_DL models struggled, consistently ranking 8 th across all languages despite reasonable validation
performance. Our analysis highlighted the key challenges: i) Minority ofensive categories such as
Ofensive_Targeted_Insult_Other and Ofensive_Untargeted, are severely under-predicted, leading to
low mF1-scores (0.32–0.35 for Kannada and Tamil using Of_DL models), ii) Frequent confusions also
occurred between related classes such as Ofensive_Targeted_Group and Ofensive_Targeted_Individual,
and iii) Malayalam and Tulu benefited from relatively cleaner datasets, while Kannada and Tamil are
impacted by noisy, code-mixed inputs that make ofensive intent harder to capture. Looking ahead,
there are several promising directions. Leveraging multilingual transformers (e.g., XLM-R, mBERT) or
instruction-tuned Large Language Models (LLMs) may enhance the ability of the models to capture
subtle cues of ofensiveness. Text augmentation, re-sampling, and synthetic text generation, could
help alleviate class imbalance and improve recall for minority classes. Code-mixing aware models and
transliteration-based pre-processing are also worth exploring to handle hybrid language scenarios more
efectively. Finally, extending these experiments to additional Dravidian and low-resource languages will
provide broader insights into the generalizability of ofensive language detection systems. In summary,
while we achieved competitive leaderboard performance through ML ensembles, further refinements are
required for DL pipelines. Addressing class imbalance, code-mixing, and fine-grained class distinctions
remains crucial for developing robust multilingual ofensive language detection systems.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>Generative Artificial Intelligence (AI) tools are used in the preparation of this work for language
refinement, grammar improvement, and formatting suggestions. The research content, analysis, results,
and conclusions are developed independently, and AI tools are not used to generate original research
ifndings.
[11] B. R. Chakravarthi, R. Priyadharshini, N. Jose, A. Kumar M, T. Mandl, P. K. Kumaresan, R.
Ponnusamy, H. R L, J. P. McCrae, E. Sherly, Findings of the Shared Task on Ofensive Language
Identification in Tamil, Malayalam, and Kannada, in: B. R. Chakravarthi, R. Priyadharshini,
A. Kumar M, P. Krishnamurthy, E. Sherly (Eds.), Proceedings of the First Workshop on Speech and
Language Technologies for Dravidian Languages, Association for Computational Linguistics, Kyiv,
2021, pp. 133–145. URL: https://aclanthology.org/2021.dravidianlangtech-1.17/.
[12] P. Patwa, G. Aguilar, S. Kar, T. Solorio, SemEval-2020 Task 9: Overview of Sentiment Analysis of
Code-Mixed Tweets, in: Proceedings of the 14th International Workshop on Semantic Evaluation,
2020.
[13] S. K. Saumya, D. Khurana, B. R. Chakravarthi, Ofensive Language Identification in Dravidian
Code-Mixed Text, in: Proceedings of the First Workshop on Speech and Language Technologies
for Dravidian Languages, 2021.
[14] P. M, R. K, A. Hegde, K. G, S. Coelho, H. L. Shashirekha, Taming Toxicity: Learning Models for Hate
Speech and Ofensive Language Detection in Social Media Text, in: Working Notes of FIRE 2023
Forum for Information Retrieval Evaluation, 2023. URL: https://ceur-ws.org/Vol-3681/T6-22.pdf.
[15] F. Balouchzahi, A. B. K, H. L. Shashirekha, MUCS@DravidianLangTech-EACL2021: Cooli -
CodeMixing Ofensive Language Identification, in: Proceedings of the First Workshop on Speech and
Language Technologies for Dravidian Languages, Association for Computational Linguistics, 2021,
pp. 323–329.
[16] A. M. D, D. Vikram, B. R. Chakravarthi, P. R. Hegde, Overcoming Low-Resource Barriers in
Tulu: Neural Models and Corpus Creation for Ofensive Language Identification, 2025. URL:
https://arxiv.org/abs/2508.11166. arXiv:arXiv:2508.11166.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Hate Speech Detection using Natural Language Processing</article-title>
          ,
          <source>Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Automatic Detection of Hate Speech in Text, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Sentiment Analysis and Opinion Mining in Code-Mixed Text</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          , et al.,
          <source>DravidianCodeMix: Sentiment Analysis and Ofensive Language Identification Dataset for Dravidian Languages, in: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          , et al.,
          <source>Findings of the Shared Task on Ofensive Language Identification in Dravidian Languages, in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M D</surname>
            , P. R. Hegde,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Vikram</surname>
          </string-name>
          ,
          <article-title>Overview of the Shared Task on Ofensive Language Identification in Dravidian Code-Mixed Languages, in: Forum of Information Retrieval and Evaluation FIRE-</article-title>
          <year>2025</year>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          , et al.,
          <source>Overview of the Hasoc Track at FIRE</source>
          <year>2019</year>
          :
          <article-title>Hate Speech and Ofensive Content Identification in Indo-European Languages</article-title>
          ,
          <source>in: Proceedings of the 11th Forum for Information Retrieval Evaluation</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ojha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Benchmarking Aggression Identification in Social Media,
          <source>in: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-1)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <source>Overview of the Hasoc Track at FIRE</source>
          <year>2020</year>
          :
          <article-title>Hate Speech and Ofensive Language Identification in Tamil, Malayalam, Hindi, English and German, in: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , FIRE '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>29</fpage>
          -
          <lpage>32</lpage>
          . URL: https://doi.org/10.1145/3441501.3441517. doi:
          <volume>10</volume>
          .1145/3441501.3441517.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          , Premjith,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Madasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Findings of Shared Task on Ofensive Language Identiifcation in Tamil and Malayalam, in: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , FIRE '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>16</fpage>
          -
          <lpage>18</lpage>
          . URL: https://doi.org/10.1145/3503162.3503179. doi:
          <volume>10</volume>
          .1145/3503162.3503179.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>