<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the shared task on Ofensive Language Identification in Dravidian code-Mixed Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sripriya N</string-name>
          <email>sripriyan@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi Raja Chakravarthi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thenmozhi Durairaj</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi B</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasanna Kumar Kumaresan</string-name>
          <email>P.Kumaresan1@universityofgalway.ie</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subalalitha C N</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anusha M D</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Parameshwar R Hegde</string-name>
          <email>parameshwarhegde@yenepoya.edu.in</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deepthi Vikram</string-name>
          <email>deepthisbangera@yenepoya.edu.in</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SRM Institute of Science And Technology</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Galway</institution>
          ,
          <addr-line>Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yenepoya Institute of Arts Science Commerce and Management</institution>
          ,
          <addr-line>Mangalore</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Ofensive language detection is a critical task in natural language processing, particularly in the context of online discourse, where harmful content can spread rapidly. Identifying ofensive language is challenging due to the varied ways in which ofense is conveyed, including subtle linguistic cues, code-mixing, and cultural context. Code-mixing is a prevalent phenomenon in a multilingual community, and the code-mixed texts are sometimes written in non-native scripts. Systems trained on monolingual data fail on code-mixed data due to the complexity of code-switching at diferent linguistic levels in the text.This shared task presents a gold-standard dataset for ofensive language detection in Tamil, Malayalam, Kannada, and Tulu, enabling researchers to develop robust classification models.13 teams actively participated and developed systems to identify ofensive content in this shared task. This work summarizes the various techniques used by the competing teams. Further, the performance analysis of the all systems was carried out using macro-F1 score and their rankings were reported.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ofensive Identification</kwd>
        <kwd>corpus Creation</kwd>
        <kwd>Classification</kwd>
        <kwd>Code-Mixing</kwd>
        <kwd>Dravidian Languages</kwd>
        <kwd>shared task</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The detection of ofensive language has become a pivotal task within the field of Natural Language
Processing (NLP), especially in the contemporary digital landscape characterized by the widespread
dissemination of user-generated content across social media platforms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Online environments such
as YouTube and Instagram facilitate millions of daily interactions encompassing topics related to
entertainment, politics, social issues, and personal viewpoints. While these platforms promote open
communication, they simultaneously function as environments conducive to the propagation of ofensive
material, including abusive language, hate speech, cyberbullying, misogyny, targeted harassment and
sarcasm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The ramifications of such conduct are extensive, leading to psychological distress,
perpetuating social inequalities—particularly afecting vulnerable populations such as women—and
compromising the safety and well-being of online communities.
      </p>
      <p>Growing concerns regarding online abuse have stimulated increased scholarly attention toward the
development of automated systems designed to detect and mitigate harmful content. Nonetheless, the
task of ofensive language detection remains complex due to the intricate linguistic manifestations of
ofensive expressions. Such content may be overt or covert, frequently articulated through sarcasm,
metaphorical language, cultural allusions, or contextual nuances that pose significant interpretative
challenges for computational models. These dificulties are further exacerbated in multilingual contexts,
where individuals often engage in code-mixing, i.e., alternating between two or more languages within
a single utterance or discourse.</p>
      <p>Code-mixing is commonly found in South Indian linguistic groups, where languages such as Tamil,
Malayalam, Kannada, and Tulu frequently blend with English during casual conversations. The situation
becomes more complicated when these languages are represented in non-native scripts or Romanized
formats, leading to significant disparities in spelling, morphology, and grammatical structures.
Conventional NLP systems, which are usually trained on monolingual and well-formed text, find it challenging
to cope with such variability, resulting in subpar performance on actual code-mixed data. Furthermore,
languages like Kannada and Tulu are under-resourced, and the lack of annotated datasets hinders the
creation of efective ofensive language identification models.</p>
      <p>To fill these gaps, the FIRE 2025 Global Ofensive Language Detection Challenge aims to detect
ofensive content in code-mixed Dravidian languages, particularly Tamil-English, Malayalam-English,
Kannada-English, and Tulu-English. Common Challenge provides a benchmark dataset compiled
from YouTube comments across entertainment, news, and sociopolitical domains. Participants are
asked to categorize comments into categories such as ofensive, non-aggressive, targeted insults, and
non-targeted (individual/group) insults.</p>
      <p>The challenge is expected to support applications such as automated content moderation on social
media platforms, help law enforcement identify harmful behavior, and help organizations monitor
public opinion while ensuring respectful communication. Beyond direct applications, the overall goal is
to advance multilingual and multilingual NLP research by encouraging the development of models that
can understand diferent linguistic patterns, scriptural variations, and culturally specific expressions.</p>
      <p>This highlights the need for comprehensive datasets and methodologies that efectively generalize
across languages, scripts, and communities and help create safer and more equitable online
environments.</p>
      <p>The rest of this document is organized as follows. Section 2 reviews related studies. Section 3
describes the problem and dataset. Section 4 presents the methodology adopted by the participating
teams with the results and Section 5 concludes the article.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Ofensive speech is communication that ofends, or hurts feelings of others. On social media sites,
these kinds of content are inciting hatred, encouraging violence, and targeting specific people or
groups based on their identification. Henceforth, it is essential to filter and censor violence in order
to promote healthy online communication and safeguard vulnerable populations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
HASOCtrack at FIRE2019 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] focused on hatespeech and ofensive content detection in Indo-European
languages, while HASOC at FIRE2020 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] extended the focus to HateSpeech and OLI in Tamil, Malayalam,
Hindi, English, and German. Furthermore, HASOC Dravidian-CodeMix at FIRE2021 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
targetedcodemixed Tamil and Malayalam,whereas DravidianLangTechatEACL2021 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] addressed OLI in
codemixedDravidianlanguages(Tamil-English, Malayalam-English,and Kannada-English).These tasks have
significantly contributed to the creation of annotated datasets,evaluation metrics, and baselinemodels
for ofensive language identification in Indian and Dravidian languages,fostering inclusive and safer
digital communication environments.
      </p>
      <p>
        A growing number of researchers are using transformer-based models, such as BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], IndicBERT
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to enhance semantic comprehension. These models are particularly well-suited to handling the
complexities of Indian languages since they were pretrained on a variety of multilingual datasets.
Despite these advancements, Tulu and Kannada remain relatively understudied in comparison to Tamil
and Malayalam [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description and Datasets</title>
      <p>This shared task presents a corpus for ofensive language identification of code-mixed text in
Dravidian languages (Tamil-English, Malayalam-English, Kannada-English and Tulu-English). This task is
further complicated in low-resource languages where limited annotated datasets exist for ofensive
speech detection. This task presents a gold-standard dataset for ofensive language detection in Tamil,
Malayalam, Kanada and Tulu, enabling researchers to develop robust classification models.</p>
      <p>The primary goal of this shared task is to build and evaluate systems that can automatically classify
social media text into these four categories. Participants will be provided with training, development,
and test datasets to develop their models. Given the real-world class imbalance in ofensive content,
models must be designed to handle the skewed distribution of data efectively. As far as we know, this
is the first shared task on ofensive language detection in Tulu. By organizing this task, we aim to
foster research in under-resourced languages, improve computational approaches for ofense detection
in multilingual and code-mixed settings, and contribute to the responsible use of AI in moderating
harmful content online.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>The dataset consists of social media comments and posts that are categorized into four classes:
• Not Ofensive (NO): Content without any ofensive elements.
• Ofensive Untargeted (OU): Ofensive content that is not directed at a specific individual or entity.
• Ofensive Targeted (OT): Direct attacks on an individual or group, including hate speech targeting
a community, ethnicity, caste, or gender.
• Not Tamil/Not Malayalam/ Not Kanada/ Not Tulu (NT): Content that does not contain the
corresponding language. Table 1 shows the distribution of the training set, validation set, and
test set for this task for all the four languages.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology and Results</title>
      <p>A total of 13 teams participated in the Ofensive Language Identification shared task at FIRE 2025, where
the the participants submitted diferent runs/models and the performance of these models was measured
using the metric called macro F1-score. The teams were ranked using the macro F1-score which is a
metric trying to balance both the measures, precision and recall. This metric is good for evaluating the
ofensive identification models, because false positives and false negatives are well balanced by this
measure. The models created by all the teams show promising capabilities in spite of the challenges
posed by linguistic diversity, code-mixed, and social disparities.</p>
      <p>The team "CoreFour" followed a comprehensive methodology that included data preprocessing,
traditional machine learning, and transformer-based deep learning techniques. As a first step, various
NLP-based preprocessing such as lowercasing, punctuation removal, and normalization of repeated
characters were carriedout. To handle multilingual inconsistencies and reduce noise, a custom
selftranslation mapping was implemented to correct transliterated or code-mixed words into a more
standardized form. For the classification models, traditional algorithms like Support Vector Machines
(SVM) and Random Forest using TF-IDF features extracted at word, character, and subword levels
were trained. Fine-tuning of the state-of-the-art multilingual transformer models such as mBERT,
Distil-mBERT, IndicBERT, and XLM-RoBERTa to capture rich contextual semantics was performed.
Finally, the outputs of these models were combined using ensemble strategies like soft voting and
model blending to improve robustness and accuracy. This multi-faceted approach efectively addressed
the challenges of linguistic diversity and label imbalance, resulting in more reliable and generalizable
performance. This system has secured a top score of 0.778 for Malayalam-English and top 3 positions in
the remaining three code-mixed languages.</p>
      <p>"NLPFusion" [14] team have used diferent transformer model and also employed balanced weights.
The proposed methodology makes use of Transfer Learning (TL) based Multilingual Bidirectional
Encoder Representations from Transformers (mBERT) model and XLM-RoBERTa. These models achieved
macro F1 scores of 0.465, 0.475, and 0.820, securing first rank in Tamil, Kannada, Tulu respectively and
second rank in Malayalam with 0.774 score.</p>
      <p>The team "MUCS_Of" [ 15] preprocessed the Datasets of Kannada, Malayalam, Tulu, and Tamil as
the first step in the structured methodology of identifying ofensive language in multiple Dravidian
languages. Transforming text to lowercase and eliminating URLs, mentions, and other superfluous
characters while keeping those unique to Indian scripts are examples of text preprocessing. In order to
make the preprocessed text appropriate for neural network models, it is subsequently transformed into
numerical sequences using a tokenizer and padded to guarantee consistent length. LSTM, CNN, and a
hybrid model that combines the two are the three categories of deep learning models are constructed.
CNNs extract local features, LSTMs capture sequential dependencies, and the hybrid model combines
the two advantages.To avoid overfitting and enhance generalization, these models are trained with
early stopping and learning rate reduction. These systems exhibited average performance compared to
other teams in the shared task.</p>
      <p>IREL@IIT-BHU [16] team have used fine-tuned pre-trained XLM-RoBERTa model with and without
early stopping. Across the four languages, these systems consistently ranked within the top-6, achieving
4th place in Kannada and Tamil, 5th in Malayalam, and 7th in Tulu, thereby establishing multilingual
transformers as a strong baseline for Dravidian code-mixed ofensive language identification.</p>
      <p>The "DUCS" team employed a comprehensive, two-pronged approach, systematically comparing
classical machine learning techniques with a state-of-the-art transformer-based model. The entire
worklfow was encapsulated in a robust, end-to-end OfensiveLanguagePipeline for reproducibility. Initially,
a strong baseline using several traditional models was established, including Logistic Regression, SVM,
Random Forest, and Multinomial Naive Bayes. The text data underwent a careful preprocessing pipeline
tailored for code-mixed Tamil-English content, which involved normalizing whitespace, replacing
URLs, mentions, and hashtags with special tokens, and a unique step of converting emojis into textual
representations For e.g., EMOJI_smile to retain their semantic value. Features for these models were
generated using TF-IDF vectorization with both unigrams and bigrams to capture lexical patterns. The
core of our system is a fine-tuned transformer model, specifically google/muril-base-cased, chosen
for its strong performance on Indian languages and code-mixed text. Recognizing the severe class
imbalance in the dataset, we implemented two distinct strategies: for the baseline models, SMOTE
(Synthetic Minority Over-sampling Technique) was used to balance the training data at the data level.
For the more sophisticated transformer model, we implemented a custom WeightedTrainer class that
applies class weights directly to the Cross-Entropy loss function, compelling the model to pay more
attention to underrepresented ofensive categories during training. The system was designed to evaluate
both approaches and select the best-performing model for final predictions and finally showed a macro
F1-score of 0.416.</p>
      <p>In order to address ofensive language detection in Tulu, Kannada, Tamil and Malayalam, the team
"YenLP_CS" [17] used a multi-stage ensemble-based methodology that combined transformer-based
deep learning techniques with conventional machine learning techniques. To establish a solid statistical
baseline, the TF-IDF features have been extracted from the text and trained a group of classifiers,
including Logistic Regression, XGBoost, MLP, SVC, and KNN, all of which were combined via a voting
mechanism. They have integrated two transformer-based models to improve contextual understanding:
(1) IndicBERT, where CLS-token embeddings were extracted and fed into a custom MLP classifier, and
(2) XLM-Roberta, where an XGBoost classifier was trained using CLS-token embeddings. Finally, a hard
majority voting strategy was used to fuse the predictions from the TF-IDF ensemble, IndicBERT+MLP,
and XLM-Roberta+XGBoost. This proposed method was good in identifying ofensive language content
in Malayalam-mixed English text securing 0.75 macro-f1 score.</p>
      <p>The "Dravidian_decoders" [18] team have adopted an ensemble-based machine learning approach
to perform ofensive language classification across two South Indian languages: Kannada and Tulu. The
methodology began with thorough data preprocessing, which included text normalization and label
encoding to make the datasets suitable for machine learning. For feature extraction, TF-IDF vectorization
with both unigrams and bigrams was used to capture important contextual patterns in the text. Three
lightweight yet efective classifiers—Linear Support Vector Machine (SVM), Logistic Regression, and
Multinomial Naive Bayes—were trained separately on each language-specific dataset. These models
were then combined using a majority voting ensemble technique to aggregate predictions and improve
classification robustness. The ensemble model capitalizes on the strengths of each individual classifier,
thereby improving generalization and handling of class imbalances. The models were evaluated using
F1-score and realized that are yielding average performances securing 6th and 8th rank in Kanada and
Tulu.</p>
      <p>The "Coreminds" [19] team developed a multilingual Ofensive language classification system for
four low-resource Indian languages. The methodology involved collecting, cleaning, and
preprocessing labeled datasets for each language, ensuring consistent formatting and removing noise. Labels
were encoded using LabelEncoder to convert them into numeric format. Fine-tuned two powerful
transformer-based language models were used for training: IndicBERTv2-MLM-only, which is trained
on 12 Indian languages, and TwHIN-BERT-base, a BERT-based model pre-trained on Twitter data, which
is particularly efective for handling social media text. These models were chosen based on the nature
of the data and the language coverage. This approach ensures adaptability to noisy, real-world text in
underrepresented Indian languages. This proposed model was performing well for Malayalam-mixed
English securing 3rd rank and was showing average performance for rest of the languages.</p>
      <p>The "DravidianDefenders" team adopted a traditional machine learning-based approach tailored
individually for each language. The methodology involved extensive preprocessing to clean and
normalize the code-mixed social media comments, including removal of URLs, special characters,
emojis, and non-linguistic artifacts. For text representation, TF-IDF vectorization with unigrams and
bigrams were used to efectively capture the linguistic patterns in code-mixed contexts. Diferent
classifiers were used for each language: Logistic Regression for Tamil, Random Forest for Malayalam
and Tulu, and Support Vector Machine (SVM) for Kannada. For Kannada, the model was further
optimized using character n-gram TF-IDF features and class-weighted SVM with grid search. The
models exhibited only average performance and obtained the ranks above 6 across all four languages.</p>
      <p>The methodology used by the "langTeam" [20] for this multi-class text classification task involved
several key steps. Initially, the dataset was loaded and subjected to a comprehensive preprocessing
pipeline. This included basic text cleaning, tokenization, stop word removal, stemming, and
lemmatization. Subsequently, the preprocessed text data was transformed into numerical features using both
TF-IDF and Count Vectorization techniques. These vectorizations were applied to various levels of
processed text. Multiple classification models were then employed, including Logistic Regression, Naive
Bayes, Support Vector Classifier, Decision Tree, Random Forest, and Gradient Boosting. A BiLSTM
model was also implemented, utilizing Word2Vec embeddings. Each model was trained and evaluated on
the diferent vectorized representations of the data. The predictions of the model with best performance
was taken based on it’s accuracy. The systems showed F1 score of 0.267,0.511, 0.77 for Tamil, Malayalam
and Tulu language.</p>
      <p>The "Malayalam_lan_tech" team employed a supervised learning pipeline using transformer-based
sentence embeddings combined with classical machine learning classifiers for performing the ofensive
language identification in Malayalam. Initially, the data was preprocessed by cleaning and encoding the
labels into numerical format. A lightweight, pre-trained multilingual transformer model was utilized to
generate fixed-size sentence embeddings for the Malayalam code-mixed text. These embeddings served
as features for training multiple classifiers, including K-Nearest Neighbors (KNN), Random Forest (RF),
Support Vector Machine (SVM), and XGBoost. The model was tested against Malayalam language
dataset and obtained a score of 0.14.</p>
      <p>The systems developed by all the participated teams for all the four language code-mixed datasets
were evaluated and ranked based on the F1 Score and are provided in Tables 2, 3, 4 and 5.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this shared task,we have promoted the issue of detecting ofensive language in code-mixed Dravidian
languages, specifically Tamil-English,Malayalam-English, Kannada-English and Tulu-English. The
research community is encouraged by this mission to investigate new methods for developing strong
and trustworthy ofensive language identification systems. Four datasets containing postings that
had been scraped from social media platforms in Tamil, Malayalam, Kanada, and Tulu were provided.
Thirteen teams took part and created systems employing a variety of techniques, such as
transformerbased models and conventional machine learning models. The F1 score was used to assess and rank the
outcomes according to the model’s efectiveness. Future study in this area will benefit from the ideas
that each team employed to develop their systems, which were also emphasized.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is supported by the Centre for Research Training in Artificial Intelligence grant number
SFI/18/CRT/6223 and a grant from the College of Science and Engineering, University of Galway, Ireland.
Bharathi Raja Chakravarthi were funded by a research grant from Research Ireland under grant number
SFI/12/RC/2289_P2 (Insight).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>In the course of preparing this manuscript, the author(s) employed the generative AI tool ChatGPT. Its
use was limited to performing checks for grammar and spelling. Following this, the author(s) conducted
a thorough review and revision of the text and assume full responsibility for the final published content.
org/abs/2508.11166. arXiv:arXiv:2508.11166.
[14] H. Asha, S. M, Amrithkala, M. Shazia, C. Sharal, Multilingual pretrained models for ofensive
language identification in dravidian code-mixed text, in: Forum of Information Retrieval and
Evaluation FIRE - 2025, Varanasi, India, 2025.
[15] N. Rachana, S. Hosahalli, Lakshmaiah, Exploringclassicalmachinelearninganddeeplearning
approachesforofensivelanguageidentificationindravidian code-mixedtext, in: Forum of Information
Retrieval and Evaluation FIRE - 2025, Varanasi, India, 2025.
[16] T. Krishna, C. Supriya, A. P. K, Irel@iit-bhu@dravidiancodemix 2025: Ofensive language
identification, in: Forum of Information Retrieval and Evaluation FIRE - 2025, Varanasi, India, 2025.
[17] A. Raksha, S. Rathnakara, Yenlp_cs@dravidiancodemix 2025: A trifusion model for ofensive
language detection in dravidian code-mixed text, in: Forum of Information Retrieval and Evaluation
FIRE - 2025, Varanasi, India, 2025.
[18] S. P, A. S, A. V, D. J, Dravidiandecoders@ dravidiancodemix 2025: Ofensive content classification
in kannada–tulu code-mixed texts using classical machine learning, in: Forum of Information
Retrieval and Evaluation FIRE - 2025, Varanasi, India, 2025.
[19] S. P, A. A, V, M. S, Arul, C. T, Coreminds@dravidiancodemix 2025: Comparative study of
transformer-based models for ofensive content detection in tamil, malayalam, kannada, and
tulu code-mixed texts, in: Forum of Information Retrieval and Evaluation FIRE - 2025, Varanasi,
India, 2025.
[20] S. Y, R. P, Saicharan, K. K, Revanth, S. D.V.L, Lang team@dravidiancodemix 2025: Ofensive detect,
in: Forum of Information Retrieval and Evaluation FIRE - 2025, Varanasi, India, 2025.
[21] S. R, S, S. U, S, S. M, S. P, S, Dravidiandefenders@dravidiancodemix 2025: Empirical analysis of
classical machine learning approaches in tamil, malayalam, and tulu code-mixed ofensive content
classification, in: Forum of Information Retrieval and Evaluation FIRE - 2025, Varanasi, India,
2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <article-title>A survey on automatic detection of hate speech in text</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>51</volume>
          (
          <year>2018</year>
          ). URL: https://doi.org/10.1145/3232676. doi:
          <volume>10</volume>
          .1145/3232676.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N</given-names>
            , B. B,
            <surname>N. K</surname>
          </string-name>
          , T. Durairaj,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rajkumar, Overview of sarcasm identification of dravidian languages in dravidiancodemix@ ifre-2023, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2023</year>
          , Goa, India,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. N</given-names>
            , B. B,
            <surname>T. Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. K</surname>
          </string-name>
          , R. Ponnusamy,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rajkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Overview of sarcasm identification of dravidian languages in dravidiancodemix@fire-2024, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Towards ofensive language identification for Dravidian languages</article-title>
          , in: B.
          <string-name>
            <surname>R. Chakravarthi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          , P. Krishnamurthy, E. Sherly (Eds.),
          <source>Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</source>
          , Kyiv,
          <year>2021</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>27</lpage>
          . URL: aclanthology.org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          , S. Modha, andetal. et al.,
          <source>Overview of the hasoc track at fire</source>
          <year>2019</year>
          :
          <article-title>Hate speech and ofensive content identification in indo-european languages</article-title>
          ,
          <source>in: Proceedings of the 11th Forum for Information Retrieval Evaluation</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Anandkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2020: Hate speech and ofensive language identification in tamil, malayalam, hindi, english and german</article-title>
          ,
          <source>Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          (
          <year>2020</year>
          ). URL: https://api.semanticscholar.org/CorpusID:231628577.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          , Premjith,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Madasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Findings of shared task on ofensive language identification in tamil and malayalam</article-title>
          ,
          <source>in: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          , FIRE '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>16</fpage>
          -
          <lpage>18</lpage>
          . URL: https://doi.org/10.1145/3503162.3503179. doi:
          <volume>10</volume>
          .1145/3503162.3503179.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          , N. Jose, T. Mandl,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          , et al.,
          <article-title>Findings of the shared task on ofensive language identification in tamil, malayalam, and kannada</article-title>
          ,
          <source>in: Proceedings of the first workshop on speech and language technologies for Dravidian languages</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Dravidiancodemix: Sentiment analysis and ofensive language identification dataset for dravidian languages in code-mixed text</article-title>
          ,
          <source>Language Resources and Evaluation</source>
          <volume>56</volume>
          (
          <year>2022</year>
          )
          <fpage>765</fpage>
          -
          <lpage>806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kakwani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kunchukuttan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Golla</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. N.C.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Khapra</surname>
          </string-name>
          , P. Kumar,
          <article-title>IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4948</fpage>
          -
          <lpage>4961</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>445</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .findings-emnlp.
          <volume>445</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Jagadeeshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Palanikumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <article-title>Ofensive language identification in dravidian languages using mpnet and cnn</article-title>
          ,
          <source>International Journal of Information Management Data Insights</source>
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <article-title>100151</article-title>
          . URL: https://www.sciencedirect.com/science/article/ pii/S2667096822000945. doi:https://doi.org/10.1016/j.jjimei.
          <year>2022</year>
          .
          <volume>100151</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>A. M. D</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Vikram</surname>
            ,
            <given-names>B. R.</given-names>
          </string-name>
          <string-name>
            <surname>Chakravarthi</surname>
            ,
            <given-names>P. R.</given-names>
          </string-name>
          <string-name>
            <surname>Hegde</surname>
          </string-name>
          ,
          <article-title>Overcoming low-resource barriers in tulu: Neural models and corpus creation for ofensive language identification</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>