<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of Sarcasm Identification of Dravidian Languages in DravidianCodeMix@FIRE-2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sripriya N</string-name>
          <email>sripriyan@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi B</string-name>
          <email>bharathib@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thenmozhi Durairaj</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nandhini K</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rahul Ponnusamy</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasanna Kumar Kumaresan</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kishore Kumar Ponnusamy</string-name>
          <email>kishorep161002@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charmathi Rajkumar</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi Raja Chakravarthi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Central University of Tamil Nadu</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Digital University Kerala</institution>
          ,
          <addr-line>Kerala</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Insight SFI Research Centre for Data Analytics, University of Galway</institution>
          ,
          <addr-line>Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>The American College</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Sarcasm is a linguistic expression that conveys an opposite meaning of what is actually stated in words. The growth of social media platforms like WhatsApp, Instagram, Twitter, and Facebook has led to the extensive use of sarcastic content among the public. Identifying sarcasm in such data has become highly critical due to its significance in related fields like sentiment analysis and emotion recognition . It could provide businesses and politicians with accurate insight, as it reflects the true opinion of people. In the recent times, we find that the comments or posts on social media are often found to be code-mixed. Detecting sarcasm in code-mixed text poses greater challenges in the field of NLP. This paper discusses the overview of the sarcasm identification in a shared task conducted as part of Dravidian-CodeMix @ FIRE-2024. The main goal of this task is to encourage researchers to develop systems to identify sarcasm in a dataset of code-mixed social media comments in Dravdian languages, particularly Tamil and Malayalam mixed with English. 23 teams participated in the shared task, with a focus on classifying comments as sarcastic or non-sarcastic. The paper provides a description of the models used by the teams in identifying sarcasm in code-mixed text. The performance of all systems developed by the teams was evaluated based on the macro-F1 score and the results are reported.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sarcasm Identification</kwd>
        <kwd>corpus Creation</kwd>
        <kwd>Classification</kwd>
        <kwd>Code-Mixing</kwd>
        <kwd>Dravidian Languages</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The communication in the entire world is changed by various social media platforms. People express
their thoughts and feelings on social platforms like Twitter, WhatsApp that reflect their opinions
about a particular topic, product, and news. Sarcasm is a linguistic expression that usually carries
the contrary meaning of what is directly conveyed by words. Sarcasm majorly influences the tone
and interpretation of discussions on these platforms, where it can be used to criticize political news,
mock societal trends [1]. Sentiment analysis is an NLP task that performs contextual analysis of text
identifying the subjectivity and sentiments present in opinions. Sarcasm expresses negative sentiments
using positive words and this inherent ambiguity confuses sentiment analysis models. Misinterpretation
of sarcastic remarks can lead to incorrect sentiment classification causing serious impacts on the overall
analysis of public sentiment [2]. Sarcasm identification is essential in social media to promote positive
communication and prevent conflicts. Identifying the sarcastic intent in text is critical when compared
to the detection using visual cues like facial expression and body language. The complex interplay of
linguistic, pragmatic and contextual factors makes the sarcasm identification task challenging [3].</p>
      <p>Studies in literature [1] shows that sarcasm influences the comprehended sentiment of a post and also
plays an important role in altering the public discourse. Sarcastic comments can go viral, amplifying
their impact and altering the narrative of online conversations. Identifying sarcasm in textual content
has several real time applications. In social media monitoring, sarcasm detection can help identify
and mitigate harmful online behaviors such as cyberbullying and hate speech. In customer service,
understanding sarcastic intent can improve customer satisfaction and response times. Additionally,
sarcasm detection can be used to enhance the performance of chatbots and virtual assistants, allowing
them to engage in more natural and nuanced conversations. Identifying Sarcasm in conversation
remains a persistent challenge as researchers have to started focus on multimodal sarcasm detection [4]
[5].</p>
      <p>Researchers have explored various techniques for sarcasm detection, such as rule-based approaches,
machine learning models, and deep learning frameworks. Early studies proposed linguistic-based
approaches to sarcasm identification by analyzing patterns of irony in text [ 6]. More recent work
by Ghosh et al. [7] and Jain et al. [8] employed machine learning models that leverage features like
punctuation, word embeddings, and contextual information to improve sarcasm and ofensive content
detection accuracy. While significant progress has been made in recent years, especially for English,
the task of identifying sarcasm, sentiment and ofensive content remains particularly challenging for
under-resourced languages like Dravidian [9][10][11]. The rise of multilingualism and code-mixing
in online communication adds another layer of complexity to sarcasm detection. Code-mixed texts
are sentences or conversations which comprise of words from multiple languages that are common
in multilingual communities where speakers frequently switch between local languages and English.
There is a surging requirement for sarcasm identification and sentiment detection among social media
communication in Dravidian languages that are largely code-mixed [12] [13]. Detecting sarcasm in such
code-mixed texts requires specialized models that account for language switching, cultural context, and
nuanced expressions [14]. Research work in the literature shows that various challenges in detecting
sarcasm in code-mixed languages, emphasizes the need for multilingual sarcasm identification models
[15] [16].</p>
      <p>Sarcasm identification is important for improving the overall performance of NLP systems. Identifying
sarcastic content precisely help NLP models to improve sentiment analysis, enable better content
moderation, and enhance the quality of conversational bots, result in more meaningful and
contextaware interactions in social media environments. This crucial importance of sarcasm detection in the
real world is stimulus for conducting this shared task in the recent years [17]. Participants were provided
with the two datasets containing Youtube comments that are code-mixed in Dravidian languages. Focus
of the shared tasks were to classify comments that are Tamil mixed with English and Malayalam mixed
with English as sarcastic or not.</p>
      <p>The principal aim of this shared task is to develop systems capable of identifying sarcasm in the
given dataset containing social media posts that are code-mixed in Tamil and Malayalam. The average
length of the post in the given corpora is one sentence, though there are few posts containing multiple
sentences. Each comment is annotated as sarcastic or not. It is observed that the dataset contains more
non-sarcastic posts than sarcastic ones, which shows the class imbalance issues prevalent in the reality.
In this shared task, participants were given training, validation and test datasets. The challenge involved
polarity classification in posts, where participants were asked to determine whether a given YouTube
comment was sarcastic or not. To the extent of our knowledge, this is the first initiative to conduct
shared task in identifying sarcasm in Dravidian code-mixed languages.</p>
      <p>Various systems developed by the participating teams, along with the results are discussed in this
paper. The organization of the paper is as follows: Section 2 covers the description of the shared task,
Section 3 discusses the datasets used, Section 4 describes the various techniques used by the teams.
Finally, Section 5 presents the results and rankings secured by each team, followed by concluding
remarks in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description</title>
      <p>The objective of this task is to identify sarcasm in texts that are code-mixed involving Dravidian
languages. In particular, focus was given to code-mixed text in Tamil and Malayalam with English. The
dataset is obtained from social media platforms, especially YouTube comments. The participants were
given the challenge of predicting sarcasm in the given comments. Contestants of the task were given
with training and validation datasets initially and the test datasets were released later for evaluation.
This shared task on sarcasm detection in Dravidian languages is conducted as a series since last year
and this is the second event of the series. Further information on the task is available in the Codalab
site1.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets</title>
      <p>The datasets containing Tamil, Malayalam text mixed with English actually comprises of social media
comments with diferent types of code-mixed sentences: inter-sentential switching, intra-sentential
switching, and tag switching. Majority of the comments includes a combination of native or Roman
script. It includes sentences framed using Tamil or Malayalam grammar mixed with English words.
It also contains sentences following English grammar that are interspersed with Malayalam or Tamil
vocabulary. The dataset is divided into training, validation and test sets. Training and validation sets
were provided with class labels while test sets used for evaluation were given as unlabeled ones. The
data distribution of training, validation, and test sets are given in Table 1. It is notable that the dataset
contains a higher proportion of non-sarcastic comments compared to sarcastic ones, as shown in Table 2.
This class imbalance skews the dataset, which participants needed to take into account while designing
their classification systems.</p>
      <p>Tamil mixed English
Malayalam mixed English</p>
      <p>Train
29,570
13,188</p>
      <p>Dev
6,336
2,826</p>
      <p>Test
6,338
2,826</p>
      <p>Total
42,244
18,840</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Twenty three teams had actively participated in this shared task to identify sarcasm in two code-mixed
languages, Tamil mixed English and Malayalam mixed English. The contestants have explored a variety
of methodologies to classify the given comment as sarcastic or not [18].</p>
      <p>Awsathama team [19] developed a sophisticated combination of transformer models and diverse data
processing techniques to detect sarcasm in Dravidian languages of Tamil and Malayalam. The approach</p>
      <sec id="sec-4-1">
        <title>1https://codalab.lisn.upsaclay.fr/competitions/19310</title>
        <p>leveraged the strengths of mBERT, Indic-BERT, XLM-Roberta, and Muril to capture the nuanced
linguistic features unique to these languages. A baseline LSTM with an attention mechanism was also
utilized to establish reference performance. To enhance model eficacy, various data augmentation
strategies were implemented. The process began with the original dataset, applying back translation
specifically to the minor classes to achieve balance. Additionally, cross-lingual translation between
Tamil and Malayalam was performed for these minor classes. This comprehensive data augmentation
aimed to improve the models’ generalization abilities, resulting in more accurate sarcasm detection
across diferent contexts and linguistic variations, ultimately securing a top F1 score of 0.74 in Tamil
task and 0.75 in Malayalam.</p>
        <p>Text_Catalysts team [20] investigated three models: DistilBERT, GRU, and LSTM. The study
demonstrates that among the various models, DistilBERT is excellent in detecting sarcasm in Tamil literature.
DistilBERT, a lightweight but efective model, is ideal for detecting sarcasm because of its ability to
capture minor contextual elements in text. It yields an F1 score of 0.74 on the test set, making it the
best performer.</p>
        <p>Change_Makers team [21] explored the application of conventional algorithms, including logistic
regression, random forest, and naive Bayes classifier, alongside the transformer-based BERT.
Performance was evaluated across the datasets, focusing on key metrics like accuracy and F1-score. BERT
demonstrated superior performance, efectively capturing contextual nuances in sarcasm detection,
making it a more viable approach for multilingual and code-mixed environments. This team attained
the maximum score of 0.74 in the Tamil-English sub task.</p>
        <p>MUCS team [22] proposed two distinct models to perform sarcasm detection: i) A Long Short-Term
Memory (LSTM) model using Keras embeddings, and ii) An mBERT+CNN model, which combines the
Multilingual Bidirectional Encoder Representations from Transformers (mBERT) tokenizer for
embeddings (a transformer-based approach) with a Convolutional Neural Network (CNN) for classification.
The data imbalance prevalent in the dataset was handled by the team by applying text augmentation
techniques using the Contextual Word Embeddings expanding the minority class. Among the proposed
models, the mBERT+CNN model achieved superior performance, securing macro F1 scores of 0.74 for
the Tamil-English subtask and 0.72 for the Malayalam-English subtask, ranking 1st and 2nd, respectively.</p>
        <p>UMSNH_NLP team’s approach integrates bag-of-words and deep learning models to solve the task
independently [23]. A new feature space is then constructed by leveraging the decision functions of
the individual models. This feature space is fed into an XGBoost classifier for the final prediction. The
generic text categorization system, FastText, achieves the best performance for both the Tamil and
Malayalam subtasks, with 0.74 and 0.76 F1-scores.</p>
        <p>IRLab@IITBHU team [24] explored a new technique for sarcasm identification using BERT with
an additional neural network layer. It also employed ChatGPT for the same task and conducted a
comparative study between GPT and BERT-based models. The experiment demonstrated that the
BERT-based model efectively detected sarcasm, securing 0.74 F1 score for both Tamil and Malayalam
code-mixed datasets, while GPT attained F1 score of 0.64 on the same datasets. These results reflected
strong overall performance, placing the model third for Malayalam-English language pairs and first for
Tamil-English language pairs.</p>
        <p>Sarcasm_NLP team [25] tackled the sarcasm detection in Dravidian languages task by exploring the
dificulties posed by code-mixing, dialectal variations, and the scarcity of annotated datasets. It
investigates the use of three transformer-based models: (i) DistilBERT, (ii) GoogleBERT, and (iii) RoBERTa,
to efectively capture the subtleties of sarcasm in these languages. Experimental results highlight the
potential of transformers in achieving strong performance in multilingual sarcasm detection, with 0.73
and 0.72 F1-scores for the Tamil and Malayalam subtasks.</p>
        <p>PixelPhrase team [26] proposed a model architecture consisted of a BERT encoder and a classification
layer, generating a probability score indicating the likelihood of sarcasm. To assess the performance
of the models, various validation metrics such as recall, precision, F1 score and AUC, were used. The
results demonstrated that this method outperformed existing approaches. It obtained 0.73 F1 score on
the Tamil test set and 0.72 on the Malayalam test set.</p>
        <p>JUNLP_Amit Barman team [40] developed a hybrid model that involves CNNs, Bi-LSTM networks,
and AdaBoost classifier for detecting sarcasm. This model demonstrated that the combination of
deep learning-based features and the classical machine learning techniques for detecting sarcasm in a
multilingual, code-mixed context. Performance, measured by F1-Score, 0.72 for the Malayalam dataset.</p>
        <p>The team CodeSpark [27] has used advanced deep learning models, such bidirectional LSTMs,
combined with specialized tokenization and embedding techniques, has resulted in substantial
advancements in sarcasm detection. This system performed sarcasm identification with a Macro-F1 Score: 0.72
and secured the third position in the Tamil subtask and 0.74 in the Malayalam subtask.</p>
        <p>KEC_Tech_Titan team [41] developed a system that achieved 6th place in Malayalam subtask of
Dravidian track. Sarcasm, often dependent on context, tone, and cultural nuances, presents significant
challenges for machine learning models. This team explored the usage of various machine learning
and deep learning models for identifying sarcasm in Malayalam text. A range of models was utilized,
including RoBERTa, CNN, Multi-layer Perceptron (MLP), Gated Recurrent Units (GRU), Random Forests
(RF), Hidden Markov Models (HMM), K-Nearest Neighbors (KNN), Logistic Regression (LR) and Gaussian
Mixture Models (GMM).</p>
        <p>Beyond_Tech team [29] studied two deep learning techniques for sarcasm identification in various
languages: a hybrid model that integrates several neural network models and a Bi-LSTM model. The
hybrid model improved sarcasm detection by leveraging long-range dependencies alongside local
feature extraction through the combination of multiple architectures. Comprehensive preprocessing
techniques, including tokenization, padding, and label encoding, were applied to Malayalam and Tamil
sarcasm datasets. The model proposed by the team outperformed the Bi-LSTM in accuracy and F1-scores,
ranking 5th on the Tamil subtask with an Macro F1 score of 0.70 and 7th on the Malayalam subtask
with an F1 score of 0.67.</p>
        <p>SSN_Language Team approach utilizes a multilingual language model designed to handle
codemixed and multilingual text to obtain relevant features from the input data [30]. The tokenized inputs
are used to derive high-dimensional features, providing robust text representations. These features
are then fed into three machine learning models: Multinomial Naive Bayes, Logistic Regression, and
Random Forest Classifier. Each model is trained on Tamil and Malayalam code-mixed datasets, including
code-mixed text, to classify the text into sarcastic and non-sarcastic categories, securing 0.7 and 0.62 F1
scores, respectively.</p>
        <p>Code Crafters team [31] evaluated the eficiency of various models, including machine learning
approaches like XGBoost, LightGBM, and CatBoost, and deep learning models such as LSTM and
GRU. To address class imbalance, SMOTE was applied to the machine learning models and GRU, while
sequence pre-padding was utilized for LSTM. The results indicate that SMOTE improves macro-average
F1 scores and accuracy across most models,reaching a notable 0.69 macro F1 score.</p>
        <p>CJM team [32] developed a system that utilized an MLP classifier, with custom-generated embeddings
provided as input. A language-agnostic sentence transformer, which supports both Tamil and Malayalam,
was used to generate text embeddings. Additionally, the LASER encoder pipeline was employed to
create LASER embeddings for all texts. These two sets of embeddings were concatenated to form the
ifnal set of embeddings, which was used to train the MLP classifier. The team gained 0.68 macro F1
score for Tamil and 0.70 for Malayalam.</p>
        <p>MSD team [33] proposed a approach first translates multilingual Tamil mixed English and Malayalam
mixed English texts into their corresponding English versions, followed by fine-tuning of the models,
BERT and Xlm-RoBERTa for sarcasm identification. The method demonstrates promising results,
achieving 0.68 F1-score for both BERT and Xlm-RoBERTa on Tamil-English posts, and a 0.71 macro
F1-score for Malayalam-English posts.</p>
        <p>The_Three_Musketeers team [34] attempted Various traditional machine learning approaches are
employed to detect sarcastic content in Tamil and Malayalam comments. Among these, the logistic
regression model achieved 0.68 F1 score for Tamil and 0.67 for Malayalam, demonstrating its capability
in capturing the complex nuances of sarcasm detection in code-mixed Dravidian languages.</p>
        <p>KEC_AIDS_79114 team [36] used TF-IDF for vectorization to convert given text it into numerical
features. Four models were evaluated for their efectiveness in sarcasm detection: Decision Tree (DT),
K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Logistic Regression (LR). Logistic
Regression showed the best performance, achieving 0.61 F1-score for Tamil codemixed data and 0.58
for Malayalam codemixed data. The system demonstrated the robustness of the approach by securing a
notable 9th place in a competitive sarcasm detection task.</p>
        <p>The TextTitans team [37] utilized zero-shot capabilities of GPT-3.5 Turbo to carry out sarcasm
detection through prompting. It capitalizes on the advantages of large-scale pretrained models,
accommodates multilingual and code-mixed environments, lessens the reliance on extensive annotated
datasets, facilitates quick experimentation, and ensures scalability across various linguistic contexts.
The use of clear and concise prompts enabled the model to focus on the main task of sarcasm detection,
leading to reliable and interpretable results. GPT model via prompting was run at three diferent
temperature values namely-0.7,0.8,0.9. The system achieved 0.61 macro-F1 score for Tamil and 0.50 for
Malayalam.</p>
        <p>Tech_Chasers team [38] built a system for detecting sarcasm in Tamil code-mixed and Malayalam
code-mixed sentences utilizes a neural network architecture that combines CNNs with Bi-LSTM layers .
The system was trained with early stopping and checkpointing, achieving an F1-score of approximately
0.5 for both Tamil-English and Malayalam-English code-mixed data.</p>
        <p>Tr4nslate team vectorized the text using the TF-IDF vectorizer. A meta-stacking ensemble model
and an ensemble model comprising SVM, KNN, Logistic Regression, Decision Tree was used for Tamil
task and produced a F1 score of 0.71. Random Forest classifier model used for the Malayalam dataset
for detection and produced a score of 0.67.</p>
        <p>Tech_Army_KEC team [28] combined traditional classifiers such as Support Vector Machines (SVM),
Logistic Regression and Random Forest with advanced methods like CNNs, LSTM, and
Transformerbased models like BERT and ALBERT. Further, Hierarchical Attention Networks (HAN) and Gradient
Boosting techniques were also utilised. the system identified the sarcastic comments across the two
languages yielding scores 0.7 for Tamil language and 0.67 for Malayalam language.</p>
        <p>KEC_AI_InnovationEngineers team [35] used few machine learning methods such as: Logistic
Regression, Support Vector Classifier (SVC), and Random Forest. Logistic Regression for binary
classification, SVC for non-linear decision boundaries, and Random Forest for ensemble learning. The system
was trained and evaluated for Tamil language task of sarcasm identification and produced a score of
0.67.</p>
        <p>DLRG team used Multilingual Bert(mBERT) for the identification of the sarcasm contents on Tamil
code-mixed language and obtained 0.49 F1 score.</p>
        <p>JUNLP team [39] built a model using CNN followed by LSTM and AdaBoost classifier to identify
sarcasm in the given Tamil code-mixed dataset and produced a F1-Score of 0.47.</p>
        <p>SSNites used the mBERT model and fine-tuned it using given Tamil code-mixed and Malayam
code-mixed data for sarcasm detection and produced the results, 0.24 and 0.57 respectively.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>
        <xref ref-type="bibr" rid="ref3">In FIRE 2024</xref>
        , 23 teams contested in the shared task of detecting sarcasm in code-mixed data from
Dravidian languages. The performance of all models was assessed using the F1 score [18]. The F1 score
depicts the harmonic mean of recall and precision, and hence is very useful for evaluating sarcasm
detection models. It also adequately balances false positives and false negatives. The results illustrate
the complexity of sarcasm recognition across Tamil and Malayalam languages, stressing its importance
in sentiment research. In spite of the problems of linguistic diversity, code mixing, and social inequities,
all teams created systems with promising potential.
      </p>
      <p>Recently, transformer-based linguistic models have shown outstanding eficiency utilizing powerful
embedding representations and self-attention mechanisms, advancing the field of language
comprehension. In this shared task, appreciably 6 teams “Awsathama", “Team_Catalysts", “Change_Makers",
“MUCS", “UMNSH_NLP", “IRLab@IITBHU" secured the first place in the Tamil code-mix task, and
the team “UMNSH_NLP" achieved first place in the Malayalam code-mix task. In the task of Tamil
code-mixed text, six teams produced top-performing models with 0.74 Macro-F1 score.“Awsathama"
team [19] designed a sophisticated ensemble of transformer models and data processing techniques to
detect sarcasm in Malayalam and Tamil. This approach harnessed the power of mBERT, XLM-Roberta,
Indic-BERT, and Muril to pinpoint the subtle linguistic nuances specific to these Dravidian languages.
A baseline LSTM with an attention mechanism was also employed to set a performance benchmark. To
further refine model efectiveness, several data augmentation strategies were applied. “UMNSH_NLP"
team incorporated bag-of-words and deep learning models to tackle the task autonomously. A novel
feature space was subsequently created by utilizing the decision functions of the individual models. This
feature space was then inputted into an XGBoost classifier for the ultimate prediction. The generic text
categorization system, FastText, attained the highest performance for both the Tamil and Malayalam
code-mixed tasks, with 0.74 and 0.76 F1-scores, respectively. “Team_Catalysts" used DistilBERT, a
lightweight but efective model in detecting sarcasm in Tamil literature due to its capacity to capture
minor contextual elements in text. It achieved 0.74 F1 score on the test set, which is the highest among
all teams. “Change_Makers" recommend BERT classifier for text classification tasks like sarcasm
detection because of its strong design and capacity to handle large-scale datasets and has proven to
yield the highest score for Tamil-English dataset.</p>
      <p>“MUCS" used the hybrid model, the mBERT+CNN model, and achieved superior performance,
securing 0.74 macro F1 score for the Tamil-English dataset and 0.72 for the Malayalam-English dataset,
ranking 1st and 2nd, respectively. “IRLab@IITBHU" team demonstrated that the BERT-based model
efectively detected sarcasm, achieving a 0.74 F1 score for Tamil and Malayalam code-mixed datasets.
The rank lists of the participants in the Sarcasm identification task for Tamil and Malayalam code-mixed
datasets are depicted in Tables 3, and 4.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The shared task for sarcasm detection in Dravidian languages emphasized the importance of language
understanding in the social media communications. Many social media posts now mix diferent
languages, a practice known as code-mixing. The objective of the shared task is to design and develop
innovative models to detect the inherent sarcasm in the code-mixed data. This task encourages the
research community to explore novel approaches to build robust and reliable sarcasm identification
systems. The contestants were shared with two datasets in Tamil and Malayalam which are mixed
with English, containing posts scrapped from social media platforms. 23 teams participated and have
developed systems using diverse approaches, including traditional machine learning models, transformer
based models and transfer learning. The results were evaluated and ranked based on the eficiency of
the models using F1 score. The ideas used by each of the teams in building their systems were also
highlighted which will aid future research in this direction.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Author Bharathi Raja Chakravarthi had supported this shared task through the research grant obtained
from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2 (Insight_2). Rahul
Ponnusamy and Prasanna Kumar Kumaresan also had rendered support through Science Foundation
Ireland Centre for Research Training in Artificial Intelligence under Grant No. 18/CRT/6223 and the
College of Science and Engineering, University of Galway, Ireland.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>[1] A. Ghosh, T. Veale, Fracking sarcasm using neural network, in: Proceedings of the 7th workshop on
computational approaches to subjectivity, sentiment and social media analysis, 2016, pp. 161–169.
[2] A. Joshi, P. Bhattacharyya, M. J. Carman, Automatic sarcasm detection: A survey, ACM Computing</p>
        <p>Surveys (CSUR) 50 (2017) 1–22.
[3] A. Reyes, P. Rosso, D. Buscaldi, From humor recognition to irony detection: The figurative
language of social media, Data &amp; Knowledge Engineering 74 (2012) 1–12.
[4] T. Yue, R. Mao, H. Wang, Z. Hu, E. Cambria, Knowlenet: Knowledge fusion network for multimodal
sarcasm detection, Information Fusion 100 (2023) 101921.
[5] Y. Qiao, L. Jing, X. Song, X. Chen, L. Zhu, L. Nie, Mutual-enhanced incongruity learning network for
multi-modal sarcasm detection, in: Proceedings of the AAAI Conference on Artificial Intelligence,
volume 37, 2023, pp. 9507–9515.
[6] U. Shrawankar, C. Chandankhede, Sarcasm detection for workplace stress management,
International Journal of Synthetic Emotions 10 (2019) 1–17. doi:10.4018/ijse.2019070101.
[7] D. Ghosh, A. R. Fabbri, S. Muresan, Sarcasm analysis using conversation context, Computational
Linguistics 44 (2018) 755–792. URL: https://aclanthology.org/J18-4009. doi:10.1162/coli_a_
00336.
[8] D. Jain, A. Kumar, G. Garg, Sarcasm detection in mash-up language using soft-attention based
bidirectional lstm and feature-rich cnn, Applied Soft Computing 91 (2020) 106198. URL: https://www.
sciencedirect.com/science/article/pii/S1568494620301381. doi:https://doi.org/10.1016/j.
asoc.2020.106198.
[9] B. R. Chakravarthi, Hope speech detection in youtube comments, Social Network Analysis and</p>
        <p>Mining 12 (2022) 75.
[10] B. R. Chakravarthi, R. Priyadharshini, S. Banerjee, M. B. Jagadeeshan, P. K. Kumaresan, R.
Ponnusamy, S. Benhur, J. P. McCrae, Detecting abusive comments at a fine-grained level in a
lowresource language, Natural Language Processing Journal 3 (2023) 100006.
[11] B. R. Chakravarthi, M. B. Jagadeeshan, V. Palanikumar, R. Priyadharshini, Ofensive language
identification in dravidian languages using mpnet and cnn, International Journal of Information
Management Data Insights 3 (2023) 100151.
[12] B. R. Chakravarthi, A. Hande, R. Ponnusamy, P. K. Kumaresan, R. Priyadharshini, How can
we detect homophobia and transphobia? experiments in a multilingual code-mixed setting for
social media governance, International Journal of Information Management Data Insights 2 (2022)
100119.
[13] S. Divya, N. Sripriya, D. Evangelin, G. Saai Sindhoora, Opinion classification on code-mixed tamil
language, in: International Conference on Speech and Language Technologies for Low-resource
Languages, Springer, 2022, pp. 155–168.
[14] S. Khanuja, S. Dandapat, A. Srinivasan, S. Sitaram, M. Choudhury, GLUECoS: An evaluation
benchmark for code-switched NLP, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of
the 58th Annual Meeting of the Association for Computational Linguistics, Association for
Computational Linguistics, Online, 2020, pp. 3575–3585. URL: https://aclanthology.org/2020.acl-main.329.
doi:10.18653/v1/2020.acl-main.329.
[15] B. R. Chakravarthi, R. Priyadharshini, V. Muralidaran, N. Jose, S. Suryawanshi, E. Sherly, J. P.</p>
        <p>McCrae, Dravidiancodemix: Sentiment analysis and ofensive language identification dataset for
dravidian languages in code-mixed text, Language Resources and Evaluation 56 (2022) 765–806.
[16] A. Hande, R. Priyadharshini, B. R. Chakravarthi, Kancmd: Kannada codemixed dataset for
sentiment analysis and ofensive language detection, in: Proceedings of the Third Workshop on
Computational Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, 2020,
pp. 54–63.
[17] B. R. "Chakravarthi, S. N, B. B, N. K, T. Durairaj, R. Ponnusamy, P. K. Kumaresan, K. K. Ponnusamy,
C. Rajkumar, Overview of sarcasm identification of dravidian languages in dravidiancodemix@
ifre-2023 (2023).
[18] S. N, B. B, T. Durairaj, N. K, R. Ponnusamy, P. K. Kumaresan, K. K. Ponnusamy, C. Rajkumar,</p>
        <p>
          Overview of sarcasm identification of dravidian languages
          <xref ref-type="bibr" rid="ref3">in dravidiancodemix@fire-2024</xref>
          , ????
[19] N. Narayan, S. Mohanty, Enhancing sarcasm detection in code-mixed dravidian texts using data
augmentation and transformer models, in: Forum of Information Retrieval and
          <xref ref-type="bibr" rid="ref18 ref2 ref24">Evaluation FIRE
2024</xref>
          .
[38] A. Chowdhury, S. Paul, S. Kundu, A. K. Thakur, A. Sarkar, A. R. Chaudhuri, A. Ray, D. Mitra,
D. Saha, Sarcasm detection in dravidian languages, in: Forum of Information Retrieval and
Evaluation FIRE -
          <xref ref-type="bibr" rid="ref1 ref4 ref6 ref9">2024, DAIICT , Gandhinagar, 2024</xref>
          .
[39] P. Maity, D. Saha, S. Das, S. K. Mahata, D. Das, A hybrid approach to sarcasm detection in dravidian
code-mixed texts, in: Forum of Information Retrieval and Evaluation FIRE -
          <xref ref-type="bibr" rid="ref1 ref4 ref6 ref9">2024, DAIICT ,
Gandhinagar, 2024</xref>
          .
[40] A. Barman, A. Mandal, S. K. Naskar, Sarcasm or serious? sarcasm detection in code-mixed
dravidian languages, in: Forum of Information Retrieval and Evaluation FIRE -
          <xref ref-type="bibr" rid="ref1 ref4 ref6 ref9">2024, DAIICT ,
Gandhinagar, 2024</xref>
          .
[41] M. Subramanian, A. S, D. P, D. S, K. S V, Investigation of machine learning and transformer models
for sarcasm detection in dravidian languages, in: Forum of Information Retrieval and Evaluation
FIRE -
          <xref ref-type="bibr" rid="ref1 ref4 ref6 ref9">2024, DAIICT , Gandhinagar, 2024</xref>
          .
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <year>2024</year>
          ,
          <string-name>
            <surname>DAIICT</surname>
          </string-name>
          , Gandhinagar,
          <year>2024</year>
          . [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shanmugavadivel</surname>
          </string-name>
          , S. K,
          <string-name>
            <surname>S. Janani J S</surname>
            ,
            <given-names>R. K</given-names>
          </string-name>
          ,
          <article-title>Leveraging transfer learning and deep recurrent</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Evaluation</surname>
            <given-names>FIRE</given-names>
          </string-name>
          -
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shanmugavadivel</surname>
          </string-name>
          , P. C, V. L,
          <string-name>
            <surname>S. S,</surname>
          </string-name>
          <article-title>Leveraging machine learning and bert for sarcasm detection</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>in text, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <year>2024</year>
          . [22]
          <string-name>
            <given-names>S. D</given-names>
            ,
            <surname>K. G</surname>
            , H. L. Shashirekha
          </string-name>
          ,
          <article-title>Unmasking sarcasm: Exploring mbert+cnn and lstm models for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>and Evaluation</surname>
            <given-names>FIRE</given-names>
          </string-name>
          -
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cerda-Flores</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Castro-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Juarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. I.</given-names>
            <surname>Hernandez-Mazariegos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cerda-Jacobo</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <year>2024</year>
          ,
          <string-name>
            <surname>DAIICT</surname>
          </string-name>
          , Gandhinagar,
          <year>2024</year>
          . [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Leveraging chatgpt and xlm-roberta for sarcasm</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>FIRE - 2024</source>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>A transformer-based model for detecting multilingual sarcasm in social</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>media posts, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <year>2024</year>
          . [26]
          <string-name>
            <surname>S. S</surname>
          </string-name>
          , J. S,
          <string-name>
            <given-names>K. J. P</given-names>
            ,
            <surname>Sarcasm</surname>
          </string-name>
          <string-name>
            <surname>unveiled</surname>
          </string-name>
          :
          <article-title>Advanced detection techniques for tamil and malayalam using</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>multi modal approaches, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT ,
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Gandhinagar</surname>
          </string-name>
          ,
          <year>2024</year>
          . [27]
          <string-name>
            <surname>S. B K</surname>
            ,
            <given-names>S. Priyaa G K,</given-names>
          </string-name>
          <article-title>A</article-title>
          . P, C. Mahibha,
          <article-title>Sarcasm detection in dravidian languages using bi-directional</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>lstm, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A</given-names>
            ,
            <surname>A. T</surname>
          </string-name>
          , A. M,
          <string-name>
            <surname>K. S</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <article-title>Sarcasm detection in dravidian languages using machine</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>learning and transformer models, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>DAIICT</surname>
          </string-name>
          , Gandhinagar,
          <year>2024</year>
          . [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shanmugavadivel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R</given-names>
            ,
            <surname>M. Sameer</surname>
          </string-name>
          <string-name>
            <given-names>B</given-names>
            ,
            <surname>M. K</surname>
          </string-name>
          ,
          <article-title>Bi-lstm and hybrid model based</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Information</given-names>
            <surname>Retrieval</surname>
          </string-name>
          and
          <article-title>Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [30]
          <string-name>
            <surname>M. A</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. S</surname>
            , P. Priya
            <given-names>B</given-names>
          </string-name>
          , B. B,
          <article-title>Sarcasm detection and identification of dravidian language using</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>machine learning approach, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>DAIICT</surname>
          </string-name>
          , Gandhinagar,
          <year>2024</year>
          . [31]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shanmugavadivel</surname>
          </string-name>
          , N. K, S. S,
          <article-title>Enhanced sarcasm detection in code-mixed tamil-english text</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Evaluation</surname>
            <given-names>FIRE</given-names>
          </string-name>
          -
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [32]
          <string-name>
            <given-names>C.</given-names>
            <surname>Mahibha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Shimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          ,
          <article-title>Sarcasm detection from dravidian language text</article-title>
          , in: Forum
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [33]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Msd:
          <article-title>Multilingual sarcasm detection using deep learning-based model</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [34]
          <string-name>
            <given-names>S.</given-names>
            <surname>Karthik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sreekumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shyam Potta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          , Sarcasm identification of dravidian
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>languages malayalam and tamil, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>DAIICT</surname>
          </string-name>
          , Gandhinagar,
          <year>2024</year>
          . [35]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shanmugavadivel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Murugan</surname>
          </string-name>
          <string-name>
            <given-names>V</given-names>
            ,
            <surname>P. Sree</surname>
          </string-name>
          <string-name>
            <given-names>M</given-names>
            ,
            <surname>P. Chinnappan</surname>
          </string-name>
          <string-name>
            <surname>D</surname>
          </string-name>
          , Automated sarcasm identification
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [36]
          <string-name>
            <given-names>K. S V</given-names>
            ,
            <surname>M. Subramnanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S P</given-names>
            ,
            <surname>V. S H</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M,</surname>
          </string-name>
          <article-title>Detecting sarcasm in social media text using</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Evaluation</surname>
            <given-names>FIRE</given-names>
          </string-name>
          -
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          . [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maity</surname>
          </string-name>
          ,
          <article-title>Youtube comments decoded: Leveraging llms for low resource language</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <article-title>classification, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>