<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ments Decoded: Leveraging LLMs for Low Resource Language Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aniket Deroy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subhankar Maity</string-name>
          <email>subhankar.ai@kgpian.iitkgp.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>GPT, Sarcasm, Classification, Low-resource Languages, Prompt Engineering</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IIT Kharagpur</institution>
          ,
          <addr-line>Kharagpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>2</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Sarcasm detection is a significant challenge in sentiment analysis, particularly due to its nature of conveying opinions where the intended meaning deviates from the literal expression. This challenge is heightened in social media contexts where code-mixing, especially in Dravidian languages, is prevalent. Code-mixing involves the blending of multiple languages within a single utterance, often with non-native scripts, complicating the task for systems trained on monolingual data. This shared task introduces a novel gold standard corpus designed for sarcasm and sentiment detection within code-mixed texts, specifically in Tamil-English and Malayalam-English languages. The primary objective of this task is to identify sarcasm and sentiment polarity within a code-mixed dataset of Tamil-English and Malayalam-English comments and posts collected from social media platforms. Each comment or post is annotated at the message level for sentiment polarity, with particular attention to the challenges posed by class imbalance, reflecting real-world scenarios.In this work, we experiment with state-of-theart large language models like GPT-3.5 Turbo via prompting to classify comments into sarcastic or non-sarcastic categories. We obtained a macro-F1 score of 0.61 for Tamil language. We obtained a macro-F1 score of 0.50 for ∗Corresponding author.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Sarcasm, a form of expression where the intended meaning sharply diverges from the literal meaning,
poses a significant challenge for sentiment analysis systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This complexity arises because
sarcasm often involves subtle linguistic cues and context-dependent interpretation, making it dificult
for automated systems to accurately detect and classify [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        Sarcasm is a sharp, often humorous form of verbal irony where someone says the opposite of what
they truly mean, typically to convey disdain or mockery. It relies heavily on tone and context, as the
delivery can transform a seemingly straightforward statement into a biting comment[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. While it can
be a playful way to highlight absurdities or critique behavior, sarcasm can also lead to misunderstandings,
especially in written communication where vocal inflections are absent. Ultimately, it’s a double-edged
sword that can entertain or ofend, depending on the audience and the intent behind it.
      </p>
      <p>Overall, the study of code-mixing in NLP is an evolving field with ongoing research aimed at
improving the performance of various NLP tasks in multilingual contexts. As multilingualism continues
to grow globally, the development of models that can efectively handle code-mixed data is becoming
increasingly important.</p>
      <p>
        Code-mixing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in NLP refers to the blending of two or more languages within a single sentence
or conversation, commonly seen in multilingual societies. It poses unique challenges for tasks like
language identification, part-of-speech tagging, and sentiment analysis. Traditional NLP models often
struggle with code-mixed data due to the complex interaction between languages. To address this,
researchers have developed advanced neural network models, such as RNNs and CNNs, that better
capture linguistic context. Code-mixed text also complicates machine translation and speech recognition,
requiring specialized approaches to handle language switching efectively. As multilingualism becomes
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
more prevalent, the importance of developing robust NLP systems capable of processing code-mixed
data continues to grow.</p>
      <p>
        Code-mixing is a widespread phenomenon in multilingual societies, particularly in Dravidian
languages such as Tamil and Malayalam, where users frequently switch between native languages and
English [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These code-mixed texts often appear in non-native scripts, such as Roman, due to the
convenience of typing on digital platforms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This creates a unique set of challenges for natural
language processing (NLP) systems, which are typically trained on monolingual data and thus struggle
to handle the intricacies of code-switching across diferent linguistic levels, such as syntax, semantics,
and phonetics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The Dravidian languages, with their rich cultural and linguistic history, present a distinct case for
code-mixed text analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Tamil, with its oficial status in India, Sri Lanka, and Singapore, and
a significant diaspora community, has a script that evolved from ancient scripts like Tamili and the
Chola-Pallava script [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Similarly, Malayalam, predominantly spoken in Kerala, India, employs an
alpha-syllabic script that blends alphabetic and syllable-based writing systems [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, on social
media, these languages are often typed using the Roman script, contributing to a growing corpus of
code-mixed data.
      </p>
      <p>
        To address these challenges, this paper introduces a prompt-based method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] using GPT-3.5 Turbo
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for sarcasm and sentiment detection in code-mixed texts, specifically focusing on Tamil-English
and Malayalam-English language pairs. This shared task [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16">11, 12, 13, 14, 15, 16</xref>
        ] aims to advance research
in sentiment analysis by providing a dataset that reflects real-world scenarios where code-mixing is
prevalent and by encouraging the development of systems that can accurately classify sarcasm and
sentiment polarity in these under-resourced languages.
      </p>
      <p>Participants in this task were provided development, training, and test datasets composed of comments
and posts collected from social media platforms such as YouTube.</p>
      <p>
        In this work, we explore the capabilities of state-of-the-art large language models, specifically GPT-3.5
Turbo [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], by leveraging prompt-based techniques to classify social media comments into sarcastic or
non-sarcastic categories. GPT-3.5 Turbo [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], a variant of OpenAI’s GPT-3.5 model, is known for its
advanced natural language understanding and generation capabilities, making it a promising candidate
for handling the nuanced task of sarcasm detection, particularly in code-mixed texts.
      </p>
      <p>Our team, TextTitans, participated in a competition aimed at advancing sarcasm detection in Tamil
and Malayalam. The results of our systems’ performance provide valuable insights into the current
state of NLP for these languages. For Tamil, our system achieved a macro-F1 score of 0.61, placing us
9th among competing teams. The macro-F1 score, which balances precision and recall across multiple
classes, reflects our system’s ability to efectively handle the nuances of Tamil sarcasm. However,
despite the relatively strong performance indicated by the F1 score, we achieved 9th rank.</p>
      <p>In contrast, our system’s performance for Malayalam resulted in a macro-F1 score of 0.50, securing
the 13th position in the rankings. This score, while moderate, highlights the challenges inherent in
processing Malayalam text, particularly in capturing the subtleties of sarcastic expressions. The lower
rank compared to Tamil indicates that our system faced greater dificulties with Malayalam, and that
several other systems were more adept at handling these challenges.</p>
      <p>These results underscore the varying levels of dificulty and competition present in sarcasm detection
for Tamil and Malayalam. The higher performance in Tamil suggests that our system is more attuned
to the linguistic characteristics of Tamil sarcasm, while the more modest performance in Malayalam
points to areas where further research and refinement are necessary. This study not only contributes
to the growing body of work in multilingual sarcasm detection but also highlights the importance of
continuing to develop robust NLP systems capable of addressing the complexities of underrepresented
languages.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Sarcasm detection has been a topic of significant interest in the field of natural language processing (NLP)
due to its complex and context-dependent nature [17]. Traditional approaches to sarcasm detection
primarily relied on feature-based methods, where linguistic cues such as punctuation, lexical patterns,
and syntactic structures were used to identify sarcasm [17]. For instance, works by [18] and [19] utilized
syntactic patterns and hashtag-based supervision to improve sarcasm detection in English texts. These
approaches, however, often struggled with generalization, particularly in scenarios involving nuanced
and culturally specific forms of sarcasm.</p>
      <p>[20] introduced an approach using deep neural networks that leveraged context to improve sarcasm
detection. Similarly, [21] explored attention-based models that better captured the dependencies
within a sentence that could indicate sarcasm. These deep learning models significantly advanced the
state-of-the-art in sarcasm detection, particularly in monolingual contexts.</p>
      <p>One of the primary challenges in dealing with code-mixed text is language identification [ 22],
which involves determining the language of each word or phrase within a mixed-language sentence.
Traditional methods for this task often relied on n-gram models, dictionary-based approaches, and
supervised learning techniques. However, these methods sometimes struggle with the complex linguistic
patterns present in code-mixed data. These models are particularly efective at capturing the contextual
information necessary to accurately identify languages in code-mixed text.</p>
      <p>Another critical area of research is part-of-speech tagging [23] for code-mixed sentences. This
task becomes more complicated due to the interaction between diferent languages’ grammatical
structures. Researchers have employed techniques like Conditional Random Fields (CRFs) and
sequenceto-sequence models, adapting them to handle the intricacies of mixed-language data. The use of
multilingual embeddings, where words from diferent languages are mapped into a shared vector space,
has also been shown to improve the accuracy of part-of-speech tagging in code-mixed scenarios.</p>
      <p>In the context of sentiment analysis [24], code-mixing introduces additional complexity due to the
potential for difering sentiment expressions across languages. Traditional sentiment analysis tools
often fail when applied directly to code-mixed text. To overcome this, researchers have explored
methods such as using bilingual lexicons, translating the text into a single language before analysis,
and employing deep learning models that can process code-mixed input directly.</p>
      <p>Machine translation [25] for code-mixed text is another challenging area. Standard machine
translation systems are usually trained on monolingual data, making them less efective at handling code-mixed
sentences. Researchers have proposed various strategies to address this, including the use of synthetic
code-mixed data for training, as well as incorporating language tags and contextual embeddings into
translation models. These approaches aim to create more robust translation systems capable of handling
the nuances of code-mixing.</p>
      <p>Additionally, speech recognition [26] in code-mixed environments requires models that can
accurately transcribe spoken language that switches between diferent languages. This task is particularly
challenging because traditional speech recognition systems are typically designed for single-language
input. Recent advances involve the use of end-to-end models, such as attention-based encoder-decoder
architectures, which have shown promise in recognizing and transcribing code-mixed speech.</p>
      <p>Our contribution is situated at the intersection of sarcasm detection, code-mixed text processing, and
the application of large language models, addressing a critical gap in the literature and advancing the
state-of-the-art in this complex domain.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The Tamil testing dataset is 6338 comments. The Malayalam testing dataset is 2826 comments. Since we
used efectively only the test dataset for our predictions, we only mention the statistics corresponding
to the test dataset.</p>
      <p>Given that our evaluation focused entirely on the test datasets, the statistics mentioned correspond
to these specific datasets. The decision to use only the test datasets for prediction ensures that the
performance metrics, such as the macro-F1 scores, accurately reflect the model’s generalization capabilities
on unseen data.</p>
      <p>The Tamil testing dataset, with its larger size of 6,338 comments, provides a broad spectrum of inputs
for the model to process, potentially leading to more robust insights into the model’s strengths and
weaknesses. On the other hand, the smaller Malayalam testing dataset, with 2,826 comments, presents
its own set of challenges, particularly in a low-resource language context where data is often scarcer.</p>
      <p>By focusing on these testing datasets, our analysis remains rooted in real-world applicability, ofering
a clear view of how well the models perform in practical scenarios without additional interventions or
training phases.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Task Definition</title>
      <p>The task is to accurately classify YouTube comments that are written in Tamil-English and
MalayalamEnglish (code-mixed) into two distinct categories: Sarcastic and Non-sarcastic, using a AI model.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>
        Prompting [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is an efective approach for solving the problem of sarcasm detection in code-mixed
texts, particularly in under-resourced languages like Tamil-English and Malayalam-English, for several
reasons:
- Natural Language Understanding: Prompting allows the model to interpret and execute
tasks based on human-like instructions. By framing the classification task as a natural language
prompt [27], the model can utilize its language understanding capabilities to classify text according
to the specified categories. The prompt provides context to the model, guiding it to focus on
specific aspects of the input (e.g., determining sarcasm). This ensures that the model’s response
is aligned with the task at hand.
- Leverage of Pretrained Knowledge: Large language models like GPT-3.5 Turbo have been
trained on vast amounts of text data, encompassing diverse linguistic patterns, cultural contexts,
and language nuances [28]. Prompting allows us to tap into this rich, pre-trained knowledge
base, enabling the model to interpret sarcasm even in challenging code-mixed scenarios [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. By
carefully crafting prompts, we can guide the model to focus on specific aspects of the input, such
as tone, context, and language switching, which are critical for sarcasm detection.
- Reduced Need for Fine-Tuning: With the right prompt, large pre-trained models (like GPT) [29]
can perform various classification tasks without the need for extensive fine-tuning on task-specific
data. This is particularly useful when data or computational resources for fine-tuning are limited.
Prompts leverage the model’s existing knowledge, gained from vast pre-training on diverse text
corpora, enabling it to generalize to new tasks with minimal additional training.
- Adaptability to Multilingual and Code-Mixed Texts: Traditional machine learning models
often require large, labeled datasets for each specific task and language, which are scarce for many
code-mixed languages [30]. In contrast, prompting large language models provides a flexible way
to adapt to diferent languages and dialects, even when training data is limited [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The model’s
inherent ability to understand and generate text in multiple languages, including code-mixed
varieties, makes prompting an attractive approach for this task.
- Improved Interpretability: The explicit nature of prompts makes it easier to understand how
the model is interpreting the task [31], as the instructions are directly embedded in the input.
This transparency can help in analyzing and improving the model’s performance. By examining
the model’s responses to diferent prompts, researchers can gain insights into where and why the
model might be making classification errors, leading to more targeted improvements.
- Reduced Need for Task-Specific Data: Building a model from scratch or fine-tuning a model
for sarcasm detection in code-mixed texts would typically require a substantial amount of labeled
data, which is often dificult to obtain for under-resourced languages [ 32]. Prompting, however,
minimizes the need for such extensive labeled datasets, as it allows the model to utilize its general
language understanding capabilities with minimal task-specific adjustments [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This makes it a
more feasible and eficient solution for resource-constrained settings.
- Flexibility and Rapid Experimentation: Prompting enables rapid experimentation with
diferent strategies to optimize model performance [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Researchers can quickly test various
prompt formulations to see which one elicits the most accurate and context-aware responses
from the model. This flexibility is particularly beneficial for complex tasks like sarcasm detection,
where the model’s interpretation can vary significantly depending on how the problem is framed.
- Scalability Across Diferent Scenarios: Once an efective prompting strategy is developed, it
can be easily adapted or scaled to other similar tasks or languages without the need for extensive
retraining [
        <xref ref-type="bibr" rid="ref9">33, 9</xref>
        ]. This is particularly useful in multilingual environments where a single model
might need to handle multiple languages or dialects. Prompting allows for the reuse of the same
model across diferent linguistic contexts with minimal modifications.
      </p>
      <p>
        In summary, prompting is a powerful and eficient approach to tackle the problem of sarcasm detection
in code-mixed texts, ofering the benefits of leveraging large-scale pretrained models, adapting to
multilingual and code-mixed scenarios, reducing dependency on large annotated datasets, enabling
rapid experimentation, and providing scalability across diferent linguistic contexts.
5.1. Prompt Engineering-Based Approach
We used the GPT-3.5 Turbo model via prompting to solve the classification task. We used GPT-3.5
Turbo in zero-Shot mode via prompting. After the prompt is provided to the LLM, the following steps
happen internal to the LLM while generating the output. The following outlines the steps [
        <xref ref-type="bibr" rid="ref10">10, 34, 35</xref>
        ]
that occur internally within the LLM, summarizing the prompting approach using GPT-3.5 Turbo:
      </p>
      <sec id="sec-5-1">
        <title>Step 1: Tokenization</title>
        <p>Step 2: Embedding
• Prompt:  = [ 1,  2, … ,   ]
• The input text (prompt) is first tokenized into smaller units called tokens. These tokens are often
subwords or characters, depending on the model’s design.
• Tokenized Input:  = [ 1,  2, … ,   ]
• Each token is converted into a high-dimensional vector (embedding) using an embedding matrix
 .
• Embedding Matrix:  ∈ ℝ | |× , where | | is the size of the vocabulary and  is the embedding
dimension.</p>
        <p>• Embedded Tokens:  emb = [( 1), ( 2), … , (  )]</p>
      </sec>
      <sec id="sec-5-2">
        <title>Step 3: Positional Encoding</title>
        <p>• Since the model processes sequences, it adds positional information to the embeddings to capture
the order of tokens.
• Positional Encoding:  (  )
• Input to the Model:  =  emb +</p>
      </sec>
      <sec id="sec-5-3">
        <title>Step 4: Attention Mechanism (Transformer Architecture)</title>
        <p>• Attention Score Calculation: The model computes attention scores to determine the importance
of each token relative to others in the sequence.
• Attention Formula:</p>
        <p>Attention(,  ,  ) =
softmax (
) 
• where  (query),  (key), and  (value) are linear transformations of the input  .
• This attention mechanism is applied multiple times through multi-head attention, allowing the
model to focus on diferent parts of the sequence simultaneously.</p>
        <p>√ 
• The output of the attention mechanism is passed through feedforward neural networks, which
• Multiple layers of attention and feedforward networks are stacked, each with its own set of
parameters. This forms the ”deep” in deep learning.
• Layer Output:
 () = LayerNorm( () + Attention( () ,  () ,  () ))</p>
        <p>(+1) = LayerNorm( () + FFN( () ))</p>
      </sec>
      <sec id="sec-5-4">
        <title>Step 7: Output Generation</title>
      </sec>
      <sec id="sec-5-5">
        <title>Step 5: Feedforward Neural Networks</title>
        <p>apply non-linear transformations.
• Feedforward Layer:
• where  1,  2 are weight matrices and  1,  2 are biases.</p>
      </sec>
      <sec id="sec-5-6">
        <title>Step 6: Stacking Layers</title>
        <p>FFN() = max(0,  1 +  1) 2 +  2
• where   is the logit corresponding to token  in the vocabulary.
• The model generates the next token in the sequence based on the probability distribution, and
the process repeats until the end of the output sequence is reached.</p>
      </sec>
      <sec id="sec-5-7">
        <title>Step 8: Decoding</title>
        <p>• The predicted tokens are then decoded back into text, forming the final output.
• Output Text:  = [ 1,  2, … ,   ]
of 0.7, 0.8, and 0.9.</p>
        <sec id="sec-5-7-1">
          <title>Tamil and Malayalam.</title>
          <p>We used the following prompt for Tamil language for the purpose of classification: ” Please Check
whether the comment-&lt;Text&gt; is Sarcastic or Non-sarcastic. Only state Sarcastic or Non-sarcastic”. The
ifgure representing the methodology is shown in Figure
1.</p>
          <p>We used the following prompt for Malayalam language for the purpose of classification: ” Please
Check whether the comment-&lt;Text&gt; is Sarcastic or Non-sarcastic. Only state Sarcastic or Non-sarcastic”.
The figure representing the methodology is shown in Figure</p>
        </sec>
        <sec id="sec-5-7-2">
          <title>2. We run the GPT model at temperatures</title>
          <p>We have used two diferent figures (Figure</p>
          <p>1 and Figure 2) for the two sarcasm detection models in
• The final output of the stacked layers is a sequence of vectors.
• These vectors are projected back into the token space using a softmax layer to predict the next
token or word in the sequence.
• Softmax Function:
 (  | ) =</p>
          <p>exp(  )
∑|=| 1 exp(  )
(1)
(2)
(3)
(4)
(5)</p>
          <p>The methodology employed for the classification task involved using language-specific prompts to
determine whether a comment is sarcastic or non-sarcastic, with this approach applied to both Tamil and
Malayalam languages. For Tamil, the prompt used was: “Please Check whether the comment-&lt;Text&gt; is
Sarcastic or Non-sarcastic. Only state Sarcastic or Non-sarcastic”. This prompt was carefully crafted to
direct the model’s focus solely on the content of the comment, ensuring that the output would be either
”Sarcastic” or ”Non-sarcastic” without any additional context or information.</p>
          <p>Similarly, for Malayalam, the prompt used was: “Please Check whether the comment-&lt;Text&gt; is
Sarcastic or Non-sarcastic. Only state Sarcastic or Non-sarcastic”. This prompt, is similar in structure to
the Tamil prompt, was designed to maintain consistency in the classification task across both languages.</p>
          <p>The process began with the preparation of input, where comments in Tamil and Malayalam were fed
into the model using their respective prompts. The model then processed each comment according
to the instructions provided in the prompt, returning a binary classification of either ”Sarcastic” or
”Non-sarcastic” for each comment.</p>
          <p>In conclusion, this methodology ensured a consistent classification task across diferent languages
while accommodating the unique characteristics of Tamil and Malayalam. The use of clear and concise
prompts enabled the model to focus on the core task of sarcasm detection, leading to reliable and
interpretable results.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>Our team named TextTitans [36], for the Tamil language the macro-F1 score is 0.61 with a rank of 9th.
For Malayalam language the macro-F1 score is 0.50 with a rank of 13th.</p>
      <p>For Malayalam, the system has achieved a macro-F1 score of 0.50, which represents the harmonic
mean of precision and recall across multiple classes, treating each class equally. This score places the
system in the 13th position among other competing models or systems. A rank of 13th suggests that
while the performance is moderate, there are several other systems that have performed better.</p>
      <p>For Tamil, the system has a macro-F1 score of 0.61, indicating better performance than the Malayalam
system in terms of the balance between precision and recall. However, despite the higher F1 score, this
system is ranked 9th. This lower rank, despite a better score, suggests that the competition for Tamil
language tasks was fiercer, with more systems achieving higher or comparable performance levels.</p>
      <p>Table 1 shows the classification report for Sarcasm Detection for Malayalam language. The model
performs well with a precision of 0.82, indicating that most of the time when it predicts a comment
as non-sarcastic, it is correct. However, with a recall of 0.73, it misses some non-sarcastic comments,
leading to a moderately high F1-score of 0.77. The model struggles with sarcasm detection, evident by a
low precision (0.18) and recall (0.27). This indicates the model has dificulty both identifying sarcastic
comments and correctly predicting them, resulting in a low F1-score of 0.22. The overall performance
across both classes is quite balanced (0.50 for precision, recall, and F1-score), showing the disparity in
performance between the two classes. Weighted by the number of instances in each class, this metric
gives a better sense of the model’s performance on the dataset as a whole, showing the skew towards
the Non-sarcastic class due to its larger support.</p>
      <p>Table 2 shows the classification report for Sarcasm Detection for Tamil language. The model shows
strong performance with a precision, recall, and F1-score all at 0.79, indicating balanced detection and
prediction of non-sarcastic comments. While still less efective at detecting sarcasm, the model has
improved significantly from the first table, with a precision, recall, and F1-score of 0.43. This shows
better balance in detecting sarcasm compared to the first report. The macro average has improved to
0.61 for all three metrics, indicating better overall performance across both classes, even though the
model is still biased towards the Non-sarcastic class. Consistent across precision, recall, and F1-score at
0.69, showing that the model’s performance is more balanced in this report compared to the first one,
likely due to better handling of the Sarcastic class.</p>
      <p>Error Analysis: According to Table 1, the sarcasm detection results for the Malayalam language
using GPT-3.5 reveal significant performance gaps, particularly in identifying sarcastic remarks. The
model’s low precision of 0.18 for the sarcastic class indicates a high rate of false positives, suggesting
that non-sarcastic comments are often misclassified as sarcastic. Additionally, a recall of 0.27 shows
that many actual sarcastic instances go undetected, resulting in a low F1-score of 0.22. In contrast,
the non-sarcastic class performs better, with a precision of 0.82 and an F1-score of 0.77, reflecting the
model’s stronger ability to identify straightforward statements. This imbalance highlights the model’s
struggle with the nuances of sarcasm, indicating the need for further refinement in training strategies
or additional data to enhance performance in Malayalam.</p>
      <p>Similarly, the sarcasm detection results for the Tamil language (See Table 2) indicate critical areas for
improvement. The model’s precision for sarcastic comments is only 0.43, which signifies a considerable
number of false positives, where non-sarcastic statements are misclassified as sarcastic. The recall
for the sarcastic class is also low at 0.43, suggesting that the model fails to identify many genuine
sarcastic instances, leading to an F1-score of 0.43. Conversely, the non-sarcastic class shows stronger
performance, achieving a precision, recall, and F1-score of 0.79. This indicates efective detection of
straightforward statements. The overall balanced performance, with a macro average F1-score of 0.61,
highlights the inconsistency in sarcasm detection. These findings underscore the necessity for improved
training methods or more diverse datasets to bolster the model’s capability to recognize sarcasm in
Tamil more efectively.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The complexity of code-switching, coupled with the nuanced nature of sarcasm, presents significant
hurdles for traditional sentiment analysis systems, particularly in under-resourced languages. To tackle
these challenges, we experimented with state-of-the-art large language models like GPT-3.5 Turbo,
leveraging prompt-based techniques to classify comments and posts as sarcastic or non-sarcastic. Our
approach demonstrated the potential of prompting as an efective method for sarcasm detection in
multilingual and code-mixed environments, allowing us to utilize the extensive pretrained knowledge
of the model with minimal task-specific data. The presence of a new gold standard corpus for
TamilEnglish and Malayalam-English code-mixed text contributes to the advancement of research in this
area, providing a valuable resource for the development and evaluation of sarcasm detection systems.
Overall, our work underscores the viability of prompt-based techniques for addressing the unique
challenges of sarcasm detection in code-mixed languages and paves the way for future studies aimed
at improving systems in multilingual social media contexts. We hope that this research will inspire
continued exploration of advanced language models in tackling the diverse and evolving landscape of
digital communication.</p>
      <p>The evaluation of our sarcasm detection systems for Malayalam and Tamil languages provides
valuable insights into the performance and challenges associated with handling these languages. For
Malayalam, the system achieved a macro-F1 score of 0.50, which, while representing a moderate balance
between precision and recall across classes, placed the system at the 13th position among competing
models. This rank indicates that while our system performs adequately, there are several other models
that have outperformed it in the specific context of Malayalam sarcasm detection.</p>
      <p>In contrast, the Tamil sarcasm detection system achieved a higher macro-F1 score of 0.61, reflecting a
better overall balance between precision and recall compared to the Malayalam system. Despite this
improved score, the Tamil system was ranked 9th among competitors. This lower ranking, even with a
higher F1 score, highlights the more competitive landscape for Tamil language tasks, where several
systems have achieved similar or better performance levels.</p>
      <p>These results underscore the varying degrees of dificulty in sarcasm detection across diferent
languages and the importance of context in interpreting performance metrics like the macro-F1 score.
The diferences in ranking also suggest that the complexity of the language, the nature of the testing
data, and the sophistication of competing models play significant roles in determining a system’s
success.</p>
      <p>In future work, it will be crucial to explore ways to enhance the performance of both systems,
particularly by focusing on the nuances of sarcasm in these languages, improving the quality and
diversity of the training data, and leveraging more advanced modeling techniques. Additionally, a more
in-depth analysis of the specific errors made by the systems could provide further insights into areas
that require targeted improvements. Ultimately, the goal is to develop robust and competitive systems
that can more efectively handle the challenges of sarcasm detection in both Malayalam and Tamil,
contributing to the broader field of natural language processing for low-resource languages.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Drafting content, Grammar
and spelling check, etc. After using this tool/service, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
R. Ponnusamy, C. Subalalitha, B. R. Chakravarthi, Findings of shared task on sarcasm identification
in code-mixed dravidian languages, FIRE 2023 16 (2023) 22.
[17] A. D. Dave, N. P. Desai, A comprehensive study of classification techniques for sarcasm detection
on textual data, in: 2016 International Conference on Electrical, Electronics, and Optimization
Techniques (ICEEOT), IEEE, 2016, pp. 1985–1991.
[18] D. Davidov, O. Tsur, A. Rappoport, Enhanced sentiment learning using twitter hashtags and
smileys, in: Coling 2010: Posters, 2010, pp. 241–249.
[19] E. Rilof, A. Qadir, P. Surve, L. De Silva, N. Gilbert, R. Huang, Sarcasm as contrast between a
positive sentiment and negative situation, in: Proceedings of the 2013 conference on empirical
methods in natural language processing, 2013, pp. 704–714.
[20] A. Ghosh, T. Veale, Fracking sarcasm using neural network, in: Proceedings of the 7th workshop on
computational approaches to subjectivity, sentiment and social media analysis, 2016, pp. 161–169.
[21] Y. Tay, L. A. Tuan, S. C. Hui, J. Su, Reasoning with sarcasm by reading in-between, arXiv preprint
arXiv:1805.02856 (2018).
[22] A. F. Hidayatullah, A. Qazi, D. T. C. Lai, R. A. Apong, A systematic review on language identification
of code-mixed text: techniques, data availability, challenges, and framework development, IEEE
access 10 (2022) 122812–122831.
[23] R. Sequiera, M. Choudhury, K. Bali, Pos tagging of hindi-english code mixed text from social
media: Some machine learning experiments, in: Proceedings of the 12th international conference
on natural language processing, 2015, pp. 237–246.
[24] B. R. Chakravarthi, R. Priyadharshini, V. Muralidaran, S. Suryawanshi, N. Jose, E. Sherly, J. P.</p>
      <p>McCrae, Overview of the track on sentiment analysis for dravidian languages in code-mixed text,
in: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation,
2020, pp. 21–24.
[25] G. Jawahar, E. M. B. Nagoudi, M. Abdul-Mageed, L. V. Lakshmanan, Exploring text-to-text
transformers for english to hinglish machine translation with synthetic code-mixing, arXiv
preprint arXiv:2105.08807 (2021).
[26] Y. Long, Y. Li, Q. Zhang, S. Wei, H. Ye, J. Yang, Acoustic data augmentation for mandarin-english
code-switching speech recognition, Applied Acoustics 161 (2020) 107175.
[27] J. Allen, Natural language understanding, Benjamin-Cummings Publishing Co., Inc., 1988.
[28] M. U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M. B. Shaikh, N. Akhtar, J. Wu, S. Mirjalili,
et al., Large language models: a comprehensive survey of its applications, challenges, limitations,
and future prospects, Authorea Preprints (2023).
[29] P. Colon-Hernandez, C. Havasi, J. Alonso, M. Huggins, C. Breazeal, Combining pre-trained
language models and structured knowledge, arXiv preprint arXiv:2101.12294 (2021).
[30] A. F. Hidayatullah, A. Qazi, D. T. C. Lai, R. A. Apong, A systematic review on language identification
of code-mixed text: techniques, data availability, challenges, and framework development, IEEE
access 10 (2022) 122812–122831.
[31] Y. W. Jie, R. Satapathy, G. S. Mong, E. Cambria, et al., How interpretable are reasoning explanations
from prompting large language models?, arXiv preprint arXiv:2402.11863 (2024).
[32] K. Goswami, Unsupervised deep representation learning for low-resourced, LASER 77 (2012) 79–7.
[33] M. G. Z. A. Khan, M. F. Naeem, L. Van Gool, D. Stricker, F. Tombari, M. Z. Afzal, Introducing
language guidance in prompt-based continual learning, in: Proceedings of the IEEE/CVF International
Conference on Computer Vision, 2023, pp. 11463–11473.
[34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, Advances in neural information processing systems 30 (2017) 5998–6008.
[35] A. Radford, Improving language understanding by generative pre-training (2018).
[36] B. R. Chakravarthi, S. N, B. B, N. K, T. Durairaj, R. Ponnusamy, P. K. Kumaresan, K. K. Ponnusamy,
C. Rajkumar, Overview of sarcasm identification of dravidian languages in
dravidiancodemix@fire2024, in: Forum of Information Retrieval and Evaluation FIRE - 2024, DAIICT , Gandhinagar,
2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Uchiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Saito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Tanabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Harada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ohno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Koeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sadato</surname>
          </string-name>
          ,
          <article-title>Distinction between the literal and intended meanings of sentences: A functional magnetic resonance imaging study of metaphor and sarcasm</article-title>
          , cortex
          <volume>48</volume>
          (
          <year>2012</year>
          )
          <fpage>563</fpage>
          -
          <lpage>583</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <source>Sarcasm Detection Incorporating Context &amp; World Knowledge, Ph.D. thesis, COOPER UNION</source>
          ,
          <year>1914</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bin</surname>
          </string-name>
          <string-name>
            <surname>Razali</surname>
          </string-name>
          ,
          <article-title>Figurative language detection using deep learning and contextual features (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>A survey of current datasets for code-switching research</article-title>
          ,
          <source>in: 2020 6th international conference on advanced computing and communication systems (ICACCS)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , G. Seed,
          <article-title>Building educational technologies for code-switching: Current practices, dificulties and future directions</article-title>
          ,
          <source>Languages</source>
          <volume>7</volume>
          (
          <year>2022</year>
          )
          <fpage>220</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Dravidiancodemix: Sentiment analysis and ofensive language identification dataset for dravidian languages in code-mixed text</article-title>
          ,
          <source>Language Resources and Evaluation</source>
          <volume>56</volume>
          (
          <year>2022</year>
          )
          <fpage>765</fpage>
          -
          <lpage>806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ramaswamy</surname>
          </string-name>
          ,
          <article-title>Historical dictionary of the Tamils</article-title>
          ,
          <source>Rowman &amp; Littlefield</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>N. B. FalDessai</surname>
          </string-name>
          ,
          <article-title>Development of a Text to Speech System for Devanagari Konkani</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Goa University,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T. B. Brown</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , arXiv preprint ArXiv:
          <year>2005</year>
          .
          <volume>14165</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          , J. Jose, P. Kumaresan,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <article-title>Findings of the shared task on sentiment analysis for dravidian languages in code-mixed text</article-title>
          ,
          <source>in: Proceedings of the first workshop on speech and language technologies for Dravidian languages</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <article-title>How can we detect homophobia and transphobia? experiments in a multilingual code-mixed setting for social media governance</article-title>
          ,
          <source>International Journal of Information Management Data Insights</source>
          <volume>2</volume>
          (
          <year>2022</year>
          )
          <fpage>100119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Hope speech detection in youtube comments</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>75</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nandhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rajkumar, Overview of the shared task on sarcasm identification of dravidian languages (malayalam and tamil) in dravidiancodemix, in: Forum of Information Retrieval and Evaluation FIRE-</article-title>
          <year>2023</year>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nandhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Chinnaudayar</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rajkumar, Overview of the shared task on sarcasm identification of Dravidian languages (Malayalam and Tamil) in DravidianCodeMix, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2023</year>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nandhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rajkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>