<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Based on Ensembles of Pre-Trained Models for Sarcasm Identification in Dravidian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaime Cerda-Flores</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Castro-Pineda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel G. Juarez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodrigo I Hernandez-Mazariegos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaime Cerda-Jacobo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Graf</string-name>
          <email>mario.graff@infotec.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Ortiz-Bejar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Stacking</institution>
          ,
          <addr-line>XGBoost, FastText, mTC</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>2</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This manuscript describes the participation of the UMSNH NLP Team in the Sarcasm Identification of Dravidian Languages: Tamil &amp; Malayalam task at FIRE 2024. Our approach combines bag-of-words and deep learning models, solving the task independently. We then construct a new feature space by leveraging the decision functions of the individual models. This new feature space is fed into an XGBoost classifier to make a final prediction. The generic text categorization system, FastText, achieves the best performance for the Tamil-based task with a macro F1-score of 0.74. However, our combined model improves upon individual performances for the Malayalam task with a macro F1-score of 0.76.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task description</title>
      <p>
        The task Sarcasm Identification of Dravidian Languages Tamil &amp; Malayalam aims to identify sarcasm and
sentiment polarity in code-mixed YouTube comments/posts written in Tamil-English and
MalayalamEnglish. Each corpus entry is annotated as Sarcastic, Non-Sarcastic. More details about corpus creation
and labeling can be seen at references [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ].
      </p>
      <p>For the task, organizers provide two datasets per language: one labeled dataset for training and an
unlabeled one for evaluation. Furthermore, the training set of each language is split into a smaller
validation set and a larger training set. Table 2 shows the composition of the training and testing sets.</p>
      <sec id="sec-2-1">
        <title>Language</title>
      </sec>
      <sec id="sec-2-2">
        <title>Dataset</title>
      </sec>
      <sec id="sec-2-3">
        <title>Total Instances</title>
      </sec>
      <sec id="sec-2-4">
        <title>Non-sarcastic</title>
      </sec>
      <sec id="sec-2-5">
        <title>Sarcastic</title>
        <sec id="sec-2-5-1">
          <title>Malayalam</title>
        </sec>
        <sec id="sec-2-5-2">
          <title>Tamil</title>
        </sec>
        <sec id="sec-2-5-3">
          <title>Train</title>
        </sec>
        <sec id="sec-2-5-4">
          <title>Test</title>
        </sec>
        <sec id="sec-2-5-5">
          <title>Train</title>
        </sec>
        <sec id="sec-2-5-6">
          <title>Test</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        Transformers, introduced by Vaswani et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], revolutionized natural language processing by relying
on a self-attention mechanism to capture long-range dependencies in text. Since their introduction,
transformers have been successfully applied to various tasks, such as machine translation as implemented
through BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and text generation through GPT-2 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], demonstrating state-of-the-art performance
across multiple domains.
      </p>
      <p>
        As of recent, transformer based models have also been widely used for text classification as well, which
can be observed in the past edition of the Sarcasm Identification of Dravidian Languages task from the
Forum for Information Retrieval Evaluation 2023 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Amongst these, the best performing system for
the Tamil-English language was achieved by Bhaumik and Das [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] through the use of MuRIL [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], with
which they also achieved second place in the Malayalam-English language.
      </p>
      <p>
        However, not all participants chose to employ these transformer based models. The work presented
by Krishnan et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] saw the use of simpler classification methods; MLPs, Random Forests, KNN
and SVMs, along with count and frequency based methods for feature extraction, as opposed to word
embeddings used in transformers. This implementation proved to be efective as it placed the authors in
ifrst place for the Malayalam-English language, demonstrating the continued value of shallow-learning
approaches.
      </p>
      <p>
        FastText, introduced by Joulin et al. proposed an approach to text classification which can be considered
a modification of the popular Word2Vec algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Instead of using a Continuous Bag of Words
model to train word embeddings to predict surrounding words, it trains them to predict the predefined
labels directly. From [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], it has been shown to perform competitively in sentiment analysis tasks.
XGBoost is an optimized gradient boosting framework that has become highly popular in machine
learning due to its eficiency and performance in predictive modeling [ 13]. XGBoost has achieved
state-of-the-art results in various structured data tasks, such as the classification of fake news [ 14].
Ensembles from stacking outputs of diferent classifiers have been used in [ 15, 16]. In [15], the authors
used this method to classify sarcastic tweets that did not contain the string “#sarcasm”. The approach
in [16] sought to enhance the performance of individual classifiers and showed competitive results.
Nevertheless, performance seemed to be afected by the use of sub-sets from the original database for
model training due to computational limitations. The latter is addressed in this work.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Our approach</title>
      <p>Roughly, our approach consists of first applying text pre-processing. The processed text is used to train
and optimize multiple text classification models  . Then, the decision function   values are stacked
to produce a new feature space, which is used to train an XGBoost instance; this flow is depicted by the
orange line in the Figure 1. For the prediction phase, after performing the pre-processing, the stacked
space is created by using the prediction made for each model   and finally fed to the XGBoost to
perform the final decision; this phase is indicated by the blue line in Figure 1.</p>
      <p>Labelled
Corpus
Test set</p>
      <p>Preprocessing</p>
      <p>Train
Validation</p>
      <p>Fit models</p>
      <p>Stacked
Decision functions</p>
      <p>XGBoost
The rest of the document is organized as follows: Section 4.1 explains the pre-preprocessing details.
The basis of the diferent evaluated classifiers is described in Section 4.2. Experimental setup and results
are presented in Section 5. Finally, in Section 6, some conclusions derived from the results are given.
4.1. Pre-processing
One pre-processing approach is to transliterate a language into a specific alphabet, such as the Malayalam
script or Latin script. This method, while less robust, ofers a high level of accuracy, lower computational
cost, and superior consistency for this task.</p>
      <p>Text to Tamil/Malayalam script means transliterating the text into the Tamil/Malayalam alphabet, Text
to Latin Script means transliterating the text from the Tamil/Malayalam alphabet to Latin, Raw text
means no pre-processing was done. Text normalization means the text was treated even further, such
as removing punctuation marks, URLs, or lower casing.</p>
      <sec id="sec-4-1">
        <title>4.1.1. Tamil/Malayalam Script</title>
        <p>The primary objective of this pre-processing task is to identify the alphabet in which the phrase is
written and, if needed, transliterate it to the desired alphabet. This is to ensure that the entire data set,
in both Malayalam and Tamil languages, is in a single alphabet, specifically the Latin alphabet.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.1.2. To Latin Script</title>
        <p>This pre-processing task involves translating Malayalam and Tamil texts from their source into the
Latin alphabet. This is to take advantage of the tokenizers of some multilingual BERT models that
require the input in Latin script.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.1.3. Text Normalization</title>
        <p>Text normalization aims to transform raw text into a standardized format easier for machine learning
algorithms to process. It reduces complexities and diferences in text, such as spelling, formatting, or
casing. It allows tokenization and model training to focus on patterns in the text rather than noise.
Normalization procedures, such as lower casing, punctuation and special character removal, stemming,
lemmatization, stop word removal, handling of numbers, usernames, or emails, dealing with emojis or
emoticons, and handling spelling mistakes, play a vital role in enhancing the quality of text data for
machine learning.
4.2. Classification Approaches
4.2.1.  TC
 TC is a text classification framework designed to eficiently find a competitive text classifier by treating
the process as a combinatorial optimization problem [17]. It defines a large configuration space, which
includes text transformation, tokenization, vectorization, and classification functions. Given the vast
number of possible configurations (over 45 million at the time of writing), evaluating all of them
is computationally impractical. To address this,  TC employs the meta-heuristic search techniques
Random Search and Hill Climbing to navigate the configuration space eficiently.</p>
        <p>In this framework, the objective is to maximize the score function, set as the Macro F1-score, but
this setting is customizable. This score evaluates a classifier trained on a given dataset by comparing
predicted labels against true labels.</p>
        <p>The optimization process begins with Random Search, which selects a subset of configurations and
ifnds the best-performing one. Hill Climbing then explores the neighborhood of this best configuration
to find further improvements.</p>
        <p>
          The configuration space includes functions for text transformation (e.g., handling hashtags, URLs, and
emojis), tokenization (e.g., n-word grams, q-character grams), vectorization (e.g., TF, TF-IDF), and
classification (using an unoptimized SVM with a linear kernel). This flexible setup allows  TC to handle
various text preprocessing and classification tasks.
4.2.2. BERT
BERT (Bidirectional Encoder Representations from Transformers) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and RoBERTa (Robustly
Optimized BERT Pretraining Approach)[18] are transformer-based text models that have shown excellent
performance at diferent classification benchmarks like GLUE, RACE and SQuAD, to name a few. Both
RoBERTa and BERT models are designed as bidirectional models that can predict words conditioned
on both the left and right contexts. This feature has shown outstanding results in multiple natural
language processing tasks, such as text classification.
        </p>
        <p>The BERT transformer model is pre-trained using a static MLM (masked language modeling) and NSP
(next-sentence prediction). In the MLM task, about 15% of the words in each sequence are masked, and
the model is trained to predict them. On the other hand, RoBERTa is built on top of BERT and modifies
key hyperparameters. In the pre-training stage, it uses a dynamic MLM where the masked token is
constantly changed. It also completely removes the NSP step.</p>
        <p>Fine-tuning
Fine-tuning is the process in which the parameters learned from a previous pre-trained model are then
transferred to a new model that will work as a starting point for a Natural Language task such as text
classification.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Model</title>
        <sec id="sec-4-4-1">
          <title>BERT</title>
        </sec>
        <sec id="sec-4-4-2">
          <title>RoBERTa</title>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>Pre-training approach</title>
      </sec>
      <sec id="sec-4-6">
        <title>Tokenizer</title>
        <sec id="sec-4-6-1">
          <title>Static MLM, NSP</title>
        </sec>
        <sec id="sec-4-6-2">
          <title>Dynamic MLM</title>
        </sec>
        <sec id="sec-4-6-3">
          <title>Wordpiece BPE</title>
          <p>For the fine-tuning task done in this study, a sequence classification instance using BERT was generated
for each pre-trained BERT or RoBERTa model. These instances contained labels for two
categories:”sarcastic” and ”nonsarcastic”. Data sets were loaded, pre-tokenized, and padded to the max length of the
models (512 tokens). Table 3 shows the diferent models fine-tuned for the sarcasm detection task.</p>
        </sec>
      </sec>
      <sec id="sec-4-7">
        <title>Target Language Preprocessing Hyperparameters Model Name</title>
        <p>
          MuRIL is a BERT-based model pre-trained on 17 Indian languages and their transliterated versions[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
It has been used in previous text classification competitions with positive results [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. MuRIL follows a
similar training approach to multilingual BERT but includes some modifications; training uses translation
and transliteration segment pairs and applies an upsampling factor of 0.3, enhancing performance
for low-resource languages. It is trained on both monolingual data (Wikipedia and Common Crawl),
translated data (translations of the monolingual data and the PMINDIA dataset), and transliterated
data (transliterations from Wikipedia and the Dakshina dataset). The model was trained using whole
word masking for 1000K steps, with a batch size of 4096 and a sequence length of 512. It focuses on
self-supervised masked language modeling.
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>4.2.3. FastText</title>
        <p>FastText is a library predominantly used for generating word embeddings. It is also efective for text
classification through techniques like bag-of-words and subword information. It includes various
unsupervised and supervised learning algorithms. FastText has pre-trained models for nearly 294
languages, including German, Spanish, French, and Czech. One main characteristic of fastText is its
ability to generate word vectors even for unknown, out-of-vocabulary (OOV) words or concatenation
of diferent words. This is due to how word vectors are created by joining substrings of characters in
the OOV words.
4.3. Ensemble
The ensemble approach consisted of creating a new Vector Space Model (VSM) by horizontally stacking
the output probabilities and decision functions obtained from the previous classification approaches,
which were observed to perform relatively well individually. The VSM is then used to fit an XGBoost
classifier optimized through randomized search cross-validation, which performs the final prediction
task. The dimension of the VSM used as input data is given by  ×  ×   , where  is the number of
classes,  is the number of samples in the dataset, and   is the number of used outputs or the amount
of selected approaches used to create the VSM. In the case of this task,  is always equal to 2 for every
approach, save for the case of  TC, as the output decision function from its SVM classifier is a positive
number for a sarcastic prediction and a negative number for a non-sarcastic prediction, whereas the
rest of the approaches output a pair of probabilities for a sample being of each class.
In order to obtain the outputs used to train the XGBoost classifiers, as well as to validate them and
get the final predictions from the test set, the models obtained from each approach were used on the
training sets, the validation sets, and the test sets for each language.
4.3.1. XGBoost
XGBoost is an open source machine learning library designed to implement eficient distributed gradient
boosting algorithms [19]. It has been widely used by winning teams in machine learning competitions.
The core concept behind gradient boosting is using weak models, in this case shallow decision trees,
to build a strong decision tree. This building process is done sequentially, until a specified number
of iterations is met. Each new built model tries to correct the errors made by the previous models by
minimizing a specified loss function using gradient descent, the first derivative of the loss function.
Unlike the normal gradient boosting algorithm, XGBoost uses the second derivative of the loss function,
it’s Hessian, as well as the first derivative, essentially turning the gradient descent optimization into
Newton-Raphson optimization. It also uses L1 and L2 regularization to control model complexity,
making it less prone to overfitting by penalizing large trees and overly complex models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and results</title>
      <p>For our experiments, we used only the training and validation sets as provided by the competition
organizers. The compositions of each set are shown in Table 5.</p>
      <sec id="sec-5-1">
        <title>Language</title>
        <sec id="sec-5-1-1">
          <title>Malayalam</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>Tamil</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Dataset</title>
        <sec id="sec-5-2-1">
          <title>Train</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Validation</title>
        </sec>
        <sec id="sec-5-2-3">
          <title>Train</title>
        </sec>
        <sec id="sec-5-2-4">
          <title>Validation</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Total Instances</title>
        <p>13188
2826
As the  TC classification pipeline involves a series of text pre-processing steps, the raw datasets were
fed to it as is, specifying the use of 5-fold cross-validation and the space configuration search to take
place in 80 points. A single model was trained per-language. Table 5 shows the parameters obtained for
Malayalam and Tamil after optimization.</p>
        <p>For the use of FastText, all training and validation sets, as well as the test sets, were preprocessed by
performing the following text normalization procedures using a custom function:
Parameter name
lower-case
emojis-handler
hashtag-handlers
url-handler
user-handler
number-handler
diacritic-removal
duplication-removal
punctuation-removal
 -grams
 -grams
skip-grams
weighting scheme
token-max-filter
token-min-filter</p>
      </sec>
      <sec id="sec-5-4">
        <title>Dravidian dialect Malayalam Tamil</title>
        <p>false
none
none
delete
group
none
true
true
true
1, 3, 4
3, 2, 1
(3,1)
tfidf</p>
        <p>1
−1
true
none
delete
delete
delete
delete
true
true
false
1, 2, 3, 5, 9</p>
        <p>3, 2
none
tfidf</p>
        <p>1
−1
• Lower casing
• URL removal
• Removal of usernames
• Removal of non-alphanumerical characters
• Conversion of emojis and emoticons to textual representation
In addition to this, the training and validation sets were transformed to comply with the default required
format for the use FastText. This is to say, the string “__label__ ” was placed at the beginning of every
sample in each set, where  represents the real label for a given sample.</p>
        <p>A single model was trained per-language using both the default FastText parameters and using the
hyperparameter optimization option provided by FastText, setting the time limit to 10 minutes. This
resulted in 2 models per-language.
5.3. BERT
All BERT and RoBERTa models, save for MuRIL and Multilingual, were fine-tuned for a text classification
task. Subsequently, each model was tested across the target languages to evaluate its performance in
detecting sarcasm. This involved the following methodology:
• Both the BERT and RoBERTa models were fine-tuned using the prepared dataset. This process
involved adjusting the pre-trained models with warm-up steps and learning rates for the sarcasm
classification task.
• After fine-tuning, each model was tested on the test sets provided for each target language.</p>
        <p>Performance metrics such as accuracy, precision, recall, and F1 score were calculated to select the
most efective models to be later used in the ensemble.
• Once the most efective models were identified, the probability scores for each dataset were
computed. These scores were then used to create the ensemble model, a significant outcome of
our methodology.</p>
        <p>In the case of MuRIL and BERT Multilingual, the pre-trained model was used to obtain document
embeddings from each document in every dataset. Instead of using the document embedding provided
by the model, the token embeddings from every token in a given document were summed and averaged;
this averaged embedding was used as the document embedding. These document embeddings were used
in the training process for each language to train a logistic regression classifier for each language with the
use of 5-fold stratified cross-validation, which were then used to make predictions and obtain probability
scores from the training, validation, and test datasets for later use in the ensemble model. It is important
to note that prior to passing the text through MuRIL, it was normalized by performing the following
operations: removing URLs, converting emojis and emoticons to textual descriptions, converting the
text to lowercase, replacing usernames with a placeholder, removing specified punctuation and symbols,
and normalizing whitespace. The text was left in code-mixed format instead of transliterating, as
opposed to the other uses of BERT models. For MuRIL, an experiment was also done with transliterated
text.
5.4. XGBoost
The selected configurations to create the VSMs for use with XGBoost are detailed in table 6. Please
note that a model was created for each language in all configurations, using the output probabilities
from the respective language to build the VSM.</p>
      </sec>
      <sec id="sec-5-5">
        <title>XGBoost Configuration</title>
        <sec id="sec-5-5-1">
          <title>MuRIL + fastText</title>
        </sec>
        <sec id="sec-5-5-2">
          <title>MuRIL + fastText + Multilingual BERT</title>
        </sec>
        <sec id="sec-5-5-3">
          <title>MuRIL + fastText +  TC + l3cube-pune</title>
        </sec>
        <sec id="sec-5-5-4">
          <title>MuRIL + fastText +  TC + l3cube-pune + RoBERTaXLM</title>
          <p>Each XGBoost model was built by randomized search cross-validation using stratified k-folds with five
splits. The parameters with their respective values that were chosen for the random search are detailed
in table 7.</p>
          <p>Parameter
n_estimators
min_child_weight
gamma
learning_rate
subsample
colsample_bytree
max_depth</p>
          <p>Values
5.5. Results</p>
          <p>As organizers allow three submissions, we decide to submit the results obtained with the models
boldfaced in Table 8. For the Tamil case, the ensemble model was ranked at the top of the contestants,
as shown in Table 9. However, for the Malayalam dialect, the optimized FastText approach was ranked
at the top tied with the other five teams, as shown in Table 10.
Default fastext+prepocessing
2gram fastext+prepocessing
bert-(malayalam/tamil)
 tc
l3cube-pune/(malayalam/tamil)-bert
roBERTa XLM
optimized_fasttext
bert_muril_malayalam
bert_muril_tamil
optimized xgboost (bert muril + fasttext)
optimized xgboost (bert muril + fasttext
+ bert multilingual)
optimized xgboost (bert muril + fasttext
+ mtc + bert/l3cube-pune)
optimized xgboost (bert muril + fasttext
+ mtc + bert/l3cube-pune + RoBERTaXLM)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>At this point, there is a rapid and constant rise of new and eficient language models. The latter
provides researchers with a comprehensive variety of tools; integrating them could be challenging. Our
team integrated the knowledge using XGBoost over stacked VSM to take advantage of multiple text
classification approaches. Our approach ranks at the top in both languages. However, even though
ensembles perform consistently better in our validation partition, the ensemble only outperforms other
approaches for the Malayalam task, while in the Tamil task, FastText is the one on rank 1. The latter
suggest that the correct model can depend on the input data, or possible overfitting from the ensemble
model, but more research is needed to rectify this. Also, while this approach proves to be efective, the
setup is relatively complex to implement and computationally intensive. This leaves open the possibility
of further optimizing the process to simplify our implementation.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>Authors from the Universidad Michoacana acknowledge the support provided by CONAHCYT through
Project CF-2023-I-1174 within the framework of the “Ciencia de Frontera 2023”.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly to grammar and spelling check. After,
the authors reviewed and edited the content as needed and took full responsibility for the publication’s
content.
[13] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16,
Association for Computing Machinery, New York, NY, USA, 2016, p. 785–794. URL: https://doi.org/
10.1145/2939672.2939785. doi:10.1145/2939672.2939785.
[14] J. P. Haumahu, S. D. H. Permana, Y. Yaddarabullah, Fake news classification for indonesian
news using extreme gradient boosting (xgboost), IOP Conference Series: Materials Science
and Engineering 1098 (2021) 052081. URL: https://dx.doi.org/10.1088/1757-899X/1098/5/052081.
doi:10.1088/1757- 899X/1098/5/052081.
[15] R. Bagate, S. Ramadass, Sarcasm detection of tweets without #sarcasm: Data science approach,
Indonesian Journal of Electrical Engineering and Computer Science 23 (2021) 993. doi:10.11591/
ijeecs.v23.i2.pp993- 1001.
[16] J. Cerda-Flores, R. Hernández-Mazariegos, J. Ortiz-Bejar, F. Calderón-Solorio, J. Ortiz-Bejar, Umsnh
at restmex 2023: An xgboost stacking with pre-trained word-embeddings over data batches, 2023,
pp. 1–10. URL: https://ceur-ws.org/Vol-3496/restmex-paper19.pdf.
[17] E. S. Tellez, D. Moctezuma, S. Miranda-Jiménez, M. Graf, An automated text categorization
framework based on hyperparameter optimization, Knowledge-Based Systems 149 (2018) 110–123.
[18] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
Roberta: A robustly optimized bert pretraining approach. arxiv [preprint](2019), arXiv preprint
arXiv:1907.11692 (2019).
[19] T. Chen, T. He, M. Benesty, V. Khotilovich, Package ‘xgboost’, R version 90 (2019) 40.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nandhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rajkumar, Overview of the shared task on sarcasm identification of dravidian languages (malayalam and tamil) in dravidiancodemix, in: Forum of Information Retrieval and Evaluation FIRE-</article-title>
          <year>2023</year>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nandhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rajkumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Subalalitha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Findings of shared task on sarcasm identification in code-mixed dravidian languages</article-title>
          ,
          <source>FIRE</source>
          <year>2023</year>
          16 (
          <year>2023</year>
          )
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Hope speech detection in youtube comments</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>75</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <article-title>How can we detect homophobia and transphobia? experiments in a multilingual code-mixed setting for social media governance</article-title>
          ,
          <source>International Journal of Information Management Data Insights</source>
          <volume>2</volume>
          (
          <year>2022</year>
          )
          <fpage>100119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N</given-names>
            , B. B,
            <surname>N. K</surname>
          </string-name>
          , T. Durairaj,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rajkumar, Overview of sarcasm identification of dravidian languages in dravidiancodemix@fire2024, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention is all you need,
          <year>2023</year>
          . URL: https://arxiv.org/abs/1706.03762. arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <year>2019</year>
          . URL: https://api.semanticscholar.org/CorpusID:160025533.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>M. D. Anik Basu</surname>
            <given-names>Bhaumik</given-names>
          </string-name>
          ,
          <article-title>Sarcasm detection in dravidian code-mixed text using transformer-based models</article-title>
          ,
          <source>FIRE</source>
          <year>2023</year>
          16 (
          <year>2023</year>
          )
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehtani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Margam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Nagipogu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dave</surname>
          </string-name>
          , et al.,
          <article-title>Muril: Multilingual representations for indian languages</article-title>
          ,
          <source>arXiv preprint arXiv:2103.10730</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Dhanya Krishnan</surname>
          </string-name>
          , Krithika Dharanikota,
          <article-title>Cross-linguistic sarcasm detection in tamil and malayalam: A multilingual approach</article-title>
          ,
          <source>FIRE</source>
          <year>2023</year>
          16 (
          <year>2023</year>
          )
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Bag of tricks for eficient text classification</article-title>
          ,
          <source>arXiv preprint arXiv:1607.01759</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>