<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sarcasm Detection and Identification of Dravidian Language Using Machine Learning Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Moogambigai A</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kamesh S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paruvatha Priya B</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi B</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Sarcasm detection poses a significant challenge in sentiment analysis, especially on social media, where sarcasm is often used to convey opinions indirectly. This complexity is further exacerbated in multilingual settings, particularly with code-mixed languages like Tamil-English and Malayalam-English, where traditional sentiment analysis systems, typically trained on monolingual data, often fail due to the intricacies of code-switching at diferent linguistic levels. In this work, we develop an automated system for detecting sarcasm in code-mixed social media texts, with a focus on Tamil-English and Malayalam-English. We employ several machine learning classifiers, including Random Forest, Logistic Regression, Support Vector Machine, and Multinomial Naive Bayes, along with TF-IDF vectorization for feature extraction. Our system is trained on a newly developed gold standard corpus that reflects the real-world class imbalance typical of such datasets. The performance evaluation of the models shows that the Support Vector Machine and Random Forest classifiers achieve the highest accuracy, outperforming existing systems designed for monolingual sarcasm detection. These results represent a significant advancement in handling sarcasm in under-resourced languages and encourage further research into multilingual sentiment analysis in code-mixed contexts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sarcasm detection</kwd>
        <kwd>code-mixed languages</kwd>
        <kwd>sentiment analysis</kwd>
        <kwd>Tamil-English</kwd>
        <kwd>Malayalam-English</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Sarcasm is a way of speaking where the actual meaning is diferent from the literal words. It’s often
used for irony, teasing, or humor, which makes it hard for sentiment analysis systems to detect. For
example, if someone says, "Great job on breaking the vase!" The words seem positive, but the true
meaning is negative. Detecting sarcasm is important to understand the real sentiment in text, especially
on social media, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]where tone and intent can be easily misread. In recent years, there has been a
growing demand for efective sarcasm and sentiment detection systems tailored to the code-mixed
texts prevalent on social media platforms. The Dravidian languages, Tamil and Malayalam, are widely
spoken in South India and among the global diaspora. Tamil is recognized as an oficial language in
India, Sri Lanka, and Singapore, while Malayalam is predominantly spoken in the Indian state of Kerala.
However, the ease of typing in Roman script has led to widespread use of code-mixed Tamil-English and
Malayalam-English in online communication. This paper introduces a new gold standard corpus for the
detection of sarcasm and sentiment in code-mixed Tamil-English and Malayalam-English texts collected
from social media [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]The task is to classify YouTube comments as either sarcastic or non-sarcastic at
the message level. This shared task[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] represents the first known efort to address sarcasm detection in
Dravidian code-mixed text, and it aims to foster research that illuminates how sarcasm is expressed in
these multilingual scenarios.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Data Collection and Sources</title>
      <p>For this task, we present a unique dataset composed of Tamil-English and Malayalam-English code-mixed
sentences derived from YouTube video comments. The dataset is specifically curated to encompass all
three types of code-mixed sentences: Inter-Sentential switch, Intra-Sentential switch, and Tag switching.</p>
      <p>
        The comments in the dataset exhibit a mix of native script and Roman script, reflecting the linguistic
complexity of code-mixing in social media [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Many comments are written in either Tamil or Malayalam
script, combined with English lexicon or grammar, creating a rich, diverse dataset that captures various
code-mixing patterns. Additionally, some comments are composed in Tamil or Malayalam script with
English expressions interspersed, further enhancing the dataset’s variety.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Dataset Structure</title>
        <p>
          The dataset structure aligns with the shared task on sarcasm detection for Dravidian code-mixed
languages [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], providing comprehensive training, development, and test sets for building and evaluating
robust models. These datasets not only allow for the development of sarcasm detection systems but
also serve as a resource for further research in code-mixed language processing and sentiment analysis
in under-resourced languages.
        </p>
        <p>Table 1 presents the dataset statistics for both Tamil-English and Malayalam-English code-mixed
comments, indicating the number of samples in the training, development, and test sets. Both datasets
present unique challenges due to their code-mixed nature. Social media comments often combine
Tamil/Malayalam and English in a single sentence, sometimes even switching between languages
within a single word. This characteristic increases the complexity of tokenization, feature extraction,
and sarcasm detection, requiring specialized preprocessing steps. Additionally, the datasets reflect
significant class imbalance, with sarcastic comments being much rarer than non-sarcastic ones.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Tamil-English:</title>
          <p>The Tamil-English dataset is the larger of the two, containing 29,571 samples in the training set, 6,337
in the development set, and 6,339 in the test set. These comments are code-mixed between Tamil
and English, where the majority of the text is written in Roman script. The dataset covers a wide
range of sentiment expressions, including sarcasm, positive, neutral, and negative tones.Training Set:
Comprises 29,571 comments that are used to train the machine learning models. This set includes
labeled data that identifies whether a comment is sarcastic or not.Development Set: This set contains
6,337 labeled comments and is used for model validation and hyperparameter tuning during training.
The development set plays a crucial role in preventing overfitting by allowing iterative testing.Test
Set: The test set, with 6,339 comments, is used for the final evaluation of the models. These comments
are provided without labels, and the model’s predictions on this set determine its generalization
performance.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Malayalam-English:</title>
          <p>The Malayalam-English dataset is smaller in size but poses similar challenges due to code-mixing. It
contains 13,189 samples in the training set, 2,827 in the development set, and 2,827 in the test set.Training
Set: Consists of 13,189 code-mixed Malayalam-English comments, used to train the models. Like the
Tamil-English dataset, this set also contains labeled data for sarcasm detection.Development Set: With
2,827 labeled comments, this set is used for evaluating and refining the model during training. Its
smaller size compared to Tamil-English indicates that fewer resources are available for Malayalam,
highlighting it as an under-resourced language.Test Set: The test set contains 2,827 comments. Similar
to the Tamil-English dataset, this set is unlabeled and serves as the benchmark for testing the model’s
performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The framework of proposed methodology is shown in Figure 1</p>
      <sec id="sec-3-1">
        <title>3.1. Classification Techniques for Malayalam Dataset</title>
        <p>
          In our case, we tried several machine learning [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] models to classify sarcasm in the Malayalam-English
dataset. Each model was selected based on its ability to handle the unique characteristics of the
code-mixed data.
        </p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Logistic Regression:</title>
          <p>Logistic Regression provided a straightforward approach to sarcasm detection. It performed well in
identifying non-sarcastic comments, with a precision of 0.85, recall of 0.98, and an F1-score of 0.91.
However, the model struggled with sarcastic comments, achieving a lower precision of 0.72, recall of
0.23, and F1-score of 0.35. The overall accuracy was 0.84, with a macro average F1-score of 0.63.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Random Forest:</title>
          <p>
            The Random Forest model [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], an ensemble of decision trees, ofered a balanced performance. It achieved
an accuracy of 0.82, precision of 0.79, recall of 0.82, and an F1-score of 0.79. This model efectively
managed the complexity of code-mixed text, providing consistent results across various metrics.
          </p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Support Vector Machine (SVM):</title>
          <p>
            SVM [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], known for its robustness in text classification, performed slightly better than the Random
Forest model. It achieved an accuracy of 0.83, precision of 0.81, recall of 0.83, and an F1-score of 0.81,
making it a strong contender in sarcasm detection.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Classification Techniques for Tamil Dataset</title>
        <p>
          For the Tamil-English dataset, we applied a range of classification techniques similar to those used in
the Malayalam-English dataset. Logistic Regression showed mixed results [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], whereas Random Forest
and SVM demonstrated more stable performances, aligning with the findings in the Malayalam-English
dataset.
        </p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Logistic Regression:</title>
          <p>
            The Logistic Regression model [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] provided a robust approach, excelling in non-sarcastic comment
detection with a precision of 0.85, recall of 0.97, and F1-score of 0.91 during training. However, it
struggled with sarcastic comments, especially in the testing phase, where it achieved a precision of 0.71,
recall of 0.42, and F1-score of 0.53. The overall accuracy on the testing data was 0.80.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Random Forest with TF-IDF:</title>
          <p>The Random Forest model, combined with TF-IDF vectorization, demonstrated strong performance
during training, but its testing accuracy dropped to 0.78. It was efective in identifying non-sarcastic
comments but faced challenges with sarcasm, resulting in a lower F1-score for sarcastic comments.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Multinomial Naive Bayes:</title>
          <p>
            This model [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] provided balanced performance, achieving an accuracy of 0.84 during training and 0.76
on the testing data. It was particularly strong in detecting non-sarcastic comments but had limitations
in identifying sarcastic ones.
          </p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. TF-IDF Vectorizers:</title>
          <p>TF-IDF Vectorization [11], while nearly perfect during training, faced generalization challenges on
testing data, achieving an accuracy of 0.74. The model was efective for non-sarcastic comments but
limited in detecting sarcasm.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Preprocessing Steps</title>
        <p>Given the complexity of code-mixed text, several preprocessing steps were undertaken to prepare
the data for modeling: Given the complexity of code-mixed text, several preprocessing steps were
undertaken to prepare the dataset for modeling. First, text normalization was performed to standardize
informal language, including slang, abbreviations, and non-standard spellings, thereby reducing noise in
the data. Following normalization, tokenization was applied to split the comments into individual words
or phrases, an essential step for feature extraction. Special care was taken in handling mixed-language
tokens, as the dataset contained a combination of Tamil/Malayalam and English words. In cases where
comments were written in native scripts, script conversion was employed to translate these into Roman
script, ensuring consistency across the dataset, which predominantly consisted of Romanized text.
This step was crucial for uniform text processing, particularly for code-mixed languages. A significant
challenge encountered was the class imbalance in the dataset, where sarcastic comments were vastly
outnumbered by non-sarcastic ones. To address this, various strategies were implemented, including
oversampling, where the number of sarcastic samples was increased by duplicating existing ones,
and undersampling, where the number of non-sarcastic samples was reduced to balance the classes.
Additionally, synthetic data generation techniques like SMOTE (Synthetic Minority Over-sampling
Technique) were utilized to generate synthetic sarcastic comments, further mitigating the imbalance.
Lastly, feature extraction was performed using TF-IDF vectorizers, which emphasize the importance of
rare words that might carry sarcastic meanings. This approach enabled the system to capture the subtle
nuances of sarcasm in code-mixed text, facilitating more accurate predictions.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Challenges Faced</title>
        <p>During the training process, we encountered several challenges:</p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Class Imbalance:</title>
          <p>As mentioned, the imbalance [12] between sarcastic and non-sarcastic comments posed a significant
challenge. Despite applying techniques like oversampling and SMOTE, achieving a balanced dataset
while maintaining the integrity of the data was dificult.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Code-Mixing Complexity:</title>
          <p>The mixture of Tamil/Malayalam [13] with English within single comments introduced additional
complexity. The variations in script, syntax, and linguistic structures made it challenging to develop
models that could accurately capture the context and intent of the text.</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>3.4.3. Ambiguity in Annotation:</title>
          <p>Annotating sarcasm [14] is inherently subjective, and even among native speakers, there can be
diferences in interpretation. Ensuring consistent and accurate labeling of the training data was a critical
but challenging task.</p>
        </sec>
        <sec id="sec-3-4-4">
          <title>3.4.4. Training and Testing without Labels:</title>
          <p>While the training dataset [15] was annotated, the testing dataset provided for this study did not include
labels, which added an extra layer of dificulty in evaluating model performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. EXPERIMENTS AND RESULTS</title>
      <sec id="sec-4-1">
        <title>4.1. Logistic Regression:</title>
        <p>For both Tamil and Malayalam datasets, we implemented logistic regression models. These models
are particularly efective in binary classification tasks such as sarcasm detection. We used advanced
optimization techniques to handle the large and diverse datasets, ensuring that the models converge
properly even with the intricate nature of code-mixed text.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Random Forest:</title>
        <p>We employed Random Forest classifiers due to their robustness and ability to handle overfitting, making
them ideal for our unevenly distributed datasets. The ensemble method, which builds multiple decision
trees and merges them to get a more accurate and stable prediction, proved to be very efective, especially
with the Tamil dataset using TF-IDF vectorization to enhance feature extraction.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Support Vector Machine (SVM):</title>
        <p>The SVM models were configured with a linear kernel to capitalize on their strength in high-dimensional
spaces, which is typical in text classification tasks like ours where features (words or terms) can be
numerous. This model was applied to the Malayalam dataset and tuned to balance precision and recall,
optimizing for the unique challenges of sarcasm detection in this language.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Multinomial Naive Bayes:</title>
        <p>For the Tamil dataset, we utilized Multinomial Naive Bayes classifiers. These models are well-suited for
discrete feature classification and were particularly chosen for their eficiency in handling large volumes
of text data. The assumption that features follow a multinomial distribution aligns well with the word
frequencies in code-mixed social media comments, aiding in better sarcasm identification.Each model
was meticulously tuned to address the specific challenges posed by the sarcasm detection task, taking
into account the complexities of code-switching, emotive content, and linguistic subtleties unique to
Tamil and Malayalam code-mixed text. The performance of these models indicates a promising direction
for further research in sarcasm detection within under-resourced languages.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Performance on Malayalam Validation Set</title>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Performance on Tamil Validation Set</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this study, we explored the challenging task of sarcasm detection in code-mixed Dravidian languages,
specifically Tamil-English and Malayalam-English text from social media. By creating and analyzing a
unique dataset of YouTube comments, we highlighted the complexities of code-mixing, including
InterSentential, Intra-Sentential, and Tag switching, and their impact on sarcasm detection.Our approach
involved selecting and extracting features that capture the linguistic and contextual subtleties of the
text. By leveraging lexical, syntactic, sentiment, and code-mixing specific features, along with advanced
contextual embeddings [16], we aimed to develop a robust model for identifying sarcasm in these
mixed-language scenarios.For the Malayalam dataset, we employed Logistic Regression, Random Forest,
and Support Vector Machine (SVM) as our classification techniques. Similarly, for the Tamil dataset, we
utilized Logistic Regression, Random Forest with TF-IDF, and Multinomial Naïve Bayes approaches.The
results demonstrate that integrating various feature types significantly improves sarcasm detection
accuracy. However, the task remains dificult due to the intricate nature of sarcasm and the diversity of
code-mixed languages. Class imbalance and variability in code-switching patterns further complicate
the task, underscoring the need for more research in this area. Overall, this study contributes to the
growing body of work on sentiment analysis in under-resourced languages, emphasizing the importance
of context and specialized models when dealing with code-mixed languages. We submitted three models
each for Tamil and Malayalam datasets, achieving notable placements in the evaluations, securing 5th
place in Tamil and 9th place in Malayalam evaluations, reflecting our models’ competitiveness and
efectiveness in sarcasm detection within these challenging linguistic contexts.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[11] S. Mihi, B. Ait Ben Ali, I. El Bazi, S. Arezki, N. Laachfoubi, Automatic sarcasm detection in dialectal
arabic using bert and tf-idf, in: The Proceedings of the International Conference on Smart City
Applications, Springer, 2021, pp. 837–847.
[12] A. Banerjee, M. Bhattacharjee, K. Ghosh, S. Chatterjee, Synthetic minority oversampling in
addressing imbalanced sarcasm detection in social media, Multimedia Tools and Applications 79
(2020) 35995–36031.
[13] S. Chanda, A. Mishra, S. Pal, Sarcasm detection in tamil and malayalam dravidian code-mixed
text., in: FIRE (Working Notes), 2023, pp. 336–343.
[14] P. Chaudhari, C. Chandankhede, Literature survey of sarcasm detection, in: 2017 International
conference on wireless communications, signal processing and networking (WiSPNET), IEEE,
2017, pp. 2041–2046.
[15] P. Goel, R. Jain, A. Nayyar, S. Singhal, M. Srivastava, Sarcasm detection using deep learning and
ensemble learning, Multimedia Tools and Applications 81 (2022) 43229–43252.
[16] N. Babanejad, H. Davoudi, A. An, M. Papagelis, Afective and contextual embedding for sarcasm
detection, in: Proceedings of the 28th international conference on computational linguistics, 2020,
pp. 225–243.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Carman</surname>
          </string-name>
          ,
          <article-title>Automatic sarcasm detection: A survey, ACM Computing Surveys (CSUR) 50 (</article-title>
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <article-title>Techniques of sarcasm detection: A review, in: 2021 international conference on advance computing and innovative technologies in engineering (ICACITE)</article-title>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>968</fpage>
          -
          <lpage>972</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N</given-names>
            , B. B,
            <surname>N. K</surname>
          </string-name>
          , T. Durairaj,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Rajkumar, Overview of sarcasm identification of dravidian languages in dravidiancodemix@fire2024, in: Forum of Information Retrieval and Evaluation FIRE -</article-title>
          <year>2024</year>
          , DAIICT , Gandhinagar,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Ou,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <article-title>Sarcasm detection in social media based on imbalanced classification</article-title>
          ,
          <source>in: Web-Age Information Management: 15th International Conference, WAIM</source>
          <year>2014</year>
          , Macau, China, June 16-18,
          <year>2014</year>
          . Proceedings 15, Springer,
          <year>2014</year>
          , pp.
          <fpage>459</fpage>
          -
          <lpage>471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bagate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suguna</surname>
          </string-name>
          ,
          <article-title>Sarcasm detection of tweets without# sarcasm: data science approach</article-title>
          ,
          <source>Indonesian Journal of Electrical Engineering and Computer Science</source>
          <volume>23</volume>
          (
          <year>2021</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1001</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Sarsam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Al-Samarraie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. I.</given-names>
            <surname>Alzahrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <article-title>Sarcasm detection using machine learning algorithms in twitter: A systematic review</article-title>
          ,
          <source>International Journal of Market Research</source>
          <volume>62</volume>
          (
          <year>2020</year>
          )
          <fpage>578</fpage>
          -
          <lpage>598</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Elgabry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Attia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdel-Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdel-Ate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Girgis</surname>
          </string-name>
          ,
          <article-title>A contextual word embedding for arabic sarcasm detection with random forests</article-title>
          ,
          <source>in: Proceedings of the Sixth Arabic Natural Language Processing Workshop</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>340</fpage>
          -
          <lpage>344</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duhan</surname>
          </string-name>
          ,
          <article-title>Sarcasm detection on twitter data using support vector machine</article-title>
          .,
          <source>ICTACT Journal on Soft Computing</source>
          <volume>10</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A multinomial logistic regression modeling approach for anomaly intrusion detection</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>24</volume>
          (
          <year>2005</year>
          )
          <fpage>662</fpage>
          -
          <lpage>674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <article-title>Sentiment polarity detection in bengali tweets using multinomial naïve bayes and support vector machines</article-title>
          ,
          <source>in: 2017 IEEE Calcutta Conference (CALCON)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>