<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Word-Level Language Identification in Code-Mixed Dravidian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sonith D</string-name>
          <email>sonithksd@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kavya G</string-name>
          <email>kavyamujk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asha Hegde</string-name>
          <email>hegdekasha@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H L Shashirekha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <addr-line>Mangalore, Karnataka</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Forum for Information Retrieval Evaluation</institution>
          ,
          <addr-line>FIRE</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Code-mixing is the practice of combining two or more languages in a single utterance and users on social networking platforms often employ code-mixed text for the ease of use. This phenomena reflects the dynamic linguistic landscape of multilingual societies, where speakers fluidly switch between languages. Language Identification (LI) which aims to recognize the language of text automatically is a crucial and preliminary step for many Natural Language Processing (NLP) applications. Word-Level Language Identification (WLLI) is LI of each word in a given code-mixed text. The dificulties presented by informal and non-standard language, such as slang, abbreviations, and partial words, in user-generated code-mixed text prompt the need for WLLI. To explore the strategies for WLLI, in this paper, we - team MUCS describe the models submitted to “Word Level Language Identification in Code-Mixed Dravidian Languages" - a shared task organized at Forum for Information Retrieval Evaluation (FIRE) 2024. The shared task is ofered in four code-mixed Dravidian languages - Malayalam, Kannada, Tamil, and Tulu. We have explored WLLI as: i) Sequence Labeling (CoLi_CNN - using Multilingual Representations for Indian Languages (MuRIL) and Convolutional Neural Network (CNN) and CoLi_TNN customized Transformer Neural Network (TNN) model) problem and ii) Sequence-to-Sequence (Seq2Seq) learning approach (using Bidirectional Long Short Term Memory (BiLSTM)-to-Long Short Term Memory (LSTM) model), for WLLI in code-mixed Dravidian languages. Among the proposed models, CoLi_CNN model outperformed other models with macro F1 scores of 0.8028, 0.8400, 0.6994, and 0.7854 for Malayalam, Kannada, Tamil, and Tulu datasets respectively, securing 6th rank in all the languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Word-Level Language Identification</kwd>
        <kwd>Sequence Labeling</kwd>
        <kwd>Sequence-to-Sequence Labeling Approach</kwd>
        <kwd>Code-mixed Text</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>LI refers to the process of determining the natural language in which a given piece of text is written. The
increase in multilingual text comprising multiple languages or dialects on digital platforms, particularly
in regions with diverse linguistic landscapes, makes LI as essential task. For tasks like sentiment
analysis, information retrieval, content moderation, and machine translation, accurate LI is crucial as it
enables systems to process and comprehend text correctly. Without efective LI, processing multilingual
data can lead to errors, misinterpretations, and ineficiencies, making LI a crucial task in modern NLP
applications.</p>
      <p>
        India is a multilingual country with a rich heritage of languages and Indians who are often hooked
to social media platforms can often read, write and speak two-three languages comfortably in addition
to English. They usually use a combination of two or more languages in their informal communication
on social media platforms such as Twitter, Instagram, and Facebook, to express themselves more
comfortably [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. This phenomenon of mixing languages at diferent linguistic units such as
sentence, word, or sub-word, is known as code-mixing and it poses significant challenges for identifying
the language of these linguistic units. LI involves analyzing various linguistic features and patterns
within the text to accurately determine the language it belongs to [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. To process the code-mixed
content, it is necessary to go beyond traditional LI and focus on identifying the language of each word in
a sentence. WLLI which addresses the challenge of automatically discerning the language of each word
within a sentence or phrase is crucial for efectively processing and understanding multilingual content
on social media and other digital platforms. By accurately identifying languages at the word level, WLLI
not only enhances the usability of digital tools and social media analytics but also contributes to preserve
linguistic diversity enabling more inclusive communication platforms [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. As digital interactions
continue to evolve in multilingual societies like India, the significance of WLLI in code-mixed text
remains paramount for fostering efective communication across diverse linguistic contexts. Further,
exploring the complexities of code-mixed text and developing innovative solutions for WLLI provides
new opportunities for language technology and promote greater linguistic diversity and inclusion in
the digital sphere. However, challenges in WLLI include the fluidity of language switching within
sentences, variations in spelling and grammar across languages, and the scarcity of annotated data
particularly for under-resourced languages.
      </p>
      <p>Malayalam, Tamil, Kannada, and Tulu languages, primarily spoken in southern part of the country
belong to Dravidian language family and are known for their unique linguistic features and scripts. In
spite of their popularity, these languages are under-resourced. Further, code-mixing of these languages
with English is quite common on social media platforms. To address the challenges of WLLI in
codemixed Dravidian languages - Malayalam, Kannada, Tamil, and Tulu, in this paper, we - Team MUCS
describe the models submitted to "Word-Level Language Identification in Dravidian Languages" shared
task1 organized at FIRE 2024. With the aim of developing robust models for WLLI despite the challenges
posed by code-mixing text, we propose: sequence labeling (CoLi_CNN: using MuRIL with CNN and
CoLi_TNN: customized TNN model) and Seq2Seq learning approach with BiLSTM2LSTM model, to
identify the language at word level in Malayalam, Kannada, Tamil, and Tulu code-mixed texts. The
given dataset is in romanized form and the sample Malayalam, Kannada, Tamil, and Tulu words, from
the given datasets are shown in Table 1.</p>
      <p>The rest of paper is organized as follows: Section 2 describes the recent literature on WLLI and
Section 3 focuses on the description of the proposed models followed by the experiments and results in
Section 4. The paper concludes with future works in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>WLLI in code-mixed language environments is crucial for accurately processing multilingual texts
found on social media platforms. This not only improves user engagement, but also enhances content
personalization, and fosters better communication across diverse linguistic communities online. In
this direction, several studies have been conducted on WLLI in Dravidian languages and some of the
notable studies are mentioned below:</p>
      <p>Sushma et al. [9] proposed two distinct models: i) CoLIEnsemble - an ensemble of Machine Learning
(ML) classifiers (Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR)) with
hard voting trained with Term Frequency-Inverse Document Frequency (TF-IDF) of character n-grams
in the range (1, 3) and fastText pre-trained word vectors individually, and ii) CoLI-CRF - a Conditional
Random Field (CRF) algorithm trained with text-based features, for WLLI in Tulu. Their proposed
CoLI-CRF model outperformed the other model with a macro F1 score of 0.77. Yigezu et al. [10] proposed
LSTM, BiLSTM, and RF models, to identify the language of words in code-mixed Kannada texts in
CoLI-Kanglish shared task at ICON2022. Their proposed BiLSTM model outperformed other models
with a weighted F1-score of 0.82. Tash et al. [11] proposed ML models (k-Nearest Neighbors (k-NN),
SVM) trained with TF-IDF of word n-grams in the range (1, 2) for WLLI in Kannada-English Texts and
their proposed kNN and SVM models achieved macro F1 scores of 0.58 and 0.47 respectively. Thara and
Poornachandran [12] employed a transformer model with various Bidirectional Encoder Representations
from Transformers (BERT) variants (Cross-lingual Language Model - Robustly Optimized BERT approach
(XLM-RoBERTa), CamemBERT, Distilled Version of BERT (DistilBERT), and Eficiently Learning an
Encoder that Classifies Token Replacements Accurately (ELECTRA)) for the WLLI in Malayalam-English
code-mixed dataset and their proposed ELECTRA model outperformed other models with a weighted
1https://codalab.lisn.upsaclay.fr/competitions/19357
F1 score of 0.99. Bansal et al. [13] proposed ML models (LR, Decision Tree (DT), and Gaussian Naive
Bayes (GNB)) for LI in English-Punjabi code-mixed sentiment analysis social media dataset. Among
the proposed models LR classifier outperformed other ML classifiers with an accuracy and F1 score of
86.63% and 0.88 respectively.</p>
      <p>
        Shashirekha et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] developed code-mixed Kannada-English dataset, code-mixed Kannada-English
embeddings (for words, sub-words, and characters) and implemented four learning models: i)
CoLIngrams : an ensemble of ML classifiers (Linear Support Vector Classifier (LSVC), Multi-Layer Perceptron
(MLP) and LR) with ‘soft’ voting trained with Byte Pair Embeddings, ii) CoLI-vectors: an ensemble
model trained with CountVectorizer of sub-words in the range (1, 5) and characters in the range (2, 5),
iii) CoLI-BiLSTM: a sequence processing model based on BiLSTM architecture, and iv) CoLI-ULMFiT: a
Universal Language Model Fine-Tuning (ULMFiT) utilizing Transfer Learning (TL) based approach, for
Kannada-English code-mixed LI task at word level. Among the proposed models, CoLI-ngrams model
outperformed all other models with an average macro F1 score of 0.64.
      </p>
      <p>The related work emphasizes research on WLLI using various ML, DL, and transformer models.
However, the performance of all models are not promising due to the challenges of processing variations
in code-mixed text generated by creative users. Further, scarcity of annotated data for WLLI in
lowresource Dravidian languages such as Malayalam, Kannada, Tamil and Tulu, adds its share of challenges
to develop models for WLLI. This creates a significant opportunity for further research in this field.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Pre-processing involves cleaning the data to remove noise in order to enhance the performance of
the learning models. But as the given dataset is clean, to enhance the models ability to process and
identify the language of the words accurately, all numerical values in the text are converted into their
corresponding word forms. For instance, the number "100" is transformed into "one hundred". This
conversion will help in avoiding potential confusion caused by numerical digits and ensures that all
elements of the text are treated uniformly.</p>
      <p>While sequence labeling problem assigns a label to each and every element in a sequence like tagging
each word in a sentence with its part-of-speech tag, Seq2Seq learning on the other hand focuses on
mapping the entire input sequence to an output sequence. The methodology for the proposed models
are explained below:</p>
      <sec id="sec-3-1">
        <title>3.1. Sequence Labeling</title>
        <p>Two models: i) CoLi_CNN and ii) CoLi_TNN, are proposed using sequence labeling. CoLi_CNN model
employs MuRIL embeddings to train a CNN, while CoLi_TNN model utilizes self-attention mechanisms
to efectively capture contextual relationships in a sequence labeling approach. The description of the
models is given below:</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. CoLi_CNN Model</title>
          <p>In this approach, MuRIL2 - a transformer model pre-trained on 17 Indian languages (including English,
Malayalam, Tamil, Kannada, etc.) is used to represent the given text. MuRIL excels at capturing the
semantic meaning of text through its deep layers and provides contextualized representations of text
[14]. These embeddings are then passed to CNN, which applies convolutional filters to detect local
patterns and features in the data. CNN architecture includes multiple convolutional layers followed
by pooling layers to reduce dimensionality, and a dense layer with a softmax activation function to
generate the final classification probabilities. The CNN classifier, a type of feed forward artificial neural
network, efectively learns complex patterns and sequential dependencies within the data. Dropout
component is also used to regularize the model to prevent overfitting. This approach combines the
contextual understanding provided by MuRIL with CNNs capability to optimize the models performance
for WLLI in code-mixed content. The framework and hyperparameters used in proposed CoLi_CNN
model is shown in Figure 1 and Table 2 respectively.
2https://huggingface.co/google/muril-base-cased</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. CoLi_TNN Model</title>
          <p>CoLi_TNN is a customized TNN architecture proposed for WLLI. Unlike traditional sequence
transduction models that rely on RNNs or CNNs [15], the transformer architecture in CoLi_TNN uses
self-attention mechanisms to compute representations of input sequence. In this study, a standard
transformer architecture is customized to suit token-level classification. This customized model has
embedding layers that convert input tokens and positional information into dense vectors followed by a
series of transformer blocks consisting of multi-head attention and feed-forward networks that allow the
model to capture complex relationships between tokens in the sequence. Custom residual connections
are used to retain the original token-level information in case if any token-specific information is
missed, and layer normalization is applied to stabilize training. This ensures consistent activations
within each layer, leading to smoother learning and improved training eficiency. The model outputs
tag predictions for each token via a dense layer with sparse categorical cross entropy loss, dropout
layers for regularization followed by a softmax layer. The hyperparameter and their values used in this
model is shown in Table 3.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Sequence to Sequence Learning (Seq2Seq)</title>
        <p>Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on
dificult learning tasks. A Seq2Seq model is a type of DNN architecture designed to transform one
sequence into another [16, 17] and framework of the proposed Seq2Seq model is shown in Figure 2. This
model consists of an encoder-decoder architecture designed for Seq2Seq learning. While tokenization
converts text sequences into numerical tokens, padding ensures uniform sequence lengths for batch
processing. The encoder, implemented with a BiLSTM layer, processes the input sequence by capturing
underlying patterns in both directions (forward and backward), creating a rich sequence representation.
The decoder, utilizing a LSTM layer, generates the output sequence based on the context vector produced
by the encoder. Both the input text and labels are embedded into high-dimensional vector spaces using
embedding layers, while the final output is predicted using a fully connected softmax layer, providing a
probability distribution over possible labels for each time step. The hyperparameter and their values
used in Seq2Seq model is given in Table 4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>Various experiments were carried out using diferent learning models to identify the language of the
words in the given code-mixed Kannada, Malayalam, Tamil and Tulu text. The label distribution of
Malayalam, Kannada, Tamil, and Tulu datasets, is shown in Table 5. The performances of the models
are evaluated based on macro F1 score and performances of the proposed models on the Validation
and Test sets using sequence problem (CoLi_CNN and CoLi_TNN) and Seq2Seq approach are shown in
tables 6 and 7 respectively.</p>
      <p>Figure 3 gives a comparison of macro F1 scores of all the participating teams in the shared task for all
the four languages. Among the submitted models, proposed CoLi_CNN model obtained better macro F1
scores securing 6th rank for all the four languages in the shared task. These macro F1 scores indicate
that proposed CoLi_CNN model have performed competitively.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>In this paper, we - team MUCS, describe the models submitted to ‘Word-Level Language Identification in
Dravidian Languages’ a shared task at ‘FIRE 2024’, to identify the languages in code-mixed Malayalam,
Kannada, Tamil, and Tulu texts. Experiments are carried out with sequence labeling (CoLi_CNN and
CoLi_TNN), and Seq2Seq approaches. CoLi_CNN model employs MuRIL word embeddings to train
the CNN model, whereas CoLi_TNN and Seq2Seq models incorporate Keras embeddings for feature
extraction. Among the proposed models, CoLi_CNN model outperformed other models with macro
F1 scores of 0.8028, 0.8400, 0.6994, and 0.7854 for Malayalam, Kannada, Tamil, and Tulu languages
respectively, securing 6th rank for all the languages in the shared task. Optimized feature combinations
and diverse learning approaches will be explored, in addition to examining methods for addressing data
imbalance.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.
for Code-mixed Language Identification at the Word Level in Kannada-English Texts, in: Acta
Polytechnica Hungarica, volume 19, 2022.
[9] N. Sushma, A. Hegde, H. L. Shashirekha, Word-level Language Identification in Code-mixed Tulu</p>
      <p>Texts., in: FIRE (Working Notes), 2023, pp. 213–222.
[10] M. G. Yigezu, A. L. Tonja, O. Kolesnikova, M. S. Tash, G. Sidorov, A. Gelbukh, Word Level
Language Identification in Code-mixed Kannada-English Texts using Deep Learning Approach,
in: Proceedings of the 19th International Conference on Natural Language Processing (ICON):
Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, 2022,
pp. 29–33.
[11] M. S. Tash, Z. Ahani, A. Tonja, M. Gemeda, N. Hussain, O. Kolesnikova, Word Level Language
Identification in Code-mixed Kannada-English Texts using Traditional Machine Learning
Algorithms, in: Proceedings of the 19th International Conference on Natural Language Processing
(ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English
Texts, 2022, pp. 25–28.
[12] S. Thara, P. Poornachandran, Transformer based Language Identification for Malayalam-English</p>
      <p>Code-mixed Text, in: IEEE Access, volume 9, IEEE, 2021, pp. 118837–118850.
[13] N. Bansal, V. Goyal, S. Rani, Experimenting Language Identification for Sentiment Analysis of
English Punjabi Code mixed Social Media Text, in: International Journal of E-Adoption (IJEA),
volume 12, IGI Global, 2020, pp. 52–62.
[14] T. Bao, N. Ren, R. Luo, B. Wang, G. Shen, T. Guo, A Bert-based Hybrid Short Text Classification
Model Incorporating CNN and Attention-based bigru, in: Journal of Organizational and End User
Computing (JOEUC), volume 33, IGI Global, 2021, pp. 1–21.
[15] M. Bilkhu, S. Wang, T. Dobhal, Attention is All You Need for Videos: Self-attention Based Video</p>
      <p>Summarization using Universal Transformers, 2019.
[16] I. Sutskever, O. Vinyals, Q. V. Le, Sequence to Sequence Learning with Neural Networks, 2014.</p>
      <p>URL: https://arxiv.org/abs/1409.3215. arXiv:1409.3215.
[17] S. Palaskar, F. Metze, Acoustic-to-word Recognition with Sequence-to-Sequence Models, in: 2018
IEEE Spoken Language Technology Workshop (SLT), IEEE, 2018, pp. 397–404.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          , K. G,
          <string-name>
            <surname>H. S Kumar</surname>
            , S. D, S. Hosahalli Lakshmaiah,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Agrawal</surname>
          </string-name>
          , Overview of CoLI-Dravidian:
          <article-title>Word-level Code-mixed Language Identification in Dravidian Languages, in: Forum for Information Retrieval Evaluation FIRE -</article-title>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ashraf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          ,
          <article-title>Overview of coli-kanglish: Word Level Language Identification in Code-mixed</article-title>
          <source>Kannada-English Texts at Icon</source>
          <year>2022</year>
          ,
          <source>in: Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H L</given-names>
            , H. A.
            <surname>Nayel</surname>
          </string-name>
          , S. Butt, CoLI@FIRE2023:
          <article-title>Findings of Word-level Language Identification in Code-mixed Tulu Text</article-title>
          , FIRE '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>25</fpage>
          -
          <lpage>26</lpage>
          . URL: https://doi.org/10.1145/3632754.3633075. doi:
          <volume>10</volume>
          .1145/3632754.3633075.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shetty</surname>
          </string-name>
          ,
          <article-title>Word-Level Language Identification of Code-Mixed Tulu-English Data</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ashraf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Hosahalli</given-names>
            <surname>Lakshmaiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          , Overview of CoLI-Kanglish:
          <article-title>Word Level Language Identification in Code-mixed Kannada-English Texts at ICON 2022</article-title>
          , in: 19th
          <source>International Conference on Natural Language Processing Proceedings</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Anusha</surname>
          </string-name>
          , G. Sidorov,
          <article-title>CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts</article-title>
          , in: arXiv preprint arXiv:
          <volume>2211</volume>
          .09847,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Anusha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Corpus Creation for Sentiment Analysis in Code-mixed Tulu Text, in: Proceedings of the 1st Annual Meeting of the ELRA</article-title>
          /ISCA Special Interest Group on
          <string-name>
            <surname>Under-Resourced Languages</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Lakshmaiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Anusha</surname>
          </string-name>
          , G. Sidorov,
          <string-name>
            <surname>CoLI-Machine Learning</surname>
          </string-name>
          Approaches
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>