<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Word Level Language Identification in Code-mixed Dravidian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harshitha S Kumar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharal Coelho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asha Hegde</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kavya G</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hosahalli Lakshmaiah Shashirekha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Natural Language Processing</institution>
          ,
          <addr-line>Dravidian Languages, CRF model, Language Identification, Code-mixed data</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In social media posts, it is common to see a mix of languages at word level and identifying the language of the word in these posts is essential for applications like Machine Translation. Word level Language Identification (LI) deals with identification of the language of the word in a sentence or document. The limited availability of labeled data in low-resource Indian languages such as Kannada, Tulu, Tamil, Telugu, etc., for word-level LI hinders the advancements in developing robust models for LI in such languages. In order to meet this requirement "CoLI-Dravidian 2024: Word-level Code-Mixed Language Identification in Dravidian Languages" shared task organized at FIRE-2024 invites researchers to develop models to address the challenges of LI in four diferent Dravidian languages (Kannada, Tulu, Tamil, and Malayalam). In this paper, we - team MUCSNLPLab, describe the Conditional Random Field (CRF) model trained with text-based features to identify the language of the word in a given sentence. The proposed CRF model obtained Macro F1 scores of 0.608, 0.869, 0.767, and 0.772 for Tamil, Kannada, Malayalam, and Tulu respectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Dravidian languages are a well-known language category spoken by more than 250 million people,
mainly in South India, Sri Lanka, and other parts of South Asia. Kannada, Telugu, Tamil, Malayalam
and Tulu, are the most widely spoken Dravidian languages. While Kannada is one of the scheduled
languages of India with 40+ million speakers mainly from Karnataka state. Tulu has a rich cultural and
literary heritage and is spoken by a community of about 2.5 million native speakers in coastal regions
of southern part of India, predominantly in Karnataka state [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Tamil is the oficial language of Tamil
Nadu and Puducherry and it is the oldest language in India. Malayalam language is another most widely
spoken language in the southern region of India with nearly 35+ million speakers [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Despite their
rich linguistic history, these languages are considered low-resource languages, as they lack extensive
digital tools, resources, and formal computational language processing infrastructure.
      </p>
      <p>
        Recently social media has allowed speakers of low-resource languages to create and share content
in their native and/or regional language mixed with English on platforms like Twitter, WhatsApp,
Facebook, etc., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This combination of multiple languages in a paragraph, sentence, or word is termed
code-mixing or code-switching and such data is called code-mixed data that has become a default
language of social media which has attracted the attention of researchers in the Natural Language
Processing (NLP) domain. Social media platforms have given users the freedom to write text informally,
often ignoring the grammar conventions of the specific language used. This has resulted in considerable
growth in user-generated texts, which are characterized by code-mixed words like "moviege"
(KannadaEnglish), "trailerg" (Tulu-English), user-defined abbreviations (’ASAP’ for "As Soon As Possible"), and
the repetition of characters (“toooo goood” for “too good”), and so on. Such contents are dificult to
understand due to the informal linguistic arrangement.
      </p>
      <p>
        Such contents are dificult to understand due to the informal linguistic arrangement. In order to
handle code-mixing and informal content on platforms like social media, chatbots, and real-time
systems, the concept of word-level LI is developed. Each word in the text can be classified as belonging
to a single language or a group of languages is known as word-level Language Identification (LI).
This encouraged NLP researchers to explore the problem of LI at the word-level. Word-level LI is
necessary for applications such as Machine Translation (MT), Part-Of-Speech (POS), and Named Entity
Recognition (NER) in multilingual contexts. With massive amounts of social media data generated
everyday, manual word-level LI is dificult. Additionally, it often lacks context, making it harder to
interpret meaning and intent accurately. The efectiveness of incorporating linguistic features can vary
greatly depending on the specific languages and features used, and finding the optimal combination
needs to be explored. Hence, this requires automated tools and techniques to identify language of the
word [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The word-level LI is modeled as a sequence labeling task which involves assigning a label to
each element in an input sequence, maintaining the order of the words. This approach allows for better
handling of the complexities inherent in code-mixed content, where multiple languages may be present
within the same sentence.
      </p>
      <p>
        The increasing prevalence of code-mixed content, especially on social media, has made word-level LI a
critical challenge. Despite its importance, the development of reliable tools for word-level LI in Dravidian
languages continues to face significant challenges, mainly due to the lack of standardized resources
and the scarcity of large-scale annotated datasets. While Deep Learning (DL) models, such as neural
networks, have demonstrated strong performance in various NLP tasks, they often require vast amounts
of labeled data to function efectively. In contrast, traditional Machine Learning (ML) techniques like
Conditional Random Fields (CRF), when integrated with well-designed linguistic features, provide
a more resource-eficient alternative. These methods ofer a promising balance between accuracy
and computational feasibility, making them particularly suitable for low-resource languages like the
Dravidian languages. To address the challenges of word-level LI in Dravidian languages - Tamil, Kannada,
Malayalam, and Tulu, in this paper, we - team MUCSNLPLab, describe the learning models submitted
to "CoLI-Dravidian 2024: Word-level Code-Mixed Language Identification in Dravidian Languages"
shared task organized at FIRE 2024 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Word-level LI is modeled as a multi-class classification using
CRF model trained with text-based features, to identify the language of each word in the given text.
      </p>
      <p>The rest of the paper is organized as follows: Section 2 contains Related Work. While Section 3
describes the Methodology, Section 4 gives a description of the Experiments, Results, and Observations
followed by Conclusion in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The task of word-level LI has become increasingly important as multilingual and code-mixed content
continues to grow, especially on digital platforms like Facebook, YouTube, etc. Researchers have
explored various approaches to address word-level LI for languages where extensive corpora and
linguistic resources are readily available [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, the challenge of processing languages with
limited resources, often referred to as low-resource languages for word-level LI has gained significant
attention. Some of the related works for word-level LI are described below:
      </p>
      <p>
        Chaitanya et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed word-level LI model in Hindi-English code-mixed data using word
embeddings (Continuous Bag of Words (CBOW) and Skip-gram) to efectively capture word semantics
and relationships. They achieved 67.33% and 67.34% for CBOW and Skip-gram model respectively using
Support Vector Machine (SVM) classifiers. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] implemented a word-level LI system for code-mixed
Malayalam-English and Tamil-English Facebook data and generated character embedding features using
skip-gram architecture. They employed 10-fold cross-validation to train and evaluate the SVM model,
ensuring the robust performance of the model, and obtained 93% and 95% accuracies for
MalayalamEnglish and Tamil-English text, respectively. Thara and Poornachandran [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have scraped YouTube
comments to identify bilingual Malayalam-English code-mixed text. To filter out the comments they
have removed English alphabets, numbers, special characters, and emoticons. They used transformer
models (CamemBERT, XLMRoBERTa, ELECTRA, and DistilBERT) to predict language tags at the
wordlevel. The results of this study showed that ELECTRA performed better than other models by obtaining
F1-score of 0.993. Deka et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposed the Bidirectional Encoder Representations for Transformers
(BERT) based approach for LI using Kannada-English code-mixed corpus. Their approach achieved 86%
weighted average F1-score and a macro average F1-score of 57%. To identify the language of words
in code-mixed Kannada texts Yigezu et al. [11] proposed a Bi-LSTM with an attention model that
integrates BERT features to enhance word-level LI accuracy.
      </p>
      <p>Mandal and Singh [12] proposed bidirectional LSTM network with CRF layer (Bi-LSTM-CRF) for
word-level LI in Bengali-English and Hindi-English and obtained accuracies of 93.28% and 93.32% for
Bengali-English and Hindi-English, respectively. Gundapu et al. [13] proposed Naive Bayes, Random
Forest, CRF and Hidden Markov Model for word-level LI in English-Telugu code-mixed data. Among
these models CRF-based model obtained F1-score of 0.91.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In the proposed methodology, the word-level LI task is modeled as a sequence labeling problem where
the goal is to assign a label to each word in a sequence. It is achieved by training CRF model with text
based features like next words, previous words, etc. The framework of CRF model is shown in Figure 1
and the steps involved in the framework are described in the following subsection.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Preparation and Feature Extraction</title>
        <p>The data preparation phase necessitates a careful examination of the provided data and arranging the
data to fit the selected feature extraction technique to extract text features that can be used to train the
CRF model.</p>
        <p>In this work, the CRF model utilizes text features such as word length, previous words, next words,
digits, and punctuations to capture the dependencies and relationships between the words in the given
sequence. These text features show an essential role in determining the performance of a CRF model.
The text features for words in the sample sentences are shown in Table 1 and few of these features are
described below:
• A word: the current word.
• Word length: refers to the number of characters in a word.
• Local context: two preceding and two succeeding words.
• Is current word digit: checks whether the word contains only numerical characters. This text
feature helps in identifying tokens belonging to ’Number’ or ‘Other’ classes.
• Is current word punctuation: checks whether the word contains only punctuations like comma
(,), full stop (.), asterisk (*), etc. This helps in identifying tokens belonging to the ’sym’ category.
These text features are used to assist the learning models in making more accurate predictions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Construction</title>
        <p>CRF model is a probabilistic model often used for sequence prediction tasks and performs well in
modeling the conditional probability distribution of tags such as NER or POS tagging. It grasps the
dependencies between tags by taking into account both previous and subsequent observations and
learns from the text features to predict the sequence of labels for the input sequence. Due to its ability
in capturing relationships between words, CRF models have drawn much attention in labeling sequence
data.</p>
        <p>In this work, CRFSuite is employed for implementing CRF, which is a wrapper built using the
sklearn_crfsuite library, providing a scikit-learn compatible estimator for CRF. CRFSuite is a wrapper
implemented using sklearn_crfsuite1 library, which is a scikit-learn compatible estimator for CRF
implementation. This library simplifies the classifier construction process by wrapping the transformation of
textual features into feature vectors and training the CRF classifier.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>
        The experiments are carried out on the datasets provided by the shared task for word-level LI in
four code-mixed Dravidian languages–Tulu-English [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Kannada-English [14][15], Tamil-English, and
Malayalam-English [16]. The data provided in all the four languages are in Romanized script. Table 2
displays the train and validation set’s tag-wise distribution for the given four code-mixed datasets. Few
of the tags in the annotated data are given below:
• en/mal/tam: English / Malayalam / Tamil
• mixed: Code-mixed words
• sym/SYM: Symbol
• tmen: Tamil-English
      </p>
      <sec id="sec-4-1">
        <title>4.1. Results and Observations</title>
        <p>The performance of the classifier is evaluated based on Macro F1-Score (M_F1). Macro scores are
preferred for evaluating the performance across all classes without bias. The performances of the
proposed CRF models on Test sets are shown in Table 5 and the results in the table reveals that M_F1
are considerably low for Tamil language for Run1 compared to other languages because limited
TamilEnglish data restricts the model’s ability to learn patterns and nuances in the language resulting in
lower performance. For Kannada language, Run2 experiment showed poor performance whereas Run1
achieved better outcomes. For all Run2 experiments, the models showed poor performance, because
L2-SGD is specially designed to work well with large datasets since the provided datasets are limiting
1https://pypi.org/project/sklearn-crfsuite/
the diversity of the training samples and making it dificult for the model to learn meaningful patterns
efectively.</p>
        <p>Considering the performance across all evaluation metrics in both macro and weighted forms, CRF
models demonstrate reliable performance across languages. The results indicate that Run1 model has
exhibited a better M_F1 for Kannada, Malayalam, and Tulu and the third experiment (Run3) gives
better results for Tamil Language. The Run1 model acquired Macro F1 scores of 0.869, 0.767, and 0.772
for Kannada, Malayalam, and Tulu Languages by securing 7th, 9th, and 7th ranks respectively. This is
demonstrated by the fact that the model performs more or less consistently with accuracies ranging
from 60% - 86% across four code-mixed languages.</p>
        <p>The performance variation could be due to i) Unique grammatical structures, syntax, and morphology,
which may not be captured well by the model, ii) Imbalanced or limited training data, leading to poor
classification, and (iii) Wrong annotation issues. Additionally, other than Romanized words, Arabic
words found in Malayalam datasets posed significant challenge. In the given dataset, the words like
’good’, ’message’, ’super’ are written as "goood", "msg", and ’superrruu’, etc., such nonstandard usage
are acceptable by the social media users but is an issue for word-level LI model. The Table 6 shows few
misclassification samples, highlighting instances where imbalanced data, mixed-language annotations,
and numeric content in words contributed to prediction errors.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we describe a CRF model submitted to ”CoLI-Dravidian 2024: Word-level Code-Mixed
Language Identification in Dravidian Languages” shared task at FIRE 2024, for word-level LI in four
code-mixed Dravidian languages (Kannada, Tulu, Tamil, and Malayalam). By training the CRF model
with text-based features, the proposed model obtained Macro F1 score of 0.608 for Tamil and secured 3rd
rank. Eficient techniques will be explored in the future to handle the imbalanced dataset and improve
the performance of the proposed models.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Chat GPT-4 in order to:Grammar and spelling
check. Using this tool, the author(s) reviewed and edited the content as needed and take full responsibility
for the publication’s control
[11] M. G. Yigezu, A. L. Tonja, O. Kolesnikova, M. S. Tash, G. Sidorov, A. Gelbukh, Word Level
Language Identification in Code-Mixed Kannada-English Texts using Deep Learning Approach,
in: Proceedings of the 19th International Conference on Natural Language Processing (ICON):
Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts, 2022,
pp. 29–33.
[12] S. Mandal, A. K. Singh, Language Identification in Code-Mixed Data using Multichannel Neural</p>
      <p>Networks and Context Capture, arXiv preprint arXiv:1808.07118 (2018).
[13] S. Gundapu, R. Mamidi, Word Level Language Identification in English Telugu Code Mixed Data,
arXiv preprint arXiv:2010.04482 (2020).
[14] S. Hosahalli Lakshmaiah, F. Balouchzahi, A. Mudoor Devadas, G. Sidorov, CoLI-Machine Learning
Approaches for Code-Mixed Language Identification at the Word Level in Kannada-English Texts,
acta polytechnica hungarica (2022).
[15] F. Balouchzahi, S. Butt, A. Hegde, N. Ashraf, S. Hosahalli Lakshmaiah, G. Sidorov, A. Gelbukh,
Overview of CoLI-Kanglish: Word Level Language Identification in Code-Mixed Kannada-English
Texts at ICON 2022, in: 19th International Conference on Natural Language Processing Proceedings,
2022.
[16] A. Hegde, F. Balouchzahi, S. Coelho, S. H L, H. A. Nayel, S. Butt, CoLI@FIRE2023: Findings of
Word-Level Language Identification in Code-Mixed Tulu Text, FIRE ’23, Association for Computing
Machinery, New York, NY, USA, 2024, p. 25–26. URL: https://doi.org/10.1145/3632754.3633075.
doi:10.1145/3632754.3633075.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Anusha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Corpus Creation for Sentiment Analysis in Code-Mixed Tulu text</article-title>
          ,
          <source>in: Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          , G. Kavya,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          , Mucs@ dravidianlangtech2023:
          <article-title>Malayalam Fake News Detection using Machine Learning Approach</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>288</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Poornachandran</surname>
          </string-name>
          ,
          <string-name>
            <surname>Code-Mixing</surname>
          </string-name>
          :
          <article-title>A Brief Survey</article-title>
          , in: 2018 International conference
          <article-title>on advances in computing, communications and informatics (ICACCI)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>2382</fpage>
          -
          <lpage>2388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamani</surname>
          </string-name>
          , G. Kavya,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          , MUCSD@ DravidianLangTech2023:
          <article-title>Predicting Sentiment in Social Media Text using Machine Learning Techniques</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>282</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          , K. G,
          <string-name>
            <surname>H. S Kumar</surname>
            , S. D, S. Hosahalli Lakshmaiah,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Agrawal</surname>
          </string-name>
          , Overview of CoLI-Dravidian:
          <article-title>Word-Level Code-Mixed Language Identification in Dravidian Languages, in: Forum for Information Retrieval Evaluation FIRE -</article-title>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. HL</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Nayel</surname>
          </string-name>
          , S. Butt, CoLI@ FIRE2023:
          <article-title>Findings of Word-level Language Identification in Code-Mixed Tulu Text</article-title>
          ,
          <source>in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chaitanya</surname>
          </string-name>
          , I. Madapakula,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thara</surname>
          </string-name>
          ,
          <article-title>Word Level Language Identification in CodeMixed Data using Word Embedding Methods for Indian Languages</article-title>
          , in: 2018
          <source>International Conference on Advances in Computing, Communications and Informatics (ICACCI)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>1137</fpage>
          -
          <lpage>1141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Veena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Soman</surname>
          </string-name>
          ,
          <article-title>An Efective Way of Word-Level Language Identification for Code-Mixed Facebook Comments using Word-Embedding via Character-Embedding</article-title>
          , in: 2017
          <source>International Conference on Advances in Computing, Communications and Informatics (ICACCI)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>1552</fpage>
          -
          <lpage>1556</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Poornachandran</surname>
          </string-name>
          ,
          <article-title>Transformer Based Language Identification for Malayalam-English Code-Mixed Text</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>118837</fpage>
          -
          <lpage>118850</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Deka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Kalita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Sarma</surname>
          </string-name>
          ,
          <article-title>BERT-Based Language Identification in Code-Mix KannadaEnglish Text at the CoLI-Kanglish Shared Task@ ICON 2022, in:</article-title>
          <source>Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>