<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Language Identification in Dravidian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Asha Hegde</string-name>
          <email>hegdekasha@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fazlourrahman Balouchzahi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabur Butt</string-name>
          <email>saburb@tec.mx</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharal Coelho</string-name>
          <email>sharalmucs@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sudha V</string-name>
          <email>srisudhavenkatesh22@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shashirekha Hosahalli Lakshmaiah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ameeta Agrawal</string-name>
          <email>ameeta@pdx.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mexico</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Word-level Language Identification, Code-mixed, Dravidian Languages, Data Collection</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Portland State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Forum for Information Retrieval Evaluation</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>IFE</institution>
          ,
          <addr-line>Tecnologico de Monterrey</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Language Identification (LI) has traditionally been performed at the document or sentence level and specifically for high-resource languages. The rise of social media communication in multilingual regions such as India has seen users generate code-mixed texts, which typically combine English with local languages. Here, word-level LI is of primary importance: a system assigns a language label to each word in the text. It provides a fine granularity essential for capturing frequent language switches in informal, transliterated content. Indeed, it becomes an important step in most downstream NLP tasks such as machine translation, sentiment analysis, and conversation systems. This becomes all the more important because many languages of the Dravidian family are spoken by millions but are characteristically low-resourced. The CoLI-Dravidian shared task attempts to fill this gap by taking up word-level LI in Roman-script code-mixed Dravidian datasets. A total of eight teams participated in this shared task, and the top models achieved macro F1 scores 0.743, 0.921, 0.827, 0.952, and 0.823 for Tamil, Kannada, Malayalam, Telugu, and Tulu, respectively, indicating both the complexity and the progress in this domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The Dravidian languages are a family of approximately 80 languages spoken by more than 220 million
people in South Asia, with a rich and ancient history. As per a recent study, the Dravidian language
family, consisting of major languages such as Tamil, Telugu, Kannada, and Malayalam, is approximately
4,500 years old [1]. Usually, speakers of these regional languages are comfortable using more than one
language, including English, for daily communication. This multilingual environment causes users to
frequently switch between languages and scripts, especially on informal platforms like social media.
Hence, code-mixing has emerged as one of the widespread and natural linguistic phenomena in online
communication [2]. The code-mixing can occur at several levels-paragraph, sentence, word, or even
subword-depending on the speaker’s fluency and communicative intent [ 3]. This dynamic multilingual
behavior introduced new opportunities and challenges for the research of computational linguistics.</p>
      <p>One of the major tasks in multilingual computational linguistics is the identification of the language
for every word in a code-mixed sentence. It forms the base for more accurate NLP tools that enable
applications such as machine translation, sentiment analysis, and social media analytics [4]. Since
Dravidian languages are highly morphologically rich, the challenge becomes even more complex. These
languages have extensive inflection, agglutination, and derivational morphology, which contributes to
their large vocabularies and numerous word variations. Morphological complexity introduces ambiguity,
complicates tokenization, and makes feature extraction problematic for NLP models. Moreover, most
code-mixed data lacks a standard for spellings, transliterations, and grammar. Hence, it calls for</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
specialized approaches toward the development of robust NLP systems which can take code-mixing
and heavy morphological structures into consideration efectively.</p>
      <p>We have thus organized a shared task1 titled “CoLI-Dravidian: Word-level Code-Mixed Language
Identification in Dravidian Languages” to address the challenges associated with word-level LI within
the Dravidian languages. This is part of FIRE 2025: Code-mixed datasets in five languages, namely
Kannada, Tamil, Malayalam, Telugu and Tulu, were provided to develop advanced models for accurate LI
systems in morphologically rich languages. The shared task consists of two major phases: a training and
validation phase, and a testing phase. In the first phase, the participants will be provided with labeled
training and validation datasets in all five languages to develop and tune their systems. In the testing
phase, the unlabeled test sets will be released and participants will submit their predictions through the
Codabench platform. for automatic evaluation. A maximum of five submissions per language were
allowed per team, with only the best performing submission taken into account for the final ranking.
Of 35 registered teams, a total of 10 teams have submitted valid predictions for the final evaluation and
8 of them provided detailed working notes documenting system descriptions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        In recent years, there has been a growing interest among researchers in the field of code-mixed text,
particularly in low-resource and under-resource languages for various applications [5] [6] [7] [2]
[3] [2] [8]. To address the challenges of LI in code-mixed text, several studies have been conducted
employing various ML and Deep Learning (DL) algorithms. Gundapu and Mamidi [9] performed LI
on TeluguEnglish code-mixed text using Conditional Random Fields (CRF) classifiers and obtained an
accuracy of 91.28% by considering previous, current, and next words, their POS tags, word length, and
character n-grams in the range (
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ) as features. Veena et al. [10] explored SVM models trained with
word and character 5-gram embeddings, for LI in code-mixed Hindi-English text and achieved better
accuracy.
      </p>
      <p>Chaitanya et al. [11] proposed word-level LI model in Hindi-English code-mixed data using word
embeddings (Continuous Bag of Words (CBOW) and Skip-gram) to efectively capture word semantics
and relationships. They achieved 67.33% and 67.34% for CBOW and Skip-gram model respectively using
Support Vector Machine (SVM) classifiers. Veena et al.[ 12] implemented a word-level LI system for
code-mixed Malayalam-English and Tamil-English Facebook data and generated character embedding
features using skip-gram architecture. They employed 10-fold cross-validation to train and evaluate the
SVM model, ensuring the robust performance of the model, and obtained 93% and 95% accuracies for
Malayalam-English and Tamil-English text, respectively.</p>
      <p>In the context of word-level LI in code-mixed Kannada-English texts, particularly during the ICON
2022 competition, a notable trend emerged. As documented by Balouchzahi et al. [2], teams that used
neural network (NN) architectures and transformer-based models consistently outperformed traditional
machine learning classifiers and baseline models. Hegde et al. [ 7] provided an in-depth overview of the
methodologies and outcomes of the ”CoLI-Tunglish: Word-level Language Identification in Code-mixed
Tulu Texts” shared task. This task featured participation from five distinct teams, each employing LI in
code-mixed Tulu texts. Among the approaches, a machine learning model that employed a stacking
ensemble of multiple classifiers trained on character n-grams emerged as the top performer. This
model achieved a significant macro F1 score of 0.813, highlighting its efectiveness in addressing the
complexities of code-mixed Tulu text processing</p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets</title>
      <p>The shared task is built on a multilingual corpus of user-generated social media text spanning five
Dravidian and Indo-Aryan languages: Kannada (KAN), Malayalam (MAL), Telugu (TL), Tamil (TM), and
1https://www.codabench.org/competitions/7902/</p>
      <sec id="sec-3-1">
        <title>Train</title>
        <p>Dev</p>
      </sec>
      <sec id="sec-3-2">
        <title>Test</title>
      </sec>
      <sec id="sec-3-3">
        <title>Train</title>
        <p>Dev</p>
      </sec>
      <sec id="sec-3-4">
        <title>Test</title>
      </sec>
      <sec id="sec-3-5">
        <title>Train</title>
        <p>Dev</p>
      </sec>
      <sec id="sec-3-6">
        <title>Test</title>
      </sec>
      <sec id="sec-3-7">
        <title>Train</title>
        <p>Dev</p>
      </sec>
      <sec id="sec-3-8">
        <title>Test</title>
      </sec>
      <sec id="sec-3-9">
        <title>Train</title>
        <p>Dev
Test</p>
        <p>SYM</p>
        <p>NUMBER</p>
        <p>Tulu. The raw data were sourced from YouTube comments collected from web scraper and underwent
preprocessing steps such as script normalization, tokenization, and removal of sensitive information.
Each dataset is annotated at the token level using a unified labeling scheme to support consistent
cross-language modeling.</p>
        <p>The tagset includes:
• LANG, which represents the primary language label of the respective dataset (KAN, MAL, TL,</p>
        <p>TM, or TULU),
• EN for English tokens, SYM for symbols, emoji, punctuation, and informal markers,
• NUMBER for numeric expressions,
• NAME for named entities (primarily person names),
• LOCATION for geographic references,
• MIXED for intra-token code-mixed forms,
• and OTHER for residual or ambiguous cases.</p>
        <p>Annotation was carried out by native speaker of each language following unified guidelines. Table 1
presents the distribution of tags across Train, Development, and Test splits for all languages. In the
shared-task setup, the Train and Development sets were released with gold-standard labels, whereas
the Test sets were provided in blinded form to enable unbiased system evaluation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <p>This section gives concise system descriptions for each of the eight teams that submitted. These
descriptions highlight the core methodologies and performance highlights of their approaches to the
shared tasks.</p>
      <p>
        Team 1: The authors employ a SVM-based framework enriched with morpho-phonological features
designed specifically for Dravidian languages. They combine TF-IDF vectorization over character
(
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1–4</xref>
        ) and word (
        <xref ref-type="bibr" rid="ref1 ref2">1–2</xref>
        ) n-grams with afix patterns and phonotactic cues to handle agglutination and
Romanization variability. A class-weighted, linear-kernel SVM (C=5) optimized on the Kannada
wordlevel LI dataset delivers strong performance, achieving 0.911 Macro F1 securing 4th rank in the shared
task.
      </p>
      <p>Team 2: This working note, the authors describe a word-level LI system for the Dravidian languages
Tamil, Telugu, Malayalam, Kannada, and Tulu by combining TF-IDF character n-grams in range 2 to 5,
handcrafted linguistic features like word length, capitalization and alphanumeric check, and FastText
embeddings. Their proposed system obtained macro F1 scores of 0.908, 0.921, 0.734,0.818, and 0.791 for
Tulu, Kannada, Tamil, Malayalam, and Telugu respectively securing first place in Tulu and Kannada
tracks and second place in rest of the tracks.</p>
      <p>Team 3: The authors of this system, approached the shared task with a traditional yet robust
pipeline built around language-specific feature engineering and Conditional Random Fields (CRF).
They designed lexical, orthographic, and contextual features tailored to the morphological properties
and word formation patterns of each Dravidian language, enabling the models to adapt efectively to
code-mixed and noisy inputs. Separate CRF models were trained for each language, leveraging these
handcrafted features to capture fine-grained linguistic cues often missed by end-to-end neural systems.
Their proposed system obtained macro F1 scores of 0.729, 0.796, 0.762, 0.904, and 0.596 for Telugu, Tulu,
Malayalam, Kannada, and Tamil languages securing 4th, 5th, 6th, 6th, and 8th ranks, respectively, in the
shared task.</p>
      <p>Team 4: The authors fine-tuned LaBSE on the shared-task datasets for the five Dravidian languages.
The model was trained for 10 epochs using Adam optimizer with a batch size of 32, where the
classification head was adapted to the number of language labels in the dataset (four or five). Input words were
tokenized, embeddings were generated, and the [CLS] representation from LaBSE was passed through
the classifier to predict the language of each word. The system achieved strong results, with macro-F1
scores of 0.8995 for Kannada, 0.7434 for Tamil, 0.8271 for Malayalam, 0.9515 for Telugu, and 0.8224 for
Tulu, securing 1st place for Tamil, Malayalam, and Telugu, 2nd place for Tulu, and 7th place for Kannada
on the leaderboard.</p>
      <p>
        Team 5: The authors propose a lightweight approach to word-level LI across five Dravidian languages
(Kannada, Malayalam, Telugu, Tamil, and Tulu) using character-level TF-IDF features combined with
classical machine learning classifiers. Words are vectorized as character n-grams (
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1–4</xref>
        ), transformed
into sparse TF-IDF matrices, and then passed to classifiers, with ExtraTrees showing the strongest
performance. Label encoding is used to convert language tags into numeric form for supervised learning.
The method was evaluated on the shared-task datasets, achieving macro-F1 scores of 0.8987 for Kannada,
0.7938 for Malayalam, 0.7084 for Tamil, 0.7572 for Telugu, and 0.7925 for Tulu. On the leaderboard, this
corresponded to 8th place for Kannada, 4th for Malayalam, 3rd for Tamil and Telugu, and 6th for Tulu.
      </p>
      <p>Team 6: In this work, authors present a system that combines mBERT and GRU models (mBERT+GRU)
for word-level LI in five Dravidian languages—Kannada, Malayalam, Tamil, Telugu, and Tulu. This
model merges the contextual multilingual transformer embeddings from mBERT with GRU-based
sequential modeling to learn both global and local linguistic patterns in code-mixed text. Training is
performed using a learning rate of 2e-5, weight decay of 0.01, batch size of 16, and up to 150 epochs with
early stopping to avoid overfitting. The Focal Loss and oversampling techniques are used to combat
class imbalance, with post-processing prediction cleaning aimed at removing invalid or irrelevant tags.
Evaluation on the oficial FIRE 2025 shared task dataset elicits competitive performance across all
languages, with macro F1 scores of 0.642, 0.620, 0.493, 0.732, and 0.791 for Malayalam, Tamil, Telugu,
Tulu and Kannada languages respectively obtaining 7th, 7th 8th, 8th, and 9th ranks in the shared task.</p>
      <p>Team 7: The author employs a CountVectorizer-based representation using character n-grams in
the range of 1–4 to capture subword patterns common in code-mixed Dravidian text. These features
are fed into a linear SVM classifier. The model is trained on the shared task datasets and optimized for
word-level LI across multiple Dravidian languages. Their proposed model obtained macro F1 scores of
0.917, 0.817, 0.790, 0.684, and 0.706 for Kannada, Tulu, Malayalam, Tamil, and Telugu languages securing
3rd, 3rd, 5th, 5th, and 6th ranks respectively, in the shared task.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Ranking and Findings</title>
      <p>Out of thirty-three registrations for the shared task, Ten teams took part, and seven teams submitted
their working notes. Tables 2, 3 and 4 show the results of the models submitted by the participants,
and the ranks decided by their macro F1 scores (M.F1). The top-performing models achieved macro F1
scores of 0.921, 0908, 0.952, 0.827, and 0.743 for code-mixed Kannada, Tulu, Telugu, Malayalam, and
Tamil code-mixed texts respectively, emphasizing the challenges and accomplishments of the word-level
LI task.</p>
      <p>Most of the teams employed a variety of ML models (SVM, ExtraTrees, etc.) for LI in code-mixed
text. In addition, participants also explored Conditional Random Fields (CRF) and fine-tuned LaBSE,
mBERT+GRU models. Further, ML models proposed by the participants are commonly trained with
TFIDF of character n-grams, CountVectorizer-based representation using character n-grams. The models
proposed and the features employed by the participating teams highlight the scarcity of computational
resources for handling Dravidian texts.</p>
      <p>
        The team (Team 2) that utilized TF-IDF character n-grams (
        <xref ref-type="bibr" rid="ref2 ref5">2,5</xref>
        ), handcrafted linguistic features (word
length, capitalization and alphanumeric check) and FastText embeddings outperformed the other models,
including fine-tuned LaBSE, CRF model. Most participating teams opted for language-independent
features (TF-IDF of character n-grams) rather than investigating the possibility of very few available
pre-trained models.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Works</title>
      <p>Although LI is frequently overlooked in low-resource languages, it is an essential first step for many
NLP projects. The amount of text data in low-resource languages has significantly increased as a
result of recent technological developments, especially on social media platforms where code-mixed
content—a combination of local and regional languages and English—is frequently found. Word-level
LI is required in code-mixed texts when many languages are combined at the word level. For various
ranges of ”n,” the majority of teams have investigated machine learning models trained with TF-IDF of
character n-grams. The results obtained by the models of the participating teams suggest a promising
avenue for addressing LI challenges in low-resource and code-mixed language scenarios. As future
work, we plan to extend the task by incorporating additional languages to widen its applicability.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Chat GPT-4 in order to:Grammar and spelling
check. Paraphrasing was handled via QuillBot. With this tool, the author(s) reviewed and revised the
content as required, while assuming full responsibility for the publication’s integrity.
for code-mixed language identification at the word level in kannada-english texts, arXiv preprint
arXiv:2211.09847 (2022).
[9] S. Gundapu, R. Mamidi, Word level language identification in english telugu code mixed data, in:</p>
      <p>Proceedings of the 32nd Pacific Asia conference on language, information and computation, 2018.
[10] P. Veena, M. Anand Kumar, K. Soman, Character embedding for language identification in
hindi-english code-mixed social media text, Computación y Sistemas 22 (2018) 65–74.
[11] I. Chaitanya, I. Madapakula, S. K. Gupta, S. Thara, Word level language identification in code-mixed
data using word embedding methods for indian languages, in: 2018 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2018, pp. 1137–1141.
[12] P. Veena, M. A. Kumar, K. Soman, An efective way of word-level language identification for
codemixed facebook comments using word-embedding via character-embedding, in: 2017 International
Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2017,
pp. 1552–1556.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kolipakam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dunn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Greenhill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bouckaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Verkerk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Bayesian</given-names>
            <surname>Phylogenetic</surname>
          </string-name>
          <article-title>Study of the Dravidian Language Family</article-title>
          ,
          <source>Royal Society open science 5</source>
          (
          <year>2018</year>
          )
          <fpage>171504</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ashraf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          ,
          <article-title>Overview of coli-kanglish: Word level language identification in code-mixed kannada-english texts at icon 2022</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Anusha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Corpus creation for sentiment analysis in code-mixed tulu text</article-title>
          ,
          <source>in: Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sushma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <article-title>Word-level language identification in code-mixed tulu texts</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          , K. G,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. HL</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Coli@ fire2024:
          <article-title>Findings of word-level code-mixed language identification in dravidian languages</article-title>
          ,
          <source>in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Nayel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          ,
          <article-title>Overview of coli-tunglish: Word-level language identification in code-mixed tulu text at fire 2023</article-title>
          ., in: FIRE (Working Notes),
          <year>2023</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. HL</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Nayel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Butt</surname>
          </string-name>
          , Coli@ fire2023:
          <article-title>Findings of word-level language identification in code-mixed tulu text</article-title>
          ,
          <source>in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Anusha</surname>
          </string-name>
          , G. Sidorov,
          <article-title>Coli-machine learning approaches</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>