<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CoLIR: Late Interaction based Code-Mixed Information Retrieval for English-Bengali language pair</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Devansh Shrof</string-name>
          <email>devansh.shroff@gmail.com</email>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>There has been limited work towards developing information retrieval models for Indic languages, more so for code-mixed information retrieval. Code-mixing is an issue for Indic languages, as it is usually accompanied by transliteration and the absence of a standard CMIR model for the same. Based on the aforementioned premise, the paper attempts to develop a model specifically for the English-Bengali language pair. This work uses ColBERT on a test dataset containing code-mixed and transliterated queries developed from the content extracted from social media in English and Bengali. Query expansion using a phonetic algorithm is performed on the dataset to allow the parsing of texts to be more efective. A few experiments are performed by combining classical ranking models as initial rankers of the documents along with a neural re-ranker, in this case ColBERT. The experiments are carried out using PyTerrier. The results place Team IRSolver 7th with a mAP score of 0.063702, and indicate that neural IR on the type of documents in our dataset seems to be the way forward for efective CMIR on Indic language texts.</p>
      </abstract>
      <kwd-group>
        <kwd>code-mixed</kwd>
        <kwd>transliteration</kwd>
        <kwd>ColBERT</kwd>
        <kwd>Query Expansion</kwd>
        <kwd>PyTerrier</kwd>
        <kwd>CMIR</kwd>
        <kwd>neural IR</kwd>
        <kwd>information retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Indic languages have always been approached meticulously when it comes to various NLP tasks due to
the diversity and uniqueness of the languages in terms of morphology and scripts. This however poses
certain problems, especially for the models to parse a document written in those languages efectively.
The problem is aggravated when the text is code-mixed and more so when it is transliterated as well.
For example, a word like আেছ is often transliterated as
aache while on social media, it sometimes
turns up as a6e. Consider another word like ভােলা which is transliterated as bhaalo and a lot of times
as vaalo, valo, bhalo etc. online. In information retrieval, this is a major hurdle, as our model must
be robust enough to handle the linguistic dynamics in play in order to be able to efectively retrieve
meaningful answers. In that case, how can we retrieve answers from documents that cannot even
be parsed properly? How can we ensure that our model captures the linguistic nuances of Indian
languages, even if we can parse them?</p>
      <p>The paper aims to develop a model that can compute query relevance for a particular document,
both of which are code-mixed and transliterated, specifically for English, Roman-transliterated Bengali
and Brahmi script Bengali. Given a query and a document, the goal is to determine the relevance score
of the query to the document and rank them accordingly. This involves handling the complexities of
code-mixing, where elements from both languages are used within the same text, as well as dealing
with the informal and non-standardized nature of the language. The system must accurately capture
the semantic relationship between the query and the document despite these linguistic challenges.
It is important because, first of all, lesser than usual amount of work has been documented for this
specific use case. Presenting a significant outcome in this direction would give way to the development
of highly robust search engines that truly return meaningful answers to queries in the vernacular
language with a fundamental understanding of modern social contexts. Moreover, it would enable
wider access to groups previously disadvantaged due to the absence of search systems that process
vernacular low-resource languages, especially on social media platforms and online community forums.</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <sec id="sec-3-1">
        <title>3.1. Prompting</title>
        <p>
          The importance of why one must consider the encoding of transliterated text is best shown in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which show a significant diference in performance when phonetic encoding is brought into play.
Phonetic encoding has generally not been considered as a supplementary tool with neural rankers. This
work explores the diference in performance by incorporating phonetic encoding in the form of query
expansion and using it with a neural ranker and re-ranker to observe increased accuracy in document
retrieval.
        </p>
        <p>
          This shared task consists of a single dataset [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ] for code-mixed information retrieval. The corpus
consists of 107900 documents and 50 queries in total. The dataset is in Roman transliterated Bengali
mixed with English language.
        </p>
        <p>The entire state-of-the-art was eventually narrowed down to 3 major techniques in order to develop
efective Information Retrieval models for the purposes of this paper, which seem pertinent to the topic:
Prompting, Late-Interaction and Sparse Retrieval.</p>
        <p>
          Prompting of LLMs has been a common occurrence across papers for Information Retrieval in the years
following the development of the transformer [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. It essentially involves zero-shot/few-shot prompting
LLMs to make them rank the documents based on their relevance to the given query. One of the initial
methods of ranking documents was done using Pointwise Ranking Prompting [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ][
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] wherein a relevance
generation method is used [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] that involves the calculation of a relevance score based on a probabilistic
function. Additionally, a query generation method [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] has also been used where a LLM was prompted
to generate a query based on the document and the probability of generating the actual query would be
checked. A more efective approach was devised by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] called Pairwise Ranking Prompting in which
relevance is checked pairwise owing to LLM’s inherent ability to understand pairwise relevance order
of documents. Experiments were carried out on test sets of TREC-DL2019 and TREC-DL2020 along
with the BEIR dataset, giving higher NDCG@10 score than RankGPT and almost at par performance
with GPT-4 and GPT-3.5 Turbo. An improvement over this is the Setwise approach [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] which increased
the performance of [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] by comparing multiple candidates at once. Experiments for [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] were performed
using BEIR and TREC-DL2019 as well, giving slightly higher NDCG@10 score than [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Sparse Retrieval</title>
        <p>
          Sparse Retrieval is a technique for information retrieval wherein query and documents are converted
into a high-dimensional sparse vector over a vocabulary and using inverted index to look up exact or
weighted term matches, followed by ranking. One of the significant works in this sphere is [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] which
introduced DeepCT (Deep Contextualized Term Weighting) framework that maps BERT’s contextualized
embeddings to context-aware term wights for sentences and passages [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], for first-stage retrieval.
Another very important development in this area is [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] that introduced SPLADE (Sparse Lexical and
Expansion Model) with the primary focus on first-stage ranking, giving much higher NDCG@10 and
MRR@10 scores as compared to DeepCT [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and other sparse retrieval models, upon evaluation on
MS-MARCO dataset.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Late-Interaction</title>
        <p>
          Late-interaction methods, introduced recently, have proven to be a game-changer in retrieval tasks due
to their unique mechanism of individual parsing of queries and documents by a cross-encoder followed
by similarity estimation using the MaxSim1 function ensuring the preservation of important context
in both, unlike coalescing them into a high-dimension vector. It was introduced by [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] through the
ColBERT model (Figure 1). ColBERT gives a better MRR@10 and Recall score than existing methods,
on MS MARCO. Its variant ColBERTv2 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] performs even better when compared head to head with
SPLADEv2 on the BEIR and LoTTE benchmark. Moreover, ColBERTv2 improves upon the latency of
ColBERT, in addition to enabling higher retrieval quality and a much smaller index, thus saving space.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Approaches specifically for CMIR</title>
        <p>
          Although all of the models mentioned above were not specifically made keeping in mind code-mixing,
they are important cornerstones to be based of of, during the development of a robust model than can
tackle code-mixing. When specifically considering code-mixed IR work on Indic languages, one of the
benchmark works in this use case are [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that have experimented on corpus-based stopwords
removal and phonetic encoding for query expansion, respectively, proving to be quite efective and have
given a completely new way to think about Indic texts for retrieval. An addition to the aforementioned
papers, instrumental in devising a new method to efectively parse Indic texts, is [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] that introduces
a new phonetic algorithm called Hindex. It defines a set of rules to encode transliterated Hindi texts
based on the phonemes of the consonants and vowels in Hindi. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] achieved a 16% increase in mAP
score over the removal of non-corpus-based stopwords. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] achieved over 15% increase in mAP for
Hindex-encoded Named Entity and English tags, showcasing the eficacy of Hindex phonetic algorithm.
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] have explored this topic from the point of view of Sentence-BERT in addition to the
Graph Neural Network (GNN), and a mathematical model based on the prompting GPT 3.5 Turbo and
the sequential nature of the documents, respectively. An important contribution towards Indic IR in
terms of resources has been IndicIRSuite [18], which introduces open-source ColBERT models called
IndicColBERT, finetuned in 11 Indian languages using the Indic-MS MARCO dataset.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        The aim is to perform empirical research on selected Information Retrieval techniques that may prove
efective for code-mixed transliterated social media data to achieve a significant mAP score and obtain
meaningful search results when applied to queries with the same constraints, for English and Bengali.
Moreover, a comparison is made between the traditional IR methods and these experimental methods
1MaxSim operator computes the maximum similarity between query embeddings and document embeddings, the scalar
outputs of multiple such operators are summed across query terms [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
to understand the diference in performance. The main research questions to be asked based on the
above premise are as follows.
      </p>
      <p>
        RQ1. Can we solely rely on a neural model for CMIR, without a first stage retrieval?
RQ2. How important is accurate phoneme-based parsing in terms of improving performance
on code-mixed transliterated Indic language texts?
The experiments presented in this paper are carried out using PyTerrier, a Python interface for Terrier IR
Engine developed by [19], that enables the creation of flexible retrieval pipelines. The models and indexer
used here are taken from the PyTerrier libraries , like TRECCollectionIndexer for indexing documents as
per the indexes used by algorithms like BM25 and ColBERTIndexer for indexing documents according
to ColBERT’s index structure. The models used for this experiment[20] are BM25 and ColBERT,
wherein BM25 is loaded using the .   .   () 2 function and ColBERT is intialised using the
  _   () 3 function from the PyTerrier extension of ColBERT. A small comparitive study is
drawn up between BM25, ColBERT and a pipeline containing BM25 as the first stage ranker with
ColBERT as a neural re-ranker. This is done using the then operator (represented by »), which combines
two transformers [19] (retrieval transformer in this case). The experiments were carried out on training
data provided by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] wherein the first case included non-expanded queries and the other case included
Hindex-expanded queries. Hindex is applied on Named-Entity tags and English tags only since this
setup yielded the best results in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Additionally, the BM25»ColBERT model is applied on the provided
test queries [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This paper has used a pre-trained ColBERT model for the purposes of experimentation,
no explicit fine-tuning was carried out on ColBERT.
      </p>
      <p>The metrics used here to measure the performance of each model are mAP, (Mean Average Precision),
NDCG (Normalised Discounted Cumulative Gain), P@5 (Precision@5) and P@10 (Precision@10), with
greater emphasis on the mAP score.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>It is evident from Table 1 and Table 2 that query expansion improves the mAP, NDCG, P@5 and
P@10 scores of our chosen models thus answering RQ2 that phoneme-based parsing of transliterated
code-mixed text for English and Bengali introduces variations of NE and English tags into the expanded
query that aid in retrieval. It is also evident that solely using a neural ranker does not really yield a
significant mAP score, rather a more sophisticated approach is required if an efective single-stage
neural ranker is to be obtained, thus answering RQ1. The scores obtained on the test set were quite low
and hence underscore the need to adopt a stronger approach.</p>
      <sec id="sec-5-1">
        <title>Model</title>
        <p>BM25
ColBERT
BM25»ColBERT
MAP Score
0.175802
0.074576
0.119620
ndcg Score
0.397553
0.347530
0.225578</p>
      </sec>
      <sec id="sec-5-2">
        <title>P@5 Score</title>
        <p>0.34
0.22
0.37</p>
      </sec>
      <sec id="sec-5-3">
        <title>P@10 Score</title>
        <p>0.25
0.145
0.250
2https://pyterrier.readthedocs.io/en/latest/_modules/pyterrier/terrier/retriever.htmlRetriever
3https://github.com/terrierteam/pyterrier_colbert/blob/main/pyterrier_colbert/indexing.py</p>
      </sec>
      <sec id="sec-5-4">
        <title>Submission File</title>
        <p>Run 1
Run 2
hi hyderabad e rapid antigen test kothay
kora hochhe keu janate parben its urgent
jate test korar 1 2 ghontar modhhe result
pete pari
hi hyderabad e rapid antigen test kothay
kora hochhe keu janate parben its urgent
jate test korar 1 2 ghontar modhhe result
pete pari
hi hyderabad e rapid antigen test kothay
kora hochhe keu janate parben its urgent
jate test korar 1 2 ghontar modhhe result
pete pari
hi hyderabad e rapid antigen test kothay
kora hochhe keu janate parben its urgent
jate test korar 1 2 ghontar modhhe result
pete pari
hi hyderabad e rapid antigen test kothay
kora hochhe keu janate parben its urgent
jate test korar 1 2 ghontar modhhe result
pete pari</p>
        <p>Document
acha tmr chele thikache to covid test korar
por test korar por kono problm hochhe na
toh
test er no e o onek kichu edik odik ache like
delhi rt oct half kore half er besi rapid test
korche
rapid test authentic na apni normal ta karun
citizen hospital e retcr ta korate parben
pdf search korar valo kichhu website janate
parben</p>
        <p>Score</p>
        <p>The scores obtained on the test set after successful submission of the obtained order of relevant
documents for individual test queries, are presented in Table 3 along with a list of sample query
and documents along with their relevance scores from the test set presented in Table 4. Thus, team
”IRSolver”[21, 22] secured 7th rank amongst all the submissions with a mAP score of 0.063702 when
evaluated on test data by the organisers.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Code-mixing and transliteration complicate retrieval tasks on Indic language texts due to the sheer
diversity of the scripts and the lingustic dynamics, which needs a more specialised approach. Consequently,
traditional information retrieval methods cannot be employed to circumvent this problem.</p>
      <p>This paper explores the eficacy of ColBERT as a single stage neural ranker for retrieval, and highlights
the superiority of two-stage retrieval engine with BM25 as the first stage ranker and ColBERT as the
neural re-ranker compared to the prior, in code-mixed information retrieval task. Moreover, the paper
also highlights the usefulness of the phoneme-based approach Hindex for encoding texts in queries to
retrieve the most relevant documents , and the boost in performance brought about by the combination
of these two approaches.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used Grammarly to grammar and spelling check. The
author reviewed and edited the content as needed and took full responsibility for the publication’s
content.
[18] S. Haq, A. Sharma, P. Bhattacharyya, IndicIRSuite: Multilingual Dataset and Neural Information</p>
      <p>Models for Indian Languages, 2023. URL: https://arxiv.org/abs/2312.09508. arXiv:2312.09508.
[19] C. Macdonald, N. Tonellotto, Declarative Experimentation in Information Retrieval using PyTerrier,
in: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information
Retrieval, 2020, pp. 161–168.
[20] C. Macdonald, N. Tonellotto, S. MacAvaney, I. Ounis, PyTerrier: Declarative experimentation in
Python from BM25 to dense retrieval, in: Proceedings of the 30th ACM International Conference
on Information &amp; Knowledge Management, 2021, pp. 4526–4533.
[21] S. Chanda, K. Tewari, S. Pal, Findings of the code-mixed information retrieval from social media
data (cmir) shared task at fire 2025, in: Forum for Information Retrieval Evaluation (Working
Notes) (FIRE 2025) December 17-20, Varanasi , India, CEUR-WS.org, 2025.
[22] S. Chanda, K. Tewari, S. Pal, Overview of the CMIR Track at FIRE 2025: Code-Mixed Information
Retrieval from Social Media Data, in: FIRE ’25: Proceedings of the 17th Annual Meeting of the
Forum for Information Retrieval Evaluation. December 17-20, Varanasi , India, Association for
Computing Machinery (ACM), New York, NY, USA, 2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <source>Text Processing on Code Mixed Social Media Data</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Prabhakar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Query Expansion for Transliterated Text Retrieval</article-title>
          ,
          <source>Transactions on Asian and Low-Resource Language Information Processing</source>
          <volume>20</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Overview of the shared task on code-mixed information retrieval from social media data, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , FIRE '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2025</year>
          , p.
          <fpage>29</fpage>
          -
          <lpage>31</lpage>
          . URL: https://doi.org/10.1145/3734947.3735670. doi:
          <volume>10</volume>
          .1145/3734947.3735670.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Overview of the shared task on code-mixed information retrieval from social media data</article-title>
          ,
          <source>in: FIRE 2024 Working Notesl, CEUR Workshop Proceedings</source>
          ,
          <year>2024</year>
          , p.
          <fpage>124</fpage>
          -
          <lpage>128</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>4054</volume>
          /
          <fpage>T2</fpage>
          -1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention is All You Need,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsipras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Soylu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , et al.,
          <source>Holistic Evaluation of Language Models, arXiv preprint arXiv:2211.09110</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aghajanyan</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pineau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <article-title>Improving Passage Retrieval with Zero-Shot Question Generation</article-title>
          ,
          <source>arXiv preprint arXiv:2204.07496</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jagerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          , et al.,
          <source>Large Language Models are Efective Text Rankers with Pairwise Ranking Prompting, arXiv preprint arXiv:2306.17563</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zuccon, A Setwise Approach for Efective and Highly Eficient Zero-Shot Ranking with Large Language Models</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>Context-Aware Sentence/Passage Term Importance Estimation for First Stage Retrieval</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>10687</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>Context-Aware Term Weighting for First Stage Passage Retrieval</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1533</fpage>
          -
          <lpage>1536</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          , S. Clinchant, SPLADE:
          <article-title>Sparse Lexical and Expansion Model for First Stage Ranking</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2288</fpage>
          -
          <lpage>2292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          , M. Zaharia,
          <article-title>ColBERT: Eficient and Efective Passage Search via Contextualized Late Interaction over BERT</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Santhanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saad-Falcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zaharia, ColBERTv2: Efective and Eficient Retrieval via Lightweight Late Interaction</article-title>
          ,
          <source>arXiv preprint arXiv:2112.01488</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>The Efect of Stopword Removal on Information Retrieval for Code-Mixed Data Obtained Via Social Media</article-title>
          ,
          <source>SN Comput. Sci. 4</source>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1007/ s42979-023
          <article-title>-01942-7</article-title>
          . doi:
          <volume>10</volume>
          .1007/s42979- 023- 01942- 7.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>InfoTextCM: Addressing Code-Mixed Data Retrieval Challenges via Text Classification, in: Forum of Information Retrieval and Evaluation (FIRE-</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deroy</surname>
          </string-name>
          , S. Maity,
          <source>RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced CodeMixed Information Retrieval</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2411.04752. arXiv:
          <volume>2411</volume>
          .
          <fpage>04752</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>