<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Multi-Model Lexical Fusion Approach for English-Bengali Code-Mixed Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kunal Chakma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subhajit Datta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Technology</institution>
          ,
          <addr-line>Agartala, Tripura, 799046</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sister Nivedita University</institution>
          ,
          <addr-line>Newtown, West Bengal, 700156</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The CMIR-2025 shared task focuses on retrieving relevant social media posts written in English-Bengali codemixed text, a setting characterized by noisy transliteration and informal language use. This paper presents a multi-model lexical retrieval framework developed for the task, which integrates two-stage text normalization and hybrid retrieval. The normalization stage combines dictionary-based replacement and fuzzy string matching to handle spelling and transliteration variations. Subsequently, multiple classical models-BM25, TF-IDF, PL2, InL2, and Hiemstra_LM-are applied, and their ranked outputs are merged using Reciprocal Rank Fusion (RRF). The proposed system, submitted under the team name NITA_CMIR, achieved a Mean Average Precision (MAP) of 0.1518, nDCG of 0.4151 (ranked 2nd overall), P@5 of 0.30, and P@10 of 0.237 on the oficial evaluation dataset. The results highlight the efectiveness of integrating normalization and rank fusion for robust information retrieval in code-mixed social media data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The remainder of the paper is organized into the following sections. The related works are discussed
in Section 2. The dataset is discussed in Section 3. The proposed work is discussed in Section 4. The
results are discussed in Section 5, followed by a discussion on the observations in Section 6. The paper
concludes in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Code-mixed information retrieval (CMIR) has gained attention in recent years, particularly for
lowresource languages such as English-Bengali mixtures common in Indian social networks. Earlier
works [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], such as those in the FIRE (Forum for Information Retrieval Evaluation) tracks, focused on
handling transliteration and spelling variations in multilingual queries. For instance, the CMIR-2024
shared task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] introduced benchmarks for retrieving relevant social media posts in English-Bengali
code-mixed data, emphasizing the challenges of noisy text and informal language. Chanda and Pal [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
provide an overview of CMIR challenges in FIRE-2024, highlighting issues like orthographic variations
and lack of standardization.
      </p>
      <p>
        Classical lexical models such as BM25 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and TF-IDF [8] remain baselines in CMIR due to their
eficiency, but they struggle with vocabulary mismatch caused by Romanization and slang [ 9]. To
mitigate this, normalization techniques, including dictionary-based mapping and fuzzy matching, have
been proposed [10]. For example, tools such as RapidFuzz1 have been used to handle orthographic
variations in transliterated text. Recent advances incorporate neural embeddings for semantic matching,
such as multilingual variants of BERT [11] fine-tuned on code-mixed data [ 12]. However, these require
substantial computational resources, making hybrid approaches, such as combining lexical retrieval
with fusion methods such as Reciprocal Rank Fusion (RRF) [13], practical for resource-constrained
settings. Our work builds on these by integrating normalization with multiple lexical retrievers and
RRF for robust run generation in English-Bengali IR.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The CMIR-2025 dataset for the shared task, obtained from platforms such as Facebook, comprises
approximately 107,900 social media posts in code-mixed English-Bengali text (Romanized Bengali and
English). Each post is considered as a document. These documents comprise fields such as DOCNO
(unique identifier), HEAD (optional header), and BODY (main content), frequently containing slang,
abbreviations, and spelling variations.</p>
      <p>The training set provided by the organisers comprises 20 code-mixed queries with the query relevance
(qrels). The test set includes 30 code-mixed queries that simulate real-world searches on subjects such as
daily life and entertainment. Queries integrate Romanized Bengali (e.g., “ki", “hobe") alongside English.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Work</title>
      <sec id="sec-4-1">
        <title>4.1. Problem definition</title>
        <p>
          The task requires retrieving ranked lists of relevant documents (runs) for English-Bengali code-mixed
queries. Code-mixed IR (CMIR) [14], [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [9] can be defined as retrieving code-mixed documents for
given code-mixed queries. Formally, CMIR can be defined as:for a given query   ⟨︀ (), ()⟩︀ where
︀⟨ (), ()⟩︀ is the language and script, the task is to retrieve the relevant documents from a pool of documents
, written in language () and script () , where  &gt; 1. Therefore, in such a scenario, the document
pool is defined as:
 = ⋃︁ (), ()
(1)
        </p>
        <sec id="sec-4-1-1">
          <title>1https://github.com/rapidfuzz/RapidFuzz</title>
          <p>where  = {1, 2, ...,  } and script  = {1, 2, ...,  }. In the CMIR 2025 shared task,  = {1, 2 }
and  = {1 }.</p>
          <p>Our submission uses fused rankings from multiple lexical models. The workflow of our proposed
work is shown in figure 1 and described in the following sections.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Preprocessing Steps</title>
        <p>4.2.1. Dictionary normalization
The raw queries and documents contain a significant amount of informal spellings, numeric-letter
substitutions, and slang words in both English and Bengali. To address this, we construct normalization
dictionaries for both languages.</p>
        <p>• English Slang Dictionary: Mapping of common abbreviations and non-standard forms such
as “gd” → “good”, “hpy” → “happy”, “frnd” → “friend”, “luv” → “love”, “gr8” → “great”, “plz” →
“please”, etc., to their standard equivalents. This helps reduce spelling noise in English fragments
of code-mixed text.
• Bengali Slang Dictionary (Romanized): Normalization of frequently observed Romanized
Bengali slangs. Examples include “ache” → “aache”, “6ilo” → “chilo”, “onk” → “onek”, “erkm” →
“erokom”, “hye6e” → “hoyeche”, and “ki6u” → “kichu”.
4.2.2. Tokenization
Following normalization, each sentence is split into tokens. In addition to simple whitespace-based
tokenization, we also perform language tagging to identify whether a token belongs to English (en)
or Romanized Bengali (bn). Each token is then aligned with its normalized form, if available. Table 1
shows the normalized form of some of the words found in the corpus.
4.2.3. Fuzzy Matching with RapidFuzz
After dictionary-based slang normalization, additional preprocessing is applied to handle spelling
variations and noisy Romanized text. We use RapidFuzz, a fuzzy string matching library, to further
Word
gd
frnd
bdy
6ilo
krechilo
n8
fvrt
coz
h66e</p>
        <p>Language
en
en
en
bn
bn
en
en
en
bn
normalize tokens. RapidFuzz computes similarity scores (with a threshold of 85) between strings and
allows us to map out-of-vocabulary words to their closest standardized form when the similarity exceeds
a given threshold.</p>
        <p>For example:
“frndz” is mapped to “friend”,
“ferends” is mapped to “friends”,
“achee” is mapped to “aache”.</p>
        <p>This two-stage normalization process — first dictionary replacement, then fuzzy matching — reduces
the efect of orthographic variations, thereby improving retrieval consistency in code-mixed queries.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Indexing and Corpus Preparation</title>
        <p>After tokenization and normalization, the cleaned corpus is converted into a PyTerrier2-compatible
format for indexing. The document collection is read into a DataFrame where the docno and title
columns are used directly as the document identifier and text, respectively. All fields are explicitly cast
to string types to ensure consistency during indexing.</p>
        <p>For the query set, each query is assigned a unique identifier ( qid) and its text is taken from the TITLE
column. Queries containing special characters or Bengali script are sanitized to remove characters that
could break the Terrier parser. When necessary, queries are translated into English using the Google
Translate API3.</p>
        <p>The final corpus is indexed using IterDictIndexer, with the document text stored for retrieval
and docno preserved as metadata. This index serves as the foundation for all subsequent retrieval
experiments.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Retrieval Models</title>
        <p>
          After indexing, we experiment with a set of classical retrieval models provided by PyTerrier with the
default settings. These models serve as strong baselines for code-mixed information retrieval tasks:
• TF-IDF [8]: A vector space model based on term frequency–inverse document frequency
weighting.
• BM25 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]: A probabilistic retrieval model that normalizes term frequency and document length.
• PL2 [15]: A divergence-from-randomness model based on the Poisson distribution.
• InL2 [16]: A variant of divergence-from-randomness that applies the Laplace after-efect.
• Hiemstra_LM [17]: A language model-based retrieval approach using linear smoothing.
        </p>
        <p>Each query from the prepared query set passes through these retrieval models. The retrieved results
for each query are ranked according to model-specific scores. To leverage the complementary strengths</p>
        <sec id="sec-4-4-1">
          <title>2https://pyterrier.readthedocs.io/en/latest/ 3https://cloud.google.com/translate</title>
          <p>of all models, the individual retrieval results are later combined using Reciprocal Rank Fusion
(RRF) [13]. This fusion method produces a single, stronger ranking per query by assigning higher
scores to documents that appear near the top across multiple models.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Run Generation via Reciprocal Rank Fusion</title>
        <p>Since qrels for the test queries are not provided by the track organizers for the English-Bengali
codemixed dataset, we construct pseudo-qrels using the fused results obtained from Reciprocal Rank Fusion
(RRF). The RRF score [13] for a document  for a query is defined as:</p>
        <p>RRFscore() = ∑︁
∈</p>
        <p>1
 + ()
(2)
where () is the rank of document  in run ,  is the set of all runs, and  is a constant (set to
60). Documents that appear higher across multiple models receive higher RRF scores.</p>
        <p>The fused ranking per query is then used to create the pseudo-qrels. Each row contains the qid,
docno, and the RRF-based score. These pseudo-qrels serve as a reference for evaluation and further
experiments. This approach allows us to leverage the complementary strengths of diferent retrieval
models while keeping the pipeline computationally lightweight.</p>
        <p>Since the organizers did not release the oficial relevance judgments (qrels) for the CMIR-2025 test
queries, all experiments beyond the oficial submissions were performed using pseudo-qrels constructed
from the fused RRF rankings. These pseudo-qrels were used only for internal validation, parameter
tuning, and ablation analysis on the training data. All leaderboard scores reported in Table 2 correspond
to the oficial evaluation performed by the organizers.</p>
        <p>Leaderboard
MUCS_Run2&amp;3 0.486 0.212 0.420 0.300
Defense_NLP2 0.377 0.187 0.393 0.290
Defense_NLP1 0.366 0.179 0.367 0.290
NITA_CMIR (Ours) 0.415 (-14.6%) 0.152 (-28.3%) 0.300 (-28.6%) 0.237 (-21.0%)
1 (All scores)
2 (MAP)
3 (MAP)
2 (nDCG)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Team Performance Analysis</title>
        <p>The organizers published the results of the run submissions which are shown in Table 2. These results
show that our run achieves an nDCG of 0.415057 (ranked 2 overall), a MAP of 0.151773 (ranked
5ℎ), a P@5 of 0.300 (ranked 7ℎ), and a P@10 of 0.236667 (ranked 7ℎ). While our nDCG score is
competitive, it does not surpass the top-performing runs (e.g., MUCS runs 2/3, which reach 0.485517).
Similarly, our MAP and precision scores trail behind stronger baselines such as MUCS and Defense_NLP,
which achieve higher MAP and P@5 values.</p>
        <p>These results suggest that, while our retrieval approach is efective in ranking relevant documents
highly (as reflected in the nDCG score), it lags in precision-orientated measures. This indicates potential
room for improvement in early precision, possibly through better query expansion, reranking, or hybrid
ensemble techniques.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Performance Comparison</title>
        <p>A comparison of our submited system (NITA_CMIR) with the top performing systems is shown in
Table 3. From Table 3, we observe that our system (NITA_CMIR) is competitive in terms of overall
ranking quality, with an nDCG of 0.415, which is only about 14.6% lower than the best-performing
system (MUCS_Run2&amp;3). However, our MAP (0.152) is approximately 28.3% below the top score,
and our precision at early cutofs (P@5 and P@10) also lag behind by 28.6% and 21.0%, respectively.
These findings highlight that while our approach is efective at ranking relevant documents higher
overall, it falls short in early precision compared to the strongest baselines. This indicates the need for
improvements in query refinement and reranking strategies to better capture highly relevant documents
at the top ranks.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>Our team, NITA_CMIR, achieves a strong nDCG of 0.415, ranking second overall among all
submissions. This indicates that our system is efective at ranking relevant documents higher in the result lists,
even though it does not outperform competing runs in MAP or precision-oriented measures (P@5 and
P@10).</p>
      <p>One possible reason for this performance pattern lies in the nature of nDCG, which emphasizes
both rank position and graded relevance. Our pipeline, which combines dictionary-based normalization
with RapidFuzz fuzzy similarity filtering, appears particularly efective at promoting highly relevant
documents into the upper ranks. As a result, our system captures the quality of ranking well, though it
retrieves fewer relevant documents in the very top positions compared to the strongest systems (e.g.,
MUCS and Defense_NLP).</p>
      <p>The comparatively lower MAP and P@k scores suggest limitations in early precision. This may
stem from incomplete query coverage due to limited handling of code-mixing variations, as well as
insuficient query expansion and reranking. In addition, reliance on classical retrieval models without
deeper neural rerankers could have restricted the system’s ability to consistently place the most relevant
documents within the top-5 or top-10 results.</p>
      <p>Overall, the results demonstrate that our approach is well-suited for ranking efectiveness in noisy,
code-mixed social media text, but future work should focus on enhancing early precision through
stronger query expansion, entity-aware rewriting, and the integration of neural rerankers.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The NITA_CMIR system, developed for the CMIR-2025 shared task on English-Bengali code-mixed
retrieval, combines dictionary-based normalization with RapidFuzz fuzzy matching and classical retrieval
models in PyTerrier. Despite its lightweight design, the system achieves competitive performance,
ranking highly relevant documents near the top. Normalization and fuzzy lexical retrieval are strong
bases for code-mixed IR, especially in noisy, low-resource social media contexts. However, performance
gaps in MAP and precision suggest the need for enhanced query expansion, entity-aware rewriting,
and neural reranking strategies. Future work aims to integrate hybrid approaches to improve ranking
quality and early precision.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT and Quillbot in order to: grammar
and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take full responsibility for the publication’s content.
[8] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Information</p>
      <p>Processing &amp; Management 24 (1988) 513–523. doi:10.1016/0306-4573(88)90021-0.
[9] K. Chakma, A. Das, Cmir: A corpus for evaluation of code mixed information retrieval of
hindienglish tweets, Computación y Sistemas 20 (2016) 425–434. URL: https://api.semanticscholar.org/
CorpusID:11152913.
[10] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer,
Overview of the HASOC track at FIRE 2020: Hate speech and ofensive content identification
in indo-european languages, CoRR abs/2108.05927 (2021). URL: https://arxiv.org/abs/2108.05927.
arXiv:2108.05927.
[11] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual BERT?, in: A. Korhonen,
D. Traum, L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp.
4996–5001. URL: https://aclanthology.org/P19-1493/. doi:10.18653/v1/P19-1493.
[12] A. Patil, V. Patwardhan, A. Phaltankar, G. Takawane, R. Joshi, Comparative study of pre-trained
bert models for code-mixed hindi-english data, in: 2023 IEEE 8th International Conference for
Convergence in Technology (I2CT), IEEE, 2023, p. 1–7. URL: http://dx.doi.org/10.1109/I2CT57861.
2023.10126273. doi:10.1109/i2ct57861.2023.10126273.
[13] G. V. Cormack, C. L. A. Clarke, S. Buettcher, Reciprocal rank fusion outperforms condorcet and
individual rank learning methods, in: Proceedings of the 32nd International ACM SIGIR Conference
on Research and Development in Information Retrieval, SIGIR ’09, Association for Computing
Machinery, New York, NY, USA, 2009, p. 758–759. URL: https://doi.org/10.1145/1571941.1572114.
doi:10.1145/1571941.1572114.
[14] S. Chanda, S. Pal, Overview of the shared task on code-mixed information retrieval from social
media data, in: FIRE 2024 Working Notesl, CEUR Workshop Proceedings, 2024, p. 124–128. URL:
https://ceur-ws.org/Vol-4054/T2-1.pdf.
[15] G. Amati, C. J. V. Rijsbergen, Probabilistic models of information retrieval based on measuring
the divergence from randomness, ACM Transactions on Information Systems (TOIS) 20 (2002)
357–389. doi:10.1145/582415.582416.
[16] G. Amati, Probabilistic models for information retrieval based on divergence from randomness,</p>
      <p>Ph.D. thesis, University of Glasgow, 2003. URL: http://theses.gla.ac.uk/1750/.
[17] D. Hiemstra, A linguistically motivated probabilistic model of information retrieval, in: C. Nikolaou,
C. Stephanidis (Eds.), Research and Advanced Technology for Digital Libraries, Springer Berlin
Heidelberg, Berlin, Heidelberg, 1998, pp. 569–584.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Muysken</surname>
          </string-name>
          (Ed.), The Cambridge Handbook of Linguistic Code-switching, Cambridge University Press, Cambridge,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Overview of the shared task on code-mixed information retrieval from social media data, in: Proceedings of the 16th Annual Meeting of the Forum for Information Retrieval Evaluation</article-title>
          , FIRE '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2025</year>
          , p.
          <fpage>29</fpage>
          -
          <lpage>31</lpage>
          . URL: https://doi.org/10.1145/3734947.3735670. doi:
          <volume>10</volume>
          .1145/3734947.3735670.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Findings of the code-mixed information retrieval from social media data (cmir) shared task at fire 2025</article-title>
          , in: K. Ghosh,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Chakraborty (Eds.),
          <source>Forum for Information Retrieval Evaluation (Working Notes) (FIRE 2025) December</source>
          <volume>17</volume>
          -20, Varanasi, India, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Overview of the cmir track at fire 2025: Code-mixed information retrieval from social media data</article-title>
          ,
          <source>in: FIRE '25: Proceedings of the 17th Annual Meeting of the Forum for Information Retrieval Evaluation. December</source>
          <volume>17</volume>
          -20, Varanasi, India, Association for Computing Machinery (ACM), New York, NY, USA,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Banchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Query expansion for mixed-script information retrieval</article-title>
          ,
          <source>in: Proceedings of the 37th International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          , SIGIR '14,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>677</fpage>
          -
          <lpage>686</lpage>
          . URL: https://doi.org/10.1145/2600428.2609622. doi:
          <volume>10</volume>
          .1145/ 2600428.2609622.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Pal,</surname>
          </string-name>
          <article-title>The efect of stopword removal on information retrieval for code-mixed data obtained via social media</article-title>
          ,
          <source>SN Comput. Sci. 4</source>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1007/s42979-023
          <article-title>-01942-7</article-title>
          . doi:
          <volume>10</volume>
          .1007/s42979-023-01942-7.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>The probabilistic relevance framework: Bm25 and beyond</article-title>
          ,
          <source>Found. Trends Inf. Retr</source>
          .
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          . URL: https://doi.org/10.1561/1500000019. doi:
          <volume>10</volume>
          .1561/ 1500000019.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>