<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Math Beyond Language Barriers: Retrieving Mathematical Content using Sentence Transformers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sharal Coelho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asha Hegde</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammed Zaher Taljeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amaan Ahmad</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Manipal Institute of Technology (MIT)</institution>
          ,
          <addr-line>Bangaluru</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Cross-Lingual Mathematical Information Retrieval (CLMIR) aims to facilitate the retrieval of mathematical content across various languages, thereby enhancing accessibility for various user communities. In this study, we address the CLMIR-2025 task with a semantic retrieval framework designed for English-Hindi retrieval. The proposed system uses transformer-based embeddings generated by the sentence-transformers/all-MiniLM-L6-v2 model, combined with FAISS for scalable similarity search. Multilingual training data are preprocessed, which are subsequently segmented into manageable chunks to handle long texts while preserving contextual information. Normalized embeddings are then computed and indexed in a FAISS vector store, enabling eficient retrieval of the top-50 candidate documents for a given query. Initial evaluation using hypothetical metrics indicates promising performance, with Precision@10 of 0.118, Mean Average Precision (MAP) of 0.149, and NDCG@10 of 0.2898, highlighting the potential of semantic embedding-based systems for CLMIR.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information Retrieval</kwd>
        <kwd>FAISS</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Mathematical Information Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Natural Language Processing (NLP) plays a crucial role in automating various tasks such as machine
translation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Information Retrieval (IR) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], text classification [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and sentiment snalysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
With the large amount of textual data being generated every day from social media posts to research
articles, NLP has become a important tool for extracting useful information and acquiring insights
from unstructured text. However, the challenge lies in finding the most relevant information from
this large pool of data. This is where IR becomes essential. IR refers to the process of searching and
retrieving meaningful information from a large collection of data based on user questions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The main
purpose of IR is to return results that are both accurate and contextually relevant. Text-based queries
are traditionally handled quite well by IR systems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This is due to the fact that textual data usually
has a linear structure, which facilitates indexing and query matching. However, IR is not limited to just
textual data but also includes the retrieval of other types of data, such as images, videos, and speech.
Among these, mathematical information presents unique challenges [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Mathematical expressions
are symbolic and non-linear, in contrast to simple text. It is challenging for traditional IR systems to
correctly understand and retrieve these statements due to their structural complexity and lack of a
natural word order. When a user inputs a mathematical formula as a search query, most general-purpose
IR system fails to provide relevant results. This is because they are not designed to understand the
syntax and semantics of mathematical formulas. The field of Mathematical Information Retrieval (MIR)
has emerged to fill this gap [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Its main goal is to create specialized systems that can search, retrieve,
and analyze mathematical data from databases, documents, and online sources.
      </p>
      <p>
        Finding and retrieving mathematical data, such as formulas, equations, symbols, and other relevant
scientific expressions, is the goal of MIR. Unlike traditional search engines like Google and Bing, which
mostly handle unstructured text, MIR systems are made to handle intricate mathematical formulas and
their many scripting styles, which are diefrent from ordinary text processing [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. MIR systems help
teachers, researchers, and students find mathematical material stored in databases, scientific publications,
or digital libraries. When a user submits a query in the form of plain text, a mathematical expression,
or a mixture of both, the system first understands the query’s structure. It splits the textual and the
mathematical portions and processes each part, applying suitable methods. In regard to mathematical
expressions, the system often uses formula parsing, pattern matching, or semantic analysis to decode
the meaning and the structure of the provided input.
      </p>
      <p>However, their dependence on English-language queries has become a limitation. This limits their
availability to consumers who might feel more at ease expressing their informational demands in
their mother tongue or another language. In this paper, we introduce the problem of Cross-lingual
Mathematical Information Retrieval (CLMIR) focused on retrieving mathematical information. In
this study, we worked on sentence-transformers/all-MiniLM-L6-v21 for semantic embeddings, which
captures contextual nuances better than dictionary-based methods, and FAISS for eficient retrieval,
inspired by neural IR advancements.</p>
      <p>The remainder of this paper is organized as follows: Section 2 reviews the related work. Section
3 describes the proposed methodology. Section 4 outlines the experiments and results and Section 5
contains discussion part. Finally, Section 6 concludes the paper and discusses directions for future
research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        CLMIR for English-Hindi, is a specialized domain within Cross-Lingual Information Retrieval (CLIR)
that focuses on retrieving mathematical content across languages. The challenge involves translating
and matching queries and documents, including mathematical expressions, while addressing linguistic
and semantic complexities. Chinnakotla et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] developed Hindi-to-English and Marathi-to-English
CLIR systems. They adopted a query translation approach using bilingual dictionaries, with rule-based
transliteration for out-of-vocabulary (OOV) words. Multiple translation/transliteration candidates were
disambiguated using an iterative PageRank-style algorithm based on term-term co-occurrence statistics.
Their system achieved a Mean Average Precision (MAP) of 0.2366 for Hindi using query titles and
0.2952 with titles and descriptions. This work highlights the importance of disambiguation in CLIR,
which is relevant for handling ambiguous mathematical terms in CLMIR. However, their reliance on
dictionaries limits performance for specialized domains like mathematics, where terminology may not
be well-covered. Bajpai et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] proposed an English-to-Hindi CLIR system with a focus on query
disambiguation using a "Two-Level Disambiguation" method. The system is tested on 30 queries and
achieved high precision. Their approach addressed lexical ambiguity in Hindi, a morphologically rich
language, by prioritizing salient context words over treating all query terms equally.
      </p>
      <p>
        Chandra et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] explored Query Expansion (QE) for Hindi-to-English CLIR using the FIRE 2012
dataset. They proposed a location-based algorithm to resolve query drift issues in QE, translating Hindi
queries to English with back-translation for improved accuracy. Documents were ranked using Okapi
BM25, achieving a 12% improvement in relevancy compared to non-QE baselines. Their use of 50 Hindi
queries and three test collections such as FIRE dataset, document snippets, and nearest-neighbor words,
demonstrates the value of QE in enhancing retrieval, which could be adapted for CLMIR to expand
mathematical queries with related terms.
      </p>
      <p>
        Paheli et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] utilized word embeddings for Hindi-to-English CLIR, addressing OOV words using
a context-based translation algorithm. They used large monolingual corpora and a small bilingual
parallel corpus, outperforming baseline statistical machine translation on FIRE datasets. Their approach
achieved better handling of OOV terms, which is critical for CLMIR, where mathematical terms may
lack direct translations. Their focus was on general text rather than mathematical content, suggesting
the need for domain-specific embeddings in CLMIR.
      </p>
      <p>
        Haq et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] introduced IndicIRSuite, comprising INDIC-MARCO that is a multilingual dataset
for 11 Indian languages, including Kannada, Telugu, Assamese, Hindi, Malayalam, Marathi, etc. and
1https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
Indic-ColBERT which is monolingual neural IR models. Their system achieved a 47.47% improvement
in MRR@10 and 12.26% in NDCG@10 for Hindi over baselines like BM25 and mBERT. While focused
on monolingual IR, their work showed potential for cross-lingual extensions, relevant for CLMIR.
These outcomes highlight key challenges in CLIR, such as query translation, OOV handling, and
disambiguation. Unlike general CLIR, CLMIR requires precise matching of mathematical concepts
across languages, where direct translations are dificult.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The proposed model for the CLMIR-2025 task uses semantic embeddings in combination with
vectorbased similarity search to accurately retrieve relevant documents corresponding to user queries. The
methodology is developed to capture both the semantic and contextual meaning of mathematical and
textual information.</p>
      <p>The considerable textual fields from the training data are aggregated and merged into comprehensive
document representations. These combined documents are then transformed into high-dimensional
vector embeddings using a pre-trained language model, which encodes both linguistic and semantic
features across languages. The embeddings are subsequently normalized to facilitate the use of cosine
similarity as the primary metric for measuring document-query relevance.</p>
      <p>
        Given the possible presence of lengthy documents, a chunking strategy is employed to partition large
texts into smaller, manageable segments without losing contextual coherence. This allows the system
to handle long inputs eficiently while preserving semantic integrity. For the retrieval mechanism,
we employed a FAISS (Facebook AI Similarity Search) vector store, which provides highly eficient
similarity search at scale. Further, the cosine similarity scores are normalized and manually recomputed,
ensuring they remain bounded within the [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] range. Generally, the proposed methodology combines
efective data pre-processing, semantic embedding generation, scalable similarity search, and score
normalization to provide a reliable and accurate framework for CLMIR.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Data Preparation</title>
        <p>For each entry in the training dataset, a unified textual representation is constructed by combining the
Title, Body, and Tags fields, with missing values replaced by empty strings to ensure consistency. To
address the challenge of lengthy documents and to improve the granularity of the embeddings, the
RecursiveCharacterTextSplitter module from LangChain2 is employed. The documents are segmented
into chunks of 500 characters with a 50-character overlap to maintain contextual continuity across
segments.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Embedding Generation and Vector Store Creation</title>
        <p>
          To generate embeddings, the HuggingFaceEmbeddings framework is employed, utilizing the pre-trained
model sentence-transformers/all-MiniLM-L6-v23. This model is specifically optimized for semantic
similarity tasks and produces dense vector representations of 384 dimensions, which strike a balance
between computational eficiency and representational power. Each document chunk, obtained through
the pre-processing and segmentation pipeline, is encoded into an embedding. To standardize the
representation space, L2 normalization is applied to all embeddings, ensuring that each vector has a unit
length. The normalized embeddings are then stored in a FAISS index, which is designed for scalable
and eficient similarity search in high-dimensional vector spaces [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. During index construction, the
parameter normalize_L2=True is set, ensuring that all vectors are stored in their normalized form. This
allows the retrieval system to perform k-nearest neighbor searches directly using cosine similarity as
the underlying distance metric.
2https://python.langchain.com/api_reference/reference.html
3https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Query Processing and Retrieval</title>
        <p>
          The query text is formed by concatenating the query and context columns. For each test query, an
embedding is generated using the same model and normalized to unit length. The FAISS vector store
performs a similarity search to retrieve the top-50 most similar document chunks for each query
embedding. To enhance precision, the retrieved chunks’ embeddings are re-generated and normalized,
and cosine similarities are manually computed using scikit-learn’s cosine_similarity function4. Scores
are clamped to the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] range by taking the maximum of 0 and the computed value. For each query,
results are collected including the query ID, search ID, run number, and the computed similarity score.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Result</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset Description</title>
        <p>The dataset for the CLMIR 2025 task5 has been taken from the Math Stack Exchange corpus in
ARQMath1, featuring roughly 39,862 training instances. Each entry is structured around scientific content bodies
that contains mathematical equations, expressions, and accompanying textual explanations in Hindi,
linked to a corresponding search ID. Performance evaluation relies on validation data with 10 English
queries combining formulas and text, plus test data with 50 such English queries.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Results</title>
        <p>The performance of our proposed model in the CLMIR 2025 task is evaluated using three standard
metrics: Precision at 10 (P@10), Mean Average Precision (MAP), and Normalized Discounted Cumulative
Gain (nDCG). These metrics assess the system’s ability to retrieve relevant mathematical information
across languages, balancing precision, ranking quality, and relevance weighting. The results for our
four submitted runs are presented in Table 1.</p>
        <p>Across the four runs, we observe a consistent improvement in performance from Run 1 to Run 4
across all metrics. Run 4 achieves the highest scores, with a P@10 of 0.118, MAP of 0.149, and nDCG of
0.2898, indicating the best overall performance. Run 1 and Run 2 exhibit lower performance, with P@10
scores of 0.048 and 0.046, respectively, and MAP scores of 0.0972 and 0.0794. Run 3 shows a noticeable
improvement over the first two runs, particularly in nDCG (0.2523), suggesting a better ranking quality.
4https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
5https://clmir2025.github.io/
Progressive enhancement in scores suggests that modifications made across the runs, such as improved
query processing, feature engineering, or model tuning, positively impacted retrieval efectiveness.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The results demonstrate a clear direction of improvement in the four submitted runs, with Run 4
outperforming the others in all the metrics evaluated. The increase in P@10 from 0.048 in Run 1 to 0.118
in Run 4 indicates a significant enhancement in the system’s ability to retrieve relevant documents.
This improvement is likely due to improvements in cross-lingual query expansion or better handling
of mathematical expressions, which are critical in the CLMIR task. Similarly, the MAP score, which
evaluates precision across all relevant documents, increases from 0.0794 in Run 2 to 0.149 in Run
4, reflecting improved ranking consistency. The nDCG metric, which emphasizes the relevance of
higher-ranked documents, shows the most significant gains, increasing from 0.1521 in Run 1 to 0.2898
in Run 4. This suggests that our system improvements better prioritize highly relevant results in later
runs. The performance gap between runs may be attributed to several factors. Run 1 and Run 2 likely
used baseline approaches with limited cross-lingual alignment or simpler term-matching techniques,
resulting in lower precision and ranking quality.</p>
      <p>The low P@10 scores suggest that the system struggles to consistently place relevant documents in
the top 10, which is critical for user satisfaction in information retrieval tasks. This could be due to
limitations in handling multilingual synonyms or variations in mathematical notation across languages.
Additionally, the diversity of the dataset in languages and mathematical formats may have posed
challenges for robust generalization.</p>
      <p>In conclusion, the progressive improvement across the four runs demonstrates the efectiveness of
iterative refinements in our system design. Run 4’s results, while the strongest, indicate that there is
still chances for improvement in CLMIR. By addressing the identified limitations, future iterations of
the system can aim for higher precision and better ranking quality, ultimately enhancing the retrieval
experience for users in the CLMIR 2025 task.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The proposed model developed for the CLMIR-2025 task, focusing on English-Hindi retrieval,
efectively addresses the challenge of accessing mathematical content across languages. By leveraging
transformer-based semantic embeddings from the sentence-transformers/all-MiniLM-L6-v2 model
and FAISS for eficient vector-based similarity search, the system enables robust retrieval of relevant
documents, including mathematical expressions and associated text, regardless of the query’s language.
The methodology–encompassing data aggregation, document chunking, normalized embedding
generation, and manual cosine similarity recomputation–demonstrated strong performance in retrieving
semantically relevant documents, as evidenced by hypothetical metrics Precision@10 (0.118), MAP
(0.149), and NDCG@10 (0.2898). These results highlight the system’s ability to bridge the accessibility
gap for Hindi-speaking students and researchers seeking English-language mathematical resources, as
well as for English speakers accessing Hindi content.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>In preparing this work, the author(s) utilized Chat GPT-4 and Grok6 for grammar and spelling checks.
Paraphrasing was handled via QuillBot. With this tool, the author(s) reviewed and revised the content
as required, while assuming full responsibility for the publication’s integrity.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Madasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>A study of machine translation models for kannada-tulu</article-title>
          ,
          <source>in: Congress on Intelligent Systems</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          , J.-Y. Nie,
          <article-title>Information retrieval meets large language models</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          <year>2024</year>
          ,
          <year>2024</year>
          , pp.
          <fpage>1586</fpage>
          -
          <lpage>1589</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Mucs</surname>
          </string-name>
          @ dravidianlangtech2023:
          <article-title>Malayalam fake news detection using machine learning approach</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>288</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Shetty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Aljunid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Manjaiah</surname>
          </string-name>
          ,
          <article-title>Sentiment exploring on feedback of e-commerce data using machine learning algorithms</article-title>
          , in: International Conference on Emerging Research in Computing, Information,
          <source>Communication and Applications</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Mucsd</surname>
          </string-name>
          @ dravidianlangtech2023:
          <article-title>Predicting sentiment in social media text using machine learning techniques</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>282</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T. R.</given-names>
            <surname>Laskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhuiyan</surname>
          </string-name>
          ,
          <article-title>Utilizing bert for information retrieval: Survey, applications, resources, and challenges</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Hambarde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Proenca</surname>
          </string-name>
          ,
          <article-title>Information retrieval: recent advances and beyond</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>76581</fpage>
          -
          <lpage>76604</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dadure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pakray</surname>
          </string-name>
          , S. Bandyopadhyay,
          <article-title>Mathematical information retrieval: A review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>57</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aizawa</surname>
          </string-name>
          , M. Kohlhase, Mathematical information retrieval,
          <source>Evaluating Information Retrieval and Access Tasks</source>
          <volume>43</volume>
          (
          <year>2021</year>
          )
          <fpage>169</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , et al.,
          <article-title>Mathematical information retrieval: Search and question answering</article-title>
          ,
          <source>Foundations and Trends® in Information Retrieval</source>
          <volume>19</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. K. Chinnakotla</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. B. SagarRanadive</surname>
            ,
            <given-names>O. P.</given-names>
          </string-name>
          <string-name>
            <surname>Damani</surname>
          </string-name>
          ,
          <article-title>Hindi and marathi to english cross language information retrieval” at clef 2007 department of cse iit bombay mumbai</article-title>
          ,
          <source>India Advances in Multilingual and Multimodal Information Retrieval</source>
          (
          <year>2008</year>
          )
          <fpage>111</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bajpai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Q.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <article-title>English-hindi cross language information retrieval system: Query perspective</article-title>
          .,
          <source>J. Comput. Sci</source>
          .
          <volume>14</volume>
          (
          <year>2018</year>
          )
          <fpage>705</fpage>
          -
          <lpage>713</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Dwivedi</surname>
          </string-name>
          ,
          <article-title>Query expansion using proposed location-based algorithm for hindienglish clir: Analyzing three test collections</article-title>
          ,
          <source>International Journal of Pattern Recognition and Artificial Intelligence</source>
          <volume>38</volume>
          (
          <year>2024</year>
          )
          <fpage>2459001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarkar</surname>
          </string-name>
          ,
          <article-title>Using word embeddings for query translation for hindi to english cross language information retrieval</article-title>
          ,
          <source>Computación y Sistemas</source>
          <volume>20</volume>
          (
          <year>2016</year>
          )
          <fpage>435</fpage>
          -
          <lpage>447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          , Indicirsuite:
          <article-title>Multilingual dataset and neural information models for indian languages</article-title>
          ,
          <source>arXiv preprint arXiv:2312.09508</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Ghadekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>More</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mangrule</surname>
          </string-name>
          , et al.,
          <article-title>Sentence meaning similarity detector using faiss</article-title>
          ,
          <source>in: 2023 7th International Conference On Computing, Communication, Control And Automation (ICCUBEA)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>