1. Introduction

The FASEB journal 22 (2008) 338-342. [6] R. Miner

Overview of Cross-Lingual Mathematical Information Retrieval at FIRE 2025

Ayushi Malik

Pankaj Dadure

Sahinur Rahman Laskar

0 UPES Dehradun , Uttarakhand , India

2025

11 17 20

This abstract provides a short overview of the first edition of the shared task on Cross-Lingual Mathematical Information Retrieval (CLMIR) organized at the 17th Forum for Information Retrieval Evaluation (FIRE 2025). A more detailed discussion of approaches used by the participating teams is available in the track overview paper. The CLMIR shared task at FIRE 2025 is designed to encourage participants to develop retrieval approaches that can process queries in one language and retrieve relevant results written in another, thereby bridging the accessibility gap between diverse user groups. Although the CLMIR shared task witnessed encouraging interest with 9 teams registering, only 6 teams submitted their results.

eol>Information Retrieval Mathematical Information Retrieval Cross-lingual Mathematical Information Retrieval Digital Libraries Search Engines

1. Introduction

Natural Language Processing (NLP) is a transformative field that enables computers to understand, interpret, and generate human language [ 1 ][2]. With advancements in machine learning and deep learning, NLP applications have expanded from text summarization [3] and question answering [4] to complex tasks like cross-lingual understanding. A key domain where NLP shows practical significance is Information Retrieval (IR) [5], which focuses on searching, ranking, and retrieving relevant information from large repositories. Traditional IR systems relied on keyword or semantic-based retrieval [6][7] of text data, but the growth of digital information has introduced structured, semi-structured, and multimodal data [8][9] [10][11][12]. Among these, mathematical data poses particular challenges due to its symbolic, non-linear structure and representation in languages like LaTeX and MathML. Generalpurpose IR systems often fail to process mathematical queries efectively, leading to the emergence of Mathematical information retrieval (MIR) [13]. MIR deals with retrieving mathematical expressions, equations, and theorems while addressing issues of structural representation and integration with textual data. Despite advancements, most MIR systems like MathWebSearch [14][15][16], SimSearch [17][18][19], Mathcat [20][21] Tangent [22][23][24][25][26][27][28][29], and Wolfram Alpha [30][31] remain monolingual, predominantly focusing on English. This limits the accessibility of researchers and learners working in multiple languages. To bridge this gap, cross-lingual mathematical information retrieval (CLMIR) comes into the picture. CLMIR extends the scope of MIR by enabling users to query in one language and retrieve mathematical data in another language [32]. In CLMIR, a user may enter the query in one language (i.e., English) and retrieve relevant mathematical data written in another language (i.e., Hindi), thereby bridging the linguistic barriers in accessing mathematical knowledge. However, CLMIR presents a unique set of challenges. These include the scarcity of cross-lingual mathematical datasets, dificulties in aligning mathematical symbols with multilingual textual descriptions, handling structural and semantic variations of mathematical expressions across languages, and ensuring efective ranking of results in cross-lingual settings. Addressing these challenges is essential for developing robust approaches that can support cross-lingual access to mathematical knowledge on a global scale [33].

2. Track Description

The CLMIR shared task at FIRE 2025 is designed to address the challenges of retrieving mathematical data across languages, with a specific focus on the English-Hindi cross-lingual environment. The task encourages the participants to develop retrieval approaches that can process the queries in one language and retrieve relevant results written in another. The primary objectives of CLMIR are to promote research in cross-lingual retrieval of mathematical data and to create a benchmark dataset for English-Hindi MIR, enabling a fair comparison of retrieval models. To encourage the development of novel retrieval approaches that combine symbolic mathematics understanding with cross-lingual natural language processing. This is the first shared task focused on cross-lingual retrieval of mathematical content, making CLMIR an important step toward cross-lingual accessibility in STEM education and research.

2.1. Use Case/s

The use case example demonstrates how the CLMIR model retrieves relevant mathematical content across languages. It shows a query in English and the corresponding relevant and irrelevant results in Hindi, highlighting the model’s ability to match both mathematical expressions and textual meaning accurately.

2.2. Participation

A total of 9 teams, Archisha Dhyani (UPES), CJM (IISER Kolkata), Tends (NIT, Kurukshetra), IReL (IIT BHU Varanasi), j22 (IISER Kolkata), NLP Fusion (Mangalore University), MathQA_AUS (Assam University), DUCS_CLMIR ( University of Delhi), Retriever (Dhirubhai Ambani University), have been registered for the task. Out of these 9 teams, only 6 teams, Archisha Dhyani, Tends, IReL, NLP Fusion, DUCS_CLMIR, and Retriever have been able to submit their results. Among these 6 teams, only 4 Tends, IReL, NLP Fusion, and Retriever, have been successful in submitting the working notes.

3. Dataset and Evaluation

The dataset for the CLMIR task is curated from the Math Stack Exchange corpus of ARQMath-1 and contains approximately 39,862 instances. The dataset is formatted to include titles of scientific information and the body, which contains mathematical formulas expressed in LaTeX and supporting textual descriptions. The provided corpus is structured to maintain cross-lingual consistency between two languages, Hindi and English, ensuring that the mathematical content and contextual information are accurately aligned across both languages. The data covers content in Hindi, including mathematical equations, expressions, and related textual information. It is designed to support the development and evaluation of cross-lingual mathematical information retrieval approaches. To assess the performance of participants, 50 formulas and text-based queries will be provided. Participants are expected to submit a results file containing the relevant search results corresponding to each query. The performance of participants’ approaches in the CLMIR 2025 task will be evaluated using three key metrics, which are Precision@10 (P@10), Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (nDCG).

4. System Description

The CLMIR shared task attracted diverse participation, with teams proposing a variety of approaches to address the challenges of retrieving mathematical data across languages. This section provides a detailed description of the baseline approach and the methodologies submitted by participating teams.

4.1. Baseline System (Organizer)

The baseline system was developed by initially translating the input query into Hindi using the Google Translate API to ensure consistency in language representation. Following translation, embeddings for both the query and the training data were generated using the sentence-transformers/paraphrasemultilingual-mpnet-base-v2 model. This model was selected due to its proven multilingual capability, robust performance in capturing semantic similarity across languages, and demonstrated efectiveness for Hindi text. Subsequently, cosine similarity was employed as the similarity estimation technique to measure the closeness between the query and training data embeddings. Based on these similarity scores, the top 50 most relevant search results were retrieved. 4.2. Tends Team tends designed a cross-lingual retrieval pipeline that integrates machine translation, dense retrieval, and re-ranking to address the CLMIR task. The Hindi corpus from ARQMath-1 was first translated into English using the ai4bharat IndicTrans2 model, ensuring both queries and documents resided in the same language space for retrieval. Each document was represented as a composite of its title, body, and tags, which were encoded into contextualized token embeddings using CoIBERTv2, a late interaction dense retrieval model optimized for fine-grained query document matching. The embeddings were stored in a Facebook AI Similarity Search (FAISS) index, enabling eficient approximate nearest neighbour search. At retrieval time, FAISS retrieved the top 100 candidate documents, which were further refined through a cross encoder (MiniLM-L6-v2) that jointly encoded each query document pair to generate precise semantic relevance scores. The final ranked list of the top 50 documents was returned as output. Since the corpus was fully translated into English, incoming English queries, including both text and mathematical expressions, were directly processed without any additional translation, making the system both eficient and semantically robust [34]. 4.3. IReL Team IReL suggested a hybrid retrieval system that integrates multilingual embeddings and symbolic formula similarity. Their pipeline first applied preprocessing to separate textual and mathematical components, transliterating Hindi text into Latin script for compatibility with transformer-based encoders. For semantic alignment, they used MiniLM embeddings in Run 1 and MPNet embeddings in Run 2, with candidate retrieval performed through FAISS indexing. Mathematical similarity was modeled using two strategies that are a lightweight token overlap method in Run 1 and a more expressive tree-based structural similarity in Run 2 [35].

4.4. Retriever

Team retriever developed a two-stage English-Hindi CLMIR system combining BM25-based sparse retrieval with LLM-based pointwise re-ranking. English text formula queries were translated into Hindi using GPT-4, and BM25 retrieved the top 100 candidate documents from ARQMath-1 corpora. These candidates were then re-ranked using instruction-tuned Gemma-2 models(4B and 12B) in a zero-shot setting, with binary relevance predictions converted into probabilities for ranking [36].

4.5. NLP_Fusion

Team NLP_Fusion, proposed a semantic retrieval system for the CLMIR 2025 shared task, focusing on English-Hindi cross-lingual mathematical information retrieval. Their approach utilized sentencetransformer/all MiniLM-L6-v2 embeddings to generate multilingual semantic representations of documents and queries, combined with FAISS for eficient vector-based similarity search. The Hindi documents from ARQMath-1 were preprocessed by concatenating titles, bodies, and tags, then split into manageable chunks to preserve contextual coherence. The system retrieved the top 50 most relevant chunks for each query, with additional post-processing to secure scores within the [ 0,1 ] range for consistency [37].

5. Experimental Results and Analysis

We evaluated all team submissions for the CLMIR 2025 shared task using three metrics, which are P@10, MAP, and nDCG. The results are shown in Table 1. Each team was allowed up to three runs, and the diferences in performance reflect the design of their systems. The baseline system used Google Translate to convert queries into Hindi, generated multilingual embeddings with the MPNet model, and ranked documents using cosine similarity. Team Archisha Dhyani showed steady improvements, with Run 3 giving the best results, which suggests that their system became stronger after refinements. DUCS_CLMIR produced stable results across runs, but at a lower level compared to other teams, likely because their method did not add new improvements across submissions. IReL performed much better in Run 2, with a hybrid retrieval system that integrates multilingual embeddings and symbolic formula similarity. NLP fusion gave consistent results across all three runs, with the third run being the strongest. They utilized sentence-transformer/all MiniLM-L6-v2 embeddings to generate multilingual semantic representations of documents and queries, combined with FAISS for eficient vector-based similarity search. Retriever improved a lot in later runs, with Run 3 achieving one of the best MAP and nDCG scores overall. Their combination of BM25 retrieval with large language model re-ranking helped capture both word overlap and deeper meaning. Finally, Tends submitted only one run, but it still achieved strong results. Their system, which used translation, dense retrieval, and re-ranking, showed solid performance even without multiple submissions. Run 1 0.0280 Run 2 0.0440 Run 3 0.0500

6. Conclusion

The CLMIR shared task at FIRE 2025 represents the first attempt to systematically evaluate the retrieval of mathematical data across languages, with a focus on the English-Hindi pair. By providing a large-scale cross-lingual dataset, well-defined queries, and a clear evaluation framework, this task establishes a foundation for research in a less-explored area of cross-lingual mathematical information retrieval. In future editions, we aim to expand the dataset to cover additional language pairs and incorporate more diverse query types (including multi-formula and contextual queries). Furthermore, by releasing resources such as baseline models, smaller benchmark datasets, and multilingual extensions, we intend to make the task more accessible and encourage participation from a broader range of researchers. Through these steps, CLMIR seeks to promote innovation in cross-lingual mathematical information retrieval, promote access to STEM resources, and build a long-term research community dedicated to advancing cross-lingual access to mathematical knowledge.

Acknowledgement

This research work is supported under the Visvesvaraya PhD Scheme for Electronics and IT, implemented by the Ministry of Electronics and Information Technology (MeitY), Government of India. The authors also thank UPES Dehradun for providing the necessary support and research infrastructure.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools. [22] K. D. V. Wangari, R. Zanibbi, A. Agarwal, Discovering real-world use cases for a multimodal math search interface, in: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014, pp. 947–950. [23] D. Stalnaker, R. Zanibbi, Math expression retrieval using an inverted index over symbol pairs, in:

Document recognition and retrieval XXII, volume 9402, SPIE, 2015, pp. 34–45. [24] R. Zanibbi, K. Davila, A. Kane, F. Tompa, The tangent search engine: Improved similarity metrics and scalability for math formula search, arXiv preprint arXiv:1507.06235 (2015). [25] R. Zanibbi, K. Davila, A. Kane, F. W. Tompa, Multi-stage math formula search: Using appearancebased similarity metrics at scale, in: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 145–154. [26] K. Davila, R. Zanibbi, Layout and semantics: Combining representations for mathematical formula search, in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 1165–1168. [27] D. Fraser, A. Kane, F. W. Tompa, Choosing math features for bm25 ranking with tangent-l, in:

Proceedings of the ACM Symposium on Document Engineering 2018, 2018, pp. 1–10. [28] W. Zhong, R. Zanibbi, Structural similarity search for formulas using leaf-root paths in operator subtrees, in: Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, Springer, 2019, pp. 116–129. [29] B. Mansouri, S. Rohatgi, D. W. Oard, J. Wu, C. L. Giles, R. Zanibbi, Tangent-cft: An embedding model for mathematical formulas, in: Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval, 2019, pp. 11–18. [30] V. E. Dimiceli, A. S. Lang, L. Locke, Teaching calculus with wolfram| alpha, International Journal of Mathematical Education in Science and Technology 41 (2010) 1061–1071. [31] W. N. S. Wan Mohd Rosly, S. S. Syed Abdullah, F. N. Ahmad Shukri, The uses of wolfram alpha in mathematics, Teaching and Learning in Higher Education (TLHE) 1 (2020) 96–103. [32] J. Gore, J. Polletta, B. Mansouri, Crossmath: Towards cross-lingual math information retrieval, in: Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, 2024, pp. 101–105. [33] A. Malik, P. Dadure, S. R. Laskar, A review of mathematical information retrieval: Bridging symbolic representation and intelligent retrieval, Archives of Computational Methods in Engineering (2025) 1–35. [34] A. Sur, V. Shukla, A. Rai, L. Garg, Enhancing multilingual mathematical document retrieval: A hindi to english translation and colbert based approach, in: Proceedings of the Forum for Information Retrieval Evaluation (FIRE), ACM, 2025. [35] K. Tewari, S. Chanda, R. Tripathi, S. Pal, Miracle: Multilingual information retrieval with crosslingual embeddings for mathematical expressions, in: Proceedings of the Forum for Information Retrieval Evaluation (FIRE 2025), To appear, 2025. Accepted for publication. [36] K. Kachhadiya, P. Patel, Cross-lingual mathematical information retrieval with bm25 and llm-based pointwise re-ranking, in: Proceedings of the Forum for Information Retrieval Evaluation (FIRE 2025), To appear, 2025. Accepted for publication. [37] K. S. Coelho, A. Hegde, M. Z. Taljeh, A. Ahmad, Math beyond language barriers: Retrieving mathematical content using sentence transformers, in: Proceedings of the Forum for Information Retrieval Evaluation (FIRE 2025), To appear, 2025. Accepted for publication.

[1]

Enríquez ,

F. L.

Cruz ,

F. J.

Ortega ,

C. G.

Vallejo ,

J. A.

Troyano , A comparative study of classifier combination applied to nlp tasks , Information Fusion 14 ( 2013 ) 255 - 267 . URL: https://www.