<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Second International Workshop on Scholarly Information Access (SCOLIA), April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Scientific knowledge injection and multilingual alignment for concept-driven retrieval with sentence embedding models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolau Duran-Silva</string-name>
          <email>R@1</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Accuosto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Horacio Saggion</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LaSTUS Lab, TALN Group, Universitat Pompeu Fabra</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SIRIS Lab, Research Division of SIRIS Academic</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <volume>2</volume>
      <issue>2026</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Accessing research and innovation information increasingly requires efective retrieval across languages, document types, and levels of textual granularity. In many research ecosystems, content is inherently multilingual and queries are short, concept-driven, and underspecified, posing challenges for traditional lexical retrieval methods, while performance of general-purpose dense retrieval is limited. In this work, we present an empirical evaluation of multilingual dense retrieval for scholarly documents in Catalan, Spanish, and English. We analyse the behaviour of general-purpose and domain-adapted embedding models across monolingual and cross-lingual settings, query types, and query lengths, and compare dense retrieval against strong sparse baselines. Using weakly supervised query-passage and triplet datasets derived from open research information, we show that domain-specific multilingual fine-tuning substantially improves retrieval efectiveness, semantic alignment, and embedding coherence. Our results highlight the importance of domain and multilingual adaptation for robust scholarly information access. These capabilities are particularly important for research mapping and scientometric analysis tools, where retrieval quality can directly influence downstream analytical modules such as topic and collaboration analysis, or research portfolio mapping.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Scholarly Information Access</kwd>
        <kwd>Dense Retrieval</kwd>
        <kwd>Multilingual Semantic Search</kwd>
        <kwd>Domain Adaptation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Scientific and technical information is increasingly available through open databases of research projects,
scholarly publications, and patents [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], which contain an enormous quantity of textual information
that details current challenges, proposed advancements, used technologies, and expected impact of the
research and innovation process [
        <xref ref-type="bibr" rid="ref35 ref4">4</xref>
        ]. Given this situation, one could think that the growing amount
of available information is very useful to foster new discoveries and advances in research. However,
accessing and reading this large and growing amount of documents would be extremely time-consuming,
and therefore, unfeasible for humans [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        These documents form the basis of research mapping platforms [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ] which allow researchers and
policymakers to search, compare, and analyse of research and innovation activities and production
across languages, institutions, and funding instruments. Search in scholarly and project repositories is
therefore often multilingual and concept-driven. Research information systems aggregate outputs from
diferent territories and communities with distinct dominant languages, and publicly funded research
projects managed by national or regional funding agencies frequently provide titles and descriptions in
local languages [
        <xref ref-type="bibr" rid="ref10 ref11 ref6 ref7 ref9">9, 10, 6, 7, 11</xref>
        ]. These challenges are particularly evident in publicly funded research
data, where documents are distributed across local, national, and international repositories, and titles
and descriptions may vary substantially in length, detail, and availability.
      </p>
      <p>The aim of research mapping platforms often goes beyond traditional document ranking and relevance,
because search results are also commonly used as input to analytical modules aimed at understanding
scientific specialisation, organisational performance, or thematic trends across research portfolios. In
this sense, retrieval can be interpred as a form of classification. Users frequently issue brief queries such
as ’cancer’, ’artificial intelligence’ or ’blue economy’, expecting to retrieve relevant documents that may
be written in diferent languages and described using diverse and more specific scientific terminology.
These fine-grained scientific concepts are the basis of scientific queries [ 12]. However, the major
dificulty in scholarly information retrieval could be the knowledge behind the words which is expected
to be known or understood [13], especially relevant for those queries that are not self-descriptive or
only known by domain experts.</p>
      <p>While traditional keyword-based information retrieval systems can handle lexical representations [15],
they fail to capture the semantic relationships and contextual meaning that characterise modern scientific
language—for example, they cannot recognise that ’oncology’ and ’cancer research’ refer to related
concepts, or that a query in Spanish should match an English abstract on the same topic, or to process
more complex queries like ’AI for energy transition’. Recent advances in multilingual large language
models have enabled more natural and concept-oriented interaction with textual databases through
semantic search [16]. Embedding-based retrieval [17, 18] represent search queries and documents in
a shared semantic space, supporting retrieval beyond exact word matches, allowing proper semantic
search. Sentence encoder models are increasingly used from scholarly RAG systems [19], to modern
scientometrics topic modelling approaches [20, 21]. While language models are trained to represent
context, concept-based queries (e.g. ’cancer’) lack of suficient context to produce informative dense
representations.</p>
      <p>However, semantic search systems often rely on similarity thresholds or ranking signals derived from
embedding similarity, and general-purpose embedding models may introduce biases that afect scholarly
retrieval efectiveness. Similarity scores in this analysis are computed using cosine similarity between
query and document embeddings produced by SentenceTransformers [17]. As shown in Figure 1,
despite being multilingual, the model displays cross-lingual diferences in similarity distributions. In
this case, documents written in the query language (English, in this case) tend to receive higher similarity
scores, while texts in less-represented languages, such as Catalan, obtain lower scores. While retrieval
metrics such as Recall@k and MRR depend on the relative ranking of documents rather than absolute
similarity values, systematic shifts in similarity distributions may still influence ranking outcomes when
relevant documents consistently receive lower similarity scores than competing candidates. In addition,
similarity scores correlate with passage length, with shorter texts (e.g., titles) often ranked ahead of
longer passages regardless of their semantic completeness given a concept-based query. This behaviour
is something we have observed in the practical development of dense retrieval search systems with
industry-standard models, as well as narrow similarity window discrimination for scientific documents
with general-purpose models. Although these behaviours are partly expected, this issue is particularly
relevant in open scholarly knowledge graphs, where abstract availability has recently decreased due to
publisher restrictions [22].</p>
      <p>Our goal is to evaluate and improve dense retrieval models for multilingual documents by adapting
models with:
• scientific domain knowledge,
• multilingual alignment across Catalan, Spanish, and English,
• ranking-oriented behaviour for search,
• fine-grained cosine separation to support classification and analytical tasks.</p>
      <p>
        Our contribution is primarily empirical rather than architectural. Instead of proposing new training
objectives or model architectures, we evaluate multilingual semantic search for scholarly publications
and projects, analysing how general-purpose embedding models behave across languages, passage
lengths, and similarity ranges, and how they can be adapted to the scientific multilingual domain.
This study builds on our previous work on multilingual semantic retrieval and query segmentation
for scientific information access, expanding both evaluation and experiments. The development of
these models is in the context of building new text search capabilities for open research information
platforms in Catalonia, with tools like RIS3MCAT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]1, for this reason we focus on titles and abstracts
from both research publications and funded projects in Catalan, Spanish, and English.
      </p>
      <p>These capabilities are directly relevant to scientometric and research policy analysis. Research
mapping platforms such as RIS3MCAT integrate search with analytical modules that support the
exploration of research portfolios, collaboration networks, thematic specialisation, and funding impact.
In such systems, retrieval acts as a filtering and classification step that determines which documents are
included in downstream analytical workflows. Consequently, the quality of semantic retrieval directly
afects the reliability of scientometric analyses derived from these platforms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Scholarly information access research has addressed the challenge raised by the rapid growth of
scientific literature, by exploring how to address information needs of researchers [ 23], developing
recommendation systems [24] and discovery tools [25], and using publication textual content and
metadata [26]. Semantic similarity search is dominated by dense retrieval methods, which encode
queries and documents into a shared embedding space and rank candidates by vector similarity [27, 28].
These approaches enable concept-level search beyond lexical overlap, but also face known limitations
[29] in capturing rare terminology and fine-grained distinctions, issues especially relevant for scientific
and multilingual contexts. However, several studies show that dense retrieval does not consistently
outperform strong lexical baselines as BM25 [
        <xref ref-type="bibr" rid="ref12">30</xref>
        ]. However, highlight that fine-tuning a dense model on
domain-specific data lead to improved performance, surpassing BM25 in most metrics. Dense retrieval
models trained in one domain do not generalise properly in others [
        <xref ref-type="bibr" rid="ref13">31</xref>
        ], particularly general-purpose
semantic representation models often fail to capture fine-grained scientific concepts [12].
      </p>
      <p>
        A key challenge for dense retrieval of scholarly documents is the lack of annotated data for training
and test. Creating supervised query-passage pairs for scientific literature is costly and generally requires
domain experts [12]. To address this gap, prior studies have explored the challenge of generating
training labels with unsupervised and weakly supervised approaches [
        <xref ref-type="bibr" rid="ref14">32</xref>
        ], including pseudolabelling
strategies [
        <xref ref-type="bibr" rid="ref13">31</xref>
        ], automatic generation of negative examples [
        <xref ref-type="bibr" rid="ref15">33</xref>
        ], and query expansion with LLMs [12],
Their significant improvements suggest that dense retrievers can be trained without manually labelled
data. The challenge of weakly supervised dataset creation for scholarly document processing has been
addressed due to the challenge and cost of generating those labels, which generally require domain
experts [
        <xref ref-type="bibr" rid="ref16 ref17">34, 35</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>1Available at https://ris3mcat.gencat.cat/.</title>
        <p>
          The sentence-transformers framework [17] provides a widely adopted pipeline for training dense
retrievers, typically using Multiple Negatives Ranking Loss (MNRL) [
          <xref ref-type="bibr" rid="ref18">36</xref>
          ], where in-batch examples act as
implicit negatives to eficiently learn discriminative representations. Recent advances refine contrastive
objectives through better negative sampling, hard negative mining [
          <xref ref-type="bibr" rid="ref19">37</xref>
          ], cross-lingual pairs [
          <xref ref-type="bibr" rid="ref20">38</xref>
          ], and
improved optimisation strategies [
          <xref ref-type="bibr" rid="ref21">18, 39</xref>
          ], all contributing to more robust vectorial representations.
Hybrid retrieval architectures partially address the weaknesses of dense-only methods in handling
exact matches and rare entities (e.g., uncommon organisation names, specialised technical terms, or
newly coined concepts that appear infrequently in training corpora). Late-interaction models such as
ColBERT [
          <xref ref-type="bibr" rid="ref22">40</xref>
          ], preserve token-level granularity while maintaining eficiency, and recent multi-vector
approaches [
          <xref ref-type="bibr" rid="ref23">41</xref>
          ] further improve retrieval precision through fixed-dimensional encodings. Others [
          <xref ref-type="bibr" rid="ref24">42</xref>
          ]
have explored extraction and indexing of relevant dimensions of scholarly abstracts like directions or
challenges described. For deployment, vector indexes based on HNSW graphs [
          <xref ref-type="bibr" rid="ref25">43</xref>
          ] remain the standard
for low-latency large-scale retrieval.
        </p>
        <p>
          In the multilingual domain, several model families are particularly relevant. The multilingual E5
models [
          <xref ref-type="bibr" rid="ref26">44</xref>
          ] show strong cross-lingual transfer from large-scale retrieval corpora. Multilingual
RoBERTabased encoders trained in trilingual query relevance dataset (on 65k CA-ES-EN query-passage pairs)
demonstrates efectiveness when trained with domain-appropriate data [
          <xref ref-type="bibr" rid="ref27">45</xref>
          ]. These models benefit
substantially from domain-specific contrastive fine-tuning, which improves discrimination between
closely related scientific concepts. While in scientific domain, SPECTER [
          <xref ref-type="bibr" rid="ref28">46</xref>
          ] leverages citation networks
to specialise embeddings for scientific papers (though predominantly in English). However, recent work
[12] compare E5 and Specter2 [18], finding E5 can achieve best results, better than BM25 or hybrid
baselines.
        </p>
        <p>
          Dense retrieval also plays a central role in retrieval-augmented generation (RAG) frameworks,
improving factual accuracy for LLMs [
          <xref ref-type="bibr" rid="ref29 ref30">47, 48</xref>
          ]. However, adapting dense retrievers to specialised
multilingual scientific domains remains challenging due to domain-specific terminology, code-switching,
and limited non-English training data [
          <xref ref-type="bibr" rid="ref31 ref32">49, 50</xref>
          ]. Our approach follows the contrastive paradigm while
introducing domain-specific multilingual pairs to strengthen semantic alignment across Catalan, Spanish,
and English research texts.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>
        This section describes the methodology used to evaluate and adapt dense retrieval models for
multilingual scholarly search, with a focus on cross-lingual alignment, ranking behaviour, and sensitivity to
passage length. Following prior studies [
        <xref ref-type="bibr" rid="ref14 ref17">35, 32</xref>
        ], we rely on weak supervision derived from latent and
author-provided publication metadata and machine translation for training models. This setting reflects
realistic constraints in multilingual scholarly information access, where large-scale expert annotation is
not available. An alternative approach to multilingual retrieval would consist of translating all queries
and documents into a pivot language such as English; however, in this work we focus on multilingual
embeddings to avoid full-corpus translation and preserve original-language representations.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Retrieval Models</title>
        <p>
          Base Models. We evaluate a set of multilingual or scientific-domain sentence encoder models:
• Multilingual E52 [
          <xref ref-type="bibr" rid="ref26">44</xref>
          ]: a multilingual text embedding encoder trainer on MS-MARCO dataset
[
          <xref ref-type="bibr" rid="ref33">51</xref>
          ], a large-scale passage retrieval dataset derived from Bing search queries.
• mRoBERTA_retrieval3: a trilingual RoBERTa model pre-trained on CA, ES, and EN data.
• distilRoBERTa4 [17]: a lightweight English sentence encoder used as general-purpose baseline.
• SPECTER5 [
          <xref ref-type="bibr" rid="ref28">46</xref>
          ]: a scientific-domain encoder trained on English documents for citation similarity,
2huggingface.co/intfloat/multilingual-e5-base
3huggingface.co/langtech-innovation/mRoBERTA_retrieval
4huggingface.co/sentence-transformers/all-distilroberta-v1
5huggingface.co/sentence-transformers/allenai-specter
providing a strong baseline for semantic paper retrieval.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Datasets</title>
        <p>We build several weakly supervised datasets from openly available scholarly collections of publications
and projects to support training, evaluation and analysis.</p>
        <p>Trilingual Research Project Corpus.</p>
        <p>
          This is a dataset of 1.5K publicly funded research projects and is used to analyse retrieval behaviour
across languages and document granularities. It consists of 500 English projects from the European
Commission’s CORDIS platform6, 500 Catalan projects from RIS3CAT[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]7, and 500 Spanish projects
from AEI8 and CDTI9. Each record includes title and description, when available. We have manually
annotated relevant documents according to 5 diferent concept-driven queries, using a pooling strategy
(for each query, the top 30 candidate projects retrieved by keyword search, bm25, and dense model
variants).
        </p>
        <p>Query-Passage Dataset.10
Our primary training dataset comprises 76k query-text pairs, equally distributed across English, Catalan,
and Spanish. We collect 30K scientific publications in English from several bibliographic databases 11,
extracting their titles, abstracts and author keywords. Individual author keywords are treated as queries,
while titles and abstracts serve as passages. To obtain multilingual supervision samples, all textual
ifelds (author keywords, titles and abstracts) are automatically translated into Catalan and Spanish
using machine translation system, using the Google Translate API. The original and translated texts
are then aligned to construct both monolingual and cross-lingual query-passage pairs across the three
languages. There are no repeated articles between languages. The dataset contains both monolingual
and cross-lingual pairs. Approximately 90% of the examples correspond to keyword→text pairs (where
the text may consist of the title, abstract, or title+abstract), reflecting the typical use of short concept
queries in scholarly search, and the missing abstracts for some records. The remaining 10% consist
of title→abstract pairs to preserve document-level semantic similarity during training, and for
textual equivalent searches. While automatic translation may introduce some noise or semantic drift,
keywords are typically short technical terms, which reduces the likelihood of substantial translation
errors. Contrastive training has been shown to be robust to moderate noise in supervision signals. The
overall data construction process is illustrated in Figure 2. This dataset can therefore be considered a
weakly supervised resource, where author keywords act as implicit relevance signals. In a low-resource
multilingual setting, this approach enable the generation of cross-lingual training pairs with a low
investment of resources. The dataset is split into 80/10/10 partitions. Splitting is performed at pair level,
ensuring that each query-passage pair appears in only one partition. Because queries correspond to
author keywords representing scientific concepts, the same query term (e.g. “cancer”) may occur across
splits paired with diferent passages, while longer and more specific queries appear less frequently. This
setup reflects realistic retrieval scenarios where common concepts may correspond to many diferent
documents.</p>
        <p>
          Classification Dataset. 12
We additionally use the classification dataset scidocs-mag [
          <xref ref-type="bibr" rid="ref28">46</xref>
          ], translating one third to Catalan
and one third Spanish, which are annotated with 19 scientific categories corresponding to Microsoft
Academic Graph’s Fields of Science at level 0. This is used for computing polarity score and optimal
similarity threshold search.
6cordis.europa.eu/projects
7https://ris3mcat.gencat.cat/
8aei.gob.es/ayudas-concedidas/buscador-ayudas-concedidas
9cdti.es/datos-abiertos-creditos-subvenciones-y-lineas
10huggingface.co/datasets/nicolauduran45/multilingual_research_pairs
11huggingface.co/datasets/nicolauduran45/scidocs-keywords-exkeyliword
12huggingface.co/datasets/nicolauduran45/multilingual-research-classification
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Fine-tuning</title>
        <p>In order to assess diferent fine-tuning strategies on our query–passage dataset, we experimented with
diferent dataset configurations and loss functions. This allows us to analyse how training objectives
influence multilingual alignment and ranking behaviour.</p>
        <p>Loss Functions.
dense retrieval.</p>
        <p>
          We evaluate four complementary loss functions [17] commonly used in
• Multiple Negatives Ranking Loss (MNRL) [
          <xref ref-type="bibr" rid="ref34">52</xref>
          ], which performs contrastive learning using in-batch
negatives. For each query–passage positive pair, all other passages in the same mini-batch (batch
size = 32) are treated as implicit negatives, without explicit negative sampling.
• Contrastive Loss [
          <xref ref-type="bibr" rid="ref18">36</xref>
          ], a pairwise objective that learns to separate relevant and non-relevant
query–document pairs using a margin-based formulation. We adapt the dataset by pairing each
query with its associated positive passage and sampling a single explicit negative passage per
query. Negatives are selected randomly under the constraint that they do not share any annotated
positive keywords with the query, ensuring semantically safe negative examples. Positive pairs
are labelled with similarity 1 and negative pairs with 0.
• Triplet Loss, which explicitly enforces relative similarity constraints between queries, relevant
passages, and hard negatives. For each query, we form triplets consisting of an anchor (query), a
positive passage, and a sampled negative passage. Negative passages are selected as in Contrastive
Loss. A cosine-distance margin of 0.5 is used.
• Cosine Similarity Loss, which optimises cosine similarity scores for query–document pairs. We
use the same pairwise dataset construction as for Contrastive Loss.
        </p>
        <p>Training Setup. Models are fine-tuned using SentenceTransformer framework [17]. Training is
performed for three epochs with identical optimisation setting across models, and evaluated on held-out
test data. Detailed hyperparameter settings are reported in Appendix B.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation and analysis</title>
      <p>We explore model retrieval capacity, and analyse performance in monolingual and cross-lingual settings,
explore the impact of query length, as well as model behaviour for lexical and semantic queries. We
evaluate on 5 example queries performance of sparse and dense retrieval and explore how to choose
the best similarity thresholds for dense retrieval.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation metrics</title>
        <p>
          Models are evaluated on the held-out multilingual test split using the following metrics:
• Top- Recall ( ∈ {1, 5, 10}): proportion of queries for which the paired passage appears among
the top- retrieved results.
• Cosine@1: average cosine similarity between positive query–passage pairs, measuring
embedding alignment quality.
• Mean Reciprocal Rank (MRR): evaluates the ranking quality by measuring the inverse rank of
the first correct passage within the top-10 retrieved candidates.
• Neighbourhood Polarity: following [
          <xref ref-type="bibr" rid="ref21">39</xref>
          ] formulation, we compute the proportion of the top-
nearest neighbours in the embedding space that share the same class label (discipline) as the
target document. Higher polarity indicates more coherent semantic neighbourhoods and stronger
clustering of scientific topics.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Overall model performance</title>
        <p>Table 1 reports retrieval performance of the base and fine-tuned embedding models on our multilingual
Query-Passage Dataset test split. We report R@k, MRR, cosine accuracy, and neighbouring polarity. To
ensure robust evaluation given the weakly supervised construction of the dataset, retrieval is performed
over batches of 64 candidate documents. Because supervision signals are derived from author keywords,
relevance annotations are incomplete, given that a document may be relevant to a query even if it
is not paired in the dataset. This setting reduces false positives arising from semantically related
but non-paired samples. Evaluating over the full corpus would introduce several false negatives and
artificially penalise correct semantic matches. Restricting the candidate pool allows us to measure
the capacity of the model to distinguish relevant passages from semantically related distractors while
mitigating noise from incomplete supervision.</p>
        <p>In addition, we also treat as valid positives any passages associated with additional author keywords
from the same source document, reflecting the many-to-many nature of author keywords, where a
single document may correspond to multiple conceptually related queries. The neighbouring polarity
score derived from the classification dataset, measures whether the top-  neighbours (here,  = 16)
share the same scientific field. This provides an external estimate of whether fine-tuned models preserve
meaningful semantic structure across scientific domains.</p>
        <p>Comparing loss functions, MNR loss consistently provides the largest gains, reflecting its explicit
optimisation of relative similarity among in-batch candidates for retrieval. The results indicate that
domain-specific and multilingual enrichment significantly improves both ranking and semantic
organisation of the embedding space. While E5 achieves the strongest overall performance, all models
benefit from fine-tuning with MNR loss, including encoders originally trained only on English data.
In contrast, neighbourhood polarity improves to a similar extent across all loss functions, suggesting
that most objectives encourage comparable levels of inter-document semantic cohesion, even when the
improvements in retrieval are smaller.</p>
        <p>In the following sections, we focus on the best-performing fine-tuned models, those with MNR loss,
and analyse their behaviour in more detail.
4.2.1. Performance by Query Type
To better analyse retrieval behaviour, and to answer the question of how well dense retrieval preserve
lexical retrieval capacities, we further distinguish between lexical and semantic query-passage matches.
A pair is considered lexical when the query string appears verbatim in the passage (e.g., query “cancer”
and passage containing “breast cancer”). Otherwise, it is labelled as semantic when retrieval success
requires conceptual inference or paraphrasing (e.g., query “cancer” and passage mentioning “basal cell
carcinoma”). This classification is language-agnostic: cross-lingual pairs are still considered lexical if
translated forms match directly. Table 2 reports Recall@1 and Recall@10 across lexical and semantic
Model
– Base Models
E5
mRoBERTA
DistilRoBERTa
Specter
– Fine-tuned with ContrastiveLoss
E5
mRoBERTA
DistilRoBERTa
Specter
– Fine-tuned with CosineSimilarityLoss
E5
mRoBERTA
DistilRoBERTa
Specter
– Fine-tuned with MNRLoss
E5
mRoBERTA
DistilRoBERTa
Specter
– Fine-tuned with TripletLoss
E5
mRoBERTA
DistilRoBERTa</p>
        <p>Specter
matches, comparing base models and MNRLoss finetuned. Base models show a pronounced gap
between lexical and semantic performance, indicating a strong reliance on surface-level term overlap.
Fine-tuning with MNRLoss substantially improves retrieval performance for both match types, with
lexical and semantic recall doubling in most cases. These results suggest that training on a mixture of
lexical and semantic query–passage pairs strengthens both exact-match sensitivity and deeper semantic
generalisation.</p>
        <p>MRR
.70
.59
.58
.47
Model R@1
Base Models Lex.| Sem.</p>
        <p>E5 .48 | .34
mRoBERTA .29 | .27
DistilRoBERTa .37 | .29
Specter .21 | .19
Fine-tuned Models
E5 .79 | .64
mRoBERTA .69 | .54
DistilRoBERTa .66 | .50
Specter .63 | .51</p>
        <p>R@10
Lex.| Sem.</p>
        <p>.83 | .73
.68 | .68
.68 | .62
.54 | .55</p>
        <p>Model
Base Models
E5
mRoBERTA
DistilRoBERTa
Specter
Fine-tuned Models
E5
mRoBERTA
DistilRoBERTa
Specter</p>
        <p>R@1
Mono.| Cross.</p>
        <p>.54 | .27
.31 | .25
.40 | .25
.25 | .15
4.2.2. Performance by Language Configuration
To analyse the impact of multilingualism on retrieval quality, we evaluate the models separately under
monolingual and cross-lingual pairs. In the monolingual scenario, queries and passages are written in
the same language, while the cross-lingual scenario contains pairs where the query and the target text
are in diferent languages. This distinction allows us to measure both in-language semantic retrieval
and the ability of models to align concepts across languages. Table 3 reports Recall@1 and Recall@10
under both conditions for all base and fine-tuned models. We observe how in fine-tuned models the
gap between monolingual and cross-lingual performances are reduced considerably.
4.2.3. Monolingual Performance by Language and Query Length
We further analyse monolingual retrieval performance by grouping test pairs according to the language
of the target passage. This analysis examines how models handle scientific text in each language
independently, isolating retrieval accuracy from cross-lingual alignment efects. Table 4 reports Recall@1
and Recall@10 for all models across the three languages. We observe the English as the dominant
language, likely due to a major representation in training datam. However, this gap is reduced after
ifne-tuning, by most models Catalan is the worst, possibly due to representation and resource availability.
Finally, we analyse in Table 5 retrieval performance in function of query length. Queries are grouped
into three categories: short (single token), medium (2-3 tokens), and long ( 4 tokens). This captures the
efect of some of the challenges of concept-driven keyword searches, comparing from no context to
more descriptive queries. Across all models, fine-tuning yields substantial improvements for all lengths
of queries, indicating enhanced robustness to limited contextual information.</p>
        <p>Model R@1
Base Models CA| EN| ES
E5 .45 | .62 | .54
mRoBERTA .26 | .35 | .32
DistilRoBERTa .31 | .59 | .29
Specter .18 | .34 | .21
Fine-tuned Models
E5 .70 | .79 | .75
mRoBERTA .60 | .67 | .67
DistilRoBERTa .57 | .73 | .61
Specter .55 | .71 | .59</p>
        <p>R@10
CA| EN| ES
.86 | .90 | .89
.66 | .75 | .69
.66 | .84 | .68
.54 | .70 | .55
.78
.69
.68
.68</p>
        <p>Long
.70
.43
.55
.39</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Comparing sparse and dense retrieval</title>
        <p>To compare dense retrieval with well-established sparse methods, we suggest and conduct a small-scale
analysis using exact keyword matching and BM25 as baselines. We evaluate five representative,
conceptdriven queries we have annotated over the trilingual research project corpus, spanning well-established
scientific topics, emerging policy-oriented concepts, and semantically complex queries that are not
always lexically explicit in project descriptions. Table 6 reports Precision@10 across retrieval methods.
While BM25 provides a strong and stable baseline, particularly for topics with consistent terminology,
base embedding models do not consistently outperform it. In contrast, fine-tuned embedding models,
especially E5, achieve higher precision across all queries, with the largest gains observed for
conceptdriven queries of diferent natures. This is a small-scale analysis, but it would be interesting to analyse
more deeply going forward, also as query routing strategies.
Method
Exact Keyword Match
BM25
E5 (base)
mRoBERTa (base)
DistilRoBERTa (base)
Specter (base)
E5 (ft)
mRoBERTa (ft)
DistilRoBERTa (ft)
Specter (ft)</p>
        <p>r
’NuEcnleeargy’
’SustaFinoaobdl’e
’EBclouneomy’</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Selecting Similarity Thresholds for Retrieval and Classification</title>
        <p>A key challenge in dense retrieval is determining an appropriate relevance threshold on cosine similarity,
particularly when retrieval outputs are used for analytical tasks. Unlike ranking-based evaluation, these
applications require a binary decision on document relevance, making threshold selection both critical
and model-dependent. To address this question, we leverage the classification dataset to estimate cosine
similarity thresholds that maximise the F1 score on the test set. We report, in Table 7, average optimal
threshold and corresponding F1 score across 19 subject categories, providing practical orientation
and empirical guidelines for selecting similarity thresholds under diferent embedding models.</p>
        <p>Model
E5
mRoBERTA
DistilRoBERTa
Specter</p>
        <p>Base
Threshold
.79
.70
.21
.71</p>
        <p>To present the efect of threshold selection, we present an example in Figure 3, which visualises cosine
similarity distributions with True/False samples for a representative query (Environmental science)
under the base and fine-tuned E5 models. The histogram highlights how fine-tuning increases the
separation between relevant and non-relevant documents, leading to higher F1 score.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Across all experiments, fine-tuning consistently improves retrieval quality for every model and
evaluation setting. Gains are especially pronounced in cross-lingual retrieval, where base encoders struggle
to align Catalan, Spanish, and English scientific content. Models show 20–30 point improvements
in Recall@1 after contrastive fine-tuning, confirming that domain-specific multilingual adaptation is
important for multilingual scientific search.</p>
      <p>Improvements extend across match types, while lexical queries are naturally easier, fine-tuning
also boosts semantic retrieval capacity, demonstrating that the models learn to generalise beyond
surface forms. Importantly, dense models do not lose lexical retrieval capacity, fine-tuning strengthens
both lexical and semantic abilities. Even weaker base encoders become competitive multilingual
retrievers after adaptation. Monolingual performance by language shows clear asymmetries that reflect
underlying resource availability. English remains the easiest setting, with the highest scores even before
adaptation. Catalan and Spanish lag behind in base models, particularly Catalan, which sufers from
limited representation in openly available corpora. After fine-tuning, however, these gaps narrow
substantially: Catalan gains the largest relative improvements, and Spanish reaches parity with English
in some models.</p>
      <p>These results show that the combination of multilingual contrastive learning and modest
domainspecific supervision yields robust multilingual and cross-lingual semantic search capabilities—crucial
for accessing R&amp;I information in ecosystems where English, Spanish, and Catalan coexist. These models
are able to improve upon strong sparse retrieval baselines. Finally, our analysis highlights that efective
scholarly retrieval requires not only strong ranking performance but also interpretable similarity scores.
The use of classification dataset threshold identification provides guides for bridging retrieval and
analytical applications. Because from a scientometric perspective, improving retrieval quality is critical
for research intelligence platforms rely on search results as input for analytical modules and facets that
compute indicators such as thematic specialization or collaboration networks. Reliable multilingual
retrieval therefore supports more accurate mapping and monitoring of research ecosystems.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work, we examine the performance of multilingual embedding models for accessing scientific
and innovation data in a trilingual setting characteristic of many R&amp;I information systems. Our results
demonstrate that lightweight and domain-adapted models, including Catalan-centric variants, can
efectively adapt to domain-specific data. Beyond findings, we contribute new multilingual datasets,
model checkpoints, and evaluation resources designed to support future research on cross-lingual
scientific information access. Taken together, our work underscores the importance of domain-specific
adaptation and robust multilingual alignment for enabling reliable and scalable access to open research
information. Beyond improving document retrieval, these capabilities are also relevant for scientometrics
and research analysis, where semantic search systems are used to identify and analyse resarch portfolios
and collaboration patterns.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used the following generative AI tools and services:
ChatGPT, Claude, DeepL, and LanguageTool. These tools were used exclusively to support
writingrelated tasks, including grammar and spelling checking, paraphrasing and sentence rephrasing, and
general proofreading of the manuscript. In addition, generative AI tools were used for assistance in code
development, documentation, and testing during the preparation of experimental scripts. After using
these tools/services, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Supported by the Industrial Doctorates Plan of the Department of Research and Universities of the
Generalitat de Catalunya, by Departament de Recerca i Universitats de la Generalitat de Catalunya
(grant reference 2022/DI /00017).</p>
      <p>We thank the anonymous reviewers for their constructive feedback and suggestions, which improved
the clarity and quality of this work.
[12] Y. Zhang, R. Yang, S. Jiao, S. Kang, J. Han, Scientific paper retrieval with llm-guided semantic-based
ranking, arXiv preprint arXiv:2505.21815 (2025). doi:10.48550/arXiv.2505.21815.
[13] C. Friedman, P. Kra, A. Rzhetsky, Two biomedical sublanguages: a description based on the
theories of zellig harris, Journal of biomedical informatics 35 (2002) 222–235. doi:10.1016/
S1532-0464(03)00012-1.
[14] L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, F. Wei, Multilingual e5 text embeddings: A
technical report, arXiv preprint arXiv:2402.05672 (2024). doi:10.48550/arXiv.2402.05672.
[15] S. Robertson, H. Zaragoza, The probabilistic relevance framework: BM25 and beyond, volume 4,</p>
      <p>Now Publishers Inc, 2009. doi:10.1561/1500000019.
[16] A. Biswal, L. Patel, S. Jha, A. Kamsetty, S. Liu, J. E. Gonzalez, C. Guestrin, M. Zaharia, Text2sql
is not enough: Unifying ai and databases with tag, arXiv preprint arXiv:2408.14717 (2024).
doi:10.48550/arXiv.2408.14717.
[17] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,
Association for Computational Linguistics, 2019. URL: https://arxiv.org/abs/1908.10084.
[18] A. Singh, M. D’Arcy, A. Cohan, D. Downey, S. Feldman, Scirepeval: A multi-format benchmark for
scientific document representations, in: Proceedings of the 2023 Conference on Empirical Methods
in Natural Language Processing, 2023, pp. 5548–5566. doi:10.18653/v1/2023.emnlp-main.
338.
[19] M. D. Skarlinski, S. Cox, J. M. Laurent, J. D. Braza, M. Hinks, M. J. Hammerling, M. Ponnapati, S. G.</p>
      <p>Rodriques, A. D. White, Language agents achieve superhuman synthesis of scientific knowledge,
arXiv preprint arXiv:2409.13740 (2024). doi:10.48550/arXiv.2409.13740.
[20] A. Glazkova, Identifying topics of scientific articles with bert-based approaches and topic modeling,
in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2021, pp. 98–105.
doi:10.1007/978-3-030-75015-2_10.
[21] N. Bovenzi, N. Duran-Silva, F. A. Massucci, F. Multari, J. Pujol-Llatse, Mapping sti ecosystems via
open data: overcoming the limitations of conflicting taxonomies. a case study for climate change
research in denmark, in: International Conference on Theory and Practice of Digital Libraries,
Springer, 2022, pp. 495–499. doi:10.1007/978-3-031-16802-4_52.
[22] B. Kramer, More open abstracts? comparing abstract coverage in crossref and openalex, 2024. URL:
https://doi.org/10.5281/zenodo.11580550. doi:10.5281/zenodo.11580550.
[23] I. Frommholz, P. Mayr, G. Cabanac, S. Verberne, Bibliometric-enhanced information retrieval:
14th international bir workshop (bir 2024), in: European Conference on Information Retrieval,
Springer, 2024, pp. 442–446. doi:10.1007/978-3-031-56069-9_61.
[24] S.-Y. Yang, C.-L. Hsu, S.-H. Lu, Developing an ontology-supported information recommending
system for scholars, in: 2009 Joint Conferences on Pervasive Computing (JCPC), 2009, pp. 223–228.
doi:10.1109/JCPC.2009.5420185.
[25] S. Volkova, P. Bautista, A. Hiriyanna, G. Ganberg, I. Erickson, Z. Klinefelter, N. Abele, H.-T. Kao,
G. Engberson, Cross-disciplinary knowledge retrieval and synthesis: A compound ai architecture
for scientific discovery, arXiv preprint arXiv:2511.18298 (2025). doi: 10.48550/arXiv.2511.
18298.
[26] T. Strohman, W. B. Croft, D. Jensen, Recommending citations for academic papers, in:
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in
information retrieval, 2007, pp. 705–706. doi:10.1145/1277741.1277868.
[27] V. Karpukhin, B. Oguz, S. Min, P. S. Lewis, L. Wu, S. Edunov, D. Chen, W.-t. Yih, Dense passage
retrieval for open-domain question answering, in: EMNLP (1), 2020, pp. 6769–6781. doi:10.18653/
v1/2020.emnlp-main.550.
[28] G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, E. Grave, Unsupervised
dense information retrieval with contrastive learning, arXiv preprint arXiv:2112.09118 (2021).
doi:10.48550/arXiv.2112.09118.
[29] O. Weller, M. Boratko, I. Naim, J. Lee, On the theoretical limitations of embedding-based retrieval,
2025. URL: https://arxiv.org/abs/2508.21038.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Online Resources</title>
      <sec id="sec-9-1">
        <title>The datasets and models are available at: • GitHub, • Datasets &amp; Models.</title>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>B. Fine-tuning Hyperparameters</title>
      <p>We provide experimental details of our baseline fine-tuning approaches of sentence encoder models for
content retrieval. Training was run (using 1x 24 GB GPU) for all models with hyperparameter defined
in Table 8 .
Parameter
Loss function
Epochs
Batch size
Learning rate
Selection criterion</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Manola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schirrwagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Príncipe</surname>
          </string-name>
          ,
          <source>The openaire research graph data model</source>
          ,
          <year>2019</year>
          . URL: https://api.semanticscholar.org/CorpusID:182277225.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Priem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Piwowar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Orr</surname>
          </string-name>
          ,
          <article-title>Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts</article-title>
          ,
          <source>ArXiv abs/2205</source>
          .
          <year>01833</year>
          (
          <year>2022</year>
          ). URL: https://api.semanticscholar.org/ CorpusID:248512771.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Sciscinet: A large-scale open data lake for the science of science research</article-title>
          ,
          <source>Scientific Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>315</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41597-023-02198-9.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Massucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matusiak</surname>
          </string-name>
          ,
          <article-title>Identifying specialisation domains beyond taxonomies: mapping scientific and technological domains of specialisation via semantic analyses, in: Quantitative Methods for Place-Based Innovation Policy</article-title>
          , Edward Elgar Publishing,
          <year>2020</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>234</lpage>
          . doi:
          <volume>10</volume>
          .4337/9781789905519.00014.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Frommholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mayr</surname>
          </string-name>
          , G. Cabanac,
          <string-name>
            <given-names>S.</given-names>
            <surname>Verberne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Kreutz</surname>
          </string-name>
          ,
          <source>The first workshop on scholarly information access (scolia)</source>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>326</fpage>
          -
          <lpage>331</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -88720-8_
          <fpage>50</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Carretero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duran-Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guixé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pujol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Rondelli</surname>
          </string-name>
          , G. Rull,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cortijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romagosa</surname>
          </string-name>
          ,
          <article-title>Towards building a monitoring platform for a challenge-oriented smart specialisation with ris3-mcat</article-title>
          ,
          <source>arXiv preprint arXiv:2401.10900</source>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv. 2401.10900.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>ART-ER</surname>
          </string-name>
          , SIRIS Academic, Monitoring Platform: Methodology Document - Smart
          <source>Specialization Strategy</source>
          <year>2021</year>
          -2027,
          <string-name>
            <given-names>Technical</given-names>
            <surname>Report</surname>
          </string-name>
          , Emilia-Romagna
          <string-name>
            <surname>Region</surname>
          </string-name>
          ,
          <year>2024</year>
          . URL: https://monitoraggios3. art-er.it/documents/metodologia/S3%20Monitoring%
          <article-title>20Methodology%20document.pdf, platform release updated as of November 5</article-title>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves</surname>
          </string-name>
          ,
          <article-title>Product: The lens-patent and scholarly search analysis</article-title>
          ,
          <source>Journal of the Canadian Health Libraries Association (JCHLA) 46</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Baruch</surname>
          </string-name>
          ,
          <article-title>Open access developments in france: the hal open archives system</article-title>
          ,
          <source>Learned Publishing</source>
          <volume>20</volume>
          (
          <year>2007</year>
          )
          <fpage>267</fpage>
          -
          <lpage>282</lpage>
          . doi:
          <volume>10</volume>
          .1087/095315107X239636.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>S. M.</surname>
          </string-name>
          <year>d</year>
          . Santos, G. Fraumann,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mugnaini</surname>
          </string-name>
          ,
          <article-title>The relationship between the publication language and its impact on public and collective health (</article-title>
          <year>2020</year>
          ). doi:https://doi.org/10.1590/ SciELOPreprints.1549.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Packer</surname>
          </string-name>
          ,
          <article-title>Multilingualism in scientific literature communicated by journals from the scielo brazil collection</article-title>
          ,
          <source>European Review</source>
          <volume>32</volume>
          (
          <year>2024</year>
          )
          <fpage>S124</fpage>
          -
          <lpage>S144</lpage>
          . doi:
          <volume>10</volume>
          .1017/S1062798724000103.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mori</surname>
          </string-name>
          , C. Sousa de Oliveira,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yih</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ventresca, Assessing the performance gap between lexical and semantic models for information retrieval with formulaic legal language</article-title>
          ,
          <source>in: Proceedings of the Twentieth International Conference on Artificial Intelligence and Law</source>
          , ICAIL '25,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2026</year>
          , p.
          <fpage>114</fpage>
          -
          <lpage>128</lpage>
          . URL: https://doi.org/10.1145/ 3769126.3769205. doi:
          <volume>10</volume>
          .1145/3769126.3769205.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>N.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Injecting domain adaptation with learning-to-hash for efective and eficient zero-shot dense retrieval</article-title>
          ,
          <source>arXiv preprint arXiv:2205.11498</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv. 2205.11498.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Afzal</surname>
          </string-name>
          , G. Tsatsaronis,
          <article-title>Unsupervised dense retrieval for scientific articles</article-title>
          , in: Y.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          Lazaridou (Eds.),
          <source>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          , Association for Computational Linguistics, Abu Dhabi,
          <string-name>
            <surname>UAE</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>321</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .emnlp-industry.
          <volume>32</volume>
          /. doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2022</year>
          .emnlp-industry.
          <volume>32</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sinha</surname>
          </string-name>
          , P. S,
          <string-name>
            <given-names>R.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          , Bica:
          <article-title>Efective biomedical dense retrieval with citation-aware hard negatives (</article-title>
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2511.08029.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yakimovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beaugnon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Ozkirimli,</surname>
          </string-name>
          <article-title>Labels in a haystack: Approaches beyond supervised learning in biomedical applications</article-title>
          ,
          <source>Patterns</source>
          <volume>2</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.patter.
          <year>2021</year>
          .
          <volume>100383</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Llop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Multari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duran-Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Parra-Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Massucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>A weakly supervised textual entailment approach to zero-shot text classification</article-title>
          ,
          <source>in: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>286</fpage>
          -
          <lpage>296</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .eacl-main.
          <volume>22</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>LeCun, Dimensionality reduction by learning an invariant mapping</article-title>
          ,
          <source>in: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)</source>
          , volume
          <volume>2</volume>
          ,
          <year>2006</year>
          , pp.
          <fpage>1735</fpage>
          -
          <lpage>1742</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2006</year>
          .
          <volume>100</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-F.</given-names>
            <surname>Tang</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Overwijk</surname>
          </string-name>
          ,
          <article-title>Approximate nearest neighbor negative contrastive learning for dense text retrieval</article-title>
          , arXiv preprint arXiv:
          <year>2007</year>
          .
          <volume>00808</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>2007</year>
          .
          <volume>00808</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arivazhagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Language-agnostic bert sentence embedding</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>878</fpage>
          -
          <lpage>891</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>62</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Jørgensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Breitung</surname>
          </string-name>
          ,
          <article-title>Margins in contrastive learning: Evaluating multi-task retrieval for sentence embeddings</article-title>
          , in: R.
          <string-name>
            <surname>Johansson</surname>
          </string-name>
          , S. Stymne (Eds.),
          <source>Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies</source>
          (NoDaLiDa/Baltic-HLT
          <year>2025</year>
          ), University of Tartu Library, Tallinn, Estonia,
          <year>2025</year>
          , pp.
          <fpage>269</fpage>
          -
          <lpage>278</lpage>
          . URL: https://aclanthology.org/
          <year>2025</year>
          .nodalida-
          <volume>1</volume>
          .28/.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <article-title>Colbert: Eficient and efective passage search via contextualized late interaction over bert</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          . doi:
          <volume>10</volume>
          .1145/3397271.3401075.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dhulipala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jayaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mirrokni</surname>
          </string-name>
          ,
          <article-title>Muvera: multi-vector retrieval via fixed dimensional encodings</article-title>
          ,
          <source>in: Proceedings of the 38th International Conference on Neural Information Processing Systems</source>
          , NIPS '24, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2024</year>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2024/file/ b71cfefae46909178603b5bc6c11d3ae-Paper-Conference.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lahav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Falcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , S. Parasa,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shomron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Chau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          , et al.,
          <article-title>A search engine for discovery of scientific challenges and directions</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>36</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>11982</fpage>
          -
          <lpage>11990</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v36i11.
          <fpage>21456</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Malkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Yashunin</surname>
          </string-name>
          ,
          <article-title>Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>42</volume>
          (
          <year>2020</year>
          )
          <fpage>824</fpage>
          -
          <lpage>836</lpage>
          . URL: https://doi.org/10.1109/TPAMI.
          <year>2018</year>
          .
          <volume>2889473</volume>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2018</year>
          .
          <volume>2889473</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <source>Multilingual e5 text embeddings: A technical report</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2402.05672.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodriguez-Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Armentano-Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Melero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. d. G.</given-names>
            <surname>Bonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Pio</surname>
          </string-name>
          ,
          <article-title>The catalan language club</article-title>
          ,
          <source>arXiv preprint arXiv:2112</source>
          .
          <year>01894</year>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .48550/ arXiv.2112.
          <year>01894</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <article-title>Specter: Document-level representation learning using citation-informed transformers</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2270</fpage>
          -
          <lpage>2282</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . acl-main.
          <volume>207</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          . URL: https://proceedings.neurips. cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rutherford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Millican</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. B. Van Den Driessche</surname>
          </string-name>
          , J.
          <string-name>
            <surname>-B. Lespiau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Damoc</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
          </string-name>
          , et al.,
          <article-title>Improving language models by retrieving from trillions of tokens</article-title>
          , in: International conference on machine learning,
          <source>PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2206</fpage>
          -
          <lpage>2240</lpage>
          . URL: https://proceedings.mlr.press/v162/borgeaud22a.html.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Ma, P. Shi,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mr.</surname>
          </string-name>
          <article-title>TyDi: A multi-lingual benchmark for dense retrieval</article-title>
          , in: D.
          <string-name>
            <surname>Ataman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Birch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Firat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ruder</surname>
          </string-name>
          , G. G. Sahin (Eds.),
          <source>Proceedings of the 1st Workshop on Multilingual Representation Learning</source>
          , Association for Computational Linguistics, Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>137</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .mrl-
          <volume>1</volume>
          .12/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .mrl-
          <volume>1</volume>
          .
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>R.</given-names>
            <surname>Litschko</surname>
          </string-name>
          , I. Vulić, G. Glavaš,
          <article-title>Parameter-eficient neural reranking for cross-lingual and multilingual retrieval</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>C.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wanner</surname>
          </string-name>
          , K.-S. Choi,
          <string-name>
            <surname>P.-M. Ryu</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Donatelli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kurohashi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Paggio</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hahm</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>T. K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Santus</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bond</surname>
          </string-name>
          , S.-H. Na (Eds.),
          <source>Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>1071</fpage>
          -
          <lpage>1082</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .90/.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bajaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McNamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , et al.,
          <article-title>Ms-marco: A human generated machine reading comprehension dataset</article-title>
          ,
          <source>arXiv preprint arXiv:1611.09268</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.1611.09268.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>M.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lukács</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Miklos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          ,
          <article-title>Eficient natural language response suggestion for smart reply</article-title>
          ,
          <source>arXiv preprint arXiv:1705.00652</source>
          (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.1705.00652.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <article-title>Setting 4 diferent losses 3 32 per device 2e-5 (with 0.1 warm-up ratio</article-title>
          )
          <source>Best model selected based on R@1</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>