<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Retrieval-Reranking and LLM-Based Answer Generation for Biomedical QA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Poojan Vachharajani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Netaji Subhas University of Technology</institution>
          ,
          <addr-line>New Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents the methodology and results of our participation in Task 13b of the BioASQ 13 challenge under the team name PJs-team. We developed five distinct system configurations focusing on semantic retrieval, snippet selection, and large language model (LLM)-based answer generation. In Phase A, we utilized e5-base to embed PubMed titles and abstracts, retrieving the top 10,000 articles per query, followed by reranking using models such as modernbert-embed-base, GTE-large, and Granite-125M. An ensemble strategy in config-4 achieved the highest Mean Precision of 0.0619 for documents and 0.0284 for snippets. Snippet selection was guided by MiniLM-L6 similarity scoring. In Phase A+, we employed Claude Sonnet 3 for prompt-based answer generation. Notably, config-3 achieved 100% accuracy on Yes/No questions, while config-5 led in List question precision. For ideal answers, config-1 achieved the best R2 recall (0.2551). In Phase B, we evaluated both proprietary and open-source LLMs, including a LoRA-finetuned Qwen2.5-3B. While the finetuned model was competitive in ideal answer generation, proprietary LLMs outperformed it on exact-answer tasks. Our findings highlight the value of strategic reranking and model ensembling, and emphasize the trade-ofs between open and proprietary models in biomedical question answering.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Biomedical Question Answering</kwd>
        <kwd>Semantic Retrieval</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
        <kwd>Reranking</kwd>
        <kwd>BioASQ</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. System Description and Methodology</title>
      <p>Our system architecture for BioASQ Task 13b is modular, consisting of several key stages: initial
document retrieval, document reranking, snippet selection, and answer generation using LLMs. We
developed five configurations, primarily varying the reranking models and ensemble strategies in Phase
A, and the LLMs used for answer generation in Phases A+ and B.</p>
      <sec id="sec-2-1">
        <title>2.1. Phase A: Document and Snippet Retrieval</title>
        <p>Phase A required participants to retrieve relevant documents and snippets for given biomedical questions.
Our pipeline for this phase is detailed below.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Initial Document Retrieval</title>
          <p>
            The primary corpus for document retrieval was the PubMed Annual Baseline Repository, focusing on
titles and abstracts. The provided notebooks indicated initial data preparation involved concatenating
multiple CSV files containing PubMed data and subsequently generating embeddings for this corpus.
• Embedding Model: We utilized the ‘intfloat/e5-base‘ model [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] to generate embeddings for
all PubMed titles and abstracts. This model was chosen for its strong performance on semantic
retrieval tasks. The embedding process was carried out in chunks to manage memory.
• Retrieval Process: For each query, we first embedded the query text (prefixed with "query: "
as per ‘e5-base‘ best practices). We then performed a semantic search against the pre-computed
PubMed embeddings using cosine similarity. An HNSWlib index [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] was built for eficient
similarity search over the large embedding space. The top 10,000 articles were retrieved for each
query to serve as candidates for the reranking stage.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Document Reranking</title>
          <p>
            The initially retrieved 10,000 documents were subsequently reranked to improve relevance. We
experimented with three diferent reranking models across our configurations:
• ‘nomic-ai/modernbert-embed-base‘ (referred to as ‘mo‘) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]
• ‘thenlper/gte-large‘ (referred to as ‘gt‘) [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]
• ‘ibm-granite/granite-embedding-125m-english‘ (referred to as ‘gr‘) [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]
These models were selected to explore the impact of architectural diversity and training paradigms on
reranking performance: ‘mo‘ was expected to excel on modern web-like content, ‘gt‘ served as a strong
baseline for general semantic similarity tasks, and ‘gr‘ was included to assess how a compact, eficient
model from IBM performs in comparison to larger counterparts.
          </p>
          <p>The query was embedded using the respective reranking model (with "search_query: " prefix for
‘modernbert-embed-base‘), and the top-k documents were selected based on cosine similarity between
the query embedding and the document embeddings (concatenation of title and abstract, prefixed with
"search_document: " for ‘modernbert-embed-base‘). The specific configurations for reranking were:
• Config-1: Reranked using ‘modernbert-embed-base‘.
• Config-2: Reranked using ‘gte-large‘.
• Config-3: Reranked using ‘granite-embedding-125m-english‘.
• Config-4 and Config-5 (Ensemble): These configurations employed an ensemble strategy. The
top 5 results from each of the three individual rerankers (‘mo‘, ‘gt‘, ‘gr‘) were collected. The union
of these (up to 15) documents was then formed, and the final top 10 documents were selected
from this union based on their original scores or a simple unweighted combination.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.1.3. Snippet Selection</title>
          <p>
            From the top 10 reranked documents for each configuration, we selected relevant snippets.
• Model: ‘sentence-transformers/all-MiniLM-L6-v2‘ [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] was used for its eficiency and efectiveness
in semantic similarity tasks at the sentence level.
• Method: For each of the top 10 documents, the abstract was segmented into sentences using
NLTK’s sentence tokenizer. The query and each sentence were embedded using
‘all-MiniLM-L6v2‘. The sentence with the highest cosine similarity to the query was selected as the primary
snippet from that document. Up to 10 such snippets were selected across the top documents,
ordered by their similarity scores. The exact ofset information was recorded as required by the
BioASQ guidelines.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Phase A+: Answer Generation from System-Retrieved Context</title>
        <p>In Phase A+, we used the documents and snippets retrieved by our Phase A configurations to generate
exact and ideal answers.</p>
        <p>• LLM Used: ‘Claude Sonnet 3‘ (via Amazon Bedrock) was employed for this task.
• Prompting Strategy: We constructed a detailed prompt that included:
1. A system prompt outlining the task, desired JSON output format (including "ideal_answer"
and "exact_answer" fields), and instructions for each answer type (factoid, list, summary,
yes/no) as provided in the BioASQ guidelines and detailed in the notebooks.
2. A user prompt containing:
– Context: The selected snippets from Phase A, numbered and concatenated.
– Question: The original biomedical question.
– Format Instructions: Specific instructions for the expected ‘exact_answer‘ format
based on the question type (factoid, list, summary, yesno).</p>
        <p>The context provided to the LLM was a concatenation of the text from the selected snippets. The
LLM was instructed to generate its response within ‘&lt;JSON&gt;...&lt;/JSON&gt;‘ tags. Temperature was
set to 0.01 to promote factual responses.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Phase B: Answer Generation from Gold Context</title>
        <p>For Phase B, gold standard documents and snippets were provided by the BioASQ organizers. We
evaluated various LLMs for generating answers based on this gold context.</p>
        <p>
          • Proprietary LLMs: We utilized several models available through Amazon Bedrock, including
‘Claude Sonnet 3‘, ‘Claude Sonnet 3.7‘, ‘Claude Haiku 3.5‘, and ‘Amazon Nova Pro‘. The same
prompting strategy as in Phase A+ was used, but with the gold snippets as context.
• Open-Source LLM: We finetuned ‘Qwen/Qwen2.5-3B-Instruct‘ [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] using Low-Rank Adaptation
(LoRA) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The finetuning dataset was prepared from the BioASQ training data, formatting it
into a conversational structure with system prompts, user prompts (containing context, question,
and format instructions), and assistant responses (the gold JSON answers).
• Prompting for Finetuned Model: The finetuned ‘Qwen2.5-3B-Instruct‘ was inferenced using
vLLM [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], with a similar prompt structure to the proprietary models, adapted to its chat template.
        </p>
        <p>The context was formed from the provided gold snippets.</p>
        <p>For all LLM-based answer generation, the temperature was set to a low value (e.g., 0.01) to encourage
factual and deterministic outputs. Context length was managed by trimming input to the LLM to fit
within its maximum token limit, prioritizing the most recent or relevant parts of the context if necessary.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The primary dataset for this work was the BioASQ 13 Task 13b dataset. The development ("dry-run")
dataset, consisting of 5389 questions, was used for system development and internal validation. For
training the LoRA-finetuned ‘Qwen2.5-3B-Instruct‘ model, we utilized the oficial BioASQ training data
(‘training13b.json‘). The test phase involved processing questions released in batches by the BioASQ
organizers. The PubMed corpus used for retrieval was preprocessed and embeddings were generated.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation Metrics</title>
        <p>We adhered to the oficial BioASQ evaluation metrics for each phase and question type:
• Phase A (Documents and Snippets): Mean Precision (MP) for the top 10 retrieved documents
and snippets.
• Phase A+ and Phase B:
– Yes/No Questions: Accuracy (F1-score is also an oficial metric).
– Factoid Questions: Strict Accuracy (SA), Lenient Accuracy (LA), and Mean Reciprocal</p>
        <p>Rank (MRR).
– List Questions: Mean F1-score, Mean Precision, and Mean Recall.
– Ideal Answers (Summary, Factoid, List, Yes/No): ROUGE-2 Recall and ROUGE-SU4
Recall, as well as manual assessment scores (Readability, Information Recall, Information
Precision, Non-Ashkenazi score (NoASc)).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section details the performance of our five system configurations (config-1 to config-5) in the
BioASQ 13 Task 13b, Test Batch 1, from the oficial results from BioASQ.</p>
      <sec id="sec-4-1">
        <title>4.1. Phase A: Document and Snippet Retrieval</title>
        <p>Our ensemble strategy in Config-4 achieved the highest Mean Precision of 0.0619 for documents and
0.0284 for snippets among our configurations. The other configurations also performed competitively.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Phase A+: Answer Generation from System-Retrieved Context</title>
        <p>achieved 100% accuracy on Yes/No questions. Config-5 led our submissions
in List question F-Measure (0.2338). For ideal answers, Config-1
achieved the best ROUGE-2 Recall
(0.2551) among our configurations.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Phase B: Answer Generation from Gold Context</title>
        <p>configurations showed competitive results, with</p>
        <p>Config-5 (ensemble of Claude Sonnet 3.7 and Sonnet
3) achieving 0.5926 Strict Accuracy for Factoid questions and an F-measure of 0.5444 for List questions
in Test Batch 1. The LoRA-finetuned Qwen2.5-3B model (represented by Config-1 in the Phase B ideal
answer table, assuming this config used the finetuned model for ideal answers as implied by the abstract
for some Phase B evaluations) showed good generative capabilities.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Our participation in BioASQ 13 Task 13b provided several insights into building efective biomedical
question answering systems.</p>
      <p>The multi-stage retrieval and reranking pipeline in Phase A demonstrated its utility. While initial
semantic retrieval with ‘e5-base‘ cast a wide net, the subsequent reranking stage was crucial for refining
the document set. The ensemble strategy (Config-4) outperforming individual rerankers highlights the
benefit of combining diverse relevance signals. The ‘MiniLM-L6‘ model proved adequate for snippet
selection, though more advanced techniques could further improve this stage.</p>
      <p>The Phase A+ results underscored the dependency of LLM-based answer generation on the quality
of the retrieved context. The variation in best-performing configurations across diferent question
types (Config-3 for Yes/No, Config-5 for List, Config-1 for Ideal Answers) suggests that diferent
retrieval/reranking strategies might be optimal for diferent answer granularities.</p>
      <p>Phase B allowed for a more direct comparison of LLM capabilities using gold context. The
LoRAifnetuned ‘Qwen2.5-3B-Instruct‘ showed promise, particularly for generating coherent ideal answers.
This indicates that even smaller open-source models, when appropriately finetuned on domain-specific
data, can be competitive. However, for tasks requiring precise extraction of facts (factoid, list, and yes/no
exact answers), larger proprietary models like those from the Claude family generally exhibited superior
performance. This suggests a trade-of: finetuned models ofer customization and potentially lower cost,
while state-of-the-art proprietary models often provide higher accuracy on specific extraction tasks
out-of-the-box. The observed "ERROR" outputs from some Bedrock models (e.g., Nova Pro on certain
queries in our internal tests) highlight potential robustness issues or strict input format requirements
for these APIs, which might have impacted our oficial submissions if similar issues occurred.</p>
      <p>A key challenge remains the efective handling of long contexts and the synthesis of information from
multiple snippets. While LLMs are increasingly capable, ensuring they focus on the most relevant pieces
of information within a large context and avoid hallucination is critical, especially in the biomedical
domain where accuracy is paramount. The structured JSON output requirement also posed a challenge,
as LLMs can sometimes fail to adhere strictly to complex formatting instructions, necessitating robust
parsing and error handling.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>PJs-team’s participation in BioASQ 13 Task 13b successfully demonstrated a comprehensive pipeline for
biomedical question answering, integrating semantic retrieval, multi-model reranking, and LLM-based
answer generation. Our ensemble reranking strategy (Config-4) yielded our best results in Phase A for
both document and snippet retrieval. In Phases A+ and B, diferent configurations leveraging proprietary
LLMs (primarily Claude Sonnet 3 and its variants) showed strong performance across various question
types, particularly Config-3 for Yes/No questions and Config-5 for List questions in Phase A+. Our
LoRA-finetuned ‘Qwen2.5-3B-Instruct‘ was competitive for ideal answer generation in Phase B.</p>
      <p>Future work will focus on several areas. Firstly, we plan to explore more advanced reranking
models, possibly including cross-encoders, and by developing more sophisticated ensemble techniques.
Secondly, refining prompting strategies for LLMs, perhaps by incorporating few-shot examples or
developing adaptive prompting based on question complexity, could improve answer accuracy and
coherence. Thirdly, continued exploration of finetuning various open-source LLMs on larger and more
diverse biomedical QA datasets is crucial to improve their performance on exact answer tasks. Finally,
integrating methods for explicit biomedical entity recognition and relation extraction prior to or in
conjunction with LLM generation could further improve the precision and factuality of the answers.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank the BioASQ organizers for providing the dataset and the platform for this challenge. We also
acknowledge the developers of the open-source models and tools used in this work.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Gemini2.5 Pro in order to: Grammar and spelling
check. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Prompts Used for LLM-Based Answer Generation</title>
      <p>The core prompt structure used for interacting with the Large Language Models (LLMs) in Phases A+
and B is detailed below.</p>
      <sec id="sec-9-1">
        <title>A.1. System Prompt</title>
        <p>The following system prompt was provided to the LLM to set the context and overall task:
You will be given a biomedical question, some abstracts of
relevant research papers, and the format you have to answer in.
answer accurately in JSON in &lt;JSON&gt;..&lt;/JSON&gt; tags
Also output "ideal_answer" which is a single paragraph-sized text
ideally summarizing the most relevant information from articles and
snippets.It is intended to approximate a short text that a
biomedical expert would write to answer the corresponding question
(e.g., including prominent supportive information).
output:
&lt;JSON&gt;
{
}
&lt;/JSON&gt;</p>
      </sec>
      <sec id="sec-9-2">
        <title>A.2. User Prompt Template</title>
        <p>Context: {CONTEXT_SNIPPETS}
Question: {QUESTION_BODY}
Format: {FORMAT_INSTRUCTIONS}
The user prompt was dynamically constructed based on the question, retrieved context (snippets), and
the question type. The general template was:
• {CONTEXT_SNIPPETS}: Contained the concatenated text from the selected (or gold) snippets,
typically numbered for clarity.
• {QUESTION_BODY}: The biomedical question.
• {FORMAT_INSTRUCTIONS}: This section varied based on the question type (‘factoid‘, ‘list‘,
‘summary‘, ‘yesno‘) and included the specific instructions and example JSON output for the
‘exact_answer‘ field, as detailed in the BioASQ guidelines and reflected in our notebook implementations.</p>
        <p>For example, for a "factoid" question, this section would include:
These are questions that, strictly speaking, require a particular
entity name (e.g., of a disease, drug, or gene), a number, or a
similar short expression as an answer, though again a longer answer
may be desirable in practice.</p>
        <p>Return a list of lists. Each of the inner list (up to 5 inner lists
are allowed) should contain the name of the entity (or number, or
other similar short expression) sought by the question.</p>
        <p>No multiple names (synonyms) should be submitted for any entity,
therefore each inner list should only contain one element.</p>
        <p>Example output:
&lt;JSON&gt;
{
"ideal_answer":"...",
"exact_answer":[["autosomal dominant"],</p>
        <p>["Facioscapulohumeral muscular dystrophy (FSHD)"]]
}
&lt;/JSON&gt;</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tsatsaronis</surname>
          </string-name>
          , G. Balikas,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          , I. Partalas,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zschunke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Alvers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Polychronopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Almirantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavlopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baskiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gallinari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Artiéres</surname>
          </string-name>
          , A.
          <string-name>
            <surname>-C. Ngonga Ngomo</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Heino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Barrio-Alvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Schroeder</surname>
            , I. Androutsopoulos,
            <given-names>G. Paliouras,</given-names>
          </string-name>
          <article-title>An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition</article-title>
          ,
          <source>BMC Bioinformatics 16</source>
          (
          <year>2015</year>
          )
          <article-title>138</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12859-015-0564-6.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodriguez-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitriadis</surname>
          </string-name>
          , G. Tsoumakas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannakoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bekiaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. N. Maria</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Nunzio</surname>
          </string-name>
          , Giorgio,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinelli</surname>
          </string-name>
          , G. Silvello, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2025</year>
          :
          <article-title>The thirteenth BioASQ challenge on large-scale biomedical semantic indexing and question answering</article-title>
          , in: L.
          <string-name>
            <surname>P. A. G. S. d. H. J. M. F. P. P. R. D. S. G. F. N. F. Jorge Carrillo-de Albornoz</surname>
          </string-name>
          , Julio Gonzalo (Ed.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bougiatiotis</surname>
          </string-name>
          , G. Paliouras,
          <string-name>
            <surname>BioASQ-QA</surname>
          </string-name>
          :
          <article-title>A manually curated corpus for Biomedical Question Answering</article-title>
          ,
          <source>Scientific Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <fpage>170</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Text embeddings by weakly-supervised contrastive pre-training</article-title>
          ,
          <source>arXiv preprint arXiv:2212.03533</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Malkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Yashunin</surname>
          </string-name>
          ,
          <article-title>Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>42</volume>
          (
          <year>2020</year>
          )
          <fpage>824</fpage>
          -
          <lpage>836</lpage>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2018</year>
          .
          <volume>2889473</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Nussbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Duderstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mulyar</surname>
          </string-name>
          ,
          <article-title>Nomic embed: Training a reproducible long context text embedder</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>01613</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Towards general text embeddings with multi-stage contrastive learning</article-title>
          ,
          <source>arXiv preprint arXiv:2308.03281</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Abdelaziz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumaravel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stallone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Panda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rizk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. P. S.</given-names>
            <surname>Bhargav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gunasekara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ikbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Karanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Munawar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neelam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Raghu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Meza</given-names>
            <surname>Soria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sreedhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Venkateswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Unuvar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lastras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kapanipathi</surname>
          </string-name>
          ,
          <article-title>Granite-function calling model: Introducing function calling abilities via multi-task learning of granular tasks</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zhou, MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (NeurIPS
          <year>2020</year>
          ),
          <year>2020</year>
          , pp.
          <fpage>5776</fpage>
          -
          <lpage>5788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ge</surname>
          </string-name>
          , Y. Han,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Liu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lu</surname>
          </string-name>
          , J. Ma,
          <string-name>
            <given-names>R.</given-names>
            <surname>Men</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , T. Zhu,
          <source>Qwen technical report, arXiv preprint arXiv:2309.16609</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , W. Chen, LoRA:
          <article-title>Low-rank adaptation of large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2106.09685</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , I. Stoica,
          <article-title>Eficient memory management for large language model serving with pagedattention</article-title>
          ,
          <source>in: Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP '23)</source>
          , ACM,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1145/ 3600006.3613165.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>