<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anastasios Nentidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Katsimpras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Krithara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Paliouras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NCSR Demokritos</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an overview of the Question Answering (QA) tasks in the thirteenth edition of the BioASQ challenge, on large-scale biomedical semantic indexing and QA, which is part of the Conference and Labs of the Evaluation Forum (CLEF) 2025. For more than a decade, BioASQ has been serving as a key platform for advancing the state-of-the-art in biomedical information retrieval, NLP, and QA. In this paper, we present a comprehensive overview of the biomedical QA tasks 13b and Synergy13 of the thirteenth BioASQ challenge (BioASQ 13). This year, 49 teams with more than 160 systems participated in these two tasks of the challenge, with 46 focusing on task 13b, on biomedical semantic QA, and 5 on task Synergy13, on QA for open questions on developing biomedical topics. The competitive performance achieved by several participating systems in the QA tasks of BioASQ 13 highlights the continuous advancement of state-of-the-art methods in the field, in alignment with previous editions of the tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Biomedical knowledge</kwd>
        <kwd>Semantic Indexing</kwd>
        <kwd>Question Answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the Tasks</title>
      <sec id="sec-2-1">
        <title>2.1. Biomedical semantic QA - Task 13b</title>
        <p>
          Task 13b introduces a comprehensive question-answering challenge in the biomedical field, requiring
participants to develop systems that address all stages of question answering. As in previous editions,
the task focuses on four question types: ‘yes/no,’ ‘factoid,’ ‘list,’ and ‘summary’ questions [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          In the thirteenth edition of the BioASQ Challenge, participating teams received an updated version of
the BioASQ QA training dataset [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], containing 5,389 questions that had been annotated with relevant
golden elements and answers from previous task versions [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ]. These questions served as the basis
for developing their systems. Table 1 provides a detailed overview of both the training and testing sets
for task 13b. A notable observation from these statistics is the significantly larger average number of
documents and snippets within the training data compared to the test batches. This can be attributed
to two main factors. First, in the early years of BioASQ the annotation with relevant documents and
snippets by the experts was exhaustive, in an attempt to identify as many relevant items as possible in
the corpus. These questions are part of the training datasets afecting the average number of relevant
items per question. Currently, only a suficient number of relevant answers is required when the initial
version of the data is developed. Still, when the participants submit their responses, the experts assess
the submitted items and enrich the ground-truth data with potential additional relevant items detected
by the participants. The numbers of relevant items for the test sets in Table 1 are preliminary, before the
enrichment by the assessment process, which is still in progress. The final evaluation of the participants
will be against these enriched relevant items, ensuring that all the submitted items that are relevant are
indeed handled as such.
        </p>
        <p>
          Task 13b, similar to the previous version of the task (12b), was structured into three phases [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. As in
older task versions, task 13b was divided into four independent bi-weekly batches, and the three phases
for each batch ran for two consecutive days [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The three phases of task 13b included: (phase A)
the retrieval of the required information, (phase A+) answering the question without golden feedback,
and (phase B) answering the question with golden feedback, which ran for two consecutive days for
each batch. Participants were given 24 hours to submit their system’s responses after receiving the
test set for each respective phase. This year, each test set consisted of 85 questions. For each test
set, the respective questions, written in English, were released for phase A, requiring participants to
identify and submit relevant elements from designated resources, including PubMed/MedLine articles
and snippets extracted from these articles. Then, these questions were also released in phase A+ and the
participating systems were asked to respond with exact answers, that is, entity names or short phrases,
and ideal answers, that is, natural language summaries of the requested information. Finally, during
phase B, manually selected relevant articles and snippets related to these questions were also made
available, and participating systems were once again asked to provide exact answers and ideal answers.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Synergy13 Task</title>
        <p>Introduced in the ninth BioASQ edition [13], the Synergy task aimed to foster collaboration between
biomedical experts studying COVID-19 and automated question-answering systems participating in
BioASQ. The core objective is to create a synergy where experts assess system responses, and this
feedback is used to iteratively improve the systems [14]. The task continued to focus on COVID-19
in the tenth edition of BioASQ [15], and it was extended to any developing biomedical topic in the
subsequent editions of the challenge [16, 17, 18].</p>
        <p>In the process depicted in Figure 1, competing systems provide their initial responses to open
questions related to emerging problems. These responses, along with relevant documents and snippets,
are evaluated by experts. Subsequently, the experts provide feedback to the systems and address any
new or pending questions.</p>
        <p>
          As in previous years, the Synergy task (Synergy13) comprised four rounds, with a two-week interval
between each round, and focused on emerging biomedical topics, drawing from relevant documents
in the current PubMed version [
          <xref ref-type="bibr" rid="ref11 ref12">12, 11</xref>
          ]. Consistent with earlier versions, the questions posed were
open-ended, allowing for dynamic responses [19, 20].
        </p>
        <p>
          In the Synergy task, during each round, the system responses and expert feedback address the same
questions, unless those questions have already been closed by experts due to receiving a comprehensive
and definite answer. Specifically, in Synergy13, a group of four biomedical experts contributed a total of
74 open biomedical questions. This set includes 47 new questions on developing health topics, such as
infectious, rare, and genetic diseases, and women’s and reproductive health, and 27 questions from the
previous version of the task [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which remained open and were enriched with more recent evidence
and updated answers. The experts evaluated the retrieved material (including documents and snippets)
and the responses submitted by participating systems in all four rounds. Table 2 shows the details of
the datasets used in task Synergy13.
        </p>
        <p>Synergy13, similar to task 13b, addresses four question types: yes/no, factoid, list, and summary, and
two types of answers, exact and ideal. Moreover, the evaluation of systems relies on the same measures
used in task 13b. Upon completing the Synergy13 task, relevant material was identified for answering
roughly 80% of the questions. Additionally, around 47% of the questions had at least one ideal answer
submitted by the systems, which was deemed satisfactory by the expert who posed the question.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Overview of participation</title>
      <p>In this year’s BioASQ challenge, over 160 distinct systems engaged in tasks 13b and Synergy13 with a
total of 49 teams, 39 of which were new. The high percentage of teams joining the BioASQ challenge
for the first time, which is almost 80%, indicates the enduring interest of the community in large-scale
biomedical question answering. Specifically, 46 of these teams submitted on at least some phase of task
13b and 5 on task Synergy13, with several teams participating in more than one task and phases, as
demonstrated in Figure 2.</p>
      <p>
        In line with previous years [
        <xref ref-type="bibr" rid="ref12">20, 12</xref>
        ], task b attracted more participants than Synergy, with a large
increase in the total number of participating teams this year in comparison to last year, as illustrated in
Figure 3. The increased participation in task 13b can be partly attributed to 15 student teams from a
course on Advanced Information Retrieval at TU Wien, which incorporated the phase A of BioASQ task
13b as part of the course assignments, highlighting the educational potential of BioASQ. Specifically,
these teams account for about 38% of all new teams in the QA tasks of BioASQ 13.
3.1. Task 13b
This year, 46 teams participated in task 13b, submitting a total of 146 distinct systems across all three
phases A, A+, and B. Specifically, 34, 20, and 26 teams competed in phases A, A+, and B, with 95, 79, and
88 distinct systems, respectively. Eleven of these teams were involved in all three phases, as depicted in
Figure 2.
      </p>
      <p>An overview of the technologies utilized by the teams is outlined in Table 3. Additional details for
specific systems can be found in the workshop proceedings. As in previous years, the open-source
system OAQA [21], which achieved top performance in older editions of BioASQ [22, 23], was used
as a baseline for phase B exact answers. This system is based on the UIMA framework and relies on
UA
UR
traditional NLP and Machine Learning approaches and tools, such as MetaMap and LingPipe 1.</p>
      <p>The UA team from the University of Aveiro participated in all three phases of the task with five
systems. In phase A, their systems followed a two-stage retrieval pipeline in line with previous years’
submissions [41, 42]. To enhance the BM25 results they used the BGE-M3 model, and reciprocal rank
fusion to combine model outputs. For phases A+ and B, their systems adapted RAG with prompts
incorporating relevant abstracts using instruction-based transformer models such as llama-3 70B and
Gemma-3 27B.</p>
      <p>The UR team from the Universität Regensburg competed in all phases of the task with five systems.
Their systems employed few-shot prompting (typically 0-shot or 10-shot configurations) and an
experimental two-step self-feedback mechanism. In this feedback approach, an LLM generated an initial
answer, then provided critical feedback on its own output, and finally refined the answer based on
this critique. This process was applied across yes/no, factoid, list, and ideal answer types. For the
retrieval stages, their systems utilized LLM-driven query expansion, which also incorporated a similar
self-feedback loop for query refinement based on initial search results, coupled with Elasticsearch for
document retrieval, followed by LLM-based snippet extraction and re-ranking.</p>
      <p>The NCU team from the National Central University participated with five systems. Their systems
employed a basic Retrieval-Augmented Generation (RAG) framework consisting of a retriever, a reranker,
and a LLM for language generation. The initial retriever used was a BM25, and the documents were
further re-ranked using the bge-eranker-v2-m3 model to identify the most relevant articles and snippets.
For answer generation, their systems used the meta-llama/Llama-3.1-8B-Instruct and GPT-4o modles.
Furthermore, for the Phase A+ task, they extended the answer generation pipeline previously developed
by Chih et al. [43]. For phase B, the systems utilized GPT-4o, o3-mini, and Gemini 2.0 Flash. Each
system employed either direct or two-stage generation for ideal and exact answers.</p>
      <p>Another team participating in all phases is the team from the BSRC Alexander Fleming Institute. For
Phases A and A+, their systems focused on document retrieval using the BM25 algorithm enhanced
with RM3, and re-ranking based on the relevance of associated text snippets, similar to the previous
year participation [44]. In phases A+ and B, their systems explored three distinct prompting strategies
for generating exact answers (snippet-based, abstract-based, and extended abstract-based). To further
enhance answer quality, their systems build upon their earlier LLM ensemble strategy [44] and extend
it to all exact answer types. For list and factoid questions, they use the union of answers across multiple
LLMs to maximize recall.</p>
      <p>The PJs team participated in all three phases, submitting five system configurations in total. For
Phase A, their systems employed dense retrieval using the e5-base model and then re-ranking using
diferent models (e.g. modernbert-embed-base, gte-large, granite-25m, gte ensemble, modernbert, and
granite). For snippet selection, MiniLM-L6 was utilized. For Phase A+, their systems used Claude Sonnet
3 with prompt engineering on the top retrieved results, while for Phase B, the the same prompt design
was utilized with diferent LLM modles including Qwen2.5-3B, Claude Sonnet 3.7, Claude Haiku 3.5,
and an ensemble of Sonnet 3.7 and Sonnet 3.0 with post-processing to correct errors.</p>
      <p>Also, the JU_NLP team from Jadavpur University participated in all phases. For phase A, their
systems perform dense retrieval using a fine-tuned minilm model. Then, a RoBERTa-base-SQuAD2
model extracts the answer snippet. For phase B, a separately fine-tuned T5 model for generating ideal
and exact answers is utilized. Finally, phase A+ combines these processes, first using the Phase A system
to extract relevant articles and snippets, and then feeding those results into the Phase B system to
produce ideal and exact answer.</p>
      <p>The UniTor team from Università degli Studi di Roma Tor Vergata participated in all three phases.
Their systems employed a multi-stage pipeline that combined sparse and dense retrieval techniques,
supervised re-ranking, supervised LLM snippet extraction, and supervised LLM answer generation.
All components were trained and evaluated using oficial BioASQ datasets and external biomedical
resources such as PubMed.</p>
      <p>The UniBol team from Universidad Tecnológica de Bolivar participated in all three phases. Their
systems employed the PubMedBert model to perform dense retrieval and Qwen3 8b with prompt
engineer to generate answers.</p>
      <p>Another team that participated in all phases is the lasigeBioTM team. Their systems used Mistral as
the baseline model for all phases. Furthermore, they incorporated external knowledge from various
ontologies and knowledge bases like Human Disease Ontology and NCBI Gene, with BENT tool
used for named-entity recognition and linking. Also, some systems employed a Decoding on Graphs
methodology, utilizing Monarch KG and MARISA Trie for structured reasoning.</p>
      <p>The AQAMS team from Universidad Europea took part in all phases. For phases A and A+ their
systems used a two-stage pipeline. First, LlamaIndex embeddings were used to perform dense retrieval.
Then, OpenAI GPT models with structured prompt templates were utilized to generate answers. For
phase B, their systems focused on biomedical NER using a fine-tuned model, optionally mapping terms
to UMLS concepts, and generating answers, all deployed via a user-friendly Streamlit interface.</p>
      <p>The ZUT team from Zhongyuan University of Technology participated in all phases. For phase A,
their systems employed a query expansion-driven multi-stage retrieval and re-ranking framework that
utilized diferent DeepSeek models. In phases A+ and B, their systems utilized diferent prompting
strategies and also experimented with supervised fine-tuning of LLMs.</p>
      <p>The FSU team from Florida State University participated in phases A and A+. Their systems followed
a multi-stage retrieval and reasoning framework, leveraging an LLM for keyword extraction and answer
generation, along with an internal search engine for document retrieval. A re-ranker model and an
LLM-based scoring mechanism filter retrieved documents for quality. Additionally, their systems
integrate structured biomedical knowledge from IKraph, an in-house knowledge graph built from
PubMed abstracts using large-scale relation extraction and causal inference.</p>
      <p>The GT team from Georgia Tech participated in both phases A and B. For phase A, their systems
employed a custom index built with the bge-large-en embedding model and used a fine-tuned
ms-marco12 model for re-ranking, with some configurations also incorporating GPT-4o as an additional re-ranker.
In phase B, the team utilized Mistral-7B-Instruct-v0.3 and GPT-4o-turbo, with a prompt-based approach
using few-shot examples to generate answers.</p>
      <p>The MQU team from Macquarie University participated in phases A+ and B. Their systems employed
a zero-shot QA framework, using prompting multiple LLMs (Gemini variants and Claude) to generate
answers based on snippets and full abstracts. A secondary synthesis step with confidence-based re-tries
refines the candidate answers into a final response, improving precision and consistency across yes/no
and factoid questions.</p>
      <p>Also, the HCMC team from the University of Science participated in phases A+ and B. Their systems
ifrst distinguish queries as single-hop or multi-hop; multi-hop queries are decomposed into sub-queries.
Relevant documents are retrieved using the PubMed API, with queries reformulated into Conjunctive
Normal Form (CNF). Document sentences are encoded using SciSpaCy, and dense retrieval is performed
using all-MiniLM-L12-v2 model. Finally, GPT-4o-mini is utilized to generate answers.</p>
      <p>The VCU team from Virginia Commonwealth University participated with four diferent systems in
phase B. Their systems are based on a zero-shot learning approach using generative LLMs, including
Synthia-13B-GPTQ and llama-3. Their systems heavily relied on prompt engineering and answer
processing.</p>
      <p>The Evidence team from Evidence Prime participated in phase B. Their systems employed dense
retrieval using the nomic-embed-text-v1 model. Various open-source LLMs and proprietary models
were used, including GPT-4o, GPT-4.1, Claude 3.5, and Claude 3.7. For Yes/No and Factoid/List questions,
an ensemble strategy was implemented, using voting or frequency analysis of outputs from multiple
LLMs. Summary responses were generated using carefully designed, hand-crafted prompts to balance
contextual similarity and appropriate length.</p>
      <p>The DMIS team from the Korea University participated in phase B. Their systems employed multiple
LLMs, including GPT-4o-mini, GPT-4, and Claude. They explored various prompting strategies, such
as standard instruction, one-by-one snippet querying, randomized snippet order, and a no-snippet
condition relying on prior knowledge. Final answers were derived through ensemble techniques, either
by aggregating log probability scores or using majority voting.</p>
      <p>Another team that participated only in phase B is the UTB team from the Universidad Tecnológica
de Bolívar. Their systems were based on the well-known BERT model.</p>
      <sec id="sec-3-1">
        <title>3.2. Synergy13 Task</title>
        <p>In the thirteenth edition of BioASQ, five teams participated in the Synergy task (Synergy13). These
teams submitted 46 runs from 21 distinct systems. Two of these teams participated in task 13b as
well, while the remaining three focused exclusively on task Synergy13. An overview of systems and
approaches employed is provided in Table 4.</p>
        <p>In particular, the BSRC Alexander Fleming team participated with four systems. Similar to task
b, their systems focused on LLMs, specifically the DeepSeek-R1 model, with optimized prompts and
majority voting. Also, the SINAI team from the Universidad de Jaén competed with five systems. Their
systems built upon prior research by integrating a lightweight NER-based query module, dynamic
indexing of PubMed abstracts, and few-shot prompt templates tailored to diferent question types.
These components are utilized within a RAG framework based on a biomedical fine-tuned LLaMA
model. The system employs a multi-stage pipeline—comprising data preparation, context extraction,
and response generation—designed specifically for biomedical question answering across the required
formats (summary, yes/no, factoid, list).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>4.1. Task 13b
This section presents the evaluation measures and preliminary results for the task 13b. These results are
preliminary, as the final results will be available after the manual assessment of the system responses by
the BioASQ team of experts and the enrichment of the ground truth with potential additional relevant
items, answer elements, and/or synonyms, which is still in progress.</p>
      <p>Phase A: The Mean Average Precision (MAP) was used for evaluation on document retrieval. In
particular, since BioASQ8 [46], MAP calculation is based on a modified version of Average Precision
(AP) that considers both the limit of 10 elements allowed per question in each submission and the
actual number of golden elements that is often less than 10 in practice [47]. For snippets, where a
single ground-truth snippet may overlap with several submitted ones, the interpretation of MAP is
less straightforward. Hence, since BioASQ9 [13], we use the F-measure which is based on character
overlaps2 [48]. Tables 5 and 6 present some indicative results in batch 1 for document and snippet
retrieval, respectively. The full 13b results for phase A are available online3.</p>
      <p>
        Phases A+ and B: The oficial ranking for systems providing ideal answers is based on manual scores
assigned by the BioASQ team of experts that assesses each ideal answer in the responses [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The final
position of systems providing exact answers is based on their average ranking in the three question
types where exact answers are required, that is “yes/no”, “list”, and “factoid”. Summary questions for
which no exact answers are submitted are not considered in this ranking. In particular, the mean F1
measure is used for the ranking in list questions, the Mean Reciprocal Rank (MRR) is used for the
ranking in factoid questions, and the F1 measure, macro-averaged over the classes of yes and no, is
used for yes/no questions. Tables 7 and 8 present some indicative results on exact answer extraction.
The full 13b results for both phase A+4 and B5 are available online.
      </p>
      <p>The top performance of the participating systems in exact answer generation for each type of question
during the thirteen years of BioASQ is presented in Figure 4. The preliminary results for task 13b, reveal
that the participating systems keep achieving high scores in answering all types of questions, despite
the addition of two new experts to the BioASQ team. In batch 2, phase B, for instance, presented in
Table 8, several systems manage to correctly answer literally all yes/no questions, as well as in batches
2http://participants-area.bioasq.org/Tasks/b/eval_meas_2022/
3http://participants-area.bioasq.org/results/13b/phaseA/
4http://participants-area.bioasq.org/results/13b/phaseAplus/
5http://participants-area.bioasq.org/results/13b/phaseB/
1 and 4. In phase A+, on the other hand, where no ground truth relevant material is available, correctly
answering all yes/no questions is more challenging, though not infeasible. In batch 2, for instance,
presented in Table 7, only one system manages to do so. The preliminary results of task 13b, Phase B,
reveal improved and more consistent performance for list questions compared to the previous years,
but room for improvement is still available, as is for factoid questions, where the performance across
batches presents more fluctuations.</p>
      <sec id="sec-4-1">
        <title>4.2. Task Synergy13</title>
        <p>In task Synergy13, no relevant material was initially available for new questions. For old questions,
however, feedback from previous rounds was provided per question, that is the documents and snippets
submitted by the participants with manual annotations of their relevance. Hence, the documents and
snippets of the feedback, that have already been assessed and released, were not considered valid for
submission in the subsequent rounds. As in task 13b, the evaluation measures for document and snippet
retrieval are MAP and F-measure respectively.</p>
        <p>
          In addition, due to the developing nature of the topic, no answer is available for all of the open
questions in each round. Therefore only the questions indicated as “answer ready” were evaluated for
exact and ideal answers in each round. Regarding the ideal answers, the systems were ranked according
to manual scores assigned to them by the BioASQ experts during the assessment of systems responses
as in phase B of task B [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. As regards evaluation for the exact answers, similarly to task 13b, the mean
F1 measure, the Mean Reciprocal Rank (MRR), and the macro F1 measure are used for the ranking in list,
factoid, and yes/no questions respectively. Any exact or ideal answer that was assessed as ground-truth
quality by the experts, was included in the feedback and provided to the participants before the next
round.
        </p>
        <p>Some indicative results for the Synergy task are presented in Table 9. The full Synergy13 results are
available online6. Overall, the collaboration between participating biomedical experts and
questionanswering systems allowed the progressive identification of relevant material and extraction of exact
and ideal answers for several open questions for developing problems, such as infectious, rare, and
genetic diseases, and women’s and reproductive health. In total, after the four rounds of Synergy13,
enough relevant material was identified to provide an answer to about 80% of the questions. In addition,
about 47% of the questions had at least one ideal answer, submitted by the systems, which was considered
of ground-truth quality by the respective expert.
6http://participants-area.bioasq.org/results/synergy_v2025/</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper, we introduced the thirteenth version of the BioASQ challenge, focusing on the question
answering tasks b and Synergy. These tasks have been well-established through previous versions of
the challenge and remain timely and relevant, as indicated by the increased participation.</p>
      <p>The preliminary results of task 13b reveal the strong performance of top participating systems,
particularly in generating yes/no answers, even under the constraints of Phase A+, where no
groundtruth relevant documents and snippets were provided. System performance on list and factoid questions
was more variable, especially in Phase A+, highlighting the presence of room for improvement. These
results suggest that access to ground truth relevant material significantly enhances response quality for
more complex question types. This emphasizes the critical role of Phase A, which involves the automatic
retrieval of relevant documents and snippets for answering a biomedical question. Performance in
Phase A showed greater variability across batches, potentially influenced by the domain expertise
of the experts who authored the questions. A diverse set of retrieval and generation strategies was
employed, including traditional IR methods, large language model (LLM)-based approaches, and systems
enriched with domain-specific biomedical knowledge. Finally, the results of Synergy13 highlight that
state-of-the-art QA systems can be useful in aiding biomedical researchers to address their specialized
information needs, in alignment with the results of previous versions of the task, despite the persisting
challenges and room for improvement.</p>
      <p>Overall, several participating systems achieved competitive performance on the QA tasks of BioASQ
13, and some of them managed to improve over the state-of-the-art performance from previous years.
Therefore, twelve years after its initial introduction, BioASQ keeps pushing the research frontier in
biomedical question answering, ofering two QA tasks and several response types that cover a range of
biomedical information needs.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The thirteenth edition of BioASQ is sponsored by Ovid, Atypon Systems Inc, and Elsevier. The
MEDLINE/PubMed data resources considered in this work were accessed courtesy of the U.S. National
Library of Medicine. BioASQ is grateful to the CMU team for providing the exact answer baselines for
task 13b.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly and ChatGPT to perform the following
tasks: grammar and spelling checks, Paraphrasing, and rewording. After using these tools, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
[13] BioASQ at CLEF2021: Large-Scale Biomedical Semantic Indexing and Question Answering, in:
Advances in Information Retrieval, Springer International Publishing, Springer International
Publishing, Cham, 2021.
[14] A. Krithara, A. Nentidis, E. Vandorou, G. Katsimpras, Y. Almirantis, M. Arnal, A. Bunevicius,
E. Farré-Maduell, M. Kassiss, V. Konstantakos, S. Matis-Mitchell, D. Polychronopoulos, J.
RodriguezPascual, E. G. Samaras, M. Samiotaki, D. Sanoudou, A. Vozi, G. Paliouras, BioASQ Synergy: a
dialogue between question-answering systems and biomedical experts for promoting
COVID19 research, Journal of the American Medical Informatics Association (2024) ocae232. URL:
https://doi.org/10.1093/jamia/ocae232. doi:10.1093/jamia/ocae232.
[15] A. Nentidis, A. Krithara, G. Paliouras, L. Gasco, M. Krallinger, BioASQ at CLEF2022: The Tenth
Edition of the Large-scale Biomedical Semantic Indexing and Question Answering Challenge,
in: Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022,
Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, Springer, Springer, 2022. URL: https:
//link.springer.com/chapter/10.1007/978-3-030-99739-7_53.
[16] A. Nentidis, A. Krithara, G. Paliouras, E. Farre-Maduell, S. Lima-Lopez, M. Krallinger, BioASQ at
CLEF2023: The Eleventh Edition of the Large-Scale Biomedical Semantic Indexing and Question
Answering Challenge, in: Advances in Information Retrieval: 45th European Conference on
Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III, Springer,
2023, pp. 577–584.
[17] A. Nentidis, A. Krithara, G. Paliouras, M. Krallinger, L. G. Sanchez, S. Lima, E. Farre,
N. Loukachevitch, V. Davydova, E. Tutubalina, BioASQ at CLEF2024: The Twelfth Edition of the
Large-Scale Biomedical Semantic Indexing and Question Answering Challenge, in: European
Conference on Information Retrieval, Springer, 2024, pp. 490–497.
[18] A. Nentidis, G. Katsimpras, A. Krithara, M. Krallinger, M. R. Ortega, N. Loukachevitch,
A. Sakhovskiy, E. Tutubalina, G. Tsoumakas, G. Giannakoulas, A. Bekiaridou, A. Samaras, G. M.
Di Nunzio, N. Ferro, S. Marchesin, L. Menotti, G. Silvello, G. Paliouras, BioASQ at CLEF2025: The
Thirteenth Edition of the Large-Scale Biomedical Semantic Indexing and Question Answering
Challenge, in: Advances in Information Retrieval, Springer Nature Switzerland, Cham, 2025, pp.
407–415.
[19] A. Nentidis, G. Katsimpras, E. Vandorou, A. Krithara, G. Paliouras, Overview of bioasq tasks 9a, 9b
and synergy in clef2021, in: Proceedings of the 9th BioASQ Workshop A challenge on large-scale
biomedical semantic indexing and question answering, 2021. URL: http://ceur-ws.org/Vol-2936/
paper-10.pdf.
[20] A. Nentidis, G. Katsimpras, E. Vandorou, A. Krithara, G. Paliouras, Overview of BioASQ Tasks
10a, 10b and Synergy10 in CLEF2022, in: Proceedings of the 10th BioASQ Workshop A challenge
on large-scale biomedical semantic indexing and question answering, 2022. URL: https://ceur-ws.
org/Vol-3180/paper-10.pdf.
[21] Z. Y. Y. Z. E. Nyberg, Learning to Answer Biomedical Questions: OAQA at BioASQ 4B, ACL 2016
(2016) 23.
[22] G. Balikas, A. Kosmopoulos, A. Krithara, G. Paliouras, I. Kakadiaris, Results of the BioASQ tasks
of the question answering lab at CLEF 2015, CEUR Workshop Proceedings 1391 (2015).
[23] A. Krithara, A. Nentidis, G. Paliouras, I. Kakadiaris, Results of the 4th edition of BioASQ Challenge,
in: Proceedings of the Fourth BioASQ workshop, Association for Computational Linguistics,
Stroudsburg, PA, USA, 2016, pp. 1–7. URL: http://aclweb.org/anthology/W16-3101. doi:10.18653/
v1/W16-3101.
[24] R. A. A. Jonker, T. Almeida, J. Almeida, S. Matos, BIT.UA at BioASQ 13B: Revisiting Evaluation,
DPRF-Enhanced Retrieval and Fine-Tuned LLMs, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
CLEF 2025 Working Notes, 2025.
[25] S. Ateia, U. Kruschwitz, Can Language Models Critique Themselves? Investigating Self-Feedback
for Retrieval Augmented Generation at BioASQ 2025 , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), CLEF 2025 Working Notes, 2025.
[26] C. Bing-Chen, J.-C. Han, H.-C. Hung, R. T.-H. Tsai, NCU-IISR: Biomedical Question Answering via
Gemini and GPT APIs in the BioASQ 13b Phase B Challenge , in: G. Faggioli, N. Ferro, P. Rosso,
D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[27] J.-C. Han, B.-C. Chih, H.-C. Hung, R. T.-H. Tsai, A Retrieval-Augmented Generation Approach
for BioASQ 13b Phase A and A+ , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025
Working Notes, 2025.
[28] D. Panou, A. Dimopoulos, M. Koubarakis, M. Reczko, Harnessing Collective Intelligence of LLMs
for Robust Biomedical QA: A Multi-Model Approach , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), CLEF 2025 Working Notes, 2025.
[29] P. Vachharajani, Exploring Retrieval-Reranking and LLM-Based Answer Generation for Biomedical</p>
      <p>QA , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[30] H. P. Gupta, R. Banerjee, LLMs for Biomedical NER , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), CLEF 2025 Working Notes, 2025.
[31] F. Borazio, D. Croce, R. Basili, UniTor at BioASQ 2025: Modular Biomedical QA with Synthetic
Snippets and Multiple Task Answer Generation , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
CLEF 2025 Working Notes, 2025.
[32] A. Morillo, E. Puertas, J. C. M. Santos, J. S. Castaneda, C. A. Palomino, VerbanexAI at BioASQ
13B: PubMed API and LLM-Driven Hybrid Retrieval for Biomedical Question Answering , in:
G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[33] P. R. C. Lopes, S. I. R. Conceição, M. Fernandes, F. M. Couto, lasigeBioTM: A lean biomedical QA
system empowered by structured knowledge , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
CLEF 2025 Working Notes, 2025.
[34] J. Angulo, V. Yeste, AQAMS and AQAMS2: Multi Agent Systems for Biomedical Question
Answering , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[35] J. Tang, H. Yang, K. Xiong, H. Li, P. Quaresma, H. Yu, W. Zhang, M. Song, Y. Jiang, Applying
DeepSeek to BioASQ Task 13B: Using Supervised Fine-Tuning and Few-Shot Learning , in:
G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[36] S. Verma, F. Jiang, X. Xue, Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with
LLMs for Biomedical QA , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working
Notes, 2025.
[37] D. Galat, D. Molla-Aliod, LLM Ensemble for RAG: Role of context length in zero-shot Question
Answering for BioASQ Challenge , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025
Working Notes, 2025.
[38] E. Quiñones, E. A. P. D. Castillo, Evaluation of a System for Generating Exact and Ideal Responses
, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[39] D. Stachura, J. Konieczna, A. Nowak, Are Smaller Open-Weight LLMs Closing the Gap to
Proprietary Models for Biomedical Question Answering? , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), CLEF 2025 Working Notes, 2025.
[40] H. Kim, H. Lee, Y. Cho, J. Park, J. Park, S. Park, Y. T. Chok, S. Baek, D. Lee, J. Kang, Prompting
Matters: Snippet-Aware Strategies for Biomedical QA with LLMs in BioASQ 13b , in: G. Faggioli,
N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[41] T. Almeida, R. A. A. Jonker, R. Poudel, J. M. Silva, S. Matos, Bit. ua at bioasq 11b: Two-stage ir
with synthetic training and zero-shot answer generation., in: G. Faggioli, N. Ferro, P. Galuščáková,
A. García Seco de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the
Evaluation Forum, 2024.
[42] T. Almeida, R. A. Jonker, J. Reis, J. R. Almeida, S. Matos, Bit. ua at bioasq 12: From retrieval to
answer generation, in: G. Faggioli, N. Ferro, P. Galuščáková, A. García Seco de Herrera (Eds.),
Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, 2024.
[43] B.-C. Chih, J.-C. Han, R. Tzong-Han Tsai, NCU-IISR: Enhancing Biomedical Question Answering
with GPT-4 and Retrieval Augmented Generation in BioASQ 12b Phase B, in: G. Faggioli, N. Ferro,
P. Galuščáková, A. García Seco de Herrera (Eds.), CLEF Working Notes, 2024.
[44] D. Panou, A. Dimopoulos, M. Reczko, Farming Open LLMs for Biomedical Question Answering,
in: G. Faggioli, N. Ferro, P. Galuščáková, A. García Seco de Herrera (Eds.), CLEF Working Notes,
2024.
[45] S. D. Romero, L. A. Ureña-López, E. Martínez-Cámara, SINAI at CLEF 2025: A Multi-Stage RAG
Pipeline for Biomedical Semantic Question Answering , in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), CLEF 2025 Working Notes, 2025.
[46] M. Krallinger, A. Krithara, A. Nentidis, G. Paliouras, M. Villegas, BioASQ at CLEF2020: large-scale
biomedical semantic indexing and question answering, in: European Conference on Information
Retrieval, Springer, 2020, pp. 550–556.
[47] A. Nentidis, A. Krithara, K. Bougiatiotis, M. Krallinger, C. Rodriguez-Penagos, M. Villegas,
G. Paliouras, Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical
Semantic Indexing and Question Answering, in: Experimental IR Meets Multilinguality,
Multimodality, and Interaction Proceedings of the Eleventh International Conference of the CLEF
Association (CLEF 2020), Thessaloniki, Greece, September 22–25, 2020, Proceedings, volume 12260,
Springer, 2020.
[48] A. Nentidis, G. Katsimpras, E. Vandorou, A. Krithara, L. Gasco, M. Krallinger, G. Paliouras, Overview
of BioASQ 2021: The Ninth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and
Question Answering, in: International Conference of the Cross-Language Evaluation Forum for
European Languages, Springer, 2021, pp. 239–263.
[49] A. Nentidis, A. Krithara, K. Bougiatiotis, G. Paliouras, I. Kakadiaris, Results of the sixth edition
of the BioASQ Challenge, in: Proceedings of the 6th BioASQ Workshop A challenge on
largescale biomedical semantic indexing and question answering, 1, Association for Computational
Linguistics, Brussels, Belgium, 2018, pp. 1–10. URL: http://aclweb.org/anthology/W18-5301. doi:10.
18653/v1/W18-5301.
[50] A. Nentidis, G. Katsimpras, E. Vandorou, A. Krithara, A. Miranda-Escalada, L. Gasco, M. Krallinger,
G. Paliouras, Overview of BioASQ 2022: The Tenth BioASQ Challenge on Large-Scale Biomedical
Semantic Indexing and Question Answering, in: Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume
13390 LNCS, 2022, pp. 337–361. doi:10.1007/978-3-031-13643-6_22. arXiv:2210.06852.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodriguez-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitriadis</surname>
          </string-name>
          , G. Tsoumakas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannakoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bekiaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinelli</surname>
          </string-name>
          , G. Silvello, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2025</year>
          :
          <article-title>The thirteenth BioASQ challenge on large-scale biomedical semantic indexing and question answering</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodríguez-Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Escolano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Melero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pratesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vigil-Gimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, Overview of MultiClinSum task at BioASQ 2025: evaluation of clinical case summarization strategies for multiple languages: data, evaluation, resources and results</article-title>
          ., in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          , E. Tutubalina,
          <article-title>Overview of the BioASQ BioNNE-L Task on Biomedical Nested Entity Linking in CLEF 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitriadis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patsiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stoikopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Toumpas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kipouros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bekiaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Barmpagiannos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vasilopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barmpagiannos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samaras</surname>
          </string-name>
          , G. Giannakoulas, G. Tsoumakas,
          <source>Overview of ElCardioCC Task on Clinical Coding in Cardiology at BioASQ</source>
          <year>2025</year>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Silvello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Irrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Menotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          , Overview of GutBrainIE@CLEF 2025:
          <article-title>Gut-Brain Interplay Information Extraction</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tsatsaronis</surname>
          </string-name>
          , G. Balikas,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          , I. Partalas,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zschunke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Alvers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Polychronopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Almirantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavlopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baskiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gallinari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Artieres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ngonga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Heino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gaussier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrio-Alvers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schroeder</surname>
          </string-name>
          , I. Androutsopoulos,
          <string-name>
            <surname>G. Paliouras,</surname>
          </string-name>
          <article-title>An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition</article-title>
          ,
          <source>BMC Bioinformatics 16</source>
          (
          <year>2015</year>
          )
          <article-title>138</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12859-015-0564-6.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Balikas</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Partalas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kosmopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Pavlopoulos</surname>
          </string-name>
          , I. Androutsopoulos,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baskiotis</surname>
          </string-name>
          , E. Gaussier,
          <string-name>
            <given-names>T.</given-names>
            <surname>Artieres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gallinari</surname>
          </string-name>
          , Evaluation Framework Specifications,
          <source>Project deliverable D4</source>
          .1,
          <string-name>
            <surname>UPMC</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bougiatiotis</surname>
          </string-name>
          , G. Paliouras,
          <string-name>
            <surname>BioASQ-QA</surname>
          </string-name>
          :
          <article-title>A manually curated corpus for Biomedical Question Answering</article-title>
          ,
          <source>Scientific Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <fpage>170</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          , G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2023</year>
          :
          <article-title>The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF</source>
          <year>2023</year>
          ), ????
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Davydova</surname>
          </string-name>
          , E. Tutubalina, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2024</year>
          :
          <article-title>The twelfth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Maria Di Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Paliouras, Overview of BioASQ Tasks 12b and Synergy12 in CLEF2024</article-title>
          , booktitle = Working Notes of CLEF,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          / paper-01.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Paliouras, Overview of BioASQ Tasks 11b and Synergy11 in CLEF2023</article-title>
          , Working Notes of CLEF (
          <year>2023</year>
          ). URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-003.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>