<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Biomedical Semantic Question Answering - Answering Systems Using Diferent LLMs for Subtasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samitinjaya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dipankar Das</string-name>
          <email>dipankar.dipnil2005@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jadavpur University</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This work presents our systems designed for the BioASQ Task13b, in which we tried to create base systems using sentence tranformers and Large Language models(LLMs) to address the task of BioASQ Task13b. The systems aims to address diverse biomedical questions by integrating Information Retrieval (IR), extractive QA, and summarization. Information retrieval part covers extracting related articles(documents) and then return them with a decreasing order of relevance. The article retrieval is done by the help of Bio.Entrez module. The article ranking for relevance is done using the sentence-transformers/all-MiniLM-L6-v2 model. Snippets are to be retrieved from each document that contain the related answer or part to the question. For snippet extraction deepset/roberta-base-squad2, has been used. Exact and Ideal answers extractive and generative answer respectively for the question. Both of these has been done using a fine-tuned T5 model (google-t5/t5-base) . The problem statement has been explained for an easy understanding of each of the phase in the challenge along with the required answers in each of these phases. Then a brief overview of the approach used has been provided. For easy understanding of the challenge, the 3 phases of the challenge can be divided by us into multiple subtasks, each one solving a particular part of the challenge. An overview of how the systems are created and how subtasks are approached has been provided. Implementation part describes how each of these subtask has been implemented to create systems that can easily solve the challenge phases. Framework of the system is given for an in-depth understanding. The oficial preliminary results from the BioASQ Task13b for the Test data 4 achieved by our systems has been shared here. The organizers are yet to perform a manual evaluation. The results were based on an automated evaluation. The overall performance of our systems remain limited, however the systems are functional, and since each part in system can be treated separately, so we can easily improve the parts individually to get a better output. Our system lays a strong foundation for future enhancements. Future work can focus on improving document relevance, integrating domain-specific knowledge, and optimizing answer generation quality.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BioASQ Task 13b</kwd>
        <kwd>System Approach</kwd>
        <kwd>System Framework</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid growth of the biomedical literature presents both opportunities and challenges in the quest
to access relevant and accurate information eficiently. With millions of new citations being added
annually to databases like PubMed1, practitioners and researchers often face dificulties in retrieving
precise answers and comprehensive summaries from this vast and unstructured data. Traditional
keyword-based search methods are limited in addressing issues such as synonymy and contextual
relevance, which are prevalent in biomedical terminology.</p>
      <p>In this paper we have shared our approach to solve the issues stated above.</p>
      <p>Semantic Question Answering (QA) systems have emerged as a vital solution to these challenges by
leveraging advanced natural language processing (NLP) techniques or machine learning to understand,
retrieve and generate relevant answers. These systems can go beyond simple keyword matching by
interpreting the underlying intent of questions and provide contextually appropriate answers to the
user.</p>
      <p>Within this scope, the BioASQ2 challenge has been instrumental in promoting research and
development in biomedical semantic indexing, information retrieval, question answering, and summarization.
BioASQ Task 13b3 focuses on generating accurate and comprehensive answers to biomedical questions,
requiring integration of retrieval and summarization techniques to provide the answers</p>
      <p>This work aims to develop and implement methods that efectively integrate information retrieval,
answer extraction, and summarization processes to tackle complex biomedical queries. By participating
in and addressing the challenges posed by BioASQ Task 13b, the work contributes to advancing the
capabilities of automated systems in providing reliable, precise, and concise biomedical knowledge,
ultimately supporting researchers, and healthcare professionals in decision-making processes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background And Motivation</title>
      <p>The exponential growth of biomedical literature, with millions of articles in repositories such as PubMed,
poses significant challenges for researchers and clinicians seeking quick and precise information. As
the volume of available knowledge continues to expand rapidly, it becomes increasingly dificult to
eficiently retrieve relevant, evidence-based answers to complex biomedical questions using traditional
search methods.</p>
      <p>Conventional keyword-based search systems often fall short in capturing the true semantic intent
behind user queries, leading to retrieval of irrelevant information or missing critical data. This gap
underscores the need for advanced systems capable of understanding, reasoning, and summarizing
biomedical knowledge efectively.</p>
      <p>Aim here is to develop systems that can bridge this gap by integrating information retrieval, natural
language understanding, and summarization techniques to support the biomedical community more
efectively. The task aims to tackle multiple problems which address the critical need for automated
and scalable solutions to manage the overwhelming influx of biomedical data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Outline Of The Paper</title>
      <p>This paper gives us an overview on how the task has been approached and done. This paper has been
organized as follows.</p>
      <p>In the Introduction part, the details and overview of the challenge has been provided. It briefly
describes what this challenge is about and what are the aims this challenge wants to achieve. Then
background and motivation for this work is discussed.</p>
      <p>Next, all the details related to the task are described in detail. The problem statement describes the
problem/challenge/task. Then a detailed description of task is provided. Then it deep dives into what
approach has been used by us to solve this problem. Each part of the approach is elaborated to give a
detailed overview of the techniques used.</p>
      <p>The section Implementation describes about how the methodology has been implemented. The
structure of our systems after implementing every part is given here. The Results section gives an
insight on the performance and overall usefulness of the implementation. The discussion section then
provides us with the interpretations of the Results along with various improvements that can be made
in system.</p>
      <p>At last the paper has a Conclusions and Future Work section where the conclusions of the project
are given along with some ideas of improvement and future related work that can be done. After that
the references used are provided.</p>
      <sec id="sec-3-1">
        <title>2https://www.bioasq.org/ 3https://participants-area.bioasq.org/general_information/Task13b/</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Problem Statement</title>
      <p>To develop a system/s that can accurately retrieve relevant biomedical documents and generate precise
and comprehensive answers to biomedical questions of diferent types, thereby reducing the manual
efort, ineficiency, and limitations of traditional literature search methods.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Detailed Description Of Task</title>
      <p>To address this challenge, BioASQ Task 13b provides a benchmark task that promotes the development
of Biomedical Semantic Question Answering (QA) systems capable of automatically handling such
queries. The task focuses on four types of questions : factoid, list, yes/no, and summary.</p>
      <p>The system under development is required to:
• Retrieve relevant documents from biomedical repositories (e.g., PubMed)
• Extract relevant snippets of text from the retrieved documents
• Extract exact answers (facts, lists, yes/no)
• Generate ideal answers (concise, paragraph-sized summaries)
The BioASQ Task 13b organizes this into three phases, each representing a key sub-task:
• Phase A: The system retrieves up to 10 relevant documents, ranked by decreasing relevance,
and returns snippets—text spans that may help answer the question (typically extracted from
abstracts).
• Phase A+: No gold-standard documents/snippets are provided. Participants must use their Phase</p>
      <p>A outputs to directly extract Exact Answers and generate Ideal Answers.
• Phase B: Gold-standard relevant documents and snippets (manually curated) are provided to
the participants. Using these high-quality resources, the task is to extract Exact Answers and
generate Ideal Answers.</p>
      <sec id="sec-5-1">
        <title>5.1. Required answers in each of the Phase :</title>
        <p>5.1.1. In Phase A of Task 13b
In Phase A of Task 13b, the participants will be provided with English questions 1, 2,...,. For each
question , each participating system will be required to return any (ideally all) of the following lists:
• A list of at most 10 relevant articles (documents) ,1, ,2, ,3,... from the designated article
repositories. Again, the list should be ordered by decreasing confidence, i.e., ,1, should be the
article that the system considers to be the most relevant to the question 1„ ,2, should be the
article that the system considers to be the second most relevant etc. A single article list will be
returned per question and participating system, and the list may contain articles from multiple
designated repositories. The returned article list will actually contain unique article identifiers
(obtained from the repositories).
• A list of at most 10 relevant text snippets ,1, ,2, ,3,... from the returned articles. Again, the list
should be ordered by decreasing confidence. A single snippet list will be returned per question
and participating system, and the list may contain any number (or no) snippets from any of the
returned articles ,1, ,2, ,3 ,... Each snippet will be represented by the unique identifier of
the article it comes from, the identifier of the section the snippet starts in, the ofset of the first
character of the snippet in the section the snippet starts in, the identifier of the section the snippet
ends in, and the ofset of the last character of the snippet in the section the snippet ends in. The
snippets themselves will also have to be returned (as strings)
5.1.2. In Phase A+ of Task 13b
In Phase A+ of Task 13b, the participants will be provided with English questions as in Phase A (above),
but will be required to return "exact" and/or "ideal" answers, as in for Phase B (below).
5.1.3. In Phase B of Task 13b
In Phase B of Task 13b, the participants will be provided with the same questions 1, 2,..., as in Phase
A, but this time they will also be given gold (correct) lists of articles and snippets. The "gold" lists
will contain articles and snippets identified by biomedical experts as relevant and providing enough
information to answer the questions. For each question, each participating system may return an ideal
answer, i.e., a paragraph-sized summary of relevant information. In the case of yes/no, factoid, and list
questions, the systems may also return exact answers; for summary questions, no exact answers will be
returned. The participants will be told the type of each question. A participating system may return
only "exact" answers, or only "ideal" answers, or (ideally) both "exact" and "ideal" answers.</p>
        <p>Exact Answers
• For each yes/no question, the exact answer of each participating system will have to be either
"yes" or "no".
• For each factoid question, each participating system will have to return a list* of up to 5 entity
names (e.g., up to 5 names of drugs), numbers, or similar short expressions, ordered by decreasing
confidence.
• For each list question, each participating system will have to return a single list* of entity names,
numbers, or similar short expressions, jointly taken to constitute a single answer (e.g., the most
common symptoms of a disease). The returned list will have to contain no more than 100 entries
of no more than 100 characters each.</p>
        <p>• No exact answers will be returned for summary questions.</p>
        <p>Ideal Answers</p>
        <p>For each question (yes/no, factoid, list, summary), each participating system of Phase B may also return
an ideal answer, i.e., a single paragraph-sized text ideally summarizing the most relevant information
from articles and snippets retrieved in Phase A. Each returned "ideal" answer is intended to approximate
a short text that a biomedical expert would write to answer the corresponding question (e.g., including
prominent supportive information), whereas the "exact" answers are only "yes"/"no" responses, entity
names or similar short expressions, or lists of entity names and similar short expressions; and there are
no "exact" answers in the case of summary questions. The maximum allowed length of each "ideal"
answer is 200 words.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. The main challenges inherent in this task are:</title>
        <p>• Accurately retrieving semantically relevant documents beyond simple keyword matching
• Extracting precise factual entities or lists from complex biomedical text
• Generating coherent and comprehensive ideal answers (summaries) from multiple documents
This work aims to automate these tasks by integrating modern Natural Language Processing (NLP)
techniques, thereby improving the eficiency, consistency, and quality of biomedical information retrieval
and question answering, ultimately supporting biomedical research and clinical decision-making.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. JSON Format Of Data</title>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Input Output Examples</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Proposed Approach</title>
      <p>The main task can be easily divided into 2 Systems (or pipelines) . One system for Phase A, and one for
Phase A+ and Phase B. Phase A+ problem can be easily converted into a Phase B problem. In Phase B
we are given gold documents and snippets related to each question and we have to return exact and
ideal answers to these questions. In Phase A+ problem we are given the questions only and no gold
documents or snippets, So we can use System 1 to get documents and snippets for each question and
thus we can use System 2 to get our answers for Phase A+. Since Test Data in Phase A+ is same as the
test data in phase A, therefore we can easily use the answers of Phase A Test data from System A and
use System 2 on it to get the Phase A+ answers. Thus we only require 2 System for the whole project.</p>
      <p>The whole project can be divided into 5 Subtasks for a structured approach.</p>
      <p>Relevant Document Retrieval=&gt;Subtask 1
Ranking Documents=&gt;Subtask 2
Snippets Extraction=&gt;Subtask 3
Extracting Exact Answers=&gt;Subtask 4
Generating Ideal Answers (Summarization)=&gt;Subtask 5
System 1 solves the Subtask 1,2 and 3</p>
      <p>System 2 solves the Subtask 4 and 5</p>
      <sec id="sec-6-1">
        <title>6.1. Approach For Each Subtask</title>
        <p>6.1.1. Subtask 1 – Document Retrieval
In this subtask the main goal is to retrieve relevant documents related to the given question. According
to our approach the question can be transformed into a query consisting of important keywords (say
Nouns). Now this query is used to fetch and retrieve the related documents. The method used by
us to form the query is discussed in the subsection 7.3 (System 1 implementation) of the section 7
(Implementation).</p>
        <p>To retrieve relevant documents, we utilize the Entrez Programming Utilities (E-utilities)4 — a suite of
web-based APIs provided by the National Center for Biotechnology Information (NCBI)5 for
programmatic access to the PubMed database.</p>
        <p>The query formed is then submitted to Entrez using the following utilities:
• esearch: Retrieves PubMed IDs (PMIDs) of articles(documents) matching the query.
• efetch: Fetches metadata and abstracts of the retrieved articles(articles) based on the PMIDs.</p>
        <p>Entrez is used as it ofers direct and authoritative access to the entire PubMed database, making it
ideal for biomedical information retrieval. It enables automated, eficient, and large- scale querying,
ensuring comprehensive coverage of relevant literature while maintaining scalability and reliability in
a production system.
6.1.2. Subtask 2 – Document Ranking
Subtask 2 deals with ranking a given set of documents based on a question provided. It then returns a
list back where the list is sorted in a descending order of relevance.</p>
        <p>To accurately rank retrieved documents based on their relevance to a question, we employ the
sentence-transformers/all-MiniLM-L6-v26 model — a lightweight and eficient transformer- based
model that captures semantic similarity between text pairs, going beyond simple keyword matching.</p>
        <p>Given a biomedical question and the abstract of an article, our fine-tuned model encodes both into
ifxed-size vector representations (embeddings). These embeddings capture the contextual meaning of
the text, allowing us to compute their cosine similarity and rank documents by their semantic closeness
to the question.</p>
        <p>Model Architecture: The model is based on a distilled Transformer architecture (MiniLM) — a compact
and faster variant of BERT7. Key characteristics include:
4https://www.ncbi.nlm.nih.gov/books/NBK25497/
5https://www.ncbi.nlm.nih.gov/
6https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
7https://huggingface.co/docs/transformers/en/model_doc/bert
• 6 Transformer layers
• 384-dimensional embeddings
• Generates dense, fixed-size vectors suitable for fast similarity computation.</p>
        <p>The main reason to choose MiniLM was that it enables context-aware document ranking that captures
semantic relationships between questions and abstracts, making it more accurate and robust, all while
remaining computationally eficient for large-scale use.</p>
        <p>The model used will be fine-tuned using the Multiple Negatives Ranking Loss on a large corpus of
sentence pairs. This objective trains the model to embed semantically similar sentences closer together
in the embedding space, improving its ability to assess relevance efectively.
6.1.3. Subtask 3 – Extracting Snippets
In this Subtask, the snippet extraction part is done. Snippets are extracted from the Abstracts of the
retrieved documents. Here, a pre-trained Extractive Question-Answering Model is used. No fine-tuned
model is used here.</p>
        <p>We utilize the deepset/roberta-base-squad28 model — a robust question answering (QA) model capable
of identifying exact answer spans within a given context.</p>
        <p>This model is based on RoBERTa-base9 - (Robustly Optimized BERT Pretraining Approach) and is
ifne-tuned on the SQuAD 2.0 10 dataset, which enables it to:
• Extract concise answer spans from the provided text when a valid answer exists.
• Abstain from answering gracefully when no appropriate answer is found in the context.</p>
        <p>For each biomedical question, the abstract of a retrieved document serves as the context. The QA
model takes the question-context pair as input and predicts the start and end positions of the most
relevant answer span within the abstract. If no suitable answer is detected, the model returns a null
output, minimizing false positives. 5 snippets are extracted using this model and then the one with the
best score and length combination is selected out of these.</p>
        <p>Model Architecture and Capabilities
• Built on the RoBERTa-base transformer architecture (12 layers, 768 hidden size).
• Fine-tuned explicitly for extractive question answering with a "no answer" option (a key feature
of SQuAD 2.0).
• Capable of both precise extraction and answerability detection, making it well-suited for
biomedical contexts where relevant information may not always be explicitly present.
6.1.4. Subtask 4 – Extracting Exact Answers
For generating concise, fact-based exact answers, we employ the T5 (Text-to-Text Transfer Transformer)
— a flexible Transformer-based encoder-decoder model that frames all NLP tasks as a text-to-text
problem. A model - (google-t5/t5-base) 11 is finetuned by us for the task.</p>
        <p>The input to the model consists of:
• The biomedical question
• Its question type (e.g., yes/no, factoid, list)
• A concatenation of all answer snippets from the relevant documents</p>
        <sec id="sec-6-1-1">
          <title>8https://huggingface.co/deepset/roberta-base-squad2 9https://huggingface.co/FacebookAI/roberta-base 10https://huggingface.co/datasets/rajpurkar/squad_v2 11https://huggingface.co/google-t5/t5-base</title>
          <p>The T5 model will be fine-tuned for this exact answer task. The training dataset is, around 5˜300
questions. The model is fine-tuned by providing the required input and output format. This combination
makes the model learn the pattern of extracting the exact answer from a given output.</p>
          <p>The output given must be post-processed according to each of diferent question types mentioned as
each of them requires a diferent type of exact answer.</p>
          <p>T5 is specifically used because its flexible text-to-text architecture allows explicit control over output
formats based on question type, improving exact answer accuracy across diverse biomedical questions.
6.1.5. Subtask 5 – Generating Ideal Aanswers
For generating ideal answers — comprehensive, well-formed summaries that synthesize information
from multiple snippets — we again leverage the T5 model, fine-tuned for generative summarization.</p>
          <p>The input to the model includes:
• The biomedical question
• All extracted snippets concatenated into a single context</p>
          <p>The model generates a coherent, paragraph-style summary that integrates relevant information
providing a richer, more contextualized answer to the question.</p>
          <p>Again, the model (google-t5/t5-base) is fine-tuned on a large set of question-answer pairs, around 5˜300
questions, which enables it to learn the patterns required for ideal answer generation. For fine-tuning
the input is the question along with all the relevant snippets as context and the output as it was in
dataset. This fine-tuning helps it to generate long-form, informative summaries aligned to our needs.</p>
          <p>T5 is used because it’s encoder-decoder framework and large pretrained knowledge base make it
ideal for multi-document summarization, providing comprehensive and context-aware answers.</p>
          <p>These Subtasks are solved by the Pipeline System 1 and System 2. The division was made to easily
understand all the subtasks that need to be done in the main task.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Implementation</title>
      <sec id="sec-7-1">
        <title>The approach is implemented in the following manner :</title>
        <sec id="sec-7-1-1">
          <title>7.1. Pre-Processing</title>
          <p>The Training (or Development) Dataset provided to us contains questions along with related documents,
answers, but no abstract of documents is given. For fine-tuning we need the abstract part as the
document label contain just the link itself. Thus for each question, we need to extract the related
abstracts. But the issue is the limit of Entrez, which is used to extract abstracts using efetch. Entrez has
a limit of fetching 3 documents per second. Thus for such a large dataset we cannot extract all abstracts
in a single run as it would take many many hours.</p>
          <p>Thus the dataset is firstly divided into 6 parts with first 5 having 1000 questions and last one having
300 questions. Now for each of these parts abstracts are extracted using Entrez. At last abstracts from
all these parts are appended to the main dataset. Thus creating a solid training dataset will all data
related for fine-tuning.</p>
        </sec>
        <sec id="sec-7-1-2">
          <title>7.2. Finetuning</title>
          <p>For Subtask 2, 4 and 5 the models which will be used needs to be fine-tuned first. Each of the model is
fune-tuned as per it was described in the approach above. Models are fine-tuned in Google Colab as our
current machine cannot perform computation and GPU intensive tasks like fine-tuning.</p>
          <p>The fine-tuning part was time-consuming too as one needs to change the training arguments such as
learning-rate, number of epochs, etc to get the perfect fine-tuned model that gives the best answers.</p>
        </sec>
        <sec id="sec-7-1-3">
          <title>7.3. System 1 Implementation :</title>
          <p>The system 1 contains a python function that accepts a Test Json and Email and return the answer
data in the required submission format. Test Json is the file that contains questions given to us for
whose the answers need to be given, and Email is the email that is used for Entrez login. The model
used here is minilm_ranked (our fine-tuned MiniLM) and is stored in the respective directory to use it.
This model is then loaded inside this function, to be used for article (document) ranking. The model
used for extracting snippet is the extractive question-answering model “deepset/roberta-base-squad2”
which is taken from huggingface. This model is loaded inside this function using pipeline module from
transformers library, for the use of snippet extraction from relevant articles(documents).</p>
          <p>Now inside this function, two functions are defined, each of which takes a question. One returns
nouns, proper nouns and verbs separated by commas while other returns only nouns and proper nouns.
This step assists in the query creation process.</p>
          <p>Now to get the relevant articles for each question we create the query using above method and then
for each query we get PMIDS of relevant articles using Entrez.esearch . At first nouns, proper nouns
along with verbs are searched as query and if no articles found then, only nouns and proper nouns
are searched as query. If then also no PMIDs or articles are found then the whole question is searched
using Entrez.esearch . After getting the PMIDs the document link is created from these PMIDs and
appended to answer data. This completes our article (document) retrieval process.</p>
          <p>After that for ranking the articles (documents) we need to have the abstract of each of document
because we cannot rank articles based on just article links. So, for every PMID, abstracts are fetched
using Entrez.efetch . Then the minilm_ranker model (fine-tuned MiniLM) is loaded. For each question,
we pass the question to a function that takes the ranking model, article abstracts and PMIDs as input.
Now the function ranks the documents according to relevance and returns a list containing PMIDS,
articles in a descending order of relevance. PMIDs are used so to identify the article link. Now these
ranked article (documents) are appended to question data in the dictionary replacing the original
unranked ones. This dictionary itself is returned at last after all the steps giving us the answer data in
required format of submission.</p>
          <p>After this the snippet extraction part is done. For that we use deepset/roberta-base-squad2 model
from hugging face. Then we create a QA pipeline using pipeline from transformers. Now every question
is passed in QA format to this pipeline to extract the answer snippet. Input is question + abstract of an
article. Output is the snippet extracted. All other details like ofsetInBeginSection, ofsetInEndSection,etc
are filled from start_pos, end_pos,etc. Document(article) link is also appended, along with beginSection
and endSection which are abstract. Snippets, thus extracted, are appended to the answer data.</p>
          <p>At last the answer data is returned. This data contains the articles ( documents) in a ranked order of
decreasing relevance, along with snippet text from each article for every question.</p>
          <p>Now, this answer data is in a python dictionary format. The data is then converted to a .json file
format as required for the submission.</p>
        </sec>
        <sec id="sec-7-1-4">
          <title>7.4. System 2 Implementation :</title>
          <p>The System 2 contains a python function that accepts only the .json file in which questions are given,
along with gold articles and snippets, whose exact and ideal answers are required. The models used
here are the fine-tuned T5 sentence transformers :- T5_ideal_answer and T5_exact_answer. These
models need to upload to the required directories so that they can be loaded inside the function.</p>
          <p>Inside this function, two functions named get_ideal_answer and get_exact_answer are defined. The
get_exact_answer function takes question, question type, its snippets, model and tokenizer as inputs. It
uses the question, question type and snippets to form an input and then get the exact answer from the
input. Question, question type and snippets are all appended to form the input. Since T5 is a text-to-text
model, so the exact answer extracted is in string format. But exact answer for list and factoid type
question must be list and for yes/no type it should be “yes” or “no” and no exact answer is to be given
for summary type. So according to each question type, the post- processing is done and required format
of exact answer is taken from answer string. The get_ideal_answer takes the question, its snippets,
model and tokenizer as input and generate a ideal answer for the question using all the snippets as the
context. The ideal answer is a summarized answer for the question based on the snippets related to the
question.</p>
          <p>For every question these functions are run to get ideal and exact answers which are then appended
to the answer data which is to be returned by the main function. This answer data is in a python
dictionary format. The data is then converted to a .json file format as required for the submission.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Results</title>
      <p>Participated in shared task BioASQ Task13b, Test Batch 4. These are the oficial results from their
website.</p>
      <p>Manual Evaluation has not been done yet. These are auto-generated results from evaluation script/s.
Various types of metrics have been used. Each phase with its diferent required answers, uses a set of
diferent metric. The result along with the metrics used has been provided in a tabular form below.</p>
      <sec id="sec-8-1">
        <title>8.1. Phase A Results</title>
        <sec id="sec-8-1-1">
          <title>The results of Phase A : Documents are in the Table 1. The results of Phase A : Snippets are in the Table 2.</title>
        </sec>
      </sec>
      <sec id="sec-8-2">
        <title>8.2. Phase B Results</title>
        <sec id="sec-8-2-1">
          <title>The results of Phase B : Exact Answers are in the Table 3. The results of Phase B : Ideal Answers are in the Table 4.</title>
          <p>Mean precision
0.0154</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9. Discussions</title>
      <p>The current systems presents a functional baseline that integrates all required subtasks—document
retrieval, ranking, snippet extraction, and answer extraction and generation. While the overall results
are not very good, they do validate the feasibility of the modular approach. Each component/Subtask in
the system, though individually limited in performance, demonstrates that the intended function is
operational. This confirms that the foundational design of the system is sound and capable of supporting
future improvements.</p>
      <p>The underwhelming results highlight the need for significant enhancements in several areas. Some
subtasks, particularly document ranking and answer extraction, require more robust techniques to
capture biomedical relevance and generate high-quality responses. However, the current framework is
lfexible and modular, allowing for easy substitution or augmentation of individual components. This
means that alternative models, especially those more specialized or fine-tuned for biomedical domains,
can be tested and incorporated without requiring an overhaul of the entire system.</p>
      <p>In essence, the baseline system successfully establishes a starting point. It confirms that the problem
is addressable using current tools and provides a clear structure for future experimentation. The path
forward involves systematic testing of alternative models, targeted fine-tuning, and iterative evaluation
to gradually replace weaker components and improve the overall performance.
10. Conclusion And Future Work
This work presents Biomedical Question Answering (QA) systems that integrates information retrieval
(IR), extractive QA, and abstractive summarization to handle a variety of biomedical question types.
The systems employs a combination of MiniLM (for semantic document ranking), RoBERTa_SQuAD2
(for answer span extraction), and T5 (for both exact and ideal answer generation). Despite the limited
performance of the current approach, the system demonstrates that such a modular and semantically
driven architecture is viable and provides a foundation for further research in biomedical information
access.</p>
      <p>The results indicate that the baseline is functional but lacks the robustness needed for high-quality
biomedical QA. Several components, particularly in document ranking and long-form answer generation,
require further refinement.
10.1. Future Work
:To improve system performance and answer quality, the future work can incorporate the following
points:
• Model Improvement : Experiment with a broader range of pretrained models, especially those
ifne-tuned specifically on biomedical corpora, to enhance domain- specific understanding.
• Relevance Optimization: Develop advanced document ranking strategies that consider
domainspecific signals and query intent to boost retrieval precision.
• Enhanced Summarization: Improve the abstractive summarization component by fine-tuning on
biomedical summarization datasets or using models designed for medical discourse generation.
• System Improvement: Conduct error analysis across subtasks to identify weaknesses and
iteratively improve individual modules without compromising the overall system structure.</p>
      <p>By addressing these, the baseline systems can evolve into a more accurate and reliable biomedical
QA solution capable of supporting clinicians, researchers, and end-users in accessing critical medical
information.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>Thanks to the developers of ACM consolidated LaTeX styles https://github.com/borisveytsman/acmart
and to the developers of Elsevier updated LATEX templates https://www.ctan.org/tex-archive/macros/
latex/contrib/els-cas-templates.
During the preparation of this work, the author(s) used Chat-GPT-4 in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodriguez-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitriadis</surname>
          </string-name>
          , G. Tsoumakas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannakoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bekiaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. N. Maria</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Nunzio</surname>
          </string-name>
          , Giorgio,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinelli</surname>
          </string-name>
          , G. Silvello, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2025</year>
          :
          <article-title>The thirteenth BioASQ challenge on large-scale biomedical semantic indexing and question answering</article-title>
          , in: L.
          <string-name>
            <surname>P. A. G. S. d. H. J. M. F. P. P. R. D. S. G. F. N. F. Jorge Carrillo-de Albornoz</surname>
          </string-name>
          , Julio Gonzalo (Ed.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Paliouras, Overview of BioASQ Tasks 13b and Synergy13 in CLEF2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Lima</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          , G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2023</year>
          :
          <article-title>The Eleventh BioASQ Challenge on Large-Scale Biomedical Semantic Indexing</article-title>
          and Question Answering, Springer Nature Switzerland,
          <year>2023</year>
          , p.
          <fpage>227</fpage>
          -
          <lpage>250</lpage>
          . URL: http://dx.doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -42448-9_
          <fpage>19</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -42448-9_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bougiatiotis</surname>
          </string-name>
          , G. Paliouras,
          <string-name>
            <surname>BioASQ-QA</surname>
          </string-name>
          :
          <article-title>A manually curated corpus for Biomedical Question Answering</article-title>
          ,
          <source>Scientific Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>170</article-title>
          . URL: https://doi.org/10.1038/ s41597-023
          <article-title>-02068-4</article-title>
          . doi:doi.org/10.1038/s41597-023-02068-4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Mork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Paliouras,</surname>
          </string-name>
          <article-title>The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey</article-title>
          ,
          <source>Frontiers in Research Metrics and Analytics</source>
          Volume 8
          <article-title>-</article-title>
          <year>2023</year>
          (
          <year>2023</year>
          ). URL: https://www.frontiersin.org/journals/research-metrics-and-analytics/ articles/10.3389/frma.
          <year>2023</year>
          .
          <volume>1250930</volume>
          . doi:
          <volume>10</volume>
          .3389/frma.
          <year>2023</year>
          .
          <volume>1250930</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>The probabilistic relevance framework: Bm25 and beyond</article-title>
          ,
          <source>Foundations and Trends® in Information Retrieval</source>
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          . URL: http://dx.doi.org/10.1561/1500000019. doi:
          <volume>10</volume>
          .1561/1500000019.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1907</year>
          . 11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Know what you don't know: Unanswerable questions for squad</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1806</year>
          .03822. arXiv:
          <year>1806</year>
          .03822.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Minilm:
          <article-title>Deep self-attention distillation for task-agnostic compression of pre-trained transformers</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2002</year>
          .10957. arXiv:
          <year>2002</year>
          .10957.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          . URL: http://jmlr.org/papers/v21/
          <fpage>20</fpage>
          -
          <lpage>074</lpage>
          .html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>