<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Agentic MCS: A Multilingual Clinical Summarization Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johanna Angulo</string-name>
          <email>johanna.angulo@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Víctor Yeste</string-name>
          <email>victor.yeste@universidadeuropea.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Science, Engineering and Design, Universidad Europea de Valencia</institution>
          ,
          <addr-line>Paseo de la Alameda, 7, 46010 Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This research presents Agentic MCS, a LangGraph-based framework for multilingual biomedical text summarization, evaluated across English, Spanish, French, and Portuguese clinical case report datasets from bioASQ MultiClinSum Challenge. The investigation implements a multi-agent workflow integrating extractive preprocessing, neural abstractive summarization generation, and quality enhancement through comparative experimental runs. The framework employs diverse architectural approaches including hybrid BM25-neural pipelines, NERintegrated entity preservation systems, knowledge graph-guided summarization, and dense retrieval re-ranking mechanisms. Core models encompass domain-specific transformers (BioMistral-7B), multilingual fine-tuned models (mBART-large-50, mT5-base), general-purpose language models (Mistral 7B, Llama 3.1 8B), and proprietary LLMs as enhancement systems (GPT-4o). The framework achieved comprehensive multilingual coverage with language-specific optimizations. Evaluation reveals consistently high BERTScore results (0.71-0.84) indicating strong semantic fidelity, while lower ROUGE scores (0.17-0.25) reflect high abstraction rather than lexical extraction. This work establishes a robust foundation for intelligent agentic systems in biomedical text processing.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multilingual summarization</kwd>
        <kwd>Multi-agent systems</kwd>
        <kwd>Medical text mining</kwd>
        <kwd>Transformer models</kwd>
        <kwd>Knowledge graphs</kwd>
        <kwd>Entity preservation</kwd>
        <kwd>Clinical NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Biomedical summarization has evolved from traditional extractive methods to sophisticated neural
approaches. Comprehensive surveys of document summarization techniques [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] show this evolution
from rule-based to neural approaches. Early work focused on rule-based systems and statistical methods,
while recent advances leverage transformer architectures and domain-specific models. Multilingual
biomedical summarization remains under-explored, with most systems designed for English. The
MultiClinSum challenge addresses this gap by providing standardized evaluation across four languages.
Large language models are increasingly being integrated into clinical decision support systems [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
though questions remain about their role in enhancing versus replacing human expertise.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. MultiClinSum</title>
      <p>
        MultiClinSum represents a shared task addressing the automatic summarization of clinical case reports
across four major languages: English, Spanish, French, and Portuguese. This challenge responds to
the critical need in healthcare and biomedical research domains, where the exponential growth of
clinical documentation creates significant barriers for healthcare professionals, researchers, and patients
attempting to extract essential medical knowledge from extensive clinical texts [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        The challenge is structured into four independent sub-tracks (MultiClinSum-en, -es, -fr, -pt),
providing participants the flexibility to focus on specific languages or develop comprehensive multilingual
approaches [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Teams may submit up to five runs per language, with evaluation conducted using
ROUGE-L-Sum and BERTScore metrics against human-generated reference summaries. This task
addresses critical healthcare applications including clinical decision support, discharge summary
generation, medical literature review, multilingual clinical communication, and patient-oriented summary
creation, connecting natural language processing research with practical clinical applications across
linguistic boundaries [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Gold and Large-Scale Dataset</title>
        <p>
          The MultiClinSum challenge utilizes a multilingual corpus comprising both training and evaluation
datasets across four major languages: English, Spanish, French, and Portuguese. The corpus is structured
into two distinct training components and corresponding test sets, all made publicly available through
the Zenodo repository to ensure reproducibility and accessibility for the research community [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>The training data encompasses both gold-standard and large-scale variants, with the gold-standard
datasets containing 592 document-summary pairs per language, providing high-quality reference
materials for supervised learning approaches.</p>
        <p>
          The large-scale training datasets expand the available data with 25,902 document-summary pairs
per language, enabling the development and fine-tuning of data-intensive models while maintaining
consistent cross-lingual coverage [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Test Dataset</title>
        <p>
          The evaluation framework (test dataset) employs language-specific test datasets containing between
3,396 and 3,469 full-text clinical cases per language, with slight variations reflecting the natural
distribution of available clinical literature across linguistic domains. All datasets maintain consistent
organizational structure, with full-text documents and their corresponding summaries stored in separate
directories as UTF-8 encoded plain text files [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        MCS is a multi-agent and multimodal system designed for summarizing complex biomedical texts from
various languages [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It leverages LangGraph to create a stateful, resilient workflow, LlamaIndex for
Retrieval-Augmented Generation (RAG) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and a suite of specialized agents and tools for diferent
tasks using Large Language Model (LLM) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        This framework functions as an abstraction, defining a flexible multi-agent system wherein diferent
architectural configurations for summarization can be implemented and compared [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The architectures
detailed below represent specific instances developed and deployed for the challenge. It is important
to note that the runs submitted for the challenge may align with one or more of these conceptual
architectures.
      </p>
      <p>The scope of the MCS system is considerably broader than what was implemented for this specific
challenge. Consequently, only the architectures directly relevant to our submitted runs are presented
herein.</p>
      <p>Agentic Decision Making: The Route to Architecture component uses similarity analysis of input
documents against stored performance vectors to automatically select optimal architectures. The system
calculates cosine similarity between document embeddings and historical performance patterns, routing
to the architecture with highest expected performance (similarity threshold ≥ 0.85).</p>
      <p>RAG Configuration: Retrieval uses k=5 most similar patterns, with BM25 + dense retrieval hybrid
ranking. Deduplication removes patterns with ≥ 90% overlap before final selection.</p>
      <sec id="sec-4-1">
        <title>4.1. Architecture A: Extractive with Neural Reranking</title>
        <p>
          This architecture leverages a sequential pipeline of well-established Natural Language Processing
(NLP) techniques. The process commences with an extractive phase, where an initial set of candidate
summaries is generated using traditional methods such as BM25[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. These candidates are subsequently
processed by an LLM to produce more fluent, abstractive versions [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          A critical component of this architecture is the final reranking stage, which serves as a quality control
mechanism [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Reranking is conceptualized as a two-stage process [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. This methodology allows
for the strategic application of powerful models on a pre-filtered set of promising candidates, thereby
optimizing the trade-of between computational cost and output quality.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Architecture A2: Graph-Based Ensemble</title>
        <p>
          This architecture introduces an approach by constructing a structured representation of the source
document’s content. It begins by extracting key entities and their semantic relations to build a
documentspecific knowledge graph (KG) using the networkx library [
          <xref ref-type="bibr" rid="ref18 ref5">5, 18</xref>
          ]. This graph is then condensed or
serialized back into a textual format, which efectively serves as a dense, structured summary of the
original document [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Knowledge graph-guided retrieval approaches [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] have shown efectiveness
in maintaining factual consistency.
        </p>
        <p>
          An LLM is then prompted to generate an abstractive summary, with the serialized knowledge graph
provided as strong contextual guidance [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. This ensures that the generated summary is not only fluent
but also factually anchored to the core information captured in the KG. The final output is a hybrid
summary that synergistically combines the KG-guided abstractive text with a traditional extractive
summary produced by BM25, thus balancing narrative coherence with factual fidelity [
          <xref ref-type="bibr" rid="ref14 ref19">19, 14</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Architecture B: Dual Hybrid + QA Enhancements</title>
        <p>
          Building upon the principles of Architecture A2, this configuration incorporates a highly optimized
postprocessing system. The objective is to merge the strengths of fine-tuned models, such as mBART and
mT5 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], with the advanced reasoning and generation capabilities of GPT-4o [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. This enhancement
is informed by a detailed analysis of gold-standard summaries.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Architecture C: Supervised Fine-Tuning (SFT) of Multilingual Models</title>
        <p>
          This architecture is centered on exploring the eficacy of domain adaptation via Supervised
FineTuning (SFT) [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. This process adapts the model’s general capabilities to the specific nuances of the
summarization task at hand. The models subjected to SFT in this study were mBART, mT5 [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], and
BioMistral [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Architecture D: Few-Shot Prompting on General-Purpose LLMs</title>
        <p>
          This set of architectures leverages the powerful in-context learning capabilities of large-scale,
non-finetuned LLMs. The core technique employed is Few-Shot Prompting [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], which involves conditioning
the model by providing a small number of illustrative examples or example patterns (i.e., “shots”) of the
task directly within the prompt. This guides the model to generate output that conforms to the desired
format and style, all without the need for updating the model’s underlying parameters. To enhance
the efectiveness of the prompts, they were enriched with several layers of information such as NER
Extraction [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] and gold document patterns. This approach was systematically tested on two prominent
models: Llama 3.1 8B and Mistral 7B. Additionally, for a subset of the Spanish-language runs, a
post-processing step was implemented using GPT-4o. This step, which also falls under the paradigm of
few-shot prompting, was designed to further refine and enhance the quality of the generated summaries.
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. A Comprehensive Analysis of Medical Gold Summaries</title>
        <p>A comprehensive analysis of medical gold summaries was conducted to extract optimal patterns for
automated summarization systems. The analysis employed both traditional NLP metrics and advanced
English BioMistral-7B Helios9/BioMed 3 1 16
Spanish (1) mmTB5A-RbaTs-learge-50, bBsScC-b-NioL-ePh4rB-eIAs/ 3 1 8
Spanish (2) BioMistral-7B bBsScC-b-NioL-ePh4rB-eIAs/ 3 1 16
French mMBixAtrRaTl--8laxr7gBe* -50, THyepailctahAcaI/reNER-Fr 3/2* 1 8/8
Portuguese mmTB5A-RbaTs-learge-50, pHoUrtMugAuDeEseX_/medical_ner 3 1 8
* Mixtral-8x7B faced device management issues during inference and was not successfully deployed.
* Mixtral used 2 epochs for lighter fine-tuning; values show mBART/Mixtral where diferent.</p>
        <p>
          LR
2e-4
2e-5
2e-4
2e-5/2e-4
2e-5
NER (Named Entity Recognition) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] techniques to identify key characteristics that define
highquality medical summaries. For each language, the analysis was conducted independently, and the
corresponding outputs were encoded into dense vector representations and stored in a vector database.
These embeddings were later retrieved during inference using a Retrieval-Augmented Generation (RAG)
[
          <xref ref-type="bibr" rid="ref29 ref6">29, 6</xref>
          ] architecture to provide contextual grounding for the language model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>Key Findings &amp; Pattern Extraction
The quantitative analysis presented below focuses on Spanish gold-standard biomedical summaries as a
representative example. Similar analyses were conducted for each language in our multilingual dataset,
with language-specific patterns informing the respective summarization strategies.
• Target Length: 120.1 words (from gold dataset analysis)
• Target Sentences: 5 sentences
• Average Sentence Length: ~24 words per sentence
Words in Summaries. The distribution is centered around a mean of 120.1 words, indicating moderate
summary length with a right-skewed tail due to a few longer samples.</p>
        <p>Medical Terms. Most summaries contain fewer than 10 medical terms, with a mean of 4.0. These
medical terms are obtained through regex rules designed to capture common medical terminology
not included in the NER model. This suggests that while summaries are concise, they embed relevant
domain-specific terminology.</p>
        <p>NER Entities. The Named Entity Recognition (NER) distribution shows a broader spread, with a
mean of 5.2 entities per summary. For NER processing, we employed diferent biomedical NER models
per language, with the specific models detailed in the table above. This reflects consistent inclusion of
structured biomedical concepts across diferent linguistic contexts.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.7. Fine-Tuning Configuration</title>
        <p>Training Methodology: The fine-tuning process varied by model architecture. For
sequence-tosequence models (mBART, mT5), we used standard Seq2SeqTrainer with target-specific tokenization.
For causal language models (BioMistral, Mixtral), we applied Parameter-Eficient Fine-Tuning (PEFT)
using LoRA (Low-Rank Adaptation) with rank=16, alpha=32, and dropout=0.1, targeting attention
projections and feed-forward layers.</p>
        <p>Language-Specific Configurations: mBART models were configured with appropriate language
codes (en_XX, es_XX, fr_XX, pt_XX) for source and target languages. Mixtral employed
instructiontuned prompts with chat formatting. All models used a maximum sequence length of 1024 tokens for
input and 256 for output summaries.</p>
        <p>Training Hyperparameters:
• Loss Function: Cross-entropy loss for all models (sequence-to-sequence for mBART/mT5,
language modeling for BioMistral/Mixtral)
• Optimizer: AdamW with weight decay 0.01
• Warmup: 25-50 steps depending on dataset size
• Scheduler: Linear decay after warmup
• Precision: FP16 enabled where supported, BF16 for Mixtral
• Evaluation: Loss-based with early stopping disabled
• Saving: Best model based on evaluation loss, limited to 1-2 checkpoints</p>
        <p>Dataset Preparation: Training datasets consisted of 592 gold standard document-summary pairs
per language, except for Spanish Run 2 which utilized an extended dataset of 12,100 pairs. For causal
models, documents were formatted with instruction templates emphasizing biomedical summarization
objectives and clinical relevance.</p>
      </sec>
      <sec id="sec-4-8">
        <title>4.8. Language-Specific Implementation Details</title>
        <p>We provide comprehensive implementation details for each run across all languages:
English Runs:
• Run 1 (Architecture A): Phase 1: BM25 Extractive preprocessing. Phase 2: BioMistral fine-tuning.</p>
        <p>Phase 3: Hybrid generation with semantic reranking. Strategy: 3-stage pipeline optimization.
• Run 2 (Architecture B): Phase 1: NER entity extraction. Phase 2: BioMistral generation. Phase
3: Coverage enhancement validation. Strategy: NER-integrated summarization.
Spanish Runs:
• Run 1 (Architecture A2): Phase 1: Model fine-tuning (mBART, mT5). Phase 2: Knowledge
graph ensemble. Phase 3: NER preservation validation. Strategy: Dual-model hybrid approach.
• Run 2 (Architecture A+D): Phase 1: BM25 extractive preprocessing. Phase 2: Llama 3.1 8B
generation. Phase 3: GPT-4o quality enhancement. Strategy: 3-stage quality assurance.
• Run 3 (Architecture A+D): Phase 1: BM25 extractive preprocessing. Phase 2: Mistral 7B
generation. Phase 3: Dense retrieval reranking. Strategy: Dense retrieval optimization.
• Run 4 (Architecture B+D): Phase 1: NER-guided KG summarization. Phase 2: Hybrid generation.</p>
        <p>Phase 3: GPT-4o quality assurance. Strategy: Dual-model with proprietary enhancement.
• Run 5 (Architecture C): Phase 1: BioMistral fine-tuning on large-scale data. Phase 2: Hybrid
summarization. Phase 3: Semantic reranking. Strategy: Parameter-eficient adaptation.
Portuguese Runs:
• Run 1 (Architecture C+A): Phase 1: Fine-tuning (mBART, mT5). Phase 2: Initial hybrid approach
(failed). Phase 3: Revised BM25-neural hybrid. Strategy: Fine-tuning with fallback mechanism.
• Run 2 (Architecture A): Phase 1: Re-ranking with multiple summarization modalities. Phase 2:</p>
        <p>Gold pattern extraction. Phase 3: Quality control pipeline. Strategy: Pattern-based reranking.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>• Run 1 (Architecture C): Phase 1: Fine-tuning (mBART, Mixtral). Phase 2: NER-guided
generation. Phase 3: Quality normalization with BM25 fallback. Strategy: Quality threshold optimization.</p>
      <p>Architecture Legend: A: Extractive with Neural Reranking; B: Knowledge Graph-Based Ensemble;
C: Supervised Fine-Tuning; D: Few-Shot Prompting.</p>
      <p>English Results. Two experimental runs demonstrated consistent performance with minimal
variation. BERTScore results showed good semantic similarity (F1: 0.842, 0.840) with Run 1 slightly
outperforming Run 2 (Precision: 0.848, Recall: 0.837). ROUGE metrics indicated more constrained
lexical overlap (F1: 0.176, 0.167), with Run 1 achieving superior performance (Precision: 0.213, Recall:
0.165). Lower ROUGE scores compared to Spanish suggest greater lexical diversity in English biomedical
references.</p>
      <p>Spanish Results. Five runs showed consistent performance with moderate variance. BERTScore F1
ranged 0.701–0.742, with es_run_2 achieving highest performance (F1: 0.742, Precision: 0.752, Recall:
0.732). All runs maintained balanced precision-recall trade-ofs with recall above 0.710. ROUGE F1
scores ranged 0.190–0.247, with Run 2 demonstrating superior lexical overlap (F1: 0.247, Precision:
0.283, Recall: 0.242). Lower ROUGE compared to BERTScore indicates efective semantic capture despite
challenging exact lexical matching.</p>
      <p>Portuguese Results. Run 2 achieved higher BERTScore F1 (0.71 vs 0.689) while Run 1 demonstrated
superior ROUGE F1 (0.196 vs 0.187). Both runs showed identical BERTScore precision (0.70), with Run
2 exhibiting higher recall (0.71 vs 0.67) and Run 1 displaying greater ROUGE precision (0.23 vs 0.173).
French Results. Single run fr_run_1 achieved solid semantic performance (BERTScore F1: 0.712,
Precision: 0.716, Recall: 0.709), positioning between Spanish (0.701–0.742) and English (0.84+) results.
ROUGE metrics yielded F1: 0.196 (Precision: 0.210, Recall: 0.213), with recall exceeding precision,
indicating comprehensive summary generation with some redundancy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Analysis and Discussion</title>
      <sec id="sec-6-1">
        <title>6.1. Cross-Lingual Performance Patterns</title>
        <p>The results reveal distinct performance hierarchies across languages as shown in Figure 2.
• Semantic Similarity (BERTScore): English (0.842) &gt; Spanish (0.742) &gt; French (0.712) &gt;
Portuguese (0.710)
• Lexical Overlap (ROUGE): Spanish (0.247) &gt; French/Portuguese (0.197) &gt; English (0.176)</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Architecture Efectiveness</title>
        <p>Three-stage pipelines (Spanish Runs 2-3) achieved best performance by combining extractive
preprocessing, neural generation, and quality enhancement. The integration of GPT-4o for post-processing
(Spanish Run 2) yielded the best overall results across both metrics. The GPT-4o quality assurance
operates as an intelligent agent within our LangGraph multi-agent architecture framework, functioning as a
specialized tool that performs automated quality validation, content refinement, and error correction
on generated summaries. While the system architecture supports both GPT-4o and GPT-4.1 models, we
utilized GPT-4o for all quality assurance operations in this implementation.</p>
        <p>Knowledge Graph approaches (Spanish Runs 1, 4) demonstrated competitive performance, with
GPT-4o enhancement providing improvements (+3.2% ROUGE, +3.7% BERTScore over baseline KG
approaches). The multi-agent framework enables seamless integration where the knowledge graph
extraction agent passes structured medical entities to the GPT-4o quality assurance agent for validation
and summary enhancement.</p>
        <p>Fine-tuning strategies showed mixed results. BioMistral fine-tuning (Spanish Run 5)
underperformed despite extensive parameter-eficient training, likely due to the English-centric pre-training
conflicting with Spanish medical terminology requirements.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Semantic vs. Lexical Trade-ofs</title>
        <p>The analysis reveals a trade-of between semantic coherence and lexical overlap (Pearson r = -0.490, p
&lt; 0.05). English achieves maximum semantic performance but lowest lexical overlap, while Spanish
demonstrates the most balanced profile across both metrics.</p>
        <p>This pattern suggests that diferent languages require distinct optimization strategies: English benefits
from semantic preservation approaches, while Spanish allows for better balance between abstractive
generation and lexical alignment.</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Error Analysis</title>
        <p>
          The challenges of building multilingual language models for medicine [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] are reflected in our
crosslingual performance variations. Key limitations identified include:
• Language-specific model adaptation challenges, particularly for domain-specific models trained
primarily in English
• Entity preservation vs. fluency trade-ofs when implementing NER-guided generation
• Computational resource constraints afecting architecture selection, especially for large
multilingual models
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This work presents Agentic MCS, a multilingual framework for biomedical text summarization evaluated
across four languages. Key findings include:</p>
      <p>English achieves high semantic similarity (BERTScore: 0.842) through optimized neural architectures,
demonstrating the efectiveness of domain-specific fine-tuning with BioMistral and hybrid reranking
approaches.</p>
      <p>Spanish delivers the most balanced performance across both metrics (BERTScore: 0.742, ROUGE:
0.247), with three-stage pipelines and GPT-4o enhancement proving most efective. The diversity of
architectural approaches (5 runs) provides valuable insights into optimal strategy selection.</p>
      <p>Portuguese and French achieve good results (BERTScore: 0.71+, ROUGE: 0.197) through adapted
architectures, though with distinct optimization patterns reflecting language-specific characteristics.
Portuguese benefits from pattern mining approaches, while French shows efectiveness of quality
normalization strategies.</p>
      <p>Architecture efectiveness varies significantly by language, with ensemble approaches and quality
enhancement mechanisms outperforming single-model systems across all languages. The systematic
trade-of between semantic and lexical optimization provides insights for future multilingual biomedical
summarization systems.</p>
      <p>The consistent semantic performance (BERTScore 0.71-0.84) across languages demonstrates the
framework’s robust abstractive capabilities, while varying lexical overlap reflects language-specific
adaptation requirements.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Future Work</title>
      <p>The current analysis reveals several promising directions for advancing multilingual biomedical
summarization capabilities. Building upon our demonstrated semantic fidelity across languages, future
research will pursue three complementary directions.</p>
      <p>Comprehensive Evaluation Framework: Comprehensive baseline comparisons and ablation
studies are planned for the extended journal version to quantify the individual contributions of each
architectural component and validate the necessity of multi-agent complexity.</p>
      <p>Enhanced Quality Assurance Framework: We plan to develop enhanced RAG-based quality
assurance mechanisms that leverage our multi-agent architecture to maintain consistency across
languages while addressing the identified limitations in entity preservation and fluency trade-ofs.</p>
      <p>
        Multimodal Integration: The evolution toward a multimodal clinical summarization framework
represents our primary long-term objective. This development will focus on eficiently processing
multimodal biomedical inputs prevalent in clinical and research settings, aligning with advances in
multimodal biomedical foundation models [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] that can process both text and image data efectively.
      </p>
      <p>Advanced Cross-lingual Adaptation: Future work will develop systematic approaches to model
ifne-tuning with incrementally larger datasets combined with quality-supervised training methodologies.
This includes expanded cross-lingual transfer learning strategies to better address language-specific
medical terminology requirements and optimize the semantic versus lexical trade-ofs identified in our
analysis.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4 in order to: Grammar and spelling check.
After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bornmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haunschild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mutz</surname>
          </string-name>
          ,
          <article-title>Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases</article-title>
          ,
          <source>Humanities and Social Sciences Communications</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Holmgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Adler-Milstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Apathy</surname>
          </string-name>
          ,
          <article-title>Electronic health record documentation burden crowds out health information exchange use by primary care physicians: Article examines electrnoic health record documentation burden</article-title>
          ,
          <source>Health Afairs 43</source>
          (
          <year>2024</year>
          )
          <fpage>1538</fpage>
          -
          <lpage>1545</lpage>
          . doi:
          <volume>10</volume>
          .1377/hlthaff.
          <year>2024</year>
          .
          <volume>00398</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Asgari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaur</surname>
          </string-name>
          , G. Nuredini,
          <string-name>
            <given-names>J.</given-names>
            <surname>Balloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , N. Sebire,
          <string-name>
            <given-names>R.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pimenta</surname>
          </string-name>
          ,
          <article-title>Impact of electronic health record use on cognitive load and burnout amongst clinicians: A narrative review (preprint)</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>12</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .2196/55499.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Veen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Uden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Blankemeier</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Delbrouck</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Aali</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Blüthgen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pareek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Polacin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Reis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Seehofnerova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Rohatgi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Hosamani</surname>
            , W. Collins,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ahuja</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Langlotz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hom</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gatidis</surname>
            ,
            <given-names>A. Chaudhari,</given-names>
          </string-name>
          <article-title>Clinical text summarization: Adapting large language models can outperform human experts, Research square (</article-title>
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .21203/rs.3.rs-
          <volume>3483777</volume>
          /v1.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. J. Kuo</surname>
          </string-name>
          , et al.,
          <article-title>Knowledge graph embedding: An overview</article-title>
          ,
          <source>APSIPA Transactions on Signal and Information Processing</source>
          <volume>13</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ghanem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Munawar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <article-title>A survey on RAG with LLMs</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>246</volume>
          (
          <year>2024</year>
          )
          <fpage>3781</fpage>
          -
          <lpage>3790</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H. Y.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ju</surname>
          </string-name>
          , M. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>An empirical survey on long document summarization: Datasets, models and metrics</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.1145/3545176. doi:
          <volume>10</volume>
          . 1145/3545176.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Widyassari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rustad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. F.</given-names>
            <surname>Shidik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Noersasongko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Syukur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Afandy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R. I. M.</given-names>
            <surname>Setiadi</surname>
          </string-name>
          ,
          <article-title>Review of automatic text summarization techniques &amp; methods</article-title>
          , Journal of King Saud University-Computer and
          <source>Information Sciences</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <fpage>1029</fpage>
          -
          <lpage>1046</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Large language models-powered clinical decision support: Enhancing or replacing human expertise?</article-title>
          ,
          <source>Intelligent Medicine</source>
          <volume>05</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1016/j. imed.
          <year>2025</year>
          .
          <volume>01</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodríguez-Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Escolano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Melero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pratesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vigil-Gimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, Overview of MultiClinSum task at BioASQ 2025: evaluation of clinical case summarization strategies for multiple languages: data, evaluation, resources and results</article-title>
          ., in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>D. B. Acharya</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kuppan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Divya</surname>
          </string-name>
          ,
          <string-name>
            <surname>Agentic</surname>
            <given-names>AI</given-names>
          </string-name>
          :
          <article-title>Autonomous intelligence for complex goals-a comprehensive survey</article-title>
          ,
          <source>IEEE Access 13</source>
          (
          <year>2025</year>
          )
          <fpage>18912</fpage>
          -
          <lpage>18936</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2025</year>
          .
          <volume>3532853</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>McCoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <article-title>Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          (
          <year>2025</year>
          )
          <article-title>ocaf008</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thapa</surname>
          </string-name>
          , S. Adhikari, ChatGPT, Bard, and
          <article-title>Large Language Models for Biomedical Research: Opportunities and Pitfalls</article-title>
          ,
          <source>Annals of Biomedical Engineering</source>
          <volume>51</volume>
          (
          <year>2023</year>
          )
          <fpage>2647</fpage>
          -
          <lpage>2651</lpage>
          . URL: https: //doi.org/10.1007/s10439-023-03284-0. doi:
          <volume>10</volume>
          .1007/s10439-023-03284-0.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Puurula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Burgess</surname>
          </string-name>
          ,
          <article-title>Improvements to BM25 and language models examined</article-title>
          ,
          <source>in: Proceedings of the 19th Australasian Document Computing Symposium, ADCS 14</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          . URL: https://doi.org/10.1145/2682862. 2682863. doi:
          <volume>10</volume>
          .1145/2682862.2682863.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. P.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Pan,</surname>
          </string-name>
          <article-title>The large language models on biomedical data analysis: A survey</article-title>
          ,
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rybinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Karimi,</surname>
          </string-name>
          <article-title>Clinical trial search: Using biomedical language understanding models for re-ranking</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>109</volume>
          (
          <year>2020</year>
          )
          <article-title>103530</article-title>
          . URL: https://www. sciencedirect.com/science/article/pii/S1532046420301581. doi:https://doi.org/10.1016/j. jbi.
          <year>2020</year>
          .
          <volume>103530</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Yoo</surname>
          </string-name>
          , G. Kim,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Summary-sentence level hierarchical supervision for re-ranking model of two-stage abstractive summarization framework</article-title>
          ,
          <source>Mathematics</source>
          <volume>12</volume>
          (
          <year>2024</year>
          )
          <fpage>521</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <article-title>Implementation of enhanced graph layout algorithm for visualizing social network data using NetworkX Library</article-title>
          ,
          <source>International Journal of Advanced Research in Computer Science</source>
          <volume>8</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guan</surname>
          </string-name>
          , S. Liu, T. Wu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Knowledge graph quality control: A survey</article-title>
          ,
          <source>Fundamental Research</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>607</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          , Knowledge
          <string-name>
            <surname>Graph-Guided Retrieval Augmented Generation</surname>
          </string-name>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2502.06864. arXiv:
          <volume>2502</volume>
          .
          <fpage>06864</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>P.-E. Genest</surname>
          </string-name>
          , G. Lapalme,
          <article-title>Framework for abstractive summarization using text-to-text generation</article-title>
          ,
          <source>in: Proceedings of the workshop on monolingual text-to-text generation</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wilman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Atara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Suhartono</surname>
          </string-name>
          ,
          <article-title>Abstractive english document summarization using BART model with chunk method</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>245</volume>
          (
          <year>2024</year>
          )
          <fpage>1010</fpage>
          -
          <lpage>1019</lpage>
          . URL: https://www. sciencedirect.com/science/article/pii/S1877050924031375. doi:https://doi.org/10.1016/j. procs.
          <year>2024</year>
          .
          <volume>10</volume>
          .329, 9th International Conference on Computer
          <source>Science and Computational Intelligence</source>
          <year>2024</year>
          (ICCSCI
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gallifant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Levites Strekalova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Osorio-Valencia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Parke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mwavu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Gichoya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghassemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          , et al.,
          <source>Peer review of GPT-4 technical report and systems card, PLOS digital health 3</source>
          (
          <year>2024</year>
          )
          <article-title>e0000417</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , et al.,
          <article-title>Parameter-eficient fine-tuning in large language models: a survey of methodologies</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>58</volume>
          (
          <year>2025</year>
          ). URL: https://doi.org/10.1007/ s10462-025-11236-4. doi:
          <volume>10</volume>
          .1007/s10462-025-11236-4.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Pająk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pająk</surname>
          </string-name>
          ,
          <article-title>Multilingual fine-tuning for grammatical error correction</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>200</volume>
          (
          <year>2022</year>
          )
          <article-title>116948</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0957417422003773. doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2022</year>
          .
          <volume>116948</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Labrak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bazoge</surname>
          </string-name>
          , E. Morin,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Gourraud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rouvier</surname>
          </string-name>
          , R. Dufour,
          <article-title>BioMistral: A collection of open-source pretrained large language models for medical domains</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>5848</fpage>
          -
          <lpage>5864</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .findings-acl.
          <volume>348</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .findings-acl.
          <volume>348</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <article-title>True few-shot learning with Prompts-A real-world perspective</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>716</fpage>
          -
          <lpage>731</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .tacl-
          <volume>1</volume>
          .41/. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00485</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>N.</given-names>
            <surname>Perera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Emmert-Streib</surname>
          </string-name>
          ,
          <article-title>Named entity recognition and relation detection for biomedical information extraction, Frontiers in cell and developmental biology 8 (</article-title>
          <year>2020</year>
          )
          <fpage>673</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Pseudo-Knowledge</surname>
            <given-names>Graph</given-names>
          </string-name>
          :
          <article-title>Meta-Path Guided Retrieval and In-Graph Text for RAGEquipped LLM</article-title>
          , https://arxiv.org/html/2503.00309v1,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2503.00309. arXiv:
          <volume>2503</volume>
          .
          <fpage>00309</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>P.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Towards building multilingual language model for medicine</article-title>
          ,
          <source>Nature Communications</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <article-title>8384</article-title>
          . doi:
          <volume>10</volume>
          .1038/ s41467-024-52417-z.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usuyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bagga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Preston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Valluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tupini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mazzola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Crabtree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piening</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bifulco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Lungren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , H. Poon,
          <article-title>BiomedCLIP: A multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs</article-title>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .48550/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>