<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using LLMs to Generate Patient Journeys in Portuguese: an Experiment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tahsir Ahmed Munna</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Luísa Fernandes</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Puri!cação Silvano</string-name>
          <email>msilvano@letras.up.pt</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nuno Guimarães</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alípio Jorge</string-name>
          <email>amjorge@fc.up.pt</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>INESC TEC</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Portugal</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The relationship of a patient with a hospital from admission to discharge is often kept in a series of textual documents that describe the patient's journey. These documents are important to analyze the di"erent steps of the clinical process and to make aggregated studies of the paths of patients in the hospital. In this paper, we explore the potential of Large Language Models (LLMs) to generate realistic and comprehensive patient journeys in European Portuguese, addressing the scarcity of medical data in this speci!c context. We employed Google's Gemini 1.5 Flash model and utilized a dataset of 285 European Portuguese published case reports from the SPMI website, published by the Portuguese Society of Internal Medicine, as references for generating synthetic medical reports. Our methodology involves a sequential approach to generating a synthetic patient journey. Initially, we generate an admission report, followed by a discharge report. Subsequently, we generate a comprehensive patient journey that integrates the admission, multiple daily progress reports, and the discharge into a cohesive narrative. This end-to-end process ensures a realistic and detailed representation of the patient's clinical pathway as a patient's journey. The generated reports were rigorously evaluated by medical and linguistic professionals, as well as automatic metrics to measure the inclusion of key medical entities, similarity to the case report, and correct Portuguese variant. Both qualitative and quantitative evaluations con!rmed that the generated synthetic reports are predominantly written in European Portuguese without the loss of important medical information from the case reports. This work contributes to developing high-quality synthetic medical data for training LLMs and advancing AI-driven healthcare applications in under-resourced language settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Model</kwd>
        <kwd>Patient Journey</kwd>
        <kwd>Medical Text Generation</kwd>
        <kwd>Gemini</kwd>
        <kwd>Prompt Engineering</kwd>
        <kwd>European Portuguese</kwd>
        <kwd>Contextual Coherence</kwd>
        <kwd>Semantic Accuracy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, Large Language Models (LLMs) have provided advancements across a variety of
complex tasks, including question answering [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], code generation [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], and text generation [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
The multimodal capabilities [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] of LLMs are another notable feature. Models like GPT-4 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] combine
textual and visual inputs that expand the realm of possible application of these models.
      </p>
      <p>
        LLMs have also proven outstanding achievements in domain-speci!c tasks, with the medical !eld
being a prime example [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For instance, in the medical context, LLMs are used to generate clinical
summaries [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], improve diagnostic processes [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and provide medical decision support [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The
capacity to process unstructured clinical narratives and combine multimodal data, including textual
inputs and medical images, has considerably improved their value in healthcare [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These features
establish LLMs as transformational appliances for improving healthcare service and research.
      </p>
      <p>
        Regardless of these developments, the lack of high-quality, annotated medical data prevents LLMs
from being widely used in the medical !eld. Strict privacy laws like GDPR1 and HIPAA2, as well as
the di#culties in getting patient consent [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], usually restrict access to real-world medical datasets.
Furthermore, developing complete and useful real-world medical datasets is challenging due to the
high variability of medical information, which adds complexity to the process. Although a signi!cant
amount of medical data is available in English, accessing it in other languages, such as Portuguese,
especially in its European variant, is far more challenging. This lack of data impacts the training and
!ne-tuning of LLMs in speci!c languages [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In order to address these obstacles, di"erent approaches
for synthetic data generation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have been proposed. In particular, the injection of synthetic data in
the training process of LLMs has shown improvements in their performance in speci!c medical domain
tasks [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        To support healthcare applications, synthetic data generation has typically focused on generating
individual medical reports such as admission and discharge reports [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], summarization of medical text
[
        <xref ref-type="bibr" rid="ref16 ref6">16, 6</xref>
        ], and supporting tasks like answering medical questions [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. On the other hand, most medical
synthetic data generated by LLMs, including ClinicalBERT [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], MedPaLM [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], BioGPT [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
PMCLLaMA [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], BioMedLM [21] are mostly in English, with limited utilization of other languages such as
Portuguese. In particular, European Portuguese remains comparatively underrepresented, especially in
the medical !eld, and is often classi!ed as a low- to mid-resource language in this context [22]. Despite
e"orts by the research community and industry to develop language-uni!ed or language-speci!c LLMs
[23], signi!cant gaps continue to persist.
      </p>
      <p>In this paper, we contribute to the mitigation of the problem of lack of clinical corpora in European
Portuguese by proposing a method to generate a speci!c corpus that can also be adapted to other
languages. To the best of our knowledge, no prior research has explicitly focused on generating
comprehensive synthetic medical reports in European Portuguese that encapsulate a patient’s entire
hospitalization journey—from admission and daily progress updates to discharge, all integrated into a
cohesive narrative.</p>
      <p>This work has the following main contributions:
• Generation of Synthetic Medical Dataset: The goal of this research is to generate synthetic
datasets that convey the full journey of a patient’s hospital stay, including admission reports,
daily progress reports, and discharge summaries. In contrast to traditional datasets, which usually
contain single reports, the method proposed in this study captures the entire range of a patient’s
experience. Fine-tuning LLMs on this synthetic medical dataset enables LLMs-based medical
support systems to better understand patient hospitalization processes, improving diagnoses,
personalized treatments, and overall care.
• Mitigating the Data Scarcity Problem: This work is part of a project aimed at establishing
Portugal as a global hub for innovative healthcare solutions. It seeks to address the scarcity of data
in European Portuguese for AI-driven medical decision support. By generating comprehensive
synthetic medical datasets, we aim to meet the speci!c demands of this project, as well as
contribute to broader advancements in this !eld.
• Evaluation of LLM for Generating Patients’ Journeys in European Portuguese (PT-PT):
This study makes a unique contribution by assessing the LLMs’ pro!ciency in generating medical
text in European Portuguese.</p>
      <p>The remainder of the paper is structured as follows: Section 2 provides an overview of existing
literature on natural language generation, with a focus on advancements in medical text generation.
Section 3 outlines the proposed pipeline for generating synthetic medical reports. Section 4 details the
qualitative and quantitative evaluation including exploratory results of our work. Finally, Section 5
concludes the paper with limitations and explores potential directions for future research.</p>
      <sec id="sec-1-1">
        <title>1https://gdpr-info.eu/ 2https://www.hhs.gov/programs/hipaa/index.html</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The exponential growth in LLMs technology, such as OpenAI’s GPT-4 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], DeepSeek [24] and similar
architectures, have sparked signi!cant interest in their potential applications within the healthcare domain.
These models, trained on vast amounts of data, demonstrate remarkable capabilities in understanding
and generating human-like text, which can be leveraged for tasks such as medical documentation
[25], patient communication [26], clinical decision support [27]. Several studies also demonstrate
the e"ectiveness of LLMs in medical text summarization, where they achieve superior performance
compared to traditional methods in terms of both speed and accuracy [28, 29]. Beyond summarization,
LLMs have also demonstrated promising results in other important and critical medical tasks. LLMs
can identify patterns and relationships within large datasets that help to enable applications in clinical
decision support, as explained by Benary et al. [27]. Furthermore, LLMs are also used for analyzing
medical images, such as X-rays and CT scans, and extracting relevant diagnostic features [30, 31, 32].
Moreover, the potential of LLMs in drug discovery is increasingly recognized as researchers leverage
these models to predict molecular properties, optimize drug design, and accelerate the identi!cation of
potential drug candidates [33].
      </p>
      <p>
        On the other hand, Omiye et al. [34] explain in their study that LLMs in the medical !eld are facing a
signi!cant limitation in developing robust and reliable AI models due to a lack of high-quality annotated
medical data. This issue is particularly acute when dealing with speci!c languages and variants, such
as European Portuguese, where data scarcity directly impacts the ability to create and validate accurate
models within the targeted language [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The limitations imposed by real-world data scarcity have
fueled research into methods for generating synthetic medical data. This approach aims to generate
synthetic electronic health records (EHRs) and clinical notes, o"ering a potential solution to the data
scarcity problem [35, 36, 37]. However, the generation of realistic synthetic reports that can capture the
temporal sequence of events of a patient journey in European Portuguese presents unique challenges,
demanding a more sophisticated approach beyond existing techniques [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The generation of a patient
journey requires not only accurate medical information but also a deep understanding of the linguistic
and cultural context [38]. While these applications showcase the versatility of LLMs in healthcare, their
adaptation to the speci!c challenge of patient journey generation remains largely unexplored. Existing
research primarily focuses on individual tasks such as generating admission reports and discharge notes
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Finally, constructing a comprehensive patient journey necessitates the integration of diverse elements,
including the temporal progression of medical events and settings. Notably, in our study, we showed that
existing LLMs, such as Gemini, demonstrate the capability to generate resource-constrained medical
language by leveraging case reports as references, e"ectively generating comprehensive patient journeys
that provide valuable insights for training and evaluating LLMs.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Approach</title>
      <p>We propose a method for generating synthetic patient journeys that fully represent a patient’s hospital
experience, from admission and daily progress notes to discharge, while existing research often
concentrates on generating individual medical reports. This holistic approach o"ers several key advantages.
First, it provides a more complete and nuanced picture of disease progression, enabling a more thorough
analysis. Second, it is important to train AI-driven medical support systems that require comprehensive
patient information to make accurate and intelligent decisions. Our approach encompasses a sequence
of generations: the initial admission report, which details the patient’s condition upon arrival; daily
progress notes documenting examinations, medications, treatments, and procedures (including
surgeries, if applicable); and !nally, the discharge report, which summarizes the patient’s stay and overall
outcome.</p>
      <p>To generate synthetic patient journeys, we utilized the Gemini 1.5 Flash model3 via its API. This</p>
      <sec id="sec-3-1">
        <title>3https://ai.google.dev/gemini-api/docs/models/gemini</title>
        <p>model was selected due to its fast and versatile performance across a wide range of tasks, as well as our
access to its paid version. However, other powerful LLMs could potentially achieve similar results.</p>
        <p>I = Input, O = Output, G = Text Generator, Case Report = individual case report
I1 : Case Report + 1st Prompt</p>
        <p>G</p>
        <p>O1: Admission Report
I2: Case Report + O1 + 2nd Prompt</p>
        <p>G</p>
        <p>O2: Discharge Report
I3 : O1+ O2+ 3rd Prompt</p>
        <p>G</p>
        <p>O3: Full Journey</p>
        <p>The pipeline for the generation of the synthetic patient’s journeys is presented in Figure 1. The
process begins by utilizing a generator G (Gemini 1.5 Flash model), which is responsible for generating
the admission, discharge, and the patient’s full journey reports, including daily progress notes. To
generate these medical reports, a reference case report and !rst prompt are provided as input (I1).</p>
        <p>The reference case reports were extracted from a publicly available dataset of Portuguese clinical
articles for internal medicine, sourced from the SPMI website4. A total of 863 articles were initially
collected. Articles that did not contain case reports or were written in languages other than Portuguese
were excluded, resulting in a !nal dataset of 285 case reports. Each case report includes textual
descriptions of a patient’s clinical case, providing key information such as symptoms, signs, and
relevant medical history for the admission report, as well as treatment summaries, exam results,
discharge medications, and follow-up instructions for the discharge report. These case reports were
selected because they can be used without raising privacy or ethical concerns. However, signi!cant
di"erences in the textual structure between case reports and medical reports introduce additional
challenges to our task. These case reports are also important for generating synthetic reports because
they ensure accuracy, consistency, and relevance to the speci!c context or domain. They provide a
reliable foundation for the model to produce coherent and contextually appropriate outputs, reducing
the risk of errors or irrelevant information.</p>
        <p>The prompt speci!es the desired output format and content, ensuring clarity and precision in the
generated text. As highlighted in the study by Jin et al. [39], re!ning prompts often requires multiple
iterations to achieve clear and well-de!ned instructions. For this reason, a wide-ranging experimentation
was conducted to optimize all the prompt wording and ensure the accuracy and completeness of the
patient journey. Interestingly, during the experiment, we found that using English prompts to instruct
the model to generate Portuguese text yielded better results than prompts written directly in Portuguese.
As a result, all prompts in this study were written in English. To further enhance prompt quality,
ChatGPT was employed to test and re!ne the prompts before their use with the Gemini model. This
step was taken as an additional precaution to mitigate potential biases or errors that might arise during
generation by the Gemini model. However, using the Gemini model alone may not necessarily have a
signi!cant impact. Additionally, the prompts were also evaluated by a linguist, ensuring the linguistic
accuracy of the prompt. To make the generated reports more realistic and aligned with a human
perspective, we instructed the prompt to include occasional typos in the generated text, ensuring they
appear not infrequently and only in a few samples. This rigorous approach ensures the reliability and
quality of the !nal outputs.</p>
        <p>We started by providing the generator with the !rst prompt instructing it to generate an admission
report (O1). Once the admission report was generated, we proceeded to the second generation step.
Here, the admission report, along with the case report and a new second prompt was used as input (I2)
to generate a discharge report (O2). We generate the discharge report before generating the full patient
journey because the full journey includes the admission report, discharge report, and daily updates
during the hospital stay. Therefore, we need both the admission and discharge reports to ensure the G
can generate an accurate and well-structured full patient journey. Finally, for the third and principal
4https://casosclinicosonline.spmi.pt/index.php
generation step of the patient’s full journey, we used the generated discharge and admission report
as input (I3), excluding the case report. This omission allowed the model some freedom, while the
inclusion of the synthetic admission and discharge reports provided necessary constraints to maintain
realism. At the end of all generations, a sample of journeys’ was reviewed by a specialist with expertise
in linguistics and pharmaceutical sciences to evaluate their coherence, accuracy, and overall suitability
as authentic medical records. An example of our generated full patient journey is presented in Figure 2.
*Relatório de Admissão
Doente masculino, 36 anos, autónomo, com história de asma na infância e rinite alérgica, medicado com fluticasona/salmeterol SOS
há 1 ano. Tabagista (10 maços/ano), consumo alcoólico &lt;12g/dia. Viagem à República Dominicana (zonas rurais) há 2 anos. Apresenta
dor e edema escrotal assimétrico (direito &gt; esquerdo) e edema perineal após traumatismo perineal minor (queda de bicicleta).
Queixa de início súbito. Edema de membros inferiores e superiores (predominante em membros inferiores). Observam-se placas
edematosas não eritematosas, pouco dolorosas, na região extensora dos antebraços e perna esquerda. Artrite tibiotársica direita.
Agravamento progressivo do edema escrotal e perineal com áreas equimóticas e descolamento epidérmico, sem necrose. Eosinofilia
(670 µL, 9%), trombocitopenia (85000 µL). PCR levemente elevada (2,4 mg/dL). Negava febre, queixas digestivas, respiratórias e
urinárias. Iniciou doxiciclina e AINE em consulta de urologia e posteriormente amoxicilina/ácido clavulânico. Administrou-se
metilprednisolona 40mg VO. Diagnóstico inicial: suspeita de gangrena de Fournier. Internamento no serviço de cirurgia.
*Relatório do Internamento - Dias 1-10
Paciente internado com diagnóstico inicial de gangrena de Fournier. Evolução clínica com agravamento do edema e celulite perineal.
Manutenção da metilprednisolona 40mg/dia. Realizados exames microbiológicos (hemoculturas, uroculturas, coproculturas)
todos negativos. Serologias para várias infecções também negativas. Eosinofilia em aumento (670 µL no dia 1 para 1980 µL no dia
10). Trombocitopenia persistente. TC pélvica demonstrou celulite perineal sem envolvimento fascial. Sem melhora significativa do
quadro clínico.
*Relatório do Internamento - Dia 11
Iniciada corticoterapia com metilprednisolona 40mg/dia. Observada melhoria clínica e analítica, com redução da eosinofilia e
normalização da trombocitopenia.
*Relatório do Internamento - Dias 12-24
Realizadas biópsias de pele: paniculite eosinofílica com vasculite. Inicialmente considerada Granulomatose Eosinofílica com
Poliangeíte (Síndrome de Churg-Strauss), ANCA negativo. Broncofibroscopia com Lavado Bronco-Alveolar (LBA): trofozoítos
e cistos compatíveis com Entamoeba histolytica. Diagnóstico revisto para infecção pulmonar por E. histolytica com vasculite
associada. Iniciado tratamento com metronidazol 750mg t.i.d. durante 14 dias, com redução gradual da corticoterapia. Melhora
clínica progressiva, com resolução completa das lesões cutâneas e melhoria do derrame pleural (não descrito anteriormente, mas
inferido pela melhora clínica). Azatioprina suspensa devido à trombocitopenia possivelmente iatrogênica.
*Relatório de Alta
Doente com 25 dias de internamento. Alta hospitalar com resolução completa das lesões cutâneas e melhoria significativa do quadro
clínico. Diagnóstico final: infecção pulmonar por Entamoeba histolytica com vasculite associada. Orientação para seguimento
em consulta externa de doenças autoimunes e repetição de LBA para confirmar erradicação da E. histolytica. Recomenda-se
seguimento próximo para monitorização da eosinofilia e função renal e hepática. Observa-se uma possível relação entre a viagem à
República Dominicana e a infecção parasitária.</p>
        <p>Additionally, all sample materials, including code, case reports, prompts, generated admission,
discharge, and comprehensive patient journey reports are available via GitHub repository 5.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>In this section, we assess the quality of the generated patient journeys. The evaluation protocol includes
qualitative and quantitative assessments, done by experts and by automated benchmarking, respectively.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation Protocol (EP)</title>
        <p>Qualitative EP: The linguistic and content analysis of the reports generated by the LLM was
conducted by an expert in linguistics and pharmaceutical sciences with experience in the analysis of
medical reports. Drawing on the expert’s prior experience analyzing in previous projects and following
a comprehensive examination of the real medical reports used in this study, six key parameters were
identi!ed for assessment based on their distinct characteristics:
5https://github.com/tahsirmunna/patients_journey.git
1. Specialised Medical Language: The case reports employ specialized medical language that is
precise and appropriate to the clinical context.
2. Narrative Nature: The case reports exhibit a narrative character (i.e. presenting a story organised
around a sequence of events, with a structure comprising a beginning, middle, and end.)
3. Coherence: The case reports adhere to a logical and consistent structure, without any internal
contradictions.
4. Use of Inter-sentential Connectors: The case reports feature few or no inter-sentential
connectors (e.g. "as a result", "in conclusion", etc).
5. Occurrence of Typographical Errors: The case reports often contain minor typographical
errors; however, these do not hinder the comprehension of the a"ected words.
6. Essential Medical Information: The case reports include all the necessary medical information
to understand the clinical case, such as the reason for hospitalisation (in the case of admission
reports), the patient’s progress during hospitalisation up to discharge (in discharge reports), and
the patient’s complete clinical journey (in full journey reports).</p>
        <p>Each parameter was assessed by the expert using Likert scales [40], with scores ranging from 1 to 5.
For instance, for the !rst parameter, "Specialized Medical Language," the question posed was: "Does
the report use specialized medical language (technical terms appropriate for the medical context)?"
Five response options were provided: (1) Not specialized; (2) Slightly specialised; (3) Moderately
specialised; (4) Quite specialised;(5) Fully specialized. A detailed analysis of the parameters can be
found in the project’s GitHub repository. For the development and interpretation of the Likert scale,
the recommendations of [41] were followed.</p>
        <p>Quantitative EP: To evaluate the quality of the generated patient journeys, we used three di"erent
quantitative methods in order to measure (1) the inclusion of key information presented in the case
report (such as symptoms, diagnoses, and exams) in the generated reports; (2) the semantic and textual
similarity between the case report and the generated reports; and (3) the identi!cation of European
Portuguese in the generated text.</p>
        <p>To evaluate the inclusion of key medical information in the generated reports, we applied
MediAlbertina [42], a state-of-the-art model for Named Entity Recognition (NER) speci!cally designed to
extract medical entities (such as diagnoses, medications, and procedures) from medical texts in
European Portuguese. We extract the entities from individual case reports and the corresponding generated
reports. After extracting, we verify if the generated reports included the key medical entities that existed
in the case reports in the following way: Let Es be the set of entities in the generated individual report
(admission, discharge, and full journey) and Eo be the set of entities in the corresponding individual
case report. We want to verify that all entities in the generated report( Es) are a subset of the entities
in the case report (Eo). We can demonstrate this in this way Es → Eo. After that, we !nd the number
of matches between individuals Es and Eo to get a score.</p>
        <p>To assess the semantic similarity between a candidate (generated report) text and a reference (
case report) text, we utilized BERTscore [43], which uses BERT embeddings to measure contextual
alignment. It works by !rst generating contextual BERT embeddings for each token in both texts. Then,
it computes the cosine similarity[44] between each token in the candidate and reference texts. Precision
is calculated as the average similarity of candidate tokens to their closest reference tokens, while recall
is the average similarity of reference tokens to their closest candidate tokens. The !nal BERTScore is
the harmonic mean (F1 score) of precision and recall. A higher BERTscore indicates greater semantic
similarity between the original and generated text. In addition, we also applied BLEU score [45], a
common metric for evaluating machine translation that measures the amount of word overlap between
the original and generated texts. BLEU scores are lower if there are many di"erences in the exact words
used.</p>
        <p>Finally, we applied a Portuguese Language Variety Identi!er (LVI) [46]. This LVI system is
speci!cally trained to distinguish between European and Brazilian Portuguese texts. Using the LVI, we
the patient’s journey is created in this language variant.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Qualitative Results</title>
        <p>The six previously mentioned quality measurement parameters were analyzed across 30 synthetic
reports corresponding to the medical histories of 10 patients. Each patient had an admission report, a
discharge report, and a full journey report. To ensure diversity in our sample, we selected 10 unique
patients from the 285 available using the k-means clustering technique [47]. Table 1 presents the results
of the arithmetic means and standard deviations of the sample of 10 patients’ journeys.</p>
        <p>As evidenced by the results presented in Table 1, all the synthetic reports analysed, similar to the
case reports, exhibited specialised, consistent and concise language appropriate to the clinical context.
Similarly, all reports displayed a narrative nature, characterised by coherence and logical structure. As
with the case reports, the use of inter-sentential connectors was rare (only one connector in the full
journey report of id_41, two connectors in the discharge report of id_52, and one connector in the
discharge report of id_270 or entirely absent.</p>
        <p>Regarding typographical errors, some occurrences were noted, similar to those found in the case
reports, without compromising the understanding of the words. However, in the discharge report
for the patient identi!ed as id_270, the LLM introduced an unusual alteration by replacing the word
“cotovelos” (elbows) with “coelhos” (rabbits) and adding the following sentence at the end of the report:
"Desculpe pelo erro tipográ!co em “coelho” - era “cotovelo”" ("Apologies for the typographical error in
“coelho” – it should have been “cotovelo”"). This change does not represent a typographical error, and
the sentence at the end is not something typically found in a case report. Another relevant example
was observed in id_41, where the term “in$uenzais” was used in the sentence "Recomenda-se vacinação
anti-pneumocócica e in$uenzais" ("Pneumococcal and in$uenza vaccination is recommended"). This
term is neither dictionary-recognised nor commonly used in medical jargon. Lastly, in the full journey
report for the same patient, the expression “36 UMA” (Units of cigarettes per Year) was incorrectly
translated by the LLM as “36 anos-pack”, a non-existent unit of measurement. This “hallucination” was
isolated, as the other reports maintained the correct terminology.</p>
        <p>Regarding the inclusion of essential medical information, the admission reports exhibited signi!cant
gaps. In several cases, essential details, such as results from medical and laboratory tests, were missing,
which are necessary to justify the therapeutic decisions made. In contrast, the discharge and full
journey reports were notably comprehensive, with the LLM adding coherent and contextually relevant
information (not present in the case reports), signi!cantly enriching the synthetic reports.</p>
        <p>In terms of the variety of Portuguese, the synthetic reports were predominantly written in European
Portuguese. Only one case displayed a feature of Brazilian Portuguese, speci!cally in the sentence "O
quadro clínico se agravou progressivamente" ("The clinical condition progressively worsened"), from
id_91. In this sentence, the clitic pronoun “se” appeared in a proclitic position (“se agravou”), typical of
Brazilian Portuguese. In European Portuguese, the typical construction would have been in an enclitic
position (“agravou-se”).</p>
        <p>Overall, the synthetic reports generated by the LLM demonstrated good quality in terms of specialised
language, narrative structure, and clinical context appropriateness, aligning closely with the standards
observed in the case reports. However, occasional $aws were identi!ed, such as errors in the translation
of units of measurement and omissions of relevant information in some admission reports. Despite
these limitations, the results suggest that the LLM holds great potential for generating medical reports.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Quantitative Results</title>
        <p>In addition to qualitative analysis, we also measured quantitative results, as shown in Table 2, to
provide a comprehensive evaluation of the performance and e"ectiveness of the generated outputs. The
right side of Table 2 shows that NER-inclusive average scores of 1.00 were achieved across all three
report types, indicating that the medical entities extracted from the generated texts matched those in the
case reports. These results were achieved through multiple iterations of prompt re!nement to ensure
appropriate generation and the inclusion of key entities from the case reports. The NER-inclusive
average score for the full journey report also remains 1.00 even though the case report was not provided
during the generation of the full journey. This happens because the admission and discharge reports
linked to the case report were used as input. On the other hand, during full journey generation, the
model generates text with some degree of freedom and occasionally adds treatments or medications
not present in the case report, as explained in Section 4.2.</p>
        <p>Furthermore, the BERTscores, which measure semantic similarity, also demonstrate strong
performance, ranging in average score from 0.75 to 0.77. This indicates a high level of semantic alignment
and coherence between the generated and case reports. In contrast, BLEU scores were much lower,
ranging in average score from 0.026 to 0.148, showing that the generated texts are not word-for-word
copies of the case reports. This doesn’t mean the texts are poor—instead, it highlights that they capture
the intended meaning without repeating the exact words.</p>
        <p>Finally, on the left side of Table 2, the results demonstrate the distribution of Portuguese variants in
the generated texts. A strong predominance of European Portuguese (PT-PT) is evident, with an average
score consistently exceeding 95% across all three report types. The presence of Brazilian Portuguese
(PT-BR) was minimal, with an average score never surpassing 4.58% in any of the generated reports.
This indicates that the model predominantly adheres to European Portuguese linguistic norms.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Exploratory Results</title>
        <p>The exploratory results provide a detailed analysis of 285 generated medical reports, focusing on key
metrics such as the average number of tokens, most frequent terms, and their occurrences. This analysis
highlights the distinct characteristics of admission, discharge, and full journey reports, demonstrating
the model’s ability to generate coherent and contextually appropriate medical narratives in European
Portuguese.</p>
        <p>Table 3 shows the exploratory analysis of the generated medical reports reveals distinct patterns
in the text and its structure. The admission reports are the most concise, averaging 133.80 tokens,
anos: 417, exame: 278, antecedentes: 230,
dor: 162, sexo: 154, refere: 144, história: 136,
físico: 134, urgência: 131, febre: 124
alta: 586, paciente: 410, anos: 375,
seguimento: 360, consulta: 301, doente: 298,
tratamento: 265, sexo: 259, avaliação: 258,
revelou: 257
relatório: 1424, dia: 1110, paciente: 968,
alta: 853, anos: 664, internamento: 614,
dias: 606, dor: 471, tratamento: 452,
realizada: 451
with frequent terms such as "anos" (age), "exame" (exam), "dor" (pain), and "febre" (fever) highlighting
their focus on initial patient assessments, symptoms, and diagnostic procedures. The discharge reports
are more detailed, averaging 323.57 tokens, and they emphasize terms like "alta" (discharge),
"seguimento" (follow-up), and "tratamento" (treatment), re$ecting their role in summarizing hospital stays,
treatment outcomes, and post-discharge care plans. On the other hand, full journey reports are the
most comprehensive, averaging 565.99 tokens, with terms such as "relatório" (report), "internamento"
(hospitalization), and "tratamento" (treatment) indicating thorough documentation of the patient’s entire
hospital experience, from admission to discharge. This information is important as it ensures diversity
across the three types of medical reports, demonstrating the model’s ability to generate coherent and
contextual reports in European Portuguese. This also includes capturing key medical terminology and
structural nuances and supporting the model’s e"ectiveness in producing realistic synthetic patient
journeys. For better understanding, we present a word cloud visualization [48] in Figure 3.
(a) Admission Report
(b) Discharge Report
(c) Full Journey Report</p>
        <p>Figure 3 visualizes the most frequently occurring terms in generated admission, discharge, and full
patient journey reports. This visualization helps identify key themes and terminology used across
di"erent stages of the patient journey. In the admission report, frequent terms include "anos" (age),"exame"
(examination), "antecedentes" (background), "urgência" (emergency), and "dor" (pain). This re$ects a
focus on initial patient evaluations, medical history, and acute symptoms that led to hospitalization.
The discharge report prominently features terms such as "alta" (discharge), "seguimento" (follow-up),
"tratamento" (treatment) and "avaliação" (evaluation). This indicates an emphasis on therapeutic
interventions, clinical progress, and the patient’s condition at the time of discharge. The full journey report
combines terms from both admission and discharge, with dominant words like "relatório" (report),
"internamento" (hospitalization), " "tratamento" (treatment) and "realizada" (carry out). This highlights
the comprehensive nature of the full journey, encompassing the entire patient trajectory from diagnosis
and treatment to follow-up assessments. Overall, the word clouds demonstrate that each stage of the
hospital journey has distinct medical focuses—admission reports concentrate on symptoms and history,
discharge reports emphasize treatment outcomes, and the full journey provides a detailed and cohesive
medical narrative.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions, Limitations and Future Work</title>
      <p>This research highlights the potential of LLMs to address the scarcity of open clinical textual records,
particularly in European Portuguese, by generating realistic and comprehensive patient journeys. This
marks a signi!cant advancement, given the limited data available for training and evaluating LLMs in
this under-resourced language and domain. Our !ndings highlight the e"ectiveness of the Gemini 1.5
Flash model in producing synthetic patient journeys that closely mirror the structure and content of
realworld medical records. The proposed generation approach, referencing real-world clinical case reports,
proved particularly e"ective in ensuring both the coherence and clinical accuracy of the generated
texts. Quantitative analysis con!rms the accurate generation and transference of key medical entities
from the case reports to the generated reports. BERTscores demonstrated strong semantic alignment
with the case reports; the lower BLEU scores indicate that the generated texts are not exact replicas
of the case reports, con!rming the successful creation of high-quality European Portuguese patient
full journeys. Additionally, qualitative evaluations by a linguistics and pharmaceutical sciences expert
experienced in assessing medical reports further validated the clinical accuracy and linguistic coherence
of the generated report. The generated text was also found to feature a high level of association with
European Portuguese, as veri!ed by both evaluation processes and !nally, exploratory results ensuring
the diversity across the three types of medical reports.</p>
      <p>While our paper demonstrates the promising capability of LLMs in generating realistic European
Portuguese patient journeys, it is important to acknowledge the limitations. The dataset we used for
reference, while carefully selected, comprises only 285 anonymized case reports for internal medicine,
which does not fully represent the diversity of patient experiences or clinical scenarios. In addition, the
evaluation process conducted, although comprehensive, relies on automated metrics such as BertScore,
which have their own biases and limitations, especially in capturing the nuanced semantic meaning
of medical language. Furthermore, the study’s focus on a single language limits the generalization of
these !ndings to other under-resourced languages and healthcare settings. Finally, healthcare also
demands high precision, as minor errors can have severe consequences. However, LLMs, designed for
general responses, may produce inaccurate or misleading information, increasing the risk of harmful or
unreliable outputs in medical contexts.</p>
      <p>Future research directions are multifaceted. First, expanding the dataset to include a wider variety of
patient cases and clinical scenarios will improve the generalizability and robustness of the proposed
approach for generating patient’s journey. The integration of additional data modalities, such as medical
images and laboratory results, presents a promising avenue for generating even more comprehensive
and realistic synthetic patient journeys. Moreover, adapting this methodology to other under-resourced
languages will contribute signi!cantly to the development of diverse and widely accessible AI-driven
healthcare tools.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is co-!nanced by Component 5 - Capitalization and Business Innovation, integrated in
the Resilience Dimension of the Recovery and Resilience Plan within the scope of the Recovery and
Resilience Mechanism (MRR) of the European Union (EU), framed in the Next Generation EU, for the
period 2021 - 2026, within project HfPT, with reference 41.
ocae045.
[21] E. Bolton, A. Venigalla, M. Yasunaga, D. Hall, B. Xiong, T. Lee, R. Daneshjou, J. Frankle, P. Liang,
M. Carbin, et al., Biomedlm: A 2.7 b parameter language model trained on biomedical text, arXiv
preprint arXiv:2403.18421 (2024).
[22] A. Névéol, H. Dalianis, S. Velupillai, G. Savova, P. Zweigenbaum, Clinical natural language
processing in languages other than english: opportunities and challenges, Journal of biomedical
semantics 9 (2018) 1–13.
[23] J. Y. Wang, N. Sukiennik, T. Li, W. Su, Q. Hao, J. Xu, Z. Huang, F. Xu, Y. Li, A survey on
humancentric llms, arXiv preprint arXiv:2411.14491 (2024).
[24] Z. Wu, X. Chen, Z. Pan, X. Liu, W. Liu, D. Dai, H. Gao, Y. Ma, C. Wu, B. Wang, et al.,
Deepseekvl2: Mixture-of-experts vision-language models for advanced multimodal understanding, arXiv
preprint arXiv:2412.10302 (2024).
[25] S. Goyal, E. Rastogi, S. P. Rajagopal, D. Yuan, F. Zhao, J. Chintagunta, G. Naik, J. Ward, Healai: A
healthcare llm for e"ective medical documentation, in: Proceedings of the 17th ACM International
Conference on Web Search and Data Mining, 2024, pp. 1167–1168.
[26] C. R. Subramanian, D. A. Yang, R. Khanna, Enhancing health care communication with large
language models—the role, challenges, and future directions, JAMA Network Open 7 (2024)
e240347–e240347.
[27] M. Benary, X. D. Wang, M. Schmidt, D. Soll, G. Hilfenhaus, M. Nassir, C. Sigler, M. Knödler, U. Keller,
D. Beule, et al., Leveraging large language models for decision support in personalized oncology,
JAMA Network Open 6 (2023) e2343689–e2343689.
[28] H. Jin, Y. Zhang, D. Meng, J. Wang, J. Tan, A comprehensive survey on process-oriented automatic
text summarization with exploration of llm-based methods, arXiv preprint arXiv:2403.02901 (2024).
[29] D. Van Veen, C. Van Uden, L. Blankemeier, J.-B. Delbrouck, A. Aali, C. Bluethgen, A. Pareek,
M. Polacin, E. P. Reis, A. Seehofnerová, et al., Adapted large language models can outperform
medical experts in clinical text summarization, Nature medicine 30 (2024) 1134–1142.
[30] S. Wang, Z. Zhao, X. Ouyang, Q. Wang, D. Shen, Chatcad: Interactive computer-aided diagnosis
on medical image using large language models, arXiv preprint arXiv:2302.07257 (2023).
[31] D. Tian, S. Jiang, L. Zhang, X. Lu, Y. Xu, The role of large language models in medical image
processing: a narrative review, Quantitative Imaging in Medicine and Surgery 14 (2023) 1108.
[32] S. Lee, J. Youn, H. Kim, M. Kim, S. H. Yoon, Cxr-llava: a multimodal large language model for
interpreting chest x-ray images, European Radiology (2025) 1–13.
[33] J.-P. Vert, How will generative ai disrupt data science in drug discovery?, Nature Biotechnology
41 (2023) 750–751.
[34] J. A. Omiye, H. Gui, S. J. Rezaei, J. Zou, R. Daneshjou, Large language models in medicine: the
potentials and pitfalls: a narrative review, Annals of Internal Medicine 177 (2024) 210–220.
[35] R. J. Chen, M. Y. Lu, T. Y. Chen, D. F. Williamson, F. Mahmood, Synthetic data in machine learning
for medicine and healthcare, Nature Biomedical Engineering 5 (2021) 493–497.
[36] R. Tang, X. Han, X. Jiang, X. Hu, Does synthetic data generation of llms help clinical text mining?,
arXiv preprint arXiv:2303.04360 (2023).
[37] A. Bauer, S. Trapp, M. Stenger, R. Leppich, S. Kounev, M. Leznik, K. Chard, I. Foster, Comprehensive
exploration of synthetic data generation: A survey, arXiv preprint arXiv:2401.02524 (2024).
[38] J. C. Chow, K. Li, Ethical considerations in human-centered ai: Advancing oncology chatbots
through large language models, JMIR Bioinformatics and Biotechnology 5 (2024) e64406.
[39] H. Jin, H. Che, Y. Lin, H. Chen, Promptmrg: Diagnosis-driven prompts for medical report
generation, in: Proceedings of the AAAI Conference on Arti!cial Intelligence, volume 38, 2024,
pp. 2607–2615.
[40] R. Likert, A technique for the measurement of attitudes, Archives of Psychology, Nova Iorque,
1932.
[41] L. South, D. Sa"o, O. Vitek, C. Dunne, M. A. Borkin, E"ective use of likert scales in visualization
evaluations: A systematic review, Computer Graphics Forum 41 (2022) 43–55. URL: https://doi.
org/10.1111/cgf.14521. doi:10.1111/cgf.14521.
[42] M. J. B. Nunes, MediAlbertina: A family of European Portuguese medical language models, Master’s
thesis, 2024.
[43] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, Bertscore: Evaluating text generation with
bert, arXiv preprint arXiv:1904.09675 (2019).
[44] F. Rahutomo, T. Kitasuka, M. Aritsugi, et al., Semantic cosine similarity, in: The 7th international
student conference on advanced science and technology ICAST, volume 4, University of Seoul
South Korea, 2012, p. 1.
[45] M. Post, A call for clarity in reporting bleu scores, arXiv preprint arXiv:1804.08771 (2018).
[46] H. Sousa, R. Almeida, P. Silvano, I. Cantante, R. Campos, A. Jorge, Enhancing portuguese variety
identi!cation with cross-domain approaches, arXiv preprint arXiv:2502.14394 (2025).
[47] A. Ahmad, L. Dey, A k-mean clustering algorithm for mixed numeric and categorical data, Data &amp;</p>
      <p>Knowledge Engineering 63 (2007) 503–527.
[48] F. Heimerl, S. Lohmann, S. Lange, T. Ertl, Word cloud explorer: Text analytics based on word clouds,
in: 2014 47th Hawaii international conference on system sciences, IEEE, 2014, pp. 1833–1842.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Singhal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gottweis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sayres</surname>
          </string-name>
          , E. Wulczyn,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Pfohl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cole-Lewis</surname>
          </string-name>
          , et al.,
          <article-title>Toward expert-level medical question answering with large language models</article-title>
          ,
          <source>Nature</source>
          Medicine (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Jmlr: Joint medical llm and retrieval training for enhancing reasoning and professional question answering capability</article-title>
          ,
          <source>arXiv preprint arXiv:2402.17887</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Communicative agents for software development</article-title>
          ,
          <source>arXiv preprint arXiv:2307.07924 6</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , et al.,
          <article-title>When llm-based code generation meets the software development process</article-title>
          ,
          <source>arXiv preprint arXiv:2403.15852</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kumichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kuzkina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goncharov</surname>
          </string-name>
          , G. Zubkova,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zenovkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goncharov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Savchenko</surname>
          </string-name>
          , Medsyn:
          <article-title>Llm-based synthetic medical text generation framework</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          , J. Han,
          <string-name>
            <surname>G</surname>
          </string-name>
          . Kee,
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          , et al.,
          <article-title>Enhancing clinical e#ciency through llm: Discharge note generation for cardiac patients</article-title>
          ,
          <source>arXiv preprint arXiv:2404.05144</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ji</surname>
          </string-name>
          , T.-S. Chua, Next-gpt:
          <article-title>Any-to-any multimodal llm</article-title>
          ,
          <source>arXiv preprint arXiv:2309.05519</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Carignan</surname>
          </string-name>
          , E. Horvitz,
          <article-title>Capabilities of gpt-4 on medical challenge problems</article-title>
          , arXiv preprint arXiv:
          <volume>2303</volume>
          .13375 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <article-title>A domain-speci!c next-generation large language model (llm) or chatgpt is required for biomedical engineering and research</article-title>
          ,
          <source>Annals of biomedical engineering 52</source>
          (
          <year>2024</year>
          )
          <fpage>451</fpage>
          -
          <lpage>454</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parwani</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Baig</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Challenges and barriers of using large language models (llm) such as chatgpt for diagnostic medicine with a focus on digital pathology-a recent scoping review</article-title>
          ,
          <source>Diagnostic pathology 19</source>
          (
          <year>2024</year>
          )
          <fpage>43</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Oniani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Visweswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kapoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kooragayalu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Polanska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Enhancing large language models for clinical decision support by incorporating clinical practice guidelines</article-title>
          ,
          <source>arXiv preprint arXiv:2401.11120</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wiertz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boldt</surname>
          </string-name>
          , Ethical, legal, and
          <article-title>practical concerns surrounding the implemention of new forms of consent for health data research: Qualitative interview study</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>26</volume>
          (
          <year>2024</year>
          )
          <article-title>e52180</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , E. Ntoutsi,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Rajawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Medda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <article-title>Unlocking llms: Addressing scarce data and bias challenges in mental health</article-title>
          ,
          <source>arXiv preprint arXiv:2412.12981</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L. F. P.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <article-title>Narrative Extraction from Synthetic Clinical Texts in Portuguese, Master's thesis</article-title>
          ,
          <source>Universidade do Porto (Portugal)</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Hartsock</surname>
          </string-name>
          , G. Rasool,
          <article-title>Vision-language models for medical report generation and visual question answering: A review</article-title>
          ,
          <source>Frontiers in Arti!cial Intelligence</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <fpage>1430984</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>D. Van Veen</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Van Uden</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Blankemeier</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Delbrouck</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Aali</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bluethgen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pareek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Polacin</surname>
            ,
            <given-names>E. P.</given-names>
          </string-name>
          <string-name>
            <surname>Reis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Seehofnerova</surname>
          </string-name>
          , et al.,
          <article-title>Clinical text summarization: adapting large language models can outperform human experts</article-title>
          , Research Square (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alsentzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Boag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-H.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>McDermott, Publicly available clinical bert embeddings</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>03323</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          , et al.,
          <article-title>Palm: Scaling language modeling with pathways</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>24</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Poon, T.-Y. Liu,
          <article-title>Biogpt: generative pre-trained transformer for biomedical text generation and mining</article-title>
          ,
          <source>Brie!ngs in bioinformatics 23</source>
          (
          <year>2022</year>
          )
          <article-title>bbac409</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>C. Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , W. Xie,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Pmc-llama: toward building open-source language models for medicine</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          (
          <year>2024</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>