<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pilot Assessment of Transparency of LLM-based Systems to Support Emergency Rooms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michał Chojnicki</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katarzyna Kaczmarek-Majer</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paweł Burchardt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yanwu Ren</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marek Z. Reformat</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Cardiology, J. Struś Hospital</institution>
          ,
          <addr-line>61-285 Poznań</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Electrical and Computer Engineering, University of Alberta</institution>
          ,
          <addr-line>Edmonton</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Hypertension, Angiology and Internal Medicine, Poznań University of Medical Sciences</institution>
          ,
          <addr-line>61-848 Poznań</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Systems Research Institute Polish Academy of Sciences</institution>
          ,
          <addr-line>Newelska 6, 01-147 Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>The Department of Medical Biology, Poznan University of Medical Sciences</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Ostrava, Institute for Research and Applications of Fuzzy Modeling</institution>
          ,
          <addr-line>70103 Ostrava</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University of Social Sciences</institution>
          ,
          <addr-line>90-113 Łódź</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>One of the main challenges when developing medical decision support systems for the emergency room is adequately filtering the most relevant information. High workload, stress, and the necessity for urgent decisions require precise answers to the questions posed. Although LLM-based systems can provide abundant information, physicians need concise and relevant data in this particular clinical setting. In this study, we perform a pilot assessment of the transparency of selected LLM-based systems. The comparative analysis includes ChatGPT o1 model, which was asked to produce responses with varying temperatures and a pilot graph-based RAG specializing in cardiovascular diseases. A survey was conducted among 33 clinicians regarding the amount of information contained in the provided prompts. Physicians favored the most readable, specific, and helpful answers in emergency department conditions. Reliable medical data and the form in which answers are delivered are crucial for physicians working in the emergency room. We conclude that physicians have preferences for LLM responses at a specific temperature. Further research should be expanded to enable tailoring responses not only to the clinical situation but also to the experience of the asking physician.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Trustworthiness assessment</kwd>
        <kwd>Medical decision support systems</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Retrieval-Augmented Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Attempts to implement Medical Decision Support Systems (MDSS) in emergency rooms based on text
processing by Large Language Models (LLMs) have significantly intensified in recent years [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
The work of a physician in emergency medicine requires that decisions be made particularly rapidly.
Therefore, the decision support provided by LLMs must be substantive and precise. On the other hand,
overly limited information delivery may necessitate additional queries directed to the LLM by medical
personnel to obtain more comprehensive data. The motivation behind this study is to assess whether
variation in the temperature-related parameter of a large language model or a selection of
model itself significantly afects the understanding of answers.
      </p>
      <p>
        Although various LLMs are able to deliver answers to medical questions including the control of
their temperature, there is a need to perform advanced validation with the dedicated end users of such
systems and to ensure that the deployed LLM-based systems are trustworthy. Trustworthiness requires
that they will be technically accurate and follow appropriate ethical and legal principles. Following
Ethics Guidelines for Trustworthy AI [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] the EU Artificial Intelligence Act [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], seven key requirements AI
systems should meet to be deemed trustworthy: (i) human agency and oversight; (ii) technical robustness
and safety; (iii) privacy and data governance; (iv) transparency; (v) diversity, non-discrimination, and
fairness; (vi) societal and environmental wellbeing; (vii) accountability. In this work, we focus on the
transparency-related aspects of the AI-based systems and perform a pilot assessment across domain
experts.
      </p>
      <p>
        In this work, we propose a pilot version of the MedicalGraphRAG specialising in cardiovascular
diseases, aiming to support emergency room clinicians (denoted as D in the remainder of this work),
extending [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As baseline models, ChatGPT o1 with varying temperature [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was also examined:
A. ChatGPT o1, temperature set to 0.2 in prompt;
B. ChatGPT o1, temperature set to 0.5 in prompt;
C. ChatGPT o1, temperature set to 0.8 in prompt.
      </p>
      <p>The structure of the paper is as follows. In Section 2, we present the proposed pilot MedicalGraphRAG
system to support emergency rooms. Section 3 describes the survey and its results. In Section 4, the
main conclusions and future work are outlined.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The pilot MedicalGraph RAG in Cardiovascular Diseases</title>
      <p>Our system transforms the medical literature into structured knowledge graphs to enable efective
question answering with verifiable citations, i.e., references to the sections of the source
information/documents. The process consists of three main phases: document processing and segmentation, knowledge
graph construction, and the question-answering process supporting tracking of the information source.
The system is designed to handle medical literature’s complexity and domain-specific nature while
maintaining high accuracy and providing transparent source attribution.</p>
      <p>The development of the MedicalGraphRAG system starts with the split of documents into fragments,
the extraction of information, and the construction of knowledge graphs, as shown by the flow-chart in
Figure 1.</p>
      <sec id="sec-2-1">
        <title>2.1. Document Processing and Segmentation</title>
        <p>Medical literature presents unique challenges for automated processing due to complex formatting,
specialized terminology, and varying document structures. We address these challenges through a
multi-stage processing pipeline that preserves semantic coherence while enabling eficient downstream
processing.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Content Extraction</title>
          <p>
            We extract content from PDF medical literature using MinerU [
            <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
            ], an open-source tool specifically
designed for precise document content extraction. Unlike traditional PDF parsers that often corrupt
formatting and lose structural information, MinerU preserves document formatting while converting to
Markdown, maintaining critical elements such as tables, figures, and hierarchical section structures
that are essential for medical document comprehension.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Intelligent Text Segmentation</title>
          <p>
            Given that medical documents typically exceed the token limits of current large language models (LLMs)
and that LLM comprehension quality degrades significantly with excessively long contexts [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ],
intelligent text segmentation becomes crucial. We implement a relevance-based sliding window approach
that maintains semantic coherence across segment boundaries.
          </p>
          <p>Our segmentation algorithm operates as follows: starting with an empty window  , each new
paragraph  is evaluated by an LLM using the prompt template:
"Given the current context [CONTEXT  ], determine if the following paragraph is semantically
related and should be included in the same section: [PARAGRAPH ]. Respond with ’INCLUDE’
or ’NEW_SECTION’."
Figure 1: Flowchart for constructing the MedicalGraphRAG.</p>
          <p>When the LLM determines that paragraph  is contextually relevant to the current window (section)
content  , we update it  ←  ∪ {}. Otherwise, we finalize the current section  =  , reset the
window (created a new section)  = {}, and continue processing. This approach ensures that each
segment maintains topical coherence while respecting token limitations.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Knowledge Graph Construction</title>
        <p>Our knowledge graph construction operates at multiple semantic granularities to capture both
finegrained factual information and high-level conceptual relationships. The process creates a hierarchical
knowledge representation that supports both detailed fact retrieval and contextual understanding.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Multi-Level Information Extraction</title>
          <p>For each section , we perform parallel extraction of two types of complementary information: atomic
propositions and summary of section.</p>
          <p>
            Atomic Proposition Extraction: Following the methodology of Chen et al. [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], we extract atomic
propositions that represent the smallest meaningful factual units. Each proposition  is a simplified,
self-contained statement that captures a single fact or relationship. For example, the complex sentence
"This type of myocardial injury is characterized by myocyte necrosis and elevated troponin levels
due to mechanisms other than myocardial ischemia and can be acute or chronic." is decomposed into
propositions such as:
• 1: "This type of myocardial injury is characterized by myocyte necrosis."
• 2: "This type of myocardial injury is characterized by elevated troponin levels."
• 3: "Myocardial injury may be acute."
• 4: "Myocardial injury may be chronic."
          </p>
          <p>Summarization: Simultaneously, each section undergoes structured summarization to extract
highlevel information relevant to potential medical queries. Our summarization prompt specifically targets
key medical concepts including:
• Overall description
• Key medical methods
• Indications
• Mechanisms of action
• Eficacy and safety
• Advantages and limitations
• Clinical applications
• Patient outcomes
• Future directions</p>
          <p>As a result, a summary of each section is created that contains information related to the
abovementioned concepts.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Hierarchical Knowledge Organization</title>
          <p>We create a three-level hierarchical structure to support diferent query types and retrieval granularities:
- Document Level: All section summaries for document  are aggregated using an LLM to
create a comprehensive document-level summary node  . This aggregation process
identifies common themes, resolves potential contradictions, and creates a coherent overview of
the entire document’s content.
- Section Level: Each section  generates a corresponding section summary node  
that captures the essential information within that specific section while maintaining links to the
parent document node.
- Proposition Level: Atomic propositions extracted from each section are semantically grouped
using our LLM-based clustering approach. Similar to the initial segmentation method,
propositions are grouped based on semantic relevance, forming coherent chunks  =
{1, 2, ..., }.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>2.2.3. Node and Relationship Generation</title>
          <p>Within each proposition chunk  , we apply structured information extraction methods derived from
the Camel framework 1 to generate knowledge graph nodes and relationships. The process involves:
• Entity Recognition and Typing;
• Relationship Extraction;
• Node Deduplication.</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>2.2.4. Traceability and Provenance</title>
          <p>• Source document identifier _;
• Section identifier _;
• Original text span.</p>
          <p>A critical aspect of our system is maintaining bidirectional traceability between knowledge graph
elements and source documents. Each generated node  stores:</p>
          <p>This provenance information enables automatic citation generation and allows users to verify the
reliability of extracted knowledge.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Question Answering with Citation Support</title>
        <p>Our question-answering pipeline combines semantic retrieval with source attribution to provide accurate
and verifiable answers to medical queries.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Query Processing and Restructuring</title>
          <p>Medical queries often require domain-specific interpretation and may need restructuring to align with
the knowledge graph’s structure. We employ an LLM to transform user queries into a standardized
medical inquiry format</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Two-Stage Retrieval Process</title>
          <p>To eficiently navigate the large-scale knowledge graph, we implement a coarse-to-fine retrieval strategy:</p>
          <p>Document-Level Retrieval: The structured query is first matched against document-level summary
nodes using semantic similarity. We identify the most relevant documents based on similarity score
distribution.</p>
          <p>Section-Level Refinement: Within the selected documents, we perform fine-grained matching
against section-level nodes to identify the most relevant subsections.</p>
          <p>This two-stage approach significantly reduces computational overhead while maintaining high recall.</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>2.3.3. Context Assembly and Answer Generation</title>
          <p>Retrieved content is assembled into a structured context that includes relevant knowledge graph triplets.
The assembled context is provided to a medical-domain fine-tuned LLM along with the original query
to generate comprehensive answers.</p>
        </sec>
        <sec id="sec-2-3-4">
          <title>2.3.4. Automatic Citation Generation</title>
          <p>A key innovation of our system is the automatic generation of verifiable citations. Using the stored
provenance information, we trace each piece of information used in answer generation back to its
original source location.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>The purpose of the survey is to measure users’ satisfaction with the answers provided by the selected
LLM-based systems and check whether introducing explanations will add value to the system. Clinicans
were asked to evaluate answers generated from ChatGPT o1 temperature set to 0.2 (model A),
temperature set to 0.5 (model B), temperature set to 0.8 (model C) and andwers by the pilot MedigalGraphRAQ
(model D).</p>
      <p>We asked each model (A-D) to answer the following question:</p>
      <p>The patient has ST segment elevation at point J. When to diagnose acute coronary syndrome?
The respondents evaluated the four answers with the same criteria. Physicians evaluated the helpfulness,
coherence, and presence of redundant information in the model responses.</p>
      <sec id="sec-3-1">
        <title>3.1. Illustrative Example: End-to-end Answer Generation Process</title>
        <p>The illustration of the answer generation process done by theMedicalGraphRAG is presented below.</p>
        <p>
          First, the retriever compared all imported materials and returned the most relevant document, that
is [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. A second LLM pass ranked intra-document sections by semantic similarity to the query, after
which the retriever collected the subject–predicate–object triples of the top passages.
(Acute coronary syndromes (ACS))–[PERTAINS_TO]–(ESC Guidelines 2023)
(Acute coronary syndromes (ACS))–[ASSOCIATED_WITH]–(ST-segment elevation)
(Acute coronary syndrome)–[SUGGESTS_DIAGNOSIS_OF]–(Acute chest discomfort)
(Acute coronary syndrome)–[PERFORM]–(12-lead ECG)
(Acute coronary syndrome)–[RECOMMENDED]–(Immediate invasive strategy)
(ST-segment elevation myocardial infarction (STEMI))–[RECOMMENDED]–(PPCI)
(Invasive strategy)–[RECOMMENDS]–(Acute coronary syndrome (ACS))
(ST-segment elevation myocardial infarction (STEMI))–[REQUIRES]–(Fibrinolysis)
(Acute coronary syndrome)–[REQUIRES]–(Electrocardiogram (ECG))
(Cardiac troponin)–[CHARACTERIZED_BY]–(Myocardial injury)
        </p>
        <p>These triples were, in turn, supplied – together with the supporting passages – to LLM, prompting it
to draft a clinical answer and put the document as references. The final generated answer is as follows:
Answer
ST-segment elevation at the J point in a patient is one of the key diagnostic criteria for the recognition of
acute coronary syndrome, in particular ST-segment elevation myocardial infarction (STEMI). According to
the European Society of Cardiology (ESC) guidelines, acute coronary syndrome can be diagnosed on the
basis of electrocardiogram (ECG) changes, such as ST-segment elevation, together with elevated cardiac
troponin levels indicating myocardial injury.</p>
        <p>When ST-segment elevation is identified, an immediate invasive strategy is recommended, comprising
primary percutaneous coronary intervention (PPCI) or reperfusion therapy, including fibrinolysis if PPCI
is not available in a timely manner. The ESC guidelines emphasize the importance of rapid action to
minimize myocardial damage and improve patient prognosis.</p>
        <p>References: 2023 ESC Guidelines for the management of acute coronary syndromes
Translation based on: 2023 ESC Guidelines for the management of acute coronary syndromes (European
Heart Journal, 2023 — doi.org/10.1093/eurheartj/ehad191).</p>
        <p>Patients with acute coronary syndrome (ACS) may initially present with a wide variety of clinical
symptoms, and it is important that both the public and healthcare professionals are aware of this. . . . . . [The
remainder of the references are omitted].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Survey Results</title>
        <p>We received responses from 33 physicians who completed their medical studies between 1979 and 2024.
The study invitation explained that AI-based system would serve as a tool to support clinical
decisionmaking. The purpose of the survey was to determine which responses, in their opinion, represented
the best answers to the given question. The individual questionnaire included two questions:
1. If such an answer was given in everyday work, how helpful is it?
2. Does it contain too much unnecessary information?
Then, they were asked to answer and justify the following summary question:
• Which of the above versions of the answer: A, B, C, or D - would be the best when working in the</p>
        <p>Hospital Emergency Department?</p>
        <p>Figures 2-3 present the main results. With the first survey question ( 1), we aimed to assess whether
the AI-generated answers would be helpful in everyday practice. To evaluate the comprehensiveness
and usefulness of these answers in clinical decision-making, we employed a 10-point scale ranging
from ‘not at all’ to ‘contains everything I need’. The vast majority of respondents selected the model
with an intermediate temperature – 0.5, whose answers contained specific information regarding the
diagnosis of an emergency condition. For the second question (2), we asked respondents to evaluate
the presence of unnecessary information. Model A had the lowest temperature setting and was rated as
the most concise. Nevertheless, in each model, a significant percentage of physicians (ranging from 18%
to 63%) indicated that the information provided was appropriate.</p>
        <p>Figure 4 presents results for the third – summarizing – survey question (3) about selecting the
best answer, while Table 1 summarizes the reasons for such selection. Interestingly, most respondents
selected model B (temperature 0.5). The remaining models were rated similarly (temperature 0.2,
temperature 0.8, and MedicalGraphRAG system). Some individuals did not see a clear favourite (2
individuals). One person concluded that none of the answers met their expectations. Yet, when asked
about the thoroughness of the answer – the overwhelming majority of participants pointed out the
system MedicalGraphRAG, Table 1.</p>
        <p>The present study indicates that, besides the essential substantive support provided by medical
decision support systems, it is very important how the responses are presented. A group of physicians
demands specific, concise responses from LLMs, while another group expects comprehensive answers
that include an extensive clinical context.</p>
        <p>This pilot study does not analyze all factors influencing the selection of responses at the current
stage; however, it demonstrates that various forms of presentation are valued. These diferences may
1. The most precise one, it contained everything I needed
2. The most comprehensive one, it explained all the aspects
3. Simple, clear message
A
stem from both professional experience and individual preferences.</p>
        <p>Collaboration between physicians and decision-support systems, particularly regarding the level
of detail in the provided content, should be tailored to the clinical context and individual users. One
limitation of the present study is that it evaluates a single clinical scenario typical of an emergency
department, assessed by physicians from various specialities. A cardiologist may consider one response
valid if the scenario pertains to their field, whereas an otolaryngologist might prefer a diferent answer.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <p>Significant diferences exist in the responses preferred by physicians from LLM systems depending on
the model’s temperature. Most physicians choose models whose responses are described as ‘the most
precise,’ whereas a second group prefers comprehensive answers containing an extensive description of
the suggested actions. Further research is needed to tailor responses to up-to-date medical knowledge in
the clinical context and physicians’ expectations. This study highlights the need for extensive validation
of LLM-based systems, including their robustness, ethical aspects, transparency, and explicability.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported from the project „Research of Excellence on Digital Technologies and Wellbeing
CZ.02.01.01/00/22_008/0004583” which is co-financed by the European Union. Katarzyna
KaczmarekMajer is supported from the project "ExplainMe: Explainable Artificial Intelligence for Monitoring
Acoustic Features extracted from Speech" (FENG.02.02-IP.05-0302/23) carried out within the First Team
programme of the Foundation for Polish Science co-financed by the European Union under the European
Funds for Smart Economy 2021-2027 (FENG). The authors thank all clinicians involved in the user study,
helping with the evaluation of answers.
During the preparation of this work, the author(s) used X-GPT-4 to generate samples for the survey.
After using these tool(s), the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Glicksberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Timsina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sawant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Raut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Charney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Apakama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Freeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nadkarni</surname>
          </string-name>
          , E. Klang,
          <article-title>Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room</article-title>
          ,
          <source>J Am Med Inform Assoc</source>
          <volume>1</volume>
          :
          <fpage>31</fpage>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kimbrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ashley</surname>
          </string-name>
          , E. Abbott,
          <article-title>Use of a large language model (llm) for ambulance dispatch and triage</article-title>
          ,
          <source>Am J Emerg Med</source>
          <volume>89</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. Y. K.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sushil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Butte</surname>
          </string-name>
          ,
          <article-title>Use of a large language model to assess clinical acuity of adults in the emergency department</article-title>
          ,
          <source>JAMA Network Open</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <fpage>e248895</fpage>
          -
          <lpage>e248895</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          and
          <article-title>Directorate-General for Communications Networks, Content and Technology and Grupa ekspertów wysokiego szczebla ds. sztucznej inteligencji: Ethics guidelines for trustworthy AI</article-title>
          ,
          <string-name>
            <surname>Publications</surname>
            <given-names>Ofice</given-names>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] European parliament and council: Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (artificial intelligence act</article-title>
          ),
          <source>Oficial Journal of the European Union, 12 July</source>
          <year>2024</year>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Menolascina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <article-title>Medical graph rag: Towards safe medical large language model via graph retrieval-augmented generation</article-title>
          ,
          <source>arXiv preprint arXiv:2408.04187</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] OpenAI, Introducing openai o1,
          <source>Retrieved December 6</source>
          ,
          <year>2024</year>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mineru:</surname>
          </string-name>
          <article-title>An open-source solution for precise document content extraction</article-title>
          ,
          <source>arXiv preprint arXiv:2409.18839</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Opendatalab: Empowering general artificial intelligence with open datasets</article-title>
          ,
          <source>arXiv preprint arXiv:2407.13773</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paranjape</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bevilacqua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Lost in the middle: How language models use long contexts</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>12</volume>
          (
          <year>2024</year>
          )
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Dense x retrieval: What retrieval granularity should we use?</article-title>
          ,
          <source>arXiv preprint arXiv:2312.06648</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Byrne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Rossello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Coughlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Barbato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chiefo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Claeys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.-A.</given-names>
            <surname>Dan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Dweck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galbraith</surname>
          </string-name>
          , et al.,
          <article-title>Wytyczne esc 2023 dotyczące postępowania w ostrych zespołach wieńcowych</article-title>
          ,
          <source>Polish Heart Journal (Kardiologia Polska)</source>
          <volume>81</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>