<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aniket Deroy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subhankar Maity</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IIT Kharagpur</institution>
          ,
          <addr-line>Kharagpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Gastrointestinal (GI) tract cancers account for a substantial portion of the global cancer burden, where early diagnosis is critical for improved management and patient outcomes. The complex aetiologies and overlapping symptoms across GI cancers often delay diagnosis, leading to suboptimal treatment strategies. Cancer-related queries are crucial for timely diagnosis, treatment, and patient education, as access to accurate, comprehensive information can significantly influence outcomes. However, the complexity of cancer as a disease, combined with the vast amount of available data, makes it dificult for clinicians and patients to quickly find precise answers. To address these challenges, we leverage large language models (LLMs) such as GPT-3.5 Turbo to generate accurate, contextually relevant responses to cancer-related queries. Pre-trained with medical data, these models provide timely, actionable insights that support informed decision-making in cancer diagnosis and care, ultimately improving patient outcomes. We calculate two metrics: A1 (which represents the fraction of entities present in the model-generated answer compared to the gold standard) and A2 (which represents the linguistic correctness and meaningfulness of the model-generated answer with respect to the gold standard), achieving maximum values of 0.546 and 0.881, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;GPT</kwd>
        <kwd>Medical</kwd>
        <kwd>Cancer</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Prompt Engineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Gastrointestinal (GI) tract cancers [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], represent a significant portion of cancer-related morbidity
and mortality worldwide, encompassing malignancies of the esophagus, stomach, liver, pancreas, and
intestines. Early detection and accurate diagnosis are paramount for improving prognosis and patient
survival. However, these cancers present a unique set of challenges due to their complex aetiologies and
overlapping symptoms, which often result in delayed diagnosis and misclassification. Diferentiating
between various GI tract cancers remains a formidable task for clinicians, who must navigate a wide
array of symptoms that can mimic benign conditions or other malignancies. In this context,
cancerrelated queries are crucial for timely diagnosis, treatment, and patient education, as access to accurate
and comprehensive information can significantly impact outcomes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The sheer volume of data and
complexity of cancer as a disease make it dificult for clinicians and patients to quickly access precise
answers.
      </p>
      <p>
        Traditional diagnostic methods [
        <xref ref-type="bibr" rid="ref5">5, 6</xref>
        ], including imaging, endoscopy, and histopathological
examination, although valuable, sometimes fall short in providing rapid and precise diferentiation of these
cancer types. As a result, delays in diagnosis can compromise the efectiveness of treatment, leading to
suboptimal patient outcomes. The need for more advanced, data-driven diagnostic tools has never been
greater. In recent years, the advent of large language models (LLMs) [7, 8] such as GPT-3.5 Turbo has
opened new possibilities in medical diagnostics and decision support. These models, when prompted
appropriately, have demonstrated remarkable potential in generating human-like text and answering
complex queries. Leveraging LLMs for medical diagnostics ofers a promising approach to addressing
the diagnostic challenges posed by GI tract cancers. In this work, we explore the use of prompted LLMs
to generate answers to medical queries, with a particular focus on their applicability to diferentiating
between GI tract cancers. By harnessing the power of these models, we aim to ofer new insights into
how artificial intelligence can assist clinicians in making more timely and accurate diagnoses, ultimately
improving patient outcomes.
      </p>
      <p>We explore prompted large language models (LLMs), such as GPT-3.5 Turbo [9], by designing and
using specific prompts to generate relevant and coherent answers for various medical queries. These
prompts guide the model to focus on producing medically accurate, context-appropriate responses based
on the input. By leveraging the capabilities of GPT-3.5 Turbo, we aim to harness its vast knowledge
base and advanced natural language understanding to assist in addressing a range of medical-related
questions efectively. The use of prompts ensures that the model responds in a structured manner,
providing meaningful information that aligns with the specific medical context of the queries.</p>
      <p>Metrices A1 and A2 represent distinct components or processes within the evaluated system, and their
performance metrics ofer valuable information about the operational eficiency and potential areas for
enhancement. By examining the results from multiple runs, we aim to identify trends, improvements,
or inconsistencies that could impact the overall efectiveness of the system. This evaluation not only
helps in understanding the current performance but also guides future adjustments and optimizations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The application of artificial intelligence (AI) in healthcare [ 10, 11, 12] has seen rapid growth in recent
years, particularly in the areas of diagnostics and decision support. Several studies have explored
the potential of machine learning (ML) and deep learning techniques to assist in the detection and
classification of gastrointestinal (GI) cancers [ 13, 14, 15]. These approaches range from traditional
supervised learning models to more advanced AI systems, including convolutional neural networks
(CNNs) and natural language processing (NLP) models.</p>
      <p>Early work in AI-assisted GI cancer diagnostics primarily focused on image-based techniques. For
instance, CNNs have been employed to analyze endoscopic and radiological images for detecting
specific types of GI cancers such as esophageal and colorectal cancer [ 16, 17, 18]. Studies like [19,
20, 21] demonstrated the ability of deep learning algorithms to match or even surpass human-level
performance in identifying cancerous lesions. While these methods have made significant strides,
they are primarily limited to image processing tasks, requiring large amounts of annotated data and
sophisticated preprocessing techniques.</p>
      <p>NLP models, on the other hand, have recently been leveraged to analyze clinical reports and patient
records. Various works, such as by [22, 23, 24] utilized deep learning models to extract relevant
information from unstructured text data, aiming to support clinical decision-making. However, these
approaches often rely on predefined rules or training with vast, labeled datasets, which limits their
generalizability to diverse clinical scenarios, including GI cancers.</p>
      <p>The emergence of large language models (LLMs) like GPT-3 and GPT-3.5 has introduced a new
frontier in NLP for healthcare [25, 26, 27]. These models are pre-trained on massive amounts of text data
and can generate highly contextualized responses to medical queries with minimal fine-tuning. Studies
such [28, 29, 30] have begun exploring the utility of LLMs in medical applications, showing promising
results in generating accurate, coherent responses to clinical questions. However, most research to
date has focused on general medical knowledge or specific diseases, with limited exploration into their
potential for diagnosing GI tract cancers.</p>
      <p>The idea of using prompted LLMs to aid in GI cancer diagnostics [25, 31, 32] remains underexplored.
While existing NLP systems provide valuable insights, they often lack the ability to dynamically respond
to complex medical queries without extensive fine-tuning. In contrast, GPT-3.5 Turbo and similar models
can be prompted to generate medically relevant text with minimal training, potentially addressing some
of the limitations faced by previous systems. This work builds on the foundation of AI in healthcare by
investigating the use of prompted LLMs to generate responses to GI cancer-related medical queries,
contributing to the growing body of research on AI-powered diagnostics in oncology.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
    </sec>
    <sec id="sec-4">
      <title>4. Task Definition</title>
      <p>There are 30 queries in the training set related to GI-Cancer. There are 50 queries in the testing set
related to GI-Cancer.</p>
      <p>We need to design a Question Answering based conversational system that can provide answers to
queries related to GI Cancer, using an AI model.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>Prompting [33] is an emerging technique in the development of question-answering (QA) systems,
particularly with the advent of large language models (LLMs) like GPT-3.5. Unlike traditional machine
learning methods, which often require large amounts of labeled data and extensive fine-tuning,
prompting involves crafting specific input instructions that guide LLMs to generate relevant answers to user
queries. This approach has been tried for these key reasons:
- Minimal Data and Fine-Tuning Requirements: One of the major advantages of prompting
is that it minimizes the need for extensive training data and domain-specific fine-tuning [ 34].
Traditional QA systems rely on massive datasets to train models for each specific task. With
prompting, models like GPT-3.5 can leverage their pre-existing knowledge from vast amounts
of pre-trained data, allowing them to generate accurate answers across diferent domains with
minimal additional data. This is particularly useful in medical domains like gastrointestinal (GI)
cancer diagnostics, where high-quality labeled data can be scarce and time-consuming to acquire.
- Generalization Across Diverse Topics: LLMs pre-trained on large and diverse corpora are
capable of handling a wide variety of questions without being confined to a narrow domain [ 35].
In contrast, conventional QA systems typically require specialized models for specific areas. By
using well-designed prompts, LLMs can provide answers to medical questions across diferent
GI tract cancers without needing separate models for each type of cancer or medical condition.
This flexibility allows the same model to respond to queries about symptoms, diagnostics, and
treatment, improving eficiency.
- Reduced Development Time: The prompt-based approach reduces the time and complexity
involved in developing QA systems [36]. Traditional systems require careful preprocessing,
feature extraction, and extensive model training. By contrast, prompting requires only
wellconstructed input prompts that instruct the LLM to generate a response. This simplifies the
development process and allows for rapid iteration, enabling QA systems to be deployed quickly
in clinical environments.
- Dynamic and Contextual Responses: LLMs are designed to understand the context of a
question and generate dynamic, human-like responses [37]. By using specific prompts, QA
systems can better interpret the nuances of medical questions, which is critical in complex
domains such as GI cancer. The models can adapt to variations in question phrasing, ofering
contextually relevant answers that align with the complexity of medical knowledge. For example,
they can handle follow-up questions or clarify answers based on additional information provided
by the user.
- Scalability and Adaptability: Prompt-based QA systems are highly scalable [37], as they do not
require retraining or large-scale infrastructure changes when applied to new domains or updated
with new information. This is particularly useful in rapidly evolving fields like medicine, where
new research and findings continuously emerge. The adaptability of LLMs to new topics through
updated prompts allows QA systems to stay current with the latest medical knowledge without
the need for re-engineering the entire system.
- Cost-Efective Solution: Developing and maintaining traditional QA systems can be
resourceintensive due to the need for large datasets, computing power, and expertise in model training [38].
Prompting ofers a cost-efective alternative, as it capitalizes on the power of pre-trained LLMs.
This approach reduces the dependency on large-scale infrastructure and can be easily implemented
without requiring extensive computational resources.</p>
      <p>In summary, prompting is an eficient, flexible, and scalable solution for building question-answering
systems in specialized domains like medical diagnostics. By leveraging LLMs through well-designed
prompts, QA systems can generate accurate, context-aware responses, significantly reducing
development time, data requirements, and costs, while improving the overall quality and accessibility of
information. For the field of GI cancer diagnostics, prompted LLMs ofer a promising tool for clinicians,
allowing them to access critical information and make informed decisions more efectively.
5.1. Prompt Engineering-Based Approach
We used the GPT-3.5 Turbo1 model via prompting to solve the question-answering task. We used
GPT-3.5 Turbo in zero-Shot mode via prompting. After the prompt is provided to the LLM, the
following steps happen internal to the LLM while generating the output. The following outlines the
steps that occur internally within the LLM, summarizing the prompting approach using GPT-3.5 Turbo:</p>
      <sec id="sec-5-1">
        <title>Step 1: Tokenization</title>
        <p>Step 2: Embedding
• Prompt:  = [1, 2, . . . , ]
• The input text (prompt) is first tokenized into smaller units called tokens. These tokens are often
subwords or characters, depending on the model’s design.
• Tokenized Input:  = [1, 2, . . . , ]
• Each token is converted into a high-dimensional vector (embedding) using an embedding matrix
.
• Embedding Matrix:  ∈ R| |× , where | | is the size of the vocabulary and  is the embedding
dimension.</p>
        <p>• Embedded Tokens: emb = [(1), (2), . . . , ()]</p>
      </sec>
      <sec id="sec-5-2">
        <title>Step 3: Positional Encoding</title>
        <p>• Since the model processes sequences, it adds positional information to the embeddings to capture
the order of tokens.
• Positional Encoding:  ()
• Input to the Model:  = emb +</p>
      </sec>
      <sec id="sec-5-3">
        <title>Step 4: Attention Mechanism (Transformer Architecture)</title>
        <p>• Attention Score Calculation: The model computes attention scores to determine the importance
of each token relative to others in the sequence.
• Attention Formula:</p>
        <p>Attention(, ,  ) = softmax
(1)
︂(  )︂
√

• where  (query),  (key), and  (value) are linear transformations of the input .
1https://platform.openai.com/docs/models/gpt-3-5-turbo
• This attention mechanism is applied multiple times through multi-head attention, allowing the
model to focus on diferent parts of the sequence simultaneously.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Step 5: Feedforward Neural Networks</title>
        <p>• The output of the attention mechanism is passed through feedforward neural networks, which
apply non-linear transformations.
• Feedforward Layer:</p>
        <p>FFN() = max(0, 1 + 1)2 + 2
(2)
• where 1, 2 are weight matrices and 1, 2 are biases.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Step 6: Stacking Layers</title>
        <p>• Multiple layers of attention and feedforward networks are stacked, each with its own set of
parameters. This forms the "deep" in deep learning.
• Layer Output:
() = LayerNorm(() + Attention((), (),  ()))</p>
        <p>(+1) = LayerNorm(() + FFN(()))
Step 7: Output Generation
• The final output of the stacked layers is a sequence of vectors.
• These vectors are projected back into the token space using a softmax layer to predict the next
token or word in the sequence.
• Softmax Function:
 (|) =</p>
        <p>exp()
∑︀|=|1 exp( )
• where  is the logit corresponding to token  in the vocabulary.
• The model generates the next token in the sequence based on the probability distribution, and
the process repeats until the end of the output sequence is reached.
(3)
(4)
(5)</p>
      </sec>
      <sec id="sec-5-6">
        <title>Step 8: Decoding</title>
        <p>• The predicted tokens are then decoded back into text, forming the final output.</p>
        <p>• Output Text:  = [1, 2, . . . , ]</p>
        <p>(i) We used GPT-3.5 Turbo in zero-shot mode at temperature 0.7 with the following prompt: "Generate
an answer that includes the key ideas corresponding to the question &lt;question&gt;."
(ii) We used GPT-3.5 Turbo in zero-shot mode at temperature 0.7 with a simpler prompt: "&lt;query&gt;"
(The query was directly passed as the prompt without additional instructions.)
(iii) We used GPT-3.5 Turbo in zero-shot mode at temperature 0.7 with the following prompt: "Please
summarize the key ideas of the answer to the following cancer-related question in one paragraph:
&lt;question&gt;".</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>We have defined two metrics namely:</p>
      <p>A1: A1 measures the fraction of entities present in the model-generated answers which are also
present in the gold standard answers.</p>
      <p>A1 = |(Entities in Model Generated Answer) ∩ (Entities in Gold Standard Answer)|
|(Entities in Gold Standard Answer)|</p>
      <p>We manually calculate the entities for every model-generated answer and the gold standard answer
to calculate the value of A1 for a particular (Query, Answer) pair. The values are added over the 50
samples in the test set and then averaged out of the 50 samples.</p>
      <p>A2: A2 measures the linguistic correctness and meaningfulness of the model-generated answers wrt
to the gold standard data.</p>
      <p>
        We manually assign a value of [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] representing the linguistic correctness and meaningfulness of
the generated answer to calculate the value of A2 for a particular (Query, Answer) pair. The values are
added over the 50 samples in the test set and then averaged out of the 50 samples.
      </p>
      <p>Table 1 shows the values for three diferent runs for the two metrics, namely A1, and A2.</p>
      <p>For A1, A2, the upward trend is positive and suggests that the metric being measured is becoming
more efective or eficient. It might be beneficial to investigate what changes were made between runs
that led to these improvements.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this work, we explored the potential of large language models (LLMs), particularly GPT-3.5 Turbo, as a
question-answering (QA) tool to address the challenges associated with diagnosing gastrointestinal (GI)
tract cancers. GI cancers pose unique dificulties due to overlapping symptoms and complex aetiologies,
A1
A2</p>
      <p>Run 1
often leading to delayed diagnoses and suboptimal treatment strategies. By leveraging the power of
prompted LLMs, we demonstrated the capability of these models to generate coherent, contextually
relevant answers to medical queries, providing a flexible and eficient approach for assisting clinicians
in diferentiating between various GI cancers.</p>
      <p>The analysis of performance metrics for metrics A1 and A2 across three runs reveals important insights
into their behavior and efectiveness. For metric A1, A2 the consistent improvement in performance
across the runs indicates a successful enhancement of the underlying system or methodology. This
positive trend suggests that the adjustments or optimizations implemented are yielding favorable results
and warrants continued focus and refinement.</p>
      <p>Our findings highlight the advantages of using prompt-based systems in healthcare, including the
ability to generalize across a wide range of medical topics, minimal data requirements, and the flexibility
to dynamically adapt to new information. These characteristics make LLMs a promising tool for
augmenting clinical decision-making, particularly in resource-constrained environments where access
to specialized diagnostic expertise may be limited.</p>
      <p>However, it is important to recognize the limitations of current LLMs in handling highly specialized
or nuanced medical cases, underscoring the need for ongoing research to improve model accuracy
and reliability. Future work could focus on further fine-tuning LLMs with domain-specific data or
incorporating additional knowledge sources to enhance their diagnostic capabilities.</p>
      <p>Overall, the integration of LLMs into clinical workflows has the potential to improve the accuracy
and timeliness of cancer diagnoses, particularly in complex cases like GI tract cancers, ultimately
contributing to better patient outcomes and more eficient healthcare delivery.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Drafting content, Grammar
and spelling check, etc. After using this tool/service, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
[6] S. Coda, A. V. Thillainayagam, State of the art in advanced endoscopic imaging for the detection
and evaluation of dysplasia and early cancer of the gastrointestinal tract, Clinical and experimental
gastroenterology (2014) 133–150.
[7] T. I. Wilhelm, J. Roos, R. Kaczmarczyk, Large language models for therapy recommendations
across 3 clinical specialties: comparative study, Journal of medical Internet research 25 (2023)
e49324.
[8] Z. A. Nazi, W. Peng, Large language models in healthcare and medical domain: A review, in:</p>
      <p>Informatics, volume 11, MDPI, 2024, p. 57.
[9] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
processing systems 33 (2020) 1877–1901.
[10] M. Y. Shaheen, Applications of artificial intelligence (ai) in healthcare: A review, ScienceOpen</p>
      <p>Preprints (2021).
[11] A. Väänänen, K. Haataja, K. Vehviläinen-Julkunen, P. Toivanen, Ai in healthcare: A narrative
review, F1000Research 10 (2021) 6.
[12] A. Panesar, Machine learning and AI for healthcare, Springer, 2019.
[13] C. Röcken, Molecular classification of gastric cancer, Expert review of molecular diagnostics 17
(2017) 293–301.
[14] S. Kuntz, E. Krieghof-Henning, J. N. Kather, T. Jutzi, J. Höhn, L. Kiehl, A. Hekler, E. Alwers, C. von
Kalle, S. Fröhling, et al., Gastrointestinal cancer classification and prognostication from histology
using deep learning: Systematic review, European Journal of Cancer 155 (2021) 200–215.
[15] O. Serra, M. Galán, M. Ginesta, M. Calvo, N. Sala, R. Salazar, Comparison and applicability of
molecular classifications for gastric cancer, Cancer treatment reviews 77 (2019) 29–34.
[16] G. Liu, J. Hua, Z. Wu, T. Meng, M. Sun, P. Huang, X. He, W. Sun, X. Li, Y. Chen, Automatic
classification of esophageal lesions in endoscopic images using a convolutional neural network,
Annals of translational medicine 8 (2020).
[17] F. Xie, K. Zhang, F. Li, G. Ma, Y. Ni, W. Zhang, J. Wang, Y. Li, Diagnostic accuracy of convolutional
neural network–based endoscopic image analysis in diagnosing gastric cancer and predicting its
invasion depth: a systematic review and meta-analysis, Gastrointestinal Endoscopy 95 (2022)
599–609.
[18] B. P. Mohan, S. R. Khan, L. L. Kassab, S. Ponnada, P. S. Dulai, G. S. Kochhar, Accuracy of
convolutional neural network-based artificial intelligence in diagnosis of gastrointestinal lesions
based on endoscopic images: A systematic review and meta-analysis, Endoscopy International
Open 8 (2020) E1584–E1594.
[19] A. Mitsala, C. Tsalikidis, M. Pitiakoudis, C. Simopoulos, A. K. Tsaroucha, Artificial intelligence
in colorectal cancer screening, diagnosis and treatment. a new era, Current Oncology 28 (2021)
1581–1607.
[20] H. He, S. Yan, D. Lyu, M. Xu, R. Ye, P. Zheng, X. Lu, L. Wang, B. Ren, Deep learning for
biospectroscopy and biospectral imaging: state-of-the-art and perspectives, 2021.
[21] Z. Omar, Deep Learning Applications in Medical Bioinformatics, Master’s thesis, University of</p>
      <p>Windsor (Canada), 2021.
[22] D. Zhang, C. Yin, J. Zeng, X. Yuan, P. Zhang, Combining structured and unstructured data for
predictive models: a deep learning approach, BMC medical informatics and decision making 20
(2020) 1–11.
[23] I. Spasic, G. Nenadic, et al., Clinical text data in machine learning: systematic review, JMIR medical
informatics 8 (2020) e17984.
[24] T. M. Seinen, E. A. Fridgeirsson, S. Ioannou, D. Jeannetot, L. H. John, J. A. Kors, A. F. Markus,
V. Pera, A. Rekkas, R. D. Williams, et al., Use of unstructured text in prognostic clinical prediction
models: a systematic review, Journal of the American Medical Informatics Association 29 (2022)
1292–1302.
[25] Y. Liu, T. Han, S. Ma, J. Zhang, Y. Yang, J. Tian, H. He, A. Li, M. He, Z. Liu, et al., Summary of
chatgptrelated research and perspective towards the future of large language models, Meta-Radiology
(2023) 100017.
[26] M. U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M. B. Shaikh, N. Akhtar, J. Wu, S. Mirjalili,
et al., A survey on large language models: Applications, challenges, limitations, and practical
usage, Authorea Preprints (2023).
[27] M. U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M. B. Shaikh, N. Akhtar, J. Wu, S. Mirjalili,
et al., Large language models: a comprehensive survey of its applications, challenges, limitations,
and future prospects, Authorea Preprints (2023).
[28] Z. Z. Chen, J. Ma, X. Zhang, N. Hao, A. Yan, A. Nourbakhsh, X. Yang, J. McAuley, L. Petzold, W. Y.</p>
      <p>Wang, A survey on large language models for critical societal domains: Finance, healthcare, and
law, arXiv preprint arXiv:2405.01769 (2024).
[29] W. Khan, S. Leem, K. B. See, J. K. Wong, S. Zhang, R. Fang, A comprehensive survey of foundation
models in medicine, arXiv preprint arXiv:2406.10729 (2024).
[30] M. S. Treder, S. Lee, K. A. Tsvetanov, Introduction to large language models (llms) for dementia
care and research, Frontiers in Dementia 3 (2024) 1385303.
[31] P. Hager, F. Jungmann, K. Bhagat, I. Hubrecht, M. Knauer, J. Vielhauer, R. Holland, R. Braren,
M. Makowski, G. Kaisis, et al., Evaluating and mitigating limitations of large language models in
clinical decision making, medRxiv (2024) 2024–01.
[32] P. Hager, F. Jungmann, R. Holland, K. Bhagat, I. Hubrecht, M. Knauer, J. Vielhauer, M. Makowski,
R. Braren, G. Kaissis, et al., Evaluation and mitigation of the limitations of large language models
in clinical decision-making, Nature medicine (2024) 1–10.
[33] S. Maity, A. Deroy, S. Sarkar, Exploring the capabilities of prompted large language models in
educational and assessment applications, arXiv preprint arXiv:2405.11579 (2024).
[34] C. Ge, R. Huang, M. Xie, Z. Lai, S. Song, S. Li, G. Huang, Domain adaptation via prompt learning,</p>
      <p>IEEE Transactions on Neural Networks and Learning Systems (2023).
[35] R. Patil, V. Gudivada, A review of current trends, techniques, and challenges in large language
models (llms), Applied Sciences 14 (2024) 2074.
[36] Z. Gekhman, N. Oved, O. Keller, I. Szpektor, R. Reichart, On the robustness of dialogue history
representation in conversational question answering: a comprehensive study and a new
promptbased method, Transactions of the Association for Computational Linguistics 11 (2023) 351–366.
[37] B. Alsafari, E. Atwell, A. Walker, M. Callaghan, Towards efective teaching assistants: From
intent-based chatbots to llm-powered teaching assistants, Natural Language Processing Journal
(2024) 100101.
[38] R. Y. Cohen, V. P. Kovacheva, A methodology for a scalable, collaborative, and resource-eficient
platform to facilitate healthcare ai research, arXiv preprint arXiv:2112.06883 (2021).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Payne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dvorak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garewal</surname>
          </string-name>
          ,
          <article-title>Field defects in progression to gastrointestinal tract cancers</article-title>
          ,
          <source>Cancer letters 260</source>
          (
          <year>2008</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Bijlsma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadanandam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vermeulen</surname>
          </string-name>
          ,
          <article-title>Molecular subtypes in cancers of the gastrointestinal tract</article-title>
          ,
          <source>Nature reviews Gastroenterology &amp; hepatology 14</source>
          (
          <year>2017</year>
          )
          <fpage>333</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Islami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kamangar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aghcheli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Semnani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Taghavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Marjani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Merat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nasseri-Moghaddam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pourshams</surname>
          </string-name>
          , et al.,
          <article-title>Epidemiologic features of upper gastrointestinal tract cancers in northeastern iran</article-title>
          ,
          <source>British journal of cancer 90</source>
          (
          <year>2004</year>
          )
          <fpage>1402</fpage>
          -
          <lpage>1406</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hyatt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Humphries</surname>
          </string-name>
          , G. Lock,
          <string-name>
            <given-names>M.</given-names>
            <surname>Varlow</surname>
          </string-name>
          ,
          <article-title>How can we improve information for people afected by cancer? a national survey exploring gaps in current information provision, and challenges with accessing cancer information online</article-title>
          ,
          <source>Patient Education and Counseling</source>
          <volume>105</volume>
          (
          <year>2022</year>
          )
          <fpage>2763</fpage>
          -
          <lpage>2770</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodríguez-Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gómez-Río</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Medina-Benítez</surname>
          </string-name>
          , J. V.
          <article-title>-d.</article-title>
          <string-name>
            <surname>Moral</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ramos-Font</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ramia-Ángel</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Llamas-Elvira</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ferrón-Orihuela</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Lardelli-Claret</surname>
          </string-name>
          ,
          <article-title>Application of modern imaging methods in diagnosis of gallbladder cancer</article-title>
          ,
          <source>Journal of surgical oncology 93</source>
          (
          <year>2006</year>
          )
          <fpage>650</fpage>
          -
          <lpage>664</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>