<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>K. Holtedahl, P. Vedsted, L. Borgquist, G. A. Donker, F. Buntinx, D. Weller, T. Braaten, P. Hjertholm,
J. Månsson, E. L. Strandberg, et al., Abdominal symptoms in general practice: Frequency, cancer
suspicions raised, and actions taken by gps in six european countries. cohort study with prospective
registration of cancer, Heliyon</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3389/fmed.2024.1392555</article-id>
      <title-group>
        <article-title>Gastrointestinal Cancer Related Question Answering Using BERT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S ArunaDevi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J Abirami</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B Bharathi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of CSE Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>3</volume>
      <issue>2017</issue>
      <fpage>412</fpage>
      <lpage>421</lpage>
      <abstract>
        <p>In today's world, Large Language Models (LLMs) have made significant advancements across various fields. While there are numerous models tailored for specific purposes, there remains a scarcity of models dedicated to the medical domain. This paper details our participation in the shared task “Conversational System for Diferential Diagnosis of GI Cancer” at FIRE 2024, which addresses this gap. We employed a BERT model specifically trained for question answering. This task involved responding to inquiries posed by both doctors and patients.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BERT</kwd>
        <kwd>Gastrointestinal cancer</kwd>
        <kwd>Large Language model</kwd>
        <kwd>Bleu</kwd>
        <kwd>Rouge</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This enables BERT to comprehend a word’s context by taking into account both its left and right
surrounds.BERT can be used to solve a variety of NLP issues, including: Text categorization, answering
questions, etc.</p>
      <p>To sum up, BERT models are highly efective in deciphering the meaning and context of language,
which makes them helpful for a variety of NLP applications requiring in-depth language comprehension.
This paper is sectioned as follows: Section 2 describes the previous works that have been done by
various authors in the field of hope speech detection. Section 3 provides a detailed explanation of
the data set. Section 4 provides an overview of the work done. Section 5 deals with the development
of model for interactive question answering for diferent types of gastrointestinal cancer. Section 6
analyses the result obtained from our system. Section 7 provides the conclusion of this paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Joseph et al[8] developed a BERT based system to automatically extract detailed tumor site and
histology information from oncological pathology reports. They first trained a base language model to
comprehend the technical language in pathology reports. This involved unsupervised learning on a
training corpus. Then they trained a question-and-answer (Q&amp;A) model that connects a Q&amp;A layer
to the base pathology language model to answer pathology questions. Their final system called as
CancerBERT(caBERTnet) network consisted of a network 3 BERT based model.</p>
      <p>Qingqing Zhou et al[6]focused on the application of RAG in the field of clinical gastroenterology
in China, aiming to address the issue associated with the continuous increase in the infection rate of
Helicobacter pylori and the rising incidence of gastric cancer. The fine-tuned model exhibited an 18%
improvement in hit rate compared to its base model, gte-base-zh. Moreover, it outperformed OpenAI’s
Embedding model by 20%. For fine-tuning the gte-base-zh model, we employed GPT-3.5 Turbo to aid
in generating question-answer pairs.The aim of Adi Lahat et al[10] is to evaluate the performance of
ChatGPT in answering patients’ questions regarding gastrointestinal health. ChatGPT was able to
provide accurate and clear answers to patients’ questions in some cases, but not in others.
Jiajia Yuan et al[11] found out that prompt engineering afects large language models’ performance
in GI oncology. They designed the prompts as follows: Initially, the models are subjected to a more
sophisticated introduction prompt, intricately crafted with complex semantic. Then an advanced method
of in-context learning was introduced, encouraging the models to extract knowledge and patterns from
various contexts rather than individual sentences, fostering a more comprehensive understanding of
the text. Lastly, they have implemented an iterative feedback loop through multi-round
question-andanswer sessions, reinforcing the model’s ability to comprehend, retain, and apply information over
successive interactions.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Description</title>
      <p>In this shared task, we were not given any explicit dataset to train the model. There were no specific
methodology given to access public data sources. So, we were asked to use API for data or any other
public data sources. Considering this, we have collected our dataset from Wikipedia for each type of
gastrointestinal cancer for general information such as symptoms, causes, diagnosis and treatments.
For genetic mutations and its efect for each of the cancers, we have used both Wikipedia and various
other public data sources such as journals, articles and sites.1
1Data sources: Esophageal Cancer, Genetic mutations for Esophageal Cancer, Pancreatic Cancer, SMAD4- Pancreatic cancer,
p53-Pancreatic Cancer, ARID1A- Pancreatic Cancer, GNAS- Pancreatic Cancer, KRAS- Pancreatic Cancer, MEN1- Pancreatic
Cancer, Gallbladder Cancer, CDKN2A- Gallbladder Cancer, Stomach Cancer, Liver Cancer, Anal Cancer, Colorectal Cancer,
PIK3CA- Colorectal Cancer, Gastrointestinal Stromal Tumor
In our dataset, we have collected all the above mentioned informations for 8 gastrointestinal
cancers such as Gastrointestinal Stromal Tumor(GIST), esophageal, pancreatic, gall bladder, stomach, liver,
anal and colorectal cancers. Each cancer’s data is stored as a paragraph. Then data is labelled
appropriately for distinguishing between the diferent factors of that cancer.</p>
      <sec id="sec-3-1">
        <title>A sample of the dataset:</title>
        <p>Gallbladder cancer: Symptoms: Steady pain in the upper right abdomen, Indigestion (dyspepsia),Bilious
vomit,Weakness, Loss of appetite, Weight loss, Jaundice and vomiting due to obstruction, Early symptoms
mimic gallbladder inflammation due to gallstones. Diagnosis: Transabdominal ultrasound, CT scan,
endoscopic ultrasound, MRI, and MR cholangio-pancreatography (MRCP) can be used for diagnosis.2
The entire data we that we have used for the task is uploaded in our github page.3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Work</title>
      <p>The proposed system architecture, as illustrated in Figure 1, follows a structured methodology designed
for eficient question answering. The methodology is composed of the following key steps: 1. Selecting
the context according to the question given to the system so that the model can extract the answer from
it. 2. In the this step, the system simplifies the input question. This transformation helps the model
better understand the essence of the question . 3. The selected context is then divided into multiple
chunks. This segmentation allows the system to handle large contexts more eficiently. 4. Each chunk
of context is parsed individually to locate potential answers. This step allows the model to consider all
possible segments of the text that could contain the answer. 5. The model calculates the score for each
chunk parsed. 6. Finally by comparing the scores of all the chunks, we choose the answer with the best
score.</p>
      <sec id="sec-4-1">
        <title>2Gallbladder cancer 3Access our work and dataset through Github</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Implementation</title>
      <p>LLaMA(Large Language Model Meta AI) models are large language models built to handle more
generalpurpose language understanding and generation. These models are usually larger in scale and require
more memory and computational power. LLaMA2 has 7 billion parameters and LLaMA3 has 47 billion
parameters. Whereas BERT large model has 340 million parameters. Due to large computational
requirements of LLaMA models, we were not able to deploy it in our systems.</p>
      <p>Hence we have used the method of Question Answering (QA) by utilizing a BERT model
(bert-largeuncased-whole-word-masking-finetuned-squad) to find an answer inside a given text passage based
on the question asked by the doctor. Tokenization, input preprocessing and output extraction are the
main steps in the process. With a pre-trained tokenizer built for BERT, the question and text passage
are transformed into tokens. Tokenization divides the text into more manageable units called tokens,
which are then translated to integer IDs that the BERT model can comprehend.Usually, BERT models
can only handle 512 tokens. So, the context is divided into 3 segment: (i) From causes to symptoms (ii)
From symptoms to genetic mutations (iii) From genetic mutations to treatments</p>
      <p>After the above division into segments, the input IDs are produced as tensors and fed into the BERT
model together with the attention mask, which indicates which tokens are actual and which may be
padding. Two sets of logits, called startscores and endscores, are returned by the model. These sets of
logits represent the likelihood that each token marks the beginning or end of the answer, respectively.</p>
      <p>The BERT model predicts the most likely span of tokens that answer the question by finding the
token positions with the highest start and end scores. Subword tokens are merged correctly by stitching
the tokens between these points back into a human-readable format (by removing continuation
indicators).The model produces an invalid answer (e.g., predicting [SEP]) if the start or end scores are
too low, or it returns a fallback message indicating that no answer could be found.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>Since we were not provided with an explicit training dataset. We collected data from various sites on
our own and used it as context to be parsed. We did not use any dataset to train our model. Hence the
results obtained might be low in their accuracy rate. We tested the outputs our model produced using
the test data that FIRE 2024 provided, and the following outcomes were attained.</p>
      <p>From Table 1, we can infer that BLEU and ROUGE-1 scores are low. BLEU (Bilingual Evaluation
Understudy) score measures the quality of machine-generated answers by comparing them to a set of
reference answers. It computes the overlap of n-grams (sequences of n words) between the predicted
and reference answers, with higher n-gram precision indicating better alignment. The brevity penalty
(BP) is a component of the BLEU score designed to penalize machine-generated translations or answers
that are shorter than the reference text.</p>
      <p>ROUGE-1 score measures the overlap of unigrams between the generated answer and the reference
answer. It provides a measure of how well the predicted answer covers the important words found in
the reference. If the generated answer is shorter than the reference, it typically lowers recall,
potentially lowering the overall ROUGE score. ROUGE-2 score measures the overlap of bigrams (pairs of
consecutive words) between the generated answer and the reference answer. Since bigrams focus on
word pairs, this metric evaluates both content and some level of fluency or coherence in the generated
text. Our dataset consists of only a limited information due to the token limit in BERT. This makes
answer generated by our system smaller in length when compared to the original answers due to which
our BLEU, ROUGE-1 and ROUGE-2 scores are low.
Question: This patient likely has pancreatic cancer.Can you provide information on the role of BRCA
mutations in pancreatic cancer and potential implications for treatment?
Answer: The mutations BRCA1 and BRCA2 increase a person’s lifetime risk of developing pancreatic
cancer.Their normal function is to repair damage to DNA, but when BRCA1 or BRCA2 is mutated and
doesn’t work correctly, the accumulation of unrepaired DNA damage can ultimately lead to unregulated
cell growth, or cancer.</p>
      <p>Though the model can extract answers from the given text, when the chunk size is too large for the
model to handle, it is not able to extract the answer from the context. This reduces the reliability of the
system.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>The applications of large language models (LLMs) like BERT are vast, yet their use in the medical field
remains in its developmental stages. To address this gap, we have used a BERT model specifically
tailored to answer questions related to gastrointestinal cancer. Our approach involved preparing the
dataset for various gastrointestinal cancers and selecting the relevant context based on the specific
cancer mentioned in the query. This was followed by tokenizing the input and extracting the appropriate
answers.</p>
      <p>Looking ahead, this model can be trained on larger datasets to enhance its performance further.
Additionally, it can be adapted to extract data directly from medical resources, thereby improving its
accuracy and reliability.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT-4o in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Dunham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Pacak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Pratt</surname>
          </string-name>
          ,
          <article-title>Automatic indexing of pathology data</article-title>
          ,
          <source>Journal of the American Society for Information Science</source>
          <volume>29</volume>
          (
          <year>1978</year>
          )
          <fpage>81</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. V.</given-names>
            <surname>Bernstam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <article-title>A frame semantic overview of nlp-based information extraction for cancer-related ehr notes</article-title>
          ,
          <source>Journal of biomedical informatics 100</source>
          (
          <year>2019</year>
          )
          <fpage>103301</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abu-Hanna</surname>
          </string-name>
          , N. de Keizer, R. Cornet,
          <article-title>Natural language processing in pathology: a scoping review</article-title>
          ,
          <source>Journal of clinical pathology 69</source>
          (
          <year>2016</year>
          )
          <fpage>949</fpage>
          -
          <lpage>955</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Seifert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rubin</surname>
          </string-name>
          , N. de Wit,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hungin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palka</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Mendive,</surname>
          </string-name>
          <article-title>The management of common gastrointestinal disorders in general practice: a survey by the european society for primary care gastroenterology (espcg) in six european countries</article-title>
          ,
          <source>Digestive and Liver Disease</source>
          <volume>40</volume>
          (
          <year>2008</year>
          )
          <fpage>659</fpage>
          -
          <lpage>666</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>