<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conversational System for Diferential Diagnosis of GI Cancer: Track Overview</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manjira Sinha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajat Pal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tirthankar Dasgupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tata Consultancy Services, Research and Innovation</institution>
          ,
          <addr-line>Kolkata</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Gastrointestinal (GI) tract cancers, encompassing malignancies in the esophagus, stomach, liver, pancreas, and colon, represent a significant burden on global health, contributing to high mortality rates worldwide. The diagnosis of GI cancers is particularly challenging due to the overlapping symptoms shared by various gastrointestinal conditions and the complex etiologies involved. As a result, accurately diferentiating between these cancers remains a formidable task for clinicians, often leading to delays in diagnosis and sub-optimal management. This diagnostic uncertainty has profound consequences. Artificial Intelligent based systems can assist physicians towards faster diagnosis and efective outcome.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Health Analytic</kwd>
        <kwd>GI Tract Cancer</kwd>
        <kwd>Diagnostics models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Gastrointestinal (GI) tract cancers, encompassing malignancies in the esophagus, stomach, liver,
pancreas, and colon, represent a significant burden on global health, contributing to high mortality rates
worldwide. The diagnosis of GI cancers is particularly challenging due to the overlapping symptoms
shared by various gastrointestinal conditions and the complex etiologies involved. As a result, accurately
diferentiating between these cancers remains a formidable task for clinicians, often leading to delays
in diagnosis and sub optimal management. This diagnostic uncertainty has profound consequences,
contributing to the alarming statistic that medical errors are the third leading cause of death in the
United States, with misdiagnoses playing a central role in this issue.</p>
      <p>Additionally, the time constraints placed on healthcare providers—especially given their substantial
time spent on administrative tasks—further exacerbate the problem. Physicians often struggle to balance
these duties with critical patient care, leaving less time for nuanced diagnosis. This can result in delayed
treatment, which negatively impacts prognosis, especially in cancers where early intervention is key to
survival. Studies suggest that administrative burdens contribute significantly to clinician burnout and
ineficiency, with a consequent reduction in the quality of patient care .</p>
      <p>The need for more efective diagnostic support has led to increasing interest in Artificial Intelligence
(AI)-driven diagnostic assistants. Recent surveys indicate that majority of physicians believe AI could
significantly enhance their diagnostic accuracy and improve treatment decisions, underscoring the
demand for AI-based solutions in clinical practice. AI technologies, such as machine learning and deep
learning, have the potential to augment human expertise by analyzing large datasets, including medical
imaging and histopathology, to provide faster, more accurate assessments. These innovations ofer a
promising avenue for addressing the diagnostic challenges in GI tract cancers, potentially reducing
misdiagnoses, enhancing early detection, and improving patient outcomes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description</title>
      <p>Development of a Question Answering based conversational system that can help in the early detection
of GI cancer, given information on general symptoms, diagnosis and medical history of a patient.</p>
      <p>The task participants were provided with a sample set of 30 questions and corresponding model
answers that helped the participants develop the system’s capabilities. The participants were encouraged
to leverage other open source biomedical data sets and knowledge-bases. They were expected to extract
relevant biomedical entities and their relationships and also standardized them form their canonical
forms.</p>
      <sec id="sec-2-1">
        <title>2.1. Evaluation Metrics</title>
        <p>For evaluation purposes, as test data, 50 questions were shared with the participants for submitting the
corresponding answers as generated by their developed conversational system.</p>
        <p>The answers were evaluated against our ground truth answers on the following criteria:
• Concepts/entities/relationships correctly identified
• Linguistics correctness and meaningfulness of the answers
• Consistency in the answers when asked similar question with diferent paraphrases
• Confidence in the questions when doubted</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Participants</title>
        <p>Total five teams across various institutions submitted their results for the task.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Observation</title>
      <p>It was interesting to observe that diferent teams approached the problem very diferently. While some
teams opted for high performing LLMs such as GPT 3.5,others preferred transformer models, general
and domain specific, and also leveraged retrieval augmented generation. Apart from the model diversity,
teams also augmented their datasets from sources such as wikipedia, PubMed.</p>
      <p>The details of each team’s implementation and relative performance are discussed in the individual
team working notes. We present here the overall findings:
1. GPT-3.5 Turbo based conversational system demonstrates potential for generating accurate and
relevant information regarding GI cancers. The system also scored high in entity accuracy and
linguistic correctness and meaningfulness.
2. A combined system using query categorization, RoBERTa-based retrieval from a vector database,
keyword boosting, and BioGPT-based response generation efectively interprets complex and
unstructured user queries related to GI cancers.
3. A BERT-based question-answering system proposed by one of the teams demonstrates the
potential of using alternative to LLMs to provide accurate and relevant information about GI
cancers. However, it performs not so good on BLEU and ROGUE scores.
4. A system leveraging Electronic Health Records (EHR) data provides significant performance
improvement over the baseline.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The shared task on building conversational agents for aiding the diagnosis of GI cancer has yielded
valuable insights into the potential and challenges of leveraging large language models (LLMs) and
transformer-based architectures for healthcare applications. Teams developed a range of solutions,
integrating both general-purpose models and domain-specific models tailored to the nuances of
gastrointestinal oncology. One key finding is that domain-specific models, when properly fine-tuned,
demonstrated enhanced accuracy in understanding and responding to medical queries, especially those
related to complex diagnostic processes and symptom interpretation. However, the general-purpose
models, while versatile, often required additional contextualization and retraining to perform optimally
in this specialized domain. Another important conclusion is the critical role of data quality and the
ethical considerations surrounding the use of sensitive medical data.</p>
      <p>The impact of this task is profound, as it highlights the ability of conversational agents to assist
healthcare professionals by streamlining the diagnostic process, improving patient outcomes, and
ofering scalable solutions to support clinicians. This collaborative efort marks a significant step toward
more intelligent, eficient healthcare solutions in oncology.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>