1. Introduction

Conversational System for Diferential Diagnosis of GI Cancer: Track Overview

Manjira Sinha

Rajat Pal

Tirthankar Dasgupta

0 0 Tata Consultancy Services, Research and Innovation , Kolkata , India

Gastrointestinal (GI) tract cancers, encompassing malignancies in the esophagus, stomach, liver, pancreas, and colon, represent a significant burden on global health, contributing to high mortality rates worldwide. The diagnosis of GI cancers is particularly challenging due to the overlapping symptoms shared by various gastrointestinal conditions and the complex etiologies involved. As a result, accurately diferentiating between these cancers remains a formidable task for clinicians, often leading to delays in diagnosis and sub-optimal management. This diagnostic uncertainty has profound consequences. Artificial Intelligent based systems can assist physicians towards faster diagnosis and efective outcome.

eol>Health Analytic GI Tract Cancer Diagnostics models

1. Introduction

Gastrointestinal (GI) tract cancers, encompassing malignancies in the esophagus, stomach, liver, pancreas, and colon, represent a significant burden on global health, contributing to high mortality rates worldwide. The diagnosis of GI cancers is particularly challenging due to the overlapping symptoms shared by various gastrointestinal conditions and the complex etiologies involved. As a result, accurately diferentiating between these cancers remains a formidable task for clinicians, often leading to delays in diagnosis and sub optimal management. This diagnostic uncertainty has profound consequences, contributing to the alarming statistic that medical errors are the third leading cause of death in the United States, with misdiagnoses playing a central role in this issue.

Additionally, the time constraints placed on healthcare providers—especially given their substantial time spent on administrative tasks—further exacerbate the problem. Physicians often struggle to balance these duties with critical patient care, leaving less time for nuanced diagnosis. This can result in delayed treatment, which negatively impacts prognosis, especially in cancers where early intervention is key to survival. Studies suggest that administrative burdens contribute significantly to clinician burnout and ineficiency, with a consequent reduction in the quality of patient care .

The need for more efective diagnostic support has led to increasing interest in Artificial Intelligence (AI)-driven diagnostic assistants. Recent surveys indicate that majority of physicians believe AI could significantly enhance their diagnostic accuracy and improve treatment decisions, underscoring the demand for AI-based solutions in clinical practice. AI technologies, such as machine learning and deep learning, have the potential to augment human expertise by analyzing large datasets, including medical imaging and histopathology, to provide faster, more accurate assessments. These innovations ofer a promising avenue for addressing the diagnostic challenges in GI tract cancers, potentially reducing misdiagnoses, enhancing early detection, and improving patient outcomes.

2. Task Description

Development of a Question Answering based conversational system that can help in the early detection of GI cancer, given information on general symptoms, diagnosis and medical history of a patient.

The task participants were provided with a sample set of 30 questions and corresponding model answers that helped the participants develop the system’s capabilities. The participants were encouraged to leverage other open source biomedical data sets and knowledge-bases. They were expected to extract relevant biomedical entities and their relationships and also standardized them form their canonical forms.

2.1. Evaluation Metrics

For evaluation purposes, as test data, 50 questions were shared with the participants for submitting the corresponding answers as generated by their developed conversational system.

The answers were evaluated against our ground truth answers on the following criteria: • Concepts/entities/relationships correctly identified • Linguistics correctness and meaningfulness of the answers • Consistency in the answers when asked similar question with diferent paraphrases • Confidence in the questions when doubted

2.2. Participants

Total five teams across various institutions submitted their results for the task.

3. Observation

It was interesting to observe that diferent teams approached the problem very diferently. While some teams opted for high performing LLMs such as GPT 3.5,others preferred transformer models, general and domain specific, and also leveraged retrieval augmented generation. Apart from the model diversity, teams also augmented their datasets from sources such as wikipedia, PubMed.

The details of each team’s implementation and relative performance are discussed in the individual team working notes. We present here the overall findings: 1. GPT-3.5 Turbo based conversational system demonstrates potential for generating accurate and relevant information regarding GI cancers. The system also scored high in entity accuracy and linguistic correctness and meaningfulness. 2. A combined system using query categorization, RoBERTa-based retrieval from a vector database, keyword boosting, and BioGPT-based response generation efectively interprets complex and unstructured user queries related to GI cancers. 3. A BERT-based question-answering system proposed by one of the teams demonstrates the potential of using alternative to LLMs to provide accurate and relevant information about GI cancers. However, it performs not so good on BLEU and ROGUE scores. 4. A system leveraging Electronic Health Records (EHR) data provides significant performance improvement over the baseline.

4. Conclusion

The shared task on building conversational agents for aiding the diagnosis of GI cancer has yielded valuable insights into the potential and challenges of leveraging large language models (LLMs) and transformer-based architectures for healthcare applications. Teams developed a range of solutions, integrating both general-purpose models and domain-specific models tailored to the nuances of gastrointestinal oncology. One key finding is that domain-specific models, when properly fine-tuned, demonstrated enhanced accuracy in understanding and responding to medical queries, especially those related to complex diagnostic processes and symptom interpretation. However, the general-purpose models, while versatile, often required additional contextualization and retraining to perform optimally in this specialized domain. Another important conclusion is the critical role of data quality and the ethical considerations surrounding the use of sensitive medical data.

The impact of this task is profound, as it highlights the ability of conversational agents to assist healthcare professionals by streamlining the diagnostic process, improving patient outcomes, and ofering scalable solutions to support clinicians. This collaborative efort marks a significant step toward more intelligent, eficient healthcare solutions in oncology.

Declaration on Generative AI