<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MUCS@ - Question Answering in Hindi for Tourism: Evaluation of Transformer-Based Approaches on VATIKA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rachana Nagaraju</string-name>
          <email>rachananagaraju20@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hosahalli Lakshmaiah Shashirekha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Mangalore University</institution>
          ,
          <addr-line>Mangalore, Karnataka</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Question Answer (QA) systems play a vital role in the field of Natural Language Processing (NLP) as they are designed to automatically generate precise answers to user queries expressed in natural language. In the tourism domain, QA systems are especially significant as they assist travelers by providing reliable and contextaware multilingual information, thereby enhancing visitor's experience overall. With the growing demand for intelligent information retrieval, such systems contribute directly promoting cultural heritage, supporting sustainable tourism, and improving accessibility of knowledge for domestic and international tourists. In view of these objectives, VATIKA: Varanasi Tourism in Question Answer System shared task emphasizes on domain-specific QA for tourism in Varanasi (Kashi), one of the world's oldest living cities and a prominent cultural-spiritual hub. The dataset spans 10 tourism-relevant domains such as Ganga Aarti, Cruise, Temples, Ashrams, Food Courts, and Museums, in Hindi and the system must accurately answer factual, navigational, and experiential queries. This task is crucial for enabling a smoother, enriched, and hassle-free tourism experience. In this paper, we - team MUCS - describe a transformer-based QA pipeline for fine-tuning MuRIL, a pre-trained multilingual model, for extractive QA. We explored three fine-tuning strategies: Hugging Face Trainer, a Custom AdamW Trainer, and a Simplified Trainer variant. On Test-A set, the Hugging Face Trainer achieved F1 score of 0.4972, BLEU score of 0.3529, and ROUGE-L score of 0.5239; the Custom AdamW Trainer approach obtained F1 score of 0.5003, BLEU score of 0.3454, and ROUGE-L score of 0.5300; while the Simplified Trainer produced F1 score of 0.4510, BLEU score of 0.3175, and ROUGE-L core of 0.5095. On the more challenging Test-B set, Hugging Face Trainer delivered the best overall results with an F1 score of 0.3351, BLEU score of 0.2214, and ROUGE-L score of 0.3621, compared to AdamW's 0.0416, 0.2810, and 0.2024, and Simplified Trainer's 0.0582, 0.1956, and 0.2165, F1-score, BLUE score and ROUGE-L score, respectively. These results highlight the efectiveness of the Hugging Face Trainer's fine-tuning strategy in capturing contextual semantics and maintaining robustness across diverse tourism-related queries in Hindi.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Question Answer</kwd>
        <kwd>Tourism</kwd>
        <kwd>Hindi</kwd>
        <kwd>Transformer Models</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Sustainable Tourism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Tourism plays a vital role in economic development worldwide, contributing to income generation,
employment opportunities, and the preservation of cultural heritage. In India, tourism not only boosts
regional pride and cultural exchange but also strengthens sustainable development by encouraging
infrastructure growth and global awareness. Among Indian cities, Varanasi (also known as Kashi
or Banaras) holds a unique position as one of the world’s oldest living cities and as a spiritual and
cultural hub. Millions of domestic and international tourists visit Varanasi every year, seeking spiritual
enrichment, cultural experiences, and historical exploration. Hence, providing tourists with timely,
accurate, and multilingual information has become increasingly important for improving their overall
travel experience.</p>
      <p>
        QA systems have emerged as a core application in the field of NLP. They are designed to automatically
return precise answers to natural language queries posed by users, leveraging structured knowledge
bases or unstructured documents. Unlike traditional search engines, which return a ranked list of
documents, QA systems directly address the user’s information need, thereby reducing cognitive
load and making information retrieval more user-friendly [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The significance of QA becomes
particularly evident in specialized domains such as healthcare [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], education [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and tourism [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where
domain-specific queries require reliable and contextualized answers.
      </p>
      <p>Most QA systems focus on high-resource languages like English, leaving low-resource languages
unexplored in this direction. Many of the Indian languages are low-resourced due to limited annotated
datasets, diverse scripts, and lack of robust language models tailored to regional nuances. Added to
this, many Indian languages have multiple dialects and informal usage patterns, which complicate
accurate understanding and response generation. Further, code-mixed queries and script variations add
their share of challenges. Overall, the ecosystem is still evolving, with promising research but with
significant gaps in resources and infrastructure.</p>
      <p>VATIKA: Varanasi Tourism in Question Answer System shared task at Forum for Information
Retrieval Evaluation (FIRE) 2025 invites researchers to develop models to address the challenges of
QA systems in Hindi - a low-resource Indian language in tourism sector. The dataset is provided in
structured JSON format, organized by domain → context → QAs. Each QA pair includes a unique ID,
question in Hindi, and its corresponding answer as shown in Figures 1 and 2. VATIKA dataset is curated
to cover ten important tourism–relevant domains of Varanasi, including Ganga Aarti, Cruise, Temples,
Ashrams, Kunds, Museums, Food Courts, Travel Agencies, Public Toilets, and General Information.
It consists of Hindi language contexts written in Devanagari script, paired with realistic QA pairs
simulating actual tourist queries. This makes VATIKA one of the first Hindi QA datasets targeting a
domain-specific real-world application in tourism. VATIKA shared task highlights the importance of
specialized QA systems in tourism, which not only supports individual travelers, but also contributes
significantly to cultural promotion and sustainable tourism growth.</p>
      <p>In this paper, we - team MUCS describe the transformer-based QA models submitted for VATIKA
shared task to answer tourism-related queries in Hindi. We experimented fine-tuning MuRIL - a
pretrained multilingual model, using Hugging Face Trainer, Custom AdamW Trainer, and Simplified
Trainer Variant, for extractive QA. The models are evaluated based on standard QA metrics - F1
score, BLEU, and ROUGE-L. Our model, fine-tuned with Hugging Face Trainer, emerged as the
bestperforming system, achieving F1 score of 0.3351, ROUGE-L score of 0.2625, and BLEU score of
0.2214, demonstrating the efectiveness of our pipeline in capturing semantic alignment between
questions and contexts in Hindi. Our code is available on GitHub1 to reproduce the results and explore
further. The importance of this task lies not only in advancing research on Hindi QA systems but also in
its direct applicability to real-world tourism. By enabling intelligent, accurate, and accessible information
delivery to tourists, such systems can enhance cultural promotion, improve visitor satisfaction, and
foster sustainable tourism growth in cities like Varanasi. This paper presents our approach to the shared
task, detailing our methodology, experimental setup, and results, followed by an analysis of system
performance, and discussion on future directions.</p>
      <p>The subsequent sections of this paper details the related works (Section 2), methodology (Section 3),
experiments, results, and implications of our approach (Section 4), declaration on generative AI (Section
5) followed by conclusion and future work (Section 6).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Research in multilingual and Indic-language QA has seen rapid growth in recent years, particularly
with the advancement of transformer-based architectures. Singh et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] explored multilingual QA
approaches for Indic languages, benchmarking transformer models such as mBERT, XLM-R, IndicBERT,
and MuRIL. Their experiments reported F1 scores ranging between 58–72% across Indic languages, with
IndicBERT showing superior results in low-resource contexts. Clark et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] introduced TyDi QA, a
multilingual benchmark that has been widely used for training and evaluating QA models in diverse
languages, including Hindi. They reported average F1 score of 65% for high-resource languages and
45% for low-resource ones, demonstrating the challenges in handling morphologically rich languages
1https://github.com/rachanabn20/VATIKA-Varanasi-Tourism-in-Question-Answer-System
such as Hindi. Artetxe et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced the XQuAD benchmark, which consists of 1,190 QA pairs
translated into ten languages, to evaluate zero-shot cross-lingual transfer. Their experiments showed
that multilingual models such as mBERT, when fine-tuned on English, achieved nontrivial transfer
performance across target languages. Subsequent studies reported that XLM-R outperformed mBERT
by 7–10 points in both Exact Match score and F1 score across several languages, reaching around 70%
F1 score for Hindi and Spanish, while performance dropped substantially for low-resource languages.
      </p>
      <p>
        Li et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] focused on domain-specific QA, proposing a tourism knowledge graph-based system.
Their evaluations on tourism datasets achieved an F1 score of 81% and BLEU-4 of 26, demonstrating
the eficiency of integrating structured knowledge with neural architectures for domain QA tasks.
Contractor et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] investigated QA for tourism-related queries, emphasizing user-generated reviews
and geo-entity retrieval. Their neural QA system reported precision of 73% and recall of 68%, with
an overall F1 score of 70%, highlighting the utility of combining entity retrieval with neural encoders.
Nguyen et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed a tourism-oriented QA framework for conversational systems. Using
transformer-based architectures, they demonstrated BLEU-4 scores of 28.5, ROUGE-L of 61.2, and F1 of
76% on a Vietnamese tourism corpus, showing strong applicability of QA models in tourism dialogue
systems. Lee et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] advanced QA systems by incorporating verification mechanisms to ensure
answer reliability. Their evaluations across multilingual QA tasks showed F1 improvements of 4–6
points over standard transformers, reporting final F1 scores between 68–74% depending on the dataset,
and emphasized trustworthiness in QA responses.
      </p>
      <p>In summary, recent works show significant progress in QA systems in diferent languages and specific
applications to tourism. However, only few studies have explored Hindi-centric QA systems for tourism,
emphasizing the importance of contributions like VATIKA in bridging this research gap.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The proposed methodology employs fine-tuning MuRIL - a multilingual pretrained model, for QA on the
VATIKA dataset. To provide clarity on the fine-tuning process, we describe the end-to-end pipeline in
detail. Three diferent training strategies are implemented to explore the impact of fine-tuning choices.
The end-to-end pipeline for fine-tuning MuRIL model on VATIKA is illustrated in Figure 3.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset Preparation</title>
        <p>VATIKA dataset contains multiple domains with contexts paired with corresponding QA pairs. The
JSON files are parsed such that each instance aligns a context with its corresponding question and
gold-standard answer. These are then converted into Hugging Face Dataset objects for training,
validation, and evaluation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Pre-processing</title>
        <p>MuRIL2 tokenizer is applied to both questions and contexts. The following preprocessing steps ensure
compatibility with extractive QA fine-tuning:
• Questions and contexts are tokenized with a maximum sequence length of 512 tokens.
• A sliding window with a stride of 128 tokens is applied to cover long contexts.
• Character-level answer spans are mapped to token indices to obtain start and end positions for
supervision.</p>
        <p>• Sequences are padded to a fixed length with attention masks for batching.
2https://huggingface.co/google/muril-base-cased
These steps collectively prepare the input data in a structured format suitable for training the QA model.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Fine-tuning Strategies</title>
        <p>Fine-tuning a pretrained model for QA involves adapting that model to understand and extract answers
from the context passages based on given questions. During fine-tuning, MuRIL model is extended
with a QA-specific output head, consisting of a linear layer that predicts the start and end positions
of the answer span within the context. The input is structured as a concatenation of the question
and context, separated by special tokens, and tokenized using MuRIL’s native tokenizer to preserve
linguistic nuances in Hindi. While no modifications is made to the core architecture of MuRIL, the
task-specific output layer is randomly initialized and trained from scratch. This ensures MuRIL’s
multilingual representations could be leveraged with learning task-specific parameters for the Hindi
QA task.</p>
        <p>We fine-tuned MuRIL model for Hindi QA using three distinct strategies to evaluate training eficiency
and model performance, and the strategies are explained below:
1. Hugging Face Trainer uses Trainer API, which manages batching, forward and backward
passes, loss computation, gradient updates, and optimizer scheduling automatically. Training
runs with AdamW optimizer under this framework, while monitoring training loss across epochs.
2. Custom AdamW Training - is a manual training loop implemented without the high-level
Trainer. Each epoch involves:
• Forward pass of the model on a mini-batch
• Loss computation using the predicted and gold answer spans
• Backpropagation with loss.backward()
• Parameter updates with the AdamW optimizer
This setup allows explicit control over gradient accumulation, optimizer steps, and evaluation
checkpoints.
3. Simplified Trainer is a reduced version of the Trainer, focusing exclusively on fine-tuning
with training data. Unlike the full setup, this variant omits additional evaluation or logging steps
during training, serving as a lightweight baseline for comparison.</p>
        <p>In the three approaches, the fine-tuning process is supervised using question–context pairs from
the VATIKA dataset. Each input pair is tokenized and aligned with annotated answer spans, enabling
the model to learn semantic correspondence between questions and context passages. This design
allowed us to evaluate the efect of diferent optimization strategies on MuRIL’s ability to generalize in
low-resource Hindi QA setting. The specific hyperparameters used in our experiments are summarized
in Table 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>
        The experiments are designed to evaluate the performance of transformer-based models on the VATIKA
dataset in Hindi. The goal is to assess efectiveness of the models to handle natural language queries
related to tourist information and services in Hindi, ensuring a smooth and informative experience for
users. VATIKA dataset is a domain-specific QA dataset comprising of contexts and QA pairs written in
Hindi, covering multiple tourism-related domains such as Ganga Aarti, temples, cruises, museums, and
public services. The dataset is divided into training, validation, and two test sets (Test-A and Test-B),
and the statistics of datasets [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] are provided in Table 2.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Results</title>
        <p>We evaluated the models on Validation and Test sets using multiple metrics: F1 Score, BLEU, and
ROUGE-L, and the results for three fine-tuning strategies are presented in Table 3. The evaluation
focused on the ability of the models to predict accurate and fluent answers aligned with the gold
standard annotations. The results indicate that Hugging Face Trainer-based fine-tuning achieved the
best performance, with the highest F1 and ROUGE-L scores, demonstrating strong alignment with the
gold standard answers. Custom AdamW Training and Simplified Trainer, fine-tuning strategies, while
showing some competitiveness in BLEU scores, lagged behind in overall performance. This suggests
that Hugging Face Trainer-based fine-tuning provided a better balance between exactness and fluency,
making it the most efective configuration for the task.</p>
        <p>The findings highlight the challenges of QA in Hindi, particularly in the tourism domain where
answers may be diverse, context-specific, and phrased diferently across contexts. However, the
promising results of Hugging Face Trainer-based fine-tuning underscore the feasibility of building
robust QA systems tailored for tourism applications in Hindi.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Declaration on Generative AI</title>
      <p>Generative Artificial Intelligence (GenAI) tools are used in the preparation of this paper exclusively
for language refinement, grammar correction, and LaTeX formatting assistance. GenAI is not used for
generating research ideas, experiments, datasets, results, or conclusions. All core research activities,
including data preprocessing, model training, evaluation, and interpretation, are performed entirely by
the research team.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This work presented a Hindi QA system for the tourism domain of Varanasi using VATIKA dataset,
submitted by our team MUCS. QA system is developed by implementing three strategies - Hugging
Face Trainer-based setup, Custom AdamW Trainer and a Simplified Trainer, to fine-tune pretrained
MuRIL model, for extractive QA. Each fine-tuning strategy followed the same dataset preparation and
pre-processing steps but difered in model optimization and training. Among the three models submitted
by our team MUCS, Hugging Face Trainer-based fine-tuning achieved the best results with F1 score
of 0.3351, BLEU score of 0.2214, and ROUGE-L score of 0.3621. These results confirm the feasibility
of building robust domain-specific QA systems in Indian languages, where linguistic diversity and
complex query styles pose unique challenges. Comparatively, Custom AdamW Trainer and Simplified
Trainer fine-tuning strategies delivered lower F1 and ROUGE-L scores, underscoring the efectiveness
of Hugging Face Trainer-based fine-tuning as the most reliable configuration. This work demonstrates
the role of domain-adapted QA systems in enhancing the accessibility of tourism information, thereby
enriching visitor experiences in culturally significant cities. Future work will aim to further improve
contextual understanding, extend the system’s applicability across diverse queries, and explore practical
deployment strategies in real-world tourism scenarios.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hirschman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gaizauskas</surname>
          </string-name>
          ,
          <source>Natural Language Processing And Question Answering, Natural Language Engineering</source>
          <volume>7</volume>
          (
          <year>2001</year>
          )
          <fpage>275</fpage>
          -
          <lpage>300</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lopyrev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Squad:
          <volume>100</volume>
          ,000+
          <article-title>Questions for Machine Comprehension of Text</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>2383</fpage>
          -
          <lpage>2392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wiese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Seife</surname>
          </string-name>
          ,
          <article-title>Making Neural Qa as Simple as Possible but Not Simpler</article-title>
          , in
          <source>: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2369</fpage>
          -
          <lpage>2380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Information Extraction And Question Answering: Emerging Directions,
          <source>in: Proceedings of the 2003 Conference on Computational Linguistics and Intelligent Text Processing</source>
          , Springer,
          <year>2003</year>
          , pp.
          <fpage>473</fpage>
          -
          <lpage>483</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , P. Bansal,
          <article-title>IndicQA: Multilingual Question Answering for Indic Languages</article-title>
          ,
          <source>Journal of Natural Language Engineering</source>
          <volume>30</volume>
          (
          <year>2024</year>
          )
          <fpage>145</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Clark</surname>
          </string-name>
          , E. Choi, M. Collins,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garrette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kwiatkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palomaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>TyDi QA</surname>
          </string-name>
          :
          <article-title>A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4544</fpage>
          -
          <lpage>4560</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Artetxe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <article-title>Xquad: A Cross-lingual Question Answering Dataset</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Chen,
          <string-name>
            <surname>TourismKB-QA</surname>
          </string-name>
          :
          <article-title>A Knowledge Graph Based Question Answering Framework for Tourism</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>59</volume>
          (
          <year>2022</year>
          )
          <fpage>103097</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Contractor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Sharma,
          <article-title>TourismQA: Neural Question Answering for Tourism Information Retrieval</article-title>
          ,
          <source>in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1265</fpage>
          -
          <lpage>1268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Le</surname>
          </string-name>
          , M. Vo, SaigonTourism-QA:
          <article-title>Transformer Based Conversational Question Answering for Tourism</article-title>
          ,
          <source>in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>2112</fpage>
          -
          <lpage>2125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>Verified</surname>
            <given-names>QA</given-names>
          </string-name>
          :
          <article-title>Enhancing Answer Reliability in Multilingual Question Answering, Transactions of the Association for Computational Linguistics 13 (</article-title>
          <year>2025</year>
          )
          <fpage>122</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gatla</surname>
          </string-name>
          , Anushka,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kanwar</surname>
          </string-name>
          , G. Sahoo,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Mundotiya</surname>
          </string-name>
          ,
          <article-title>Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models, arXiv preprint (</article-title>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>