<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VATIKA: A Hindi Machine Reading Comprehension Approach for Varanasi Tourism Question Answering using mT5</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harsh Mishra</string-name>
          <email>harshmishra83022@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naina Yadav</string-name>
          <email>nainayadav585@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nirbhay Kumar Tagore</string-name>
          <email>nktagore@rgipt.ac.in</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ramakant Kumar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering and Application, GLA University</institution>
          ,
          <addr-line>Mathura, Uttar Pradesh</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Engineering and Applications, GLA University</institution>
          ,
          <addr-line>Mathura, Uttar Pradesh</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science and Engineering, Dr. B. R. Ambedkar National Institute of Technology (NIT) Jalandhar</institution>
          ,
          <addr-line>Punjab</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Department of Computer Science and Engineering, Rajiv Gandhi Institute of Petroleum Technology (RGIPT)</institution>
          ,
          <addr-line>Jais, Amethi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Tourism-focused Question Answering in Hindi remains a low-resource challenge, despite the language being spoken by over 600 million people. Existing multilingual QA systems often underperform due to cultural, linguistic, and domain-specific gaps. To address this, we propose a method using the VATIKA dataset, a high-quality Hindi-based Machine Reading Comprehension (MRC) dataset comprising 2,902 question-answer pairs covering ten tourism domains in Varanasi. This dataset was released as part of the FIRE 2025 tracks. Our proposed models use two multilingual generative models, mT5-small and mBART-50. Performance is evaluated using BLEU, ROUGE, and QA-F1 scores. Comparative analysis shows that our model achieves competitive QA-F1 (0.4529) and strong BLEU-1 (56.5), while mBART-50 performs better on longer n-gram fluency (BLEU-4 = 19.6), compared to baseline models IndicBERT and XLM-R.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Domain-Specific QA</kwd>
        <kwd>Generative Models</kwd>
        <kwd>Low-Resource Languages</kwd>
        <kwd>Hindi Question Answering</kwd>
        <kwd>mBART-50</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the rise of digital tourism platforms and increased smartphone use, tourists today expect instant
and personalized access to information in their native languages. However, most AI-based tourist
support systems prioritize high-resource languages like English, thereby neglecting millions of
nonEnglish speakers — especially in linguistically diverse regions like India. Hindi, spoken by over 600
million people, remains significantly underrepresented in many domain-specific tasks like question
answering.</p>
      <p>
        Varanasi (Kashi) is one of the world’s oldest and most culturally rich cities, a prominent pilgrimage
site, and home to thousands of temples, sacred ghats, spiritual ashrams, and unique experiences like
Ganga Aarti. Tourists often ask questions in Hindi, such as:
अस्सी घाट पर गंगा आरती कब होती है?
काशी वनाथ मंदर के दशर्न के लए यक्ा समय है?
वाराणसी में शाकाहारी भोजन कहाँ मलेगा?
While these questions may appear simple, they are often highly contextual, culturally specific, and
embedded in domain-specific semantics. Unfortunately, existing Question Answering (QA) systems fall
short in handling such intricacies for Hindi due to lack of annotated datasets, fine-tuned models, and
domain-aware linguistic understanding. Most multilingual or cross-lingual QA models rely on generic
datasets like TyDi QA, MLQA, or XQuAD. These datasets do not reflect tourism-specific queries, lack
deep cultural grounding, and rarely provide contextual answers in native sentence structure. Hindi also
presents challenges, including morphological richness, free word order, complex question phrasing,
and diverse dialects. These characteristics make vanilla translation-based approaches or zero-shot
multilingual models inadequate for domain-specific Hindi answer generation. To address this, we
ifne-tune mT5-small on the VATIKA dataset — a Varanasi Tourism dataset with 2,902 Hindi QA pairs
across 10 tourism-related domains (kunds, ashrams, temples, travel and transport, Ganga Aarti, cruise
tourism, museums, public toilets, food courts, and general FAQs). Each question is grounded in an
authentic contextual passage sourced from web portals, brochures, blogs, and local guides. Because
mT5 is a generative sequence-to-sequence model, it can produce free-form, fluent Hindi answers rather
than only extracting spans. Through fine-tuning on VATIKA we aim to generate concise, contextually
accurate responses for real tourist queries in Hindi [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Non-English and Indian Language QA</title>
        <p>
          While English QA datasets such as SQuAD have become standard benchmarks, comparable resources
for non-English languages remain scarce. Chandra et al [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] surveyed non-English QA datasets and
noted that most eforts concentrate European and East Asian languages, leaving significant gaps
elsewhere. In the context of Indian languages, early contributions include the use of mBERT for span
prediction in Hindi [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and the development of Indic-Transformers [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], both of which reported
competitive baselines across several Indic languages. More recently, the CHAIi dataset [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] introduced
short Hindi and Tamil QA pairs in the clinical domain; however, it does not address tourism-specific
questions, nor does it support generative QA tasks [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Tourism-Specific QA Systems</title>
        <p>
          Most existing work on tourism QA has focused on high-resource languages. Contractor et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], for
instance, developed large scale English dataset built from restaurant and hotel reviews, using
retrievalbased methods for answer generation. Later studies expanded this direction to multilingual settings,
examining the classification of tourism related content such as reviews and tweets in languages like
French and Spanish [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Despite these advances, research on Hindi remains limited. In particular,
generative QA for regional tourism has not been explored, highlighting the importance of culturally
specific resources such as VATIKA [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Multilingual Pretrained Models for QA</title>
        <p>
          Recent progress in transformer based multilingual pretraining has substantially advanced cross lingual
QA. The mT5 model [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], for example, demonstrates strong results on benchmarks such as MLQA and
TyDiQA, with its encoder decoder design supporting free form answer generation. Similarly,
mBART50 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] extends the BART framework to 50 languages and achieves state-of-the-art performance in both
machine translation and cross lingual generation, making it a promising option for generative QA in
low resource settings. Other models, including IndicBERT and XLM-R [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], have also been applied
to extractive QA tasks; however, their limitations in generative fluency make them less suitable for
conversational applications such as tourism assistance.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Positioning Our Work in the State of the Art</title>
        <p>
          Based on the existing literature, three key gaps can be identified. First, there is no dedicated dataset for
Hindi tourism specific QA. Second, generative QA in low-resource Indic languages has received little
attention. Third, evaluations of multilingual encoder decoder models such as mT5 and mBART on
culturally grounded datasets remain scarce. Our study addresses these gaps by benchmarking mT5-small
and mBART-50 on VATIKA, a domain-specific Hindi MRC dataset for tourism in Varanasi. Empirical
results indicate that models fine-tuned on VATIKA outperform prior Indic baselines such as IndicBERT
and XLM-R in QA-F1, thereby setting a new state of the art for Hindi generative QA [
          <xref ref-type="bibr" rid="ref15 ref8">8, 15</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Overview of VATIKA Dataset</title>
        <p>This section outlines the methodology followed in our study. The objective of this work is to evaluate
multilingual generative QA models on a culturally grounded Hindi dataset. For this purpose, we
employ the VATIKA dataset (Varanasi Tourism in Question Answering), which contains domain specific
tourism queries in Hindi. Unlike generic multilingual benchmarks, VATIKA provides context question
answer triples grounded in real-world tourism scenarios, thereby enabling the assessment of models
in low-resource, domain-specific settings. An overview of the dataset creation pipeline is shown in
Figure 1, and the detailed workflow is described below.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset and Setup</title>
        <p>The VATIKA dataset comprises 2,902 context-question-answer triples across ten tourism-related
domains: temples, kunds (sacred ponds), ashrams, ghats, cruise tourism, museums, food courts,
transportation, public toilets, and general FAQs. Each sample comprises a context passage in Hindi, a
natural-language question, and a corresponding human-annotated answer. The dataset was designed
to capture contextual diversity, linguistic richness, and domain specificity, making it a suitable
benchmark for generative QA models in Hindi. The detailed statistics of the dataset are in Table 1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Dataset Description</title>
        <p>
          The VATIKA dataset was introduced [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] as part of the shared task “Varanasi Tourism in Question
Answer System (Indian Language),” with the aim of supporting the development of question-answering
systems in Hindi for the tourism domain. The track was designed to provide authentic information
covering key aspects of Varanasi (Kashi), thereby contributing to a more user-friendly and informative
tourism experience for visitors. The dataset includes queries and answers across ten tourism-relevant
domains: Ganga Aarti, Cruise, Food Court, Public Toilet, Kund, Museum, Travel Agencies, Ashram,
Temple, and General FAQs. Each entry consists of a paragraph-level Hindi context followed by natural
question-answer pairs, reflecting realistic tourist information needs. The dataset is written in Hindi
using the Devanagari script and supports both contextual machine reading comprehension and open
domain QA. It has been partitioned into training and validation splits to facilitate model development
and evaluation. The training set comprises 5,358 contexts with 13,408 question-answer pairs, while the
validation set contains 1,158 contexts with 2,963 question-answer pairs. These splits ensure balanced
coverage across the ten domains while maintaining diversity in query intent, ranging from factual and
navigational to experiential questions.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Annotation Process</title>
        <sec id="sec-3-4-1">
          <title>Each entry was manually annotated:</title>
          <p>Example:
• Context Hindi paragraph containing factual or descriptive information.
• Question A natural tourist query.
• Answer A concise, human-written response grounded in the context.</p>
          <p>(संदभर्): मणकणका कुं ड में...
्पर: मणकणका कुं ड में कौन सी पूजा होती है?
उर: यहाँ मयुख्तः पडदान और मयृत्ु-संस्कार से जुड़ी पूजा होती है।
3.4.1. Dataset Splits
The dataset is divided into the following components:
• Training set: 2,902 QA pairs used for model learning.
• Validation set: Used for hyperparameter tuning and early stopping.
• Test-A: Contains gold-standard answers for ofline evaluation.</p>
          <p>• Test-B (Unlabeled): Used for blind leaderboard submissions.</p>
          <p>This dataset structure follows Machine Reading Comprehension (MRC) and closed-book QA
protocols, requiring models to understand paragraph-level Hindi text and generate context-aware free-form
answers.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Model Selection</title>
      <sec id="sec-4-1">
        <title>4.1. Model Selection Overview</title>
        <p>
          For this study, we focused on multilingual encoder–decoder architectures well suited to low-resource
generative QA. These models enable not only span extraction from a given context but also the
generation of fluent, contextually rich answers.
4.1.1. Model Based on mT5-small
We selected google/mt5-small, a 300M-parameter variant of the multilingual T5 model pretrained on
the mC4 corpus, covering over 100 languages including Hindi [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Several factors motivated this
choice:
• Language Coverage: mT5-small supports Hindi, making it efective for fine-tuning in
lowresource conditions.
• Generative Architecture: As an encoder–decoder model, it enables free-form answer
generation—an advantage over purely extractive models such as BERT.
• Span Corruption Pretraining: Its denoising (span corruption) objective equips the model to
handle long-range dependencies and produce coherent responses.
• Eficiency: With a relatively compact size, mT5-small can be trained on GPUs with limited
memory, making it suitable for deployment in resource-constrained environments, including
edge devices.
4.1.2. Comparison with Other Models
To contextualize our choice, we compared mT5-small against two alternatives:
• mrm8488/bert-multicased-finetuned-xquadv1 is inherently extractive, limiting its ability to
generate fluent, contextually rich answers in a generative QA setting.
• AVISHKAARAM/avishkaarak-ekta-hindi is a Hindi-specific language model, but its
performance on tourism-focused QA tasks is limited by weaker generalization and domain transfer
capabilities.
4.1.3. Additional Baseline: mBART-50
For a broader perspective, we also evaluated mBART-50, a multilingual sequence-to-sequence model
developed by Meta AI [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Pretrained across 50 languages, mBART-50 has demonstrated strong
performance in machine translation and cross-lingual generation. Its generative capacity makes it a robust
baseline for benchmarking Hindi QA performance alongside mT5-small.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Input Representation and Preprocessing</title>
      <sec id="sec-5-1">
        <title>5.1. Input Format</title>
        <sec id="sec-5-1-1">
          <title>Each QA instance was formatted using the following template:</title>
          <p>्पर: &lt;question&gt; संदभर्: &lt;context&gt;</p>
          <p>This structure encourages the model to attend to both components. Tokenization used the mT5
tokenizer (Hugging Face), which preserves Devanagari script. Context passages were truncated to a
maximum of 512 tokens, and question diversity and dialectal variations were retained during
preprocessing. The expected output was the corresponding Hindi answer. This design encourages the model
to simultaneously attend to the context and the query during generation.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Tokenization</title>
        <p>Tokenization was performed with the MT5 Tokenizer provided by Hugging Face, which preserves the
Devanagari script, including punctuation and diacritics. Particular care was taken to ensure accurate
handling of compound words and expressions unique to the Hindi spoken in Varanasi and its
surrounding regions.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Preprocessing Enhancements</title>
        <p>To ensure eficient and robust training, several preprocessing strategies were applied. Context
passages were truncated to a maximum of 512 tokens to manage GPU memory usage. Question
diversity was preserved by including a wide range of interrogatives such as where, what, as well as
yes/no queries—reflecting natural tourist queries. Finally, regional and dialectal variations, including
Bhojpuri-influenced phrasing, were retained to mirror real-world usage.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results and Analysis</title>
      <p>Fine-tuning was carried out using MT5 For Conditional Generation. The loss function was standard
cross-entropy with masking applied to padding tokens. Optimization employed AdamW with a
learning rate of 3e-5, supported by a scheduler that combined linear warmup with cosine decay.
Experiments were conducted on an NVIDIA V100 GPU (16 GB) with mixed precision (fp16) training to reduce
memory consumption. The configuration parameters are summarized in Table 2. A custom PyTorch
dataset wrapper was used to dynamically tokenize each context–question–answer triple during
training. A data collator handled sequence padding at the batch level, ensuring eficient GPU utilization.</p>
      <sec id="sec-6-1">
        <title>6.1. Experimental Analysis</title>
        <p>The model exhibited steady convergence, with loss decreasing substantially across epochs. Table 3
reports the average training loss per epoch.</p>
        <p>Parameter
Epochs
Batch Size
Max Input Length
Max Target Length</p>
        <p>
          Automatic evaluation used BLEU [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], ROUGE [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and QA-F1 metrics for combined fluency and
accuracy assessment [
          <xref ref-type="bibr" rid="ref11 ref12 ref13">11–13</xref>
          ].
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Results and Discussion</title>
        <p>We evaluated mT5-small and mBART-50 on the VATIKA dataset using BLEU, ROUGE, and QA-F1.
This section presents the quantitative metrics, visual comparisons, and confusion matrix analysis. The
detailed result for our proposed model are in Table 4 and 5.</p>
        <p>A fewkey patterns emerge from these results. First, mT5-small achieves a substantially higher
QAF1 score (0.4529 vs. 0.1069), indicating stronger factual accuracy. Second, mBART-50 demon strates
better performance on longer n-gram overlaps, achieving BLEU-3 of 25.5 and BLEU-4 of 19.6, which
Model
mT5-small
mBART-50</p>
        <p>ROUGE-1
0.0818
0.0818</p>
        <p>ROUGE-2
0.0518
0.0518</p>
        <p>ROUGE-L
0.0454
0.4529
suggests improved flu encyingenerating multi-word expressions. Finally, although over all ROUGE
scores remain modest, mT5-small consistently outper forms mBART-50 across ROUGE-1, ROUGE-2,
and ROUGE-L. Figures 2 and 3 illustrate BLEU and ROUGE score comparisons. The BLEU analysis
highlights complementary strengths between the two models. mT5-small achieves higher BLEU-1 and
BLEU-2, showing that it is more efective at reproducing key words and short sequences, which is
essential for factual accuracy. On the other hand, mBART-50 excelsin BLEU-3 and BLEU-4, reflecting
its ability to produce longer and more fluent sequences that capture broader syntactic structures. The
ROUGE results, though modest overall, further reinforce this distinction. mT5-small consistently
outperforms mBART-50 across ROUGE-1, ROUGE-2, and ROUGE-L, suggesting that it is better aligned
with the structural patterns of the reference sentences. These results point to a trade-of: mBART-50
emphasizes fluency and longer sequence coherence, while mT5-small delivers stronger lexical
precision and factual alignment.</p>
        <p>Confusion Matrix: Figure 4 presents confusion matrices for the two models. The analysis shows
that mT5-small yields a larger number of true positives and fewer false negatives compared to
mBART50. This explains its superior QA-F1 score, as it is more efective at identifying relevant information and
generating factually correct responses. In contrast, mBART-50 struggles with recall, leading to higher
false negative rates, despite its advantage in generating fluent, longer sequences. Overall, the results
suggest that mBART-50 is better suited for generating longer, fluent responses, while mT5-small ofers
a stronger balance of precision, structural similarity, and factual correctness. For domain-specific QA
in Hindi tourism, where accuracy is critical, mT5-small provides more reliable performance.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>
        The results reveal an important balance between factual accuracy and linguistic fluency in multilingual
generative QA. Them T5-small model achieved notably higher factual correctness (QA-F1: 0.4529),
making it more dependable for tourism related applications where precise information is essential.
In contrast, mBART-50 demonstrated stronger performance on longer n-gram overlaps (BLEU-3 and
BLEU-4), reflecting its capacity to produce more fluent and contextually rich sequences. These findings
indicate that compact multilingual transformers such as mT5 are particularly efective for domain
specific, low-resource QA tasks, whereas larger encoder decoder models like mBART-50 may be more
appropriate for dialogue-oriented or multi-turn conversational systems. Furthermore, the VATIKA
dataset sets a new bench mark for Hindi tourism QA, with results that surpass prior Indic baselines,
including IndicBERT and XLM-R, particularly in terms of factual accuracy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion and Future Work</title>
      <p>This study introduced VATIKA, the first Hindi Machine Reading Comprehension dataset dedicated
to the tourism domain in Varanasi. The dataset contains 2,902 high-quality question–answer pairs
spanning ten domains, capturing both factual and experiential aspects of tourist information. Using
this dataset, we benchmarked mT5 small and mBART -50, showing that while mT5-small provides
stronger factual accuracy, mBART-50 demonstrates superior fluency in generating longer sequences.
Together, these experiments advance the state of the art in Hindi generative QA and emphasize the
importance of culturally specific datasets for building practical QA systems. Looking ahead, several
directions remain open for exploration:
• Fine-tuning more powerful multilingual architectures, such as mT5-base or IndicTrans2, may
improve fluency without sacrificing accuracy.
• Extending VATIKA to additional Indian languages (e.g. Bhojpuri, Marathi, Bengali) would
broaden its applicability and promote cross lingual research.
• Integrating Automatic Speech Recognition (ASR) and Text to-Speech (TTS) would allow the
development of interactive, speech-based tourist assistants.
• Incorporating maps, images, and GPS data could enable richer multimodal QA, enhancing
navigation and overall user experience for tourists.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Acknowledgement</title>
      <p>I would like to sincerely thank Dr. Naina Yadav for her significant mentorship and support with this
paper. I would also like to thank the lab team at NIT Jalandhar, where I was fortunate enough to
work as an intern. The use of the lab’s GPU resources significantly aided in the experiments and
analysis conducted as part of this work. Their encouragement, suggestions, and technical support
have contributed greatly to the acceptance of this paper.
As we wrote the paper, we only employed a generative AI assistant in a limited way to facilitate the
writing process. The AI was mostly used to help refine the language, help structure sections, and
maintain consistency in LaTeX format. All technical content, experimental design, model development, and
reported results were conceptualised, implemented, and validated solely by the authors. The
generative AI assistant ofered no new research ideas or influence over the reported findings. AI was only a
supportive resource, which can be compared to utilizing grammar checking or typesetting resources.
All content in this paper was critically reviewed and approved by the authors.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fahrizain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Willyanto</surname>
          </string-name>
          .
          <article-title>A Survey on Non-English Question Answering Datasets</article-title>
          .
          <source>arXiv preprint arXiv:2112.13634</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          . Reading Wikipedia to Answer Open-Domain
          <string-name>
            <surname>Questions</surname>
          </string-name>
          .
          <source>In Proceedings of ACL</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          et al.
          <article-title>XLM-R: A Strong Baseline for Cross-Lingual Understanding</article-title>
          .
          <source>In Proceedings of ACL</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Contractor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Partap</surname>
          </string-name>
          , Mausam, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Singla</surname>
          </string-name>
          .
          <article-title>Large-Scale Question Answering Using Tourism Data</article-title>
          .
          <source>In Proceedings of ACL</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gatla</surname>
          </string-name>
          , Anushka,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kanwar</surname>
          </string-name>
          , G. Sahoo, and
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Mundotiya</surname>
          </string-name>
          .
          <article-title>Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models</article-title>
          .
          <source>arXiv preprint</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Khade</surname>
          </string-name>
          .
          <article-title>BERT-based Multilingual Machine Comprehension in English and Hindi</article-title>
          . arXiv preprint arXiv:
          <year>2006</year>
          .01432,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Shridhar</surname>
          </string-name>
          et al.
          <article-title>Indic-Transformers: Analyzing Transformers for Indian Languages</article-title>
          . arXiv preprint arXiv:
          <year>2011</year>
          .02323,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jain</surname>
          </string-name>
          et al.
          <article-title>IndicBERT: A Multilingual Model for 10 Indian Languages</article-title>
          . arXiv preprint arXiv:
          <year>2008</year>
          .00401,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          et al.
          <article-title>Dense Passage Retrieval for Open-Domain QA</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>EMNLP</given-names>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          et al.
          <source>CHAIi: Hindi and Tamil Question Answering Dataset. Kaggle Dataset</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          et al.
          <article-title>Retrieval-Augmented Generation for Knowledge-Intensive NLP</article-title>
          . In NeurIPS,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C. Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>ROUGE: A Package for Automatic Evaluation of Summaries</article-title>
          .
          <source>In ACL Workshop</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ward</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>BLEU: A Method for Automatic Evaluation of Machine Translation</article-title>
          .
          <source>In ACL</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          et al.
          <article-title>Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)</article-title>
          .
          <source>Journal of Machine Learning</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          et al.
          <article-title>IndicTrans: Transformer Models for Indian Languages</article-title>
          .
          <source>In ACL Findings</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          et al. mBART
          <article-title>-50: Multilingual Machine Translation with BART</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>ACL</given-names>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xue</surname>
          </string-name>
          et al.
          <article-title>mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer</article-title>
          .
          <source>In NAACL</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>