<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TIRTHA: Tourism Information Retrieval and Text-based Hindi Answering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Krishna Tewari</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Supriya Chanda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aarya Chaturvedi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bennett University</institution>
          ,
          <addr-line>Greater Noida</addr-line>
          ,
          <country country="IN">INDIA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Technology (BHU)</institution>
          ,
          <addr-line>Varanasi</addr-line>
          ,
          <country country="IN">INDIA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Hindi Tourism QA (HTQA) addresses the challenge of extracting precise answers from Hindi context paragraphs within the specialized tourism domain of Varanasi, where limited annotated resources and complex linguistic structures pose significant hurdles. As part of the FIRE 2025 VATIKA shared task, which focuses on Hindilanguage QA, we developed and evaluated multiple QA approaches using a structured dataset consisting of context-question pairs in JSON format. Three main strategies were explored: (i) fine-tuning the multilingual mT5 model, which demonstrated reasonable language support but occasionally produced fallback answers; (ii) span-based extractive modeling using XLM-RoBERTa, enhanced with post-processing techniques to refine shortspan predictions; and (iii) a zero-shot approach leveraging ChatGPT with batch-wise prompt engineering applied over 50 context-question pairs purely for comparative analysis. Evaluation was performed using BLEU (1-4), ROUGE-L, and QA-F1 metrics. While ChatGPT achieved higher metric scores, only open-source models are considered for leaderboard results; hence, the ChatGPT results are reported separately as ablation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;QA</kwd>
        <kwd>Extractive QA</kwd>
        <kwd>XLM-RoBERTa</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>Zero-shot Learning</kwd>
        <kwd>Tourism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rich cultural and spiritual heritage of Varanasi, also known as Kashi, makes it one of the world’s
oldest living cities and a prominent pilgrimage destination in India. Renowned for its sacred kunds,
temples, and ghats, the city attracts millions of tourists and devotees each year. However, most
information about these landmarks exists in unstructured textual formats, which poses significant barriers
for Hindi-speaking visitors seeking concise, accurate, and reliable knowledge.</p>
      <p>
        Hindi-language QA (QA) systems ofer a solution by automatically extracting precise answers from
large bodies of text, enabling eficient information retrieval for end-users. A typical QA task involves
processing a question  = ( 1,  2, … ,   ) in natural Hindi language and retrieving the correct answer
span from a given context paragraph  = ( 1,  2, … ,   ). However, several challenges complicate this
process in low-resource and specialized domains. First, the lack of large-scale annotated datasets in
Hindi limits supervised training of robust models [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Second, domain-specific variability in phrasing,
complex syntactic structures, and culturally grounded concepts further increase modeling dificulty
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Third, ambiguity in question formulation and answer granularity creates additional hurdles in
achieving precise and reliable retrieval [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        To advance research in this direction, the Forum for Information Retrieval Evaluation (FIRE)
introduced the VATIKA shared task in 2025 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], focusing on Hindi QA in the tourism domain of Varanasi.
The dataset comprises structured JSON instances pairing context passages about sacred sites with
corresponding questions, providing a valuable benchmark for systematic development and evaluation of
QA systems.
      </p>
      <p>
        In this work, we benchmark multiple QA approaches in this culturally rich, low-resource setting,
including transformer-based fine-tuning and zero-shot prompting strategies using large language models
as comparative analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Our study demonstrates the potential of these approaches and highlights
key challenges in building robust Hindi QA systems for specialized domains, pointing toward
promising directions for future research.
      </p>
      <p>The rest of the paper is structured as follows: Section 2 discusses related work; Section 3 describes
the dataset; Section 4 presents the proposed methodology; Section 5 reports results and analysis; and
Section 6 concludes with key findings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The field of QA has been fundamentally reshaped by the introduction of the Transformer
architecture, which enabled large pre-trained language models (PLMs) to excel across NLP tasks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Early
breakthroughs such as BERT established the dominant pre-train and fine-tune paradigm, learning rich
contextual representations from vast text corpora to achieve state-of-the-art performance on many
language tasks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Two primary paradigms for QA have emerged. Extractive QA, popularized by benchmark datasets
like SQuAD [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], formulates the task as span prediction over a context paragraph. Cross-lingual
transformers such as XLM-RoBERTa have demonstrated strong performance in this space by enabling
transfer learning across languages [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In contrast, Generative QA treats the task as a text-to-text problem,
where models like T5 unify multiple NLP tasks into a single sequence-to-sequence framework [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        With the advent of Large Language Models (LLMs) such as GPT-3 and GPT-4, zero-shot and few-shot
prompting strategies have gained significant attention. These models perform tasks by interpreting
instructions embedded in prompts, often achieving competitive results without task-specific fine-tuning
[
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. Zero-shot prompting has proven especially viable for low-resource settings, allowing models
to generalize to unseen tasks [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ].
      </p>
      <p>While most research in QA has focused on high-resource languages such as English, several eforts
have extended QA to low-resource and cross-lingual settings. IndicQA and TyDi QA are notable
benchmarks focusing on diverse Indian languages, highlighting challenges such as code-mixing,
transliteration, and limited data availability [14, 15]. Transfer learning and multilingual pretraining strategies
have been proposed to overcome these challenges, demonstrating that models pretrained on
multilingual corpora (e.g., mBERT) show strong cross-lingual transferability [16, 17].</p>
      <p>Domain-specific QA has also seen increasing interest. Specialized benchmarks in medical, legal,
and scientific domains have revealed that generic models often struggle with domain-specific jargon
and knowledge representation [18, 19, 20]. Fine-tuning on domain-specific data significantly improves
performance but remains challenging in low-resource settings.</p>
      <p>Recent studies have started exploring hybrid architectures that combine neural and symbolic
methods to improve robustness and interpretability [21, 22]. Such models aim to bridge the gap between
purely data-driven approaches and rule-based systems, often improving precision and reducing
ambiguity in specialized applications.</p>
      <p>Despite these advances, a direct comparative analysis of extractive, generative, and zero-shot paradigms
on a low-resource, culturally specific dataset such as the VATIKA Hindi QA remains underexplored.
Our work benchmarks these paradigms in a tourism domain setting, shedding light on their practical
efectiveness and identifying key areas for future improvement.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The dataset used in this study is released as part of the FIRE 2025 VATIKA Shared Task on Hindi QA.
It is designed to support machine reading comprehension (MRC) and QA applications in the tourism
domain of Varanasi, focusing on cultural and spiritual heritage. The dataset is provided in a structured
JSON format, organized by domain → context → question-answer pairs.</p>
      <p>Each entry is organized into three primary fields: Context, a factual, descriptive paragraph in Hindi
(Devanagari script) detailing specific landmarks (e.g., temples, kunds, ghats), historical events, or
cultural rituals in Varanasi; Question, a fact-seeking wh-question in Hindi (e.g., “कहाँ,” “कब,” “कौन”),
designed to be answerable based solely on the provided context; and Answer, the ground-truth
answer, a verbatim span directly extracted from the context paragraph, enforcing an extractive span
prediction task.</p>
      <p>The dataset is pre-divided into training, validation, and test splits to ensure standardized evaluation.
The training set contains 2,452 question-answer pairs, the validation set contains 273 pairs, and the
blind test set contains 915 pairs. The full distribution is summarized in Table 1.</p>
      <p>The VATIKA dataset covers 10 tourism-relevant domains: Ganga Aarti, Cruise, Food Court, Public
Toilet, Kund, Museum, General, Ashram, Temple, and Travel. Each domain includes detailed
paragraphlevel contexts followed by multiple question-answer pairs, simulating real-world information-seeking
behavior in natural Hindi language.</p>
      <p>A representative structured entry from the “kund” domain is shown below:</p>
      <sec id="sec-3-1">
        <title>Domain: kund</title>
      </sec>
      <sec id="sec-3-2">
        <title>Contexts:</title>
        <p>• मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय ह ...</p>
        <p>– QID: kund_1467</p>
        <p>Question: मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय हवाई
अेड् (वाराणसी) से कतनी दूर है?
Answer: मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय हवाई
अेड् (वाराणसी) से 25.8 कलोमीटर दूर है।
– QID: kund_1468</p>
        <p>Question: मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय हवाई
अेड् के पास से कै से पहुँचा जा सकता है?
Answer: मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय हवाई
अेड् से यह दूरी टसैक्ी या अयन् नजी परवहन के माध्यम से तय क जा सकती है।</p>
        <p>A qualitative review of the data highlights several key characteristics. Contexts are rich in proper
nouns (e.g., place names, deity names), dates, and factual details. The questions are predominantly
factoid, focusing on the retrieval of specific entities rather than complex reasoning or synthesis.
Answer spans are typically short phrases directly extracted from the context. This structured and curated
dataset provides a robust benchmark for evaluating extractive QA models in a specialized low-resource
Hindi setting, promoting research toward domain-specific QA systems.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>We address the problem of developing a robust Hindi QA system for the Varanasi tourism domain.
Formally, given a question  in Hindi and a set of context paragraph  , the goal is to produce an
answer  that is fluent, factually consistent, and derived strictly from the provided context. This can
be expressed as:
 =̂ arg max  ( ∣ , ),</p>
      <p>∈ ()
where  () denotes the set of plausible answer spans or sequences within the context. For
extractive methods,  () is restricted to spans of text that exist verbatim in  , while for large language model
(LLM) approaches,  () encompasses all possible text sequences that can be generated from the
context.</p>
      <p>The design of our QA system is centered around three complementary computational paradigms:
generative QA using fine-tuned mT5, extractive QA using XLM-RoBERTa with post-processing, and
zero-shot answer generation using a large language model. These paradigms were selected to leverage
their respective strengths and flexibility for generative QA, precision and interpretability for extractive
QA, and contextual fluency and completeness for LLM-based generation.</p>
      <sec id="sec-4-1">
        <title>4.1. Generative QA with Fine-Tuned mT5</title>
        <p>
          Our first approach employed the mT5-small model, a multilingual version of the Text-to-Text Transfer
Transformer (T5) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The T5 framework uniquely treats all NLP tasks as a text-generation problem,
making it a flexible choice for generative QA. We fine-tuned the model on the oficial training set by
providing the question and context as input, with the objective of teaching the model’s decoder to
generate the ground-truth answer. Despite its potential to produce fluent responses, this approach proved
underwhelming. The model often defaulted to generic, uninformative answers (e.g., “ उर नहीं है” ),
suggesting that the limited size of the training corpus was insuficient for robust domain adaptation.
This highlighted the significant data and computational requirements of fine-tuning generative models
for specialized tasks.
        </p>
        <p>All experiments were conducted using the PyTorch framework and the Hugging Face Transformers
library. For the generative approach, the google/mt5-small model was fine-tuned for 5 epochs with
a batch size of 8 and a learning rate of 2e-5 using the AdamW optimizer.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Extractive QA with XLM-RoBERTa and Post-Processing</title>
        <p>
          The extractive paradigm employs XLM-RoBERTa (XLM-R) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a transformer-based model pretrained
for cross-lingual understanding, capable of processing Hindi text directly. The model formulates QA
as a span prediction problem: given a context paragraph   ∈  , it predicts a start token  and an end
token  such that the answer is extracted as:
        </p>
        <p>=    +1 …   ,
where   denotes the  -th token of the context.</p>
        <p>Challenges in raw predictions: Despite the model’s accuracy at identifying relevant tokens, we
observed two recurring issues: 1. Incomplete spans: The predicted spans were often too short, omitting
critical contextual information necessary for coherent understanding. 2. Low-confidence predictions:
In cases involving ambiguous questions or rare domain-specific vocabulary, the model occasionally
generated predictions with very low confidence scores, leading to unreliable outputs.</p>
        <p>To address these challenges, we devised a two-step post-processing pipeline that improves answer
completeness and reliability:</p>
        <p>1. Sentence Expansion: The predicted span (, ) is mapped back to the full sentence containing
it, producing a more comprehensive answer:</p>
        <p>expanded = sentence_containing(  …   )
2. Confidence Filtering: Predictions with confidence below a threshold (empirically set at 0.05)
that are unusually short are further analyzed. We check for the presence of domain-specific keywords
(e.g., names of locations, temples, or ghats relevant to Varanasi). If keywords are missing, the answer
is replaced with a standard fallback message:
 final = {
 expanded,
उर उपल नहीं है</p>
        <p>if confidence is high or keywords present ,
, otherwise.</p>
        <p>Implementation Details: We use the deepset/xlm-roberta-base-squad2 checkpoint. Context
paragraphs are tokenized using XLM-R’s SentencePiece tokenizer. The model processes inputs in
batches of 16. By integrating sentence expansion and confidence filtering, this extractive pipeline
produces answers that are both accurate and contextually complete while remaining interpretable.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Zero-Shot Prompting with a Large Language Model</title>
        <p>
          As an ablation experiment, we used large language model (LLM), specifically ChatGPT / GPT-4o mini
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], in a zero-shot setting. This model was not part of the oficial runs due to task restrictions
prohibiting closed-source systems. Unlike extractive QA, LLMs generate answers as free-form sequences
of text rather than extracting spans. This approach does not require fine-tuning on domain-specific
data.
        </p>
        <p>For each question-context pair (, ) , we construct a detailed prompt that instructs the model to
answer strictly using the provided context. The prompt is formulated as:
prompt = ‘Please provide answer based on the given context only’
 = मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय हवाई अेड् (वाराणसी) से कतनी दूर है?
 = मणकणका चक्र पकुष्रणीय कुं ड लाल बहादुर शास्तरी अंतरराीय ह...</p>
        <p>Advantages and rationale: The zero-shot LLM approach ofers several benefits:
• Fluency: Answers are generated in grammatically correct and natural Hindi.
• Contextual completeness: The model can combine information from multiple sentences to
produce richer answers.
• High performance without fine-tuning: The model performs well in this domain, making
zeroshot prompting efective.</p>
        <p>Implementation Details: Prompts are submitted in batches of 50 question-context pairs via the
OpenAI. Responses are parsed to extract the answer segment, discarding any additional commentary.</p>
        <p>Results from this ablation are reported separately for reference and are excluded from oficial
leaderboard discussion.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The VATIKA 2025 Shared Task evaluated submissions on Test Data-II using three complementary
families of metrics: (i) QA-F1, the primary measure balancing precision and recall; (ii) BLEU-1 to BLEU-4,
assessing lexical overlap and fluency across increasing  -gram lengths; and (iii) ROUGE-L, capturing
the longest common subsequence and content coverage. The oficial leaderboard, covering all
participating teams and runs, is presented in Table 2.</p>
      <p>IReL’s submissions show a clear progression across runs. Run 1 established a baseline (QA-F1 of
0.4169, BLEU-4 of 15.4), but its precision and recall were limited. Run 2 improved moderately in QA-F1
(0.4612), indicating better overall ranking, while maintaining similar BLEU and ROUGE-L values.</p>
      <p>Compared with other teams, IReL’s Run 2 is highly competitive. Its QA-F1 of 0.4612 surpasses all
runs from CSE_SVNIT, MUCS, Namaste NLP, NLP Fusion, and IIIT Surat, while also outperforming
AiNauts (best QA-F1 of 0.4529). In summary, IReL demonstrated steady improvements across its two
runs, culminating in Run 2, which achieved competitive performance against the best systems in the
task.</p>
      <sec id="sec-5-1">
        <title>Team</title>
        <p>AiNauts
CSE_SVNIT
IIIT Surat</p>
      </sec>
      <sec id="sec-5-2">
        <title>IReL</title>
        <p>MUCS
Namaste NLP
NLP Fusion
Scalar
VA-BO-INTERN
Run
Run 1
Run 2
Run 1
Run 2
Run 3
Run 1
Run 2
Run 3
Run 1
Run 2
Run 1
Run 2
Run 3
Run 1
Run 2
Run 3
Run 1
Run 1
Run 2
Run 3
Run 1
Run 2
Run 3</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.1. Ablation: Zero-Shot Closed-Source Baseline</title>
        <p>While Runs 1 and 2 were submitted oficially, an additional ablation using a closed-source ChatGPT
model (Run 3) yielded higher scores (QA-F1 of 0.5507, BLEU-1 of 61.5, BLEU-4 of 17.9 and ROGUE-L of
0.0824). These results are provided solely for diagnostic comparison and are excluded from task
evaluation due to the use of proprietary models. However, this study indicates the potential of large models
for low-resource Hindi QA, motivating exploration of open-source instruction-tuned counterparts in
the future.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>The VATIKA 2025 Shared Task showed the dificulty of Hindi question answering in the tourism
domain of Varanasi, where data scarcity and linguistic complexity limit system performance. Among the
oficial open-source submissions, Run 2 achieved the best performance. An additional ablation with
ChatGPT indicated the potential of large models for low-resource Hindi QA. These results confirm
that careful refinement leads to better balance across lexical fluency, semantic coverage, and retrieval
precision. Still, challenges remain. Systems struggle with domain-specific terms, long contexts, and
ambiguous user queries. Future work should focus on fine-tuning multilingual transformers on Hindi
tourism data, and using retrieval-augmented generation to improve context–answer alignment.
Postprocessing can help make outputs more complete and fluent. Hybrid pipelines combining extractive
accuracy with generative flexibility may further improve results. Incorporating structured knowledge
of cultural sites can add robustness. Domain-adaptive evaluation and query expansion strategies may
also raise coverage. Together, these directions can push Hindi QA toward more accurate, fluent, and
user-friendly systems in specialized low-resource settings.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT, Grammarly in order to: Grammar and
spelling check, Paraphrase and reword. After using these tools, the authors reviewed and edited the
content as needed and take full responsibility for the publication’s content.
[14] D. Kakwani, A. Ghosal, M. Shrivastava, S. Sitaram, V. Sastry, P. Talukdar, Indicnlp suite:
Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian
languages, in: Findings of the Association for Computational Linguistics (EMNLP), 2020, pp.
4947–4958.
[15] C. Clerwall, D. Y. Tang, A survey of question answering in low-resource languages, ACM
Computing Surveys 54 (2021) 1–34.
[16] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual BERT?, in: Proceedings
of the 57th Annual Meeting of the Association for Computational Linguistics, Association for
Computational Linguistics, Florence, Italy, 2019, pp. 4996–5001. doi:10.18653/v1/P19- 1493.
[17] J. Phang, X. Guo, K. Tran, K. Cho, English is enough! leveraging english data in code-switching
language modeling, in: Proceedings of the 59th Annual Meeting of the Association for
Computational Linguistics (ACL), 2021, pp. 2421–2435.
[18] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical
language representation model for biomedical text mining, Bioinformatics 36 (2020) 1234–1240.
[19] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, I. Androutsopoulos, LEGAL-BERT: The
muppets straight out of law school, in: Findings of the Association for Computational Linguistics:
EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 2898–2904. doi:10.
18653/v1/2020.findings- emnlp.261.
[20] I. Beltagy, K. Lo, A. Cohan, SciBERT: A pretrained language model for scientific text, in:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the
9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),
Association for Computational Linguistics, Hong Kong, China, 2019, pp. 3615–3620. doi:10.18653/v1/
D19- 1371.
[21] S. Gupta, P. Malik, A. Jaiswal, S. Jha, R. Prasad, Neural-symbolic approaches in natural language
processing: A survey, arXiv preprint arXiv:2105.06375 (2021).
[22] Z. Dai, Y. Sun, Y. Zhang, Q. Liu, A survey of knowledge-enhanced text generation, IEEE
Transactions on Knowledge and Data Engineering 33 (2021) 3567–3584.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lopyrev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Squad:
          <volume>100</volume>
          ,000+
          <article-title>questions for machine comprehension of text</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>2383</fpage>
          -
          <lpage>2392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          , D. Cheng,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Investigating transferability of pre-trained language models for neural question answering</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>08962</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          , T. Wolf,
          <article-title>Transfer learning in natural language processing</article-title>
          ,
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials</source>
          (
          <year>2019</year>
          )
          <fpage>15</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <article-title>Reading wikipedia to answer open-domain questions</article-title>
          ,
          <source>in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1870</fpage>
          -
          <lpage>1879</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gatla</surname>
          </string-name>
          , Anushka,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kanwar</surname>
          </string-name>
          , G. Sahoo,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Mundotiya</surname>
          </string-name>
          ,
          <article-title>Tourism question answer system in indian language using domain-adapted foundation models, arXiv preprint (</article-title>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Zero-shot question answering by prompting pre-trained language models</article-title>
          , arXiv preprint arXiv:
          <year>2009</year>
          .
          <volume>07118</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . acl- main.747.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>14165</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Gpt-4
          <source>technical report, arXiv preprint arXiv:2303.08774</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Leveraging chatgpt and xlm-roberta for sarcasm detection in dravidian code-mixed languages</article-title>
          ,
          <source>in: Proceedings of FIRE (Working Notes)</source>
          ,
          <source>Forum for Information Retrieval Evaluation</source>
          ,
          <year>2024</year>
          , India,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>4054</volume>
          /
          <fpage>T4</fpage>
          -14.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>