<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Varanasi Tourism in Question Answer System Track:IIIT SURAT @ FIRE'25 Shared Task⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ritesh Kumar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sumit Chand Jaiswal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dhiraj Bhatia</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Biological Sciences and Engineering, Indian Institute of Technology Gandhinagar</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science and Engineering, Indian Institute of Information Technology Surat</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This paper presents our approach to the VATIKA: Varanasi Tourism in Question Answering System Track at FIRE 2025, conducted by the Indian Institute of Information Technology Surat. The task focuses on developing a domain-specific Question Answering (QA) system for tourism-related queries in Hindi, particularly centered on the culturally significant city of Varanasi. To address this challenge, we propose a hybrid architecture that integrates semantic retrieval with extractive question answering. Our system leverages Facebook AI Similarity Search (FAISS) for eficient similarity search in high-dimensional vector spaces. Contextual embeddings are generated using IndicBERT, a multilingual ALBERT-based transformer model pretrained on major Indic languages. These embeddings are indexed within FAISS to enable fast and accurate retrieval of semantically relevant contexts for a given user query. The retrieved context is subsequently processed by a fine-tuned IndicBERT-based extractive QA model, which predicts the start and end token positions of the answer span within the passage. This two-stage retrieval and comprehension framework improves computational eficiency while maintaining contextual relevance. We submitted three system runs for the shared task. Although IndicBERT proved efective for both embedding generation and question answering, the overall performance was constrained by challenges in capturing nuanced linguistic characteristics of pure Hindi text, particularly domain-specific expressions and culturally grounded references. Our findings highlight the importance of domain adaptation and languagespecific fine-tuning for Hindi QA systems. Future improvements may include enhanced Hindi-specific pretraining, incorporation of linguistic features, and improved retrieval strategies to better address semantic variability in tourism-related queries.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;VATIKA</kwd>
        <kwd>FASIS</kwd>
        <kwd>ALBERT</kwd>
        <kwd>IndicBERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Tourism plays a vital role in India’s socio-economic development by generating income, creating
employment opportunities, and supporting local businesses [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Beyond its economic contribution,
tourism also facilitates cultural exchange, promotes the preservation of heritage, and accelerates
infrastructural growth [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. By attracting global visitors, tourism not only enhances international
visibility but also instills regional pride, positioning itself as a key driver of sustainable development
and global cooperation in the travel and tourism sector [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Among India’s most prominent destinations,
Varanasi (Kashi) holds a unique position as one of the world’s oldest living cities. It is revered as a
cultural and spiritual hub, attracting millions of domestic and international tourists seeking spiritual
awakening, cultural enrichment, and experiential travel [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Known for its Bhakti-Bhaav (devotional
ethos), Varanasi reflects the living traditions of India and continues to be a vibrant center for pilgrimage,
cultural festivities, and heritage tourism [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Despite its global significance, the tourism experience in Varanasi can often be hindered by limited
access to authentic and structured information. Tourists frequently seek reliable guidance regarding
Forum for Information Retrieval Evaluation, December 17-20, 2025, Varanasi, India
⋆You can use this document as the template for preparing your publication. We recommend using the latest version of the
ceurart style.
* Corresponding author.
religious rituals such as the Ganga Aarti, local services including cruise rides, food courts, public facilities,
travel agencies, ashrams, temples, kunds, museums, and general cultural events [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Traditional modes
of information dissemination, such as guidebooks or physical helpdesks, are often insuficient in meeting
the diverse and immediate queries posed by modern tourists, particularly in Indian languages [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In this context, Natural Language Processing (NLP) ofers promising solutions through Question
Answering (QA) systems, which are designed to automatically respond to user queries in natural
language using structured databases or unstructured text resources [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. By combining domain specificity
with multilingual capabilities, QA systems can enhance tourist experiences by ofering precise,
userfriendly, and context-aware information [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>However, Hindi presents challenges including morphological richness, free word order, and limited
annotated datasets. To address these issues, we participated in the VATIKA shared task at FIRE 2025,
proposing a retrieval-augmented QA framework integrating FAISS with IndicBERT.</p>
      <p>Our contributions are:
• A domain-specific Hindi QA pipeline for tourism.
• Integration of FAISS for semantic retrieval.
• Fine-tuning IndicBERT for extractive QA.</p>
      <p>• Empirical evaluation on VATIKA Test Data-II.</p>
      <p>Organization of rest of the paper is as follows. Section 2 describes dataset, Section 3 describes about
Methodology we have used and Section 4 discuss about Results and Analysis. Finally, we conclude in
Section 5 with directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        QA systems evolved significantly with the introduction of datasets such as SQuAD [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Transformerbased architectures like BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] improved contextual understanding, while ALBERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] reduced
parameter redundancy.
      </p>
      <p>
        Multilingual BERT extended support to multiple languages but showed limitations for low-resource
languages. AI4Bharat introduced IndicBERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], a multilingual ALBERT-based model trained on
Indian languages.
      </p>
      <p>
        Retrieval-Augmented QA approaches combine semantic retrieval with answer extraction. FAISS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
enables eficient similarity search in high-dimensional vector spaces and is widely used in open-domain
QA frameworks. Tourism-based QA systems remain underexplored for Indian languages. Our work
contributes by combining IndicBERT with FAISS for Hindi tourism QA.
      </p>
      <p>
        For multilingual settings, mBERT and XLM-R [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] extended Transformer architectures to support
multiple languages. However, studies have shown that multilingual models often underperform in
low-resource languages due to limited language-specific supervision. To address this gap, AI4Bharat
introduced IndicBERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], a multilingual ALBERT-based model trained specifically on Indian language
corpora. IndicBERT demonstrated promising results in tasks such as classification, NER, and QA
across several Indic languages. In addition to extractive QA, open-domain QA systems have gained
attention. Dense Passage Retrieval (DPR) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] introduced dual-encoder retrieval models that learn dense
embeddings for eficient document retrieval. Retrieval-Augmented Generation (RAG) [ 16] combined
retrieval with generative models, improving answer quality in open-domain settings. These systems
highlight the importance of retrieval mechanisms for improving context relevance. For Indian language
QA, research remains comparatively limited. Several studies have explored Hindi QA using mBERT and
multilingual Transformer models, but domain-specific tourism QA datasets have been scarce. The FIRE
evaluation campaigns have played a crucial role in promoting Indian language IR and QA research. The
VATIKA shared task focuses specifically on tourism queries in Hindi, providing a structured benchmark
for evaluating domain-adapted QA systems.
      </p>
      <p>In the tourism domain, conversational agents and chatbots have been proposed to assist travelers
with itinerary planning and local information access. However, many of these systems are primarily
English-centric and rely on generative approaches without robust domain grounding. Our work difers
by focusing on extractive QA with domain-specific retrieval in Hindi, leveraging IndicBERT embeddings
and FAISS-based semantic indexing.</p>
      <p>Overall, prior research highlights three important directions:
• leveraging Transformer-based contextual encoders,
• integrating retrieval mechanisms for improved relevance, and
• adapting models to low-resource languages. Our approach builds upon these principles to design
a retrieval-augmented Hindi tourism QA system tailored to the VATIKA benchmark.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>Test collection provided by FIRE 2025 VATIKA organizers for shared task [17]. The test data is divided
into 2 parts: Test Data-I and Test Data-II. Test Data-I was provided for the initial stage. Test Data-II
was provided for the final submission of system. Focused on the culturally significant city of Varanasi,
the dataset captures realistic queries that travelers and pilgrims commonly raise regarding locations,
services, logistics, and spiritual landmarks.</p>
      <p>VATIKA is distinctive in its coverage of ten tourism-relevant domains: Ganga Aarti, Cruise, Food
Court, Public Toilet, Kund, Museum, General Queries, Ashram, Temple, and Travel. Each domain
contains carefully curated Hindi passages (in Devanagari script), paired with multiple question–answer
sets. The dataset is designed to simulate authentic information-seeking behavior by including questions
that span factual, navigational, and experiential types, thereby ensuring comprehensive coverage of
diverse tourist concerns. Entirely developed in Hindi, VATIKA provides paragraph-level contexts with
associated QA pairs, making it a valuable linguistic resource for the Indian tourism sector. It supports
both open-domain QA and contextual MRC-style QA, ofering researchers and developers a benchmark
for building and evaluating robust, user-centric systems tailored to Indian language contexts. From the
provided dataset from split section we have Train and Validation that consists of 5,538 contexts, 13,408
QA pairs and 1,158 contexts, 2,963 QA pairs respectively.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology and Experimental Setup</title>
      <p>The system operates in two main phases: training and inference. Data prepossessing was required
for the task. We present a Hindi extractive question-answering (QA) framework that integrates text
normalization, data structuring, tokenization, fine-tuning, and context retrieval into a unified pipeline.
Hindi text, including both questions and answers, is first standardized using the  −  − 
to ensure consistency and improve downstream performance. Question–answer pairs and their contexts
are then parsed from JSON data, with the script augmenting each context to guarantee inclusion of the
correct answer span—an essential step for extractive QA tasks. The text is subsequently tokenized with
the IndicBERT tokenizer, which converts input into numerical  and  tensors
while mapping character-level answer boundaries to token-level indices for supervised training. A
pre-trained IndicBERT model is fine-tuned for QA using the   
class, and the Hugging Face Trainer API manages optimization under specified hyperparameters such
as epoch count, batch size, and logging frequency. For inference, the system employs an embedding
model with FAISS-based semantic search to retrieve the most relevant context for a given user query.
The fine-tuned model then predicts the start and end tokens of the answer span, which are decoded
back into fluent Hindi text, with fallback responses provided when no confident answer is available.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>For this work, we employ IndicBERT (ai4bharat/indic-bert), a multilingual ALBERT-based model
pretrained on 12 major Indic languages, including Hindi. The model is utilized for two core tasks within
our system:
• Embedding Generation: IndicBERT is used to encode text into dense vector representations that
capture semantic meaning. These embeddings form the foundation of the system’s similarity
search component.
• Question Answering: The model is further fine-tuned for extractive QA, enabling it to predict the
start and end positions of answers within a given passage, thereby supporting precise information
retrieval.</p>
      <p>To enable eficient retrieval, we integrate FAISS for similarity search in high-dimensional vector spaces.
All contextual embeddings produced by IndicBERT are indexed using FAISS. Upon receiving a query,
the system retrieves the most semantically relevant context from the index, which is subsequently
passed to the QA model. This significantly enhances both the accuracy and eficiency of the system.
Our implementation relies heavily on the Hugging Face ecosystem, particularly the transformers and
datasets libraries. The transformers library provides access to the pre-trained IndicBERT model and the
Trainer API for fine-tuning, while the datasets library supports preprocessing, formatting, and eficient
handling of training and validation data.</p>
      <p>Our model’s performance is hindered by its struggle with the nuances of pure Hindi datasets, particularly
when utilizing the AI4Bharat IndicBERT model. Key contributing factors include potential mismatches
between the model’s training data and our specific dataset, as well as limitations in capturing linguistic
intricacies unique to Hindi. These challenges suggest avenues for improvement, such as fine-tuning
the model with Hindi-specific datasets or incorporating additional linguistic features tailored to the
language. The scores obtained by our three runs are given in Table 1. Here, we have used Test Data-II
for testing our system. The oficial evaluation measure by VATIKA’25 are F1 score, BLEU score, and
ROUGE-L score. Our best performance is by IIIT Surat-03-05072025 where we use IndicBERT.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This year we participated in the Shared Task of VATIKA: Varanasi Tourism in Question Answer System.
We tried for the integration of FAISS that enables eficient similarity search in high-dimensional vector
spaces. By indexing the contextual embeddings generated by IndicBERT, the system can quickly
identify the most semantically relevant context for a given query. Passing this retrieved context
to the QA model ensures more accurate and context-aware responses, thereby enhancing both the
efectiveness and eficiency of the overall system. While there can be no denial of the fact that our
overall performance is dismal, initial results are suggestive as to what should be done next. Future eforts
may also explore conversational agents capable of handling multi-turn dialogues, allowing tourists to
refine and contextualize their queries in real time. Integration with real-time services, such as transport
schedules, weather updates, and ticketing platforms, could further enhance the system’s practical utility.
Additionally, personalized recommendation systems based on user preferences (e.g., spiritual, cultural,
or culinary tourism) represent an interesting research direction. We shall be exploring some of these
tasks in the coming days.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgment</title>
      <p>This work and the author’s participation in the conference were supported by the ANRF-PAIR Scheme,
Government of India (Sanction Order No. ANRF/PAIR/2025/000008/PAIR).</p>
    </sec>
    <sec id="sec-8">
      <title>8. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI-GPT-4: Grammar and spelling check.
After using this tool, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.
[16] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih,
T. Rocktäschel, et al., Retrieval-augmented generation for knowledge-intensive nlp tasks, Advances
in neural information processing systems 33 (2020) 9459–9474.
[17] P. Gatla, Anushka, N. Kanwar, G. Sahoo, R. K. Mundotiya, Tourism question answer system in
indian language using domain-adapted foundation models, arXiv preprint (2025).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Travel</surname>
          </string-name>
          , T. E. Impact,
          <source>World travel &amp; tourism council (wttc)</source>
          ,
          <source>Travel &amp; Tourism Economic Impact</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Rana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Manshin</surname>
          </string-name>
          ,
          <article-title>Contribution of tourism in india's gdp in pre-and post-pandemic scenarios</article-title>
          , in: Sustainable Development of Transport: Economy, Transformation, Logistics and
          <string-name>
            <given-names>ESG</given-names>
            <surname>Agenda</surname>
          </string-name>
          . Volume
          <volume>2</volume>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Hall</surname>
          </string-name>
          , G. Richards,
          <article-title>Tourism and sustainable community development</article-title>
          , volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>Routledge</surname>
            <given-names>London</given-names>
          </string-name>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Professor rb singh (1955˜ 2021), an icon of indian geography: A passage on the path of lineage, legacy and liminality</article-title>
          ,
          <source>Space and Culture</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>06</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <article-title>Religious tourism and ascetic integrity: A sociological study of economic dependency and sacred authenticity in varanasi (</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Feizabadi</surname>
          </string-name>
          ,
          <article-title>A critical review of the sustainability of tourism in varanasi</article-title>
          , Department of Geography, Banaras Hindu University (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pande</surname>
          </string-name>
          ,
          <article-title>Religious tourism in uttar pradesh: A case study of varanasi</article-title>
          ,
          <source>CASEPEDIA</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Crawford</surname>
          </string-name>
          ,
          <article-title>Linguistic changes in spontaneous speech for detecting parkinson's disease using large language models</article-title>
          ,
          <source>PLOS Digital Health</source>
          <volume>4</volume>
          (
          <year>2025</year>
          )
          <article-title>e0000757</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lopyrev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Squad:
          <volume>100</volume>
          ,000+
          <article-title>questions for machine comprehension of text</article-title>
          ,
          <source>arXiv preprint arXiv:1606.05250</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>Albert:</surname>
          </string-name>
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>11942</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kakwani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kunchukuttan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Golla</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. NC</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Khapra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , Indicnlpsuite:
          <article-title>Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages, in: Findings of the association for computational linguistics</article-title>
          :
          <source>EMNLP</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>4948</fpage>
          -
          <lpage>4961</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , M. Douze,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          ,
          <article-title>Billion-scale similarity search with gpus</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>535</fpage>
          -
          <lpage>547</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale, in: Proceedings of the 58th annual meeting of the association for computational linguistics</article-title>
          ,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Oguz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W.-t. Yih,
          <article-title>Dense passage retrieval for open-domain question answering</article-title>
          .,
          <source>in: EMNLP (1)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>6769</fpage>
          -
          <lpage>6781</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>