<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Robertson, H. Zaragoza, The probabilis-
Linguistics</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.18653/v1/D19-1410</article-id>
      <title-group>
        <article-title>Exploring Text-Embedding Retrieval Models for the Italian Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuri Noviello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Tamburini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FICLIT - University of Bologna</institution>
          ,
          <addr-line>Via Zamboni, 32</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>YN: Conceptualization</institution>
          ,
          <addr-line>Investigation, Software, Formal FT: Methodology, Supervision, Writing - Review &amp; Editing</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>11</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Text retrieval systems have become essential in the field of natural language processing (NLP), serving as the backbone for applications such as search engines, document indexing, and information retrieval. With the rise of generative AI, particularly Retrieval-Augmented Generation (RAG) systems, the demand for robust text retrieval models has increased. However, existing large language models (LLMs) and datasets are often insuficiently optimized for Italian, limiting their performance in Italian text retrieval tasks. This paper addresses this gap by proposing both a data collection and specialized models tailored for Italian text retrieval. Through extensive experimentation, we analyze the improvements and limitations in retrieval performance, paving the way for more efective Italian NLP applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Italian embedding</kwd>
        <kwd>text embedding</kwd>
        <kwd>retrieval model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>tasks. This shortfall highlights a significant area for
improvement and development within the Italian NLP
comIn recent years, text retrieval systems have emerged as munity.
a cornerstone of the natural language processing (NLP) To address this gap, our work aims to propose both
ifeld. These systems are crucial in various applications, novel datasets and specialized models optimized for
Italincluding search engines, document indexing, and infor- ian text retrieval. By focusing exclusively on the Italian
mation retrieval tasks. Their primary function is to fetch language, we strive to enhance the performance of
rerelevant pieces of text from large corpora, enabling efi- trieval tasks.
cient and accurate information access. This capability is The primary contribution of this paper is the
introcrucial for numerous industries, including legal, medical, duction of a comprehensive Italian text retrieval system,
and customer service sectors, where timely and precise encompassing both a curated dataset collection and
speinformation retrieval can significantly impact decision- cialized language models. Through extensive
experimenmaking processes. tation and rigorous evaluation, we demonstrate the
ef</p>
      <p>With the advent of generative AI, the importance fectiveness of our approach, setting the stage for more
of text retrieval systems has only amplified. Ad- advanced and reliable Italian text retrieval solutions
apvanced systems, particularly chatbots based on Retrieval- plicable across diverse tasks.</p>
      <p>
        Augmented Generation (RAG) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], have become essential
tools for various purposes. RAG systems combine
retrieval mechanisms with generative models to produce 2. Related Works
contextually relevant and accurate responses in
conversational AI applications. This integration has enhanced
the capabilities of chatbots, making them more eficient
in providing precise information and engaging in
meaningful dialogues.
      </p>
      <p>
        Despite the impressive performance of recent large
language models (LLMs) as conversational agents in
Italian contexts, there remains a notable gap in the resources
and models specifically designed for Italian text retrieval
The development of text embedding models has seen
significant advancements over the years, evolving from
simple word representations to sophisticated contextual
embeddings. Early models like Word2Vec [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and GloVe
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] set the foundation by capturing semantic
relationships between words through fixed-size vector
representations. These models, however, lacked the ability to
understand context, leading to the development of more
advanced techniques.
      </p>
      <p>
        Transformers have revolutionized the field of NLP by
introducing mechanisms to capture context and
relationships across entire sentences. BERT (Bidirectional
Encoder Representations from Transformers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) marked
a signicfiant milestone, providing deep contextualized
word embeddings by considering both left and right
con
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Data</title>
      <p>
        for various large language models (LLMs), such as GPT-3 dataset encompasses 18 diferent languages, it does not
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and T5 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which further extend the capabilities of include any Italian data. Given the dataset high quality,
transformers by scaling up model size and training data. particularly in defining hard negatives through manual
      </p>
      <p>
        Sentence Transformers, an extension of the trans- annotation, we decided to translate the dataset into
Italformer architecture [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], focus on generating embeddings ian using automated methods. In particular, we focused
for whole sentences rather than individual words. Models on the English section of the dataset, which is organized
like SBERT (Sentence-BERT) enhance the performance as shown in Table 1.
of sentence-level tasks, such as semantic textual
similarity and information retrieval, by fine-tuning BERT Table 1
specifically for sentence embeddings. This approach has English data organization of MIRACL
demonstrated significant improvements in capturing the Split Query Passage
semantic meaning of sentences, but specific training cor- train 2,863 29,416
pora, annotated with sentence similarity scores, must be dev 799 8,350
provided for setting up the system. corpus - 32,893,221
      </p>
      <p>In the realm of multilingual models, the multilingual
E5 family has emerged as a robust solution for handling The translation process aimed to preserve these
qualimultiple languages within a single model architecture [8]. ties while adapting the content to Italian, thereby creating
These models are pre-trained on a multilingual corpus, a robust resource for training and evaluating Italian text
enabling them to perform efectively across diferent lin- retrieval models.
guistic contexts. The multilingual E5 models leverage the To translate the dataset, we experimented with two
strengths of transformer architectures to provide high- diferent approaches: a large language model (LLM)
transquality embeddings for numerous languages, including lation via the PaLM 2 API [13] and an open-source ofline
less-resourced ones. This makes them particularly valu- translation via Argos Translate [14]. The translation
able for tasks requiring cross-lingual understanding and quality was evaluated to ensure that the Italian version
retrieval. maintained the dataset integrity and usefulness for
train</p>
      <p>The continuous evolution of text embedding mod- ing efective retrieval models.
els, from standard embeddings to advanced
transformerbased approaches, highlights the dynamic nature of NLP
research. Each progression addresses the limitations of its 3.1.1. Datasets translation using PaLM 2
predecessors, contributing to more accurate and
contextaware representations, which are crucial for a wide array
of applications in natural language understanding and
information retrieval.</p>
      <p>We performed the translation of the whole training and
development English sets of MIRACL using PaLM 2 API
[13]. Due to budget constraints, we did not translate the
entire corpus, as it would have required approximately
€10,000, given the huge number of documents. We used
the following prompt in order to obtain the Italian
translation:
The quality and abundance of the data is one of the main Translate the following text in Italian.
aspect in order to obtain high quality text embedding Write the translation only:
models. The data used in this work for training the {text}
models were adapted from the following datasets:
MIRACL [9], SQuAD-it [10], MLDR [11] and WikipediaQA- We used the same prompt for both queries and
ita [12]. Among these, only the Multilingual Long- documents. For documents, we used the model
Document Retrieval (MLDR) was used as-is, as it already text-bison-32k@002, and for queries, we relied on
contains 2, 151 examples of Italian triplets in the form text-bison@002. This resulted in a total of 37, 351
of query-positive passage-negative passage. API calls, as some documents are associated with
multiFollowing sections detail the processing of the other ple queries.
datasets.</p>
      <sec id="sec-2-1">
        <title>3.1. MIRACL-it</title>
        <sec id="sec-2-1-1">
          <title>The Multilingual Information Retrieval Across a Continuum of Languages (MIRACL) dataset is widely used for building multilingual information retrieval models, such as the multilingual E5 models family [8]. Although the</title>
          <p>3.1.2. Open-source ofline translation using Argos</p>
          <p>Translate</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Argos Translate is an open-source library that uses Open</title>
          <p>NMT for translation and supports multiple language
model packages [14]. We utilized the English-to-Italian
model to translate the training and development sets of
MIRACL, including the entire corpus.
3.1.3. Translations quality evaluation</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>The translation performed by PaLM 2, as reported in the</title>
          <p>Technical Report [13] and confirmed by our empirical
tests, is considered high-quality. To measure the quality
of the translation performed by Argos Translate, we used
the SOTA automatic metric BLEURT [15] and we used
the PaLM 2 translations as reference. Since we do not 4. Methodology
have the entire corpus translated by the LLM, we
conducted the evaluation only on the overlapping portion of 4.1. Contrastive learning on labeled data
the translated datasets, resulting in a corpus of 33, 689
documents.</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>RAG finetuning. It contains more than 100, 000 question</title>
          <p>answer pairs. Similar to SQuAD-it, we considered only
the question and context attributes for each
example and applied the same hard negative mining strategy
using the BM25 algorithm.</p>
        </sec>
        <sec id="sec-2-1-5">
          <title>This work implements a dual-encoder model that uses</title>
          <p>a combination of supervised loss functions to achieve
efective learning.</p>
          <p>The dual-encoder model encodes queries and passages
separately to produce their respective embeddings:
 = Encoderquery()
 = Encoderpassage()</p>
        </sec>
        <sec id="sec-2-1-6">
          <title>The similarity score between a query  and a passage  is computed as the dot product of their embeddings:</title>
          <p>=  ·</p>
          <p>The embeddings are normalized before computing the
dot product, resulting in cosine similarity:
(1)
(2)
(3)
(4)
(5)</p>
        </sec>
        <sec id="sec-2-1-7">
          <title>The average BLEURT score of 0.625 indicates that</title>
          <p>Argos Translate produced a decent translation, validating
its use as a cost-efective alternative for text embedding
model fine-tuning and evaluation.
q^ =

‖‖
and p^ =

‖‖</p>
        </sec>
        <sec id="sec-2-1-8">
          <title>Thus, the similarity score becomes:</title>
          <p>= q^ · p^
3.2. SQuAD-it For a batch of queries and passages, the contrastive
loss encourages higher similarity scores for matching
SQuAD-it is obtained through semi-automatic transla- query-passage pairs and lower scores for non-matching
tion of the SQuAD dataset into Italian, it contains more pairs. The loss function is defined as:
than 60, 000 question-answer pairs. For these
experiments, we considered only the question and context
tanrteitprgilabetutsitevisneotfhpeeaasfcoshramdgaetoa,fsweqetuepexerarfymorp-mlpee.odTshihaetrnid,vnseiengcpaetaiwvsesesanmgeeiend-- cont = 1 ∑=︁1 [︃− log ∑︀e=x1pe(xp( / )/ ) ]︃ (6)
ing. We used the standard BM25 algorithm [16] to extract where  is the batch size,  is the temperature
pathe top-10 similar documents for each query, excluding rameter, and  represents the similarity score for the
positive passages for the given query. This process en- matching query-passage pair.
sured that the dataset was suitably challenging for
training robust retrieval models.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. WikipediaQA-ita</title>
        <sec id="sec-2-2-1">
          <title>The WikipediaQA-ita is a datasets synthetically generated using a custom model from ReDiX Informatica; it has been created on Italian and specifically designed for</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>4.2. Fine-tuning procedure</title>
        <sec id="sec-2-3-1">
          <title>We performed our answer-generation experiments by using the following base models:</title>
          <p>1. Minerva-1B [17],
2. Qwen2-1.5B [18],
3. Gemma-2B [19],</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>We relied on the foundational versions of these models.</title>
          <p>To speed up the computation, we implemented a LoRA
ifne-tuning procedure. As a pooling strategy, we used
EOS (End-Of-Sequence) pooling and normalized the
embeddings. While we did not apply any prefix for passages,
we added the following prefix to queries:
Given a search query, retrieve relevant
passages that answer the query.\nQuery:</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>2. Recall@100: Measures the proportion of relevant</title>
          <p>documents retrieved among the top 100 results.
3. nDCG@10 (Normalized Discounted Cumulative
Gain): Measures the ranking quality by
comparing the order of results to the ideal ranking,
emphasizing higher ranks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Discussion and Analysis</title>
      <sec id="sec-3-1">
        <title>We also experimented with using an Italian text pre</title>
        <p>ifx but found no significant diference in performance. We propose a comparison of the performance of diferent
Therefore, we opted for an English prefix to maintain models on our Italian benchmark. For this analysis, we
consistency with other open-source models. considered the Multilingual Sentence Transformers
mod</p>
        <p>The fine-tuning process was executed on a weighted els [22] and the multilingual versions of the E5 models
mixture of the datasets reported in Table 2. During this family. The scores are reported in Table 3.
phase, the tokenization of the datasets documents was
truncated at 512 tokens. We trained the model in mixed 5.1. Argos vs PaLM
precision for 3 epochs, using a learning rate of 10− 5.</p>
        <p>For each model, we conducted two fine-tuning experi- By observing the performance on the MIRACL sets
transments: one using the dataset with MIRACL data trans- lated with PaLM 2 and Argos Translate, we found that
lated with PaLM 2 and another using the dataset trans- every model achieved better results on the dataset
translated with Argos Translate. lated with the PaLM 2 API. This behavior can be
attributed to the higher translation quality provided by
Table 2 PaLM 2, which likely ofers clearer sentence structures
Fine-tuning datasets organization for the models to process.</p>
        <p>Source Sample However, since the diference in the results is very
MIRACL-it 100% marginal, we can state that the machine translation
proMLDR-it 100% vided by Argos Translate is a valid and cost-efective
SQuAD-it 20% alternative for text embedding modeling.</p>
        <p>WikipediaQA-ita 10% On the contrary, we did not find any significant
correlation between the models trained with diferent
translation versions, given their small diference in scores,
4.3. Evaluation procedure except for the MLDR-it evaluation of gemma-2B-Argos,
which will be discussed later. This indicates that while
translation quality can impact performance, the overall
diference may not be substantial enough to render one
method vastly superior to the other in practical
applications for this specific task.</p>
      </sec>
      <sec id="sec-3-2">
        <title>For the evaluation, we considered only the datasets for</title>
        <p>whose we already had the representation of relevance
judgments (Qrels) in the TREC standard format [20],
namely MIRACL-it and MLDR-it. This setup allows for a
comprehensive evaluation of Retrieval Systems for the
Italian language, encompassing both small/medium and 5.2. Multilingual Sentence Transformers
large documents.</p>
        <p>As with the training procedure, we evaluated each Generally, the performance of the Multilingual
Senmodel using both the dataset with MIRACL data trans- tence Transformers is similar when evaluated on the
lated with PaLM 2 and the dataset translated with Argos MIRACL-it sets. However, there is a notably
sigTranslate. To ensure consistency, we conducted evalu- nificant performance gap for the MLDR-it dataset.
ations only on the overlapping portions of the datasets We attribute the very poor performance of the
between the two translations. paraphrase-multi-MiniLM-L12-v2 model to its</p>
        <p>After creating the embeddings for both the test queries small maximum input token length of 128 tokens, which
and documents, we used FAISS [21] to retrieve relevant is unsuitable for datasets containing long documents. As
documents. Finally, we employed the original implemen- expected, both our proposed models and the E5 models
tation of TREC-eval for metrics computation. outperform all the Multilingual Sentence Transformers
We evaluated the models using the following metrics: across all metrics on every dataset.</p>
      </sec>
      <sec id="sec-3-3">
        <title>1. MRR@10 (Mean Reciprocal Rank): Measures the average of the reciprocal ranks of the first relevant document retrieved.</title>
        <sec id="sec-3-3-1">
          <title>5.3. Multilingual E5 Models</title>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>The Multilingual E5 Models achieved very high scores</title>
        <p>in the evaluation of both datasets. In particular, the
multilingual-E5-large model achieved the best
MRR@10, Recall@100, and nDCG@10 scores on both
translations of the MIRACL dataset. As expected, the
multilingual-E5-large outperformed the base
version, although the performance gap narrows with longer
documents (MLDR-it).</p>
        <sec id="sec-3-4-1">
          <title>5.4. Proposed Models</title>
          <p>per lies in illustrating a strategy for fine-tuning Large
Language Models (LLMs) to achieve efective semantic
representations of Italian texts. Additionally, we provide
original models and datasets that serve as a starting point
to bridge the performance gap between models designed
for Italian and those optimized for other languages.</p>
          <p>Our results demonstrate that the proposed models
achieve performance comparable with state-of-the-art
models for medium-sized documents and even surpass
them when dealing with datasets containing very long
documents. This suggests that our tailored approach to
Italian text retrieval is not only viable but also highly
efective.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>By observing the scores obtained by our proposed mod</title>
        <p>els, it appears that the models based on Minerva-1B
achieved lower scores compared to the others, suggest- 6.1. Limitations and Future works
ing that it may not be the most suitable foundation model
for this type of task.</p>
        <p>The results obtained by the Gemma-2B and
Qwen2-1.5B based models are very similar,
except for the low MRR@10 and nDCG@10 scores
obtained by gemma-2B-Argos on the MLDR-it dataset,
which could indicate worse training stability caused
by data translated with Argos Translate. However, the
model achieved the best Recall@100 score on the same
dataset, suggesting that this behavior may be caused by
random noise during fine-tuning.</p>
        <p>Finally, our proposed models achieved both the first
and second best scores for each metric associated with
the MLDR-it test set, demonstrating their efectiveness
in handling long document retrieval tasks.</p>
      </sec>
      <sec id="sec-3-6">
        <title>One of the main limitations of this study is the limited</title>
        <p>availability of hardware resources. Our fine-tuning
process involved a significantly smaller number of dataset
examples, well below 50, 000, compared to the
multilingual E5 models, which were pre-trained on over 2 billion
text pairs and fine-tuned on more than 1 million.</p>
        <p>Additionally, we were unable to evaluate the proposed
models on the complete MIRACL corpus, as it would
have required more than 100 hours of computation per
model. This restriction has highlighted a key area for
potential improvement in our research. Future work
could benefit significantly from experiments involving
larger quantities of Italian data and the application of
more advanced model architectures.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
      <sec id="sec-4-1">
        <title>This work presents a comprehensive study on models and datasets focused on Information Retrieval (IR) for Italian documents. The primary contribution of this pa</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7. Online Resources</title>
      <sec id="sec-5-1">
        <title>The fine-tuned adapters and the datasets have been made</title>
        <p>available (Models1, Datasets2).</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>8. Implementation Details</title>
      <sec id="sec-6-1">
        <title>All the experiments were executed on a Compute Engine Virtual Machine with 2 NVIDIA L4 GPUs.</title>
        <sec id="sec-6-1-1">
          <title>8.1. Translation</title>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>While the ofline translation relies on the model proposed by Argos Translated, to speed up computation, we directly utilized the API of CTranslate2 [23].</title>
        <sec id="sec-6-2-1">
          <title>8.2. Fine-tuning</title>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>The fine-tuning experiments were conducted using an</title>
        <p>adaptation of the code from the Tevatron Toolkit [24].
The primary modifications included excluding the "title"
attribute from document encoding to simulate a realistic
scenario and filtering out queries not associated with
negative passages.</p>
        <sec id="sec-6-3-1">
          <title>8.3. Evaluation</title>
          <p>Similar to the fine-tuning process, the evaluation was
conducted without considering the "title" attribute for
documents. Each model was evaluated according to the
instructions provided by the authors. For creating
embeddings with the Multilingual Sentence Transformers,
we relied on the sentence-transformers
implementation. For all other models, we used the transformers
library [25].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <sec id="sec-7-1">
        <title>We would like to thank Dinova Srl for funding this research and providing access to the Google Cloud Virtual Machines used in this project. Their support has been essential for this work.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Credit author statement</title>
      <sec id="sec-8-1">
        <title>1https://huggingface.co/collections/yuri-no/</title>
        <p>italian-retrieval-llm-adapters-667ab367ce13150b7c774078
2https://huggingface.co/collections/yuri-no/
italian-retrieval-datasets-667acdccf922286634ef603b
sociation for Computational Linguistics, USA, 1996,
p. 373–410. URL: https://doi.org/10.3115/1119018.
1119070. doi:10.3115/1119018.1119070.
[21] M. Douze, A. Guzhva, C. Deng, J. Johnson, G.
Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, H. Jégou,
The faiss library (2024). arXiv:2401.08281.
[22] Y. Yang, D. Cer, A. Ahmad, M. Guo, J. Law, N.
Constant, G. Hernandez Abrego, S. Yuan, C. Tar, Y.-h.
Sung, B. Strope, R. Kurzweil, Multilingual
universal sentence encoder for semantic retrieval, in:
A. Celikyilmaz, T.-H. Wen (Eds.), Proceedings of
the 58th Annual Meeting of the Association for
Computational Linguistics: System
Demonstrations, Association for Computational Linguistics,
Online, 2020, pp. 87–94. URL: https://aclanthology.
org/2020.acl-demos.12. doi:10.18653/v1/2020.
acl-demos.12.
[23] OpenNMT, Ctranslate2, https://github.com/</p>
        <p>OpenNMT/CTranslate2, 2019.
[24] L. Gao, X. Ma, J. Lin, J. Callan, Tevatron: An
efifcient and flexible toolkit for neural retrieval, in:
Proceedings of the 46th International ACM SIGIR
Conference on Research and Development in
Information Retrieval, SIGIR ’23, Association for
Computing Machinery, New York, NY, USA, 2023, p.
3120–3124. URL: https://doi.org/10.1145/3539618.
3591805. doi:10.1145/3539618.3591805.
[25] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C.
Delangue, A. Moi, P. Cistac, C. Ma, Y. Jernite, J. Plu,
C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest,
A. M. Rush, Transformers: State-of-the-Art Natural
Language Processing, Association for
Computational Linguistics, 2020, pp. 38–45. URL: https://
www.aclweb.org/anthology/2020.emnlp-demos.6.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrievalaugmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>in: Proceedings of the 34th International Conference on Neural Information Processing Systems</source>
          , NIPS '20, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>Proceedings of Workshop at ICLR</source>
          <year>2013</year>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , C. Manning, GloVe:
          <article-title>Global vectors for word representation</article-title>
          , in: A.
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Pang</surname>
          </string-name>
          , W. Daelemans (Eds.),
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . URL: https://aclanthology.org/ D14-1162. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>D14</fpage>
          -1162.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings. neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-totext transformer</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          . URL: http://jmlr.org/papers/ v21/
          <fpage>20</fpage>
          -
          <lpage>074</lpage>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.), Proceed-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>