<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>E. Yang);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Extending Translate-Train for ColBERT-X to African Language CLIR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eugene Yang</string-name>
          <email>eugene.yang@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dawn J. Lawrie</string-name>
          <email>lawrie@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul McNamee</string-name>
          <email>mcnamee@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Mayfield</string-name>
          <email>mayfield@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ColBERT-X</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Translate-Train</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>PLAID</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>JH POLO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIRE'23: Forum for Information Retrieval Evaluation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Human Language Technology Center of Excellence, Johns Hopkins University</institution>
          ,
          <addr-line>Baltimore, Maryland</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unoficial runs that use an alternative training procedure with a similar training setting.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>CEUR
Workshop
Proceedings</p>
      <p>
        We use MS MARCO [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] training triples with English queries and machine-translated African
language passages (Hausa, Somali, Swahili, and Yoruba) to perform Translate-Train for
ColBERTX based on the out-of-box XLM-RoBERTa-Large model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] (Run 4) and our MLM-fine-tuned
version (Run 6). We also compare with the ColBERT-X model trained with English MS MARCO,
i.e., English-Trained, (Runs 3 and 5) to understand the quality and usefulness of the
machinetranslated MS MARCO in the African languages. Finally, we experiment with a new technique,
JH POLO [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], that uses large language models to generate English training queries drawn from
the retrieval collection to perform in-domain retrieval fine-tuning (Run 7).
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Machine Translation</title>
      <p>We used automated machine translation (MT) in two principal ways for the evaluation. First, we
used document translation to create English-language representations of the CIRAL document
collections, as this directly enables search using English queries. Second, we translated the MS
MARCO passages from English to the four African languages.</p>
      <p>
        Transformer-based models were trained using Amazon’s Sockeye v2 toolkit [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with training
data that was principally from the open source repository, OPUS [11]. Preprocessing steps
included: running the Moses tokenizer, removal of duplicate lines, and learning of subword
units using the subword-nmt toolkit. Case was retained. Notable hyperparameters include:
use of 6 layers in both encoder and decoder; 512 dimensional embeddings; 8 attention heads;
2,048 hidden units per layer; 30,000 subword byte pair encoding (BPE) unit, separately in source
and target languages; batch size of 4,096; the Adam optimizer with an initial learning rate of
2 × 10−4.
      </p>
      <sec id="sec-3-1">
        <title>2.1. Document Translation</title>
        <p>When the query language is known ahead of time it is possible to translate documents into
the query language, efectively reducing the CLIR problem to a monolingual task. Of course
the quality of automated machine translation can vary considerably, and some queries can
materially sufer if named-entities or other essential query elements are mistranslated. When
languages have fewer resources, and when source and target languages difer in linguistic
typology, translation can be challenging.</p>
        <p>To increase the likelihood of producing better quality document translations we created
synthetic training bitext so that our neural machine translation models would have larger
quantities of data to work with. In recently published work McNamee and Duh [12] showed
that back-translation can be particularly eficacious in lower-resource settings, and helps with
lexical coverage in the resulting translation system. For this setup we first trained
Englishto-Other models, and used these initial four models to back-translate 7 million sentences of
web-crawled English news. Then for each language these 7 million synthetic translations were
added to our human produced training data (i.e., bitext from OPUS) to then train the forward
models which were used to create English language translations of the four African document
collections.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Translating MS MARCO</title>
        <p>MS MARCO was created to support neural IR over English texts. To support the Translate-Train
approach for the cross language setting, we wanted to produce translations of MS MARCO into
Hausa, Somali, Swahili, and Yoruba. The original English dataset consists of 8,841,823 passages
containing 497 million words. Table 2 shows the quantity of training bitext, translation quality
scores on a commonly used benchmark, and the number of words in the translated MS MARCO
dataset, by language.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Training Pipeline</title>
      <p>Our full training pipeline for ColBERT-X starts from the pretrained XLM-RoBERTa Large model,
followed by masked language model fine-tuning (MLM), retrieval fine-tuning with
translatetrain, and finally, in-domain fine-tuning with JH POLO. This section describes each fine-tuning
step.</p>
      <sec id="sec-4-1">
        <title>3.1. Masked Language Model Fine-tuning</title>
        <p>
          Since XLM-RoBERTa [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] pretraining does not include Yoruba, we designed a fine-tuning step
to accommodate this absence. However, presenting only Yoruba text to the model during
ifne-tuning risks catastrophic forgetting of other language knowledge. Specifically, we would
like the language model to retain language knowledge related to the four African languages and
to the query language – English. Therefore, we present documents in Hausa, Somali, Swahili,
Yoruba, and English round-robin to perform masked language model fine-tuning. We used
Common Crawl documents in Afriberta Corpus [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for the four African languages and collected
additional English Common Crawl documents to match the genre.
        </p>
        <p>We fine-tune the model for 200,000 update steps using a learning rate of 1 × 10−5 and a batch
size of 48 text sequences of a maximum length of 512 tokens each. We used four A100 NVidia
GPUs to train the model. Fine-tuning took around 34 hours to complete.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Retrieval Fine-tuning with Translate-Train</title>
        <p>
          To transform a multilingual language model into a CLIR ColBERT-X model, we fine-tuned
the language model using MS MARCO small training triples with the original English queries
and translated passages (Translate-Train) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. We evaluate this Translate-Train with both the
pretrained XLM-RoBERTa model and our MLM-fine-tuned language model. The model is
trained with a contrastive loss using Cross-Entropy between the positive and negative passages
of each training query. For comparison, we also fine-tuned the language model with English
MS MARCO without translation (English-Train).
        </p>
        <p>
          We fine-tune the language model with the retrieval objective for 200,000 update steps with a
learning rate of 5 × 10−6 and a batch size of 64 triples (query, positive, and negative passage
triplets). Following the ColBERT-X [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] training setup, we pad the queries to 32 tokens with
[MASK] tokens. Each ColBERT-X model is trained with eight V100 NVidia GPUs for around 50
hours. For the oficial submissions, we used the PLAID [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] implementation of ColBERT training.
However, after the submission, we discovered that the ColBERT-X implementation 3, which is
based on the ColBERT v1 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] codebase, provides a more stable and efective training process.
Thus, we also report a set of unoficial runs using this implementation.
        </p>
        <p>
          For retrieval, we use the PLAID [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] retrieval implementation, which uses K-Means clustering
and compression to approximate and accelerate retrieval. We compress each document token
residual vector dimension down to one bit, resulting in a 128-bit residual representation for
each document token.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. JH POLO In-Domain Retrieval Fine-tuning</title>
        <p>Training data for the CIRAL languages is quite limited. One option for new training data is
Translate-Train: translating the documents of an existing retrieval training collection, such as
MS MARCO, to the target languages. However, machine translation for the CIRAL languages
is not particularly good at the time of the evaluation. Furthermore, there is no guarantee that
the documents of an existing evaluation collection will be a good match for those of the target
collection. Creating new training examples using the target collection itself for the documents
would eliminate these problems; documents would be naturally occurring, and would therefore
not exhibit “translationese.” And there would never be a mismatch between the genre or style
of the documents in the training collection and those in the target collection.
3https://github.com/hltcoe/ColBERT-X
You must write questions for a news quiz to appear in the newspaper. A news quiz asks about
events in the news, NOT about news articles. Here are two articles that appeared in this week’s
news: «first» «second» For each article give five factual news quiz English questions, one per
line with no extraneous words, that are answered by the events described in that document and
are not answered by the events described in the other document. The quiz questions must never
refer to individual news articles, or assume the quiz-taker has seen those articles. Precede the
first five with DOCA: and the second with DOCB:</p>
        <p>
          JH POLO [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a methodology for creating such training data. It relies on the existence of a
large generative language model that includes coverage of the target language. The process
begins by selecting two documents from the target collection that have some topic overlap. One
of the documents will end up as a relevant document in a training example, and the other will
become a non-relevant document in the same example. Selecting document pairs that are closer
in meaning will lead to harder negative examples in the training examples produced.
        </p>
        <p>Once the documents have been selected, the generative language model is prompted to create
a query for which the first document is relevant and the second document is not. This query,
and the two documents, are bundled to form a single new training example. This process can
be repeated to generate as many training examples as desired.</p>
        <p>We used the JH POLO methodology to create training data for the four CIRAL languages.
We used GPT-4 for the generative language model. While GPT-4 would occasionally complain
that it was unable to handle documents in one of the CIRAL languages, in almost all cases it
would willingly process the documents without being told what language they were written in.
In addition to allowing naturally-occurring documents in the training set, this approach hits
the sweet spot of most generative LLMs: producing short, English texts. Our prompt is shown
in Figure 1. The prompt accomplishes several things:
• It identifies the task as question answering.
• It attempts to focus the questions on the content of the news articles, not on the articles
themselves. It also tries to prevent the generated queries from assuming that the searcher
knows the content of any article a priori. GPT-4 had a dificult time conforming to these
requirements.
• It includes the text of the two documents or passages.
• It asks for ten training examples for each document pair, with each document serving as
the relevant document for five of thOSe queries.</p>
        <p>• It specifies how the output should be formatted.</p>
        <p>To select document pairs, we first used each document with more than a fixed number of
characters (the “query document”) as a query over the other documents using a BM25 sparse
retrieval model. We considered each of the top twenty documents in the resulting ranked list (the
“candidate document”) not including the query document. We eliminated from consideration
any document that met any of the following criteria:
{
"id": "chatcmpl-7ruApne33HFrrRiJB25563Ctcr3Ud",
"object": "chat.completion",
"created": 1693081671,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "DOCA:\n1. Who reflected on President Buhari's leadership
↪ in the recent public discourse?\n2. What is the criticism
↪ regarding the opposition's treatment raised in the recent
↪ debate?\n3. What were the financial implications mentioned in
↪ recent political critiques?\n4. What are the societal issues
↪ addressed by the speaker?\n5. Who made the prayer for wisdom and
↪ understanding at the end of the recent speech?\n\nDOCB:\n1. What
↪ were the tragic aspects of Yunusa and Ese's love story mentioned
↪ in recent reports?\n2. Who recently criticized some religious
↪ leaders for committing wrongdoings?\n3. What legal judgement was
↪ recently confirmed as punishment for an offender?\n4. What issue
↪ of child exploitation came to light recently?\n5. What phrase has
↪ been adopted by vocal sympathizers to describe the prevailing
↪ situation?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 742,
"completion_tokens": 161,
"total_tokens": 903
}</p>
        <p>}</p>
        <p>• the ratio of the score of the candidate document to that of the query document was greater
than 0.65
• the longest common substring between the query document and the candidate document
was more than 60% of the entire candidate document
• fewer than twenty characters from the candidate document were not part of the longest
common substring
• the candidate document had fewer than 150 characters
We selected for inclusion in the training collection the pair that was not rejected by the above
criteria, and that maximized the size of the training collection, given that no document was
allowed to be part of more than one pair.</p>
        <p>Once the document pairs were selected, we embedded the text of each document in the GPT-4
prompt and ran the prompt. In most cases, GPT-4 successfully produced output with ten output
queries per prompt. Figure 2 shows the GPT-4 output for a completed prompt.</p>
        <p>We applied two forms of automated quality control to the JH POLO outputs. First, because
GPT-4 had a dificult time omitting mention of the documents in its output queries and not
assuming the user knew anything about those documents, we eliminated any query that
contained any of the words articles, reports, speaker, these. Second, to try to eliminate examples
where the relevant and non-relevant documents were too close together, we used an mMiniLM
cross-encoder (cross-encoder/mmarco-mMiniLMv2-L12-H384-v1) to compare the query to
each of the documents; we eliminated any example where the cross-encoder score (between
0 and 1) for the positive document was not at least 0.15 above the score of the non-relevant
document. The result was a collection of 48,459 training examples over 14,323 document pairs
in the four CIRAL languages combined.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Results</title>
      <sec id="sec-5-1">
        <title>4.1. Unoficial Runs</title>
        <p>Based on other experiments, we discovered that the PLAID training implementation (essentially
version 3 of the ColBERT implementation) leads to degraded performance in the resulting IR
model. We retrained the models using the original ColBERT-X implementation and present the
results in Table 4. Since these runs are produced after the submission deadline, the runs are
not part of the pooling assessments. Therefore, only around 50% to 60% of the top 20 retrieved
documents are judged. While treating the unjudged documents as non-relevant is a common
assumption in IR evaluation, this also suggests that results presented in Tables 3 (and other
oficial submissions) and 4 are not perfectly comparable.</p>
        <p>Based on results in Table 4, models trained with the ColBERT-X implementation seem to be
generally more efective. While the trend of the contribution provided by each training step is
less clear, Translate-Train without MLM still provides more efective models than English-Train,
except for Somali.</p>
        <p>However, based on this set of results, the benefit of the additional MLM fine-tuning step
is smaller. In fact, the knowledge in the Afriberta Corpus and in the machine-translated MS
MARCO seem to be contradictory. While performing only Translate-Train or MLM fine-tuning
still leads to similar efectiveness, doing both does not give us additional advantage.
[11] J. Tiedemann, Parallel data, tools and interfaces in OPUS, in: Proceedings of the Eighth
International Conference on Language Resources and Evaluation (LREC’12), European
Language Resources Association (ELRA), Istanbul, Turkey, 2012, pp. 2214–2218. URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf.
[12] P. McNamee, K. Duh, An extensive exploration of back-translation in 60 languages,
in: Findings of the Association for Computational Linguistics: ACL 2023,
Association for Computational Linguistics, Toronto, Canada, 2023, pp. 8166–8183. URL: https:
//aclanthology.org/2023.findings-acl.518. doi:10.18653/v1/2023.findings- acl.518.
[13] E. Yang, S. Nair, R. Chandradevan, R. Iglesias-Flores, D. W. Oard, C3: Continued
pretraining with contrastive weak supervision for cross language ad-hoc retrieval, in:
Proceedings of the 45th International ACM SIGIR Conference on Research and
Development in Information Retrieval, SIGIR ’22, Association for Computing Machinery,
New York, NY, USA, 2022, p. 2507–2512. URL: https://doi.org/10.1145/3477495.3531886.
doi:10.1145/3477495.3531886.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Santhanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <article-title>Plaid: an eficient engine for late interaction retrieval</article-title>
          ,
          <source>in: Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1747</fpage>
          -
          <lpage>1756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <article-title>Colbert: Eficient and efective passage search via contextualized late interaction over bert</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McNamee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mayfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <article-title>Transfer learning approaches for building cross-language dense retrieval models</article-title>
          ,
          <source>in: Advances in Information Retrieval: 44th European Conference on IR Research</source>
          , ECIR
          <year>2022</year>
          , Stavanger, Norway,
          <source>April 10-14</source>
          ,
          <year>2022</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2022</year>
          , p.
          <fpage>382</fpage>
          -
          <lpage>396</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -99736-6_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          , S. MacAvaney, J. Mayfield,
          <string-name>
            <given-names>P.</given-names>
            <surname>McNamee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          , E. Yang,
          <article-title>Overview of the TREC 2022 NeuCLIR track</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>12367</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Santhanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saad-Falcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zaharia, ColBERTv2: Efective and eficient retrieval via lightweight late interaction, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>3715</fpage>
          -
          <lpage>3734</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .naacl-main.
          <volume>272</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ogueji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Small data? no problem! exploring the viability of pretrained multilingual language models for low-resourced languages</article-title>
          ,
          <source>in: Proceedings of the 1st Workshop on Multilingual Representation Learning</source>
          , Association for Computational Linguistics, Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>116</fpage>
          -
          <lpage>126</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .mrl-
          <volume>1</volume>
          .
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          , L. Deng, MS MARCO:
          <article-title>A human generated machine reading comprehension dataset</article-title>
          ,
          <source>CoRR abs/1611</source>
          .09268 (
          <year>2016</year>
          ). URL: http://arxiv.org/abs/1611.09268. arXiv:
          <volume>1611</volume>
          .
          <fpage>09268</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>747</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mayfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mason</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Synthetic cross-language information retrieval training data</article-title>
          ,
          <source>arXiv preprint arXiv:2305.00331</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , NIPS'17, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>6000</fpage>
          -
          <lpage>6010</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>