<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning a Sparse Representation Model for Neural CLIR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Suraj Nair</string-name>
          <email>srnair@cs.umd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugene Yang</string-name>
          <email>eugene.yang@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dawn Lawrie</string-name>
          <email>lawrie@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Mayfield</string-name>
          <email>mayfield@jhu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Douglas W. Oard</string-name>
          <email>oard@umd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Sparse Representation, Neural CLIR, Multilingual Language Models,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HLTCOE. Johns Hopkins University</institution>
          ,
          <addr-line>Baltimore MD 21211</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>College Park MD 20742</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>In monolingual retrieval, sparse representations learned atop BERT-style models ofer a complementary approach to the unsupervised BM25 model. Inspired by this line of work, we explore adapting such models to the Cross-Language Information Retrieval (CLIR) setting, in which queries and documents are in diferent languages. The lack of lexical match between queries and documents inhibits a naive replication of these monolingual models for CLIR. We propose SPLADE-X, a cross-language expansion model for CLIR that performs complementary to a strong PSQ baseline. We further identify the challenges in developing such models, make connections to existing CLIR models and highlight future directions that pave the way for learning sparse representations feasible for CLIR.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Learning sparse representation models using pretrained language models (e.g., BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])
has risen in prominence in monolingual information retrieval applications, particularly those
that access English content. The main idea is to represent queries and documents in a
highdimensional space spanning BERT’s vocabulary, where only a few dimensions (which correspond
to vocabulary terms) are non-zero. The non-zero document term weights can then be stored in a
standard inverted index. This allows us to exploit the eficiency of traditional sparse vector space
“bag of words” retrieval approaches. This framework also enables query or document expansion
by generating weights for terms that do not, but plausibly could have, appeared in either the
queries or the documents. This partially mitigates the vocabulary mismatch faced by bag of
words models such as BM25 [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. In this paper, our goal is to build a sparse representation
model for Cross-Language Information Retrieval (CLIR), in which the queries and documents
are expressed in diferent languages.
      </p>
      <p>
        With the availability of large scale training collections such as MS MARCO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that have
been translated into multiple languages [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] and an increasing variety of sparse representation
models for monolingual retrieval [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref7 ref8 ref9">7, 8, 9, 10, 11, 12, 13, 14</xref>
        ], a natural question is whether
extending these ideas to CLIR involves anything more than simply replacing a monolingual
pretrained model (e.g, BERT) with a multilingual model (e.g., mBERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or XLM-R [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ])?
To answer this question, we need to look at how the existing models generate sparse term
weights for queries or documents. We can group the existing models into two categories: a)
exact-match, where weights are changed for terms that occur in queries or documents but
no nonzero weights are added for any additional terms; or b) lexical expansion, in which the
number of terms with nonzero weights is still limited in some way, but some terms that did
not appear in the original query or the original document can be given non-zero weights. In
the case of CLIR, the exact-match approach will not work (or at least it will not work very
well!) because the queries and documents are expressed in diferent languages, generally using
diferent words. Thus our natural point of comparison should be lexical expansion.
      </p>
      <p>
        For monolingual lexical expansion, existing approaches rely on either applying a document
expansion model (e.g., doc2query [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ] or TILDE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), or utilizing a pretrained language
model head to expand a low-dimensional dense representation back out to a sparse
representation that spans the full vocabulary space (e.g., SPLADE [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). Building a document expansion
model that generalizes well beyond English is already a challenging problem [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ], one that
becomes even more challenging given the explosive growth in the vocabulary size of
multilingual pretrained models such as mBERT (110k) and XLM-R (250k). These are 3 to 7 times the
size of the monolingual BERT vocabulary (35k). It is these two factors, the need to generalize
across languages and the potential benefits of limiting the vocabulary size, that distinguish
CLIR applications of lexical expansion methods from their monolingual cousins.
      </p>
      <p>
        We propose SPLADE-X, a cross-language generalization of the lexical expansion framework
of SPLADE [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], built on top of mBERT. To manage the large vocabulary of mBERT, we use a
vocabulary reduction technique in which we choose dimensions corresponding to vocabulary
terms (i.e., subwords) belonging to only the query language (which in our case is English).
This choice forces the model to learn cross-language lexical expansions for the non-English
documents, which roughly corresponds to an encoder-only translation task. To train SPLADE-X,
we explore several cross-lingual transfer learning strategies, including zero-shot [20],
translatetrain [21] and bilingual training. In the zero-shot setup, we train the model on pairs consisting
of English queries and English passages from MS MARCO, and then we simply use that trained
system with our test collections that contain English queries and documents in some language
other than English. In the translate-train setup, we train the model on pairs consisting of
English queries from MS MARCO and translated MS MARCO passages, where we use machine
translation to produce those translated passages.
      </p>
      <p>For bilingual training, we train on triples of English queries and English passages from MS
MARCO, together with a translation of the MS MARCO passage. Inspired by Reimers and
Gurevych [22], we propose a bilingual alignment loss that encourages the sparse representation
of the English passage and its translation to be similar. Furthermore, we extend the monolingual
distillation loss proposed by Yang et al. [23] to the cross-language setting. Specifically, we distill
the similarity matrix produced by the monolingual SPLADE (teacher) model to the multilingual
SPLADE-X (student) model.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>In this section, we describe the preliminaries involved in sparse representation learning, and we
describe existing CLIR models that use diferent techniques to generate similar representations.</p>
      <sec id="sec-2-1">
        <title>2.1. Sparse representation learning</title>
        <p>Using the notation defined in Lin [24], given a query  and document  , we can generate
fixedlength vectors   () and   () for the query and document respectively. Here, the   and   are
the query and document encoder initialized using a pretrained language model.1 Using the
generated vector, we can compute a relevance score for query  and document  using a custom
scoring function  as follows
(, ) = (</p>
        <p>(),   ())</p>
        <p>The most common approach to train these models is to define a ranking loss such that the
relevance score of a query  and a relevant document  + is higher than the relevance score of
the same query  and a non-relevant document  − (i.e., (,  +) &gt; (,  −)). Specifically, the
ranking loss is defined as follows:
ℒrank = − log</p>
        <p>
          (, +)
 (, +) + ∑ −  (, −)
There are several ways of sampling non-relevant (“negative”) documents, including in-batch
negative samples [25] or using a large queue of negatives [26, 27]. In the case of a sparse
retrieval model, the fixed-length of the query and document representations corresponds to the
vocabulary size of the pretrained model. However, training with ranking loss does not ensure
the resulting representations are sparse. Existing work enforces sparsity in these representations
either by choosing specific terms that can receive non-zero weights (e.g., the terms with the
highest weights in the query or document representations [23]) or by optimizing for some form
of regularization loss (such as L1 regularization [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] or that of the FLOPS optimizer [28] used
in SPLADE [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]). Among the existing sparse models, SPLADE has the best performance on
the monolingual BEIR benchmark [29]. Hence, we have chosen to generalize SPLADE’s basic
approach for application to CLIR tasks in this paper.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Similarly Structured CLIR models</title>
        <p>Our application of SPLADE to CLIR takes a sequence of terms from a non-English document as
input, and it outputs a corresponding set of (unordered) term weights for (only) English terms.
This is essentially the same behavior that we would expect from a statistical machine translation
system that lacks a language model for the generated English, or a neural translation model that
decodes isolated terms. The first approach, modeled on statistical machine translation, has been
called Probabilistic Structured Queries (PSQ) [30]. PSQ maps term frequency vectors from the
document language to the query language using a matrix of translation probabilities (normalized
to sum to one in the document language to query language direction) as a simple matrix-vector
1In this paper, we set the query and document encoder to be the same.
(1)
(2)
product. This results in a sparse document representation, which contains nonzero term weights
only for plausible translations of terms that appear in the document. Any traditional term
weighting function that can accept partial term counts (e.g., BM25 or query likelihood) can then
be computed on the resulting term frequency vector.</p>
        <p>One limitation of PSQ is that it pays no attention to the terms that precede or follow the
term being translated; the translation probabilities for a term are constant regardless of context.
But the same result, a weighted mapping of each term to its plausible translations, can also be
achieved in ways that leverage context. Perhaps the most straightforward such approach is
simply to use a weighted n-best decoder in a neural MT system to generate weighted alternatives
for each translated term. Neural translation systems are data hungry at training time, however,
and translation sequences that are rare in the training data may not be well modeled. Rare
terms are particularly useful in information retrieval tasks (because of their specificity), so
techniques that leverage context on the source (document) side, but not the target (English)
side have also been explored. Two such examples are Searcher [31], which leverages pretrained
language models, and the Neural Network Lexical Translation Model (NNLTM) [32], built using
a character-level Convolutional Neural Network (CNN) model.</p>
        <p>The key diference between our approach and those summarized in this section is the training
objective. In the CLIR techniques we have described, the first step leverages parallel text to
generate translation probabilities; those probabilities are then used in a modular way with some
(traditional or neural) ranked retrieval method. Our approach also learns from parallel text, but
with two key diferences: a) the parallel texts in our case are the English and non-English version
of the documents for which we have training judgments; and b) we jointly optimize translation
and retrieval by balancing multiple loss functions. Because these techniques generate conformal
representations, we can experiment with either early fusion, in which we combine alternative
ways of estimating term weights in the query language [33], or late fusion, in which we combine
ranked retrieval results. In this paper we experiment with Reciprocal Rank Fusion (RRF), a
late fusion technique [34], finding that our approach yields results complementary to those
obtained using PSQ.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Sparse Model for CLIR</title>
      <p>Here we describe the SPLADE framework and introduce the training procedure for our CLIR
extension, which we call SPLADE-X.
3.1. SPLADE → SPLADE-X
SPLADE is a sparse lexical expansion model that generates | | -dimensional vectors for queries
and documents, where the weights represents the importance of the underlying terms.</p>
      <p>To generalize the model to a cross-language setting, let   be the vocabulary space of the
input text and   for the output, where  =   =   in the original monolingual SPLADE
model. Given an input (either query or document) text sequence  ∈    of length  , SPLADE
uses a BERT Masked Language Model (MLM) head to get the term weights for every query
subword. Specifically, for a query subword   ∈   , the model generates the term weights   for
candidate output subword   ∈   as
  =  (ℎ 
)   +  

(3)
(4)
where the  is a composition of linear layer with GeLU activation and LayerNorm applied to
the contextual BERT embedding ℎ of   . Here   denotes the  -th row of the BERT MLM decoder
learnable matrix and   stands for the token-level bias.</p>
      <p>
        To produce an aggregate score for each candidate output token   ∈   , SPLADEv2 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
proposes a max pooling over the input vocabulary dimension as follows:
      </p>
      <p>= max log (1 + ReLU(  ))</p>
      <p>While the original implementation of SPLADE used the FLOPS optimizer to sparsify the
representation, more recent work has proposed simply selecting the top- dimensions from the
ifnal |  |-sized vectors [23]. Owing to the simplicity of the approach, we use that top- masking
approach in this paper.</p>
      <p>Generalizing SPLADE to the CLIR setting, we create SPLADE-X by replacing the BERT
encoder with multilingual BERT (mBERT). Since the size of the mBERT vocabulary (110k) is
roughly 3x that of BERT (35k), we create a vocabulary mask that filters out any subword that
does not belong to the query language (in our case, English). This forms an output vocabulary
space   containing only English subwords. To do so, we tokenize the MS MARCO corpus (in
English) using the mBERT tokenizer and select only those subwords that contain alphanumeric
or punctuation characters.2 This gives us a list of 33k unique subwords that we use for
SPLADEX modeling. In addition to constraining vocabulary size, this also forces the model to learn
cross-language expansions for non-English documents.</p>
      <sec id="sec-3-1">
        <title>3.2. Bilingual Training</title>
        <p>
          To train SPLADE-X, we propose a bilingual training recipe that utilizes the monolingual MS
MARCO [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] queries and passages along with translations of the MS MARCO passages [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to
document language that we plan to search. Here, we refer to English as the source language
 and non-English as the target language  , since that is the direction in which MS MARCO
was translated. Our training loss is composed of three components, source-only loss (  ),
target-only loss (  ) and source-target loss (   ).
        </p>
        <p>For source-only loss (  ), we use the ranking loss in Eqn. 2, using the English queries and
English passages, and an additional distillation loss using the monolingual SPLADE model,
as proposed in Yang et al. [23]. The distillation loss enforces knowledge transfer from the
monolingual teacher to the multilingual student model. We compute the target-only loss (  )
in a similar way as the source-only loss, except we replace the English passages with their
translated versions. We further introduce an alignment loss between source and target (   )
that brings the representations of English passages and their translated version closer, using a
MSE loss.</p>
        <p>MS MARCO corpora.</p>
        <p>2If we had wanted to experiment with using non-English queries, we would have instead used the translated
Our final training loss is summarized as follows,
 =   +   +   
(5)</p>
        <p>We also experiment with a zero-shot variant with only the source loss, and a translate-train
variant with only the target loss.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>Test collection</title>
        <p>Here we describe test collections, training, evaluation and results.</p>
        <p>For evaluation we use CLIR test collections from the CLEF 2003 multilingual ad-hoc retrieval
task, with queries in English and news documents in either German, French, Italian or Spanish.
[35].3 Each collection includes 60 English topics, for which we use the title field as the query.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Training &amp; Inference setup</title>
        <p>For training SPLADE-X, we use the Tevatron toolkit4 that supports the HuggingFace
Transformers [36] library. We initialize the encoder using mBERT and train three variants of SPLADE-X:
zero-shot (ZS), translate-train (TT), and our proposed bilingual training (BI). We train the ZS
and TT variants using the Adam optimizer with a learning rate of 1e-5 and a batch size of 32
using 4 V100 GPUs for 100k steps. For our new BI variant, we train on 8 V100 GPUs for 60k
steps, keeping the rest of the parameters same.</p>
        <p>We fix the query and the passage length to be 32 and 128 respectively and choose  to be 1%
of the total mBERT vocabulary size. For distillation, we use the publicly available
DistilSPLADEmax checkpoint,5 initialized using coCondenser-medium [37].</p>
        <p>For inference, we split the CLEF documents into overlapping passages of length 128 with a
stride of 42 tokens. We index the SPLADE-X passage term weights using Anserini [38]. To rank
documents, we first run a passage-retrieval task using the SPLADE-X queries and the indexed
passages using Anserini. The output is then passed through a score aggregation function that
selects the maximum passage score as the corresponding document score.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Baseline</title>
        <p>As a traditional alternative to our SPLADE-X variants, we also report a PSQ baseline. Specifically,
we implement a PSQ-based Query Likelihood model [39] to estimate the relevance of a document
in some other language given a query in English To obtain the translation probabilities to be
used in PSQ, we rely on the word alignment output from the GIZA++ [40] aligner. We train
GIZA++ using a combination of parallel sentences from Europarl [41] and the Panlex [42]
dictionaries. For each language pair, we have 2.5 to 3 million sentence pairs for training.
Translation probabilities that are less than 1e-5 are filtered out.</p>
        <p>3We omit evaluation on translated MS MARCO collections due to their synthetic nature.
4https://github.com/texttron/tevatron
5http://download-de.europe.naverlabs.com/Splade_Release_Jan22/splade_distil_CoCodenser_medium.tar.gz</p>
      </sec>
      <sec id="sec-4-4">
        <title>Evaluation</title>
      </sec>
      <sec id="sec-4-5">
        <title>Results</title>
        <p>To evaluate our CLIR models and baseline we compute Mean Average Precision (MAP) using
trec_eval.6 For system combination, we perform Reciprocal Rank Fusion using TrecTools [43].
Diferences in means are tested for significance using a two-tailed paired  -test (p&lt;0.05) with
Bonferroni-Holm correction.</p>
        <p>Table 1 shows results for diferent SPLADE-X variants and the PSQ baseline which performs on
par with BM25 on human-translated queries (Mono. BM25) and machine-translated queries
(QMT BM25) using a Marian MT model.7 Contrary to findings in monolingual retrieval (where
comparisons are to a traditional monolingual retrieval baseline), we observe that none of the
SPLADE-X variants consistently outperform the PSQ baseline. Comparing among the variants,
ZS performs the worst, as expected, illustrating the challenges of relying on mBERT alone
to CLIRize a monolingual IR method. This aligns with the finding that tasks involving
crosslanguage input sequences are harder to generalize [44]. BI generally performs numerically better
than the TT variant, except in Italian, but none of those apparent diferences are statistically
significant.</p>
        <p>When using Reciprocal Rank Fusion to combine each variant with PSQ baseline, we observe
statistically significant improvements over PSQ alone for both TT and BI. Looking at the pattern
of results, we consistently see smaller improvements from Reciprocal Rank Fusion with PSQ
and BI (28% from 0.317 to 0.405 in German) than fusing PSQ and ZS (59% from 0.213 to 0.339 in
German), suggesting that BI may be learning some lexical translations from its more powerful
training scheme, thus receiving less advantage from PSQ when fusing. In the future, we plan to
explore early fusion between of PSQ and SPLADE-X (BI) so as to also be able to take advantage
of their synergies during training.</p>
        <p>6https://github.com/usnistgov/trec_eval
7https://huggingface.co/Helsinki-NLP</p>
        <p>Original
PSQ
après une réunion de spécialistes durant trois jours en
autriche importante réforme de l’orthographe allemande
à partir de fin 1995. les experts de ces trois pays ont
tenue une conférence de trois jours sur la réforme de
l’orthographe de la langue allemande. ils ont décidé
que celle-ci pourra être appliquée dès la fin de l’année
1995 et sera contraignante à l’issue d’une période de
transition de six ans, à partir de l’an 2001. l’application
de la réforme devra être surveillée par une cömmission
sur l’orthographe.̈
year own
as germany
letter 2001
experts
german important in first be
will reform austria written
end 1995 conference three
aut deutschen
script year
SPLADE-X (BI) allemande
transition
phone</p>
        <p>allemand
application
spelling
conference</p>
        <p>language years
reform important
deutsche grammar
languages ##rap ort</p>
        <p>Figure 1 shows a relevant French document and the top-20 terms in the PSQ and SPLADE-X
document vectors. It is interesting to see that both systems rank this document highly, despite
very diferent term weights. Among the top terms, both the systems cover two query terms
apiece, whereas a union covers all of them. This might shed some light on why the combination
of the two systems improves over the individual systems, and perhaps can open up ways to
generate a low indexing footprint as the union of the top-k entries from the two lists.</p>
      </sec>
      <sec id="sec-4-6">
        <title>The Curious Case of XLM-R.</title>
        <p>Our choice of mBERT as a multilingual encoder for SPLADE-X was empirical, despite the
fact that XLM-R has been shown to be outperforming mBERT on several tasks [45]. Figure 2
shows Kernel Density Estimation (KDE) plots generated using the mBERT and XLM-R MLM
decoders for a specific query term. We observe that a XLM-R has a relatively higher mean
than mBERT, which afects the SPLADE-X model as fewer terms are being sparsified by the
ReLU. To counteract this efect, we used an additional LayerNorm that normalizes weights
following Eqn. 3. We then trained the monolingual SPLADE models with both mBERT and
XLM-R embeddings on English MS MARCO and tested it on the English MS MARCO dev set.
We observed an MRR@10 of 0.347 for mBERT, compared to a 0.295 for XLM-R. Investigating
the output generated by XLM-R, we see the distribution of weights is quite uniform relative to
mBERT. For future work, we intend to investigate ways to get sharper distribution of weights
from the XLM-R models, which might in turn lead to more efective retrieval.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We propose SPLADE-X, a sparse retrieval model that performs cross-language lexical expansion.
We introduce a joint bilingual training procedure using both the monolingual and the translated
MS MARCO collections with an additional alignment loss between the two. Our experiments
show that our model performs on par with a strong PSQ baseline on several CLIR test collections
and, when combined, performs significantly better than the individual systems. In the future,
we would like to explore better ways to combine the PSQ with the SPLADE-X training. Another
direction is to investigate integration of multilingual models belonging to the XLM-R family
into the SPLADE-X framework.
retrieval, arXiv preprint arXiv:2109.01628 (2021).
[20] S. MacAvaney, L. Soldaini, N. Goharian, Teaching a new dog old tricks: Resurrecting
multilingual retrieval using zero-shot learning, in: Proceedings of the 42nd European
Conference on Information Retrieval Research, 2020, pp. 246–254. URL: https://link.springer.
com/chapter/10.1007/978-3-030-45442-5_31. doi:10.1007/978- 3- 030- 45442- 5_31.
[21] P. Shi, J. Lin, Cross-lingual relevance transfer for document retrieval, arXiv preprint
arXiv:1911.02989 (2019).
[22] N. Reimers, I. Gurevych, Making monolingual sentence embeddings multilingual using
knowledge distillation, arXiv preprint arXiv:2004.09813 (2020).
[23] J.-H. Yang, X. Ma, J. Lin, Sparsifying sparse representations for passage retrieval by top-
masking, arXiv preprint arXiv:2112.09628 (2021).
[24] J. Lin, A proposed conceptual framework for a representational approach to information
retrieval, arXiv preprint arXiv:2110.01529 (2021).
[25] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, W.-t. Yih, Dense
passage retrieval for open-domain question answering, arXiv preprint arXiv:2004.04906
(2020).
[26] Z. Wu, Y. Xiong, S. X. Yu, D. Lin, Unsupervised feature learning via non-parametric
instance discrimination, in: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2018, pp. 3733–3742.
[27] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual
representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, 2020, pp. 9729–9738.
[28] B. Paria, C.-K. Yeh, I. E. Yen, N. Xu, P. Ravikumar, B. Póczos, Minimizing FLOPs to learn
eficient sparse representations, arXiv preprint arXiv:2004.05665 (2020).
[29] N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, I. Gurevych, BEIR: A heterogenous
benchmark for zero-shot evaluation of information retrieval models, arXiv preprint
arXiv:2104.08663 (2021).
[30] K. Darwish, D. W. Oard, Probabilistic structured query methods, in: Proceedings of
the 26th annual international ACM SIGIR conference on Research and development in
informaion retrieval, 2003, pp. 338–344.
[31] J. Barry, E. Boschee, M. Freedman, S. Miller, SEARCHER: Shared embedding architecture
for efective retrieval, in: Proceedings of the workshop on Cross-Language Search and
Summarization of Text and Speech (CLSSTS2020), European Language Resources Association,
Marseille, France, 2020, pp. 22–25. URL: https://aclanthology.org/2020.clssts-1.4.
[32] R. Zbib, L. Zhao, D. Karakos, W. Hartmann, J. DeYoung, Z. Huang, Z. Jiang, N. Rivkin,
L. Zhang, R. Schwartz, et al., Neural-network lexical translation for cross-lingual IR from
text and speech, in: Proceedings of the 42nd International ACM SIGIR Conference on
Research and Development in Information Retrieval, 2019, pp. 645–654.
[33] S. Nair, P. Galuscakova, D. W. Oard, Combining contextualized and non-contextualized
query translations to improve CLIR, in: Proceedings of the 43rd International ACM SIGIR
Conference on Research and Development in Information Retrieval, 2020, pp. 1581–1584.
[34] G. V. Cormack, C. L. Clarke, S. Buettcher, Reciprocal rank fusion outperforms condorcet and
individual rank learning methods, in: Proceedings of the 32nd International ACM SIGIR
Conference on Research and Development in Information Retrieval, 2009, pp. 758–759.
[35] M. Braschler, C. Peters, CLEF 2003 methodology and metrics, in: Comparative Evaluation
of Multilingual Information Access Systems, Springer Berlin Heidelberg, 2004, pp. 7–20.
[36] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,
S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-art natural
language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing: System Demonstrations, Association for Computational Linguistics,
Online, 2020, pp. 38–45. URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[37] L. Gao, J. Callan, Unsupervised corpus aware language model pre-training for dense
passage retrieval, arXiv preprint arXiv:2108.05540 (2021).
[38] P. Yang, H. Fang, J. Lin, Anserini: Enabling the use of lucene for information retrieval
research, in: Proceedings of the 40th International ACM SIGIR Conference on Research and
Development in Information Retrieval, SIGIR ’17, Association for Computing Machinery,
New York, NY, USA, 2017, pp. 1253–1256.
[39] J. Xu, R. Weischedel, Cross-lingual information retrieval using hidden Markov models, in:
2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and
Very Large Corpora, 2000, pp. 95–103.
[40] F. J. Och, H. Ney, A systematic comparison of various statistical alignment models,</p>
      <p>Computational Linguistics (2003).
[41] P. Koehn, Europarl: A parallel corpus for statistical machine translation, Machine</p>
      <p>Translation Summit, 2005 (2005) 79–86.
[42] D. Kamholz, J. Pool, S. Colowick, PanLex: Building a resource for panlingual lexical
translation, in: Proceedings of the Ninth International Conference on Language Resources
and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik,
Iceland, 2014, pp. 3145–3150. URL: http://www.lrec-conf.org/proceedings/lrec2014/pdf/
1029_Paper.pdf.
[43] J. Palotti, H. Scells, G. Zuccon, TrecTools: An open-source python library for information
retrieval practitioners involved in TREC-like campaigns, in: Proceedings of the 42nd
International ACM SIGIR Conference on Research and Development in Information Retrieval,
2019, pp. 1325–1328.
[44] Z. Wang, S. Mayhew, D. Roth, et al., Cross-lingual ability of multilingual BERT: An
empirical study, arXiv preprint arXiv:1912.07840 (2019).
[45] J. Hu, S. Ruder, A. Siddhant, G. Neubig, O. Firat, M. Johnson, Xtreme: A massively
multilingual multi-task benchmark for evaluating cross-lingual generalisation, in: International
Conference on Machine Learning, PMLR, 2020, pp. 4411–4421.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          - 1423.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <article-title>The vocabulary problem in human-system communication</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>30</volume>
          (
          <year>1987</year>
          )
          <fpage>964</fpage>
          -
          <lpage>971</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gatford</surname>
          </string-name>
          , Okapi at TREC-3, in: TREC,
          <year>1994</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bajaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McNamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stoica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwary</surname>
          </string-name>
          , T. Wang, MS MARCO:
          <article-title>A human generated machine reading comprehension dataset</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <volume>1611</volume>
          .
          <fpage>09268</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Bonifacio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Campiotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jeronymo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lotufo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <article-title>mMARCO: A multilingual version of the MS MARCO passage ranking dataset</article-title>
          ,
          <source>arXiv preprint arXiv:2108.13897</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lawrie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McNamee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mayfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <article-title>Transfer learning approaches for building cross-language dense retrieval models</article-title>
          ,
          <source>in: Advances in Information Retrieval: 44th European Conference on IR Research</source>
          , ECIR
          <year>2022</year>
          , Stavanger, Norway,
          <source>April 10-14</source>
          ,
          <year>2022</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>382</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Learned-Miller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>From neural re-ranking to neural ranking: Learning a sparse representation for inverted indexing</article-title>
          ,
          <source>in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>497</fpage>
          -
          <lpage>506</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>Context-aware sentence/passage term importance estimation for first stage retrieval</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>10687</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mallia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <article-title>Learning passage impacts for inverted indexes</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1723</fpage>
          -
          <lpage>1727</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques</article-title>
          ,
          <source>arXiv preprint arXiv:2106.14807</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , G. Zuccon,
          <article-title>Tilde: Term independent likelihood model for passage re-ranking</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1483</fpage>
          -
          <lpage>1492</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Shang,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Sparterm: Learning term-based sparse representation for fast text retrieval</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>00768</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          , S. Clinchant, SPLADE:
          <article-title>Sparse lexical and expansion model for ifrst stage ranking</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2288</fpage>
          -
          <lpage>2292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lassance</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          , Splade v2:
          <article-title>Sparse lexical and expansion model for information retrieval</article-title>
          ,
          <source>arXiv preprint arXiv:2109.10086</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>Document expansion by query prediction</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>08375</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          , From doc2query to docTTTTTquery,
          <source>Technical Report</source>
          , University of Waterloo,
          <year>2019</year>
          . URL: https://cs.uwaterloo.ca/~jimmylin/publications/Nogueira_Lin_
          <year>2019</year>
          _
          <article-title>docTTTTTquery-v2</article-title>
          .
          <fpage>pdf</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-L.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Cross-lingual natural language generation via pre-training</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>7570</fpage>
          -
          <lpage>7577</lpage>
          . URL: https://ojs.aaai.org/index.php/AAAI/article/view/6256. doi:
          <volume>10</volume>
          . 1609/aaai.v34i05.
          <fpage>6256</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Cross-lingual training with dense retrieval for document</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>