<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Gap by Predicting Missing Tokens for E-com merce Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kaihao Li</string-name>
          <email>kaihao.li@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juexin Lin</string-name>
          <email>juexin.lin@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tony Lee</string-name>
          <email>tony.lee@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Document Expansion, Information Retrieval, E-commerce Search</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Walmart Global Technology</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Addressing the “vocabulary mismatch” issue in information retrieval is a central challenge for e-commerce search engines, because product pages often miss important keywords that customers search for. Doc2Query [1] is a popular document-expansion technique that predicts search queries for a document and includes the predicted queries with the document for retrieval. However, this approach can be ineficient for e-commerce search, because the predicted query tokens are often already present in the document. In this paper, we propose Doc2Token, a technique that predicts relevant tokens (instead of queries) that are missing from the document and includes these tokens in the document for retrieval. For the task of predicting missing tokens, we introduce a new metric, “novel ROUGE score”. Doc2Token is demonstrated to be superior to Doc2Query in terms of novel ROUGE score and diversity of predictions. Doc2Token also exhibits eficiency gains by reducing both training and inference times. We deployed the feature to production and observed significant revenue gain in an online A/B test, and launched the feature to full trafic on Walmart.com.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>The vocabulary gap problem in e-commerce search is a central challenge, as it arises from
discrepancies between the vocabulary used by customers and sellers when describing products.
Customer queries are often short and ambiguous, while product descriptions tend to be more
detailed and explicit. For instance, a customer might search for “small building set”, intending to
ifnd a set that ofers simpler building experiences for young children. However, in the product
catalog, those products are often characterized by piece count and target age group, which do
not align directly with this search query.</p>
      <p>
        Diferent approaches have been proposed to address the vocabulary mismatch issue. In the
context of lexical retrieval, query expansion [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2, 3, 4, 5, 6</xref>
        ] and document expansion [
        <xref ref-type="bibr" rid="ref1 ref7 ref8">1, 7, 8</xref>
        ]
are two efective techniques. Query expansion enriches user queries with additional terms
or synonyms to better capture the user’s intent, while document expansion enriches product
information with additional keywords or phrases. Doc2Query [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a document expansion
technique that predicts and indexes search queries for documents. Recently, embedding-based
dense retrieval models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have demonstrated their ability to align queries and documents
by projecting them into a representation space to learn their semantic similarity. Although
dense retrieval with approximate nearest neighbor search has shown impressive results, lexical
eCom’24: ACM SIGIR Workshop on eCommerce, July 18, 2024, Washington, D.C., USA
nEvelop-O
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
retrieval remains an important component of e-commerce search due to its desirable properties,
such as interpretability, scalability, as well as handling of rare words and numerical tokens.
      </p>
      <p>In this paper, we propose a new document expansion technique, called Doc2Token, which
generalizes Doc2Query for application in e-commerce search, as depicted in Figure 1. Our task
is to, given a product, generate relevant keywords that are absent from the product’s indexed
metadata to ensure that the product is retrieved when customers search using these keywords.
We call these “novel tokens”. We observed that Doc2Query’s predicted queries often contain
tokens already in the product metadata instead of novel tokens, which makes it ineficient for
our task. In contrast, Doc2Token is designed to predict novel tokens (instead of queries). The
approach is to prepare a dataset of pairs of product and novel token, then train a seq2seq model
on that dataset. By design, Doc2Token eficiently generates tokens with a high probability of
being novel, rather than producing long and redundant sequences like in Doc2Query. Using
the product shown in Figure 1 as an example, Doc2Query predicts “6 year old boy toy”, “3
in 1 creator”, “sea animal toy”, and “building toy for boy”. On the other hand, Doc2Token
provides a more diversified set of tokens, including “small”, “kit”, and “tank”. To incorporate
the Doc2Token predictions into the search system, we added them to the product’s metadata
and indexed for retrieval matching and ranking.</p>
      <p>
        To assess performance on the novel-token-prediction task, we introduce a new evaluation
metric called “novel ROUGE score”, denoted by “nROUGE”, to measure the ROUGE score [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
specifically for novel tokens. Our results indicate that Doc2Token surpasses Doc2Query in terms
of nROUGE score. Regarding eficiency, Doc2Token is capable of predicting more diverse results
while significantly reducing training and inference times. The efectiveness of this approach
is further demonstrated through online relevance evaluation and A/B testing, confirming that
the novel tokens generated are not only novel but also relevant to products, ultimately driving
customer engagement.
      </p>
      <p>Our contributions are summarized as follows:
ing novel tokens.</p>
      <p>ciency compared to Doc2Query.
• We propose a novel technique, Doc2Token, for document expansion in e-commerce search,
encompassing both the training setup and the modification of the loss function.
• We introduce a new metric, “novel ROUGE score”, to evaluate the performance of
predict• We demonstrate that Doc2Token achieves improvements in both efectiveness and
efi</p>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <p>We define the task of novel token prediction in this section. For a product  , let   and   represent
the product text and the set of tokens extracted from   , respectively. The goal is to predict
novel tokens, i.e., tokens that are absent from   . To achieve this, we first collect a list of relevant
queries (section 3.1) for the product based on historical search logs. Next, for each product, we
assemble a set of unique tokens, disregarding their sequence, through the subsequent process.
We concatenate all queries, divide the concatenated sequence into individual tokens, count
their frequencies, and exclude tokens already present in product  . As a result, for product  ,
we have a target token set ⋃
queries for product  , and  

 ∉ 
(</p>
      <p>
        ,   ), where   represents the  th unique token from target
denotes the frequency of token   . Instead of using all tokens as
one training target, we divide them into  training instances. We train a seq2seq generative
input and outputs a sequence of tokens  ̂ in an autoregressive manner. More formally,
language model, T5 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], with an encoder-decoder structure. It takes the product text   as
token pair is:
loss for 
 5 .
      </p>
      <p>̂ = Decoder(Encoder(  )).</p>
      <p>To account for the token frequency, we modify the T5 loss as follows. The loss for a
product
weighted(

,  ̂ ) = (  )
 ∗ 
 5 (

,  ̂ ),
where  is the smoothing factor, set to 0.5 in our implementation. This value is chosen to balance
the contribution of token frequency to the overall loss. In practice, we use the cross-entropy
(1)
(2)</p>
      <p>
        For model inference, we employ beam search [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to generate the top N predictions. Since
T5 tokenizes words into subtokens [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the target token   and predicted token  ̂ may consist
of multiple subtokens, although they are always single words. We utilize the beam score [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to
determine the confidence of the prediction, and we only retain predictions with scores greater
than a predetermined cutof value (more details are discussed in Section
3.4).
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Experiments</title>
      <sec id="sec-4-1">
        <title>3.1. Datasets</title>
        <p>We sampled the product-query pairs from user engagement data on Walmart.com with at least
a certain number of add-to-cart (ATCs) over a two-year period. Then we did the following
preprocessing steps. The product information used throughout the experiment includes product
title, product type, brand, color, gender and description.</p>
        <sec id="sec-4-1-1">
          <title>Preprocessing</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>None</title>
          <p>RF
RF + FMF + PTF
(Doc2Query)</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>RF + FMF + PTF + tokenization + OTF (Doc2Token)</title>
          <p>
            Relevance filter (RF). The engaged products are not always relevant to the search query, because
users’ decisions are influenced by factors other than relevance, such as price, visual appeal,
ranking, etc. Additionally, the minimal match criteria [
            <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
            ] for our lexical retrieval is not
always 100%. As a result, customers may be shown products that don’t fully match their search
terms. For instance, a customer may search for “vanilla ice cream” but end up buying chocolate
ice cream. To mitigate such noise, we removed product-query pairs predicted to be irrelevant
by applying a relevance model. (The relevance model is a BERT [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] cross-encoder model with
a classifier head that takes the query and product information as input and outputs a relevance
score. It was trained on manually annotated relevance data.)
Full match filter (FMF). For the purpose of predicting novel tokens, we focused on product-query
pairs with a vocabulary mismatch. This implies that at least one query token was not found in
the product information. Thus, we removed pairs where all query tokens were in the product
information.
          </p>
          <p>Price token filter (PTF). Customer queries sometimes include price and deal intent (e.g., “under
$500” or “on sale”). Such phrases are not very useful for our task, since price and deals can
lfuctuate rapidly, so it does not make sense to include them as training labels. We utilized
regular expressions to identify and eliminate these phrases from the query.</p>
          <p>Overlapping token filter (OTF). This step excludes all query tokens that are present in the product
information. (This is a stronger extension of the full match filter.) After this step, only novel
tokens remain.</p>
          <p>Detailed data statistics can be found in Table 1. In later sections, we show results for
both Doc2Query and Doc2Token. For the Doc2Query dataset, we applied the first three filters,
resulting in 14.9M product-query pairs. For the Doc2Token dataset, we built upon the Doc2Query
dataset by further dividing the queries into tokens and applying the overlapping token filter,
resulting in 10.3M product-token pairs. For each dataset, we partitioned it by product into
training, validation, and test sets in an 8:1:1 ratio to ensure no product overlap between the sets.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Metrics</title>
        <p>
          To evaluate model performance, we utilize the standard ROUGE score, a widely-used evaluation
metric for summarization tasks [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The ROUGE score assesses the quality of generated text by
comparing it to a reference text, considered as ground truth, using n-gram overlaps. Unlike the
summarization task, which may require complex evaluations based on n-grams, lexical retrieval
= 1

∑
        </p>
        <p>= 1

∑</p>
        <p>ℎ
∑</p>
        <p>∈
∑
∈




( ̂
(
ℎ
ℎ</p>
        <p>(,  ̂

)
(,  ̂

)
primarily relies on unigrams. Therefore, we measure the ROUGE scores in terms of unigrams.
In our context, we formulate the ROUGE score as follows.
where   ,  ̂ represent the reference text and predicted text for product  , respectively. 
denotes the number of tokens, and 
token in the predicted text.  is the number of products.</p>
        <p>denotes the number of co-occurrences of a reference</p>
        <p>However, a higher ROUGE score does not necessarily indicate better performance at
predicting novel tokens. We observed that text predictions with high ROUGE scores often exhibit
substantial token overlap with the product information. To assess the performance for novel
tokens, we introduce a new metric, “novel ROUGE score” (nROUGE), where the reference text
consists solely of novel tokens. Formally, the nROUGE score is defined as follows.</p>
        <p>= 1

∑</p>
        <p>= 1

∑

∑</p>
        <p>∈
∑
∈

∗ 

∗ 
(
( ̂
ℎ
ℎ
∗
 )</p>
        <p>(,  ̂

)
(,  ̂
but not in the product information.
where   ∗ represents the novel reference text for product  , i.e., the tokens in the reference text</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Experiment Setup</title>
        <p>For the model training, we fine-tuned the public T5-base model using eight A-100 Nvidia GPUs,
learning rate of 1e-4, batch size of 64, maximum input sequence of 256, and maximum output
sequence of 32. For model inference, we employed a top-k beam search strategy with a beam
size of 10. The input of the model is a text of product information consisting of the product title
and some attributes, such as brand, color, etc.</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Results</title>
        <p>In this section, we compare our proposed Doc2Token model to the baseline Doc2Query model
from both efectiveness and eficiency perspectives. To assess efectiveness, we measure
performance based on the ROUGE and nROUGE scores. In terms of eficiency, we report the resources
used for model training and inference. We evaluated on four models, including both Doc2Query
and Doc2Token models with and without full-match filter in data preprocessing step.

)</p>
        <p>,

)
,

)</p>
        <p>,

)
(3)
(4)
(5)
(6)</p>
        <sec id="sec-4-4-1">
          <title>Model</title>
        </sec>
        <sec id="sec-4-4-2">
          <title>Cutof</title>
        </sec>
        <sec id="sec-4-4-3">
          <title>Doc2Query</title>
          <p>w/o FMF</p>
        </sec>
        <sec id="sec-4-4-4">
          <title>Doc2Token w/o (FMF + OTF)</title>
        </sec>
        <sec id="sec-4-4-5">
          <title>Doc2Query</title>
          <p>w/ FMF
Doc2Token
w/ (FMF
+ OTF)
3.4.1. Ofline evaluation results
In Table 2, we present the results for each model based on the top 10 predictions with various
beam score cutofs. The predictions were chosen if their beam scores exceeded the respective
cutof value. These cutof values were tuned, for each model, to achieve the optimal nROUGE
F1 score. Additionally, to assess models’ eficiency in predicting novel tokens, we concatenated
the predictions and calculated the total number of predicted tokens and the number of predicted
novel tokens. For the Doc2Query models, we present the result without any beam score cutof,
as well as the result that achieves the highest nROUGE F1 score. For the Doc2Token models,
we reported three results: one without any cutof, one with the optimal nROUGE F1 score, and
a result generating a similar number of novel tokens as the optimal Doc2Query model.</p>
          <p>To evaluate the efectiveness of the full-match filter in data preprocessing, we trained the
models without incorporating that step. We observed no apparent impact on the ROUGE
F1 score. However, there was a substantial improvement in the nROUGE F1 score for both
Doc2Query (from 0.438 to 0.481) and Doc2Token (from 0.469 to 0.500) at the optimal cutof
values. This is expected, as with the full-match filter, our training data primarily relies on labels
containing novel tokens.</p>
          <p>In our comparison between the Doc2Query and Doc2Token models, we observed that the
Doc2Query models tend to achieve higher ROUGE F1 scores than the Doc2Token models.
However, Doc2Token excels in achieving superior nROUGE F1 scores. Comparing the models
with a similar number of predicted novel tokens, for example, the Doc2Query model with a
cutof of 0.51 and Doc2Token model with a cutof of 0.29, the Doc2Token model outperforms the
Doc2Query model in both nROUGE precision and nROUGE recall, yielding a higher nROUGE F1
score. With optimal cutofs, the Doc2Token model shows the superior performance compared
to the Doc2Query model by achieving a 3.95% higher nROUGE F1 score from 0.496 to 0.500.
This improvement is statistically significant, with a 95% confidence interval for the Doc2Token
F1 score of (0.498, 0.501) obtained through bootstrap resampling. Moreover, the Doc2Token
model is more eficient in generating novel tokens, achieving nearly 100% of predicted tokens
being novel. In contrast, Doc2Query produces only 20% novel tokens, indicating a higher degree</p>
        </sec>
        <sec id="sec-4-4-6">
          <title>Model</title>
        </sec>
        <sec id="sec-4-4-7">
          <title>Doc2Query</title>
        </sec>
        <sec id="sec-4-4-8">
          <title>Doc2Token</title>
        </sec>
        <sec id="sec-4-4-9">
          <title>Doc2Query</title>
        </sec>
        <sec id="sec-4-4-10">
          <title>Doc2Token</title>
        </sec>
        <sec id="sec-4-4-11">
          <title>Preprocessing</title>
          <p>w/o FMF
w/o (FMF + OTF)
all
all
of redundancy. This is expected, as the Doc2Token model is designed to predict more diverse
novel tokens.
3.4.2. Model eficiency
Table 3 presents the results for training and inference times. The training time is primarily
afected by the size of the training data. Without the full-match filter, splitting queries into
tokens explodes the data size, resulting in a longer training time for Doc2Token compared
to Doc2Query. However, with the full-match filter, the situation changes: the Doc2Token
strategy significantly reduces the dataset size, leading to shorter training times than Doc2Query.
For inference time, we sampled 100,000 products from the test dataset and conducted model
inference on the top 10 results with a batch size of 16 using a single K80 GPU machine. The
inference time for Doc2Token is faster than that for Doc2Query, as the output of Doc2Token is
generally shorter. The results are in agreement with the eficiency discussions from Table 2.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>3.5. Examples</title>
        <sec id="sec-4-5-1">
          <title>Product Input title: Toddler Floaties, Swim Vest for Boys and Girls Age 2-7 Years Old, 20-50</title>
        </sec>
        <sec id="sec-4-5-2">
          <title>Pounds Children Water Wings Arm Floaties in Puddle/Sea/Pool/Beach (Dinosaur) brand: Dark Lightning color: Blue gender: Unisex</title>
        </sec>
        <sec id="sec-4-5-3">
          <title>Doc2Query “swimming vest for kid”, “toddler boy swim vest”, “swim vest for kid”, “boy floaty”, “kid floaty”</title>
          <p>Doc2Token “float”, “kid”, “floaty”, “floater”, “salvavida”, “swimmy”, “baby”, “floatation”,
“children”, “life”</p>
        </sec>
        <sec id="sec-4-5-4">
          <title>Product Input title: Hanno Muller-Brachmann - North German Poets - Classical - CD brand: Artists color: white</title>
        </sec>
        <sec id="sec-4-5-5">
          <title>Doc2Query “classical cds”, “germany cd”, “country music cd”, “north germany cd”, “west</title>
          <p>germany cd”</p>
        </sec>
        <sec id="sec-4-5-6">
          <title>Doc2Token “music”, “country”, “5”, “b”, “classical”, “christmas”, “soundtrack”</title>
          <p>
            Table 4 showcases two example products along with its corresponding Doc2Query and
Doc2Token predictions. The Doc2Query model produces the top 5 queries, while the Doc2Token
model generates the top 10 tokens. The novel tokens, after the process of stemming [
            <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
            ],
are bold. In general, the Doc2Query model produces queries containing tokens that are already
present in the product information, while the Doc2Token model does not. In contrast, all
tokens produced by the Doc2Token model are relevant and absent from the product. In the first
positive example, the Doc2Token model is capable of predicting a Spanish word “salvavida”
(“lifeguard” in English), indicating its ability to handle Spanish queries. Queries in Spanish are
commonly observed in US e-commerce search. The second example, a product from northern
Germany, illustrates some bad predictions from the models. The predicted tokens “country”,
“west”, “christmas” are irrelevant. This is mainly due to a lack of media-related data in our
training set. Replacing T5 with a more knowledgeable LLM could potentially address this issue.
4. Implementation and online tests
We implemented the Doc2Token model in production because of its superior performance
compared to the Doc2Query model as shown in Section 3.4. We ran the model inference on
all products in our catalog using cost-efective K80 GPUs and a batch size of 16. We predicted
the top 10 tokens and retained the predictions with scores above 0.33. The inference process
is conducted ofline on a daily basis. For online usage, the Doc2Token predictions serve as
an additional text matching field in Solr [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], which is an enterprise search platform built on
Apache Lucene [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] utilized for the search retrieval at Walmart.com.
          </p>
          <p>We evaluated the performance of the Doc2Token feature from both relevance and engagement
perspectives. For the relevance evaluation, we enlisted the human annotators to assess the top
10 ranked products from impacted queries. These assessments were based on a three-point
scale (exact match, substitute, irrelevant), considering factors such as product title, image, and
product page at Walmart.com. We then computed the NDCG@10 based on this 3-point scale,
showing a 0.49% lift (p-value=0.066). For engagement assessment, we conducted a two-week
A/B test for the feature on live trafic. The test revealed a statistically significant 0.28% lift
in revenue (p-value = 0.013). While the NDCG@10 improvement is statistically marginal,
the statistically significant revenue increase demonstrates the efectiveness of the Doc2Token
feature. By introducing relevant products in the retrieval process, the Doc2Token feature is
able to enhance the end-to-end search results, assisting customers in finding what they are
searching for.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this study, we present Doc2Token, a novel document expansion technique for e-commerce
search engines. We introduce the novel ROUGE score, a new metric crafted to evaluate the
eficacy of document expansion endeavors. Our analysis has demonstrated that Doc2Token
surpasses Doc2Query in terms of eficiency and efectiveness in addressing the vocabulary
mismatch challenge. The Doc2Token feature has been deployed and evaluated online, resulting
in a significant improvement in both relevance and revenue.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>Document expansion by query prediction</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>08375</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Rocchio</surname>
          </string-name>
          ,
          <article-title>Relevance feedback in information retrieval</article-title>
          , in: G. Salton (Ed.),
          <article-title>The Smart retrieval system - experiments in automatic document processing</article-title>
          , Englewood Clifs, NJ: Prentice-Hall,
          <year>1971</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Wordnet: a lexical database for english</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>38</volume>
          (
          <year>1995</year>
          )
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>A comparative study of methods for estimating query language models with pseudo feedback</article-title>
          ,
          <source>in: Proceedings of the 18th ACM conference on Information and knowledge management</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>1895</fpage>
          -
          <lpage>1898</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stenetorp</surname>
          </string-name>
          ,
          <article-title>Query expansion using contextual clue sampling with language models</article-title>
          ,
          <source>arXiv preprint arXiv:2210.07093</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Query2doc: Query expansion with large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2303.07678</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Epistemic</surname>
          </string-name>
          , From doc2query to doctttttquery,
          <source>Online preprint 6</source>
          (
          <year>2019</year>
          )
          <article-title>2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          , Splade:
          <article-title>Sparse lexical and expansion model for ifrst stage ranking</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2288</fpage>
          -
          <lpage>2292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-F.</given-names>
            <surname>Tang</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Overwijk</surname>
          </string-name>
          ,
          <article-title>Approximate nearest neighbor negative contrastive learning for dense text retrieval</article-title>
          , arXiv preprint arXiv:
          <year>2007</year>
          .
          <volume>00808</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C.-Y. Lin</surname>
          </string-name>
          ,
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          , in: Text summarization branches out,
          <year>2004</year>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>5485</fpage>
          -
          <lpage>5551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          , M. Ma,
          <article-title>Breaking the beam search curse: A study of (re-) scoring methods and stopping criteria for neural machine translation</article-title>
          , arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>09582</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kudo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <article-title>Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing</article-title>
          , arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>06226</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Shahi</surname>
          </string-name>
          , Apache solr, Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Apache</surname>
            <given-names>lucene</given-names>
          </string-name>
          , http://lucene.apache.org,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>See</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Get to the point: Summarization with pointer-generator networks</article-title>
          ,
          <source>arXiv preprint arXiv:1704.04368</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>