=Paper= {{Paper |id=Vol-3784/short5 |storemode=property |title=A Product-Aware Query Auto-Completion Framework for E-Commerce Search via Retrieval-Augmented Generation Method |pdfUrl=https://ceur-ws.org/Vol-3784/short5.pdf |volume=Vol-3784 |authors=Fangzheng Sun,Tianqi Zheng,Aakash Kolekar,Rohit Patki,Hossein Khazaei,Xuan Guo,Ziheng Cai,David Liu,Ruirui Li,Yupin Huang,Dante Everaert,Hanqing Lu,Garima Patel,Monica Cheng |dblpUrl=https://dblp.org/rec/conf/ir-rag/SunZKPKGCL0HELC24 }} ==A Product-Aware Query Auto-Completion Framework for E-Commerce Search via Retrieval-Augmented Generation Method== https://ceur-ws.org/Vol-3784/short5.pdf
                         A Product-Aware Query Auto-Completion Framework for
                         E-Commerce Search via Retrieval-Augmented Generation
                         Method
                         Fangzheng Sun*,† , Tianqi Zheng† , Aakash Kolekar, Rohit Patki, Hossein Khazaei, Xuan Guo,
                         Ziheng Cai, David Liu, Ruirui Li, Yupin Huang, Dante Everaert, Hanqing Lu, Garima Patel and
                         Monica Cheng
                         Amazon Search, Palo Alto, CA, USA


                                         Abstract
                                         Query Auto-Completion (QAC) is a fundamental component of user search experience on e-commerce websites. It assists in finding
                                         user-intended products, by automatically presenting search queries as users typing in the search bar. Traditional QAC systems build
                                         upon query popularity to suggest a list of potential completions, but they fall short for unforeseen search prefixes. A generative Large
                                         Language Model (LLM) can complete even unforeseen prefixes, but relevance to the product catalog of the generated suggestions is
                                         not guaranteed. To our best knowledge, there is no existing study using LLMs to generate product-aware search query completion
                                         suggestions.
                                             This paper proposes a generative approach named "Product-RAG", to incorporate product metadata and adapt Retrieval Augmented
                                         Generation (RAG) in the development of QAC systems. Product-RAG contains two components: (1) a retrieval model that identifies
                                         top-K most relevant products from the product catalog given a user-input prefix, and (2) a generative model that offers suggestions
                                         based on both the given prefix and the retrieved product metadata. We evaluate this approach for its ability to match user-input prefixes
                                         to user-intended products, using the metrics of ROUGE scores, Mean Reciprocal Rank (MRR) and Hit Ratio (HR) in downstream product
                                         search. We observe that the proposed Product-RAG approach outperforms state-of-the-art generative models in auto-completing
                                         e-commerce search queries.

                                         Keywords
                                         Query Auto-Complete, Retrieval-Augmented Generation, E-Commerce, Product-aware



                         1. Introduction                                                                                            tent related to specific products, brands, or categories and
                                                                                                                                    providing auto-complete suggestions that align with the
                         Query auto-completion (QAC) [1, 2, 3, 4, 5] refers to an in-                                               user’s potential shopping targets. When query log falls
                         formation retrieval system for search engines, for which,                                                  short for unseen or rarely seen prefixes, product knowl-
                         given partial context typed by the user (i.e. prefix), it offers                                           edge is particularly helpful to predict users’ shopping intent
                         one or multiple query suggestions to the user. In modern                                                   and generate corresponding suggestions (e.g., in the case
                         e-commerce, where user experience is pivotal, QAC stands                                                   of Figure 1). Nevertheless, in spite of efforts attempting
                         as an important feature shaping the way consumers interact                                                 to understand user’s shopping intent with product cata-
                         with search engines and plays a crucial role in smoothing                                                  log or product attributes [18, 19], we could not find any
                         all the downstream shopping experiences [6, 7, 8, 9]. By                                                   work bridging the gap between partially complete search
                         leveraging personalized signals, product-related knowledge,                                                queries and product knowledge for e-commerce QAC sys-
                         and advanced recommendation algorithms, QAC not only                                                       tems. Herein we propose a generative approach to leverage
                         accelerates the search experience but also ensures that users                                              the product knowledge in e-commerce QAC systems based
                         receive tailored suggestions based on their unique prefer-                                                 on Retrieval-Augmented Generation (RAG) Framework [20],
                         ences.                                                                                                     namely Product-RAG, which is capable of improving the
                            One major challenge of QAC tasks in e-commerce is to                                                    QAC systems by providing accurate auto-completion sug-
                         understand user shopping intent from an incomplete search
                         query, and provide them relevant auto-complete sugges-
                         tions. A typical production QAC works as follows: Given
                         a prefix entered by a user, the QAC system obtains a col-
                         lection of queries satisfying the prefix from query log, and
                         adopts a selection process, often based on forecasted popu-
                         larity, to select candidate queries to send to the query ranker
                         [10, 11, 12]. This framework lacks the understanding of user
                         shopping intent. To give more importance to users’ intents
                         and provide more personalized and relevant QAC sugges-
                         tions, a number of works explore the context-aware and
                         personalized QAC systems [13, 14, 15, 16, 17].
                            Another challenge for QAC in e-commerce is to attain
                         product awareness by recognizing and predicting users’ in-

                         IR-RAG @ SIGIR24 workshop: The 47th International ACM SIGIR Confer-
                         ence on Research and Development in Information Retrieval, July 18, 2024,
                         Washington D.C., USA                                                                                       Figure 1: A search of "guidebook for edible mus" returns an
                         *
                           Corresponding author.                                                                                    empty QAC suggestion list while multiple related products are
                         †
                           These authors contributed equally.                                                                       available.
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                     Attribution 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                prefixes                        100 things to see                               Generator

                                                                                                                               product3
                                                                                                                             product2
                           prefix encoder                                                                    prefix        product1



                                                                                                                           prompt
                         MaxSim           MaxSim
   offline indexing




                          product encoder                              1.   100 Things To See In The         QAC suggestions:
                                                                            Kimberley
                                                                       2.   100 Things to See in the Night   1. 100 things to see in the
                                                                            Sky, Expanded Edition: Your
                                                                                                                kimberley
                                                                            Illustrated Guide to the
                                                                                                             2. 100 things to see in the
                              products                                      Planets, Satellites,
                                                                            Constellations, and More            night sky
                                                                       3.   100 Things to See in the         3. 100 things to see in the
                                                                            National Parks                      national parks
                            Retriever
                      Figure 2: Schematic architecture of Product-RAG model for retrieving top-K products relevant to a prefix and generating K
                      product-aware query auto-completion suggestions from them.


gestions based on product knowledge relevant to search                           using neural network representations learned with
prefixes. The RAG framework exhibits outstanding efficacy                        query prefix, candidate query completions, along with
in domain-specific sequence generation tasks through ex-                         session context [21, 22]. Context-aware QAC considers
ploiting the domain knowledge as additional context when                         users’ past queries instead of solely relying on popular-
generating the target sequences. Under the RAG scheme,                           ity [13, 14, 15, 16, 17]. Meanwhile, recent advances in
generative large language models (LLMs) are supervised by                        natural language processing [23] have inspired enormous
retrieved relevant information from heterogeneous and pre-                       work on semantically understanding users’ search context
determined knowledge sources, and consequently, present                          and intent. They employ a high-dimensional space to
accurate and controlled text output. This is a compelling                        measure the semantic similarities between search queries
benefit for e-commerce QAC systems where relevancy of                            in contextual representations. These methods either offer
QAC suggestions is essential for user trust. Additionally,                       users more contextually relevant suggestions from query
the RAG framework brings QAC systems great versatility                           log [24, 25, 26, 27] or generate new keywords [28, 29, 30].
against frequently-updated information sources. This at-                         Candidate sourcing approaches usually fall into one
tribute allows the utilization of the vast universal product                     of the three categories: retriever-only approaches, pure
knowledge base into e-commerce QAC systems without un-                           generative approaches, and retrieval-augmented generation.
dergoing the high costs of re-training the generative model
on the giant product catalog. The superiority of the pro-                        Retriever-only approaches. This class of approaches
posed Product-RAG model is empirically demonstrated by                           are based on retrieving the candidates from a pool of
human-annotated e-commerce QAC tasks. We illustrate                              candidates that is usually built from scraping historical
that the proposed method could outperform state-of-the-art                       logs of the QAC system. The candidates are often selected
generative Large Language Models (LLMs) by generating                            based on popularity-based approaches such as MPC and
QAC suggestions that 1) are more similar with the ground                         neural matching [31, 32, 33]. In MPC-based approaches,
truths and in terms of ROUGE scores, and 2) lead to more                         candidates with higher forecasted popularity are selected;
relevant product search results in terms of Mean Reciprocal                      in lexical matching approaches, candidates with higher
Rank (MRR) and Hit Ratio (HR).                                                   similarities are selected; neural matching approaches for
                                                                                 candidate retrieval allow selecting queries in semantic
                                                                                 representations [34]. Session context [21, 17] or personal
2. Related Work                                                                  profile signals [35, 36] could be considered as auxiliary
                                                                                 inputs. However, since they are limited to the existing
Most QAC systems work on a sourcing-and-ranking basis -
                                                                                 candidates, offering suggestions is not guaranteed. In
they first source candidates from a large pool to limit the
                                                                                 addition, the achieved product-awareness is biased toward
scope, and then rank on the sourced candidates. This paper
                                                                                 popular candidates in the past.
focuses on the sourcing part of the QAC system, ensuring
that the sourced candidates are product-aware, before send-
                                                                                 Generative approaches. The generative approaches
ing them to the query ranker.
                                                                                 use language models to generate the candidate based on
  Established approaches for sourcing candidate query
                                                                                 inputs like prefix, session context, and personalization
completions often rely on most popular candidate
                                                                                 signals[37, 38, 39]. Generative approaches can provide can-
(MPC) approach, which could not incorporate important
                                                                                 didates for prefixes fist appear. Two major challenges facing
semantic information of prefix and session context.
                                                                                 pure generative approaches in e-commerce QAC system
Newer approaches incorporate semantic information
development are: (1) They may suffer from hallucinations,            Precisely, given the representation of a prefix 𝑥 and a
generating plausible queries but without a reference to            product 𝑧, the relevance score of 𝑧 to 𝑥, denoted as 𝑆𝑥,𝑧 is
product information. (2) In a dynamic environment of               defined as the sum of maximum cosine similarity between
e-commerce where products changes continuously, they               each vector E𝑥𝑖 in prefix embedding and the vectors in a a
lack the mechanism to automatically incorporate new                product bag E𝑇 :
information post-training. This necessitates periodic
                                                                                          𝑥
fine-tuning to maintain the model’s relevance and accuracy,                              ∑︁
                                                                               𝑆𝑥,𝑧 :=         max E𝑥𝑖 · E𝑇𝑗 , 𝑧 ∈ 𝑇         (2)
which can be prohibitively costly and impractical.                                             𝑗∈𝑇
                                                                                         𝑖=1

Retrieval Augmented Generation (RAG). RAG ap-                      Offline product knowledge indexing. We pre-compute
proaches [20, 40, 41], extend the capabilities of language         all product embeddings and offline index these vector repre-
models by integrating external knowledge sources as auxil-         sentations to support the efficient lookup of relevant prod-
iary inputs to enhance the performance of the overall sys-         ucts. The index includes 1) centriods representing centers
tem. To our best knowledge, our work is the first study that       partitioning product embeddings into bags, 2) residuals stor-
adapts RAG framework for QAC systems. Prior to this, RAG           ing embedding for a product and comparing to its nearest
has been applied to various tasks and application domains          centroid, and 3) index inversion representing an inverted
such as question answering [42], text summarization [20].          map from a centroid to products to support the fast near-
Since RAG uses product information as auxiliary inputs for         est neighbor search. They are encoded offline and loaded
the language model during generation, and that the prod-           into the memory of QAC service. Given a prefix, the prefix
uct retrieval updates in response to underlying database           encoder vectorizes it and the retrieval model looks for the
updates, the language model does not need to be fine-tuned         top-K most relevant products through operating MaxSim
to capture new products.                                           between the prefix embedding and already-loaded product
                                                                   indexing in memory.
                                                                      Offline indexing of pre-computed product embeddings
3. Product-RAG                                                     also brings convenience to refreshing the product knowl-
                                                                   edge pool frequently with low cost and requires no effort
Existing studies show that the RAG framework is effective
                                                                   from re-training models to adapt to newly added products.
and efficient in extending the already powerful capabilities
of LLMs to specific domains without the need to retrain
                                                                   QAC Suggestion Generation. For the generative compo-
the model with a heterogeneous database. The proposed
                                                                   nent of the Product-RAG, we use a generative LLM where
Product-RAG model is based upon the architecture of the
                                                                   the input is a prompt containing both prefix 𝑥 and top-K
RAG-Sequence Model [20] to generate one suggestion for
                                                                   retrieved products 𝑧 ∈ Z and the outputs are K product-
each relevant product. The schematic architecture of the
                                                                   aware QAC suggestions Y for the prefix. As we train the
Product-RAG model for generating product-aware query
                                                                   retrieval model and the generative model separately, we can
auto-completion suggestions from relevant products is de-
                                                                   use any state-of-the-art generative LLMs such as Mistral-7B
picted in Figure 2. Our framework leverages two compo-
                                                                   [45], PaLM [46], GPT-4 [47], as long as they can perform
nents:
                                                                   text summarization and QAC or equivalent tasks. In our
    1. A retrieval model 𝜂 that retrieves the best-matching        proposed Product-RAG we empirically choose Mistral-7B
       product titles or catalog from product pool.                based on offline evaluations on the performance and latency
    2. A generative LLM 𝜃 that outputs auto-complete sug-          of different generative LLMs in QAC tasks. Moreover, we
       gestions for a given prefix and retrieved products          are able to fine-tune Mistral-7B with e-commerce search
       from the retriever.                                         and product data.
Task formulation. We denote a search prefix as 𝑥 and
the target QAC suggestions as 𝑦. The retriever 𝑝𝜂 (Z|𝑥)            4. Experiments
consumes a product knowledge base P and returns top-K
relevant products Z = {𝑧1 , 𝑧2 , ..., 𝑧𝐾 }, 𝑧𝑖 ∈ P given the       We now evaluate Product-RAG on e-commerce QAC
prefix 𝑥. For each 𝑧 ∈ Z, the generative LLM 𝑝𝜃 (𝑦|𝑥, 𝑧)           tasks, testing its ability to generate QAC suggestions for
generates a QAC suggestion for 𝑥 with context from the             a given prefix in the e-commerce domain. We define a
retrieved product, rendering the top-K suggestions Y =             baseline LLM by fine-tuning the Mistral-7B model on
{𝑦1 , 𝑦2 , ..., 𝑦𝐾 }. The Product-RAG can be parameterized         the e-commerce QAC database without the help of the
as                                                                 RAG framework. In the Product-RAG framework, we
       𝑝𝑃 𝑟𝑜𝑑𝑢𝑐𝑡−𝑅𝐴𝐺 (Y|𝑥) =
                                .                                  employ the multi-vector retrieval model (denoted as
                                                                   MultiVec), fine-tuned as outlined in the previous section,
                                   𝑝𝜂 (𝑧𝑖 |𝑥)𝑝𝜃 (𝑦𝑖 |𝑥, 𝑧𝑖 ) (1)
                               ∑︁
                                                                   as the primary retrieval component. To demonstrate the
                       𝑧𝑖 ∈𝑡𝑜𝑝−𝐾(𝑝(·|𝑥))                           effectiveness of our proposed retrieval method, we establish
                                                                   a baseline retrieval model, BM25 [48], within the RAG
Multi-vector retrieval model. State-of-the-art methods
                                                                   framework for comparison. The generative component for
typically fine-tune deep pre-trained language models, such
                                                                   both the Product-RAG-MultiVec and Product-RAG-BM25
as BERT [43], to generate dense vector representations for
                                                                   frameworks is a fine-tuned Mistral-7B.
both input queries and documents. The top-K documents
with the highest similarity scores are then retrieved. In-
                                                                   Experimental dataset. We perform an experiment on
spired by recent advances in multi-vector representations
                                                                   1,500 search queries corresponding to book products with
[34, 44], we adopt a retrieval model in the proposed ap-
                                                                   the help of human expert annotation: given a search prefix,
proach, as depicted in Figure 2, where we fine-tune the
                                                                   a human expert manually annotates an auto-completion
prefix and product encoders with e-commerce data.
    Table 1
    Evaluation scores of generated QAC suggestions. Each model generates 3 suggestions and we obtain the maximum evaluation
    scores out of these suggestions as the evaluation score of the data point.

                           Model             ROUGE-1       ROUGE-2          ROUGE-L       MRR@10        HR@10
                         Mistral-7B              77.2            66.7           76.6         0.65         0.76
                     Product-RAG-BM25            75.3            64.9           74.6         0.62         0.74
                   Product-RAG-MultiVec         82.2             74.1          81.5          0.75         0.87




keyword as the ground truth QAC suggestion and, through                 5. Conclusions
the product search page, find an available book product,
which we use as the ground truth targeting product. Thus,               In this work, we introduce Product-RAG, an RAG framework
each evaluation data point is composed by a triple . For each data point, we use                   vant products to search prefix and informing product-aware
proposed models to generate top-3 suggestions based on                  suggestions. This framework generates suggestions close to
the prefix. For the Product-RAG models, we employ 7                     user search intention, and it highlights product relevance
million book products as product knowledge.                             at an early stage of the shopping journey, before down-
                                                                        stream product searches. Through empirical experiments
Evaluation metrics. We evaluate the generated sugges-                   on auto-completing search queries, we compare the pro-
tions in both query and product levels. We compute                      posed framework with baseline LLM and we test various
                                                                        retrieval models. In particular, we find that the Product-
    1. the similarity between generated suggestions with                RAG-MultiVec remarkably outperforms its counterparts in
       annotated ground truth QAC suggestions in terms                  terms of query similarity and product relevance. This work
       of ROUGE-1/2/L F1 scores. We use the maximum                     sheds light on bridging the semantic gap between partial
       score out of the 3 suggestions for each data point.              search queries and product knowledge in the scenario of
    2. for each target product, the Mean Reciprocal Rank in             e-commerce QAC systems.
       the top 10 results (MRR@10) and the Hit Ratio (target
       product is included) in the top 10 results (HR@10) on
       the product search page triggered by the generated               References
       suggestions. We use the maximum MRR@10 and
                                                                         [1] M. Jakobsson, Autocompletion in full text transaction
       HR@10 out of the 3 suggestions generated as well.
                                                                             entry: a method for humanized input, ACM SIGCHI
Evaluation results. We report the ROUGE-1, ROUGE-2,                          Bulletin 17 (1986) 327–332.
ROUGE-L F-1 scores, MRR@10, and HR@10 for the 3                          [2] H. Bast, I. Weber, Type less, find more: fast autocom-
experimented models in Table. 1. We observe that the                         pletion search with a succinct index, in: Proceedings of
proposed Product-RAG-MultiVec outperforms both baseline                      the 29th annual international ACM SIGIR conference
generative Mistral-7B and the counterpart based on the                       on Research and development in information retrieval,
BM25 retrieval model, in terms of all ROUGE scores and                       2006, pp. 364–371.
the 2 product-level metrics. These findings demonstrate                  [3] H. Bast, D. Majumdar, I. Weber, Efficient interactive
the supremacy of the Product-RAG-MultiVec model in                           query expansion with complete search, in: Proceed-
generating high-quality QAC suggestions for e-commerce                       ings of the sixteenth ACM conference on Conference
systems.                                                                     on information and knowledge management, 2007, pp.
                                                                             857–860.
Discussions. In the experiment above, we notice a neg-                   [4] F. Cai, M. De Rijke, et al., A survey of query auto
ative impact in these metrics led by the BM25 retriever                      completion in information retrieval, Foundations and
compared with baseline Mistral-7B. Inspecting the top-3 re-                  Trends® in Information Retrieval 10 (2016) 273–363.
trieved products from BM25 and those from MultiVec model,                [5] C. Xiao, J. Qin, W. Wang, Y. Ishikawa, K. Tsuda,
we notice a non-trivial gap between the performance of two                   K. Sadakane, Efficient error-tolerant query autocom-
retrievers: in our experiment, MultiVec model is able to                     pletion, Proceedings of the VLDB Endowment 6 (2013)
successfully retrieve the targeting product or its equivalents               373–384.
in 74.7% cases (e.g., Harry Potter and the Chamber of Secrets:           [6] M. A. Hasan, N. Parikh, G. Singh, N. Sundaresan,
Gryffindor Edition Red or its equivalent Harry Potter and the                Query suggestion for e-commerce sites, in: Proceed-
Chamber of Secrets), whereas BM25 retriever only succeeds                    ings of the fourth ACM international conference on
in 37.1% cases. The accuracy of two retrievers explains                      Web Search and Data Mining, 2011, pp. 765–774.
the performance gap between the two Product-RAG mod-                     [7] S. K. Karmaker Santu, P. Sondhi, C. Zhai, On appli-
els. This leads to a conclusion that when the retriever pro-                 cation of learning to rank for e-commerce search, in:
vides highly-relevant products, the proposed Product-RAG                     Proceedings of the 40th international ACM SIGIR con-
framework is capable of improving upon the state-of-the-art                  ference on research and development in information
generative approaches in e-commerce QAC tasks. And we                        retrieval, 2017, pp. 475–484.
believe that improving the precision of the retriever model is           [8] L. Wu, D. Hu, L. Hong, H. Liu, Turning clicks into
one future direction of refining the proposed Product-RAG                    purchases: Revenue optimization for product search
framework.                                                                   in e-commerce, in: The 41st International ACM SIGIR
     Conference on Research & Development in Informa-            [22] S. Wang, W. Guo, H. Gao, B. Long, Efficient neu-
     tion Retrieval, 2018, pp. 365–374.                               ral query auto completion, in: Proceedings of the
 [9] A. Block, R. Kidambi, D. N. Hill, T. Joachims, I. S.             29th ACM International Conference on Information
     Dhillon, Counterfactual learning to rank for utility-            & Knowledge Management, CIKM ’20, Association
     maximizing query autocompletion, in: Proceedings                 for Computing Machinery, New York, NY, USA, 2020,
     of the 45th International ACM SIGIR Conference on                p. 2797–2804. URL: https://doi.org/10.1145/3340531.
     Research and Development in Information Retrieval,               3412701. doi:10.1145/3340531.3412701.
     2022, pp. 791–802.                                          [23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
[10] G. Di Santo, R. McCreadie, C. Macdonald, I. Ounis,               L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Atten-
     Comparing approaches for query autocompletion, in:               tion is all you need, Advances in neural information
     Proceedings of the 38th International ACM SIGIR Con-             processing systems 30 (2017).
     ference on Research and Development in Information          [24] B. Mitra, N. Craswell, Query auto-completion for rare
     Retrieval, 2015, pp. 775–778.                                    prefixes, in: Proceedings of the 24th ACM interna-
[11] M. Shokouhi, K. Radinsky, Time-sensitive query auto-             tional on conference on information and knowledge
     completion, in: Proceedings of the 35th international            management, 2015, pp. 1755–1758.
     ACM SIGIR conference on Research and development            [25] F. Cai, M. de Rijke, Learning from homologous queries
     in information retrieval, 2012, pp. 601–610.                     and semantically related terms for query auto comple-
[12] A. Strizhevskaya, A. Baytin, I. Galinskaya,                      tion, Information Processing & Management 52 (2016)
     P. Serdyukov,        Actualization of query sugges-              628–643.
     tions using query logs, in: Proceedings of the 21st         [26] T. Shao, H. Chen, W. Chen, Query auto-completion
     International Conference on World Wide Web, 2012,                based on word2vec semantic similarity, in: Journal of
     pp. 611–612.                                                     Physics: Conference Series, volume 1004, IOP Publish-
[13] Z. Bar-Yossef, N. Kraus, Context-sensitive query auto-           ing, 2018, p. 012018.
     completion, in: Proceedings of the 20th international       [27] K. Arkoudas, M. Yahya, Semantically driven auto-
     conference on World wide web, 2011, pp. 107–116.                 completion, in: Proceedings of the 28th ACM Inter-
[14] F. Cai, S. Liang, M. De Rijke, Time-sensitive person-            national Conference on Information and Knowledge
     alized query auto-completion, in: Proceedings of the             Management, 2019, pp. 2693–2701.
     23rd ACM international conference on conference on          [28] S. Wang, W. Guo, H. Gao, B. Long, Efficient neu-
     information and knowledge management, 2014, pp.                  ral query auto completion, in: Proceedings of the
     1599–1608.                                                       29th ACM International Conference on Information
[15] M. Shokouhi, Learning to personalize query auto-                 & Knowledge Management, 2020, pp. 2797–2804.
     completion, in: Proceedings of the 36th international       [29] Y. M. Kang, W. Liu, Y. Zhou, Queryblazer: efficient
     ACM SIGIR conference on Research and development                 query autocompletion framework, in: Proceedings
     in information retrieval, 2013, pp. 103–112.                     of the 14th ACM International Conference on Web
[16] S. Whiting, J. M. Jose, Recent and robust query auto-            Search and Data Mining, 2021, pp. 1020–1028.
     completion, in: Proceedings of the 23rd international       [30] D. Maxwell, P. Bailey, D. Hawking, Large-scale gen-
     conference on World wide web, 2014, pp. 971–982.                 erative query autocompletion, in: Proceedings of the
[17] N. Yadav, R. Sen, D. N. Hill, A. Mazumdar, I. S. Dhillon,        22nd Australasian Document Computing Symposium,
     Session-aware query auto-completion using extreme                2017, pp. 1–8.
     multi-label ranking, in: Proceedings of the 27th ACM        [31] S. Whiting, J. M. Jose, Recent and robust query auto-
     SIGKDD Conference on Knowledge Discovery & Data                  completion, in: Proceedings of the 23rd International
     Mining, 2021, pp. 3835–3844.                                     Conference on World Wide Web, WWW ’14, Associa-
[18] J. Zhao, H. Chen, D. Yin, A dynamic product-aware                tion for Computing Machinery, New York, NY, USA,
     learning model for e-commerce query intent under-                2014, p. 971–982. URL: https://doi.org/10.1145/2566486.
     standing, in: Proceedings of the 28th ACM Inter-                 2568009. doi:10.1145/2566486.2568009.
     national Conference on Information and Knowledge            [32] F. Cai, S. Liang, M. de Rijke, Time-sensitive person-
     Management, 2019, pp. 1843–1852.                                 alized query auto-completion, in: Proceedings of the
[19] C. Luo, R. Goutam, H. Zhang, C. Zhang, Y. Song, B. Yin,          23rd ACM International Conference on Conference on
     Implicit query parsing at amazon product search, in:             Information and Knowledge Management, CIKM ’14,
     Proceedings of the 46th International ACM SIGIR Con-             Association for Computing Machinery, New York, NY,
     ference on Research and Development in Information               USA, 2014, p. 1599–1608. URL: https://doi.org/10.1145/
     Retrieval, 2023, pp. 3380–3384.                                  2661829.2661921. doi:10.1145/2661829.2661921.
[20] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin,    [33] M. Shokouhi, K. Radinsky, Time-sensitive query auto-
     N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rock-              completion, in: Proceedings of the 35th International
     täschel, et al., Retrieval-augmented generation for              ACM SIGIR Conference on Research and Develop-
     knowledge-intensive nlp tasks, Advances in Neural                ment in Information Retrieval, SIGIR ’12, Associa-
     Information Processing Systems 33 (2020) 9459–9474.              tion for Computing Machinery, New York, NY, USA,
[21] B. Mitra, Exploring session context using distributed            2012, p. 601–610. URL: https://doi.org/10.1145/2348283.
     representations of queries and reformulations, in: Pro-          2348364. doi:10.1145/2348283.2348364.
     ceedings of the 38th International ACM SIGIR Con-           [34] O. Khattab, M. Zaharia, Colbert: Efficient and effec-
     ference on Research and Development in Informa-                  tive passage search via contextualized late interaction
     tion Retrieval, SIGIR ’15, Association for Comput-               over bert, in: Proceedings of the 43rd International
     ing Machinery, New York, NY, USA, 2015, p. 3–12.                 ACM SIGIR conference on research and development
     URL: https://doi.org/10.1145/2766462.2767702. doi:10.            in Information Retrieval, 2020, pp. 39–48.
     1145/2766462.2767702.                                       [35] G. Aslanyan, A. Mandal, P. Senthil Kumar, A. Jaiswal,
     M. Rangasamy Kannadasan, Personalized ranking in                 arXiv:2112.01488 (2021).
     ecommerce search, in: Companion Proceedings of              [45] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford,
     the Web Conference 2020, WWW ’20, Association for                D. S. Chaplot, D. d. l. Casas, F. Bressand, G. Lengyel,
     Computing Machinery, New York, NY, USA, 2020, p.                 G. Lample, L. Saulnier, et al., Mistral 7b, arXiv preprint
     96–97. URL: https://doi.org/10.1145/3366424.3382715.             arXiv:2310.06825 (2023).
     doi:10.1145/3366424.3382715.                                [46] A. Chowdhery, S. Narang, J. Devlin, M. Bosma,
[36] A. Jaech, M. Ostendorf, Personalized language model              G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sut-
     for query auto-completion, in: I. Gurevych, Y. Miyao             ton, S. Gehrmann, et al., Palm: Scaling language mod-
     (Eds.), Proceedings of the 56th Annual Meeting of                eling with pathways, Journal of Machine Learning
     the Association for Computational Linguistics (Vol-              Research 24 (2023) 1–113.
     ume 2: Short Papers), Association for Computa-              [47] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya,
     tional Linguistics, Melbourne, Australia, 2018, pp. 700–         F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman,
     705. URL: https://aclanthology.org/P18-2111. doi:10.             S. Anadkat, et al., Gpt-4 technical report, arXiv
     18653/v1/P18-2111.                                               preprint arXiv:2303.08774 (2023).
[37] D. Yin, J. Tan, Z. Zhang, H. Deng, S. Huang,                [48] S. Robertson, H. Zaragoza, et al., The probabilistic
     J. Chen, Learning to generate personalized query auto-           relevance framework: Bm25 and beyond, Foundations
     completions via a multi-view multi-task attentive ap-            and Trends® in Information Retrieval 3 (2009) 333–
     proach, in: Proceedings of the 26th ACM SIGKDD                   389.
     International Conference on Knowledge Discovery &
     Data Mining, KDD ’20, Association for Computing
     Machinery, New York, NY, USA, 2020, p. 2998–3007.
     URL: https://doi.org/10.1145/3394486.3403350. doi:10.
     1145/3394486.3403350.
[38] D. Maxwell, P. Bailey, D. Hawking, Large-scale gen-
     erative query autocompletion, in: Proceedings of the
     22nd Australasian Document Computing Symposium,
     ADCS ’17, Association for Computing Machinery, New
     York, NY, USA, 2017. URL: https://doi.org/10.1145/
     3166072.3166083. doi:10.1145/3166072.3166083.
[39] A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Si-
     monsen, J.-Y. Nie, A hierarchical recurrent encoder-
     decoder for generative context-aware query sugges-
     tion, in: Proceedings of the 24th ACM Interna-
     tional on Conference on Information and Knowledge
     Management, CIKM ’15, Association for Computing
     Machinery, New York, NY, USA, 2015, p. 553–562.
     URL: https://doi.org/10.1145/2806416.2806493. doi:10.
     1145/2806416.2806493.
[40] P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Sub-
     ramanian, E. Bakhturina, M. Shoeybi, B. Catanzaro,
     Retrieval meets long context large language models,
     arXiv preprint arXiv:2310.03025 (2023).
[41] N. Kandpal, H. Deng, A. Roberts, E. Wallace, C. Raf-
     fel, Large language models struggle to learn long-tail
     knowledge, in: Proceedings of the 40th International
     Conference on Machine Learning, ICML’23, JMLR.org,
     2023.
[42] Y. Mao, P. He, X. Liu, Y. Shen, J. Gao, J. Han,
     W. Chen, Generation-augmented retrieval for open-
     domain question answering, in: C. Zong, F. Xia,
     W. Li, R. Navigli (Eds.), Proceedings of the 59th An-
     nual Meeting of the Association for Computational
     Linguistics and the 11th International Joint Confer-
     ence on Natural Language Processing (Volume 1:
     Long Papers), Association for Computational Lin-
     guistics, Online, 2021, pp. 4089–4100. URL: https:
     //aclanthology.org/2021.acl-long.316. doi:10.18653/
     v1/2021.acl-long.316.
[43] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert:
     Pre-training of deep bidirectional transformers for lan-
     guage understanding, arXiv preprint arXiv:1810.04805
     (2018).
[44] K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts,
     M. Zaharia, Colbertv2: Effective and efficient re-
     trieval via lightweight late interaction, arXiv preprint