A Product-Aware Query Auto-Completion Framework for E-Commerce Search via Retrieval-Augmented Generation Method Fangzheng Sun*,† , Tianqi Zheng† , Aakash Kolekar, Rohit Patki, Hossein Khazaei, Xuan Guo, Ziheng Cai, David Liu, Ruirui Li, Yupin Huang, Dante Everaert, Hanqing Lu, Garima Patel and Monica Cheng Amazon Search, Palo Alto, CA, USA Abstract Query Auto-Completion (QAC) is a fundamental component of user search experience on e-commerce websites. It assists in finding user-intended products, by automatically presenting search queries as users typing in the search bar. Traditional QAC systems build upon query popularity to suggest a list of potential completions, but they fall short for unforeseen search prefixes. A generative Large Language Model (LLM) can complete even unforeseen prefixes, but relevance to the product catalog of the generated suggestions is not guaranteed. To our best knowledge, there is no existing study using LLMs to generate product-aware search query completion suggestions. This paper proposes a generative approach named "Product-RAG", to incorporate product metadata and adapt Retrieval Augmented Generation (RAG) in the development of QAC systems. Product-RAG contains two components: (1) a retrieval model that identifies top-K most relevant products from the product catalog given a user-input prefix, and (2) a generative model that offers suggestions based on both the given prefix and the retrieved product metadata. We evaluate this approach for its ability to match user-input prefixes to user-intended products, using the metrics of ROUGE scores, Mean Reciprocal Rank (MRR) and Hit Ratio (HR) in downstream product search. We observe that the proposed Product-RAG approach outperforms state-of-the-art generative models in auto-completing e-commerce search queries. Keywords Query Auto-Complete, Retrieval-Augmented Generation, E-Commerce, Product-aware 1. Introduction tent related to specific products, brands, or categories and providing auto-complete suggestions that align with the Query auto-completion (QAC) [1, 2, 3, 4, 5] refers to an in- user’s potential shopping targets. When query log falls formation retrieval system for search engines, for which, short for unseen or rarely seen prefixes, product knowl- given partial context typed by the user (i.e. prefix), it offers edge is particularly helpful to predict users’ shopping intent one or multiple query suggestions to the user. In modern and generate corresponding suggestions (e.g., in the case e-commerce, where user experience is pivotal, QAC stands of Figure 1). Nevertheless, in spite of efforts attempting as an important feature shaping the way consumers interact to understand user’s shopping intent with product cata- with search engines and plays a crucial role in smoothing log or product attributes [18, 19], we could not find any all the downstream shopping experiences [6, 7, 8, 9]. By work bridging the gap between partially complete search leveraging personalized signals, product-related knowledge, queries and product knowledge for e-commerce QAC sys- and advanced recommendation algorithms, QAC not only tems. Herein we propose a generative approach to leverage accelerates the search experience but also ensures that users the product knowledge in e-commerce QAC systems based receive tailored suggestions based on their unique prefer- on Retrieval-Augmented Generation (RAG) Framework [20], ences. namely Product-RAG, which is capable of improving the One major challenge of QAC tasks in e-commerce is to QAC systems by providing accurate auto-completion sug- understand user shopping intent from an incomplete search query, and provide them relevant auto-complete sugges- tions. A typical production QAC works as follows: Given a prefix entered by a user, the QAC system obtains a col- lection of queries satisfying the prefix from query log, and adopts a selection process, often based on forecasted popu- larity, to select candidate queries to send to the query ranker [10, 11, 12]. This framework lacks the understanding of user shopping intent. To give more importance to users’ intents and provide more personalized and relevant QAC sugges- tions, a number of works explore the context-aware and personalized QAC systems [13, 14, 15, 16, 17]. Another challenge for QAC in e-commerce is to attain product awareness by recognizing and predicting users’ in- IR-RAG @ SIGIR24 workshop: The 47th International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, July 18, 2024, Washington D.C., USA Figure 1: A search of "guidebook for edible mus" returns an * Corresponding author. empty QAC suggestion list while multiple related products are † These authors contributed equally. available. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings prefixes 100 things to see Generator product3 product2 prefix encoder prefix product1 prompt MaxSim MaxSim offline indexing product encoder 1. 100 Things To See In The QAC suggestions: Kimberley 2. 100 Things to See in the Night 1. 100 things to see in the Sky, Expanded Edition: Your kimberley Illustrated Guide to the 2. 100 things to see in the products Planets, Satellites, Constellations, and More night sky 3. 100 Things to See in the 3. 100 things to see in the National Parks national parks Retriever Figure 2: Schematic architecture of Product-RAG model for retrieving top-K products relevant to a prefix and generating K product-aware query auto-completion suggestions from them. gestions based on product knowledge relevant to search using neural network representations learned with prefixes. The RAG framework exhibits outstanding efficacy query prefix, candidate query completions, along with in domain-specific sequence generation tasks through ex- session context [21, 22]. Context-aware QAC considers ploiting the domain knowledge as additional context when users’ past queries instead of solely relying on popular- generating the target sequences. Under the RAG scheme, ity [13, 14, 15, 16, 17]. Meanwhile, recent advances in generative large language models (LLMs) are supervised by natural language processing [23] have inspired enormous retrieved relevant information from heterogeneous and pre- work on semantically understanding users’ search context determined knowledge sources, and consequently, present and intent. They employ a high-dimensional space to accurate and controlled text output. This is a compelling measure the semantic similarities between search queries benefit for e-commerce QAC systems where relevancy of in contextual representations. These methods either offer QAC suggestions is essential for user trust. Additionally, users more contextually relevant suggestions from query the RAG framework brings QAC systems great versatility log [24, 25, 26, 27] or generate new keywords [28, 29, 30]. against frequently-updated information sources. This at- Candidate sourcing approaches usually fall into one tribute allows the utilization of the vast universal product of the three categories: retriever-only approaches, pure knowledge base into e-commerce QAC systems without un- generative approaches, and retrieval-augmented generation. dergoing the high costs of re-training the generative model on the giant product catalog. The superiority of the pro- Retriever-only approaches. This class of approaches posed Product-RAG model is empirically demonstrated by are based on retrieving the candidates from a pool of human-annotated e-commerce QAC tasks. We illustrate candidates that is usually built from scraping historical that the proposed method could outperform state-of-the-art logs of the QAC system. The candidates are often selected generative Large Language Models (LLMs) by generating based on popularity-based approaches such as MPC and QAC suggestions that 1) are more similar with the ground neural matching [31, 32, 33]. In MPC-based approaches, truths and in terms of ROUGE scores, and 2) lead to more candidates with higher forecasted popularity are selected; relevant product search results in terms of Mean Reciprocal in lexical matching approaches, candidates with higher Rank (MRR) and Hit Ratio (HR). similarities are selected; neural matching approaches for candidate retrieval allow selecting queries in semantic representations [34]. Session context [21, 17] or personal 2. Related Work profile signals [35, 36] could be considered as auxiliary inputs. However, since they are limited to the existing Most QAC systems work on a sourcing-and-ranking basis - candidates, offering suggestions is not guaranteed. In they first source candidates from a large pool to limit the addition, the achieved product-awareness is biased toward scope, and then rank on the sourced candidates. This paper popular candidates in the past. focuses on the sourcing part of the QAC system, ensuring that the sourced candidates are product-aware, before send- Generative approaches. The generative approaches ing them to the query ranker. use language models to generate the candidate based on Established approaches for sourcing candidate query inputs like prefix, session context, and personalization completions often rely on most popular candidate signals[37, 38, 39]. Generative approaches can provide can- (MPC) approach, which could not incorporate important didates for prefixes fist appear. Two major challenges facing semantic information of prefix and session context. pure generative approaches in e-commerce QAC system Newer approaches incorporate semantic information development are: (1) They may suffer from hallucinations, Precisely, given the representation of a prefix 𝑥 and a generating plausible queries but without a reference to product 𝑧, the relevance score of 𝑧 to 𝑥, denoted as 𝑆𝑥,𝑧 is product information. (2) In a dynamic environment of defined as the sum of maximum cosine similarity between e-commerce where products changes continuously, they each vector E𝑥𝑖 in prefix embedding and the vectors in a a lack the mechanism to automatically incorporate new product bag E𝑇 : information post-training. This necessitates periodic 𝑥 fine-tuning to maintain the model’s relevance and accuracy, ∑︁ 𝑆𝑥,𝑧 := max E𝑥𝑖 · E𝑇𝑗 , 𝑧 ∈ 𝑇 (2) which can be prohibitively costly and impractical. 𝑗∈𝑇 𝑖=1 Retrieval Augmented Generation (RAG). RAG ap- Offline product knowledge indexing. We pre-compute proaches [20, 40, 41], extend the capabilities of language all product embeddings and offline index these vector repre- models by integrating external knowledge sources as auxil- sentations to support the efficient lookup of relevant prod- iary inputs to enhance the performance of the overall sys- ucts. The index includes 1) centriods representing centers tem. To our best knowledge, our work is the first study that partitioning product embeddings into bags, 2) residuals stor- adapts RAG framework for QAC systems. Prior to this, RAG ing embedding for a product and comparing to its nearest has been applied to various tasks and application domains centroid, and 3) index inversion representing an inverted such as question answering [42], text summarization [20]. map from a centroid to products to support the fast near- Since RAG uses product information as auxiliary inputs for est neighbor search. They are encoded offline and loaded the language model during generation, and that the prod- into the memory of QAC service. Given a prefix, the prefix uct retrieval updates in response to underlying database encoder vectorizes it and the retrieval model looks for the updates, the language model does not need to be fine-tuned top-K most relevant products through operating MaxSim to capture new products. between the prefix embedding and already-loaded product indexing in memory. Offline indexing of pre-computed product embeddings 3. Product-RAG also brings convenience to refreshing the product knowl- edge pool frequently with low cost and requires no effort Existing studies show that the RAG framework is effective from re-training models to adapt to newly added products. and efficient in extending the already powerful capabilities of LLMs to specific domains without the need to retrain QAC Suggestion Generation. For the generative compo- the model with a heterogeneous database. The proposed nent of the Product-RAG, we use a generative LLM where Product-RAG model is based upon the architecture of the the input is a prompt containing both prefix 𝑥 and top-K RAG-Sequence Model [20] to generate one suggestion for retrieved products 𝑧 ∈ Z and the outputs are K product- each relevant product. The schematic architecture of the aware QAC suggestions Y for the prefix. As we train the Product-RAG model for generating product-aware query retrieval model and the generative model separately, we can auto-completion suggestions from relevant products is de- use any state-of-the-art generative LLMs such as Mistral-7B picted in Figure 2. Our framework leverages two compo- [45], PaLM [46], GPT-4 [47], as long as they can perform nents: text summarization and QAC or equivalent tasks. In our 1. A retrieval model 𝜂 that retrieves the best-matching proposed Product-RAG we empirically choose Mistral-7B product titles or catalog from product pool. based on offline evaluations on the performance and latency 2. A generative LLM 𝜃 that outputs auto-complete sug- of different generative LLMs in QAC tasks. Moreover, we gestions for a given prefix and retrieved products are able to fine-tune Mistral-7B with e-commerce search from the retriever. and product data. Task formulation. We denote a search prefix as 𝑥 and the target QAC suggestions as 𝑦. The retriever 𝑝𝜂 (Z|𝑥) 4. Experiments consumes a product knowledge base P and returns top-K relevant products Z = {𝑧1 , 𝑧2 , ..., 𝑧𝐾 }, 𝑧𝑖 ∈ P given the We now evaluate Product-RAG on e-commerce QAC prefix 𝑥. For each 𝑧 ∈ Z, the generative LLM 𝑝𝜃 (𝑦|𝑥, 𝑧) tasks, testing its ability to generate QAC suggestions for generates a QAC suggestion for 𝑥 with context from the a given prefix in the e-commerce domain. We define a retrieved product, rendering the top-K suggestions Y = baseline LLM by fine-tuning the Mistral-7B model on {𝑦1 , 𝑦2 , ..., 𝑦𝐾 }. The Product-RAG can be parameterized the e-commerce QAC database without the help of the as RAG framework. In the Product-RAG framework, we 𝑝𝑃 𝑟𝑜𝑑𝑢𝑐𝑡−𝑅𝐴𝐺 (Y|𝑥) = . employ the multi-vector retrieval model (denoted as MultiVec), fine-tuned as outlined in the previous section, 𝑝𝜂 (𝑧𝑖 |𝑥)𝑝𝜃 (𝑦𝑖 |𝑥, 𝑧𝑖 ) (1) ∑︁ as the primary retrieval component. To demonstrate the 𝑧𝑖 ∈𝑡𝑜𝑝−𝐾(𝑝(·|𝑥)) effectiveness of our proposed retrieval method, we establish a baseline retrieval model, BM25 [48], within the RAG Multi-vector retrieval model. State-of-the-art methods framework for comparison. The generative component for typically fine-tune deep pre-trained language models, such both the Product-RAG-MultiVec and Product-RAG-BM25 as BERT [43], to generate dense vector representations for frameworks is a fine-tuned Mistral-7B. both input queries and documents. The top-K documents with the highest similarity scores are then retrieved. In- Experimental dataset. We perform an experiment on spired by recent advances in multi-vector representations 1,500 search queries corresponding to book products with [34, 44], we adopt a retrieval model in the proposed ap- the help of human expert annotation: given a search prefix, proach, as depicted in Figure 2, where we fine-tune the a human expert manually annotates an auto-completion prefix and product encoders with e-commerce data. Table 1 Evaluation scores of generated QAC suggestions. Each model generates 3 suggestions and we obtain the maximum evaluation scores out of these suggestions as the evaluation score of the data point. Model ROUGE-1 ROUGE-2 ROUGE-L MRR@10 HR@10 Mistral-7B 77.2 66.7 76.6 0.65 0.76 Product-RAG-BM25 75.3 64.9 74.6 0.62 0.74 Product-RAG-MultiVec 82.2 74.1 81.5 0.75 0.87 keyword as the ground truth QAC suggestion and, through 5. Conclusions the product search page, find an available book product, which we use as the ground truth targeting product. Thus, In this work, we introduce Product-RAG, an RAG framework each evaluation data point is composed by a triple . For each data point, we use vant products to search prefix and informing product-aware proposed models to generate top-3 suggestions based on suggestions. This framework generates suggestions close to the prefix. For the Product-RAG models, we employ 7 user search intention, and it highlights product relevance million book products as product knowledge. at an early stage of the shopping journey, before down- stream product searches. Through empirical experiments Evaluation metrics. We evaluate the generated sugges- on auto-completing search queries, we compare the pro- tions in both query and product levels. We compute posed framework with baseline LLM and we test various retrieval models. In particular, we find that the Product- 1. the similarity between generated suggestions with RAG-MultiVec remarkably outperforms its counterparts in annotated ground truth QAC suggestions in terms terms of query similarity and product relevance. This work of ROUGE-1/2/L F1 scores. We use the maximum sheds light on bridging the semantic gap between partial score out of the 3 suggestions for each data point. search queries and product knowledge in the scenario of 2. for each target product, the Mean Reciprocal Rank in e-commerce QAC systems. the top 10 results (MRR@10) and the Hit Ratio (target product is included) in the top 10 results (HR@10) on the product search page triggered by the generated References suggestions. We use the maximum MRR@10 and [1] M. Jakobsson, Autocompletion in full text transaction HR@10 out of the 3 suggestions generated as well. entry: a method for humanized input, ACM SIGCHI Evaluation results. We report the ROUGE-1, ROUGE-2, Bulletin 17 (1986) 327–332. ROUGE-L F-1 scores, MRR@10, and HR@10 for the 3 [2] H. Bast, I. Weber, Type less, find more: fast autocom- experimented models in Table. 1. We observe that the pletion search with a succinct index, in: Proceedings of proposed Product-RAG-MultiVec outperforms both baseline the 29th annual international ACM SIGIR conference generative Mistral-7B and the counterpart based on the on Research and development in information retrieval, BM25 retrieval model, in terms of all ROUGE scores and 2006, pp. 364–371. the 2 product-level metrics. These findings demonstrate [3] H. Bast, D. Majumdar, I. Weber, Efficient interactive the supremacy of the Product-RAG-MultiVec model in query expansion with complete search, in: Proceed- generating high-quality QAC suggestions for e-commerce ings of the sixteenth ACM conference on Conference systems. on information and knowledge management, 2007, pp. 857–860. Discussions. In the experiment above, we notice a neg- [4] F. Cai, M. De Rijke, et al., A survey of query auto ative impact in these metrics led by the BM25 retriever completion in information retrieval, Foundations and compared with baseline Mistral-7B. Inspecting the top-3 re- Trends® in Information Retrieval 10 (2016) 273–363. trieved products from BM25 and those from MultiVec model, [5] C. Xiao, J. Qin, W. Wang, Y. Ishikawa, K. Tsuda, we notice a non-trivial gap between the performance of two K. Sadakane, Efficient error-tolerant query autocom- retrievers: in our experiment, MultiVec model is able to pletion, Proceedings of the VLDB Endowment 6 (2013) successfully retrieve the targeting product or its equivalents 373–384. in 74.7% cases (e.g., Harry Potter and the Chamber of Secrets: [6] M. A. Hasan, N. Parikh, G. Singh, N. Sundaresan, Gryffindor Edition Red or its equivalent Harry Potter and the Query suggestion for e-commerce sites, in: Proceed- Chamber of Secrets), whereas BM25 retriever only succeeds ings of the fourth ACM international conference on in 37.1% cases. The accuracy of two retrievers explains Web Search and Data Mining, 2011, pp. 765–774. the performance gap between the two Product-RAG mod- [7] S. K. Karmaker Santu, P. Sondhi, C. Zhai, On appli- els. This leads to a conclusion that when the retriever pro- cation of learning to rank for e-commerce search, in: vides highly-relevant products, the proposed Product-RAG Proceedings of the 40th international ACM SIGIR con- framework is capable of improving upon the state-of-the-art ference on research and development in information generative approaches in e-commerce QAC tasks. And we retrieval, 2017, pp. 475–484. believe that improving the precision of the retriever model is [8] L. Wu, D. Hu, L. Hong, H. Liu, Turning clicks into one future direction of refining the proposed Product-RAG purchases: Revenue optimization for product search framework. in e-commerce, in: The 41st International ACM SIGIR Conference on Research & Development in Informa- [22] S. Wang, W. Guo, H. Gao, B. Long, Efficient neu- tion Retrieval, 2018, pp. 365–374. ral query auto completion, in: Proceedings of the [9] A. Block, R. Kidambi, D. N. Hill, T. Joachims, I. S. 29th ACM International Conference on Information Dhillon, Counterfactual learning to rank for utility- & Knowledge Management, CIKM ’20, Association maximizing query autocompletion, in: Proceedings for Computing Machinery, New York, NY, USA, 2020, of the 45th International ACM SIGIR Conference on p. 2797–2804. URL: https://doi.org/10.1145/3340531. Research and Development in Information Retrieval, 3412701. doi:10.1145/3340531.3412701. 2022, pp. 791–802. [23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, [10] G. Di Santo, R. McCreadie, C. Macdonald, I. Ounis, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Atten- Comparing approaches for query autocompletion, in: tion is all you need, Advances in neural information Proceedings of the 38th International ACM SIGIR Con- processing systems 30 (2017). ference on Research and Development in Information [24] B. Mitra, N. Craswell, Query auto-completion for rare Retrieval, 2015, pp. 775–778. prefixes, in: Proceedings of the 24th ACM interna- [11] M. Shokouhi, K. Radinsky, Time-sensitive query auto- tional on conference on information and knowledge completion, in: Proceedings of the 35th international management, 2015, pp. 1755–1758. ACM SIGIR conference on Research and development [25] F. Cai, M. de Rijke, Learning from homologous queries in information retrieval, 2012, pp. 601–610. and semantically related terms for query auto comple- [12] A. Strizhevskaya, A. Baytin, I. Galinskaya, tion, Information Processing & Management 52 (2016) P. Serdyukov, Actualization of query sugges- 628–643. tions using query logs, in: Proceedings of the 21st [26] T. Shao, H. Chen, W. Chen, Query auto-completion International Conference on World Wide Web, 2012, based on word2vec semantic similarity, in: Journal of pp. 611–612. Physics: Conference Series, volume 1004, IOP Publish- [13] Z. Bar-Yossef, N. Kraus, Context-sensitive query auto- ing, 2018, p. 012018. completion, in: Proceedings of the 20th international [27] K. Arkoudas, M. Yahya, Semantically driven auto- conference on World wide web, 2011, pp. 107–116. completion, in: Proceedings of the 28th ACM Inter- [14] F. Cai, S. Liang, M. De Rijke, Time-sensitive person- national Conference on Information and Knowledge alized query auto-completion, in: Proceedings of the Management, 2019, pp. 2693–2701. 23rd ACM international conference on conference on [28] S. Wang, W. Guo, H. Gao, B. Long, Efficient neu- information and knowledge management, 2014, pp. ral query auto completion, in: Proceedings of the 1599–1608. 29th ACM International Conference on Information [15] M. Shokouhi, Learning to personalize query auto- & Knowledge Management, 2020, pp. 2797–2804. completion, in: Proceedings of the 36th international [29] Y. M. Kang, W. Liu, Y. Zhou, Queryblazer: efficient ACM SIGIR conference on Research and development query autocompletion framework, in: Proceedings in information retrieval, 2013, pp. 103–112. of the 14th ACM International Conference on Web [16] S. Whiting, J. M. Jose, Recent and robust query auto- Search and Data Mining, 2021, pp. 1020–1028. completion, in: Proceedings of the 23rd international [30] D. Maxwell, P. Bailey, D. Hawking, Large-scale gen- conference on World wide web, 2014, pp. 971–982. erative query autocompletion, in: Proceedings of the [17] N. Yadav, R. Sen, D. N. Hill, A. Mazumdar, I. S. Dhillon, 22nd Australasian Document Computing Symposium, Session-aware query auto-completion using extreme 2017, pp. 1–8. multi-label ranking, in: Proceedings of the 27th ACM [31] S. Whiting, J. M. Jose, Recent and robust query auto- SIGKDD Conference on Knowledge Discovery & Data completion, in: Proceedings of the 23rd International Mining, 2021, pp. 3835–3844. Conference on World Wide Web, WWW ’14, Associa- [18] J. Zhao, H. Chen, D. Yin, A dynamic product-aware tion for Computing Machinery, New York, NY, USA, learning model for e-commerce query intent under- 2014, p. 971–982. URL: https://doi.org/10.1145/2566486. standing, in: Proceedings of the 28th ACM Inter- 2568009. doi:10.1145/2566486.2568009. national Conference on Information and Knowledge [32] F. Cai, S. Liang, M. de Rijke, Time-sensitive person- Management, 2019, pp. 1843–1852. alized query auto-completion, in: Proceedings of the [19] C. Luo, R. Goutam, H. Zhang, C. Zhang, Y. Song, B. Yin, 23rd ACM International Conference on Conference on Implicit query parsing at amazon product search, in: Information and Knowledge Management, CIKM ’14, Proceedings of the 46th International ACM SIGIR Con- Association for Computing Machinery, New York, NY, ference on Research and Development in Information USA, 2014, p. 1599–1608. URL: https://doi.org/10.1145/ Retrieval, 2023, pp. 3380–3384. 2661829.2661921. doi:10.1145/2661829.2661921. [20] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, [33] M. Shokouhi, K. Radinsky, Time-sensitive query auto- N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rock- completion, in: Proceedings of the 35th International täschel, et al., Retrieval-augmented generation for ACM SIGIR Conference on Research and Develop- knowledge-intensive nlp tasks, Advances in Neural ment in Information Retrieval, SIGIR ’12, Associa- Information Processing Systems 33 (2020) 9459–9474. tion for Computing Machinery, New York, NY, USA, [21] B. Mitra, Exploring session context using distributed 2012, p. 601–610. URL: https://doi.org/10.1145/2348283. representations of queries and reformulations, in: Pro- 2348364. doi:10.1145/2348283.2348364. ceedings of the 38th International ACM SIGIR Con- [34] O. Khattab, M. Zaharia, Colbert: Efficient and effec- ference on Research and Development in Informa- tive passage search via contextualized late interaction tion Retrieval, SIGIR ’15, Association for Comput- over bert, in: Proceedings of the 43rd International ing Machinery, New York, NY, USA, 2015, p. 3–12. ACM SIGIR conference on research and development URL: https://doi.org/10.1145/2766462.2767702. doi:10. in Information Retrieval, 2020, pp. 39–48. 1145/2766462.2767702. [35] G. Aslanyan, A. Mandal, P. Senthil Kumar, A. Jaiswal, M. Rangasamy Kannadasan, Personalized ranking in arXiv:2112.01488 (2021). ecommerce search, in: Companion Proceedings of [45] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, the Web Conference 2020, WWW ’20, Association for D. S. Chaplot, D. d. l. Casas, F. Bressand, G. Lengyel, Computing Machinery, New York, NY, USA, 2020, p. G. Lample, L. Saulnier, et al., Mistral 7b, arXiv preprint 96–97. URL: https://doi.org/10.1145/3366424.3382715. arXiv:2310.06825 (2023). doi:10.1145/3366424.3382715. [46] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, [36] A. Jaech, M. Ostendorf, Personalized language model G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sut- for query auto-completion, in: I. Gurevych, Y. Miyao ton, S. Gehrmann, et al., Palm: Scaling language mod- (Eds.), Proceedings of the 56th Annual Meeting of eling with pathways, Journal of Machine Learning the Association for Computational Linguistics (Vol- Research 24 (2023) 1–113. ume 2: Short Papers), Association for Computa- [47] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, tional Linguistics, Melbourne, Australia, 2018, pp. 700– F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, 705. URL: https://aclanthology.org/P18-2111. doi:10. S. Anadkat, et al., Gpt-4 technical report, arXiv 18653/v1/P18-2111. preprint arXiv:2303.08774 (2023). [37] D. Yin, J. Tan, Z. Zhang, H. Deng, S. Huang, [48] S. Robertson, H. Zaragoza, et al., The probabilistic J. Chen, Learning to generate personalized query auto- relevance framework: Bm25 and beyond, Foundations completions via a multi-view multi-task attentive ap- and Trends® in Information Retrieval 3 (2009) 333– proach, in: Proceedings of the 26th ACM SIGKDD 389. International Conference on Knowledge Discovery & Data Mining, KDD ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 2998–3007. URL: https://doi.org/10.1145/3394486.3403350. doi:10. 1145/3394486.3403350. [38] D. Maxwell, P. Bailey, D. Hawking, Large-scale gen- erative query autocompletion, in: Proceedings of the 22nd Australasian Document Computing Symposium, ADCS ’17, Association for Computing Machinery, New York, NY, USA, 2017. URL: https://doi.org/10.1145/ 3166072.3166083. doi:10.1145/3166072.3166083. [39] A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Si- monsen, J.-Y. Nie, A hierarchical recurrent encoder- decoder for generative context-aware query sugges- tion, in: Proceedings of the 24th ACM Interna- tional on Conference on Information and Knowledge Management, CIKM ’15, Association for Computing Machinery, New York, NY, USA, 2015, p. 553–562. URL: https://doi.org/10.1145/2806416.2806493. doi:10. 1145/2806416.2806493. [40] P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Sub- ramanian, E. Bakhturina, M. Shoeybi, B. Catanzaro, Retrieval meets long context large language models, arXiv preprint arXiv:2310.03025 (2023). [41] N. Kandpal, H. Deng, A. Roberts, E. Wallace, C. Raf- fel, Large language models struggle to learn long-tail knowledge, in: Proceedings of the 40th International Conference on Machine Learning, ICML’23, JMLR.org, 2023. [42] Y. Mao, P. He, X. Liu, Y. Shen, J. Gao, J. Han, W. Chen, Generation-augmented retrieval for open- domain question answering, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th An- nual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing (Volume 1: Long Papers), Association for Computational Lin- guistics, Online, 2021, pp. 4089–4100. URL: https: //aclanthology.org/2021.acl-long.316. doi:10.18653/ v1/2021.acl-long.316. [43] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for lan- guage understanding, arXiv preprint arXiv:1810.04805 (2018). [44] K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts, M. Zaharia, Colbertv2: Effective and efficient re- trieval via lightweight late interaction, arXiv preprint