=Paper=
{{Paper
|id=Vol-3784/short5
|storemode=property
|title=A Product-Aware Query Auto-Completion Framework for E-Commerce Search via Retrieval-Augmented Generation Method
|pdfUrl=https://ceur-ws.org/Vol-3784/short5.pdf
|volume=Vol-3784
|authors=Fangzheng Sun,Tianqi Zheng,Aakash Kolekar,Rohit Patki,Hossein Khazaei,Xuan Guo,Ziheng Cai,David Liu,Ruirui Li,Yupin Huang,Dante Everaert,Hanqing Lu,Garima Patel,Monica Cheng
|dblpUrl=https://dblp.org/rec/conf/ir-rag/SunZKPKGCL0HELC24
}}
==A Product-Aware Query Auto-Completion Framework for E-Commerce Search via Retrieval-Augmented Generation Method==
A Product-Aware Query Auto-Completion Framework for
E-Commerce Search via Retrieval-Augmented Generation
Method
Fangzheng Sun*,† , Tianqi Zheng† , Aakash Kolekar, Rohit Patki, Hossein Khazaei, Xuan Guo,
Ziheng Cai, David Liu, Ruirui Li, Yupin Huang, Dante Everaert, Hanqing Lu, Garima Patel and
Monica Cheng
Amazon Search, Palo Alto, CA, USA
Abstract
Query Auto-Completion (QAC) is a fundamental component of user search experience on e-commerce websites. It assists in finding
user-intended products, by automatically presenting search queries as users typing in the search bar. Traditional QAC systems build
upon query popularity to suggest a list of potential completions, but they fall short for unforeseen search prefixes. A generative Large
Language Model (LLM) can complete even unforeseen prefixes, but relevance to the product catalog of the generated suggestions is
not guaranteed. To our best knowledge, there is no existing study using LLMs to generate product-aware search query completion
suggestions.
This paper proposes a generative approach named "Product-RAG", to incorporate product metadata and adapt Retrieval Augmented
Generation (RAG) in the development of QAC systems. Product-RAG contains two components: (1) a retrieval model that identifies
top-K most relevant products from the product catalog given a user-input prefix, and (2) a generative model that offers suggestions
based on both the given prefix and the retrieved product metadata. We evaluate this approach for its ability to match user-input prefixes
to user-intended products, using the metrics of ROUGE scores, Mean Reciprocal Rank (MRR) and Hit Ratio (HR) in downstream product
search. We observe that the proposed Product-RAG approach outperforms state-of-the-art generative models in auto-completing
e-commerce search queries.
Keywords
Query Auto-Complete, Retrieval-Augmented Generation, E-Commerce, Product-aware
1. Introduction tent related to specific products, brands, or categories and
providing auto-complete suggestions that align with the
Query auto-completion (QAC) [1, 2, 3, 4, 5] refers to an in- user’s potential shopping targets. When query log falls
formation retrieval system for search engines, for which, short for unseen or rarely seen prefixes, product knowl-
given partial context typed by the user (i.e. prefix), it offers edge is particularly helpful to predict users’ shopping intent
one or multiple query suggestions to the user. In modern and generate corresponding suggestions (e.g., in the case
e-commerce, where user experience is pivotal, QAC stands of Figure 1). Nevertheless, in spite of efforts attempting
as an important feature shaping the way consumers interact to understand user’s shopping intent with product cata-
with search engines and plays a crucial role in smoothing log or product attributes [18, 19], we could not find any
all the downstream shopping experiences [6, 7, 8, 9]. By work bridging the gap between partially complete search
leveraging personalized signals, product-related knowledge, queries and product knowledge for e-commerce QAC sys-
and advanced recommendation algorithms, QAC not only tems. Herein we propose a generative approach to leverage
accelerates the search experience but also ensures that users the product knowledge in e-commerce QAC systems based
receive tailored suggestions based on their unique prefer- on Retrieval-Augmented Generation (RAG) Framework [20],
ences. namely Product-RAG, which is capable of improving the
One major challenge of QAC tasks in e-commerce is to QAC systems by providing accurate auto-completion sug-
understand user shopping intent from an incomplete search
query, and provide them relevant auto-complete sugges-
tions. A typical production QAC works as follows: Given
a prefix entered by a user, the QAC system obtains a col-
lection of queries satisfying the prefix from query log, and
adopts a selection process, often based on forecasted popu-
larity, to select candidate queries to send to the query ranker
[10, 11, 12]. This framework lacks the understanding of user
shopping intent. To give more importance to users’ intents
and provide more personalized and relevant QAC sugges-
tions, a number of works explore the context-aware and
personalized QAC systems [13, 14, 15, 16, 17].
Another challenge for QAC in e-commerce is to attain
product awareness by recognizing and predicting users’ in-
IR-RAG @ SIGIR24 workshop: The 47th International ACM SIGIR Confer-
ence on Research and Development in Information Retrieval, July 18, 2024,
Washington D.C., USA Figure 1: A search of "guidebook for edible mus" returns an
*
Corresponding author. empty QAC suggestion list while multiple related products are
†
These authors contributed equally. available.
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
prefixes 100 things to see Generator
product3
product2
prefix encoder prefix product1
prompt
MaxSim MaxSim
offline indexing
product encoder 1. 100 Things To See In The QAC suggestions:
Kimberley
2. 100 Things to See in the Night 1. 100 things to see in the
Sky, Expanded Edition: Your
kimberley
Illustrated Guide to the
2. 100 things to see in the
products Planets, Satellites,
Constellations, and More night sky
3. 100 Things to See in the 3. 100 things to see in the
National Parks national parks
Retriever
Figure 2: Schematic architecture of Product-RAG model for retrieving top-K products relevant to a prefix and generating K
product-aware query auto-completion suggestions from them.
gestions based on product knowledge relevant to search using neural network representations learned with
prefixes. The RAG framework exhibits outstanding efficacy query prefix, candidate query completions, along with
in domain-specific sequence generation tasks through ex- session context [21, 22]. Context-aware QAC considers
ploiting the domain knowledge as additional context when users’ past queries instead of solely relying on popular-
generating the target sequences. Under the RAG scheme, ity [13, 14, 15, 16, 17]. Meanwhile, recent advances in
generative large language models (LLMs) are supervised by natural language processing [23] have inspired enormous
retrieved relevant information from heterogeneous and pre- work on semantically understanding users’ search context
determined knowledge sources, and consequently, present and intent. They employ a high-dimensional space to
accurate and controlled text output. This is a compelling measure the semantic similarities between search queries
benefit for e-commerce QAC systems where relevancy of in contextual representations. These methods either offer
QAC suggestions is essential for user trust. Additionally, users more contextually relevant suggestions from query
the RAG framework brings QAC systems great versatility log [24, 25, 26, 27] or generate new keywords [28, 29, 30].
against frequently-updated information sources. This at- Candidate sourcing approaches usually fall into one
tribute allows the utilization of the vast universal product of the three categories: retriever-only approaches, pure
knowledge base into e-commerce QAC systems without un- generative approaches, and retrieval-augmented generation.
dergoing the high costs of re-training the generative model
on the giant product catalog. The superiority of the pro- Retriever-only approaches. This class of approaches
posed Product-RAG model is empirically demonstrated by are based on retrieving the candidates from a pool of
human-annotated e-commerce QAC tasks. We illustrate candidates that is usually built from scraping historical
that the proposed method could outperform state-of-the-art logs of the QAC system. The candidates are often selected
generative Large Language Models (LLMs) by generating based on popularity-based approaches such as MPC and
QAC suggestions that 1) are more similar with the ground neural matching [31, 32, 33]. In MPC-based approaches,
truths and in terms of ROUGE scores, and 2) lead to more candidates with higher forecasted popularity are selected;
relevant product search results in terms of Mean Reciprocal in lexical matching approaches, candidates with higher
Rank (MRR) and Hit Ratio (HR). similarities are selected; neural matching approaches for
candidate retrieval allow selecting queries in semantic
representations [34]. Session context [21, 17] or personal
2. Related Work profile signals [35, 36] could be considered as auxiliary
inputs. However, since they are limited to the existing
Most QAC systems work on a sourcing-and-ranking basis -
candidates, offering suggestions is not guaranteed. In
they first source candidates from a large pool to limit the
addition, the achieved product-awareness is biased toward
scope, and then rank on the sourced candidates. This paper
popular candidates in the past.
focuses on the sourcing part of the QAC system, ensuring
that the sourced candidates are product-aware, before send-
Generative approaches. The generative approaches
ing them to the query ranker.
use language models to generate the candidate based on
Established approaches for sourcing candidate query
inputs like prefix, session context, and personalization
completions often rely on most popular candidate
signals[37, 38, 39]. Generative approaches can provide can-
(MPC) approach, which could not incorporate important
didates for prefixes fist appear. Two major challenges facing
semantic information of prefix and session context.
pure generative approaches in e-commerce QAC system
Newer approaches incorporate semantic information
development are: (1) They may suffer from hallucinations, Precisely, given the representation of a prefix 𝑥 and a
generating plausible queries but without a reference to product 𝑧, the relevance score of 𝑧 to 𝑥, denoted as 𝑆𝑥,𝑧 is
product information. (2) In a dynamic environment of defined as the sum of maximum cosine similarity between
e-commerce where products changes continuously, they each vector E𝑥𝑖 in prefix embedding and the vectors in a a
lack the mechanism to automatically incorporate new product bag E𝑇 :
information post-training. This necessitates periodic
𝑥
fine-tuning to maintain the model’s relevance and accuracy, ∑︁
𝑆𝑥,𝑧 := max E𝑥𝑖 · E𝑇𝑗 , 𝑧 ∈ 𝑇 (2)
which can be prohibitively costly and impractical. 𝑗∈𝑇
𝑖=1
Retrieval Augmented Generation (RAG). RAG ap- Offline product knowledge indexing. We pre-compute
proaches [20, 40, 41], extend the capabilities of language all product embeddings and offline index these vector repre-
models by integrating external knowledge sources as auxil- sentations to support the efficient lookup of relevant prod-
iary inputs to enhance the performance of the overall sys- ucts. The index includes 1) centriods representing centers
tem. To our best knowledge, our work is the first study that partitioning product embeddings into bags, 2) residuals stor-
adapts RAG framework for QAC systems. Prior to this, RAG ing embedding for a product and comparing to its nearest
has been applied to various tasks and application domains centroid, and 3) index inversion representing an inverted
such as question answering [42], text summarization [20]. map from a centroid to products to support the fast near-
Since RAG uses product information as auxiliary inputs for est neighbor search. They are encoded offline and loaded
the language model during generation, and that the prod- into the memory of QAC service. Given a prefix, the prefix
uct retrieval updates in response to underlying database encoder vectorizes it and the retrieval model looks for the
updates, the language model does not need to be fine-tuned top-K most relevant products through operating MaxSim
to capture new products. between the prefix embedding and already-loaded product
indexing in memory.
Offline indexing of pre-computed product embeddings
3. Product-RAG also brings convenience to refreshing the product knowl-
edge pool frequently with low cost and requires no effort
Existing studies show that the RAG framework is effective
from re-training models to adapt to newly added products.
and efficient in extending the already powerful capabilities
of LLMs to specific domains without the need to retrain
QAC Suggestion Generation. For the generative compo-
the model with a heterogeneous database. The proposed
nent of the Product-RAG, we use a generative LLM where
Product-RAG model is based upon the architecture of the
the input is a prompt containing both prefix 𝑥 and top-K
RAG-Sequence Model [20] to generate one suggestion for
retrieved products 𝑧 ∈ Z and the outputs are K product-
each relevant product. The schematic architecture of the
aware QAC suggestions Y for the prefix. As we train the
Product-RAG model for generating product-aware query
retrieval model and the generative model separately, we can
auto-completion suggestions from relevant products is de-
use any state-of-the-art generative LLMs such as Mistral-7B
picted in Figure 2. Our framework leverages two compo-
[45], PaLM [46], GPT-4 [47], as long as they can perform
nents:
text summarization and QAC or equivalent tasks. In our
1. A retrieval model 𝜂 that retrieves the best-matching proposed Product-RAG we empirically choose Mistral-7B
product titles or catalog from product pool. based on offline evaluations on the performance and latency
2. A generative LLM 𝜃 that outputs auto-complete sug- of different generative LLMs in QAC tasks. Moreover, we
gestions for a given prefix and retrieved products are able to fine-tune Mistral-7B with e-commerce search
from the retriever. and product data.
Task formulation. We denote a search prefix as 𝑥 and
the target QAC suggestions as 𝑦. The retriever 𝑝𝜂 (Z|𝑥) 4. Experiments
consumes a product knowledge base P and returns top-K
relevant products Z = {𝑧1 , 𝑧2 , ..., 𝑧𝐾 }, 𝑧𝑖 ∈ P given the We now evaluate Product-RAG on e-commerce QAC
prefix 𝑥. For each 𝑧 ∈ Z, the generative LLM 𝑝𝜃 (𝑦|𝑥, 𝑧) tasks, testing its ability to generate QAC suggestions for
generates a QAC suggestion for 𝑥 with context from the a given prefix in the e-commerce domain. We define a
retrieved product, rendering the top-K suggestions Y = baseline LLM by fine-tuning the Mistral-7B model on
{𝑦1 , 𝑦2 , ..., 𝑦𝐾 }. The Product-RAG can be parameterized the e-commerce QAC database without the help of the
as RAG framework. In the Product-RAG framework, we
𝑝𝑃 𝑟𝑜𝑑𝑢𝑐𝑡−𝑅𝐴𝐺 (Y|𝑥) =
. employ the multi-vector retrieval model (denoted as
MultiVec), fine-tuned as outlined in the previous section,
𝑝𝜂 (𝑧𝑖 |𝑥)𝑝𝜃 (𝑦𝑖 |𝑥, 𝑧𝑖 ) (1)
∑︁
as the primary retrieval component. To demonstrate the
𝑧𝑖 ∈𝑡𝑜𝑝−𝐾(𝑝(·|𝑥)) effectiveness of our proposed retrieval method, we establish
a baseline retrieval model, BM25 [48], within the RAG
Multi-vector retrieval model. State-of-the-art methods
framework for comparison. The generative component for
typically fine-tune deep pre-trained language models, such
both the Product-RAG-MultiVec and Product-RAG-BM25
as BERT [43], to generate dense vector representations for
frameworks is a fine-tuned Mistral-7B.
both input queries and documents. The top-K documents
with the highest similarity scores are then retrieved. In-
Experimental dataset. We perform an experiment on
spired by recent advances in multi-vector representations
1,500 search queries corresponding to book products with
[34, 44], we adopt a retrieval model in the proposed ap-
the help of human expert annotation: given a search prefix,
proach, as depicted in Figure 2, where we fine-tune the
a human expert manually annotates an auto-completion
prefix and product encoders with e-commerce data.
Table 1
Evaluation scores of generated QAC suggestions. Each model generates 3 suggestions and we obtain the maximum evaluation
scores out of these suggestions as the evaluation score of the data point.
Model ROUGE-1 ROUGE-2 ROUGE-L MRR@10 HR@10
Mistral-7B 77.2 66.7 76.6 0.65 0.76
Product-RAG-BM25 75.3 64.9 74.6 0.62 0.74
Product-RAG-MultiVec 82.2 74.1 81.5 0.75 0.87
keyword as the ground truth QAC suggestion and, through 5. Conclusions
the product search page, find an available book product,
which we use as the ground truth targeting product. Thus, In this work, we introduce Product-RAG, an RAG framework
each evaluation data point is composed by a triple . For each data point, we use vant products to search prefix and informing product-aware
proposed models to generate top-3 suggestions based on suggestions. This framework generates suggestions close to
the prefix. For the Product-RAG models, we employ 7 user search intention, and it highlights product relevance
million book products as product knowledge. at an early stage of the shopping journey, before down-
stream product searches. Through empirical experiments
Evaluation metrics. We evaluate the generated sugges- on auto-completing search queries, we compare the pro-
tions in both query and product levels. We compute posed framework with baseline LLM and we test various
retrieval models. In particular, we find that the Product-
1. the similarity between generated suggestions with RAG-MultiVec remarkably outperforms its counterparts in
annotated ground truth QAC suggestions in terms terms of query similarity and product relevance. This work
of ROUGE-1/2/L F1 scores. We use the maximum sheds light on bridging the semantic gap between partial
score out of the 3 suggestions for each data point. search queries and product knowledge in the scenario of
2. for each target product, the Mean Reciprocal Rank in e-commerce QAC systems.
the top 10 results (MRR@10) and the Hit Ratio (target
product is included) in the top 10 results (HR@10) on
the product search page triggered by the generated References
suggestions. We use the maximum MRR@10 and
[1] M. Jakobsson, Autocompletion in full text transaction
HR@10 out of the 3 suggestions generated as well.
entry: a method for humanized input, ACM SIGCHI
Evaluation results. We report the ROUGE-1, ROUGE-2, Bulletin 17 (1986) 327–332.
ROUGE-L F-1 scores, MRR@10, and HR@10 for the 3 [2] H. Bast, I. Weber, Type less, find more: fast autocom-
experimented models in Table. 1. We observe that the pletion search with a succinct index, in: Proceedings of
proposed Product-RAG-MultiVec outperforms both baseline the 29th annual international ACM SIGIR conference
generative Mistral-7B and the counterpart based on the on Research and development in information retrieval,
BM25 retrieval model, in terms of all ROUGE scores and 2006, pp. 364–371.
the 2 product-level metrics. These findings demonstrate [3] H. Bast, D. Majumdar, I. Weber, Efficient interactive
the supremacy of the Product-RAG-MultiVec model in query expansion with complete search, in: Proceed-
generating high-quality QAC suggestions for e-commerce ings of the sixteenth ACM conference on Conference
systems. on information and knowledge management, 2007, pp.
857–860.
Discussions. In the experiment above, we notice a neg- [4] F. Cai, M. De Rijke, et al., A survey of query auto
ative impact in these metrics led by the BM25 retriever completion in information retrieval, Foundations and
compared with baseline Mistral-7B. Inspecting the top-3 re- Trends® in Information Retrieval 10 (2016) 273–363.
trieved products from BM25 and those from MultiVec model, [5] C. Xiao, J. Qin, W. Wang, Y. Ishikawa, K. Tsuda,
we notice a non-trivial gap between the performance of two K. Sadakane, Efficient error-tolerant query autocom-
retrievers: in our experiment, MultiVec model is able to pletion, Proceedings of the VLDB Endowment 6 (2013)
successfully retrieve the targeting product or its equivalents 373–384.
in 74.7% cases (e.g., Harry Potter and the Chamber of Secrets: [6] M. A. Hasan, N. Parikh, G. Singh, N. Sundaresan,
Gryffindor Edition Red or its equivalent Harry Potter and the Query suggestion for e-commerce sites, in: Proceed-
Chamber of Secrets), whereas BM25 retriever only succeeds ings of the fourth ACM international conference on
in 37.1% cases. The accuracy of two retrievers explains Web Search and Data Mining, 2011, pp. 765–774.
the performance gap between the two Product-RAG mod- [7] S. K. Karmaker Santu, P. Sondhi, C. Zhai, On appli-
els. This leads to a conclusion that when the retriever pro- cation of learning to rank for e-commerce search, in:
vides highly-relevant products, the proposed Product-RAG Proceedings of the 40th international ACM SIGIR con-
framework is capable of improving upon the state-of-the-art ference on research and development in information
generative approaches in e-commerce QAC tasks. And we retrieval, 2017, pp. 475–484.
believe that improving the precision of the retriever model is [8] L. Wu, D. Hu, L. Hong, H. Liu, Turning clicks into
one future direction of refining the proposed Product-RAG purchases: Revenue optimization for product search
framework. in e-commerce, in: The 41st International ACM SIGIR
Conference on Research & Development in Informa- [22] S. Wang, W. Guo, H. Gao, B. Long, Efficient neu-
tion Retrieval, 2018, pp. 365–374. ral query auto completion, in: Proceedings of the
[9] A. Block, R. Kidambi, D. N. Hill, T. Joachims, I. S. 29th ACM International Conference on Information
Dhillon, Counterfactual learning to rank for utility- & Knowledge Management, CIKM ’20, Association
maximizing query autocompletion, in: Proceedings for Computing Machinery, New York, NY, USA, 2020,
of the 45th International ACM SIGIR Conference on p. 2797–2804. URL: https://doi.org/10.1145/3340531.
Research and Development in Information Retrieval, 3412701. doi:10.1145/3340531.3412701.
2022, pp. 791–802. [23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
[10] G. Di Santo, R. McCreadie, C. Macdonald, I. Ounis, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Atten-
Comparing approaches for query autocompletion, in: tion is all you need, Advances in neural information
Proceedings of the 38th International ACM SIGIR Con- processing systems 30 (2017).
ference on Research and Development in Information [24] B. Mitra, N. Craswell, Query auto-completion for rare
Retrieval, 2015, pp. 775–778. prefixes, in: Proceedings of the 24th ACM interna-
[11] M. Shokouhi, K. Radinsky, Time-sensitive query auto- tional on conference on information and knowledge
completion, in: Proceedings of the 35th international management, 2015, pp. 1755–1758.
ACM SIGIR conference on Research and development [25] F. Cai, M. de Rijke, Learning from homologous queries
in information retrieval, 2012, pp. 601–610. and semantically related terms for query auto comple-
[12] A. Strizhevskaya, A. Baytin, I. Galinskaya, tion, Information Processing & Management 52 (2016)
P. Serdyukov, Actualization of query sugges- 628–643.
tions using query logs, in: Proceedings of the 21st [26] T. Shao, H. Chen, W. Chen, Query auto-completion
International Conference on World Wide Web, 2012, based on word2vec semantic similarity, in: Journal of
pp. 611–612. Physics: Conference Series, volume 1004, IOP Publish-
[13] Z. Bar-Yossef, N. Kraus, Context-sensitive query auto- ing, 2018, p. 012018.
completion, in: Proceedings of the 20th international [27] K. Arkoudas, M. Yahya, Semantically driven auto-
conference on World wide web, 2011, pp. 107–116. completion, in: Proceedings of the 28th ACM Inter-
[14] F. Cai, S. Liang, M. De Rijke, Time-sensitive person- national Conference on Information and Knowledge
alized query auto-completion, in: Proceedings of the Management, 2019, pp. 2693–2701.
23rd ACM international conference on conference on [28] S. Wang, W. Guo, H. Gao, B. Long, Efficient neu-
information and knowledge management, 2014, pp. ral query auto completion, in: Proceedings of the
1599–1608. 29th ACM International Conference on Information
[15] M. Shokouhi, Learning to personalize query auto- & Knowledge Management, 2020, pp. 2797–2804.
completion, in: Proceedings of the 36th international [29] Y. M. Kang, W. Liu, Y. Zhou, Queryblazer: efficient
ACM SIGIR conference on Research and development query autocompletion framework, in: Proceedings
in information retrieval, 2013, pp. 103–112. of the 14th ACM International Conference on Web
[16] S. Whiting, J. M. Jose, Recent and robust query auto- Search and Data Mining, 2021, pp. 1020–1028.
completion, in: Proceedings of the 23rd international [30] D. Maxwell, P. Bailey, D. Hawking, Large-scale gen-
conference on World wide web, 2014, pp. 971–982. erative query autocompletion, in: Proceedings of the
[17] N. Yadav, R. Sen, D. N. Hill, A. Mazumdar, I. S. Dhillon, 22nd Australasian Document Computing Symposium,
Session-aware query auto-completion using extreme 2017, pp. 1–8.
multi-label ranking, in: Proceedings of the 27th ACM [31] S. Whiting, J. M. Jose, Recent and robust query auto-
SIGKDD Conference on Knowledge Discovery & Data completion, in: Proceedings of the 23rd International
Mining, 2021, pp. 3835–3844. Conference on World Wide Web, WWW ’14, Associa-
[18] J. Zhao, H. Chen, D. Yin, A dynamic product-aware tion for Computing Machinery, New York, NY, USA,
learning model for e-commerce query intent under- 2014, p. 971–982. URL: https://doi.org/10.1145/2566486.
standing, in: Proceedings of the 28th ACM Inter- 2568009. doi:10.1145/2566486.2568009.
national Conference on Information and Knowledge [32] F. Cai, S. Liang, M. de Rijke, Time-sensitive person-
Management, 2019, pp. 1843–1852. alized query auto-completion, in: Proceedings of the
[19] C. Luo, R. Goutam, H. Zhang, C. Zhang, Y. Song, B. Yin, 23rd ACM International Conference on Conference on
Implicit query parsing at amazon product search, in: Information and Knowledge Management, CIKM ’14,
Proceedings of the 46th International ACM SIGIR Con- Association for Computing Machinery, New York, NY,
ference on Research and Development in Information USA, 2014, p. 1599–1608. URL: https://doi.org/10.1145/
Retrieval, 2023, pp. 3380–3384. 2661829.2661921. doi:10.1145/2661829.2661921.
[20] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, [33] M. Shokouhi, K. Radinsky, Time-sensitive query auto-
N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rock- completion, in: Proceedings of the 35th International
täschel, et al., Retrieval-augmented generation for ACM SIGIR Conference on Research and Develop-
knowledge-intensive nlp tasks, Advances in Neural ment in Information Retrieval, SIGIR ’12, Associa-
Information Processing Systems 33 (2020) 9459–9474. tion for Computing Machinery, New York, NY, USA,
[21] B. Mitra, Exploring session context using distributed 2012, p. 601–610. URL: https://doi.org/10.1145/2348283.
representations of queries and reformulations, in: Pro- 2348364. doi:10.1145/2348283.2348364.
ceedings of the 38th International ACM SIGIR Con- [34] O. Khattab, M. Zaharia, Colbert: Efficient and effec-
ference on Research and Development in Informa- tive passage search via contextualized late interaction
tion Retrieval, SIGIR ’15, Association for Comput- over bert, in: Proceedings of the 43rd International
ing Machinery, New York, NY, USA, 2015, p. 3–12. ACM SIGIR conference on research and development
URL: https://doi.org/10.1145/2766462.2767702. doi:10. in Information Retrieval, 2020, pp. 39–48.
1145/2766462.2767702. [35] G. Aslanyan, A. Mandal, P. Senthil Kumar, A. Jaiswal,
M. Rangasamy Kannadasan, Personalized ranking in arXiv:2112.01488 (2021).
ecommerce search, in: Companion Proceedings of [45] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford,
the Web Conference 2020, WWW ’20, Association for D. S. Chaplot, D. d. l. Casas, F. Bressand, G. Lengyel,
Computing Machinery, New York, NY, USA, 2020, p. G. Lample, L. Saulnier, et al., Mistral 7b, arXiv preprint
96–97. URL: https://doi.org/10.1145/3366424.3382715. arXiv:2310.06825 (2023).
doi:10.1145/3366424.3382715. [46] A. Chowdhery, S. Narang, J. Devlin, M. Bosma,
[36] A. Jaech, M. Ostendorf, Personalized language model G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sut-
for query auto-completion, in: I. Gurevych, Y. Miyao ton, S. Gehrmann, et al., Palm: Scaling language mod-
(Eds.), Proceedings of the 56th Annual Meeting of eling with pathways, Journal of Machine Learning
the Association for Computational Linguistics (Vol- Research 24 (2023) 1–113.
ume 2: Short Papers), Association for Computa- [47] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya,
tional Linguistics, Melbourne, Australia, 2018, pp. 700– F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman,
705. URL: https://aclanthology.org/P18-2111. doi:10. S. Anadkat, et al., Gpt-4 technical report, arXiv
18653/v1/P18-2111. preprint arXiv:2303.08774 (2023).
[37] D. Yin, J. Tan, Z. Zhang, H. Deng, S. Huang, [48] S. Robertson, H. Zaragoza, et al., The probabilistic
J. Chen, Learning to generate personalized query auto- relevance framework: Bm25 and beyond, Foundations
completions via a multi-view multi-task attentive ap- and Trends® in Information Retrieval 3 (2009) 333–
proach, in: Proceedings of the 26th ACM SIGKDD 389.
International Conference on Knowledge Discovery &
Data Mining, KDD ’20, Association for Computing
Machinery, New York, NY, USA, 2020, p. 2998–3007.
URL: https://doi.org/10.1145/3394486.3403350. doi:10.
1145/3394486.3403350.
[38] D. Maxwell, P. Bailey, D. Hawking, Large-scale gen-
erative query autocompletion, in: Proceedings of the
22nd Australasian Document Computing Symposium,
ADCS ’17, Association for Computing Machinery, New
York, NY, USA, 2017. URL: https://doi.org/10.1145/
3166072.3166083. doi:10.1145/3166072.3166083.
[39] A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Si-
monsen, J.-Y. Nie, A hierarchical recurrent encoder-
decoder for generative context-aware query sugges-
tion, in: Proceedings of the 24th ACM Interna-
tional on Conference on Information and Knowledge
Management, CIKM ’15, Association for Computing
Machinery, New York, NY, USA, 2015, p. 553–562.
URL: https://doi.org/10.1145/2806416.2806493. doi:10.
1145/2806416.2806493.
[40] P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Sub-
ramanian, E. Bakhturina, M. Shoeybi, B. Catanzaro,
Retrieval meets long context large language models,
arXiv preprint arXiv:2310.03025 (2023).
[41] N. Kandpal, H. Deng, A. Roberts, E. Wallace, C. Raf-
fel, Large language models struggle to learn long-tail
knowledge, in: Proceedings of the 40th International
Conference on Machine Learning, ICML’23, JMLR.org,
2023.
[42] Y. Mao, P. He, X. Liu, Y. Shen, J. Gao, J. Han,
W. Chen, Generation-augmented retrieval for open-
domain question answering, in: C. Zong, F. Xia,
W. Li, R. Navigli (Eds.), Proceedings of the 59th An-
nual Meeting of the Association for Computational
Linguistics and the 11th International Joint Confer-
ence on Natural Language Processing (Volume 1:
Long Papers), Association for Computational Lin-
guistics, Online, 2021, pp. 4089–4100. URL: https:
//aclanthology.org/2021.acl-long.316. doi:10.18653/
v1/2021.acl-long.316.
[43] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert:
Pre-training of deep bidirectional transformers for lan-
guage understanding, arXiv preprint arXiv:1810.04805
(2018).
[44] K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts,
M. Zaharia, Colbertv2: Effective and efficient re-
trieval via lightweight late interaction, arXiv preprint